Skip to main content
Article
Big data quality framework: a holistic approach to continuous quality management
Journal of Big Data
  • Ikbal Taleb, Zayed University
  • Mohamed Adel Serhani, United Arab Emirates University
  • Chafik Bouhaddioui, United Arab Emirates University
  • Rachida Dssouli, Concordia University
Document Type
Article
Publication Date
5-29-2021
Abstract

Big Data is an essential research area for governments, institutions, and private agencies to support their analytics decisions. Big Data refers to all about data, how it is collected, processed, and analyzed to generate value-added data-driven insights and decisions. Degradation in Data Quality may result in unpredictable consequences. In this case, confidence and worthiness in the data and its source are lost. In the Big Data context, data characteristics, such as volume, multi-heterogeneous data sources, and fast data generation, increase the risk of quality degradation and require efficient mechanisms to check data worthiness. However, ensuring Big Data Quality (BDQ) is a very costly and time-consuming process, since excessive computing resources are required. Maintaining Quality through the Big Data lifecycle requires quality profiling and verification before its processing decision. A BDQ Management Framework for enhancing the pre-processing activities while strengthening data control is proposed. The proposed framework uses a new concept called Big Data Quality Profile. This concept captures quality outline, requirements, attributes, dimensions, scores, and rules. Using Big Data profiling and sampling components of the framework, a faster and efficient data quality estimation is initiated before and after an intermediate pre-processing phase. The exploratory profiling component of the framework plays an initial role in quality profiling; it uses a set of predefined quality metrics to evaluate important data quality dimensions. It generates quality rules by applying various pre-processing activities and their related functions. These rules mainly aim at the Data Quality Profile and result in quality scores for the selected quality attributes. The framework implementation and dataflow management across various quality management processes have been discussed, further some ongoing work on framework evaluation and deployment to support quality evaluation decisions conclude the paper.

Disciplines
Keywords
  • Big data quality,
  • Data quality profile,
  • Pre-processing,
  • Quality assessment,
  • Quality metrics and scores
Scopus ID

85107381727

Creative Commons License
Creative Commons Attribution 4.0 International
Indexed in Scopus
Yes
Open Access
Yes
Open Access Type
Gold: This publication is openly available in an open access journal/series
Citation Information
Ikbal Taleb, Mohamed Adel Serhani, Chafik Bouhaddioui and Rachida Dssouli. "Big data quality framework: a holistic approach to continuous quality management" Journal of Big Data Vol. 8 (2021) ISSN: <p><a href="https://v2.sherpa.ac.uk/id/publication/issn/2196-1115" target="_blank" title="2196-1115">2196-1115</a></p>
Available at: http://works.bepress.com/ikbal-taleb/1/