Published October 10, 2023 | Version v1
Journal article Open

BIG DATA PREPROCESSING USING ENHANCED DATA QUALITY RULES DISCOVERY MODEL (EDQRM)

  • 1. * Assistant Professor, Department of Computer Science, Sri Nehru Maha Vidyalaya College of Arts and Science, Coimbatore, Tamilnadu ** Assistant Professor, Department of Computer Science, Sri Krishna Adithya College of Arts and Science, Coimbatore, Tamilnadu

Description

In the Big Data Era, data is the center for any governmental, institutional, and private organization. Endeavors were equipped towards extricating profoundly important bits of knowledge that can't occur assuming data is of low quality. Hence, data quality (DQ) is considered as a vital component in big data processing. In this stage, bad quality data isn't entered to the Big Data value chain. This paper, proposed the Enhanced data quality Rules discovery model (EDQRM) for assessment of quality and Big Data pre-processing. EDQRM discovery model to improve and precisely focus on the pre-processing exercises in view of quality requirements. Characterized, a bunch of pre-processing exercises related with data quality dimensions (DQD's) to automatize the EDQRM process. Rules improvement is applied on approved rules to stay away from multi-passes pre-processing exercises and disposes of copy rules. Directed tests showed an expanded quality scores in the wake of applying the found and optimized EDQRM's on data.

Files

276.pdf

Files (1.1 MB)

Name Size Download all
md5:cc078733e8ef6d986af58b485cef6bb9
1.1 MB Preview Download

Additional details

References

  • 1. Dai, H., Zhang, S., Wang, L., & Ding, Y. (2016). Research and implementation of big data preprocessing system based on Hadoop. 2016 IEEE International Conference on Big Data Analysis (ICBDA). 2. Wang, Z., Huang, X., Song, Y., & Xiao, J. (2017). An outlier detection algorithm based on the degree of sharpness and its applications on traffic big data preprocessing. 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA) 3. Celik, O., Hasanbasoglu, M., Aktas, M. S., Kalipsiz, O., & Kanli, A. N. (2019). Implementation of Data Preprocessing Techniques on Distributed Big Data Platforms. 2019 4th International Conference on Computer Science and Engineering (UBMK). 4. Lincy, S. S. B. T., & Kumar, N. S. (2017). An enhanced pre-processing model for big data processing: A quality framework. 2017 International Conference on Innovations in Green Energy and Healthcare Technologies (IGEHT). 5. P. Glowalla, P. Balazy, D. Basten, and A. Sunyaev, "Process-Driven Data Quality Management - An Application of the Combined Conceptual Life Cycle Model," in 2014 47th Hawaii International Conference on System Sciences (HICSS), 2014, pp. 4700-4709. 6. F. Sidi, P. H. Shariat Panahy, L. S. Affendey, M. A. Jabar, H. Ibrahim, and A. Mustapha, "Data quality: A survey of data quality dimensions," in 2012 International Conference on Information Retrieval Knowledge Management (CAMP), 2012, pp. 300-304. 7. Y. W. Lee, "Crafting rules: context-reflective data quality problem solving," J. Manag. Inf. Syst., vol. 20, no. 3, pp. 93-119, 2003. 8. P. Z. Yeh and C. A. Puri, "An Efficient and Robust Approach for Discovering Data Quality Rules," 22nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2010, vol. 1, pp. 248-255. 9. F. Chiang and R. J. Miller, "Discovering data quality rules," Proc. VLDB Endow., vol. 1, no. 1, pp. 1166- 1177, 2008. 10. W. Fan, "Dependencies revisited for improving data quality," in Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART, 2008, pp. 159-170.