Conference paper Open Access
Poon, Lex;
Farshidi, Siamak;
Li, Na;
Zhao, Zhiming
{ "DOI": "10.1109/BigData52589.2021.9671672", "title": "Unsupervised Anomaly Detection in Data Quality Control", "issued": { "date-parts": [ [ 2021, 12, 15 ] ] }, "abstract": "<p>Data is one of the most valuable assets of an</p>\n\n<p>organization and has a tremendous impact on its long-term</p>\n\n<p>success and decision-making processes. Typically, organizational</p>\n\n<p>data error and outlier detection processes perform manually and</p>\n\n<p>reactively, making them time-consuming and prone to human errors.</p>\n\n<p>Additionally, rich data types, unlabeled data, and increased</p>\n\n<p>volume have made such data more complex. Accordingly, an</p>\n\n<p>automated anomaly detection approach is required to improve</p>\n\n<p>data management and quality control processes. This study</p>\n\n<p>introduces an unsupervised anomaly detection approach based</p>\n\n<p>on models comparison, consensus learning, and a combination of</p>\n\n<p>rules of thumb with iterative hyper-parameter tuning to increase</p>\n\n<p>data quality. Furthermore, a domain expert is considered a</p>\n\n<p>human in the loop to evaluate and check the data quality and to</p>\n\n<p>judge the output of the unsupervised model. An experiment has</p>\n\n<p>been conducted to assess the proposed approach in the context of</p>\n\n<p>a case study. The experiment results confirm that the proposed</p>\n\n<p>approach can improve the quality of</p>", "author": [ { "family": "Poon, Lex" }, { "family": "Farshidi, Siamak" }, { "family": "Li, Na" }, { "family": "Zhao, Zhiming" } ], "id": "5872438", "event-place": "Virtual", "version": "camera ready", "type": "paper-conference", "event": "7th International Workshop on Methods to Improve Big Data Science Projects (MIDP-2021), in IEEE BigData 2021 (MIDP-2021)" }
Views | 35 |
Downloads | 46 |
Data volume | 133.4 MB |
Unique views | 28 |
Unique downloads | 45 |