Conference paper Open Access
Poon, Lex;
Farshidi, Siamak;
Li, Na;
Zhao, Zhiming
{ "description": "<p>Data is one of the most valuable assets of an</p>\n\n<p>organization and has a tremendous impact on its long-term</p>\n\n<p>success and decision-making processes. Typically, organizational</p>\n\n<p>data error and outlier detection processes perform manually and</p>\n\n<p>reactively, making them time-consuming and prone to human errors.</p>\n\n<p>Additionally, rich data types, unlabeled data, and increased</p>\n\n<p>volume have made such data more complex. Accordingly, an</p>\n\n<p>automated anomaly detection approach is required to improve</p>\n\n<p>data management and quality control processes. This study</p>\n\n<p>introduces an unsupervised anomaly detection approach based</p>\n\n<p>on models comparison, consensus learning, and a combination of</p>\n\n<p>rules of thumb with iterative hyper-parameter tuning to increase</p>\n\n<p>data quality. Furthermore, a domain expert is considered a</p>\n\n<p>human in the loop to evaluate and check the data quality and to</p>\n\n<p>judge the output of the unsupervised model. An experiment has</p>\n\n<p>been conducted to assess the proposed approach in the context of</p>\n\n<p>a case study. The experiment results confirm that the proposed</p>\n\n<p>approach can improve the quality of</p>", "license": "https://creativecommons.org/licenses/by/4.0/legalcode", "creator": [ { "affiliation": "University of Amsterdam", "@type": "Person", "name": "Poon, Lex" }, { "affiliation": "University of Amsterdam", "@type": "Person", "name": "Farshidi, Siamak" }, { "affiliation": "University of Amsterdam", "@type": "Person", "name": "Li, Na" }, { "affiliation": "University of Amsterdam", "@id": "https://orcid.org/0000-0002-6717-9418", "@type": "Person", "name": "Zhao, Zhiming" } ], "headline": "Unsupervised Anomaly Detection in Data Quality Control", "image": "https://zenodo.org/static/img/logos/zenodo-gradient-round.svg", "datePublished": "2021-12-15", "url": "https://zenodo.org/record/5872438", "version": "camera ready", "@type": "ScholarlyArticle", "keywords": [ "data quality", "unsupervised learning", "data quality control", "data quality assessment", "anomaly detection,", "automated data quality control" ], "@context": "https://schema.org/", "identifier": "https://doi.org/10.1109/BigData52589.2021.9671672", "@id": "https://doi.org/10.1109/BigData52589.2021.9671672", "workFeatured": { "url": "http://www.midp-info.org/", "alternateName": "MIDP-2021", "location": "Virtual", "@type": "Event", "name": "7th International Workshop on Methods to Improve Big Data Science Projects (MIDP-2021), in IEEE BigData 2021" }, "name": "Unsupervised Anomaly Detection in Data Quality Control" }
Views | 35 |
Downloads | 46 |
Data volume | 133.4 MB |
Unique views | 28 |
Unique downloads | 45 |