Conference paper Open Access

Unsupervised Anomaly Detection in Data Quality Control

Poon, Lex; Farshidi, Siamak; Li, Na; Zhao, Zhiming


Citation Style Language JSON Export

{
  "DOI": "10.1109/BigData52589.2021.9671672", 
  "title": "Unsupervised Anomaly Detection in Data Quality Control", 
  "issued": {
    "date-parts": [
      [
        2021, 
        12, 
        15
      ]
    ]
  }, 
  "abstract": "<p>Data is one of the most valuable assets of an</p>\n\n<p>organization and has a tremendous impact on its long-term</p>\n\n<p>success and decision-making processes. Typically, organizational</p>\n\n<p>data error and outlier detection processes perform manually and</p>\n\n<p>reactively, making them time-consuming and prone to human errors.</p>\n\n<p>Additionally, rich data types, unlabeled data, and increased</p>\n\n<p>volume have made such data more complex. Accordingly, an</p>\n\n<p>automated anomaly detection approach is required to improve</p>\n\n<p>data management and quality control processes. This study</p>\n\n<p>introduces an unsupervised anomaly detection approach based</p>\n\n<p>on models comparison, consensus learning, and a combination of</p>\n\n<p>rules of thumb with iterative hyper-parameter tuning to increase</p>\n\n<p>data quality. Furthermore, a domain expert is considered a</p>\n\n<p>human in the loop to evaluate and check the data quality and to</p>\n\n<p>judge the output of the unsupervised model. An experiment has</p>\n\n<p>been conducted to assess the proposed approach in the context of</p>\n\n<p>a case study. The experiment results confirm that the proposed</p>\n\n<p>approach can improve the quality of</p>", 
  "author": [
    {
      "family": "Poon, Lex"
    }, 
    {
      "family": "Farshidi, Siamak"
    }, 
    {
      "family": "Li, Na"
    }, 
    {
      "family": "Zhao, Zhiming"
    }
  ], 
  "id": "5872438", 
  "event-place": "Virtual", 
  "version": "camera ready", 
  "type": "paper-conference", 
  "event": "7th International Workshop on Methods to Improve Big Data Science Projects (MIDP-2021), in IEEE BigData 2021 (MIDP-2021)"
}
35
46
views
downloads
Views 35
Downloads 46
Data volume 133.4 MB
Unique views 28
Unique downloads 45

Share

Cite as