Conference paper Open Access

Unsupervised Anomaly Detection in Data Quality Control

Poon, Lex; Farshidi, Siamak; Li, Na; Zhao, Zhiming


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Poon, Lex</dc:creator>
  <dc:creator>Farshidi, Siamak</dc:creator>
  <dc:creator>Li, Na</dc:creator>
  <dc:creator>Zhao, Zhiming</dc:creator>
  <dc:date>2021-12-15</dc:date>
  <dc:description>Data is one of the most valuable assets of an

organization and has a tremendous impact on its long-term

success and decision-making processes. Typically, organizational

data error and outlier detection processes perform manually and

reactively, making them time-consuming and prone to human errors.

Additionally, rich data types, unlabeled data, and increased

volume have made such data more complex. Accordingly, an

automated anomaly detection approach is required to improve

data management and quality control processes. This study

introduces an unsupervised anomaly detection approach based

on models comparison, consensus learning, and a combination of

rules of thumb with iterative hyper-parameter tuning to increase

data quality. Furthermore, a domain expert is considered a

human in the loop to evaluate and check the data quality and to

judge the output of the unsupervised model. An experiment has

been conducted to assess the proposed approach in the context of

a case study. The experiment results confirm that the proposed

approach can improve the quality of</dc:description>
  <dc:identifier>https://zenodo.org/record/5872438</dc:identifier>
  <dc:identifier>10.1109/BigData52589.2021.9671672</dc:identifier>
  <dc:identifier>oai:zenodo.org:5872438</dc:identifier>
  <dc:relation>info:eu-repo/grantAgreement/EC/Horizon 2020 Framework Programme - European Training Networks/860627/</dc:relation>
  <dc:relation>info:eu-repo/grantAgreement/EC/H2020/862409/</dc:relation>
  <dc:relation>info:eu-repo/grantAgreement/EC/H2020/825134/</dc:relation>
  <dc:relation>info:eu-repo/grantAgreement/EC/H2020/824068/</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:subject>data quality</dc:subject>
  <dc:subject>unsupervised learning</dc:subject>
  <dc:subject>data quality control</dc:subject>
  <dc:subject>data quality assessment</dc:subject>
  <dc:subject>anomaly detection,</dc:subject>
  <dc:subject>automated data quality control</dc:subject>
  <dc:title>Unsupervised Anomaly Detection in Data Quality Control</dc:title>
  <dc:type>info:eu-repo/semantics/conferencePaper</dc:type>
  <dc:type>publication-conferencepaper</dc:type>
</oai_dc:dc>
35
46
views
downloads
Views 35
Downloads 46
Data volume 133.4 MB
Unique views 28
Unique downloads 45

Share

Cite as