Conference paper Open Access
Poon, Lex;
Farshidi, Siamak;
Li, Na;
Zhao, Zhiming
<?xml version='1.0' encoding='utf-8'?> <resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd"> <identifier identifierType="URL">https://zenodo.org/record/5872438</identifier> <creators> <creator> <creatorName>Poon, Lex</creatorName> <givenName>Lex</givenName> <familyName>Poon</familyName> <affiliation>University of Amsterdam</affiliation> </creator> <creator> <creatorName>Farshidi, Siamak</creatorName> <givenName>Siamak</givenName> <familyName>Farshidi</familyName> <affiliation>University of Amsterdam</affiliation> </creator> <creator> <creatorName>Li, Na</creatorName> <givenName>Na</givenName> <familyName>Li</familyName> <affiliation>University of Amsterdam</affiliation> </creator> <creator> <creatorName>Zhao, Zhiming</creatorName> <givenName>Zhiming</givenName> <familyName>Zhao</familyName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-6717-9418</nameIdentifier> <affiliation>University of Amsterdam</affiliation> </creator> </creators> <titles> <title>Unsupervised Anomaly Detection in Data Quality Control</title> </titles> <publisher>Zenodo</publisher> <publicationYear>2021</publicationYear> <subjects> <subject>data quality</subject> <subject>unsupervised learning</subject> <subject>data quality control</subject> <subject>data quality assessment</subject> <subject>anomaly detection,</subject> <subject>automated data quality control</subject> </subjects> <dates> <date dateType="Issued">2021-12-15</date> </dates> <resourceType resourceTypeGeneral="ConferencePaper"/> <alternateIdentifiers> <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/5872438</alternateIdentifier> </alternateIdentifiers> <relatedIdentifiers> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.1109/BigData52589.2021.9671672</relatedIdentifier> </relatedIdentifiers> <version>camera ready</version> <rightsList> <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights> <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights> </rightsList> <descriptions> <description descriptionType="Abstract"><p>Data is one of the most valuable assets of an</p> <p>organization and has a tremendous impact on its long-term</p> <p>success and decision-making processes. Typically, organizational</p> <p>data error and outlier detection processes perform manually and</p> <p>reactively, making them time-consuming and prone to human errors.</p> <p>Additionally, rich data types, unlabeled data, and increased</p> <p>volume have made such data more complex. Accordingly, an</p> <p>automated anomaly detection approach is required to improve</p> <p>data management and quality control processes. This study</p> <p>introduces an unsupervised anomaly detection approach based</p> <p>on models comparison, consensus learning, and a combination of</p> <p>rules of thumb with iterative hyper-parameter tuning to increase</p> <p>data quality. Furthermore, a domain expert is considered a</p> <p>human in the loop to evaluate and check the data quality and to</p> <p>judge the output of the unsupervised model. An experiment has</p> <p>been conducted to assess the proposed approach in the context of</p> <p>a case study. The experiment results confirm that the proposed</p> <p>approach can improve the quality of</p></description> </descriptions> <fundingReferences> <fundingReference> <funderName>European Commission</funderName> <funderIdentifier funderIdentifierType="Crossref Funder ID">10.13039/100010661</funderIdentifier> <awardNumber awardURI="info:eu-repo/grantAgreement/EC/Horizon 2020 Framework Programme - European Training Networks/860627/">860627</awardNumber> <awardTitle>CLoud ARtificial Intelligence For pathologY</awardTitle> </fundingReference> <fundingReference> <funderName>European Commission</funderName> <funderIdentifier funderIdentifierType="Crossref Funder ID">10.13039/100010661</funderIdentifier> <awardNumber awardURI="info:eu-repo/grantAgreement/EC/H2020/862409/">862409</awardNumber> <awardTitle>Blue-Cloud: Piloting innovative services for Marine Research & the Blue Economy</awardTitle> </fundingReference> <fundingReference> <funderName>European Commission</funderName> <funderIdentifier funderIdentifierType="Crossref Funder ID">10.13039/100010661</funderIdentifier> <awardNumber awardURI="info:eu-repo/grantAgreement/EC/H2020/825134/">825134</awardNumber> <awardTitle>smART socIal media eCOsytstem in a blockchaiN Federated environment</awardTitle> </fundingReference> <fundingReference> <funderName>European Commission</funderName> <funderIdentifier funderIdentifierType="Crossref Funder ID">10.13039/100010661</funderIdentifier> <awardNumber awardURI="info:eu-repo/grantAgreement/EC/H2020/824068/">824068</awardNumber> <awardTitle>ENVironmental Research Infrastructures building Fair services Accessible for society, Innovation and Research</awardTitle> </fundingReference> </fundingReferences> </resource>
Views | 35 |
Downloads | 46 |
Data volume | 133.4 MB |
Unique views | 28 |
Unique downloads | 45 |