Conceptual Model and Framework for Collaborative Data Cleaning

Parulian, Nikolaus Nova; Ludäscher, Bertram

doi:10.5281/zenodo.6781134

Published June 13, 2022 | Version v1

Conference paper Open

Conceptual Model and Framework for Collaborative Data Cleaning

1. University of Illinois at Urbana-Champaign

Data cleaning and preparation are essential parts of data curation lifecycles and scientific workflow. It is also known that exploratory data mining and data cleaning takes 80% of the scientific research pipeline. However, a data cleaning task can be very tedious for a single user, involving lots of exploration and iteration, and prone to error, especially when a curator finds various problems in the dataset. Nevertheless, the single-user data cleaning can also introduce bias where the cleaning quality will only be as good as their knowledge. Therefore, we can assign a data cleaning task to multiple data curators to collaborate on curating datasets. However, when a data cleaning task involves multiple users, it can introduce new problems such as data change disagreement and conflicting process dependency. Understanding this variation in changes and analyzing the merging workflow is important for data curation to evolve the data cleaning workflow and improve the dataset's quality. In line with the reusability theme for IDCC 2022, this approach can help improve the data curation pipeline by improving the data cleaning pipeline through collaboration.

Files

IDCC22_Parulian_ConceptualModel.pdf

Files (289.2 kB)

Name	Size	Download all
IDCC22_Parulian_ConceptualModel.pdf md5:ec332c576d2469541f34172c911b243e	289.2 kB	Preview Download

	All versions	This version
Views	200	199
Downloads	107	107
Data volume	33.5 MB	33.5 MB

Conceptual Model and Framework for Collaborative Data Cleaning

Creators

Description

Files

IDCC22_Parulian_ConceptualModel.pdf

Files (289.2 kB)