Published July 12, 2023 | Version v1
Poster Open

Data-centric ML pipeline for data drift and data preprocessing

Creators

  • 1. Arm Ltd

Description

Main MLOps challenges in hardware verification originate from severe data heterogeneity and frequent data drift both in feature and type spaces. This study proposes using multi-purpose data schema, inferred in a bottom-up fashion, which can be used for data monitoring, type casting, and preprocessing. This approach provides a data ingestion step in an ML pipeline that increases transparency and flexibility in data preprocessing. With the flexibility in data preprocessing, we also demonstrate that data (preprocessing) tuning can further improve model performance, emphasizing the importance of data handling and data quality in building ML products.

Files

SciPy_2023_Hongsup_Shin.pdf

Files (410.7 kB)

Name Size Download all
md5:8bf7f93762eea0e4569cabba66654eb2
410.7 kB Preview Download