Published October 30, 2021 | Version v1
Journal article Open

A Data-Centric Approach to Improve Machine Learning Model's Performance in Production

  • 1. B.Tech. Department of Computer Science & Engineering, Institute of Engineering & Management, Salt-Lake, Kolkata, India
  • 1. Publisher

Description

Machine learning teaches computers to think in a similar way to how humans do. An ML models work by exploring data and identifying patterns with minimal human intervention. A supervised ML model learns by mapping an input to an output based on labeled examples of input-output (X, y) pairs. Moreover, an unsupervised ML model works by discovering patterns and information that was previously undetected from unlabelled data. As an ML project is an extensively iterative process, there is always a need to change the ML code/model and datasets. However, when an ML model achieves 70-75% of accuracy, then the code or algorithm most probably works fine. Nevertheless, in many cases, e.g., medical or spam detection models, 75% accuracy is too low to deploy in production. A medical model used in susceptible tasks such as detecting certain diseases must have an accuracy label of 98-99%. Furthermore, that is a big challenge to achieve. In that scenario, we may have a good working model, so a model-centric approach may not help much achieve the desired accuracy threshold. However, improving the dataset will improve the overall performance of the model. Improving the dataset does not always require bringing more and more data into the dataset. Improving the quality of the data by establishing a reasonable baseline level of performance, labeler consistency, error analysis, and performance auditing will thoroughly improve the model's accuracy. This review paper focuses on the data-centric approach to improve the performance of a production machine learning model.

Files

A32011011121.pdf

Files (417.1 kB)

Name Size Download all
md5:62619022e56478ea2b862b6ada4b3173
417.1 kB Preview Download

Additional details

Related works

Is cited by
Journal article: 2249-8958 (ISSN)

Subjects

ISSN
2249-8958
Retrieval Number
100.1/ijeat.A32011011121