A Data-Centric Approach to Improve Machine Learning Model's Performance in Production
Creators
- 1. B.Tech. Department of Computer Science & Engineering, Institute of Engineering & Management, Salt-Lake, Kolkata, India
Contributors
- 1. Publisher
Description
Machine learning teaches computers to think in a similar way to how humans do. An ML models work by exploring data and identifying patterns with minimal human intervention. A supervised ML model learns by mapping an input to an output based on labeled examples of input-output (X, y) pairs. Moreover, an unsupervised ML model works by discovering patterns and information that was previously undetected from unlabelled data. As an ML project is an extensively iterative process, there is always a need to change the ML code/model and datasets. However, when an ML model achieves 70-75% of accuracy, then the code or algorithm most probably works fine. Nevertheless, in many cases, e.g., medical or spam detection models, 75% accuracy is too low to deploy in production. A medical model used in susceptible tasks such as detecting certain diseases must have an accuracy label of 98-99%. Furthermore, that is a big challenge to achieve. In that scenario, we may have a good working model, so a model-centric approach may not help much achieve the desired accuracy threshold. However, improving the dataset will improve the overall performance of the model. Improving the dataset does not always require bringing more and more data into the dataset. Improving the quality of the data by establishing a reasonable baseline level of performance, labeler consistency, error analysis, and performance auditing will thoroughly improve the model's accuracy. This review paper focuses on the data-centric approach to improve the performance of a production machine learning model.
Files
A32011011121.pdf
Files
(417.1 kB)
Name | Size | Download all |
---|---|---|
md5:62619022e56478ea2b862b6ada4b3173
|
417.1 kB | Preview Download |
Additional details
Related works
- Is cited by
- Journal article: 2249-8958 (ISSN)
Subjects
- ISSN
- 2249-8958
- Retrieval Number
- 100.1/ijeat.A32011011121