Training Data Alchemy: Balancing Quality and Quantity in Machine Learning Training

T. Aditya Sai Srinivas; B. Thulasi Thanmai; A. David Donald; G. Thippanna; I. V.  Dwaraka Srihith; I. Venkat Sai

doi:10.5281/zenodo.8138725

Published July 12, 2023 | Version v1

Journal article Open

Training Data Alchemy: Balancing Quality and Quantity in Machine Learning Training

1. Associate Professor, Ashoka Women's Engineering College, Kurnool
2. Student, Ashoka Women's Engineering College, Kurnool
3. Assistant Professor, Ashoka Women's Engineering College, Kurnool
4. Professor, Ashoka Women's Engineering College, Kurnool
5. Student, Alliance University, Bangalore
6. Student,G. Pullaiah College of Engineering and Technology, Kurnool

Determining the optimal amount of training data for machine learning algorithms is a critical task in achieving successful and accurate models. This abstract delves into the research surrounding this question and provides insights into the factors that affect the quantity of training data required for effective machine learning. It explores the delicate balance between data quality and quantity, the concept of over fitting, and the importance of representative and diverse datasets. Additionally, it discusses the various techniques and approaches used to estimate the minimum training data required for achieving desirable performance. By understanding the implications of training data size on model performance, researchers and practitioners can make informed decisions in selecting appropriate training datasets, thereby maximizing the efficiency and effectiveness of machine learning algorithms.

Files

Training Data Alchemy -Formatted Paper.pdf

Files (323.8 kB)

Name	Size	Download all
Training Data Alchemy -Formatted Paper.pdf md5:27de4932018e76b974ae2ab35b911f99	323.8 kB	Preview Download

Additional details

1. Halevy, A., Norvig, P., & Pereira, F. (2009). The Unreasonable Effectiveness of Data. IEEE Intelligent Systems, 24(2), 8-12.
2. Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G. S., Ng, A. (2012). Building high-level features using large-scale unsupervised learning. In Proceedings of the 29th International Conference on Machine Learning (ICML-12):1025-1032.
3. Rudin, C. (2019). The Mythos of Model Interpretability. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(3), 1019-1047.
4. He, H., Bai, Y., Garcia, E. A., & Li, S. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284.
5. Bengio, Y. (2012). Practical Recommendations for Gradient-Based Training of Deep Architectures. In Neural Networks: Tricks of the Trade.:437-478).

	All versions	This version
Views	81	81
Downloads	903	903
Data volume	320.9 MB	320.9 MB

Training Data Alchemy: Balancing Quality and Quantity in Machine Learning Training

Creators

Description

Files

Training Data Alchemy -Formatted Paper.pdf

Files (323.8 kB)

Additional details

References