Training Data Alchemy: Balancing Quality and Quantity in Machine Learning Training
Creators
- 1. Associate Professor, Ashoka Women's Engineering College, Kurnool
- 2. Student, Ashoka Women's Engineering College, Kurnool
- 3. Assistant Professor, Ashoka Women's Engineering College, Kurnool
- 4. Professor, Ashoka Women's Engineering College, Kurnool
- 5. Student, Alliance University, Bangalore
- 6. Student,G. Pullaiah College of Engineering and Technology, Kurnool
Description
Determining the optimal amount of training data for machine learning algorithms is a critical task in achieving successful and accurate models. This abstract delves into the research surrounding this question and provides insights into the factors that affect the quantity of training data required for effective machine learning. It explores the delicate balance between data quality and quantity, the concept of over fitting, and the importance of representative and diverse datasets. Additionally, it discusses the various techniques and approaches used to estimate the minimum training data required for achieving desirable performance. By understanding the implications of training data size on model performance, researchers and practitioners can make informed decisions in selecting appropriate training datasets, thereby maximizing the efficiency and effectiveness of machine learning algorithms.
Files
Training Data Alchemy -Formatted Paper.pdf
Files
(323.8 kB)
Name | Size | Download all |
---|---|---|
md5:27de4932018e76b974ae2ab35b911f99
|
323.8 kB | Preview Download |
Additional details
References
- 1. Halevy, A., Norvig, P., & Pereira, F. (2009). The Unreasonable Effectiveness of Data. IEEE Intelligent Systems, 24(2), 8-12.
- 2. Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G. S., Ng, A. (2012). Building high-level features using large-scale unsupervised learning. In Proceedings of the 29th International Conference on Machine Learning (ICML-12):1025-1032.
- 3. Rudin, C. (2019). The Mythos of Model Interpretability. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(3), 1019-1047.
- 4. He, H., Bai, Y., Garcia, E. A., & Li, S. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284.
- 5. Bengio, Y. (2012). Practical Recommendations for Gradient-Based Training of Deep Architectures. In Neural Networks: Tricks of the Trade.:437-478).