Published October 31, 2024 | Version v1
Journal article Open

Framework for Implementing Experiment Tracking in Machine Learning Development

Authors/Creators

Description

Machine learning (ML) projects often involve numerous experiments that need to be tracked, compared, and reproduced to ensure consistent results and effective collaboration. This paper explores the significance of experiment tracking in ML workflows, discusses best practices, and addresses challenges in implementation. We present a comprehensive framework for experiment tracking that enhances reproducibility, accountability, and collaboration within ML teams. This paper emphasizes on how systematic tracking can optimize workflows, accelerate model development, and improve the overall quality of machine learning projects.

Files

JSAER2024-11-10-118-123.pdf

Files (147.4 kB)

Name Size Download all
md5:b66dc0516f8ce2f6448af4bc133a8cc8
147.4 kB Preview Download

Additional details

References

  • [1]. D. Sculley et al., "Hidden technical debt in machine learning systems," in Advances in Neural Information Processing Systems, 2015, pp. 2503-2511.
  • [2]. S. Amershi et al., "Software engineering for machine learning: A case study," in Proc. 41st Int. Conf. on Software Engineering: Software Engineering in Practice, 2019, pp. 291-300.
  • [3]. M. Vartak et al., "ModelDB: A system for machine learning model management," in Proc. Workshop on Human-In-the-Loop Data Analytics, 2016, pp. 1-3.
  • [4]. K. Greff et al., "The sacred infrastructure for computational research," in Proc. Python in Science Conferences-SciPy Conferences, 2017.
  • [5]. S. Schelter et al., "On challenges in machine learning model management," Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 41, no. 4, 2018.
  • [6]. H. Miao et al., "Modelhub: Deep learning lifecycle management," in 2017 IEEE 33rd Int. Conf. on Data Engineering (ICDE), 2017, pp. 1393-1394.
  • [7]. D. Baylor et al., "TFX: A tensorflow-based production-scale machine learning platform," in Proc. 23rd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2017, pp. 1387-1395.
  • [8]. S. Idowu et al., "Machine learning experiment management-A systematic review," arXiv preprint arXiv:2101.00068, 2021.
  • [9]. Z. C. Lipton and J. Steinhardt, "Troubling trends in machine learning scholarship," Queue, vol. 17, no. 1, pp. 45-77, 2019.
  • [10]. H. Harutyunyan et al., "Improving exclusivity and independence in multiclass classification," in Int. Conf. on Machine Learning, 2020, pp. 4100-4110.
  • [11]. A. Krizhevsky et al., "ImageNet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems, 2012, pp. 1097-1105.
  • [12]. J. Devlin et al., "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 4171-4186.
  • [13]. S. M. Lundberg and S. I. Lee, "A unified approach to interpreting model predictions," in Advances in Neural Information Processing Systems, 2017, pp. 4765-4774.
  • [14]. V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015.
  • [15]. M. Zaharia et al., "Accelerating the machine learning lifecycle with MLflow," IEEE Data Eng. Bull., vol. 41, no. 4, pp. 39-45, 2018.
  • [16]. C. Renggli et al., "Continuous integration of machine learning models with ease.ml/ci: Towards a rigorous yet practical treatment," in Proc. 2nd SysML Conf., 2019.
  • [17]. E. Breck et al., "The ML test score: A rubric for ML production readiness and technical debt reduction," in 2017 IEEE Int. Conf. on Big Data (Big Data), 2017, pp. 1123-1132.
  • [18]. A. Tsymbal, "The problem of concept drift: definitions and related work," Computer Science Department, Trinity College Dublin, vol. 106, no. 2, p. 58, 2004.
  • [19]. M. Feurer et al., "Efficient and robust automated machine learning," in Advances in Neural Information Processing Systems, 2015, pp. 2962-2970.
  • [20]. D. Golovin et al., "Google vizier: A service for black-box optimization," in Proc. 23rd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2017, pp. 1487-1495.
  • [21]. L. Biewald, "Experiment tracking with weights and biases," 2022. [Online]. Available: https://www.wandb.com/
  • [22]. N. Polyzotis et al., "Data lifecycle challenges in production machine learning: A survey," ACM SIGMOD Record, vol. 47, no. 2, pp. 17-28, 2018.