Published 2025 | Version v4
Journal article Open

Scalability and Maintainability Challenges and Solutions in Machine Learning: A Systematic Literature Review

  • 1. University of Oslo, Norway

Description

Background: The rapid advancement of Machine Learning (ML) across various domains has led to its widespread adoption in academia and industry. However, ML systems present unique maintainability and scalability challenges not encountered in conventional software projects. Addressing these challenges and possible solutions is crucial for ensuring long-term value and preventing ML performance decline.

Objective: This research aims to identify and consolidate the maintainability and scalability challenges and solutions at different stages of the ML workflow and discern the interdependencies between these ML stages, and investigate and identify potential tradeoffs and solutions to overcome these challenges.
Methodology: We conducted a rigorous and comprehensive systematic literature review, initially screening over 17,000 papers and subsequently selecting and reviewing 124 papers to be included in this study.

Contributions: Our study presents (i) a catalogue of maintainability and scalability challenges and solutions in various stages of Data Engineering and Model Engineering workflows, as well as difficulties in building ML systems or applications in the current ecosystem of framework and tools ; (ii) Identified and consolidated 41 maintainability challenges and 13 Scalability challenges with some potential solutions, tools, and recommendations  (iii) synthesised a list of tradeoffs and strategies for striking a balance in achieving Maintainability and Scalability.

Conclusions: This study can help practitioners and organizations better understand the maintainability and scalability challenges and available solutions. Moreover, the identified trade-offs and challenges can serve as a foundation for future research, further enhancing our comprehension of ML system maintainability and scalability in different stages of the ML workflow. This knowledge will empower them to circumvent potential pitfalls and facilitate the development of maintainable and scalable ML systems and applications.

Files

SLR Coding Report.pdf

Files (5.6 MB)

Name Size Download all
md5:dc4d1bda3eddf881583121829d29cdea
28.5 kB Download
md5:45c4df8ce0d09b9d0b9ebd619ea120f7
10.9 kB Download
md5:75a7bc54dce0efa2e95b1bb349b0301d
5.5 MB Preview Download
md5:6acd6dc1136fb0c3460443453e7a6c54
16.7 kB Download