Machine Learning for Predictive Capacity Planning: Evolution from Analytical Modeling to Autonomous Infrastructure
Authors/Creators
Description
As digital infrastructures expanded rapidly throughout the 2010s, the complexity of managing dynamic workloads, fluctuating user demand, and distributed computing environments exposed the limitations of traditional capacity planning. Reactive methods based on static thresholds, manual scaling, and retrospective performance analysis proved inadequate for hybrid and cloud-native systems that required elasticity, scalability, and near real-time decision-making. In response, machine learning (ML) emerged as a transformative force, enabling predictive capacity planning that leverages historical utilization data, workload telemetry, and application metrics to forecast resource needs proactively. By integrating statistical time-series analysis, ensemble learning, and deep neural forecasting, organizations could automate capacity optimizationbalancing cost, performance, and reliability with precision. This article explores how ML-based forecasting reshaped infrastructure management, tracing its progression from analytical models to reinforcement learning frameworks that support self-healing, autonomous infrastructure planning across modern digital ecosystems.
Files
EJAET-6-10-84-90.pdf
Files
(499.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:aa58582a4cdd80af81c2bde530ccb105
|
499.9 kB | Preview Download |
Additional details
References
- [1]. Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control (5th ed.). Wiley. https://doi.org/10.1002/9781118619193
- [2]. Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice (2nd ed.). OTexts. https://otexts.com/fpp2/
- [3]. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
- [4]. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 785–794. https://doi.org/10.1145/2939672.2939785
- [5]. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems (NeurIPS), 27. https://doi.org/10.48550/arXiv.1409.3215
- [6]. Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2019). DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3), 1181–1191. https://doi.org/10.1016/j.ijforecast.2019.07.001
- [7]. Menasce, D. A., & Almeida, V. A. F. (2002). Capacity Planning for Web Services: Metrics, Models, and Methods. Prentice Hall.
- [8]. Harchol-Balter, M. (2013). Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press. https://doi.org/10.1017/CBO9781139226424
- [9]. Lorido-Botran, T., Miguel-Alonso, J., & Lozano, J. A. (2014). A review of auto-scaling techniques for elastic applications in cloud environments. Journal of Grid Computing, 12(4), 559–592. https://doi.org/10.1007/s10723-014-9314-7
- [10]. Delimitrou, C., & Kozyrakis, C. (2014). Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 127–144. https://doi.org/10.1145/2541940.2541941
- [11]. Mao, H., Alizadeh, M., Menache, I., & Kandula, S. (2016). Resource management with deep reinforcement learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks (HotNets), 50–56. https://doi.org/10.1145/3005745.3005750
- [12]. Mao, H., Schwarzkopf, M., Venkatakrishnan, S. B., Meng, Z., & Alizadeh, M. (2019). Learning scheduling algorithms for data processing clusters. Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM), 270–288. https://doi.org/10.1145/3341302.3342080
- [13]. Netflix Technology Blog. (2014). Scryer: Netflix's predictive auto-scaling engine. Retrieved from https://netflixtechblog.com/scryer-netflixs-predictive-auto-scaling-engine-3eec6f9b6d3a
- [14]. Amazon Web Services. (2018). Predictive scaling for EC2 Auto Scaling. AWS Compute Blog. Retrieved from https://aws.amazon.com/blogs/compute/introducing-predictive-scaling-for-ec2/
- [15]. Beyer, B., Jones, C., Petoff, J., & Murphy, N. (Eds.). (2016). Site Reliability Engineering: How Google Runs Production Systems. O'Reilly Media. https://sre.google/sre-book/table-of-contents/