Eugen Betke
Julian M Kunkel
2021-01-29
<p><strong>Every day, supercomputers execute 1000s of jobs with different characteristics. Data centers monitor the behavior of jobs to support the users and improve the infrastructure, for instance, by optimizing jobs or by determining guidelines for the next procurement. The classification of jobs into groups that express similar run-time behavior aids this analysis as it reduces the number of representative jobs to look into.</strong></p>
<p><strong>This work utilizes machine learning techniques to cluster and classify parallel jobs based on the similarity in their temporal I/O behavior. Our contribution is the qualitative and quantitative evaluation of different I/O characterizations and similarity measurements and the development of a suitable clustering algorithm.</strong></p>
<p><strong>In the evaluation, we explore I/O characteristics from monitoring data of one million parallel jobs and cluster them into groups of similar jobs. Therefore, the time series of various I/O statistics is converted into features using different similarity metrics that customize the classification.</strong></p>
<p><strong>When using general-purpose clustering techniques, suboptimal results are obtained. Additionally, we extract phases of I/O activity from jobs. Finally, we simplify the grouping algorithm in favor of performance. We discuss the impact of these changes on the clustering quality.</strong></p>
https://doi.org/10.5281/zenodo.4478960
oai:zenodo.org:4478960
eng
Zenodo
https://zenodo.org/communities/jhps
https://doi.org/10.5281/zenodo.4478959
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Journal of High-Performance Storage, 1(1), (2021-01-29)
I/O fingerprinting
performance analysis
monitoring
Classifying Temporal Characteristics of Job I/O Using Machine Learning Techniques
info:eu-repo/semantics/article