Journal article Open Access

Classifying Temporal Characteristics of Job I/O Using Machine Learning Techniques

Eugen Betke; Julian M Kunkel

Every day, supercomputers execute 1000s of jobs with different characteristics. Data centers monitor the behavior of jobs to support the users and improve the infrastructure, for instance, by optimizing jobs or by determining guidelines for the next procurement. The classification of jobs into groups that express similar run-time behavior aids this analysis as it reduces the number of representative jobs to look into.

This work utilizes machine learning techniques to cluster and classify parallel jobs based on the similarity in their temporal I/O behavior. Our contribution is the qualitative and quantitative evaluation of different I/O characterizations and similarity measurements and the development of a suitable clustering algorithm.

In the evaluation, we explore I/O characteristics from monitoring data of one million parallel jobs and cluster them into groups of similar jobs. Therefore, the time series of various I/O statistics is converted into features using different similarity metrics that customize the classification.

When using general-purpose clustering techniques, suboptimal results are obtained. Additionally, we extract phases of I/O activity from jobs. Finally, we simplify the grouping algorithm in favor of performance. We discuss the impact of these changes on the clustering quality.

Files (2.6 MB)
Name Size
jhps-incubator-06-temporal-29-jan.pdf
md5:c61972423a10fa90aa7995a7054bc912
2.6 MB Download
25
48
views
downloads
All versions This version
Views 2525
Downloads 4848
Data volume 124.5 MB124.5 MB
Unique views 2020
Unique downloads 3838

Share

Cite as