Topologically Aware Job Scheduling for SLURM
Description
SLURM is a popular resource management system that is used on many supercomputers in the TOP500 list. In this
work, we describe our new AUCSCHED3 SLURM scheduler plug-in that extends our earlier AUCSCHED2 plug-in
with a capability to do topologically aware mappings of jobs on hierarchically interconnected systems like trees or fat
trees. Our approach builds on our previous auction based scheduling algorithm of AUCSCHED2 and generates bids
for topologically good mappings of jobs onto the resources. The priorities of the jobs are also adjusted slightly without
changing the original priority ordering of jobs so as to favour topologically better candidate mappings. SLURM emulation
results are presented for a heterogeneous 1024 node system which has 16 cores and 3 GPUs on each of its nodes. The
results show that our heuristic generates better topological mappings than SLURM/Back ll. AUCSCHED3 is available
at http://code.google.com/p/slurm-ipsched/.
Files
WP180.pdf
Files
(493.9 kB)
Name | Size | Download all |
---|---|---|
md5:b16059fa6be2a0b4e97aa85aed35ff52
|
493.9 kB | Preview Download |