Published August 31, 2014 | Version v1
Working paper Open

Topologically Aware Job Scheduling for SLURM

Creators

  • 1. Computer Engineering Department, Bogazici University, Istanbul, Turkey

Description

SLURM is a popular resource management system that is used on many supercomputers in the TOP500 list. In this
work, we describe our new AUCSCHED3 SLURM scheduler plug-in that extends our earlier AUCSCHED2 plug-in
with a capability to do topologically aware mappings of jobs on hierarchically interconnected systems like trees or fat
trees. Our approach builds on our previous auction based scheduling algorithm of AUCSCHED2 and generates bids
for topologically good mappings of jobs onto the resources. The priorities of the jobs are also adjusted slightly without
changing the original priority ordering of jobs so as to favour topologically better candidate mappings. SLURM emulation
results are presented for a heterogeneous 1024 node system which has 16 cores and 3 GPUs on each of its nodes. The
results show that our heuristic generates better topological mappings than SLURM/Back ll. AUCSCHED3 is available
at http://code.google.com/p/slurm-ipsched/.

Files

WP180.pdf

Files (493.9 kB)

Name Size Download all
md5:b16059fa6be2a0b4e97aa85aed35ff52
493.9 kB Preview Download

Additional details

Funding

PRACE-2IP – PRACE - Second Implementation Phase Project 283493
European Commission