Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published February 28, 2013 | Version v1
Working paper Open

An Auction Based SLURM Scheduler for Heterogeneous Supercomputers and its Comparative Performance Study

Creators

  • 1. Computer Engineering Department, Bogazici University, Istanbul, Turkey

Contributors

  • 1. Computer Engineering Department, Bogazici University, Istanbul, Turkey

Description

SLURM is a resource management system that is used on many TOP500 supercomputers. We present a heterogeneous
CPU-GPU scheduler plug-in, called AUCSCHED, for SLURM that implements an auction based algorithm. In order
to tune the topological mapping of jobs to resources, our plug-in determines at scheduling time, for each job, the best
resource choices based on node contiguity from available ones. Each of these choices is then expressed as a bid that a
job makes in an auction. Our algorithm takes a window of jobs from the front of the job queue, generates multiple bids
for available resources for each job, and solves an assignment problem that maximizes an objective function involving
priorities of jobs. We generate several CPU-GPU synthetic workloads and perform realistic SLURM emulation tests
to compare the performance of our auction based scheduler with that of SLURM's own back- ll scheduler. In general,
AUCSCHED has a few percentage points of better utilization over SLURM/BF plug-in but topologically SLURM/BF
is leading to less fragmentation whereas AUCSCHED is leading to less spread. SLURM's as well as our plug-in produce
high utilizations around 90% when workloads are made up of jobs requesting no more than 1 GPU per node. On the
other hand, when workloads contain jobs that request 2 GPUs per node, it is observed that the system utilization drops
drastically to the 65-75% range both when our AUCSCHED and SLURM's own plug-in are used. This points to the
need to further study of scheduling jobs that utilize multiple GPU cards on nodes. Our plug-in which builds on our
earlier plug-in called IPSCHED is available at http://code.google.com/p/slurm-ipsched/.

Files

An Auction Based SLURM Scheduler Heterogeneous Supercomputers and its Comparative Study.pdf

Additional details

Funding

PRACE-2IP – PRACE - Second Implementation Phase Project 283493
European Commission