An Auction Based SLURM Scheduler for Heterogeneous Supercomputers and its Comparative Performance Study
Contributors
Other:
- 1. Computer Engineering Department, Bogazici University, Istanbul, Turkey
Description
SLURM is a resource management system that is used on many TOP500 supercomputers. We present a heterogeneous
CPU-GPU scheduler plug-in, called AUCSCHED, for SLURM that implements an auction based algorithm. In order
to tune the topological mapping of jobs to resources, our plug-in determines at scheduling time, for each job, the best
resource choices based on node contiguity from available ones. Each of these choices is then expressed as a bid that a
job makes in an auction. Our algorithm takes a window of jobs from the front of the job queue, generates multiple bids
for available resources for each job, and solves an assignment problem that maximizes an objective function involving
priorities of jobs. We generate several CPU-GPU synthetic workloads and perform realistic SLURM emulation tests
to compare the performance of our auction based scheduler with that of SLURM's own back- ll scheduler. In general,
AUCSCHED has a few percentage points of better utilization over SLURM/BF plug-in but topologically SLURM/BF
is leading to less fragmentation whereas AUCSCHED is leading to less spread. SLURM's as well as our plug-in produce
high utilizations around 90% when workloads are made up of jobs requesting no more than 1 GPU per node. On the
other hand, when workloads contain jobs that request 2 GPUs per node, it is observed that the system utilization drops
drastically to the 65-75% range both when our AUCSCHED and SLURM's own plug-in are used. This points to the
need to further study of scheduling jobs that utilize multiple GPU cards on nodes. Our plug-in which builds on our
earlier plug-in called IPSCHED is available at http://code.google.com/p/slurm-ipsched/.
Files
An Auction Based SLURM Scheduler Heterogeneous Supercomputers and its Comparative Study.pdf
Files
(426.0 kB)
Name | Size | Download all |
---|---|---|
md5:3931fd04f027a95bbc0f8826dd8e1ac8
|
426.0 kB | Preview Download |