Efficiently Supporting Dynamic Task-Parallelism on Heterogeneous Cache-Coherent Systems
Description
Manycore processors, with tens to hundreds of tiny cores but no hardware-based cache coherence, can offer tremendous peak throughput on highly parallel programs while being complexity and energy efficient. Manycore processors can be combined with a few high-performance big cores for executing operating systems, legacy code, and serial regions. These systems use heterogeneous cache coherence (HCC) with hardware-based cache coherence between big cores and software-centric cache coherence between tiny cores. Unfortunately, programming these heterogeneous cache-coherent systems to enable collaborative execution is challenging, especially when considering dynamic task parallelism. This paper seeks to address this challenge using a combination of light-weight software and hardware techniques. We provide a detailed description of how to implement a work-stealing runtime to enable dynamic task parallelism on heterogeneous cache-coherent systems. We also propose direct task stealing (DTS), a new technique based on user-level interrupts to bypass the memory system and thus improve the performance and energy efficiency of work stealing. Our results demonstrate that executing dynamic task-parallel applications on a 64-core system (4 big, 60 tiny) with complexity-effective HCC and DTS can achieve: 7× speedup over a single big core; 1.4× speedup over an area-equivalent eight big core system with hardware-based cache coherence; and 21% better performance and similar energy efficiency compared to a 64-core system (4 big, 60 tiny) with full-system hardware-based cache coherence.
Here we provide a Docker image, containing the source code and datasets we used in the paper. Please refer to the README file for how to import the docker image, build binaries from source code, and run the simulations we did in this paper.
Files
README.md
Files
(13.7 GB)
Name | Size | Download all |
---|---|---|
md5:ceb53a8d9caa8af604a89fd93ecce4e6
|
17.0 kB | Preview Download |
md5:52010ef6a416f1d363dff1e2232aff77
|
11.1 GB | Download |
md5:8567f23174a2e99035845e4b64879402
|
2.7 GB | Download |