big.VLITTLE: On-Demand Data-Parallel Acceleration for Mobile Systems on Chip
Creators
- 1. Cornell University
Description
Single-ISA heterogeneous multi-core architectures offer a compelling high-performance and high-efficiency solution to executing task-parallel workloads in mobile systems on chip (SoCs). In addition to task-parallel workloads, many data-parallel applications such as machine learning, computer vision, and data analytics increasingly run on mobile SoCs to provide real-time user interactions. Next-generation scalable vector architectures such as the RISC-V vector extension and Arm SVE have recently emerged as unified vector abstractions for both large- and small-scale systems. In this paper, we propose novel area-efficient high-performance architectures called big.VLITTLE that support next-generation vector architectures to efficiently accelerate data-parallel workloads in conventional big.LITTLE systems. big.VLITTLE architectures reconfigure multiple little cores on demand to work as a decoupled vector engine when executing data-parallel workloads. Our results show that a big.VLITTLE system can achieve 1.6× performance speedup over an area-comparable big.LITTLE system equipped with an integrated vector unit across multiple data-parallel applications and 1.9× speedup compared to an aggressive decoupled vector engine for task-parallel workloads.
Here we provide the source code of our gem5 cycle-level models used in this work in the attached Docker image. In addition, we also provide pre-built software tools and dependencies required in our experiments. Please refer to the README file for how to load the Docker image, build gem5 simulator, cross-compile applications, run experiments, and plot generated results.
Files
Files
(6.4 GB)
Name | Size | Download all |
---|---|---|
md5:27f497d1d7fef58984c568129ee7e619
|
6.4 GB | Download |
md5:c094f00e6a0b7ca22548a5f64ccf1639
|
5.1 kB | Download |