big.VLITTLE: On-Demand Data-Parallel Acceleration for Mobile Systems on Chip
Creators
- 1. Cornell University
Description
Single-ISA heterogeneous multi-core architectures offer a compelling high-performance and high-efficiency solution to executing task-parallel workloads in mobile systems on chip (SoCs). In addition to task-parallel workloads, many data-parallel applications, such as machine learning, computer vision, and data analytics, increasingly run on mobile SoCs to provide real-time user interactions. Next-generation scalable vector architectures, such as the RISC-V vector extension and Arm SVE, have recently emerged as unified vector abstractions for both large- and small-scale systems. In this paper, we propose novel area-efficient high-performance architectures called big.VLITTLE that support next-generation vector architectures to efficiently accelerate data-parallel workloads in conventional big.LITTLE systems. big.VLITTLE architectures reconfigure multiple little cores on demand to work as a decoupled vector engine when executing data-parallel workloads. Our results show that a big.VLITTLE system can achieve 1.6× performance speedup over an area-comparable big.LITTLE system equipped with an integrated vector unit across multiple data-parallel applications and 1.7× speedup compared to an aggressive decoupled vector engine for task-parallel workloads.
Here we provide the source code of our gem5 cycle-level models used in this work in the attached Docker image. In addition, we also provide pre-built software tools and dependencies required in our experiments. Please refer to the README file for how to load the Docker image, build gem5 simulator, cross-compile applications, run experiments, and plot generated results.
Files
README.md
Files
(6.0 GB)
Name | Size | Download all |
---|---|---|
md5:20ffe190cbdc3218e14b04e9cf6dfadb
|
6.0 GB | Download |
md5:d8df22b5ffa5c0ad50d3239e43f4f7f6
|
5.5 kB | Preview Download |