ASPLOS22 Artifact - AStitch Machine Learning Optimizing Compiler
Description
This is the artifact of AStitch paper accepted by ASPLOS 2022.
We provide a docker image to ease the environment setup. You just need to pull the docker image and launch a container:
docker pull jamesthez/astitch:astitch_asplos_ae
docker run --gpus all --net=host --pid=host -it --name <your-container-name> \
jamesthez/astitch:astitch_asplos_ae bash
Alternatively, you can download the attached tar.gz file, unzip it into tar file, and import the tar file as a docker image:
gzip -d astitch_asplos_ae.tar.gz
docker import - astitch_asplos_ae < astitch_asplos_ae.tar
docker run --gpus all --net=host --pid=host -it astitch_asplos_ae bash
Use sudo to run docker if necessary.
After launching docker, please following the instructions below to reproduce the key results of AStitch. (You can also find the instructions at `/root/README.md` in the docker container).
=============================================================
# Collect Performance Results.
Run the following command to collect all performance results at once.
cd /root/scripts
sh run_all.sh
It takes about 2.5 hours to execute all experiments. Please do not execute other heavy workloads during the execution, neither CPU workloads nor GPU workloads.
All results will be written into `/root/scripts/results`.
# Manual Finishing
The results in `/root/scripts/results` contains the data to draw Figure.11-a, Figure.11-b, and Figure.12, and to fill Table.3. We have prepared the scripts to draw the figures (in `/root/scripts/plot`). It requires some manual finishing to fill data to corresponding scripts:
## Inference Speedup. Figure.11-a.
Script to draw figure: `/root/scripts/plot/infer_speedup_fig_11_a.py`
It requires to fill the execution time of the five inference benchmarks to line 9-12 in `infer_speedup_fig_11_a.py`, including the time of naive Tensorflow (tf), XLA, (xla), TensorRT (trt) and AStitch (astitch).
1. Take CRNN as an example.
Go to directory `/root/scripts/results/crnn_infer/`.
The execution time of naive Tensorflow, XLA, TensorRT and AStitch are in the file named tf_time.txt, xla_time.txt, trt_time.txt and astitch_time.txt correspondingly.
As for TensorFow, XLA and AStitch, the execution time is shown in the first line of the file.
As for TensorRT, the execution time is shown in the bottom of the file.
Please fill the corresponding data into `/root/scripts/plot/infer_speedup_fig_11_a.py`. CRNN corresponds to the first column of the table in line 9-12.
2. Fill performance results of all five inference workloads in `/root/scripts/plot/infer_speedup_fig_11_a.py`.
Performancde result of CRNN: /root/scripts/results/crnn_infer/
Performancde result of ASR: /root/scripts/results/asr_infer/
Performancde result of BERT: /root/scripts/results/bert_infer/
Performancde result of Transformer: /root/scripts/results/transformer_infer/
Performancde result of DIEN: /root/scripts/results/dien_infer/
3. Draw figure.
After filling the data, please execute the following command and you will find the generated figure named inference_speedup.pdf.
python /root/scripts/plot/infer_speedup_fig_11_a.py
## Training Speedup. Figure.11-b.
We can only provide the evaluation of BERT and Transformer in the artifact. We are not able to release the training scripts and input data of DIEN due to company policy.
Script to draw figure: `/root/scripts/plot/train_speedup_fig_11_b.py`
It requires to fill the execution time of the two training benchmarks to line 9-11 in `train_speedup_fig_11_b.py` , including the time of naive Tensorflow (tf), XLA, (xla) and AStitch (astitch).
1. Take BERT as an example.
Go to directory `/root/scripts/results/bert_train/`
The execution time (in second) of naive Tensorflow, XLA, and AStitch are in the file named tf_time_sec.txt, xla_time_sec.txt, and astitch_time_sec.txt correspondingly.
Please fill the corresponding data into `/root/scripts/plot/infer_speedup_fig_11_b.py`. BERT corresponds to the first column of the table in line 9-11.
2. Fill all performance results in `/root/scripts/plot/infer_speedup_fig_11_b.py`.
Performancde result of BERT: /root/scripts/results/bert_train/
Performancde result of Transformer: /root/scripts/results/transformer_train/
3. Draw figure.
After filling the data, please execute the following command and you will find the generated figure named train_speedup.pdf.
python /root/scripts/plot/train_speedup_fig_11_b.py
The above results have already shown the most important results (speedup of AStitch). If you have the interest of the breakdown, please see the following instructions.
## Breakdown. Figure.12. Table.3.
1. Figure.12
In Figure.12, it shows the breakdown of the execution time of memory-intensive computations and overhead, for both XLA and AStitch.
You need to fill the data in `/root/scripts/plot/breakdown_fig_12.py` (line 8-14).
Take CRNN as an example. You can find the data of XLA in `/root/scripts/results/crnn_infer/xla_time.txt`, and AStitch in `/root/scripts/results/crnn_infer/astitch_time.txt`.
There is a table at the bottom in each file (last 3 lines). The execution time of memory-intensive computations (i.e. mem) is the data at {row-1, column-3} in the table.
The overhead (i.e. cpu) is the data at {row-1, column-1} in the table.
(Note that {row-2, column-1} is empty.)
You already know that the end-2-end (i.e. e2e) time is at the first line of the file.
Finally, you can fill the data of mem, cpu and e2e in xla_time.txt into line 8-10 of `/root/scripts/plot/breakdown_fig_11_a.py`, and the corresponding data in astitch_time.txt into line 12-14.
After fill the data of all the five inference workloads, you can run the following command to get the figure file (breakdown.pdf):
python /root/scripts/plot/breakdown_fig_12.py
2. Table.3
Table 3 shows the kernel number of memory-intensive computations (i.e. mem) and CUDA memcpy/memset calls (i.e. cpy). You can find the data as following.
Take CRNN as an example. You can find the data of XLA in `/root/scripts/results/crnn_infer/xla_time.txt`, and AStitch in `/root/scripts/results/crnn_infer/astitch_time.txt`.
There is a table at the bottom in each file (last 3 lines).
The kernel number of memory-intensive computations (i.e. mem) is the data at {row-2, column-3} in the table.
The kernel number of CUDA memcpy/memset (i.e. cpy) is the data at {row-2, column-4} in the table.
(Note that {row-2, column-1} is empty.)
You can find the corresponding data of Table.3 for all the five inference workloads of both XLA and AStitch with the above process.
===============================================
If you have any problem about the evaluation, or any other questions about AStitch, please email me at zzchman@gmail.com.
Files
Files
(4.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:82cc489e51ced2611c8c0b63df798956
|
4.3 GB | Download |