We claim the results for figures 8-12 are replicable (under reasonable expectations of performance variation).
We conducted the experiments on a machine with four Non-Uniform Memory Access (NUMA) nodes of Intel(R) Xeon(R) CPU E5-4650 8-core processor (32-cores in total), operating at 2.70 GHz, with 32KB L1 data cache, 256KB L2 cache per core, and 80MB LLC shared between 4 NUMA nodes. The machine had 190GB of RAM and 8GB of swap space.
The artifact expects the machine to have 32 cores at least. Schedule generation
requires ~120 GB of RAM. However, we included the final schedules inside the
test_schedules
directory so that the user does not have to have that much RAM.
If you are doing the full schedule generation, you will need an extra 10 GB of
memory. Since the tensors take about 10 GB of memory, the execution of the
artifact would require at least 20 GB of disk space.
Linux with cmake, gcc, g++, python3, python3-pip, python3-venv, wget, libomp-dev, zip, and unzip. Python packages z3, z3-solver, matplotlib, numpy, pandas, pillow, regex, and seaborn.
Most of the scripts will need to be run from the directory tensor-schedules
.
The artifact is bundled as an OCI container created with Docker (Dockerfile
is available as a part of the tarball on Zenodo).
The Docker image is tar
red as sparseauto-docker-image.tar
.
The image can be added to the local Docker store as follows:
docker load --input sparseauto-docker-image.tar
For your convenience, we have also included a Dockerfile so that the reviewer can build the Docker image themselves. (you need to be inside the oopsla24-artefact)
docker image build -t sparseauto-oopsla -f Dockerfile .
In case you are running the docker image on an ARM-based Mac, you can use the command in dockerhelp.md
Once you get the image, start the session as follows:
docker run --name sparseauto-oopsla-instance -td sparseauto-oopsla
Then, log into the container:
docker exec -it sparseauto-oopsla-instance bash
Once you log into the docker container, you should be inside
/home/oopsla/tensor-schedules
directory.
If you prefer to not run docker, please refer to the
Step-by-step Guide
/Building from the source
section.
From inside the tensor-schedules
directory, execute the below commands to
confirm that everything is working properly.
# download matrices/tensors - these are already pre-bundled with the docker
#image/source artifact. Confirm all the tensors are downloded by running this script.
./download_tensors.sh
# build the TACO/SparseLNR project
./build_taco.sh
Execute a simple script to generate a plot
./kick_the_tires.sh
Once the execution of the kick_the_tires.sh
finishes, there should be a plot
inside tensor-schedules/plots/fig8/plot3.png
. It should take less than 30
minutes to finish the execution of the script.
This plot contains a subset of the tensors for the first subplot in Fig. 8 in the paper.
If you're inside the docker container, you can copy the image to your local machine to view it using the below command.
docker cp sparseauto-oopsla-instance:/home/oopsla/tensor-schedules/plots/fig8/test3.png ~/<directory you want the image to be saved>/
After getting started guide, from inside the tensor-schedules
directory
execute the scripts below to obtain subgraphs of the figures 8-12 in the paper.
./figure8.sh
./figure9.sh
./figure10.sh
./figure11.sh
./figure12.sh
Due to time constraints, we have reduced the number of iterations in some of
the tests. For example, we have used 32 iterations in figure8 generation
originally. But for this artifact, we have reduced the number of iterations to
4 so that the reviewer does not have to wait too long to generate the plots.
The number of iterations can be changed by passing the argument
--iterations <Number of iterations>
in the figure to the
src.main_run_test_modified
python script call.
tensor-schedules/plots/fig8
directory with the name plots-combined
tensor-schedules/plots/fig9-12
directories.The data points used to generate the plots are saved in a directory called
tensor-schedules/csv_results
but we do not expect the reviewers to read them
as the generated plots are directly saved to the tensor-schedules/plots
directory.
Testing is done by generating the schedule as a string and then replacing it in
the file sparseLNR/test/test-workspaces.cpp
file. If a process hangs, it
may lead to compilation errors. In this scenario, we have included a fixme.sh
script to copy a fresh file in tensor-schedules/fixme/test-workspaces.cpp
to
sparseLNR/test/test-workspaces.cpp
.
We expect the user can finish all testing within less than a week.
You can use the source directly as it may give faster execution times. We assume that the user has access to a Linux/Ubuntu-based machine.
Please install the below packages (You can look at the Dockerfile for exact steps)
# install packages
sudo apt-get -y update && sudo apt-get -y install cmake gcc g++ python3 python3-pip python3-venv git wget libomp-dev zip unzip
# from inside the oopsla24-artefact, create a python virtual environment and
# install python modules
pip install --no-cache-dir --upgrade pip
pip install --no-cache-dir virtualenv
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --no-cache-dir -r ./tensor-schedules/requirements.txt
Now move inside the tensor-schedules
directory
The artifact contains two main directories. The first one sparseLNR contains the code generation section.
We have added tests to sparseLNR/test/test-workspaces.cpp
. Should the user
intends to add their tests, they should follow one of the tests in that file.
More information on writing TACO code can be found on
TACO website and the
Documentation page.
The schedule generation
framework is included in the directory called tensor-schedules
. The generated
schedule is compiled into an equivalent SparseAuto schedule string. We paste
this schedule string to the sparseLNR/test/test-workspaces.cpp
file between
/* BEGIN / / END */ tags.
The schedule generation utilizes a config file where the user is required to
provide a config file containing all the constraints. Example config files are
included in the tensor-schedules/test_configs
directory.
Please refer to
tensor-schedules/test_configs/test3_config_kick_the_tires.json
for an example
config and tensor-schedules/kick_the_tires.sh
for the execution commands. The
corresponding test of test3_config_kick_the_tires.json is attributed to the
"test_name" attribute in the JSON file, named sddmm_spmm_real
.
{
"accesses": { # A(i,l) = B(i,j) * C(i,k) * D(j,k) * E(j,l)
"A": ["i", "l"],
"B": ["i", "j"],
"C": ["i", "k"],
"D": ["j", "k"],
"E": ["j", "l"]
},
"tensor_idx_order_constraints": {
"B": [ # B: B(i,j) here is sparse, therefore j should appear after i
["j", "i"]
]
},
"output_tensor": "A", # output tensor in the computation is A
# defines files to save the configs
"test_json_file": "test3_without_z3_pruning.json",
"test_json_file_without_depth": "test3_without_depth_pruning.json",
"test_json_file_after_z3": "test3_with_z3_pruning.json",
"test_best_schedule_file": "test3_best_schedule.json",
# corresponding test name in sparseLNR/test/tests-workspaces.cpp file
# corresponding default test name is default_sddmm_spmm_real
"test_name": "sddmm_spmm_real",
# definintion of z3 constraints for the filter stage 3
"z3_constraints": [
"i >= 11000", "i <= 1000000",
"j >= 11000", "j <= 1000000",
"k >= 8", "k <= 256",
"l >= 8", "l <= 256",
"jpos >= 0", "jpos <= j",
"1000 * i * jpos < i * j",
"i * j < 1000000 * i * jpos"
],
# timing values after comparing against default TACO schedule
"output_csv_file": "test3.csv",
# matrices to evaluate on
"eval_files": ["bcsstk17.mtx", "cant.mtx", "consph.mtx", "cop20k_A.mtx"],
# actual bounds of the tensors for runtime pruning stages
"actual_values": {
"bcsstk17.mtx": {"i": 11000, "j": 11000, "k": 16, "l": 16, "jpos": 39},
"cant.mtx": {"i": 62000, "j": 62000, "k": 16, "l": 16, "jpos": 65},
"consph.mtx": {"i": 83000, "j": 83000, "k": 16, "l": 16, "jpos": 72},
"cop20k_A.mtx": {"i": 12000, "j": 12000, "k": 16, "l": 16, "jpos": 218}
}
}
There is a corresponding test in sparseLNR/test/tests-workspaces.cpp
with
a name bearing sddmm_spmm_real
. The below template describes the basic
functionality of the test included in tests-workspaces.cpp
.
TEST(workspaces, <test name>) {
[variable declarations]
[load tensor file for reading]
[tensor declarations and packing]
[index declarations]
[computation declaration]
...
/* BEGIN <test name> */
...
/* END <test name> */
[extra transformations]
...
[declare expected (no transformations)]
[declare timing variables]
for (int [var] = 0; [var] < [any integer]; [var]++) {
[time the computation]
[time computation without transformations]
...
}
...
}
Should the user change the schedule generation algorithm, they should look at
the get_schedules_unfused
and other functions in tensor-schedules/src/autosched.py
Depth-based pruning logic is implemented in src/prune.py
. POSET-based pruning
logic is implemented in src/solver_config.py
. Z3 based pruning logic is also
implemented in prune_baskets
function in src/solver_config.py
.
The user can either choose to change these functions or add new pruning stages
following the same implementation logic.
When we save schedules after the compile time stages, we allocate them to
baskets where a basket holds schedules with the same time and memory
complexities. Runtime pruning logic is implemented in src/basket.py
.
The code generation code is implemented in src/generate_taco_schedule.py