Dataset Open Access
Chris Cummins
This dataset contains 493k LLVM-IRs taken from a wide range of projects and source programming languages, and includes labels for several compiler data analyses. We also include the logs for the machine learning jobs which produced our published experimental results.
The uncompressed dataset uses the following layout:
labels/
labels/<analysis>/<source>.<id>.<lang>.ProgramFeaturesList.pb
graphs/
graphs/<source>.<id>.<lang>.ProgramGraph.pb
ll/
ir/<source>.<id>.<lang>.ll
clang -emit-llvm -S
or equivalent.test/
graphs/
directory, indicating which graphs should be used as part of the test set.train/
graphs/
directory, indicating which graphs should be used as part of the training set.val/
graphs/
directory, indicating which graphs should be used as part of the validation set.vocal/
vocab/<type>.csv
For further information please see our ProGraML repository.
Name | Size | |
---|---|---|
classifyapp_2020.05.06.tar.bz2
md5:8398df80abc564cf74143ef4740ec833 |
1.9 GB | Download |
dataflow_logs_20.06.01.tar.bz2
md5:10ad56f31bafa85a96d896f4ea0b387f |
265.6 MB | Download |
devmap_2020.06.27.tar.bz2
md5:4491105b61eb534ce42f7d342e88af27 |
10.6 MB | Download |
graphs_20.06.01.tar.bz2
md5:5812e41db6f11720454003762e7a8b0b |
3.8 GB | Download |
labels_datadep_20.06.01.tar.bz2
md5:ec91e691882eb658be138fd0fbed1b26 |
69.8 MB | Download |
labels_domtree_20.06.01.tar.bz2
md5:d515819b6041b27eb9f592d46761639f |
69.0 MB | Download |
labels_liveness_20.06.01.tar.bz2
md5:c3879f4c3fa1d339a3aad7cd9d4c2188 |
124.8 MB | Download |
labels_reachability_20.06.01.tar.bz2
md5:96bbc6a8d44fe6b17a8f7f76ea40148e |
84.1 MB | Download |
labels_subexpressions_20.06.01.tar.bz2
md5:128f0e67fb9bd2b72ede055ab236c49e |
71.4 MB | Download |
llvm_bc_20.06.01.tar.bz2
md5:76815e3344101a504b224f10175b7dfa |
1.3 GB | Download |
llvm_ir_20.06.01.tar.bz2
md5:a9303e635f60b521119c2801972b6781 |
1.1 GB | Download |
Cummins, C., Fisches, Z. V., Ben-Nun, T., Hoefler, T., & Leather, H. (2020). ProGraML: Graph-based Deep Learning for Program Optimization and Analysis. arXiv preprint arXiv:2003.10536.
All versions | This version | |
---|---|---|
Views | 49 | 45 |
Downloads | 324 | 178 |
Data volume | 322.2 GB | 232.9 GB |
Unique views | 44 | 41 |
Unique downloads | 211 | 159 |