Published June 1, 2020
| Version 2020.06.01
Dataset
Open
DeepDataFlow
Description
This dataset contains 493k LLVM-IRs taken from a wide range of projects and source programming languages, and includes labels for several compiler data analyses. We also include the logs for the machine learning jobs which produced our published experimental results.
The uncompressed dataset uses the following layout:
labels/
- Directory containing machine learning features and labels for programs for compiler data flow analyses.
labels/<analysis>/<source>.<id>.<lang>.ProgramFeaturesList.pb
- A ProgramFeaturesList protocol buffer containing a list of features resulting from running a data flow analysis on a program.
graphs/
- Directory containing ProGraML representations of LLVM IRs.
graphs/<source>.<id>.<lang>.ProgramGraph.pb
- A ProgramGraph protocol buffer of an LLVM IR in the ProGraML representation.
ll/
- Directory containing LLVM-IR files.
ir/<source>.<id>.<lang>.ll
- An LLVM IR in text format, as produced by
clang -emit-llvm -S
or equivalent.
- An LLVM IR in text format, as produced by
test/
- A directory containing symlinks to graphs in the
graphs/
directory, indicating which graphs should be used as part of the test set.
- A directory containing symlinks to graphs in the
train/
- A directory containing symlinks to graphs in the
graphs/
directory, indicating which graphs should be used as part of the training set.
- A directory containing symlinks to graphs in the
val/
- A directory containing symlinks to graphs in the
graphs/
directory, indicating which graphs should be used as part of the validation set.
- A directory containing symlinks to graphs in the
vocal/
- Directory containing vocabulary files.
vocab/<type>.csv
- A vocabulary file, which lists unique node texts, their frequency in the dataset, and the cumulative proportion of total unique node texts that is covered.
For further information please see our ProGraML repository.
Files
Files
(8.8 GB)
Name | Size | Download all |
---|---|---|
md5:8398df80abc564cf74143ef4740ec833
|
1.9 GB | Download |
md5:10ad56f31bafa85a96d896f4ea0b387f
|
265.6 MB | Download |
md5:4491105b61eb534ce42f7d342e88af27
|
10.6 MB | Download |
md5:5812e41db6f11720454003762e7a8b0b
|
3.8 GB | Download |
md5:ec91e691882eb658be138fd0fbed1b26
|
69.8 MB | Download |
md5:d515819b6041b27eb9f592d46761639f
|
69.0 MB | Download |
md5:c3879f4c3fa1d339a3aad7cd9d4c2188
|
124.8 MB | Download |
md5:96bbc6a8d44fe6b17a8f7f76ea40148e
|
84.1 MB | Download |
md5:128f0e67fb9bd2b72ede055ab236c49e
|
71.4 MB | Download |
md5:76815e3344101a504b224f10175b7dfa
|
1.3 GB | Download |
md5:a9303e635f60b521119c2801972b6781
|
1.1 GB | Download |
Additional details
Related works
- Is cited by
- Preprint: arXiv:2003.10536 (arXiv)
References
- Cummins, C., Fisches, Z. V., Ben-Nun, T., Hoefler, T., & Leather, H. (2020). ProGraML: Graph-based Deep Learning for Program Optimization and Analysis. arXiv preprint arXiv:2003.10536.