10.5281/zenodo.3403078
https://zenodo.org/records/3403078
oai:zenodo.org:3403078
Lacomis, Jeremy
Jeremy
Lacomis
0000-0003-0653-5738
Carnegie Mellon University
Yin, Pengcheng
Pengcheng
Yin
Carnegie Mellon University
Schwartz, Edward J.
Edward J.
Schwartz
Carnegie Mellon University Software Engineering Institute
Allamanis, Miltiadis
Miltiadis
Allamanis
Microsoft Research
Le Goues, Claire
Claire
Le Goues
Carnegie Mellon University
Neubig, Graham
Graham
Neubig
Carnegie Mellon University
Vasilescu, Bogdan
Bogdan
Vasilescu
Carnegie Mellon University
DIRE: A Neural Approach to Decompiled Identifier Naming
Zenodo
2019
2019-09-09
10.5281/zenodo.3403077
MIT License
This dataset is released as a companion to the paper "DIRE: A Neural Approach to Decompiled Identifier Naming", appearing in the proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE 2019).
It contains information generated by decompiling 3,195,962 functions found in 164,632 unique binaries generated from C code scraped from GitHub. For practicality, the dataset is partitioned into 16 archives by the first hexadecimal digit of the SHA-256 hash of the binary used to generate it. Each of the 16 archives contains approximately 10,000 JSONL files, named according to a binary's hash. Each JSONL file consists of a single JSON object per-line corresponding to a single function in the decompiled binary.
Archives are provided in both GZIP and BZIP2 format.
See the README file for more information.