Dataset Open Access

DIRE: A Neural Approach to Decompiled Identifier Naming

Lacomis, Jeremy; Yin, Pengcheng; Schwartz, Edward J.; Allamanis, Miltiadis; Le Goues, Claire; Neubig, Graham; Vasilescu, Bogdan

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Lacomis, Jeremy</dc:creator>
  <dc:creator>Yin, Pengcheng</dc:creator>
  <dc:creator>Schwartz, Edward J.</dc:creator>
  <dc:creator>Allamanis, Miltiadis</dc:creator>
  <dc:creator>Le Goues, Claire</dc:creator>
  <dc:creator>Neubig, Graham</dc:creator>
  <dc:creator>Vasilescu, Bogdan</dc:creator>
  <dc:description>This dataset is released as a companion to the paper "DIRE: A Neural Approach to Decompiled Identifier Naming", appearing in the proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE 2019).

It contains information generated by decompiling 3,195,962 functions found in 164,632 unique binaries generated from C code scraped from GitHub. For practicality, the dataset is partitioned into 16 archives by the first hexadecimal digit of the SHA-256 hash of the binary used to generate it. Each of the 16 archives contains approximately 10,000 JSONL files, named according to a binary's hash. Each JSONL file consists of a single JSON object per-line corresponding to a single function in the decompiled binary.

Archives are provided in both GZIP and BZIP2 format.

See the README file for more information.</dc:description>
  <dc:title>DIRE: A Neural Approach to Decompiled Identifier Naming</dc:title>
All versions This version
Views 759759
Downloads 7,6577,657
Data volume 1.3 TB1.3 TB
Unique views 679679
Unique downloads 431431


Cite as