Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published November 15, 2020 | Version v1
Dataset Open

Neural Reverse Engineering of Stripped Binaries using Augmented Control Flow Graphs

  • 1. Technion, Israel

Description

This dataset and pre-trained models are released as a companion to our OOPSLA '20 publication: "Neural Reverse Engineering of Stripped Binaries using Augmented Control Flow Graphs":

  1. The dataset file (nero_dataset_binaries.tar.gz) is composed from packages of binary executables created by compiling several GNU source-code packages. We used these executables to evaluate our approach as implemented in our prototype "Nero" and compare it to other approaches. All executables contain debug information which serves as the ground truth for the procedure name predictions. The packages are split into three sets: training, validation and test.
    1. The executable file name structure is: "<compiler>-<compiler version>__O<Optimization level(u for default)>__<Package name>[-<optional package version>]__<Executable name>". For example "gcc-5__Ou__cssc__sccs".
  2. The pre-trained model file (nero_gnn_model.tar.gz) was created using the above dataset:
    1. The pre-trained model and training log.
    2. The prediction results log.

For the code of the "Nero" prototype see our Github repo

Files

Files (134.7 MB)

Name Size Download all
md5:96ecb494acdee1f723fa5c350b0af846
36.0 MB Download
md5:b2f244402a2241945f60f33406fdcff9
98.7 MB Download

Additional details

Related works

Is documented by
Conference paper: 10.1145/3428293 (DOI)
Preprint: arXiv:1902.09122 (arXiv)