Published August 31, 2023 | Version v1
Conference paper Open

SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings

  • 1. Ohio State University
  • 2. Columbia University

Description

Predicting function names in stripped binaries is an extremely useful but challenging task, as it requires summarizing the execution behavior and semantics of the function in human languages. Recently, there has been significant progress in this direction with machine learning. However, existing approaches fail to model the exhaustive function behavior and thus suffer from the poor generalizability to unseen binaries. To advance the state of the art, we present a function Symbol name prediction and binary Language Modeling (SymLM) framework, with a novel neural architecture that learns the comprehensive function semantics by jointly modeling the execution behavior of the calling context and instructions via a novel fusing encoder. We have evaluated SymLM with 1,431,169 binary functions from 27 popular open source projects, compiled with 4 optimizations (O0-O3) for 4 different architectures (i.e., x64, x86, ARM, and MIPS) and 4 obfuscations. SymLM outperforms the stateof-the-art function name prediction tools by up to 15.4%, 59.6%, and 35.0% in precision, recall, and F1 score, with significantly better generalizability and obfuscation resistance. Ablation studies also show that our design choices (e.g., fusing components of the calling context and execution behavior) substantially boost the performance of function name prediction. Finally, our case studies further demonstrate the practical use cases of SymLM in analyzing firmware images.

 

This repository is the dataset used to train the SymLM model. It includes the binaries across different architectures (i.e., x64, x86, ARM, and MIPS) and different optimization levels (i.e., O0, O1, O2, O3).  

Files

arm.zip

Files (9.4 GB)

Name Size Download all
md5:aa35d14da31d7a4e769dc8d128e9ce9e
606.9 MB Preview Download
md5:71524714dd00362617499490cc6e64c9
548.8 MB Preview Download
md5:c629bfbf804ad047724693bba7ccfc4c
4.4 GB Preview Download
md5:0804d80c2275bf7ea246065bb1a75d8a
3.8 GB Preview Download