Function Name Recovery in Stripped Binaries: An Experience Report on Preprocessing, Evaluation, and Reproducibility
Authors/Creators
Description
Abstract:
Recovering function names from stripped binaries remains a bottleneck in software maintenance, program comprehension, binary debugging, and security analysis. Although recent years have seen a wave of machine-learning-based techniques, the practical state of the art remains difficult to assess. Prior studies are confounded by three recurring problems: a widespread assumption that heavy manual preprocessing is needed to help tokenizers, even though such processing can erase domain-specific semantics or simplify labels in ways that inflate scores; evaluations that are not directly comparable because tools rely on different function-discovery backends or permissive metrics such as token-level top-$k$; and severe reproducibility barriers caused by missing artifacts, undocumented bugs, and extreme computational cost.
This experience paper reports our effort to systematize and re-evaluate function-name recovery through a within-pipeline sensitivity analysis. We reproduce four representative state-of-the-art models on a common dataset and controlled pipeline, then retrain them under multiple preprocessing configurations to test whether manual segmentation and normalization are necessary. Across models, we find that these hand-engineered strategies often provide limited benefit over modern tokenizers and can silently discard useful semantic information. We further re-evaluate model outputs under stricter, analyst-facing criteria and show that permissive scoring schemes can substantially overstate practical performance. Finally, we document the scalability and reproducibility challenges encountered during reproduction, including missing artifacts, software bugs, and prohibitive resource demands. Based on these findings, we propose a unified evaluation framework and concrete best practices for more robust, comparable, and reproducible research on function name recovery.
Files
README.md
Files
(2.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:454ae69ca0e7f30c72cdb1763b9893f0
|
2.8 kB | Preview Download |
|
md5:e41e569413be5d40bbc381e6d8ac16e5
|
15.4 kB | Preview Download |
|
md5:91ebff236f09fc86110b42221dff80de
|
47.8 MB | Preview Download |
|
md5:7921a06212f314abbb4b169de528ca80
|
715.6 MB | Download |
|
md5:8c43ef12662fd12de334a8ac15533f6c
|
2.1 GB | Download |