Published March 26, 2026 | Version v1
Dataset Open

Function Name Recovery in Stripped Binaries: An Experience Report on Preprocessing, Evaluation, and Reproducibility

Authors/Creators

Description

Abstract:


Recovering function names from stripped binaries remains a bottleneck in software maintenance, program comprehension, binary debugging, and security analysis. Although recent years have seen a wave of machine-learning-based techniques, the practical state of the art remains difficult to assess. Prior studies are confounded by three recurring problems: a widespread assumption that heavy manual preprocessing is needed to help tokenizers, even though such processing can erase domain-specific semantics or simplify labels in ways that inflate scores; evaluations that are not directly comparable because tools rely on different function-discovery backends or permissive metrics such as token-level top-$k$; and severe reproducibility barriers caused by missing artifacts, undocumented bugs, and extreme computational cost.

This experience paper reports our effort to systematize and re-evaluate function-name recovery through a within-pipeline sensitivity analysis. We reproduce four representative state-of-the-art models on a common dataset and controlled pipeline, then retrain them under multiple preprocessing configurations to test whether manual segmentation and normalization are necessary. Across models, we find that these hand-engineered strategies often provide limited benefit over modern tokenizers and can silently discard useful semantic information. We further re-evaluate model outputs under stricter, analyst-facing criteria and show that permissive scoring schemes can substantially overstate practical performance. Finally, we document the scalability and reproducibility challenges encountered during reproduction, including missing artifacts, software bugs, and prohibitive resource demands. Based on these findings, we propose a unified evaluation framework and concrete best practices for more robust, comparable, and reproducible research on function name recovery.

Files

README.md

Files (2.9 GB)

Name Size Download all
md5:454ae69ca0e7f30c72cdb1763b9893f0
2.8 kB Preview Download
md5:e41e569413be5d40bbc381e6d8ac16e5
15.4 kB Preview Download
md5:91ebff236f09fc86110b42221dff80de
47.8 MB Preview Download
md5:7921a06212f314abbb4b169de528ca80
715.6 MB Download
md5:8c43ef12662fd12de334a8ac15533f6c
2.1 GB Download