Published April 21, 2026 | Version v3
Software Open

Analyzing Bytes: Pre-Disassembly Static Binary Analysis

Authors/Creators

Contributors

Project member:

  • 1. EDMO icon Stony Brook University

Description

Binary code analysis plays a central role in numerous applications in software security, performance optimization, reverse engineering, and so on. Existing techniques need to first disassemble binaries into functions in assembly code before an analysis can be performed. However, disassembly and function identification have proven to be major challenges for complex variable-length instruction sets such as the x86. A recent trend has been to use static analysis to improve the accuracy of these tasks. This raises a chicken-and-egg problem: a disassembly is needed for static analysis, but a static analysis is needed for accurate disassembly! We overcome this problem by developing a novel static analysis approach that can operate before committing to a disassembly. Our analysis operates on the output of exhaustive disassembly that considers each possible offset in a binary as an instruction, and constructs what is known as a super-set control-flow graph (CFG). The central technical challenge in analyzing this CFG is that it mixes legitimate instructions with unintended ones, causing analysis results from invalid code paths to pollute legitimate one. To overcome this challenge, we begin with a key new insight that if we focus on backward analyses, we can ensure accuracy of analysis results at intended instructions even though we have no idea where these intended instructions are! Moreover, our analysis operates in time that is linear in the size of the binary. For the task it is intended for, namely, providing static analysis results for any offset in a binary, it is orders of magnitude faster than previous techniques. Finally, our evaluation results also show that these performance benefits are achieved without sacrificing analysis accuracy.

Files

README.md

Files (19.9 GB)

Name Size Download all
md5:853af18ab6b0b0d3199e43b62a3d9b7f
6.0 GB Download
md5:7aed1841d4e131605b66870228f719a3
13.6 GB Download
md5:14bcce03e0a4dd41e227507ea0110cf2
39.0 kB Preview Download
md5:d2e0b645e774b431b475f62cb97a1f02
235.7 MB Download

Additional details

Software

Repository URL
https://github.com/nhuhuan/sba
Programming language
C++
Development Status
Active