Analyzing Bytes: Pre-Disassembly Static Binary Analysis
Authors/Creators
Description
Binary code analysis plays a central role in numerous applications in software security, performance optimization, reverse engineering, and so on. Existing techniques need to first disassemble binaries into functions in assembly code before an analysis can be performed. However, disassembly and function identification have proven to be major challenges for complex variable-length instruction sets such as the x86. A recent trend has been to use static analysis to improve the accuracy of these tasks. This raises a chicken-and-egg problem: a disassembly is needed for static analysis, but a static analysis is needed for accurate disassembly! We overcome this problem by developing a novel static analysis approach that can operate before committing to a disassembly. Our analysis operates on the output of exhaustive disassembly that considers each possible offset in a binary as an instruction, and constructs what is known as a super-set control-flow graph (CFG). The central technical challenge in analyzing this CFG is that it mixes legitimate instructions with unintended ones, causing analysis results from invalid code paths to pollute legitimate one. To overcome this challenge, we begin with a key new insight that if we focus on backward analyses, we can ensure accuracy of analysis results at intended instructions even though we have no idea where these intended instructions are! Moreover, our analysis operates in time that is linear in the size of the binary. For the task it is intended for, namely, providing static analysis results for any offset in a binary, it is orders of magnitude faster than previous techniques. Finally, our evaluation results also show that these performance benefits are achieved without sacrificing analysis accuracy.
Files
README.md
Additional details
Software
- Repository URL
- https://github.com/nhuhuan/sba
- Programming language
- C++
- Development Status
- Active