Published June 22, 2024 | Version v1
Model Restricted

Graph-Driven Indirect Call Prediction in Binary Code with Cross-Reference Augmented Control Flow Representations

Authors/Creators

Description

Abstract:

Binary code analysis has extensive downstream applications such as binary rewriting, recompilation, and software security. A significant challenge in maintaining the integrity of static analysis is resolving indirect call targets. This difficulty arises because the operand of a call instruction (e.g., call rax) remains unknown until runtime, leading to an incomplete inter-procedural control flow graph (CFG). Traditional solutions suffer from low accuracy or poor scalability, prompting recent research to explore machine learning (ML) approaches. However, the quality of ground truth data critically affects the accuracy of ML models. In this paper, we present NeuCall, a novel approach for resolving indirect calls using graph neural networks (GNNs). 


Repository Structure


We have uploaded both the source code and dataset used in our experiments. The detailed information is listed below.
.
├── batch
├── binary_collection
├── database
├── env
├── graph_generation
├── model
├── model.checkpoint
├── predictor.checkpoint
├── README.md
└── util

Dataset:

binary: binary.zip

graph: graph.zip

 

Detailed Workflow


Step 1: Environment Setup

  • To replicate our experiments, first, set up the environment using the provided YAML files. This ensures that all dependencies and configurations are correctly established.

Step 2: Binary Collection

  • Using the modified Typro_CFI and GHCC tools, we collect binaries and annotate them with ground truth information. This step involves compiling source code from Arch Linux and GitHub to create a comprehensive dataset.

Step 3: Graph Generation

  • Convert the collected binary information into heterogeneous graphs. These graphs serve as the input for our GNN model, capturing the necessary control flow and data flow information.

Step 4: Model Training

  • Train the GNN model using the generated graphs. Experiment with different model settings, such as varying the size, layers, and types of graph elements, to optimize performance.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.