Is Call Graph Pruning Really Effective? An Empirical Re-evaluation

Anonymous

doi:10.5281/zenodo.17204367

Published September 25, 2025 | Version v2

Dataset Open

Is Call Graph Pruning Really Effective? An Empirical Re-evaluation

Anonymous

This artifact contains the dataset, results, and source code associated with the paper. It is divided into three archives:

datasets.zip

This archive includes the datasets produced by this study.

Directory contents:

NJR1
- Automatic
  - list of programs as well as false and true labels.
- Manual
  - manual.csv file (final labels used in the paper)
  - recorded_evidence.csv (contains call chain and allocation trace evidence)
Xcorpus
- Automatic
  - list of programs as well as false and true labels.
- Manual
  - list of programs
    - manual.csv file (final labels used in the paper)
    - recorded_evidence.csv (contains call chain and allocation trace evidence)

experimental_data.zip

This archive includes all the generated data in this study.

Directory contents:

generated_cgs/ – Automatically generated static call graphs and their associated labels.
feature_vectors/ – Structured and token-based features extracted using pre-trained CodeBERT and CodeT5 models.
ML_results/ – Contains all output files, including final results and plots used in the paper.

source_code.zip

This archive includes all scripts used to generate the dataset and conduct experiments.

Directory contents:

static_cg_generation/ – Scripts for running WALA, DOOP, and OPAL with multiple configurations to generate static call graphs. Each tool’s settings can be found under its config/ subdirectory.
dataset_generation/ – Scripts for dataset construction:

manual_sampling/ – Stratified sampling of call graph edges.
semantic_features/ – Extraction of raw and fine-tuned semantic features.
structured_features/ – Generation of structured graph features.

approach/ – Machine learning experiments and evaluation pipelines described in the paper.
paper/ – Scripts used to generate plots and visualizations presented in the paper.

Each directory includes a README file explaining its structure and usage.

Configurations.xlsx

This file contains the configurations we used for each tool in this study.

File contents:

WALA_full_configuration : all the selected configuration for WALA.
Doop_full_configuration : all the selected configurations for Doop.
Opal_full_configuration : all the selected configurations for Opal.

WALA_partial_order : list of pairs of configurations we used to generate false labels using partial orders for WALA.
Doop_partial_order : list of pairs of configurations we used to generate false labels using partial orders for Doop.

This artifact enables full reproducibility of the dataset creation, feature extraction, and experimental results discussed in the paper.

Files

datasets.zip

Files (13.5 GB)

Name	Size	Download all
Configurations.xlsx md5:1d7c70f54b7af2a8afd4812c4f72e592	109.2 kB	Download
datasets.zip md5:1d2c21aff67cd3394790d1971ab73c00	3.3 MB	Preview Download
experimental_data.zip md5:8e6573bc3c3ea8bbe1c8aeaa8e5b1812	13.5 GB	Preview Download
source-code.zip md5:099ad8e439ba321b12f30a7f3e5570a9	63.2 MB	Preview Download

Additional details

Other: artifact

Submitted: 2025-07-17

	All versions	This version
Views	157	36
Downloads	78	19
Data volume	596.3 GB	40.7 GB

datasets.zip

Directory contents:

experimental_data.zip

Directory contents:

source_code.zip

Directory contents:

Configurations.xlsx

File contents:

datasets.zip

Files (13.5 GB)

Identifiers

Dates

Is Call Graph Pruning Really Effective? An Empirical Re-evaluation

Authors/Creators

Description

datasets.zip

Directory contents:

experimental_data.zip

Directory contents:

source_code.zip

Directory contents:

Configurations.xlsx

File contents:

Files

datasets.zip

Files (13.5 GB)

Additional details

Identifiers

Dates