Is Call Graph Pruning Really Effective? An Empirical Re-evaluation
Authors/Creators
Description
This artifact contains the dataset, results, and source code associated with the paper. It is divided into three archives:
datasets.zip
This archive includes the datasets produced by this study.
Directory contents:
- NJR1
- Automatic
- list of programs as well as false and true labels.
- Manual
- manual.csv file (final labels used in the paper)
- recorded_evidence.csv (contains call chain and allocation trace evidence)
- Automatic
- Xcorpus
- Automatic
- list of programs as well as false and true labels.
- Manual
- list of programs
- manual.csv file (final labels used in the paper)
- recorded_evidence.csv (contains call chain and allocation trace evidence)
- list of programs
- Automatic
experimental_data.zip
This archive includes all the generated data in this study.
Directory contents:
-
generated_cgs/ – Automatically generated static call graphs and their associated labels.
-
feature_vectors/ – Structured and token-based features extracted using pre-trained CodeBERT and CodeT5 models.
-
ML_results/ – Contains all output files, including final results and plots used in the paper.
source_code.zip
This archive includes all scripts used to generate the dataset and conduct experiments.
Directory contents:
-
static_cg_generation/ – Scripts for running WALA, DOOP, and OPAL with multiple configurations to generate static call graphs. Each tool’s settings can be found under its config/ subdirectory.
-
dataset_generation/ – Scripts for dataset construction:
-
manual_sampling/ – Stratified sampling of call graph edges.
-
semantic_features/ – Extraction of raw and fine-tuned semantic features.
-
structured_features/ – Generation of structured graph features.
-
approach/ – Machine learning experiments and evaluation pipelines described in the paper.
-
paper/ – Scripts used to generate plots and visualizations presented in the paper.
Each directory includes a README file explaining its structure and usage.
Configurations.xlsx
This file contains the configurations we used for each tool in this study.
File contents:
-
WALA_full_configuration : all the selected configuration for WALA.
-
Doop_full_configuration : all the selected configurations for Doop.
- Opal_full_configuration : all the selected configurations for Opal.
- WALA_partial_order : list of pairs of configurations we used to generate false labels using partial orders for WALA.
- Doop_partial_order : list of pairs of configurations we used to generate false labels using partial orders for Doop.
This artifact enables full reproducibility of the dataset creation, feature extraction, and experimental results discussed in the paper.
Files
datasets.zip
Additional details
Identifiers
- Other
- artifact
Dates
- Submitted
-
2025-07-17