There is a newer version of the record available.

Published September 25, 2025 | Version v2
Dataset Open

Is Call Graph Pruning Really Effective? An Empirical Re-evaluation

Authors/Creators

Description

This artifact contains the dataset, results, and source code associated with the paper. It is divided into three archives:

datasets.zip

This archive includes the datasets produced by this study.

Directory contents:

  • NJR1
    • Automatic
      • list of programs as well as false and true labels.
    • Manual
      • manual.csv file (final labels used in the paper)
      • recorded_evidence.csv (contains call chain and allocation trace evidence)
  • Xcorpus
    • Automatic
      • list of programs as well as false and true labels.
    • Manual
      • list of programs 
        • manual.csv file (final labels used in the paper)
        • recorded_evidence.csv (contains call chain and allocation trace evidence)

 

experimental_data.zip

This archive includes all the generated data in this study.

Directory contents:

  • generated_cgs/ – Automatically generated static call graphs and their associated labels.

  • feature_vectors/ – Structured and token-based features extracted using pre-trained CodeBERT and CodeT5 models.

  • ML_results/ – Contains all output files, including final results and plots used in the paper.

 

 

source_code.zip

This archive includes all scripts used to generate the dataset and conduct experiments.

Directory contents:

  • static_cg_generation/ – Scripts for running WALA, DOOP, and OPAL with multiple configurations to generate static call graphs. Each tool’s settings can be found under its config/ subdirectory.

  • dataset_generation/ – Scripts for dataset construction:

    • manual_sampling/ – Stratified sampling of call graph edges.

    • semantic_features/ – Extraction of raw and fine-tuned semantic features.

    • structured_features/ – Generation of structured graph features.

  • approach/ – Machine learning experiments and evaluation pipelines described in the paper.

  • paper/ – Scripts used to generate plots and visualizations presented in the paper.

Each directory includes a README file explaining its structure and usage.

 

Configurations.xlsx

This file contains the configurations we used for each tool in this study.

File contents:

  • WALA_full_configuration : all the selected configuration for WALA.

  • Doop_full_configuration : all the selected configurations for Doop.

  • Opal_full_configuration : all the selected configurations for Opal.

 

  • WALA_partial_order : list of pairs of configurations we used to generate false labels using partial orders for WALA.
  • Doop_partial_order : list of pairs of configurations we used to generate false labels using partial orders for Doop.

 

This artifact enables full reproducibility of the dataset creation, feature extraction, and experimental results discussed in the paper.

 

Files

datasets.zip

Files (13.5 GB)

Name Size Download all
md5:1d7c70f54b7af2a8afd4812c4f72e592
109.2 kB Download
md5:1d2c21aff67cd3394790d1971ab73c00
3.3 MB Preview Download
md5:8e6573bc3c3ea8bbe1c8aeaa8e5b1812
13.5 GB Preview Download
md5:099ad8e439ba321b12f30a7f3e5570a9
63.2 MB Preview Download

Additional details

Identifiers

Other
artifact

Dates

Submitted
2025-07-17