Published November 17, 2023 | Version v2
Dataset Open

Artifact for An Extensive Empirical Study of Nondeterministic Behavior in Static Analysis Tools

Creators

Description

This repository contains data for 'An Extensive Empirical Study of Nondeterministic Behavior in Static Analysis Tools' and the source code of the tool NDDetector that is used for performing the experiments in RQ2.

There are two directories, data and tool:

<data> contains the data for the conclusion made in the two research questions, RQ1 and RQ2. (rq1 is Research Question 1s data)

In rq1/ there are:

final_results.csv - Contains 43 distinct results from 4 repositories (SOOT, WALA, FlowDroid, DroidSafe) that fix or report nondeterminism.

summary.pdf - Reports the number of nondeterminism results by tool repository at each stage of the qualitative study.

categorization.pdf - Reports the number of nondeterminism results by root cause categories at each component of analysis codebase in which the nondeterminism takes place

raw_data.zip - Contains the raw commits and issues extracted from 9 repositories (SOOT, DOOP, WALA, FlowDroid, DroidSafe, AmanDroid, TAJS, Code2Flow, PyCG)

key_words_results.zip - Contains the results extracted by each keyword (concurrency, concurrent, consistent, determinism, deterministic, different, flakiness, flaky, parallel, thread) from the raw data.

In rq2/ there are:

ICSE2024_AGGREGATE_DATA.csv - Contains the result distributions of each combination of target program, configuration hash, and tool aswell as the calculated consistency score.

analyze_results.py - Script that makes this data.

node_freqs - Contains the frequency of each node in the nondeterministic results we observed,it also keeps track of whether this particular node is a callee or caller or source/sink.

edge_dists - Contains the actual edge distributions of all of our results that behaved nondeterministically. it contains, for each result (edge/flow) across repetitions, which repetitions did or did not contain this edge/flow and which did. This means if you are interested in the actual differences across results generated by tool edge_dists/ is the place to look.

figure_8 - The raw data and occurences per node sheet for generating Figure_8.

<tool> contains the framework and its source code that we used for conducting the experiments as well as the scripts that are used to post-process the detected nondeterminstic behavior and generate the summarized results in Section 4.

Files

data_repov2.zip

Files (1.7 GB)

Name Size Download all
md5:33e3dfc00e513b4aa791aca9d779a6c8
1.7 GB Preview Download