Dataset Open Access
NJR is a Normalized Java Resource.
The NJR-1 dataset consists of 293 Java bytecode programs, each of which executes at least 100 unique application methods at runtime. Additionally, 5 static analysis tools (SpotBugs, Wala, Doop, Soot, Petablox) successfully run on these programs.
These programs are repositories picked from the set of Java-8 projects on Github that compile and run successfully.
Each of these programs comes with an executable jar file, the compiled bytecode file, and the Java source code.
There are 3 files available for download: njr-1_dataset.zip, scripts.zip, benchmark_stats.csv.
njr-1_dataset.zip has the actual dataset programs. scripts.zip contains Python3 scripts to run analysis tools (SpotBugs, Wala, Doop, Soot, Petablox) on the entire dataset. The benchmark_stats.csv file lists, for each benchmark, the number of nodes and edges in its dynamic application call-graph, as well as the number of edges in its static application call-graph (as computed by Wala).
A summary of the same is listed here:
Statistics Dynamic-Nodes Dynamic-Edges Static-Edges
Mean 205 469 1404
St.Dev 199 464 2523
Median 149 327 610
To cite the dataset, please cite the following paper:
Jens Palsberg and Cristina V. Lopes, NJR: a Normalized Java Resource.
In Proceedings of ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP), 2018.