There is a newer version of this record available.

Dataset Open Access

NJR-1 Dataset

Utture, Akshay; Kalhauge, Christian Gram; Liu, Shuyang; Palsberg, Jens

NJR is a Normalized Java Resource.

The NJR-1 dataset consists of 293 Java bytecode programs, each of which executes at least 100 unique application methods at runtime. Additionally, 5 static analysis tools (SpotBugs, Wala, Doop, Soot, Petablox) successfully run on these programs. 
These programs are repositories picked from the set of Java-8 projects on Github that compile and run successfully. 
Each of these programs comes with an executable jar file, the compiled bytecode file, and the Java source code. 

There are 3 files available for download:,, benchmark_stats.csv. has the actual dataset programs. contains Python3 scripts to run analysis tools (SpotBugs, Wala, Doop, Soot, Petablox) on the entire dataset. The benchmark_stats.csv file lists, for each benchmark, the number of nodes and edges in its dynamic application call-graph, as well as the number of edges in its static application call-graph (as computed by Wala). 
A summary of the same is listed here:

Statistics  Dynamic-Nodes  Dynamic-Edges  Static-Edges
Mean                 205                         469                  1404
St.Dev               199                         464                   2523
Median              149                         327                   610

To cite the dataset, please cite the following paper:
Jens Palsberg and Cristina V. Lopes, NJR: a Normalized Java Resource. 
In Proceedings of ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP), 2018.

Funded by the following NSF grant (
Files (2.6 GB)
Name Size
25.5 kB Download
2.6 GB Download
24.2 kB Download
All versions This version
Views 14453
Downloads 8837
Data volume 13.0 GB5.2 GB
Unique views 11740
Unique downloads 6726


Cite as