There is a newer version of this record available.

Dataset Open Access

NJR-1 Dataset

Utture, Akshay; Kalhauge, Christian Gram; Liu, Shuyang; Palsberg, Jens

NJR is a Normalized Java Resource.

The NJR-1 dataset consists of 293 Java bytecode programs, each of which runs successfully with the following 12 Java static analysis tools:

1.  SpotBugs (
2.  Wala (
3.  Doop (
4.  Soot (
5.  Petablox (
6.  Infer (
7.  Error-Prone (
8.  Checker-Framework (
9.  Opium (Opal-framework) (
10. Spoon (
11. PMD (
12. CheckStyle (

Additionally, each program executes at least 100 unique application methods at runtime. These programs are repositories picked from the set of Java-8 projects on Github that compile and run successfully. Each of these programs come with a jar file, the compiled bytecode files, compiled library files and the Java source code. It also comes with a list of source files, declared methods, application-classes list, and main-class names. The availability of the files in both jar-file form, as well as source code form (with the compiled library classes) is a major reason the dataset works with so many tools, without requiring any extra effort.

There are 3 files available for download:,, benchmark_stats.csv. has the actual dataset programs. contains Python3 scripts for each tool, to run it on the entire dataset. The benchmark_stats.csv file lists, for each benchmark, the number of nodes and edges in its dynamic application call-graph, as well as the number of edges in its static application call-graph (as computed by Wala) when using the main function listed in the info/mainclassname file. 
A summary of the same is listed here:

Statistics  Dynamic-Nodes  Dynamic-Edges  Static-Edges
Mean                 205                         469                  1404
St.Dev               199                         464                   2523
Median              149                         327                   610

To cite the dataset, please cite the following paper:
Jens Palsberg and Cristina V. Lopes, NJR: a Normalized Java Resource. 
In Proceedings of ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP), 2018.

Funded by the following NSF grant (
Files (2.6 GB)
Name Size
25.5 kB Download
2.6 GB Download
377.6 kB Download
All versions This version
Views 556240
Downloads 442167
Data volume 161.7 GB44.3 GB
Unique views 444213
Unique downloads 318141


Cite as