There is a newer version of this record available.

Dataset Open Access

NJR-1 Dataset

Utture, Akshay; Kalhauge, Christian Gram; Liu, Shuyang; Palsberg, Jens

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Utture, Akshay</dc:creator>
  <dc:creator>Kalhauge, Christian Gram</dc:creator>
  <dc:creator>Liu, Shuyang</dc:creator>
  <dc:creator>Palsberg, Jens</dc:creator>
  <dc:description>NJR is a Normalized Java Resource.

The NJR-1 dataset consists of 293 Java bytecode programs, each of which runs successfully with the following 12 Java static analysis tools:

1.  SpotBugs (
2.  Wala (
3.  Doop (
4.  Soot (
5.  Petablox (
6.  Infer (
7.  Error-Prone (
8.  Checker-Framework (
9.  Opium (Opal-framework) (
10. Spoon (
11. PMD (
12. CheckStyle (

Additionally, each program executes at least 100 unique application methods at runtime. These programs are repositories picked from the set of Java-8 projects on Github that compile and run successfully. Each of these programs come with a jar file, the compiled bytecode files, compiled library files and the Java source code. It also comes with a list of source files, declared methods, application-classes list, and main-class names. The availability of the files in both jar-file form, as well as source code form (with the compiled library classes) is a major reason the dataset works with so many tools, without requiring any extra effort.

There are 3 files available for download:,, benchmark_stats.csv. has the actual dataset programs. contains Python3 scripts for each tool, to run it on the entire dataset. The benchmark_stats.csv file lists, for each benchmark, the number of nodes and edges in its dynamic application call-graph, as well as the number of edges in its static application call-graph (as computed by Wala) when using the main function listed in the info/mainclassname file. 
A summary of the same is listed here:

Statistics  Dynamic-Nodes  Dynamic-Edges  Static-Edges
Mean                 205                         469                  1404
St.Dev               199                         464                   2523
Median              149                         327                   610

To cite the dataset, please cite the following paper:
Jens Palsberg and Cristina V. Lopes, NJR: a Normalized Java Resource. 
In Proceedings of ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP), 2018.</dc:description>
  <dc:description>Funded by the following NSF grant (;HistoricalAwards=false)</dc:description>
  <dc:subject>Static Analysis, Java</dc:subject>
  <dc:title>NJR-1 Dataset</dc:title>
All versions This version
Views 15281
Downloads 9345
Data volume 13.0 GB5.2 GB
Unique views 12573
Unique downloads 7241


Cite as