There is a newer version of this record available.

Dataset Open Access

NJR-1 Dataset

Utture, Akshay; Kalhauge, Christian Gram; Liu, Shuyang; Palsberg, Jens

NJR is a Normalized Java Resource.

The NJR-1 dataset consists of 293 Java bytecode programs, each of which runs successfully with the following 12 Java static analysis tools:

1.  SpotBugs (https://spotbugs.github.io)
2.  Wala (https://wala.github.io)
3.  Doop (https://bitbucket.org/yanniss/doop)
4.  Soot (https://github.com/soot-oss/soot)
5.  Petablox (https://github.com/petablox/petablox)
6.  Infer (https://fbinfer.com)
7.  Error-Prone (http://errorprone.info)
8.  Checker-Framework (https://checkerframework.org)
9.  Opium (Opal-framework) (https://www.opal-project.de)
10. Spoon (https://spoon.gforge.inria.fr)
11. PMD (https://pmd.github.io)
12. CheckStyle (https://checkstyle.org)

Additionally, each program executes at least 100 unique application methods at runtime. These programs are repositories picked from the set of Java-8 projects on Github that compile and run successfully. Each of these programs come with a jar file, the compiled bytecode files, compiled library files and the Java source code. It also comes with a list of source files, declared methods, application-classes list, and main-class names. The availability of the files in both jar-file form, as well as source code form (with the compiled library classes) is a major reason the dataset works with so many tools, without requiring any extra effort.

There are 3 files available for download: njr-1_dataset.zip, scripts.zip, benchmark_stats.csv.

njr-1_dataset.zip has the actual dataset programs. scripts.zip contains Python3 scripts for each tool, to run it on the entire dataset. The benchmark_stats.csv file lists, for each benchmark, the number of nodes and edges in its dynamic application call-graph, as well as the number of edges in its static application call-graph (as computed by Wala) when using the main function listed in the info/mainclassname file. 
A summary of the same is listed here:

Statistics  Dynamic-Nodes  Dynamic-Edges  Static-Edges
Mean                 205                         469                  1404
St.Dev               199                         464                   2523
Median              149                         327                   610

To cite the dataset, please cite the following paper:
Jens Palsberg and Cristina V. Lopes, NJR: a Normalized Java Resource. 
In Proceedings of ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP), 2018.

Funded by the following NSF grant (https://www.nsf.gov/awardsearch/showAward?AWD_ID=1823360&HistoricalAwards=false)
Files (2.6 GB)
Name Size
benchmark_stats.csv
md5:f9b0d46bf68b0609722e980c1e317ea8
25.5 kB Download
njr-1_dataset.zip
md5:c0a57feaf93f4b9374b32885b01997fc
2.6 GB Download
scripts.zip
md5:8f9cf0a0df99f8c1d3cfcd9a45cea8ac
377.6 kB Download
111
66
views
downloads
All versions This version
Views 11151
Downloads 6625
Data volume 10.4 GB5.2 GB
Unique views 8945
Unique downloads 4922

Share

Cite as