Dataset Open Access
NJR is a Normalized Java Resource.
The NJR-1 dataset consists of 293 Java bytecode programs, each of which runs successfully with the following 12 Java static analysis tools:
1. SpotBugs (https://spotbugs.github.io)
2. Wala (https://wala.github.io)
3. Doop (https://bitbucket.org/yanniss/doop)
4. Soot (https://github.com/soot-oss/soot)
5. Petablox (https://github.com/petablox/petablox)
6. Infer (https://fbinfer.com)
7. Error-Prone (http://errorprone.info)
8. Checker-Framework (https://checkerframework.org)
9. Opium (Opal-framework) (https://www.opal-project.de)
10. Spoon (https://spoon.gforge.inria.fr)
11. PMD (https://pmd.github.io)
12. CheckStyle (https://checkstyle.org)
Additionally, each program executes at least 100 unique application methods at runtime. These programs are repositories picked from the set of Java-8 projects on Github that compile and run successfully. Each of these programs come with a jar file, the compiled bytecode files, compiled library files and the Java source code. It also comes with a list of source files, declared methods, application-classes list, and main-class names. The availability of the files in both jar-file form, as well as source code form (with the compiled library classes) is a major reason the dataset works with so many tools, without requiring any extra effort.
There are 3 files available for download: njr-1_dataset.zip, scripts.zip, benchmark_stats.csv.
njr-1_dataset.zip has the actual dataset programs. scripts.zip contains Python3 scripts for each tool, to run it on the entire dataset. The benchmark_stats.csv file lists, for each benchmark, the number of nodes and edges in its dynamic application call-graph, as well as the number of edges in its static application call-graph (as computed by Wala) when using the main function listed in the info/mainclassname file.
A summary of the same is listed here:
Statistics Dynamic-Nodes Dynamic-Edges Static-Edges
Mean 205 469 1404
St.Dev 199 464 2523
Median 149 327 610
To cite the dataset, please cite the following paper:
Jens Palsberg and Cristina V. Lopes, NJR: a Normalized Java Resource.
In Proceedings of ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP), 2018.