There is a newer version of this record available.

Dataset Open Access

NJR-1 Dataset

Utture, Akshay; Kalhauge, Christian Gram; Liu, Shuyang; Palsberg, Jens

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.4632231", 
  "language": "eng", 
  "title": "NJR-1 Dataset", 
  "issued": {
    "date-parts": [
  "abstract": "<p>NJR is a Normalized Java Resource.</p>\n\n<p>The <em>NJR-1</em> dataset consists of 293 Java bytecode programs, each of which runs successfully with the following 12 Java static analysis tools:</p>\n\n<p>1. &nbsp;SpotBugs (<br>\n2. &nbsp;Wala (<br>\n3. &nbsp;Doop (<br>\n4. &nbsp;Soot (<br>\n5. &nbsp;Petablox (<br>\n6. &nbsp;Infer (<br>\n7. &nbsp;Error-Prone (<br>\n8. &nbsp;Checker-Framework (<br>\n9. &nbsp;Opium (Opal-framework) (<br>\n10. Spoon (<br>\n11. PMD (<br>\n12. CheckStyle (</p>\n\n<p>Additionally, each program&nbsp;executes at least 100 unique application methods at runtime.&nbsp;These programs are repositories picked from the set of Java-8 projects on Github that compile and run successfully.&nbsp;Each of these programs come with a jar file, the compiled bytecode files, compiled library files&nbsp;and the Java source code. It also comes with a list of source files, declared methods, application-classes list, and main-class names.&nbsp;The availability of the files in both jar-file form, as well as source code form (with the compiled library classes) is a major reason the dataset works&nbsp;with&nbsp;so many tools, without requiring any extra effort.</p>\n\n<p>There are 3 files available for download: <em>,, benchmark_stats.csv.</em></p>\n\n<p><em></em> has the actual dataset programs. <em></em> contains&nbsp;Python3 scripts&nbsp;for each tool, to run it&nbsp;on the entire dataset.&nbsp;The <em>benchmark_stats.csv</em> file lists, for each benchmark, the number of nodes and edges in its dynamic application call-graph, as well as the number of edges in its static application call-graph (as computed by Wala) when using the main function listed in the&nbsp;<em>info/mainclassname</em> file.&nbsp;<br>\nA summary of the same is listed here:</p>\n\n<p><strong><em>Statistics &nbsp;Dynamic-Nodes &nbsp;Dynamic-Edges &nbsp;Static-Edges</em></strong><br>\nMean &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 205&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;469&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1404<br>\nSt.Dev &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 199&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;464&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2523<br>\nMedian &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;149&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;327&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;610</p>\n\n<p>To cite the dataset, please cite the following paper:<br>\nJens Palsberg and Cristina V. Lopes,&nbsp;NJR: a&nbsp;Normalized Java Resource.&nbsp;<br>\nIn Proceedings of ACM SIGPLAN International Workshop&nbsp;on State Of the Art in Program Analysis (SOAP), 2018.</p>", 
  "author": [
      "family": "Utture, Akshay"
      "family": "Kalhauge, Christian Gram"
      "family": "Liu, Shuyang"
      "family": "Palsberg, Jens"
  "note": "Funded by the following NSF grant (;HistoricalAwards=false)", 
  "version": "1.0.1", 
  "type": "dataset", 
  "id": "4632231"
All versions This version
Views 14473
Downloads 8840
Data volume 13.0 GB5.2 GB
Unique views 11765
Unique downloads 6736


Cite as