There is a newer version of this record available.

Dataset Open Access

NJR-1 Dataset

Utture, Akshay; Kalhauge, Christian Gram; Liu, Shuyang; Palsberg, Jens


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p>NJR is a Normalized Java Resource.</p>\n\n<p>The <em>NJR-1</em> dataset consists of 293 Java bytecode programs, each of which runs successfully with the following 12 Java static analysis tools:</p>\n\n<p>1. &nbsp;SpotBugs (https://spotbugs.github.io)<br>\n2. &nbsp;Wala (https://wala.github.io)<br>\n3. &nbsp;Doop (https://bitbucket.org/yanniss/doop)<br>\n4. &nbsp;Soot (https://github.com/soot-oss/soot)<br>\n5. &nbsp;Petablox (https://github.com/petablox/petablox)<br>\n6. &nbsp;Infer (https://fbinfer.com)<br>\n7. &nbsp;Error-Prone (http://errorprone.info)<br>\n8. &nbsp;Checker-Framework (https://checkerframework.org)<br>\n9. &nbsp;Opium (Opal-framework) (https://www.opal-project.de)<br>\n10. Spoon (https://spoon.gforge.inria.fr)<br>\n11. PMD (https://pmd.github.io)<br>\n12. CheckStyle (https://checkstyle.org)</p>\n\n<p>Additionally, each program&nbsp;executes at least 100 unique application methods at runtime.&nbsp;These programs are repositories picked from the set of Java-8 projects on Github that compile and run successfully.&nbsp;Each of these programs come with a jar file, the compiled bytecode files, compiled library files&nbsp;and the Java source code. It also comes with a list of source files, declared methods, application-classes list, and main-class names.&nbsp;The availability of the files in both jar-file form, as well as source code form (with the compiled library classes) is a major reason the dataset works&nbsp;with&nbsp;so many tools, without requiring any extra effort.</p>\n\n<p>There are 3 files available for download: <em>njr-1_dataset.zip, scripts.zip, benchmark_stats.csv.</em></p>\n\n<p><em>njr-1_dataset.zip</em> has the actual dataset programs. <em>scripts.zip</em> contains&nbsp;Python3 scripts&nbsp;for each tool, to run it&nbsp;on the entire dataset.&nbsp;The <em>benchmark_stats.csv</em> file lists, for each benchmark, the number of nodes and edges in its dynamic application call-graph, as well as the number of edges in its static application call-graph (as computed by Wala) when using the main function listed in the&nbsp;<em>info/mainclassname</em> file.&nbsp;<br>\nA summary of the same is listed here:</p>\n\n<p><strong><em>Statistics &nbsp;Dynamic-Nodes &nbsp;Dynamic-Edges &nbsp;Static-Edges</em></strong><br>\nMean &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 205&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;469&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1404<br>\nSt.Dev &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 199&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;464&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2523<br>\nMedian &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;149&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;327&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;610</p>\n\n<p>To cite the dataset, please cite the following paper:<br>\nJens Palsberg and Cristina V. Lopes,&nbsp;NJR: a&nbsp;Normalized Java Resource.&nbsp;<br>\nIn Proceedings of ACM SIGPLAN International Workshop&nbsp;on State Of the Art in Program Analysis (SOAP), 2018.</p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "UCLA", 
      "@id": "https://orcid.org/0000-0002-9623-3049", 
      "@type": "Person", 
      "name": "Utture, Akshay"
    }, 
    {
      "affiliation": "UCLA", 
      "@type": "Person", 
      "name": "Kalhauge, Christian Gram"
    }, 
    {
      "affiliation": "UCLA", 
      "@type": "Person", 
      "name": "Liu, Shuyang"
    }, 
    {
      "affiliation": "UCLA", 
      "@type": "Person", 
      "name": "Palsberg, Jens"
    }
  ], 
  "url": "https://zenodo.org/record/4632231", 
  "datePublished": "2020-06-16", 
  "version": "1.0.1", 
  "keywords": [
    "Static Analysis, Java"
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/ec39320b-0f2b-453d-b132-0c6379c25f71/benchmark_stats.csv", 
      "encodingFormat": "csv", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/ec39320b-0f2b-453d-b132-0c6379c25f71/njr-1_dataset.zip", 
      "encodingFormat": "zip", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/ec39320b-0f2b-453d-b132-0c6379c25f71/scripts.zip", 
      "encodingFormat": "zip", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.4632231", 
  "@id": "https://doi.org/10.5281/zenodo.4632231", 
  "@type": "Dataset", 
  "name": "NJR-1 Dataset"
}
144
88
views
downloads
All versions This version
Views 14473
Downloads 8840
Data volume 13.0 GB5.2 GB
Unique views 11765
Unique downloads 6736

Share

Cite as