Published March 24, 2018 | Version v1
Dataset Open

Apache POI pre-processed data for the first DocGen challenge at DySDoc 3

  • 1. McGill University
  • 2. The University of Texas at Dallas
  • 3. University of Adelaide
  • 4. Università della Svizzera italiana
  • 5. University of Delaware
  • 6. University of Victoria
  • 7. Northern Arizona University
  • 8. Nara Institute of Science and Technology
  • 9. Tokyo Institute of Technology
  • 10. Colorado State University
  • 11. University of Alberta
  • 12. ABB Corporate Research

Description

Apache POI pre-processed data for the first DocGen challenge

The pre-processed data for First Software Documentation Generation Challenge (DocGen), hosted at the Third International Workshop on Dynamic Software Documentation (DySDoc 3), includes the following datasets for Apache POI 3.17:

Call graph between method and classes.

File: call-graph-poi-3.17-all.zip

CSV file with the call graph between methods and between classes. Class A calls class B if there exists a call between amethod of class A and a method of class B. The call graph was produced by the tool java-callgraph

The CSV file contains the following columns:

  • call_type: call between (C)lasses or (M)ethods
  • caller: the Fully Qualified Name (FQN) of the caller
  • method_call_type: the type of method call:
    • M for invokevirtual calls
    • I for invokeinterface calls
    • O for invokespecial calls
    • S for invokestatic calls
    • D for invokedynamic calls
  • callee: the FQN of the callee

For more details about the format and each type of method call, check the tool README.

Inheritance hierarchy

File: poi-3.17-inheritance.zip

A CSV file with the inheritance hierarchy of POI, which was extracted using bcel 6.2

The CSV file contains the following columns:

  • record_id: sequential number
  • parent_class: the parent class
  • child_class: the child class
  • relationship_type: the type of relationship between classes, i.e., the child class 'extends' or 'implements' the parent class

Issues

File: bugzilla-poi-dump.zip

CSV file with the list of issues of Apache POI (timestamp: Tue Feb 27, 2018, 18.41.40 UTC)

The CSV file contains the following columns:

  • record_id: sequential number
  • issue_id: the ID that identifies the issue in the issue tracker
  • issue_url: the URL of the issue in the issue tracker
  • issue_title: the title of the issue
  • xml_path: the path to the XML of the issue, which contains all the issue information provided by the issue tracker

All the issues in XML format can be found in the "poi" folder in the ZIP file

Commits

File: poi-commits.zip

A JSON file with commit information for POI 3.17 (until revision 219dff00e6, on Sept. 8, 2017). The information was extracted using the tools Historage and Kataribe.

For each commit, we provide:

  • Commit hash
  • Parent commit hash (if exists)
  • Commit message
  • Commit time
  • Committer name
  • Method-level changes (addition/deletion/modification/renaming and method FQN).
    • The FQN contains information about the class (CN) and method (MT) or constructor (CS)

StackOverflow posts

File: apache-poi-SO.zip

JSON file with all 6,299 Stack Overflow threads with the apache-poi tag,

Files

apache-poi-SO.zip

Files (184.3 MB)

Name Size Download all
md5:d1323fce43b7bb2dd28c4371a871c795
12.4 MB Preview Download
md5:ed33d2df3151cc1c18db4cbf9f4d11f2
166.8 MB Preview Download
md5:94167dd47a30b2b995c9cbd4d866cf36
2.3 MB Preview Download
md5:d2da7a882e80164fa08d115c0629cb71
85.3 kB Preview Download
md5:39606442e9240c99fb3f570bb4c8b6b6
2.7 MB Preview Download