Published March 24, 2018 | Version v1
Dataset Open

Apache POI pre-processed data for the first DocGen challenge at DySDoc 3

  • 1. McGill University
  • 2. The University of Texas at Dallas
  • 3. University of Adelaide
  • 4. Università della Svizzera italiana
  • 5. University of Delaware
  • 6. University of Victoria
  • 7. Northern Arizona University
  • 8. Nara Institute of Science and Technology
  • 9. Tokyo Institute of Technology
  • 10. Colorado State University
  • 11. University of Alberta
  • 12. ABB Corporate Research


Apache POI pre-processed data for the first DocGen challenge

The pre-processed data for First Software Documentation Generation Challenge (DocGen), hosted at the Third International Workshop on Dynamic Software Documentation (DySDoc 3), includes the following datasets for Apache POI 3.17:

Call graph between method and classes.


CSV file with the call graph between methods and between classes. Class A calls class B if there exists a call between amethod of class A and a method of class B. The call graph was produced by the tool java-callgraph

The CSV file contains the following columns:

  • call_type: call between (C)lasses or (M)ethods
  • caller: the Fully Qualified Name (FQN) of the caller
  • method_call_type: the type of method call:
    • M for invokevirtual calls
    • I for invokeinterface calls
    • O for invokespecial calls
    • S for invokestatic calls
    • D for invokedynamic calls
  • callee: the FQN of the callee

For more details about the format and each type of method call, check the tool README.

Inheritance hierarchy


A CSV file with the inheritance hierarchy of POI, which was extracted using bcel 6.2

The CSV file contains the following columns:

  • record_id: sequential number
  • parent_class: the parent class
  • child_class: the child class
  • relationship_type: the type of relationship between classes, i.e., the child class 'extends' or 'implements' the parent class



CSV file with the list of issues of Apache POI (timestamp: Tue Feb 27, 2018, 18.41.40 UTC)

The CSV file contains the following columns:

  • record_id: sequential number
  • issue_id: the ID that identifies the issue in the issue tracker
  • issue_url: the URL of the issue in the issue tracker
  • issue_title: the title of the issue
  • xml_path: the path to the XML of the issue, which contains all the issue information provided by the issue tracker

All the issues in XML format can be found in the "poi" folder in the ZIP file



A JSON file with commit information for POI 3.17 (until revision 219dff00e6, on Sept. 8, 2017). The information was extracted using the tools Historage and Kataribe.

For each commit, we provide:

  • Commit hash
  • Parent commit hash (if exists)
  • Commit message
  • Commit time
  • Committer name
  • Method-level changes (addition/deletion/modification/renaming and method FQN).
    • The FQN contains information about the class (CN) and method (MT) or constructor (CS)

StackOverflow posts


JSON file with all 6,299 Stack Overflow threads with the apache-poi tag,


Files (184.3 MB)

Name Size Download all
12.4 MB Preview Download
166.8 MB Preview Download
2.3 MB Preview Download
85.3 kB Preview Download
2.7 MB Preview Download