Dataset Open Access

Apache POI pre-processed data for the first DocGen challenge at DySDoc 3

Robillard, Martin; Marcus, Andrian; Treude, Christoph; Lanza, Michele; Chaparro, Oscar; Clause, James; Ernst, Neil; Gerosa, Marco; Hata, Hideaki; Hayashi, Shinpei; Kobayashi, Takashi; Moreno, Laura; Nadi, Sarah; Shepherd, David

Apache POI pre-processed data for the first DocGen challenge

The pre-processed data for First Software Documentation Generation Challenge (DocGen), hosted at the Third International Workshop on Dynamic Software Documentation (DySDoc 3), includes the following datasets for Apache POI 3.17:

Call graph between method and classes.


CSV file with the call graph between methods and between classes. Class A calls class B if there exists a call between amethod of class A and a method of class B. The call graph was produced by the tool java-callgraph

The CSV file contains the following columns:

  • call_type: call between (C)lasses or (M)ethods
  • caller: the Fully Qualified Name (FQN) of the caller
  • method_call_type: the type of method call:
    • M for invokevirtual calls
    • I for invokeinterface calls
    • O for invokespecial calls
    • S for invokestatic calls
    • D for invokedynamic calls
  • callee: the FQN of the callee

For more details about the format and each type of method call, check the tool README.

Inheritance hierarchy


A CSV file with the inheritance hierarchy of POI, which was extracted using bcel 6.2

The CSV file contains the following columns:

  • record_id: sequential number
  • parent_class: the parent class
  • child_class: the child class
  • relationship_type: the type of relationship between classes, i.e., the child class 'extends' or 'implements' the parent class



CSV file with the list of issues of Apache POI (timestamp: Tue Feb 27, 2018, 18.41.40 UTC)

The CSV file contains the following columns:

  • record_id: sequential number
  • issue_id: the ID that identifies the issue in the issue tracker
  • issue_url: the URL of the issue in the issue tracker
  • issue_title: the title of the issue
  • xml_path: the path to the XML of the issue, which contains all the issue information provided by the issue tracker

All the issues in XML format can be found in the "poi" folder in the ZIP file



A JSON file with commit information for POI 3.17 (until revision 219dff00e6, on Sept. 8, 2017). The information was extracted using the tools Historage and Kataribe.

For each commit, we provide:

  • Commit hash
  • Parent commit hash (if exists)
  • Commit message
  • Commit time
  • Committer name
  • Method-level changes (addition/deletion/modification/renaming and method FQN).
    • The FQN contains information about the class (CN) and method (MT) or constructor (CS)

StackOverflow posts


JSON file with all 6,299 Stack Overflow threads with the apache-poi tag,

Files (184.3 MB)
Name Size
12.4 MB Download
166.8 MB Download
2.3 MB Download
85.3 kB Download
2.7 MB Download
All versions This version
Views 197197
Downloads 4545
Data volume 1.2 GB1.2 GB
Unique views 186186
Unique downloads 2323


Cite as