Published January 31, 2020 | Version 1.0.0
Dataset Open

LogChunks: A Data Set for Build Log Analysis

  • 1. Delft University of Technology

Description

We collected 797 Travis CI logs from a wide range of 80 GitHub repositories from 29 different main development languages.
You can find our collection tool in `log-collection` and the logs sorted by language and repository in `logs`.

We manually labeled the part (chunk) of the log describing why the build failed.In addition, the chunks are annotated with keywords that we would use to search for them and categorized according to their structural representation within the log.
You can find this data in an xml-file for each repository in `build-failure-reason`.

Files

LogChunks.zip

Files (24.1 MB)

Name Size Download all
md5:aafa45079bdae44e340f4474ca5c4340
24.1 MB Preview Download