LogChunks: A Data Set for Build Log Analysis
We collected 797 Travis CI logs from a wide range of 80 GitHub repositories from 29 different main development languages.
You can find our collection tool in `log-collection` and the logs sorted by language and repository in `logs`.
We manually labeled the part (chunk) of the log describing why the build failed.In addition, the chunks are annotated with keywords that we would use to search for them and categorized according to their structural representation within the log.
You can find this data in an xml-file for each repository in `build-failure-reason`.