{ "access": { "embargo": { "active": false, "reason": null }, "files": "public", "record": "public", "status": "open" }, "created": "2018-01-09T19:05:46.098646+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "files": { "count": 5, "enabled": true, "entries": { "tr_projects_sample_filtered.csv": { "checksum": "md5:abde8855d3e00d07f92a9b33e6f7508e", "ext": "csv", "id": "88877050-f48a-408e-9124-74702c56253a", "key": "tr_projects_sample_filtered.csv", "metadata": null, "mimetype": "text/csv", "size": 56573 }, "tr_sample_commits_default_branch_before_ci.csv": { "checksum": "md5:8151aae585b3200b53dc8dd279a8b4b9", "ext": "csv", "id": "d8ee1b4d-6b54-4926-a5ef-cb78dd2b01a4", "key": "tr_sample_commits_default_branch_before_ci.csv", "metadata": null, "mimetype": "text/csv", "size": 14189298 }, "tr_sample_commits_default_branch_during_ci.csv": { "checksum": "md5:9fa59d1b42aaf1f9d986265f316531c4", "ext": "csv", "id": "35c8106d-4ed8-4f57-9bef-80a0cdc30316", "key": "tr_sample_commits_default_branch_during_ci.csv", "metadata": null, "mimetype": "text/csv", "size": 14463352 }, "tr_sample_merges_default_branch_before_ci.csv": { "checksum": "md5:79b265b0a6b6f0015a8c95d16e8858f0", "ext": "csv", "id": "6d7d6417-2fb2-413b-8084-119627633ba6", "key": "tr_sample_merges_default_branch_before_ci.csv", "metadata": null, "mimetype": "text/csv", "size": 1653515 }, "tr_sample_merges_default_branch_during_ci.csv": { "checksum": "md5:c15caa9e6cf4cec1e5beacf0c5b2f7dc", "ext": "csv", "id": "fc3b3c0a-7858-4703-b87e-28c26bb5ba77", "key": "tr_sample_merges_default_branch_during_ci.csv", "metadata": null, "mimetype": "text/csv", "size": 2582292 } }, "order": [], "total_bytes": 32945030 }, "id": "1140261", "is_draft": false, "is_published": true, "links": { "access": "https://zenodo.org/api/records/1140261/access", "access_links": "https://zenodo.org/api/records/1140261/access/links", "access_request": "https://zenodo.org/api/records/1140261/access/request", "access_users": "https://zenodo.org/api/records/1140261/access/users", "archive": "https://zenodo.org/api/records/1140261/files-archive", "archive_media": "https://zenodo.org/api/records/1140261/media-files-archive", "communities": "https://zenodo.org/api/records/1140261/communities", "communities-suggestions": "https://zenodo.org/api/records/1140261/communities-suggestions", "doi": "https://doi.org/10.5281/zenodo.1140261", "draft": "https://zenodo.org/api/records/1140261/draft", "files": "https://zenodo.org/api/records/1140261/files", "latest": "https://zenodo.org/api/records/1140261/versions/latest", "latest_html": "https://zenodo.org/records/1140261/latest", "media_files": "https://zenodo.org/api/records/1140261/media-files", "parent": "https://zenodo.org/api/records/1140260", "parent_doi": "https://zenodo.org/doi/10.5281/zenodo.1140260", "parent_html": "https://zenodo.org/records/1140260", "requests": "https://zenodo.org/api/records/1140261/requests", "reserve_doi": "https://zenodo.org/api/records/1140261/draft/pids/doi", "self": "https://zenodo.org/api/records/1140261", "self_doi": "https://zenodo.org/doi/10.5281/zenodo.1140261", "self_html": "https://zenodo.org/records/1140261", "self_iiif_manifest": "https://zenodo.org/api/iiif/record:1140261/manifest", "self_iiif_sequence": "https://zenodo.org/api/iiif/record:1140261/sequence/default", "versions": "https://zenodo.org/api/records/1140261/versions" }, "media_files": { "count": 0, "enabled": false, "entries": {}, "order": [], "total_bytes": 0 }, "metadata": { "creators": [ { "affiliations": [ { "name": "University of Trier" } ], "person_or_org": { "family_name": "Baltes", "given_name": "Sebastian", "identifiers": [ { "identifier": "0000-0002-2442-7522", "scheme": "orcid" } ], "name": "Baltes, Sebastian", "type": "personal" } }, { "affiliations": [ { "name": "University of Trier" } ], "person_or_org": { "family_name": "Knack", "given_name": "Jascha", "name": "Knack, Jascha", "type": "personal" } } ], "description": "
This dataset is based on the TravisTorrent dataset released 2017-01-11 (https://travistorrent.testroots.org), the Google BigQuery GHTorrent dataset accessed 2017-07-03, and the Git log history of all projects in the dataset, retrieved 2017-07-16 - 2017-07-17.
\n\nWe selected projects hosted on GitHub that employ the Continuous Integration (CI) system Travis CI. We identified the projects using the TravisTorrent data set and considered projects that:
\n\nTo derive the time frames, we employed the GHTorrent Big Query data set. The resulting sample contains 321 projects. Of these projects, 214 are Ruby projects and 107 are Java projects. The mean time span before_ci was 2.9 years (SD=1.9, Mdn=2.3), the mean time span during_ci was 3.2 years (SD=1.1, Mdn=3.3). For our analysis, we only consider the activity one year before and after the first build.
\n\nWe cloned the selected project repositories and extracted the version history for all branches (see https://github.com/sbaltes/git-log-parser). For each repo and branch, we created one log file with all regular commits and one log file with all merges. We only considered commits changing non-binary files and applied a file extension filter to only consider changes to Java or Ruby source code files. From the log files, we then extracted metadata about the commits and stored this data in CSV files (see https://github.com/sbaltes/git-log-parser).
\n\nThe dataset contains the following files:
\n\ntr_projects_sample_filtered.csv
\nA CSV file with information about the 321 selected projects.
tr_sample_commits_default_branch_before_ci.csv
\ntr_sample_commits_default_branch_during_ci.csv
\nOne CSV file with information about all commits to the default branch before and after the first CI build. Only commits modifying, adding, or deleting Java or Ruby source code files were considered. Those CSV files have the following columns:
project: GitHub project name ("/" replaced by "_").
\nbranch: The branch to which the commit was made.
\nhash_value: The SHA1 hash value of the commit.
\nauthor_name: The author name.
\nauthor_email: The author email address.
\nauthor_date: The authoring timestamp.
\ncommit_name: The committer name.
\ncommit_email: The committer email address.
\ncommit_date: The commit timestamp.
\nlog_message_length: The length of the git commit messages (in characters).
\nfile_count: Files changed with this commit.
\nlines_added: Lines added to all files changed with this commit.
\nlines_deleted: Lines deleted in all files changed with this commit.
\nfile_extensions: Distinct file extensions of files changed with this commit.
tr_sample_merges_default_branch_before_ci.csv
\ntr_sample_merges_default_branch_during_ci.csv
\nOne CSV file with information about all merges into the default branch before and after the first CI build. Only merges modifying, adding, or deleting Java or Ruby source code files were considered. Those CSV files have the following columns:
project: GitHub project name ("/" replaced by "_").
\nbranch: The destination branch of the merge.
\nhash_value: The SHA1 hash value of the merge commit.
\nmerged_commits: Unique hash value prefixes of the commits merged with this commit.
\nauthor_name: The author name.
\nauthor_email: The author email address.
\nauthor_date: The authoring timestamp.
\ncommit_name: The committer name.
\ncommit_email: The committer email address.
\ncommit_date: The commit timestamp.
\nlog_message_length: The length of the git commit messages (in characters).
\nfile_count: Files changed with this commit.
\nlines_added: Lines added to all files changed with this commit.
\nlines_deleted: Lines deleted in all files changed with this commit.
\nfile_extensions: Distinct file extensions of files changed with this commit.
\npull_request_id: ID of the GitHub pull request that has been merged with this commit (extracted from log message).
\nsource_user: GitHub login name of the user who initiated the pull request (extracted from log message).
\nsource_branch : Source branch of the pull request (extracted from log message).