Published October 11, 2023 | Version v1
Dataset Open

Angular GitHub Commits Object-centric Event Log

  • 1. Chair of Process and Data Science (PADS), RWTH Aachen University)

Contributors

  • 1. Chair of Process and Data Science (PADS), RWTH Aachen University

Description

Overview

This real-world object-centric event log in the OCEL 2.0 standard contains an extraction of the commit information from the GitHub repository used to developed the Angular platform. A single code commit in the repository is abstracted to one event in the log. The dataset contains essential information for each commit, such as the timestamp and the contributor's details. Crucially, commit information is connected to two classes of objects: the file(s) affected by the commit, and the branch(es) in the repository containing the commit.

Description

GitHub, a popular platform for developers offering the functionalities of the Git versioning system, allows to record single modifications to software projects by contributors; such modifications are grouped in units called commits. Commits contain all details of the edits operated on a group of files in the projects. Therefore, all commits of a project constitute a ledger, that allows to rewind or fast-forward all contributions in the project.

Commits in a project are arranged in branches, which form a tree-like structure. A contributor may create a new branch, essentially a copy of the project, in order to commit modifications safely. Once the contributor is satisfied with the edits, they may merge their new branch back into the pre-existing branch (realized by applying the modifications of all the new commits sequentially, and then solving the conflicts that may arise).

This log contains an extraction of the commit information of the Angular project on GitHub. The abstraction level is such that every commit corresponds to an event in the log.

For each event, the following information is recorded:

  • a unique identifier (hash)
  • the author's timestamp of the commit (includes timezone information)
  • an activity label: the Angular project conforms to the Conventional Commits initiative, which mandates commit messages containing an initial identifier. This helps to reconstruct a clean activity notion. Some of the labels have been cleaned by hand (for instance, in case of typos)
  • the message of the commit
  • the contributor's name
  • the contributor's email (resource)
  • a merge flag; True if the commit is a merge, False otherwise
  • information related to the files edited by the commit (in case of renames, we track the new name)
  • information related to the branches in which the commit appears

Files and branches are two distinct object types in this log. Note that a commit might not be associated to any file. Conversely, a commit always appears in at least one branch.

This event log has been extracted with the help of PyDriller.

Properties

This event log has the following properties:

Property Value
Events 27847
Activity Labels 67
Object Types 2
Objects (files) 35392
Objects (branches) 119

Get started

Download the dataset, and position it in the folder of your Python script or console.

pip install pm4py

To manipulate object-centric logs programmatically, use the functionality of the ocel package in the PM4Py library. Additionally, check out the tool support for object-centric event logs!

from pm4py import ocel

Acknowledgements

We thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.

Files

angular_github_commits_ocel.csv

Files (326.1 MB)

Name Size Download all
md5:20e844f9f56cec07fc00160ccbaea205
57.9 MB Preview Download
md5:9d485d0aa4799862305f45c1a3412088
90.1 MB Download
md5:6c6200127174d953379ebd50e0ab7e70
178.1 MB Preview Download