Angular GitHub Commits Object-centric Event Log
Contributors
Supervisor:
- 1. Chair of Process and Data Science (PADS), RWTH Aachen University
Description
Overview
This real-world object-centric event log in the OCEL 2.0 standard contains an extraction of the commit information from the GitHub repository used to developed the Angular platform. A single code commit in the repository is abstracted to one event in the log. The dataset contains essential information for each commit, such as the timestamp and the contributor's details. Crucially, commit information is connected to two classes of objects: the file(s) affected by the commit, and the branch(es) in the repository containing the commit.
Description
GitHub, a popular platform for developers offering the functionalities of the Git versioning system, allows to record single modifications to software projects by contributors; such modifications are grouped in units called commits. Commits contain all details of the edits operated on a group of files in the projects. Therefore, all commits of a project constitute a ledger, that allows to rewind or fast-forward all contributions in the project.
Commits in a project are arranged in branches, which form a tree-like structure. A contributor may create a new branch, essentially a copy of the project, in order to commit modifications safely. Once the contributor is satisfied with the edits, they may merge their new branch back into the pre-existing branch (realized by applying the modifications of all the new commits sequentially, and then solving the conflicts that may arise).
This log contains an extraction of the commit information of the Angular project on GitHub. The abstraction level is such that every commit corresponds to an event in the log.
For each event, the following information is recorded:
- a unique identifier (hash)
- the author's timestamp of the commit (includes timezone information)
- an activity label: the Angular project conforms to the Conventional Commits initiative, which mandates commit messages containing an initial identifier. This helps to reconstruct a clean activity notion. Some of the labels have been cleaned by hand (for instance, in case of typos)
- the message of the commit
- the contributor's name
- the contributor's email (resource)
- a merge flag; True if the commit is a merge, False otherwise
- information related to the files edited by the commit (in case of renames, we track the new name)
- information related to the branches in which the commit appears
Files and branches are two distinct object types in this log. Note that a commit might not be associated to any file. Conversely, a commit always appears in at least one branch.
This event log has been extracted with the help of PyDriller.
Properties
This event log has the following properties:
Property | Value |
Events | 27847 |
Activity Labels | 67 |
Object Types | 2 |
Objects (files) | 35392 |
Objects (branches) | 119 |
Get started
Download the dataset, and position it in the folder of your Python script or console.
pip install pm4py
To manipulate object-centric logs programmatically, use the functionality of the ocel package in the PM4Py library. Additionally, check out the tool support for object-centric event logs!
from pm4py import ocel
Acknowledgements
We thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.