Replication Package for "Beyond the YAML File: Understanding Real-World GitHub Actions Workflow Adoption"

Ali, Khatami; Carolin, Brandt; Andy, Zaidman

doi:10.5281/zenodo.18258226

Published 2026 | Version v2

Dataset Open

Replication Package for "Beyond the YAML File: Understanding Real-World GitHub Actions Workflow Adoption"

This repository contains the replication package for the study "Beyond the YAML File: Understanding Real-World GitHub Actions Workflow Adoption" . It provides the code, data artifacts, and instructions necessary to reproduce and access the quantitative and qualitative analyses from the paper.

Repository structure

- data-pipelines-and-analysis/
- data_pipeline/: End-to-end data collection, storage, and analysis pipeline.
- collect_data.py: Entry point to collect data from the GitHub API.
- config/: Configuration (API tokens, runtime options).
- crawlers/: Modular crawlers for repositories, commits, pull requests, workflow runs, and jobs.
- database/: DB connection and ORM-like models used during collection.
- persistence/: Storage interfaces and stores.
- data/: Lightweight helpers and logs for local data handling.
- analysis/: Reproduction scripts for figures, tables, and statistics used in the paper. Includes figures/ and intermediate data/ CSVs used by scripts.
- services/ and scripts/: Utilities and checks (e.g., data quality scripts).
- README.md: Detailed setup, configuration, and execution steps for the pipeline and analyses.
- manual_and_qualitative/: CSVs and notes for manual and qualitative analysis. See its README.md for details and suggested usage.

Quick start

Reproducing the pipeline and analysis:

1. Navigate to data-pipelines-and-analysis/data_pipeline/.
2. Create and activate a virtual environment.
3. Install requirements: pip install -r requirements.txt
4. Follow the instructions in data-pipelines-and-analysis/data_pipeline/README.md to configure credentials, run data collection, and execute analysis scripts in analysis/.

For qualitative artifacts and manual coding resources, see data-pipelines-and-analysis/manual_and_qualitative/README.md.

Reproducing figures and tables

The analysis/ folder inside data_pipeline/ contains scripts that generate the figures and summary tables used in the paper. Many scripts read from analysis/data/ and write outputs to analysis/figures/. Refer to each script's docstring and the data_pipeline/README.md.

Files

Beyond the YAML File Understanding Real-World GitHub Actions Workflow Adoption - Replication Package.zip

Files (137.3 MB)

Name	Size	Download all
Beyond the YAML File Understanding Real-World GitHub Actions Workflow Adoption - Replication Package.zip md5:70a5ce9452725b8748eb1e11f136c074	137.3 MB	Preview Download

Additional details

Programming language: Python

	All versions	This version
Views	82	59
Downloads	20	4
Data volume	2.7 GB	549.2 MB

Replication Package for "Beyond the YAML File: Understanding Real-World GitHub Actions Workflow Adoption"

Authors/Creators

Description

Files

Beyond the YAML File Understanding Real-World GitHub Actions Workflow Adoption - Replication Package.zip

Files (137.3 MB)

Additional details

Software