Code Appendix: The innovation dynamics of programming technologies

Borchers, Conrad; Braesemann, Fabian

doi:10.5281/zenodo.17106338

Published September 12, 2025 | Version 1.0

Software Open

Code Appendix: The innovation dynamics of programming technologies

# Stack Overflow Technology Evolution Dataset & Code

This repository contains the data processing pipeline and analysis code for the paper:

**"The Combinatorial Technology Evolution of Digital Programming Languages"**
Conrad Borchers (Carnegie Mellon University)
Fabian Braesemann (Oxford Internet Institute, ECDF, DWG)

## Overview

This project analyzes the combinatorial evolution of programming technologies using a large-scale dataset of Stack Overflow questions. We investigate how programming tags co-evolve over time, revealing distinct phases in technological development and the network dynamics of innovation.

Our analysis leverages weekly tag-level usage data from Stack Overflow and constructs correlation-based co-usage networks to capture the structure and shifts in digital programming ecosystems.

## Repository Structure

```
.
├── R/
│ ├── functions.R # Core data wrangling and modeling functions
│ └── utils.R # Helper functions
├── _targets.R # Pipeline definition using {targets} package
├── plots/
│ └── plots.R # Script to generate figures used in the paper
├── README.md # This file
```

## Data Access

The post data was extracted from the [Stack Overflow public dataset on Google BigQuery](https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow?hl=en-GB) using the post table on **April 22<sup>nd</sup>, 2022**, using the following query:

```sql
#standardSQL
SELECT tags, creation_date, owner_user_id, view_count AS views, score,
answer_count AS answers, favorite_count AS favs
FROM `bigquery-public-data.stackoverflow.posts_questions`
```

Due to the continuously evolving nature of the Stack Overflow dataset, reproducing the exact same data might be challenging. Researchers may request the original file (`SO_April_2022_Conrad.csv`) from the authors **upon reasonable request** which corresponds to the export associated with the aforementioned query. The pipeline can then be run with that single file.

## Dependencies

This project is built in R and uses the `{targets}` package for reproducible workflows. Please refer to the script headers for required packages.

## Citation

If you use this dataset or code, please cite our paper and acknowledge the Stack Overflow dataset.

[CITATION TO BE ADDED]

Files

README.md

Files (170.0 kB)

Name	Size	Download all
_targets.R md5:0fb029e12817738f9a70c3a0847d0bd2	13.0 kB	Download
functions.R md5:5ab2754c6fe88542618aa8ac22944641	120.4 kB	Download
New Plots.R md5:937e257e828cd82330a20a010cebd812	22.5 kB	Download
plots.R md5:67b06787231e9a1cccd179be48eebc80	8.5 kB	Download
README.md md5:81824e929683b0bae3fb572cd9097487	2.4 kB	Preview Download
utils.R md5:7ec3fad13727e59d2cd0d6a93c409568	3.2 kB	Download

Additional details

Accepted: 2025-09-12

Repository URL: https://github.com/conradborchers/stack-overflow-evolution

	All versions	This version
Views	59	59
Downloads	37	37
Data volume	1.1 MB	1.1 MB

Code Appendix: The innovation dynamics of programming technologies

Authors/Creators

Description

Files

README.md

Files (170.0 kB)

Additional details

Dates

Software