Published September 12, 2025 | Version 1.0
Software Open

Code Appendix: The innovation dynamics of programming technologies

Description

# Stack Overflow Technology Evolution Dataset & Code

This repository contains the data processing pipeline and analysis code for the paper:

**"The Combinatorial Technology Evolution of Digital Programming Languages"**  
Conrad Borchers (Carnegie Mellon University)  
Fabian Braesemann (Oxford Internet Institute, ECDF, DWG)

## Overview

This project analyzes the combinatorial evolution of programming technologies using a large-scale dataset of Stack Overflow questions. We investigate how programming tags co-evolve over time, revealing distinct phases in technological development and the network dynamics of innovation.

Our analysis leverages weekly tag-level usage data from Stack Overflow and constructs correlation-based co-usage networks to capture the structure and shifts in digital programming ecosystems.

## Repository Structure

```
.
├── R/
│   ├── functions.R           # Core data wrangling and modeling functions
│   └── utils.R               # Helper functions
├── _targets.R                # Pipeline definition using {targets} package
├── plots/
│   └── plots.R               # Script to generate figures used in the paper
├── README.md                 # This file
```

## Data Access

The post data was extracted from the [Stack Overflow public dataset on Google BigQuery](https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow?hl=en-GB) using the post table on **April 22<sup>nd</sup>, 2022**, using the following query:

```sql
#standardSQL
SELECT tags, creation_date, owner_user_id, view_count AS views, score, 
       answer_count AS answers, favorite_count AS favs
FROM `bigquery-public-data.stackoverflow.posts_questions`
```

Due to the continuously evolving nature of the Stack Overflow dataset, reproducing the exact same data might be challenging. Researchers may request the original file (`SO_April_2022_Conrad.csv`) from the authors **upon reasonable request** which corresponds to the export associated with the aforementioned query. The pipeline can then be run with that single file.

## Dependencies

This project is built in R and uses the `{targets}` package for reproducible workflows. Please refer to the script headers for required packages.

## Citation

If you use this dataset or code, please cite our paper and acknowledge the Stack Overflow dataset.

[CITATION TO BE ADDED]

Files

README.md

Files (170.0 kB)

Name Size Download all
md5:0fb029e12817738f9a70c3a0847d0bd2
13.0 kB Download
md5:5ab2754c6fe88542618aa8ac22944641
120.4 kB Download
md5:937e257e828cd82330a20a010cebd812
22.5 kB Download
md5:67b06787231e9a1cccd179be48eebc80
8.5 kB Download
md5:81824e929683b0bae3fb572cd9097487
2.4 kB Preview Download
md5:7ec3fad13727e59d2cd0d6a93c409568
3.2 kB Download

Additional details

Dates

Accepted
2025-09-12