Code Appendix: The innovation dynamics of programming technologies
Authors/Creators
Description
# Stack Overflow Technology Evolution Dataset & Code
This repository contains the data processing pipeline and analysis code for the paper:
**"The Combinatorial Technology Evolution of Digital Programming Languages"**
Conrad Borchers (Carnegie Mellon University)
Fabian Braesemann (Oxford Internet Institute, ECDF, DWG)
## Overview
This project analyzes the combinatorial evolution of programming technologies using a large-scale dataset of Stack Overflow questions. We investigate how programming tags co-evolve over time, revealing distinct phases in technological development and the network dynamics of innovation.
Our analysis leverages weekly tag-level usage data from Stack Overflow and constructs correlation-based co-usage networks to capture the structure and shifts in digital programming ecosystems.
## Repository Structure
```
.
├── R/
│ ├── functions.R # Core data wrangling and modeling functions
│ └── utils.R # Helper functions
├── _targets.R # Pipeline definition using {targets} package
├── plots/
│ └── plots.R # Script to generate figures used in the paper
├── README.md # This file
```
## Data Access
The post data was extracted from the [Stack Overflow public dataset on Google BigQuery](https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow?hl=en-GB) using the post table on **April 22<sup>nd</sup>, 2022**, using the following query:
```sql
#standardSQL
SELECT tags, creation_date, owner_user_id, view_count AS views, score,
answer_count AS answers, favorite_count AS favs
FROM `bigquery-public-data.stackoverflow.posts_questions`
```
Due to the continuously evolving nature of the Stack Overflow dataset, reproducing the exact same data might be challenging. Researchers may request the original file (`SO_April_2022_Conrad.csv`) from the authors **upon reasonable request** which corresponds to the export associated with the aforementioned query. The pipeline can then be run with that single file.
## Dependencies
This project is built in R and uses the `{targets}` package for reproducible workflows. Please refer to the script headers for required packages.
## Citation
If you use this dataset or code, please cite our paper and acknowledge the Stack Overflow dataset.
[CITATION TO BE ADDED]
Files
README.md
Files
(170.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0fb029e12817738f9a70c3a0847d0bd2
|
13.0 kB | Download |
|
md5:5ab2754c6fe88542618aa8ac22944641
|
120.4 kB | Download |
|
md5:937e257e828cd82330a20a010cebd812
|
22.5 kB | Download |
|
md5:67b06787231e9a1cccd179be48eebc80
|
8.5 kB | Download |
|
md5:81824e929683b0bae3fb572cd9097487
|
2.4 kB | Preview Download |
|
md5:7ec3fad13727e59d2cd0d6a93c409568
|
3.2 kB | Download |
Additional details
Dates
- Accepted
-
2025-09-12
Software
- Repository URL
- https://github.com/conradborchers/stack-overflow-evolution