Geographic Diversity in Public Code Contributions - Replication Package

This document describes how to replicate the findings of the paper: Davide Rossi and Stefano Zacchiroli, 2022, Geographic Diversity in Public Code Contributions - An Exploratory Large-Scale Study Over 50 Years. In 19th International Conference on Mining Software Repositories (MSR ’22), May 23-24, Pittsburgh, PA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3524842.3528471

This document comes with the software needed to mine and analyze the data presented in the paper.

Prerequisites

These instructions assume the use of the bash shell, the Python programming language, the PosgreSQL DBMS (version 11 or later), the zstd compression utility and various usual *nix shell utilities (cat, pv, …), all of which are available for multiple architectures and OSs.
It is advisable to create a Python virtual environment and install the following PyPI packages:

click==8.0.4
cycler==0.11.0
fonttools==4.31.2
kiwisolver==1.4.0
matplotlib==3.5.1
numpy==1.22.3
packaging==21.3
pandas==1.4.1
patsy==0.5.2
Pillow==9.0.1
pyparsing==3.0.7
python-dateutil==2.8.2
pytz==2022.1
scipy==1.8.0
six==1.16.0
statsmodels==0.13.2

Initial data

Data preparation

Zone detection by email

Database creation and initial data ingestion

Zone detection by name

Extraction and graphs