Replication Package for ESEC/FSE 2023 Paper "How Early Participation Determines Long-Term Sustained Activity in GitHub Projects?"
- 1. Peking University
- 2. Beijing Institute of Technology
Description
This replication package can be used for replicating results in the paper. It contains 1) a dataset of 290,255 repositories; and 2) Python scripts for training and interpreting models. The GitHub repository of the paper is available at https://github.com/mcxwx123/Sustainable_projects.
We recommend manually setup the required environment in a commodity Linux machine with at least 1 CPU Core, 8GB Memory and 100GB empty storage space. We conduct development and execute all our experiments on a Ubuntu 20.04 server with two Intel Xeon Gold CPUs, 320GB memory, and 36TB RAID 5 Storage.
We use GHTorrent to restore historical states of 290,255 repositories with more than 57 commits, 4 PRs, 1 issue, 1 fork and 2 stars. The raw data of repositories (collected in their first 1,3,5 months(s)) are stored in `Replication Package/data/prodata_1.pkl`, `Replication Package/data/prodata_3.pkl`, and `Replication Package/data/prodata_5.pkl`. The contribution of features resulting from LIME model is stored in `Replication Package/data/limeres_m3_t2_k1.pkl`.
`Replication Package/data/X_test_m3_t2_k1.pkl` and `Replication Package/data/y_test_m3_t2_k1.pkl` store the test dataset for the LIME model. You can run `Replication Package/fitdata.py` to get the results in Table 3 and 4, run `Replication Package/draw_compare_variable.py` to get Figure 2 and run `Replication Package/allvari_statistics.py` to get Table 5. In `Replication Package/Variable_comparison_with_different_parameter.pdf`, we show the LIME results under different parameters. In `Replication Package/sample_pros.csv`, we also provide the list of randomly selected repositories in Section 3.1.
The explanations for collecting the variables, the examples of variable effects on project sustainability, and the hyperparameter setting of the machine learning models are provided in the README.md file.
Files
Replication Package.zip
Files
(2.0 GB)
Name | Size | Download all |
---|---|---|
md5:9b263281dc1daf84e9963e5c0186eb6a
|
2.0 GB | Preview Download |