Dataset Open Access

A Dataset of Pull Requests and A Trained Random Forest Model for predicting Pull Request Acceptance

Tapajit Dey; Audris Mockus

A Curated Dataset of 470,925 pull requests for 3349 popular NPM packages, description of the variables, code snippet for creating a Random Forest model for predicting pull request acceptance, and a pre-trained  Random Forest model (in R). The dataset is for the ESEM-2020 paper: "Impact of Technical and Social Factors on Pull Request Quality for the NPM Ecosystem" (https://arxiv.org/abs/2007.04816). 

Citation:

@inproceedings{dey2020effect,
  title={Effect of technical and social factors on pull request quality for the npm ecosystem},
  author={Dey, Tapajit and Mockus, Audris},
  booktitle={Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)},
  pages={1--11},
  year={2020}
}
Files (294.3 MB)
Name Size
Curated_Pull_Request_Data.csv
md5:9ba00c622679e3ae6d2ff12bde44e3e7
35.8 MB Download
description.pdf
md5:4c5c559bb644f3e2c991d71dc19932b5
40.4 kB Download
PRMODEL.Rdata
md5:ba1eb93c488e1090ab051901aeb370f2
258.4 MB Download
snippet.R
md5:5cf376f3114ce94edf9e16fddaa50185
841 Bytes Download
120
129
views
downloads
All versions This version
Views 120113
Downloads 129128
Data volume 6.3 GB6.3 GB
Unique views 10198
Unique downloads 7474

Share

Cite as