An Empirical Survey of GitHub Repositories at U.S. Research Universities
Description
In this work we aim to partially answer the question, "Just how many research software projects are out there?" by searching for open source GitHub projects affiliated with research universities in the United States. We explore this through keyword searches on GitHub itself and by scraping university websites for links to GitHub repositories. We then filter these results by using a large language model to classify GitHub repositories as research software engineering projects or not, finding over 35,000 RSE repositories. We report our results by university. We then analyze these repositories against metrics of popularity, such as stars and repository forks, and find just under 14,000 RSE repositories meet our minimum criteria for projects which have a community. Based on the time since a developer last pushed a change to a RSE repository with a community, we further posit that 3,300 RSE repositories with communities and a link to a research university are at risk of dying, and thus may benefit from sustainability support. Finally, across all RSE projects linked to a research university, we empirically find the top repository languages are Python, C++, and Jupyter Notebook.
Files
rse_paper_2024.pdf
Files
(221.0 kB)
Name | Size | Download all |
---|---|---|
md5:bd2194327097f2c7c0e118641c7b5d8e
|
221.0 kB | Preview Download |