Published 2026 | Version v2
Conference paper Open

"It Makes the Code Clearer": Why Developers Adopt Modern Python Features in Open Source Projects

Authors/Creators

Description

This dataset accompanies the study “It Makes the Code Clearer: Why Developers Adopt Modern Python Features in Open Source Projects.” It contains the tools, scripts, and data used in the quantitative and qualitative analyses presented in the paper.

  • BooksCodes.xlsx — Qualitative results spreadsheet: contains the 77 coded excerpts (quote substrings) and their assigned themes/categories from the manual thematic analysis.

  • github_api_scrapper-master.zip — Python-based infrastructure used to collect pull request comments from GitHub’s REST API and assemble the raw qualitative corpus.

  • PullRequestCommentsDataset.xlsx — The final set of 494 manually verified pull request comments discussing Python source code rejuvenation, including repository identifiers and coding metadata.

  • PyMiner-develop.zip — The PyMiner tool used to mine the source code history of 424 GitHub repositories, parse ASTs to detect modern Python features, and emit project-level CSV outputs.

  • pyminer-postgres-backup.sqlPostgreSQL dump of the full raw PR-comments corpus (~395,702 comments) and related tables (schema, data, constraints). Suitable for restoration via psql (e.g., psql -U <user> -d <db> -f pyminer-postgres-backup.sql).

All materials are provided to support replication and further research on source code rejuvenation and language feature adoption in Python.

Files

github_api_scrapper-master.zip

Files (432.2 MB)

Name Size Download all
md5:26b4533c2e3716b46e2b79cd1421c1ea
27.9 kB Download
md5:416c85353a9558741137aa7e1340cc4a
5.4 MB Preview Download
md5:b8d065eb63014a093461fbf072443925
80.0 kB Download
md5:c81848d32b208f9bb12de45211465fbe
31.5 MB Preview Download
md5:cf22a3280177c8013c944ec33035d5a4
395.2 MB Preview Download

Additional details

Software

Programming language
Python