OpenDORS: Open Dataset of Openly Referenced Open Research Software
Description
In many academic disciplines, software is created during the research process or specifically for the purpose of enabling and improving critical research. The crucial role of software for research is increasingly acknowledged. The application of software engineering to research software has been formalized as research software engineering, to create better software that enables better research. Despite this, large-scale studies of research software and its development are still lacking.
OpenDORS, the Open Dataset of Openly Referenced Open Research Software enables such studies.
OpenDORS contains
- 134,352 unique open research software projects with referencing publications
- 134,154 source code repositories
- 122,425 latest versions in source code repositories with metadata on, e.g.,
- license information,
- programming languages
- descriptive metadata files
Usage
- Decompress the dataset archive, e.g.,
tar -xJvf OpenDORS.v2025-11.tar.xz - Refer to
schema.jsonfor the structure of the dataset.
Reproducibility
- To reproduce the dataset, extract
OpenDORS-construction-workflow.tar.xzand refer to the includedREADME.md.
Files
schema.json
Files
(16.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:1437349999e220d41042b8e04f33b05a
|
6.3 kB | Download |
|
md5:e5f0875f880f8aabfd65f5b6910d70b5
|
16.0 MB | Download |
|
md5:244f0b9339747a33375c95d9e2a3863d
|
24.3 kB | Preview Download |
Additional details
Software
- Repository URL
- https://codebase.helmholtz.cloud/dlr-sc/opendors-workflow
- Programming language
- Python , Snakemake , Shell
- Development Status
- Wip