Published November 8, 2025 | Version v2025-11
Dataset Open

OpenDORS: Open Dataset of Openly Referenced Open Research Software

Authors/Creators

  • 1. ROR icon Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR)

Description

In many academic disciplines, software is created during the research process or specifically for the purpose of enabling and improving critical research. The crucial role of software for research is increasingly acknowledged. The application of software engineering to research software has been formalized as research software engineering, to create better software that enables better research. Despite this, large-scale studies of research software and its development are still lacking.

OpenDORS, the Open Dataset of Openly Referenced Open Research Software enables such studies.

OpenDORS contains

  • 134,352 unique open research software projects with referencing publications
  • 134,154 source code repositories
  • 122,425 latest versions in source code repositories with metadata on, e.g.,
    • license information,
    • programming languages
    • descriptive metadata files

Usage

  • Decompress the dataset archive, e.g., tar -xJvf OpenDORS.v2025-11.tar.xz
  • Refer to schema.json for the structure of the dataset.

Reproducibility

  • To reproduce the dataset, extract OpenDORS-construction-workflow.tar.xz and refer to the included README.md.

Files

schema.json

Files (16.1 MB)

Name Size Download all
md5:1437349999e220d41042b8e04f33b05a
6.3 kB Download
md5:e5f0875f880f8aabfd65f5b6910d70b5
16.0 MB Download
md5:244f0b9339747a33375c95d9e2a3863d
24.3 kB Preview Download

Additional details

Software

Repository URL
https://codebase.helmholtz.cloud/dlr-sc/opendors-workflow
Programming language
Python , Snakemake , Shell
Development Status
Wip