Published March 2, 2026 | Version v1.0.0
Presentation Open

A metadata extraction tool for GitLab repositories

  • 1. ROR icon ZB MED - Information Centre for Life Sciences

Description

Research software is fundamental to scientific reproducibility; however, its reuse is still difficult due to, for instance, poor documentation. Although READMEs are becoming more popular, they are not yet well and consistently adopted across disciplines, and often lack the depth required for other researchers to build upon existing work. Initiatives such as FAIR for Research Software (FAIR4RS), Software Management Plans (SMPs), and those around structured metadata for research software, e.g., CodeMeta and the NFDI Research Software Metadata Working Group, help increase awareness of the need to share and document software, together with its metadata. However, the adoption of machine-readable, semantically structured metadata, critical for achieving FAIRness, remains limited. Some of the factors for researchers not sharing metadata in a structured way are, for instance, lack of knowledge on how to do it, lack of time to provide data already in their repositories but in a different format, lack of awareness on good practices.

Based on an extension to schema.org and CodeMeta including metadata relevant to SMPs, namely the machine-actionable SMP (maSMP) metadata schema, we have created a metadata extraction tool to retrieve information from GitHub repositories so researchers can easily get metadata for their GitHub repositories. In this talk, we will introduce an extension to this tool, now supporting metadata extraction from GitLab repositories. Our metadata extraction tool relies on an extraction, transformation and loading (ETL) pipeline offering an API and end-user web application. Thanks to its modularity, it could be integrated to projects such as Connected Open-Source Software (ConnOSS) and SMP platforms such as the Research Data Management Organiser (RDMO). Our GitHub/GitLab metadata extraction tool aims at enhancing research software quality, reproducibility, and documentation. By automating metadata extraction and aligning it with maSMP metadata schema and FAIR4RS principles, we will contribute to the adoption of better software management practices across all scientific fields.

This work is part of the deRSE26 - Conference for Research Software Engineering in Germany, see https://events.hifis.net/event/2945/contributions/21166/

This work has been partially supported by the German Research Foundation (DFG) through the project ConnOSS with project number 561044496 and DMP4NFDI through Base4NFDI with project number 521453681.

Files

deRSE26 - Gitlab_Metadata_extraction.pdf

Files (6.5 MB)

Name Size Download all
md5:4e4e7f7dca955093628b180d15094282
2.5 MB Download
md5:0f815718b918493e23bad191d9bc9874
1.3 MB Preview Download
md5:f1cc1541c7e5972f22f62cda736ebbaa
2.7 MB Download

Additional details

Funding

Deutsche Forschungsgemeinschaft
Connected Open-Source Software (ConnOSS) 561044496
Deutsche Forschungsgemeinschaft
Base4NFDI - Basisdienste für die NFDI 521453681