Published August 4, 2025 | Version v1
Conference paper Open

Bioschemas and Schemas.science at NFDI

  • 1. ZB MED Information Centre for Life Sciences
  • 2. CNRS
  • 3. The University of Manchester
  • 4. Forschungszentrum Jülich

Contributors

  • 1. Nationale Forschungsdateninfrastruktur (NFDI) e.V.
  • 2. University of Amsterdam

Description

Bioschemas [1] is a community-driven initiative aimed to improve the findability of life sciences resources hosted on web pages. It builds upon Schema.org in two ways: (i) providing specific types tailored to the life sciences domain, such as BioChemEntity, Protein, and Gene; and (ii) providing recommendation of usage for existing types, aka profiles. Bioschemas 'types' therefore allows researchers to semantically express biological information, leveraging the system created by search engines. Furthermore, 'profiles' build on the limited help provided by schemas.org in the use of their types, by expressing valid property values, marginality and cardinality. In this way, search engines and other services can index the data, with an understanding of the biological context of the data. Adding Bioschemas-compatible structured markup to web pages, ensures that these resources are more accessible and reusable, aligning with the FAIR (Findable, Accessible, Interoperable, Reusable) principles [2]. Bioschemas markup has been leveraged to combine distributed resources in a federated manner [3], and to create Knowledge Graphs for the Chemical and Plant community in ELIXIR, including partners from the German National Data Research Infrastructure (NFDI) consortia, namely NFDI4Chem and DataPlant [4]. FAIRAgro has also expressed interest in working on a schema.org extension compatible with Bioschemas. Bioschemas profiles corresponding to cross-domain research artifacts are also exposed on its sister website schemas.science, currently hosting 7 cross-domain profiles: Dataset, DataRepository, ComputationalTool, ComputationalWorkflow, Course, CourseInstance and TrainingMaterial. Schemas.science will facilitate collaborations with partners outside the 'bio' domains. The cross-domain and multi-disciplinary nature of research, as seen through emergent ESFRI RIs, is becoming very prominent. Schemas.sci can play an important role, being a central resource and collaborative/hosting space for such work, building out from the 'exemplar' community of Bioschemas. Such a collaborative space would enable a forum to discuss the needs of metadata standards/definitions to enable the bridging of disciplinary boundaries. For instance, there is some interest from the Artificial Intelligence (AI) community to (i) extend the the Dataset profile to include as 'properties' (elements) from Croissant ML [5] and (ii) create a profile corresponding to the FAIR4ML vocabulary [6]. Croissant ML deals with the description of datasets used in AI approaches, i.e., AI-ready datasets while FAIR4ML aims at describing Machine Learning (ML) models. In addition, training communities (e.g., DALIA [7], TeSS [8] and mTeSS-X), and those related to data and software management plans (e.g., DMP4NFDI [9] and maSMP [10,11]) would also benefit from some core agreements on the usage of schema.org, facilitated through Bioschemas profiles. Bioschemas, together with its sister Schemas.science, aim at making it easier for researchers to implement FAIR for a wide variety of research artifacts, and to make that methodology more consistently implementable in a domain and discipline agnostic manner.

Files

CoRDI_2025_paper_93.pdf

Files (136.2 kB)

Name Size Download all
md5:d5e85e03e5904a3eded3716d21ee7876
136.2 kB Preview Download