Published March 12, 2024 | Version v3
Dataset Open

OpenAIRE Graph Beginner's Kit Dataset

  • 1. ISTI - CNR
  • 2. OpenAIRE AMKE
  • 3. ISTI - CNR & OpenAIRE AMKE
  • 4. Athena Research and Innovation Centre
  • 5. University of Warsaw
  • 6. Athena Research and Innovation Centre & OpenAIRE AMKE
  • 7. Athena Research and Innovation Centre & OpenAIRE AMKE
  • 8. University of Bielefeld
  • 9. CERN

Description

The OpenAIRE Graph is an Open Access dataset containing metadata about research products (literature, datasets, software, etc.) linked to other entities of the research ecosystem like organisations, project grants, and data sources. 

The large size of the OpenAIRE Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents. Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone.  

The OpenAIRE Beginner’s Kit aims to address this issue. It consists of two components:

  • A subset of the OpenAIRE Graph composed of the research products published between 2023-06-30 and 2024-02-29, all the entities connected to them and the respective relationships. The subset is composed of the following parts:

    • publication.tar: metadata records about research literature (includes types of publications listed here)

    • dataset.tar: metadata records about research data (includes the subtypes listed here

    • software.tar: metadata records about research software (includes the subtypes listed here)

    • otherresearchproduct.tar: metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed here)

    • organization.tar: metadata records about organizations involved in the research life-cycle, such as universities, research organizations, funders.

    • datasource.tar: metadata records about data sources whose content is available in the OpenAIRE Graph. They include institutional and thematic repositories, journals, aggregators, funders' databases.

    • project.tar: metadata records about project grants.

    • relation.tar: metadata records about relations between entities in the graph.

    • communities_infrastructures.tar: metadata records about research communities and research infrastructures

      Each file is a tar archive containing gz files, each with one json per line. Each json is compliant to the schema available at https://zenodo.org/records/10519039
       


 

Notes

This is a subset of the OpenAIRE Graph that you can use to get familiar with the data model and test your code on a smaller dataset. For your data analysis tasks, you can use the full dump available at https://zenodo.org/records/10488385. For more information on the OpenAIRE Graph and its data model, see (https://graph.openaire.eu/docs/data-model/).

Files

Files (4.8 GB)

Name Size Download all
md5:c5fef68aa9a5872d0197e8462db6806b
61.4 kB Download
md5:240a41822d79b36abd8fdfca4b0a445c
717.3 MB Download
md5:78672dbba3fbd6688003c22b959af158
5.1 MB Download
md5:792844158a84d0e3c2b49dff0135b104
5.9 MB Download
md5:cb9c866a859d1afa359c168f72d102a3
87.4 MB Download
md5:c7a67b570e351d1639fa83dd6c5c71d5
27.6 MB Download
md5:f82846945f2a636ec904150a894d6680
3.2 GB Download
md5:cbbc59bb47f64353694da78c55745d0c
730.1 MB Download
md5:48bd52c42718147e05abf8d6dc90c5fe
12.1 MB Download

Additional details

Related works

Is documented by
10.5281/zenodo.10519039 (DOI)

Funding

OpenAIRE Nexus – OpenAIRE-Nexus Scholarly Communication Services for EOSC users 101017452
European Commission