Published February 19, 2025 | Version v4
Dataset Open

OpenAIRE Graph Beginner's Kit Dataset

  • 1. ISTI - CNR
  • 2. OpenAIRE AMKE
  • 3. ISTI - CNR & OpenAIRE AMKE
  • 4. Athena Research and Innovation Centre
  • 5. University of Warsaw
  • 6. Athena Research and Innovation Centre & OpenAIRE AMKE
  • 7. Athena Research and Innovation Centre & OpenAIRE AMKE
  • 8. CERN

Description

The OpenAIRE Graph is an Open Access dataset containing metadata about research products (literature, datasets, software, etc.) linked to other entities of the research ecosystem like organisations, project grants, and data sources. 

The large size of the OpenAIRE Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents. Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone.  

The OpenAIRE Beginner’s Kit aims to address this issue. It consists of two components:

  • A subset of the OpenAIRE Graph composed of the research products published between 2024-06-01 and 2024-12-31, all the entities connected to them and the respective relationships. The subset is composed of the following parts:

    • publication.tar: metadata records about research literature (includes types of publications listed here)

    • dataset.tar: metadata records about research data (includes the subtypes listed here

    • software.tar: metadata records about research software (includes the subtypes listed here)

    • otherresearchproduct.tar: metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed here)

    • organization.tar: metadata records about organizations involved in the research life-cycle, such as universities, research organizations, funders.

    • datasource.tar: metadata records about data sources whose content is available in the OpenAIRE Graph. They include institutional and thematic repositories, journals, aggregators, funders' databases.

    • project.tar: metadata records about project grants.

    • relation.tar: metadata records about relations between entities in the graph.

    • communities_infrastructures.tar: metadata records about research communities and research infrastructures

      Each file is a tar archive containing gz files, each with one json per line. Each json is compliant to the schema available at https://doi.org/10.5281/zenodo.14608526
       


 

Notes

This is a subset of the OpenAIRE Graph that you can use to get familiar with the data model and test your code on a smaller dataset. For your data analysis tasks, you can use the full dump available at https://zenodo.org/records/10488385. For more information on the OpenAIRE Graph and its data model, see (https://graph.openaire.eu/docs/data-model/).

Files

Files (4.2 GB)

Name Size Download all
md5:db359742b3b6cfebee46a933e21418bf
71.7 kB Download
md5:d1c419d6cbbe63bcfba7769fbecd3eb1
247.2 MB Download
md5:44df54dda28c94991b236ce11b190815
5.8 MB Download
md5:9443f8d31af53d3a40a7fdf1a8b8d904
5.3 MB Download
md5:bf1cb8ef59b4a8b03257cf8a9b09f62f
786.2 MB Download
md5:b03ba5ec3d51aaa79157dc4ab5434d1d
26.4 MB Download
md5:628732874cc7f24f0a5cfd742de4da5b
2.6 GB Download
md5:5289c36c382aacddeeb59a33e1cc102d
507.2 MB Download
md5:98eb294aee7b1a592ec04f01351b6a4a
12.0 MB Download

Additional details

Related works

Is documented by
10.5281/zenodo.14608526 (DOI)

Funding

European Commission
OpenAIRE Nexus - OpenAIRE-Nexus Scholarly Communication Services for EOSC users 101017452