Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published March 12, 2024 | Version v3
Dataset Open

OpenAIRE Graph Beginner's Kit Dataset

  • 1. ISTI - CNR
  • 2. OpenAIRE AMKE
  • 3. ISTI - CNR & OpenAIRE AMKE
  • 4. Athena Research and Innovation Centre
  • 5. University of Warsaw
  • 6. Athena Research and Innovation Centre & OpenAIRE AMKE
  • 7. Athena Research and Innovation Centre & OpenAIRE AMKE
  • 8. University of Bielefeld
  • 9. CERN


The OpenAIRE Graph is an Open Access dataset containing metadata about research products (literature, datasets, software, etc.) linked to other entities of the research ecosystem like organisations, project grants, and data sources. 

The large size of the OpenAIRE Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents. Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone.  

The OpenAIRE Beginner’s Kit aims to address this issue. It consists of two components:

  • A subset of the OpenAIRE Graph composed of the research products published between 2023-06-30 and 2024-02-29, all the entities connected to them and the respective relationships. The subset is composed of the following parts:

    • publication.tar: metadata records about research literature (includes types of publications listed here)

    • dataset.tar: metadata records about research data (includes the subtypes listed here

    • software.tar: metadata records about research software (includes the subtypes listed here)

    • otherresearchproduct.tar: metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed here)

    • organization.tar: metadata records about organizations involved in the research life-cycle, such as universities, research organizations, funders.

    • datasource.tar: metadata records about data sources whose content is available in the OpenAIRE Graph. They include institutional and thematic repositories, journals, aggregators, funders' databases.

    • project.tar: metadata records about project grants.

    • relation.tar: metadata records about relations between entities in the graph.

    • communities_infrastructures.tar: metadata records about research communities and research infrastructures

      Each file is a tar archive containing gz files, each with one json per line. Each json is compliant to the schema available at



This is a subset of the OpenAIRE Graph that you can use to get familiar with the data model and test your code on a smaller dataset. For your data analysis tasks, you can use the full dump available at For more information on the OpenAIRE Graph and its data model, see (


Files (4.8 GB)

Name Size Download all
61.4 kB Download
717.3 MB Download
5.1 MB Download
5.9 MB Download
87.4 MB Download
27.6 MB Download
3.2 GB Download
730.1 MB Download
12.1 MB Download

Additional details

Related works

Is documented by
10.5281/zenodo.10519039 (DOI)


OpenAIRE Nexus – OpenAIRE-Nexus Scholarly Communication Services for EOSC users 101017452
European Commission