Published October 22, 2025 | Version v1
Dataset Open

Publications related to AI, ML, and NLP within Medicine and Biology_OpenAlex

  • 1. ROR icon Athena Research and Innovation Center In Information Communication & Knowledge Technologies

Description

This dataset contains a set of publications retrieved from October 2024 OpenAlex snapshot that are associated with the top-level concepts “Medicine” and “Biology”. From this subset, we selected works additionally categorized under “Artificial Intelligence”, “Machine Learning”, or “Natural Language Processing”.

Each record includes the following metadata fields:

DOI – Digital Object Identifier of the publication

PMID – PubMed identifier (when available)

Year – Year of publication

Countries – Countries of the authors’ affiliations

concepts_AI-ML-NLP_L1 – Level-1 OpenAlex concept IDs, names, and confidence scores related to AI, ML, or NLP

concepts_children_L2 – Level-2 OpenAlex sub-concepts with corresponding IDs, names, and confidence scores

The dataset represents a structured sample for mapping the intersection of AI/ML/NLP research with biomedical and Life Sciences domains.

Files

Files (34.4 MB)

Name Size Download all
md5:81a6997669b87cde254ba610a11c277b
34.4 MB Download

Additional details

Funding

European Commission
ELIXIR - European Life-science Infrastructure for Biological Information 211601