Published April 23, 2025 | Version v1
Dataset Open

Variantscape datasets

  • 1. University of St Gallen School of Medicine

Description

Variantscape dataset
LLM-based extraction of genetic variants and biomedical entities from titles and abstracts of biomedical publications. These datasets support the analysis of literature-derived co-associations between genetic variants, cancer types, and treatments, enabling downstream network analysis, hypothesis generation, and discovery in precision oncology.

 

1. Dataset: Cleaned literature dataset for biomedical entity extraction (2014–2024)
"cleaned_OpenAlex.csv "
A pre-processed, cleaned, and structured dataset of cancer-related biomedical publications (2014–2024) retrieved from OpenAlex, containing titles, abstracts, and metadata curated for downstream NLP and LLM-based biomedical entity extraction.

 

2. Dataset: Binary entity matrix for co-association and network analysis
"
dataset_for_analysis.csv"
Final binary matrix dataset derived from NLP- and LLM-based entity extraction on cancer-related literature. Entities include genetic variants, cancer types, and treatments, enabling co-occurrence and network analysis, and the investigation of literature-derived co-associations.

 

3. Dataset: LLM-based classification of variant-treatment co-associations
"v
ariant_treatment_relationship_consensus.csv"
Dataset capturing LLM-based classification and consensus on co-associations between genetic variants and treatments.

 

4. Dataset: Metadata mapping for entity extraction and analysis
"
metadata_mapping_transposed.csv "
Transposed, row-indexed metadata mapping file used for identification of each column as a variant, cancer type, treatment, study design element, or publication-derived metadata.

Files

cleaned_OpenAlex.csv

Files (5.3 GB)

Name Size Download all
md5:1de25a9727bd220cacecf08899b181d1
5.2 GB Preview Download
md5:83c347ccf0aa7a21b00110d8a4a396bb
85.9 MB Preview Download
md5:4ecede74d8981b106c3781f1fccca7b8
100.4 kB Preview Download
md5:c1b39fd6d420ad5aa8346caa7d274aab
592.5 kB Preview Download

Additional details

Dates

Created
2025-04-23

Software

Repository URL
https://github.com/hastingslab-org/Variantscape
Programming language
Python
Development Status
Active