Published January 7, 2025 | Version v1
Dataset Open

LORE PMKB-CV

  • 1. Taiwan AI Labs
  • 1. Taiwan AI Labs & Foundation
  • 2. ROR icon Academia Sinica
  • 3. ROR icon National Taiwan University
  • 4. Taiwan AI Labs

Description

LORE PMKB-CV

  • Knowledge graph (LLM-ORE)
    • 70M relations between 8k Diseases (MeSH) and 18k Genes (NCBI, human protein coding) curated by LLMs reading PubMed
    • Data format: (D_id, G_id, PMID, relation) csv file
  • Semantic embedding (LLM-EMB)
    • 2.5M DG vectors created by LLMs reading the knowledge graph
    • Data format: (D_id, G_id, vector) pkl file
  • DG pathogenicity scores (ML-Ranker)
    • 3.1M DG scores predicted by pretrained models
    • Features, training annotations, pretrained models are also provided
  • Curated key semantics taxonomy
    • A manually curated taxonomy of 105 semantic tags about DG pathogenicity in the knowledge graph
    • Use the github LORE Key-Semantics module to use the taxonomy as tags and add them to the knowledge graph

Source project

Files

Files (6.3 GB)

Name Size Download all
md5:dbb08704d80cb59b1fb0d32603df9ec1
6.3 GB Download

Additional details

Related works

References
Journal article: 10.1093/bib/bbaf070 (DOI)
Preprint: 10.1101/2024.08.10.24311801 (DOI)
Journal article: 10.1093/nar/gkac310 (DOI)

Dates

Created
2025-01-07

Software

Repository URL
https://github.com/ailabstw/LORE
Programming language
Python
Development Status
Active