Published January 7, 2025
| Version v1
Dataset
Open
LORE PMKB-CV
Contributors
Description
LORE PMKB-CV
- Knowledge graph (LLM-ORE)
- 70M relations between 8k Diseases (MeSH) and 18k Genes (NCBI, human protein coding) curated by LLMs reading PubMed
- Data format: (D_id, G_id, PMID, relation) csv file
- Semantic embedding (LLM-EMB)
- 2.5M DG vectors created by LLMs reading the knowledge graph
- Data format: (D_id, G_id, vector) pkl file
- DG pathogenicity scores (ML-Ranker)
- 3.1M DG scores predicted by pretrained models
- Features, training annotations, pretrained models are also provided
- Curated key semantics taxonomy
- A manually curated taxonomy of 105 semantic tags about DG pathogenicity in the knowledge graph
- Use the github LORE Key-Semantics module to use the taxonomy as tags and add them to the knowledge graph
Source project
- https://github.com/ailabstw/LORE
- Tools for running LLM-ORE relation extraction, LLM-EMB embedding, ML-Ranker prediction, Key-Semantics curation on custom datasets
- https://doi.org/10.1093/bib/bbaf070
- Research article describing the LORE framework, analyses, experiments, and details of the PMKB-CV dataset
Files
Files
(6.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:dbb08704d80cb59b1fb0d32603df9ec1
|
6.3 GB | Download |
Additional details
Related works
- References
- Journal article: 10.1093/bib/bbaf070 (DOI)
- Preprint: 10.1101/2024.08.10.24311801 (DOI)
- Journal article: 10.1093/nar/gkac310 (DOI)
Dates
- Created
-
2025-01-07
Software
- Repository URL
- https://github.com/ailabstw/LORE
- Programming language
- Python
- Development Status
- Active