PheKnowLator Human Disease Knowledge Graph Benchmarks Archive
Description
PKT Human Disease KG Benchmark Builds
The PheKnowLator (PKT) Human Disease KG (PKT-KG) was built to model mechanisms of human disease, which includes the Central Dogma and represents multiple biological scales of organization including molecular, cellular, tissue, and organ. The knowledge representation was designed in collaboration with a PhD-level molecular biologist (Figure).
The PKT Human Disease KG was constructed using 12 OBO Foundry ontologies, 31 Linked Open Data sets, and results from two large-scale experiments (Supplementary Material). The 12 OBO Foundry ontologies were selected to represent chemicals and vaccines (i.e., ChEBI and Vaccine Ontology), cells and cell lines (i.e., Cell Ontology, Cell Line Ontology), gene/gene product attributes (i.e., Gene Ontology), phenotypes and diseases (i.e., Human Phenotype Ontology, Mondo Disease Ontology), proteins, including complexes and isoforms (i.e., Protein Ontology), pathways (i.e., Pathway Ontology), types and attributes of biological sequences (i.e., Sequence Ontology), and anatomical entities (Uberon ontology). The RO is used to provide relationships between the core OBO Foundry ontologies and database entities.
The PKT Human Disease KG contained 18 node types and 33 edge types. Note that the number of nodes and edge types reflects those that are explicitly added to the core set of OBO Foundry ontologies and does not take into account the node and edge types provided by the ontologies. These nodes and edge types were used to construct 12 different PKT Human Disease benchmark KGs by altering the Knowledge Model (i.e., class- vs. instance-based), Relation Strategy (i.e., standard vs. inverse relations), and Semantic Abstraction (i.e., OWL-NETS (yes/no) with and without Knowledge Model harmonization [OWL-NETS Only vs. OWL-NETS + Harmonization]) parameters. Benchmarks within the PheKnowLator ecosystem are different versions of a KG that can be built under alternative knowledge models, relation strategies, and with or without semantic abstraction. They provide users with the ability to evaluate different modeling decisions (based on the prior mentioned parameters) and to examine the impact of these decisions on different downstream tasks.
The Figures and Tables explaining attributes in the builds can be found here.
Build Data Access
Important Build Information
The benchmarks were originally built and stored using Google Cloud Platform (GCP) resources. For details and a complete description of this process, can be found on GitHub (here). Note that we have developed this Zenodo-based archive for the builds. While the original GCP resources contained all of the resources needed to generate the builds, due to the file size upload limits associated with each archive, we have limited the uploaded files to the KGs, associated metadata, and log files. The list of resources, including their URLs, and date of download, can all be found in the logs associated with each build.
🗂 For additional information on the KG file types please see the following Wiki page, which is also available as a download from this repository (PheKnowLator_HumanDiseaseKG_Output_FileInformation.xlsx).
v1.0.0
- KGs: https://zenodo.org/doi/10.5281/zenodo.7030200
- Embeddings: https://zenodo.org/doi/10.5281/zenodo.7030188
All Other Build Versions
Class-based Builds
Standard Relations
- OWL Build
- OWL-NETS Build
Inverse Relations
- OWL Build
- OWL-NETS Build
Instance-based Builds
Standard Relations
- OWL Build
- OWL-NETS Build
Inverse Relations
Files
Files
(17.4 kB)
Name | Size | Download all |
---|---|---|
md5:71b044a87de10a34eda5ef0b9f859cb7
|
17.4 kB | Download |
Additional details
Related works
- Is identical to
- Dataset: https://github.com/callahantiff/PheKnowLator/wiki/Archived-Builds (URL)
- References
- Dataset: https://console.cloud.google.com/storage/browser/pheknowlator (URL)