PheKnowLator Human Disease Knowledge Graphs -- Build Archive

Callahan, Tiffany J

doi:10.5281/zenodo.7051238

Published September 5, 2022 | Version v.3.0.2

Dataset Open

PheKnowLator Human Disease Knowledge Graphs -- Build Archive

Callahan, Tiffany J¹

1. University of Colorado Anschutz Medical Campus

RELEASE V3.0.2 KNOWLEDGE GRAPH BENCHMARK ARCHIVE

Website: https://github.com/callahantiff/PheKnowLator/wiki/Benchmarks-and-Builds

The goal of the PheKnowLator (PKT) Human Disease knowledge graph (KG) benchmarks is to provide KG builds that represent human disease mechanisms, including the central dogma.

PKT Human Disease KG Build Information

To enable customization in the way that knowledge is represented when constructing a KG, three configurable parameters are provided:

Knowledge Model. PKT-KG defines a KG as K=⟨T, A⟩, where T is the TBox and A is the ABox. The TBox describes “classes”, properties, and assertions that are assumed to generally be true (e.g., a gene is a heritable unit of DNA located in the nucleus of cells). The ABox describes “individuals” or instances of classes and assertions that are specific to an instance (e.g., A2M is a type of gene that may cause Alzheimer’s Disease). PKT-KG KGs are ground in Open Biological and Biomedical Ontology (OBO) Foundry ontologies, which are enhanced with database entities. Database entities are added to the core ontologies using either a TBox (i.e., “class”-based) or an ABox (i.e., “instance”-based) knowledge model. For the class-based approach, database entities are made a subclass of an existing ontology class. For the instance-based approach, database entities are made an instance of an existing ontology class. Both approaches require the alignment of database entities to a core ontology class.
Relation Strategy. PKT-KG provides two relation strategies: unidirectionally, through a single edge (e.g., “gene causes phenotype”) and bidirectionally, through inferring a relation’s inverse, if the relation is from an ontology like the Relations Ontology (e.g., “chemical participates in pathway” and “pathway has participant chemical”), and through inferring implicitly symmetric relations for edge types that represent biological interactions (e.g., gene-gene interactions).
Semantic Abstraction. KGs built using expressive languages like OWL, are structurally complex and comprised of triples or edges that are not biologically meaningful (e.g., syntactic entities used to express OWL axioms). PKT-KG includes the OWL-NETS (PMC5737627) semantic abstraction algorithm, which enables the versions of PheKnowLator KGs that only include biologically meaningful edges (i.e., all edges representing OWL syntactic elements have been removed). OWL-NETS v2.0 includes additional functionality that harmonizes a semantically abstracted KGs to a class- or instance-based knowledge model. For class-based knowledge models, all triples or edges containing rdf:type are updated to rdfs:subClassOf and for instance-based knowledge models, all triples or edges containing rdfs:subClassOf are updated to rdf:type. For additional details, see OWL-NETS v2.0 documentation.

The PKT Human Disease KG benchmark was designed to model mechanisms of human disease, which included the Central Dogma and represented a total of six biological scales of organization. The PKT Human Disease KG was developed in collaboration with a PhD-level molecular biologist (knowledge representation). The PKT Human Disease KGs were constructed using 12 OBO Foundry ontologies. Combining the combining these sources results in 18 node types and facilitates the addition of 35 edge types. Note that the count node and edge types listed reflect those that are explicitly added to the core set of ontologies and does take into account the node and edge types provided by the ontologies.

These data are used to construct 12 different versions of the PKT Human Disease KG by altering the Knowledge Model (i.e., class- vs. instance-based), Relation Strategy (i.e., standard vs. inverse relations), and applying Semantic Abstraction (i.e., OWL-NETS (yes/no) with and without Knowledge Model harmonization) parameters described in the Construct KGs element of the Knowledge Graph Construction Resources component.

Files in this Directory

PheKnowLator_HumanDiseaseKG_Output_FileInformation.xlsx: contains information on the different types of files that are output for each build.
pheknowlator_builds.json. A list of all KG builds, identifier-labeled edge lists.
full_pheknowlator_build_files.json. A list of all output files for each build.

Google Cloud Storage Bucket Access:

https://console.cloud.google.com/storage/browser/pheknowlator

Build Data:

Google Cloud Storage

Zenodo

Knowledge Graph Data:

v1.0.0

Google Cloud Storage

https://console.cloud.google.com/storage/browser/pheknowlator/archived_builds/release_v1.0.0

Zenodo

10.5281/zenodo.7030200

All Other Versions

Google Cloud Storage

Zenodo

Class-based Knowledge Model + Standard Relations
- OWL
- OWL-NETS
Class-based Knowledge Model + Inverse Relations
- OWL
- OWL-NETS
Instance-based Knowledge Model + Standard Relations
- OWL
- OWL-NETS
Instance-based Knowledge Model + Inverse Relations
- OWL
- OWL-NETS

Knowledge Graph Embeddings:

Google Cloud Storage

v1.0.0_03SEP2019

Zenodo:

v1.0.0_03SEP2019

Files

full_pheknowlator_build_files.json

Files (497.5 kB)

Name	Size	Download all
full_pheknowlator_build_files.json md5:521333a15b1249cb26589e014fb8c8fd	453.1 kB	Preview Download
pheknowlator_builds.json md5:0ec5681f880f4910a6a7fc9c0e5818fd	33.5 kB	Preview Download
PheKnowLator_HumanDiseaseKG_Output_FileInformation.xlsx md5:c1d134551f517eaa1c1d03ff1979b1a5	10.8 kB	Download

	All versions	This version
Views	1,569	330
Downloads	151	30
Data volume	11.2 MB	5.0 MB

PheKnowLator Human Disease Knowledge Graphs -- Build Archive

Creators

Description

Files

full_pheknowlator_build_files.json

Files (497.5 kB)