PheKnowLator Human Disease Knowledge Graphs -- Build Archive
Description
RELEASE V3.0.2 KNOWLEDGE GRAPH BENCHMARK ARCHIVE
Website: https://github.com/callahantiff/PheKnowLator/wiki/Benchmarks-and-Builds
The goal of the PheKnowLator (PKT) Human Disease knowledge graph (KG) benchmarks is to provide KG builds that represent human disease mechanisms, including the central dogma.
PKT Human Disease KG Build Information
To enable customization in the way that knowledge is represented when constructing a KG, three configurable parameters are provided:
- Knowledge Model. PKT-KG defines a KG as K=〈T, A〉, where T is the TBox and A is the ABox. The TBox describes “classes”, properties, and assertions that are assumed to generally be true (e.g., a gene is a heritable unit of DNA located in the nucleus of cells). The ABox describes “individuals” or instances of classes and assertions that are specific to an instance (e.g., A2M is a type of gene that may cause Alzheimer’s Disease). PKT-KG KGs are ground in Open Biological and Biomedical Ontology (OBO) Foundry ontologies, which are enhanced with database entities. Database entities are added to the core ontologies using either a TBox (i.e., “class”-based) or an ABox (i.e., “instance”-based) knowledge model. For the class-based approach, database entities are made a subclass of an existing ontology class. For the instance-based approach, database entities are made an instance of an existing ontology class. Both approaches require the alignment of database entities to a core ontology class.
- Relation Strategy. PKT-KG provides two relation strategies: unidirectionally, through a single edge (e.g., “gene causes phenotype”) and bidirectionally, through inferring a relation’s inverse, if the relation is from an ontology like the Relations Ontology (e.g., “chemical participates in pathway” and “pathway has participant chemical”), and through inferring implicitly symmetric relations for edge types that represent biological interactions (e.g., gene-gene interactions).
- Semantic Abstraction. KGs built using expressive languages like OWL, are structurally complex and comprised of triples or edges that are not biologically meaningful (e.g., syntactic entities used to express OWL axioms). PKT-KG includes the OWL-NETS (PMC5737627) semantic abstraction algorithm, which enables the versions of PheKnowLator KGs that only include biologically meaningful edges (i.e., all edges representing OWL syntactic elements have been removed). OWL-NETS v2.0 includes additional functionality that harmonizes a semantically abstracted KGs to a class- or instance-based knowledge model. For class-based knowledge models, all triples or edges containing rdf:type are updated to rdfs:subClassOf and for instance-based knowledge models, all triples or edges containing rdfs:subClassOf are updated to rdf:type. For additional details, see OWL-NETS v2.0 documentation.
The PKT Human Disease KG benchmark was designed to model mechanisms of human disease, which included the Central Dogma and represented a total of six biological scales of organization. The PKT Human Disease KG was developed in collaboration with a PhD-level molecular biologist (knowledge representation). The PKT Human Disease KGs were constructed using 12 OBO Foundry ontologies. Combining the combining these sources results in 18 node types and facilitates the addition of 35 edge types. Note that the count node and edge types listed reflect those that are explicitly added to the core set of ontologies and does take into account the node and edge types provided by the ontologies.
These data are used to construct 12 different versions of the PKT Human Disease KG by altering the Knowledge Model (i.e., class- vs. instance-based), Relation Strategy (i.e., standard vs. inverse relations), and applying Semantic Abstraction (i.e., OWL-NETS (yes/no) with and without Knowledge Model harmonization) parameters described in the Construct KGs element of the Knowledge Graph Construction Resources component.
Files in this Directory
- PheKnowLator_HumanDiseaseKG_Output_FileInformation.xlsx: contains information on the different types of files that are output for each build.
- pheknowlator_builds.json. A list of all KG builds, identifier-labeled edge lists.
- full_pheknowlator_build_files.json. A list of all output files for each build.
Google Cloud Storage Bucket Access:
Build Data:
Google Cloud Storage
Zenodo
Knowledge Graph Data:
v1.0.0
Google Cloud Storage
Zenodo
All Other Versions
Google Cloud Storage
Zenodo
- Class-based Knowledge Model + Standard Relations
- Class-based Knowledge Model + Inverse Relations
- Instance-based Knowledge Model + Standard Relations
- Instance-based Knowledge Model + Inverse Relations
Knowledge Graph Embeddings:
Google Cloud Storage
Zenodo: