Published July 1, 2022 | Version 0.9.0beta
Software Open

Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: a use case studying depression as a risk factor for Alzheimer's disease

  • 1. University of Pittsburgh School of Medicine
  • 2. University of Pittsburgh School of Information

Description

Problem: Determining the causes of disease is a central focus of biomedical science. Randomized studies are the gold standard for establishing causal relations between an exposure and an outcome of interest. However, randomized studies are not always feasible. Unfortunately, confounding bias threatens the validity of such studies for causal inference. However, conditioning on common causes or confounders of the association may reduce confounding. However, the knowledge required for this task is limited by our ability to organize and analyze biomedical knowledge at scale. Computable knowledge extracted from the literature could be potentially useful. However, this knowledge is incomplete and may be inaccurate. Furthermore, variables may exhibit complex behavior by playing roles besides confounder, including collider (common effect) and mediator (intermediate). Variables with complex behavior may introduce bias. This repository contains the code we used to construct and analyze our knowledge graph application to address these problems in a use case studying depression as a risk factor for Alzheimer's disease (AD).


Solution: We build a large knowledge graph (KG) combined with computable knowledge extracted from the literature on clinical studies of AD using an ontology-grounded KG workflow developed by molecular biologists called PheKnowLator.  Our workflow also harmonizes structured knowledge extracted from a scoped literature corpus using three machine reading systems. This approach resolves knowledge integration between clinical disease definitions and biomedical knowledge using semantic web technologies and standardization embedded in the ontology-grounded resources. Additionally, our methods address the incompleteness of the literature-derived knowledge by applying graph closure procedures over the structured knowledge. Finally, we translate standard epidemiological definitions for confounders, colliders, and mediators into SPARQL queries for searching the KG. 

Objective: build a KG integrating granular definitions of clinical disease phenotype with biomedical knowledge of molecular processes combined with a superset of inferred knowledge extracted from the literature-derived computable knowledge/ for identifying potential confounders, colliders, and mediators.

Notes

Latest version of data on GitHub

Files

alz_lbgcm-master.zip

Files (417.6 MB)

Name Size Download all
md5:54d823a72046dc50c4c8670df98b1dcd
417.5 MB Preview Download
md5:3be59e808c87407485348159eefcf00e
28.4 kB Download
md5:2a713f31853ded45789f106376ade3da
30.3 kB Download
md5:56c169ed89dd35233b37f96c617829d8
5.9 kB Download
md5:77eb4054863c4a7448132262d42a0ba2
4.4 kB Download
md5:c598c79cee3432d91e521e6ec6bc18bf
30.2 kB Download
md5:554d84adc98b9da585d2269ecb724872
30.0 kB Download
md5:e7921ff31d3d904a46566dfad7d8de83
6.5 kB Download

Additional details

Related works