Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: a use case studying depression as a risk factor for Alzheimer's disease
Authors/Creators
- 1. University of Pittsburgh School of Medicine
- 2. University of Pittsburgh School of Information
Description
Problem: Determining the causes of disease is a central focus of biomedical science. Randomized studies are the gold standard for establishing causal relations between an exposure and an outcome of interest. However, randomized studies are not always feasible. Unfortunately, confounding bias threatens the validity of such studies for causal inference. However, conditioning on common causes or confounders of the association may reduce confounding. However, the knowledge required for this task is limited by our ability to organize and analyze biomedical knowledge at scale. Computable knowledge extracted from the literature could be potentially useful. However, this knowledge is incomplete and may be inaccurate. Furthermore, variables may exhibit complex behavior by playing roles besides confounder, including collider (common effect) and mediator (intermediate). Variables with complex behavior may introduce bias. This repository contains the code we used to construct and analyze our knowledge graph application to address these problems in a use case studying depression as a risk factor for Alzheimer's disease (AD).
Solution: We build a large knowledge graph (KG) combined with computable knowledge extracted from the literature on clinical studies of AD using an ontology-grounded KG workflow developed by molecular biologists called PheKnowLator. Our workflow also harmonizes structured knowledge extracted from a scoped literature corpus using three machine reading systems. This approach resolves knowledge integration between clinical disease definitions and biomedical knowledge using semantic web technologies and standardization embedded in the ontology-grounded resources. Additionally, our methods address the incompleteness of the literature-derived knowledge by applying graph closure procedures over the structured knowledge. Finally, we translate standard epidemiological definitions for confounders, colliders, and mediators into SPARQL queries for searching the KG.
Objective: build a KG integrating granular definitions of clinical disease phenotype with biomedical knowledge of molecular processes combined with a superset of inferred knowledge extracted from the literature-derived computable knowledge/ for identifying potential confounders, colliders, and mediators.
Notes
Files
alz_lbgcm-master.zip
Files
(417.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:54d823a72046dc50c4c8670df98b1dcd
|
417.5 MB | Preview Download |
|
md5:3be59e808c87407485348159eefcf00e
|
28.4 kB | Download |
|
md5:2a713f31853ded45789f106376ade3da
|
30.3 kB | Download |
|
md5:56c169ed89dd35233b37f96c617829d8
|
5.9 kB | Download |
|
md5:77eb4054863c4a7448132262d42a0ba2
|
4.4 kB | Download |
|
md5:c598c79cee3432d91e521e6ec6bc18bf
|
30.2 kB | Download |
|
md5:554d84adc98b9da585d2269ecb724872
|
30.0 kB | Download |
|
md5:e7921ff31d3d904a46566dfad7d8de83
|
6.5 kB | Download |
Additional details
Related works
- Is referenced by
- Journal article: https://www.biorxiv.org/content/10.1101/2022.07.18.500549v1.article-metrics (URL)