There is a newer version of the record available.

Published November 20, 2021 | Version v0.0.0
Thesis Open

Learning Deep Translational Patient Representations: Systematic Integration of Clinical Records and Biomedical Knowledge

Authors/Creators

  • 1. Computational Bioscience Program, The University of Colorado Anschutz Medical Campus

Contributors

  • 1. Department of Pediatrics, Section of Pediatric Critical Care, School of Medicine, The University of Colorado Anschutz Medical Campus
  • 2. Computational Bioscience Program, The University of Colorado Anschutz Medical Campus

Description

Traditional computational phenotypes (CPs) identify patient cohorts without consideration of underlying pathophysiological mechanisms. Deeper patient-level characterizations are necessary for personalized medicine and while advanced methods exist, their application in clinical settings remains largely unrealized. This thesis advances deep CPs through several experiments designed to address four requirements.

Stability was examined through three experiments. First, a multiphase study was performed and identified resources and remediation plans as barriers preventing data quality (DQ) assessment. Then, through two experiments, the Harmonized DQ Framework was used to characterize DQ checks from six clinical organizations and 12 biomedical ontologies finding Atemporal Plausibility and Completeness and Value Conformance as the most common clinical checks and Value and Relation Conformance as the most common biomedical ontology checks.

Scalability was examined through three experiments. First, a novel composite patient similarity algorithm was developed that demonstrated that information from clinical terminology hierarchies improved patient representations when applied to small populations. Then, ablation studies were performed and showed that the combination of data type, sampling window, and clinical domain used to characterize rare disease patients differed by disease. Finally, an algorithm that losslessly transforms complex knowledge graphs (KGs) into representations more suitable for inductive inference was developed and validated through the generation of expert-verified plausible novel drug candidates.

Interoperability was examined through two experiments. First, 36 strategies to align five eMERGE CPs to standard clinical terminologies were examined and revealed lower false negative and positive counts in adults than in pediatric patient populations. Then, hospital-scale mappings between clinical terminologies and biomedical ontologies were developed and found to be accurate, generalizable, and logically consistent.

Multimodality was examined through two experiments. A novel ecosystem for constructing ontologically-grounded KGs under alternative knowledge models using different relation strategies and abstraction strategies was created. The resulting KGs were validated through successfully enriching portions of the preeclampsia molecular signature with no previously known literature associations.

These experiments were used to develop a joint learning framework for inferring molecular characterizations of patients from clinical data. The utility of this framework was demonstrated through the accurate inference of EHR-derived rare disease patient genotypes/phenotypes from publicly available molecular data.

Notes

This thesis is licensed as Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). As defined by creativecommons.org: "CC BY-NC-SA: This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms." CC BY-NC-SA includes the following elements: BY  – Credit must be given to the creator; NC  – Only noncommercial uses of the work are permitted; SA  – Adaptations must be shared under the same terms. A copy of this license has been uploaded to this record (LICENSE.pdf)

Files

Callahan_Dissertation_V2.2_October2021.pdf

Files (283.7 MB)

Name Size Download all
md5:a84f464f8decd561ee550a5e83289d93
122.3 MB Preview Download
md5:075ca1ffdeaf245ece1602540d0fb27f
161.4 MB Preview Download

Additional details