Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published November 20, 2021 | Version v1.0.0
Thesis Open

Learning Deep Translational Patient Representations: Systematic Integration of Clinical Records and Biomedical Knowledge

  • 1. Computational Bioscience Program, The University of Colorado Anschutz Medical Campus
  • 1. Department of Pediatrics, Section of Pediatric Critical Care, School of Medicine, The University of Colorado Anschutz Medical Campus
  • 2. Computational Bioscience Program, The University of Colorado Anschutz Medical Campus
  • 3. University of Colorado Denver School of Pharmacy, The University of Colorado Anschutz Medical Campus
  • 4. Department of Pediatrics, Sections of Informatics and Data Science and Critical Care Medicine, School of Medicine, University of Colorado School of Medicine
  • 5. Translational and Integrative Sciences Lab, University of Colorado Anschutz Medical Campus
  • 6. Department of Pediatrics, Breathing Institute, Pediatric Pulmonary Section, Children's Hospital Colorado

Description

Traditional computational phenotypes (CPs) identify patient cohorts without consideration of underlying pathophysiological mechanisms. Deeper patient-level characterizations are necessary for personalized medicine and while advanced methods exist, their application in clinical settings remains largely unrealized. This thesis advances deep CPs through several experiments designed to address four requirements.

Stability was examined through three experiments. First, a multiphase study was performed and identified resources and remediation plans as barriers preventing data quality (DQ) assessment. Then, through two experiments, the Harmonized DQ Framework was used to characterize DQ checks from six clinical organizations and 12 biomedical ontologies finding Atemporal Plausibility and Completeness and Value Conformance as the most common clinical checks and Value and Relation Conformance as the most common biomedical ontology checks.

Scalability was examined through three experiments. First, a novel composite patient similarity algorithm was developed that demonstrated that information from clinical terminology hierarchies improved patient representations when applied to small populations. Then, ablation studies were performed and showed that the combination of data type, sampling window, and clinical domain used to characterize rare disease patients differed by disease. Finally, an algorithm that losslessly transforms complex knowledge graphs (KGs) into representations more suitable for inductive inference was developed and validated through the generation of expert-verified plausible novel drug candidates.

Interoperability was examined through two experiments. First, 36 strategies to align five eMERGE CPs to standard clinical terminologies were examined and revealed lower false negative and positive counts in adults than in pediatric patient populations. Then, hospital-scale mappings between clinical terminologies and biomedical ontologies were developed and found to be accurate, generalizable, and logically consistent.

Multimodality was examined through two experiments. A novel ecosystem for constructing ontologically-grounded KGs under alternative knowledge models using different relation strategies and abstraction strategies was created. The resulting KGs were validated through successfully enriching portions of the preeclampsia molecular signature with no previously known literature associations.

These experiments were used to develop a joint learning framework for inferring molecular characterizations of patients from clinical data. The utility of this framework was demonstrated through the accurate inference of EHR-derived rare disease patient genotypes/phenotypes from publicly available molecular data.

Notes

This thesis is licensed as Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). A copy of the license has been attached to this record (Creative Commons — Attribution-NonCommercial-ShareAlike 4.0 International — CC BY-NC-SA 4.0.pdf)

Files

Callahan_Dissertation_V2.2_October2021.pdf

Files (283.6 MB)

Additional details