Published January 30, 2023 | Version V3.0
Preprint Open

Ontologizing Health Systems Data at Scale: Making Translational Discovery a Reality

  • 1. University of Colorado Anschutz Medical Campus; Columbia University Irving Medical Center
  • 2. University of Colorado Anschutz Medical Campus
  • 3. National Institutes of Health
  • 4. Columbia University Irving Medical Center
  • 5. Georgia State University
  • 6. University of Pittsburgh School of Medicine
  • 7. Università degli Studi di Milano
  • 8. The Jackson Laboratory for Genomic Medicine
  • 9. University of Cambridge
  • 10. Children's Hospital Colorado
  • 11. University of Colorado Anschutz School of Medicine
  • 12. University of Colorado School of Medicine
  • 13. Semanticly
  • 14. University of Southern California
  • 15. Lawrence Berkeley National Laboratory
  • 16. HealthLinc
  • 17. University of Pittsburgh
  • 18. University of Colorado Anschutz Skaggs School of Pharmacy and Pharmaceutical Sciences and School of Medicine
  • 19. Tufts University
  • 20. Sema4
  • 21. Janssen Research and Development

Description

Background: Common data models solve many challenges of standardizing electronic health record (EHR) data, but are unable to semantically integrate all the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. Objective: We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Results: Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. Conclusions: By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.

Notes

This work was supported by funding from the National Library of Medicine (T15LM009451) to Lawrence E. Hunter and (T15LM007079) to George Hripcsak.

Files

OMOP2OBO_Manuscript_v3.pdf

Files (6.3 MB)

Name Size Download all
md5:71359b07f8cf75622fb1d6adf5129d7b
3.9 MB Preview Download
md5:88cef4b67889ef068775edb77e0465b7
2.4 MB Preview Download

Additional details

Related works