Published October 25, 2022 | Version V1.5
Dataset Open

OMOP2OBO Condition Occurrence Mappings

  • 1. Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz Medical Campus
  • 2. Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz School of Medicine
  • 3. Translational and Integrative Sciences Lab, University of Colorado Anschutz School of Medicine
  • 4. Department of Pediatrics, Section of Pediatric Critical Care, School of Medicine, University of Colorado Anschutz School of Medicine
  • 5. Adult and Child Consortium for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado Anschutz School of Medicine
  • 6. Computational Bioscience Program, University of Colorado Anschutz Medical Campus
  • 1. Department of Computer Science, Georgia State University
  • 2. Department of Biomedical Informatics, University of Pittsburgh School of Medicine
  • 3. Center for Health AI, University of Colorado Anschutz Medical Campus
  • 4. Keck School of Medicine, University of Southern California
  • 5. Department of Biomedical Informatics, Columbia University Medical Center
  • 6. Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory
  • 7. Department of Clinical Pharmacy and Medicine, University of Colorado Anschutz Skaggs School of Pharmacy and Pharmaceutical Sciences and School of Medicine
  • 8. Tufts Institute for Clinical Research and Health Policy Studies, Tufts University
  • 9. Semanticly Ltd
  • 10. The Jackson Laboratory for Genomic Medicine
  • 11. Computational Bioscience Program, Department of Pharmacology, Aurora, CO, 80045, USA, University of Colorado Anschutz Medical Campus
  • 12. Department of Research Informatics, Children's Hospital Colorado

Description

OMOP2OBO Condition Occurrence Mappings V1.0

These mappings were created by the OMOP2OBO mapping algorithm (see links below).  OMOP2OBO - the first health system-wide, disease-agnostic mappings between standardized clinical terminologies and eight Open Biomedical Ontology (OBO) Foundry ontologies spanning diseases, phenotypes, anatomical entities, cell types, organisms, chemicals, vaccines, and proteins. These mappings are also the first to be explicitly created using standard terminologies in the Observational Medical Outcomes (OMOP) common data model (CDM), ensuring both semantic and clinical interoperability across a space of N conditions (and N relationships curated in these ontologies).

The mappings in this repository were created between OMOP standard condition occurrence concepts (i.e., SNOMED CT) to the Human Phenotype Ontology (HPO) and the (Mondo). The National Library of Medicine's Unified Medical Language System (UMLS) Semantic Types are first used to filter out all concepts that did not have a biological origin (accidents, injuries, external complications, and findings without clear interpretations). Then, the Semantic Type was used to prioritize the mapping of HPO concepts to findings and symptoms and Mondo to Semantic Types indicative of disease. For these OMOP domains, owl:intersectionOf (“and”), and owl:unionOf (“or”) constructors were used to construct semantically expressive mappings.


Mapping Details
Mappings included in this set were generated automatically using OMOP2OBO or through the use of a Bag-of-words embedding model using TF-IDF. Cosine similarity is used to compute similarity scores between all pairwise combinations of OMOP and OBO concepts and ancestor concepts. To improve the efficiency of this process, the algorithm searches only the top 𝑛 most similar results and keeps the top 75th percentile among all pairs with scores >= 0.25. Manually created mappings are also included.

Mapping Categories

  • Automatic One-to-One Concept: Exact label or synonym, dbXRef, or expert validated mapping @ concept-level; 1:1
  • Automatic One-to-One Ancestor: Exact label or synonym, dbXRef, or expert validated mapping @ concept ancestor-level; 1:1
  • Automatic One-to-Many Concept: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
  • Automatic One-to-Many Ancestor: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
  • Manual One-to-One: Hand mapping created using expert suggested resources; 1:1
  • Manual One-to-Many: Hand mapping created using expert suggested resources; 1:Many
  • Cosine Similarity: score suggested mapping -- manually verified
  • UnMapped: No suitable mapping or not mapped type


Mapping Statistics
Additional statistics have been provided for the mappings and are shown in the table below. This table presents the counts of OMOP concepts by mapping category and ontology:

Mapping Category HPO Mondo
Automatic One-to-One Concept 4767 9097
Automatic One-to-Many Concept 150 885
Cosine Similarity 1375 667
Automatic One-to-One Ancestor 13595 8911
Automatic One-to-Many Ancestor   38080 40224
Manual 5131 755
Manual One-to-Many 10326 2835
Unmapped 36301 46345


Provenance and Versioning: The V1.0 deposited mappings were created by OMOP2OBO v1.0.0 on October 2022 using the OMOP Common Data Model V5.0 and OBO Foundry ontologies downloaded on September 14, 2020. 

Caveats: The deposited files only contain the mappings that were generated automatically by the algorithm. The manually generated mappings will be deposited with the official preprint manuscript. Please note that these are the original mappings that were created for the preprint. They have not been updated to current versions of the ontologies. In our experience, this should result in very few errors, but we do suggest that you check the ontology concepts used against current versions of each ontology before using them.

 

Important Resources and Documentation

Files

Files (94.6 MB)

Name Size Download all
md5:d6bd189c5f42d5a177551f1bf37b57c0
94.6 MB Download

Additional details

Related works

Is cited by
Preprint: 10.5281/zenodo.5716421 (DOI)
Is compiled by
Software: https://github.com/callahantiff/OMOP2OBO (URL)
Is published in
Other: http://tiffanycallahan.com/OMOP2OBO_Dashboard (URL)
Is referenced by
Thesis: 10.5281/zenodo.5716401 (DOI)