There is a newer version of the record available.

Published July 15, 2021 | Version v1.5.1
Dataset Open

Ontology Quality Check -- Harmonized Data Quality Framework Alignment

  • 1. University of Colorado Anschutz Medical Campus

Description

Ontologies play an important role in the representation, standardization, and integration of biomedical data, but are known to have data quality (DQ) issues. We aimed to understand if the Harmonized Data Quality Framework (HDQF), developed to standardize electronic health record DQ assessment strategies, could be used to improve ontology quality assessment. A novel set of 14 ontology checks was developed. These DQ checks were aligned to the HDQF and examined by HDQF developers. The ontology checks were evaluated using 11 Open Biomedical Ontology Foundry ontologies. 85.7% of the ontology checks were successfully aligned to at least 1 HDQF category. Accommodating the unmapped DQ checks (n=2), required modifying an original HDQF category and adding a new Data Dependency category. The HQDF is a valuable resource within the clinical domain and this work demonstrates its ability to categorize ontology quality assessment strategies.

 

This repository contains the following:

  • Results of mapping the ontology quality checks to the HDQF (Ontology_DQA_v1.5.1.xlsx).
  • The Jupyter Notebook that contains the code that is used to perform the ontology quality checks (Ontology_Cleaning.ipynb).  
  • An example of the Ontology Quality Report, taken from the v2.1.0 01 MAY2021 PheKnowLator Build (ontology_quality_report_v2.1.0_01MAY2021.txt)

Files

Ontology_Cleaning.ipynb

Files (1.5 MB)

Name Size Download all
md5:887853c62cba7898dc576cd902593aab
31.3 kB Preview Download
md5:59c3a347cc1a91314280f2eb0905c028
33.1 kB Download
md5:aaf3cf23757300ef8edebc68eeafc02a
1.4 MB Preview Download