Ontology Quality Check -- Harmonized Data Quality Framework Alignment
Description
Ontologies play an important role in the representation, standardization, and integration of biomedical data, but are known to have data quality (DQ) issues. We aimed to understand if the Harmonized Data Quality Framework (HDQF), developed to standardize electronic health record DQ assessment strategies, could be used to improve ontology quality assessment. A novel set of 14 ontology checks was developed. These DQ checks were aligned to the HDQF and examined by HDQF developers. The ontology checks were evaluated using 11 Open Biomedical Ontology Foundry ontologies. 85.7% of the ontology checks were successfully aligned to at least 1 HDQF category. Accommodating the unmapped DQ checks (n=2), required modifying an original HDQF category and adding a new Data Dependency category. The HQDF is a valuable resource within the clinical domain and this work demonstrates its ability to categorize ontology quality assessment strategies.
This repository contains the following:
- Results of mapping the ontology quality checks to the HDQF (Ontology_DQA_v1.5.1.xlsx).
- The Jupyter Notebook that contains the code that is used to perform the ontology quality checks (Ontology_Cleaning.ipynb).
- An example of the Ontology Quality Report, taken from the v2.1.0 01 MAY2021 PheKnowLator Build (ontology_quality_report_v2.1.0_01MAY2021.txt)