Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching
- 1. University of Oxford
- 2. City, University of London
- 3. Samsung Research UK
Description
NOTE: This version is used in the Bio-ML track of the OAEI 2022; a new version for the OAEI 2023 will be available soon.
About
The purpose of these datasets is to support equivalence and subsumption ontology matching.
There are five ontology pairs extracted from MONDO and UMLS:
Source | Task | Category | #SrcCls | #TgtCls | #TgtCls (subs) | #Ref (equiv) | #Ref (subs) |
Mondo | OMIM-ORDO | Disease | 9,642 | 8,838 | 8,735 | 3,721 | 103 |
Mondo | NCIT-DOID | Disease | 6,835 | 8,448 | 5,113 | 4,686 | 3,339 |
UMLS | SNOMED-FMA | Body | 24,182 | 64,726 | 59,567 | 7,256 | 5,506 |
UMLS | SNOMED-NCIT | Pharm | 16,045 | 15,250 | 12,462 | 5,803 | 4,225 |
UMLS | SNOMED-NCIT | Neoplas | 11,271 | 13,956 | 13,790 | 3,804 | 213 |
Each pair is associated with three folders: "raw_data", "equiv_match", and "subs_match", corresponding to the downloaded source ontologies, the package for equivalence matching, and the package for subsumption matching.
Citation
@inproceedings{he2022machine,
title={Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching},
author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian},
booktitle={The Semantic Web--ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23--27, 2022, Proceedings},
pages={575--591},
year={2022},
organization={Springer}
}
Links
- See detailed instructions at: https://krr-oxford.github.io/DeepOnto/bio-ml.
- See the OAEI Bio-ML track at: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/
- See our resource paper at arxiv or springer (accepted by ISWC-2022 and nominated as the best resource paper candidate).
Changelog
Update against the previous versions:
- Class entities in the mapping files are now represented using their full IRIs (against v2).
- Files for candidate mappings for local ranking evaluation are simplified to just one .tsv file.