Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching
- 1. University of Oxford
- 2. City, University of London
- 3. Samsung Research UK
Description
This version is used in the Bio-ML track of the OAEI 2023; a few signifcant changes have been made as compared to the OAEI 2022 version.
Overview
The purpose of these datasets is to support equivalence and subsumption ontology matching.
There are five ontology pairs extracted from MONDO and UMLS:
Source | Task | Category | #SrcCls | #TgtCls | #Ref (equiv) | #Ref (subs) |
Mondo | OMIM-ORDO | Disease | 9,648 (+6) | 9,275 (+437) | 3,721 | 103 |
Mondo | NCIT-DOID | Disease | 15,762 (+8,927) | 8,465 (+17) | 4,686 | 3,339 |
UMLS | SNOMED-FMA | Body | 34,418 (+10,236) | 88,955 (+24,229) | 7,256 | 5,506 |
UMLS | SNOMED-NCIT | Pharm | 29,500 (+13,455) | 22,136 (+6,886) | 5,803 | 4,225 |
UMLS | SNOMED-NCIT | Neoplas | 22,971 (+11,700) | 20,247 (+6291) | 3,804 | 213 |
The "+" numbers reflect the changes due to locality module enrichment.
The main track is available at "bio-ml", where each pair is associated with a task folder, containing the source and target ontologies, reference equivalence mappings (in "refs_equiv"), reference subsumption mappings ("refs_subs").
The special sub-track is available at "bio-llm", where each pair is associated with a task folder, containing the source and target ontologies, and the test candidate mappings.
Citation
Bio-ML
```
@inproceedings{he2022machine, title={Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian}, booktitle={International Semantic Web Conference}, pages={575--591}, year={2022}, organization={Springer} }
```
Bio-LLM
```
@article{he2023exploring, title={Exploring large language models for ontology alignment}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian}, journal={arXiv preprint arXiv:2309.07172}, year={2023} }
```
Important Links
- See detailed documentation at: https://krr-oxford.github.io/DeepOnto/bio-ml.
- See the OAEI Bio-ML track at: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/
- See our resource paper for the original Bio-ML at arxiv or springer (accepted at ISWC-2022 and nominated as the best resource paper candidate). See our poster paper for the new Bio-LLM at arxiv (accepted at ISWC-2023 Posters & Demos).
Changelog
Several signifcant changes have been made and they are well-documented in the Bio-ML documentation.