Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching
- 1. University of Oxford
- 2. City, University of London
- 3. Samsung Research UK
Description
This version is used in the Bio-ML track of the OAEI 2024; the only change compared to the OAEI 2023 is the deletion of certain training subsumption mappings.
Overview
The purpose of these datasets is to support equivalence and subsumption ontology matching.
There are five ontology pairs extracted from MONDO and UMLS:
Source | Task | Category | #SrcCls | #TgtCls | #Ref (equiv) | #Ref (subs) |
Mondo | OMIM-ORDO | Disease | 9,648 | 9,275 | 3,721 | 103 |
Mondo | NCIT-DOID | Disease | 15,762 | 8,465 | 4,686 | 3,338 (-1) |
UMLS | SNOMED-FMA | Body | 34,418 | 88,955 | 7,256 | 5,453 (-53) |
UMLS | SNOMED-NCIT | Pharm | 29,500 | 22,136 | 5,803 | 4,224 (-1) |
UMLS | SNOMED-NCIT | Neoplas | 22,971 | 20,247 | 3,804 | 213 |
The "-" numbers reflect the changes due to lthe deletion of certain training subsumption mappings.
The main track is available at "bio-ml", where each pair is associated with a task folder, containing the source and target ontologies, reference equivalence mappings (in "refs_equiv"), reference subsumption mappings ("refs_subs").
The special sub-track is available at "bio-llm", where each pair is associated with a task folder, containing the source and target ontologies, and the test candidate mappings.
Citation
Bio-ML (Main Track)
```
@inproceedings{he2022machine, title={Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian}, booktitle={International Semantic Web Conference}, pages={575--591}, year={2022}, organization={Springer} }
```
Bio-LLM (Sub-track)
```
@article{he2023exploring, title={Exploring large language models for ontology alignment}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian}, journal={arXiv preprint arXiv:2309.07172}, year={2023} }
```
Important Links
- See detailed documentation at: https://krr-oxford.github.io/DeepOnto/bio-ml.
- See the OAEI Bio-ML track at: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/
- See our resource paper for the original Bio-ML at arxiv or springer (accepted at ISWC-2022 and nominated as the best resource paper candidate). See our poster paper for the Bio-LLM sub-track at arxiv (accepted at ISWC-2023 Posters & Demos).
Changelog
The only change in this version compared to the OAEI 2023 is the deletion of certain training subsumption mappings that can be directly exploited through deductive reasoning.
Files
ncit-doid.zip
Files
(41.7 MB)
Name | Size | Download all |
---|---|---|
md5:d5d82bb5c17c89a2b5f9aa9a5b7ad15a
|
5.9 MB | Preview Download |
md5:7adf13e1726071f8922fdae9d653aeed
|
4.5 MB | Preview Download |
md5:94c9ec76584a204aa7dc6bf5985ba66d
|
13.2 MB | Preview Download |
md5:960c58698e4cc1ebbd2bde33c49bf1bd
|
7.2 MB | Preview Download |
md5:c4ed2701ff569b2cb92e37714636a112
|
10.9 MB | Preview Download |