Published July 28, 2024 | Version OAEI Bio-ML 2024
Dataset Open

Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

  • 1. University of Oxford
  • 2. City, University of London
  • 3. Samsung Research UK

Description

 

This version is used in the Bio-ML track of the OAEI 2024; the only change compared to the OAEI 2023 is the deletion of certain training subsumption mappings.

 

Overview

The purpose of these datasets is to support equivalence and subsumption ontology matching.

There are five ontology pairs extracted from MONDO and UMLS:

Source Task Category #SrcCls #TgtCls #Ref (equiv) #Ref (subs)
Mondo OMIM-ORDO Disease 9,648 9,275 3,721 103
Mondo NCIT-DOID Disease 15,762 8,465 4,686 3,338 (-1)
UMLS SNOMED-FMA Body 34,418 88,955 7,256 5,453 (-53)
UMLS SNOMED-NCIT Pharm 29,500 22,136 5,803 4,224 (-1)
UMLS SNOMED-NCIT Neoplas 22,971 20,247 3,804 213

The "-" numbers reflect the changes due to lthe deletion of certain training subsumption mappings.

The main track is available at "bio-ml", where each pair is associated with a task folder, containing the source and target ontologies, reference equivalence mappings (in "refs_equiv"), reference subsumption mappings ("refs_subs"). 

The special sub-track is available at "bio-llm", where each pair is associated with a task folder, containing the source and target ontologies, and the test candidate mappings. 

 

Citation

Bio-ML (Main Track)

```
@inproceedings{he2022machine, title={Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian}, booktitle={International Semantic Web Conference}, pages={575--591}, year={2022}, organization={Springer} }
```

Bio-LLM (Sub-track)

```
@article{he2023exploring, title={Exploring large language models for ontology alignment}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian}, journal={arXiv preprint arXiv:2309.07172}, year={2023} }
```

 

Important Links

 

Changelog

The only change in this version compared to the OAEI 2023 is the deletion of certain training subsumption mappings that can be directly exploited through deductive reasoning.

Files

ncit-doid.zip

Files (41.7 MB)

Name Size Download all
md5:d5d82bb5c17c89a2b5f9aa9a5b7ad15a
5.9 MB Preview Download
md5:7adf13e1726071f8922fdae9d653aeed
4.5 MB Preview Download
md5:94c9ec76584a204aa7dc6bf5985ba66d
13.2 MB Preview Download
md5:960c58698e4cc1ebbd2bde33c49bf1bd
7.2 MB Preview Download
md5:c4ed2701ff569b2cb92e37714636a112
10.9 MB Preview Download