There is a newer version of the record available.

Published July 28, 2022 | Version OAEI Bio-ML 2022
Dataset Open

Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

  • 1. University of Oxford
  • 2. City, University of London
  • 3. Samsung Research UK

Description

 

NOTE: This version is used in the Bio-ML track of the OAEI 2022; a new version for the OAEI 2023 will be available soon.

 

About

The purpose of these datasets is to support equivalence and subsumption ontology matching.

There are five ontology pairs extracted from MONDO and UMLS:

Source Task Category #SrcCls #TgtCls #TgtCls (subs) #Ref (equiv) #Ref (subs)
Mondo OMIM-ORDO Disease 9,642 8,838 8,735 3,721 103
Mondo NCIT-DOID Disease 6,835 8,448 5,113 4,686 3,339
UMLS SNOMED-FMA Body 24,182 64,726 59,567 7,256 5,506
UMLS SNOMED-NCIT Pharm 16,045 15,250 12,462 5,803 4,225
UMLS SNOMED-NCIT Neoplas 11,271 13,956 13,790 3,804 213

 

Each pair is associated with three folders: "raw_data", "equiv_match", and "subs_match", corresponding to the downloaded source ontologies, the package for equivalence matching, and the package for subsumption matching.

 

Citation

@inproceedings{he2022machine,
title={Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching},
author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian}, 
booktitle={The Semantic Web--ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23--27, 2022, Proceedings},
pages={575--591},
year={2022},
organization={Springer}
}

 

Links

 

Changelog

Update against the previous versions:

  • Class entities in the mapping files are now represented using their full IRIs (against v2).
  • Files for candidate mappings for local ranking evaluation are simplified to just one .tsv file.

Files

MONDO.zip

Files (158.8 MB)

Name Size Download all
md5:ae3ac1e70c371ce70142799b4fcbf3d3
52.7 MB Preview Download
md5:36e05c19a8385f64436a8de3f43dc3d1
106.2 MB Preview Download