Published August 12, 2024 | Version v2
Dataset Open

OWL2VecOA Resources for Bio-ML 2023

  • 1. ROR icon City, University of London

Contributors

  • 1. ROR icon City, University of London

Description

1. The repository omim2ordo_exp_results.zip contains results of applying our extended  OWL2VecOA method to biomedical ontology alignments, specifically focusing on the alignment between OMIM and ORDO. 

The specifications are: walk depth = 3, embedding size =100, iteration =70, walker iteration k=20. 

The alignment process utilized a combined approach, integrating results from two well-established ontology matching systems: AML and LogMap. Specifically the following input configurations were used:

  • Train.tsv (from BIO-ML Track 2023) combined with the intersection of AML and LogMap alignments
  • Train.tsv combined with the union of AML and LogMap alignments
  • Train.tsv combined with LogMap alignments (Logmapping)
  • Train.tsv combined with LogMap alignments (Anchor Mappings)
  • Train.tsv combined with LogMap alignments (OverEstimation Mappings)
  • Train.tsv only

2. The repository "owl2vecstart_initres_2&3.zip" contains the results of applying the initial version of the OWL2VecStar method to biomedical ontology alignments 2023 : OMIM-ORDO (o2o), NCIT-DOID (ncit2doid), SNOMED-NCIT-N (s2nn), and SNOMED-NCIT-PHARMA (sn2p) with walk depths of 2 and 3. Similarly, the repository "owl2vecstar_initres_4&5.zip" contains analogous results, but with walk depths of 4 and 5  

3. The repository "owl2vecOA_results_2&3.zip" contains the results of applying our extended OWL2VecOA method to the BIO-ML  datasets, utilizing walk depths of 2 and 3.

The results package contains three key components of each input data: Embedding file, Cosine Similarity Scores file and Euclidian Distance Scores file. The embedding files can be used for various ML downstream tasks, while the similarity and distance scores provide direct measures of entity relatedness, potentially useful for ontology alignment, entity matching, or other biomedical informatics applications. 

Notes

  • Embedding file: This file contains vector representations of ontology entities (concepts, properties, and instances) from both OMIM and ORDO. These embeddings capture semantic information from the ontologies in a dense vector space.
  • Cosine similarity scores file: This file provides cosine similarity scores between the vector representations of source (OMIM) and target (ORDO) entities. Cosine similarity is a measure of similarity between two non-zero vectors, often used to compare the semantic relatedness of entities in vector space.
  • Euclidean distance scores file: This file contains Euclidean distance scores between the vector representations of source and target entities. Euclidean distance measures the straight-line distance between two points in the vector space and can be used as another metric for entity similarity.

Files

omim2ordo_exp_results.zip

Files (1.2 GB)

Name Size Download all
md5:faf9920fbb75aa34780999c965afc736
210.3 MB Preview Download
md5:69c83a04edd7c66191d6c439a3f7cea8
490.6 MB Preview Download
md5:a68a8394287739af9a024a0a051a6785
250.3 MB Preview Download
md5:837d3241261e359ada1f3625f28c0871
205.6 MB Preview Download

Additional details

Dates

Created
2024-08-12