UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.

Journal article Open Access

ClinSpEn-OC Data: Parallel English-Spanish Ontology Concepts

Lima, Salvador; Johan, Darryl; Krallinger, Martin

UPDATE August 22nd 2022: The data in this repository has been merged with the rest of the ClinSpEn data, you may access it here: https://doi.org/10.5281/zenodo.6497350


This repository contains the sample, test and background data for the ClinSpEn-Ontology Concepts sub-track. The direction of this sub-track is EN>ES.

ClinSpEn is part of the Biomedical WMT 2022 shared task, having the aim to promote the development and evaluation of machine translation systems adapted to the medical domain with three highly relevant sub-tracks: clinical cases, medical controlled vocabularies/ontologies, and clinical terms and entities extracted from medical content.

The concepts have been extracted from various open biomedical ontologies and taxonomies and then manually translated by a professional medical translator. Due to their origin, these concepts may present different challenges than terms extracted from free text, such as semi-structured concepts.

The sample data includes 400 concepts. The terms are presented as tab-separated file (TSV), with the first column corresponding to English terms and the second column to Spanish terms. The third column includes the term’s origin ontology and its correspondent ID, while the fourth one includes a link to the concept in OBO Library.

The test and background data is made up of a TSV file with two columns: concept number and English concept.

Related Links:

- Sub-track website with more information: https://temu.bsc.es/clinspen/

- WMT website: https://www.statmt.org/wmt22/

- CodaLab: https://codalab.lisn.upsaclay.fr/competitions/6696


- ClinSpEn-CC (Clinical Cases): https://doi.org/10.5281/zenodo.6497350

- ClinSpEn-CT (Clinical Terms): https://doi.org/10.5281/zenodo.6497372

- ClinSpEn-OC (Ontology Concepts): https://doi.org/10.5281/zenodo.6497388


Files (2.4 MB)
Name Size
2.4 MB Download
All versions This version
Views 241131
Downloads 3511
Data volume 27.8 MB26.7 MB
Unique views 225123
Unique downloads 3511


Cite as