Journal article Open Access
UPDATE August 22nd 2022: The data in this repository has been merged with the rest of the ClinSpEn data, you may access it here: https://doi.org/10.5281/zenodo.6497350
This repository contains the sample, test and background data for the ClinSpEn-Ontology Concepts sub-track. The direction of this sub-track is EN>ES.
ClinSpEn is part of the Biomedical WMT 2022 shared task, having the aim to promote the development and evaluation of machine translation systems adapted to the medical domain with three highly relevant sub-tracks: clinical cases, medical controlled vocabularies/ontologies, and clinical terms and entities extracted from medical content.
The concepts have been extracted from various open biomedical ontologies and taxonomies and then manually translated by a professional medical translator. Due to their origin, these concepts may present different challenges than terms extracted from free text, such as semi-structured concepts.
The sample data includes 400 concepts. The terms are presented as tab-separated file (TSV), with the first column corresponding to English terms and the second column to Spanish terms. The third column includes the term’s origin ontology and its correspondent ID, while the fourth one includes a link to the concept in OBO Library.
The test and background data is made up of a TSV file with two columns: concept number and English concept.
- Sub-track website with more information: https://temu.bsc.es/clinspen/
- WMT website: https://www.statmt.org/wmt22/
- ClinSpEn-CC (Clinical Cases): https://doi.org/10.5281/zenodo.6497350
- ClinSpEn-CT (Clinical Terms): https://doi.org/10.5281/zenodo.6497372
- ClinSpEn-OC (Ontology Concepts): https://doi.org/10.5281/zenodo.6497388