Published August 1, 2022 | Version v1
Dataset Open

ClinSpEn-CC (Clinical Cases) Test + Background Set

  • 1. Barcelona Supercomputing Center

Description

This repository contains the test and background data for the ClinSpEn-Clinical Cases sub-track. ClinSpEn is part of the Biomedical WMT 2022 shared task, having the aim to promote the development and evaluation of machine translation systems adapted to the medical domain with three highly relevant sub-tracks: clinical cases, medical controlled vocabularies/ontologies, and clinical terms and entities extracted from medical content.

The data is made up of a TSV file with three columns: document number, line number and English line. The direction of this sub-track is EN>ES. The clinical cases themselves include COVID-19 case reports as well as diverse content extracted from PubMed.

 

Related Links:

- Data website with more information: https://temu.bsc.es/clinspen/

- WMT website (includes schedule, registration, ...): https://www.statmt.org/wmt22/

- CodaLab: https://codalab.lisn.upsaclay.fr/competitions/6696

 

ClinSpEn SAMPLE SETS:

- ClinSpEn-CC Sample Set (Clinical Cases): https://doi.org/10.5281/zenodo.6497350

- ClinSpEn-CT Sample Set (Clinical Terms): https://doi.org/10.5281/zenodo.6497372

- ClinSpEn-OC Sample Set (Ontology Concepts): https://doi.org/10.5281/zenodo.6497388

ClinSpEn TEST SETS:

- ClinSpEn-CC Test Set (Clinical Cases): https://doi.org/10.5281/zenodo.6948634

- ClinSpEn-CT Test Set (Clinical Terms): https://doi.org/10.5281/zenodo.6948669

- ClinSpEn-OC Test Set (Ontology Concepts): https://doi.org/10.5281/zenodo.6948679

 

Files

Files (12.9 MB)

Name Size Download all
md5:ad56ba4ad2fefd084095727062897d0f
12.9 MB Download