Published October 13, 2022 | Version v1
Dataset Open

Biomedical Entities and Relations on Spanish Clinical Case Corpus: BERSCCC

  • 1. Ontology Engineering Group, Universidad Politécnica de Madrid
  • 1. Ontology Engineering Group, Universidad Politécnica de Madrid

Description

This first version of a spanish corpus contains 200 clinical reports annotated with biomedical entities and semantic relations. 
These reports belong to the Spanish Clinical Case Corpus (SPACCC) (https://doi.org/10.5281/zenodo.2560316)
and each of them has been annotated by three persons that work in the medicine, biomolecular or pharmaceutic area.

The annotators had to identify the following thirteen types of entities in the spanish lenguage: Enfermedad/Síndrome, Gen, Parte del cuerpo/Órgano, Glúcido, Procedimiento de Diagnóstico, Proteína, Procedimiento Terapeútico, Síntoma/Signo, Sustancia Farmacológica, Lípido, Organismo, Químico Orgánico and Abreviatura/Sigla/Alias.
And the next eight semantic relations: Analiza, Altera, Causa, Diagnostica, Manifestación de, Produce, Trata and Refiere a. 

Finally there were identified 6,636 biomedical entities (37,081 mentions) and 4,864 semantic relations (7,622 mentions).

These resources are freely distributed under a Creative Commons Attribution 4.0 International License.
The scripts used to create this corpus can be found at: https://github.com/drugs4covid/bio-corpora

Author: Lucía Sánchez González, Ontology Engineering Group, Universidad Politécnica de Madrid.

Supervisors: 

- Carlos Badenes Olmedo, Ontology Engineering Group, Universidad Politécnica de Madrid.

- María Poveda Villalón, Ontology Engineering Group, Universidad Politécnica de Madrid.

Project Member: Óscar Corcho García, Ontology Engineering Group, Universidad Politécnica de Madrid.

 Contact:

Lucía Sánchez González at lu.sanchez@alumnos.upm.es or lusangonz99@gmail.com

Acknowledgments to Project DRUGS4COVID++: Servicios de Inteligencia Artificial para la
creación de un grafo de conocimientos sobre fármacos usados en el control clínico de la
enfermedad, a partir de la explotación de grandes corpus de documentación científica sobre
SARS-COV-2 y COVID-19-AYUDAS FUNDACIÓN BBVA A EQUIPOS DE
INVESTIGACIÓN CIENTÍFICA SARS-CoV-2 y COVID-19.

 

Files

BERSCCC.zip

Files (42.3 MB)

Name Size Download all
md5:97e99824c676647aed323100c3bfec7d
42.3 MB Preview Download