NUT ALLERGY CORPUS
Creators
- 1. Computer Science and Engineering Department
Contributors
Contact person:
Data collectors:
Data managers:
Description
The first corpus of clinical notes on allergies in Spanish, a collection comprising 828 texts related to clinical notes of 197 patients visiting the Allergy Unit and Emergency Department Hospital Universitario Fundación Alcorcón.
The collection of texts has a total of 70.272 words and 3.938 sentences, with an average of 85 words and five sentences per note. The maximum number of words in a text is 533, and 50 sentences. The notes contain medical terms that pose a complex comprehension challenge for non-medical professionals. Clinical notes follow a different structure depending on the template used to collect patient information. The types of templates are anamnesis, personal and family history, physical examination, medical-evolution, diagnostic tests, summary of the situation, diagnosis, medical treatment, and recommendations.
The texts are written in informal clinical writing where typos, abbreviations, and incomplete sentences are found. Some clinical notes may contain results of analyses or skin tests performed on the patient. There are spelling errors, tokenization errors, and words that should not be anonymised.
This corpus was built for research and educational purposes.