ToxHabits-NER: A Gold-Standard annotated Dataset for Named Entity Recognition in Toxic Habits context.
- 1. Barcelona Supercomputing Center
Description
ToxHabits-NER stands for Toxic Habits Named Entity Recognition. It is a gold-standard annotated dataset focused on the detection and classification of toxic habit mentions — such as tobacco use, alcohol consumption, and illicit drug use — within clinical texts written in Spanish.
This repository includes the corpus' train and test sets in multiple formats. The annotations cover mentions of toxic habits, associated behaviors, and contextual cues relevant for clinical and epidemiological studies. For more information, please check the attached README file.
ToxHabits-NER was developed by the Barcelona Supercomputing Center's NLP for Biomedical Information Analysis team. It is intended as a resource to advance research on toxic habit extraction and normalization tasks from medical documents.
This dataset is part of the ToxHabits Shared Task of BioCreative IX.
For more information on the corpus, annotation scheme, and usage instructions, please visit: [link to your project page if you have one, or Zenodo landing page].
Please cite if you use this resource:
Salvador Lima-López, Wesam Alnabki, Gabriel Vayá-Abad, and Martin Krallinger.
ToxHabits-NER: A Gold-Standard Annotated Dataset for Named Entity Recognition in Toxic Habits Context.
Zenodo. 2025.
DOI: 10.5281/zenodo.15304758.
@misc{toxhabitsner,
title={ToxHabits-NER: A Gold-Standard Annotated Dataset for Named Entity Recognition in Toxic Habits Context},
author={Lima-López, Salvador and Alnabki, Wesam and Vayá-Abad, Gabriel and Krallinger, Martin},
year={2025},
publisher={Zenodo},
doi={YOUR_DOI_HERE},
url={https://doi.org/10.5281/zenodo.15304758
}
Related Links:
-
Annotation Guidelines (Spanish): https://doi.org/10.5281/zenodo.15753952
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Contact
If you have any questions or suggestions, please contact us at:
- Wesam Alnabki (<wesam [dot] alnabki [dot] bsc [at] gmail [dot] com>)
- Gabriel Vaya (<gvaya [dot] bsc [at] gmail [dot] com>)