Published May 28, 2025 | Version v5
Dataset Open

ToxHabits-NER: A Gold-Standard annotated Dataset for Named Entity Recognition in Toxic Habits context.

  • 1. Barcelona Supercomputing Center

Description

ToxHabits-NER stands for Toxic Habits Named Entity Recognition. It is a gold-standard annotated dataset focused on the detection and classification of toxic habit mentions — such as tobacco use, alcohol consumption, and illicit drug use — within clinical texts written in Spanish.

This repository includes the corpus' train and test sets in multiple formats. The annotations cover mentions of toxic habits, associated behaviors, and contextual cues relevant for clinical and epidemiological studies. For more information, please check the attached README file.

ToxHabits-NER was developed by the Barcelona Supercomputing Center's NLP for Biomedical Information Analysis team. It is intended as a resource to advance research on toxic habit extraction and normalization tasks from medical documents.

This dataset is part of the ToxHabits Shared Task of BioCreative IX. 

For more information on the corpus, annotation scheme, and usage instructions, please visit: [link to your project page if you have one, or Zenodo landing page].

Please cite if you use this resource:

Salvador Lima-López, Wesam Alnabki, Gabriel Vayá-Abad, and Martin Krallinger.
ToxHabits-NER: A Gold-Standard Annotated Dataset for Named Entity Recognition in Toxic Habits Context.
Zenodo. 2025.
DOI: 10.5281/zenodo.15304758.

@misc{toxhabitsner,
    title={ToxHabits-NER: A Gold-Standard Annotated Dataset for Named Entity Recognition in Toxic Habits Context},
    author={Lima-López, Salvador and Alnabki, Wesam and Vayá-Abad, Gabriel and Krallinger, Martin},
    year={2025},
    publisher={Zenodo},
    doi={YOUR_DOI_HERE},
    url={https://doi.org/10.5281/zenodo.15304758
}

Related Links:

  • Annotation Guidelines (Spanish): https://doi.org/10.5281/zenodo.15753952

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Contact

If you have any questions or suggestions, please contact us at:

- Wesam Alnabki (<wesam [dot] alnabki [dot] bsc [at] gmail [dot] com>)

- Gabriel Vaya (<gvaya [dot] bsc [at] gmail [dot] com>)

Files

ToxHabits(ToxNER)_Test_ANNFiles.zip

Files (8.6 MB)

Name Size Download all
md5:cfbc2a1efa2a1918530743f55667cd9d
657.4 kB Preview Download
md5:9ccddc4f13451a38ecd6432d3bcfd7a6
3.6 MB Preview Download
md5:768ae9cd4dbb9c12e1a717257f25e15d
657.4 kB Preview Download
md5:9fdf40012a0a46ac2ec3ca17822e901c
3.7 MB Preview Download