Published April 24, 2025 | Version v1
Dataset Restricted

MentalRiskES corpus

Description

MentalRiskES is a new dataset about mental disorders in Spanish. The dataset is divided into three distinct mental disorders:

  • Eating Disorder
  • Depression
  • Anxiety

Each dataset contains a set of subjects and their message thread in a Telegram social network chat.

How is constructed?
Public groups on the Telegram social network were accessed, and conversations were extracted from them. This data was processed, and we kept only the text messages, excluding images, audio, etc. In order to carry out the annotation, a subset of messages was extracted from each subject. This message thread was annotated by 10 different annotators through the Prolific platform and made use of the Doccano annotation platform.

In this way, we associated a user ID with some tags that emerged after averaging the annotators' decisions. The labels available for each set are:

  • Eating Disorder: suffer (s), control (c)
  • Depression: suffer + in favour (sf), suffer + against (sa), suffer + other (so), control (c)
  • Anxiety: suffer (s), control (c)

Labels
The values available in Anxiety files are:

  • bs (binary suffer): 1 if the subject suffers and 0 if not according to the frequency of the labels (in case of a tie it is marked as suffers)
  • bc (binary control): 1 if the subject does not suffer and 0 if they do according to the frequency of the labels (in case of a tie it is marked as suffers)
  • rbs (regression binary suffer): number of times the subject has been marked as suffering among the total number of scorers, i.e., 10
  • rbc (regression binary control): number of times the subject has been marked as not suffering among the total number of scorers, i.e., 10

The values available in the Depression and Eating Disorders files are:

  • bs (binary suffer): 1 if the subject suffers and 0 if not, according to the frequency of the labels (in case of a tie it is marked as suffers)
  • bsf (binary suffer favour): 1 if the subject suffers and is in favour,r and 0 if not according to the frequency of the labels
  • bsa (binary suffer against): 1 if the subject suffers and is against, and 0 if not according to the frequency of the labels
  • bso (binary suffer other): 1 if the subject suffers and is neither in favour nor against and 0 if not according to the frequency of the labels
  • bc (binary control): 1 if the subject does not suffer and 0 if they do according to the frequency of the labels (in case of a tie it is marked as suffers)
  • rbs (regression binary suffer): number of times the subject has been marked as suffering among the total number of scorers, i.e., 10
  • rbc (regression binary control): number of times the subject has been marked as not suffering among the total number of scorers, i.e., 10
  • rsf (regression suffer favour): number of times the subject has been marked as suffering and in favour among the total number of scorers, i.e., 10
  • rsa (regression suffer against): number of times the subject has been marked as suffering and against and in favour among the total number of scorers, i.e., 10
  • rso (regression suffer other): number of times the subject has been marked as suffering and is neither in favour nor against among the total number of scorers, i.e., 10
  • rc (regression control): number of times the subject has been marked as not suffering among the total number of scorers, i.e., 10 (Note that it is equal to 'rbc')
    So, the labels 'rbs' and 'rbc' must sum to 1, and the labels 'rsf','rsa', 'rso' and 'rc' must sum to 1 too.

Preprocessing
The same corpus is found with emojis or without emojis; that is to say, in the folder 'processed' is the corpus with emojis in text format, while in the folder 'raw' is the corpus with emojis in original format.

MentalRiskES evaluation campaign
MentalRiskES is a shared task organized at IberLEF. The aim of this task is to promote the early detection of mental risk disorders in Spanish. In this task we made use of the corpusMentalRiskES, the partitions used are available in the folder MentalRiskES2023edition.zip provided in git (https://github.com/sinai-uja/corpusMentalRiskES). To cite the task: Mármol-Romero, A. M., Moreno-Muñoz, A., Plaza-del-Arco, F. M., Molina-González, M. D., Martín-Valdivia, M. T., Ureña-López, L. A., & Montejo-Raéz, A. (2023). Overview of MentalriskES at IberLEF 2023: Early Detection of Mental Disorders Risk in Spanish. Procesamiento del Lenguaje Natural, 71, 329-350.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Additional titles

Subtitle (English)
Corpus for early detection of mental disorders in Spanish

Related works

Is supplement to
Dataset: 10.5281/zenodo.8055604 (DOI)

References

  • Mármól-Romero, A. M., Moreno-Muñoz, A., Plaza-del-Arco, F. M., Molina-González, M. D., Martín-Valdivia, M. T., Ureña-López, L. A., & Montejo-Ráez, A. (2024). MentalRiskES: A new corpus for early detection of mental disorders in Spanish. En N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 11204–11214). ELRA and ICCL.