Dataset of suicidal ideation texts in Brazilian Portuguese - Boamente System
Creators
Contributors
Hosting institution:
Description
We obtained non-clinical texts from tweets (user posts of the online social network Twitter). To find suicide-related tweets, we used the Twitter API to download tweets in a personalized way based on search terms associated with suicide. After different experiments to retrieve relevant texts, 5699 tweets were collected in May 2021. Each downloaded tweet had user-specific information (for example, user ID, timestamp, language, location, number of likes, etc.). Still, we kept only the post content (suicide-related texts) and discarded the additional data. Therefore, all texts were anonymized.
After data collection, three psychologists were invited to perform the data annotation, in which they individually labeled each tweet. To avoid bias in the annotation process, we selected psychologists with different psychological approaches, namely cognitive behavioral theory, psychoanalytic theory, and humanistic theory. Professionals had to classify each tweet as negative for suicidal ideation (annotated as 0), or positive for suicidal ideation (annotated as 1).
All tweets with at least one divergence between psychologists (n = 1513) were excluded, resulting in a dataset with 4186 instances. 398 duplicate tweets were excluded. The final dataset consists of 2691 instances labeled negative and 1097 labeled positive.
Files
boamente_dataset.csv
Files
(349.6 kB)
Name | Size | Download all |
---|---|---|
md5:222acfd0582d89ad85a4005c62370e18
|
349.6 kB | Preview Download |
Additional details
Related works
- Is cited by
- Conference proceeding: 10.1016/j.procs.2022.09.093 (DOI)
References
- Diniz, Evandro J. S., José E. Fontenele, Adonias C. de Oliveira, Victor H. Bastos, Silmar Teixeira, Ricardo L. Rabêlo, Dario B. Calçada, Renato M. dos Santos, Ana K. de Oliveira, and Ariel S. Teles. 2022. "Boamente: A Natural Language Processing-Based Digital Phenotyping Tool for Smart Monitoring of Suicidal Ideation" Healthcare 10, no. 4: 698. https://doi.org/10.3390/healthcare10040698