Published January 15, 2026 | Version v1
Dataset Open

TWEDDIT: A Dataset of Triggering Experiences Predominantly Shared by Women on Reddit

Description

https://doi.org/10.48550/arXiv.2601.11819

TWEDDIT consists of 5,000 Reddit posts collected from approximately 22 support-oriented subreddits spanning mental health, reproductive health, interpersonal relationships, and trauma-related discussions. The dataset includes content from communities such as abortion, BabyBumps, Miscarriage, OBGYN, WomensHealth, depression, PTSD, CPTSD, therapy, metoo, harassment, assault, relationships, relationship_advice, Parenting, Pregnant, raisedbynarcissists, offmychest, TrueOffMyChest, AmITheAsshole, and related forums.

Each instance contains the post title, post body, and a set of post-level trigger-warning annotations (Tags). The tags capture the presence of potentially sensitive content, including categories such as Abuse, Aggression, Discrimination, Medical, Mental Health, Pregnancy, Sexual, and not-applicable cases. 

The dataset is designed to support research on automatic trigger-warning detection, sensitive content moderation, and computational social science analysis of self-disclosure and support-seeking behavior on social media. The overall data structure and annotation schema define the expected fields and label conventions used across the corpus.

Important Note: For proper attribution, researchers who use this dataset in their work are kindly requested to cite the following paper describing the dataset and its analyses:

Bandela, S. R., Parthasarathy, S., & Garg, V. (2026). TWeddit: A Dataset of Triggering Stories Predominantly Shared by Women on Reddit. arXiv preprint arXiv:2601.11819. DOI: https://doi.org/10.48550/arXiv.2601.11819

arXiv  |  GitHub  | Zenodo

Files

TWeddit.csv

Files (9.0 MB)

Name Size Download all
md5:5898753a1f5c43ae0b4dde13a19002e2
9.0 MB Preview Download