TWEDDIT: A Dataset of Triggering Experiences Predominantly Shared by Women on Reddit
Creators
Description
TWEDDIT consists of 5,000 Reddit posts collected from approximately 22 support-oriented subreddits spanning mental health, reproductive health, interpersonal relationships, and trauma-related discussions. The dataset includes content from communities such as abortion, BabyBumps, Miscarriage, OBGYN, WomensHealth, depression, PTSD, CPTSD, therapy, metoo, harassment, assault, relationships, relationship_advice, Parenting, Pregnant, raisedbynarcissists, offmychest, TrueOffMyChest, AmITheAsshole, and related forums.
Each instance contains the post title, post body, and a set of post-level trigger-warning annotations (Tags). The tags capture the presence of potentially sensitive content, including categories such as Abuse, Aggression, Discrimination, Medical, Mental Health, Pregnancy, Sexual, and not-applicable cases.
The dataset is designed to support research on automatic trigger-warning detection, sensitive content moderation, and computational social science analysis of self-disclosure and support-seeking behavior on social media. The overall data structure and annotation schema define the expected fields and label conventions used across the corpus.
Important Note: For proper attribution, researchers who use this dataset in their work are kindly requested to cite the following paper describing the dataset and its analyses:
Bandela, S. R., Parthasarathy, S., & Garg, V. (2026). TWeddit: A Dataset of Triggering Stories Predominantly Shared by Women on Reddit. arXiv preprint arXiv:2601.11819. DOI: https://doi.org/10.48550/arXiv.2601.11819
Files
TWeddit.csv
Files
(9.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5898753a1f5c43ae0b4dde13a19002e2
|
9.0 MB | Preview Download |