Detecting Symptoms of Depression on Reddit
Authors/Creators
- 1. National Institute on Drug Abuse, National Institutes of Health
- 2. University of Pennsylvania
- 3. Stanford University
Description
Depression is known to have heterogeneous symptom manifestations. Investigating various symptoms of depression is essential to understanding underlying mechanisms and personalizing treatments.
We use Reddit posts from depression and mental health-related subreddits to detect symptoms of depression in a distantly supervised manner. Specifically:
- We identified the online language markers of 13 symptoms of depression using 1,318,749 posts from 43 subreddit communities.
- We built 13 prediction models (based on RoBERTa embeddings) that can detect specific symptom discourse vs. posts from control subreddits contributed by the same Reddit users.
- We validated the prediction models on a sample who shared their Facebook posts and also took self-report depression (PHQ-9), anxiety (GAD-7), and loneliness (UCLA-3) surveys.
The description of the data and models is part of our paper published at the 15th ACM Web Science Conference 2023 as a full paper.
Brief Description of the Lexica
We employed happierfuntokenizer from the DLATK Python library to tokenize all posts. Using Latent Dirichlet Allocation (LDA) with MALLET implementation, we generated 200 topics at an alpha level of 5. We then analyzed the topic distribution of all posts in the Reddit dataset for each depression symptom.
The current data includes:
- symptom vs. control
- symptom vs. control + all other symptoms
To learn more about how to utilize the lexica, please refer to this link: https://github.com/sjgiorgi/dlatk-lexica