Reddit climate emotions and topic analysis
Authors/Creators
Description
# Data Files Description for reddit_Paper1_main_NHB_plots.ipynb
# ================================================================
## Main Data Files
- paper1_fastopic_themes.json: Topic-to-theme mapping (Solution, Cause, Catastrophic Impact, Societal/Scientific Response) for 100 FasTopic topics
- df_joined_w_FasTopics.pkl: Main Reddit dataset (submissions + comments) with assigned topic IDs and metadata (scores, timestamps, etc.)
## Statistical Analysis
- paper1_fig1_KStest.json: Kolmogorov-Smirnov test results comparing topic distributions vs global baseline (endorsement, engagement metrics)
## Emotion Analysis - Climate Data
- df_emotions_w_topics/*.pkl (20 files): Chunked emotion predictions for climate-related Reddit comments, merged with topic assignments. Contains 28 emotion scores per comment (joy, fear, anger, etc.) from emotion classification model.
## Emotion Analysis - Baseline Comparisons
- emotions-casualconversation-baseline/: CasualConversation subreddit baseline (100k sample) - neutral conversation control
- emotions-nostupidquestions-baseline/: NoStupidQuestions subreddit baseline (100k sample) - informational Q&A control
- emotions-fitness-baseline/: Fitness subreddit baseline (100k sample) - lifestyle/health control
Each baseline contains:
- emotions_model_outputs_*.pkl: Emotion predictions for baseline comments
- *_100k_sample.parquet: Original comment text and metadata
## Output Data
- 00_fig3_communication.json: Communication pattern analysis results (engagement vs emotional content) for Figure 3
## Purpose
This dataset supports analysis of climate discourse on Reddit: topic volume/endorsement/engagement patterns (Fig 1), emotion distributions vs baselines (Fig 2), and communication-emotion relationships (Fig 3).
Files
data.zip
Additional details
Software
- Programming language
- Python