Dataset Open Access

Reddit Mental Health Dataset

Low, Daniel M.; Rumker, Laurie; Talker, Tanya; Torous, John; Cecchi, Guillermo; Ghosh, Satrajit S.


Citation Style Language JSON Export

{
  "DOI": "10.17605/OSF.IO/7PEYQ", 
  "language": "eng", 
  "title": "Reddit Mental Health Dataset", 
  "issued": {
    "date-parts": [
      [
        2020, 
        7, 
        13
      ]
    ]
  }, 
  "abstract": "<div>&nbsp;</div>\n\n<p>This dataset contains posts from 28 subreddits (15 mental health support groups) from 2018-2020. We used this dataset to understand the impact of COVID-19 on mental health support groups from January to April, 2020&nbsp;and included older timeframes to obtain baseline&nbsp;posts before COVID-19.</p>\n\n<p><strong>Please cite if you use this dataset:</strong></p>\n\n<p>Low DM, Rumker L, Talker T, Torous J, Cecchi G, Ghosh SS (2020). Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on Reddit during COVID-19.&nbsp;<em>PsyArXiv</em>.&nbsp;https://doi.org/10.31234/osf.io/xvwcy</p>\n\n<p><strong>License</strong></p>\n\n<p>This dataset is made available under the Public Domain Dedication and License v1.0 whose full text can be found at:&nbsp;<a href=\"http://www.opendatacommons.org/licenses/pddl/1.0/\">http://www.opendatacommons.org/licenses/pddl/1.0/</a></p>\n\n<p>It was downloaded using pushshift API. Re-use of this data is subject to Reddit API terms.</p>\n\n<p>&nbsp;</p>\n\n<p><strong>Reddit Mental Health Dataset</strong></p>\n\n<p>Contains posts and text features for the following timeframes from 28 mental health and non-mental health subreddits:</p>\n\n<ul>\n\t<li><strong>15 specific mental health support groups</strong>&nbsp;(r/EDAnonymous, r/addiction, r/alcoholism, r/adhd, r/anxiety, r/autism, r/bipolarreddit, r/bpd, r/depression, r/healthanxiety, r/lonely, r/ptsd, r/schizophrenia, r/socialanxiety, and r/suicidewatch)</li>\n\t<li><strong>2 broad mental health</strong>&nbsp;subreddits (r/mentalhealth, r/COVID19_support)</li>\n\t<li><strong>11 non-mental health subreddits</strong>&nbsp;(r/conspiracy, r/divorce, r/fitness, r/guns, r/jokes, r/legaladvice, r/meditation, r/parenting, r/personalfinance, r/relationships, r/teaching).</li>\n</ul>\n\n<p><code>filenames</code>&nbsp;and corresponding timeframes:</p>\n\n<ul>\n\t<li><code>post:</code>&nbsp;Jan 1 to April 20, 2020 (called &quot;mid-pandemic&quot; in manuscript; r/COVID19_support appears).&nbsp;Unique users: 320,364.&nbsp;</li>\n\t<li><code>pre:</code>&nbsp;Dec 2018 to Dec 2019. A full year which provides more data for a baseline of Reddit posts.&nbsp;Unique users: 327,289.</li>\n\t<li><code>2019:</code>&nbsp;Jan 1 to April 20, 2019 (r/EDAnonymous appears). A control for seasonal fluctuations to match&nbsp;<code>post</code>&nbsp;data.&nbsp;Unique users: 282,560.</li>\n\t<li><code>2018:</code>&nbsp;Jan 1 to April 20, 2018. A control for seasonal fluctuations to match&nbsp;<code>post</code>&nbsp;data.&nbsp;Unique users: 177,089</li>\n</ul>\n\n<p>Unique users across all time windows (pre and 2019 overlap): 826,961.</p>\n\n<p>See manuscript Supplementary Materials (https://doi.org/10.31234/osf.io/xvwcy) for more information.</p>\n\n<p>Note: if subsampling (e.g., to balance subreddits), we recommend bootstrapping analyses for unbiased results.</p>\n\n<p>&nbsp;</p>", 
  "author": [
    {
      "family": "Low, Daniel M."
    }, 
    {
      "family": "Rumker, Laurie"
    }, 
    {
      "family": "Talker, Tanya"
    }, 
    {
      "family": "Torous, John"
    }, 
    {
      "family": "Cecchi, Guillermo"
    }, 
    {
      "family": "Ghosh, Satrajit S."
    }
  ], 
  "version": "01", 
  "type": "dataset", 
  "id": "3941387"
}