Dataset Open Access
Low, Daniel M.;
Rumker, Laurie;
Talker, Tanya;
Torous, John;
Cecchi, Guillermo;
Ghosh, Satrajit S.
{ "DOI": "10.17605/OSF.IO/7PEYQ", "language": "eng", "title": "Reddit Mental Health Dataset", "issued": { "date-parts": [ [ 2020, 7, 13 ] ] }, "abstract": "<div> </div>\n\n<p> </p>\n\n<p>This dataset contains posts from 28 subreddits (15 mental health support groups) from 2018-2020. We used this dataset to understand the impact of COVID-19 on mental health support groups from January to April, 2020 and included older timeframes to obtain baseline posts before COVID-19.</p>\n\n<p><strong>Please cite if you use this dataset:</strong></p>\n\n<p>Low, D. M., Rumker, L., Torous, J., Cecchi, G., Ghosh, S. S., & Talkar, T. (2020). Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study. <em>Journal of medical Internet research</em>, <em>22</em>(10), e22635.</p>\n\n<pre>@article{low2020natural,\n title={Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study},\n author={Low, Daniel M and Rumker, Laurie and Torous, John and Cecchi, Guillermo and Ghosh, Satrajit S and Talkar, Tanya},\n journal={Journal of medical Internet research},\n volume={22},\n number={10},\n pages={e22635},\n year={2020},\n publisher={JMIR Publications Inc., Toronto, Canada}\n}</pre>\n\n<p><br>\n<strong>License</strong></p>\n\n<p>This dataset is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: <a href=\"http://www.opendatacommons.org/licenses/pddl/1.0/\">http://www.opendatacommons.org/licenses/pddl/1.0/</a></p>\n\n<p>It was downloaded using pushshift API. Re-use of this data is subject to Reddit API terms.</p>\n\n<p> </p>\n\n<p><strong>Reddit Mental Health Dataset</strong></p>\n\n<p>Contains posts and text features for the following timeframes from 28 mental health and non-mental health subreddits:</p>\n\n<ul>\n\t<li><strong>15 specific mental health support groups</strong> (r/EDAnonymous, r/addiction, r/alcoholism, r/adhd, r/anxiety, r/autism, r/bipolarreddit, r/bpd, r/depression, r/healthanxiety, r/lonely, r/ptsd, r/schizophrenia, r/socialanxiety, and r/suicidewatch)</li>\n\t<li><strong>2 broad mental health</strong> subreddits (r/mentalhealth, r/COVID19_support)</li>\n\t<li><strong>11 non-mental health subreddits</strong> (r/conspiracy, r/divorce, r/fitness, r/guns, r/jokes, r/legaladvice, r/meditation, r/parenting, r/personalfinance, r/relationships, r/teaching).</li>\n</ul>\n\n<p><code>filenames</code> and corresponding timeframes:</p>\n\n<ul>\n\t<li><code>post:</code> Jan 1 to April 20, 2020 (called "mid-pandemic" in manuscript; r/COVID19_support appears). Unique users: 320,364. </li>\n\t<li><code>pre:</code> Dec 2018 to Dec 2019. A full year which provides more data for a baseline of Reddit posts. Unique users: 327,289.</li>\n\t<li><code>2019:</code> Jan 1 to April 20, 2019 (r/EDAnonymous appears). A control for seasonal fluctuations to match <code>post</code> data. Unique users: 282,560.</li>\n\t<li><code>2018:</code> Jan 1 to April 20, 2018. A control for seasonal fluctuations to match <code>post</code> data. Unique users: 177,089</li>\n</ul>\n\n<p>Unique users across all time windows (pre and 2019 overlap): 826,961.</p>\n\n<p>See manuscript Supplementary Materials (https://doi.org/10.31234/osf.io/xvwcy) for more information.</p>\n\n<p>Note: if subsampling (e.g., to balance subreddits), we recommend bootstrapping analyses for unbiased results.</p>\n\n<p> </p>", "author": [ { "family": "Low, Daniel M." }, { "family": "Rumker, Laurie" }, { "family": "Talker, Tanya" }, { "family": "Torous, John" }, { "family": "Cecchi, Guillermo" }, { "family": "Ghosh, Satrajit S." } ], "version": "01", "type": "dataset", "id": "3941387" }
Views | 1,339 |
Downloads | 3,492 |
Data volume | 79.9 GB |
Unique views | 1,153 |
Unique downloads | 1,312 |