Dataset Open Access
Low, Daniel M.;
Rumker, Laurie;
Talker, Tanya;
Torous, John;
Cecchi, Guillermo;
Ghosh, Satrajit S.
<?xml version='1.0' encoding='utf-8'?> <resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd"> <identifier identifierType="URL">https://zenodo.org/record/3941387</identifier> <creators> <creator> <creatorName>Low, Daniel M.</creatorName> <givenName>Daniel M.</givenName> <familyName>Low</familyName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-8866-8667</nameIdentifier> <affiliation>Harvard Medical School & MIT</affiliation> </creator> <creator> <creatorName>Rumker, Laurie</creatorName> <givenName>Laurie</givenName> <familyName>Rumker</familyName> <affiliation>Harvard Medical School</affiliation> </creator> <creator> <creatorName>Talker, Tanya</creatorName> <givenName>Tanya</givenName> <familyName>Talker</familyName> <affiliation>Harvard Medical School & MIT Lincoln Labs</affiliation> </creator> <creator> <creatorName>Torous, John</creatorName> <givenName>John</givenName> <familyName>Torous</familyName> <affiliation>Beth Israel Deaconess Medical Center, Harvard Medical School</affiliation> </creator> <creator> <creatorName>Cecchi, Guillermo</creatorName> <givenName>Guillermo</givenName> <familyName>Cecchi</familyName> <affiliation>IBM</affiliation> </creator> <creator> <creatorName>Ghosh, Satrajit S.</creatorName> <givenName>Satrajit S.</givenName> <familyName>Ghosh</familyName> <affiliation>MIT & Harvard Medical School</affiliation> </creator> </creators> <titles> <title>Reddit Mental Health Dataset</title> </titles> <publisher>Zenodo</publisher> <publicationYear>2020</publicationYear> <subjects> <subject>Natural Language Processing</subject> <subject>Mental Health</subject> <subject>Psychiatry</subject> <subject>COVID-19</subject> <subject>Reddit</subject> <subject>Social Media</subject> </subjects> <dates> <date dateType="Issued">2020-07-13</date> </dates> <language>en</language> <resourceType resourceTypeGeneral="Dataset"/> <alternateIdentifiers> <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/3941387</alternateIdentifier> </alternateIdentifiers> <relatedIdentifiers> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo" resourceTypeGeneral="Dataset">10.17605/OSF.IO/7PEYQ</relatedIdentifier> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsDocumentedBy" resourceTypeGeneral="Text">10.31234/osf.io/xvwcy</relatedIdentifier> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.17605/OSF.IO/7PEYQ</relatedIdentifier> <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/covid-19</relatedIdentifier> <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/medicalnlp</relatedIdentifier> <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/natural-language-processing</relatedIdentifier> <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/zenodo</relatedIdentifier> </relatedIdentifiers> <version>01</version> <rightsList> <rights rightsURI="http://www.opendefinition.org/licenses/odc-pddl">Open Data Commons Public Domain Dedication and Licence 1.0</rights> <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights> </rightsList> <descriptions> <description descriptionType="Abstract"><div>&nbsp;</div> <p>&nbsp;</p> <p>This dataset contains posts from 28 subreddits (15 mental health support groups) from 2018-2020. We used this dataset to understand the impact of COVID-19 on mental health support groups from January to April, 2020&nbsp;and included older timeframes to obtain baseline&nbsp;posts before COVID-19.</p> <p><strong>Please cite if you use this dataset:</strong></p> <p>Low, D. M., Rumker, L., Torous, J., Cecchi, G., Ghosh, S. S., &amp; Talkar, T. (2020). Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study.&nbsp;<em>Journal of medical Internet research</em>,&nbsp;<em>22</em>(10), e22635.</p> <pre>@article{low2020natural, title={Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study}, author={Low, Daniel M and Rumker, Laurie and Torous, John and Cecchi, Guillermo and Ghosh, Satrajit S and Talkar, Tanya}, journal={Journal of medical Internet research}, volume={22}, number={10}, pages={e22635}, year={2020}, publisher={JMIR Publications Inc., Toronto, Canada} }</pre> <p><br> <strong>License</strong></p> <p>This dataset is made available under the Public Domain Dedication and License v1.0 whose full text can be found at:&nbsp;<a href="http://www.opendatacommons.org/licenses/pddl/1.0/">http://www.opendatacommons.org/licenses/pddl/1.0/</a></p> <p>It was downloaded using pushshift API. Re-use of this data is subject to Reddit API terms.</p> <p>&nbsp;</p> <p><strong>Reddit Mental Health Dataset</strong></p> <p>Contains posts and text features for the following timeframes from 28 mental health and non-mental health subreddits:</p> <ul> <li><strong>15 specific mental health support groups</strong>&nbsp;(r/EDAnonymous, r/addiction, r/alcoholism, r/adhd, r/anxiety, r/autism, r/bipolarreddit, r/bpd, r/depression, r/healthanxiety, r/lonely, r/ptsd, r/schizophrenia, r/socialanxiety, and r/suicidewatch)</li> <li><strong>2 broad mental health</strong>&nbsp;subreddits (r/mentalhealth, r/COVID19_support)</li> <li><strong>11 non-mental health subreddits</strong>&nbsp;(r/conspiracy, r/divorce, r/fitness, r/guns, r/jokes, r/legaladvice, r/meditation, r/parenting, r/personalfinance, r/relationships, r/teaching).</li> </ul> <p><code>filenames</code>&nbsp;and corresponding timeframes:</p> <ul> <li><code>post:</code>&nbsp;Jan 1 to April 20, 2020 (called &quot;mid-pandemic&quot; in manuscript; r/COVID19_support appears).&nbsp;Unique users: 320,364.&nbsp;</li> <li><code>pre:</code>&nbsp;Dec 2018 to Dec 2019. A full year which provides more data for a baseline of Reddit posts.&nbsp;Unique users: 327,289.</li> <li><code>2019:</code>&nbsp;Jan 1 to April 20, 2019 (r/EDAnonymous appears). A control for seasonal fluctuations to match&nbsp;<code>post</code>&nbsp;data.&nbsp;Unique users: 282,560.</li> <li><code>2018:</code>&nbsp;Jan 1 to April 20, 2018. A control for seasonal fluctuations to match&nbsp;<code>post</code>&nbsp;data.&nbsp;Unique users: 177,089</li> </ul> <p>Unique users across all time windows (pre and 2019 overlap): 826,961.</p> <p>See manuscript Supplementary Materials (https://doi.org/10.31234/osf.io/xvwcy) for more information.</p> <p>Note: if subsampling (e.g., to balance subreddits), we recommend bootstrapping analyses for unbiased results.</p> <p>&nbsp;</p></description> </descriptions> </resource>
Views | 1,339 |
Downloads | 3,492 |
Data volume | 79.9 GB |
Unique views | 1,153 |
Unique downloads | 1,312 |