Dataset Open Access

Reddit Mental Health Dataset

Low, Daniel M.; Rumker, Laurie; Talker, Tanya; Torous, John; Cecchi, Guillermo; Ghosh, Satrajit S.


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="URL">https://zenodo.org/record/3941387</identifier>
  <creators>
    <creator>
      <creatorName>Low, Daniel M.</creatorName>
      <givenName>Daniel M.</givenName>
      <familyName>Low</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-8866-8667</nameIdentifier>
      <affiliation>Harvard Medical School &amp; MIT</affiliation>
    </creator>
    <creator>
      <creatorName>Rumker, Laurie</creatorName>
      <givenName>Laurie</givenName>
      <familyName>Rumker</familyName>
      <affiliation>Harvard Medical School</affiliation>
    </creator>
    <creator>
      <creatorName>Talker, Tanya</creatorName>
      <givenName>Tanya</givenName>
      <familyName>Talker</familyName>
      <affiliation>Harvard Medical School &amp; MIT Lincoln Labs</affiliation>
    </creator>
    <creator>
      <creatorName>Torous, John</creatorName>
      <givenName>John</givenName>
      <familyName>Torous</familyName>
      <affiliation>Beth Israel Deaconess Medical Center, Harvard Medical School</affiliation>
    </creator>
    <creator>
      <creatorName>Cecchi, Guillermo</creatorName>
      <givenName>Guillermo</givenName>
      <familyName>Cecchi</familyName>
      <affiliation>IBM</affiliation>
    </creator>
    <creator>
      <creatorName>Ghosh, Satrajit S.</creatorName>
      <givenName>Satrajit S.</givenName>
      <familyName>Ghosh</familyName>
      <affiliation>MIT &amp; Harvard Medical School</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Reddit Mental Health Dataset</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2020</publicationYear>
  <subjects>
    <subject>Natural Language Processing</subject>
    <subject>Mental Health</subject>
    <subject>Psychiatry</subject>
    <subject>COVID-19</subject>
    <subject>Reddit</subject>
    <subject>Social Media</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2020-07-13</date>
  </dates>
  <language>en</language>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/3941387</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo" resourceTypeGeneral="Dataset">10.17605/OSF.IO/7PEYQ</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsDocumentedBy" resourceTypeGeneral="Text">10.31234/osf.io/xvwcy</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.17605/OSF.IO/7PEYQ</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/covid-19</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/medicalnlp</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/natural-language-processing</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/zenodo</relatedIdentifier>
  </relatedIdentifiers>
  <version>01</version>
  <rightsList>
    <rights rightsURI="http://www.opendefinition.org/licenses/odc-pddl">Open Data Commons Public Domain Dedication and Licence 1.0</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;div&gt;&amp;nbsp;&lt;/div&gt;

&lt;p&gt;This dataset contains posts from 28 subreddits (15 mental health support groups) from 2018-2020. We used this dataset to understand the impact of COVID-19 on mental health support groups from January to April, 2020&amp;nbsp;and included older timeframes to obtain baseline&amp;nbsp;posts before COVID-19.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Please cite if you use this dataset:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Low DM, Rumker L, Talker T, Torous J, Cecchi G, Ghosh SS (2020). Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on Reddit during COVID-19.&amp;nbsp;&lt;em&gt;PsyArXiv&lt;/em&gt;.&amp;nbsp;https://doi.org/10.31234/osf.io/xvwcy&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This dataset is made available under the Public Domain Dedication and License v1.0 whose full text can be found at:&amp;nbsp;&lt;a href="http://www.opendatacommons.org/licenses/pddl/1.0/"&gt;http://www.opendatacommons.org/licenses/pddl/1.0/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It was downloaded using pushshift API. Re-use of this data is subject to Reddit API terms.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reddit Mental Health Dataset&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Contains posts and text features for the following timeframes from 28 mental health and non-mental health subreddits:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;strong&gt;15 specific mental health support groups&lt;/strong&gt;&amp;nbsp;(r/EDAnonymous, r/addiction, r/alcoholism, r/adhd, r/anxiety, r/autism, r/bipolarreddit, r/bpd, r/depression, r/healthanxiety, r/lonely, r/ptsd, r/schizophrenia, r/socialanxiety, and r/suicidewatch)&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;2 broad mental health&lt;/strong&gt;&amp;nbsp;subreddits (r/mentalhealth, r/COVID19_support)&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;11 non-mental health subreddits&lt;/strong&gt;&amp;nbsp;(r/conspiracy, r/divorce, r/fitness, r/guns, r/jokes, r/legaladvice, r/meditation, r/parenting, r/personalfinance, r/relationships, r/teaching).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;filenames&lt;/code&gt;&amp;nbsp;and corresponding timeframes:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;code&gt;post:&lt;/code&gt;&amp;nbsp;Jan 1 to April 20, 2020 (called &amp;quot;mid-pandemic&amp;quot; in manuscript; r/COVID19_support appears).&amp;nbsp;Unique users: 320,364.&amp;nbsp;&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;pre:&lt;/code&gt;&amp;nbsp;Dec 2018 to Dec 2019. A full year which provides more data for a baseline of Reddit posts.&amp;nbsp;Unique users: 327,289.&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;2019:&lt;/code&gt;&amp;nbsp;Jan 1 to April 20, 2019 (r/EDAnonymous appears). A control for seasonal fluctuations to match&amp;nbsp;&lt;code&gt;post&lt;/code&gt;&amp;nbsp;data.&amp;nbsp;Unique users: 282,560.&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;2018:&lt;/code&gt;&amp;nbsp;Jan 1 to April 20, 2018. A control for seasonal fluctuations to match&amp;nbsp;&lt;code&gt;post&lt;/code&gt;&amp;nbsp;data.&amp;nbsp;Unique users: 177,089&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unique users across all time windows (pre and 2019 overlap): 826,961.&lt;/p&gt;

&lt;p&gt;See manuscript Supplementary Materials (https://doi.org/10.31234/osf.io/xvwcy) for more information.&lt;/p&gt;

&lt;p&gt;Note: if subsampling (e.g., to balance subreddits), we recommend bootstrapping analyses for unbiased results.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
  </descriptions>
</resource>