Published January 14, 2020 | Version v1
Conference paper Open

The Pushshift Reddit Dataset

  • 1.
  • 2. Max Planck Institute
  • 3. University of Colorado Boulder
  • 4. Elon University
  • 5. Binghamton University


The Pushshift Reddit Dataset

We provide a small sample of the Pushshift Reddit dataset. The sample consists of two files:

RS_2019-04.zst: All Reddit submissions that were posted during April 2019.

RC_2019-04.zst: All Reddit comments that were posted during April 2019.

The full dataset can be downloaded from: for submissions and for comments. In the website, you can find a file for each month of our data collection. Each file is a newline delimited json (ndjson) file , where each line contains the json object of a submission or a comment.





Files (21.1 GB)

Name Size Download all
15.5 GB Download
5.6 GB Download