Conference paper Open Access

The Pushshift Reddit Dataset

Baumgartner, Jason; Zannettou, Savvas; Keegan, Brian; Squire, Megan; Blackburn, Jeremy

The Pushshift Reddit Dataset

We provide a small sample of the Pushshift Reddit dataset. The sample consists of two files:

RS_2019-04.zst: All Reddit submissions that were posted during April 2019.

RC_2019-04.zst: All Reddit comments that were posted during April 2019.

The full dataset can be downloaded from: https://files.pushshift.io/reddit/submissions/ for submissions and https://files.pushshift.io/reddit/comments/ for comments. In the website, you can find a file for each month of our data collection. Each file is a newline delimited json (ndjson) file , where each line contains the json object of a submission or a comment.

 

 

 

Files (21.1 GB)
Name Size
RC_2019-04.zst
md5:5651d5fc9ab9577a56be33e8f52c2bdf
15.5 GB Download
RS_2019-04.zst
md5:e24ecb20e08751f0bf3b9189860d7ac9
5.6 GB Download
1,568
212
views
downloads
All versions This version
Views 1,5681,568
Downloads 212212
Data volume 2.2 TB2.2 TB
Unique views 1,4471,447
Unique downloads 152152

Share

Cite as