Conference paper Open Access

The Pushshift Telegram Dataset

Baumgartner, Jason; Zannettou, Savvas; Squire, Megan; Blackburn, Jeremy

The Pushshift Telegram Dataset

The dataset consists of three files:

Accounts.ndjson: Provides data for 2.2M Telegram users that were active in the channels we crawled.

Channels.ndjson: Provides data for 28K Telegram channels that we crawled.

Messages.ndjson: Provides data for 317M Telegram messages that were posted by 2.2M Telegram users in 28K Telegram channels.

Each file is a newline delimited json (ndjson) file that includes a json object with the data for each account/channel/message. The format of each object is according to the Telethon API (https://docs.telethon.dev/en/latest/), which is a Python interface for Telegram's API.

Files (52.1 GB)
Name Size
accounts.ndjson.zst
md5:a68ec75718218796caba58b8a06f871b
125.2 MB Download
channels.ndjson.zst
md5:80e678333976597384dcda0054677b67
7.3 MB Download
messages.ndjson.zst
md5:ed90ff295afebccd47262280aecf33f2
51.9 GB Download
1,736
3,280
views
downloads
All versions This version
Views 1,7361,736
Downloads 3,2803,280
Data volume 141.2 TB141.2 TB
Unique views 1,5441,544
Unique downloads 1,1921,192

Share

Cite as