Conference paper Open Access

The Pushshift Telegram Dataset

Baumgartner, Jason; Zannettou, Savvas; Squire, Megan; Blackburn, Jeremy

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Baumgartner, Jason</dc:creator>
  <dc:creator>Zannettou, Savvas</dc:creator>
  <dc:creator>Squire, Megan</dc:creator>
  <dc:creator>Blackburn, Jeremy</dc:creator>
  <dc:description>The Pushshift Telegram Dataset

The dataset consists of three files:

Accounts.ndjson: Provides data for 2.2M Telegram users that were active in the channels we crawled.

Channels.ndjson: Provides data for 28K Telegram channels that we crawled.

Messages.ndjson: Provides data for 317M Telegram messages that were posted by 2.2M Telegram users in 28K Telegram channels.

Each file is a newline delimited json (ndjson) file that includes a json object with the data for each account/channel/message. The format of each object is according to the Telethon API (, which is a Python interface for Telegram's API.</dc:description>
  <dc:title>The Pushshift Telegram Dataset</dc:title>
All versions This version
Views 3,7333,733
Downloads 4,5934,593
Data volume 178.2 TB178.2 TB
Unique views 3,2793,279
Unique downloads 1,8961,896


Cite as