Published January 14, 2020 | Version v1
Conference paper Open

The Pushshift Telegram Dataset

  • 1.
  • 2. Max Planck Institute
  • 3. Elon University
  • 4. Binghamton University


The Pushshift Telegram Dataset

The dataset consists of three files:

Accounts.ndjson: Provides data for 2.2M Telegram users that were active in the channels we crawled.

Channels.ndjson: Provides data for 28K Telegram channels that we crawled.

Messages.ndjson: Provides data for 317M Telegram messages that were posted by 2.2M Telegram users in 28K Telegram channels.

Each file is a newline delimited json (ndjson) file that includes a json object with the data for each account/channel/message. The format of each object is according to the Telethon API (, which is a Python interface for Telegram's API.


Files (52.1 GB)

Name Size Download all
125.2 MB Download
7.3 MB Download
51.9 GB Download