Published January 14, 2020
| Version v1
Conference paper
Open
The Pushshift Telegram Dataset
- 1. Pushshift.io
- 2. Max Planck Institute
- 3. Elon University
- 4. Binghamton University
Description
The Pushshift Telegram Dataset
The dataset consists of three files:
Accounts.ndjson: Provides data for 2.2M Telegram users that were active in the channels we crawled.
Channels.ndjson: Provides data for 28K Telegram channels that we crawled.
Messages.ndjson: Provides data for 317M Telegram messages that were posted by 2.2M Telegram users in 28K Telegram channels.
Each file is a newline delimited json (ndjson) file that includes a json object with the data for each account/channel/message. The format of each object is according to the Telethon API (https://docs.telethon.dev/en/latest/), which is a Python interface for Telegram's API.
Files
Files
(52.1 GB)
Name | Size | Download all |
---|---|---|
md5:a68ec75718218796caba58b8a06f871b
|
125.2 MB | Download |
md5:80e678333976597384dcda0054677b67
|
7.3 MB | Download |
md5:ed90ff295afebccd47262280aecf33f2
|
51.9 GB | Download |