Preview of the TeraGram dataset
Description
This is a preview of the TeraGram dataset that contains approximately 1% of the downloaded chats of the full dataset.
TeraGram is a dataset of publically available messages from the Telegram platform. The full version of the dataset contains metadata for over 5.9 billion messages from 712 thousand channels and groups, spanning a period from 2015 to 2025. For convenience, here we provide a sample of the full dataset in CSV format that contains all data related to 7,000 chats.
For details regarding dataset collection and preliminary results, please refer to our paper "TeraGram: A Structured Longitudinal Dataset of the Telegram Messenger" (currently under review).
A datasheet for the dataset is available here.
Files
chat_language.csv
Files
(7.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f59c94c8a2a1e2259738c8f82d23e263
|
232.3 kB | Preview Download |
|
md5:525688fcf1dfecd6babc24078cbe8603
|
1.7 MB | Preview Download |
|
md5:07831f80db10ab85b980f936b0db48c5
|
1.5 MB | Preview Download |
|
md5:dc86dc12de5d9760d11acb67b91659e2
|
2.3 GB | Preview Download |
|
md5:11ebdd8783c0e2c4aefe558228dc937d
|
210.4 MB | Preview Download |
|
md5:339c98bd3eaa1b44db89f7fb2940f171
|
1.6 GB | Download |
|
md5:2c5671cfe308010b9ef3eb45a3bf74a0
|
55.9 MB | Preview Download |
|
md5:3be1d2c1d27805b704b9050579909675
|
34.7 MB | Preview Download |
|
md5:388fe4d7fcdb630bdeda89dda6d5c0f6
|
2.9 GB | Preview Download |
|
md5:d7c27d018f8e7cfa1b57974ce2adaa54
|
236.4 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/Priesemann-Group/telegram_quality_control
- Programming language
- Python