Published January 15, 2026 | Version 0.1.0
Dataset Open

Preview of the TeraGram dataset

Description

This is a preview of the TeraGram dataset that contains approximately 1% of the downloaded chats of the full dataset. 

 

TeraGram is a dataset of publically available messages from the Telegram platform. The full version of the dataset contains metadata for over 5.9 billion messages from 712 thousand channels and groups, spanning a period from 2015 to 2025. For convenience, here we provide a sample of the full dataset in CSV format that contains all data related to 7,000 chats. 

 

For details regarding dataset collection and preliminary results, please refer to our paper "TeraGram: A Structured Longitudinal Dataset of the Telegram Messenger" (currently under review). 

 

A datasheet for the dataset is available here

Files

chat_language.csv

Files (7.3 GB)

Name Size Download all
md5:f59c94c8a2a1e2259738c8f82d23e263
232.3 kB Preview Download
md5:525688fcf1dfecd6babc24078cbe8603
1.7 MB Preview Download
md5:07831f80db10ab85b980f936b0db48c5
1.5 MB Preview Download
md5:dc86dc12de5d9760d11acb67b91659e2
2.3 GB Preview Download
md5:11ebdd8783c0e2c4aefe558228dc937d
210.4 MB Preview Download
md5:339c98bd3eaa1b44db89f7fb2940f171
1.6 GB Download
md5:2c5671cfe308010b9ef3eb45a3bf74a0
55.9 MB Preview Download
md5:3be1d2c1d27805b704b9050579909675
34.7 MB Preview Download
md5:388fe4d7fcdb630bdeda89dda6d5c0f6
2.9 GB Preview Download
md5:d7c27d018f8e7cfa1b57974ce2adaa54
236.4 MB Preview Download

Additional details

Software