Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict
Creators
- 1. University of Potsdam
- 2. TIB - Leibniz Information Centre for Science and Technology
Description
We present a dataset that collects tweets from news media channels worldwide that pertain to the Russo-Ukrainian war. This dataset spans a period of February 2022-May 2023. The dataset is unique in its global scope, encompassing tweets in various languages and from different parts of the world. Additionally, we extracted information about the stance, sentiment, prominent entities & concepts that occur in tweets to be able to answer questions about the discourse: who says what (prominent entities), who stands (stance) where on what aspect (prominent concepts), how are the aspects portrayed (sentiment). We also downloaded the images attached to the post and classified them to extract image tags for each image. The dataset includes 1,524,832 tweets, out of which 306,295 have images, for 60 languages.
The source code for the collection and processing of tweets can be found on here: https://github.com/sherzod-hakimov/ru-ua-news-discourse-twitter
Each entry in the dataset is a single JSON line and has the following entries:
{
'tweet_id':
'lang':
'stanza_output':
'stanza_named_entities':
'sentiment':
'stance':
'channel':
'country':
'verified':
'image_tags':
}
If you need access to the full text of the dataset, please contact us via an email: sherzodhakimov (at sign) gmail.com
If you find the resources useful, please cite us:
```
@misc{hakimov2023unveiling,
title={Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict},
author={Sherzod Hakimov and Gullal S. Cheema},
year={2023},
eprint={2306.12886},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
Files
README.md
Files
(7.5 GB)
Name | Size | Download all |
---|---|---|
md5:bd384c37b2c330b1b0314d91e83a1132
|
7.5 GB | Download |
md5:b1602f31e93e63a14d1fa58fa80dd549
|
1.6 kB | Preview Download |