Published January 24, 2024 | Version v2
Dataset Open

Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict

  • 1. University of Potsdam
  • 2. TIB - Leibniz Information Centre for Science and Technology

Description

We present a dataset that collects tweets from news media channels worldwide that pertain to the Russo-Ukrainian war. This dataset spans a period of February 2022-May 2023. The dataset is unique in its global scope, encompassing tweets in various languages and from different parts of the world. Additionally, we extracted information about the stance, sentiment, prominent entities & concepts that occur in tweets to be able to answer questions about the discourse: who says what (prominent entities), who stands (stance) where on what aspect (prominent concepts), how are the aspects portrayed (sentiment). We also downloaded the images attached to the post and classified them to extract image tags for each image. The dataset includes 1,524,826 tweets, out of which 306,295 tweets have images, for 60 languages.

The source code for the collection and processing of tweets can be found on here: https://github.com/sherzod-hakimov/ru-ua-news-discourse-twitter

Each entry in the dataset is a single JSON line and has the following entries:

{
'tweet_id': 
'lang':
'stanza_output':
'stanza_named_entities':
'sentiment':
'stance':
'channel':
'country': 
'verified':
'image_tags': }
 

If you need access to the full text of the dataset, please contact us via an email: sherzodhakimov (at sign) gmail.com

If you find the resources useful, please cite us:

```

@inproceedings{hakimov2023unveiling,
      title={Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict}, 
      author={Sherzod Hakimov and Gullal S. Cheema},
      booktitle={Proceedings of the 2024 {ACM} International Conference on Multimedia Retrieval, {ICMR} 2024},
      year={2024}
}
```

Files

README.md

Files (7.6 GB)

Name Size Download all
md5:d2b86ab6e064fb382292b861d29be464
7.6 GB Download
md5:8cd8ef79b87dbde066b54cf32b277be6
1.7 kB Preview Download