There is a newer version of the record available.

Published October 20, 2020 | Version 1.0.2
Dataset Open

A Data set for Information Spreading over the News

  • 1. Jozef Stefan Institute and Jozef Stefan International Postgraduate School

Description

Abstract:

Analyzing the spread of information related to a specific event in the news has many potential applications. Consequently, various systems have been developed to facilitate the analysis of information spreadings such as detection of disease propagation and identification of the spreading of fake news through social media. There are several open challenges in the process of discerning information propagation, among them the lack of resources for training and evaluation. This paper describes the process of compiling a corpus from the EventRegistry global media monitoring system. We focus on information spreading in three domains: sports (i.e. the FIFA WorldCup), natural disasters (i.e. earthquakes), and climate change (i.e.global warming). This corpus is a valuable addition to the currently available datasets to examine the spreading of information about various kinds of events.

Introduction:

Domain-specific gaps in information spreading are ubiquitous and may exist due to economic conditions, political factors, or linguistic, geographical, time-zone, cultural, and other barriers. These factors potentially contribute to obstructing the flow of local as well as international news. We believe that there is a lack of research studies that examine, identify, and uncover the reasons for barriers in information spreading. Additionally, there is limited availability of datasets containing news text and metadata including time, place, source, and other relevant information. When a piece of information starts spreading, it implicitly raises questions such as as

  1. How far does the information in the form of news reach out to the public?
  2. Does the content of news remain the same or changes to a certain extent?
  3. Do the cultural values impact the information especially when the same news will get translated in other languages?

Statistics about datasets:

--------------------------------------------------------------------------------------------------------------------------------------

#     Domain                     Event Type              Articles Per Language                               Total Articles

1     Sports                       FIFA World Cup      983-en, 762-sp, 711-de, 10-sl, 216-pt        2679

2     Natural Disaster        Earthquake            941-en, 999-sp, 937-de, 19-sl, 251-pt        3194

3     Climate Changes     Global Warming    996-en, 298-sp, 545-de, 8-sl, 97-pt              1945

--------------------------------------------------------------------------------------------------------------------------------------

Files

Earthquake - Metadata.csv

Files (24.3 MB)

Name Size Download all
md5:20fdd239047b52ce756640ba18a5a438
880.5 kB Preview Download
md5:44320c69650daf92dd2dd349625ee7c7
6.3 MB Preview Download
md5:af0b0bf78d17378ca23da612387248c6
1.6 MB Preview Download
md5:aa6e04d49e0f0485ae893a1587c50206
5.9 kB Preview Download
md5:2c998649ea8ec6981fd96cb9f50021cb
9.6 kB Preview Download
md5:e72d21452d0e3d77a8ee039a52d9fbfa
2.8 kB Preview Download
md5:b123d87240e9554f8b1421d4a4c82292
44.3 kB Preview Download
md5:21d9562f9f29aaf38bc03d87d7f477b8
32.1 kB Preview Download
md5:ea3c48c66709db95af475666f6c5ad5f
8.0 kB Preview Download
md5:840044f4bcf56598dd6333b1c541df41
6.5 MB Preview Download
md5:67c1f47c271930cbbd34dfb595109a25
761.7 kB Preview Download
md5:a5a6af65128797cd1acca7dd33560e0c
1.3 MB Preview Download
md5:966bda8fc63290c7f7f1087e49992383
13.0 kB Preview Download
md5:a69f7a66cc18674cda62c6f7de7e3a58
11.4 kB Preview Download
md5:f44d388484cb5efc4ce66177fe4ef058
4.1 kB Preview Download
md5:6b88d23e645b3637e885bb8f5f389df5
20.7 kB Preview Download
md5:e23e567caccc3c3b11892d98e2724644
10.9 kB Preview Download
md5:1038e0875b1d9357fdc525008e928a27
8.4 kB Preview Download
md5:4811da7726f19d3afe429efef0087da5
5.3 MB Preview Download
md5:9aab8ab7156e06651b9e69f10f8ad103
557.0 kB Preview Download
md5:78451b831912313821db15307e9ee1f3
945.7 kB Preview Download
md5:7592d2abc8afa143d71e9e9fccaac724
3.9 kB Preview Download
md5:083d8162ba96d313d29928a430b2146d
8.0 kB Preview Download
md5:dd0758b4d2a26f81bd5cb108a23d1619
4.3 kB Preview Download
md5:5ae1aa036c2099acecaab7f19e1fbe5f
14.5 kB Preview Download
md5:7fbfee6b67a10417faef28aeefea01ee
5.3 kB Preview Download
md5:aa231fffddc382a32de7968ae36ac479
9.2 kB Preview Download

Additional details

Funding

Cleopatra – Cross-lingual Event-centric Open Analytics Research Academy 812997
European Commission