Augmented dataset of rumours and non-rumours for rumour detection

Sooji Han; Jie Gao; Fabio Ciravegna

doi:10.5281/zenodo.3249977

Published June 19, 2019 | Version 1.0

Dataset Open

Augmented dataset of rumours and non-rumours for rumour detection

1. University of Sheffield

This data set contains a collection of Twitter rumours and non-rumours during three real-world events: 1) 2013 Boston marathon bombings, 2) 2014 Ottawa shooting, 3) 2014 Sydney siege.

The data set is an augmented data set of the PHEME dataset of rumours and non-rumours based on two data sets: the PHEME data [1] (downloaded via https://figshare.com/articles/PHEME_dataset_for_Rumour_Detection_and_Veracity_Classification/6392078), and the CrisisLexT26 data [2] (downloaded via https://github.com/sajao/CrisisLex/tree/master/data/CrisisLexT26/2013_Boston_bombings).

This data is the first released version of our data augmentation project. More data sets will be released from our follow-up work.

The statistics of the released (v1.0) rumour data for the "2013 Boston marathon bombings" event is as follows:

* 2013 Boston marathon bombings: 165 rumours and 228 non-rumours (1,238 rumours and 3,714 non-rumours before context filtering)

Augmented data for the "2014 Ottawa Shooting" and "2014 Sydney Siege" will be available shortly.

The data structure follows the format of the PHEME data [1]. Each event has a directory, with two subfolders, rumours and non-rumours. These two folders have folders named with a tweet ID. The tweet itself can be found on the 'source-tweet' directory of the tweet in question, and the directory 'reactions' has the set of tweets responding to that source tweet. Also each folder contains ‘aug_complete.csv’ and ‘reference.csv'.

'aug_complete.csv' file contains the metadata (tweet ID, tweet text, timestamp, and rumour label) of augmented tweets before deduplication and filtering tweets without context (i.e., replies).

'reference.csv' file contains manually annotated reference tweets [1, 2].

If you use our augmented data (PHEME-Aug v1.0), please also cite:

Han, S., Gao, J., Ciravegna, F. (2019). "Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model With Large-Scale Credibility Corpus", Seventh International Conference on Learning Representations (ICLR) LLD, May 2019, New Orleans, Louisiana, US

[1] Kochkina, E., Liakata, M., & Zubiaga, A. (2018). All-in-one: Multi-task Learning for Rumour Verification. COLING.

[2] Olteanu, A., Vieweg, S., & Castillo, C. (2015, February). What to expect when the unexpected happens: Social media communications across crises. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing (pp. 994-1009). ACM

Files

Files (650.4 kB)

Name	Size	Download all
aug-rnr-data.tar.bz2 md5:05017d1d69563e01c372828aff5ccd80	650.4 kB	Download

Additional details

Han, S., Gao, J., Ciravegna, F. (2019). "Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model With Large-Scale Credibility Corpus", Seventh International Conference on Learning Representations (ICLR) LLD, May 2019, New Orleans, Louisiana, US
Kochkina, E., Liakata, M., & Zubiaga, A. (2018). All-in-one: Multi-task Learning for Rumour Verification. COLING.
Olteanu, A., Vieweg, S., & Castillo, C. (2015, February). What to expect when the unexpected happens: Social media communications across crises. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing (pp. 994-1009). ACM.

	All versions	This version
Views	4,038	2,271
Downloads	763	302
Data volume	60.3 GB	206.8 MB

Augmented dataset of rumours and non-rumours for rumour detection

Authors/Creators

Description

Files

Files (650.4 kB)

Additional details

References