Augmented dataset of rumours and non-rumours for rumour detection
Description
This data set contains a collection of Twitter rumours and non-rumours during three real-world events: 1) 2013 Boston marathon bombings, 2) 2014 Ottawa shooting, 3) 2014 Sydney siege.
The data set is an augmented data set of the PHEME dataset of rumours and non-rumours based on two data sets: the PHEME data [1] (downloaded via https://figshare.com/articles/PHEME_dataset_for_Rumour_Detection_and_Veracity_Classification/6392078), and the CrisisLexT26 data [2] (downloaded via https://github.com/sajao/CrisisLex/tree/master/data/CrisisLexT26/2013_Boston_bombings).
This data is the first released version of our data augmentation project. More data sets will be released from our follow-up work.
The statistics of the released (v1.0) rumour data for the "2013 Boston marathon bombings" event is as follows:
* 2013 Boston marathon bombings: 165 rumours and 228 non-rumours (1,238 rumours and 3,714 non-rumours before context filtering)
Augmented data for the "2014 Ottawa Shooting" and "2014 Sydney Siege" will be available shortly.
The data structure follows the format of the PHEME data [1]. Each event has a directory, with two subfolders, rumours and non-rumours. These two folders have folders named with a tweet ID. The tweet itself can be found on the 'source-tweet' directory of the tweet in question, and the directory 'reactions' has the set of tweets responding to that source tweet. Also each folder contains ‘aug_complete.csv’ and ‘reference.csv'.
'aug_complete.csv' file contains the metadata (tweet ID, tweet text, timestamp, and rumour label) of augmented tweets before deduplication and filtering tweets without context (i.e., replies).
'reference.csv' file contains manually annotated reference tweets [1, 2].
If you use our augmented data (PHEME-Aug v1.0), please also cite:
Han, S., Gao, J., Ciravegna, F. (2019). "Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model With Large-Scale Credibility Corpus", Seventh International Conference on Learning Representations (ICLR) LLD, May 2019, New Orleans, Louisiana, US
[1] Kochkina, E., Liakata, M., & Zubiaga, A. (2018). All-in-one: Multi-task Learning for Rumour Verification. COLING.
[2] Olteanu, A., Vieweg, S., & Castillo, C. (2015, February). What to expect when the unexpected happens: Social media communications across crises. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing (pp. 994-1009). ACM
Files
Files
(650.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:05017d1d69563e01c372828aff5ccd80
|
650.4 kB | Download |
Additional details
References
- Han, S., Gao, J., Ciravegna, F. (2019). "Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model With Large-Scale Credibility Corpus", Seventh International Conference on Learning Representations (ICLR) LLD, May 2019, New Orleans, Louisiana, US
- Kochkina, E., Liakata, M., & Zubiaga, A. (2018). All-in-one: Multi-task Learning for Rumour Verification. COLING.
- Olteanu, A., Vieweg, S., & Castillo, C. (2015, February). What to expect when the unexpected happens: Social media communications across crises. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing (pp. 994-1009). ACM.