Disaster Tweet Corpus 2020

Matti Wiegmann; Jens Kersten; Friederike Klan; Martin Potthast; Benno Stein

doi:10.5281/zenodo.3713920

Published March 17, 2020 | Version 1.0.0

Dataset Open

Disaster Tweet Corpus 2020

1. Bauhaus-Universität Weimar
2. German Aerospace Center (DLR)
3. Leipzig University

This dataset consists of tweets collected during 48 disasters over 10 disaster types with human annotations denoting if a tweet is related to this disaster or not. This collection is intended as a benchmarking dataset for filtering algorithms.

Dataset Specification

Tweets are separated into files based on individual disasters, where each file contains a balanced number of positive and negative examples. The naming scheme is as follows:

<disaster type>-<name or region>[-<sub-type>]-<year>.ndjson

Each line in the data files is a complete json-object, containing the tweet-id, the text, and the annotations as:

{"id": "12345", "text": "let's all pray for nepal!", "relevance": 1}

References

To reference this collection as a whole, please use the following citation:

Wiegmann, M., Kersten, J., Klan, F., Potthast, M., Stein, B. (2020). Analysis of Filtering Models for
Disaster-Related Tweets. Proceedings of the 17th ISCRAM.

This dataset compiles tweets collected, annotated, and published in several other works. Please consider to cite those too:

1. Imran, M., Castillo, C., Lucas, J., Meier, P., and Vieweg, S. (2014). AIDR: artificial intelligence for disaster response. In: WWW (Companion Volume).

2. Olteanu, A., Castillo, C., Diaz, F., and Vieweg, S. (2014). CrisisLex: A Lexicon for Collecting and Filtering
Microblogged Communications in Crises. Proceedings of the 8th ICWSM.

3. Olteanu, A., Vieweg, S., and Castillo, C. (2015). What to Expect When the Unexpected Happens: Social
Media Communications Across Crises. Proceedings of the 18th ACM Conference on Computer Supported
Cooperative Work & Social Computing.

4. Imran, M., Mitra, P., and Srivastava, J. (2016). Enabling Rapid Classification of Social Media Communications
During Crises. IJISCRAM 8.

5. Alam, F., Ofli, F., and Imran, M. (2018). CrisisMMD: Multimodal Twitter Datasets from Natural Disasters. Proceedings of the 12th ICWSM.

6. Stowe, K., Palmer, M., Anderson, J., Kogan, M., Palen, L., Anderson, K. M., Morss, R., Demuth, J., and Lazrus,
H. (2018). Developing and Evaluating Annotation Procedures for Twitter Data during Hazard Events. Proceedings of the LAW-MWE-CxG-2018.

7. McMinn, A. J., Moshfeghi, Y., and Jose, J. M. (2013). Building a Large-scale Corpus for Evaluating Event
Detection on Twitter. Proceedings of the 22nd ACM CIKM.

Files

disaster-tweet-filtering-incident-tweets.zip

Files (202.5 MB)

Name	Size	Download all
disaster-tweet-filtering-incident-tweets.zip md5:c61c2cb776f420fc51e86731f3a2b544	7.8 MB	Preview Download
disaster-tweet-filtering-tranquil-tweets.zip md5:7752eb9c08c9eb587d0924f873e938df	194.7 MB	Preview Download

	All versions	This version
Views	3,251	3,238
Downloads	634	631
Data volume	75.8 GB	75.6 GB

Disaster Tweet Corpus 2020

Authors/Creators

Description

Files

disaster-tweet-filtering-incident-tweets.zip

Files (202.5 MB)