Disaster Tweet Corpus 2020
- 1. Bauhaus-Universität Weimar
- 2. German Aerospace Center (DLR)
- 3. Leipzig University
Description
This dataset consists of tweets collected during 48 disasters over 10 disaster types with human annotations denoting if a tweet is related to this disaster or not. This collection is intended as a benchmarking dataset for filtering algorithms.
Dataset Specification
Tweets are separated into files based on individual disasters, where each file contains a balanced number of positive and negative examples. The naming scheme is as follows:
<disaster type>-<name or region>[-<sub-type>]-<year>.ndjson
Each line in the data files is a complete json-object, containing the tweet-id, the text, and the annotations as:
{"id": "12345", "text": "let's all pray for nepal!", "relevance": 1}
References
To reference this collection as a whole, please use the following citation:
Wiegmann, M., Kersten, J., Klan, F., Potthast, M., Stein, B. (2020). Analysis of Filtering Models for
Disaster-Related Tweets. Proceedings of the 17th ISCRAM.
This dataset compiles tweets collected, annotated, and published in several other works. Please consider to cite those too:
1. Imran, M., Castillo, C., Lucas, J., Meier, P., and Vieweg, S. (2014). AIDR: artificial intelligence for disaster response. In: WWW (Companion Volume).
2. Olteanu, A., Castillo, C., Diaz, F., and Vieweg, S. (2014). CrisisLex: A Lexicon for Collecting and Filtering
Microblogged Communications in Crises. Proceedings of the 8th ICWSM.
3. Olteanu, A., Vieweg, S., and Castillo, C. (2015). What to Expect When the Unexpected Happens: Social
Media Communications Across Crises. Proceedings of the 18th ACM Conference on Computer Supported
Cooperative Work & Social Computing.
4. Imran, M., Mitra, P., and Srivastava, J. (2016). Enabling Rapid Classification of Social Media Communications
During Crises. IJISCRAM 8.
5. Alam, F., Ofli, F., and Imran, M. (2018). CrisisMMD: Multimodal Twitter Datasets from Natural Disasters. Proceedings of the 12th ICWSM.
6. Stowe, K., Palmer, M., Anderson, J., Kogan, M., Palen, L., Anderson, K. M., Morss, R., Demuth, J., and Lazrus,
H. (2018). Developing and Evaluating Annotation Procedures for Twitter Data during Hazard Events. Proceedings of the LAW-MWE-CxG-2018.
7. McMinn, A. J., Moshfeghi, Y., and Jose, J. M. (2013). Building a Large-scale Corpus for Evaluating Event
Detection on Twitter. Proceedings of the 22nd ACM CIKM.
Files
disaster-tweet-filtering-incident-tweets.zip
Files
(202.5 MB)
Name | Size | Download all |
---|---|---|
md5:c61c2cb776f420fc51e86731f3a2b544
|
7.8 MB | Preview Download |
md5:7752eb9c08c9eb587d0924f873e938df
|
194.7 MB | Preview Download |