Published January 15, 2019 | Version 1
Dataset Open

A Twitter Dataset for Spatial Infectious Disease Surveillance

Description

Dengue is a mosquito-borne viral disease which infects millions of people every year, specially in developing countries. Some of the main challenges facing the disease are reporting risk indicators and rapidly detecting outbreaks. Traditional surveillance systems rely on passive reporting from health-care facilities, often ignoring human mobility and locating each individual by their home address. Yet, geolocated data are becoming commonplace in social media, which is widely used as means to discuss a large variety of health topics, including the users' health status. In this dataset paper, we make available two large collections of dengue related labeled Twitter data. One is a set of tweets available through the Streaming API using the keywords dengue and aedes from 2010 to 2016. The other is the set of all geolocated tweets in Brazil during the year of 2015 (available also through the Streaming API). We detail the process of collecting and labeling each tweet containing keywords related to dengue in one of 5 categories: personal experience, information, opinion, campaign, and joke. This dataset can be useful for the development of models for spatial disease surveillance, but also scenarios such as understanding health-related content in a language other than English, and studying human mobility.

Files

cities.zip

Files (1.8 GB)

Name Size Download all
md5:896f2df001866828cb1f78dd36c3a46b
1.8 GB Preview Download
md5:fb56dfa02c899cf64d505eec0b5a6c90
946 Bytes Preview Download
md5:8d8cf763e8a5439402c3b974385b0426
613.7 kB Preview Download