A Twitter Dataset for Spatial Infectious Disease Surveillance
Creators
- 1. Universidade Federal de Minas Gerais
Description
Dengue is a mosquito-borne viral disease which infects millions of people every year, specially in developing countries. Some of the main challenges facing the disease are reporting risk indicators and rapidly detecting outbreaks. Traditional surveillance systems rely on passive reporting from health-care facilities, often ignoring human mobility and locating each individual by their home address. Yet, geolocated data are becoming commonplace in social media, which is widely used as means to discuss a large variety of health topics, including the users' health status. In this dataset paper, we make available two large collections of dengue related labeled Twitter data. One is a set of tweets available through the Streaming API using the keywords dengue and aedes from 2010 to 2016. The other is the set of all geolocated tweets in Brazil during the year of 2015 (available also through the Streaming API). We detail the process of collecting and labeling each tweet containing keywords related to dengue in one of 5 categories: personal experience, information, opinion, campaign, and joke. This dataset can be useful for the development of models for spatial disease surveillance, but also scenarios such as understanding health-related content in a language other than English, and studying human mobility.