Published January 20, 2023 | Version v1
Dataset Open

TweetC19SR-Spa - Manually annotated dataset of Spanish language COVID-19 tweets containing self-reports of symptoms

Description

In this work, we release two expert curated, manually annotated datasets of COVID-19 self-reported symptoms. The first dataset contains tweets in English and the second contains tweets in Spanish, both containing around 36,500 tweets in total. These datasets were used for the Sixth and Seventh Workshop on Social Media Mining For Health (2021 and 2022)

Files

complete_training_set_spanish_mentions.zip

Files (1.3 MB)

Name Size Download all
md5:0b2b2dc22a05886bcf1bfaca2f8f20fb
927.3 kB Preview Download
md5:48e085d402e5b1d09e27db779d25a123
11.4 kB Preview Download
md5:94bb796da6947604d7c1a1547b00fb1c
335.6 kB Preview Download

Additional details

Funding

Google (United States)
Towards more equitable representation of Latin American Spanish natural language processing resources for social media mining of health-related applications Award for inclusion research program