Dataset: Characterizing Anti-Asian Rhetoric During The COVID-19 Pandemic: A Sentiment Analysis Case Study on Twitter

doi:10.5281/zenodo.6523152

Published May 6, 2022 | Version 1

Dataset Open

Dataset: Characterizing Anti-Asian Rhetoric During The COVID-19 Pandemic: A Sentiment Analysis Case Study on Twitter

1. Georgia State University
2. University of Waterloo
3. University of California, Berkeley
4. Coronavirus Visualization Team

This is the dataset, trained model, and software companion for the paper titled: Characterizing Anti-Asian Rhetoric During The COVID-19 Pandemic: A Sentiment Analysis Case Study on Twitter accepted for the Workshop on Data for the Wellbeing of Most Vulnerable of the ICWSM 2022 conference.

The COVID-19 pandemic has shown a measurable increase in the usage of sinophobic comments or terms on online social media platforms. In the United States, Asian Americans have been primarily targeted by violence and hate speech stemming from negative sentiments about the origins of the novel SARS-CoV-2 virus. While most published research focuses on extracting these sentiments from social media data, it does not connect the specific news events during the pandemic with changes in negative sentiment on social media platforms. In this work we combine and enhance publicly available resources with our own manually annotated set of tweets to create machine learning classification models to characterize the sinophobic behavior. We then applied our classifier to a pre-filtered longitudinal dataset spanning two years of pandemic related tweets and overlay our findings with relevant news events.

Files

Readme.pdf

Files (1.2 GB)

Name	Size	Download all
buildClassical.py md5:d2e9ca2bd5078bcb007f168798d0f30d	4.7 kB	Download
buildTransformers.py md5:8ece887971d16743b83eef42b3e52648	5.4 kB	Download
covidbert.tar.gz md5:b944a9c86d2d26e53861c8079c0ec743	1.2 GB	Download
ICWSM2022_new_dataset.tsv md5:b820032b8b6f239e0afcce79af1ab3a6	845.7 kB	Download
prepDatasets.py md5:ee38f401427e345e7e93b7a43b216280	4.1 kB	Download
Readme.pdf md5:16ea1c770535b5d705d25a0ee0381103	64.4 kB	Preview Download

Additional details

Is published in: Conference paper: 10.36190/2022.81 (DOI)

	All versions	This version
Views	176	174
Downloads	129	128
Data volume	26.2 GB	26.2 GB

Dataset: Characterizing Anti-Asian Rhetoric During The COVID-19 Pandemic: A Sentiment Analysis Case Study on Twitter

Creators

Description

Files

Readme.pdf

Files (1.2 GB)

Additional details

Related works