Datasets for "Auditing Elon Musk's Impact on Hate Speech and Bots"

doi:10.5281/zenodo.10578271

Published December 18, 2023 | Version v2

Dataset Open

Datasets for "Auditing Elon Musk's Impact on Hate Speech and Bots"

1. Oregon State University
2. University of California Merced

Datasets for the publication "Auditing Elon Musk's Impact on Hate Speech and Bots" [1].

File information:

baseline_tweet_ids_2022.csv, hate_tweet_ids_2022.csv: List of IDs and their corresponding dates from the "baseline" and "hate" samples of tweets used in the publication, respectively. The former is used to create the number of baseline tweets each day (‘baseline_freq.csv’) while the latter is used to create the number of hate tweets each day (‘hate_freq.csv'). We share the date a tweet was made as well as its tweet ID from which you can find the original tweet’s URL with the help of this web page. As you explore these data, you may notice in a minority of cases hate tweets that are not hateful or, alternatively, baseline tweets that are hateful. This is a product of our filtering method used to collect and analyze tweets at scale. We always look forward to hearing your suggestions to improve the tweet filtering process.
baseline_freq.csv, hate_freq.csv: Number of collected tweets per day for the baseline and hate samples, respectively. The file 'freq_data.py' is used to calculate these frequencies from the raw data. Feel free to consult this if you have questions about how the frequencies are calculated (or if you want to change how the data are aggregated). Use these data to recreate Figure 2 from Hickey et al [1]. See the Methods section of the publication for more details.
user_hate_levels_per_day.csv: CSV file with dates (YYYY-MM-DD format) and the mean proportion of slurs used by hateful users each day from October 1st to November 30th, 2022. Use these data to recreate Figure 1 from Hickey et al [1]. See the Methods section for more details.
hate_keywords.txt: Words used to query the Twitter Academic API for hate tweets.
unfiltered_tweets_containing_hate_words.csv: All tweets with hate words collected with values for Perspective API attributes.

Reference:

Hickey, D., Schmitz, M., Fessler, D.M.T, Smaldino, P., Muric, G., & Burghardt, K. Auditing Elon Musk's Impact on Hate Speech and Bots. In Proceedings of the 17th International AAAI Conference on Web and Social Media, (2023).

V2 note: This is an updated version of this dataset that was previously uploaded on 1/23/2024. In the previous version, tweet IDs were stored as floating point values which truncated them. In this version, that issue has been fixed and all IDs are complete.

Files

baseline_freq.csv

Files (204.6 MB)

Name	Size	Download all
baseline_freq.csv md5:3399486031fa5af35ad4b1d0531a241b	6.2 kB	Preview Download
baseline_tweet_ids_2022.csv md5:ce82759a3612454802660e56fdf139bb	196.9 MB	Preview Download
freq_data.py md5:c6c911a5f374dcac345c2ca680751546	1.1 kB	Download
hate_freq.csv md5:32c914b00985270064ba5939e59ee295	5.0 kB	Preview Download
hate_keywords.txt md5:a5b976d51ed00ff08a86b10993367119	399 Bytes	Preview Download
hate_tweet_ids_2022.csv md5:883f88b69218fc30ee103e549e4c41d9	159.8 kB	Preview Download
unfiltered_tweets_containing_hate_words.csv md5:08c3c83517f57c28ef5bd818b1764b0c	7.5 MB	Preview Download
user_hate_levels_by_day.csv md5:3ccf8b4395247f5c4b64b373c81e9d07	2.0 kB	Preview Download

	All versions	This version
Views	237	203
Downloads	309	269
Data volume	7.9 GB	7.3 GB

Datasets for "Auditing Elon Musk's Impact on Hate Speech and Bots"

Creators

Description

Files

baseline_freq.csv

Files (204.6 MB)