Published December 18, 2023 | Version v2
Dataset Open

Datasets for "Auditing Elon Musk's Impact on Hate Speech and Bots"

  • 1. Oregon State University
  • 2. University of California Merced

Description

Datasets for the publication "Auditing Elon Musk's Impact on Hate Speech and Bots" [1].

File information:

  • baseline_tweet_ids_2022.csv, hate_tweet_ids_2022.csv: List of IDs and their corresponding dates from the "baseline" and "hate" samples of tweets used in the publication, respectively. The former is used to create the number of baseline tweets each day (‘baseline_freq.csv’) while the latter is used to create the number of hate tweets each day (‘hate_freq.csv'). We share the date a tweet was made as well as its tweet ID from which you can find the original tweet’s URL with the help of this web page.  As you explore these data, you may notice in a minority of cases hate tweets that are not hateful or, alternatively, baseline tweets that are hateful. This is a product of our filtering method used to collect and analyze tweets at scale. We always look forward to hearing your suggestions to improve the tweet filtering process.
  • baseline_freq.csv, hate_freq.csv: Number of collected tweets per day for the baseline and hate samples, respectively. The file 'freq_data.py' is used to calculate these frequencies from the raw data. Feel free to consult this if you have questions about how the frequencies are calculated (or if you want to change how the data are aggregated). Use these data to recreate Figure 2 from Hickey et al [1]. See the Methods section of the publication for more details.
  • user_hate_levels_per_day.csv: CSV file with dates (YYYY-MM-DD format) and the mean proportion of slurs used by hateful users each day from October 1st to November 30th, 2022. Use these data to recreate Figure 1 from Hickey et al [1]. See the Methods section for more details.
  • hate_keywords.txt: Words used to query the Twitter Academic API for hate tweets.
  • unfiltered_tweets_containing_hate_words.csv: All tweets with hate words collected with values for Perspective API attributes.

Reference:

  1. Hickey, D., Schmitz, M., Fessler, D.M.T, Smaldino, P., Muric, G., & Burghardt, K. Auditing Elon Musk's Impact on Hate Speech and Bots. In Proceedings of the 17th International AAAI Conference on Web and Social Media, (2023).

 

V2 note: This is an updated version of this dataset that was previously uploaded on 1/23/2024. In the previous version, tweet IDs were stored as floating point values which truncated them. In this version, that issue has been fixed and all IDs are complete. 

Files

baseline_freq.csv

Files (204.6 MB)

Name Size Download all
md5:3399486031fa5af35ad4b1d0531a241b
6.2 kB Preview Download
md5:ce82759a3612454802660e56fdf139bb
196.9 MB Preview Download
md5:c6c911a5f374dcac345c2ca680751546
1.1 kB Download
md5:32c914b00985270064ba5939e59ee295
5.0 kB Preview Download
md5:a5b976d51ed00ff08a86b10993367119
399 Bytes Preview Download
md5:883f88b69218fc30ee103e549e4c41d9
159.8 kB Preview Download
md5:08c3c83517f57c28ef5bd818b1764b0c
7.5 MB Preview Download
md5:3ccf8b4395247f5c4b64b373c81e9d07
2.0 kB Preview Download