Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published April 28, 2024 | Version v1
Dataset Open

Exploring Behavioral Tendencies on Social Media: A Perspective Through Claim Check-Worthiness

Description

Randomly Sampled Users Dataset (RSU.csv)

This dataset consists of 11,173 users collected through Twitter's APIs. We collected 10,000 random English tweets in February 2023 using Twitter's Volume Stream API.  The tweets were posted by around 3,000 users.  For each user, we collected up to 100 of its most recent followees using Twitter's Following API. Through the Timeline and Liking APIs, for each user, we collected their most recent tweets (up to 3,200 tweets due to Twitter's limit) and liked-tweets (up to 3,200 too). We then filtered out users that have insufficient tweets (less than 100 original-tweets or less than 80 retweets/liked-tweets) to ensure that the sample sizes are statistically significant in our analyses.  Finally, we have 11,173 users along with 40,405,150 tweets. 

Humanities Dataset (HUM.csv)

This dataset contains 341,285 tweets and 498 Twitter accounts from selected Twitter lists including Book Author, Christianity, Artists, Buddhism, Musician, and Philosophers. We use Twitter's List and Timeline APIs to collect the accounts and their most recent tweets (up to 1,000). The dataset was collected in January 2024.

Politics Dataset (POL.csv)

This dataset contains all tweets from selected U.S. news media and U.S. politicians including Senators, House Members, US Governors, US Secretaries of State, US Cabinet, and US Election Officials at collection time. We used Twitter's Timeline API to collect the accounts' tweets (up to 3,200 tweets). The dataset was collected in May 2023, with 8,153,745 tweets and 3,784 Twitter accounts. 

Data Fields

Due to Twitter's content redistribution policy, we are only allowed to publish tweet IDs and user IDs. Therefore, in each dataset, each row/datapoint represent a tweet, containing two fields --- tweet_id and user_id. 

Files

HUM.csv

Files (1.6 GB)

Name Size Download all
md5:273c1f36bd98594d2bf4a26bae5c678b
11.2 MB Preview Download
md5:d0a0656cc034aa2a69507c56fcd98936
256.7 MB Preview Download
md5:31aba71cb02621f2f4d23e0e3af5aedf
1.3 GB Preview Download