Published May 24, 2024 | Version v1
Dataset Open

Supersharers of fake news on Twitter

  • 1. Ben-Gurion University of the Negev
  • 2. Northeastern University

Description

Governments may have the capacity to flood social media with fake news, but little is known about the use of flooding by ordinary voters. In this work, we identify 2107 registered US voters that account for 80% of the fake news shared on Twitter during the 2020 US presidential election by an entire panel of 664,391 voters. We find that supersharers are important members of the network, reaching a sizable 5.2% of registered voters on the platform. Supersharers have a significant overrepresentation of women, older adults, and registered Republicans. Supersharers' massive volume does not seem automated but is rather generated through manual and persistent retweeting. These findings highlight a vulnerability of social media for democracy, where a small group of people distort the political reality for many.

Methods

This dataset contains aggregated information necessary to replicate the results reported in our work on Supersharers of Fake News on Twitter while respecting and preserving the privacy expectations of individuals included in the analysis. No individual-level data is provided as part of this dataset. 

The data collection process that enabled the creation of this dataset leveraged a large-scale panel of registered U.S. voters matched to Twitter accounts. We examined the activity of 664,391 panel members who were active on Twitter during the months of the 2020 U.S. presidential election (August to November 2020, inclusive), and identified a subset of 2,107 supersharers, which are the most prolific sharers of fake news in the panel that together account for 80% of fake news content shared on the platform. We rely on a source-level definition of fake news, that uses the manually-labeled list of fake news sites by Grinberg et al. 2019 and an updated list based on NewsGuard ratings (commercially available, but not provided as part of this dataset), although the results were robust to different operationalizations of fake news sources. We restrict the analysis to tweets with external links that were identified as political by a machine learning classifier that we trained and validated against human coders, similar to the approach used in prior work. 
We address our research questions by contrasting supersharers with three reference groups: people who are the most prolific sharers of non-fake political tweets (supersharers non-fake group; SS-NF), a group of average fake news sharers, and a random sample of panel members. In particular, we identify the distinct sociodemographic characteristics of supersharers using a series of multilevel regressions, examine their use of Twitter through existing tools and additional statistical analysis, and study supersharers' reach by examining the consumption patterns of voters that follow supersharers.

Files

Archive20240523.zip

Files (3.0 MB)

Name Size Download all
md5:97c4a36e3a25ea98462517149820f23f
3.0 MB Preview Download
md5:7fc0f078a7a2d47fefee228552b38b3c
6.5 kB Preview Download