Published May 12, 2026 | Version v2
Dataset Open

Congressional Tweets Annotated for Anti-Democratic Rhetoric

  • 1. University of Richmond

Description

This dataset was collected to measure Anti-Democratic Rhetoric (ADR) in social media posts on Twitter from members of the U.S. Congress, in order to analyze (one aspect of) democratic backsliding in the United States. It involves a textual corpus of all tweets sent from the official accounts of sitting members of the 117th Congress during the period spanning January 2020 through June 2022, encompassing the 2020 election and the events of January 6, 2021.

The scraped Tweets are stored in this Excel file. There are 1,048,515 rows, each row representing a Tweet. Each tweet is accompanied by Twitter-derived metadata (e.g., timestamps, hashtags, and number of replies) as well as relevant demographic data about the members (including each congressmember’s name, Twitter username, party ID, gender, state and district represented, chamber of Congress, and tenure in office), drawn from Ballotpedia.org and members’ own web sites.

Unlike Version v1 of this dataset, this version has been annotated to include five additional columns reflecting instances of ADR found by our analytical model (see paper below), both overall and disaggregated into four subcategories. There are no other changes.

The data-scraping code and the metadata for members of Congress are available on GitHub at https://github.com/yucongj/congressional-tweets.

Analysis of this dataset has been published in:
Miller, C. J., & Jiang, Y. (2025). Congressional rhetoric on Twitter and the crisis of democracy. Communication and Democracy59(1), 161–204. https://doi.org/10.1080/27671127.2025.2478863

Files

Files (553.6 MB)

Name Size Download all
md5:1344050ea3b8e7066eeab03485e9004c
553.6 MB Download

Additional details

Related works

Is compiled by
Software: https://github.com/yucongj/congressional-tweets (URL)
Is supplement to
Journal article: 10.1080/27671127.2025.2478863 (DOI)