There is a newer version of the record available.

Published August 4, 2023 | Version v1
Dataset Open

Congressional Tweets

  • 1. University of Richmond

Description

This dataset was collected to measure Anti-Democratic Rhetoric (ADR) in social media posts on Twitter from members of the U.S. Congress, in order to analyze (one aspect of) democratic backsliding in the United States. It involves a textual corpus of all tweets sent from the official accounts of sitting members of the 117th Congress during the period spanning January 2020 through June 2022, encompassing the 2020 election and the events of January 6, 2021.

The scraped Tweets are stored in this Excel file. There are 1,048,515 rows, each row representing a Tweet. Each tweet is accompanied by Twitter-derived metadata (e.g., timestamps, hashtags, and number of replies) as well as relevant demographic data about the members (including each congressmember’s name, Twitter username, party ID, gender, state and district represented, chamber of Congress, and tenure in office), drawn from Ballotpedia.org and members’ own web sites.

The data-scraping code and the metadata for members of Congress are available on GitHub at https://github.com/yucongj/congressional-tweets.

Analysis of this dataset has been published in:
Miller, C. J., & Jiang, Y. (2025). Congressional rhetoric on Twitter and the crisis of democracy. Communication and Democracy59(1), 161–204. https://doi.org/10.1080/27671127.2025.2478863

Files

Files (531.9 MB)

Name Size Download all
md5:518d6407e9a310d7e7401a462bbdaa12
531.9 MB Download

Additional details

Related works

Is compiled by
Software: https://github.com/yucongj/congressional-tweets (URL)
Is supplement to
Journal article: 10.1080/27671127.2025.2478863 (DOI)