Detecting lexical political unpredictability in speech transcripts with generative AI
Creators
Description
This is replication data for the paper titled:
"Detecting lexical political unpredictability in speech transcripts with generative AI"
submitted for peer review.
Abstract:
This study introduces Lexical Political Unpredictability (LPU), a novel generative AI-based metric measuring unpredictability in political speech. We analyzed 989 speeches delivered by 43 U.S. Presidents, computing LPU using cosine similarity between actual presidential sentences and those generated by GPT-3 (2020 release) and GPT-4o-mini (2024). Empirically, results robustly confirm across models that increased presidential speech unpredictability strongly correlates with higher Economic Policy Uncertainty (EPU), integrative complexity, and weaker democratic mandates. Granger causality tests reveal consistent evidence that economic uncertainty predicts subsequent increases in presidential unpredictability, emphasizing economic factors as key determinants of political rhetoric.
Methodologically, the study provides valuable insights into the stability and variability of generative AI-based results. Cross-model validation indicates that relationships between LPU, integrative complexity, and EPU remain highly robust when transitioning from GPT-3 to GPT-4o-mini. However, some relationships, notably involving presidential greatness ratings and the directionality of Granger-causal links, display considerable variability between models. This finding underscores a critical methodological lesson: while many core conclusions drawn from earlier generative AI models remain reliable, others - particularly nuanced or theoretically ambiguous ones - should be routinely revalidated using cutting-edge models. Thus, the study advocates ongoing, model-aware replication practices to differentiate genuinely stable empirical relationships from artifacts specific to particular generative architectures. Overall, the LPU methodology contributes a replicable and insightful quantitative approach to assessing political unpredictability, with practical implications for policymakers, researchers, and the broader public during periods of economic and political volatility.
The repository includes six datasets in .RData
format:
-
df_GPT3.RData: Contains Lexical Political Unpredictability scores calculated using the GPT-3 model for each presidential speech. The dataframe includes cosine similarity standard deviation scores (sg1sd, sg2sd, sl1sd, sl2sd), speech dates, president names, presidential terms, political party affiliations, and speech transcripts.
-
df_GPT4o_mini.RData: Similar to df_GPT3 but with scores computed using the GPT-4o-mini generative AI model.
-
df_GPT4o_mini_mod_prompt.RData: Includes Lexical Political Unpredictability scores using GPT-4o-mini with a modified prompt incorporating personalized context about each president.
-
epu.RData: Contains monthly Economic Policy Uncertainty (EPU) index values for the United States from January 1900 through March 2022, alongside corresponding year-month identifiers.
-
pres_rank.RData: Provides presidential greatness rankings, including normalized scores from the 2021 C-SPAN Presidential Historians Survey and the 2022 Siena College Presidential Rankings.
-
pres_personal_data: Contains integrative complexity and human capital scores for individual U.S. presidents.
All dataframes were utilized in various statistical analyses reported in the manuscript, including regression modeling and Granger causality testing. Independent variables in analyses include integrative complexity, presidential human capital, presidential greatness rankings, electoral mandate strength (percentage of the popular vote), and economic policy uncertainty (EPU).
Files
Files
(21.8 MB)
Name | Size | Download all |
---|---|---|
md5:a103c20ed71cdd08ff2024caab9ca387
|
7.1 MB | Download |
md5:3cb8d106946b777ad946bfb717ac65a7
|
7.1 MB | Download |
md5:8ac2d4a09601e459acac19db176dcc15
|
7.5 MB | Download |
md5:792dafe0a5552fc35c546cb037ba5f84
|
18.3 kB | Download |
md5:33138fd106a8df1f7e33125f2539d732
|
1.1 kB | Download |
md5:ae80840a9cfb7b6c87d5cf8a1528b29f
|
1.6 kB | Download |
Additional details
Dates
- Created
-
2025-05-19