10.5281/zenodo.3250516
https://zenodo.org/records/3250516
oai:zenodo.org:3250516
Jeremy Silver
Jeremy Silver
0000-0003-1502-6249
University of Melbourne
Mark Quigley
Mark Quigley
0000-0002-4430-4212
University of Melbourne
Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages
Zenodo
2019
US Politics
Text processing
Keyword counts
2019-06-20
eng
10.5281/zenodo.3250515
V1.0
Creative Commons Attribution 4.0 International
Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages. This was done using the Python scripts provided under https://github.com/JeremySilver/KeywordCountsPresidentialMessages. The raw text data is from The American Presidency Project (UCSB), with some Presidential Budget Messages being extracted from US Federal Budget documents available through FRASER (a digital library of U.S. economic, financial, and banking history) or, for the more recent documents the website of the White House.
The data headings are:
pid: in most cases, this is the index for the text document as archived on The American Presidency Project website. In some cases, this was the filename of a plain-text file read directly.
year: Year that the message was delivered.
date: Date that the message was delivered.
name: Name of the US President delivering the message.
count_of_all_words: Count of all words in the document.
count_of_keywords: Count of all keywords encountered in that document.
Keyword specific columns - three per keyword. For example, for the 'energy' keyword, the 'energy' column gives the number of times the 'energy' keyword was counted in the message, 'energy_pct_of_keywords' gives this count as a percentage of all keywords, and 'energy_pct_of_all_words' gives this count as a percentage of all words
Below is the list of keywords that match when the search is applied to a dictionary file containing over 99,000 US English words.
energy: 'energy'
tax: 'nontaxable', 'overtax', 'overtaxed', 'overtaxes', 'overtaxing', 'surtax', 'surtaxed', 'surtaxes', 'surtaxing', 'surtaxs', 'tax', 'taxable', 'taxation', 'taxations', 'taxed', 'taxes', 'taxing', 'taxpayer', 'taxpayers', 'taxs'
defense: 'defend', 'defense'
education: 'education'
employment: 'employ', 'employable', 'employe', 'employed', 'employee', 'employees', 'employer', 'employers', 'employes', 'employing', 'employment', 'employments', 'employs', 'underemployed', 'unemployable', 'unemployed', 'unemployeds', 'unemployment', 'unemployments'
research: 'research', 'researched', 'researcher', 'researchers', 'researches', 'researching', 'researchs'
shooting: 'shooting'
space: 'space'
nuclear: 'nuclear'
natural resources: 'natural resources'
racism: 'racism', 'civil rights'
crime: 'crime', 'crimes', 'criminal', 'criminally', 'criminals', 'decriminalization', 'decriminalizations', 'decriminalize', 'decriminalized', 'decriminalizes', 'decriminalizing'
environment: 'environment', 'environmental', 'environmentalism', 'environmentalisms', 'environmentalist', 'environmentalists', 'environmentally', 'environments'
religion: 'faith', 'god', 'prayer', 'religion'
health: 'health', 'healthful', 'healthfully', 'healthfulness', 'healthfulnesss', 'healthier', 'healthiest', 'healthily', 'healthiness', 'healthinesss', 'healths', 'healthy', 'unhealthful', 'unhealthier', 'unhealthiest', 'unhealthy'
terror: 'terror', 'terrorism', 'terrorisms', 'terrorist', 'terrorists', 'terrorize', 'terrorized', 'terrorizes', 'terrorizing', 'terrors'
war: 'war', 'warrior', 'warriors', 'wars'
economy: 'economic', 'economical', 'economically', 'economics', 'economicss', 'economy', 'economys', 'microeconomics', 'microeconomicss', 'socioeconomic', 'uneconomic', 'uneconomical'
jobs: 'jobs'
business: 'agribusiness', 'agribusinesses', 'agribusinesss', 'business', 'businesses', 'businesslike', 'businessman', 'businessmans', 'businessmen', 'businesss', 'businesswoman', 'businesswomans', 'businesswomen'
drugs: 'drugs', 'narcotics'
inflation: 'inflation'
climate: 'climate'
science: 'science', 'sciences', 'scientific', 'scientifically', 'scientist', 'scientists'
gun: 'gun', 'gunfire', 'gunman', 'guns', 'handgun', 'rifle', 'shotgun'
tech: 'biotechnology', 'biotechnologys', 'technical', 'technological', 'technologically', 'technologies', 'technologist', 'technologists', 'technology', 'technologys'
military: 'military'
security: 'security'
housing: 'housing'
pollution: 'pollution'
The dictionary file used is a standard file among Linux systems, and the version used was provided with version 7.1-1 of the Ubuntu 'wamerican' package. Two extra phrases, which do not appear in the dictionary file, are added to the list: 'civil rights' (under the 'racism' keyword) and 'natural resources' (under the 'natural resources' theme).