Published June 20, 2019 | Version V1.0
Dataset Open

Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages

  • 1. University of Melbourne


Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages. This was done using the Python scripts provided under The raw text data is from The American Presidency Project (UCSB), with some Presidential Budget Messages being extracted from US Federal Budget documents available through FRASER (a digital library of U.S. economic, financial, and banking history) or, for the more recent documents the website of the White House.

The data headings are:

  • pid: in most cases, this is the index for the text document as archived on The American Presidency Project website. In some cases, this was the filename of a plain-text file read directly.
  • year: Year that the message was delivered.
  • date: Date that the message was delivered.
  • name: Name of the US President delivering the message.
  • count_of_all_words: Count of all words in the document.
  • count_of_keywords: Count of all keywords encountered in that document.
  • Keyword specific columns - three per keyword. For example, for the 'energy' keyword, the 'energy' column gives the number of times the 'energy' keyword was counted in the message, 'energy_pct_of_keywords' gives this count as a percentage of all keywords, and 'energy_pct_of_all_words' gives this count as a percentage of all words

Below is the list of keywords that match when the search is applied to a dictionary file containing over 99,000 US English words.

  • energy: 'energy'
  • tax: 'nontaxable', 'overtax', 'overtaxed', 'overtaxes', 'overtaxing', 'surtax', 'surtaxed', 'surtaxes', 'surtaxing', 'surtaxs', 'tax', 'taxable', 'taxation', 'taxations', 'taxed', 'taxes', 'taxing', 'taxpayer', 'taxpayers', 'taxs'
  • defense: 'defend', 'defense'
  • education: 'education'
  • employment: 'employ', 'employable', 'employe', 'employed', 'employee', 'employees', 'employer', 'employers', 'employes', 'employing', 'employment', 'employments', 'employs', 'underemployed', 'unemployable', 'unemployed', 'unemployeds', 'unemployment', 'unemployments'
  • research: 'research', 'researched', 'researcher', 'researchers', 'researches', 'researching', 'researchs'
  • shooting: 'shooting'
  • space: 'space'
  • nuclear: 'nuclear'
  • natural resources: 'natural resources'
  • racism: 'racism', 'civil rights'
  • crime: 'crime', 'crimes', 'criminal', 'criminally', 'criminals', 'decriminalization', 'decriminalizations', 'decriminalize', 'decriminalized', 'decriminalizes', 'decriminalizing'
  • environment: 'environment', 'environmental', 'environmentalism', 'environmentalisms', 'environmentalist', 'environmentalists', 'environmentally', 'environments'
  • religion: 'faith', 'god', 'prayer', 'religion'
  • health: 'health', 'healthful', 'healthfully', 'healthfulness', 'healthfulnesss', 'healthier', 'healthiest', 'healthily', 'healthiness', 'healthinesss', 'healths', 'healthy', 'unhealthful', 'unhealthier', 'unhealthiest', 'unhealthy'
  • terror: 'terror', 'terrorism', 'terrorisms', 'terrorist', 'terrorists', 'terrorize', 'terrorized', 'terrorizes', 'terrorizing', 'terrors'
  • war: 'war', 'warrior', 'warriors', 'wars'
  • economy: 'economic', 'economical', 'economically', 'economics', 'economicss', 'economy', 'economys', 'microeconomics', 'microeconomicss', 'socioeconomic', 'uneconomic', 'uneconomical'
  • jobs: 'jobs'
  • business: 'agribusiness', 'agribusinesses', 'agribusinesss', 'business', 'businesses', 'businesslike', 'businessman', 'businessmans', 'businessmen', 'businesss', 'businesswoman', 'businesswomans', 'businesswomen'
  • drugs: 'drugs', 'narcotics'
  • inflation: 'inflation'
  • climate: 'climate'
  • science: 'science', 'sciences', 'scientific', 'scientifically', 'scientist', 'scientists'
  • gun: 'gun', 'gunfire', 'gunman', 'guns', 'handgun', 'rifle', 'shotgun'
  • tech: 'biotechnology', 'biotechnologys', 'technical', 'technological', 'technologically', 'technologies', 'technologist', 'technologists', 'technology', 'technologys'
  • military: 'military'
  • security: 'security'
  • housing: 'housing'
  • pollution: 'pollution'

The dictionary file used is a standard file among Linux systems, and the version used was provided with version 7.1-1 of the Ubuntu 'wamerican' package. Two extra phrases, which do not appear in the dictionary file, are added to the list: 'civil rights' (under the 'racism' keyword) and 'natural resources' (under the 'natural resources' theme).



Files (76.5 kB)

Name Size Download all
40.7 kB Preview Download
35.9 kB Preview Download