Dataset Open Access

Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages

Jeremy Silver; Mark Quigley


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p>Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages. This was done using the Python scripts provided under&nbsp;<a href=\"https://github.com/JeremySilver/KeywordCountsPresidentialMessages\">https://github.com/JeremySilver/KeywordCountsPresidentialMessages</a>. The raw text data is from&nbsp;<a href=\"http://www.presidency.ucsb.edu/\">The American Presidency Project</a>&nbsp;(<a href=\"http://www.ucsb.edu/\">UCSB</a>), with some&nbsp;Presidential Budget Messages being extracted from US Federal Budget documents available through&nbsp;<a href=\"https://fraser.stlouisfed.org/\">FRASER</a>&nbsp;(a digital library of U.S. economic, financial, and banking history) or, for the more recent documents the website of the&nbsp;<a href=\"https://www.whitehouse.gov/\">White House</a>.</p>\n\n<p>The data headings are:</p>\n\n<ul>\n\t<li>pid: in most cases, this is the index for the text document as archived on&nbsp;<a href=\"http://www.presidency.ucsb.edu/\">The American Presidency Project</a>&nbsp;website. In some cases, this was the filename of a plain-text file read directly.</li>\n\t<li>year: Year that the message was delivered.</li>\n\t<li>date: Date that the message was delivered.</li>\n\t<li>name: Name of the US President delivering the message.</li>\n\t<li>count_of_all_words: Count of all words in the document.</li>\n\t<li>count_of_keywords: Count of all keywords encountered in that document.</li>\n\t<li>Keyword specific columns - three per keyword. For example, for the &#39;energy&#39; keyword, the&nbsp;&#39;energy&#39; column gives the number of times the &#39;energy&#39; keyword was counted in the message, &#39;energy_pct_of_keywords&#39; gives this count as a percentage of all keywords, and &#39;energy_pct_of_all_words&#39; gives this count as a percentage of all words</li>\n</ul>\n\n<p>Below is the list of keywords that match when the search is applied to a dictionary file containing over 99,000 US English words.</p>\n\n<ul>\n\t<li>energy: &#39;energy&#39;</li>\n\t<li>tax: &#39;nontaxable&#39;, &#39;overtax&#39;, &#39;overtaxed&#39;, &#39;overtaxes&#39;, &#39;overtaxing&#39;, &#39;surtax&#39;, &#39;surtaxed&#39;, &#39;surtaxes&#39;, &#39;surtaxing&#39;, &#39;surtaxs&#39;, &#39;tax&#39;, &#39;taxable&#39;, &#39;taxation&#39;, &#39;taxations&#39;, &#39;taxed&#39;, &#39;taxes&#39;, &#39;taxing&#39;, &#39;taxpayer&#39;, &#39;taxpayers&#39;, &#39;taxs&#39;</li>\n\t<li>defense: &#39;defend&#39;, &#39;defense&#39;</li>\n\t<li>education: &#39;education&#39;</li>\n\t<li>employment: &#39;employ&#39;, &#39;employable&#39;, &#39;employe&#39;, &#39;employed&#39;, &#39;employee&#39;, &#39;employees&#39;, &#39;employer&#39;, &#39;employers&#39;, &#39;employes&#39;, &#39;employing&#39;, &#39;employment&#39;, &#39;employments&#39;, &#39;employs&#39;, &#39;underemployed&#39;, &#39;unemployable&#39;, &#39;unemployed&#39;, &#39;unemployeds&#39;, &#39;unemployment&#39;, &#39;unemployments&#39;</li>\n\t<li>research: &#39;research&#39;, &#39;researched&#39;, &#39;researcher&#39;, &#39;researchers&#39;, &#39;researches&#39;, &#39;researching&#39;, &#39;researchs&#39;</li>\n\t<li>shooting: &#39;shooting&#39;</li>\n\t<li>space: &#39;space&#39;</li>\n\t<li>nuclear: &#39;nuclear&#39;</li>\n\t<li>natural&nbsp;resources: &#39;natural&nbsp;resources&#39;</li>\n\t<li>racism: &#39;racism&#39;, &#39;civil rights&#39;</li>\n\t<li>crime: &#39;crime&#39;, &#39;crimes&#39;, &#39;criminal&#39;, &#39;criminally&#39;, &#39;criminals&#39;, &#39;decriminalization&#39;, &#39;decriminalizations&#39;, &#39;decriminalize&#39;, &#39;decriminalized&#39;, &#39;decriminalizes&#39;, &#39;decriminalizing&#39;</li>\n\t<li>environment: &#39;environment&#39;, &#39;environmental&#39;, &#39;environmentalism&#39;, &#39;environmentalisms&#39;, &#39;environmentalist&#39;, &#39;environmentalists&#39;, &#39;environmentally&#39;, &#39;environments&#39;</li>\n\t<li>religion: &#39;faith&#39;, &#39;god&#39;, &#39;prayer&#39;, &#39;religion&#39;</li>\n\t<li>health: &#39;health&#39;, &#39;healthful&#39;, &#39;healthfully&#39;, &#39;healthfulness&#39;, &#39;healthfulnesss&#39;, &#39;healthier&#39;, &#39;healthiest&#39;, &#39;healthily&#39;, &#39;healthiness&#39;, &#39;healthinesss&#39;, &#39;healths&#39;, &#39;healthy&#39;, &#39;unhealthful&#39;, &#39;unhealthier&#39;, &#39;unhealthiest&#39;, &#39;unhealthy&#39;</li>\n\t<li>terror: &#39;terror&#39;, &#39;terrorism&#39;, &#39;terrorisms&#39;, &#39;terrorist&#39;, &#39;terrorists&#39;, &#39;terrorize&#39;, &#39;terrorized&#39;, &#39;terrorizes&#39;, &#39;terrorizing&#39;, &#39;terrors&#39;</li>\n\t<li>war: &#39;war&#39;, &#39;warrior&#39;, &#39;warriors&#39;, &#39;wars&#39;</li>\n\t<li>economy: &#39;economic&#39;, &#39;economical&#39;, &#39;economically&#39;, &#39;economics&#39;, &#39;economicss&#39;, &#39;economy&#39;, &#39;economys&#39;, &#39;microeconomics&#39;, &#39;microeconomicss&#39;, &#39;socioeconomic&#39;, &#39;uneconomic&#39;, &#39;uneconomical&#39;</li>\n\t<li>jobs: &#39;jobs&#39;</li>\n\t<li>business: &#39;agribusiness&#39;, &#39;agribusinesses&#39;, &#39;agribusinesss&#39;, &#39;business&#39;, &#39;businesses&#39;, &#39;businesslike&#39;, &#39;businessman&#39;, &#39;businessmans&#39;, &#39;businessmen&#39;, &#39;businesss&#39;, &#39;businesswoman&#39;, &#39;businesswomans&#39;, &#39;businesswomen&#39;</li>\n\t<li>drugs: &#39;drugs&#39;, &#39;narcotics&#39;</li>\n\t<li>inflation: &#39;inflation&#39;</li>\n\t<li>climate: &#39;climate&#39;</li>\n\t<li>science: &#39;science&#39;, &#39;sciences&#39;, &#39;scientific&#39;, &#39;scientifically&#39;, &#39;scientist&#39;, &#39;scientists&#39;</li>\n\t<li>gun: &#39;gun&#39;, &#39;gunfire&#39;, &#39;gunman&#39;, &#39;guns&#39;, &#39;handgun&#39;, &#39;rifle&#39;, &#39;shotgun&#39;</li>\n\t<li>tech: &#39;biotechnology&#39;, &#39;biotechnologys&#39;, &#39;technical&#39;, &#39;technological&#39;, &#39;technologically&#39;, &#39;technologies&#39;, &#39;technologist&#39;, &#39;technologists&#39;, &#39;technology&#39;, &#39;technologys&#39;</li>\n\t<li>military: &#39;military&#39;</li>\n\t<li>security: &#39;security&#39;</li>\n\t<li>housing: &#39;housing&#39;</li>\n\t<li>pollution: &#39;pollution&#39;</li>\n</ul>\n\n<p>The dictionary file used is a standard file among Linux systems, and the version used was provided with version 7.1-1 of the Ubuntu &#39;wamerican&#39; package.&nbsp;Two extra phrases, which do not appear in the dictionary file, are added to the list: &#39;civil rights&#39; (under the &#39;racism&#39; keyword) and &#39;natural&nbsp;resources&#39; (under the &#39;natural&nbsp;resources&#39; theme).</p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "University of Melbourne", 
      "@id": "https://orcid.org/0000-0003-1502-6249", 
      "@type": "Person", 
      "name": "Jeremy Silver"
    }, 
    {
      "affiliation": "University of Melbourne", 
      "@id": "https://orcid.org/0000-0002-4430-4212", 
      "@type": "Person", 
      "name": "Mark Quigley"
    }
  ], 
  "url": "https://zenodo.org/record/3250516", 
  "datePublished": "2019-06-20", 
  "version": "V1.0", 
  "keywords": [
    "US Politics", 
    "Text processing", 
    "Keyword counts"
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/7f86334a-ec6a-47a6-bae5-793e39806579/results_PBM.txt", 
      "encodingFormat": "txt", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7f86334a-ec6a-47a6-bae5-793e39806579/results_SoU.txt", 
      "encodingFormat": "txt", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.3250516", 
  "@id": "https://doi.org/10.5281/zenodo.3250516", 
  "@type": "Dataset", 
  "name": "Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages"
}
134
29
views
downloads
All versions This version
Views 134134
Downloads 2929
Data volume 1.1 MB1.1 MB
Unique views 9999
Unique downloads 2121

Share

Cite as