Dataset Open Access

Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages

Jeremy Silver; Mark Quigley


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/7f86334a-ec6a-47a6-bae5-793e39806579/results_PBM.txt"
      }, 
      "checksum": "md5:fd44fceda37d3f1bef39543ed87dc11d", 
      "bucket": "7f86334a-ec6a-47a6-bae5-793e39806579", 
      "key": "results_PBM.txt", 
      "type": "txt", 
      "size": 40658
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7f86334a-ec6a-47a6-bae5-793e39806579/results_SoU.txt"
      }, 
      "checksum": "md5:19dd5e5fb940579f362b13e386213de2", 
      "bucket": "7f86334a-ec6a-47a6-bae5-793e39806579", 
      "key": "results_SoU.txt", 
      "type": "txt", 
      "size": 35875
    }
  ], 
  "owners": [
    70144
  ], 
  "doi": "10.5281/zenodo.3250516", 
  "stats": {
    "version_unique_downloads": 21.0, 
    "unique_views": 99.0, 
    "views": 134.0, 
    "version_views": 134.0, 
    "unique_downloads": 21.0, 
    "version_unique_views": 99.0, 
    "volume": 1112120.0, 
    "version_downloads": 29.0, 
    "downloads": 29.0, 
    "version_volume": 1112120.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.3250516", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.3250515", 
    "bucket": "https://zenodo.org/api/files/7f86334a-ec6a-47a6-bae5-793e39806579", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.3250515.svg", 
    "html": "https://zenodo.org/record/3250516", 
    "latest_html": "https://zenodo.org/record/3250516", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3250516.svg", 
    "latest": "https://zenodo.org/api/records/3250516"
  }, 
  "conceptdoi": "10.5281/zenodo.3250515", 
  "created": "2019-06-20T05:24:40.325594+00:00", 
  "updated": "2020-01-21T07:23:09.538088+00:00", 
  "conceptrecid": "3250515", 
  "revision": 4, 
  "id": 3250516, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.3250516", 
    "description": "<p>Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages. This was done using the Python scripts provided under&nbsp;<a href=\"https://github.com/JeremySilver/KeywordCountsPresidentialMessages\">https://github.com/JeremySilver/KeywordCountsPresidentialMessages</a>. The raw text data is from&nbsp;<a href=\"http://www.presidency.ucsb.edu/\">The American Presidency Project</a>&nbsp;(<a href=\"http://www.ucsb.edu/\">UCSB</a>), with some&nbsp;Presidential Budget Messages being extracted from US Federal Budget documents available through&nbsp;<a href=\"https://fraser.stlouisfed.org/\">FRASER</a>&nbsp;(a digital library of U.S. economic, financial, and banking history) or, for the more recent documents the website of the&nbsp;<a href=\"https://www.whitehouse.gov/\">White House</a>.</p>\n\n<p>The data headings are:</p>\n\n<ul>\n\t<li>pid: in most cases, this is the index for the text document as archived on&nbsp;<a href=\"http://www.presidency.ucsb.edu/\">The American Presidency Project</a>&nbsp;website. In some cases, this was the filename of a plain-text file read directly.</li>\n\t<li>year: Year that the message was delivered.</li>\n\t<li>date: Date that the message was delivered.</li>\n\t<li>name: Name of the US President delivering the message.</li>\n\t<li>count_of_all_words: Count of all words in the document.</li>\n\t<li>count_of_keywords: Count of all keywords encountered in that document.</li>\n\t<li>Keyword specific columns - three per keyword. For example, for the &#39;energy&#39; keyword, the&nbsp;&#39;energy&#39; column gives the number of times the &#39;energy&#39; keyword was counted in the message, &#39;energy_pct_of_keywords&#39; gives this count as a percentage of all keywords, and &#39;energy_pct_of_all_words&#39; gives this count as a percentage of all words</li>\n</ul>\n\n<p>Below is the list of keywords that match when the search is applied to a dictionary file containing over 99,000 US English words.</p>\n\n<ul>\n\t<li>energy: &#39;energy&#39;</li>\n\t<li>tax: &#39;nontaxable&#39;, &#39;overtax&#39;, &#39;overtaxed&#39;, &#39;overtaxes&#39;, &#39;overtaxing&#39;, &#39;surtax&#39;, &#39;surtaxed&#39;, &#39;surtaxes&#39;, &#39;surtaxing&#39;, &#39;surtaxs&#39;, &#39;tax&#39;, &#39;taxable&#39;, &#39;taxation&#39;, &#39;taxations&#39;, &#39;taxed&#39;, &#39;taxes&#39;, &#39;taxing&#39;, &#39;taxpayer&#39;, &#39;taxpayers&#39;, &#39;taxs&#39;</li>\n\t<li>defense: &#39;defend&#39;, &#39;defense&#39;</li>\n\t<li>education: &#39;education&#39;</li>\n\t<li>employment: &#39;employ&#39;, &#39;employable&#39;, &#39;employe&#39;, &#39;employed&#39;, &#39;employee&#39;, &#39;employees&#39;, &#39;employer&#39;, &#39;employers&#39;, &#39;employes&#39;, &#39;employing&#39;, &#39;employment&#39;, &#39;employments&#39;, &#39;employs&#39;, &#39;underemployed&#39;, &#39;unemployable&#39;, &#39;unemployed&#39;, &#39;unemployeds&#39;, &#39;unemployment&#39;, &#39;unemployments&#39;</li>\n\t<li>research: &#39;research&#39;, &#39;researched&#39;, &#39;researcher&#39;, &#39;researchers&#39;, &#39;researches&#39;, &#39;researching&#39;, &#39;researchs&#39;</li>\n\t<li>shooting: &#39;shooting&#39;</li>\n\t<li>space: &#39;space&#39;</li>\n\t<li>nuclear: &#39;nuclear&#39;</li>\n\t<li>natural&nbsp;resources: &#39;natural&nbsp;resources&#39;</li>\n\t<li>racism: &#39;racism&#39;, &#39;civil rights&#39;</li>\n\t<li>crime: &#39;crime&#39;, &#39;crimes&#39;, &#39;criminal&#39;, &#39;criminally&#39;, &#39;criminals&#39;, &#39;decriminalization&#39;, &#39;decriminalizations&#39;, &#39;decriminalize&#39;, &#39;decriminalized&#39;, &#39;decriminalizes&#39;, &#39;decriminalizing&#39;</li>\n\t<li>environment: &#39;environment&#39;, &#39;environmental&#39;, &#39;environmentalism&#39;, &#39;environmentalisms&#39;, &#39;environmentalist&#39;, &#39;environmentalists&#39;, &#39;environmentally&#39;, &#39;environments&#39;</li>\n\t<li>religion: &#39;faith&#39;, &#39;god&#39;, &#39;prayer&#39;, &#39;religion&#39;</li>\n\t<li>health: &#39;health&#39;, &#39;healthful&#39;, &#39;healthfully&#39;, &#39;healthfulness&#39;, &#39;healthfulnesss&#39;, &#39;healthier&#39;, &#39;healthiest&#39;, &#39;healthily&#39;, &#39;healthiness&#39;, &#39;healthinesss&#39;, &#39;healths&#39;, &#39;healthy&#39;, &#39;unhealthful&#39;, &#39;unhealthier&#39;, &#39;unhealthiest&#39;, &#39;unhealthy&#39;</li>\n\t<li>terror: &#39;terror&#39;, &#39;terrorism&#39;, &#39;terrorisms&#39;, &#39;terrorist&#39;, &#39;terrorists&#39;, &#39;terrorize&#39;, &#39;terrorized&#39;, &#39;terrorizes&#39;, &#39;terrorizing&#39;, &#39;terrors&#39;</li>\n\t<li>war: &#39;war&#39;, &#39;warrior&#39;, &#39;warriors&#39;, &#39;wars&#39;</li>\n\t<li>economy: &#39;economic&#39;, &#39;economical&#39;, &#39;economically&#39;, &#39;economics&#39;, &#39;economicss&#39;, &#39;economy&#39;, &#39;economys&#39;, &#39;microeconomics&#39;, &#39;microeconomicss&#39;, &#39;socioeconomic&#39;, &#39;uneconomic&#39;, &#39;uneconomical&#39;</li>\n\t<li>jobs: &#39;jobs&#39;</li>\n\t<li>business: &#39;agribusiness&#39;, &#39;agribusinesses&#39;, &#39;agribusinesss&#39;, &#39;business&#39;, &#39;businesses&#39;, &#39;businesslike&#39;, &#39;businessman&#39;, &#39;businessmans&#39;, &#39;businessmen&#39;, &#39;businesss&#39;, &#39;businesswoman&#39;, &#39;businesswomans&#39;, &#39;businesswomen&#39;</li>\n\t<li>drugs: &#39;drugs&#39;, &#39;narcotics&#39;</li>\n\t<li>inflation: &#39;inflation&#39;</li>\n\t<li>climate: &#39;climate&#39;</li>\n\t<li>science: &#39;science&#39;, &#39;sciences&#39;, &#39;scientific&#39;, &#39;scientifically&#39;, &#39;scientist&#39;, &#39;scientists&#39;</li>\n\t<li>gun: &#39;gun&#39;, &#39;gunfire&#39;, &#39;gunman&#39;, &#39;guns&#39;, &#39;handgun&#39;, &#39;rifle&#39;, &#39;shotgun&#39;</li>\n\t<li>tech: &#39;biotechnology&#39;, &#39;biotechnologys&#39;, &#39;technical&#39;, &#39;technological&#39;, &#39;technologically&#39;, &#39;technologies&#39;, &#39;technologist&#39;, &#39;technologists&#39;, &#39;technology&#39;, &#39;technologys&#39;</li>\n\t<li>military: &#39;military&#39;</li>\n\t<li>security: &#39;security&#39;</li>\n\t<li>housing: &#39;housing&#39;</li>\n\t<li>pollution: &#39;pollution&#39;</li>\n</ul>\n\n<p>The dictionary file used is a standard file among Linux systems, and the version used was provided with version 7.1-1 of the Ubuntu &#39;wamerican&#39; package.&nbsp;Two extra phrases, which do not appear in the dictionary file, are added to the list: &#39;civil rights&#39; (under the &#39;racism&#39; keyword) and &#39;natural&nbsp;resources&#39; (under the &#39;natural&nbsp;resources&#39; theme).</p>", 
    "language": "eng", 
    "title": "Keyword counts from US Presidential State of the Union Addresses and Presidential Budget Messages", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "3250515"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "3250516"
          }
        }
      ]
    }, 
    "version": "V1.0", 
    "keywords": [
      "US Politics", 
      "Text processing", 
      "Keyword counts"
    ], 
    "publication_date": "2019-06-20", 
    "creators": [
      {
        "orcid": "0000-0003-1502-6249", 
        "affiliation": "University of Melbourne", 
        "name": "Jeremy Silver"
      }, 
      {
        "orcid": "0000-0002-4430-4212", 
        "affiliation": "University of Melbourne", 
        "name": "Mark Quigley"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.3250515", 
        "relation": "isVersionOf"
      }
    ]
  }
}
134
29
views
downloads
All versions This version
Views 134134
Downloads 2929
Data volume 1.1 MB1.1 MB
Unique views 9999
Unique downloads 2121

Share

Cite as