pid: in most cases, this is the index for the text document as archived on The American Presidency Project website. In some cases, this was the filename of a plain-text file read directly.
year: Year that the message was delivered.
date: Date that the message was delivered.
name: Name of the US President delivering the message.
count_of_all_words: Count of all words in the document.
count_of_keywords: Count of all keywords encountered in that document.
Keyword specific columns - three per keyword. For example, for the 'energy' keyword, the 'energy' column gives the number of times the 'energy' keyword was counted in the message, 'energy_pct_of_keywords' gives this count as a percentage of all keywords, and 'energy_pct_of_all_words' gives this count as a percentage of all words
Below is the list of keywords that match when the search is applied to a dictionary file containing over 99,000 US English words.
The dictionary file used is a standard file among Linux systems, and the version used was provided with version 7.1-1 of the Ubuntu 'wamerican' package. Two extra phrases, which do not appear in the dictionary file, are added to the list: 'civil rights' (under the 'racism' keyword) and 'natural resources' (under the 'natural resources' theme).