Create a quanteda dictionary class object, either from a list or by importing from a foreign format. Currently supported input file formats are the Wordstat, LIWC, Lexicoder v2 and v3, and Yoshikoder formats. The import using the LIWC format works with all currently available dictionary files supplied as part of the LIWC 2001, 2007, and 2015 software (see References).
dictionary(x, file = NULL, format = NULL, separator = " ", tolower = TRUE, encoding = "auto")
x | a named list of character vector dictionary entries, including valuetype pattern
matches, and including multi-word expressions separated by |
---|---|
file | file identifier for a foreign dictionary |
format | character identifier for the format of the foreign dictionary. If not supplied, the format is guessed from the dictionary file's extension. Available options are:
|
separator | the character in between multi-word dictionary values.
This defaults to |
tolower | if |
encoding | additional optional encoding value for reading in imported dictionaries. This uses the iconv labels for encoding. See the "Encoding" section of the help for file. |
A dictionary class object, essentially a specially classed named list of characters.
Wordstat dictionaries page, from Provalis Research http://provalisresearch.com/products/content-analysis-software/wordstat-dictionary/. Pennebaker, J.W., Chung, C.K., Ireland, M., Gonzales, A., & Booth, R.J. (2007). The development and psychometric properties of LIWC2007. [Software manual]. Austin, TX (www.liwc.net). Yoshikoder page, from Will Lowe http://conjugateprior.org/software/yoshikoder/. Lexicoder format, http://www.lexicoder.com
mycorpus <- corpus_subset(data_corpus_inaugural, Year>1900) mydict <- dictionary(list(christmas = c("Christmas", "Santa", "holiday"), opposition = c("Opposition", "reject", "notincorpus"), taxing = "taxing", taxation = "taxation", taxregex = "tax*", country = "america")) head(dfm(mycorpus, dictionary = mydict))#> Document-feature matrix of: 30 documents, 6 features (71.7% sparse). #> (showing first 6 documents and first 6 features) #> features #> docs christmas opposition taxing taxation taxregex country #> 1901-McKinley 0 2 0 1 1 0 #> 1905-Roosevelt 0 0 0 0 0 0 #> 1909-Taft 0 1 0 4 6 4 #> 1913-Wilson 0 0 0 1 1 0 #> 1917-Wilson 0 0 0 0 0 2 #> 1921-Harding 0 0 0 1 2 15not_run({ # import the Laver-Garry dictionary from Provalis Research dictfile <- tempfile() download.file("https://provalisresearch.com/Download/LaverGarry.zip", dictfile, mode = "wb") unzip(dictfile, exdir = (td <- tempdir())) lgdict <- dictionary(file = paste(td, "LaverGarry.cat", sep = "/")) head(dfm(data_corpus_inaugural, dictionary = lgdict)) # import a LIWC formatted dictionary from http://www.moralfoundations.org download.file("https://goo.gl/5gmwXq", tf <- tempfile()) mfdict <- dictionary(file = tf, format = "LIWC") head(dfm(data_corpus_inaugural, dictionary = mfdict)) })