Get or set variables associated with a document in a corpus, tokens or dfm object.
docvars(x, field = NULL) docvars(x, field = NULL) <- value
x | corpus, tokens, or dfm object whose document-level variables will be read or set |
---|---|
field | string containing the document-level variable name |
value | the new values of the document-level variable |
docvars
returns a data.frame of the document-level variables,
dropping the second dimension to form a vector if a single docvar is
returned.
docvars<-
assigns value
to the named field
Reassigning document variables for a tokens or dfm object is allowed, but discouraged. A better, more reproducible workflow is to create your docvars as desired in the corpus, and let these continue to be attached "downstream" after tokenization and forming a document-feature matrix. Recognizing that in some cases, you may need to modify or add document variables to downstream objects, the assignment operator is defined for tokens or dfm objects as well. Use with caution.
Another way to access and set docvars is through indexing of the corpus
j
element, such as data_corpus_irishbudget2010[, c("foren",
"name"]
; or, for a single docvar,
data_corpus_irishbudget2010[["name"]]
. The latter also permits
assignment, including the easy creation of new document variables, e.g.
data_corpus_irishbudget2010[["newvar"]] <-
1:ndoc(data_corpus_irishbudget2010)
. See [.corpus
for details.
# retrieving docvars from a corpus head(docvars(data_corpus_inaugural))#> Year President FirstName #> 1789-Washington 1789 Washington George #> 1793-Washington 1793 Washington George #> 1797-Adams 1797 Adams John #> 1801-Jefferson 1801 Jefferson Thomas #> 1805-Jefferson 1805 Jefferson Thomas #> 1809-Madison 1809 Madison Jamestail(docvars(data_corpus_inaugural, "President"), 10)#> [1] "Reagan" "Reagan" "Bush" "Clinton" "Clinton" "Bush" "Bush" #> [8] "Obama" "Obama" "Trump"# assigning document variables to a corpus corp <- data_corpus_inaugural docvars(corp, "President") <- paste("prez", 1:ndoc(corp), sep = "") head(docvars(corp))#> Year President FirstName #> 1789-Washington 1789 prez1 George #> 1793-Washington 1793 prez2 George #> 1797-Adams 1797 prez3 John #> 1801-Jefferson 1801 prez4 Thomas #> 1805-Jefferson 1805 prez5 Thomas #> 1809-Madison 1809 prez6 James# alternative using indexing head(corp[, "Year"])#> [1] 1789 1793 1797 1801 1805 1809#> Year President FirstName President2 #> 1789-Washington 1789 prez1 George prezTwo1 #> 1793-Washington 1793 prez2 George prezTwo2 #> 1797-Adams 1797 prez3 John prezTwo3 #> 1801-Jefferson 1801 prez4 Thomas prezTwo4 #> 1805-Jefferson 1805 prez5 Thomas prezTwo5 #> 1809-Madison 1809 prez6 James prezTwo6