Returns document subsets of a tokens that meet certain conditions, including
direct logical operations on docvars (document-level variables).
tokens_subset
functions identically to
subset.data.frame()
, using non-standard evaluation to evaluate
conditions based on the docvars in the tokens.
tokens_subset(x, subset, ...)
x | tokens object to be subsetted |
---|---|
subset | logical expression indicating the documents to keep: missing values are taken as false |
... | not used |
tokens object, with a subset of documents (and docvars) selected according to arguments
corp <- corpus(c(d1 = "a b c d", d2 = "a a b e", d3 = "b b c e", d4 = "e e f a b"), docvars = data.frame(grp = c(1, 1, 2, 3))) toks <- tokens(corp) # selecting on a docvars condition tokens_subset(toks, grp > 1)#> Tokens consisting of 2 documents and 1 docvar. #> d3 : #> [1] "b" "b" "c" "e" #> #> d4 : #> [1] "e" "e" "f" "a" "b" #>#> Tokens consisting of 2 documents and 1 docvar. #> d1 : #> [1] "a" "b" "c" "d" #> #> d3 : #> [1] "b" "b" "c" "e" #>