Returns document subsets of a tokens that meet certain conditions, including
direct logical operations on docvars (document-level variables).
tokens_subset
functions identically to
subset.data.frame
, using non-standard evaluation to evaluate
conditions based on the docvars in the tokens.
tokens_subset(x, subset, select, ...)
x | tokens object to be subsetted |
---|---|
subset | logical expression indicating the documents to keep: missing values are taken as false |
select | expression, indicating the docvars to select from the tokens; or a tokens object, in which case the returned tokens will contain the same documents in the same order as the original tokens, even if these are empty. |
... | not used |
tokens object, with a subset of documents (and docvars) selected according to arguments
subset.data.frame
corp <- corpus(c(d1 = "a b c d", d2 = "a a b e", d3 = "b b c e", d4 = "e e f a b"), docvars = data.frame(grp = c(1, 1, 2, 3))) toks <- tokens(corp) # selecting on a docvars condition tokens_subset(toks, grp > 1)#> tokens from 2 documents. #> d3 : #> [1] "b" "b" "c" "e" #> #> d4 : #> [1] "e" "e" "f" "a" "b" #># selecting on a supplied vector tokens_subset(toks, c(TRUE, FALSE, TRUE, FALSE))#> tokens from 2 documents. #> d1 : #> [1] "a" "b" "c" "d" #> #> d3 : #> [1] "b" "b" "c" "e" #># selecting on a tokens toks1 <- tokens(c(d1 = "a b b c", d2 = "b b c d")) toks2 <- tokens(c(d1 = "x y z", d2 = "a b c c d", d3 = "x x x")) tokens_subset(toks1, subset = toks2)#> tokens from 3 documents. #> d1 : #> [1] "a" "b" "b" "c" #> #> d2 : #> [1] "b" "b" "c" "d" #> #> d3 : #> character(0) #>tokens_subset(toks1, subset = toks2[c(3,1,2)])#> tokens from 3 documents. #> d3 : #> character(0) #> #> d1 : #> [1] "a" "b" "b" "c" #> #> d2 : #> [1] "b" "b" "c" "d" #>