Calls C++ for super-fast selection or removal of features from a set of tokens.
selectFeaturesOLD(x, ...) # S3 method for tokenizedTexts selectFeaturesOLD(x, features, selection = c("keep", "remove"), valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, verbose = TRUE, ...)
x | object whose features will be selected |
---|---|
... | supplementary arguments passed to the underlying functions in
|
features | one of: a character vector of features to be selected, a dfm whose features will be used for selection, or a dictionary class object whose values (not keys) will provide the features to be selected. For dfm objects, see details in the Value section below. |
selection | whether to keep or remove the features |
valuetype | how to interpret keyword expressions: |
case_insensitive | ignore the case of dictionary values if |
verbose | if |
toks <- tokenize(c("This is some example text from me.", "More of the example text."), remove_punct = TRUE) selectFeaturesOLD(toks, stopwords("english"), "remove")#> tokenizedTexts from 2 documents. #> Component 1 : #> [1] "example" "text" #> #> Component 2 : #> [1] "example" "text" #>selectFeaturesOLD(toks, "ex", "keep", valuetype = "regex")#> tokenizedTexts from 2 documents. #> Component 1 : #> [1] "example" "text" #> #> Component 2 : #> [1] "example" "text" #>