pattern.RdPattern(s) for use in matching Feature, tokens, and keywords through a valuetype pattern.
| pattern | a character vector, list of character vectors, dictionary, collocations, or dfm. See pattern for details. |
|---|
The pattern argument is a vector of patterns, including sequences,
to match in a target object, whose match type is specified by valuetype.
Note that an empty pattern ("") will match "padding" in a tokens object.
characterA character vector of token patterns to be selected or removed.
Whitespace is not privileged, so that in a character vector, white space is interpreted
literally. If you wish to consider whitespace-separated elements as sequences of tokens,
wrap the argument in phrase.
list of character objectsIf the list elements are character vectors of
length 1, then this is equivalent to a vector of characters. If a list element contains
a vector of characters longer than length 1, then for matching will consider these
as sequences of matches, equivalent to wrapping the argument in phrase,
except for matching to dfm features where this does not apply.
dictionaryValues in dictionary are used as patterns, for literal matches.
Multi-word values are automatically converted into phrases, so performing selection or
compounding using a dictionary is the same as wrapping the dictionary in phrase.
collocationsCollocations objects created from textstat_collocations,
which are treated as phrases automatically.
dfmOnly dfm_select accepts dfm as features to create a new dfm
identical in its feature set, using a fixed match.
# these are interpreted literally (patt1 <- c('president', 'white house', 'house of representatives'))#> [1] "president" "white house" #> [3] "house of representatives"#> [[1]] #> [1] "president" #> #> [[2]] #> [1] "white" "house" #> #> [[3]] #> [1] "house" "of" "representatives" #># three single-word patterns (patt2 <- c('president', 'white_house', 'house_of_representatives'))#> [1] "president" "white_house" #> [3] "house_of_representatives"phrase(patt2)#> [[1]] #> [1] "president" #> #> [[2]] #> [1] "white_house" #> #> [[3]] #> [1] "house_of_representatives" #># this is equivalent to phrase(patt1) (patt3 <- list(c('president'), c('white', 'house'), c('house', 'of', 'representatives')))#> [[1]] #> [1] "president" #> #> [[2]] #> [1] "white" "house" #> #> [[3]] #> [1] "house" "of" "representatives" #># glob expression can be used phrase(patt4 <- c('president?', 'white house', 'house * representatives'))#> [[1]] #> [1] "president?" #> #> [[2]] #> [1] "white" "house" #> #> [[3]] #> [1] "house" "*" "representatives" #># this is equivalent to phrase(patt4) (patt5 <- list(c('president?'), c('white', 'house'), c('house', '*', 'representatives')))#> [[1]] #> [1] "president?" #> #> [[2]] #> [1] "white" "house" #> #> [[3]] #> [1] "house" "*" "representatives" #># dictionary with multi-word matches (dict1 <- dictionary(list(us = c('president', 'white house', 'house of representatives'))))#> Dictionary object with 1 key entry. #> - [us]: #> - president, white house, house of representativesphrase(dict1)#> [[1]] #> [1] "president" #> #> [[2]] #> [1] "white" "house" #> #> [[3]] #> [1] "house" "of" "representatives" #>