Substitute features based on vectorized one-to-one matching for lemmatization or user-defined stemming.
dfm_replace(x, pattern, replacement = NULL, case_insensitive = TRUE, verbose = quanteda_options("verbose"))
x | dfm whose features will be replaced |
---|---|
pattern | a character vector or dictionary. See pattern for more details. |
replacement | if |
case_insensitive | ignore case when matching, if |
verbose | print status messages if |
mydfm <- dfm(data_corpus_irishbudget2010) # lemmatization infle <- c("foci", "focus", "focused", "focuses", "focusing", "focussed", "focusses") lemma <- rep("focus", length(infle)) mydfm2 <- dfm_replace(mydfm, infle, lemma) featnames(dfm_select(mydfm2, infle))#> [1] "focus"# stemming feat <- featnames(mydfm) stem <- char_wordstem(feat, "porter") mydfm3 <- dfm_replace(mydfm, feat, stem, case_insensitive = FALSE) identical(mydfm3, dfm_wordstem(mydfm, "porter"))#> [1] TRUE