Substitute features based on vectorized one-to-one matching for lemmatization or user-defined stemming.
dfm_replace( x, pattern, replacement, case_insensitive = TRUE, verbose = quanteda_options("verbose") )
x | dfm whose features will be replaced |
---|---|
pattern | a character vector. See pattern for more details. |
replacement | if |
case_insensitive | logical; if |
verbose | print status messages if |
dfmat1 <- dfm(data_corpus_inaugural) # lemmatization taxwords <- c("tax", "taxing", "taxed", "taxed", "taxation") lemma <- rep("TAX", length(taxwords)) featnames(dfm_select(dfmat1, pattern = taxwords))#> [1] "tax" "taxation" "taxing"dfmat2 <- dfm_replace(dfmat1, pattern = taxwords, replacement = lemma) featnames(dfm_select(dfmat2, pattern = taxwords))#> [1] "TAX"# stemming feat <- featnames(dfmat1) featstem <- char_wordstem(feat, "porter") dfmat3 <- dfm_replace(dfmat1, pattern = feat, replacement = featstem, case_insensitive = FALSE) identical(dfmat3, dfm_wordstem(dfmat1, "porter"))#> [1] TRUE