Get or replace the texts in a corpus, with grouping options.
Works for plain character vectors too, if groups
is a factor.
texts(x, groups = NULL, spacer = " ") texts(x) <- value # S3 method for corpus as.character(x, ...)
x | a corpus or character object |
---|---|
groups | either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details. |
spacer | when concatenating texts by using |
value | character vector of the new texts |
... | unused |
For texts
, a character vector of the texts in the corpus.
For texts <-
, the corpus with the updated texts.
for texts <-
, a corpus with the texts replaced by value
as.character(x)
is equivalent to texts(x)
as.character(x)
where x
is a corpus is equivalent to
calling texts(x)
The groups
will be used for concatenating the texts based on shared
values of groups
, without any specified order of aggregation.
You are strongly encouraged as a good practice of text analysis
workflow not to modify the substance of the texts in a corpus.
Rather, this sort of processing is better performed through downstream
operations. For instance, do not lowercase the texts in a corpus, or you
will never be able to recover the original case. Rather, apply
tokens_tolower
after applying tokens
to a
corpus, or use the option tolower = TRUE
in dfm
.
#> 1789-Washington 1793-Washington 1797-Adams 1801-Jefferson 1805-Jefferson #> 8618 790 13876 10136 12907# grouping on a document variable nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806), groups = "President"))#> Adams Jefferson Washington #> 13876 23045 9410# grouping a character vector using a factor nchar(data_char_ukimmig2010[1:5])#> BNP Coalition Conservative Greens Labour #> 18567 1471 2692 3841 3854nchar(texts(data_corpus_inaugural[1:5], groups = as.factor(data_corpus_inaugural[1:5, "President"])))#> Adams Jefferson Washington #> 13876 23045 9410BritCorpus <- corpus(c("We must prioritise honour in our neighbourhood.", "Aluminium is a valourous metal.")) texts(BritCorpus) <- stringi::stri_replace_all_regex(texts(BritCorpus), c("ise", "([nlb])our", "nium"), c("ize", "$1or", "num"), vectorize_all = FALSE) texts(BritCorpus)#> text1 #> "We must prioritize honor in our neighborhood." #> text2 #> "Aluminum is a valorous metal."texts(BritCorpus)[2] <- "New text number 2." texts(BritCorpus)#> text1 #> "We must prioritize honor in our neighborhood." #> text2 #> "New text number 2."