Convert texts or tokens to lower (or upper) case

toLower(x, keep_acronyms = FALSE, ...)

# S3 method for character
toLower(x, keep_acronyms = FALSE, ...)

# S3 method for NULL
toLower(x, ...)

# S3 method for tokenizedTexts
toLower(x, keep_acronyms = FALSE, ...)

# S3 method for tokens
toLower(x, ...)

# S3 method for tokens
toUpper(x, ...)

# S3 method for corpus
toLower(x, ...)

toUpper(x, ...)

# S3 method for character
toUpper(x, ...)

# S3 method for NULL
toUpper(x, ...)

# S3 method for tokenizedTexts
toUpper(x, ...)

# S3 method for corpus
toUpper(x, ...)

Arguments

x

texts to be lower-cased (or upper-cased)

keep_acronyms

if TRUE, do not lowercase any all-uppercase words. Only applies to toLower.

...

additional arguments passed to stringi functions, (e.g. stri_trans_tolower), such as locale

Value

Texts tranformed into their lower- (or upper-)cased versions. If x is a character vector or a corpus, return a character vector. If x is a list of tokenized texts, then return a list of tokenized texts.

Examples

test1 <- c(text1 = "England and France are members of NATO and UNESCO", text2 = "NASA sent a rocket into space.") toLower(test1)
#> Warning: 'toLower.character' is deprecated. #> Use 'char_tolower' instead. #> See help("Deprecated")
#> text1 #> "england and france are members of nato and unesco" #> text2 #> "nasa sent a rocket into space."
toLower(test1, keep_acronyms = TRUE)
#> Warning: 'toLower.character' is deprecated. #> Use 'char_tolower' instead. #> See help("Deprecated")
#> text1 #> "england and france are members of NATO and UNESCO" #> text2 #> "NASA sent a rocket into space."
test2 <- tokenize(test1, remove_punct=TRUE) toLower(test2)
#> tokenizedTexts from 2 documents. #> text1 : #> [1] "england" "and" "france" "are" "members" "of" "nato" #> [8] "and" "unesco" #> #> text2 : #> [1] "nasa" "sent" "a" "rocket" "into" "space" #>
toLower(test2, keep_acronyms = TRUE)
#> tokenizedTexts from 2 documents. #> text1 : #> [1] "england" "and" "france" "are" "members" "of" "NATO" #> [8] "and" "UNESCO" #> #> text2 : #> [1] "NASA" "sent" "a" "rocket" "into" "space" #>
test1 <- c(text1 = "England and France are members of NATO and UNESCO", text2 = "NASA sent a rocket into space.") toUpper(test1)
#> Warning: 'toUpper.character' is deprecated. #> Use 'char_toupper' instead. #> See help("Deprecated")
#> text1 #> "ENGLAND AND FRANCE ARE MEMBERS OF NATO AND UNESCO" #> text2 #> "NASA SENT A ROCKET INTO SPACE."
test2 <- tokenize(test1, remove_punct = TRUE) toUpper(test2)
#> Warning: 'toUpper.character' is deprecated. #> Use 'char_toupper' instead. #> See help("Deprecated")
#> Warning: 'toUpper.character' is deprecated. #> Use 'char_toupper' instead. #> See help("Deprecated")
#> tokenizedTexts from 2 documents. #> text1 : #> [1] "ENGLAND" "AND" "FRANCE" "ARE" "MEMBERS" "OF" "NATO" #> [8] "AND" "UNESCO" #> #> text2 : #> [1] "NASA" "SENT" "A" "ROCKET" "INTO" "SPACE" #>