Creates a hashed object of tokens, called by tokens.

tokens_hash(x, types_reserved, ...)

Arguments

x

a source of tokenizedText

types_reserved

optional pre-existing types for mapping of tokens

...

additional arguments

Value

a list the hashed tokens found in each text

Details

This was formerly used to create a tokenizedTextsHashed object, but all tokens objects are now hashed, so this is just exported for testing until it will become internal only.

Note

This will be internal only soon.

See also

tokenize

Examples

txt <- c(doc1 = "The quick brown fox jumped over the lazy dog.", doc2 = "The dog jumped and ate the fox.") toks <- tokenize(char_tolower(txt), remove_punct = TRUE) toksHashed <- tokens_hash(toks) toksHashed
#> tokens from 2 documents. #> doc1 : #> [1] "the" "quick" "brown" "fox" "jumped" "over" "the" "lazy" #> [9] "dog" #> #> doc2 : #> [1] "the" "dog" "jumped" "and" "ate" "the" "fox" #>
# returned as a list as.list(toksHashed)
#> $doc1 #> [1] "the" "quick" "brown" "fox" "jumped" "over" "the" "lazy" #> [9] "dog" #> #> $doc2 #> [1] "the" "dog" "jumped" "and" "ate" "the" "fox" #>
# returned as a tokenized Text as.tokenizedTexts(toksHashed)
#> tokenizedTexts from 2 documents. #> doc1 : #> [1] "the" "quick" "brown" "fox" "jumped" "over" "the" "lazy" #> [9] "dog" #> #> doc2 : #> [1] "the" "dog" "jumped" "and" "ate" "the" "fox" #>
# change case toks <- tokens_hash(tokenize(c(one = "a b c d A B C D", two = "A B C d")))