Creates a hashed object of tokens, called by tokens
.
tokens_hash(x, types_reserved, ...)
x | a source of tokenizedText |
---|---|
types_reserved | optional pre-existing types for mapping of tokens |
... | additional arguments |
a list the hashed tokens found in each text
This was formerly used to create a tokenizedTextsHashed
object, but all tokens objects are now hashed, so this is just exported for
testing until it will become internal only.
This will be internal only soon.
txt <- c(doc1 = "The quick brown fox jumped over the lazy dog.", doc2 = "The dog jumped and ate the fox.") toks <- tokenize(char_tolower(txt), remove_punct = TRUE) toksHashed <- tokens_hash(toks) toksHashed#> tokens from 2 documents. #> doc1 : #> [1] "the" "quick" "brown" "fox" "jumped" "over" "the" "lazy" #> [9] "dog" #> #> doc2 : #> [1] "the" "dog" "jumped" "and" "ate" "the" "fox" #># returned as a list as.list(toksHashed)#> $doc1 #> [1] "the" "quick" "brown" "fox" "jumped" "over" "the" "lazy" #> [9] "dog" #> #> $doc2 #> [1] "the" "dog" "jumped" "and" "ate" "the" "fox" #>#> tokenizedTexts from 2 documents. #> doc1 : #> [1] "the" "quick" "brown" "fox" "jumped" "over" "the" "lazy" #> [9] "dog" #> #> doc2 : #> [1] "the" "dog" "jumped" "and" "ate" "the" "fox" #>