Corpus functions
Functions for constructing and manipulating corpus class objects.
|
corpus_reshape
|
recast the document units of a corpus |
corpus_sample
|
randomly sample documents from a corpus |
corpus_segment char_segment
|
segment texts into component elements |
corpus_subset
|
extract a subset of a corpus |
corpus_trim char_trim
|
remove sentences based on their token lengths or a pattern match |
corpus
|
construct a corpus object |
metacorpus & -
|
get or set corpus metadata |
docvars & -
|
get or set for document-level variables |
metadoc & -
|
get or set document-level meta-data |
texts & - as.character
|
get or assign corpus texts |
as.corpus
|
coerce a compressed corpus to a standard corpus |
Text matrix functions
Functions for constructing and manipulating a document-feature matrix (dfm) or feature co-occurrence matrix object.
|
dfm
|
create a document-feature matrix |
dfm_compress fcm_compress
|
recombine a dfm or fcm by combining identical dimension elements |
dfm_group
|
combine documents in a dfm by a grouping variable |
dfm_lookup
|
apply a dictionary to a dfm |
dfm_sample
|
randomly sample documents or features from a dfm |
dfm_select dfm_remove fcm_select fcm_remove
|
select features from a dfm or fcm |
dfm_subset
|
extract a subset of a dfm |
dfm_sort
|
sort a dfm by frequency of one or more margins |
dfm_tolower dfm_toupper fcm_tolower fcm_toupper
|
convert the case of the features of a dfm and combine |
dfm_trim
|
trim a dfm using frequency threshold-based feature selection |
dfm_weight dfm_smooth
|
weight the feature frequencies in a dfm |
tokens_wordstem char_wordstem dfm_wordstem
|
stem the terms in an object |
head tail
|
return the first or last part of a dfm |
is.dfm as.dfm
|
coercion and checking functions for dfm objects |
as.matrix as.data.frame
|
coerce a dfm to a matrix or data.frame |
fcm
|
create a feature co-occurrence matrix |
fcm_sort
|
sort an fcm in alphabetical order of the features |