These functions compute distance matrix between documents and/or features from a
dfm
and return a standard dist
object.
Slots
selection
- character or character vector of document names or feature
labels from the dfm
n
- the top
n
most similar items will be returned. If n is NULL
, return all items.
margin
- identifies the margin of the dfm on which similarity will be
computed:
documents
for documents or features
for word/term
features.
method
- the distance measure to be used, options are "euclidean", "hamming",
"Chisquared","Chisquared2" and "kullback", default "euclidean". More options are avaible
in
textstat_simil
normalize
- a deprecated argument retained (temporarily) for legacy
reasons. If you want to compute similarity on a "normalized" dfm objects
(e.g.
x
), wrap it in weight(x, "relFreq")
.
digits
- decimal places to display similarity values
tri
- whether the upper triangle of the symmetric \(V \times V\) matrix is recorded
diag
- whether the diagonal of the distance matrix should be recorded
dist
- whether the distance matrix should be converted into an object of class "dist". Distance matrix
created from some methods, such as "kullback", is not symmetric.
See also
dfm