These functions compute distance matrix between documents and/or features from a dfm and return a standard dist object.

Slots

selection
character or character vector of document names or feature labels from the dfm

n
the top n most similar items will be returned. If n is NULL, return all items.

margin
identifies the margin of the dfm on which similarity will be computed: documents for documents or features for word/term features.

method
the distance measure to be used, options are "euclidean", "hamming", "Chisquared","Chisquared2" and "kullback", default "euclidean". More options are avaible in textstat_simil

normalize
a deprecated argument retained (temporarily) for legacy reasons. If you want to compute similarity on a "normalized" dfm objects (e.g. x), wrap it in weight(x, "relFreq").

digits
decimal places to display similarity values

tri
whether the upper triangle of the symmetric \(V \times V\) matrix is recorded

diag
whether the diagonal of the distance matrix should be recorded

dist
whether the distance matrix should be converted into an object of class "dist". Distance matrix created from some methods, such as "kullback", is not symmetric.

See also

dfm