distanceMatrix.Rd
Computes distance matrices among the samples of two or more multivariate time-series provided in a single dataframe (generally produced by prepareSequences
), identified by a grouping column (argument grouping.column
). Distances can be computed with the methods "manhattan", "euclidean", "chi", and "hellinger", and are implemented in the function distance
. The function uses the packages parallel
, foreach
, and doParallel
to compute distances matrices among different sequences in parallel. It is configured to use all processors available minus one.
distanceMatrix( sequences = NULL, grouping.column = NULL, time.column = NULL, exclude.columns = NULL, method = "manhattan", parallel.execution = TRUE )
sequences | dataframe with multiple sequences identified by a grouping column. Generally the ouput of |
---|---|
grouping.column | character string, name of the column in |
time.column | character string, name of the column with time/depth/rank data. The data in this column is not modified. |
exclude.columns | character string or character vector with column names in |
method | character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error. |
parallel.execution | boolean, if |
A list with named slots containing the the distance matrices of every possible combination of sequences according to grouping.column
.
Distances are computed as:
manhattan
: d <- sum(abs(x - y))
euclidean
: d <- sqrt(sum((x - y)^2))
chi
:
xy <- x + y
y. <- y / sum(y)
x. <- x / sum(x)
d <- sqrt(sum(((x. - y.)^2) / (xy / sum(xy))))
hellinger
: d <- sqrt(1/2 * sum(sqrt(x) - sqrt(y))^2)
Note that zeroes are replaced by 0.00001 whem method
equals "chi" or "hellinger".
#loading data data(sequenceA) data(sequenceB) #preparing datasets AB.sequences <- prepareSequences( sequence.A = sequenceA, sequence.A.name = "A", sequence.B = sequenceB, sequence.B.name = "B", merge.mode = "complete", if.empty.cases = "zero", transformation = "hellinger" ) #computing distance matrix AB.distance.matrix <- distanceMatrix( sequences = AB.sequences, grouping.column = "id", method = "manhattan", parallel.execution = FALSE ) #plot plotMatrix(distance.matrix = AB.distance.matrix)