distancePairedSamples.Rd
Computes the distance (one of: "manhattan", "euclidean", "chi", or "hellinger") between pairs of aligned samples (same order/depth/age) in two or more multivariate time-series.
distancePairedSamples( sequences = NULL, grouping.column = NULL, time.column = NULL, exclude.columns = NULL, same.time = FALSE, method = "manhattan", sum.distances = FALSE, parallel.execution = TRUE )
sequences | dataframe with multiple sequences identified by a grouping column. Generally the ouput of |
---|---|
grouping.column | character string, name of the column in |
time.column | character string, name of the column with time/depth/rank data. The data in this column is not modified. |
exclude.columns | character string or character vector with column names in |
same.time | boolean. If |
method | character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error. |
sum.distances | boolean, if |
parallel.execution | boolean, if |
A list with named slots (names of the sequences separated by a vertical line, as in "A|B") containing numeric vectors with the distance between paired samples of every possible combination of sequences according to grouping.column
.
Distances are computed as:
manhattan
: d <- sum(abs(x - y))
euclidean
: d <- sqrt(sum((x - y)^2))
chi
:
xy <- x + y
y. <- y / sum(y)
x. <- x / sum(x)
d <- sqrt(sum(((x. - y.)^2) / (xy / sum(xy))))
hellinger
: d <- sqrt(1/2 * sum(sqrt(x) - sqrt(y))^2)
Note that zeroes are replaced by 0.00001 whem method
equals "chi" or "hellinger".
#loading data data(climate) #preparing sequences #notice the argument paired.samples climate.prepared <- prepareSequences( sequences = climate, grouping.column = "sequenceId", time.column = "time", paired.samples = TRUE ) #compute pairwise distances between paired samples climate.prepared.distances <- distancePairedSamples( sequences = climate.prepared, grouping.column = "sequenceId", time.column = "time", exclude.columns = NULL, method = "manhattan", sum.distances = FALSE, parallel.execution = FALSE )