Sort multidimensional data

kd_sort(x, ...)

# S3 method for matrix
kd_sort(x, parallel = TRUE, ...)

# S3 method for arrayvec
kd_sort(x, inplace = FALSE, parallel = TRUE, ...)

# S3 method for data.frame
kd_sort(x, cols = 1:ncol(x), parallel = TRUE, ...)

kd_order(x, ...)

# S3 method for matrix
kd_order(x, parallel = TRUE, ...)

# S3 method for arrayvec
kd_order(x, inplace = FALSE, parallel = TRUE, ...)

# S3 method for data.frame
kd_order(x, cols = 1:ncol(x), parallel = TRUE, ...)

kd_is_sorted(x, ...)

Arguments

x

a matrix or arrayvec object

...

ignored

parallel

use multiple threads if true

inplace

sort as a side-effect if true

cols

integer vector of column indices

Details

The algorithm used is a divide-and-conquer quicksort variant that recursively partions an range of tuples using the median of each successive dimension. Ties are resolved by cycling over successive dimensions. The result is an ordering of tuples matching their order if they were inserted into a kd-tree.

kd_order returns permutation vector that will order the rows of the original matrix, exactly as order. If inplace is true, then kd_order will also sort the arrayvec object as a side effect. This can be more efficient when many subsequent queries are required.

kd_sort and kd_order have been extended to work directly on a data frame. All vector column types are supported (even lists of objects as long as equality and comparison operators are defined). Additional, the user can specify a sequence of column indices that will be used for sorting. These can be a subset of columns and given in any order.

Note

The matrix version will be slower because of data structure conversions.

See also

Examples

z <- data.frame(real = runif(10), lgl = runif(10) > 0.5, int = as.integer(rpois(10, 2)), char = sample(month.name, 10), stringsAsFactors = FALSE) kd_sort(z)
#> real lgl int char #> 8 0.289767245 FALSE 2 January #> 5 0.007399441 FALSE 5 June #> 4 0.157208442 FALSE 1 September #> 1 0.080750138 TRUE 1 April #> 6 0.466393497 FALSE 3 October #> 7 0.497777389 FALSE 0 May #> 3 0.600760886 FALSE 3 March #> 9 0.732881987 FALSE 3 November #> 2 0.834333037 FALSE 3 February #> 10 0.772521511 TRUE 3 July
x <- matrix(runif(200), 100) y <- kd_sort(x) kd_is_sorted(y)
#> [1] TRUE
kd_order(x)
#> [1] 27 44 67 42 58 50 46 53 13 98 92 11 77 38 57 6 59 63 #> [19] 61 26 29 41 81 47 9 18 100 83 30 7 80 16 37 93 95 85 #> [37] 84 20 24 71 31 34 5 4 39 25 14 28 70 54 35 89 22 8 #> [55] 76 36 62 86 23 32 91 65 12 1 40 10 2 96 69 55 33 75 #> [73] 79 88 94 90 97 74 19 43 72 49 73 17 51 60 56 82 15 87 #> [91] 52 66 78 21 68 48 64 99 3 45
plot(y, type = "o", pch = 19, col = "steelblue", asp = 1)