Sort multidimensional data
kd_sort(x, ...) # S3 method for matrix kd_sort(x, parallel = TRUE, ...) # S3 method for arrayvec kd_sort(x, inplace = FALSE, parallel = TRUE, ...) # S3 method for data.frame kd_sort(x, cols = 1:ncol(x), parallel = TRUE, ...) kd_order(x, ...) # S3 method for matrix kd_order(x, parallel = TRUE, ...) # S3 method for arrayvec kd_order(x, inplace = FALSE, parallel = TRUE, ...) # S3 method for data.frame kd_order(x, cols = 1:ncol(x), parallel = TRUE, ...) kd_is_sorted(x, ...)
x | a matrix or arrayvec object |
---|---|
... | ignored |
parallel | use multiple threads if true |
inplace | sort as a side-effect if true |
cols | integer vector of column indices |
The algorithm used is a divide-and-conquer quicksort variant that recursively partions an range of tuples using the median of each successive dimension. Ties are resolved by cycling over successive dimensions. The result is an ordering of tuples matching their order if they were inserted into a kd-tree.
kd_order
returns permutation vector that will order the rows of the
original matrix, exactly as order
. If inplace
is true,
then kd_order
will also sort the arrayvec object as a side effect.
This can be more efficient when many subsequent queries are required.
kd_sort
and kd_order
have been extended to work directly on a
data frame. All vector column types are supported (even lists of objects as
long as equality and comparison operators are defined). Additional, the
user can specify a sequence of column indices that will be used for
sorting. These can be a subset of columns and given in any order.
The matrix version will be slower because of data structure conversions.
z <- data.frame(real = runif(10), lgl = runif(10) > 0.5, int = as.integer(rpois(10, 2)), char = sample(month.name, 10), stringsAsFactors = FALSE) kd_sort(z)#> real lgl int char #> 8 0.289767245 FALSE 2 January #> 5 0.007399441 FALSE 5 June #> 4 0.157208442 FALSE 1 September #> 1 0.080750138 TRUE 1 April #> 6 0.466393497 FALSE 3 October #> 7 0.497777389 FALSE 0 May #> 3 0.600760886 FALSE 3 March #> 9 0.732881987 FALSE 3 November #> 2 0.834333037 FALSE 3 February #> 10 0.772521511 TRUE 3 July#> [1] TRUEkd_order(x)#> [1] 27 44 67 42 58 50 46 53 13 98 92 11 77 38 57 6 59 63 #> [19] 61 26 29 41 81 47 9 18 100 83 30 7 80 16 37 93 95 85 #> [37] 84 20 24 71 31 34 5 4 39 25 14 28 70 54 35 89 22 8 #> [55] 76 36 62 86 23 32 91 65 12 1 40 10 2 96 69 55 33 75 #> [73] 79 88 94 90 97 74 19 43 72 49 73 17 51 60 56 82 15 87 #> [91] 52 66 78 21 68 48 64 99 3 45