Using setkey
and setkeyv
in data.table
to carry out group_by
-like functionalities in dplyr. This is
not only convenient but also efficient in computation.
group_by_dt(.data, ..., cols = NULL) group_exe_dt(.data, ...)
.data | A data frame |
---|---|
... | Variables to group by for |
cols | A character vector of column names to group by. |
A data.table with keys
group_by_dt
and group_exe_dt
are a pair of functions
to be used in combination. It utilizes the feature of key setting in data.table,
which provides high performance for group operations, especially when you have
to operate by specific groups frequently.
# aggregation after grouping using group_exe_dt as.data.table(iris) -> a a %>% group_by_dt(Species) %>% group_exe_dt(head(1))#> Key: <Species> #> Species Sepal.Length Sepal.Width Petal.Length Petal.Width #> <fctr> <num> <num> <num> <num> #> 1: setosa 5.1 3.5 1.4 0.2 #> 2: versicolor 7.0 3.2 4.7 1.4 #> 3: virginica 6.3 3.3 6.0 2.5#> Key: <Species> #> Species sum #> <fctr> <num> #> 1: setosa 14.7 #> 2: versicolor 20.3 #> 3: virginica 19.2#> Key: <cyl, am> #> cyl am mpg_sum #> <num> <num> <num> #> 1: 4 0 68.7 #> 2: 4 1 224.6 #> 3: 6 0 76.5 #> 4: 6 1 61.7 #> 5: 8 0 180.6 #> 6: 8 1 30.8# equals to mtcars %>% group_by_dt(cols = c("cyl","am")) %>% group_exe_dt( summarise_dt(mpg_sum = sum(mpg)) )#> Key: <cyl, am> #> cyl am mpg_sum #> <num> <num> <num> #> 1: 4 0 68.7 #> 2: 4 1 224.6 #> 3: 6 0 76.5 #> 4: 6 1 61.7 #> 5: 8 0 180.6 #> 6: 8 1 30.8