This function converts wide data into long format. It allows to transform multiple key-value pairs to be transformed from wide to long format in one single step.
to_long(data, keys, values, ..., labels = NULL, recode.key = FALSE)
data | A |
---|---|
keys | Character vector with name(s) of key column(s) to create in output. Either one key value per column group that should be gathered, or a single string. In the latter case, this name will be used as key column, and only one key column is created. See 'Examples'. |
values | Character vector with names of value columns (variable names) to create in output. Must be of same length as number of column groups that should be gathered. See 'Examples'. |
... | Specification of columns that should be gathered. Must be one character vector with variable names per column group, or a numeric vector with column indices indicating those columns that should be gathered. See 'Examples'. |
labels | Character vector of same length as |
recode.key | Logical, if |
This function enhances tidyr's gather
function that you can gather multiple column groups at once.
Value and variable labels for non-gathered variables are preserved.
However, gathered variables may have different variable label
attributes. In this case, gather
will drop
these attributes. Hence, the new created variables from gathered
columns don't have any variable label attributes. In such cases,
use labels
argument to set variable label attributes.
# create sample mydat <- data.frame(age = c(20, 30, 40), sex = c("Female", "Male", "Male"), score_t1 = c(30, 35, 32), score_t2 = c(33, 34, 37), score_t3 = c(36, 35, 38), speed_t1 = c(2, 3, 1), speed_t2 = c(3, 4, 5), speed_t3 = c(1, 8, 6)) # check tidyr. score is gathered, however, speed is not tidyr::gather(mydat, "time", "score", score_t1, score_t2, score_t3)#> age sex speed_t1 speed_t2 speed_t3 time score #> 1 20 Female 2 3 1 score_t1 30 #> 2 30 Male 3 4 8 score_t1 35 #> 3 40 Male 1 5 6 score_t1 32 #> 4 20 Female 2 3 1 score_t2 33 #> 5 30 Male 3 4 8 score_t2 34 #> 6 40 Male 1 5 6 score_t2 37 #> 7 20 Female 2 3 1 score_t3 36 #> 8 30 Male 3 4 8 score_t3 35 #> 9 40 Male 1 5 6 score_t3 38# gather multiple columns. both time and speed are gathered. to_long( data = mydat, keys = "time", values = c("score", "speed"), c("score_t1", "score_t2", "score_t3"), c("speed_t1", "speed_t2", "speed_t3") )#> age sex time score speed #> 1 20 Female score_t1 30 2 #> 2 30 Male score_t1 35 3 #> 3 40 Male score_t1 32 1 #> 4 20 Female score_t2 33 3 #> 5 30 Male score_t2 34 4 #> 6 40 Male score_t2 37 5 #> 7 20 Female score_t3 36 1 #> 8 30 Male score_t3 35 8 #> 9 40 Male score_t3 38 6# gather multiple columns, use numeric key-value to_long( data = mydat, keys = "time", values = c("score", "speed"), c("score_t1", "score_t2", "score_t3"), c("speed_t1", "speed_t2", "speed_t3"), recode.key = TRUE )#> age sex time score speed #> 1 20 Female 1 30 2 #> 2 30 Male 1 35 3 #> 3 40 Male 1 32 1 #> 4 20 Female 2 33 3 #> 5 30 Male 2 34 4 #> 6 40 Male 2 37 5 #> 7 20 Female 3 36 1 #> 8 30 Male 3 35 8 #> 9 40 Male 3 38 6# gather multiple columns by colum names and colum indices to_long( data = mydat, keys = "time", values = c("score", "speed"), c("score_t1", "score_t2", "score_t3"), 6:8, recode.key = TRUE )#> age sex time score speed #> 1 20 Female 1 30 2 #> 2 30 Male 1 35 3 #> 3 40 Male 1 32 1 #> 4 20 Female 2 33 3 #> 5 30 Male 2 34 4 #> 6 40 Male 2 37 5 #> 7 20 Female 3 36 1 #> 8 30 Male 3 35 8 #> 9 40 Male 3 38 6# gather multiple columns, use separate key-columns # for each value-vector to_long( data = mydat, keys = c("time_score", "time_speed"), values = c("score", "speed"), c("score_t1", "score_t2", "score_t3"), c("speed_t1", "speed_t2", "speed_t3") )#> age sex time_score score time_speed speed #> 1 20 Female score_t1 30 speed_t1 2 #> 2 30 Male score_t1 35 speed_t1 3 #> 3 40 Male score_t1 32 speed_t1 1 #> 4 20 Female score_t2 33 speed_t2 3 #> 5 30 Male score_t2 34 speed_t2 4 #> 6 40 Male score_t2 37 speed_t2 5 #> 7 20 Female score_t3 36 speed_t3 1 #> 8 30 Male score_t3 35 speed_t3 8 #> 9 40 Male score_t3 38 speed_t3 6# gather multiple columns, label columns mydat <- to_long( data = mydat, keys = "time", values = c("score", "speed"), c("score_t1", "score_t2", "score_t3"), c("speed_t1", "speed_t2", "speed_t3"), labels = c("Test Score", "Time needed to finish") ) library(sjlabelled) str(mydat$score)#> num [1:9] 30 35 32 33 34 37 36 35 38 #> - attr(*, "label")= chr "Test Score"#> [1] "Time needed to finish"