The following functions compute descriptive statistics by levels of a factor or combination of factors quickly.
cv_by()
For computing coefficient of variation.
max_by()
For computing maximum values.
means_by()
For computing arithmetic means.
min_by()
For compuing minimum values.
n_by()
For getting the length.
sd_by()
For computing sample standard deviation.
sem_by()
For computing standard error of the mean.
Useful functions for descriptive statistics. All of them work
naturally with %>%
, handle grouped data and multiple variables (all
numeric variables from .data
by default).
av_dev()
computes the average absolute deviation.
ci_mean()
computes the confidence interval for the mean.
cv()
computes the coefficient of variation.
freq_table()
Computes frequency fable. Handles grouped data.
hmean(), gmean()
computes the harmonic and geometric means,
respectively. The harmonic mean is the reciprocal of the arithmetic mean of
the reciprocals. The geometric mean is the nth root of n
products.
kurt()
computes the kurtosis like used in SAS and SPSS.
range_data()
Computes the range of the values.
row_col_mean(), row_col_sum()
Adds a row with the mean/sum of
each variable and a column with the the mean/sum for each row of the data.
sd_amo(), sd_pop()
Computes sample and populational standard
deviation, respectively.
sem()
computes the standard error of the mean.
skew()
computes the skewness like used in SAS and SPSS.
sum_dev()
computes the sum of the absolute deviations.
sum_sq_dev()
computes the sum of the squared deviations.
var_amo(), var_pop()
computes sample and populational variance.
valid_n()
Return the valid (not NA
) length of a data.
desc_stat
is wrapper function around the above ones and can be
used to compute quickly all these statistics at once.
av_dev(.data, ..., na.rm = FALSE) ci_mean(.data, ..., na.rm = FALSE, level = 0.95) cv(.data, ..., na.rm = FALSE) freq_table(.data, ...) hmean(.data, ..., na.rm = FALSE) gmean(.data, ..., na.rm = FALSE) kurt(.data, ..., na.rm = FALSE) pseudo_sigma(.data, ..., na.rm = FALSE) range_data(.data, ..., na.rm = FALSE) row_col_mean(.data, na.rm = FALSE) row_col_sum(.data, na.rm = FALSE) sd_amo(.data, ..., na.rm = FALSE) sd_pop(.data, ..., na.rm = FALSE) sem(.data, ..., na.rm = FALSE) skew(.data, ..., na.rm = FALSE) sum_dev(.data, ..., na.rm = FALSE) sum_sq_dev(.data, ..., na.rm = FALSE) var_pop(.data, ..., na.rm = FALSE) var_amo(.data, ..., na.rm = FALSE) valid_n(.data, ..., na.rm = FALSE) cv_by(.data, ..., na.rm = FALSE) max_by(.data, ..., na.rm = FALSE) means_by(.data, ..., na.rm = FALSE) min_by(.data, ..., na.rm = FALSE) n_by(.data, ..., na.rm = FALSE) sd_by(.data, ..., na.rm = FALSE) sem_by(.data, ..., na.rm = FALSE) sum_by(.data, ..., na.rm = FALSE)
.data | A data frame or a numeric vector. |
---|---|
... | The argument depends on the function used.
|
na.rm | A logical value indicating whether |
level | The confidence level for the confidence interval of the mean. Defaults to 0.95. |
Functions *_by()
returns a tbl_df with the computed statistics by
each level of the factor(s) declared in ...
.
All other functions return a nammed integer if the input is a data frame or a numeric value if the input is a numeric vector.
Tiago Olivoto tiagoolivoto@gmail.com
#> # A tibble: 52 x 17 #> GEN ENV PH EH EP EL ED CL CD CW KW NR NKR #> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 H1 A1 2.72 1.68 0.626 15.4 51.1 28.1 15.7 23.5 203. 16.3 33.3 #> 2 H1 A2 2.93 1.80 0.612 15.0 51.9 31.5 15.5 30.2 188. 17.1 31.4 #> 3 H1 A3 2.20 1.10 0.497 14.8 50.6 30.9 15.8 26.8 157. 15.9 28.4 #> 4 H1 A4 2.64 1.44 0.547 15.2 51.2 29.8 15.8 26.4 187. 17.2 35.7 #> 5 H10 A1 2.78 1.62 0.584 16.1 53.2 31.4 16.8 24.6 192. 16.7 31.2 #> 6 H10 A2 2.05 0.987 0.494 15.5 46.7 26.8 16.3 26.3 160. 14 33.5 #> 7 H10 A3 2.04 1.01 0.503 14.0 43.9 24.8 15.2 12.3 121. 15.3 33.3 #> 8 H10 A4 2.39 1.43 0.600 14.9 50.0 30.7 15.3 28.0 183. 16.4 31.6 #> 9 H11 A1 2.75 1.58 0.574 16.6 48.9 29.0 17.2 23.6 188. 15.2 34.6 #> 10 H11 A2 2.15 1.02 0.475 15.1 47.3 27.2 15.7 24.3 164. 13.7 35 #> # ... with 42 more rows, and 4 more variables: CDED <dbl>, PERK <dbl>, #> # TKW <dbl>, NKE <dbl># Coefficient of variation for all numeric variables # by GEN and ENV cv_by(data_ge2, GEN, ENV)#> # A tibble: 52 x 17 #> GEN ENV PH EH EP EL ED CL CD CW KW NR #> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 H1 A1 4.93 5.36 5.34 7.15 1.98 1.48 6.91 7.93 8.31 5.12 #> 2 H1 A2 3.46 5.98 2.71 2.04 1.92 1.92 1.98 3.75 5.44 3.58 #> 3 H1 A3 4.14 4.98 1.01 7.00 3.58 2.25 6.43 14.6 15.2 2.91 #> 4 H1 A4 8.66 9.84 1.89 9.88 3.72 5.51 11.4 19.1 17.9 4.65 #> 5 H10 A1 1.97 6.11 5.33 6.46 1.43 2.72 6.78 13.6 7.99 6.04 #> 6 H10 A2 5.31 6.83 12.8 5.29 0.800 1.38 4.89 8.81 9.34 7.56 #> 7 H10 A3 11.9 11.3 2.24 3.23 0.436 5.56 5.40 5.80 5.23 3.98 #> 8 H10 A4 4.38 3.53 8.82 3.11 3.42 3.89 3.09 5.02 6.07 2.44 #> 9 H11 A1 0.988 5.01 4.44 4.94 4.95 5.27 4.72 12.3 13.9 10.5 #> 10 H11 A2 1.43 3.00 4.00 4.84 0.762 2.48 5.02 15.8 5.37 3.36 #> # ... with 42 more rows, and 5 more variables: NKR <dbl>, CDED <dbl>, #> # PERK <dbl>, TKW <dbl>, NKE <dbl>#> [1] 0.1977769# Confidence interval 0.95 for the mean # All numeric variables # Grouped by levels of ENV data_ge2 %>% group_by(ENV) %>% ci_mean()#> # A tibble: 4 x 16 #> ENV PH EH EP EL ED CL CD CW KW NR NKR #> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 A1 0.0401 0.0415 0.0142 0.382 0.598 0.725 0.352 1.83 6.31 0.638 0.948 #> 2 A2 0.129 0.113 0.0206 0.442 0.882 0.819 0.436 2.24 11.9 0.467 1.06 #> 3 A3 0.0724 0.0625 0.0153 0.326 0.897 0.781 0.307 1.66 7.03 0.498 1.01 #> 4 A4 0.0539 0.0483 0.0128 0.423 0.700 0.579 0.394 1.48 8.70 0.433 1.20 #> # ... with 4 more variables: CDED <dbl>, PERK <dbl>, TKW <dbl>, NKE <dbl># standard error of the mean # Variable PH and EH sem(data_ge2, PH, EH)#> # A tibble: 1 x 2 #> PH EH #> <dbl> <dbl> #> 1 0.0267 0.0228# Frequency table for variable NR data_ge2 %>% freq_table(NR)#> # A tibble: 20 x 4 #> NR n rel_freq cum_freq #> <dbl> <int> <dbl> <dbl> #> 1 12.4 1 0.00641 0.00641 #> 2 13.2 3 0.0192 0.0256 #> 3 13.6 7 0.0449 0.0705 #> 4 14 10 0.0641 0.135 #> 5 14.4 8 0.0513 0.186 #> 6 14.8 11 0.0705 0.256 #> 7 15.2 12 0.0769 0.333 #> 8 15.6 17 0.109 0.442 #> 9 16 14 0.0897 0.532 #> 10 16.4 16 0.103 0.635 #> 11 16.8 11 0.0705 0.705 #> 12 17.2 13 0.0833 0.788 #> 13 17.6 12 0.0769 0.865 #> 14 18 9 0.0577 0.923 #> 15 18.4 3 0.0192 0.942 #> 16 18.8 2 0.0128 0.955 #> 17 19.6 1 0.00641 0.962 #> 18 20 3 0.0192 0.981 #> 19 20.4 2 0.0128 0.994 #> 20 21.2 1 0.00641 1# }