23 V
23.1 value
A single number or piece of data.
In a tidy dataset, each cell contains only one value.
The age
column contains 3 values: mean, minimum and maximum.
library(dplyr)
library(tidyr)
untidy <- data.frame(
group = c("A", "B"),
age = c("20.4 [18-25]", "19.9 [18-24]")
)
group | age |
---|---|
A | 20.4 [18-25] |
B | 19.9 [18-24] |
group | mean | min | max |
---|---|---|---|
A | 20.4 | 18 | 25 |
B | 19.9 | 18 | 24 |
23.2 variable
A word that identifies and stores the value of some data for later use.
Variables in R are usually referred to as objects. See the definition for object.
23.3 variance
A descriptive statistic for how much an average data point varies from the mean.
Variance is equal to standard deviation squared.
sd(data)^2
#> [1] 7.5
You calculate variance by summing the squared differences between each data point and their mean (sum(diff^2)
) and dividing this by the number of data points minus 1 ((n-1)
)
23.4 vector
A type of data structure that collects values with the same data type, like T/F values, numbers, or strings.
The following things are examples of vectors:
# use the c() function to make a vector of strings or numbers
ingredients <- c("vodka", "gin", "rum", "tequila", "triple sec",
"orange juice", "coke", "sour mix")
fun_to_play_at <- c(25, 13, 3, 1)
# the colon between two integers gives you all the numbers from the first to the last integer
likert <- 1:7
Elements are always the same data type. If you try to combine values with different data types, they are coerced into a common data type. Use a list to combine values with different types without coercion.
The variable letters
is a built-in vector with the Latin letters in order. You can select part of a vector by putting the numeric location (or index) of what element you want inside of square brackets after the vector. You can even put a vector of numbers inside the square brackets to select several elements.
letters[26]
#> [1] "z"
letters[1:5]
#> [1] "a" "b" "c" "d" "e"
letters[fun_to_play_at]
#> [1] "y" "m" "c" "a"
See Ch 20 of R for Data Science for a thorough explanation of vectors.
23.6 version control
A way to save a record of changes to your files.
Git is one type of version control that is most commonly used with RStudio. GitHub is a cloud-based service for storing and sharing your version controlled files.
Set up git and github with RStudio.