5 D
5.1 data frame
A container data type for storing tabular data.
data.frame(
id = 1:5,
name = c("Lisa", "Phil", "Helena", "Rachel", "Jack")
)
id | name |
---|---|
1 | Lisa |
2 | Phil |
3 | Helena |
4 | Rachel |
5 | Jack |
5.2 data type
The kind of data represented by an object.
-
integer (whole numbers like
1L
,-10L
,3000L
) -
double (numbers like
-0.223
,10.324
,1e4
) - character (letters or words like "I love R")
-
logical (
TRUE
orFALSE
) - complex (numbers with real and imaginary parts like
2i
)
Integers and doubles are both numeric.
If you want to know what data type a variable is, use the function typeof
.
typeof(10)
#> [1] "double"
typeof("10")
#> [1] "character"
typeof(10L)
#> [1] "integer"
typeof(10 == 10)
#> [1] "logical"
typeof(10i)
#> [1] "complex"
5.4 default value
A value that a function uses for an argument if it is skipped.
For example, mean()
has a default value of FALSE
for the argument na.rm
(don't ignore NA values).
So if you leave that argument out, it's the same as setting it to FALSE
.
mean(x, na.rm = FALSE)
#> [1] NA
And you have to explicitly set it if you want it to be different.
mean(x, na.rm = TRUE)
#> [1] 2
If an argument does not have a default value, you can't omit it. In the example below, there is no default value for n
.
x = rnorm()
#> Error in rnorm(): argument "n" is missing, with no default
5.5 degrees of freedom
The number of observations that are free to vary to produce a known outcome.
If you run 5 people and ask them their age, and you know the mean age of those 5 people is 20.1, then four of those people can have any age, but the 5th person must have a specific age to maintain the mean of 20.1. The mean age here is the known outcome and four people's age can vary freely, so the degrees of freedom is 4.
You need to know the degrees of freedom (usually abbreviated df) to interpret test statistics like t-values and F-values.

Figure 5.1: How p-values vary depending on the degrees of freedom for specific t-values in a t-test.
5.6 dependent variable
The target variable that is being analyzed, whose value is assumed to depend on other variables.
You are generally interested in how other variables you have measured impact the dependent variable. For example, if you are testing the effect of stress on anxiety, stress would be your independent variable and anxiety would be the dependent variable. The terms independent/dependent variable are typically used in an experimental context. In the context of regression, the dependent variable is often referred to as the response variable.
5.7 descriptive
Statistics that describe an aspect of data (e.g., mean, median, mode, variance, range)
Contrast with inferential statistics.
5.8 deviation score
A score minus the mean
score | mean | deviation_score |
---|---|---|
8 | 10 | -2 |
9 | 10 | -1 |
10 | 10 | 0 |
11 | 10 | 1 |
12 | 10 | 2 |
5.9 directory
A collection or "folder" of files on a computer.
In a file path, each directory is separated by forward slashes, e.g., "dir1/dir2/file.Rmd".
Sometimes you need to check if a directory exists and/or make a new directory. For example, a script may save images in a directory called "images" that is in the same directory as the script (so you can use the relative path). Below is code for checking whether that directory exists and making it if it doesn't.
path <- "images"
if (!dir.exists(path)) {
dir.create(path)
}
5.10 discrete
Data that can only take certain values, such as integers.
Discrete data are not continuous, so it doesn't make sense to have a value that is partway between two values. For example, the number of texts you send per day is discrete; you can't send 12.5 texts. Discrete data can be ordinal or nominal.
5.12 double brackets
Two pairs of square brackets used to select a single item from a container like a list, data frame, or matrix (e.g., data[[1]]).
data <- data.frame(
id = 1:3,
x = c(1.4, 2.3, 8.2)
)
# double brackets return the id column as a vector
data[["id"]]
#> [1] 1 2 3
See brackets for an explanation of the difference between single and double brackets.
5.13 double
A data type representing a real decimal number.
Examples of doubles are 1
, 1.0
, -0.01
, or 1e4
.
5.14 dynamic
Something that can change in response to user actions
For example, in a shiny app, the following code creates dynamic and static elements.
dynamic <- shiny::textOutput("This text can change")
static <- shiny::p("This text cannot change")