The rnorm_multi() function makes multiple normally distributed vectors with specified parameters and relationships.

Quick example

For example, the following creates a sample that has 100 observations of 3 variables, drawn from a population where A has a mean of 0 and SD of 1, while B and C have means of 20 and SDs of 5. A correlates with B and C with r = 0.5, and B and C correlate with r = 0.25.


dat <- rnorm_multi(n = 100, 
                  mu = c(0, 20, 20),
                  sd = c(1, 5, 5),
                  r = c(0.5, 0.5, 0.25), 
                  varnames = c("A", "B", "C"),
                  empirical = FALSE)
Sample stats
n var A B C mean sd
100 A 1.00 0.45 0.49 0.03 0.99
100 B 0.45 1.00 0.33 20.01 4.89
100 C 0.49 0.33 1.00 19.76 4.02

Specify correlations

You can specify the correlations in one of four ways:

  • A single r for all pairs
  • A vars by vars matrix
  • A vars*vars length vector
  • A vars*(vars-1)/2 length vector

One Number

If you want all the pairs to have the same correlation, just specify a single number.

bvn <- rnorm_multi(100, 5, 0, 1, .3, varnames = letters[1:5])
Sample stats from a single rho
n var a b c d e mean sd
100 a 1.00 0.35 0.22 0.45 0.37 -0.04 1.09
100 b 0.35 1.00 0.19 0.36 0.28 -0.05 0.83
100 c 0.22 0.19 1.00 0.26 0.20 0.01 1.08
100 d 0.45 0.36 0.26 1.00 0.24 0.00 1.00
100 e 0.37 0.28 0.20 0.24 1.00 0.04 0.97

Matrix

If you already have a correlation matrix, such as the output of cor(), you can specify the simulated data with that.

cmat <- cor(iris[,1:4])
bvn <- rnorm_multi(100, 4, 0, 1, cmat, 
                  varnames = colnames(cmat))
Sample stats from a correlation matrix
n var Sepal.Length Sepal.Width Petal.Length Petal.Width mean sd
100 Sepal.Length 1.00 -0.10 0.88 0.83 -0.01 1.05
100 Sepal.Width -0.10 1.00 -0.38 -0.29 -0.19 1.09
100 Petal.Length 0.88 -0.38 1.00 0.96 -0.01 1.02
100 Petal.Width 0.83 -0.29 0.96 1.00 -0.05 0.98

Vector (vars*vars)

You can specify your correlation matrix by hand as a vars*vars length vector, which will include the correlations of 1 down the diagonal.

cmat <- c(1, .3, .5,
          .3, 1, 0,
          .5, 0, 1)
bvn <- rnorm_multi(100, 3, 0, 1, cmat, 
                  varnames = c("first", "second", "third"))
Sample stats from a vars*vars vector
n var first second third mean sd
100 first 1.00 0.33 0.45 -0.12 1.01
100 second 0.33 1.00 -0.04 -0.01 1.04
100 third 0.45 -0.04 1.00 -0.11 1.00

Vector (vars*(vars-1)/2)

You can specify your correlation matrix by hand as a vars*(vars-1)/2 length vector, skipping the diagonal and lower left duplicate values.

Sample stats from a (vars*(vars-1)/2) vector
n var a b c d mean sd
100 a 1.00 0.35 0.55 0.50 -0.13 1.01
100 b 0.35 1.00 0.16 0.09 -0.10 1.05
100 c 0.55 0.16 1.00 -0.21 -0.19 0.91
100 d 0.50 0.09 -0.21 1.00 0.12 0.97

empirical

If you want your samples to have the exact correlations, means, and SDs you entered, set empirical to TRUE.

Sample stats with empirical = TRUE
n var a b c d e mean sd
100 a 1.0 0.3 0.3 0.3 0.3 0 1
100 b 0.3 1.0 0.3 0.3 0.3 0 1
100 c 0.3 0.3 1.0 0.3 0.3 0 1
100 d 0.3 0.3 0.3 1.0 0.3 0 1
100 e 0.3 0.3 0.3 0.3 1.0 0 1

Pre-existing variable

Us rnorm_pre() to create a vector with a specified correlation to a pre-existing variable. The following code creates a vector called sl.5 with a mean of 10, SD of 2 and a correlation of r = 0.5 to the Sepal.Length column in the built-in dataset iris.

sl <- iris$Sepal.Length

sl.5.v1 <- rnorm_pre(sl, mu = 10, sd = 2, r = 0.5)
sl.5.v2 <- rnorm_pre(sl, mu = 10, sd = 2, r = 0.5)
rnorm_pre
n var sl sl.5.v1 sl.5.v2 mean sd
150 sl 1.00 0.47 0.52 5.84 0.83
150 sl.5.v1 0.47 1.00 0.21 10.28 2.04
150 sl.5.v2 0.52 0.21 1.00 10.08 2.14

Set empirical = TRUE to return a vector with the exact specified parameters.

sl.5.v1 <- rnorm_pre(sl, mu = 10, sd = 2, r = 0.5, empirical = TRUE)
sl.5.v2 <- rnorm_pre(sl, mu = 10, sd = 2, r = 0.5, empirical = TRUE)
rnorm_pre with empirical = TRUE
n var sl sl.5.v1 sl.5.v2 mean sd
150 sl 1.0 0.50 0.50 5.84 0.83
150 sl.5.v1 0.5 1.00 0.33 10.00 2.00
150 sl.5.v2 0.5 0.33 1.00 10.00 2.00