Most functions to fit multilevel and mixed effects models only
allow to specify frequency weights, but not design (i.e. sampling or probability)
weights, which should be used when analyzing complex samples and survey data.
scale_weights()
implements an algorithm proposed by Aaparouhov (2006)
and Carle (2009) to rescale design weights in survey data to account for
the grouping structure of multilevel models, which then can be used for
multilevel modelling.
scale_weights(x, cluster.id, pweight)
x | A data frame. |
---|---|
cluster.id | Variable indicating the grouping structure (strata) of the survey data (level-2-cluster variable). |
pweight | Variable indicating the probability (design or sampling) weights of the survey data (level-1-weight). |
x
, with two new variables: svywght_a
and svywght_b
,
which represent the rescaled design weights to use in multilevel models.
Rescaling is based on two methods: For svywght_a
, the sample
weights pweight
are adjusted by a factor that represents the proportion
of cluster size divided by the sum of sampling weights within each cluster.
The adjustment factor for svywght_b
is the sum of sample weights
within each cluster devided by the sum of squared sample weights within
each cluster (see Carle (2009), Appendix B).
Regarding the choice between scaling methods A and B, Carle suggests
that "analysts who wish to discuss point estimates should report results
based on weighting method A. For analysts more interested in residual
between-cluster variance, method B may generally provide the least biased
estimates". In general, it is recommended to fit a non-weighted model
and weighted models with both scaling methods and when comparing the
models, see whether the "inferential decisions converge", to gain
confidence in the results.
Though the bias of scaled weights decreases with increasing cluster size,
method A is preferred when insufficient or low cluster size is a concern.
The cluster ID and probably PSU may be used as random effects (e.g.
nested design, or cluster and PSU as varying intercepts), depending
on the survey design that should be mimicked.
Carle AC. Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology 2009, 9(49): 1-13
Asparouhov T. General Multi-Level Modeling with Sampling Weights. Communications in Statistics—Theory and Methods 2006, 35: 439–460
data(nhanes_sample) scale_weights(nhanes_sample, SDMVSTRA, WTINT2YR)#> # A tibble: 2,992 x 9 #> total age RIAGENDR RIDRETH1 SDMVPSU SDMVSTRA WTINT2YR svywght_a svywght_b #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 2.2 1 3 2 31 97594. 1.57 1.20 #> 2 7 2.08 2 3 1 29 39599. 0.623 0.525 #> 3 3 1.48 2 1 2 42 26620. 0.898 0.544 #> 4 4 1.32 2 4 2 33 34999. 0.708 0.550 #> 5 1 2 2 1 1 41 14746. 0.422 0.312 #> 6 6 2.2 2 4 1 38 28232. 0.688 0.516 #> 7 350 1.6 1 3 2 33 93162. 1.89 1.46 #> 8 NA 1.48 2 3 1 29 82276. 1.29 1.09 #> 9 3 2.28 2 4 1 41 24726. 0.707 0.523 #> 10 30 0.84 1 3 2 35 39895. 0.760 0.594 #> # ... with 2,982 more rows