Many MORIE analyses use Statistics Canada Public Use Microdata Files (CCS, CSADS, CSUS, CADS, CPADS), which require complex-sample design weights for valid inference. This vignette covers the sampling and weighting helpers that wrap the same machinery.
library(rmorie)
set.seed(11)
n <- 1000
df <- data.frame(
id = seq_len(n),
region = sample(c("ON", "QC", "BC", "AB"), n, replace = TRUE),
age = round(rnorm(n, 40, 12)),
weight = runif(n, 0.5, 2.0)
)
# Stratified random sample within region.
strat <- morie_stratified_sample(df, strata_col = "region", n_per_stratum = 50)
# Single-stage cluster sample on region (all units in sampled
# clusters are kept).
clust <- morie_cluster_sample(df, cluster_col = "region", n_clusters = 2)
dim(strat)
#> [1] 200 5
dim(clust)
#> [1] 505 5morie_compute_design_weights() and
morie_calibration_weights() produce weights consistent with
a known sampling scheme and known population marginals.
# Design weights from a stratified scheme.
# Treat `region` as the stratum; supply known population sizes per stratum.
pop_sizes <- c(ON = 14000000, QC = 8500000, BC = 5200000, AB = 4400000)
design <- morie_compute_design_weights(
df,
strata_col = "region",
population_sizes = pop_sizes
)
head(design)
#> [1] 34000.00 34000.00 18107.00 55555.56 18107.00 55555.56# Effective sample size given a weight vector.
ess <- morie_effective_sample_size(df$weight)
ess
#> [1] 897.9353
# Design effect: how much variance inflates from the weighting.
deff <- morie_design_effect(df$weight)
deff
#> [1] 1.113666The effective sample size answers the practical question “how many independent observations does this weighted sample contain?” The design effect translates between unweighted-equivalent and complex-sample variances.
# Bootstrap a weighted statistic --- here, the weighted mean of `age`.
boot <- morie_bootstrap_sample(
df,
statistic = function(x) stats::weighted.mean(x$age, x$weight),
n_bootstrap = 100
)
str(boot, max.level = 1)
#> List of 5
#> $ estimate : num 39.9
#> $ se : num 0.425
#> $ ci_lower : Named num 39.1
#> ..- attr(*, "names")= chr "2.5%"
#> $ ci_upper : Named num 40.6
#> ..- attr(*, "names")= chr "97.5%"
#> $ distribution: num [1:100] 40.3 40 39.1 39.9 39.6 ...causal-inference vignette — the same
estimate_* functions accept a weights
argument.