--- title: "Survey-weighted estimation with MORIE" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Survey-weighted estimation with MORIE} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = requireNamespace("morie", quietly = TRUE) ) ``` # Overview Many MORIE analyses use Statistics Canada Public Use Microdata Files (CCS, CSADS, CSUS, CADS, CPADS), which require complex-sample design weights for valid inference. This vignette covers the sampling and weighting helpers that wrap the same machinery. # Stratified, cluster, and PPS sampling ```{r sampling} library(morie) set.seed(11) n <- 1000 df <- data.frame( id = seq_len(n), region = sample(c("ON", "QC", "BC", "AB"), n, replace = TRUE), age = round(rnorm(n, 40, 12)), weight = runif(n, 0.5, 2.0) ) # Stratified random sample within region. strat <- morie_stratified_sample(df, strata_col = "region", n_per_stratum = 50) # Single-stage cluster sample on region (all units in sampled # clusters are kept). clust <- morie_cluster_sample(df, cluster_col = "region", n_clusters = 2) dim(strat) dim(clust) ``` # Calibration weights and design weights `morie_compute_design_weights()` and `morie_calibration_weights()` produce weights consistent with a known sampling scheme and known population marginals. ```{r weights, eval = exists("morie_compute_design_weights")} # Design weights from a stratified scheme. # Treat `region` as the stratum; supply known population sizes per stratum. pop_sizes <- c(ON = 14000000, QC = 8500000, BC = 5200000, AB = 4400000) design <- morie_compute_design_weights( df, strata_col = "region", population_sizes = pop_sizes ) head(design) ``` # Effective sample size and design effect ```{r ess-deff} # Effective sample size given a weight vector. ess <- morie_effective_sample_size(df$weight) ess # Design effect: how much variance inflates from the weighting. deff <- morie_design_effect(df$weight) deff ``` The effective sample size answers the practical question "how many independent observations does this weighted sample contain?" The design effect translates between unweighted-equivalent and complex-sample variances. # Bootstrap variance with weights ```{r bootstrap} # Bootstrap a weighted statistic --- here, the weighted mean of `age`. boot <- morie_bootstrap_sample( df, statistic = function(x) stats::weighted.mean(x$age, x$weight), n_bootstrap = 100 ) str(boot, max.level = 1) ``` # Where to go next - For complex-survey causal inference (survey-weighted ATE/AIPW), see the `causal-inference` vignette --- the same `estimate_*` functions accept a `weights` argument. - For Statistics Canada PUMF acknowledgments and citation requirements, see the package README.