---
title: "IPW deep-dive (Hajek and Horvitz--Thompson)"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{IPW deep-dive}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  eval     = requireNamespace("morie", quietly = TRUE)
)
```

# Overview

Inverse-probability weighting (IPW) is the simplest of the
single-robust causal estimators. This vignette shows the building
blocks that MORIE exposes: the Horvitz--Thompson and the
Hajek-stabilised IPW estimators, propensity-score modelling,
and weight-trimming diagnostics.

# Setting up

```{r setup}
library(morie)
set.seed(2026)
n <- 500
X1 <- rnorm(n)
X2 <- rnorm(n)
ps_true <- plogis(0.4 * X1 - 0.3 * X2)
treat   <- as.integer(ps_true > runif(n))
y       <- 1.0 * treat + 0.6 * X1 - 0.2 * X2 + rnorm(n, sd = 0.5)
df <- data.frame(y = y, treat = treat, X1 = X1, X2 = X2)
```

# Estimating propensities

The `morie_estimate_ate()` machinery fits a logistic propensity model
internally and returns the IPW estimate by default. To inspect the
propensities, set `propensity_col` after fitting them:

```{r ps}
ps_fit <- glm(treat ~ X1 + X2, family = binomial(), data = df)
df$ps  <- predict(ps_fit, type = "response")
summary(df$ps)
```

# Hajek-stabilised IPW

`morie_estimate_ate()` defaults to the Hajek estimator, which divides
each weighted sum by the corresponding sum of weights. This
stabilises the estimator under finite samples even when the
propensity tails are heavy:

```{r hajek}
ate_hajek <- morie_estimate_ate(df, treatment = "treat", outcome = "y", covariates = c("X1", "X2"),
                          propensity_col = "ps")
ate_hajek$estimate
ate_hajek$se
```

# Weight diagnostics

In practice, IPW is sensitive to extreme propensities. Two common
diagnostics:

```{r diag}
# Effective sample size after weighting
ess <- morie_effective_sample_size(1 / df$ps)
ess

# Range of weights (extreme means trimming)
range(1 / df$ps)
```

If the effective sample size collapses dramatically, the analysis
should consider:

- Trimming propensities to a sensible interval (e.g. [0.05, 0.95])
- Switching to a doubly-robust estimator (`morie_estimate_aipw()`)
- Adding more covariates to better separate the treatment groups

# AIPW for protection against IPW failure

```{r aipw}
aipw <- morie_estimate_aipw(df, treatment = "treat", outcome = "y", covariates = c("X1", "X2"))
aipw$estimate
```

When propensities are well-behaved, IPW and AIPW should agree to
within Monte Carlo noise. Disagreement is informative: it suggests
either model misspecification or a fragile propensity model.

# Where to go next

- The `causal-inference` vignette covers ATT / ATC / CATE / GATE.
- The `survey-weighted` vignette covers IPW under complex-sample
  designs (when survey weights and propensities both apply).