Auditing a system for group disparity

What this vignette covers

morie 0.8.0 adds a fairness & disparity-audit subsystem. It does not build or deploy a predictive-policing or risk-assessment system — it measures whether an existing one treats demographic groups differently, so the system can be held accountable.

This vignette walks through the R surface: the group-disparity metrics and the predictive-policing calibration audit.

library(rmorie)

Group-disparity metrics

Suppose a risk-flagging system has flagged a set of individuals. Group "A" is flagged at 100%, group "B" at only 60%.

flagged <- c(rep(1, 5), 1, 1, 1, 0, 0)        # 10 individuals
race    <- c(rep("A", 5), rep("B", 5))

di <- morie_fairness_disparate_impact(flagged, race, privileged = "A")
di$value            # disparate-impact ratio
#> [1] 0.6
di$adverse_impact   # below the 0.80 four-fifths threshold?
#> [1] TRUE

A disparate-impact ratio below 0.80 is the standard legal indicator of adverse impact. The demographic-parity gap reports the same disparity as an additive difference:

morie_fairness_demographic_parity(flagged, race, privileged = "A")$value
#>    B 
#> -0.4

When ground-truth outcomes are available, morie_fairness_equalized_odds audits the system’s error rates, not just its flagging rates:

truth <- c(1, 0, 1, 0, 1, 0, 1, 0)
pred  <- c(1, 0, 1, 0, 1, 1, 0, 1)
grp   <- c(rep("A", 4), rep("B", 4))
morie_fairness_equalized_odds(truth, pred, grp, privileged = "A")$violation
#> [1] TRUE

Predictive-policing calibration audit

morie_predpol_calibration_audit generalises the SciencesPo Predictive-policing-Chicago district analysis: it ranks areas by the risk an algorithm predicts and by their realised outcome rate, then tests whether the disagreement tracks the areas’ demographics.

areas        <- c("d1", "d2", "d3", "d4", "d5", "d6")
mean_risk    <- c(90, 80, 70, 30, 20, 10)   # the algorithm's ranking
outcome_rate <- c(10, 20, 30, 70, 80, 90)   # realised — the opposite
area_group   <- c("X", "X", "X", "Y", "Y", "Y")

audit <- morie_predpol_calibration_audit(areas, mean_risk, outcome_rate,
                                   area_group)
audit$group_rank_gap     # per-group mean rank gap
#> $X
#> [1] 3
#> 
#> $Y
#> [1] -3
audit$interpretation
#> [1] "Overall the ranking is miscalibrated (Spearman rho = -1.00). Group 'X' is over-predicted: its areas are ranked, on average, 3.0 rank positions more dangerous than their realised outcomes warrant."

A positive rank gap means the algorithm ranks that group’s areas more dangerous than their realised outcomes warrant — the signature of disparate over-policing.

Further reading

  • ?frns_metrics — all six disparity metrics.
  • ?frns_predpol — the predictive-policing calibration audit.
  • ?frns_temporal — the multi-city temporal audit.

The methods are clean-room reimplementations from published descriptions (IBM AIF360; the SciencesPo Predictive-policing-Chicago project; Barman & Barman, arXiv:2603.18987; the COMPAS XAI Stories audit).