MRM empirical callables (OTIS / TPS / SIU)

Overview

This vignette documents the mrm_otis_*(), mrm_tps_*(), and mrm_siu_*() empirical callables. Each function is a one-line entry point to a verified analysis used in the MRM empirical paper (Ruhela 2026, in preparation). Every example below runs on the small reference samples bundled with the package, so the vignette is network-free.

For the full datasets:

  • OTIS 192 morie_load_dataset("otisb01") (downloads via CKAN on first call; subsequent calls hit the local SQLite cache)
  • TPS 192 morie_fetch_tps("Assault") (ArcGIS REST)
  • SIU 192 morie_fetch_siu() (on-demand scrape of public reports)

See vignette("mrm-dataset-fetchers") for the dataset side.

library(morie)
b01 <- morie_sample("otis_b01")
b09 <- morie_sample("otis_b09")
tps <- morie_sample("tps_assault")

OTIS suite

Placement-count concentration on b09

The b09 long-format file publishes per (fiscal year 0d7 placement-count band 0d7 gender) counts of individuals in segregation. The callable expands the banded counts using midpoints and returns Hill-MLE Pareto exponent, Gini coefficient, mean placements per individual, and the top-k% concentration share.

mrm_otis_placement_concentration(b09)
#>     year n_individuals n_placements mean_per_individual   gini hill_alpha
#> 1   2023         12647        55421            4.382146 0.5331     1.5967
#> 2   2024         10881        47123            4.330760 0.5862     1.6430
#> 3   2025          9608        46893            4.880620 0.6057     1.6202
#> 4 pooled         33136       149437            4.509808 0.5748     1.6181
#>   top_pct_share
#> 1        0.2932
#> 2        0.3351
#> 3        0.3215
#> 4        0.3180

The values are computed within fiscal year: OTIS UniqueIndividual_ID has format YYYY-XXXXX-SG and is randomly reassigned every fiscal year, so cross-year tracking is invalid by design.

Segregation-duration KM on b01

NumberConsecutiveDays_Segregation is the duration in days of each placement (no censoring 014 all durations are observed). The callable reports the per-stratum mean, median, q25, and the fraction above the UN Mandela 15-day cutoff.

mrm_otis_seg_duration_km(b01)
#>   stratum    n mean_days median_days q25_days pct_above_mandela
#> 1  pooled 1000      3.03           2        3               0.7
#>   median_among_above_mandela
#> 1                         54
mrm_otis_seg_duration_km(b01, group_cols = "MentalHealth_Alert")
#>   stratum   n mean_days median_days q25_days pct_above_mandela
#> 1      No 499      2.76           2        3               0.6
#> 2     Yes 501      3.29           2        4               0.8
#>   median_among_above_mandela
#> 1                       54.0
#> 2                       41.5

This callable replaces the misreading of YYYY-XXXXX-SG as a persistent person identifier, which produces a spurious cross-year “time-to-readmission” artifact.

Mortification co-occurrence (alert columns)

The three b01 alert flags (MentalHealth_Alert, SuicideRisk_Alert, SuicideWatch_Alert) co-occur to a degree well above independence. The substantive figure is MentalHealth \u00d7 SuicideRisk Cramer’s V.

mrm_otis_mortification_cooccurrence(b01)
#>              alert_a            alert_b    n   chi2 df   p_value
#> 1 MentalHealth_Alert  SuicideRisk_Alert 1000  33.25  1  8.12e-09
#> 2 MentalHealth_Alert SuicideWatch_Alert 1000  12.09  1  5.08e-04
#> 3  SuicideRisk_Alert SuicideWatch_Alert 1000 470.37  1 2.66e-104
#>   morie_cramers_v
#> 1          0.1823
#> 2          0.1099
#> 3          0.6858

Region locality

Ontario provincial seg/RC placement is overwhelmingly locality-preserving 014 over 95% of placements remain within the same region in the full b01.

# (Region columns are present only in the full b01, not the bundled
# sample; uncomment after morie_load_dataset("otisb01") or
# morie_fetch_tps(...) if needed.)
res <- mrm_otis_region_locality(b01)
print(res$table)
cat("diagonal share:", res$diagonal_share, "  V:", res$morie_cramers_v, "\n")

Mandela classification

mrm_classify_mandela() shipped in v0.1.14 and remains the canonical Mandela classifier in v0.2.0. It supports three operationalisations:

mrm_classify_mandela(b01, denominator = "row")           # per-placement
#>     year denominator n_mandela        rate  pct n_broader_rc rate_broader
#> 1   2023         362         0 0.000000000 0.00            0  0.000000000
#> 2   2024         337         3 0.008902077 0.89            3  0.008902077
#> 3   2025         301         4 0.013289037 1.33            4  0.013289037
#> 4 pooled        1000         7 0.007000000 0.70            7  0.007000000
mrm_classify_mandela(b01, denominator = "individual_any") # per-person
#>     year denominator n_mandela        rate  pct n_broader_rc rate_broader
#> 1   2023         354         0 0.000000000 0.00            0  0.000000000
#> 2   2024         329         3 0.009118541 0.91            3  0.009118541
#> 3   2025         289         4 0.013840830 1.38            4  0.013840830
#> 4 pooled         972         7 0.007201646 0.72            7  0.007201646
mrm_classify_mandela(b01, denominator = "individual_cumulative")
#>     year denominator n_mandela        rate  pct n_broader_rc rate_broader
#> 1   2023         354         0 0.000000000 0.00            0  0.000000000
#> 2   2024         329         3 0.009118541 0.91            3  0.009118541
#> 3   2025         289         5 0.017301038 1.73            5  0.017301038
#> 4 pooled         972         8 0.008230453 0.82            8  0.008230453

The provincial-canonical 12.5/16.5/20.6 % torture rates from c11 require the c11 aggregate (loaded via morie_sample("otis_c11")); see the MRM empirical paper 0a76.

TPS suite

Levy-flight Hill exponent on inter-event step lengths

Treats consecutive events in chronological order as a single stream and computes the haversine inter-event step length (km). Returns the Hill-MLE exponent restricted to steps above min_step_km.

mrm_tps_levy_scaling(tps)
#> $n_events
#> [1] 1000
#> 
#> $n_steps_tail
#> [1] 995
#> 
#> $min_step_km
#> [1] 0.5
#> 
#> $hill_alpha
#> [1] 1.3043

Moran’s I + DBSCAN clustering

Grids the WGS84 extent into a coarse raster, counts events per cell, and computes the global Moran’s I via a rook-contiguity matrix. Also runs DBSCAN on the raw lat/long points (rescaled to km) for cluster counts.

mrm_tps_moran_clustering(tps, grid_resolution = 20L)
#> $morans_I
#> [1] -0.000138
#> 
#> $morans_z
#> [1] 0.67
#> 
#> $dbscan_n_clusters
#> [1] 21
#> 
#> $dbscan_n_noise
#> [1] 730
#> 
#> $dbscan_largest
#> [1] 82

For the high-precision computation on the full 254,378-event Assault file, use the morie Python tps_spatial_advanced pipeline; the R version is for quick interactive auditing.

Neighbourhood inter-event recurrence

For each HOOD_158 neighbourhood, sorts events chronologically and computes the gap (in days) between consecutive events.

head(mrm_tps_neighbourhood_recurrence_km(tps))
#>   hood n_events n_gaps mean_gap_days median_gap_days p25_gap_days p75_gap_days
#> 1  001       17     16        267.06           213.5       101.00       356.75
#> 2  002       12     11        447.91           360.0        74.00       606.00
#> 3  003        5      4        887.75           869.0       557.75      1199.00
#> 4  004        3      2       1781.00          1781.0      1065.50      2496.50
#> 5  005        4      3        776.00           676.0       537.00       965.00
#> 6  006        5      4        845.00           731.0       289.50      1286.50

Hawkes manifest loader

mrm_tps_load_hawkes_refit(path) reads paper_hawkes_refit.json (the per-category Hawkes refit table from the MRM empirical paper 0a77.1-7.2) and returns it as a tidy data.frame. The reference manifest ships with the package; the loader defaults to it (no path argument needed).

SIU suite

The SIU callables operate on the SIU.csv file produced by morie_fetch_siu() (an on-demand scraper of the public Director’s Reports). The scraped corpus is not shipped, but the callables themselves do not depend on shipped data.

siu_path <- morie_fetch_siu()
siu <- read.csv(siu_path)
res <- mrm_siu_case_to_decision_km(siu)
print(res$pooled)
head(res$by_service[order(-res$by_service$n),])
mrm_siu_per_service_rate(siu)
mrm_siu_outcome_classifier(siu)

The verified pooled median in our test snapshot is 120 days from incident to Director’s decision (n = 1,711 cases). Per-service medians cluster tightly around 120, indicating a system-wide processing cadence rather than a per-jurisdiction effect.

References

  • MRM theoretical paper 014 Ruhela (2026), MRM: Multilevel Reconciliation Methodology — A Multi-Source Statistical Foundation for Canadian Carceral, Police, and Oversight Data.
  • MRM empirical paper 014 Ruhela (2026), Solitary Confinement, Self-Excitation, and Institutional Churn: Empirical Applications of MRM to Canadian Carceral and Police Data.
  • OTIS data dictionary 014 data.ontario.ca/dataset/data-on-inmates-in-ontario.
  • Toronto Police Open Data 014 data.torontopolice.on.ca/.
  • SIU public Director’s Reports 014 siu.on.ca/en/case_directors_reports.php.