The MRM empirical callables (mrm_otis_*,
mrm_tps_*, mrm_siu_*) operate on three
external data sources. This vignette documents how morie
makes each of them accessible to end users:
data.ontario.ca, reached via the existing
morie_load_dataset() infrastructure.morie_fetch_tps().morie_fetch_siu() (the corpus is not shipped because
redistribution licensing is unsettled).Four small reference samples are bundled with the package
(inst/extdata/) so every example runs offline.
The samples live in inst/extdata/ and total ~420 KB.
Each is a 1000-row random draw (seed 42) except otis_b09
and otis_c11, which are shipped whole (already small).
# Bundled samples
b01 <- morie_sample("otis_b01")
b09 <- morie_sample("otis_b09")
c11 <- morie_sample("otis_c11")
tps <- morie_sample("tps_assault")
# Schema sanity
str(b01, max.level = 1)
#> 'data.frame': 1000 obs. of 18 variables:
#> $ EndFiscalYear : int 2025 2023 2023 2023 2024 2024 2025 2025 2024 2023 ...
#> $ UniqueIndividual_ID : chr "2025-08960-SG" "2023-11152-SG" "2023-08955-SG" "2023-11360-SG" ...
#> $ Gender : chr "Male" "Male" "Female" "Male" ...
#> $ Region_AtTimeOfPlacement : chr "Eastern" "Northern" "Central" "Central" ...
#> $ Region_MostRecentPlacement : chr "Eastern" "Northern" "Central" "Central" ...
#> $ Age_Category : chr "25 to 49" "25 to 49" "25 to 49" "25 to 49" ...
#> $ NumberConsecutiveDays_Segregation : int 4 1 6 1 2 1 1 1 2 3 ...
#> $ SegReason_SecurityOfInstitution_SafetyOfOthers : chr "No" "No" "No" "No" ...
#> $ SegReason_InmateNeedsProtection : chr "No" "Yes" "No" "No" ...
#> $ SegReason_InmateNeedsProtection_Medical : chr "Yes" "No" "No" "No" ...
#> $ SegReason_SecurityOfInstitution_SafetyOfOthers_Medical: chr "No" "No" "Yes" "Yes" ...
#> $ SegReason_Disciplinary_Segregation : chr "No" "No" "No" "No" ...
#> $ SegReason_InmateRefuseSearch_Scan : chr "No" "No" "No" "No" ...
#> $ MentalHealth_Alert : chr "Yes" "No" "Yes" "No" ...
#> $ SuicideRisk_Alert : chr "Yes" "No" "Yes" "No" ...
#> $ SuicideWatch_Alert : chr "Yes" "No" "Yes" "No" ...
#> $ SegReason_Other : chr "No" "" "" "" ...
#> $ Number_Of_Placements : int 1 8 1 1 1 2 2 1 2 1 ...
nrow(b09)
#> [1] 78
nrow(c11)
#> [1] 33
ncol(tps)
#> [1] 31data.ontario.ca)The OTIS public release is hosted at
data.ontario.ca/dataset/data-on-inmates-in-ontario as 28
CSV resources. The MORIE catalog has four of them registered with their
canonical CKAN resource IDs:
cat <- morie_dataset_catalog()
cat[cat$source == "otis", c("key", "name", "table_name", "ckan_resource_id")]
#> key name
#> 37 otisa01 OTIS a01: Restrictive Confinement - Detailed Dataset
#> 38 otisb01 OTIS b01: Segregation - Detailed Dataset
#> 39 otisb09 OTIS b09: Individuals in Segregation - Number of Placements
#> 40 otisc11 OTIS c11: Individuals in Segregation/RC by Aggregate Length
#> table_name ckan_resource_id
#> 37 otisa01 5a0c5804-a055-4031-9743-73f556e43bb4
#> 38 otisb01 406e6d90-d568-4553-8ca7-bc9f90e133b9
#> 39 otisb09 df24e943-d52b-43a8-a10e-a3cc906e26bb
#> 40 otisc11 9c7b74a5-53ad-4ef0-a7a6-97772cd01c55To pull the full b01 (82,001 rows) from CKAN:
The loader downloads the CSV on first call, caches it into the package SQLite database, and returns a data.frame on subsequent calls.
Toronto Police Service publishes per-category crime events through ArcGIS Online. MORIE knows the layer URLs for nine categories.
morie_tps_layer_urls()
#> Assault
#> "https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Assault_Open_Data/FeatureServer/0"
#> AutoTheft
#> "https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Auto_Theft_Open_Data/FeatureServer/0"
#> BicycleTheft
#> "https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Bicycle_Thefts_Open_Data/FeatureServer/0"
#> BreakAndEnter
#> "https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Break_and_Enter_Open_Data/FeatureServer/0"
#> Homicides
#> "https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Homicides_Open_Data_ASR_RC_TBL_002/FeatureServer/0"
#> Robbery
#> "https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Robbery_Open_Data/FeatureServer/0"
#> ShootingAndFirearmDiscarges
#> "https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Shooting_and_Firearm_Discharges_Open_Data/FeatureServer/0"
#> TheftFromMV
#> "https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Theft_From_Motor_Vehicle_Open_Data/FeatureServer/0"
#> TheftOver
#> "https://services.arcgis.com/S9th0jAJ7bqgIRjw/arcgis/rest/services/Theft_Over_Open_Data/FeatureServer/0"To fetch a single category:
csv_path <- morie_fetch_tps("Assault")
assault <- utils::read.csv(csv_path)
nrow(assault)
# 254378 (as of mid-2026)The fetcher pages through /query with a
2000-record-per-page cap (the ArcGIS-imposed maximum) and writes a tidy
CSV to ~/.cache/morie/tps/tps_Assault.csv. Subsequent calls
return the cached path unless overwrite = TRUE.
Filtering at the server side:
The Ontario SIU publishes Director’s Reports at
siu.on.ca/en/case_directors_reports.php. MORIE includes an
on-demand scraper that:
csv_path <- morie_fetch_siu() # full unfiltered index
siu <- utils::read.csv(csv_path)
nrow(siu)
# Or restricted to specific years:
csv_path <- morie_fetch_siu(years = 2020:2025, overwrite = TRUE)The legal status of redistributing a single tabular copy of public oversight reports is not clearly established. Running the scraper per-user is unambiguously fair use of public information; bundling the scraped corpus might be more questionable. The scraper itself respects a 2-second rate limit, sets a clear User-Agent, and follows the SIU site’s published structure.
morie_fetch_siu() in R is a thin reticulate
wrapper around morie.siu_fetch.fetch_siu_cases() in Python.
This keeps the regex parsing logic in one canonical place. If
reticulate isn’t installed, fall back to calling the Python
directly:
Putting it together 014 a complete worked example without any network call:
b01 <- morie_sample("otis_b01")
# Mandela classification with default "individual_any" denominator
mrm_classify_mandela(b01, denominator = "row")
#> year denominator n_mandela rate pct n_broader_rc rate_broader
#> 1 2023 362 0 0.000000000 0.00 0 0.000000000
#> 2 2024 337 3 0.008902077 0.89 3 0.008902077
#> 3 2025 301 4 0.013289037 1.33 4 0.013289037
#> 4 pooled 1000 7 0.007000000 0.70 7 0.007000000
# Segregation duration KM
mrm_otis_seg_duration_km(b01,
group_cols = "MentalHealth_Alert")
#> stratum n mean_days median_days q25_days pct_above_mandela
#> 1 No 499 2.76 2 3 0.6
#> 2 Yes 501 3.29 2 4 0.8
#> median_among_above_mandela
#> 1 54.0
#> 2 41.5
# Mortification co-occurrence: Cramer's V across alert pairs
mrm_otis_mortification_cooccurrence(b01)
#> alert_a alert_b n chi2 df p_value
#> 1 MentalHealth_Alert SuicideRisk_Alert 1000 33.25 1 8.12e-09
#> 2 MentalHealth_Alert SuicideWatch_Alert 1000 12.09 1 5.08e-04
#> 3 SuicideRisk_Alert SuicideWatch_Alert 1000 470.37 1 2.66e-104
#> morie_cramers_v
#> 1 0.1823
#> 2 0.1099
#> 3 0.6858For a full-data version, swap morie_sample("otis_b01")
for morie_load_dataset("otisb01") and re-run the same
callables.
| Cache path | Populated by | Size (full) |
|---|---|---|
~/.cache/morie/morie.db (SQLite) |
morie_load_dataset(*) |
a few MB to ~1 GB depending on selection |
~/.cache/morie/tps/tps_*.csv |
morie_fetch_tps() |
~5–50 MB per category |
~/.cache/morie/siu/SIU.csv |
morie_fetch_siu() |
~5 MB |
data.ontario.ca/dataset/data-on-inmates-in-ontariodata.torontopolice.on.ca/siu.on.ca/en/case_directors_reports.phpcitation("morie") for the full
bibentry)