| Title: | Multi-Domain Open Research and Inferential Estimation |
|---|---|
| Description: | Multi-domain scientific computing toolkit for observational inference and intervention analysis across scientific-experimentation contexts, hosting the MRM (Multilevel Reconciliation Methodology) framework for Canadian carceral, police, and oversight data as its primary application. Provides general-purpose causal estimators (ATE, ATT, ATC, GATE, CATE, LATE, AIPW, G-computation), survey sampling methods (stratified, cluster, PPS, bootstrap, calibration weights), propensity-score and doubly-robust estimators, and sensitivity analyses (E-value, Rosenbaum bounds). Companion modules support signal processing and spectral analysis, cryptographic primitives, spatial statistics, statistical physics of crime (Hawkes self-exciting processes, reaction-diffusion, Levy flight, urban scaling), and classical-test-theory and item-response-theory psychometrics, alongside ingestion utilities for officially published Ontario Special Investigations Unit (police-oversight) and federal Structured Intervention Unit reports. |
| Authors: | Vansh Singh Ruhela [aut, cre]
|
| Maintainer: | Vansh Singh Ruhela <[email protected]> |
| License: | AGPL (>= 3) + file LICENSE |
| Version: | 0.9.5.12 |
| Built: | 2026-06-23 11:11:12 UTC |
| Source: | https://github.com/rootcoder007/rmorie |
Front-end over stats::p.adjust and the rmorie wrappers
above. Preserves the rmorie API used by
stat_commands.
adjust_p_values(p_values, method = "bh", alpha = 0.05, labels = NULL)adjust_p_values(p_values, method = "bh", alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
method |
One of |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Sorted vector of all command names + aliases
all_stat_command_names()all_stat_command_names()
Renders Tables 1, 2 and 3 in a single roll-up morie_result.
Figures 1-4 (time-series CANSIM data) are out-of-scope here; pass
your own series to decoupling_test() once they are available.
analyze_doob_full_affidavit()analyze_doob_full_affidavit()
A morie_result named-list.
Renders Table 1 and computes overall success / revocation rates.
analyze_doob_table1_releases()analyze_doob_table1_releases()
A morie_result named-list with title, summary_lines,
tables, interpretation, and payload.
Renders Table 2 plus year-over-year changes and 5-year averages.
analyze_doob_table2_flow()analyze_doob_table2_flow()
A morie_result named-list.
Renders Table 3 plus age-group IRRs for CSC custody and admissions vs Canadian adult population.
analyze_doob_table3_age_overrepresentation()analyze_doob_table3_age_overrepresentation()
A morie_result named-list.
Uses stats::anova for sequential (Type-I) tests, or
car::Anova for Type-II/III if car is installed.
anova_table( model, typ = 2L, digits = 3L, output_format = "dataframe", title = "ANOVA Table" )anova_table( model, typ = 2L, digits = 3L, output_format = "dataframe", title = "ANOVA Table" )
model |
An |
typ |
ANOVA type (1, 2, 3). |
digits |
Decimal places. |
output_format |
Output target. |
title |
Title. |
ARSAU = the Ontario Ministry of the Solicitor General's provincial release of Police Use-of-Force incident records (formally "Race-Based and Identity-Based Data on Police Use of Force in Ontario"). Published on the Ontario Data Catalogue at https://data.ontario.ca/dataset/police-use-of-force-race-based-data.
This file ships the R-side equivalents of the Python
morie.arsau_datasets module:
ARSAU_REGISTRY(): returns the registered (year x kind)
entries as a list of lists.
morie_arsau_load_main_records(),
morie_arsau_load_individual_records(),
morie_arsau_load_probe_cycle_records(),
morie_arsau_load_weapon_records(),
morie_arsau_load_aggregate_summary(),
morie_arsau_load_detailed_dataset(): per-record-type
loaders, returning a named list with data,
schema, sidecar, year, kind,
language, is_valid, n_rows,
n_cols, interpretation.
morie_arsau_available_years(),
morie_arsau_available_datasets(),
morie_arsau_describe(): discovery callables.
No path on the maintainer's workstation is hard-coded. All file
resolution goes through .morie_resolve_arsau_dir (defined
below), which honours, in order:
an explicit data_dir = argument
the MORIE_ARSAU_DIR environment variable
MORIE_DATA_DIR/arsau
morie_cache_dir("arsau") (only if already populated by
a previous morie_arsau_download() call – never
auto-created at read-time, per CRAN policy)
system.file("extdata", "arsau", package = "rmorie") –
the bundled tiny fixture for unit tests + tutorials
stop with a remediation paragraph
The 2023 release ships uof_weapon_records_invaliddata.csv,
flagged by the ministry as non-compliant.
morie_arsau_load_weapon_records(2023) signals an error unless
the caller passes allow_invalid = TRUE; when allowed, the
returned object's is_valid field is FALSE and its
warnings list opens with an explicit caveat paragraph.
morie.arsau_analyze).Each public callable in this file loads one ARSAU dataset via the
morie_arsau_load_* loaders defined in R/arsau.R and
chains the jurisdiction-agnostic MRM Use-of-Force primitives from
R/mrm_uof.R over it, returning a single named-list result
(classed c("morie_arsau_result", "morie_rich_result", "list"))
that bundles the loaded data, every sub-analysis, and a
multi-paragraph natural-language interpretation.
These analyzers do NOT invent new statistical methods. They wire
the generic mrm_uof_* callables against the column names
that the Ontario open-data release actually publishes. If the
upstream schema changes, the generic callables in
R/mrm_uof.R continue to work; only the column-name constants
below need patching.
Public callables:
morie_arsau_analyze_main_records
(per year, 2023 / 2024)
morie_arsau_analyze_individual_records
(per year, 2023 / 2024)
morie_arsau_analyze_probe_cycle_records
(per year, 2023 / 2024)
morie_arsau_analyze_weapon_records
(per year, 2023 / 2024; 2023 requires
allow_invalid = TRUE)
morie_arsau_analyze_aggregate_summary
(2020-2022)
morie_arsau_analyze_detailed_dataset
(2020-2022)
Every analyzer returns a list whose named slots (data,
sidecar, force_concentration, data_quality,
disparity_by_race, ...) hold the constituent sub-results,
so callers can drill into individual tests without re-running the
full pipeline.
Ontario Ministry of the Solicitor General. Annual Report on Special and Adaptive Units / Data on Police Use of Force in Ontario: 2020-2022, 2023, and 2024 releases. https://data.ontario.ca/dataset/police-use-of-force-race-based-data. Technical notes accompanying each annual release describe the data-quality reasons for the 2023 weapon-records invalidity flag.
morie.arsau_datasets).The main R-side loaders + registry list-of-lists already live in
R/arsau.R (morie_arsau_load_main_records(),
morie_arsau_load_individual_records(),
morie_arsau_load_probe_cycle_records(),
morie_arsau_load_weapon_records(),
morie_arsau_load_aggregate_summary(),
morie_arsau_load_detailed_dataset(), plus
ARSAU_REGISTRY(), ARSAU_YEARS(), ARSAU_KINDS(),
morie_arsau_read_sidecar(), morie_arsau_available_*(),
and morie_arsau_describe()). This file does NOT duplicate
those. It adds the remaining surface from the Python source:
morie_arsau_registry_df: the registry as a
tidy data.frame (one row per published file), the
canonical R equivalent of ARSAU_REGISTRY in Python.
morie_arsau_sidecar_schema: simplified
[name, type, notes] extraction from a CKAN sidecar
(mirrors morie.arsau_datasets.sidecar_schema).
morie_arsau_sidecar_to_frame: convert the
array-of-arrays records body of a CKAN sidecar to a
data.frame keyed by fields[].id (mirrors
morie.arsau_datasets.sidecar_to_frame).
morie_arsau_read_xlsx_dictionary: read an
Ontario-Catalogue XLSX data-dictionary sidecar
(requires readxl; gated with requireNamespace).
morie_arsau_read_markdown_dictionary: read an
Ontario-Catalogue Markdown data-dictionary sidecar with no
extra dependencies.
morie_arsau_ckan_url +
morie_arsau_fetch_sidecar: build the upstream
CKAN datastore_search URL and (optionally) fetch
the sidecar JSON via httr2.
Ontario Ministry of the Solicitor General. Data on Police Use
of Force in Ontario, 2020-2022 / 2023 / 2024 releases. Published
on the Ontario Data Catalogue:
https://data.ontario.ca/dataset/police-use-of-force-race-based-data.
CKAN datastore_search endpoint:
datastore_search (https://data.ontario.ca/).
Each annual release ships per-resource technical notes; the 2023
weapon_records release is explicitly flagged as containing
non-compliant data and the open-data file is renamed accordingly.
Each entry is itself a named list with year_or_range,
kind, csv_filename, sidecar_filename, expected
row / column counts, is_valid, and bilingual descriptions.
ARSAU_REGISTRY()ARSAU_REGISTRY()
Named list-of-lists.
Known ARSAU year/range keys.
ARSAU_YEARS()ARSAU_YEARS()
Comprehensive calibration assessment for binary outcomes
assess_calibration(y_true, y_pred, n_groups = 10L)assess_calibration(y_true, y_pred, n_groups = 10L)
y_true |
Integer 0/1 vector. |
y_pred |
Predicted probabilities. |
n_groups |
Hosmer-Lemeshow groups. |
Discrimination assessment for binary classifier
assess_discrimination( y_true, y_pred, y_pred_ref = NULL, n_bootstrap = 1000L, confidence = 0.95, random_state = 42L )assess_discrimination( y_true, y_pred, y_pred_ref = NULL, n_bootstrap = 1000L, confidence = 0.95, random_state = 42L )
y_true |
Integer 0/1 vector. |
y_pred |
Predicted probabilities. |
y_pred_ref |
Optional reference-model probabilities for NRI/IDI. |
n_bootstrap |
Bootstrap reps for AUC CI. |
confidence |
Confidence level. |
random_state |
Seed. |
R counterpart of morie.audit_variables. Walks every column
in every OTIS and ARSAU dataset known to the package, classifies
each variable via morie_classify_variable, and
returns a single audit object (or pair of objects, when domain =
"both") summarising coverage, levels of measurement, roles,
cross-year-safety, and recommended methods per variable.
Pure R; no C/C++ hot path needed (taxonomy is regex + lookup, not
CPU-bound). Per [[feedback_r_cpp_first]] we'd reach for
Rcpp only if profiling showed a real bottleneck.
Decision logic:
If y is NULL, one-sample t-test against zero.
If paired = TRUE, paired t-test if differences are
normal, otherwise Wilcoxon signed-rank.
If two independent samples, check both-normal (Shapiro-Wilk for n<=5000, otherwise D'Agostino-Pearson); if both normal, run Student's or Welch's t depending on Levene's test; otherwise Mann-Whitney U.
auto_test(x, y = NULL, paired = FALSE, confidence = 0.95)auto_test(x, y = NULL, paired = FALSE, confidence = 0.95)
x |
Numeric vector. |
y |
Optional second sample. |
paired |
Whether samples are paired. |
confidence |
Confidence level. |
Bartlett's test for equality of variances
bartlett_test(...)bartlett_test(...)
... |
Two or more numeric vectors. |
Thin wrapper over stats::p.adjust(method = "BH").
benjamini_hochberg(p_values, alpha = 0.05, labels = NULL) bh(p_values, alpha = 0.05, labels = NULL)benjamini_hochberg(p_values, alpha = 0.05, labels = NULL) bh(p_values, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Thin wrapper over stats::p.adjust(method = "BY").
benjamini_yekutieli(p_values, alpha = 0.05, labels = NULL) by_fdr(p_values, alpha = 0.05, labels = NULL)benjamini_yekutieli(p_values, alpha = 0.05, labels = NULL) by_fdr(p_values, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Bias-adjusted treatment effect (Ding & VanderWeele 2016)
bias_adjusted_estimate(estimate, se, rr_ud, rr_eu, prevalence_confounder = 0.5)bias_adjusted_estimate(estimate, se, rr_ud, rr_eu, prevalence_confounder = 0.5)
estimate |
Observed treatment effect on the log-RR / coefficient scale. |
se |
Standard error. |
rr_ud |
RR linking confounder to outcome. |
rr_eu |
RR linking treatment to confounder. |
prevalence_confounder |
Confounder prevalence. Default 0.5. |
Named list with adjusted_estimate, bias,
adjusted_ci_lower, adjusted_ci_upper, original_estimate.
Resamples blocks of consecutive observations. Delegates to
boot::tsboot when boot is installed (fixed and
stationary / geometric blocks); a circular-block path is
implemented inline because boot::tsboot does not expose a
circular-block sim mode directly. Falls back to a pure-R loop
otherwise.
block_bootstrap( data, statistic, block_size, n_boot = 2000L, ci_level = 0.95, method = "circular", seed = 42L )block_bootstrap( data, statistic, block_size, n_boot = 2000L, ci_level = 0.95, method = "circular", seed = 42L )
data |
Numeric vector or matrix. |
statistic |
Function returning a scalar. |
block_size |
Integer block length. |
n_boot |
Number of replicates. |
ci_level |
Confidence level. |
method |
One of |
seed |
Random seed. |
A morie_bootstrap_result.
boot::tsboot.
Thin wrapper over stats::p.adjust(method = "bonferroni").
bonferroni(p_values, alpha = 0.05, labels = NULL)bonferroni(p_values, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
A morie_rich_result list (see
morie_multiple_testing).
Resamples observations with replacement and computes a confidence
interval via the percentile, normal, basic, BCa, or studentized
method. Delegates the resampling loop and CI computation to
boot::boot and boot::boot.ci when the boot
package is installed. Falls back to an inline implementation
otherwise so the wrapper keeps working on minimal installs.
Stratified and cluster resamples are supported in both arms.
bootstrap( data, statistic, n_boot = 2000L, ci_level = 0.95, ci_method = "bca", seed = 42L, stratify = NULL, cluster = NULL )bootstrap( data, statistic, n_boot = 2000L, ci_level = 0.95, ci_method = "bca", seed = 42L, stratify = NULL, cluster = NULL )
data |
A numeric vector or matrix of observations. |
statistic |
Function of one argument that returns a scalar. |
n_boot |
Number of bootstrap replicates (default 2000). |
ci_level |
Confidence level (default 0.95). |
ci_method |
One of |
seed |
Random seed. |
stratify |
Optional vector of stratum labels (length n). |
cluster |
Optional vector of cluster labels (length n). |
A morie_bootstrap_result list.
boot::boot, boot::boot.ci,
morie_boot_run(), morie_boot_basic_ci().
Computes apparent error, mean OOB bootstrap error, the .632
estimator (Efron 1983), and the .632+ no-information-adjusted
estimator (Efron and Tibshirani 1997). ipred::errorest(...,
estimator = "632plus") implements the same family in
ipred; it is cross-referenced for users who already work
with ipred's predict.\<learner\> ecosystem. The inline
implementation is retained because rmorie's API takes naked
model_fn / score_fn callables and is consumed by
downstream MRM code.
bootstrap_632(X, y, model_fn, score_fn, n_boot = 200L, seed = 42L)bootstrap_632(X, y, model_fn, score_fn, n_boot = 200L, seed = 42L)
X |
Numeric design matrix (n x p). |
y |
Numeric response (length n). |
model_fn |
Function |
score_fn |
Function |
n_boot |
Number of bootstrap replicates. |
seed |
Random seed. |
Named numeric list with apparent_error, bootstrap_error, error_632, error_632plus.
ipred::errorest.
Generic bootstrap CI wrapper for any effect-size function
bootstrap_effect_size_ci( func, ..., n_boot = 2000L, confidence = 0.95, seed = 42L )bootstrap_effect_size_ci( func, ..., n_boot = 2000L, confidence = 0.95, seed = 42L )
func |
A function taking one or more numeric vectors, returning a scalar. |
... |
Numeric vectors to bootstrap, in |
n_boot |
Bootstrap replicates (default 2000). |
confidence |
Confidence level. Default 0.95. |
seed |
RNG seed. Default 42. |
A morie_effect_size.
Bootstrap .632 / .632+ validation
bootstrap_validate( fit_fn, predict_fn, X, y, n_bootstraps = 200L, scoring = "roc_auc", method = "632plus", random_state = 42L )bootstrap_validate( fit_fn, predict_fn, X, y, n_bootstraps = 200L, scoring = "roc_auc", method = "632plus", random_state = 42L )
fit_fn |
A function |
predict_fn |
A function |
X |
Matrix or data frame of features. |
y |
Vector of targets. |
n_bootstraps |
Number of bootstrap replicates. |
scoring |
"roc_auc", "accuracy", "brier". |
method |
"632" or "632plus". |
random_state |
Seed. |
Zero-phase Butterworth bandpass filter. Isolates a frequency band of interest (e.g., 0.5–40 Hz for EEG, 25–400 Hz for phonocardiogram).
buttbp(x, fs, low, high, order = 4L)buttbp(x, fs, low, high, order = 4L)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). |
low |
Lower cutoff (Hz). |
high |
Upper cutoff (Hz). |
order |
Filter order (default 4). |
List with filtered (numeric vector), fs, order, name.
if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 1000) # 2 Hz drift + 10 Hz band of interest + 60 Hz noise x <- sin(2 * pi * 2 * t) + sin(2 * pi * 10 * t) + 0.3 * sin(2 * pi * 60 * t) y <- buttbp(x, fs = 1000, low = 5, high = 20) length(y$filtered) }if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 1000) # 2 Hz drift + 10 Hz band of interest + 60 Hz noise x <- sin(2 * pi * 2 * t) + sin(2 * pi * 10 * t) + 0.3 * sin(2 * pi * 60 * t) y <- buttbp(x, fs = 1000, low = 5, high = 20) length(y$filtered) }
Zero-phase Butterworth bandstop filter. Default 59–61 Hz removes North- American AC mains hum (60 Hz); use 49–51 Hz for European mains.
buttbs(x, fs, low = 59, high = 61, order = 4L)buttbs(x, fs, low = 59, high = 61, order = 4L)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). |
low |
Lower cutoff (Hz, default 59). |
high |
Upper cutoff (Hz, default 61). |
order |
Filter order (default 4). |
List with filtered (numeric vector), fs, order, name.
if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 1000) x <- sin(2 * pi * 10 * t) + sin(2 * pi * 60 * t) y <- buttbs(x, fs = 1000) # remove 60 Hz mains length(y$filtered) }if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 1000) x <- sin(2 * pi * 10 * t) + sin(2 * pi * 60 * t) y <- buttbs(x, fs = 1000) # remove 60 Hz mains length(y$filtered) }
Zero-phase Butterworth highpass filter. Removes low-frequency drift while preserving higher-frequency content; useful for de-trending physiological signals (EEG, ECG) prior to analysis.
butthp(x, fs, cutoff, order = 4L)butthp(x, fs, cutoff, order = 4L)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). |
cutoff |
Cutoff frequency (Hz). |
order |
Filter order (default 4). |
List with filtered (numeric vector), fs, order, name.
if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 500) x <- 5 * t + sin(2 * pi * 10 * t) # linear drift + 10 Hz signal y <- butthp(x, fs = 500, cutoff = 1) length(y$filtered) }if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 500) x <- 5 * t + sin(2 * pi * 10 * t) # linear drift + 10 Hz signal y <- butthp(x, fs = 500, cutoff = 1) length(y$filtered) }
Zero-phase Butterworth lowpass filter via the suggested signal package's
butter() + filtfilt(). Useful for removing high-frequency noise from
biological or geophysical time series.
buttlp(x, fs, cutoff, order = 4L)buttlp(x, fs, cutoff, order = 4L)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). |
cutoff |
Cutoff frequency (Hz). |
order |
Filter order (default 4). |
List with filtered (numeric vector), fs, order, name.
if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 500) x <- sin(2 * pi * 5 * t) + 0.5 * sin(2 * pi * 60 * t) # 5 Hz + 60 Hz y <- buttlp(x, fs = 500, cutoff = 20) length(y$filtered) # 500 }if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 500) x <- sin(2 * pi * 5 * t) + 0.5 * sin(2 * pi * 60 * t) # 5 Hz + 60 Hz y <- buttlp(x, fs = 500, cutoff = 20) length(y$filtered) # 500 }
Robust to arbitrary correlation structure. No clean CRAN equivalent (the ACAT package is GitHub-only); kept as an in-house implementation.
cauchy_combination(p_values, weights = NULL)cauchy_combination(p_values, weights = NULL)
p_values |
Numeric vector of raw p-values. |
weights |
Optional non-negative weights summing to 1. |
CCRSO Table 1 — 5-year average annual conditional releases
CCRSO_TABLE1_RELEASESCCRSO_TABLE1_RELEASES
An object of class data.frame with 3 rows and 10 columns.
CCRSO Table 2 — prisoner flow 2013/14-2017/18
CCRSO_TABLE2_FLOWCCRSO_TABLE2_FLOW
An object of class data.frame with 5 rows and 6 columns.
CCRSO/StatsCan Table 3 — 2018 age distribution
CCRSO_TABLE3_AGECCRSO_TABLE3_AGE
An object of class data.frame with 3 rows and 7 columns.
Real cepstrum . Useful
for pitch-period estimation and any analysis where the multiplicative
magnitude structure of the spectrum is best handled additively in the
quefrency domain.
cepst(x, n_fft = NULL)cepst(x, n_fft = NULL)
x |
Numeric vector (1-D signal). |
n_fft |
FFT length (default: next power of 2 |
Reference: Rangayyan, R.M. (2015) Biomedical Signal Analysis, 2nd ed., Wiley/IEEE Press, chapter on cepstral analysis.
List with filtered (real cepstral coefficients),
name, fs, n_samples, and extra (quefrency, n_fft).
set.seed(1) x <- sin(2 * pi * 5 * seq(0, 1, length.out = 512)) res <- cepst(x) length(res$filtered)set.seed(1) x <- sin(2 * pi * 5 * seq(0, 1, length.out = 512)) res <- cepst(x) length(res$filtered)
Mirrors the morie cheatsheet CLI subcommand: a one-screen
reference of install / learn / run / pull / ingest / help commands.
cheatsheet()cheatsheet()
Invisibly returns a character scalar of the cheatsheet. Called for its side effect of printing to the console.
cheatsheet()cheatsheet()
Check referential integrity (child FK -> parent PK)
check_referential_integrity(child, parent, child_key, parent_key)check_referential_integrity(child, parent, child_key, parent_key)
child |
Data frame with foreign key. |
parent |
Data frame with primary key. |
child_key, parent_key
|
Column names. |
Chi-squared goodness-of-fit test
chi2_goodness_of_fit(observed, expected = NULL)chi2_goodness_of_fit(observed, expected = NULL)
observed |
Observed counts. |
expected |
Expected counts or NULL for uniform. |
Chi-squared test of independence
chi2_independence(contingency_table, correction = TRUE)chi2_independence(contingency_table, correction = TRUE)
contingency_table |
A matrix or table of counts. |
correction |
Yates's continuity correction (2x2). |
Package IDs and metadata URLs for accessing CPADS, CSADS, and CSUS datasets via the Canadian Open Data CKAN API.
ckan_metadatackan_metadata
A data.frame with columns:
Survey abbreviation: cpads, csads, csus
Full survey name
CKAN package UUID
URL to retrieve full package metadata
data(ckan_metadata) ckan_metadata$metadata_urldata(ckan_metadata) ckan_metadata$metadata_url
Estimates P(X > Y) for randomly drawn observations from each group.
cles(x, y, confidence = 0.95)cles(x, y, confidence = 0.95)
x, y
|
Numeric vectors (NA dropped). |
confidence |
Confidence level for CI. Default 0.95. |
A morie_effect_size.
Cliff's delta
cliffs_delta(x, y, confidence = 0.95)cliffs_delta(x, y, confidence = 0.95)
x, y
|
Numeric vectors (NA dropped). |
confidence |
Confidence level for CI. Default 0.95. |
A morie_effect_size.
Cochran's Q test
cochrans_q(...)cochrans_q(...)
... |
Three or more matched binary 0/1 vectors. |
cmprsk::crr does not ship a coef.crr method, so a bare
stats::coef() on a crr fit falls through to coef.default()
and returns NULL. morie registers this method (via S3method
in NAMESPACE, generated by roxygen @exportS3Method) so the
standard accessor returns the fitted coefficient vector for any
caller – not just morie_survival_finegray.
## S3 method for class 'crr' coef(object, ...)## S3 method for class 'crr' coef(object, ...)
object |
A |
... |
Ignored. |
Named numeric vector of regression coefficients.
Coefficient of variation
coefficient_of_variation(x)coefficient_of_variation(x)
x |
Numeric vector. |
A morie_effect_size.
Cohen's f from eta-squared
cohens_f(eta2)cohens_f(eta2)
eta2 |
Eta-squared value. |
A morie_effect_size.
Cohen's kappa for two raters
cohens_kappa(rater1, rater2, confidence = 0.95)cohens_kappa(rater1, rater2, confidence = 0.95)
rater1, rater2
|
Equal-length categorical vectors. |
confidence |
Confidence level. |
Cohen's w for chi-squared
cohens_w(observed, expected = NULL)cohens_w(observed, expected = NULL)
observed |
Observed frequencies (numeric vector). |
expected |
Expected frequencies (or NULL for uniform). |
A morie_effect_size.
Comprehensive multicollinearity diagnostics
collinearity_diagnostics(X, column_names = NULL)collinearity_diagnostics(X, column_names = NULL)
X |
Design matrix. |
column_names |
Optional column names. |
A morie_collinearity_diagnostics list.
Construct a column rule
column_rule( name, dtype = NULL, required = TRUE, nullable = TRUE, null_threshold = 1, min_val = NULL, max_val = NULL, allowed_values = NULL, unique = FALSE, regex = NULL, custom = NULL )column_rule( name, dtype = NULL, required = TRUE, nullable = TRUE, null_threshold = 1, min_val = NULL, max_val = NULL, allowed_values = NULL, unique = FALSE, regex = NULL, custom = NULL )
name |
Column name. |
dtype |
One of "numeric", "character"/"object", "datetime", or NULL. |
required |
Whether the column must be present. |
nullable |
Whether NA values are allowed. |
null_threshold |
Maximum allowed fraction of NA (0–1). |
min_val, max_val
|
Numeric bounds (or NULL). |
allowed_values |
Vector of permitted values (or NULL). |
unique |
Logical; whether values must be unique. |
regex |
Regex pattern for string columns (or NULL). |
custom |
Optional function |
A column_rule list.
Commands grouped by category
commands_by_category()commands_by_category()
Named list of character vectors of command names.
R^2 and adjusted R^2 for linear models; McFadden pseudo-R^2, deviance and Pearson chi-squared for logistic / Poisson; AIC, BIC, log-likelihood, and the omnibus F-test for linear models.
compute_goodness_of_fit( y, y_hat, X, model_type = "linear", log_likelihood = NULL )compute_goodness_of_fit( y, y_hat, X, model_type = "linear", log_likelihood = NULL )
y |
Response vector. |
y_hat |
Fitted values. |
X |
Design matrix. |
model_type |
|
log_likelihood |
Optional precomputed log-likelihood. |
A morie_goodness_of_fit list.
Hat-matrix diagonal, Cook's distance (stats::cooks.distance
is preferred for fitted lms; this function works straight
from X, y, and a fitted y_hat), DFFITS,
DFBETAS, and COVRATIO.
compute_influence(y, X, y_hat = NULL)compute_influence(y, X, y_hat = NULL)
y |
Response vector. |
X |
Design matrix. |
y_hat |
Optional fitted values (OLS is used if NULL). |
A morie_influence_diagnostics list.
Returns raw / standardised / externally studentised residuals along with normality, heteroskedasticity (Breusch-Pagan), and autocorrelation (Durbin-Watson) tests. Optionally also returns deviance and Pearson residuals for logistic / Poisson GLMs.
compute_residuals(y, y_hat, X, model_type = "linear")compute_residuals(y, y_hat, X, model_type = "linear")
y |
Observed response. |
y_hat |
Fitted values. |
X |
Design matrix. |
model_type |
|
A morie_residual_diagnostics list.
For each column j of X, regresses column j on the remaining columns (plus an intercept) and returns 1/(1 - R^2).
compute_vif(X, column_names = NULL)compute_vif(X, column_names = NULL)
X |
Design matrix (without intercept). |
column_names |
Optional character vector of names. |
Named numeric vector of VIFs.
Pairwise correlation matrix with p-values
correlation_matrix(data, method = "pearson")correlation_matrix(data, method = "pearson")
data |
Data frame; numeric columns are used. |
method |
One of "pearson", "spearman", "kendall". |
List with components r (correlations) and p
(p-values), both data.frame objects with matching dimensions.
Pairwise correlation matrix with significance stars
correlation_table( data, method = "pearson", show_stars = TRUE, mask_diagonal = TRUE, digits = 3L, output_format = "dataframe", title = "Correlation Matrix" )correlation_table( data, method = "pearson", show_stars = TRUE, mask_diagonal = TRUE, digits = 3L, output_format = "dataframe", title = "Correlation Matrix" )
data |
Data frame. |
method |
"pearson", "spearman", "kendall". |
show_stars |
Annotate cells with significance stars. |
mask_diagonal |
Replace diagonal with "-". |
digits |
Decimal places. |
output_format |
Output target. |
title |
Title. |
Create a manifest of the environment for reproducibility
create_reproducibility_manifest(data, parameters = NULL, seeds = NULL)create_reproducibility_manifest(data, parameters = NULL, seeds = NULL)
data |
Data frame (used for a SHA-256 checksum). |
parameters |
Optional list of analysis parameters. |
seeds |
Optional named list of random seeds. |
Cross-validate a model using a user-supplied fit/predict pair
cross_validate( fit_fn, predict_fn, X, y, method = "stratified_kfold", n_folds = 5L, n_repeats = 10L, scoring = "roc_auc", groups = NULL, confidence = 0.95, random_state = 42L )cross_validate( fit_fn, predict_fn, X, y, method = "stratified_kfold", n_folds = 5L, n_repeats = 10L, scoring = "roc_auc", groups = NULL, confidence = 0.95, random_state = 42L )
fit_fn |
A function |
predict_fn |
A function |
X |
Matrix or data frame of features. |
y |
Vector of targets. |
method |
Resampling strategy: "kfold", "stratified_kfold", "grouped_kfold", "loo", "monte_carlo", "time_series". |
n_folds |
Number of folds. |
n_repeats |
Repeats for monte_carlo. |
scoring |
"roc_auc", "accuracy", "brier". |
groups |
Group labels for grouped_kfold. |
confidence |
Confidence level for the score CI. |
random_state |
Seed. |
Uses the Kraemer & Kupfer (2006) approximation.
d_to_nnt(d, base_rate = 0.5)d_to_nnt(d, base_rate = 0.5)
d |
Cohen's d. |
base_rate |
Control event rate (default 0.5). |
Numeric NNT.
Convert Cohen's d to odds ratio
d_to_or(d)d_to_or(d)
d |
Cohen's d. |
Numeric OR.
Convert Cohen's d to Pearson r
d_to_r(d, n1 = NULL, n2 = NULL)d_to_r(d, n1 = NULL, n2 = NULL)
d |
Cohen's d. |
n1, n2
|
Sample sizes (or NULL). |
Numeric r.
API stub: implemented via the K2 statistic = Z(skew)^2 + Z(kurt)^2 (D'Agostino & Pearson 1973). Recommended n >= 20.
dagostino_pearson(x)dagostino_pearson(x)
x |
Numeric vector. |
A data.frame listing all Canadian public health datasets available through the MORIE data management system. Each row describes one dataset with its source, survey, year, format, and access metadata.
dataset_catalogdataset_catalog
A data.frame with columns:
Unique catalog key (e.g., "opencanada_cpads_2021")
Human-readable dataset name
Data source: opencanada, healthinfobase, or cihi
Survey abbreviation: cpads, ccs, csads, csus, or indicators
Year or year range (e.g., "2021-2022")
File format: csv or xlsx
Data type: pumf, bootstrap, aggregate, or indicator
Logical; TRUE for bootstrap weight files (>100MB)
Relative path to the local data file
SQLite table name in the DBI cache
CKAN DataStore resource ID (empty if unavailable)
Health Canada, CIHI, Statistics Canada open data portals.
data(dataset_catalog) head(dataset_catalog)data(dataset_catalog) head(dataset_catalog)
Decision curve analysis
decision_curve_analysis(y_true, y_pred, thresholds = NULL)decision_curve_analysis(y_true, y_pred, thresholds = NULL)
y_true |
Integer 0/1 vector. |
y_pred |
Predicted probabilities. |
thresholds |
Numeric vector of thresholds (defaults to
|
Tests Doob's central thesis that imprisonment is decoupled from crime by computing the Pearson correlation between the two time series and (optionally) running Pettitt change-point on each.
decoupling_test(crime_series, imprisonment_series, years = NULL)decoupling_test(crime_series, imprisonment_series, years = NULL)
crime_series |
Numeric vector of per-period crime rates. |
imprisonment_series |
Numeric vector of per-period
imprisonment rates, same length as |
years |
Optional integer vector of period labels. |
A morie_result named-list.
Generalised jackknife removing d observations per replicate;
all subsets are enumerated when max_subsets
and Monte-Carlo sampled otherwise. resample exposes the
equivalent generalised jackknife API via resample::jackknife
(cross-referenced).
delete_d_jackknife( data, statistic, d = 2L, ci_level = 0.95, max_subsets = 5000L, seed = 42L )delete_d_jackknife( data, statistic, d = 2L, ci_level = 0.95, max_subsets = 5000L, seed = 42L )
data |
Numeric vector or matrix. |
statistic |
Function returning a scalar. |
d |
Number of observations to delete per replicate. |
ci_level |
Confidence level. |
max_subsets |
Maximum subsets to evaluate. |
seed |
Random seed. |
A morie_jackknife_result.
resample::jackknife.
Bootstrap optimism-corrected performance
detect_overfitting( fit_fn, predict_fn, X, y, scoring = "roc_auc", n_bootstrap = 200L, random_state = 42L )detect_overfitting( fit_fn, predict_fn, X, y, scoring = "roc_auc", n_bootstrap = 200L, random_state = 42L )
fit_fn |
A function |
predict_fn |
A function |
X |
Matrix or data frame of features. |
y |
Vector of targets. |
scoring |
"roc_auc", "accuracy", "brier". |
n_bootstrap |
Integer; number of bootstrap resamples used to estimate the optimism correction (default 200). |
random_state |
Seed. |
Estimates the DFA scaling exponent . White noise gives
; pink (1/f) noise ;
Brownian motion .
dfa(x, scales = NULL)dfa(x, scales = NULL)
x |
Numeric vector (length |
scales |
Integer vector of window sizes (auto-generated if |
Reference: Peng, C.-K., Havlin, S., Stanley, H.E. & Goldberger, A.L. (1995) "Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series", Chaos 5(1):82–87.
List with value (alpha), name, and extra
(scales, fluctuation).
set.seed(1) x <- cumsum(rnorm(2048)) res <- dfa(x) res$valueset.seed(1) x <- cumsum(rnorm(2048)) res <- dfa(x) res$value
Replicates the analytical contribution of Prof. Anthony N. Doob's expert-witness affidavit in Canadian Civil Liberties Association et al. v. The Attorney General of Canada (Federal Court file T-539-20, Application Record Vol. 3 of 5, pp. 778-795).
Doob's national-aggregate analyses (Figures 1-4 + Tables 1-3) sit ALONGSIDE the per-row MRM modules on OTIS provincial data and the MRM chi-square family on aggregate contingency tables.
Doob, A. N. (2020). Affidavit (T-539-20) of Anthony Doob — Federal Court of Canada, Application Record Vol. 3 of 5. CCLA et al. v. Attorney General of Canada.
Thin wrapper over EValue::evalues.OLS() when EValue is
installed and an outcome SD is supplied via the sd_y argument
(recommended workflow per VanderWeele & Ding 2017). Without
EValue, or when sd_y is left at its default of 1, falls
back to the closed-form continuous-scale RR proxy used by the
Python port so both ports stay numerically aligned.
e_value(ate, se, null = 0, sd_y = 1)e_value(ate, se, null = 0, sd_y = 1)
ate |
Point estimate of the treatment effect. |
se |
Standard error of the ATE (must be > 0). |
null |
Null value. Default 0. |
sd_y |
Outcome standard deviation. Default 1 (use the
closed-form proxy). Pass the empirical sd to route through
|
Scalar E-value (>= 1).
Converts d to an RR scale via the VanderWeele-Ding approximation
RR ~ exp(0.91 * d), then applies e_value_rr().
e_value_d(d, se = NULL, n = NULL)e_value_d(d, se = NULL, n = NULL)
d |
Standardised mean difference. |
se |
Standard error of d (optional). |
n |
Sample size for SE approximation (optional). |
A morie_evalue named-list.
Uses the HR-to-RR approximation from VanderWeele (2017).
e_value_hr(hr, ci_lower = NULL, ci_upper = NULL)e_value_hr(hr, ci_lower = NULL, ci_upper = NULL)
hr |
Hazard ratio. |
ci_lower, ci_upper
|
Optional 95% CI of HR. |
A morie_evalue named-list.
Uses Zhang & Yu (1998) OR-to-RR correction when prevalence >= 0.15.
e_value_or(odds_ratio, ci_lower = NULL, ci_upper = NULL, prevalence = NULL)e_value_or(odds_ratio, ci_lower = NULL, ci_upper = NULL, prevalence = NULL)
odds_ratio |
Observed odds ratio. |
ci_lower, ci_upper
|
Optional 95% CI. |
prevalence |
Outcome prevalence (optional). |
A morie_evalue named-list.
Wraps EValue when available; otherwise applies the VanderWeele-Ding closed-form formula directly.
e_value_rr(rr, ci_lower = NULL, ci_upper = NULL)e_value_rr(rr, ci_lower = NULL, ci_upper = NULL)
rr |
Observed risk ratio. |
ci_lower |
Lower 95% CI of the RR (optional). |
ci_upper |
Upper 95% CI of the RR (optional). |
A morie_evalue named-list.
Pan-Tompkins QRS detection: bandpass (5–15 Hz) -> differentiate -> square -> moving-window integration -> adaptive thresholding -> refinement against the raw ECG.
ecgdet(ecg, fs)ecgdet(ecg, fs)
ecg |
Numeric vector (1-D ECG signal). |
fs |
Sampling frequency in Hz. |
Reference: Pan, J. & Tompkins, W.J. (1985) "A real-time QRS detection algorithm", IEEE Trans. Biomed. Eng. BME-32(3):230–236.
List with filtered (raw ECG echoed), name, fs,
n_samples, and extra (r_peaks = 1-based sample indices,
n_peaks).
set.seed(1) fs <- 250 t <- seq(0, 4, by = 1 / fs) ecg <- sin(2 * pi * 1.2 * t) + 0.1 * rnorm(length(t)) if (requireNamespace("signal", quietly = TRUE)) { res <- ecgdet(ecg, fs) res$extra$n_peaks }set.seed(1) fs <- 250 t <- seq(0, 4, by = 1 / fs) ecg <- sin(2 * pi * 1.2 * t) + 0.1 * rnorm(length(t)) if (requireNamespace("signal", quietly = TRUE)) { res <- ecgdet(ecg, fs) res$extra$n_peaks }
Returns a named-list with class c("morie_effect_size", "list").
effect_size_result( measure, estimate, ci_lower = NA_real_, ci_upper = NA_real_, se = NA_real_, n = NA_integer_, extra = list() )effect_size_result( measure, estimate, ci_lower = NA_real_, ci_upper = NA_real_, se = NA_real_, n = NA_integer_, extra = list() )
measure |
Name of the effect-size statistic. |
estimate |
Point estimate (numeric). |
ci_lower |
Lower confidence bound (or NA). |
ci_upper |
Upper confidence bound (or NA). |
se |
Standard error (or NA). |
n |
Sample size (or NA). |
extra |
Named list of additional outputs. |
A morie_effect_size named-list.
Effect-size estimators used in biomedical and social-science research, each with analytic or bootstrap confidence intervals.
Families: standardised mean differences (Cohen's d, Hedges' g, Glass's delta); common-language ES (CLES); correlation-based (r, R^2, eta^2, partial eta^2, omega^2, epsilon^2); contingency (OR, RR, RD, NNT, NNH, rate ratio, IRD); association (Cohen's w, Cramer's V, phi); non-parametric (rank-biserial, Cliff's delta, Vargha-Delaney A); regression (standardised beta, CV); and meta- analysis (fixed-/random-effects pooling, I^2, prediction interval).
Cohen (1988); Hedges & Olkin (1985); Borenstein et al. (2009); Vargha & Delaney (2000); DerSimonian & Laird (1986).
Two function families live here:
Treatment-effect estimators (legacy)
estimate_ate() – IPW-weighted OLS ATE.
estimate_plr() – Partially Linear Regression via
DoubleML; falls back to base R cross-fit ridge.
estimate_pliv() – Partially Linear IV (LATE) via
DoubleML; falls back to 2SLS.
estimate_ate_gcomputation() – G-computation ATE.
Thin wrapper over stdReg::stdGlm() when installed; falls
back to inline bootstrap standardisation otherwise.
sensitivity_rosenbaum() – Rosenbaum bounds. Thin
wrapper over rbounds::psens() when installed; otherwise
normal-approximation Wilcoxon signed-rank bounds.
e_value() – VanderWeele-Ding E-value. Thin wrapper
over EValue::evalues.OLS() when installed; otherwise the
closed-form continuous-scale RR proxy.
Marginal-effects extenders (Phase 1.j additions; thin wrappers over Vincent Arel-Bundock's universal API and the emmeans / broom ecosystems):
morie_effects_emmeans() -> emmeans::emmeans().
morie_effects_predictions() ->
marginaleffects::predictions().
morie_effects_comparisons() ->
marginaleffects::comparisons().
morie_effects_slopes() ->
marginaleffects::slopes().
morie_effects_tidy() -> broom::tidy() (falls
back to a summary()-based tidy frame when broom
is unavailable).
Each extender requires the underlying CRAN package and signals a
clean stop() when it is missing, leaving the upstream model
object untouched. They return the underlying package's native
object verbatim so downstream code (e.g. ggplot2 plumbing,
rmorie's MRM step) keeps working with the canonical API.
Chernozhukov et al. (2018); Robins (1986); VanderWeele & Ding (2017); Rosenbaum (2002); Arel-Bundock (2024, marginaleffects); Lenth (2024, emmeans); Robinson, Hayes & Couch (2024, broom).
Epsilon-squared (Kelley, 1935)
epsilon_squared(ss_effect, ss_total, df_effect, ms_error)epsilon_squared(ss_effect, ss_total, df_effect, ms_error)
ss_effect, ss_total
|
Sums of squares. |
df_effect |
Numerator d.f. of the effect. |
ms_error |
Error mean square. |
A morie_effect_size.
Thin wrapper over a weighted stats::lm() plus HC3 robust SEs
from sandwich + lmtest when installed. Note: this is the
legacy shape used by older MRM pipelines; new code should prefer
morie_estimate_ate() (in R/causal.R) for the richer
morie_te_result return shape.
estimate_ate(data, outcome, treatment, weights_col)estimate_ate(data, outcome, treatment, weights_col)
data |
Data frame containing the analytical sample. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary treatment column. |
weights_col |
Name of the weights column (e.g. IPTW). |
Named list with ate and se (HC3-robust when available).
Thin wrapper over stdReg::stdGlm() (Sjolander's
regression-standardisation back end) when stdReg is
installed. Without stdReg, falls back to an inline outcome-
regression + bootstrap implementation (500 resamples, seed 42)
that mirrors the legacy rmorie behaviour.
estimate_ate_gcomputation( data, treatment, outcome, covariates, outcome_model = "linear" )estimate_ate_gcomputation( data, treatment, outcome, covariates, outcome_model = "linear" )
data |
Data frame with all required columns. |
treatment |
Binary treatment column (0/1). |
outcome |
Outcome column. |
covariates |
Character vector of covariates. |
outcome_model |
|
Named list with ate, se, ci_lower, ci_upper,
n_obs, outcome_model.
Delegates to qvalue::pi0est (Bioconductor) when installed
and method = "storey". Otherwise computes the cutoff-based
or bootstrap-based estimator inline.
estimate_pi0(p_values, method = c("storey", "bootstrap", "two_step"))estimate_pi0(p_values, method = c("storey", "bootstrap", "two_step"))
p_values |
Numeric vector of raw p-values. |
method |
One of |
A scalar pi0 estimate in [0, 1].
Wraps DoubleML when available. Otherwise falls back to 2SLS:
first stage D ~ Z + X, second stage Y ~ D_hat + X, base R OLS.
estimate_pliv( data, treatment, outcome, instrument, covariates, n_folds = 5L, random_state = 42L )estimate_pliv( data, treatment, outcome, instrument, covariates, n_folds = 5L, random_state = 42L )
data |
Data frame with all required columns. |
treatment |
Endogenous treatment column name. |
outcome |
Outcome column name. |
instrument |
Instrument column name. |
covariates |
Exogenous covariate column names. |
n_folds |
Cross-fitting folds (DoubleML path). Default 5. |
random_state |
RNG seed. Default 42. |
Named list with late, se, ci_lower, ci_upper,
pval, n_obs, method.
Wraps DoubleML (+ mlr3 / mlr3learners) when available. Without DoubleML, falls back to a hand-rolled cross-fitting estimator using ridge regression (glmnet) or, last-ditch, OLS partialling out.
estimate_plr( data, treatment, outcome, covariates, n_folds = 5L, random_state = 42L )estimate_plr( data, treatment, outcome, covariates, n_folds = 5L, random_state = 42L )
data |
Data frame with all required columns. |
treatment |
Column name of the treatment variable. |
outcome |
Column name of the outcome variable. |
covariates |
Character vector of covariate column names. |
n_folds |
Cross-fitting folds. Default 5. |
random_state |
RNG seed. Default 42. |
Named list with ate, se, ci_lower, ci_upper,
pval, n_obs, method.
Looks up a one-paragraph + short-table explanation by filename (any leading directory components are stripped). Falls back to matching on the filename stem if the extension differs.
explain_file(filename)explain_file(filename)
filename |
The CSV filename, with or without a path. |
A character scalar containing the explanation. If no registered entry matches, returns a fallback listing the known files.
cat(explain_file("power_summary.csv"))cat(explain_file("power_summary.csv"))
Names of all morie output CSVs with registered explanations
explain_known_files()explain_known_files()
Character vector of filenames.
Thin wrapper-as-extender entry points that delegate to canonical
CRAN packages for local-FDR estimation (locfdr), FDR /
q-values (fdrtool), quantile regression (quantreg),
nonparametric kernel regression (np), Bayesian
nonparametric Dirichlet-process mixtures
(dirichletprocess), and latent-class mixed models
(lcmm). Each function returns a two-element list with
$method (the qualified upstream function name, or a
multi-step string for compound pipelines) and $raw (the
upstream return object), so downstream callers can pattern-match
on shape while keeping the full upstream object available for
inspection.
Thin wrapper-as-extender entry points that delegate to canonical
CRAN packages for regression-discontinuity diagnostics
(rddensity, rdlocrand, rdpower), ordinal
vignette analysis (anchors), and alpha-NOMINATE
ideal-point estimation (anominate). Each function returns
a two-element list with $method (the qualified upstream
function name) and $raw (the upstream return object), so
downstream callers can pattern-match on shape while keeping the
full upstream object available for inspection.
Thin wrapper-as-extender entry points that delegate to canonical
CRAN packages for geostatistics (gstat), multivariate
dependence (copula), kernel methods (kernlab),
meta-analysis (metafor) and the multivariate normal /
distributions (mvtnorm). Each function returns a
two-element list with $method (the qualified upstream
function name) and $raw (the upstream return object), so
downstream callers can pattern-match on shape while keeping the
full upstream object available for inspection.
Thin wrapper-as-extender entry points that delegate to canonical
CRAN statistics packages. Each function returns a two-element
list with $method (the qualified upstream function name)
and $raw (the upstream return object), so downstream
callers can pattern-match on shape while keeping the full
upstream object available for inspection.
External validation on new data
external_validate(predict_fn, X_external, y_external, X_development = NULL)external_validate(predict_fn, X_external, y_external, X_development = NULL)
predict_fn |
A function |
X_external |
Matrix or data frame of features. |
y_external |
Outcome vector. |
X_development |
Optional development-data features for KS-based domain-shift diagnostics. |
R port of the Python module morie.fairness.cityprofile.
The disparity audit operates on a canonical per-area schema:
area, risk, outcome, population,
group. A morie_city_profile records which columns
of one city's open-data export carry those five canonical fields,
and morie_fairness_apply_profile renames an arbitrary
city data.frame onto the canonical schema so the audit code
never needs to know which city the data came from.
morie_fairness_city_profile: constructor for a
city profile object.
morie_fairness_register_city: register a
profile in the process-local registry.
morie_fairness_get_city: look up a registered
profile by case-insensitive name.
morie_fairness_list_cities: list registered
profile names.
morie_fairness_apply_profile: rename a
data.frame onto the canonical schema.
R port of morie.fairness.metrics. Each callable is an
audit measure: given decisions a system made (and, where
available, the realised ground truth) plus a protected attribute,
it quantifies whether outcomes differ across groups. None of these
functions make predictions; they only measure disparity in
predictions that already exist.
morie_fairness_disparate_impact: the four-fifths
rule.
morie_fairness_demographic_parity:
favourable-rate gap.
morie_fairness_equalized_odds: TPR/FPR gaps
(needs ground truth).
morie_fairness_average_odds_difference: mean
TPR+FPR gap.
morie_fairness_gini: concentration of a score
distribution.
morie_fairness_bias_amplification: composite
of parity gap and inequality.
Prior art reimplemented independently (no code copied): the COMPAS fairness audit in pbiecek's XAI Stories and IBM's AI Fairness 360 definitions; the predictive-policing disparity framing of the SciencesPo Predictive-policing-Chicago project (Lacherade, Szabo, Krikava & Aeby, 2021) and Barman & Barman, arXiv:2603.18987.
R port of morie.fairness.temporal. Reimplements Barman &
Barman, Unmasking Algorithmic Bias in Predictive Policing
(arXiv:2603.18987): the four disparity metrics - Disparate Impact
Ratio, Demographic Parity Gap, Gini coefficient, and Bias
Amplification Score - are computed for every (city,
period) cell and assembled into a time series so that
temporal instability and cross-city divergence
become visible.
Builds on the metrics in fairness_metrics.
Wiens-Dmitrienko fallback; gMCP can express the same procedure as a graphical multiple-comparison transition.
fallback_procedure(p_values, weights, alpha = 0.05, labels = NULL)fallback_procedure(p_values, weights, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
weights |
Numeric vector of non-negative weights summing to 1. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Delegates to poolr::fisher when installed; otherwise
computes the chi-square statistic inline.
fisher_combined(p_values)fisher_combined(p_values)
p_values |
Numeric vector of raw p-values. |
Fisher's exact test for a 2x2 table
fisher_exact_test(contingency_table, alternative = "two.sided")fisher_exact_test(contingency_table, alternative = "two.sided")
contingency_table |
2x2 matrix. |
alternative |
One of "two.sided", "less", "greater". |
Fixed-effects (inverse-variance) meta-analytic pooling
fixed_effects_meta(estimates, standard_errors, confidence = 0.95)fixed_effects_meta(estimates, standard_errors, confidence = 0.95)
estimates |
Numeric vector of effect-size estimates. |
standard_errors |
Numeric vector of SEs. |
confidence |
Confidence level. Default 0.95. |
A morie_effect_size with Q + Q p-value in extra.
Tests are evaluated in the given order and the procedure stops at the first non-rejection; reached hypotheses need no multiplicity adjustment. The gMCP package implements the equivalent procedure via a chain graph.
fixed_sequence(p_values, alpha = 0.05, labels = NULL)fixed_sequence(p_values, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Apply uniform formatting to numeric columns
format_dataframe( df, numeric_fmt = "%.2f", pval_cols = NULL, output_format = "dataframe", title = "" )format_dataframe( df, numeric_fmt = "%.2f", pval_cols = NULL, output_format = "dataframe", title = "" )
df |
Data frame. |
numeric_fmt |
sprintf-style format spec without leading "%". |
pval_cols |
Columns to format as p-values. |
output_format |
Output target. |
title |
Title. |
Format a single number according to style conventions
format_number( x, style = c("fixed", "scientific", "percent", "integer"), digits = 2L, apa = FALSE )format_number( x, style = c("fixed", "scientific", "percent", "integer"), digits = 2L, apa = FALSE )
x |
Numeric. |
style |
"fixed", "scientific", "percent", "integer". |
digits |
Decimal places. |
apa |
APA-style leading-zero suppression. |
Friedman test (repeated-measures rank ANOVA)
friedman_test(...)friedman_test(...)
... |
Three or more equal-length numeric vectors. |
R parity for the Python morie.fairness.metrics module. Every
callable here is an audit measure: given the decisions a system
made (and, where available, the realised ground truth) plus a
protected attribute such as race, it quantifies whether outcomes
differ across groups. None of these functions make predictions; they
only measure disparity in predictions that already exist.
Functions:
morie_fairness_disparate_impact(): the EEOC four-fifths rule.
morie_fairness_demographic_parity(): favourable-rate gap.
morie_fairness_equalized_odds(): TPR/FPR gaps (needs ground truth).
morie_fairness_average_odds_difference(): mean TPR+FPR gap.
morie_fairness_gini(): concentration of a score distribution.
morie_fairness_bias_amplification(): composite Delta_parity * G.
Each returns a named list with the metric value, a per-group
breakdown, any advisory warnings, and a plain-language
interpretation, mirroring the payload of the Python
RichResult.
Prior art reimplemented independently (no code copied): IBM AI Fairness 360 metric definitions; the COMPAS audit in pbiecek's XAI Stories; the SciencesPo Predictive-policing-Chicago project (Lacherade, Szabo, Krikava & Aeby, 2021); and Barman & Barman, arXiv:2603.18987 (the Bias Amplification Score).
Each callable in this module returns a named list with the
metric value, a per-group breakdown, advisory warnings, and
a plain-language interpretation.
pred <- c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0) race <- c(rep("A", 5), rep("B", 5)) morie_fairness_disparate_impact(pred, race, privileged = "A")$valuepred <- c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0) race <- c(rep("A", 5), rep("B", 5)) morie_fairness_disparate_impact(pred, race, privileged = "A")$value
R parity for the Python morie.fairness.predpol module. A
clean-room, city-agnostic reimplementation of the district-level
analysis of the SciencesPo Predictive-policing-Chicago project
(Lacherade, Szabo, Krikava & Aeby, 2021): rank areas by the risk an
algorithm predicts, rank them by their realised outcome rate, and
test whether the disagreement tracks the areas' demographic
composition.
Functions:
morie_predpol_aggregate_areas(): roll per-record data up to one row
per area.
morie_predpol_calibration_audit(): Spearman calibration plus a
per-group mean rank gap (the over-/under-prediction signal).
morie_predpol_score_disparity(): descriptive per-group risk-score
summary with a one-way ANOVA.
Written from the project's published methodology; no code copied (that repository carries no licence and is not redistributable).
morie_predpol_aggregate_areas() returns a per-area
data.frame; morie_predpol_calibration_audit() and
morie_predpol_score_disparity() return named lists of audit
statistics, per-group breakdowns, and a plain-language
interpretation.
agg <- morie_predpol_aggregate_areas( area = c("a", "a", "b", "b"), risk = c(10, 20, 30, 40), outcome = c(1, 0, 1, 1) ) agg$mean_riskagg <- morie_predpol_aggregate_areas( area = c("a", "a", "b", "b"), risk = c(10, 20, 30, 40), outcome = c(1, 0, 1, 1) ) agg$mean_risk
R parity for the Python morie.fairness.temporal module. The four
disparity metrics — Disparate Impact Ratio, Demographic Parity Gap,
Gini coefficient, and Bias Amplification Score — are computed for
each (city, period) cell and aggregated per city, so temporal
instability and cross-city divergence become visible.
Reimplements the longitudinal, multi-city audit of Barman & Barman, arXiv:2603.18987. Its central lesson: bias metrics are not stable from one deployment cycle to the next and must be recomputed per period and per city.
The module's audit callable returns a named list with the
worst per-city Disparate Impact Ratio range, per-city and per-cell
breakdowns, and a plain-language interpretation.
period <- c(rep("p1", 10), rep("p2", 10)) city <- rep("A", 20) pred <- rep(c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0), 2) grp <- rep(c(rep("X", 5), rep("Y", 5)), 2) morie_predpol_temporal_audit(period, city, pred, grp, privileged = "X")period <- c(rep("p1", 10), rep("p2", 10)) city <- rep("A", 20) pred <- rep(c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0), 2) grp <- rep(c(rep("X", 5), rep("Y", 5)), 2) morie_predpol_temporal_audit(period, city, pred, grp, privileged = "X")
Runs residual, influence, collinearity, goodness-of-fit, and specification tests, then summarises the overall assessment.
full_diagnostics( y, X, y_hat = NULL, model_type = "linear", column_names = NULL )full_diagnostics( y, X, y_hat = NULL, model_type = "linear", column_names = NULL )
y |
Response. |
X |
Design matrix. |
y_hat |
Optional fitted values (OLS used if NULL). |
model_type |
|
column_names |
Optional column names for X. |
A morie_diagnostic_report.
A penalised-spline alternative to the kernel methods above. Fits
y ~ s(x, k = k) and returns fitted values at x_eval.
gam_smoother(x, y, x_eval = NULL, k = 10, family = stats::gaussian())gam_smoother(x, y, x_eval = NULL, k = 10, family = stats::gaussian())
x |
Numeric covariate vector. |
y |
Numeric outcome vector. |
x_eval |
Evaluation grid (defaults to |
k |
Basis dimension for the smoother (default 10). |
family |
GLM family for |
A list with fit (the fitted gam object),
x_eval, y_hat (predictions), and edf
(effective degrees of freedom).
Glass's delta — control-group SD denominator
glass_delta(x, y, control = "y", confidence = 0.95)glass_delta(x, y, control = "y", confidence = 0.95)
x, y
|
Numeric vectors (NA dropped). |
control |
Which group is the control: |
confidence |
Confidence level for CI. Default 0.95. |
A morie_effect_size.
For tests that may be dependent. Delegates to
harmonicmeanp::p.hmp when installed; otherwise returns the
raw harmonic mean (Wilson 2019 asymptotic approximation).
harmonic_mean_p(p_values)harmonic_mean_p(p_values)
p_values |
Numeric vector of raw p-values. |
Hazard-ratio table from Cox model components
hazard_ratio_table( params, se, pvalues, confidence = 0.95, digits = 3L, apa = FALSE, output_format = "dataframe", title = "Hazard Ratios" )hazard_ratio_table( params, se, pvalues, confidence = 0.95, digits = 3L, apa = FALSE, output_format = "dataframe", title = "Hazard Ratios" )
params |
Named numeric vector of log-HR coefficients. |
se |
Named numeric vector of standard errors. |
pvalues |
Named numeric vector of p-values. |
confidence |
Confidence level. |
digits |
Decimal places. |
apa |
APA formatting. |
output_format |
Output target. |
title |
Title. |
Complex cepstrum: inverse FFT of using the unwrapped
phase. Unlike the real cepstrum, it preserves enough information to
invert the operation, which is what enables homomorphic deconvolution.
hcepst(x, n_fft = NULL)hcepst(x, n_fft = NULL)
x |
Numeric vector (1-D signal). |
n_fft |
FFT length (default: next power of 2 |
Reference: Oppenheim, A.V. & Schafer, R.W. (2009) Discrete-Time Signal Processing, 3rd ed., Pearson, chapter on cepstral analysis.
List with filtered (complex cepstrum, real-valued),
name, fs, n_samples, and extra (quefrency, n_fft,
original_length).
set.seed(1) x <- sin(2 * pi * 5 * seq(0, 1, length.out = 512)) res <- hcepst(x) length(res$filtered)set.seed(1) x <- sin(2 * pi * 5 * seq(0, 1, length.out = 512)) res <- hcepst(x) length(res$filtered)
Separates a convolved signal into a minimum-phase
impulse-response component and an excitation by
low-time liftering of the complex cepstrum.
hdecon(x, cutoff, n_fft = NULL)hdecon(x, cutoff, n_fft = NULL)
x |
Numeric vector (assumed convolution |
cutoff |
Liftering cutoff (quefrency index). Coefficients above are zeroed to isolate the slow-varying component. |
n_fft |
FFT length (default: next power of 2 |
Reference: Oppenheim & Schafer (2009), Discrete-Time Signal Processing, 3rd ed., on homomorphic systems for convolution.
List with filtered (minimum-phase component ),
name, fs, n_samples, and extra (excitation, cutoff, n_fft).
set.seed(1) x <- sin(2 * pi * 5 * seq(0, 1, length.out = 512)) res <- hdecon(x, cutoff = 20) length(res$filtered)set.seed(1) x <- sin(2 * pi * 5 * seq(0, 1, length.out = 512)) res <- hdecon(x, cutoff = 20) length(res$filtered)
Estimates the Higuchi (1988) fractal dimension of a 1-D time series via
length scaling across k time-lags. Values typically fall in [1, 2];
higher values indicate greater signal complexity.
hfd(x, kmax = 10L)hfd(x, kmax = 10L)
x |
Numeric vector (length |
kmax |
Maximum k (default 10). |
Reference: Higuchi, T. (1988) "Approach to an irregular time series on the basis of the fractal theory", Physica D 31(2):277–283.
List with value (D), name, and extra (kmax, n,
L_k).
set.seed(1) x <- cumsum(rnorm(1000)) hfd(x, kmax = 10)$valueset.seed(1) x <- cumsum(rnorm(1000)) hfd(x, kmax = 10)$value
Families are tested in order; if a family produces no rejections, subsequent families are blocked from testing. rmorie-specific stage-list return shape; no drop-in CRAN equivalent (gMCP covers the concept but with a different graphical API).
hierarchical_bonferroni( p_values_by_family, alpha = 0.05, propagate_alpha = TRUE )hierarchical_bonferroni( p_values_by_family, alpha = 0.05, propagate_alpha = TRUE )
p_values_by_family |
List of numeric vectors, one per family. |
alpha |
Overall FWER level. |
propagate_alpha |
Logical; currently keeps alpha constant across families (mirrors the Python reference). |
A morie_rich_result list with one stage entry per
family and an overall_rejected logical vector.
Thin wrapper over stats::p.adjust(method = "hochberg").
hochberg(p_values, alpha = 0.05, labels = NULL)hochberg(p_values, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Thin wrapper over stats::p.adjust(method = "holm");
uniformly more powerful than Bonferroni.
holm(p_values, alpha = 0.05, labels = NULL)holm(p_values, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Step-down Holm with Sidak's closed-form adjustment per step;
equivalent to mutoss::SidakSD when that package is
installed.
holm_sidak(p_values, alpha = 0.05, labels = NULL)holm_sidak(p_values, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Thin wrapper over stats::p.adjust(method = "hommel").
hommel(p_values, alpha = 0.05, labels = NULL)hommel(p_values, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Hosmer-Lemeshow goodness-of-fit test for logistic regression
hosmer_lemeshow_test(y, y_prob, n_groups = 10L)hosmer_lemeshow_test(y, y_prob, n_groups = 10L)
y |
Binary response vector. |
y_prob |
Predicted probabilities. |
n_groups |
Number of decile groups (default 10). |
A morie_specification_test.
Resamples the RR-interval series uniformly at fs_interp Hz, estimates
a Welch PSD, and integrates VLF (0.003–0.04 Hz), LF (0.04–0.15 Hz),
and HF (0.15–0.40 Hz) bands.
hrvfd(rr, fs_interp = 4)hrvfd(rr, fs_interp = 4)
rr |
Numeric vector of RR intervals in milliseconds. |
fs_interp |
Uniform resampling frequency in Hz (default 4). |
Reference: Task Force (1996), Circulation 93(5):1043–1065.
List with value (total power), name, and extra
(vlf, lf, hf, lf_hf_ratio, total_power, lf_norm,
hf_norm).
set.seed(1) rr <- 800 + cumsum(rnorm(200, sd = 20)) res <- hrvfd(rr) res$extra$lf_hf_ratioset.seed(1) rr <- 800 + cumsum(rnorm(200, sd = 20)) res <- hrvfd(rr) res$extra$lf_hf_ratio
Computes the short- and long-axis standard deviations of the Poincare plot: SD1 (short-term variability) and SD2 (long-term).
hrvnl(rr)hrvnl(rr)
rr |
Numeric vector of RR intervals in milliseconds. |
Reference: Brennan, M., Palaniswami, M. & Kamen, P. (2001) "Do existing measures of Poincare plot geometry reflect nonlinear features of heart rate variability?", IEEE Trans. Biomed. Eng. 48(11):1342–1347.
List with value (SD1), name, and extra
(sd1, sd2, sd1_sd2_ratio, n_intervals).
set.seed(1) rr <- 800 + cumsum(rnorm(200, sd = 20)) res <- hrvnl(rr) res$extra$sd1set.seed(1) rr <- 800 + cumsum(rnorm(200, sd = 20)) res <- hrvnl(rr) res$extra$sd1
Computes the standard HRV time-domain indices on an RR-interval series: SDNN, RMSSD, pNN50, mean RR, mean HR, and the HRV triangular index.
hrvtd(rr)hrvtd(rr)
rr |
Numeric vector of RR intervals in milliseconds. |
Reference: Task Force (1996), Circulation 93(5):1043–1065.
List with value (SDNN), name, and extra
(sdnn, rmssd, pnn50, mean_rr, mean_hr,
hrv_triangular_index, n_intervals).
set.seed(1) rr <- 800 + cumsum(rnorm(200, sd = 20)) res <- hrvtd(rr) res$extra$rmssdset.seed(1) rr <- 800 + cumsum(rnorm(200, sd = 20)) res <- hrvtd(rr) res$extra$rmssd
I^2 heterogeneity statistic (Higgins)
i_squared(estimates, standard_errors)i_squared(estimates, standard_errors)
estimates |
Effect-size estimates. |
standard_errors |
Standard errors. |
Numeric I^2 percentage.
Incidence rate difference (IRD)
incidence_rate_difference( events1, person_time1, events2, person_time2, confidence = 0.95 )incidence_rate_difference( events1, person_time1, events2, person_time2, confidence = 0.95 )
events1, person_time1
|
Events and person-time in group 1. |
events2, person_time2
|
Events and person-time in group 2. |
confidence |
Confidence level. Default 0.95. |
A morie_effect_size.
Intraclass correlation coefficient (Shrout & Fleiss 1979)
intraclass_correlation(data, targets, raters, ratings, icc_type = "ICC3k")intraclass_correlation(data, targets, raters, ratings, icc_type = "ICC3k")
data |
Long-format data frame. |
targets |
Subject ID column. |
raters |
Rater ID column. |
ratings |
Numeric rating column. |
icc_type |
One of "ICC1", "ICC1k", "ICC2", "ICC2k", "ICC3", "ICC3k". |
Computes the leave-one-out estimates, pseudovalues, influence
values, and bias-corrected jackknife estimate. The
bootstrap package's bootstrap::jackknife is the
canonical CRAN reference; it is invoked when installed and the
rmorie-shape result is reconstructed around it. Falls back to an
inline loop otherwise.
jackknife(data, statistic, ci_level = 0.95)jackknife(data, statistic, ci_level = 0.95)
data |
Numeric vector or matrix. |
statistic |
Function returning a scalar. |
ci_level |
Confidence level. |
A morie_jackknife_result.
bootstrap::jackknife, resample::jackknife.
Computes f-hat(x) equal to one over n times h times the sum over i of K of (x minus X_i) divided by h.
kde(x, x_eval, bandwidth, kernel_type = KERNEL_GAUSSIAN)kde(x, x_eval, bandwidth, kernel_type = KERNEL_GAUSSIAN)
x |
Numeric data vector. |
x_eval |
Evaluation grid. |
bandwidth |
Positive bandwidth. |
kernel_type |
Integer code or kernel name. |
Numeric vector of estimated densities.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall.
Kendall's tau-b correlation
kendall_correlation(x, y)kendall_correlation(x, y)
x, y
|
Numeric vectors. |
Useful for the conditional outcome stage of TMLE / AIPW.
kernel_cond_moments(x, y, x_eval, bandwidth, return_variance = TRUE)kernel_cond_moments(x, y, x_eval, bandwidth, return_variance = TRUE)
x |
Numeric covariate vector. |
y |
Numeric outcome vector. |
x_eval |
Evaluation grid. |
bandwidth |
Positive bandwidth. |
return_variance |
Logical; if FALSE, only the mean is returned. |
Either a numeric vector (mean only) or a list with
mean and variance.
Evaluate a kernel function at point u
kernel_eval(u, kernel_type = KERNEL_GAUSSIAN)kernel_eval(u, kernel_type = KERNEL_GAUSSIAN)
u |
Numeric evaluation point (scaled by bandwidth). |
kernel_type |
Integer code or kernel name. One of
|
Kernel density value K(u).
Integer codes used by morie's C++ semiparametric bridge to select
the kernel function for local-polynomial smoothing. Mirror the
Python morie.semipar.KernelType enum so an R caller can pass these
constants directly to any C++ kernel routine.
KERNEL_GAUSSIAN KERNEL_EPANECHNIKOV KERNEL_UNIFORM KERNEL_TRIANGULAR KERNEL_BIWEIGHTKERNEL_GAUSSIAN KERNEL_EPANECHNIKOV KERNEL_UNIFORM KERNEL_TRIANGULAR KERNEL_BIWEIGHT
Integer scalars (0L, 1L, 2L, 3L, 4L).
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
KERNEL_GAUSSIAN:
KERNEL_EPANECHNIKOV: on |u| <= 1
KERNEL_UNIFORM: on |u| <= 1
KERNEL_TRIANGULAR: on |u| <= 1
KERNEL_BIWEIGHT: on |u| <= 1
Katz fractal dimension of a 1-D signal. is total path length and
is the diameter (max distance from the first sample).
kfd(x)kfd(x)
x |
Numeric vector. |
Reference: Katz, M.J. (1988) "Fractals and the analysis of waveforms", Comput. Biol. Med. 18(3):145–156.
List with value (D), name, and extra (L, d, n).
set.seed(1) x <- cumsum(rnorm(1000)) res <- kfd(x) res$valueset.seed(1) x <- cumsum(rnorm(1000)) res <- kfd(x) res$value
One-sample Kolmogorov-Smirnov test
ks_test_one_sample(x, cdf = "pnorm", args = list())ks_test_one_sample(x, cdf = "pnorm", args = list())
x |
Numeric vector. |
cdf |
Name of a CDF function (e.g. "pnorm", "pexp"). Defaults to "pnorm". A bare distribution name like "norm" is auto-prefixed. |
args |
List of extra arguments to pass to |
Two-sample Kolmogorov-Smirnov test
ks_test_two_sample(x, y)ks_test_two_sample(x, y)
x, y
|
Numeric vectors. |
Convenience wrapper that calls .boot_cross_validate()
with n_folds = length(y). rsample::loo_cv is the
tidymodels equivalent (cross-referenced).
leave_one_out_cv(X, y, model_fn, score_fn)leave_one_out_cv(X, y, model_fn, score_fn)
X |
Numeric matrix or data.frame of predictors. |
y |
Numeric or factor outcome vector aligned with rows of |
model_fn |
Function |
score_fn |
Function |
A morie_cv_result.
rsample::loo_cv, caret::trainControl.
Levene's test for equality of variances
levene_test(..., center = "median")levene_test(..., center = "median")
... |
Two or more numeric vectors. |
center |
One of "median" (Brown-Forsythe), "mean", "trimmed". |
R parity of morie._license_check. Exposes the
FSF GPL-compatible licence list and a
check_plugin_license() helper that downstream R packages /
plugins can call to confirm GPL compatibility before linking
against morie internals. The guard is advisory — it warns or
raises but does not enforce at the R-namespace level. For
stronger guarantees see the companion userspace LSM-style daemon
(daemon/morie_lsm.py) and the kernel companion module
(kernel-module/morie.c).
morie_gpl_compatible_licenses() returns a character vector
of GPL-compatible SPDX identifiers; check_plugin_license() returns
a logical (invisibly), signalling a warning or error when the supplied
licence is not GPL-compatible.
morie_gpl_compatible_licenses()morie_gpl_compatible_licenses()
Likelihood ratio test for nested models
likelihood_ratio_test(ll_restricted, ll_full, df_diff)likelihood_ratio_test(ll_restricted, ll_full, df_diff)
ll_restricted |
Log-likelihood of the restricted model. |
ll_full |
Log-likelihood of the full model. |
df_diff |
Difference in degrees of freedom. |
A morie_specification_test.
Uses nortest::lillie.test when available; otherwise falls back to a plain KS test with estimated parameters (p-value approximate).
lilliefors_test(x)lilliefors_test(x)
x |
Numeric vector. |
Pregibon link test
link_test(y, X, model_type = "linear")link_test(y, X, model_type = "linear")
y |
Response. |
X |
Design matrix. |
model_type |
|
A morie_specification_test.
Estimates the local FDR for each test as
, where
are two-sided z-scores,
is the standard-normal null density, is a kernel
density estimate of the observed z-scores, and is the
proportion of null hypotheses estimated by the Storey-style cutoff
at .
local_fdr(p_values, pi0_method = "bootstrap", labels = NULL)local_fdr(p_values, pi0_method = "bootstrap", labels = NULL)
p_values |
Numeric vector of raw p-values in |
pi0_method |
Pi-zero estimator. Accepted: |
labels |
Optional character vector of test labels. |
A data frame with columns p_value, z_score,
local_fdr, and (if supplied) label. The data frame
additionally carries class morie_rich_result.
set.seed(1) p <- c(stats::runif(80), stats::pnorm(-abs(stats::rnorm(20, mean = 3))) * 2) lfdr <- local_fdr(p) head(lfdr)set.seed(1) p <- c(stats::runif(80), stats::pnorm(-abs(stats::rnorm(20, mean = 3))) * 2) lfdr <- local_fdr(p) head(lfdr)
Avoids the boundary bias of Nadaraya-Watson by fitting a local linear model at each evaluation point.
local_linear(x, y, x_eval, bandwidth, return_slope = FALSE)local_linear(x, y, x_eval, bandwidth, return_slope = FALSE)
x |
Numeric covariate vector. |
y |
Numeric outcome vector. |
x_eval |
Evaluation grid. |
bandwidth |
Positive bandwidth. |
return_slope |
Logical; if TRUE, also return local slopes. |
If return_slope = FALSE, a numeric vector of fitted
values; otherwise a list with y_hat and beta_hat.
Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall.
Clean-room R parity of morie.longitudinal_sim for synchronised
multivariate longitudinal-panel simulation. Implements SyncRNG,
VAR coefficient generation with stationarity preservation, MVN
draws under structured covariance kernels, and tidy panel output.
Clean-room note: this module re-implements the techniques used in the Hlozek–Bangari Collaborative-CIFAR-Catalyst project (https://github.com/bangari-19/Collaborative-CIFAR-Project-) without copying any source. That repository is unlicensed. The techniques themselves — synchronised PRNG streams, lagged AR coefficient matrices, multivariate normal generation under Toeplitz / compound-symmetric covariance — are standard methods from Hamilton (1994) and Diggle, Liang, Zeger (1994), implemented here independently.
The simulation callables return tidy longitudinal-panel
data.frames; morie_sync_rng() returns an environment
exposing synchronised rnorm, runif, and sample
methods.
rng <- morie_sync_rng(42)rng <- morie_sync_rng(42)
Minimises CV(h) equal to one over n times sum over i of
(Y_i minus m-hat-h-minus-i of X_i) squared on a grid spanning
bw_min to bw_max.
loocv_bandwidth(x, y, bw_min = NULL, bw_max = NULL, n_grid = 30L)loocv_bandwidth(x, y, bw_min = NULL, bw_max = NULL, n_grid = 30L)
x |
Numeric covariate vector. |
y |
Numeric outcome vector. |
bw_min |
Minimum candidate bandwidth (defaults to 0.1 times the Silverman bandwidth). |
bw_max |
Maximum candidate bandwidth (defaults to 2.0 times the Silverman bandwidth). |
n_grid |
Number of candidate values. |
Optimal bandwidth (numeric scalar).
Hardle, W. (1990). Applied Nonparametric Regression. Cambridge.
Mann-Whitney U / Wilcoxon rank-sum test
mann_whitney_u(x, y, alternative = "two.sided")mann_whitney_u(x, y, alternative = "two.sided")
x, y
|
Numeric vectors. |
alternative |
One of "two.sided", "less", "greater". |
Under no assumptions about selection, the ATE is only partially
identified. Returns a named list with lower_bound, upper_bound,
point_estimate, width.
manski_bounds( outcome_treated, outcome_control, p_treated, outcome_range = NULL )manski_bounds( outcome_treated, outcome_control, p_treated, outcome_range = NULL )
outcome_treated |
Outcomes for treated units. |
outcome_control |
Outcomes for control units. |
p_treated |
Proportion treated. |
outcome_range |
c(min, max) on the outcome. Default c(0, 1). |
Named list.
McNemar's test (paired nominal data)
mcnemar_test(contingency_table, exact = FALSE)mcnemar_test(contingency_table, exact = FALSE)
contingency_table |
2x2 table. |
exact |
Use exact binomial. |
Identical to rank(x, ties.method = "average") plus a tie-
correction term sum t_j^3 - t_j over tied groups.
midranks(x)midranks(x)
x |
Numeric vector. |
Named list: midranks, n, ties, tie_correction.
midranks(x = rnorm(50))midranks(x = rnorm(50))
Compare multiple model fits on AIC, BIC, log-likelihood and (optional) LR tests
model_comparison_table( models, nested = FALSE, digits = 3L, output_format = "dataframe", title = "Model Comparison" )model_comparison_table( models, nested = FALSE, digits = 3L, output_format = "dataframe", title = "Model Comparison" )
models |
Named list of fitted models. |
nested |
If TRUE, run LR tests against the previous model in the list. |
digits |
Decimal places. |
output_format |
Output target. |
title |
Title. |
Thin extender over anominate::anominate for Bayesian
alpha-NOMINATE ideal-point estimation on roll-call legislative
data (Carroll, Lewis, Lo, Poole & Rosenthal, 2013).
morie_anominate_ideal_points(rcObject, ...)morie_anominate_ideal_points(rcObject, ...)
rcObject |
A roll-call object as built by
|
... |
Further arguments forwarded to
|
A list with $method = "anominate::anominate" and
$raw (an anominate posterior-sample object).
## Not run: if (requireNamespace("anominate", quietly = TRUE) && requireNamespace("pscl", quietly = TRUE)) { data("sen90", package = "anominate") morie_anominate_ideal_points(sen90, dims = 1, nsamp = 200, burnin = 100) } ## End(Not run)## Not run: if (requireNamespace("anominate", quietly = TRUE) && requireNamespace("pscl", quietly = TRUE)) { data("sen90", package = "anominate") morie_anominate_ideal_points(sen90, dims = 1, nsamp = 200, burnin = 100) } ## End(Not run)
One-way ANOVA
morie_anova_one_way(...)morie_anova_one_way(...)
... |
Numeric vectors, one per group. |
Named list: F, df_between, df_within, p_value,
eta_squared.
morie_anova_one_way(rnorm(30, 0), rnorm(30, 0.5), rnorm(30, 1))morie_anova_one_way(rnorm(30, 0), rnorm(30, 0.5), rnorm(30, 1))
ARCH(1)-in-mean model
morie_arch_in_mean(x)morie_arch_in_mean(x)
x |
Numeric return series. |
Named list with mu, delta, omega, alpha, loglik,
conditional_variance, n, method.
morie_arch_in_mean(x = rnorm(50))morie_arch_in_mean(x = rnorm(50))
The aggregate file is a long-format
YEAR_2020 / YEAR_2021 / YEAR_2022 panel keyed by
(SECTION, CATEGORY, UNITS OF MEASURE). This function
rebuilds the implied time series, runs year-on-year change against
the REPORT_SCOPE rows (the headline volume series), and
surfaces a data-quality audit.
Builds an implied YoY series from the YEAR_2020 / YEAR_2021 / YEAR_2022 columns against the REPORT_SCOPE headline volume row.
morie_arsau_analyze_aggregate_summary( year_range = "2020-2022", language = "en", data_dir = NULL ) morie_arsau_analyze_aggregate_summary( year_range = "2020-2022", language = "en", data_dir = NULL )morie_arsau_analyze_aggregate_summary( year_range = "2020-2022", language = "en", data_dir = NULL ) morie_arsau_analyze_aggregate_summary( year_range = "2020-2022", language = "en", data_dir = NULL )
year_range |
"2020-2022". |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
A list classed
c("morie_arsau_result", "morie_rich_result", "list").
Ontario Ministry of the Solicitor General, ARSAU 2020-2022 aggregate-summary-by-year technical notes.
Chains:
mrm_uof_force_concentration on
POLICE_SERVICE
mrm_uof_weapon_diversity on
POLICE_SERVICE x ASSIGNMENT_TYPE
mrm_uof_yoy_change on REPORTING_YEAR
morie_arsau_analyze_detailed_dataset( year_range = "2020-2022", language = "en", data_dir = NULL ) morie_arsau_analyze_detailed_dataset( year_range = "2020-2022", language = "en", data_dir = NULL )morie_arsau_analyze_detailed_dataset( year_range = "2020-2022", language = "en", data_dir = NULL ) morie_arsau_analyze_detailed_dataset( year_range = "2020-2022", language = "en", data_dir = NULL )
year_range |
"2020-2022". |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
A list classed
c("morie_arsau_result", "morie_rich_result", "list").
Ontario Ministry of the Solicitor General, ARSAU 2020-2022 detailed_dataset technical notes.
Chains demographic-disparity tests over Race, Gender, and
AgeCategory against the IndivInjuries_PhysicalInjuries
outcome column, plus a data-quality audit against the sidecar.
Chains demographic disparity by Race, Gender, AgeCategory against the IndivInjuries_PhysicalInjuries outcome (Yes/No coerced). Tolerates the 2023 trailing-space typo in the outcome column.
morie_arsau_analyze_individual_records( year, language = "en", data_dir = NULL, bootstrap_reps = 0L ) morie_arsau_analyze_individual_records( year, language = "en", data_dir = NULL, bootstrap_reps = 0L )morie_arsau_analyze_individual_records( year, language = "en", data_dir = NULL, bootstrap_reps = 0L ) morie_arsau_analyze_individual_records( year, language = "en", data_dir = NULL, bootstrap_reps = 0L )
year |
2023 or 2024. |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
bootstrap_reps |
Forwarded to
|
A list classed
c("morie_arsau_result", "morie_rich_result", "list").
Ontario Ministry of the Solicitor General, ARSAU 2023 and 2024 individual_records technical release notes.
Chains:
mrm_uof_force_concentration over
PoliceService
mrm_uof_weapon_diversity over
IncidentType x PoliceService
mrm_uof_data_quality_audit against the
published CKAN sidecar (when present)
Chains mrm_uof_force_concentration (PoliceService),
mrm_uof_weapon_diversity (IncidentType x PoliceService), and
mrm_uof_data_quality_audit (against the CKAN sidecar).
morie_arsau_analyze_main_records(year, language = "en", data_dir = NULL) morie_arsau_analyze_main_records(year, language = "en", data_dir = NULL)morie_arsau_analyze_main_records(year, language = "en", data_dir = NULL) morie_arsau_analyze_main_records(year, language = "en", data_dir = NULL)
year |
2023 or 2024. |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
Region-locality is NOT meaningful for main_records – only the
OPP_PoliceService_Region column is published, and it pairs
one column with itself. See
morie_arsau_analyze_detailed_dataset for the
2020-2022 layout that exposes more region columns.
A list classed
c("morie_arsau_result", "morie_rich_result", "list").
Ontario Ministry of the Solicitor General, ARSAU 2023 and 2024 main_records technical release notes.
The probe-cycle file is intentionally narrow (BatchFileName + Indiv_Index + a comma-separated cycle string). This function computes the cycle-count distribution per incident and runs a data-quality audit.
morie_arsau_analyze_probe_cycle_records(year, language = "en", data_dir = NULL) morie_arsau_analyze_probe_cycle_records(year, language = "en", data_dir = NULL)morie_arsau_analyze_probe_cycle_records(year, language = "en", data_dir = NULL) morie_arsau_analyze_probe_cycle_records(year, language = "en", data_dir = NULL)
year |
2023 or 2024. |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
A list classed
c("morie_arsau_result", "morie_rich_result", "list").
Ontario Ministry of the Solicitor General, ARSAU probe_cycle_records technical notes (2023 and 2024).
Chains mrm_uof_weapon_diversity over
Weapon x Location (the only two categorical columns the file
publishes) plus a Weapon-only frequency table plus a data-quality
audit.
Chains Weapon x Location chi-square + weapon frequency table + DQ
audit. 2023 needs allow_invalid = TRUE.
morie_arsau_analyze_weapon_records( year, allow_invalid = FALSE, language = "en", data_dir = NULL ) morie_arsau_analyze_weapon_records( year, allow_invalid = FALSE, language = "en", data_dir = NULL )morie_arsau_analyze_weapon_records( year, allow_invalid = FALSE, language = "en", data_dir = NULL ) morie_arsau_analyze_weapon_records( year, allow_invalid = FALSE, language = "en", data_dir = NULL )
year |
2023 or 2024. |
allow_invalid |
|
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
The 2023 file is the ministry-flagged-invalid release and requires
allow_invalid = TRUE.
A list classed
c("morie_arsau_result", "morie_rich_result", "list").
Ontario Ministry of the Solicitor General, ARSAU 2023 and 2024 weapon_records technical notes – the 2023 release accompanies an explicit invalidity flag.
List ARSAU dataset kinds, optionally restricted to one year.
morie_arsau_available_datasets(year = NULL, language = "en", data_dir = NULL)morie_arsau_available_datasets(year = NULL, language = "en", data_dir = NULL)
year |
Optional year; |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
List ARSAU year / year-range buckets.
morie_arsau_available_years(data_dir = NULL, language = "en")morie_arsau_available_years(data_dir = NULL, language = "en")
data_dir |
Optional explicit ARSAU root. |
language |
"en" or "fr". |
datastore_search URL for a registry
entry.Returns NA_character_ for entries that do not publish a
sidecar (e.g. the 2023 weapon_records release).
morie_arsau_ckan_url(kind, year, limit = 5000L)morie_arsau_ckan_url(kind, year, limit = 5000L)
kind |
One of |
year |
One of |
limit |
Integer; CKAN |
Character scalar URL, or NA_character_.
Ontario Data Catalogue CKAN API:
datastore_search (https://data.ontario.ca/).
Describe a single ARSAU dataset entry.
morie_arsau_describe( kind, year, language = "en", data_dir = NULL, n_preview_rows = 3L )morie_arsau_describe( kind, year, language = "en", data_dir = NULL, n_preview_rows = 3L )
kind |
One of |
year |
One of |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
n_preview_rows |
Number of rows from the CSV head to include. |
This is the R-side equivalent of running the maintainer's
scripts/refresh_arsau.py mirror — a non-trivial pipeline that
walks the CKAN package, follows per-resource redirects, handles
rate-limits, verifies SHA digests against the published values, and
lands the files under MORIE_ARSAU_DIR. Porting it requires
an end-to-end retry + checksum manager that does not yet have a
tested R analogue; per the morie maintenance policy, network bulk
fetches must be reproducible across CRAN test environments before
the wrapper is exposed. Stubbed for now.
morie_arsau_download(target_dir, ...)morie_arsau_download(target_dir, ...)
target_dir |
Destination directory. |
... |
Reserved. |
Stops with NotYetPorted.
Optional helper. Requires httr2 (and jsonlite via the
existing morie_arsau_read_sidecar contract).
morie_arsau_fetch_sidecar(kind, year, limit = 5000L, timeout_sec = 30L)morie_arsau_fetch_sidecar(kind, year, limit = 5000L, timeout_sec = 30L)
kind |
One of |
year |
One of |
limit |
Integer; CKAN |
timeout_sec |
Request timeout in seconds. Default 30. |
A list with fields and records, ready for
morie_arsau_sidecar_schema /
morie_arsau_sidecar_to_frame.
Ontario Data Catalogue CKAN API.
Load ARSAU aggregate-summary-by-year CSV (2020-2022 only).
morie_arsau_load_aggregate_summary( year_range = "2020-2022", language = "en", data_dir = NULL )morie_arsau_load_aggregate_summary( year_range = "2020-2022", language = "en", data_dir = NULL )
year_range |
"2020-2022". |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
Load ARSAU detailed-incident-level CSV (2020-2022 only).
morie_arsau_load_detailed_dataset( year_range = "2020-2022", language = "en", data_dir = NULL )morie_arsau_load_detailed_dataset( year_range = "2020-2022", language = "en", data_dir = NULL )
year_range |
"2020-2022". |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
Load ARSAU individual_records CSV.
morie_arsau_load_individual_records(year, language = "en", data_dir = NULL)morie_arsau_load_individual_records(year, language = "en", data_dir = NULL)
year |
2023 or 2024. |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
Load ARSAU main_records CSV for the given year.
morie_arsau_load_main_records(year, language = "en", data_dir = NULL)morie_arsau_load_main_records(year, language = "en", data_dir = NULL)
year |
2023 or 2024. |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
Load ARSAU probe_cycle_records CSV (CEW telemetry).
morie_arsau_load_probe_cycle_records(year, language = "en", data_dir = NULL)morie_arsau_load_probe_cycle_records(year, language = "en", data_dir = NULL)
year |
2023 or 2024. |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
2023 requires allow_invalid = TRUE (ministry-flagged invalid).
morie_arsau_load_weapon_records( year, allow_invalid = FALSE, language = "en", data_dir = NULL )morie_arsau_load_weapon_records( year, allow_invalid = FALSE, language = "en", data_dir = NULL )
year |
2023 or 2024. |
allow_invalid |
Logical; required |
language |
"en" or "fr". |
data_dir |
Optional explicit ARSAU root. |
Parses a simple pipe-table Markdown sidecar of the form
| name | type | notes | |------|------|-------| | foo | int | ... |
as published by some ARSAU releases. No external dependencies are required: the parser is pure base R.
morie_arsau_read_markdown_dictionary(path)morie_arsau_read_markdown_dictionary(path)
path |
Path to the Markdown file. |
A data.frame with one row per table row. Returns
an empty data.frame if the file has no parseable pipe
table.
Ontario Ministry of the Solicitor General data dictionaries accompanying the ARSAU CSV releases.
Handles both bare {fields, records} and the
{result: {fields, records}} wrapper shape.
morie_arsau_read_sidecar(path)morie_arsau_read_sidecar(path)
path |
Path to the JSON file. |
Named list with fields and records.
Some ARSAU releases ship a companion *.xlsx file alongside
the CSV that holds the column-level data dictionary (variable name
dtype + notes). This helper reads the first sheet via
readxl and normalises the column names to
name / type / notes. Requires the optional readxl
dependency.
morie_arsau_read_xlsx_dictionary(path, sheet = 1L)morie_arsau_read_xlsx_dictionary(path, sheet = 1L)
path |
Path to the XLSX file. |
sheet |
Sheet identifier (name or 1-based integer). Default
|
A data.frame with columns name, type,
notes. Other columns from the XLSX are preserved with
their upstream names.
Ontario Ministry of the Solicitor General data dictionaries accompanying the ARSAU CSV releases.
data.frame.Returns one row per (year_or_range, kind) entry in the
package's internal registry, with the same columns as the Python
ARSAU_REGISTRY mapping but in row-major data.frame
form. The underlying list-of-lists is still available via
ARSAU_REGISTRY.
morie_arsau_registry_df(language = "en")morie_arsau_registry_df(language = "en")
language |
"en" or "fr"; selects the description column. |
A data.frame with columns year_or_range,
kind, csv_filename, sidecar_filename,
expected_rows, expected_cols, is_valid,
description.
Ontario Ministry of the Solicitor General, ARSAU per-resource technical release notes (2020-2022 / 2023 / 2024).
[name, type, notes] schema from a
parsed CKAN sidecar.Accepts the result of morie_arsau_read_sidecar (a
list with fields and records entries) and returns a
tidy data.frame of column metadata. Entries that lack an
id are dropped.
morie_arsau_sidecar_schema(sidecar)morie_arsau_sidecar_schema(sidecar)
sidecar |
A list as returned by
|
A data.frame with columns name,
type, notes.
CKAN datastore_search response schema, as
served by datastore_search (https://data.ontario.ca/).
records array-of-arrays into a
data.frame.The fields[].id array supplies the column names; records
are array-of-array, so the column order in the JSON matches the
column order in the resulting data.frame.
morie_arsau_sidecar_to_frame(sidecar)morie_arsau_sidecar_to_frame(sidecar)
sidecar |
A list as returned by
|
A data.frame (zero rows if records is empty).
CKAN datastore_search response schema.
Audit both OTIS and ARSAU.
morie_audit_all_variables(otis_specs = NULL, arsau_specs = NULL)morie_audit_all_variables(otis_specs = NULL, arsau_specs = NULL)
otis_specs, arsau_specs
|
See per-domain functions. |
Named list with $otis and $arsau audit results.
Audit every ARSAU variable.
morie_audit_arsau_variables(dataset_specs = NULL)morie_audit_arsau_variables(dataset_specs = NULL)
dataset_specs |
A list with class morie_audit_result.
For each OTIS dataset, this function expects a list of column
specifications. By default it constructs the specs from the
columns in DATASET_REGISTRY (in R, these are stored on the
Python side via the dictionary parser; on the R side we fall
back to a minimal name-only list and rely on the heuristic
classifier when dtype/valid_values are unknown).
morie_audit_otis_variables(dataset_specs = NULL)morie_audit_otis_variables(dataset_specs = NULL)
dataset_specs |
Optional list keyed by dataset id, each entry
a list of |
For a richer audit that consults the bilingual XLSX dictionary,
use the Python module morie.audit_variables.
A list with class morie_audit_result.
Audit declared outputs against files on disk
morie_audit_public_outputs(project_root = NULL, manifest = NULL)morie_audit_public_outputs(project_root = NULL, manifest = NULL)
project_root |
Project root directory. |
manifest |
Manifest data frame. If |
Data frame containing declared and observed output status.
# Craft a tempdir manifest + output file, then audit: tdir <- tempfile("morie-doc-") dir.create(tdir) writeLines("x,y\n1,2", file.path(tdir, "results.csv")) man <- data.frame( output = "results.csv", public_path = file.path(tdir, "results.csv"), size_kb = 0.01, modified = format(Sys.Date()) ) morie_audit_public_outputs(project_root = tdir, manifest = man)# Craft a tempdir manifest + output file, then audit: tdir <- tempfile("morie-doc-") dir.create(tdir) writeLines("x,y\n1,2", file.path(tdir, "results.csv")) man <- data.frame( output = "results.csv", public_path = file.path(tdir, "results.csv"), size_kb = 0.01, modified = format(Sys.Date()) ) morie_audit_public_outputs(project_root = tdir, manifest = man)
BayesC-pi spike-and-slab variable selection (short Gibbs)
morie_bayes_cpi_genomic( x, y, n_iter = 300, burn = 100, pi_init = 0.1, seed = 0, deterministic_seed = NULL )morie_bayes_cpi_genomic( x, y, n_iter = 300, burn = 100, pi_init = 0.1, seed = 0, deterministic_seed = NULL )
x |
(n x p) marker matrix. |
y |
Numeric response. |
n_iter |
Iterations. |
burn |
Burn-in. |
pi_init |
Initial inclusion probability. |
seed |
Seed. |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
list(estimate, beta, beta_pip, pi, sigma_b2, sigma2, n_iter, n, p, method).
Habier-Fernando-Kizilkaya-Garrick (2011); Montesinos Lopez Ch 4.
morie_bayes_cpi_genomic(x = rnorm(50), y = rnorm(50))morie_bayes_cpi_genomic(x = rnorm(50), y = rnorm(50))
Per-marker variance with scaled inverse chi-squared prior.
morie_bayes_ridge_gibbs( x, y, n_iter = 200, burn = 50, df0 = 4, S0 = NULL, seed = 0, deterministic_seed = NULL )morie_bayes_ridge_gibbs( x, y, n_iter = 200, burn = 50, df0 = 4, S0 = NULL, seed = 0, deterministic_seed = NULL )
x |
(n x p) marker matrix. |
y |
Numeric response. |
n_iter |
Iterations. |
burn |
Burn-in. |
df0 |
Prior df (default 4). |
S0 |
Prior scale (default anchors to var(y)/p). |
seed |
Seed. |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
list(estimate, beta, beta_se, sigma_j2, sigma2, n_iter, n, p, method).
Meuwissen-Hayes-Goddard (2001) Genetics 157:1819.
morie_bayes_ridge_gibbs(x = rnorm(50), y = rnorm(50))morie_bayes_ridge_gibbs(x = rnorm(50), y = rnorm(50))
Bayesian LASSO (Park & Casella 2008 short Gibbs)
morie_bayesian_lasso_full( x, y, n_iter = 200, burn = 50, lam = NULL, seed = 0, deterministic_seed = NULL )morie_bayesian_lasso_full( x, y, n_iter = 200, burn = 50, lam = NULL, seed = 0, deterministic_seed = NULL )
x |
(n x p) marker matrix. |
y |
Numeric response. |
n_iter |
Total iterations (default 200). |
burn |
Burn-in (default 50). |
lam |
Optional fixed lambda (else empirical-Bayes updated). |
seed |
Random seed. |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
list(estimate, beta, intercept, se, beta_se, lam, sigma2, n_iter, n, p, method).
Park & Casella (2008) JASA 103:681. Montesinos Lopez Ch 4.
morie_bayesian_lasso_full( x = matrix(rnorm(150), 50, 3), y = rnorm(50), n_iter = 50L, burn = 10L, lam = 1, seed = 1L, deterministic_seed = TRUE )morie_bayesian_lasso_full( x = matrix(rnorm(150), 50, 3), y = rnorm(50), n_iter = 50L, burn = 10L, lam = 1, seed = 1L, deterministic_seed = TRUE )
beta_hat = solve(X'X + lambdaI) %% X'y
morie_bayesian_ridge_regression(x, y, lam = NULL)morie_bayesian_ridge_regression(x, y, lam = NULL)
x |
(n x p) marker matrix. |
y |
Numeric response. |
lam |
Optional ridge parameter; default Endelman rrBLUP-style. |
list(estimate, beta, intercept, se, beta_se, lam, n, p, method).
Montesinos Lopez Ch 4.
morie_bayesian_ridge_regression(x = rnorm(50), y = rnorm(50))morie_bayesian_ridge_regression(x = rnorm(50), y = rnorm(50))
boot::boot.ci
Thin pass-through that computes bootstrap confidence intervals
from a boot object via boot::boot.ci, returning a
tidy named list with (ci_lower, ci_upper) per requested type.
morie_boot_basic_ci( boot_obj, type = c("perc", "bca", "basic", "norm", "stud"), conf = 0.95 )morie_boot_basic_ci( boot_obj, type = c("perc", "bca", "basic", "norm", "stud"), conf = 0.95 )
boot_obj |
A |
type |
Character vector of CI types; any of |
conf |
Confidence level (default 0.95). |
Named list of length length(type); each element is
a numeric length-2 vector c(ci_lower, ci_upper).
boot::boot.ci.
boot::boot
Thin pass-through that adapts an rmorie-style
statistic(data) -> scalar callable to boot::boot's
statistic(data, indices) -> scalar signature and returns
the raw boot object. Useful when the caller wants to use
boot's downstream helpers (boot::boot.ci,
boot::tilt.boot, boot::jack.after.boot) directly.
morie_boot_run(data, statistic, R = 2000L, strata = NULL, ...)morie_boot_run(data, statistic, R = 2000L, strata = NULL, ...)
data |
A numeric vector, matrix, or data.frame. |
statistic |
Function |
R |
Number of bootstrap replicates. |
strata |
Optional integer stratification vector. |
... |
Forwarded to |
A boot object as returned by boot::boot.
boot::boot, morie_boot_basic_ci().
Bootstrap resampling for any statistic
morie_bootstrap_sample(df, statistic, n_bootstrap = 1000L, seed = 42L)morie_bootstrap_sample(df, statistic, n_bootstrap = 1000L, seed = 42L)
df |
A data frame. |
statistic |
A function taking a data frame and returning a scalar. |
n_bootstrap |
Number of bootstrap replicates. |
seed |
Random seed. |
Named list: estimate, se, ci_lower, ci_upper,
distribution (numeric vector of bootstrap statistics).
df <- data.frame(x = rnorm(100)) morie_bootstrap_sample(df, statistic = function(d) mean(d$x))df <- data.frame(x = rnorm(100)) morie_bootstrap_sample(df, statistic = function(d) mean(d$x))
Build an outputs manifest from a directory of artifacts
morie_build_outputs_manifest( output_dir, manifest_path, public_prefix = "data/manifest/outputs", extensions = c("csv", "pdf", "png", "html", "txt", "md") )morie_build_outputs_manifest( output_dir, manifest_path, public_prefix = "data/manifest/outputs", extensions = c("csv", "pdf", "png", "html", "txt", "md") )
output_dir |
Directory containing output files. |
manifest_path |
CSV path to write. |
public_prefix |
Prefix used in |
extensions |
File extensions to include (without dots). |
Manifest data frame.
# Scan a tempdir of output files and build a manifest CSV: tdir <- tempfile("morie-doc-") dir.create(tdir) writeLines("x,y\n1,2", file.path(tdir, "results.csv")) writeLines("# report", file.path(tdir, "report.md")) morie_build_outputs_manifest(tdir, file.path(tdir, "outputs_manifest.csv"))# Scan a tempdir of output files and build a manifest CSV: tdir <- tempfile("morie-doc-") dir.create(tdir) writeLines("x,y\n1,2", file.path(tdir, "results.csv")) writeLines("# report", file.path(tdir, "report.md")) morie_build_outputs_manifest(tdir, file.path(tdir, "outputs_manifest.csv"))
Returns the path to morie.db that ships with the package
(inst/extdata/morie.db). This database contains all CPADS,
CCS, CSADS, CSUS, HealthInfobase, and CIHI datasets pre-loaded as
SQLite tables.
morie_builtin_db()morie_builtin_db()
File path string.
morie_builtin_db()morie_builtin_db()
Removes files cached by morie under
tools::R_user_dir("morie", "cache") (or
MORIE_CACHE_DIR if set). morie's default behaviour writes
caches to a session-scoped tempdir()
subdirectory, so this function only matters if you have explicitly
opted in to persistent caching by passing
cache_dir = morie_cache_dir(...) to any of the morie
fetchers.
morie_cache_clear(subdir = NULL, confirm = interactive())morie_cache_clear(subdir = NULL, confirm = interactive())
subdir |
Optional subdirectory under the morie cache root to
target (e.g. |
confirm |
If |
Invisibly, the number of files removed.
# Non-interactive: skip the confirmation prompt. morie_cache_clear("siu", confirm = FALSE)# Non-interactive: skip the confirmation prompt. morie_cache_clear("siu", confirm = FALSE)
morie functions that persist artifacts to disk (e.g.
morie_fetch_siu(cache_html = TRUE)) default to a
session-scoped subdirectory of tempdir(),
which R automatically removes when the session ends. This is the
most conservative CRAN-Policy-compliant default: nothing morie
writes ever survives the R session unless the user explicitly
opts in.
morie_cache_dir(subdir = NULL)morie_cache_dir(subdir = NULL)
subdir |
Optional subdirectory under the morie cache root
(e.g. |
Users who want persistent caching across sessions opt in by
passing the result of morie_cache_dir(subdir) as the
cache_dir argument, e.g.:
morie_fetch_siu(
cache_dir = morie_cache_dir("siu"),
cache_html = TRUE
)
The persistent location is tools::R_user_dir("morie", "cache")
(R 4.0), which on Linux defaults to
~/.cache/R/morie/, on macOS to
~/Library/Caches/org.R-project.R/R/morie/, and on Windows to
%LOCALAPPDATA%/R/cache/R/morie/. Users can override this
location by setting the MORIE_CACHE_DIR environment variable
before calling morie_cache_dir().
Active management. CRAN Policy requires persistent caches
to be actively managed. Use morie_cache_clear() to
empty the persistent cache (or a subdirectory of it). Cached SIU
HTML is ~80-100 MB at full sweep, so clearing it occasionally is
usually unnecessary, but it is supported.
A file path string. The directory is not created; callers create it lazily only when they actually persist to disk.
# Persistent cache root (does not write anything to disk): morie_cache_dir() # Per-subsystem persistent path: morie_cache_dir("siu")# Persistent cache root (does not write anything to disk): morie_cache_dir() # Per-subsystem persistent path: morie_cache_dir("siu")
Reads a local file and writes it to the cache so that CI and Docker environments (which may lack the original files) can still run tests.
morie_cache_file(path, table_name, db_path = NULL, con = NULL)morie_cache_file(path, table_name, db_path = NULL, con = NULL)
path |
Path to a CSV or RDS file. |
table_name |
Name for the cached table. |
db_path |
Optional path to a SQLite file (default backend). |
con |
Optional pre-opened DBI connection (overrides |
Number of rows cached (invisible).
tdir <- tempfile("morie-cache-") dir.create(tdir) f <- file.path(tdir, "demo.csv") write.csv(data.frame(x = 1:3, y = 4:6), f, row.names = FALSE) morie_cache_file(f, "demo", db_path = file.path(tdir, "cache.db"))tdir <- tempfile("morie-cache-") dir.create(tdir) f <- file.path(tdir, "demo.csv") write.csv(data.frame(x = 1:3, y = 4:6), f, row.names = FALSE) morie_cache_file(f, "demo", db_path = file.path(tdir, "cache.db"))
List all tables in the MORIE cache
morie_cache_list(db_path = NULL, con = NULL)morie_cache_list(db_path = NULL, con = NULL)
db_path |
Optional path to a SQLite file (default backend). |
con |
Optional pre-opened DBI connection (overrides |
A data.frame with columns table and rows.
db <- tempfile(fileext = ".db") morie_cache_store(data.frame(x = 1:3), "demo", db_path = db) morie_cache_list(db_path = db) file.remove(db)db <- tempfile(fileext = ".db") morie_cache_store(data.frame(x = 1:3), "demo", db_path = db) morie_cache_list(db_path = db) file.remove(db)
Load a table from the MORIE cache
morie_cache_load(table_name, db_path = NULL, con = NULL)morie_cache_load(table_name, db_path = NULL, con = NULL)
table_name |
Name of the table. |
db_path |
Optional path to a SQLite file (default backend). |
con |
Optional pre-opened DBI connection (overrides |
A data.frame, or NULL if the table does not exist.
db <- tempfile(fileext = ".db") morie_cache_store( data = data.frame(x = 1:5), table_name = "demo", db_path = db ) morie_cache_load(table_name = "demo", db_path = db) file.remove(db)db <- tempfile(fileext = ".db") morie_cache_store( data = data.frame(x = 1:5), table_name = "demo", db_path = db ) morie_cache_load(table_name = "demo", db_path = db) file.remove(db)
Writes (or replaces) a table in the shared SQLite cache.
morie_cache_store(data, table_name, db_path = NULL, con = NULL)morie_cache_store(data, table_name, db_path = NULL, con = NULL)
data |
A data.frame to cache. |
table_name |
Name of the destination table. |
db_path |
Optional path to a SQLite file (default backend). |
con |
Optional pre-opened DBI connection. When supplied, the
table is written through |
Number of rows written (invisible).
db <- tempfile(fileext = ".db") morie_cache_store( data = data.frame(x = rnorm(50), y = rnorm(50)), table_name = "demo", db_path = db ) file.remove(db)db <- tempfile(fileext = ".db") morie_cache_store( data = data.frame(x = rnorm(50), y = rnorm(50)), table_name = "demo", db_path = db ) file.remove(db)
Compute the continuous estimated Blood Alcohol Concentration using the
standard Widmark formula. Mirrors the Python morie.calculate_ebac().
morie_calculate_ebac(drinks, weight_lbs, hours, gender_constant)morie_calculate_ebac(drinks, weight_lbs, hours, gender_constant)
drinks |
Number of standard drinks consumed (1 drink = 14 g alcohol). |
weight_lbs |
Body weight in pounds. |
hours |
Hours elapsed since drinking began. |
gender_constant |
Widmark gender multiplier (0.73 men, 0.66 women). |
The Widmark formula is:
where is the gender constant (0.73 for men, 0.66 for women).
Returned values are clipped at zero.
Non-negative numeric scalar: estimated BAC.
morie_calculate_ebac(drinks = 4, weight_lbs = 180, hours = 2, gender_constant = 0.73)morie_calculate_ebac(drinks = 4, weight_lbs = 180, hours = 2, gender_constant = 0.73)
Mirrors the Python morie.calculate_ipw_weights(). Pure-R, no extra
dependencies.
morie_calculate_ipw_weights( data, treatment, ps_col, stabilized = FALSE, trim_quantiles = NULL )morie_calculate_ipw_weights( data, treatment, ps_col, stabilized = FALSE, trim_quantiles = NULL )
data |
A |
treatment |
Column name (string) of the binary treatment. |
ps_col |
Column name (string) of the propensity scores. |
stabilized |
If |
trim_quantiles |
Optional length-2 numeric vector |
Standard IPW: , with the
propensity score clipped at [0.01, 0.99] for stability.
Stabilised IPW replaces and with the marginal
treatment probability and respectively.
Numeric vector of IPTW weights, length nrow(data).
set.seed(1) df <- data.frame( t = rbinom(100, 1, 0.4), ps = pmin(pmax(runif(100, 0.05, 0.95), 0.05), 0.95) ) w <- morie_calculate_ipw_weights(df, treatment = "t", ps_col = "ps") summary(w)set.seed(1) df <- data.frame( t = rbinom(100, 1, 0.4), ps = pmin(pmax(runif(100, 0.05, 0.95), 0.05), 0.95) ) w <- morie_calculate_ipw_weights(df, treatment = "t", ps_col = "ps") summary(w)
Adjusts initial design weights so that weighted marginal totals match known population totals for each auxiliary variable.
morie_calibration_weights( df, aux_vars, population_totals, initial_weights = NULL, max_iter = 50L, tol = 1e-06 )morie_calibration_weights( df, aux_vars, population_totals, initial_weights = NULL, max_iter = 50L, tol = 1e-06 )
df |
A data frame. |
aux_vars |
Character vector of categorical auxiliary variable names. |
population_totals |
Named list: |
initial_weights |
Optional numeric vector of starting weights. |
max_iter |
Maximum IPF iterations. |
tol |
Convergence tolerance. |
Numeric vector of calibrated weights.
set.seed(1) df <- data.frame( region = sample(c("A", "B"), 100, TRUE), sex = sample(c("M", "F"), 100, TRUE) ) totals <- list(region_A = 60, region_B = 40, sex_M = 55, sex_F = 45) morie_calibration_weights(df, aux_vars = c("region", "sex"), population_totals = totals )set.seed(1) df <- data.frame( region = sample(c("A", "B"), 100, TRUE), sex = sample(c("M", "F"), 100, TRUE) ) totals <- list(region_A = 60, region_B = 40, sex_M = 55, sex_F = 45) morie_calibration_weights(df, aux_vars = c("region", "sex"), population_totals = totals )
Canonicalize raw CPADS PUMF columns
morie_canonicalize_cpads_data(data)morie_canonicalize_cpads_data(data)
data |
Raw CPADS data frame. |
Data frame with canonical MORIE analysis columns.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Thin wrapper around CausalImpact::CausalImpact() (Brodersen
et al. 2015). Fits a Bayesian structural time-series counterfactual
to a single-series treatment using the pre-intervention window and
reports the post-intervention causal effect with credible
intervals.
morie_causal_impact( data, pre_period, post_period, model_args = NULL, alpha = 0.05 )morie_causal_impact( data, pre_period, post_period, model_args = NULL, alpha = 0.05 )
data |
A data frame, matrix, or |
pre_period |
Integer length-2 vector giving the start and end
row indices (or time indices for |
post_period |
Integer length-2 vector giving the start and end row indices of the post-intervention window. |
model_args |
Optional named list passed to
|
alpha |
Posterior credible-interval coverage (default 0.05, meaning 95 percent intervals). |
Hard-errors if CausalImpact is not installed – the upstream
Kalman-filter + slab-and-spike machinery has no compact inline
equivalent. The wrapper is documented as an extender so that
downstream rmorie callers have a stable morie_* entry point
to the package.
Named list with elements average_effect,
cumulative_effect, ci_lower, ci_upper,
posterior_prob_causal, and summary (the upstream
CausalImpact summary matrix), plus the original
impact object.
Brodersen KH, Gallusser F, Koehler J, Remy N, Scott SL (2015). Inferring causal impact using Bayesian structural time-series models. Annals of Applied Statistics, 9(1):247-274.
Thin wrapper around sandwich variance estimators. Returns the requested variance-covariance matrix and the corresponding robust standard errors. Supports HC0-HC5 (cross-section), HAC (time-series), and clustered (one-way) variance.
morie_causal_robust_se(model, type = "HC3", cluster = NULL, ...)morie_causal_robust_se(model, type = "HC3", cluster = NULL, ...)
model |
A fitted model object (typically from
|
type |
One of |
cluster |
Optional one-sided formula or vector identifying
the cluster variable (required when |
... |
Additional arguments forwarded to the chosen sandwich estimator. |
Hard-errors if sandwich is not installed – the HC sandwich algebra is well-tested upstream and re-implementing it inline would be both lengthy and error-prone.
Named list with elements vcov (variance matrix),
se (named numeric vector of robust SEs), type,
and n_coef.
Zeileis A, Koll S, Graham N (2020). Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R. Journal of Statistical Software, 95(1), 1-36.
Thin wrapper around WeightIt::weightit() exposing the full
WeightIt method palette ("glm", "cbps",
"ebal", "ps", "energy", "optweight",
and any future additions). Provides MORIE callers with a stable
morie_* entry point for balancing weights while preserving
the underlying object so callers can pipe into
survey::svyglm or cobalt::bal.tab downstream.
morie_causal_weighting( data, treatment, covariates, method = "glm", estimand = c("ATE", "ATT", "ATC"), ... )morie_causal_weighting( data, treatment, covariates, method = "glm", estimand = c("ATE", "ATT", "ATC"), ... )
data |
A data frame. |
treatment |
Name of the treatment column (binary, multinomial,
or continuous depending on |
covariates |
Character vector of covariate names. |
method |
One of |
estimand |
One of |
... |
Additional arguments forwarded to
|
Hard-errors if WeightIt is not installed – the multi-method weighting machinery has no compact inline equivalent.
Named list with elements weights (numeric vector),
propensity_scores (numeric vector or NULL),
method, estimand, ess (effective sample
size), and weightit (the original WeightIt object).
Greifer N (2024). WeightIt: Weighting for Covariate Balance in Observational Studies. R package version 1.4.0.
Check whether a downstream package's SPDX is GPL-compatible
morie_check_plugin_license(plugin_spdx, raise_on_incompatible = FALSE)morie_check_plugin_license(plugin_spdx, raise_on_incompatible = FALSE)
plugin_spdx |
SPDX identifier (e.g. |
raise_on_incompatible |
If |
TRUE if compatible. Issues a warning (or error)
otherwise.
morie_check_plugin_license("MIT") ## Not run: # The next call demonstrates the error path; runs only on # explicit example() with run.dontrun = TRUE. morie_check_plugin_license("LicenseRef-Proprietary", raise_on_incompatible = TRUE ) ## End(Not run)morie_check_plugin_license("MIT") ## Not run: # The next call demonstrates the error path; runs only on # explicit example() with run.dontrun = TRUE. morie_check_plugin_license("LicenseRef-Proprietary", raise_on_incompatible = TRUE ) ## End(Not run)
Chi-square test of independence or goodness-of-fit
morie_chi_square_test(observed, expected = NULL)morie_chi_square_test(observed, expected = NULL)
observed |
Observed counts (matrix for independence, vector for GOF). |
expected |
Expected counts for GOF (optional; uniform if NULL). |
Named list: chi_sq, df, p_value, cramers_v.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Wraps the CKAN package_search action so users can discover
datasets that are not in the built-in MORIE catalog and fetch them
through morie_fetch_ckan or morie_fetch.
morie_ckan_search(query, portal = "open.canada.ca", rows = 25L, ...)morie_ckan_search(query, portal = "open.canada.ca", rows = 25L, ...)
query |
Free-text search string. |
portal |
A known portal name ( |
rows |
Maximum number of datasets to return (default 25). |
... |
Extra named CKAN |
A data.frame with one row per resource, columns:
dataset_title, dataset_id, resource_id,
resource_name, format, datastore_active,
url. Feed resource_id into
morie_fetch_ckan(resource_id = ...).
## Not run: hits <- morie_ckan_search("cannabis survey", portal = "open.canada.ca") head(hits[, c("dataset_title", "resource_id", "format")]) ## End(Not run)## Not run: hits <- morie_ckan_search("cannabis survey", portal = "open.canada.ca") head(hits[, c("dataset_title", "resource_id", "format")]) ## End(Not run)
Classify one variable.
morie_classify_variable( col_name, dtype = "string", valid_values = NULL, dataset_name = "unknown" )morie_classify_variable( col_name, dtype = "string", valid_values = NULL, dataset_name = "unknown" )
col_name |
Character; the column name. |
dtype |
Character; one of |
valid_values |
Optional character vector of closed-set values. |
dataset_name |
Character; the owning dataset id (e.g.
|
A named list with classes morie_variable_taxonomy /
list.
Walks morie_datasets_browse() (9242 datasets across all portals)
and writes a single CSV with the columns the C++ rmorie binary's
catalog parser expects: dataset_key, portal, title,
description, url, license, formats.
morie_cli_dump_catalog(out_path = NULL)morie_cli_dump_catalog(out_path = NULL)
out_path |
Where to write. Defaults to
|
The CLI binary prefers this unified file when present; otherwise it
falls back to scanning every inst/extdata/*_catalog.csv (which
have heterogeneous per-portal schemas and don't parse cleanly).
Idempotent: regenerate after any change to the catalog registry.
Invisibly, the path written.
Randomly selects n_clusters clusters, then takes all units within
selected clusters.
morie_cluster_sample(df, cluster_col, n_clusters, seed = 42L)morie_cluster_sample(df, cluster_col, n_clusters, seed = 42L)
df |
A data frame. |
cluster_col |
Name of the cluster identifier column. |
n_clusters |
Number of clusters to select. |
seed |
Random seed. |
Data frame of selected units with .weight column.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
CNN genomic predictor (Conv1D + GAP + dense, base R)
morie_cnn_genomic( x, y, markers, n_filters = 8, kernel = 3, hidden = 8, n_epochs = 150, lr = 0.01, l2 = 0.001, seed = 0, deterministic_seed = NULL )morie_cnn_genomic( x, y, markers, n_filters = 8, kernel = 3, hidden = 8, n_epochs = 150, lr = 0.01, l2 = 0.001, seed = 0, deterministic_seed = NULL )
x |
Optional fixed-effect design. |
y |
Numeric response. |
markers |
(n x m) genotype matrix. |
n_filters, kernel, , n_epochs, lr, l2, seed
|
Hyperparameters. |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
list(estimate, y_hat, W_conv, b_conv, W1, b1, w2, b2, se, n, method).
Montesinos Lopez Ch 13.
morie_cnn_genomic(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))morie_cnn_genomic(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))
Magnitude-squared morie_coherence between two time series
morie_coherence(x, y, nperseg = NULL, fs = 1)morie_coherence(x, y, nperseg = NULL, fs = 1)
x |
Numeric vector. |
y |
Numeric vector (same length). |
nperseg |
Segment length. Default n/4. |
fs |
Sampling frequency. Default 1. |
Named list with frequencies, morie_coherence, n_segments,
nperseg, fs, n, method.
morie_coherence(x = rnorm(50), y = rnorm(50))morie_coherence(x = rnorm(50), y = rnorm(50))
Thin extender over coin::independence_test for
conditional / permutation tests of independence between
arbitrary response and covariate combinations.
morie_coin_independence(formula, data, ...)morie_coin_independence(formula, data, ...)
formula |
A model formula, e.g. |
data |
A data frame. |
... |
Further arguments forwarded to
|
A list with $method =
"coin::independence_test" and $raw (an IndependenceTest
object).
Thin extender over coin::oneway_test for the
permutation analogue of the classical one-way ANOVA.
morie_coin_oneway(formula, data, ...)morie_coin_oneway(formula, data, ...)
formula |
A formula |
data |
A data frame. |
... |
Further arguments forwarded to
|
A list with $method = "coin::oneway_test" and
$raw (an IndependenceTest object).
Thin extender over coin::wilcox_test for two-sample
permutation Wilcoxon (Mann-Whitney) tests.
morie_coin_wilcoxon(formula, data, ...)morie_coin_wilcoxon(formula, data, ...)
formula |
A two-sided formula |
data |
A data frame. |
... |
Further arguments forwarded to
|
A list with $method = "coin::wilcox_test" and
$raw (an IndependenceTest object).
Mirrors the Python morie.compare_nested_logistic_models(). Fits a
reduced and a full logistic model (the reduced model's predictors must be
a subset of the full model's), then performs an analysis-of-deviance LRT.
morie_compare_nested_logistic_models( data, outcome, predictors_full, predictors_reduced )morie_compare_nested_logistic_models( data, outcome, predictors_full, predictors_reduced )
data |
A |
outcome |
Column name of the binary outcome. |
predictors_full |
Character vector: full model's predictors. |
predictors_reduced |
Character vector: reduced model's predictors.
Must be a subset of |
A list with chi_sq, df, p_value, aic_full, aic_reduced,
n.
set.seed(1) df <- data.frame( y = rbinom(200, 1, 0.4), x1 = rnorm(200), x2 = rnorm(200), x3 = rnorm(200) ) morie_compare_nested_logistic_models(df, outcome = "y", predictors_full = c("x1", "x2", "x3"), predictors_reduced = c("x1") )set.seed(1) df <- data.frame( y = rbinom(200, 1, 0.4), x1 = rnorm(200), x2 = rnorm(200), x3 = rnorm(200) ) morie_compare_nested_logistic_models(df, outcome = "y", predictors_full = c("x1", "x2", "x3"), predictors_reduced = c("x1") )
Compute inverse-probability design weights
morie_compute_design_weights(df, strata_col, population_sizes)morie_compute_design_weights(df, strata_col, population_sizes)
df |
A data frame. |
strata_col |
Name of the stratification column. |
population_sizes |
Named integer vector: stratum level -> population size. |
Numeric vector of design weights (same length as nrow(df)).
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Supports incomplete rankings via NA entries. For complete rankings, W = 12 S / (k^2 (n^3 - n)) where S is the sum of squared deviations of object rank-sums from their mean. Significance via chi-square approximation k(n-1) W ~ chi-square with n-1 df.
morie_concordance_incomplete(x)morie_concordance_incomplete(x)
x |
Matrix (n objects rows x k rankers cols); NA = not ranked. |
Named list: statistic (W), p_value, df, chi2, n, k.
morie_concordance_incomplete(x = rnorm(50))morie_concordance_incomplete(x = rnorm(50))
Manually constructs the confusion matrix to avoid the caret dependency for what is fundamentally a tabulation.
morie_confusion_matrix_metrics(y_true, y_pred, labels = NULL)morie_confusion_matrix_metrics(y_true, y_pred, labels = NULL)
y_true |
Observed labels. |
y_pred |
Predicted labels. |
labels |
Optional ordering vector. |
Named list: estimate (accuracy), accuracy, confusion_matrix, labels, precision, recall, f1, macro_precision, macro_recall, macro_f1, weighted_f1, n, method.
morie_confusion_matrix_metrics(y_true = rbinom(50, 1, 0.5), y_pred = rbinom(50, 1, 0.5))morie_confusion_matrix_metrics(y_true = rbinom(50, 1, 0.5), y_pred = rbinom(50, 1, 0.5))
C = sqrt(chi^2 / (chi^2 + n)). Also reports Cramer's V and the maximum attainable C = sqrt((min(r,c)-1)/min(r,c)).
morie_contingency_coefficient(x)morie_contingency_coefficient(x)
x |
A 2-D contingency table of counts. |
Named list: statistic (C), morie_cramers_v, chi2, p_value, df, max_C, n.
morie_contingency_coefficient(x = matrix(sample(1:5, 50, TRUE), 10, 5))morie_contingency_coefficient(x = matrix(sample(1:5, 50, TRUE), 10, 5))
Mann-Whitney vs. control for each treatment group; Bonferroni- adjusted p-values by default.
morie_control_comparison( groups, control_index = 1L, adjust = c("bonferroni", "none") )morie_control_comparison( groups, control_index = 1L, adjust = c("bonferroni", "none") )
groups |
List of numeric vectors; first (or |
control_index |
Integer position of the control group. Default 1. |
adjust |
One of |
Named list: statistic, p_value, p_adjusted, n, k, control_n.
morie_control_comparison(groups = list(rnorm(20), rnorm(20), rnorm(20)))morie_control_comparison(groups = list(rnorm(20), rnorm(20), rnorm(20)))
Two-sample median test: contingency-table chi-square on the counts above/below the pooled-sample median.
morie_control_median_test(x, y)morie_control_median_test(x, y)
x |
Numeric vector (control). |
y |
Numeric vector (treatment). |
Named list: statistic, p_value, df, n, grand_median, table.
morie_control_median_test(x = rnorm(50), y = rnorm(50))morie_control_median_test(x = rnorm(50), y = rnorm(50))
Thin extender over copula::fitCopula that estimates
copula parameters from pseudo-observations on .
morie_copula_fit(copula, data, ...)morie_copula_fit(copula, data, ...)
copula |
A |
data |
Numeric matrix of pseudo-observations on
|
... |
Further arguments forwarded to
|
A list with $method = "copula::fitCopula" and
$raw (a fitCopula object with the estimated
parameters, log-likelihood and variance estimates).
## Not run: if (requireNamespace("copula", quietly = TRUE)) { set.seed(1) cop <- copula::normalCopula(0.5, dim = 2) u <- copula::rCopula(200, cop) morie_copula_fit(copula::normalCopula(dim = 2), data = u) } ## End(Not run)## Not run: if (requireNamespace("copula", quietly = TRUE)) { set.seed(1) cop <- copula::normalCopula(0.5, dim = 2) u <- copula::rCopula(200, cop) morie_copula_fit(copula::normalCopula(dim = 2), data = u) } ## End(Not run)
Thin extender over copula::rCopula that generates
draws on from a specified copula.
morie_copula_sample(n, copula, ...)morie_copula_sample(n, copula, ...)
n |
Integer; the number of multivariate observations to draw. |
copula |
A |
... |
Further arguments forwarded to |
A list with $method = "copula::rCopula" and
$raw (a numeric matrix of dimension
with values in ).
## Not run: if (requireNamespace("copula", quietly = TRUE)) { set.seed(1) morie_copula_sample(100, copula::claytonCopula(2, dim = 3)) } ## End(Not run)## Not run: if (requireNamespace("copula", quietly = TRUE)) { set.seed(1) morie_copula_sample(100, copula::claytonCopula(2, dim = 3)) } ## End(Not run)
Returns the 12 morie short-name -> CKAN resource-id map for the Ontario "Use of Force in Correctional Institutions" dataset on data.ontario.ca. Useful for catalog discovery + sanity tests.
morie_corrections_uof_resource_ids()morie_corrections_uof_resource_ids()
Named list of 12 CKAN resource ids.
ids <- morie_corrections_uof_resource_ids() length(ids) names(ids)ids <- morie_corrections_uof_resource_ids() length(ids) names(ids)
First-pass canonicalization layer based on the public CPADS PUMF
field names. If frame already carries the canonical columns it is
returned unchanged (after validation). Otherwise raw PUMF columns
are remapped using .MORIE_CPADS_RAW_COLUMN_MAP and missing/DKNR
codes (98, 99) are converted to NA.
morie_cpads_canonicalize_frame(frame)morie_cpads_canonicalize_frame(frame)
frame |
A |
A data.frame with the canonical CPADS analysis columns.
Describes the morie CPADS contract: the canonical analysis variables expected in a wrangled frame, the raw -> canonical column map, and the conventional on-disk cache path used when a user has wrangled the PUMF themselves.
morie_cpads_contract()morie_cpads_contract()
CPADS is open data (Open Government Licence – Canada). The
Public Use Microdata File is available at open.canada.ca, dataset
736fa9b2-62e4-4e31-aea4-51869605b363 (resource
d2639429-c304-45a6-90b3-770562f4d46d,
file cpads-2021-2022-pumf2.csv). Aggregate dashboards at
https://health-infobase.canada.ca/substance-use/reports/cpads/.
morie ships a 30-row synthetic at
inst/extdata/cpads_pumf_synthetic.csv for offline CRAN-safe
tests; morie_datasets_cpads(offline = FALSE) fetches the live
PUMF. Earlier morie versions wrongly claimed CPADS was
"FOI/agreement-only"; that was incorrect and has been retracted
as of 3MMM.
A named list with fields source_kind,
expected_wrangled_path, required_variables,
raw_column_map, and note.
contract <- morie_cpads_contract() contract$required_variablescontract <- morie_cpads_contract() contract$required_variables
Accepts either wtpumf (PUMF release) or wtdf (full dataset) as
the weight column; otherwise all documented raw PUMF columns must be
present.
morie_cpads_has_raw_columns(frame)morie_cpads_has_raw_columns(frame)
frame |
A |
Logical scalar; TRUE if the frame contains the raw CPADS
PUMF schema, FALSE otherwise.
Recognises .csv, .xlsx / .xls, and .rds. Raises for any
other extension.
morie_cpads_infer_file_format(path)morie_cpads_infer_file_format(path)
path |
A character scalar file path. |
One of "csv", "excel", or "rds".
morie_cpads_infer_file_format("data/cache/cpads_pumf_wrangled.rds")morie_cpads_infer_file_format("data/cache/cpads_pumf_wrangled.rds")
Identify missing canonical CPADS variables in a column set.
morie_cpads_missing_variables(columns)morie_cpads_missing_variables(columns)
columns |
Character vector of column names (e.g. |
Character vector of missing canonical CPADS variables (empty if every required variable is present).
morie_cpads_missing_variables(c("weight", "alcohol_past12m"))morie_cpads_missing_variables(c("weight", "alcohol_past12m"))
Validate a data frame against the canonical CPADS analysis contract.
morie_cpads_validate_frame(frame, strict = TRUE)morie_cpads_validate_frame(frame, strict = TRUE)
frame |
A |
strict |
Logical; if |
Character vector of missing canonical variable names (invisibly when strict and complete).
## Not run: morie_cpads_validate_frame(df, strict = TRUE) ## End(Not run)## Not run: morie_cpads_validate_frame(df, strict = TRUE) ## End(Not run)
Phase 3JJJ1. Inverse of morie_crypto_chacha20_poly1305_encrypt().
Accepts the full ciphertext || tag buffer (concatenate as
c(ct, tag)).
morie_crypto_chacha20_poly1305_decrypt(key, nonce, ct_with_tag, aad = raw(0))morie_crypto_chacha20_poly1305_decrypt(key, nonce, ct_with_tag, aad = raw(0))
key |
32-byte raw vector. |
nonce |
12-byte raw vector. |
ct_with_tag |
Raw vector containing ciphertext appended with the 16-byte tag. |
aad |
Optional raw vector of additional authenticated data. |
Decrypted plaintext as raw vector.
Phase 3JJJ1. Wraps libsodium's
crypto_aead_chacha20poly1305_ietf_encrypt (RFC 8439 IETF
variant: 32-byte key, 12-byte nonce, 16-byte authentication tag).
morie_crypto_chacha20_poly1305_encrypt(key, nonce, plaintext, aad = raw(0))morie_crypto_chacha20_poly1305_encrypt(key, nonce, plaintext, aad = raw(0))
key |
32-byte raw vector. |
nonce |
12-byte raw vector (single-use per key; reuse is catastrophic). |
plaintext |
Raw vector to encrypt (may be empty). |
aad |
Optional raw vector of additional authenticated data (default empty). |
Byte-compatible with the Python morie
chacha20_poly1305_encrypt(key, nonce, plaintext, aad). The C
transport returns ciphertext || tag as a single buffer;
this R wrapper splits it into list(ct = ..., tag = ...) to
match the Python tuple return shape.
List with ct (raw vector, length = length(plaintext))
and tag (raw vector, 16 bytes).
if (morie_crypto_sodium_available()) { k <- morie_crypto_random_bytes(32) n <- morie_crypto_random_bytes(12) r <- morie_crypto_chacha20_poly1305_encrypt(k, n, charToRaw("hello")) p <- morie_crypto_chacha20_poly1305_decrypt(k, n, c(r$ct, r$tag)) rawToChar(p) }if (morie_crypto_sodium_available()) { k <- morie_crypto_random_bytes(32) n <- morie_crypto_random_bytes(12) r <- morie_crypto_chacha20_poly1305_encrypt(k, n, charToRaw("hello")) p <- morie_crypto_chacha20_poly1305_decrypt(k, n, c(r$ct, r$tag)) rawToChar(p) }
Phase 3JJJ1. Mirrors the Python
morie.crypto.hkdf_sha256(ikm, length=32, salt=b"", info=b"")
byte-for-byte. Empty salt defaults to a 32-byte zero-filled
salt per RFC 5869 §2.2 (matches Python).
morie_crypto_hkdf_sha256(ikm, length = 32L, salt = raw(0), info = raw(0)) morie_crypto_hkdf_sha256(ikm, length = 32L, salt = raw(0), info = raw(0))morie_crypto_hkdf_sha256(ikm, length = 32L, salt = raw(0), info = raw(0)) morie_crypto_hkdf_sha256(ikm, length = 32L, salt = raw(0), info = raw(0))
ikm |
Input keying material (raw vector). |
length |
Output length in bytes (1..8160). |
salt |
Optional salt raw vector. Empty -> zero-fill. |
info |
Optional context/application info raw vector. |
Raw vector of length bytes.
Derived key material as raw vector of length length.
Hybrid decrypt: ML-KEM-768 + ChaCha20-Poly1305
morie_crypto_hybrid_decrypt(ciphertext, recipient_sk)morie_crypto_hybrid_decrypt(ciphertext, recipient_sk)
ciphertext |
Raw vector container. |
recipient_sk |
Raw vector: recipient's ML-KEM-768 secret key. |
Raw vector of decrypted plaintext.
Hybrid encrypt: ML-KEM-768 + ChaCha20-Poly1305
morie_crypto_hybrid_encrypt(plaintext, recipient_pk)morie_crypto_hybrid_encrypt(plaintext, recipient_pk)
plaintext |
Raw vector or character string to encrypt. |
recipient_pk |
Raw vector: recipient's ML-KEM-768 public key. |
Raw vector container.
Generate an ML-KEM-768 key pair for hybrid encryption
morie_crypto_hybrid_keygen()morie_crypto_hybrid_keygen()
A named list with pk (raw) and sk (raw).
Create a new empty morie keystore
morie_crypto_keystore_create(password, path = .morie_keystore_default_path())morie_crypto_keystore_create(password, path = .morie_keystore_default_path())
password |
Character scalar: keystore password. |
path |
File path. |
Invisibly, NULL.
List key names in the morie keystore
morie_crypto_keystore_list(password, path = .morie_keystore_default_path())morie_crypto_keystore_list(password, path = .morie_keystore_default_path())
password |
Character scalar. |
path |
Keystore path. |
Character vector of identifiers.
Load a key pair from the morie keystore
morie_crypto_keystore_load( name, password, path = .morie_keystore_default_path() )morie_crypto_keystore_load( name, password, path = .morie_keystore_default_path() )
name |
Identifier. |
password |
Character scalar. |
path |
Keystore path. |
Named list with pk (raw) and sk (raw).
Store a key pair in the morie keystore
morie_crypto_keystore_store( name, pk, sk, password, path = .morie_keystore_default_path() )morie_crypto_keystore_store( name, pk, sk, password, path = .morie_keystore_default_path() )
name |
Identifier. |
pk |
Raw vector: public key. |
sk |
Raw vector: secret key. |
password |
Character scalar. |
path |
Keystore path. |
Invisibly, NULL.
Phase 3JJJ2. Returns TRUE when morie's compiled .so was linked
against the Open Quantum Safe library at install time. If
FALSE, install liboqs and reinstall morie:
morie_crypto_liboqs_available()morie_crypto_liboqs_available()
macOS: brew install liboqs
Debian: sudo apt-get install liboqs-dev
(or build from source: https://github.com/open-quantum-safe/liboqs)
Single logical.
liboqs runtime version string
morie_crypto_liboqs_version()morie_crypto_liboqs_version()
Single character (e.g. "0.15.0"); empty if liboqs absent.
Phase 3JJJ2. Generates a post-quantum signature keypair.
Sizes: pk = 1952 bytes, sk = 4032 bytes.
morie_crypto_mldsa65_keygen()morie_crypto_mldsa65_keygen()
List with pk (raw, 1952 B) and sk (raw, 4032 B).
Sign a message with an ML-DSA-65 secret key. Signature length is variable up to a 3309-byte ceiling (typical: ~3293 B).
morie_crypto_mldsa65_sign(sk, message)morie_crypto_mldsa65_sign(sk, message)
sk |
4032-byte raw vector (signer's secret key). |
message |
Raw vector to sign. |
Raw vector signature.
ML-DSA-65 signature verification
morie_crypto_mldsa65_verify(pk, message, signature)morie_crypto_mldsa65_verify(pk, message, signature)
pk |
1952-byte raw vector (signer's public key). |
message |
Raw vector that was signed. |
signature |
Raw vector signature returned by
|
Single logical: TRUE if signature is valid.
Recover the shared secret from an encapsulation ciphertext using the recipient's secret key.
morie_crypto_mlkem768_decaps(sk, ct)morie_crypto_mlkem768_decaps(sk, ct)
sk |
2400-byte raw vector (recipient's ML-KEM-768 secret key). |
ct |
1088-byte raw vector (sender's encapsulation ciphertext). |
Raw vector (32 B), the shared secret.
Encapsulate a shared secret under a recipient's ML-KEM-768 public key. Returns the ciphertext (1088 B) the sender transmits, plus the 32-byte shared secret the sender holds locally.
morie_crypto_mlkem768_encaps(pk)morie_crypto_mlkem768_encaps(pk)
pk |
1184-byte raw vector (recipient's ML-KEM-768 public key). |
List with ct (raw, 1088 B) and shared_secret (raw, 32 B).
Phase 3JJJ2. Generates a post-quantum key encapsulation keypair.
Sizes: pk = 1184 bytes, sk = 2400 bytes.
morie_crypto_mlkem768_keygen()morie_crypto_mlkem768_keygen()
List with pk (raw, 1184 B) and sk (raw, 2400 B).
if (morie_crypto_liboqs_available()) { kp <- morie_crypto_mlkem768_keygen() c(pk = length(kp$pk), sk = length(kp$sk)) }if (morie_crypto_liboqs_available()) { kp <- morie_crypto_mlkem768_keygen() c(pk = length(kp$pk), sk = length(kp$sk)) }
Phase 3JJJ1. Wraps libsodium's randombytes_buf.
morie_crypto_random_bytes(n)morie_crypto_random_bytes(n)
n |
Number of bytes to generate. |
Raw vector of length n.
Phase 3JJJ1. Returns TRUE when morie's compiled .so was linked
against libsodium at install time (detected by ./configure via
pkg-config --libs libsodium or a bare -lsodium probe). If
FALSE, install libsodium and reinstall morie:
morie_crypto_sodium_available()morie_crypto_sodium_available()
macOS: brew install libsodium
Debian: sudo apt-get install libsodium-dev
Fedora: sudo dnf install libsodium-devel
Single logical.
morie_crypto_sodium_available()morie_crypto_sodium_available()
Phase 3JJJ1. Returns the bundled libsodium version (e.g.,
"1.0.20"); empty string if libsodium wasn't linked.
morie_crypto_sodium_version()morie_crypto_sodium_version()
Single character.
Returns a data.frame describing every dataset available through the MORIE data management system. Each row maps a short catalog key to its source, survey, year, file format, local path, SQLite table name, and CKAN resource ID (if available).
morie_dataset_catalog()morie_dataset_catalog()
Keys match the Python DATASET_CATALOG in data.py exactly.
Use morie_load_dataset to load by key.
A data.frame with 44 rows (one per dataset) and columns:
key, name, source, survey, year, format, type, large_file,
local_path, table_name, ckan_resource_id, download_url, zip_member.
The download_url / zip_member columns are empty for
datasets reachable through the SQLite cache or the CKAN datastore.
cat <- morie_dataset_catalog() nrow(cat) head(cat[, c("key", "name", "source", "year")]) # Find Ontario carceral datasets: cat[ grepl("OTIS|Ontario", paste(cat$source, cat$survey)), c("key", "year") ]cat <- morie_dataset_catalog() nrow(cat) head(cat[, c("key", "name", "source", "year")]) # Find Ontario carceral datasets: cat[ grepl("OTIS|Ontario", paste(cat$source, cat$survey)), c("key", "year") ]
Build a single-column profile record.
morie_dataset_column_profile( series, name, ordinal_threshold = 10L, binary_threshold = 2L )morie_dataset_column_profile( series, name, ordinal_threshold = 10L, binary_threshold = 2L )
series |
A vector. |
name |
Column name. |
ordinal_threshold |
Integer; passed to |
binary_threshold |
Integer; max unique values to count as binary (default 2). |
Named list with fields name, dtype, level, n_unique,
missing_pct, is_binary, is_constant, suggested_role,
summary_stats.
Detect the suggested epidemiological role of a column.
morie_dataset_detect_role(x, name)morie_dataset_detect_role(x, name)
x |
A vector. |
name |
Column name (drives the heuristic patterns). |
One of "id", "weight", "stratum", "cluster",
"treatment", "outcome", "covariate".
Decision rules, in order:
Character / factor with n_unique <= ordinal_threshold and an ordinal
name hit (likert/grade/scale/...): "ordinal".
Character / factor otherwise: "nominal".
Logical: "nominal".
Numeric with n_unique <= 2 (binary): "nominal".
Numeric with n_unique <= 20 + ordinal name hit: "ordinal".
Double with interval name hit (year/index/date/...): "interval".
Double otherwise: "ratio".
Integer with non-negative range: "ratio"; else "interval".
Date / POSIXct: "interval".
morie_dataset_infer_level(x, name = NULL, ordinal_threshold = 10L)morie_dataset_infer_level(x, name = NULL, ordinal_threshold = 10L)
x |
A vector (any atomic type or factor). |
name |
Optional column name to drive the name-based heuristics.
Defaults to |
ordinal_threshold |
Integer; max unique values for a categorical column to be considered ordinal (default 10). |
Character scalar; one of "nominal", "ordinal", "interval",
"ratio".
Get metadata for a single dataset
morie_dataset_info(key)morie_dataset_info(key)
key |
Dataset catalog key (or fuzzy match). |
A named list with dataset metadata.
# Use a real catalog key (run `morie_dataset_catalog()$key` to list them): info <- morie_dataset_info("ocp21") info$source info$year # Fuzzy match works for partial / forgiving keys: morie_dataset_info("cpads")$key# Use a real catalog key (run `morie_dataset_catalog()$key` to list them): info <- morie_dataset_info("ocp21") info$source info$year # Fuzzy match works for partial / forgiving keys: morie_dataset_info("cpads")$key
File format is detected from the extension. Supported extensions:
.csv, .tsv, .xlsx / .xls, .parquet / .pq, .json /
.jsonl.
morie_dataset_load(path, encoding = "UTF-8", ...)morie_dataset_load(path, encoding = "UTF-8", ...)
path |
Character; file path. |
encoding |
Character; encoding for text formats (default
|
... |
Forwarded to the underlying reader
( |
A data.frame.
Aggregates every per-portal registry (Chicago, NYC NYPD, NYC
OpenData, TPS ArcGIS Hub, TPS PSDP, Ontario CKAN, Vancouver,
VPD GeoDASH, Statistics Canada CCJS, Montreal, Toronto, Calgary,
Edmonton, Ottawa) into a single tidy data.frame for cross-portal
discovery and tooling. Caches the result in a session-local
environment so repeated calls are O(1); call
morie_dataset_portal_catalog_clear_cache() to force a rebuild
after editing a registry in an interactive session.
morie_dataset_portal_catalog(portal = NULL)morie_dataset_portal_catalog(portal = NULL)
portal |
Optional character filter restricting output to a
single portal. |
A data.frame with columns dataset_key, source,
id, api_modes, loader, dict_url, n_rows_bundled.
morie_dataset_portal_catalog_clear_cache(),
morie_datasets_load_by_key(), morie_datasets_browse()
# Per-portal slice: registry lives in code, fastest path. nypd <- morie_dataset_portal_catalog(portal = "nyc_nypd") nrow(nypd) head(nypd$dataset_key) # Full catalog: bulk portals (NYC OpenData, Chicago, Toronto Hub, # etc.) prefer the rmoriedata companion when installed, otherwise # contribute zero rows with a one-time warning per portal. cat_df <- morie_dataset_portal_catalog() table(cat_df$source)# Per-portal slice: registry lives in code, fastest path. nypd <- morie_dataset_portal_catalog(portal = "nyc_nypd") nrow(nypd) head(nypd$dataset_key) # Full catalog: bulk portals (NYC OpenData, Chicago, Toronto Hub, # etc.) prefer the rmoriedata companion when installed, otherwise # contribute zero rows with a one-time warning per portal. cat_df <- morie_dataset_portal_catalog() table(cat_df$source)
Forces the next morie_dataset_portal_catalog() call to rebuild
from the per-portal registries. Useful after editing or extending
a registry in an interactive session.
morie_dataset_portal_catalog_clear_cache()morie_dataset_portal_catalog_clear_cache()
Invisibly NULL.
Walks every column, infers its NOIR level and epidemiological role, computes summary statistics, and resolves a best-guess treatment, outcome, and survey-weight column. User-supplied hints override heuristic detection.
morie_dataset_profile( df, hint_treatment = NULL, hint_outcome = NULL, hint_weights = NULL, ordinal_threshold = 10L, binary_threshold = 2L )morie_dataset_profile( df, hint_treatment = NULL, hint_outcome = NULL, hint_weights = NULL, ordinal_threshold = 10L, binary_threshold = 2L )
df |
A |
hint_treatment |
Optional character; force this column as the treatment. |
hint_outcome |
Optional character; force this column as the outcome. |
hint_weights |
Optional character; force this column as the survey weight. |
ordinal_threshold |
Integer; max unique values for a categorical column to be classified as ordinal (default 10). |
binary_threshold |
Integer; max unique values for a binary column (default 2). |
A named list (the dataset profile) with fields n_rows,
n_cols, columns (named list of column profiles),
suggested_treatment, suggested_outcome, suggested_weights.
Plain text only; no rich dependency.
morie_dataset_profile_summary_table(profile)morie_dataset_profile_summary_table(profile)
profile |
A |
Character scalar with embedded newlines.
Suitable for JSON / RDS round-trips.
morie_dataset_profile_to_list(profile)morie_dataset_profile_to_list(profile)
profile |
A |
Nested named list.
Uses the inferred measurement levels, binary indicators, and detected treatment/outcome/weight columns to recommend epidemiological analyses (descriptive profile, propensity scores, IPW-ATE, AIPW, ATT/ATC, double-ML, GATE, survey-weighted estimates).
morie_dataset_suggest_plan(profile)morie_dataset_suggest_plan(profile)
profile |
A |
A list of suggestion lists; each has analysis, rationale,
and required_vars.
Interval / ratio columns get mean/sd/min/q25/median/q75/max; nominal
/ ordinal columns get a top_counts list of value -> count for the
top ten levels.
morie_dataset_summarize_column(x, level)morie_dataset_summarize_column(x, level)
x |
A vector. |
level |
Inferred measurement level (one of nominal/ordinal/ interval/ratio). |
Named list of summary statistics.
Portal-agnostic sibling to
morie_datasets_tps_arcgis_hub_by_id(). Works for ANY ArcGIS
Online item GUID (not just TPS Hub catalog entries). Same five
format paths (json / geojson / csv / shapefile / fgdb).
morie_datasets_arcgis_item_by_id( item_id, format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, dest = NULL )morie_datasets_arcgis_item_by_id( item_id, format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, dest = NULL )
item_id |
32-char hex GUID. |
format |
One of |
where |
Optional SoQL-style WHERE for the FeatureServer
query. Default |
max_features |
Optional row cap. |
layer_idx |
Integer layer index (default |
dest |
Optional destination path for binary downloads. |
The hub_id is ALWAYS resolved live (via the items API) because
there's no bundled catalog for non-TPS items. If you find
yourself calling this against the same item repeatedly, consider
adding a named wrapper (e.g. the shipped
morie_datasets_toronto_zoning_per_neighbourhood
wraps EsriCanadaEducation's af06159170914808983959df6163fc86
with bundled fixtures for offline use).
A data.frame (json / csv), parsed GeoJSON list, or
file path (binary).
Lightweight discovery helper – one network call to the ArcGIS
Online items API (/sharing/rest/content/items/<item_id>?f=json),
returns a single-row data.frame with the same columns the TPS Hub
catalog (morie_datasets_tps_arcgis_hub_layers)
returns: hub_id, title, type, feature_server_url, owner,
tags, snippet. Use this when the item is NOT in the bundled
TPS catalog (any non-TorontoPoliceService item).
morie_datasets_arcgis_item_metadata(item_id)morie_datasets_arcgis_item_metadata(item_id)
item_id |
32-char hex GUID for an ArcGIS Online item. |
A data.frame with one row.
# Vee's Toronto Zoning per Neighbourhood discovery # m <- morie_datasets_arcgis_item_metadata( # "af06159170914808983959df6163fc86") # m$title #> "Toronto Zoning per Neighbourhood"# Vee's Toronto Zoning per Neighbourhood discovery # m <- morie_datasets_arcgis_item_metadata( # "af06159170914808983959df6163fc86") # m$title #> "Toronto Zoning per Neighbourhood"
Ontario Use-of-Force aggregate summary (5-year 2020-2022, pre-RBDS rollup)
morie_datasets_arsau_aggregate_summary(offline = TRUE, resource_id = NULL)morie_datasets_arsau_aggregate_summary(offline = TRUE, resource_id = NULL)
offline |
If |
resource_id |
Optional override. |
A data.frame.
Ontario Use-of-Force detailed dataset (5-year 2020-2022, pre-RBDS)
morie_datasets_arsau_detailed_dataset(offline = TRUE, resource_id = NULL)morie_datasets_arsau_detailed_dataset(offline = TRUE, resource_id = NULL)
offline |
If |
resource_id |
Optional override. |
A data.frame.
Ontario Use-of-Force individual records (one row per individual-in-incident)
morie_datasets_arsau_uof_individual_records( year = "2024", offline = TRUE, resource_id = NULL )morie_datasets_arsau_uof_individual_records( year = "2024", offline = TRUE, resource_id = NULL )
year |
Reporting year ( |
offline |
If |
resource_id |
Optional CKAN resource id override. |
A data.frame.
Wraps the Ontario Police Use-of-Force Race-Based Data Strategy
resource. Offline mode reads a small bundled synthetic fixture
from inst/extdata/arsau_uof_main_records_sample.csv (5 rows in
the canonical 23-column subset of the 65-column upstream schema,
clearly stamped SYNTHETIC-FIXTURE-XXX). Live mode hits the
Ontario CKAN datastore-dump JSON endpoint for the requested
reporting year.
morie_datasets_arsau_uof_main_records( year = "2024", offline = TRUE, resource_id = NULL )morie_datasets_arsau_uof_main_records( year = "2024", offline = TRUE, resource_id = NULL )
year |
Reporting year ( |
offline |
If |
resource_id |
Optional CKAN resource id override. |
A data.frame.
Ontario Open Data Catalogue, "Police Use of Force" (https://data.ontario.ca/dataset/police-use-of-force-race-based-data); Open Government Licence – Ontario.
df <- morie_datasets_arsau_uof_main_records(offline = TRUE) head(df[, c("IncidentYear", "PoliceService", "IncidentType")])df <- morie_datasets_arsau_uof_main_records(offline = TRUE) head(df[, c("IncidentYear", "PoliceService", "IncidentType")])
Ontario Use-of-Force probe-cycle records (one row per CEW cartridge probe per individual-in-incident)
morie_datasets_arsau_uof_probe_cycle_records( year = "2024", offline = TRUE, resource_id = NULL )morie_datasets_arsau_uof_probe_cycle_records( year = "2024", offline = TRUE, resource_id = NULL )
year |
Reporting year ( |
offline |
If |
resource_id |
Optional CKAN resource id override. |
A data.frame.
Year 2023 weapon_records is marked INVALID by the Ontario ministry's
technical report and has no published CKAN resource; passing
year = "2023" with offline = FALSE raises.
morie_datasets_arsau_uof_weapon_records( year = "2024", offline = TRUE, resource_id = NULL )morie_datasets_arsau_uof_weapon_records( year = "2024", offline = TRUE, resource_id = NULL )
year |
Reporting year ( |
offline |
If |
resource_id |
Optional CKAN resource id override. |
A data.frame.
data.frame.Requires the bigrquery package and Application Default Credentials.
morie_datasets_bigquery( project, dataset, table, where = NULL, limit = NULL, select = "*", billing_project = NULL )morie_datasets_bigquery( project, dataset, table, where = NULL, limit = NULL, select = "*", billing_project = NULL )
project |
Source project (e.g. |
dataset |
Source dataset (e.g. |
table |
Source table (e.g. |
where |
Raw SQL WHERE clause (without leading |
limit |
Optional integer |
select |
Projection list; defaults to |
billing_project |
GCP project to bill; |
A data.frame.
Phase 3DDD4. Convenience wrapper over
morie_dataset_portal_catalog() that lets callers filter +
search by keyword, portal, api_mode, or loader pattern without
writing subset expressions by hand.
morie_datasets_browse( keyword = NULL, portal = NULL, api_mode = NULL, loader_pattern = NULL, keyword_includes_url = FALSE, sort_by = c("dataset_key", "source", "n_rows_bundled", "id") )morie_datasets_browse( keyword = NULL, portal = NULL, api_mode = NULL, loader_pattern = NULL, keyword_includes_url = FALSE, sort_by = c("dataset_key", "source", "n_rows_bundled", "id") )
keyword |
Optional case-insensitive substring to grep against
|
portal |
Optional portal name (see
|
api_mode |
Optional API mode substring to match against the
|
loader_pattern |
Optional perl-style regex against the
|
keyword_includes_url |
If |
sort_by |
Sort order: |
Filters compose with AND semantics. A keyword matches against
dataset_key + id + loader (case-insensitive). To match
anywhere including the dict URL, pass keyword_includes_url = TRUE.
A data.frame – the filtered subset of the catalog
with the same 7-column schema.
# All TPS datasets, alphabetical (offline; reads the cached # cross-portal catalog -- no network). tps <- morie_datasets_browse(portal = "tps_arcgis_hub") nrow(tps) # Anything mentioning "homicide" h <- morie_datasets_browse(keyword = "homicide") head(h$dataset_key)# All TPS datasets, alphabetical (offline; reads the cached # cross-portal catalog -- no network). tps <- morie_datasets_browse(portal = "tps_arcgis_hub") nrow(tps) # Anything mentioning "homicide" h <- morie_datasets_browse(keyword = "homicide") head(h$dataset_key)
Phase 3FFF3. Bundled 200-row sample of Calgary's per-community
per-month crime counts (Socrata id 78gh-n26t). Covers all 8
canonical CPS categories.
Phase 3FFF3. Bundled 200-row sample of Calgary fire response
calls (Socrata id bdez-pds9).
morie_datasets_calgary_community_crime_stats( offline = TRUE, max_features = NULL ) morie_datasets_calgary_fire_response_calls(offline = TRUE, max_features = NULL) morie_datasets_calgary_fire_stations(offline = TRUE, max_features = NULL)morie_datasets_calgary_community_crime_stats( offline = TRUE, max_features = NULL ) morie_datasets_calgary_fire_response_calls(offline = TRUE, max_features = NULL) morie_datasets_calgary_fire_stations(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with community, category, crime_count,
year, month.
Phase 3FFF3. Bundled snapshot of 157 City-of-Calgary Socrata datasets matched on crime-adjacent keywords (crime, police, fire, ambulance, traffic, incident, collision, bylaw, 311).
morie_datasets_calgary_open_crime_adjacent_layers(offline = TRUE) morie_datasets_edmonton_open_crime_adjacent_layers(offline = TRUE) morie_datasets_ottawa_open_crime_adjacent_layers(offline = TRUE)morie_datasets_calgary_open_crime_adjacent_layers(offline = TRUE) morie_datasets_edmonton_open_crime_adjacent_layers(offline = TRUE) morie_datasets_ottawa_open_crime_adjacent_layers(offline = TRUE)
offline |
If |
A data.frame with soda_id, title, type,
search_keyword.
Phase 3FFF3. Generic SODA2 fetch wrapper for arbitrary Calgary Socrata resources.
morie_datasets_calgary_socrata_by_id(soda_id, limit = 1000L) morie_datasets_edmonton_socrata_by_id(soda_id, limit = 1000L)morie_datasets_calgary_socrata_by_id(soda_id, limit = 1000L) morie_datasets_edmonton_socrata_by_id(soda_id, limit = 1000L)
soda_id |
4-4 Socrata resource ID. |
limit |
Page size (default 1000). |
A data.frame of records.
dpt3-jri9)Wraps the City of Chicago "Arrests" open dataset (Socrata resource
id dpt3-jri9; portal landing
https://data.cityofchicago.org/Public-Safety/Arrests/dpt3-jri9/about_data).
24 columns covering up to four charges per arrest plus the
pipe-concatenated rollup quartet (charges_statute /
charges_description / charges_type / charges_class).
morie_datasets_chicago_arrests( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_arrests( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
year |
Integer or |
max_features |
Integer or |
offline |
Logical; if |
resource_id |
Optional Socrata resource id override. Accepts
the UUID ( |
mode |
One of |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200). |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
Socrata accepts two resource specifiers interchangeably – the
numeric/UUID id (/resource/dpt3-jri9.json) and the human-readable
alias the publisher assigned (/resource/arrests.json). morie
defaults to the UUID for stability; pass resource_id = "arrests"
if you want to exercise the alias path.
Offline mode reads a bundled 5-row synthetic fixture
(inst/extdata/chicago_arrests_dpt3_jri9_sample.csv) carrying the
real upstream snake_case schema. Live mode hits the SODA2 endpoint
via .morie_dataset_socrata_fetch() and honours the 3OO opt-in
pagination (paginate = TRUE).
A data.frame with the documented 24-col Socrata schema.
City of Chicago Data Portal, "Arrests" (dpt3-jri9).
df <- morie_datasets_chicago_arrests(offline = TRUE) df$arrest_datedf <- morie_datasets_chicago_arrests(offline = TRUE) df$arrest_date
cauq-8yn6)Wraps the City of Chicago "Boundaries - Community Areas (current)"
open dataset (Socrata resource id cauq-8yn6; portal landing
https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6).
The 77 canonical Chicago community areas
(Rogers Park, West Ridge, Uptown, Lincoln Square, ..., Edgewater).
Resolves the community_area foreign key carried by every
morie_datasets_chicago_crime() row.
morie_datasets_chicago_community_areas( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_community_areas( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
offline |
If |
geometry |
If |
max_features |
Optional row cap. |
resource_id |
Optional view id override (default
|
paginate |
Logical; 3OO/3QQ opt-in pagination via
|
page_size |
Per-page row count when paginating. |
max_pages |
Safety net. |
app_token |
Optional Socrata app token (sent as
|
SODA3-only (same filtered/derived-view caveat as Wards).
Offline mode reads a bundled 77-row attribute-only fixture
(inst/extdata/chicago_community_areas.csv: 5 cols –
area_numbe, community, area_num_1, shape_area,
shape_len). The community column carries the official
canonical name in ALL CAPS.
A data.frame with 5 attribute cols (offline) or 6
including the_geom (live, geometry = TRUE).
City of Chicago Data Portal, "Boundaries - Community
Areas (current)" (cauq-8yn6).
df <- morie_datasets_chicago_community_areas(offline = TRUE) head(df[, c("area_numbe", "community")])df <- morie_datasets_chicago_community_areas(offline = TRUE) head(df[, c("area_numbe", "community")])
ijzp-q8t2)Wraps the City of Chicago "Crimes – 2001 to Present" open dataset
(Socrata resource id ijzp-q8t2; portal landing
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2/about_data).
22-column schema, one row per reported crime incident (except
murders, where one row per victim). Data are extracted from the
Chicago PD CLEAR system, refreshed daily with a seven-day lag,
and addresses are block-only redacted.
morie_datasets_chicago_crime( year = NULL, max_features = NULL, offline = TRUE, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_crime( year = NULL, max_features = NULL, offline = TRUE, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
year |
Integer or |
max_features |
Integer or |
offline |
Logical; if |
mode |
One of
|
paginate |
Logical; if |
page_size |
Integer; per-page row count when |
max_pages |
Integer; safety net on |
app_token |
Optional Socrata app token (SODA3 only – sent
as the |
Scale warning. As of 2026-05 the live feed carries ~8,557,071
rows (8.56M; last refreshed 2026-05-23) – too large for
spreadsheet programs and slow even for programmatic pulls without
filtering. Always prefer narrowing the query first (year = ...
server-side filter) or paginating with paginate = TRUE + a
large page_size (and ideally an app_token). A full unfiltered
pull at the default page_size = 1000 would issue ~8,560
requests; with page_size = 50000 + an app_token it drops to
~172.
Socrata accepts both the numeric id (/resource/ijzp-q8t2.json)
and the publisher's crimes alias (/resource/crimes.json).
SODA3 endpoints are also available
(/api/v3/views/crimes/query.json), as are CSV variants
(/resource/crimes.csv, /api/v3/views/crimes/query.csv).
morie defaults to SODA2 JSON via the UUID for stability.
Cross-referenced datasets (Chicago Open Data). The 22-col schema carries geographic and crime-classification foreign keys that other Chicago datasets resolve:
morie wraps via
morie_datasets_chicago_police_beats() (n9it-hstw).
morie wraps via
morie_datasets_chicago_police_districts() (24zt-jpfn).
morie wraps via
morie_datasets_chicago_wards() (sp34-6z76, 3UU).
morie wraps via
morie_datasets_chicago_community_areas() (cauq-8yn6, 3UU).
morie wraps via
morie_datasets_chicago_iucr_codes() (c7ck-438e, 3UU).
A data.frame with the documented Socrata schema.
ahwe-kpsy)Wraps the Socrata MAP VIEW derived from the main Crimes feed
(parent_fxf = ijzp-q8t2). Verified live as
type: map, parent_fxf: [ijzp-q8t2] via the Socrata catalog API;
landing page at
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present-Map/ahwe-kpsy.
morie_datasets_chicago_crime_map( date_from = NULL, date_to = NULL, where = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_crime_map( date_from = NULL, date_to = NULL, where = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
date_from |
Lower bound on |
date_to |
Upper bound on |
where |
Optional additional SoQL |
max_features |
Optional total row cap. |
offline |
Logical; if |
resource_id |
Optional view id override (default
|
paginate |
Logical; opt-in pagination via baked-in
|
page_size |
Per-page row count when paginating. |
max_pages |
Safety net on paginated walks. |
app_token |
Optional Socrata app token (sent as
|
SODA3-only. The SODA2 endpoint /resource/ahwe-kpsy.json does
technically return HTTP 200 but ships rows as empty objects
([{}]) – column resolution doesn't fire on map/filtered views.
This loader uses the SODA3 endpoint
/api/v3/views/ahwe-kpsy/query.json?query=SELECT ... WHERE ...
via .morie_dataset_soda3_query().
The live ahwe-kpsy view returns a 39-column schema:
22 base ijzp-q8t2 columns (id, case_number, date, ..., location)
4 reverse-geocoded extras (location_address, location_city,
location_state, location_zip)
4 Socrata-internal metadata cols (:id, :version,
:created_at, :updated_at)
9 :@computed_region_* spatial-overlay columns mapping each
row to other Chicago boundary layers (wards, community areas,
etc.) via Socrata's automatic point-in-polygon computation
Offline mode reads a bundled 5-row 39-col fixture
(inst/extdata/chicago_crime_map_ahwe_kpsy_sample.csv).
A data.frame with the 39-col schema.
City of Chicago Data Portal, "Crimes - 2001 to
Present - Map" (ahwe-kpsy), derived from ijzp-q8t2.
df <- morie_datasets_chicago_crime_map(offline = TRUE) df$primary_typedf <- morie_datasets_chicago_crime_map(offline = TRUE) df$primary_type
ijzp-q8t2)Third Socrata API mode: OData v4 at
/api/odata/v4/<view_id>, the same protocol Tableau / Power BI /
Excel speak natively. Use this when you want morie to consume the
Crimes feed the same way those tools do, or when you want
server-driven @odata.nextLink pagination instead of the
client-driven $offset walk that SODA2/SODA3 use.
morie_datasets_chicago_crime_odata( filter = NULL, select = NULL, orderby = NULL, top = NULL, skip = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, max_pages = 200L, app_token = NULL )morie_datasets_chicago_crime_odata( filter = NULL, select = NULL, orderby = NULL, top = NULL, skip = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, max_pages = 200L, app_token = NULL )
filter |
Optional OData |
select |
Optional comma-separated column list. |
orderby |
Optional OData |
top |
Optional per-request row count (= |
skip |
Optional start offset (= |
max_features |
Optional total row cap across pages. |
offline |
Logical; default |
resource_id |
Optional view id override (default
|
paginate |
Logical; if |
max_pages |
Safety net on paginated walks. |
app_token |
Optional Socrata app token (sent as |
When to reach for which API mode:
| Mode | morie wrapper | best for |
| SODA2 | morie_datasets_chicago_crime() |
base-feed pulls + $where filtering |
| SODA3 (SoQL) | morie_datasets_chicago_crime_soql() |
arbitrary SELECT ... WHERE |
| SODA3 (map view) | morie_datasets_chicago_crime_map() |
derived/filtered views (ahwe-kpsy) |
| OData v4 | morie_datasets_chicago_crime_odata() |
third-party tool ingestion |
Known Socrata limitation. $filter is unreliable on Socrata's
OData implementation – the parser frequently rejects equality
filters with "The types 'Edm.Boolean' and 'Edm.String' (or 'Edm.Decimal') are not compatible.". $top / $skip /
$select / $orderby all work; for filtering, use SODA3.
A data.frame.
Socrata OData docs: https://support.socrata.com/hc/en-us/articles/115005364207-Access-Data-Insights-Data-using-OData
df <- morie_datasets_chicago_crime_odata(offline = TRUE) nrow(df)df <- morie_datasets_chicago_crime_odata(offline = TRUE) nrow(df)
Phase 3VV+. Pulls a slice of morie_datasets_chicago_crime()
and left-joins each of its five canonical foreign keys against
the matching resolver dataset shipped in morie:
morie_datasets_chicago_crime_resolved( year = NULL, max_features = NULL, offline = TRUE, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL, resolvers = c("ward", "community_area", "beat", "district", "iucr") )morie_datasets_chicago_crime_resolved( year = NULL, max_features = NULL, offline = TRUE, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL, resolvers = c("ward", "community_area", "beat", "district", "iucr") )
year |
Integer or |
max_features |
Integer or |
offline |
Logical; if |
mode |
One of
|
paginate |
Logical; if |
page_size |
Integer; per-page row count when |
max_pages |
Integer; safety net on |
app_token |
Optional Socrata app token (SODA3 only – sent
as the |
resolvers |
Character subset of the 5 resolver names to
join. Default joins all 5. Pass a shorter vector to skip
specific joins (e.g. |
| crime field | resolver | join key |
beat |
morie_datasets_chicago_police_beats() |
beat == beat_num |
district |
morie_datasets_chicago_police_districts() |
district == dist_num |
ward |
morie_datasets_chicago_wards() |
ward == ward |
community_area |
morie_datasets_chicago_community_areas() |
community_area == area_numbe |
iucr |
morie_datasets_chicago_iucr_codes() |
iucr == iucr
|
The resolvers are loaded in offline mode (they're all bundled +
small), so this analyzer only touches the network for the crime
pull itself. Resolver columns are prefixed with the source name
(ward_*, community_*, beat_*, district_*, iucr_*) to
avoid collisions with the crime schema.
Both mode = "soda2" and mode = "soda3" are honoured for the
crime fetch, matching the dual-API design from 3VV+.
A wide data.frame: crime columns first, then the
joined resolver columns with their canonical prefixes.
df <- morie_datasets_chicago_crime_resolved( offline = TRUE, max_features = 5L, resolvers = c("ward", "iucr")) names(df)df <- morie_datasets_chicago_crime_resolved( offline = TRUE, max_features = 5L, resolvers = c("ward", "iucr")) names(df)
Sibling to morie_datasets_chicago_crime() but hits the
SODA3 /api/v3/views/crimes/query.json endpoint instead of
SODA2's /resource/ijzp-q8t2.json. The 8.56M-row scale of the
base feed makes SODA2's URL-param $where clumsy for non-trivial
filters; SODA3 lets you send the full SoQL SELECT ... WHERE ... ORDER BY ... string in one go.
morie_datasets_chicago_crime_soql( where = NULL, select = "*", order = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_crime_soql( where = NULL, select = "*", order = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
where |
Optional SoQL |
select |
Projection list (default |
order |
Optional SoQL |
max_features |
Optional total row cap. |
offline |
Logical; if |
resource_id |
Optional view id override (default
|
paginate |
Logical; opt-in pagination. |
page_size |
Per-page row count when paginating. |
max_pages |
Safety net. |
app_token |
Optional Socrata app token (sent as header). |
A data.frame.
df <- morie_datasets_chicago_crime_soql(offline = TRUE) nrow(df)df <- morie_datasets_chicago_crime_soql(offline = TRUE) nrow(df)
c7ck-438e)Wraps the City of Chicago "Chicago Police Department - Illinois
Uniform Crime Reporting (IUCR) Codes" reference table (Socrata
resource id c7ck-438e; portal landing
https://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e).
410 IUCR codes mapping the iucr foreign key carried by every
morie_datasets_chicago_crime() row to a human-readable
description. Five columns:
morie_datasets_chicago_iucr_codes( offline = TRUE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_iucr_codes( offline = TRUE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
offline |
If |
max_features |
Optional row cap. |
resource_id |
Optional view id override. |
mode |
One of |
paginate |
Logical; 3OO opt-in pagination. |
page_size |
Per-page row count when paginating. |
max_pages |
Safety net. |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
4-character IUCR code (e.g. "110" for homicide).
Top-level category (e.g.
"HOMICIDE").
Subcategory (e.g. "FIRST DEGREE MURDER").
"I" (FBI index crime) or other.
TRUE if the code is currently active.
Available via SODA2 (single-shot or paginated) – this is a base dataset, not a filtered view.
Offline mode reads a bundled 410-row complete fixture
(inst/extdata/chicago_iucr_codes.csv).
A data.frame.
City of Chicago Data Portal, "Chicago Police
Department - Illinois Uniform Crime Reporting (IUCR) Codes"
(c7ck-438e).
df <- morie_datasets_chicago_iucr_codes(offline = TRUE) subset(df, primary_description == "HOMICIDE")df <- morie_datasets_chicago_iucr_codes(offline = TRUE) subset(df, primary_description == "HOMICIDE")
Wraps the City of Chicago "Boundaries - Neighborhoods" open dataset
(Socrata resource id y6yq-dbs2; portal landing
https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Neighborhoods/bbvz-uum9).
98 neighbourhoods, originally derived from Neighborhoods_2012b
(updated 2025-02-20). The City notes these boundaries are
approximate and the names are not official.
morie_datasets_chicago_neighborhoods( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_neighborhoods( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
offline |
If |
geometry |
If |
max_features |
Optional cap on returned rows. |
resource_id |
Optional Socrata resource id override. |
mode |
One of |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1000, the unauthenticated Socrata ceiling). |
max_pages |
Safety net on paginated walks (default 200). |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
Offline mode reads a bundled 98-row attribute-only fixture
(pri_neigh, sec_neigh, shape_area, shape_len) – the
the_geom MultiPolygon column is stripped to keep the bundled
size sane (full GeoJSON is ~800 KB). Live mode hits the SODA2
endpoint via .morie_dataset_socrata_fetch() (mockable).
To get the polygons, pass geometry = TRUE in live mode, which
includes the SODA2 the_geom column.
A data.frame with 4 attribute columns (offline mode) or
5 cols including the_geom (live mode with geometry = TRUE).
City of Chicago Data Portal, "Boundaries - Neighborhoods"; based on Neighborhoods_2012b.
df <- morie_datasets_chicago_neighborhoods(offline = TRUE) head(df[, c("pri_neigh", "sec_neigh")])df <- morie_datasets_chicago_neighborhoods(offline = TRUE) head(df[, c("pri_neigh", "sec_neigh")])
n9it-hstw)Wraps the City of Chicago "Boundaries - Police Beats (current)"
open dataset (Socrata resource id n9it-hstw; portal landing
https://data.cityofchicago.org/Public-Safety/Boundaries-Police-Beats-current-/aerh-rz74).
Returns 277 Chicago Police beats with their parent sector + district
codes (verified live 2026-05). Attribute schema:
morie_datasets_chicago_police_beats( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_police_beats( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
offline |
If |
geometry |
If |
max_features |
Optional row cap. |
resource_id |
Optional Socrata resource id override (UUID
default; pass |
mode |
One of |
paginate |
Logical; 3OO opt-in pagination. |
page_size |
Per-page row count when paginating. |
max_pages |
Safety net on paginated walks. |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
4-digit beat id (district + 2-digit beat).
Within-sector beat sequence number (string).
Within-district sector number (string).
Parent district number (string).
Offline mode reads a bundled attribute-only fixture
(inst/extdata/chicago_police_beats.csv) – the the_geom
MultiPolygon column is stripped to keep bundle size sane.
Live mode hits the SODA2 JSON endpoint via
.morie_dataset_socrata_fetch() (mockable); pass geometry = TRUE
to include the_geom. Threads through the 3OO pagination args.
A data.frame with 4 attribute cols (offline) or 5
including the_geom (live, geometry = TRUE).
City of Chicago Data Portal, "Boundaries - Police
Beats (current)" (n9it-hstw).
df <- morie_datasets_chicago_police_beats(offline = TRUE) head(df)df <- morie_datasets_chicago_police_beats(offline = TRUE) head(df)
24zt-jpfn)Wraps the City of Chicago "Boundaries - Police Districts (current)"
open dataset (Socrata resource id 24zt-jpfn; portal landing
https://data.cityofchicago.org/Public-Safety/Boundaries-Police-Districts-current-/fthy-xz3r).
Returns 22 active districts (1-12, 14-20, 22, 24, 25) plus the
special "31" headquarters polygon. Attribute schema:
morie_datasets_chicago_police_districts( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_police_districts( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
offline |
If |
geometry |
If |
max_features |
Optional row cap. |
resource_id |
Optional Socrata resource id override. |
mode |
One of |
paginate |
Logical; 3OO opt-in pagination. |
page_size |
Per-page row count when paginating. |
max_pages |
Safety net on paginated walks. |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
District number (string, "1"-"31").
Display label (e.g. "1ST", "22ND").
Offline mode reads a bundled attribute-only fixture
(inst/extdata/chicago_police_districts.csv). Live mode hits
SODA2 JSON; pass geometry = TRUE for the_geom.
Socrata exposes this dataset in all four format permutations (SODA2 + SODA3, JSON + GeoJSON + CSV):
SODA2 JSON: /resource/24zt-jpfn.json
SODA2 GeoJSON: /resource/24zt-jpfn.geojson
SODA2 CSV: /resource/24zt-jpfn.csv
SODA3 JSON: /api/v3/views/24zt-jpfn/query.json
SODA3 GeoJSON: /api/v3/views/24zt-jpfn/query.geojson
morie defaults to SODA2 JSON; pass an explicit URL via
resource_id to exercise the others (e.g. for direct sf reads
you'd typically hit the GeoJSON variant via sf::st_read()
yourself rather than going through this loader).
A data.frame with 2 attribute cols (offline) or 3
including the_geom (live, geometry = TRUE).
City of Chicago Data Portal, "Boundaries - Police
Districts (current)" (24zt-jpfn).
df <- morie_datasets_chicago_police_districts(offline = TRUE) head(df)df <- morie_datasets_chicago_police_districts(offline = TRUE) head(df)
sp34-6z76)Wraps the City of Chicago "Boundaries - Wards (2023-)" open dataset
(Socrata resource id sp34-6z76; portal landing
https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Wards-2023-/sp34-6z76).
50 wards in the current City Council district map. Resolves the
ward foreign key carried by every
morie_datasets_chicago_crime() row.
morie_datasets_chicago_wards( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_chicago_wards( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
offline |
If |
geometry |
If |
max_features |
Optional row cap. |
resource_id |
Optional view id override (default
|
paginate |
Logical; 3OO/3QQ opt-in pagination via
|
page_size |
Per-page row count when paginating. |
max_pages |
Safety net. |
app_token |
Optional Socrata app token (sent as
|
SODA3-only. The SODA2 endpoint /resource/sp34-6z76.json
returns empty objects – this is a filtered/derived view on
Socrata. Live mode uses SODA3
(/api/v3/views/sp34-6z76/query.json) via
.morie_dataset_soda3_query().
Offline mode reads a bundled 50-row attribute-only fixture
(inst/extdata/chicago_wards.csv: ward / shape_leng /
shape_area). Live mode with geometry = TRUE also includes the
the_geom MultiPolygon column.
A data.frame with 3 attribute cols (offline) or 4
including the_geom (live, geometry = TRUE).
City of Chicago Data Portal, "Boundaries - Wards
(2023-)" (sp34-6z76).
df <- morie_datasets_chicago_wards(offline = TRUE) head(df)df <- morie_datasets_chicago_wards(offline = TRUE) head(df)
Pull every CSV resource of a CKAN package as a list of data frames.
morie_datasets_ckan_package(portal, package_id)morie_datasets_ckan_package(portal, package_id)
portal |
Character; CKAN portal base URL. |
package_id |
Character; CKAN package id or slug. |
Named list mapping resource_name -> data.frame.
Examples: "https://open.canada.ca/data", "https://data.ontario.ca",
"https://data.gov.uk", "https://data.europa.eu".
morie_datasets_ckan_search(portal, query, rows = 50L)morie_datasets_ckan_search(portal, query, rows = 50L)
portal |
Character; base URL of the CKAN portal. |
query |
Character; free-text search. |
rows |
Integer; max packages to return (default 50). |
A data.frame of package metadata.
Inmate-participant ethnic origin
morie_datasets_corrections_uof_ethnic_origin( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_ethnic_origin( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Incident-type lookup
morie_datasets_corrections_uof_incident_type( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_incident_type( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Use-of-force incidents (head dataset)
morie_datasets_corrections_uof_incidents( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_incidents( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
data.frame.
https://data.ontario.ca/dataset/use-of-force-in-correctional-institutions; Open Government Licence – Ontario.
Inmate-participant Indigenous identity
morie_datasets_corrections_uof_indigenous( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_indigenous( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Inmate-to-incidents bridging table
morie_datasets_corrections_uof_inmate_incident( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_inmate_incident( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Inmate-participant demographics (head)
morie_datasets_corrections_uof_inmate_participant( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_inmate_participant( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Institution-level annual incident summary
morie_datasets_corrections_uof_institution_summary( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_institution_summary( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Location-of-incident annual summary
morie_datasets_corrections_uof_location_summary( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_location_summary( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Inmate-participant race
morie_datasets_corrections_uof_race( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_race( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Inmate-participant religion
morie_datasets_corrections_uof_religion( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_religion( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Select-incident-type annual summary
morie_datasets_corrections_uof_select_incident_summary( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_select_incident_summary( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Staff-to-incidents bridging table
morie_datasets_corrections_uof_staff_incident( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_corrections_uof_staff_incident( offline = TRUE, resource_id = NULL, source = NULL )
offline |
Logical; |
resource_id |
Optional CKAN resource id override. |
source |
One of |
Resolves a CPADS analysis frame from one of three sources:
morie_datasets_cpads( offline = TRUE, mode = c("datastore_search", "csv"), limit = NULL, q = NULL )morie_datasets_cpads( offline = TRUE, mode = c("datastore_search", "csv"), limit = NULL, q = NULL )
offline |
Logical. |
mode |
Character; one of |
limit |
Integer or |
q |
Character or |
A pre-wrangled local RDS at morie_cpads_contract()$expected_wrangled_path
(only useful if you've already produced one), OR
the bundled 30-row real PUMF sample at
inst/extdata/cpads_pumf_synthetic.csv (when offline = TRUE,
the default; carries all 394 raw PUMF columns plus the 11
morie canonical analysis aliases), OR
the live PUMF via the CKAN
datastore_search
API (default, supports limit/q) or the full
PUMF CSV (with mode = "csv"):
https://open.canada.ca/data/dataset/736fa9b2-62e4-4e31-aea4-51869605b363/resource/d2639429-c304-45a6-90b3-770562f4d46d/download/cpads-2021-2022-pumf2.csv
(when offline = FALSE).
CPADS is open data published by Health Canada / Statistics Canada (Open Government Licence – Canada). Aggregate dashboards at https://health-infobase.canada.ca/substance-use/reports/cpads/; PUMF user guide: https://open.canada.ca/data/dataset/736fa9b2-62e4-4e31-aea4-51869605b363/resource/a078e4c3-a910-4349-b00e-6ea0d31d391d/download/20212022-cpads-pumf-user-guide.pdf. Sister surveys (CSADS, CSUS, CTADS) are at https://health-infobase.canada.ca/substance-use/.
A data.frame carrying every CPADS PUMF column plus
morie's canonical analysis aliases (weight,
alcohol_past12m, heavy_drinking_30d, ebac_tot,
ebac_legal, cannabis_any_use, age_group, gender,
province_region, mental_health, physical_health).
morie_cpads_contract() for the canonical schema +
column map; morie_datasets_load_by_key() for catalog-wide
dispatch.
Wraps the static historical arrests CSV published by the Chicago Police Department at https://www.chicagopolice.org/statistics-data/public-arrest-data/ covering adult and juvenile arrests from 01 JAN 2014 through 31 DEC 2017, with all personally identifying information removed. Ten upper-case-coded columns matching the CPD data dictionary:
morie_datasets_cpd_public_arrests( url = NULL, offline = TRUE, max_features = NULL )morie_datasets_cpd_public_arrests( url = NULL, offline = TRUE, max_features = NULL )
url |
Optional direct-CSV URL. If |
offline |
Logical; if |
max_features |
Integer or |
Chicago PD district (geographic boundary).
Chicago PD beat (geographic boundary).
Calendar year of the arrest.
Calendar month of the arrest.
Perceived race code.
IUCR/FBI crime category code.
ILCS / MCC statute charged.
Plain-text statute title.
ILCS/MCC charge class code.
"M" = misdemeanour, "F" = felony.
Unlike the SODA2 feeds, CPD publishes this as a single direct CSV
download with no documented API; the file URL is not stable across
CPD's quarterly republications. morie therefore ships an offline-
first loader; pass a url for live mode (visit the landing page
to find the current direct-CSV URL).
A data.frame with the 10-col CPD schema.
Chicago Police Department, "Public Arrest Data"; landing page at chicagopolice.org/statistics-data/public- arrest-data/.
df <- morie_datasets_cpd_public_arrests(offline = TRUE) df$STAT_DESCRdf <- morie_datasets_cpd_public_arrests(offline = TRUE) df$STAT_DESCR
Phase 3FFF3. Bundled 10-row fixture of Edmonton Police Service
station locations (Socrata id e7aq-scxv).
morie_datasets_edmonton_police_stations(offline = TRUE, max_features = NULL) morie_datasets_edmonton_fire_stations(offline = TRUE, max_features = NULL)morie_datasets_edmonton_police_stations(offline = TRUE, max_features = NULL) morie_datasets_edmonton_fire_stations(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with name, address, latitude,
longitude.
Sibling discovery helper to morie_datasets_ontario_ckan_layers(),
covering the non-Ontario open-data Socrata portals morie ships
offline-mode fixtures + mocked live-mode dispatch for.
morie_datasets_external_socrata_layers()morie_datasets_external_socrata_layers()
Coverage:
City of Chicago "Crimes – 2001 to Present" (ijzp-q8t2).
City of Chicago "Arrests" (dpt3-jri9; 3PP).
City of Chicago "Boundaries-Neighborhoods" (y6yq-dbs2).
City of Chicago "Boundaries-Police-Beats (current)"
(n9it-hstw; 3PP+).
City of Chicago "Boundaries-Police-Districts (current)"
(24zt-jpfn; 3PP+).
City of Chicago "Boundaries-Wards (2023-)"
(sp34-6z76; 3UU, SODA3-only).
City of Chicago "Boundaries-Community-Areas (current)"
(cauq-8yn6; 3UU, SODA3-only).
City of Chicago "IUCR Code Dictionary"
(c7ck-438e; 3UU).
NYC OpenData NYPD Stop, Question and Frisk (SQF) microdata –
three published years (2022 = e4yi-bvqr, 2023 = rbed-zzin,
2024 = 7v9w-k82r).
All Chicago Socrata endpoints accept both the numeric/UUID
specifier (/resource/<id>.json) and the publisher's
human-readable alias (/resource/<alias>.json, e.g.
/resource/arrests.json or /resource/crimes.json). morie's
wrappers default to the UUID for stability; pass resource_id = "<alias>" to exercise the alias path.
A data.frame with columns dataset_key, label,
portal, resource_url, fixture.
dataset_key
Phase 3EEE4. Single entry point that dispatches to the right
loader function for any of the ~550 datasets in
morie_dataset_portal_catalog(). Lets callers say
morie_datasets_load_by_key("vpd_crime") or
morie_datasets_load_by_key("hub:b4d0...assault") without
remembering whether the relevant loader is
morie_datasets_vpd_crime() or
morie_datasets_tps_arcgis_hub_by_id().
morie_datasets_load_by_key( dataset_key, offline = TRUE, max_features = NULL, mode = c("auto", "soda2", "soda3", "odata"), app_token = NULL, source = NULL )morie_datasets_load_by_key( dataset_key, offline = TRUE, max_features = NULL, mode = c("auto", "soda2", "soda3", "odata"), app_token = NULL, source = NULL )
dataset_key |
A |
offline |
If |
max_features |
Optional row cap forwarded to the underlying loader. |
mode |
One of |
app_token |
Optional Socrata application token forwarded to SODA3-capable loaders. 3FFF2. |
source |
Optional portal disambiguator (e.g.,
|
Resolution rules:
If the catalog's loader
column names a function that takes no required arguments
beyond optionally offline/max_features, that function is
called directly with offline = offline + max_features.
For Ontario / Montreal /
Toronto CKAN catalog entries whose loader is the generic
morie_datasets_*_ckan_resource(), the id (CKAN
package_name slug) is resolved to a primary resource via
package_show + first CSV resource, then fetched.
For TPS Hub entries
(source == "tps_arcgis_hub"), the bare hub_id is passed
to morie_datasets_tps_arcgis_hub_by_id().
For statcan_ccjs entries, returns the cube
metadata via morie_datasets_statcan_cube_metadata().
For vancouver_opendata entries
beyond the 9 bundled fixtures, dispatches to
morie_datasets_vancouver_opendata_by_id().
For datasets where the catalog only knows a key + portal
(no row-level fixture, no targeted wrapper), live = FALSE
raises a clear error pointing at the right live-mode dispatcher.
A data.frame (or, for StatCan, the WDS metadata list).
# All three calls below resolve to bundled offline fixtures (no # network). The first call warms the cross-portal catalog cache # (~2.8s); subsequent calls reuse it (<0.1s each). df1 <- morie_datasets_load_by_key("vpd_crime") # 550 rows df2 <- morie_datasets_load_by_key("nypd_arrests_ytd") # 5 rows df3 <- morie_datasets_load_by_key("assault") # 5 rows c(vpd = nrow(df1), nypd = nrow(df2), tps_assault = nrow(df3))# All three calls below resolve to bundled offline fixtures (no # network). The first call warms the cross-portal catalog cache # (~2.8s); subsequent calls reuse it (<0.1s each). df1 <- morie_datasets_load_by_key("vpd_crime") # 550 rows df2 <- morie_datasets_load_by_key("nypd_arrests_ytd") # 5 rows df3 <- morie_datasets_load_by_key("assault") # 5 rows c(vpd = nrow(df1), nypd = nrow(df2), tps_assault = nrow(df3))
Phase 3EEE1. Generic loader that hits CKAN's datastore_search
endpoint for a given resource_id. Useful for any MTL package
beyond the bundled SIM sample.
morie_datasets_montreal_ckan_resource( resource_id, limit = 100L, filters = NULL )morie_datasets_montreal_ckan_resource( resource_id, limit = 100L, filters = NULL )
resource_id |
CKAN resource UUID (from |
limit |
Page size (CKAN default 100, max varies by host). |
filters |
Optional named list of |
A data.frame of records.
## Not run: # Hypothetical SPVM station boundaries: df <- morie_datasets_montreal_ckan_resource( resource_id = "abc-def-...", limit = 50) ## End(Not run)## Not run: # Hypothetical SPVM station boundaries: df <- morie_datasets_montreal_ckan_resource( resource_id = "abc-def-...", limit = 50) ## End(Not run)
Phase 3EEE1. Bundled 23-row snapshot of every CKAN package in the Law / Justice / Public Safety group on donnees.montreal.ca. Includes the SIM fire/EMS interventions dataset, SPVM police station boundaries, municipal regulations, traffic collisions, and ~20 others.
morie_datasets_montreal_justice_safety_layers(offline = TRUE)morie_datasets_montreal_justice_safety_layers(offline = TRUE)
offline |
If |
A data.frame with package_name, title,
num_resources, metadata_modified, language, license.
cat_df <- morie_datasets_montreal_justice_safety_layers() nrow(cat_df) # 23 head(cat_df$title)cat_df <- morie_datasets_montreal_justice_safety_layers() nrow(cat_df) # 23 head(cat_df$title)
Phase 3EEE1. Bundled lookup table mapping the
INCIDENT_TYPE_DESC codes used in SIM interventions to their
canonical French descriptions (from the dataset's own
type-interventions-descriptions20161122.csv sidecar).
morie_datasets_montreal_sim_intervention_types()morie_datasets_montreal_sim_intervention_types()
A data.frame with INCIDENT_TYPE_DESCRIPTION +
Description.
d <- morie_datasets_montreal_sim_intervention_types() nrow(d) head(d)d <- morie_datasets_montreal_sim_intervention_types() nrow(d) head(d)
Phase 3EEE1. Bundled stratified 349-row sample (50 rows per DESCRIPTION_GROUPE category) of SIM (Service de securite incendie de Montreal) interventions, drawn from the full 172,899-row open feed for years 2005-2026.
morie_datasets_montreal_sim_interventions( offline = TRUE, csv_path = NULL, max_features = NULL )morie_datasets_montreal_sim_interventions( offline = TRUE, csv_path = NULL, max_features = NULL )
offline |
If |
csv_path |
Optional path to a user-downloaded full CSV. |
max_features |
Optional row cap. |
Three source modes:
offline = TRUE (default)Bundled 349-row sample for tests + intro examples.
csv_path = "..."Reads a user-downloaded
donneesouvertes-interventions-sim.csv (or yearly variant)
from the CKAN resource link.
Columns (13): INCIDENT_NBR (per-year incident id), CREATION_DATE_TIME, INCIDENT_TYPE_DESC, DESCRIPTION_GROUPE, CASERNE (fire-hall number), NOM_VILLE, NOM_ARROND (arrondissement), DIVISION, NOMBRE_UNITES (vehicles deployed), MTM8_X, MTM8_Y (Quebec MTM zone 8 NAD83 / EPSG:32188), LONGITUDE, LATITUDE (WGS84, obfuscated to intersections per privacy policy).
A data.frame with the 13 SIM columns.
CKAN package
interventions-service-securite-incendie-montreal,
https://donnees.montreal.ca/dataset/interventions-service-securite-incendie-montreal.
df <- morie_datasets_montreal_sim_interventions(offline = TRUE) nrow(df) # 349 table(df$DESCRIPTION_GROUPE)df <- morie_datasets_montreal_sim_interventions(offline = TRUE) nrow(df) # 349 table(df$DESCRIPTION_GROUPE)
NamUs missing-persons case metadata.
morie_datasets_namus_missing_persons( state = NULL, max_features = NULL, offline = FALSE )morie_datasets_namus_missing_persons( state = NULL, max_features = NULL, offline = FALSE )
state |
Character; two-letter US state code or |
max_features |
Integer or |
offline |
Logical; if |
A data.frame.
Requires an API key (api_key= or FBI_CDE_API_KEY env var).
morie_datasets_nibrs( year = NULL, max_features = NULL, state = NULL, offense = NULL, api_key = NULL, offline = FALSE )morie_datasets_nibrs( year = NULL, max_features = NULL, state = NULL, offense = NULL, api_key = NULL, offline = FALSE )
year |
Integer; reporting year (required unless |
max_features |
Integer or |
state |
Character; two-letter US state code, or |
offense |
Character; NIBRS offence slug, or |
api_key |
Character; FBI CDE API key (or |
offline |
Logical; if |
A data.frame.
NIST Reference Datasets (RDS) catalog metadata.
morie_datasets_nist_rds( dataset_id = NULL, query = NULL, max_features = NULL, offline = FALSE )morie_datasets_nist_rds( dataset_id = NULL, query = NULL, max_features = NULL, offline = FALSE )
dataset_id |
Character or |
query |
Character or |
max_features |
Integer or |
offline |
Logical; if |
A data.frame with the NIST RDS catalog schema.
gthc-hcne)Wraps the NYC OpenData "Borough Boundaries" feed (5 NYC
boroughs: Manhattan, Bronx, Brooklyn, Queens, Staten Island).
Used as the resolver for the arrest_boro (1-letter codes) /
boro_nm (full names) / patrol_borough_name foreign keys on
NYPD CJ datasets (3NN).
morie_datasets_nyc_boroughs( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_nyc_boroughs( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
offline |
If |
geometry |
If |
max_features |
Optional row cap. |
resource_id |
Optional view id override (default
|
mode |
One of |
paginate |
Logical; 3OO/3QQ opt-in pagination via
|
page_size |
Per-page row count when paginating. |
max_pages |
Safety net. |
app_token |
Optional Socrata app token (sent as
|
Attribute schema: borocode (string, "1"-"5"), boroname
(capitalised name), shape_area, shape_leng. Live mode also
returns the_geom MultiPolygon when geometry = TRUE.
A data.frame.
Phase 3CCC2. One-stop index of every NYC boundary fixture morie ships, with its loader, SODA id, expected row count, and a note on its join key.
morie_datasets_nyc_boundaries_catalog()morie_datasets_nyc_boundaries_catalog()
NOTE: school/council/community/NTA boundaries are NOT directly
row-key joinable to NYPD CJ data – the CJ rows carry lat/long
(or just precinct/borough), not a district ID. Use these loaders
standalone for geographic context, or pair with a spatial join
via the sf package on the_geom (not bundled to keep morie
lightweight).
A data.frame with one row per boundary fixture.
morie_datasets_nyc_boundaries_catalog()morie_datasets_nyc_boundaries_catalog()
Phase 3CCC2. Bundled snapshot of NYC OpenData
5crt-au7u (71 districts).
morie_datasets_nyc_community_districts(offline = TRUE, max_features = NULL)morie_datasets_nyc_community_districts(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with boro_cd, shape_leng, shape_area.
Phase 3CCC2. Bundled snapshot of NYC OpenData
872g-cjhh (51 districts).
morie_datasets_nyc_council_districts(offline = TRUE, max_features = NULL)morie_datasets_nyc_council_districts(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with coundist, shape_leng, shape_area.
Phase 3CCC2. Bundled snapshot of NYC OpenData
9nt8-h7nd (262 NTAs from the 2020 census revision).
Carries boro + county FIPS + parent CDTA so it can be aggregated
upward without spatial intersection.
morie_datasets_nyc_ntas_2020(offline = TRUE, max_features = NULL)morie_datasets_nyc_ntas_2020(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with 11 cols including nta2020, ntaname,
borocode, boroname, countyfips, cdta2020, cdtaname.
NYPD Arrests Data (Historic)
morie_datasets_nyc_nypd_arrests_historic( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )morie_datasets_nyc_nypd_arrests_historic( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
mode |
One of |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
NYPD Arrest Data (Year to Date)
morie_datasets_nyc_nypd_arrests_ytd( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )morie_datasets_nyc_nypd_arrests_ytd( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
mode |
One of |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
NYPD CJ datasets reference boroughs through three different encodings depending on the table:
morie_datasets_nyc_nypd_boro_crosswalk()morie_datasets_nyc_nypd_boro_crosswalk()
arrest_boro1-letter code "B/Q/M/S/K" (Arrests).
boro_nmUPPER full name "MANHATTAN" etc. (Complaints).
borocode / boroname
numeric "1"-"5" + Title-Case (Borough Boundaries gthc-hcne).
This helper returns a 5-row crosswalk between all four forms so
callers can left-join NYPD data against the
morie_datasets_nyc_boroughs() boundary table regardless of
which encoding their source uses. Used internally by
morie_datasets_nyc_nypd_resolved().
A data.frame with 4 columns: arrest_boro, boro_nm,
borocode, boroname.
Generic NYC NYPD dataset loader by registry key
morie_datasets_nyc_nypd_by_key( dataset_key, year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )morie_datasets_nyc_nypd_by_key( dataset_key, year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )
dataset_key |
One of the keys in
|
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
mode |
One of |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
A data.frame.
NYPD Complaint Data Historic
morie_datasets_nyc_nypd_complaint_historic( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )morie_datasets_nyc_nypd_complaint_historic( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
mode |
One of |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
NYPD Complaint Data Current (Year To Date)
morie_datasets_nyc_nypd_complaint_ytd( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )morie_datasets_nyc_nypd_complaint_ytd( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
mode |
One of |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
NYPD Hate Crimes
morie_datasets_nyc_nypd_hate_crimes( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )morie_datasets_nyc_nypd_hate_crimes( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
mode |
One of |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
Phase 3CCC1. Maps the leading alpha prefix of an NYPD law_code
(e.g., "PL" in "PL 1601005") to its human-readable statute
book name + jurisdiction (NYS vs NYC). Covers all 22 distinct
prefixes observed in the YTD arrests feed + 24 additional canonical
NYS / NYC codes that appear in complaint, summons, and historical
arrest data.
morie_datasets_nyc_nypd_law_books()morie_datasets_nyc_nypd_law_books()
A data.frame with columns book, name, jurisdiction.
books <- morie_datasets_nyc_nypd_law_books() subset(books, book == "PL")books <- morie_datasets_nyc_nypd_law_books() subset(books, book == "PL")
List the NYPD criminal-justice Socrata datasets wrapped by morie
morie_datasets_nyc_nypd_layers()morie_datasets_nyc_nypd_layers()
A data.frame with 8 columns:
dataset_key, label, resource_id, resource_url,
permalink (data.cityofnewyork.us/d/<id> stable redirect),
data_dictionary_url (XLSX, when published as a dataset
attachment; NA_character_ otherwise),
footnotes_url (PDF, when published; NA_character_ otherwise),
fixture (bundled-fixture filename).
Currently only nypd_arrests_ytd carries the
canonical NYC OpenData attachment URLs (XLSX dictionary + PDF
footnotes). The other 7 entries leave those slots NA; PRs
welcome to fill them in when the asset UUIDs are looked up at
the dataset's landing page.
ky_cd + pd_cd + descriptions)NYC OpenData does NOT publish a standalone NYPD-offense-code
table; the canonical mapping is implicit in the
(ky_cd, ofns_desc, pd_cd, pd_desc, law_cat_cd) tuples
carried by every Arrests / Complaints record. This bundled
fixture was derived by running a $group query on the NYPD
Arrests YTD feed (uip8-fykc) at fixture-creation time, giving
the 246 distinct offense tuples currently in active use.
Mirrors the chicago_iucr_codes pattern (3UU).
morie_datasets_nyc_nypd_offense_codes(max_features = NULL)morie_datasets_nyc_nypd_offense_codes(max_features = NULL)
max_features |
Optional row cap. |
Schema (all character):
3-digit Key Code (top-level offense category).
Description for ky_cd (NYPD-truncated to
30 chars; e.g. "MURDER & NON-NEGL. MANSLAUGHTE").
3-digit Penal-Detailed code (subcategory).
Description for pd_cd (same truncation).
Penal classification: "F" felony / "M" misdemeanor / "V" violation / "I" infraction / (blank).
The string descriptions ARE truncated at 30 chars in the
upstream NYPD feeds; this is NOT a morie processing bug – it's
how NYPD's NYS DCJS warehouse stores them. PRs welcome to add a
parallel pd_desc_full column once a canonical un-truncated
source is identified.
Refreshing the fixture:
# Re-derive when the YTD feed adds new offense tuples (rare): # curl "https://data.cityofnewyork.us/resource/uip8-fykc.json # ?$select=ky_cd,ofns_desc,pd_cd,pd_desc,law_cat_cd # &$group=ky_cd,ofns_desc,pd_cd,pd_desc,law_cat_cd # &$order=ky_cd,pd_cd&$limit=10000" # then write to inst/extdata/nyc_nypd_offense_codes.csv.
A data.frame with 246 rows x 5 cols.
codes <- morie_datasets_nyc_nypd_offense_codes() subset(codes, ky_cd == "104") # all RAPE subcategoriescodes <- morie_datasets_nyc_nypd_offense_codes() subset(codes, ky_cd == "104") # all RAPE subcategories
Phase 3AAA. Pulls a slice of any
morie_datasets_nyc_nypd_by_key()-resolvable dataset and
left-joins its borough + precinct foreign keys against the
bundled resolvers (morie_datasets_nyc_boroughs() +
morie_datasets_nyc_police_precincts()).
morie_datasets_nyc_nypd_resolved( dataset_key, year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL, resolvers = c("boro", "precinct", "offense", "law_code") )morie_datasets_nyc_nypd_resolved( dataset_key, year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL, resolvers = c("boro", "precinct", "offense", "law_code") )
dataset_key |
One of the keys in
|
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
mode |
One of |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
resolvers |
Character subset of |
Auto-detects the borough + precinct columns per dataset:
| NYPD dataset | boro column | precinct column |
| nypd_arrests_historic | arrest_boro (M/B/K/Q/S) |
arrest_precinct |
| nypd_arrests_ytd | arrest_boro |
arrest_precinct |
| nypd_complaint_historic | boro_nm (UPPER) |
addr_pct_cd |
| nypd_complaint_ytd | boro_nm (UPPER) |
addr_pct_cd |
| nypd_hate_crimes | patrol_borough_name |
complaint_precinct_code |
| nypd_uof_incidents | (none directly; precinct only) | precinct
|
Resolver columns prefixed boro_* + precinct_* to avoid
collisions. Left-join semantics (row count preserved).
A wide data.frame: NYPD columns first, then prefixed
resolver columns.
df <- morie_datasets_nyc_nypd_resolved("nypd_arrests_ytd", offline = TRUE) names(df)df <- morie_datasets_nyc_nypd_resolved("nypd_arrests_ytd", offline = TRUE) names(df)
NYPD Use of Force Incidents
morie_datasets_nyc_nypd_uof_incidents( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )morie_datasets_nyc_nypd_uof_incidents( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
mode |
One of |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
NYPD Use of Force: Subjects
morie_datasets_nyc_nypd_uof_subjects( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )morie_datasets_nyc_nypd_uof_subjects( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
mode |
One of |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
NYPD Vehicle Stop Reports
morie_datasets_nyc_nypd_vehicle_stops( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )morie_datasets_nyc_nypd_vehicle_stops( year = NULL, max_features = NULL, offline = TRUE, resource_id = NULL, paginate = FALSE, page_size = 1000L, max_pages = 200L, mode = c("soda2", "soda3"), app_token = NULL )
year |
Optional year filter (server-side SoQL). |
max_features |
Optional row cap. When |
offline |
If |
resource_id |
Optional Socrata resource id override. |
paginate |
Logical; if |
page_size |
Per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Safety net on paginated walks (default 200 -> up to 200,000 rows without an app_token). |
mode |
One of |
app_token |
Optional Socrata API app token for higher rate
limits; passed as the |
Phase 3HHH1. Bundled snapshot of every CKAN package on donnees.montreal.ca – substantially broader than the 23-row Loi/Justice/Securite subset from 3EEE1.
Phase 3HHH2. Bundled snapshot of every dataset on opendata.vancouver.ca with richer schema (publisher, theme, license, records_count).
morie_datasets_nyc_opendata_bulk_layers(offline = TRUE) morie_datasets_chicago_opendata_bulk_layers(offline = TRUE) morie_datasets_toronto_opendata_bulk_layers(offline = TRUE) morie_datasets_calgary_opendata_bulk_layers(offline = TRUE) morie_datasets_edmonton_opendata_bulk_layers(offline = TRUE) morie_datasets_ottawa_opendata_bulk_layers(offline = TRUE) morie_datasets_montreal_opendata_bulk_layers(offline = TRUE) morie_datasets_vancouver_opendata_bulk_layers(offline = TRUE)morie_datasets_nyc_opendata_bulk_layers(offline = TRUE) morie_datasets_chicago_opendata_bulk_layers(offline = TRUE) morie_datasets_toronto_opendata_bulk_layers(offline = TRUE) morie_datasets_calgary_opendata_bulk_layers(offline = TRUE) morie_datasets_edmonton_opendata_bulk_layers(offline = TRUE) morie_datasets_ottawa_opendata_bulk_layers(offline = TRUE) morie_datasets_montreal_opendata_bulk_layers(offline = TRUE) morie_datasets_vancouver_opendata_bulk_layers(offline = TRUE)
offline |
If |
Tabular catalog snapshot.
y76i-bdw7)Wraps the NYC OpenData "Police Precincts" feed (77 precincts +
the special precinct 22 / Central Park alias = 78 rows in this
fixture). Used as the resolver for the arrest_precinct /
addr_pct_cd / complaint_precinct_code foreign keys on every
NYPD CJ dataset (3NN).
morie_datasets_nyc_police_precincts( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )morie_datasets_nyc_police_precincts( offline = TRUE, geometry = FALSE, max_features = NULL, resource_id = NULL, mode = c("soda2", "soda3"), paginate = FALSE, page_size = 1000L, max_pages = 200L, app_token = NULL )
offline |
If |
geometry |
If |
max_features |
Optional row cap. |
resource_id |
Optional view id override (default
|
mode |
One of |
paginate |
Logical; 3OO/3QQ opt-in pagination via
|
page_size |
Per-page row count when paginating. |
max_pages |
Safety net. |
app_token |
Optional Socrata app token (sent as
|
Attribute schema: precinct (string, "1"-"123"), shape_leng,
shape_area. Live mode also returns the_geom MultiPolygon
when geometry = TRUE.
A data.frame.
Phase 3CCC2. Bundled snapshot of NYC OpenData
8ugf-3d8u (33 districts).
morie_datasets_nyc_school_districts(offline = TRUE, max_features = NULL)morie_datasets_nyc_school_districts(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with schooldist, shape_leng, shape_area.
df <- morie_datasets_nyc_school_districts(offline = TRUE) nrow(df) # 33df <- morie_datasets_nyc_school_districts(offline = TRUE) nrow(df) # 33
Fetch a NYC OpenData Socrata dataset by ID
Fetch a Chicago Open Data Socrata dataset by ID
morie_datasets_nyc_socrata_by_id(soda_id, limit = 1000L) morie_datasets_chicago_socrata_by_id(soda_id, limit = 1000L)morie_datasets_nyc_socrata_by_id(soda_id, limit = 1000L) morie_datasets_chicago_socrata_by_id(soda_id, limit = 1000L)
soda_id |
4-4 Socrata resource ID. |
limit |
Page size. |
A data.frame of records.
NYPD Stop, Question and Frisk (SQF) microdata via NYC OpenData.
morie_datasets_nyc_stop_and_frisk( year = NULL, max_features = NULL, offline = TRUE, paginate = FALSE, page_size = 1000L, max_pages = 200L )morie_datasets_nyc_stop_and_frisk( year = NULL, max_features = NULL, offline = TRUE, paginate = FALSE, page_size = 1000L, max_pages = 200L )
year |
Integer or |
max_features |
Integer or |
offline |
Logical; if |
paginate |
Logical; if |
page_size |
Integer; per-page row count when paginating (default 1,000, the unauthenticated SODA2 ceiling). |
max_pages |
Integer; safety net on paginated walks (default 200). |
A data.frame. Schema is NOT normalised across years.
Phase 3CCC2. Bundled snapshot of NYC OpenData
35j5-n34v (221 ZCTAs intersecting NYC). ZCTAs are the Census
Bureau's geographic approximation of USPS ZIP code service areas
– pair with NYPD address-bearing data via ZIP code lookups for
a coarser-than-precinct, finer-than-borough geography.
morie_datasets_nyc_zctas(offline = TRUE, max_features = NULL)morie_datasets_nyc_zctas(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with zcta5, arealand, areawater,
centlat, centlon, intptlat, intptlon.
Generic Ontario CKAN dataset loader (by registry key)
morie_datasets_ontario_ckan_by_key( dataset_key, offline = TRUE, resource_id = NULL )morie_datasets_ontario_ckan_by_key( dataset_key, offline = TRUE, resource_id = NULL )
dataset_key |
One of the keys in
|
offline |
If |
resource_id |
Optional CKAN resource_id override. |
A data.frame.
Returns the consolidated registry of every Ontario Open Data feed
morie ships an offline-mode fixture + mocked live-mode dispatch for.
Pair with morie_datasets_ontario_ckan_by_key() for generic factory
access by dataset_key.
morie_datasets_ontario_ckan_layers()morie_datasets_ontario_ckan_layers()
Coverage as of this release:
ARSAU UoF: 2 main + 2 individual + 2 probe_cycle + 1 weapon (2024)
2 5-year aggregates (aggregate_summary + detailed_dataset).
OTIS: d01 Deaths-in-Custody.
d02-d07 OTIS deaths variants + OTIS a01/b/c families: known but resource_ids not yet wired in (PRs welcome).
A data.frame with columns dataset_key, label,
resource_id, family, year, fixture.
Thin compatibility shim that delegates to
morie_datasets_otis_a01_restrictive_confinement(). The OTIS A01
dataset is published openly at
https://data.ontario.ca/dataset/data-on-inmates-in-ontario
(Ontario Solicitor General; Open Government Licence – Ontario,
CKAN resource id 5a0c5804-a055-4031-9743-73f556e43bb4).
morie_datasets_otis_a01(offline = TRUE, ...)morie_datasets_otis_a01(offline = TRUE, ...)
offline |
Logical. |
... |
Forwarded to
|
Earlier morie versions wrongly claimed this data was FOI-only; that was incorrect and has been retracted as of 3MMM.
A data.frame.
morie_datasets_otis_a01_restrictive_confinement(),
morie_datasets_load_by_key().
OTIS a01 – Restrictive Confinement (detailed per-individual)
morie_datasets_otis_a01_restrictive_confinement( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_a01_restrictive_confinement( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
A data.frame with the canonical 10-col schema
(EndFiscalYear, UniqueIndividual_ID,
Region_AtTimeOfPlacement, Region_MostRecentPlacement,
Gender, Age_Category, MentalHealth_Alert,
SuicideRisk_Alert, SuicideWatch_Alert,
Number_Of_Placements).
OTIS b01 – Segregation detailed (per-individual episodes)
morie_datasets_otis_b01_segregation_detailed( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_b01_segregation_detailed( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
A data.frame with the canonical 18-col schema.
OTIS b02 – Segregation total days per individual
morie_datasets_otis_b02_segregation_total_days( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_b02_segregation_total_days( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS b03 – Segregation placements: alerts + hold by institution
morie_datasets_otis_b03_seg_alerts_by_institution( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_b03_seg_alerts_by_institution( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS b04 – Segregation consecutive durations by region
morie_datasets_otis_b04_seg_consecutive_by_region( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_b04_seg_consecutive_by_region( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS b05 – Segregation placements by consecutive-length bucket
morie_datasets_otis_b05_seg_consecutive_lengths( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_b05_seg_consecutive_lengths( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS b06 – Segregation placements: reason for placement by institution
morie_datasets_otis_b06_seg_reason_by_institution( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_b06_seg_reason_by_institution( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS b07 – Segregation placements: alerts + hold by gender
morie_datasets_otis_b07_seg_alerts_by_gender( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_b07_seg_alerts_by_gender( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS b08 – Segregation consecutive durations by institution
morie_datasets_otis_b08_seg_consecutive_by_institution( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_b08_seg_consecutive_by_institution( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS b09 – Individuals in segregation by number of times placed
morie_datasets_otis_b09_seg_n_times( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_b09_seg_n_times( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c01 – Total individuals (in custody / restrictive confinement / segregation)
morie_datasets_otis_c01_individuals_total( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c01_individuals_total( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c02 – Individuals by institution
morie_datasets_otis_c02_individuals_by_institution( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c02_individuals_by_institution( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c03 – Individuals by race x gender
morie_datasets_otis_c03_individuals_race_by_gender( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c03_individuals_race_by_gender( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c04 – Individuals by race x region
morie_datasets_otis_c04_individuals_race_by_region( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c04_individuals_race_by_region( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c05 – Individuals by religion x region
morie_datasets_otis_c05_individuals_religion_by_region( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c05_individuals_religion_by_region( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c06 – Individuals by age category x region
morie_datasets_otis_c06_individuals_age_by_region( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c06_individuals_age_by_region( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c07 – Individuals: alerts + hold flags
morie_datasets_otis_c07_individuals_alerts( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c07_individuals_alerts( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c08 – Individuals by religion x gender
morie_datasets_otis_c08_individuals_religion_by_gender( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c08_individuals_religion_by_gender( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c09 – Individuals by age category x gender
morie_datasets_otis_c09_individuals_age_by_gender( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c09_individuals_age_by_gender( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c10 – Aggregate durations by institution
morie_datasets_otis_c10_aggregate_durations_by_institution( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c10_aggregate_durations_by_institution( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c11 – Aggregate lengths
morie_datasets_otis_c11_aggregate_lengths( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c11_aggregate_lengths( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS c12 – Aggregate durations by region
morie_datasets_otis_c12_aggregate_durations_by_region( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_c12_aggregate_durations_by_region( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
Wraps the Ontario "Data on Inmates in Ontario" d01 resource.
Schema: Year, UniqueIndividual_ID, Region_AtTimeOfDeath, HousingUnit_Type, MedicalCauseofDeath, MeansofDeath.
morie_datasets_otis_d01_deaths_in_custody( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_d01_deaths_in_custody( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id override. |
source |
One of |
A data.frame.
Ontario Open Data Catalogue, "Data on Inmates in Ontario" (https://data.ontario.ca/dataset/data-on-inmates-in-ontario); Open Government Licence – Ontario.
df <- morie_datasets_otis_d01_deaths_in_custody(offline = TRUE) table(df$Region_AtTimeOfDeath)df <- morie_datasets_otis_d01_deaths_in_custody(offline = TRUE) table(df$Region_AtTimeOfDeath)
OTIS d02 – Deaths in custody by gender
morie_datasets_otis_d02_deaths_by_gender( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_d02_deaths_by_gender( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS d03 – Deaths in custody by race
morie_datasets_otis_d03_deaths_by_race( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_d03_deaths_by_race( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS d04 – Deaths in custody by religion
morie_datasets_otis_d04_deaths_by_religion( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_d04_deaths_by_religion( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS d05 – Deaths in custody by age category
morie_datasets_otis_d05_deaths_by_age_category( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_d05_deaths_by_age_category( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS d06 – Deaths in custody by alert type x institution
morie_datasets_otis_d06_cause_by_alert( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_d06_cause_by_alert( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
OTIS d07 – Deaths in custody alerts x housing unit
morie_datasets_otis_d07_alerts_by_housing_unit( offline = TRUE, resource_id = NULL, source = NULL )morie_datasets_otis_d07_alerts_by_housing_unit( offline = TRUE, resource_id = NULL, source = NULL )
offline |
If |
resource_id |
Optional CKAN resource id (required for live). |
source |
One of |
The SIU re-launched their site in 2025 with a JS-rendered case list; this returns the legacy-pattern anchor frame which may be empty.
morie_datasets_siu_director_reports()morie_datasets_siu_director_reports()
A data.frame with columns case_number, url, posted_date.
Extract structured fields from an SIU director's-report text or URL.
morie_datasets_siu_report_fields(text_or_url)morie_datasets_siu_report_fields(text_or_url)
text_or_url |
Character scalar; either the report text (re-used) or a PDF URL (fetched and parsed first). |
A named list with fields report_id, incident_date,
conclusion, sections.
Download an SIU director's-report PDF and return its plain text.
morie_datasets_siu_report_text(url = NULL, offline = FALSE)morie_datasets_siu_report_text(url = NULL, offline = FALSE)
url |
Character; direct PDF URL. Required unless |
offline |
Logical; if |
Character scalar (the plain text).
Phase 3DDD3. Bundled 10-row registry of high-traffic Canadian Centre for Justice and Community Safety Statistics cubes published through StatCan CODR – the federal-level complement to morie's provincial loaders (Ontario OTIS, BC VPD, etc.).
morie_datasets_statcan_ccjs_cubes()morie_datasets_statcan_ccjs_cubes()
A data.frame with product_id, cube_title_en,
dimensions, frequency.
cubes <- morie_datasets_statcan_ccjs_cubes() subset(cubes, grepl("homicide", cube_title_en, ignore.case = TRUE))cubes <- morie_datasets_statcan_ccjs_cubes() subset(cubes, grepl("homicide", cube_title_en, ignore.case = TRUE))
Phase 3DDD3. Wraps the getCubeMetadata POST endpoint.
morie_datasets_statcan_cube_metadata(product_id, timeout_s = 60L)morie_datasets_statcan_cube_metadata(product_id, timeout_s = 60L)
product_id |
Integer cube ID (e.g., |
timeout_s |
HTTP timeout in seconds. |
A list with status and object (dimensions, members,
release info, etc.). Errors if status != "SUCCESS".
## Not run: meta <- morie_datasets_statcan_cube_metadata(35100177) meta$object$cubeTitleEn length(meta$object$dimension) ## End(Not run)## Not run: meta <- morie_datasets_statcan_cube_metadata(35100177) meta$object$cubeTitleEn length(meta$object$dimension) ## End(Not run)
Phase 3DDD3. Wraps the getFullTableDownloadCSV/<productId>/en
GET endpoint. Returns the temporary download URL for the cube's
bulk CSV ZIP. Caller is responsible for downloading the ZIP
(typically large – often hundreds of MB).
morie_datasets_statcan_full_csv_url(product_id, language = c("en", "fr"))morie_datasets_statcan_full_csv_url(product_id, language = c("en", "fr"))
product_id |
Integer cube ID. |
language |
One of |
Character URL string.
## Not run: url <- morie_datasets_statcan_full_csv_url(35100177) # download.file(url, "ccjs_177.zip") ## End(Not run)## Not run: url <- morie_datasets_statcan_full_csv_url(35100177) # download.file(url, "ccjs_177.zip") ## End(Not run)
Phase 3DDD3. Wraps the getDataFromVectorsAndLatestNPeriods
POST endpoint. A "vector" is StatCan's atomic time series ID
(e.g., v109502878 for "Canada – Total, all violations").
morie_datasets_statcan_vectors(vector_ids, n_periods = 5L, timeout_s = 60L)morie_datasets_statcan_vectors(vector_ids, n_periods = 5L, timeout_s = 60L)
vector_ids |
Integer vector of StatCan vector IDs (no |
n_periods |
Number of latest periods per vector (default 5). |
timeout_s |
HTTP timeout. |
A data.frame with one row per (vector, period)
observation: vector_id, coordinate, ref_period, value,
decimals, status, symbol, scalar_factor.
## Not run: df <- morie_datasets_statcan_vectors(c(109502878L, 109502879L), n_periods = 3) nrow(df) # ~6 ## End(Not run)## Not run: df <- morie_datasets_statcan_vectors(c(109502878L, 109502879L), n_periods = 3) nrow(df) # ~6 ## End(Not run)
Phase 3DDD4. Returns a compact summary of how many datasets each portal contributes + which API protocols are used. Useful as a quick "what does morie ship?" smoke test.
morie_datasets_summary()morie_datasets_summary()
A data.frame with one row per portal: source,
n_datasets, api_modes, n_with_bundled_fixture.
morie_datasets_summary()morie_datasets_summary()
Phase 3EEE2. Bundled snapshot of ambulance-station-locations
(46 EMS stations across Toronto). Useful as a control overlay
for crime + EMS dispatch analyses.
morie_datasets_toronto_ambulance_stations(offline = TRUE, max_features = NULL)morie_datasets_toronto_ambulance_stations(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with full station address + EMS metadata.
Phase 3EEE2. Bundled snapshot of
police-annual-statistical-report-miscellaneous-data – 40 rows
of year x section x category x subtype aggregates covering hate
crime counts, IMPACT calls, and other Toronto Police aggregates
that aren't in the per-incident ArcGIS Hub layers.
morie_datasets_toronto_asr_miscellaneous(offline = TRUE, max_features = NULL)morie_datasets_toronto_asr_miscellaneous(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with YEAR, SECTION, CATEGORY,
SUBTYPE, COUNT_.
Phase 3EEE2. Generic loader hitting CKAN's datastore_search
endpoint for an arbitrary Toronto package resource.
morie_datasets_toronto_open_ckan_resource(resource_id, limit = 100L)morie_datasets_toronto_open_ckan_resource(resource_id, limit = 100L)
resource_id |
CKAN resource UUID (from |
limit |
Page size (max 32000 per CKAN; sane default 100). |
A data.frame of records.
Phase 3EEE2. Bundled snapshot of 208 City-of-Toronto CKAN
packages matched on crime-adjacent keywords (311, fire, police,
ambulance, parking, traffic collision, by-law, emergency, crime,
wellbeing). Each row identifies a package by its CKAN slug;
callers fetch records via
morie_datasets_toronto_open_ckan_resource() or visit the
open.toronto.ca dataset page.
morie_datasets_toronto_open_crime_adjacent_layers(offline = TRUE)morie_datasets_toronto_open_crime_adjacent_layers(offline = TRUE)
offline |
If |
A data.frame with package_name, title,
num_resources, metadata_modified, search_keyword.
Wraps EsriCanadaEducation's ArcGIS Online Feature Service
ZonesofToronto_Neighbourhoods
(item id af06159170914808983959df6163fc86; FeatureServer at
services.arcgis.com/As5CFN3ThbQpy8Ph/.../ZonesofToronto_Neighbourhoods/FeatureServer).
Two layers in the service:
morie_datasets_toronto_zoning_per_neighbourhood( layer = c("neighbourhoods", "zoning_stats"), format = "json", where = "1=1", max_features = NULL, offline = TRUE )morie_datasets_toronto_zoning_per_neighbourhood( layer = c("neighbourhoods", "zoning_stats"), format = "json", where = "1=1", max_features = NULL, offline = TRUE )
layer |
One of |
format |
One of |
where |
Optional FeatureServer WHERE filter (live mode). |
max_features |
Optional row cap. |
offline |
Logical; if |
layer = "neighbourhoods" (FeatureServer layer 0)Polygon
boundaries for Toronto neighbourhoods with a 39-column
demographic schema – total population, sex split, 18 age
brackets (0-4 through 85+), senior + youth + child aggregates,
and 10 specific language counts (Chinese, Italian, Korean,
Persian, Portuguese, Russian, Spanish, Tagalog, Tamil, Urdu)
plus a HomeLanguageCategory total.
layer = "zoning_stats" (FeatureServer table 1)Per-
neighbourhood zoning-area stats – 4 columns (OBJECTID,
ZoneDesc, Neighbourhood_Name, SUM_Area). Many rows per
neighbourhood, one per ZoneDesc (Commercial, Residential,
Industrial, etc.).
Offline mode reads bundled 5-row synthetic fixtures
(toronto_zoning_neighbourhoods_sample.csv /
toronto_zoning_stats_sample.csv) – SYNTH-stamped, not
attributable to actual Toronto neighbourhoods. Live mode hits
the FeatureServer via the 3SS+ generic
morie_datasets_arcgis_item_by_id() resolver.
A data.frame (json / csv / offline), parsed GeoJSON
list, or file path (binary).
Esri Canada Education – ArcGIS Online item
af06159170914808983959df6163fc86.
df <- morie_datasets_toronto_zoning_per_neighbourhood(offline = TRUE) head(df[, c("Neighbourhood", "Total_Population", "Seniors65andover")])df <- morie_datasets_toronto_zoning_per_neighbourhood(offline = TRUE) head(df[, c("Neighbourhood", "Total_Population", "Seniors65andover")])
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id b8e3ef826ea84cbcb85951d051afc2fa.
morie_datasets_tps_2008_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_2008_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
2008 Field Information Requests (FIRS)
Tags: FIRS; Field Information Requests
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id de8ba3b4899b48bc8fbf4421f4945ed6.
morie_datasets_tps_2009_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_2009_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
2009 Field Information Requests (FIRS)
Tags: FIRS; Freedom of Information Requests
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 9a5c8a4fdfa54e7f97236a769b196f16.
morie_datasets_tps_2010_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_2010_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
2010 Field Information Requests (FIRS)
Tags: FIRS; Field Information Requests
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 78361ed81cca40aebd1032a26ef52e5b.
morie_datasets_tps_2011_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_2011_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
2011 Field Information Requests (FIRS)
Tags: FIRS; Field Information Requests
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 7a690ba1d7714063983ed78024d5b2af.
morie_datasets_tps_2012_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_2012_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
2012 Field Information Requests (FIRS)
Tags: FIRS; Freedom of Information Requests
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 95a29d4453894944be7a79f537a432b1.
morie_datasets_tps_2013_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_2013_firs( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
2013 Field Information Requests (FIRS)
Tags: FIRS; Field Information Requests
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 135607d9250b4e5ea930e7ea39780a77.
morie_datasets_tps_administrative( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_administrative( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a breakdown of administrative information. This data is compiled and provided by several units of the Toronto Police Service.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Administrative
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Single entry point for the 71 datasets listed by
morie_datasets_tps_arcgis_hub_layers(). Supports five export
formats:
morie_datasets_tps_arcgis_hub_by_id( hub_id, format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_arcgis_hub_by_id( hub_id, format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
hub_id |
32-char hex GUID. See
|
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
"json" (default)Hits the FeatureServer
/0/query?where=...&outFields=*&f=json endpoint and parses
attributes into a tidy data.frame. Same path the existing
TPS PSDP loaders (3FF) use; honours an arbitrary where
clause and max_features cap.
"geojson"Hits ?f=geojson and returns the raw GeoJSON
as a parsed R list. Caller can pass to sf::st_read().
"csv"Hits the Hub CSV exporter at
hub.arcgis.com/api/v3/datasets/<hub_id>_0/downloads/data?format=csv
and parses the returned CSV into a data.frame.
"shapefile" / "fgdb"
Downloads the binary archive
(Esri Shapefile zip / File Geodatabase zip) to dest
(default tempfile()) and returns the file path. Caller can
sf::st_read() the unzipped contents.
For boundary layers (Police Divisions, Patrol Zone, Facilities)
you'll typically want format = "geojson" to get the polygon
geometry. For tabular outputs (UoF tables, KSI counts, ASR
tables, budgets) format = "json" is sufficient and lightest.
A data.frame (json / csv), a parsed GeoJSON list, or a
file path (binary).
Thin wrapper that just hits the Hub downloads endpoint without the
FeatureServer /query round-trip. Use this when you want the
canonical CSV / GeoJSON / Shapefile / FGDB exactly as the Hub UI
serves them (including any column-name and projection differences
that on-the-fly exports introduce vs the FeatureServer source).
morie_datasets_tps_arcgis_hub_download( hub_id, format = "csv", layer_idx = 0L, dest = NULL )morie_datasets_tps_arcgis_hub_download( hub_id, format = "csv", layer_idx = 0L, dest = NULL )
hub_id |
32-char hex GUID. |
format |
One of |
layer_idx |
Integer layer index (default |
dest |
Optional destination path; defaults to |
Path to the downloaded file.
Sibling discovery helper to morie_datasets_external_socrata_layers()
and morie_datasets_ontario_ckan_layers(), covering all 71 datasets
currently published on data.tps.ca.
morie_datasets_tps_arcgis_hub_layers(offline = TRUE)morie_datasets_tps_arcgis_hub_layers(offline = TRUE)
offline |
If |
Verified live against the canonical TPS Hub catalog API
(https://data.tps.ca/api/search/v1/collections/dataset/items?limit=100)
on 2026-05-24 – returned numberMatched: 71, all owned by
TorontoPoliceService.
Coverage spans nine families:
Assault, Auto Theft, Bicycle Thefts, Break and Enter, Hate Crimes, Homicides, Intimate Partner + Family Violence, Robbery, Shooting + Firearm Discharges, Theft From Motor Vehicle, Theft Over.
Six per-axis breakdown tables plus the Types-and-Perceived-Weapons crosstab.
Administrative, Calls for Service, Enforcement, Firearms, Misc., Personnel, Public Complaints, Reported Crimes, Regulated Interactions, Search of Persons, Traffic, Victims of Crime.
Main + per-mode (Automobile / Cyclist / Fatals / Motorcyclist / Passenger / Pedestrian).
Annual figures 2020 through 2026 plus
Budget_by_Command.
ASR-PB family + Staffing_by_Command.
Annual files 2008 through 2013.
Facilities, Patrol Zone, Police Divisions.
Community Safety Indicators, Mental Health Act Apprehensions, Neighbourhood Crime Rates, Persons in Crisis Calls for Service Attended.
A data.frame with columns hub_id, title, type,
feature_server_url, owner, tags, snippet.
TPS Public Safety Data Portal, https://data.tps.ca/search?collection=dataset.
cat <- morie_datasets_tps_arcgis_hub_layers() nrow(cat) # 71 head(cat$title)cat <- morie_datasets_tps_arcgis_hub_layers() nrow(cat) # 71 head(cat$title)
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 4702e79fd2404f7d93dd9866f45d7ec2.
morie_datasets_tps_arrested_and_charged_persons( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_arrested_and_charged_persons( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides an aggregate count of persons who have been arrested and charged. The data is aggregated by division, neighbourhood, sex, age, crime category, and crime subtype.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Arrests
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 899f1b3b047c47659a9843e9c5858269.
morie_datasets_tps_arrests_and_strip_searches( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_arrests_and_strip_searches( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset includes information related to all arrests and strip searches.
Tags: RBDC; race; race based data
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
TPS PSDP – Assault
morie_datasets_tps_assault( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_assault( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 9a21cd6f550748c3a25ac89a108ca5c5.
morie_datasets_tps_automobile_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_automobile_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Automobile-related KSI collisions (2006 - 2023).
Tags: Killed or Seriously Injured; KSI; Automobile; Traffic; Collision; Traffic Collision
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
TPS PSDP – Auto Theft
morie_datasets_tps_autotheft( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_autotheft( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id a89d10d5e28444ceb0c8d1d4c0ee39cc.
morie_datasets_tps_bicycle_thefts( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_bicycle_thefts( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Bicycle Theft occurrences by reported date.
Tags: Bike; Bicycle; Thefts; Crime; Toronto; TPS; Toronto Police
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
TPS PSDP – Bicycle Theft
morie_datasets_tps_bicycletheft( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_bicycletheft( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
TPS PSDP – Break and Enter
morie_datasets_tps_breakandenter( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_breakandenter( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id daca9df799ea4a54a29955ce7fb972d4.
morie_datasets_tps_budget_2020( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_budget_2020( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: Budget; Toronto Police Service; TPS
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id b511c476865b4f0a993cb7bb1c6be7cf.
morie_datasets_tps_budget_2021( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_budget_2021( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: Budget; Toronto Police Service; TPS
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 25c20a7f15e44579acb947510405ab24.
morie_datasets_tps_budget_2022( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_budget_2022( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: Budget; Toronto Police Service; TPS
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 6ac1f56513ab481091ce16f435c390b7.
morie_datasets_tps_budget_2023( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_budget_2023( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: Budget; Toronto Police Service; TPS
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 584b12967d214bb9a673505d97295eea.
morie_datasets_tps_budget_2024( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_budget_2024( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: Budget; Toronto Police Service; TPS
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id cae4c92e80f84e949de156b3ac0d4fef.
morie_datasets_tps_budget_2025( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_budget_2025( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: Budget; Toronto Police Service; TPS
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id d80f9e0b3cc74f649e5e4593cdda207e.
morie_datasets_tps_budget_2026( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_budget_2026( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: Budget; Toronto Police Service; TPS
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 3dca9164b32e4ca7b9c23f41efc9904b.
morie_datasets_tps_budget_by_command( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_budget_by_command( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: Budget; Toronto Police Service; TPS
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 46c7581a136445c78831acb657a4fb0d.
morie_datasets_tps_calls_for_service_attended( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_calls_for_service_attended( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a count of calls for service attended aggregated by division and neighbourhood.
Tags: ASR; TPS; Annual Statistical Report; Toronto Police; Calls for Service
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 0a239a5563a344a3bbf8452504ed8d68.
morie_datasets_tps_community_safety_indicators( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_community_safety_indicators( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Community Safety Indicators (CSI) occurrences by reported date.
Tags: Community Safety Indicators; CSI; Crime; Toronto; TPS; Toronto Police
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 8f3cbe34f3724f93b3aa321b3e957092.
morie_datasets_tps_complaint_dispositions( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_complaint_dispositions( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a breakdown of the total investigated complaints by disposition of the complaint submitted.
Tags: ASR; TPS; Toronto Police; Open Data; Annual Statistical Report; Complaints
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id b38c2524696943bb86398d314a06a42a.
morie_datasets_tps_cyclist_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_cyclist_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Cyclist-related KSI collisions (2006 - 2023).
Tags: Killed or Seriously Injured; KSI; Cyclist; Traffic; Collision; Traffic Collision
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 9cfcd6fe0d374f65afda69c4b9bdc60a.
morie_datasets_tps_dispatched_calls_by_division( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_dispatched_calls_by_division( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a count of the dispatched calls by division, including some specific units such as PRIME, Parking and “Other”. This data includes the command level at the time of reporting.
Tags: ASR; TPS; Annual Statistical Report; Toronto Police; Dispatched Calls
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 6288c8314c594bc9a384df2cf17f8474.
morie_datasets_tps_facilities( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_facilities( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Police stations and other TPS facilities.
Tags: Police Facilities; TPS Facilities; Facilities
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 317e768682d14fad94de83eaefbf5954.
morie_datasets_tps_fatals_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_fatals_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Fatal-related KSI collisions (2006 - 2023).
Tags: Killed or Seriously Injured; KSI; Fatal; Traffic; Collision; Traffic Collision
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 9b1b38ed56764b968c25ce6b74e5dc0d.
morie_datasets_tps_firearms_top_calibres( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_firearms_top_calibres( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
The dataset provides a list of the most common types of pistols, revolvers, rifles and shotguns that comprise the crime guns seized by the Toronto Police Service.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Firearms
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 6cb7e76c7d5b4bf5bce0c533ca7fdf40.
morie_datasets_tps_gross_expenditures_by_division( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_gross_expenditures_by_division( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a breakdown of the Gross Expenditures for each division. This data includes the command level at the time of reporting.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Budget
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 8e95b932cd2d404b9d9d26c2ecc8ebec.
morie_datasets_tps_gross_operating_budget( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_gross_operating_budget( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides the Gross Operating Budget incurred by the Toronto Police Service.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Budget
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
TPS PSDP – Hate Crimes
morie_datasets_tps_hatecrimes( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_hatecrimes( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
TPS Homicides feed.
morie_datasets_tps_homicide(year = NULL, max_features = NULL)morie_datasets_tps_homicide(year = NULL, max_features = NULL)
year |
Integer or |
max_features |
Integer or |
A data.frame.
TPS PSDP – Homicides
morie_datasets_tps_homicides( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_homicides( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
TPS PSDP – Intimate Partner and Family Violence
morie_datasets_tps_intimate_partner_family_violence( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_intimate_partner_family_violence( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id aaea16d94ae64da8a790d9649788de4c.
morie_datasets_tps_investigated_alleged_misconduct( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_investigated_alleged_misconduct( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a breakdown of the total investigated complaints by type of complaint submitted.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Misconduct
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 0a1ee9d9436546dcbdc0ee9301e45e83.
morie_datasets_tps_killed_and_seriously_injured( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_killed_and_seriously_injured( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Killed or Seriously Injured (KSI) - related collisions (2006 - 2023).
Tags: Killed or Seriously Injured; KSI; MVC; Traffic; Collision; Traffic Collision
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
List the TPS open-data layers bundled with morie.
morie_datasets_tps_layers()morie_datasets_tps_layers()
A data.frame with columns name and url.
TPS Major Crime Indicators feed.
morie_datasets_tps_major_crime( year = NULL, max_features = NULL, include_geometry = FALSE, offline = FALSE )morie_datasets_tps_major_crime( year = NULL, max_features = NULL, include_geometry = FALSE, offline = FALSE )
year |
Integer or |
max_features |
Integer or |
include_geometry |
Logical; include |
offline |
Logical; return the bundled synthetic frame instead of hitting the live ArcGIS endpoint. |
A data.frame with the documented TPS schema.
Wraps the TPS Public Safety Data Portal "Mental Health Act
Apprehensions" layer (one row per police-attended MHA event).
Carries both HOOD_158 and HOOD_140 columns – callers should pick
a version via morie_tps_resolve_hood_col() before downstream
analysis.
morie_datasets_tps_mha_apprehensions( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_mha_apprehensions( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
A data.frame.
TPS Public Safety Data Portal, "Mental Health Act Apprehensions Open Data" (https://data.tps.ca/datasets/333c4e1c96314741a83425045b6a7642_0/explore).
df <- morie_datasets_tps_mha_apprehensions(offline = TRUE) table(df$APPREHENSION_TYPE)df <- morie_datasets_tps_mha_apprehensions(offline = TRUE) table(df$APPREHENSION_TYPE)
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 542374a83ea64b3ba222c41309445b8e.
morie_datasets_tps_miscellaneous_calls_for_service( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_miscellaneous_calls_for_service( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset includes the following categories of data: Languages, Calls Received, and Alarm Calls
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Calls for Service
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id bc229f576f174e24946bd1649c98aa43.
morie_datasets_tps_miscellaneous_data( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_miscellaneous_data( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset contains information pertaining to intimate partner violence, hate crimes and budget.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Hate Crime; Budget; IPV
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 7f9e4439a6e749dea32dab9e1704b58a.
morie_datasets_tps_miscellaneous_firearms( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_miscellaneous_firearms( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Firearms
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id d691a9391c2a4c6d85bb761530d33310.
morie_datasets_tps_motorcylist_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_motorcylist_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Motorcyclist-related KSI collisions (2006 - 2023).
Tags: Killed or Seriously Injured; KSI; Motorcyclist; Traffic; Collision; Traffic Collision
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id ea0cfecdb1de416884e6b0bf08a9e195.
morie_datasets_tps_neighbourhood_crime_rates( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_neighbourhood_crime_rates( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Neighbourhood crime rates per 100,000.
Tags: Neighbourhood; Crime; Rate; Crime Rates; Community Safety Indicators; CSI; Toronto; TPS; Toronto Police
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id e4e28a899191479d8e53754414894870.
morie_datasets_tps_passenger_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_passenger_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Passenger-related KSI collisions (2006 - 2023).
Tags: Killed or Seriously Injured; KSI; Passenger; Traffic; Collision; Traffic Collision
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 4a02ac3ed83d478c914d62c3064d6bc4.
morie_datasets_tps_patrol_zone( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_patrol_zone( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Police patrol zones.
Tags: Patrol Zone; Boundary Files; Patrol Zones
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id a96252bf61b84fc68c3926bb7485970e.
morie_datasets_tps_pedestrian_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_pedestrian_ksi( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Pedestrian-related KSI collisions (2006 - 2023).
Tags: Killed or Seriously Injured; KSI; Pedestrian; Traffic; Collision; Traffic Collision
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 1f58109772e2484fba0f509c1ab49fe8.
morie_datasets_tps_personnel_by_command( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_personnel_by_command( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a count of personnel broken down by command level.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Personnel
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 62016275c866412d8de5db539dc0cb8a.
morie_datasets_tps_personnel_by_rank( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_personnel_by_rank( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a count of personnel broken down by rank classification for Uniform, Civilian, and Other Staff.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Personnel
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id e29b8d05c4754b3b8fc234324811a897.
morie_datasets_tps_personnel_by_rank_by_division( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_personnel_by_rank_by_division( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a count of personnel broken down by rank classification for Uniform & Civilian staff by division. This data includes the command level at the time of reporting.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Personnel
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 79c8e950bfe54ce39334ba108e1b325f.
morie_datasets_tps_persons_in_crisis_calls_for_service_attended( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_persons_in_crisis_calls_for_service_attended( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Persons in crisis calls for service attended.
Tags: Persons in Crisis; PIC; Crisis; Apprehensions; MHA; Calls; Calls for Service; Toronto; TPS; Toronto Police
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id fda21b25213c4c07b08c5162cba5081f.
Phase 3CCC3. Bundled snapshot of the TPS Hub
fda21b25213c4c07b08c5162cba5081f (TPS_POLICE_DIVISIONS) – the
16 post-amalgamation TPS divisions (D11, D12, D13, D14, D22, D23,
D31, D32, D33, D41, D42, D43, D51, D52, D53, D55) with unit name,
address, and area_sqkm.
morie_datasets_tps_police_divisions(offline = TRUE, max_features = NULL) morie_datasets_tps_police_divisions(offline = TRUE, max_features = NULL)morie_datasets_tps_police_divisions(offline = TRUE, max_features = NULL) morie_datasets_tps_police_divisions(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
Police divisions (post D54/D55 amalgamation).
Tags: City of Toronto; Toronto; Open Data; Feature Class; Update; Data Load; Divisions; Police Divisions
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
A data.frame with DIV, UNIT_NAME, ADDRESS,
CITY, AREA_SQKM, plus shape area / perimeter fields.
df <- morie_datasets_tps_police_divisions(offline = TRUE) nrow(df) # 16df <- morie_datasets_tps_police_divisions(offline = TRUE) nrow(df) # 16
Phase 3CCC3. Pulls a TPS PSDP crime dataset
(morie_datasets_tps_assault() etc.) and left-joins its native
DIVISION + HOOD_158 + HOOD_140 columns against the bundled
boundary metadata loaders (morie_datasets_tps_police_divisions(),
morie_to_neighbourhoods() 158 + 140 + NIA flags) and the PSDP
layer registry (morie_tps_psdp_layers()).
morie_datasets_tps_psdp_resolved( layer_key, year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL, resolvers = c("division", "hood158", "hood140", "nia", "psdp_class") )morie_datasets_tps_psdp_resolved( layer_key, year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL, resolvers = c("division", "hood158", "hood140", "nia", "psdp_class") )
layer_key |
One of the PSDP |
year |
Optional year filter passed through to the underlying loader. |
max_features |
Optional row cap. |
offline |
If |
layer_url |
Backward-compat override for non-canonical FeatureServer URL. |
resolvers |
Character subset of
|
Per-row PSDP datasets carry these ID columns natively:
DIVISION (e.g., "D11") – joins to police_divisions.DIV
HOOD_158 (integer 1-158) – joins to 158-neighbourhoods
NEIGHBOURHOOD_158 (denormalised name)
HOOD_140 (integer 1-140) – joins to 140-neighbourhoods
NEIGHBOURHOOD_140 (denormalised name)
Resolver columns are prefixed division_*, hood158_*,
hood140_*, nia_*, psdp_* to avoid collisions. Left-join
semantics (row count preserved).
Mirrors the Chicago morie_datasets_chicago_crime_resolved()
(3VV+) and NYPD morie_datasets_nyc_nypd_resolved() (3AAA-3CCC1)
patterns.
A wide data.frame: PSDP columns first, then prefixed
resolver columns.
df <- morie_datasets_tps_psdp_resolved("assault", offline = TRUE) names(df)df <- morie_datasets_tps_psdp_resolved("assault", offline = TRUE) names(df)
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 1cd5d478ef79424a8a6d5319a44edb0a.
morie_datasets_tps_regulated_interactions( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_regulated_interactions( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
The data provided count describes situations involving regulated interactions.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Regulated Interactions
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id fe2e40a464e64cb3a0e69ac3ccd17dfa.
morie_datasets_tps_reported_crimes( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_reported_crimes( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset includes all reported crime offences by reported date aggregated by division.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Reported Crimes
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
TPS PSDP – Robbery
morie_datasets_tps_robbery( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_robbery( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 8ee1697ce6af44a78640686a1feeeefb.
morie_datasets_tps_search_of_persons( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_search_of_persons( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset includes all Level 3 and Level 4 searches that were conducted.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Search of Person
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
TPS PSDP – Shooting and Firearm Discharges
morie_datasets_tps_shooting_firearm_discharges( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_shooting_firearm_discharges( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
TPS Shootings and Firearm Discharges feed.
morie_datasets_tps_shootings(year = NULL, max_features = NULL)morie_datasets_tps_shootings(year = NULL, max_features = NULL)
year |
Integer or |
max_features |
Integer or |
A data.frame.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 9d97ef7e8b494095be4abc0a628d7ce3.
morie_datasets_tps_staffing_by_command( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_staffing_by_command( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Toronto Police Service dataset.
Tags: Staffing; Budget; Toronto Police Service; TPS
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
TPS PSDP – Theft From Motor Vehicle
morie_datasets_tps_theft_from_motor_vehicle( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_theft_from_motor_vehicle( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
TPS PSDP – Theft Over
morie_datasets_tps_theft_over( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )morie_datasets_tps_theft_over( year = NULL, max_features = NULL, offline = TRUE, layer_url = NULL )
year |
Optional reporting year filter (applies an
|
max_features |
Optional cap on returned rows
( |
offline |
If |
layer_url |
Optional ArcGIS layer URL override. |
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 5069c21b5b364194807bf1958556b1ff.
morie_datasets_tps_tickets_issued( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_tickets_issued( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides an aggregated count of tickets issued by year, ticket type, offence, age group, division, and neighbourhood.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Tickets
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id a83aa604fed240acaf2dfe64e1b323f8.
morie_datasets_tps_top_20_offences_of_firearm_seizures( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_top_20_offences_of_firearm_seizures( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a list of top 20 offences ranked by volume, for occurrences linked to a firearm seizure.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Firearms
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id a16edf4bc9484e94ad7e00bc22727544.
morie_datasets_tps_total_public_complaints( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_total_public_complaints( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset provides a breakdown of the total number of public complaints from the Law Enforcement Complaints Agency (L.E.C.A.) broken down by complaints that were investigated and not investigated.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Complaints
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id bc4c72a793014a55a674984ef175a6f3.
morie_datasets_tps_traffic_collisions( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_traffic_collisions( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
Collision occurrences by occurrence date.
Tags: Traffic; Collision; Traffic Collisions; Motor Vehicle Collisions; Toronto; TPS; Toronto Police
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 04633ebdaba941efaa82f2cdaaa00bb8.
morie_datasets_tps_use_of_force_call_for_service_types( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_use_of_force_call_for_service_types( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This table provides information about the types of calls for service which resulted in an enforcement action and/or reported use of force.
Tags: race; race based data; RBDC
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 98d88b18c0364c8d86e6a7c690037b85.
morie_datasets_tps_use_of_force_call_sources_by_month( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_use_of_force_call_sources_by_month( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This table provides monthly counts of incidents which officers responded to from different call sources and which resulted in an enforcement action and/or reported use of force.
Tags: RBDC; race based data; race
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id de9284945c3e479e938c4b77586535b1.
morie_datasets_tps_use_of_force_gender_composition( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_use_of_force_gender_composition( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This table provides information about the genders of people involved in enforcement action incidents, including those that may be associated with a reported use of force.
Tags: race; race based data; rbdc
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 0e7f95cb45704c8e8c9a05973422211c.
morie_datasets_tps_use_of_force_location_of_occurrences( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_use_of_force_location_of_occurrences( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This table provides location information, aggregated to the division level.
Tags: RBDC; race; race based data
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id a9b6bef1d34b44eea814e1869fdcda62.
morie_datasets_tps_use_of_force_occurrence_category( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_use_of_force_occurrence_category( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This table provides information about the nature of the incident or the most serious offence associated with the incident, after officers arrive to the scene.
Tags: rbdc; race; race based data
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id b2bd6427e19a4706a4727338824a82b6.
morie_datasets_tps_use_of_force_time_of_day_trends( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_use_of_force_time_of_day_trends( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This table provides information on the time periods of the day in which incidents took place and which resulted in an enforcement action and/or reported use of force.
Tags: RBDC; race; race based data
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 9388798a44cd4ee5bc175669d8b6fb13.
morie_datasets_tps_use_of_force_use_of_force_types_and_perceived_weapons( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_use_of_force_use_of_force_types_and_perceived_weapons( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This table provides information about reported use of force incidents and the highest type of force used as well as whether officers perceived weapons were carried by people involved.
Tags: race; race based data; rbdc
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Toronto PS ArcGIS Hub dataset wrapper. Thin dispatch to
morie_datasets_tps_arcgis_hub_by_id() with the canonical
hub item_id 6afabfd5109847a2bbba3eaeb0275e35.
morie_datasets_tps_victims_of_crime( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )morie_datasets_tps_victims_of_crime( format = "json", where = "1=1", max_features = NULL, layer_idx = 0L, offline = TRUE, dest = NULL )
format |
One of |
where |
SoQL-style |
max_features |
Optional cap on returned rows. Passed as
|
layer_idx |
Integer index of the FeatureServer layer to pull
(default |
offline |
Logical; if |
dest |
Optional path for binary downloads
( |
This dataset includes all identified victims of crimes against the person, including, but not limited to, those that may have been deemed unfounded after investigation, those that may have occurred outside the City of Toronto limits, or have no verified location.
Tags: ASR; TPS; Toronto Police; Annual Statistical Report; Victims
A data.frame / GeoJSON list / file path; see
morie_datasets_tps_arcgis_hub_by_id() for per-format semantics.
Phase 3CCC4. Hits the Opendatasoft v2.1 /records endpoint for
an arbitrary Vancouver dataset slug. Returns the results array
as a data.frame. For larger pulls, use format = "csv" to hit
the unrestricted /exports/csv endpoint instead.
morie_datasets_vancouver_opendata_by_id( dataset_id, limit = 100L, format = c("json", "csv") )morie_datasets_vancouver_opendata_by_id( dataset_id, limit = 100L, format = c("json", "csv") )
dataset_id |
Opendatasoft dataset slug (from
|
limit |
Page size (max 100 for |
format |
One of |
A data.frame of records.
## Not run: df <- morie_datasets_vancouver_opendata_by_id("non-market-housing", limit = 50) nrow(df) ## End(Not run)## Not run: df <- morie_datasets_vancouver_opendata_by_id("non-market-housing", limit = 50) nrow(df) ## End(Not run)
Phase 3CCC4. Bundled snapshot of every City-of-Vancouver dataset
published on opendata.vancouver.ca (190 datasets as of
2026-05-24). Each row identifies a dataset by its Opendatasoft
dataset_id slug (used as the URL path segment for records /
exports endpoints).
morie_datasets_vancouver_opendata_layers(offline = TRUE, max_features = NULL)morie_datasets_vancouver_opendata_layers(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
A data.frame with dataset_id, title, publisher,
records_count.
Opendatasoft Explore API v2.1, https://opendata.vancouver.ca/api-console/explore/v2.1/.
cat_df <- morie_datasets_vancouver_opendata_layers(offline = TRUE) nrow(cat_df) # 190 head(cat_df$title)cat_df <- morie_datasets_vancouver_opendata_layers(offline = TRUE) nrow(cat_df) # 190 head(cat_df$title)
Phase 3DDD2. Loads VPD's open crime incident records. Three source modes:
morie_datasets_vpd_crime( offline = TRUE, zip_path = NULL, csv_path = NULL, max_features = NULL, accept_terms = FALSE )morie_datasets_vpd_crime( offline = TRUE, zip_path = NULL, csv_path = NULL, max_features = NULL, accept_terms = FALSE )
offline |
If |
zip_path |
Optional path to a user-downloaded
|
csv_path |
Optional path to a pre-extracted CSV. Mutually
exclusive with |
max_features |
Optional row cap. |
accept_terms |
If reading from a user-supplied zip/csv,
pass |
offline = TRUE (default)Reads a bundled stratified
550-row sample (50 rows per TYPE x 11 categories) covering
years 2003-2026 and all 25 VPD-defined neighbourhoods.
Intended for tests + intro examples – NOT for analysis.
zip_path = "..."Reads from a local copy of VPD's
crimedata_csv_AllNeighbourhoods_AllYears.zip that the
caller has downloaded themselves from
https://geodash.vpd.ca/opendata/ (after accepting
VPD's terms + conditions there).
csv_path = "..."Reads from a pre-extracted CSV (skip the zip if the caller already has the CSV on disk).
The bundled sample is open-licensed under VPD's GeoDASH terms;
the full feed requires manual T&C acceptance per VPD policy and
there is no automation-friendly API. See
morie_datasets_vpd_legal_disclaimer() for the full text.
Columns (10): TYPE, YEAR, MONTH, DAY, HOUR, MINUTE,
HUNDRED_BLOCK, NEIGHBOURHOOD, X, Y. Coordinates are UTM
Zone 10 N (NAD83 / EPSG:26910). For Offence-Against-a-Person
incidents the location is deliberately randomized + offset per
VPD's privacy policy.
Categories present in the sample:
Break and Enter Commercial
Break and Enter Residential/Other
Homicide
Mischief
Offence Against a Person (aggregated)
Other Theft
Theft from Vehicle
Theft of Bicycle
Theft of Vehicle
Vehicle Collision or Pedestrian Struck (with Fatality)
Vehicle Collision or Pedestrian Struck (with Injury)
A data.frame with 10 columns.
Source: extracted from the PRIME-BC Police Records Management System (RMS); filtered + aggregated to comply with the BC Freedom of Information & Protection of Privacy Act (BC FIPPA).
Offence Against a Person is INTENTIONALLY aggregated
to reduce re-identification risk. It bundles robbery,
assault (incl. sexual assault, domestic assault), and
other violent incidents EXCEPT Assaults Against Police.
Sub-categories are deliberately NOT exposed; do not
attempt to disaggregate this column.
Other Theft aggregates shoplifting, theft of
personal property (over / under $5000), mail theft, and
utilities theft.
Reporting method: 'All Offence' + 'Founded' (incidents the investigating officer determined did occur). This is NOT comparable to Statistics Canada's published numbers, which use 'UCR Survey' Most-Serious-Offence (MSO) scoring. Do not mix VPD GeoDASH totals with StatCan totals in the same denominator.
Location precision is deliberately reduced: person-crimes have their X/Y randomized to several blocks and offset to an intersection; no time/street-name is provided. Property-crimes are provided at the hundred-block level only. Never interpret a row's X/Y as the actual scene of the incident.
Crime classification + file status may change retroactively as investigations evolve. The dataset is a snapshot, not an archive of fact.
Update schedule: VPD refreshes the feed every Sunday morning. Cache locally for reproducible analysis.
Not a calls-for-service log: only incidents that passed the founded-categorization filter appear. Totals do not reflect total calls or complaints made to the VPD.
Liability disclaimer: VPD / Vancouver Police Board / City of Vancouver assume no liability for any decision made from this data. morie surfaces it as-is.
VPD GeoDASH Open Data, https://geodash.vpd.ca/opendata/.
df <- morie_datasets_vpd_crime(offline = TRUE) nrow(df) # 550 table(df$TYPE) table(df$NEIGHBOURHOOD)df <- morie_datasets_vpd_crime(offline = TRUE) nrow(df) # 550 table(df$TYPE) table(df$NEIGHBOURHOOD)
Phase 3DDD2. Returns the legal disclaimer text shipped with VPD's open crime data download. Useful in headless or programmatic workflows where the human reader of the GeoDASH web UI is not the same as the script user.
morie_datasets_vpd_legal_disclaimer()morie_datasets_vpd_legal_disclaimer()
A character vector (one element per line).
Opens (or creates) the per-user cache database. The default backend
is DuckDB — zero-config like SQLite, but vectorised + columnar,
so it handles the multi-GB-scale open-data PUMFs (TPS, CPADS bulk)
that morie ingests without breaking down on analytical queries. For
back-compat, an existing SQLite cache at morie.db is reused; if
duckdb is unavailable, falls back to SQLite.
morie_db_connect(db_path = NULL)morie_db_connect(db_path = NULL)
db_path |
Optional path to a DuckDB ( |
For non-default backends (PostgreSQL, MariaDB, MS SQL Server, ...),
construct your own DBI connection and pass it as con to the
morie_cache_* and morie_load_dataset functions:
con <- DBI::dbConnect(RPostgres::Postgres(),
host = "...", dbname = "morie", user = "...", password = "...")
morie_load_dataset("ocp21", con = con)
A DBI connection object.
# DuckDB (default when 'duckdb' is installed); pass a '.db' path for SQLite. if (requireNamespace("duckdb", quietly = TRUE) && requireNamespace("DBI", quietly = TRUE)) { tmp <- tempfile(fileext = ".duckdb") con <- morie_db_connect(db_path = tmp) DBI::dbListTables(con) DBI::dbDisconnect(con) file.remove(tmp) }# DuckDB (default when 'duckdb' is installed); pass a '.db' path for SQLite. if (requireNamespace("duckdb", quietly = TRUE) && requireNamespace("DBI", quietly = TRUE)) { tmp <- tempfile(fileext = ".duckdb") con <- morie_db_connect(db_path = tmp) DBI::dbListTables(con) DBI::dbDisconnect(con) file.remove(tmp) }
Looks up table_name in the per-dataset index registry (see
.morie_db_index_registry() for the full list) and creates each
CREATE INDEX IF NOT EXISTS against con. Specs whose columns
aren't present in the actual table are silently skipped, so this is
safe to call on any morie cache table — including subsets that drop
some columns. Returns the number of CREATE INDEX statements that
actually ran (not the number registered).
morie_db_create_indexes(con, table_name)morie_db_create_indexes(con, table_name)
con |
A DBI connection. |
table_name |
The cache table name (case-sensitive). Common
short names: |
Cardinality-based selection: every indexed column has > 30 distinct
values in the real published data; low-cardinality columns
(Gender, Yes/No alerts, Measure) are intentionally skipped
because the index overhead exceeds the lookup benefit.
Invisibly returns the integer count of CREATE INDEX
statements executed.
Wraps dbscan::dbscan.
morie_dbscan_clustering(x, eps = 0.5, min_samples = 5L, metric = "euclidean")morie_dbscan_clustering(x, eps = 0.5, min_samples = 5L, metric = "euclidean")
x |
Numeric matrix. |
eps |
Neighbourhood radius. |
min_samples |
Minimum points in eps-neighbourhood for a core point. |
metric |
Distance metric (passed to dbscan). |
Named list: estimate, labels, n_clusters, n_noise, core_sample_indices, eps, min_samples, n, method.
morie_dbscan_clustering(x = rnorm(50))morie_dbscan_clustering(x = rnorm(50))
Two-step DCC(1,1) on a panel of return series.
morie_dcc_multivariate_garch(x)morie_dcc_multivariate_garch(x)
x |
Numeric matrix of returns (T x k). |
Named list with a, b, unconditional_correlation,
conditional_correlation, conditional_variance, loglik, n, k, method.
morie_dcc_multivariate_garch(x = matrix(rnorm(150), 50, 3))morie_dcc_multivariate_garch(x = matrix(rnorm(150), 50, 3))
CART tree via rpart::rpart, returning the root split structure
and feature importances.
morie_decision_tree_split(x, y, criterion = "gini", max_depth = 30L, seed = 0L)morie_decision_tree_split(x, y, criterion = "gini", max_depth = 30L, seed = 0L)
x |
Numeric predictor matrix. |
y |
Response (factor for classification). |
criterion |
"gini" or "entropy" – only "gini" is supported by rpart for classification; "entropy" maps to information. |
max_depth |
Max tree depth. |
seed |
RNG seed. |
Named list: estimate, train_accuracy, root_feature, root_threshold, root_impurity, n_leaves, feature_importances, criterion, n, method.
morie_decision_tree_split(x = rnorm(50), y = rnorm(50))morie_decision_tree_split(x = rnorm(50), y = rnorm(50))
Single-hidden-layer MLP genomic predictor (base R)
morie_deep_learning_genomic( x, y, markers, hidden = 16, n_epochs = 200, lr = 0.01, l2 = 0.001, seed = 0, deterministic_seed = NULL )morie_deep_learning_genomic( x, y, markers, hidden = 16, n_epochs = 200, lr = 0.01, l2 = 0.001, seed = 0, deterministic_seed = NULL )
x |
Fixed-effect design (optional). |
y |
Numeric response. |
markers |
Genotype matrix (n x m). |
|
Hidden units (default 16). |
|
n_epochs |
Training epochs. |
lr |
Learning rate. |
l2 |
L2 weight decay. |
seed |
Seed. |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
list(estimate, y_hat, beta, W1, b1, w2, b2, se, n, method).
Montesinos Lopez Ch 12.
morie_deep_learning_genomic( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )morie_deep_learning_genomic( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )
Returns a named character vector mapping canonical variable keys used by
morie_generate_synthetic_data() to output column names.
morie_default_synthetic_name_map(profile = c("generic", "morie_legacy"))morie_default_synthetic_name_map(profile = c("generic", "morie_legacy"))
profile |
Name profile. |
Named character vector.
morie_default_synthetic_name_map("generic")morie_default_synthetic_name_map("generic")
Returns the default named map of workflow steps to project script paths.
morie_default_workflow_map()morie_default_workflow_map()
Named character vector.
morie_default_workflow_map()morie_default_workflow_map()
Thin extender over DescTools::Atkinson for the
inequality-aversion-parameter family of inequality indices.
morie_desc_atkinson(x, parameter = 0.5, ...)morie_desc_atkinson(x, parameter = 0.5, ...)
x |
A non-negative numeric vector. |
parameter |
Inequality-aversion parameter (default 0.5),
forwarded to |
... |
Further arguments forwarded to
|
A list with $method = "DescTools::Atkinson" and
$raw (the Atkinson index).
Thin extender over DescTools::CramerV for the symmetric
association statistic between two categorical variables.
morie_desc_cramers_v(x, y = NULL, ...)morie_desc_cramers_v(x, y = NULL, ...)
x |
A factor, character vector, or contingency table. |
y |
Optional second categorical vector when |
... |
Further arguments forwarded to
|
A list with $method = "DescTools::CramerV" and
$raw (the upstream numeric or matrix).
Thin extender over DescTools::Gini for the Gini index of
(in)equality on a non-negative numeric vector.
morie_desc_gini(x, ...)morie_desc_gini(x, ...)
x |
A non-negative numeric vector. |
... |
Further arguments forwarded to |
A list with $method = "DescTools::Gini" and
$raw (the Gini estimate, optionally with CI).
Thin extender over DescTools::CohenKappa (when y
is supplied) or DescTools::KappaM (when x is a
multi-rater matrix / data frame) for inter-rater agreement.
morie_desc_kappa(x, y = NULL, ...)morie_desc_kappa(x, y = NULL, ...)
x |
A vector of ratings (paired with |
y |
Optional second rater vector when |
... |
Further arguments forwarded to the upstream function. |
A list with $method (qualified upstream name) and
$raw (the upstream return object).
Thin extender over DescTools::Winsorize for symmetric
winsorisation by quantile (default 5\
morie_desc_winsorize(x, probs = c(0.05, 0.95), ...)morie_desc_winsorize(x, probs = c(0.05, 0.95), ...)
x |
A numeric vector. |
probs |
Length-2 numeric vector of lower / upper quantile
probabilities, forwarded to |
... |
Further arguments forwarded to
|
A list with $method = "DescTools::Winsorize" and
$raw (the winsorised numeric vector).
Loads the describe_inst/extdata/describe_corpus.Rds bundle and prints it to
the console. This is the R-side mirror of the Python
morie.describe() function (closing the v0.9.5.4 parity
gap; shipped in v0.9.5.5).
morie_describe(callable)morie_describe(callable)
callable |
A morie callable, as a function object (passed
unquoted), or a character scalar name. The lookup strips the
leading |
The lookup is forgiving: a leading morie_ prefix on the
callable name is stripped automatically, so
morie_describe("aalen") and
morie_describe("morie_aalen") resolve to the same
narrative.
Invisibly returns the narrative as a character scalar.
If no matching describe entry is found, returns NULL
and prints a helpful diagnostic.
morie_describe_by_name for the
string-only variant that does not capture symbol names.
## Not run: morie_describe("aalen") morie_describe("morie_aalen") # leading prefix stripped morie_describe(morie_aalen) # function-object form ## End(Not run)## Not run: morie_describe("aalen") morie_describe("morie_aalen") # leading prefix stripped morie_describe(morie_aalen) # function-object form ## End(Not run)
morie_describe.Use this when you want to pass a name as a string and avoid the
unquoted-symbol capture behaviour of morie_describe.
morie_describe_by_name(name)morie_describe_by_name(name)
name |
Character scalar, the callable's mnemonic name
(with or without the |
Invisibly returns the narrative as a character scalar.
If no matching describe entry is found, returns NULL
and prints a helpful diagnostic.
## Not run: morie_describe_by_name("aalen") morie_describe_by_name("morie_aalen") ## End(Not run)## Not run: morie_describe_by_name("aalen") morie_describe_by_name("morie_aalen") ## End(Not run)
Design effect (DEFF)
morie_design_effect(weights)morie_design_effect(weights)
weights |
Numeric vector of sampling weights. |
Numeric design effect (= n / Kish effective sample size).
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Given a callable / fixture name and an integer seed, derive a stable
R-side seed value via SHA-256, install it with set.seed(), and
return it invisibly. The matched Python helper
morie._det_rng.from_seed(name, seed) builds a numpy.random.Generator
from the same SHA digest so bootstrap / MCMC draws on the two sides
agree to Monte-Carlo tolerance (and bit-identical when a
deterministic-pseudo-bootstrap mode is plumbed).
morie_det_rng(name, seed)morie_det_rng(name, seed)
name |
Character scalar; stable callable / fixture name. Must be identical to the string the Python side passes. |
seed |
Integer; user-supplied seed. |
Mechanism: SHA-256(paste0(name, ":", seed)) is truncated to 32 bytes;
bytes [9:12] (1-indexed, i.e. hex chars 17..24) form a 32-bit value
reduced modulo 2^31 - 1 and passed to set.seed(). Bytes [1:8]
are reserved for the Python Philox key. See inst/python-stub/det_rng.py
(or the parent morie/_det_rng.py) for the Python counterpart.
Requires either the digest or openssl package for SHA-256. Both
are widely available on CRAN; we try digest first, then openssl,
and finally fall back to an internal pure-R SHA-256 implementation
loaded only when neither is available. In practice CRAN reverse
dependencies of morie ship with at least one of the two.
Integer seed installed via set.seed() (invisibly).
morie_det_rng("ksr07_bootstrap", 42L) rnorm(5) # reproducible draws keyed by ("ksr07_bootstrap", 42)morie_det_rng("ksr07_bootstrap", 42L) rnorm(5) # reproducible draws keyed by ("ksr07_bootstrap", 42)
Helper exposed so testthat can assert the Python and R sides compute
identical hex digests for the same (name, seed) pair before either
RNG is even consulted.
morie_det_rng_sha_hex(name, seed)morie_det_rng_sha_hex(name, seed)
name |
Character scalar. |
seed |
Integer scalar. |
64-character lowercase hex string.
morie_det_rng_sha_hex(name = "example", seed = 1L)morie_det_rng_sha_hex(name = "example", seed = 1L)
Estimates the canonical two-group / two-period DiD treatment effect
With covariates, fits the regression
and reports .
morie_did_2x2( data, outcome, treatment, post, covariates = NULL, cluster = NULL, alpha = 0.05 )morie_did_2x2( data, outcome, treatment, post, covariates = NULL, cluster = NULL, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary (0/1) treatment-group column. |
post |
Name of the binary (0/1) post-period column. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
For multi-period staggered designs prefer
morie_did_group_time_att (Callaway-Sant'Anna via
did). morie_did_doubly_robust (via DRDID)
is the recommended option when pre-treatment covariates are
available.
A list with elements estimate, std_error,
t_stat, p_value, ci_lower, ci_upper,
n_treated, n_control, method, details.
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics. Princeton University Press.
## Not run: df <- data.frame( y = rnorm(200), d = rep(c(0, 1), each = 100), post = rep(c(0, 1), times = 100) ) morie_did_2x2(df, "y", "d", "post") ## End(Not run)## Not run: df <- data.frame( y = rnorm(200), d = rep(c(0, 1), each = 100), post = rep(c(0, 1), times = 100) ) morie_did_2x2(df, "y", "d", "post") ## End(Not run)
Mirrors the aggregation schemes available in
did::aggte (overall ATT, by-cohort, by-calendar-time,
by-event-time) but produces a tidy data.frame consumed by
the rmorie / MRM downstream pipelines.
morie_did_aggregate_gt_att( gt_results, aggregation = "overall", time_col = "time", cohort_col = "cohort", att_col = "att", se_col = "std_error" )morie_did_aggregate_gt_att( gt_results, aggregation = "overall", time_col = "time", cohort_col = "cohort", att_col = "att", se_col = "std_error" )
gt_results |
Output of |
aggregation |
One of |
time_col, cohort_col, att_col, se_col
|
Column-name overrides. |
A data frame with group, estimate,
std_error, ci_lower, ci_upper.
Thin wrapper around bacondecomp::bacon. Decomposes a
two-way fixed-effects DiD estimate into a weighted average of all
possible 2x2 DiD comparisons. Hard-errors if bacondecomp is
not installed.
morie_did_bacon_decomposition(data, outcome, treatment, unit, time)morie_did_bacon_decomposition(data, outcome, treatment, unit, time)
data |
Balanced panel data. |
outcome |
Outcome column. |
treatment |
Binary treatment indicator that turns on at onset. |
unit |
Unit identifier. |
time |
Time period. |
A list with components (data frame) and
overall_estimate.
Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2), 254–277.
Thin wrapper around DIDmultiplegt::did_multiplegt. Computes
the instantaneous treatment effect for switchers using
appropriate comparisons. Hard-errors if DIDmultiplegt is
not installed.
morie_did_chaisemartin_dhaultfoeuille( data, outcome, treatment, unit, time, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )morie_did_chaisemartin_dhaultfoeuille( data, outcome, treatment, unit, time, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )
data |
Panel data. |
outcome, treatment, unit, time
|
Column names. |
n_bootstrap |
Bootstrap replications (forwarded as
|
seed |
RNG seed. |
alpha |
Significance level. |
A result list; see morie_did_2x2.
de Chaisemartin, C., & D'Haultfoeuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review, 110(9), 2964–2996.
Estimates the marginal effect of a one-unit increase in treatment intensity in the post period.
morie_did_continuous_treatment( data, outcome, dose, post, covariates = NULL, cluster = NULL, alpha = 0.05 )morie_did_continuous_treatment( data, outcome, dose, post, covariates = NULL, cluster = NULL, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
dose |
Continuous treatment-intensity column. |
post |
Name of the binary (0/1) post-period column. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
A result list; see morie_did_2x2.
Reports group / period sample sizes, outcome distributions, and baseline covariate balance (standardised mean differences).
morie_did_diagnostics( data, outcome, treatment, post, covariates = NULL, cluster = NULL )morie_did_diagnostics( data, outcome, treatment, post, covariates = NULL, cluster = NULL )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary (0/1) treatment-group column. |
post |
Name of the binary (0/1) post-period column. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
For richer covariate-balance reporting (variance ratios, KS
statistics, love plots) prefer cobalt::bal.tab /
cobalt::love.plot.
A list with sample_sizes, outcome_stats,
covariate_balance.
Thin wrapper around DRDID::drdid_rc for the 2x2
repeated-cross-section setting. Combines an outcome regression
model with an inverse-probability weighting model and is
consistent if either model is correctly specified. Hard-errors if
DRDID is not installed.
morie_did_doubly_robust( data, outcome, treatment, post, covariates, ps_model = "logistic", or_model = "linear", cluster = NULL, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )morie_did_doubly_robust( data, outcome, treatment, post, covariates, ps_model = "logistic", or_model = "linear", cluster = NULL, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary (0/1) treatment-group column. |
post |
Name of the binary (0/1) post-period column. |
covariates |
Optional character vector of covariate column names. |
ps_model |
Unused; retained for back-compat. DRDID fits a logistic propensity-score model internally. |
or_model |
Unused; retained for back-compat. DRDID fits a linear outcome model internally. |
cluster |
Optional cluster ID column for CR1 standard errors. |
n_bootstrap |
Number of bootstrap replications (forwarded as
|
seed |
RNG seed (set before the call). |
alpha |
Significance level for confidence intervals (default 0.05). |
For panel data (same units observed in both periods) prefer
DRDID::drdid_panel directly.
A result list; see morie_did_2x2.
Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1), 101–122.
Thin wrapper around fixest::feols with fixest::i()
relative-time dummies, plus unit and time fixed effects. The
reference_period is dropped as the baseline. Hard-errors if
fixest is not installed.
morie_did_event_study( data, outcome, unit, time, treatment_time, covariates = NULL, reference_period = -1L, leads = 4L, lags = 4L, cluster = NULL, alpha = 0.05 )morie_did_event_study( data, outcome, unit, time, treatment_time, covariates = NULL, reference_period = -1L, leads = 4L, lags = 4L, cluster = NULL, alpha = 0.05 )
data |
Panel data frame. |
outcome |
Outcome column. |
unit |
Unit identifier column. |
time |
Calendar-time column (integer-valued). |
treatment_time |
Column giving the period in which each unit
first received treatment ( |
covariates |
Optional time-varying covariates. |
reference_period |
Relative-time period omitted as baseline
(default |
leads |
Number of pre-treatment periods to include. |
lags |
Number of post-treatment periods to include. |
cluster |
Cluster variable for standard errors (defaults to
|
alpha |
Significance level. |
For sun-Abraham interaction-weighted estimation prefer
fixest::sunab() directly.
A list with coefficients (data frame),
reference_period, pre_trend_f_stat,
pre_trend_p_value, and details.
Uses as an instrument for
to recover a local average treatment
effect under imperfect compliance.
morie_did_fuzzy( data, outcome, assignment, takeup, post, covariates = NULL, cluster = NULL, alpha = 0.05 )morie_did_fuzzy( data, outcome, assignment, takeup, post, covariates = NULL, cluster = NULL, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
assignment |
Intent-to-treat assignment column. |
takeup |
Actual treatment-takeup column. |
post |
Name of the binary (0/1) post-period column. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
For the de Chaisemartin-D'Haultfoeuille fuzzy DiD estimator on
panel data prefer morie_did_chaisemartin_dhaultfoeuille
(DIDmultiplegt).
A result list; see morie_did_2x2.
Thin wrapper around did::att_gt. For each cohort and
each post-treatment calendar period t, estimates
. Hard-errors if did is
not installed.
morie_did_group_time_att( data, outcome, unit, time, treatment_time, covariates = NULL, method = "doubly_robust", control_group = "never_treated", n_bootstrap = 200L, seed = 42L, alpha = 0.05 )morie_did_group_time_att( data, outcome, unit, time, treatment_time, covariates = NULL, method = "doubly_robust", control_group = "never_treated", n_bootstrap = 200L, seed = 42L, alpha = 0.05 )
data |
Panel data. |
outcome |
Outcome column. |
unit |
Unit identifier. |
time |
Calendar-time column (integer). |
treatment_time |
Column with treatment-onset period (use
|
covariates |
Optional covariates for doubly-robust estimation. |
method |
One of |
control_group |
|
n_bootstrap |
Number of bootstrap replications for inference
(forwarded as |
seed |
RNG seed (unused; retained for back-compat). |
alpha |
Significance level. |
A data frame with columns cohort, time,
att, std_error, ci_lower, ci_upper,
p_value.
Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics, 225(2), 200–230.
Splits the sample by quantiles (or categories) of a moderator and estimates separate 2x2 DiDs.
morie_did_heterogeneous( data, outcome, treatment, post, moderator, covariates = NULL, cluster = NULL, n_quantiles = 4L, alpha = 0.05 )morie_did_heterogeneous( data, outcome, treatment, post, moderator, covariates = NULL, cluster = NULL, n_quantiles = 4L, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary (0/1) treatment-group column. |
post |
Name of the binary (0/1) post-period column. |
moderator |
Column to split on. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
n_quantiles |
Number of quantile bins if the moderator is continuous. |
alpha |
Significance level for confidence intervals (default 0.05). |
A data frame with one row per stratum.
Thin wrapper around fixest::feols estimating
with cluster-robust standard errors. Hard-errors if fixest is
not installed.
morie_did_panel_fe( data, outcome, treatment, unit, time, covariates = NULL, cluster = NULL, alpha = 0.05 )morie_did_panel_fe( data, outcome, treatment, unit, time, covariates = NULL, cluster = NULL, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary (0/1) treatment-group column. |
unit |
Unit identifier column. |
time |
Time period column. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
A result list; see morie_did_2x2.
Group-by-time outcome means for parallel-trends visualisation
morie_did_parallel_trends_data(data, outcome, treatment, time, weights = NULL)morie_did_parallel_trends_data(data, outcome, treatment, time, weights = NULL)
data |
A data frame. |
outcome, treatment, time
|
Column names. |
weights |
Optional survey weight column. |
A data frame with columns time, group,
mean_outcome, se, n.
Placebo DiD on sub-groups expected to be unaffected
morie_did_placebo_test_group( data, outcome, treatment, post, group_col, unaffected_groups, covariates = NULL, cluster = NULL, alpha = 0.05 )morie_did_placebo_test_group( data, outcome, treatment, post, group_col, unaffected_groups, covariates = NULL, cluster = NULL, alpha = 0.05 )
data |
Data frame. |
outcome, treatment, post
|
Column names. |
group_col |
Column defining sub-groups. |
unaffected_groups |
Vector of group values where no effect is expected. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
A data frame, one row per placebo group.
Placebo DiD on outcomes that should be unaffected
morie_did_placebo_test_outcome( data, placebo_outcomes, treatment, post, covariates = NULL, cluster = NULL, alpha = 0.05 )morie_did_placebo_test_outcome( data, placebo_outcomes, treatment, post, covariates = NULL, cluster = NULL, alpha = 0.05 )
data |
Data frame. |
placebo_outcomes |
Character vector of outcome columns expected to show no treatment effect. |
treatment |
Name of the binary (0/1) treatment-group column. |
post |
Name of the binary (0/1) post-period column. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
A data frame, one row per placebo outcome.
For each candidate fake time in placebo_times, redefines the
post indicator and estimates a 2x2 DiD on pre-true-treatment data.
morie_did_placebo_test_time( data, outcome, treatment, time, true_treatment_time, placebo_times, covariates = NULL, cluster = NULL, alpha = 0.05 )morie_did_placebo_test_time( data, outcome, treatment, time, true_treatment_time, placebo_times, covariates = NULL, cluster = NULL, alpha = 0.05 )
data |
Data frame. |
outcome, treatment, time
|
Column names. |
true_treatment_time |
The actual treatment-onset time (data are restricted to pre-period observations). |
placebo_times |
Vector of candidate fake treatment times. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
A data frame, one row per placebo time.
Same specification as morie_did_2x2 but accepts a survey
weight column. When weights is supplied, weighted least
squares is used.
morie_did_repeated_cross_section( data, outcome, treatment, post, covariates = NULL, weights = NULL, cluster = NULL, alpha = 0.05 )morie_did_repeated_cross_section( data, outcome, treatment, post, covariates = NULL, weights = NULL, cluster = NULL, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary (0/1) treatment-group column. |
post |
Name of the binary (0/1) post-period column. |
covariates |
Optional character vector of covariate column names. |
weights |
Optional column of (sampling / survey) weights. |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
A list of class results; see morie_did_2x2.
For each , computes a bias-adjusted confidence
set under the bound
(Rambachan & Roth, 2023, conservative version).
morie_did_sensitivity_analysis( data, outcome, treatment, post, covariates = NULL, delta_range = NULL, cluster = NULL, alpha = 0.05 )morie_did_sensitivity_analysis( data, outcome, treatment, post, covariates = NULL, delta_range = NULL, cluster = NULL, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary (0/1) treatment-group column. |
post |
Name of the binary (0/1) post-period column. |
covariates |
Optional character vector of covariate column names. |
delta_range |
Numeric vector of |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
For the full Rambachan-Roth fixed-length-confidence-interval (FLCI)
procedure with event-time pre-trends prefer
HonestDiD::createSensitivityResults_relativeMagnitudes on
an event-study coefficient vector.
A data frame with columns delta, ci_lower,
ci_upper, covers_zero.
Rambachan, A., & Roth, J. (2023). A more credible approach to parallel trends. Review of Economic Studies, 90(5), 2555–2591.
Convenience wrapper around morie_did_group_time_att and
morie_did_aggregate_gt_att. For the canonical CRAN
aggregator interface see did::aggte.
morie_did_staggered( data, outcome, unit, time, treatment_time, covariates = NULL, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )morie_did_staggered( data, outcome, unit, time, treatment_time, covariates = NULL, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )
data |
Panel data. |
outcome |
Outcome column. |
unit |
Unit identifier. |
time |
Calendar-time column (integer). |
treatment_time |
Column with treatment-onset period (use
|
covariates |
Optional covariates for doubly-robust estimation. |
n_bootstrap |
Number of bootstrap replications for inference
(forwarded as |
seed |
RNG seed (unused; retained for back-compat). |
alpha |
Significance level. |
A list with group_time, overall, by_cohort,
by_event_time.
synthdid::synthdid_estimate (explicit-name API)Parallel to morie_did_synthetic, this is the
explicit-name wrapper that surfaces the full synthdid
estimator and its placebo / jackknife variance pieces. Use this
when you want to pass through additional synthdid arguments
or inspect the unit / time weights side-by-side; use
morie_did_synthetic when you want the rmorie result-list
shape consumed by morie_did_* downstream code.
morie_did_synthdid_estimate( panel, unit, time, treatment, outcome, vcov_method = "placebo", ... )morie_did_synthdid_estimate( panel, unit, time, treatment, outcome, vcov_method = "placebo", ... )
panel |
Long-format balanced panel. |
unit |
Unit identifier column. |
time |
Time period column. |
treatment |
Binary treatment indicator that turns on at onset for treated units and is zero everywhere for controls (the synthdid W convention). |
outcome |
Outcome column. |
vcov_method |
Variance estimator passed to
|
... |
Additional arguments forwarded to
|
Wrapper-as-extender: rmorie already wraps synthdid once via
morie_did_synthetic; this entry point gives MRM / paper
callers the canonical Arkhangelsky et al. (2021) API with a
morie_* name so they don't need to load synthdid
directly.
An S3 list of class morie_did_synthdid_result with
elements att, std_error, vcov_method,
n_treated, n_control, n_pre,
n_post, method, and raw (the full
synthdid_estimate object).
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic difference-in-differences. American Economic Review, 111(12), 4088–4118.
Thin wrapper around synthdid::synthdid_estimate. Requires
the synthdid package
(remotes::install_github("synth-inference/synthdid"));
synthdid is not on CRAN and has no comparable alternative.
morie_did_synthetic( data, outcome, unit, time, treatment_time, treated_units = NULL, zeta = NULL, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )morie_did_synthetic( data, outcome, unit, time, treatment_time, treated_units = NULL, zeta = NULL, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )
data |
Balanced panel. |
outcome, unit, time, treatment_time
|
Column names. |
treated_units |
Optional explicit list of treated unit IDs. |
zeta |
Optional regularisation parameter (auto-selected if NULL). |
n_bootstrap |
Bootstrap replications for placebo SE. |
seed |
RNG seed. |
alpha |
Significance level. |
A result list; see morie_did_2x2.
Arkhangelsky, D., et al. (2021). Synthetic difference-in-differences. American Economic Review, 111(12), 4088–4118.
Regresses the outcome on group-by-time interactions in the pre-period and reports both per-period coefficients and a joint Wald (chi-square) test that they are all zero.
morie_did_test_parallel_trends( data, outcome, treatment, time, unit = NULL, cluster = NULL, pre_periods = NULL )morie_did_test_parallel_trends( data, outcome, treatment, time, unit = NULL, cluster = NULL, pre_periods = NULL )
data |
A data frame. |
outcome |
Outcome column name. |
treatment |
Binary treatment-group indicator. |
time |
Time column (integer-valued). |
unit |
Optional unit identifier (currently unused; reserved). |
cluster |
Cluster variable for robust SE. |
pre_periods |
Optional explicit list of pre-treatment times. |
For the Callaway-Sant'Anna pre-test on the group-time ATTs prefer
did::conditional_did_pretest.
A list with coefficients, joint_chi2 (and
its alias joint_f_stat), joint_df,
joint_p_value, parallel_trends_plausible.
Adds a third differencing dimension to the standard DiD specification.
morie_did_triple_difference( data, outcome, treatment, post, third_diff, covariates = NULL, cluster = NULL, alpha = 0.05 )morie_did_triple_difference( data, outcome, treatment, post, third_diff, covariates = NULL, cluster = NULL, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary (0/1) treatment-group column. |
post |
Name of the binary (0/1) post-period column. |
third_diff |
Binary variable defining the additional differencing group. |
covariates |
Optional character vector of covariate column names. |
cluster |
Optional cluster ID column for CR1 standard errors. |
alpha |
Significance level for confidence intervals (default 0.05). |
A result list; see morie_did_2x2.
Thin interface to TwoWayFEWeights::twowayfeweights: returns the
decomposition of the two-way fixed-effects DiD estimand into the
weighted average of the unit-time ATEs. Use
this to quantify how many of the implicit comparisons receive
negative weight, which is the canonical diagnostic for whether a
TWFE specification can be interpreted as a convex combination of
treatment effects.
morie_did_twoway_fe_weights( panel, group, time, treatment, outcome = NULL, type = "feTR", ... )morie_did_twoway_fe_weights( panel, group, time, treatment, outcome = NULL, type = "feTR", ... )
panel |
A long-format balanced (or near-balanced) panel
|
group |
Name of the unit / group identifier column. |
time |
Name of the time period column. |
treatment |
Name of the binary or continuous treatment column. |
outcome |
Optional outcome column. When supplied,
TwoWayFEWeights computes the weights AND the implied TWFE
coefficient; when |
type |
Weight type passed through to
|
... |
Additional arguments forwarded to
|
Wrapper-as-extender: morie_did_panel_fe already estimates the
TWFE coefficient; this function exposes the diagnostic side of the
same backend so that downstream MRM analyses can flag heterogeneous-
treatment-effects bias without leaving the rmorie API.
An S3 list of class morie_did_twfe_diagnostics with
elements n_negative_weights, sum_weights,
sum_negative_weights, share_negative_weights,
method, and raw (the full
twowayfeweights object).
de Chaisemartin, C., & D'Haultfoeuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review, 110(9), 2964–2996.
morie_did_panel_fe,
morie_did_chaisemartin_dhaultfoeuille.
Recommended when the number of clusters is small (< 50). Uses a
base-R Rademacher / Webb wild-cluster-bootstrap implementation.
Earlier rmorie versions also delegated to
fwildclusterboot::boottest when installed; that branch was
dropped in 0.9.5.12 because fwildclusterboot is GitHub-only and
transitively requires summclust, also GitHub-only, which made the
CI dependency resolver unreliable. Callers who want
fwildclusterboot should call it directly on a feols /
lm fit.
morie_did_wild_cluster_bootstrap( data, outcome, treatment, post, cluster, covariates = NULL, n_bootstrap = 999L, weight_type = "rademacher", seed = 42L, alpha = 0.05 )morie_did_wild_cluster_bootstrap( data, outcome, treatment, post, cluster, covariates = NULL, n_bootstrap = 999L, weight_type = "rademacher", seed = 42L, alpha = 0.05 )
data |
A data frame containing the outcome, treatment, post and any covariate columns. |
outcome |
Name of the outcome column. |
treatment |
Name of the binary (0/1) treatment-group column. |
post |
Name of the binary (0/1) post-period column. |
cluster |
Optional cluster ID column for CR1 standard errors. |
covariates |
Optional character vector of covariate column names. |
n_bootstrap |
Number of bootstrap replications. |
weight_type |
|
seed |
RNG seed. |
alpha |
Significance level for confidence intervals (default 0.05). |
A result list; see morie_did_2x2. p_value
is the bootstrap p-value.
R parity for morie.fn.diffu.diffusion_forward.
morie_diffu_diffusion_forward( x0, t, betas = NULL, num_steps = 1000L, noise = NULL, seed = 0L )morie_diffu_diffusion_forward( x0, t, betas = NULL, num_steps = 1000L, noise = NULL, seed = 0L )
x0 |
Clean sample. |
t |
Diffusion timestep (1.. |
betas |
Optional custom |
num_steps |
Total diffusion steps (default 1000). |
noise |
Pre-generated Gaussian noise. |
seed |
RNG seed. |
with linear schedule from 1e-4 to 0.02.
Named list (x_t, estimate, noise, alpha_bar, beta, method).
Ho, Jain & Abbeel (2020), NeurIPS.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Downloads large bootstrap weight CSVs that are too big to ship with the package. Data is cached in the user cache database for future use.
morie_download_bootstrap( survey = "all", limit = 32000L, db_path = NULL, con = NULL )morie_download_bootstrap( survey = "all", limit = 32000L, db_path = NULL, con = NULL )
survey |
One of |
limit |
Max records per CKAN request (default 32000). |
db_path |
Optional path to a SQLite/DuckDB file (default backend). |
con |
Optional pre-opened DBI connection (overrides |
Invisibly, the number of CSV files successfully downloaded.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Thin extender over
dirichletprocess::DirichletProcessGaussian +
dirichletprocess::Fit for a Bayesian nonparametric
Dirichlet-process Gaussian mixture model (Ross & Markwick, 2018;
MacEachern, 1994). Constructs the DP object on y and
then runs the Gibbs sampler for iterations sweeps.
morie_dp_gaussian_mixture(y, iterations = 1000, ...)morie_dp_gaussian_mixture(y, iterations = 1000, ...)
y |
Numeric vector of observations to model with the DP Gaussian mixture. |
iterations |
Integer; number of Gibbs-sampler iterations to
run via |
... |
Further arguments forwarded to
|
A list with
$method = "dirichletprocess::DirichletProcessGaussian + Fit"
and $raw (the fitted dirichletprocess object
after the Gibbs run, containing the cluster assignments,
cluster parameters, and concentration-parameter trace).
## Not run: if (requireNamespace("dirichletprocess", quietly = TRUE)) { set.seed(1) y <- c(stats::rnorm(50, -2), stats::rnorm(50, 2)) morie_dp_gaussian_mixture(y, iterations = 200) } ## End(Not run)## Not run: if (requireNamespace("dirichletprocess", quietly = TRUE)) { set.seed(1) y <- c(stats::rnorm(50, -2), stats::rnorm(50, 2)) morie_dp_gaussian_mixture(y, iterations = 200) } ## End(Not run)
Inverse-rFFT of the PSD recovers the (biased) autocorrelation.
morie_dsp_acf_from_psd(psd)morie_dsp_acf_from_psd(psd)
psd |
PSD vector (one-sided). |
Numeric vector (autocorrelation).
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.3.
Sliding-window mean after trimming alpha fraction of values from
each tail of the sorted window. Robust to both Gaussian and impulsive
noise; reduces to the mean filter when alpha = 0 and to the median
filter as alpha -> 0.5.
morie_dsp_alpha_trimmed_mean(x, window = 5L, alpha = 0.2)morie_dsp_alpha_trimmed_mean(x, window = 5L, alpha = 0.2)
x |
Numeric vector. |
window |
Window length. Default 5. |
alpha |
Trim fraction (0 <= alpha < 0.5). Default 0.2. |
Filtered vector, length(x).
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.4.
Equal-width histogram of x with n_bins bins. Returns counts,
bin centres, probabilities, and edges (parallel to numpy.histogram).
morie_dsp_amplitude_histogram(x, n_bins = 50L)morie_dsp_amplitude_histogram(x, n_bins = 50L)
x |
Numeric vector. |
n_bins |
Number of bins. Default 50. |
List with counts, centers, probabilities, edges.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.6.
Polygonal length: sum(sqrt(1 + diff(x)^2)). Curve length under
unit time-step.
morie_dsp_arc_length(x)morie_dsp_arc_length(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.3.
[f_low, f_high]
Trapezoid-equivalent rectangular integration of PSD over a band.
morie_dsp_band_power(psd, freqs, f_low, f_high)morie_dsp_band_power(psd, freqs, f_low, f_high)
psd |
PSD vector. |
freqs |
Matching frequency vector. |
f_low |
Lower edge (Hz). |
f_high |
Upper edge (Hz). |
Band power (units of PSD x Hz).
Rangayyan & Krishnan (2015), Ch. 6.
cor(x - mean(x), y - mean(y)) with explicit zero-norm guard.
morie_dsp_baseline_correlation(x, y)morie_dsp_baseline_correlation(x, y)
x |
Numeric vector. |
y |
Numeric vector. |
Scalar in [-1, 1].
Rangayyan & Krishnan (2015), Ch. 5.
Time-domain centre of energy: sum(t * x^2) / sum(x^2).
morie_dsp_centroidal_time(x, fs = 1)morie_dsp_centroidal_time(x, fs = 1)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). Default 1. |
Scalar (seconds).
Rangayyan & Krishnan (2015), Ch. 5.
Delegates to signal::coherence when present; otherwise builds a
Welch-style estimator from morie_dsp_psd_welch and a parallel CSD.
morie_dsp_coherence(x, y, fs = 1, nperseg = 256L)morie_dsp_coherence(x, y, fs = 1, nperseg = 256L)
x |
Numeric vector. |
y |
Numeric vector. |
fs |
Sampling frequency (Hz). Default 1. |
nperseg |
Segment length. Default 256. |
List with freqs and coh (magnitude squared coherence).
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.9.
Convenience pass-through so detection-side callers can stay within
the dsp_detection namespace.
morie_dsp_coherence_spectrum(x, y, fs = 1, nperseg = 256L)morie_dsp_coherence_spectrum(x, y, fs = 1, nperseg = 256L)
x |
Numeric vector. |
y |
Numeric vector. |
fs |
Sampling frequency (Hz). Default 1. |
nperseg |
Segment length. Default 256. |
Same as morie_dsp_coherence.
Rangayyan & Krishnan (2015), Ch. 4 & Ch. 6.
Successively notches the fundamental and its first n_harmonics
harmonics. Each notch below Nyquist is applied; aliased harmonics
are skipped.
morie_dsp_comb(x, fundamental, fs, n_harmonics = 5L, q = 30)morie_dsp_comb(x, fundamental, fs, n_harmonics = 5L, q = 30)
x |
Numeric vector. |
fundamental |
Fundamental frequency (Hz), e.g. 60 for North American mains. |
fs |
Sampling frequency (Hz). |
n_harmonics |
Number of harmonics to cancel. Default 5. |
q |
Quality factor per notch. Default 30. |
Filtered vector, length(x).
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.7.
IFFT(log|X| + j * unwrap(angle(X))). Returns cepstrum and
quefrency indices.
morie_dsp_complex_cepstrum(x)morie_dsp_complex_cepstrum(x)
x |
Numeric vector. |
List with cepstrum and quefrency.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.10; Oppenheim & Schafer (2010).
Multiplies x by exp(-j 2 pi fc t), low-passes via Butterworth
(requires signal), and returns envelope and unwrapped phase.
morie_dsp_complex_demodulation(x, fc, fs = 1)morie_dsp_complex_demodulation(x, fc, fs = 1)
x |
Numeric vector. |
fc |
Carrier frequency (Hz). |
fs |
Sampling frequency (Hz). Default 1. |
List with envelope and phase, both length(x).
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.8.
sqrt(2) for a pure sine; large values indicate spiky waveforms.
morie_dsp_crest_factor(x)morie_dsp_crest_factor(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.2.
Centres both inputs (subtracts the mean) and divides by the
geometric mean of their L2 norms. Returns lags -max_lag .. +max_lag.
morie_dsp_cross_correlation(x, y, max_lag = NULL)morie_dsp_cross_correlation(x, y, max_lag = NULL)
x |
Numeric vector. |
y |
Numeric vector, same length as |
max_lag |
Maximum lag (defaults to |
Numeric vector of length 2 * max_lag + 1.
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.4.
Hamming-windowed averaged CSD. Delegates to signal::cpsd if
present; otherwise computed from the same FFT loop as the coherence
fallback.
morie_dsp_csd(x, y, fs = 1, nperseg = 256L)morie_dsp_csd(x, y, fs = 1, nperseg = 256L)
x |
Numeric vector. |
y |
Numeric vector. |
fs |
Sampling frequency (Hz). Default 1. |
nperseg |
Segment length. Default 256. |
List with freqs (Hz) and csd (complex).
Rangayyan & Krishnan (2015), Ch. 4 & Ch. 6.
sd(x) / |mean(x)|. Inf when the mean is zero.
morie_dsp_cv(x)morie_dsp_cv(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 3.
Locates samples where the first derivative crosses zero from
positive to non-positive AND the prior slope magnitude exceeds
threshold_factor * max(|dx|).
morie_dsp_derivative_detect(x, fs = 1, threshold_factor = 0.5)morie_dsp_derivative_detect(x, fs = 1, threshold_factor = 0.5)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). Default 1. |
threshold_factor |
Slope threshold fraction. Default 0.5. |
Integer vector of peak indices.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.3.
Finds prominent minima in the second derivative outside the
systolic onset window. Requires signal::findpeaks.
morie_dsp_dicrotic_notch(pulse, fs = 125)morie_dsp_dicrotic_notch(pulse, fs = 125)
pulse |
Numeric pulse vector. |
fs |
Sampling frequency (Hz). Default 125. |
Integer vector of notch indices.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.8.
Averages a (n_segments, n_samples) matrix down its rows. Standard
synchronous-averaging recipe for repeated stimulus responses (e.g.
evoked potentials).
morie_dsp_ensemble_average(segments)morie_dsp_ensemble_average(segments)
segments |
Numeric matrix (rows = trials, cols = samples). |
Row mean as a length-ncol vector.
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.3.
-sum(p log2 p) over the histogram probabilities.
morie_dsp_entropy_histogram(x, n_bins = 50L)morie_dsp_entropy_histogram(x, n_bins = 50L)
x |
Numeric vector. |
n_bins |
Number of bins. Default 50. |
Scalar (bits).
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.6.
Splits x into its even (symmetric) and odd (anti-symmetric)
components about the centre: x_even = (x + rev(x))/2,
x_odd = (x - rev(x))/2.
morie_dsp_even_odd(x)morie_dsp_even_odd(x)
x |
Numeric vector. |
List with even and odd.
Rangayyan & Krishnan (2015), Ch. 3.
Generates a length-N fBm via spectral shaping of white noise
with |f|^{-(H + 0.5)}, then cumulative integration. H = 0.5
recovers ordinary Brownian motion.
morie_dsp_fbm_synthesis(N, H = 0.5)morie_dsp_fbm_synthesis(N, H = 0.5)
N |
Length. |
H |
Hurst exponent in (0, 1). Default 0.5. |
Numeric vector, length N.
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.8; Mandelbrot & Van Ness (1968).
1.11 for a pure sine; deviations diagnose waveshape changes.
morie_dsp_form_factor(x)morie_dsp_form_factor(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.2.
Fits log10(psd) ~ log10(f) on positive bins; with slope -beta,
returns the 1/f fractal dimension (5 - beta) / 2. Falls back to
1.5 (Brownian) when fewer than two valid bins exist.
morie_dsp_fractal_dim_psd(psd, freqs)morie_dsp_fractal_dim_psd(psd, freqs)
psd |
PSD vector. |
freqs |
Matching frequency vector. |
Scalar fractal dimension.
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.8; Eke et al. (2002).
Convolves x with a normalised Hann (raised-cosine) window of
length window. Less ringing than the boxcar.
morie_dsp_hann_filter(x, window = 5L)morie_dsp_hann_filter(x, window = 5L)
x |
Numeric vector. |
window |
Window length. Default 5. |
Smoothed vector, length(x).
Rangayyan & Krishnan (2015), Ch. 3.
Slope of log(L(k)) vs. log(1/k) over k = 1..kmax curve-length
scales. Returns a value in approximately [1, 2] for real signals.
morie_dsp_higuchi_fd(x, kmax = 10L)morie_dsp_higuchi_fd(x, kmax = 10L)
x |
Numeric vector. |
kmax |
Maximum scale. Default 10. |
Scalar fractal dimension.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.7; Higuchi (1988).
Analytic-signal magnitude via the Hilbert transform. Delegates to
signal::hilbert when available.
morie_dsp_hilbert_envelope(x)morie_dsp_hilbert_envelope(x)
x |
Numeric vector. |
Numeric vector, length(x).
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.6.
All three Hjorth parameters
morie_dsp_hjorth(x)morie_dsp_hjorth(x)
x |
Numeric vector. |
Named list: activity, mobility, complexity.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.5; Hjorth (1970).
First Hjorth parameter: signal variance.
morie_dsp_hjorth_activity(x)morie_dsp_hjorth_activity(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.5; Hjorth (1970).
mobility(diff(x)) / mobility(x); bandwidth-like descriptor (1
for a pure sinusoid).
morie_dsp_hjorth_complexity(x)morie_dsp_hjorth_complexity(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.5.
sqrt(var(diff(x)) / var(x)); proportional to mean frequency.
morie_dsp_hjorth_mobility(x)morie_dsp_hjorth_mobility(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.5.
Log -> rFFT high-pass at cutoff Hz -> exp. Reduces multiplicative
baseline drift in non-negative envelopes.
morie_dsp_homomorphic(x, cutoff = 0.1, fs = 1)morie_dsp_homomorphic(x, cutoff = 0.1, fs = 1)
x |
Numeric vector. |
cutoff |
Cutoff frequency (Hz). Default 0.1. |
fs |
Sampling frequency (Hz). Default 1. |
Numeric vector, length(x).
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.9; Oppenheim & Schafer (2010).
60 / rr, with zero-RR turned into NA.
morie_dsp_hr_from_rr(rr_intervals)morie_dsp_hr_from_rr(rr_intervals)
rr_intervals |
Numeric vector (seconds). |
Numeric vector of BPM values.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.8.
Integrated EMG (sum of absolute values)
morie_dsp_integrated_emg(x)morie_dsp_integrated_emg(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.3.
Coarse box-counting on the amplitude-normalised signal across
n_scales log-spaced box widths; returns the slope of
log N(s) vs. log(1/s).
morie_dsp_katz_fd(x, n_scales = 10L)morie_dsp_katz_fd(x, n_scales = 10L)
x |
Numeric vector. |
n_scales |
Number of box sizes. Default 10. |
Scalar fractal dimension.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.7; Katz (1988).
Least-mean-squares adaptive transversal filter. Returns the filter
output y and instantaneous error e = d - y. Coefficient update:
w <- w + 2 * mu * e[i] * x_seg.
morie_dsp_lms(x, d, order = 16L, mu = 0.01)morie_dsp_lms(x, d, order = 16L, mu = 0.01)
x |
Input (reference) vector. |
d |
Desired vector, same length as |
order |
Filter order (taps). Default 16. |
mu |
Step size. Default 0.01. |
List with elements y and e, both length(x).
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.6; Widrow & Stearns (1985).
Optimal linear filter against white Gaussian noise: convolves x
with the time-reversed template, normalised by ||template||.
morie_dsp_matched(x, template)morie_dsp_matched(x, template)
x |
Numeric vector. |
template |
Reference waveform. |
Matched-filter output, length(x).
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.5.
Mean absolute value
morie_dsp_mean_abs(x)morie_dsp_mean_abs(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.3.
m1 / m0; equals the first moment normalised by total power.
morie_dsp_mean_frequency(psd, freqs)morie_dsp_mean_frequency(psd, freqs)
psd |
PSD vector. |
freqs |
Matching frequency vector. |
Scalar mean frequency (Hz).
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.6.
Sliding-window median. Robust to impulsive (salt-and-pepper) noise.
morie_dsp_median_filter(x, kernel_size = 5L)morie_dsp_median_filter(x, kernel_size = 5L)
x |
Numeric vector. |
kernel_size |
Odd positive integer kernel length. Default 5. |
Filtered vector, length(x).
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.4.
Frequency at which the cumulative spectrum reaches half the total power. Robust to high-frequency outliers vs. the mean frequency.
morie_dsp_median_frequency(psd, freqs)morie_dsp_median_frequency(psd, freqs)
psd |
PSD vector. |
freqs |
Matching frequency vector. |
Scalar (Hz).
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.6.
Folds the real cepstrum to the causal half then re-exponentiates
to produce a minimum-phase sequence with the same magnitude
spectrum as x (approximate).
morie_dsp_min_phase(x)morie_dsp_min_phase(x)
x |
Numeric vector. |
Numeric vector, length(x).
Rangayyan & Krishnan (2015), Ch. 5; Oppenheim & Schafer (2010).
Length-window boxcar convolution in "same" mode (output length
equals input length, edges biased by zero-padding).
morie_dsp_moving_average(x, window = 5L)morie_dsp_moving_average(x, window = 5L)
x |
Numeric vector. |
window |
Positive integer kernel length. Default 5. |
Numeric vector, length(x).
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.3.
Fraction of samples with |x| > threshold (defaults to 2 * sd(x)).
morie_dsp_myopulse_rate(x, threshold = NULL)morie_dsp_myopulse_rate(x, threshold = NULL)
x |
Numeric vector. |
threshold |
Optional threshold. |
Scalar in [0, 1].
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.4.
Normalised LMS: divides the step by the instantaneous input power
x_seg' x_seg + eps, giving robust convergence over a wider range
of input scales.
morie_dsp_nlms(x, d, order = 16L, mu = 0.5, eps = 1e-08)morie_dsp_nlms(x, d, order = 16L, mu = 0.5, eps = 1e-08)
x |
Input (reference) vector. |
d |
Desired vector, same length as |
order |
Filter order (taps). Default 16. |
mu |
Normalised step size (typically 0 < mu < 2). Default 0.5. |
eps |
Power-floor for division. Default 1e-8. |
List with y, e.
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.6.
Wraps signal::butter style IIR notch via the signal package's
iirnotch / filtfilt. Falls back to a stop with NotYetPorted if
signal is unavailable.
morie_dsp_notch(x, freq, fs, q = 30)morie_dsp_notch(x, freq, fs, q = 30)
x |
Numeric vector. |
freq |
Notch centre frequency (Hz). |
fs |
Sampling frequency (Hz). |
q |
Quality factor. Default 30. |
Filtered vector, length(x).
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.7.
Smooths x^2 with a energy_window_ms boxcar and flags samples
where the smoothed energy crosses threshold_factor * median(energy).
Hysteresis: returns to "off" only when energy drops below baseline.
morie_dsp_onset_detect(x, fs, energy_window_ms = 20, threshold_factor = 3)morie_dsp_onset_detect(x, fs, energy_window_ms = 20, threshold_factor = 3)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). |
energy_window_ms |
Smoothing window (ms). Default 20. |
threshold_factor |
Multiplier on baseline median. Default 3. |
Integer vector of onset indices.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.5.
Bandpass (5-15 Hz) -> differentiate -> square -> moving-window
integrate -> adaptive threshold with refractory period. Requires
signal for the Butterworth bandpass.
morie_dsp_pan_tompkins(ecg, fs = 360)morie_dsp_pan_tompkins(ecg, fs = 360)
ecg |
ECG vector. |
fs |
Sampling frequency (Hz). Default 360. |
Integer vector of QRS sample indices.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.7; Pan & Tompkins (1985).
Gaussian-kernel KDE on a uniform grid. Bandwidth defaults to
Silverman's 1.06 * sd(x) * n^{-1/5}.
morie_dsp_parzen_pdf(x, bandwidth = NULL, n_points = 100L)morie_dsp_parzen_pdf(x, bandwidth = NULL, n_points = 100L)
x |
Numeric vector. |
bandwidth |
Optional kernel bandwidth. |
n_points |
Grid size. Default 100. |
List with grid and density.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.6; Parzen (1962); Silverman (1986).
Splits x into n_segments non-overlapping equal segments,
periodograms each, and averages the result. Variance scales as
1 / n_segments at the cost of frequency resolution.
morie_dsp_psd_bartlett(x, fs = 1, n_segments = 8L)morie_dsp_psd_bartlett(x, fs = 1, n_segments = 8L)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). Default 1. |
n_segments |
Number of equal segments. Default 8. |
List with freqs and psd.
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.4; Bartlett (1948).
One-sided rFFT-based periodogram. Inner bins are doubled to fold negative-frequency power into the one-sided spectrum.
morie_dsp_psd_periodogram(x, fs = 1)morie_dsp_psd_periodogram(x, fs = 1)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). Default 1. |
List with freqs and psd, both length floor(N/2)+1.
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.4.
10 * log10(max(psd, 1e-20)).
morie_dsp_psd_to_db(psd)morie_dsp_psd_to_db(psd)
psd |
PSD vector. |
PSD in dB.
Rangayyan & Krishnan (2015), Ch. 6.
Thin wrapper that prefers the signal package's pwelch-style
routine for the production estimator; falls back to a pure-R
Hamming-windowed averaged periodogram when signal is unavailable.
morie_dsp_psd_welch(x, fs = 1, nperseg = 256L, noverlap = NULL)morie_dsp_psd_welch(x, fs = 1, nperseg = 256L, noverlap = NULL)
x |
Numeric vector. |
fs |
Sampling frequency (Hz). Default 1. |
nperseg |
Segment length. Default 256. |
noverlap |
Overlap in samples. Default |
List with freqs and psd.
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.4; Welch (1967).
Returns peak amplitude, duration (samples), absolute area (trapezoid), up/down slopes, and peak index for a single beat.
morie_dsp_qrs_features(beat)morie_dsp_qrs_features(beat)
beat |
Numeric vector covering one beat. |
Named list of features.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.9.
Recursive least-squares with forgetting factor lam and initial
inverse-correlation P0 = delta * I. Faster convergence than LMS
at the cost of O(order^2) per sample.
morie_dsp_rls(x, d, order = 16L, lam = 0.99, delta = 100)morie_dsp_rls(x, d, order = 16L, lam = 0.99, delta = 100)
x |
Input (reference) vector. |
d |
Desired vector, same length as |
order |
Filter order (taps). Default 16. |
lam |
Forgetting factor in (0, 1]. Default 0.99. |
delta |
Initial P diagonal. Default 100. |
List with y, e.
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.6; Haykin (2002).
sqrt(mean(x^2)).
morie_dsp_rms(x)morie_dsp_rms(x)
x |
Numeric vector. |
Scalar RMS.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.2.
Slope-based estimator from polygonal lengths at log-spaced ruler sizes. Equivalent up to sign convention to Higuchi.
morie_dsp_ruler_fd(x, n_rulers = 10L)morie_dsp_ruler_fd(x, n_rulers = 10L)
x |
Numeric vector. |
n_rulers |
Number of ruler sizes. Default 10. |
Scalar fractal dimension.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.7.
-x^2 * log(x^2) with a small floor to guard log(0). Amplifies
moderate-energy components, useful as a preprocessor for heart-sound
segmentation.
morie_dsp_shannon_energy(x)morie_dsp_shannon_energy(x)
x |
Numeric vector. |
Numeric vector, length(x).
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.5; Liang et al. (1997).
Dimensionless waveshape descriptor: E|x| / (E sqrt|x|)^2.
morie_dsp_shape_factor(x)morie_dsp_shape_factor(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.2.
Sign changes in diff(x) where the absolute next slope exceeds
threshold. Hudgins TD feature.
morie_dsp_slope_sign_changes(x, threshold = 0)morie_dsp_slope_sign_changes(x, threshold = 0)
x |
Numeric vector. |
threshold |
Magnitude threshold. Default 0. |
Integer count.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.4.
10 * log10(mean(signal^2) / mean(noise^2)). Returns Inf when
noise power is zero.
morie_dsp_snr(signal, noise)morie_dsp_snr(signal, noise)
signal |
Numeric vector. |
noise |
Numeric vector. |
SNR in dB.
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.2.
Compares SNR of (clean vs. clean-noisy) before filtering with SNR of (clean vs. clean-filtered) after; positive values mean the filter reduced noise relative to clean.
morie_dsp_snr_improvement(x_noisy, x_clean, x_filtered)morie_dsp_snr_improvement(x_noisy, x_clean, x_filtered)
x_noisy |
Observed noisy vector. |
x_clean |
Clean reference. |
x_filtered |
Filter output. |
Delta SNR in dB.
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.2.
Frequency below which pct fraction of total power lies. SEF95
(pct = 0.95) is a classical EEG depth-of-anaesthesia marker.
morie_dsp_spectral_edge(psd, freqs, pct = 0.95)morie_dsp_spectral_edge(psd, freqs, pct = 0.95)
psd |
PSD vector. |
freqs |
Matching frequency vector. |
pct |
Cumulative fraction in (0, 1]. Default 0.95. |
Scalar (Hz).
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.6.
Normalises PSD to a probability mass function and returns its Shannon entropy in bits.
morie_dsp_spectral_entropy(psd)morie_dsp_spectral_entropy(psd)
psd |
PSD vector. |
Scalar in [0, log2(length(psd))].
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.7; Inouye et al. (1991).
Geometric-to-arithmetic mean ratio of the positive PSD bins; values near 1 indicate a white spectrum, near 0 indicate tonal concentration.
morie_dsp_spectral_flatness(psd)morie_dsp_spectral_flatness(psd)
psd |
PSD vector. |
Scalar in [0, 1].
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.7.
Standardised fourth central moment of frequency under the PSD treated as a probability mass.
morie_dsp_spectral_kurtosis(psd, freqs)morie_dsp_spectral_kurtosis(psd, freqs)
psd |
PSD vector. |
freqs |
Matching frequency vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 6.
m_k = sum(freqs^k * psd) * df. Used to derive mean, median, edge,
and higher-order frequency descriptors.
morie_dsp_spectral_moment(psd, freqs, order = 0L)morie_dsp_spectral_moment(psd, freqs, order = 0L)
psd |
PSD vector. |
freqs |
Matching frequency vector. |
order |
Moment order. Default 0. |
Scalar moment.
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.6.
Returns bandpower(psd, band1) / bandpower(psd, band2).
morie_dsp_spectral_ratio(psd, freqs, band1, band2)morie_dsp_spectral_ratio(psd, freqs, band1, band2)
psd |
PSD vector. |
freqs |
Matching frequency vector. |
band1 |
Length-2 numeric (low, high) Hz. |
band2 |
Length-2 numeric (low, high) Hz. |
Scalar ratio.
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.6.
Extracts length-window epochs centred on each trigger_indices
value and averages them. Out-of-bounds triggers are dropped.
morie_dsp_synchronized_average(x, trigger_indices, window = 100L)morie_dsp_synchronized_average(x, trigger_indices, window = 100L)
x |
Numeric vector. |
trigger_indices |
Integer vector of 1-based event indices. |
window |
Epoch length (centred). Default 100. |
Mean epoch, length window.
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.3.
For each QRS index, searches [loc + 0.2 * fs, loc + 0.5 * fs] for
the absolute maximum and records its global index.
morie_dsp_t_wave(ecg, qrs_locs, fs = 360)morie_dsp_t_wave(ecg, qrs_locs, fs = 360)
ecg |
ECG vector. |
qrs_locs |
QRS indices (1-based) from a detector like
|
fs |
Sampling frequency (Hz). Default 360. |
Integer vector of T-peak indices.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.8.
psi[n] = x[n]^2 - x[n-1] * x[n+1]; sensitive to instantaneous
amplitude AND frequency.
morie_dsp_teager_energy(x)morie_dsp_teager_energy(x)
x |
Numeric vector. |
Numeric vector, length(x); ends are zero.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.5; Kaiser (1990).
Slides template across x, computing the centred Pearson-style
correlation per offset. Returns indices and correlation values
meeting threshold.
morie_dsp_template_match(x, template, threshold = 0.7)morie_dsp_template_match(x, template, threshold = 0.7)
x |
Numeric vector. |
template |
Numeric vector (shorter than |
threshold |
Minimum correlation. Default 0.7. |
List with indices (1-based) and correlations.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.4.
Returns sample indices (1-based) where x crosses threshold in
the chosen direction. min_distance enforces a minimum gap (in
samples) between successive events.
morie_dsp_threshold_detect( x, threshold, min_distance = 1L, direction = "above" )morie_dsp_threshold_detect( x, threshold, min_distance = 1L, direction = "above" )
x |
Numeric vector. |
threshold |
Scalar threshold. |
min_distance |
Minimum gap between events. Default 1. |
direction |
One of "above", "below", "either". Default "above". |
Integer vector of detected indices.
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.2.
Counts strict turning points and z-scores against the i.i.d.
expectation 2(n-2)/3. |z| < 1.96 is consistent with weak
stationarity at the 5 percent level.
morie_dsp_turning_points(x)morie_dsp_turning_points(x)
x |
Numeric vector. |
List with turning_points, expected, z_statistic,
stationary.
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.2.
Counts adjacent sign changes in diff(x) whose absolute slope
difference exceeds threshold. Used as a fatigue/load proxy in
sEMG analysis.
morie_dsp_turns_count(x, threshold = 0)morie_dsp_turns_count(x, threshold = 0)
x |
Numeric vector. |
threshold |
Slope-difference threshold. Default 0. |
Integer count.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.4; Willison (1964).
var(x) / var(y); Inf if var(y) == 0.
morie_dsp_variance_ratio(x, y)morie_dsp_variance_ratio(x, y)
x |
Numeric vector. |
y |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5.
Sum of absolute first differences. Standard sEMG descriptor.
morie_dsp_waveform_length(x)morie_dsp_waveform_length(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.3; Hudgins et al. (1993).
morie_dsp_waveform_length(x) / length(x).
morie_dsp_waveform_length_norm(x)morie_dsp_waveform_length_norm(x)
x |
Numeric vector. |
Scalar.
Rangayyan & Krishnan (2015), Ch. 5.
Classical scalar Wiener gain in the rFFT domain:
H(f) = Pxx(f) / (Pxx(f) + Pnn(f)). With noise_psd = NULL the
noise PSD is assumed flat at noise_fraction * mean(Pxx).
morie_dsp_wiener_filter(x, noise_psd = NULL, noise_fraction = 0.1)morie_dsp_wiener_filter(x, noise_psd = NULL, noise_fraction = 0.1)
x |
Numeric vector (signal + noise). |
noise_psd |
Optional noise PSD, length |
noise_fraction |
Fallback flat-noise scale. Default 0.1. |
Filtered vector, length(x).
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.5.
Returns w = solve(Rxx, rxd). Used as the optimal-FIR Wiener
solution; equivalent to lm.fit on Toeplitz inputs.
morie_dsp_wiener_hopf(Rxx, rxd)morie_dsp_wiener_hopf(Rxx, rxd)
Rxx |
Symmetric autocorrelation matrix (order x order). |
rxd |
Cross-correlation vector (length order). |
Optimal tap-weight vector.
Rangayyan & Krishnan (2015), Ch. 3, sec. 3.5.
Count of |diff(x)| > threshold (defaults to sd(x)).
morie_dsp_willison_amplitude(x, threshold = NULL)morie_dsp_willison_amplitude(x, threshold = NULL)
x |
Numeric vector. |
threshold |
Optional threshold (default |
Integer count.
Rangayyan & Krishnan (2015), Ch. 5, sec. 5.4.
Returns a length-N window vector of the requested type. Supports
hamming, hann/hanning, blackman, bartlett/triangular, kaiser
(beta = 14), and rectangular/boxcar. Unknown types default to hamming.
morie_dsp_window(N, wtype = "hamming")morie_dsp_window(N, wtype = "hamming")
N |
Window length. |
wtype |
Type string. Default "hamming". |
Numeric vector of length N.
Rangayyan & Krishnan (2015), Ch. 6, sec. 6.5.
Whole-signal ZCR if frame_length = NULL; otherwise per-frame
ZCR over consecutive non-overlapping frames.
morie_dsp_zero_crossing(x, frame_length = NULL)morie_dsp_zero_crossing(x, frame_length = NULL)
x |
Numeric vector. |
frame_length |
Optional frame length. |
Scalar (whole signal) or numeric vector (per frame).
Rangayyan & Krishnan (2015), Ch. 4, sec. 4.3.
The E-value quantifies the minimum strength of confounding association needed to fully explain away an observed treatment effect:
morie_e_value(rr, rr_lower = NULL)morie_e_value(rr, rr_lower = NULL)
rr |
Risk ratio estimate (> 0). Supply > 1; if < 1, pass its reciprocal. |
rr_lower |
Lower bound of the 95\ E-value for CI). |
For a risk ratio , use before applying the
formula.
Thin wrapper over EValue::evalue() when EValue is
installed; falls back to the inline closed-form computation
otherwise. Both arms produce numerically identical answers
(the formula above is the EValue closed-form for RR estimands).
Named list: morie_e_value, e_value_ci (for the CI
bound).
VanderWeele TJ, Ding P (2017). Sensitivity analysis in observational research: introducing the E-value. Annals of Internal Medicine, 167(4):268-274.
morie_e_value(rr = 3.9, rr_lower = 2.4)morie_e_value(rr = 3.9, rr_lower = 2.4)
Thin extender over marginaleffects::comparisons() for unit-
level treatment-effect contrasts (counterfactual differences,
ratios, etc.).
morie_effects_comparisons(model, variables = NULL, ...)morie_effects_comparisons(model, variables = NULL, ...)
model |
A fitted model object. |
variables |
Character vector or named list of variables to
contrast (see |
... |
Further arguments forwarded to
|
A marginaleffects data frame.
Thin extender over emmeans::emmeans(). The fitted model is
passed through unchanged; specs follows the usual emmeans
formula / list interface. Use emmeans::pairs() or
emmeans::contrast() on the returned object for pairwise or
custom contrasts.
morie_effects_emmeans(model, specs, ...)morie_effects_emmeans(model, specs, ...)
model |
A fitted model object ( |
specs |
Specification for the marginal means – a formula
(e.g. |
... |
Further arguments forwarded to |
An emmGrid object.
Thin extender over marginaleffects::predictions() for unit-
level or grid-level adjusted predictions.
morie_effects_predictions(model, newdata = NULL, ...)morie_effects_predictions(model, newdata = NULL, ...)
model |
A fitted model object supported by insight / marginaleffects. |
newdata |
Optional data frame for which to predict. Defaults
to the model frame when |
... |
Further arguments forwarded to
|
A marginaleffects data frame.
Thin extender over marginaleffects::slopes() for continuous
marginal effects (Stata-style margins, dydx()).
morie_effects_slopes(model, variables = NULL, ...)morie_effects_slopes(model, variables = NULL, ...)
model |
A fitted model object. |
variables |
Character vector of focal variables. When |
... |
Further arguments forwarded to
|
A marginaleffects data frame.
summary() coefficients)Thin extender over broom::tidy(). When broom is not
installed, falls back to building a tidy-style data frame from
summary(model)$coefficients, which is sufficient for the
core term / estimate / std.error / statistic / p.value columns
on the model classes (lm / glm) that rmorie ships.
morie_effects_tidy(model, ...)morie_effects_tidy(model, ...)
model |
A fitted model object. |
... |
Further arguments forwarded to |
A data frame with one row per model term.
Engle-Granger two-step cointegration test
morie_eg_coint(y1, y2, max_lag = NULL)morie_eg_coint(y1, y2, max_lag = NULL)
y1 |
Numeric, first series. |
y2 |
Numeric, second series. |
max_lag |
Max ADF augmentation lags. Default |
Named list with adf_statistic, p_value, beta, n, method.
morie_eg_coint(y1 = rnorm(100), y2 = rnorm(100))morie_eg_coint(y1 = rnorm(100), y2 = rnorm(100))
EGARCH(1,1) asymmetric volatility model
morie_egarch_model(x)morie_egarch_model(x)
x |
Numeric return series. |
Named list with omega, alpha, gamma, beta, loglik,
conditional_variance, n, method.
morie_egarch_model(x = rnorm(50))morie_egarch_model(x = rnorm(50))
Function-body helper for morie endpoints that require optional
Suggests: packages. If every package in pkgs is already
installed, returns silently. Otherwise:
morie_ensure_extras(pkgs, ask = interactive(), repos = NULL)morie_ensure_extras(pkgs, ask = interactive(), repos = NULL)
pkgs |
Character vector of required package names. |
ask |
Logical. If |
repos |
Optional CRAN repo URL(s). Default uses
|
In an interactive session, prompts the user once,
and on consent installs the missing packages via
utils::install.packages().
In a non-interactive session (R CMD check, CI,
Rscript), throws an informative error with the morie command
the user should run to fix things:
morie_install_extras(c("X", "Y")).
Why not auto-install silently? CRAN policy forbids packages from writing to the user's library or making network calls at function call time without explicit consent. The interactive prompt is the CRAN-blessed escape: the user IS the one consenting, and during R CMD check or any non-interactive run, the function refuses to touch the library.
Typical use inside a morie function body that needs DoubleML and ranger:
morie_estimate_irm <- function(...) {
morie_ensure_extras(c("DoubleML", "ranger"))
...
}
Invisibly TRUE if all pkgs are (now) installed; throws
otherwise.
morie_install_extras() for the user-facing bulk installer.
## Not run: # Interactive (RStudio / R console): prompts to install if needed morie_ensure_extras(c("DoubleML", "ranger")) # CI / Rscript: errors with install-hint instead of installing morie_ensure_extras(c("DoubleML"), ask = FALSE) ## End(Not run)## Not run: # Interactive (RStudio / R console): prompts to install if needed morie_ensure_extras(c("DoubleML", "ranger")) # CI / Rscript: errors with install-hint instead of installing morie_ensure_extras(c("DoubleML"), ask = FALSE) ## End(Not run)
R parity of morie.entheo_dmt.analyze_subject. Runs the
Layer-2 BOLD analyses (global-signal LZ complexity and dynamic
functional connectivity) on one subject under each condition and
returns a RichResult-style comparison summary.
morie_entheo_analyze_subject( subject_id, conditions = c("DMT", "PCB"), window = 30L, step = 5L )morie_entheo_analyze_subject( subject_id, conditions = c("DMT", "PCB"), window = 30L, step = 5L )
subject_id |
integer subject ID. |
conditions |
character vector. Conditions to evaluate. |
window |
integer dFC window (TRs). |
step |
integer dFC stride (TRs). |
RichResult-style named list with payload$rows as a
per-condition list of result rows.
R parity of morie.entheo_dmt.available_subjects. Scans the
fMRI/ directory of the on-disk DMT_Imaging mirror for
LongS{NN}{DMT,PCB}.mat filenames and returns the integer IDs.
morie_entheo_available_subjects()morie_entheo_available_subjects()
integer vector of subject IDs sorted ascending. Empty if the
dataset root is missing or the fMRI/ folder is absent.
Timmermann, C. et al. (2023). Human brain effects of DMT assessed via EEG-fMRI. PNAS 120(13): e2218949120.
morie_entheo_clone_dmt_imaging() shells out to git clone to
fetch the open-source DMT_Imaging dataset published by Christopher
Timmerman's group at https://github.com/timmer500/DMT_Imaging.
After the clone completes, load_dmt_imaging() and
morie_entheo_load_*() will pick the real fixture up
automatically (they probe $MORIE_DMT_IMAGING_ROOT and the cache
dir).
morie_entheo_clone_dmt_imaging(root = NULL, overwrite = FALSE, branch = NULL)morie_entheo_clone_dmt_imaging(root = NULL, overwrite = FALSE, branch = NULL)
root |
Optional destination directory. Defaults to
|
overwrite |
Logical; if |
branch |
Optional branch / tag / SHA to check out after
clone. |
This is opt-in – the package never auto-clones at load time. We
clone from a specific commit by default to make tests
reproducible; pass branch = NULL to track main.
Related upstream resources Vee surfaced 2026-05-25:
https://github.com/timmer500/DMT_Imaging.git (the actual EEG+fMRI dataset)
https://github.com/pnk314/psychedelics.git (analysis pipeline)
https://github.com/lisagirard/Psychedelics.git (review repository)
https://github.com/kianenigma/awesome-psychedelics.git (curated index)
Invisibly returns the destination path.
R parity of morie.entheo_dmt.dataset_overview. Returns a
RichResult-style summary of the on-disk DMT_Imaging mirror.
morie_entheo_dataset_overview()morie_entheo_dataset_overview()
named list with title, summary_lines,
interpretation, payload.
R parity of morie.entheo_dmt.dynamic_functional_connectivity.
For an AAL-parcellated BOLD matrix of shape (n_regions, n_TRs),
computes the upper-triangular Pearson correlation matrix in each
sliding window of window TRs advanced by step TRs.
Mirrors the ‘dRSFC.m’ Matlab script in
‘DMT_Imaging/Scripts/’.
morie_entheo_dynamic_functional_connectivity(bold, window = 30L, step = 5L)morie_entheo_dynamic_functional_connectivity(bold, window = 30L, step = 5L)
bold |
numeric matrix (n_regions, n_TRs). |
window |
integer window length (TRs). Default 30. |
step |
integer window stride (TRs). Default 5. |
RichResult-style named list with per-window mean / std of the upper-triangular correlation vector.
Allen, E. A. et al. (2014). Tracking whole-brain connectivity dynamics in the resting state. Cereb. Cortex 24(3): 663-676.
R parity of morie.entheo_dmt.load_eeg_region. Reads
RegressorsInterpscrubbedIRASA_<region>.mat and returns the
DMT, PCB, and difference regressor cubes.
morie_entheo_load_eeg_region(region)morie_entheo_load_eeg_region(region)
region |
character. One of Central, Frontal, Occipital, Parietal, Temporal. |
named list with elements regDMT, regPCB,
regdiff; each is a 3-D array of shape
(14 subj, 840 TRs, 5 bands).
R parity of morie.entheo_dmt.load_fmri_subject. Reads
LongS{NN}{DMT|PCB}.mat and extracts the
BOLD_AAL matrix (112 AAL regions x 840 TRs).
morie_entheo_load_fmri_subject(subject_id, condition = "DMT")morie_entheo_load_fmri_subject(subject_id, condition = "DMT")
subject_id |
integer subject ID (e.g. 1, 2, 14). |
condition |
character: "DMT" (default) or "PCB". |
numeric matrix of shape (112, 840).
R parity of morie.entheo_dmt.lz_complexity. The
DMT-vs-PCB contrast on LZ complexity is one of Timmermann 2023's
headline findings: LZ rises under DMT, indicating increased
neural-signal diversity.
morie_entheo_lz_complexity(signal, threshold = NULL)morie_entheo_lz_complexity(signal, threshold = NULL)
signal |
numeric vector. |
threshold |
numeric or NULL. Binarisation threshold. NULL = median (the standard choice). |
RichResult-style named list with raw and length-normalised LZ.
Lempel, A. & Ziv, J. (1976). On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1): 75-81. Schartner, M. et al. (2015). Complexity of multi-dimensional spontaneous EEG decreases during propofol-induced general anaesthesia. PLOS ONE 10(8): e0133532.
R parity of morie.entheo_dmt.spectral_band_power. Wraps
rgpsd (morie's Welch PSD; same algorithm as SciPy's
welch) and integrates the PSD over each canonical band by
the trapezoidal rule.
morie_entheo_spectral_band_power( signal, fs = 200, bands = .MORIE_ENTHEO_DEFAULT_BANDS, nperseg = NULL )morie_entheo_spectral_band_power( signal, fs = 200, bands = .MORIE_ENTHEO_DEFAULT_BANDS, nperseg = NULL )
signal |
numeric vector. A 1-D EEG time series. |
fs |
numeric. Sampling frequency in Hz. Default 200 Hz (Timmermann 2023 acquisition). |
bands |
list of |
nperseg |
integer or NULL. Welch segment length. Defaults to
|
RichResult-style named list. payload$rows carries
per-band absolute and relative power.
Welch, P. (1967). The use of FFT for the estimation of power spectra. IEEE Trans. Audio Electroacoust. 15(2): 70-73. Rangayyan, R. M. & Krishnan, S. (2024). Biomedical Signal Analysis, 3rd ed., Ch. 5.
Combines IPW and outcome regression corrections. Consistent if either the propensity model or the outcome model is correctly specified.
morie_estimate_aipw( data, treatment, outcome, covariates, propensity_col = NULL, outcome_model = c("linear", "logistic") )morie_estimate_aipw( data, treatment, outcome, covariates, propensity_col = NULL, outcome_model = c("linear", "logistic") )
data |
A data frame. |
treatment |
Name of the binary treatment column. |
outcome |
Name of the outcome column. |
covariates |
Character vector of covariate names. |
propensity_col |
Optional: name of a pre-computed propensity score column. |
outcome_model |
Family for the outcome model: |
The propensity step delegates to WeightIt when installed
(via morie_estimate_propensity_scores). The outcome
regression and the doubly-robust influence-function score are
evaluated inline to preserve the closed-form SE used downstream.
Where richer outputs are desired, AIPW::AIPW (with SuperLearner
nuisance learners) is the canonical CRAN counterpart.
Named list: ate, se, ci_lower, ci_upper, n.
set.seed(1) df <- data.frame(t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200)) morie_estimate_aipw(df, "t", "y", "x")set.seed(1) df <- data.frame(t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200)) morie_estimate_aipw(df, "t", "y", "x")
Control units receive weight 1; treated units receive
.
morie_estimate_atc(data, treatment, outcome, covariates, propensity_col = NULL)morie_estimate_atc(data, treatment, outcome, covariates, propensity_col = NULL)
data |
A data frame. |
treatment |
Name of the binary treatment column. |
outcome |
Name of the outcome column. |
covariates |
Character vector of covariate names. |
propensity_col |
Optional: name of a pre-computed propensity score column. |
Propensity-score estimation delegates to WeightIt when
installed (via morie_estimate_propensity_scores); the
weighted-difference and influence-function SE run inline.
Named list: atc, se, ci_lower, ci_upper, n_control.
set.seed(1) df <- data.frame(t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200)) morie_estimate_atc(df, "t", "y", "x")set.seed(1) df <- data.frame(t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200)) morie_estimate_atc(df, "t", "y", "x")
The Hajek estimator uses stabilised IPW weights:
where
and .
morie_estimate_ate(data, treatment, outcome, covariates, propensity_col = NULL)morie_estimate_ate(data, treatment, outcome, covariates, propensity_col = NULL)
data |
A data frame. |
treatment |
Name of the binary treatment column. |
outcome |
Name of the outcome column. |
covariates |
Character vector of covariate names. |
propensity_col |
Optional: name of a pre-computed propensity score column. |
When WeightIt is installed the propensity step delegates to
WeightIt::weightit(); otherwise the inline logistic
regression is used. The Hajek difference and influence-function SE
below are evaluated inline either way so the result list shape and
the closed-form variance preserved.
Named list: ate, se, ci_lower, ci_upper, n, ess.
set.seed(1) df <- data.frame( t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200) ) morie_estimate_ate(df, "t", "y", "x")set.seed(1) df <- data.frame( t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200) ) morie_estimate_ate(df, "t", "y", "x")
Treated units receive weight 1; controls receive
.
morie_estimate_att(data, treatment, outcome, covariates, propensity_col = NULL)morie_estimate_att(data, treatment, outcome, covariates, propensity_col = NULL)
data |
A data frame. |
treatment |
Name of the binary treatment column. |
outcome |
Name of the outcome column. |
covariates |
Character vector of covariate names. |
propensity_col |
Optional: name of a pre-computed propensity score column. |
Propensity-score estimation delegates to WeightIt when
installed (via morie_estimate_propensity_scores); the
weighted-difference and influence-function SE run inline.
Named list: att, se, ci_lower, ci_upper, n_treated.
set.seed(2) df <- data.frame(t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200)) morie_estimate_att(df, "t", "y", "x")set.seed(2) df <- data.frame(t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200)) morie_estimate_att(df, "t", "y", "x")
The T-learner fits separate outcome models on treated and
control units, then predicts the counterfactual for each unit:
.
morie_estimate_cate( data, treatment, outcome, covariates, propensity_col = NULL, outcome_model = c("linear", "logistic"), meta_learner = c("t_learner", "s_learner") )morie_estimate_cate( data, treatment, outcome, covariates, propensity_col = NULL, outcome_model = c("linear", "logistic"), meta_learner = c("t_learner", "s_learner") )
data |
A data frame. |
treatment |
Name of the binary treatment column. |
outcome |
Name of the outcome column. |
covariates |
Character vector of covariate names. |
propensity_col |
Optional: name of a pre-computed propensity score column. |
outcome_model |
Family for the outcome model: |
meta_learner |
|
The S-learner fits one model with treatment as a feature.
For random-forest CATE estimation prefer grf::causal_forest
(richer heterogeneity, honest sample splitting).
Numeric vector of per-unit CATE estimates.
morie_estimate_cate( data = data.frame( t = stats::rbinom(100, 1, 0.4), y = stats::rbinom(100, 1, 0.3), x1 = stats::rnorm(100), x2 = stats::rnorm(100) ), treatment = "t", outcome = "y", covariates = c("x1", "x2") )morie_estimate_cate( data = data.frame( t = stats::rbinom(100, 1, 0.4), y = stats::rbinom(100, 1, 0.3), x1 = stats::rnorm(100), x2 = stats::rnorm(100) ), treatment = "t", outcome = "y", covariates = c("x1", "x2") )
Implements Chernozhukov et al. (2018) double/debiased machine
learning for the partially linear regression model. When the
DoubleML R package is installed, delegates to
DoubleML::DoubleMLPLR with random-forest nuisance learners.
Otherwise falls back to a hand-rolled cross-fit ridge
implementation: residualise and on via
K-fold ridge, then regress the outcome residual on the treatment
residual.
morie_estimate_double_ml( data, outcome, treatment, covariates, n_folds = 5L, n_rep = 1L, random_state = 42L )morie_estimate_double_ml( data, outcome, treatment, covariates, n_folds = 5L, n_rep = 1L, random_state = 42L )
data |
A data frame with treatment, outcome, and covariate columns. |
outcome |
Name of the continuous outcome column. |
treatment |
Name of the (binary) treatment column. |
covariates |
Character vector of covariate column names. |
n_folds |
Number of cross-fitting folds (default 5). |
n_rep |
Number of repeated cross-fitting repetitions (DoubleML only; ignored by the ridge fallback). Default 1. |
random_state |
Integer seed for cross-fit folds and learners (default 42). |
Named list with elements ate, se,
ci_lower, ci_upper, n, method.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68.
set.seed(1) n <- 200 X <- matrix(rnorm(n * 3), n, 3) d <- rbinom(n, 1, plogis(X[, 1])) y <- 0.5 * d + X[, 1] + rnorm(n) df <- data.frame(y = y, d = d, x1 = X[, 1], x2 = X[, 2], x3 = X[, 3]) morie_estimate_double_ml(df, "y", "d", c("x1", "x2", "x3"))set.seed(1) n <- 200 X <- matrix(rnorm(n * 3), n, 3) d <- rbinom(n, 1, plogis(X[, 1])) y <- 0.5 * d + X[, 1] + rnorm(n) df <- data.frame(y = y, d = d, x1 = X[, 1], x2 = X[, 2], x3 = X[, 3]) morie_estimate_double_ml(df, "y", "d", c("x1", "x2", "x3"))
Estimates the ATE by:
morie_estimate_g_computation( data, treatment, outcome, covariates, outcome_model = c("linear", "logistic") )morie_estimate_g_computation( data, treatment, outcome, covariates, outcome_model = c("linear", "logistic") )
data |
A data frame. |
treatment |
Name of the binary treatment column. |
outcome |
Name of the outcome column. |
covariates |
Character vector of covariate names. |
outcome_model |
Family for the outcome model: |
Delegates the standardisation step to stdReg::stdGlm() when
stdReg is installed; otherwise computes the contrast inline
from a single stats::glm() fit with treatment-flipped
counterfactual datasets.
Named list: ate, se, ci_lower, ci_upper.
set.seed(1) df <- data.frame(t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200)) morie_estimate_g_computation(df, "t", "y", "x")set.seed(1) df <- data.frame(t = rbinom(200, 1, 0.4), y = rnorm(200), x = rnorm(200)) morie_estimate_g_computation(df, "t", "y", "x")
Applies AIPW within each level of group_col to estimate
stratum-specific treatment effects.
morie_estimate_gate( data, treatment, outcome, covariates, group_col, propensity_col = NULL, outcome_model = c("linear", "logistic") )morie_estimate_gate( data, treatment, outcome, covariates, group_col, propensity_col = NULL, outcome_model = c("linear", "logistic") )
data |
A data frame. |
treatment |
Name of the binary treatment column. |
outcome |
Name of the outcome column. |
covariates |
Character vector of covariate names. |
group_col |
Name of the grouping variable (e.g. |
propensity_col |
Optional: name of a pre-computed propensity score column. |
outcome_model |
Family for the outcome model: |
Data frame with columns: group, ate, se,
ci_lower, ci_upper, n.
set.seed(3) df <- data.frame( t = rbinom(300, 1, 0.4), y = rnorm(300), x = rnorm(300), g = sample(c("A", "B"), 300, replace = TRUE) ) morie_estimate_gate(df, "t", "y", "x", "g")set.seed(3) df <- data.frame( t = rbinom(300, 1, 0.4), y = rnorm(300), x = rnorm(300), g = sample(c("A", "B"), 300, replace = TRUE) ) morie_estimate_gate(df, "t", "y", "x", "g")
Implements the IRM variant of Chernozhukov et al. (2018) double
machine learning, which allows treatment-effect heterogeneity by
fitting separate outcome regressions for and
alongside a propensity model. Uses DoubleML::DoubleMLIRM
when available; otherwise falls back to a hand-rolled cross-fit
estimator using logistic regression for the propensity score and
ridge regression for the conditional outcome regressions.
Thin R wrapper that dispatches to the CRAN DoubleML package's
DoubleML::DoubleMLIRM R6Class, mirroring the Python sibling
morie.estimate_irm() (which dispatches to the Python DoubleML package).
morie_estimate_irm( data, treatment, outcome, covariates, n_folds = 5, random_state = 42 ) morie_estimate_irm( data, treatment, outcome, covariates, n_folds = 5, random_state = 42 )morie_estimate_irm( data, treatment, outcome, covariates, n_folds = 5, random_state = 42 ) morie_estimate_irm( data, treatment, outcome, covariates, n_folds = 5, random_state = 42 )
data |
A |
treatment |
Column name of the binary treatment. |
outcome |
Column name of the outcome. |
covariates |
Character vector of covariate column names. |
n_folds |
Number of cross-fitting folds (default 5). |
random_state |
Random seed (default 42). |
Following the DoubleML R package's own conventions, this uses
the mlr3 ecosystem for the nuisance learners (ml_g for
and ml_m for ). Defaults are
lrn("regr.lm") and lrn("classif.log_reg"), which require nothing
beyond stats. For higher-capacity defaults, install ranger and pass
lrn("regr.ranger") / lrn("classif.ranger") via the underlying
DoubleML::DoubleMLIRM$new() directly.
Following Chernozhukov et al. (2018), the IRM extends the partially linear model by allowing fully heterogeneous treatment effects:
Named list with ate, se, ci_lower,
ci_upper, n, method.
A list with components: ate, se, ci_lower, ci_upper,
n, method ("IRM (DoubleML)").
Suggests
Requires the suggested packages DoubleML, mlr3, and mlr3learners.
Install with install.packages(c("DoubleML", "mlr3", "mlr3learners")).
If any are unavailable, the function raises an informative error.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68. doi:10.1111/ectj.12097
Bach, P., Chernozhukov, V., Kurz, M. S., & Spindler, M. (2024). DoubleML – An object-oriented implementation of double machine learning in R. Journal of Statistical Software, 108(3). doi:10.18637/jss.v108.i03
set.seed(1) n <- 200 X <- matrix(rnorm(n * 3), n, 3) d <- rbinom(n, 1, plogis(X[, 1])) y <- 0.5 * d + X[, 1] + rnorm(n) df <- data.frame(y = y, d = d, x1 = X[, 1], x2 = X[, 2], x3 = X[, 3]) morie_estimate_irm(df, treatment = "d", outcome = "y", covariates = c("x1", "x2", "x3")) if (requireNamespace("DoubleML", quietly = TRUE) && requireNamespace("mlr3", quietly = TRUE) && requireNamespace("mlr3learners", quietly = TRUE)) { set.seed(1) n <- 200 X <- matrix(rnorm(n * 5), n, 5) ps <- plogis(X[, 1] - X[, 2]) T <- rbinom(n, 1, ps) Y <- 0.5 * T + X[, 1] + rnorm(n) df <- data.frame(Y = Y, T = T, X) morie_estimate_irm(df, treatment = "T", outcome = "Y", covariates = paste0("X", 1:5) ) }set.seed(1) n <- 200 X <- matrix(rnorm(n * 3), n, 3) d <- rbinom(n, 1, plogis(X[, 1])) y <- 0.5 * d + X[, 1] + rnorm(n) df <- data.frame(y = y, d = d, x1 = X[, 1], x2 = X[, 2], x3 = X[, 3]) morie_estimate_irm(df, treatment = "d", outcome = "y", covariates = c("x1", "x2", "x3")) if (requireNamespace("DoubleML", quietly = TRUE) && requireNamespace("mlr3", quietly = TRUE) && requireNamespace("mlr3learners", quietly = TRUE)) { set.seed(1) n <- 200 X <- matrix(rnorm(n * 5), n, 5) ps <- plogis(X[, 1] - X[, 2]) T <- rbinom(n, 1, ps) Y <- 0.5 * T + X[, 1] + rnorm(n) df <- data.frame(Y = Y, T = T, X) morie_estimate_irm(df, treatment = "T", outcome = "Y", covariates = paste0("X", 1:5) ) }
Uses a binary instrument to identify the LATE
(Imbens & Angrist, 1994):
morie_estimate_late(data, treatment, outcome, instrument, covariates = NULL)morie_estimate_late(data, treatment, outcome, instrument, covariates = NULL)
data |
A data frame. |
treatment |
Name of the binary endogenous treatment column. |
outcome |
Name of the outcome column. |
instrument |
Name of the binary instrument column. |
covariates |
Optional character vector of exogenous covariates. |
With covariates, the routine delegates to ivreg::ivreg()
(ivreg) or AER::ivreg() when either package is
installed; otherwise it falls back to a manual two-stage OLS.
Without covariates the closed-form Wald estimator and its
delta-method SE are used.
Named list: late, se, ci_lower, ci_upper,
first_stage_f, n.
Imbens GW, Angrist JD (1994). Identification and estimation of local average treatment effects. Econometrica, 62(2), 467-475.
set.seed(1) n <- 300L z <- rbinom(n, 1, 0.5) t <- rbinom(n, 1, plogis(-0.2 + 1.5 * z)) y <- 0.8 * t + rnorm(n) morie_estimate_late(data.frame(t = t, y = y, z = z), "t", "y", "z")set.seed(1) n <- 300L z <- rbinom(n, 1, 0.5) t <- rbinom(n, 1, plogis(-0.2 + 1.5 * z)) y <- 0.8 * t + rnorm(n) morie_estimate_late(data.frame(t = t, y = y, z = z), "t", "y", "z")
Thin wrapper over WeightIt::weightit(method = "glm",
estimand = "ATE") when WeightIt is installed; falls back
to stats::glm(family = binomial()) otherwise.
morie_estimate_propensity_scores( data, treatment, covariates, trim = c(0.01, 0.99) )morie_estimate_propensity_scores( data, treatment, covariates, trim = c(0.01, 0.99) )
data |
A data frame. |
treatment |
Name of the binary treatment column. |
covariates |
Character vector of covariate names. |
trim |
Quantile pair used to winsorize extreme scores (default 0.01, 0.99). |
Numeric vector of propensity scores (same length as
nrow(data)).
df <- data.frame(t = c(0, 1, 0, 1, 0, 1), x = rnorm(6)) ps <- morie_estimate_propensity_scores(df, "t", "x")df <- data.frame(t = c(0, 1, 0, 1, 0, 1), x = rnorm(6)) ps <- morie_estimate_propensity_scores(df, "t", "x")
EWMA volatility (RiskMetrics 1996)
morie_ewma_volatility(x, lambda = 0.94)morie_ewma_volatility(x, lambda = 0.94)
x |
Numeric return series. |
lambda |
Decay factor in (0,1). Default 0.94 (daily RiskMetrics). |
Named list with conditional_variance, conditional_volatility,
lambda, n, last_variance, last_volatility, method.
morie_ewma_volatility(x = rnorm(50))morie_ewma_volatility(x = rnorm(50))
Rename a city data.frame onto the canonical audit schema
morie_fairness_apply_profile(df, profile)morie_fairness_apply_profile(df, profile)
df |
A |
profile |
A |
A new data.frame with the profile's columns renamed
to the canonical names, retaining only those canonical columns.
df <- data.frame(beat = c("A", "B"), score = c(0.1, 0.9)) p <- morie_fairness_city_profile( "demo", area_col = "beat", risk_col = "score" ) morie_fairness_apply_profile(df, p)df <- data.frame(beat = c("A", "B"), score = c(0.1, 0.9)) p <- morie_fairness_city_profile( "demo", area_col = "beat", risk_col = "score" ) morie_fairness_apply_profile(df, p)
For each non-reference group, the average odds difference is
0.5 * ((FPR_group - FPR_ref) + (TPR_group - TPR_ref)).
Zero means parity of errors; values away from zero mean the
combined error profile favours one group over another. Used in
IBM AIF360 and in the COMPAS XAI Stories audit.
For each non-reference group,
0.5 * ((FPR_group - FPR_ref) + (TPR_group - TPR_ref)). Zero means
parity of errors. This is the single-number summary used in IBM
AIF360 and the COMPAS XAI Stories audit.
morie_fairness_average_odds_difference( y_true, y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_average_odds_difference( y_true, y_pred, group, privileged = NULL, favorable = 1 )morie_fairness_average_odds_difference( y_true, y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_average_odds_difference( y_true, y_pred, group, privileged = NULL, favorable = 1 )
y_true |
Vector of realised ground-truth outcomes. |
y_pred |
Vector of system decisions. |
group |
Vector of protected-attribute values. |
privileged |
The reference group (inferred if |
favorable |
The value treated as the positive class (default |
A morie_fairness_result; headline value is the
largest absolute AOD across groups.
A named list: value (largest absolute AOD),
average_odds_difference, rates, privileged, warnings,
interpretation.
truth <- c(1, 0, 1, 0, 1, 0, 1, 0) pred <- c(1, 0, 1, 0, 1, 1, 0, 1) race <- c(rep("A", 4), rep("B", 4)) res <- morie_fairness_average_odds_difference(truth, pred, race, privileged = "A" ) res$value # 0.25truth <- c(1, 0, 1, 0, 1, 0, 1, 0) pred <- c(1, 0, 1, 0, 1, 1, 0, 1) race <- c(rep("A", 4), rep("B", 4)) res <- morie_fairness_average_odds_difference(truth, pred, race, privileged = "A" ) res$value # 0.25
BAS = delta_parity * G, where delta_parity is the
demographic-parity gap of the worst-affected group and G
is the Gini coefficient of the per-group favourable-outcome rates.
Large only when a directional disparity coincides with high
cross-group inequality.
BAS = Delta_parity * G, where Delta_parity is the demographic
parity gap of the worst-affected group and G is the Gini
coefficient of the per-group favourable-outcome rates. Large only
when a directional disparity coincides with high overall inequality.
morie_fairness_bias_amplification( y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_bias_amplification( y_pred, group, privileged = NULL, favorable = 1 )morie_fairness_bias_amplification( y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_bias_amplification( y_pred, group, privileged = NULL, favorable = 1 )
y_pred |
Vector of decisions/assignments, one per individual. |
group |
Vector of protected-attribute values (e.g. race). |
privileged |
The reference group. If |
favorable |
The value of |
Reimplemented from Barman & Barman, arXiv:2603.18987.
Reimplemented from Barman & Barman, "Unmasking Algorithmic Bias in Predictive Policing" (arXiv:2603.18987).
A morie_fairness_result; headline value is BAS.
A named list: value (BAS), bias_amplification_score,
demographic_parity_gap, gini, rates, privileged,
warnings, interpretation.
pred <- c(1, 1, 1, 1, 0, 0, 0, 0) race <- c(rep("A", 4), rep("B", 4)) res <- morie_fairness_bias_amplification(pred, race, privileged = "A") res$value # -0.5 (parity gap -1.0 times Gini 0.5)pred <- c(1, 1, 1, 1, 0, 0, 0, 0) race <- c(rep("A", 4), rep("B", 4)) res <- morie_fairness_bias_amplification(pred, race, privileged = "A") res$value # -0.5 (parity gap -1.0 times Gini 0.5)
The five canonical per-area fields the audit consumes.
MORIE_FAIRNESS_CANONICAL_FIELDSMORIE_FAIRNESS_CANONICAL_FIELDS
An object of class character of length 5.
Each *_col argument names the column, in that city's own
export, that carries the corresponding canonical field.
risk_col and outcome_col may be NULL when a
city only supplies one side (e.g. risk scores but no realised-
outcome feed); the missing side must then be supplied separately
to the audit.
morie_fairness_city_profile( name, area_col, risk_col = NULL, outcome_col = NULL, population_col = NULL, group_col = NULL, notes = "" )morie_fairness_city_profile( name, area_col, risk_col = NULL, outcome_col = NULL, population_col = NULL, group_col = NULL, notes = "" )
name |
Character identifier used with
|
area_col |
Column holding the area / district / precinct identifier. Required. |
risk_col, outcome_col, population_col, group_col
|
Optional columns for predicted risk, realised-outcome count, area population, and protected attribute. |
notes |
Free-text provenance or caveats, surfaced to the user. |
A list of class morie_city_profile.
p <- morie_fairness_city_profile( "chicago", area_col = "community_area", risk_col = "rti", group_col = "majority_race" ) p$namep <- morie_fairness_city_profile( "chicago", area_col = "community_area", risk_col = "rti", group_col = "majority_race" ) p$name
Returns a named character vector mapping source column names (those defined on the profile) to canonical field names.
morie_fairness_column_map(profile)morie_fairness_column_map(profile)
profile |
A |
A named character vector c(source = canonical).
The gap is rate(group) - rate(privileged). Demographic
parity holds when every group receives favourable outcomes at the
same rate, i.e. all gaps are zero. Unlike the disparate-impact
ratio this additive form is well defined even when the privileged
rate is zero.
The additive difference in favourable-outcome rates,
rate(group) - rate(privileged). Demographic parity holds when every
group receives favourable outcomes at the same rate.
morie_fairness_demographic_parity( y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_demographic_parity( y_pred, group, privileged = NULL, favorable = 1 )morie_fairness_demographic_parity( y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_demographic_parity( y_pred, group, privileged = NULL, favorable = 1 )
y_pred |
Vector of decisions/assignments, one per individual. |
group |
Vector of protected-attribute values (e.g. race). |
privileged |
The reference group. If |
favorable |
The value of |
A morie_fairness_result; headline value is the
largest absolute gap across groups.
A named list: value (largest absolute gap), gaps, rates,
privileged, warnings, interpretation.
pred <- c(1, 1, 1, 1, 0, 0, 0, 1, 0, 0) race <- c("A","A","A","A","A","B","B","B","B","B") morie_fairness_demographic_parity(pred, race, privileged = "A")$value pred <- c(1, 1, 1, 1, 0, 0, 0, 1, 0, 0) race <- c(rep("A", 5), rep("B", 5)) res <- morie_fairness_demographic_parity(pred, race, privileged = "A") res$value # -0.6 (group B rate 0.2 minus group A rate 0.8)pred <- c(1, 1, 1, 1, 0, 0, 0, 1, 0, 0) race <- c("A","A","A","A","A","B","B","B","B","B") morie_fairness_demographic_parity(pred, race, privileged = "A")$value pred <- c(1, 1, 1, 1, 0, 0, 0, 1, 0, 0) race <- c(rep("A", 5), rep("B", 5)) res <- morie_fairness_demographic_parity(pred, race, privileged = "A") res$value # -0.6 (group B rate 0.2 minus group A rate 0.8)
For each group, the disparate-impact ratio is its favourable- outcome rate divided by the privileged group's rate. A value below 0.8 is the standard legal indicator of adverse impact.
For each group, the disparate-impact ratio is its favourable-outcome rate divided by the privileged group's rate. A value below 0.8 is the standard legal indicator of adverse impact.
morie_fairness_disparate_impact( y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_disparate_impact( y_pred, group, privileged = NULL, favorable = 1 )morie_fairness_disparate_impact( y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_disparate_impact( y_pred, group, privileged = NULL, favorable = 1 )
y_pred |
Vector of decisions/assignments, one per individual. |
group |
Vector of protected-attribute values (e.g. race). |
privileged |
The reference group. If |
favorable |
The value of |
A morie_fairness_result; headline value is the
worst (smallest) ratio across groups.
A named list: value (worst ratio), ratios, rates,
privileged, adverse_impact, threshold, warnings,
interpretation.
pred <- c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0) race <- c("A","A","A","A","A","B","B","B","B","B") morie_fairness_disparate_impact(pred, race, privileged = "A")$value pred <- c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0) race <- c(rep("A", 5), rep("B", 5)) res <- morie_fairness_disparate_impact(pred, race, privileged = "A") res$value # 0.6 (group B rate 0.6 / group A rate 1.0) res$adverse_impact # TRUEpred <- c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0) race <- c("A","A","A","A","A","B","B","B","B","B") morie_fairness_disparate_impact(pred, race, privileged = "A")$value pred <- c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0) race <- c(rep("A", 5), rep("B", 5)) res <- morie_fairness_disparate_impact(pred, race, privileged = "A") res$value # 0.6 (group B rate 0.6 / group A rate 1.0) res$adverse_impact # TRUE
Equalized odds holds when both the true-positive rate and the false-positive rate are equal across groups. Needs ground truth, so it audits a system's errors rather than its decision rates - a system can satisfy demographic parity yet make many more false positives against one group.
Equalized odds holds when the true-positive rate (TPR) and false-positive rate (FPR) are equal across groups. Needs ground-truth labels, so it audits a system's errors, not just its decision rates.
morie_fairness_equalized_odds( y_true, y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_equalized_odds( y_true, y_pred, group, privileged = NULL, favorable = 1 )morie_fairness_equalized_odds( y_true, y_pred, group, privileged = NULL, favorable = 1 ) morie_fairness_equalized_odds( y_true, y_pred, group, privileged = NULL, favorable = 1 )
y_true |
Vector of realised ground-truth outcomes. |
y_pred |
Vector of system decisions. |
group |
Vector of protected-attribute values. |
privileged |
The reference group (inferred if |
favorable |
The value treated as the positive class (default |
A morie_fairness_result; headline value is the
largest absolute TPR-or-FPR gap.
A named list: value (largest absolute TPR/FPR gap),
tpr_gaps, fpr_gaps, rates, privileged, violation,
warnings, interpretation.
truth <- c(1,0,1,0,1,0,1,0) pred <- c(1,0,1,0,1,1,0,1) race <- c("A","A","A","A","B","B","B","B") morie_fairness_equalized_odds(truth, pred, race, privileged="A")$value truth <- c(1, 0, 1, 0, 1, 0, 1, 0) pred <- c(1, 0, 1, 0, 1, 1, 0, 1) race <- c(rep("A", 4), rep("B", 4)) res <- morie_fairness_equalized_odds(truth, pred, race, privileged = "A") res$violation # TRUEtruth <- c(1,0,1,0,1,0,1,0) pred <- c(1,0,1,0,1,1,0,1) race <- c("A","A","A","A","B","B","B","B") morie_fairness_equalized_odds(truth, pred, race, privileged="A")$value truth <- c(1, 0, 1, 0, 1, 0, 1, 0) pred <- c(1, 0, 1, 0, 1, 1, 0, 1) race <- c(rep("A", 4), rep("B", 4)) res <- morie_fairness_equalized_odds(truth, pred, race, privileged = "A") res$violation # TRUE
Look up a registered city profile by case-insensitive name
morie_fairness_get_city(name)morie_fairness_get_city(name)
name |
Character. The profile name (case-insensitive). |
A morie_city_profile.
Ranges from 0 (perfect equality) to nearly 1 (one unit holds
everything). Applied to risk scores or stop counts, it measures
how unequally a predictive system concentrates its attention.
When group is supplied, per-group Gini values are also
reported.
Ranges from 0 (perfect equality) to nearly 1 (one unit holds
everything). Applied to risk scores or patrol counts it measures how
unequally a system concentrates its attention. With group supplied,
a per-group Gini is also reported.
morie_fairness_gini(values, group = NULL) morie_fairness_gini(values, group = NULL)morie_fairness_gini(values, group = NULL) morie_fairness_gini(values, group = NULL)
values |
Vector of non-negative quantities. |
group |
Optional vector of protected-attribute values; enables the per-group breakdown. |
A morie_fairness_result; headline value is the
overall Gini.
A named list: value (overall Gini), gini, per_group,
warnings, interpretation.
morie_fairness_gini(c(5, 5, 5, 5))$value morie_fairness_gini(c(0, 0, 0, 100))$value morie_fairness_gini(c(5, 5, 5, 5))$value # 0 morie_fairness_gini(c(0, 0, 0, 100))$value # 0.75morie_fairness_gini(c(5, 5, 5, 5))$value morie_fairness_gini(c(0, 0, 0, 100))$value morie_fairness_gini(c(5, 5, 5, 5))$value # 0 morie_fairness_gini(c(0, 0, 0, 100))$value # 0.75
List registered city profile names
morie_fairness_list_cities()morie_fairness_list_cities()
Sorted character vector of registered profile names.
For each crime location, computes 1 - (1 - p_detect)^k where
k is the number of officers within radius.
morie_fairness_noisy_or_detection( crime_xy, officer_xy, radius, p_detect = 0.85, seed = NULL )morie_fairness_noisy_or_detection( crime_xy, officer_xy, radius, p_detect = 0.85, seed = NULL )
crime_xy |
Numeric (n, 2) matrix of crime coordinates. |
officer_xy |
Numeric (m, 2) matrix of officer coordinates. |
radius |
Detection radius (positive). |
p_detect |
Per-officer detection probability in (0, 1]. |
seed |
Optional integer; when supplied, a Bernoulli outcome is
sampled per crime and returned in |
morie_fairness_result with $probabilities,
$officers_in_range, optional $detected.
Three callables that mirror the SciencesPo Predictive-Policing-Chicago analysis, city-agnostically:
morie_fairness_predpol_aggregate_areas: per-record to
per-area roll-up.
morie_fairness_predpol_calibration_audit: Spearman
correlation between predicted risk rank and realised outcome
rank, plus per-group rank-gap analysis.
morie_fairness_predpol_score_disparity: descriptive
score-by-group summary with one-way ANOVA.
Aggregate per-record predictive-policing data to per-area
morie_fairness_predpol_aggregate_areas( area, risk, outcome, group = NULL, population = NULL )morie_fairness_predpol_aggregate_areas( area, risk, outcome, group = NULL, population = NULL )
area |
Area identifier per record. |
risk |
Predicted-risk score per record. |
outcome |
Realised-outcome indicator/count per record. |
group |
Optional protected attribute per record; the per-area majority becomes the area's label. |
population |
Optional named numeric vector mapping area to population, or a per-record vector (taken as constant within an area). When supplied, outcome rate is per 10,000 inhabitants. |
A list with areas, mean_risk,
outcome_rate, group, n_records.
Predicted-vs-realised rank audit by demographic group
morie_fairness_predpol_calibration_audit(areas, mean_risk, outcome_rate, group)morie_fairness_predpol_calibration_audit(areas, mean_risk, outcome_rate, group)
areas |
Area identifiers (per area, not per record). |
mean_risk |
Mean predicted-risk score per area. |
outcome_rate |
Realised-outcome rate per area. |
group |
Majority/dominant group per area. |
morie_fairness_result; $value is the
largest-magnitude per-group mean rank gap (positive = over-policed).
Descriptive score-by-group disparity
morie_fairness_predpol_score_disparity(score, group, reference = NULL)morie_fairness_predpol_score_disparity(score, group, reference = NULL)
score |
Continuous risk score per individual. |
group |
Protected attribute per individual. |
reference |
Optional reference group label (default: lowest-mean). |
morie_fairness_result; $value is the spread
(max - min) of per-group mean scores.
For every (city, period) cell the audit computes the
four disparity metrics, then aggregates per city - reporting the
mean of each metric, the count of periods with DIR > 1
(over-prediction periods), and the DIR temporal range (max - min)
which quantifies how unstable the metric is across the audited
window.
morie_fairness_predpol_temporal_audit( period, city, y_pred, group, privileged = NULL, favorable = 1 )morie_fairness_predpol_temporal_audit( period, city, y_pred, group, privileged = NULL, favorable = 1 )
period |
Time-period label per record (e.g. |
city |
City label per record. |
y_pred |
Decision / assignment per record. |
group |
Protected attribute per record. |
privileged |
Reference group. If |
favorable |
Value of |
The reference (privileged) group is inferred globally from the pooled data when not supplied, so every cell uses the same reference.
A morie_fairness_result; headline value is the
largest per-city DIR temporal range - the worst temporal
instability found in the audited window.
period <- c(rep("p1", 10), rep("p2", 10)) city <- rep("A", 20) pred <- rep(c(1,1,1,1,1,1,1,1,0,0), 2) grp <- rep(c(rep("X",5), rep("Y",5)), 2) res <- morie_fairness_predpol_temporal_audit( period, city, pred, grp, privileged = "X" ) res$payload$per_city$A$dir_rangeperiod <- c(rep("p1", 10), rep("p2", 10)) city <- rep("A", 20) pred <- rep(c(1,1,1,1,1,1,1,1,0,0), 2) grp <- rep(c(rep("X",5), rep("Y",5)), 2) res <- morie_fairness_predpol_temporal_audit( period, city, pred, grp, privileged = "X" ) res$payload$per_city$A$dir_range
Register a city profile in the process-local registry
morie_fairness_register_city(profile, overwrite = FALSE)morie_fairness_register_city(profile, overwrite = FALSE)
profile |
A |
overwrite |
If |
Invisibly returns the registered profile.
Generates per-record data with area, group,
true_outcome (group-independent Bernoulli at base_rate),
detected (group-dependent), and risk_score (0–500,
shifted up by bias * 100 points for non-reference groups).
The bias input is the ground truth the audits should recover.
morie_fairness_simulate_biased_crime_data( n = 2000L, groups = c("A", "B"), group_props = NULL, n_areas = 20L, base_rate = 0.3, bias = 0.5, seed = 0L )morie_fairness_simulate_biased_crime_data( n = 2000L, groups = c("A", "B"), group_props = NULL, n_areas = 20L, base_rate = 0.3, bias = 0.5, seed = 0L )
n |
Number of records. |
groups |
Character vector of group labels (the first entry is treated as the reference group). |
group_props |
Optional sampling proportions. |
n_areas |
Number of areas (>= number of groups). |
base_rate |
Reference-group favourable-outcome rate in 0–1. |
bias |
Injected disparity in -1–1. |
seed |
Reproducibility seed. |
A data.frame with columns area, group, true_outcome, detected, risk_score.
Pure base-R ports of the Noisy-OR detection model and the biased
crime-data simulator from morie.fairness.simulation, both
originally distilled from Barman & Barman (arXiv:2603.18987).
No optional dependencies.
R ports of the explainer suite in morie.fairness.xai.
Prefers iml for permutation importance / PDP / SHAP-ish
attributions when available; otherwise computes the same quantities
in base R from first principles. Every callable takes a
predict_fn closure (matrix -> numeric vector) so it works on
any classifier or risk model.
morie_fairness_xai_permutation_importance
morie_fairness_xai_partial_dependence
morie_fairness_xai_ale
morie_fairness_xai_ceteris_paribus
morie_fairness_xai_shap_values
First-order Accumulated Local Effects (Apley & Zhu)
morie_fairness_xai_ale( predict_fn, X, feature, feature_names = NULL, n_bins = 10L )morie_fairness_xai_ale( predict_fn, X, feature, feature_names = NULL, n_bins = 10L )
predict_fn |
Function mapping an (n, d) matrix to n numeric predictions. |
X |
Numeric matrix or data.frame. |
feature |
Index or name of the feature to sweep. |
feature_names |
Optional character vector. |
n_bins |
Number of quantile bins. |
morie_fairness_result; $value is the ALE range.
Holds every feature of x fixed except feature, sweeps
it across the range observed in X_ref, and reports the
resulting prediction profile.
morie_fairness_xai_ceteris_paribus( predict_fn, x, feature, X_ref, feature_names = NULL, grid_size = 20L )morie_fairness_xai_ceteris_paribus( predict_fn, x, feature, X_ref, feature_names = NULL, grid_size = 20L )
predict_fn |
Function mapping (n, d) matrix to n predictions. |
x |
Numeric vector of length d (the instance). |
feature |
Index or name of the feature to vary. |
X_ref |
Reference matrix used for the feature range. |
feature_names |
Optional character vector. |
grid_size |
Number of grid points. |
morie_fairness_result; $value is the
profile's swing (max - min).
Partial dependence on one feature (Friedman)
morie_fairness_xai_partial_dependence( predict_fn, X, feature, feature_names = NULL, grid_size = 20L )morie_fairness_xai_partial_dependence( predict_fn, X, feature, feature_names = NULL, grid_size = 20L )
predict_fn |
Function mapping an (n, d) matrix to n numeric predictions. |
X |
Numeric matrix or data.frame. |
feature |
Index or name of the feature to sweep. |
feature_names |
Optional character vector. |
grid_size |
Number of grid points. |
morie_fairness_result; $value is the PD range.
Permutation feature importance (model-agnostic)
morie_fairness_xai_permutation_importance( predict_fn, X, feature_names = NULL, n_repeats = 10L, protected = NULL, seed = 0L )morie_fairness_xai_permutation_importance( predict_fn, X, feature_names = NULL, n_repeats = 10L, protected = NULL, seed = 0L )
predict_fn |
Function mapping an (n, d) matrix to n numeric predictions. |
X |
Numeric matrix or data.frame. |
feature_names |
Optional character vector. |
n_repeats |
Shuffles averaged per feature. |
protected |
Character vector of protected-attribute names; any that rank in the top third trigger a bias warning. |
seed |
Reproducibility seed. |
morie_fairness_result; $value is the largest
importance.
Shapley feature attributions for one instance (sampling estimator)
morie_fairness_xai_shap_values( predict_fn, x, background, feature_names = NULL, n_samples = 200L, seed = 0L )morie_fairness_xai_shap_values( predict_fn, x, background, feature_names = NULL, n_samples = 200L, seed = 0L )
predict_fn |
Function mapping (n, d) matrix to n predictions. |
x |
Numeric vector of length d (the instance). |
background |
Reference matrix (n_bg, d) for marginal averaging. |
feature_names |
Optional character vector. |
n_samples |
Number of random permutations averaged. |
seed |
Reproducibility seed. |
morie_fairness_result; $value is the
largest-magnitude SHAP value.
Mirrors morie.fast.is_jit_available() on the Python side.
Returns TRUE when the Rcpp .so was built and loaded; FALSE when
falling back to base-R implementations.
morie_fast_available()morie_fast_available()
A logical scalar: TRUE when the compiled Rcpp backend was
built and loaded, FALSE when falling back to base-R kernels.
morie_fast_available()morie_fast_available()
Thin extender over fdrtool::fdrtool for the
Strimmer (2008) shrinkage estimator of local and tail-area
false-discovery rates, q-values, and the underlying
null-distribution scale parameter.
morie_fdr_qvalues(x, statistic = "normal", ...)morie_fdr_qvalues(x, statistic = "normal", ...)
x |
Numeric vector of test statistics or p-values, as
appropriate to |
statistic |
Character; the type of statistic in |
... |
Further arguments forwarded to
|
A list with $method = "fdrtool::fdrtool" and
$raw (the fdrtool return list with pval,
qval, lfdr, and param).
## Not run: if (requireNamespace("fdrtool", quietly = TRUE)) { set.seed(1) x <- c(stats::rnorm(900), stats::rnorm(100, mean = 3)) morie_fdr_qvalues(x, statistic = "normal") } ## End(Not run)## Not run: if (requireNamespace("fdrtool", quietly = TRUE)) { set.seed(1) x <- c(stats::rnorm(900), stats::rnorm(100, mean = 3)) morie_fdr_qvalues(x, statistic = "normal") } ## End(Not run)
A universal data-access entry point. Given a URL, MORIE detects the
format from the HTTP Content-Type header (falling back to the
URL extension), downloads the resource, and parses it into an R
object. The behaviour is automatic by default but every step is
controllable: pass an explicit format, extra query
params, a zip_member to extract, or reader arguments
via ....
morie_fetch( url, format = c("auto", "csv", "tsv", "json", "xml", "html", "xlsx", "zip", "arcgis"), params = NULL, zip_member = "", simplify = TRUE, ... )morie_fetch( url, format = c("auto", "csv", "tsv", "json", "xml", "html", "xlsx", "zip", "arcgis"), params = NULL, zip_member = "", simplify = TRUE, ... )
url |
The resource URL. |
format |
One of |
params |
Optional named list appended to |
zip_member |
For |
simplify |
For |
... |
Passed to the underlying reader (e.g. |
Supported formats: csv, tsv, json, xml,
html, xlsx, zip (extract one member), and
arcgis (delegates to morie_fetch_arcgis).
A data.frame for tabular formats; a list or document object
for non-tabular json/xml/html.
morie_ckan_search, morie_fetch_arcgis
## Not run: # Examples use placeholder URLs (example.org). Replace with a # real CSV / JSON endpoint when running. df <- morie_fetch("https://example.org/data.csv") js <- morie_fetch("https://api.example.org/records", format = "json", params = list(limit = 100) ) ## End(Not run)## Not run: # Examples use placeholder URLs (example.org). Replace with a # real CSV / JSON endpoint when running. df <- morie_fetch("https://example.org/data.csv") js <- morie_fetch("https://api.example.org/records", format = "json", params = list(limit = 100) ) ## End(Not run)
Pulls attribute records from an ArcGIS REST layer, paginating through
the server transfer limit automatically (ArcGIS caps a single query
at maxRecordCount features, typically 1000-2000).
morie_fetch_arcgis( layer_url, where = "1=1", out_fields = "*", params = NULL, page_size = 2000L, max_records = Inf )morie_fetch_arcgis( layer_url, where = "1=1", out_fields = "*", params = NULL, page_size = 2000L, max_records = Inf )
layer_url |
The layer URL, ending in |
where |
SQL-style WHERE filter (default |
out_fields |
Comma-separated field list (default |
params |
Optional named list of extra query parameters. |
page_size |
Records requested per page (default 2000). |
max_records |
Cap on the total number of records (default
|
A data.frame of feature attributes (geometry is dropped).
## Not run: layer <- paste0( "https://services.arcgis.com/ORG/arcgis/rest/", "services/Assault/FeatureServer/0" ) df <- morie_fetch_arcgis(layer) ## End(Not run)## Not run: layer <- paste0( "https://services.arcgis.com/ORG/arcgis/rest/", "services/Assault/FeatureServer/0" ) df <- morie_fetch_arcgis(layer) ## End(Not run)
Fetch data from the CKAN API and cache it
morie_fetch_ckan( dataset_key = "cpads", limit = Inf, db_path = NULL, resource_id = NULL, con = NULL )morie_fetch_ckan( dataset_key = "cpads", limit = Inf, db_path = NULL, resource_id = NULL, con = NULL )
dataset_key |
One of |
limit |
Maximum records to fetch. The CKAN datastore caps a
single request at 32000 rows, so larger resources are paged through
with |
db_path |
Optional override for the database path. |
resource_id |
Optional CKAN datastore resource id. When supplied
(e.g. from |
con |
Optional pre-opened DBI connection (overrides |
A data.frame.
## Not run: # Requires network access. Fetches the first 5000 rows of the # Canadian Postsecondary Alcohol and Drug Use Survey from the # Government of Canada CKAN datastore: cpads <- morie_fetch_ckan(dataset_key = "cpads", limit = 5000L) nrow(cpads) ## End(Not run)## Not run: # Requires network access. Fetches the first 5000 rows of the # Canadian Postsecondary Alcohol and Drug Use Survey from the # Government of Canada CKAN datastore: cpads <- morie_fetch_ckan(dataset_key = "cpads", limit = 5000L) nrow(cpads) ## End(Not run)
Fetches and parses the Ontario Special Investigations Unit (police-oversight) corpus – every director's report and the news releases they link – into a single CSV with the canonical 64-column schema, one row per case.
morie_fetch_siu( cache_dir = file.path(tempdir(), "morie", "siu"), overwrite = FALSE, max_drid = NULL, concurrency = 4L, rate_rps = 4, use_manifest = TRUE, lang = c("all", "en", "fr"), cache_html = FALSE, progress = TRUE )morie_fetch_siu( cache_dir = file.path(tempdir(), "morie", "siu"), overwrite = FALSE, max_drid = NULL, concurrency = 4L, rate_rps = 4, use_manifest = TRUE, lang = c("all", "en", "fr"), cache_html = FALSE, progress = TRUE )
cache_dir |
Output directory. Defaults to a session-scoped
subdirectory of |
overwrite |
Logical; if |
max_drid |
Highest director's-report id to fetch. |
concurrency |
Maximum simultaneous HTTP transfers. Default
|
rate_rps |
Maximum request starts per second across the pool
(token-bucket throttle). Default |
use_manifest |
If |
lang |
Language filter on the manifest. |
cache_html |
If |
progress |
Logical; print progress messages. |
The parser is implemented entirely in C/C++ (src/siu_parser.cpp):
libcurl drives the HTTP transport and a concurrent curl_multi
pool fetches the ~9,000+ pages, while the 64-field extraction is C++
std::regex parsing. There is no Python dependency.
This is the Ontario Special Investigations Unit – distinct from the federal Structured Intervention Units and from OTIS. The parsed corpus is not shipped with the package; each user runs the parser themselves, which is fair use of public oversight reports.
Path to the written SIU.csv.
## Not run: # Network: parses the full Ontario SIU corpus (~15-25 min at the # default polite rate of 4 RPS). csv <- morie_fetch_siu(cache_dir = tempdir()) siu <- utils::read.csv(csv) nrow(siu) ## End(Not run)## Not run: # Network: parses the full Ontario SIU corpus (~15-25 min at the # default polite rate of 4 RPS). csv <- morie_fetch_siu(cache_dir = tempdir()) siu <- utils::read.csv(csv) nrow(siu) ## End(Not run)
Pages through the ArcGIS /query endpoint and writes a tidy CSV to
the morie cache directory. Calls back to a cached file on subsequent
calls unless overwrite = TRUE.
morie_fetch_tps( category, cache_dir = file.path(tempdir(), "morie", "tps"), where = "1=1", overwrite = FALSE, max_per_page = 2000L )morie_fetch_tps( category, cache_dir = file.path(tempdir(), "morie", "tps"), where = "1=1", overwrite = FALSE, max_per_page = 2000L )
category |
One of |
cache_dir |
Directory for the CSV. Defaults to a
session-scoped subdirectory of |
where |
ArcGIS SQL where clause (default |
overwrite |
Logical; if |
max_per_page |
ArcGIS page size (default |
Path to the CSV.
## Not run: # Network: fetches major-crime indicators from the Toronto Police # ArcGIS open-data layer. csv <- morie_fetch_tps( category = "Assault", cache_dir = tempdir(), where = "OCC_YEAR = 2024" ) tps <- utils::read.csv(csv) nrow(tps) ## End(Not run)## Not run: # Network: fetches major-crime indicators from the Toronto Police # ArcGIS open-data layer. csv <- morie_fetch_tps( category = "Assault", cache_dir = tempdir(), where = "OCC_YEAR = 2024" ) tps <- utils::read.csv(csv) nrow(tps) ## End(Not run)
Fisher's exact test for 2x2 tables
morie_fisher_exact_test( table_2x2, alternative = c("two.sided", "greater", "less") )morie_fisher_exact_test( table_2x2, alternative = c("two.sided", "greater", "less") )
table_2x2 |
A 2x2 matrix or data frame of counts. |
alternative |
|
Named list: odds_ratio, ci, p_value.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
morie_garch_fit(x)morie_garch_fit(x)
x |
Numeric return series. |
Named list with omega, alpha, beta, persistence, loglik,
conditional_variance, n, method.
morie_garch_fit(x = rnorm(50))morie_garch_fit(x = rnorm(50))
Solves Henderson's MME with VanRaden G.
morie_gblup_full(x, y, markers, lambda_gblup = NULL)morie_gblup_full(x, y, markers, lambda_gblup = NULL)
x |
Fixed-effect design (vector or matrix). |
y |
Numeric response. |
markers |
Genotype matrix (n x m). |
lambda_gblup |
Optional ratio sigma_e^2 / sigma_g^2 (default h^2=0.5). |
Named list (estimate, g_hat, beta, se, lambda_gblup, n, method).
Montesinos Lopez Ch 3.
morie_gblup_full(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))morie_gblup_full(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))
Generate a stationarity-preserving AR coefficient matrix
morie_generate_ar_coefficients( p, rng, spectral_radius = 0.8, diagonal_bias = 0.4 )morie_generate_ar_coefficients( p, rng, spectral_radius = 0.8, diagonal_bias = 0.4 )
p |
Dimension (number of variables). |
rng |
An environment from |
spectral_radius |
Target spectral radius < 1. |
diagonal_bias |
Mixture weight between diagonal autoregression (1) and full off-diagonal coupling (0). |
A p x p numeric matrix A.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Generates non-identifying synthetic data suitable for development, testing,
and demos. The generator uses a canonical variable set and allows output
column renaming through name_map so it can be adapted to multiple studies.
Synthetic data should not be used for final inferential reporting.
morie_generate_synthetic_data( n = 5000L, seed = 42L, special_code_rate = 0.02, profile = c("generic", "morie_legacy"), name_map = NULL )morie_generate_synthetic_data( n = 5000L, seed = 42L, special_code_rate = 0.02, profile = c("generic", "morie_legacy"), name_map = NULL )
n |
Number of rows. |
seed |
Random seed for reproducibility. |
special_code_rate |
Proportion of values replaced with survey-style
special missing codes ( |
profile |
Convenience profile for output naming; ignored when
|
name_map |
Optional named character vector mapping canonical keys to
output column names. Use |
A data.frame with synthetic records.
df <- morie_generate_synthetic_data(n = 200, seed = 1) nrow(df)df <- morie_generate_synthetic_data(n = 200, seed = 1) nrow(df)
Generate a VAR(L) coefficient array as a 3-d list
morie_generate_var_coefficients( p, lags, rng, spectral_radius = 0.8, decay = 0.6 )morie_generate_var_coefficients( p, lags, rng, spectral_radius = 0.8, decay = 0.6 )
p |
Number of variables. |
lags |
Number of lag matrices. |
rng |
|
spectral_radius |
Per-lag target spectral radius. |
decay |
Geometric decay rate of spectral radius across lags. |
A list of length lags, each a p x p matrix.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
K-fold cross-validation for genomic-prediction accuracy
morie_genomic_cross_validation(x, y, K = 5, lam = 1, seed = 0)morie_genomic_cross_validation(x, y, K = 5, lam = 1, seed = 0)
x |
(n x p) predictor matrix. |
y |
Numeric response. |
K |
Number of folds. |
lam |
Ridge penalty within each fold. |
seed |
Seed. |
list(estimate, r_per_fold, y_hat, mse, mspe, slope, n, K, method).
Montesinos Lopez Ch 2.
morie_genomic_cross_validation(x = rnorm(50), y = rnorm(50))morie_genomic_cross_validation(x = rnorm(50), y = rnorm(50))
Thin extender over gstat::krige for simple, ordinary or
universal kriging given a fitted variogram model.
morie_geostat_krige(formula, data, newdata, model, ...)morie_geostat_krige(formula, data, newdata, model, ...)
formula |
A formula describing the response and trend terms
(e.g. |
data |
Spatial object with the observed locations and response. |
newdata |
Spatial object with the prediction locations. |
model |
A fitted variogram model (e.g. from
|
... |
Further arguments forwarded to |
A list with $method = "gstat::krige" and
$raw (a spatial object with kriging predictions and
variances).
## Not run: if (requireNamespace("gstat", quietly = TRUE) && requireNamespace("sp", quietly = TRUE)) { data(meuse, package = "sp") data(meuse.grid, package = "sp") sp::coordinates(meuse) <- ~ x + y sp::coordinates(meuse.grid) <- ~ x + y vg <- gstat::variogram(log(zinc) ~ 1, data = meuse) mod <- gstat::fit.variogram(vg, gstat::vgm(1, "Sph", 900, 1)) morie_geostat_krige(log(zinc) ~ 1, meuse, meuse.grid, mod) } ## End(Not run)## Not run: if (requireNamespace("gstat", quietly = TRUE) && requireNamespace("sp", quietly = TRUE)) { data(meuse, package = "sp") data(meuse.grid, package = "sp") sp::coordinates(meuse) <- ~ x + y sp::coordinates(meuse.grid) <- ~ x + y vg <- gstat::variogram(log(zinc) ~ 1, data = meuse) mod <- gstat::fit.variogram(vg, gstat::vgm(1, "Sph", 900, 1)) morie_geostat_krige(log(zinc) ~ 1, meuse, meuse.grid, mod) } ## End(Not run)
Thin extender over gstat::variogram that computes the
sample (semi-)variogram of a spatially-indexed response for use
in kriging and other geostatistical workflows.
morie_geostat_variogram(formula, data, ...)morie_geostat_variogram(formula, data, ...)
formula |
A formula describing the response and any trend
terms (e.g. |
data |
A spatial object (e.g. |
... |
Further arguments forwarded to
|
A list with $method = "gstat::variogram" and
$raw (a gstatVariogram data frame with the
binned distances and semivariance estimates).
## Not run: if (requireNamespace("gstat", quietly = TRUE) && requireNamespace("sp", quietly = TRUE)) { data(meuse, package = "sp") sp::coordinates(meuse) <- ~ x + y morie_geostat_variogram(log(zinc) ~ 1, data = meuse) } ## End(Not run)## Not run: if (requireNamespace("gstat", quietly = TRUE) && requireNamespace("sp", quietly = TRUE)) { data(meuse, package = "sp") sp::coordinates(meuse) <- ~ x + y morie_geostat_variogram(log(zinc) ~ 1, data = meuse) } ## End(Not run)
Adaptive contraction rates over a smoothness grid.
morie_ghosal_adaptation(x, betas = NULL, d = 1)morie_ghosal_adaptation(x, betas = NULL, d = 1)
x |
Numeric data vector (used only for sample-size n). |
betas |
Numeric vector of smoothness exponents (default seq(0.5, 3, length.out = 11)). |
d |
Integer dimension (default 1). |
Named list with estimate, betas, rates, best_beta, n, d, method.
morie_ghosal_adaptation(x = rnorm(50))morie_ghosal_adaptation(x = rnorm(50))
BvM diagnostic for the mean functional under a DP prior.
morie_ghosal_bernstein_von_mises( x, theta0 = NULL, B = 500, seed = 0, deterministic_seed = NULL )morie_ghosal_bernstein_von_mises( x, theta0 = NULL, B = 500, seed = 0, deterministic_seed = NULL )
x |
Numeric data vector. |
theta0 |
Optional null value for the mean functional. |
B |
Integer number of bootstrap draws (default 500). |
seed |
Integer RNG seed (default 0). |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
Named list with estimate, se, theta_hat, z_ks_stat, z_ks_pvalue, wald, wald_pvalue, n, B, method.
morie_ghosal_bernstein_von_mises(x = rnorm(50))morie_ghosal_bernstein_von_mises(x = rnorm(50))
Returns eps_n = n raised to the power of -beta/(2*beta+d).
morie_ghosal_contraction_rate(x, beta = 1, d = 1)morie_ghosal_contraction_rate(x, beta = 1, d = 1)
x |
Numeric data vector (used only for sample-size n). |
beta |
Numeric smoothness exponent (default 1.0). |
d |
Integer dimension (default 1). |
Named list with estimate, log_rate_correction, parametric_rate, n, beta, d, method.
morie_ghosal_contraction_rate(x = rnorm(50))morie_ghosal_contraction_rate(x = rnorm(50))
Posterior of G given X_1, ..., X_n for G ~ DP(alpha, G0) with
G0 = N(base_mean, base_sd^2). Returns the posterior-mean CDF
evaluated on a grid plus the headline estimate at mean(x).
morie_ghosal_dirichlet_posterior( x, alpha = 1, base_mean = 0, base_sd = 1, grid = NULL )morie_ghosal_dirichlet_posterior( x, alpha = 1, base_mean = 0, base_sd = 1, grid = NULL )
x |
numeric vector. |
alpha |
concentration. |
base_mean, base_sd
|
base measure (N). |
grid |
optional grid (default: 51 pts spanning x). |
named list with estimate, cdf_grid, cdf_post,
cdf_var, alpha_post, n, method.
morie_ghosal_dirichlet_posterior(x = rnorm(50))morie_ghosal_dirichlet_posterior(x = rnorm(50))
DP mixture density estimate (Neal 2000 algorithm 3)
morie_ghosal_dpmixture_density( x, alpha = 1, sigma = NULL, grid = NULL, n_iter = 120, burn = 40, seed = 0, deterministic_seed = NULL )morie_ghosal_dpmixture_density( x, alpha = 1, sigma = NULL, grid = NULL, n_iter = 120, burn = 40, seed = 0, deterministic_seed = NULL )
x |
numeric vector |
alpha, sigma
|
DP and within-cluster sd (sigma defaults to Silverman bw) |
grid |
evaluation grid |
n_iter, burn, seed
|
Gibbs settings |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
named list with estimate, grid, density, k_post, n
morie_ghosal_dpmixture_density(x = rnorm(50))morie_ghosal_dpmixture_density(x = rnorm(50))
Empirical-Bayes alpha MLE for a DP, given the observed K_n.
morie_ghosal_empirical_bayes(x, alpha_grid = NULL)morie_ghosal_empirical_bayes(x, alpha_grid = NULL)
x |
Numeric data vector. |
alpha_grid |
Optional numeric grid of alpha values to maximise over. |
Named list with estimate (alpha-hat), K_n, log_lik_at_estimate, n, method.
morie_ghosal_empirical_bayes(x = rnorm(50))morie_ghosal_empirical_bayes(x = rnorm(50))
GP posterior mean with Matern kernel.
morie_ghosal_gp_matern( x, y, nu = 1.5, length_scale = NULL, sigma_f = 1, noise = NULL, x_star = NULL )morie_ghosal_gp_matern( x, y, nu = 1.5, length_scale = NULL, sigma_f = 1, noise = NULL, x_star = NULL )
x |
Numeric vector or matrix of input points. |
y |
Numeric response vector. |
nu |
Matern smoothness parameter (default 1.5). |
length_scale |
Optional kernel length-scale. |
sigma_f |
Numeric signal sd (default 1). |
noise |
Optional observation noise sd. |
x_star |
Optional matrix of prediction points (defaults to x). |
Named list with estimate, se, mu, sd, length_scale, nu, noise, n, method.
morie_ghosal_gp_matern(x = rnorm(50), y = rnorm(50))morie_ghosal_gp_matern(x = rnorm(50), y = rnorm(50))
GP posterior mean with squared-exponential kernel.
morie_ghosal_gp_squared_exponential( x, y, length_scale = NULL, sigma_f = 1, noise = NULL, x_star = NULL )morie_ghosal_gp_squared_exponential( x, y, length_scale = NULL, sigma_f = 1, noise = NULL, x_star = NULL )
x |
Numeric vector or matrix of input points. |
y |
Numeric response vector. |
length_scale |
Optional kernel length-scale. |
sigma_f |
Numeric signal sd (default 1). |
noise |
Optional observation noise sd. |
x_star |
Optional matrix of prediction points (defaults to x). |
Named list with estimate, se, mu, sd, length_scale, noise, n, method.
morie_ghosal_gp_squared_exponential(x = rnorm(50), y = rnorm(50))morie_ghosal_gp_squared_exponential(x = rnorm(50), y = rnorm(50))
Escobar-West augmentation for alpha given K_n with a Gamma(a, b) hyperprior.
morie_ghosal_hierarchical_bayes( x, a_prior = 1, b_prior = 1, M = 400, seed = 0, deterministic_seed = NULL )morie_ghosal_hierarchical_bayes( x, a_prior = 1, b_prior = 1, M = 400, seed = 0, deterministic_seed = NULL )
x |
Numeric data vector. |
a_prior |
Gamma shape hyperparameter (default 1). |
b_prior |
Gamma rate hyperparameter (default 1). |
M |
Integer number of MCMC iterations (default 400). |
seed |
Integer RNG seed (default 0). |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
Named list with estimate (alpha post mean), alpha_se, alpha_draws, K_n, n, method.
morie_ghosal_hierarchical_bayes(x = rnorm(50))morie_ghosal_hierarchical_bayes(x = rnorm(50))
Log-spline density estimator (Stone 1990, Ghosal Ch 8).
morie_ghosal_log_density(x, K = 5, grid = NULL)morie_ghosal_log_density(x, K = 5, grid = NULL)
x |
Numeric data vector. |
K |
Integer polynomial degree (default 5). |
grid |
Optional numeric evaluation grid. |
Named list with estimate, theta, log_lik, grid, log_density, K, n, method.
morie_ghosal_log_density(x = rnorm(50))morie_ghosal_log_density(x = rnorm(50))
Posterior mean / variance of G(A) for DP(alpha, G0) and A = (A_lower, A_upper].
morie_ghosal_moment_matching( x, alpha = 1, A_lower = NULL, A_upper = NULL, base_mean = 0, base_sd = 1 )morie_ghosal_moment_matching( x, alpha = 1, A_lower = NULL, A_upper = NULL, base_mean = 0, base_sd = 1 )
x |
Numeric data vector. |
alpha |
DP concentration parameter (default 1). |
A_lower |
Optional numeric lower bound of set A (default -Inf). |
A_upper |
Optional numeric upper bound of set A (default mean(x)). |
base_mean |
Numeric base-measure mean (default 0). |
base_sd |
Numeric base-measure sd (default 1). |
Named list with estimate, se, prior_mean, prior_var, n_A, n, alpha, method.
morie_ghosal_moment_matching(x = rnorm(50))morie_ghosal_moment_matching(x = rnorm(50))
Neutral-to-the-right posterior survival (Doksum 1974).
morie_ghosal_neutral_right(time, event = NULL, c = 1, lam0 = NULL)morie_ghosal_neutral_right(time, event = NULL, c = 1, lam0 = NULL)
time |
Numeric vector of observed times. |
event |
Optional integer/logical event indicator (1 = event, 0 = censored). |
c |
Numeric prior concentration (default 1). |
lam0 |
Optional baseline hazard rate. |
Named list with estimate, times, S_post, H_post, c, lam0, n, method.
morie_ghosal_neutral_right(time = cumsum(rexp(50)))morie_ghosal_neutral_right(time = cumsum(rexp(50)))
Probit-GP classifier (Laplace approximation).
morie_ghosal_np_classification( x, y, length_scale = NULL, sigma_f = 1, n_iter = 300, seed = 0 )morie_ghosal_np_classification( x, y, length_scale = NULL, sigma_f = 1, n_iter = 300, seed = 0 )
x |
Numeric matrix of features. |
y |
Numeric binary labels (0/1). |
length_scale |
Optional kernel length-scale. |
sigma_f |
Numeric signal sd (default 1). |
n_iter |
Integer maximum Laplace iterations (default 300). |
seed |
Integer RNG seed (default 0). |
Named list with estimate, p_hat, accuracy, length_scale, n, method.
morie_ghosal_np_classification(x = rnorm(50), y = rnorm(50))morie_ghosal_np_classification(x = rnorm(50), y = rnorm(50))
Wraps morie_ghosal_gp_squared_exponential.
morie_ghosal_np_regression( x, y, length_scale = NULL, sigma_f = 1, noise = NULL )morie_ghosal_np_regression( x, y, length_scale = NULL, sigma_f = 1, noise = NULL )
x |
Numeric vector or matrix of input points. |
y |
Numeric response vector. |
length_scale |
Optional kernel length-scale. |
sigma_f |
Numeric signal sd (default 1). |
noise |
Optional observation noise sd. |
Named list with estimate, se, mu, sd, ci_lower, ci_upper, r2, log_marginal, length_scale, noise, n, method.
morie_ghosal_np_regression(x = rnorm(50), y = rnorm(50))morie_ghosal_np_regression(x = rnorm(50), y = rnorm(50))
Polya-tree Bayes factor for H0: F = N(loc, scale^2).
morie_ghosal_np_testing(x, ref_loc = 0, ref_scale = 1, depth = 6, c = 1)morie_ghosal_np_testing(x, ref_loc = 0, ref_scale = 1, depth = 6, c = 1)
x |
Numeric data vector. |
ref_loc |
Numeric reference location (default 0). |
ref_scale |
Numeric reference scale (default 1). |
depth |
Integer Polya-tree depth (default 6). |
c |
Numeric Polya-tree concentration (default 1). |
Named list with statistic (log BF), p_value, BF10, log_BF10, n, depth, method.
morie_ghosal_np_testing(x = rnorm(50))morie_ghosal_np_testing(x = rnorm(50))
Schwartz posterior-consistency diagnostic (Bayesian bootstrap).
morie_ghosal_posterior_consistency( x, ref_loc = NULL, ref_scale = NULL, eps = 0.1, K = 200, seed = 0 )morie_ghosal_posterior_consistency( x, ref_loc = NULL, ref_scale = NULL, eps = 0.1, K = 200, seed = 0 )
x |
Numeric data vector. |
ref_loc |
Optional numeric reference location. |
ref_scale |
Optional numeric reference scale. |
eps |
Numeric KS-distance tolerance (default 0.1). |
K |
Integer number of bootstrap draws (default 200). |
seed |
Integer RNG seed (default 0). |
Named list with estimate, ks_mean, ks_se, schwartz_bound, n, eps, method.
morie_ghosal_posterior_consistency(x = rnorm(50))morie_ghosal_posterior_consistency(x = rnorm(50))
Bernstein-polynomial sieve density estimator (Petrone 1999).
morie_ghosal_sieve_prior(x, K = NULL)morie_ghosal_sieve_prior(x, K = NULL)
x |
Numeric data vector. |
K |
Optional integer sieve degree (default round(n^(1/3))). |
Named list with estimate, log_lik_per_obs, weights, K, n, method.
morie_ghosal_sieve_prior(x = rnorm(50))morie_ghosal_sieve_prior(x = rnorm(50))
Truncated stick-breaking representation of DP(alpha, G0).
morie_ghosal_stick_breaking_trunc( x, alpha = 1, K = 50, seed = 0, base_mean = NULL, base_sd = NULL, deterministic_seed = NULL )morie_ghosal_stick_breaking_trunc( x, alpha = 1, K = 50, seed = 0, base_mean = NULL, base_sd = NULL, deterministic_seed = NULL )
x |
Numeric data vector. |
alpha |
DP concentration parameter (default 1). |
K |
Integer truncation level (default 50). |
seed |
Integer RNG seed (default 0). |
base_mean |
Optional base-measure mean. |
base_sd |
Optional base-measure sd. |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
Named list with estimate, weights, atoms, effective_K, trunc_err_bound, n, method.
morie_ghosal_stick_breaking_trunc(x = rnorm(50))morie_ghosal_stick_breaking_trunc(x = rnorm(50))
Beta-process posterior survival (Hjort 1990).
morie_ghosal_survival_beta_process(time, event = NULL, c = 1, lam0 = NULL)morie_ghosal_survival_beta_process(time, event = NULL, c = 1, lam0 = NULL)
time |
Numeric vector of observed times. |
event |
Optional integer/logical event indicator (1 = event, 0 = censored). |
c |
Numeric prior concentration (default 1). |
lam0 |
Optional baseline hazard rate. |
Named list with estimate, times, S_post, H_post, c, lam0, n, method.
morie_ghosal_survival_beta_process(time = cumsum(rexp(50)))morie_ghosal_survival_beta_process(time = cumsum(rexp(50)))
Haar-wavelet spike-and-slab BayesThresh estimator (Abramovich 1998).
morie_ghosal_wavelet_prior(x, pi = 0.5, sigma = NULL, noise = NULL)morie_ghosal_wavelet_prior(x, pi = 0.5, sigma = NULL, noise = NULL)
x |
Numeric data vector. |
pi |
Numeric prior inclusion probability (default 0.5). |
sigma |
Optional slab sd. |
noise |
Optional noise sd. |
Named list with estimate, fitted, noise, sigma, inclusion, n, method.
morie_ghosal_wavelet_prior(x = rnorm(50))morie_ghosal_wavelet_prior(x = rnorm(50))
Mirrors the FSF list at https://www.gnu.org/licenses/license-list.html. Apache-2.0 is GPL-3 compatible but not GPL-2 compatible; morie is GPL-2.0-only so the choice rests with downstream consumers.
morie_gpl_compatible_licenses()morie_gpl_compatible_licenses()
Character vector of SPDX identifiers.
morie_gpl_compatible_licenses()morie_gpl_compatible_licenses()
Wraps gbm::gbm when available, otherwise falls back to
xgboost as a portable boosted-trees backend.
morie_gradient_boosting_ensemble( x, y, n_estimators = 100L, learning_rate = 0.1, max_depth = 3L, task = "auto", seed = 0L, deterministic_seed = NULL )morie_gradient_boosting_ensemble( x, y, n_estimators = 100L, learning_rate = 0.1, max_depth = 3L, task = "auto", seed = 0L, deterministic_seed = NULL )
x |
Numeric predictor matrix. |
y |
Response. |
n_estimators |
Number of boosting iterations. |
learning_rate |
Shrinkage nu. |
max_depth |
Depth of each tree. |
task |
"auto", "classification", or "regression". |
seed |
RNG seed. |
deterministic_seed |
Integer or NULL. If supplied, the RNG state
is derived from the SHA-keyed |
Named list: estimate, train_score, feature_importances, n_estimators, learning_rate, max_depth, task, n, method.
morie_gradient_boosting_ensemble(x = rnorm(50), y = rnorm(50))morie_gradient_boosting_ensemble(x = rnorm(50), y = rnorm(50))
Uses gbm if available; otherwise base-R boosted stumps.
morie_gradient_boosting_genomic( x, y, markers, n_estimators = 100, learning_rate = 0.1, max_depth = 3, seed = 0 )morie_gradient_boosting_genomic( x, y, markers, n_estimators = 100, learning_rate = 0.1, max_depth = 3, seed = 0 )
x |
Optional fixed features. |
y |
Numeric response. |
markers |
(n x m) genotype matrix. |
n_estimators |
Boosting rounds. |
learning_rate |
Shrinkage. |
max_depth |
Tree depth (gbm only). |
seed |
Seed. |
list(estimate, y_hat, train_loss, se, n, method).
Friedman (2001); Montesinos Lopez Ch 9.
morie_gradient_boosting_genomic( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )morie_gradient_boosting_genomic( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )
theta := theta - lr * (2/n) X' (X theta - y), intercept included.
Validates against stats::lm reference.
morie_gradient_descent_vanilla(x, y, lr = 0.01, n_iter = 1000, tol = 1e-08)morie_gradient_descent_vanilla(x, y, lr = 0.01, n_iter = 1000, tol = 1e-08)
x |
Numeric matrix / vector of predictors. |
y |
Numeric response vector. |
lr |
Learning rate. |
n_iter |
Max iterations. |
tol |
L2 step-norm tolerance for early stopping. |
Named list with estimate, reference_ols,
n_iter, loss, n, method.
morie_gradient_descent_vanilla(x = rnorm(50), y = rnorm(50))morie_gradient_descent_vanilla(x = rnorm(50), y = rnorm(50))
Wraps caret::train with method = "glm" (classification) or
"lm" (regression) by default; users can pass any caret method.
morie_grid_search_cv( x, y, method = NULL, tune_grid = NULL, cv = 5L, task = "auto", seed = 0L )morie_grid_search_cv( x, y, method = NULL, tune_grid = NULL, cv = 5L, task = "auto", seed = 0L )
x |
Numeric predictor matrix. |
y |
Response. |
method |
caret method id (default chosen by task). |
tune_grid |
data.frame of hyperparameter combos to evaluate. |
cv |
CV folds. |
task |
"auto", "classification", or "regression". |
seed |
RNG seed. |
Named list: estimate (best CV score), best_params, best_score, cv_results_params, cv_results_mean_score, task, n, method.
morie_grid_search_cv( x = matrix(rnorm(150), 50, 3), y = rnorm(50), method = "lm", tune_grid = data.frame(intercept = c(TRUE, FALSE)), cv = 3L, task = "regression", seed = 1L )morie_grid_search_cv( x = matrix(rnorm(150), 50, 3), y = rnorm(50), method = "lm", tune_grid = data.frame(intercept = c(TRUE, FALSE)), cv = 3L, task = "regression", seed = 1L )
Computes G = ZZ' / (2 sum p_j(1-p_j)) for method 1 (default), or the per-locus-scaled variant for method 2.
morie_grm_vanraden(markers, method = 1)morie_grm_vanraden(markers, method = 1)
markers |
Numeric (n x m) genotype matrix coded (coded 0/1/2). |
method |
1 or 2 (VanRaden 2008). |
Named list with estimate (G matrix), diag_mean, off_mean, p, n, m, method.
VanRaden (2008) J Dairy Sci 91:4414. Montesinos Lopez Ch 3.
morie_grm_vanraden(markers = matrix(sample(0:2, 200, TRUE), 50, 4))morie_grm_vanraden(markers = matrix(sample(0:2, 200, TRUE), 50, 4))
Two-way GxE ANOVA with EMS variance components
morie_gxe_interaction_model(x, y, env)morie_gxe_interaction_model(x, y, env)
x |
Genotype IDs (length n). |
y |
Numeric response. |
env |
Environment IDs (length n). |
list(estimate, g, e, ge, var_g, var_e, var_ge, var_eps, se, n, method).
Montesinos Lopez Ch 11.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Fits a one-dimensional Hawkes process with a constant baseline to a vector of event times. The conditional intensity is
where nu = exp(a0) is the baseline rate, eta the
branching ratio, and g the chosen triggering kernel.
morie_hawkes_fit( times, end_time = NULL, kernel = c("exponential", "weibull", "lomax", "gamma") )morie_hawkes_fit( times, end_time = NULL, kernel = c("exponential", "weibull", "lomax", "gamma") )
times |
numeric vector of sorted, non-decreasing event times. |
end_time |
observation horizon; defaults to the last event time. |
kernel |
triggering kernel: one of |
The negative log-likelihood is evaluated by the shared morie C++
core (the same kernels the Python package uses); without a compiled
core it falls back to a pure-R likelihood.
An object of class morie_hawkes_fit: a list with the
parameter estimate, loglik, aic,
branching_ratio, baseline_rate, n_events,
converged and the backend used.
set.seed(1) ev <- cumsum(rexp(200, rate = 2)) fit <- morie_hawkes_fit(ev, kernel = "exponential") print(fit)set.seed(1) ev <- cumsum(rexp(200, rate = 2)) fit <- morie_hawkes_fit(ev, kernel = "exponential") print(fit)
Estimates the Hurst exponent : a long-memory measure of a time
series. indicates uncorrelated (Brownian) increments;
indicates persistent (trending) behaviour;
indicates anti-persistent (mean-reverting) behaviour.
morie_hurst_r(x)morie_hurst_r(x)
x |
Numeric vector (time series). |
List with H (numeric) and interpretation
("persistent"/"anti-persistent"/"random").
if (requireNamespace("pracma", quietly = TRUE)) { set.seed(1) x <- cumsum(rnorm(2048)) # Brownian motion, expected H ~ 0.5 res <- morie_hurst_r(x) res$interpretation }if (requireNamespace("pracma", quietly = TRUE)) { set.seed(1) x <- cumsum(rnorm(2048)) # Brownian motion, expected H ~ 0.5 res <- morie_hurst_r(x) res$interpretation }
Mirrors the Python morie.infer_measurement_level(). Heuristically
classifies a vector as one of "binary", "nominal", "ordinal",
"interval", or "ratio" based on Stevens' (1946) typology.
morie_infer_measurement_level(x)morie_infer_measurement_level(x)
x |
A vector (any atomic type or factor). |
Rules: logical or 2-level factor/character -> "binary"; ordered factor ->
"ordinal"; unordered factor or character -> "nominal"; integer/numeric
with non-negative range -> "ratio"; otherwise -> "interval".
Character scalar in
c("binary", "nominal", "ordinal", "interval", "ratio").
morie_infer_measurement_level(c(0, 1, 1, 0)) # "binary" morie_infer_measurement_level(factor(c("a", "b", "c"))) # "nominal" morie_infer_measurement_level(ordered(c("low", "med", "high"))) # "ordinal" morie_infer_measurement_level(c(1.2, 3.4, 5.6)) # "ratio" morie_infer_measurement_level(c(-1.5, 0.0, 2.3)) # "interval"morie_infer_measurement_level(c(0, 1, 1, 0)) # "binary" morie_infer_measurement_level(factor(c("a", "b", "c"))) # "nominal" morie_infer_measurement_level(ordered(c("low", "med", "high"))) # "ordinal" morie_infer_measurement_level(c(1.2, 3.4, 5.6)) # "ratio" morie_infer_measurement_level(c(-1.5, 0.0, 2.3)) # "interval"
Builds a SELECT ... FROM `project`.`dataset`.`table`
string with identifier validation and backtick-quoting.
where is passed through unchanged; callers compose SQL
fragments themselves and are responsible for not injecting hostile
clauses (same contract as
morie_ingest_bigquery_query).
morie_ingest_bigquery_build_sql( project, dataset, table, where = NULL, limit = NULL, select = "*" )morie_ingest_bigquery_build_sql( project, dataset, table, where = NULL, limit = NULL, select = "*" )
project, dataset, table
|
Fully-qualified BigQuery table
reference, e.g. project |
where |
Optional raw SQL |
limit |
Optional |
select |
Projection list (default |
A SQL string.
morie_ingest_bigquery_build_sql( project = "bigquery-public-data", dataset = "chicago_crime", table = "crime", where = "year = 2024", limit = 10000L )morie_ingest_bigquery_build_sql( project = "bigquery-public-data", dataset = "chicago_crime", table = "crime", where = "year = 2024", limit = 10000L )
Runs arbitrary SQL against BigQuery via bigrquery, downloads
the full result set, and returns it as a base R data.frame.
Authentication uses Application Default Credentials (the same flow
the rest of the HADES-LLM stack uses); to authenticate
interactively, run bigrquery::bq_auth() first.
morie_ingest_bigquery_query( sql, billing_project = NULL, page_size = 10000L, max_rows = Inf, quiet = TRUE )morie_ingest_bigquery_query( sql, billing_project = NULL, page_size = 10000L, max_rows = Inf, quiet = TRUE )
sql |
A SQL string to execute. |
billing_project |
Project to bill the query to. |
page_size |
Rows per download page (forwarded to
|
max_rows |
Optional cap on rows downloaded (defaults to
|
quiet |
Suppress bigrquery progress output. |
Billing project is resolved from billing_project, then the
GCP_PROJECT environment variable, then ADC discovery; if
none of those yields a project the call errors out with a clear
message before contacting BigQuery.
A base R data.frame.
morie_ingest_bigquery_table,
morie_ingest_bigquery_build_sql
## Not run: # Requires the 'bigrquery' package, ADC, and a billing project. Sys.setenv(GCP_PROJECT = "my-billing-project") df <- morie_ingest_bigquery_query( "SELECT year, COUNT(*) AS n FROM `bigquery-public-data.chicago_crime.crime` GROUP BY year ORDER BY year" ) head(df) ## End(Not run)## Not run: # Requires the 'bigrquery' package, ADC, and a billing project. Sys.setenv(GCP_PROJECT = "my-billing-project") df <- morie_ingest_bigquery_query( "SELECT year, COUNT(*) AS n FROM `bigquery-public-data.chicago_crime.crime` GROUP BY year ORDER BY year" ) head(df) ## End(Not run)
Convenience wrapper around
morie_ingest_bigquery_build_sql +
morie_ingest_bigquery_query: builds a validated,
backtick-quoted SELECT against a fully-qualified table and
downloads the result.
morie_ingest_bigquery_table( project, dataset, table, where = NULL, limit = NULL, select = "*", billing_project = NULL, page_size = 10000L, max_rows = Inf, quiet = TRUE )morie_ingest_bigquery_table( project, dataset, table, where = NULL, limit = NULL, select = "*", billing_project = NULL, page_size = 10000L, max_rows = Inf, quiet = TRUE )
project, dataset, table
|
Fully-qualified BigQuery table
reference, e.g. |
where |
Optional raw SQL |
limit |
Optional |
select |
Projection list (default |
billing_project |
Billing project; falls back to
|
page_size |
Rows per download page. |
max_rows |
Optional cap on rows downloaded. |
quiet |
Suppress bigrquery progress output. |
A base R data.frame.
## Not run: # Requires the 'bigrquery' package, ADC, and a billing project. df <- morie_ingest_bigquery_table( project = "bigquery-public-data", dataset = "chicago_crime", table = "crime", where = "year = 2024", limit = 10000L, billing_project = "my-billing-project" ) head(df) ## End(Not run)## Not run: # Requires the 'bigrquery' package, ADC, and a billing project. df <- morie_ingest_bigquery_table( project = "bigquery-public-data", dataset = "chicago_crime", table = "crime", where = "year = 2024", limit = 10000L, billing_project = "my-billing-project" ) head(df) ## End(Not run)
Returns a data.frame with the documented Socrata schema (snake_case
column names preserved): id, case_number,
date, block, iucr, primary_type,
description, location_description, arrest,
domestic, beat, district, ward,
community_area, fbi_code, x_coordinate,
y_coordinate, year, updated_on,
latitude, longitude.
morie_ingest_chicago_crime( year = NULL, where = NULL, max_features = NULL, app_token = NULL, user_agent = .MORIE_CHICAGO_DEFAULT_UA, timeout = .MORIE_CHICAGO_DEFAULT_TIMEOUT )morie_ingest_chicago_crime( year = NULL, where = NULL, max_features = NULL, app_token = NULL, user_agent = .MORIE_CHICAGO_DEFAULT_UA, timeout = .MORIE_CHICAGO_DEFAULT_TIMEOUT )
year |
Optional reporting year (e.g. 2024); when set, applies
the server-side SoQL filter |
where |
Optional raw SoQL |
max_features |
Optional hard cap on returned rows. |
app_token |
Optional Socrata X-App-Token for higher rate limits. |
user_agent, timeout
|
Standard request knobs. |
A base R data.frame.
morie_ingest_chicago_socrata,
morie_ingest_bigquery_table for the BigQuery
public-data mirror (bigquery-public-data.chicago_crime).
## Not run: df <- morie_ingest_chicago_crime(year = 2024, max_features = 10000L) head(df) ## End(Not run)## Not run: df <- morie_ingest_chicago_crime(year = 2024, max_features = 10000L) head(df) ## End(Not run)
Convenience wrapper around
morie_ingest_bigquery_table that pulls
bigquery-public-data.chicago_crime.crime - the Google
BigQuery public-data mirror of the Socrata feed served by
morie_ingest_chicago_crime. Use this path when you
want SQL-side filtering or the full historical depth of the dataset
without paging through SoQL.
morie_ingest_chicago_crime_bigquery( where = NULL, year = NULL, limit = NULL, select = "*", billing_project = NULL, page_size = 10000L, max_rows = Inf, quiet = TRUE )morie_ingest_chicago_crime_bigquery( where = NULL, year = NULL, limit = NULL, select = "*", billing_project = NULL, page_size = 10000L, max_rows = Inf, quiet = TRUE )
where |
Optional raw SQL |
year |
Convenience shortcut: when |
limit |
Optional |
select |
Projection list (default |
billing_project, page_size, max_rows, quiet
|
Forwarded to
|
Requires the optional bigrquery package and a billing project
(billing_project arg or GCP_PROJECT env var); public
datasets are billed to the caller's project, not the dataset
owner's.
A base R data.frame.
morie_ingest_chicago_crime,
morie_ingest_bigquery_table
Returns the canonical Chicago open-data Socrata endpoints morie
ships with as a flat data.frame. Useful for discovery and for the
CLI --list surface.
morie_ingest_chicago_resources()morie_ingest_chicago_resources()
A base R data.frame with columns name,
url.
Pages transparently through $offset until either the server
returns fewer rows than page_size (the last page) or
max_features is reached. Works against any Socrata-shaped
portal (Chicago, NYC, Seattle, etc.).
morie_ingest_chicago_socrata( resource_url, where = NULL, select = NULL, order = NULL, page_size = .MORIE_CHICAGO_DEFAULT_PAGE_SIZE, max_features = NULL, app_token = NULL, user_agent = .MORIE_CHICAGO_DEFAULT_UA, timeout = .MORIE_CHICAGO_DEFAULT_TIMEOUT )morie_ingest_chicago_socrata( resource_url, where = NULL, select = NULL, order = NULL, page_size = .MORIE_CHICAGO_DEFAULT_PAGE_SIZE, max_features = NULL, app_token = NULL, user_agent = .MORIE_CHICAGO_DEFAULT_UA, timeout = .MORIE_CHICAGO_DEFAULT_TIMEOUT )
resource_url |
Full Socrata resource URL ending in
|
where, select, order
|
SoQL clauses; see
https://dev.socrata.com/docs/queries/. |
page_size |
Rows per request (capped at 50,000 server-side). |
max_features |
Optional hard cap on total returned rows. |
app_token |
Optional Socrata application token (anonymous calls share a throttled pool; tokens give per-app quotas). |
user_agent, timeout
|
Standard request knobs. |
A base R data.frame.
## Not run: df <- morie_ingest_chicago_socrata( "https://data.cityofnewyork.us/resource/uip8-fykc.json", where = "arrest_year = 2023", max_features = 5000L ) ## End(Not run)## Not run: df <- morie_ingest_chicago_socrata( "https://data.cityofnewyork.us/resource/uip8-fykc.json", where = "arrest_year = 2023", max_features = 5000L ) ## End(Not run)
Download a CIHI indicator .xlsx data table
morie_ingest_cihi_xlsx( url, sheet = NULL, timeout = 120, user_agent = "morie/r (+https://hadesllm.com)", ... )morie_ingest_cihi_xlsx( url, sheet = NULL, timeout = 120, user_agent = "morie/r (+https://hadesllm.com)", ... )
url |
Direct URL of the CIHI .xlsx data table. |
sheet |
Worksheet name or 1-based index. NULL = largest sheet. |
timeout |
HTTP timeout in seconds (default 120). |
user_agent |
User-Agent string. |
... |
forwarded to readxl::read_excel. |
base R data.frame.
Mirrors the Python fetch_package_csvs helper: pulls the
package metadata, walks its resources list, and downloads
each CSV / TSV into a named list of data.frames keyed by resource
name (falling back to url, then id). Non-CSV /
TSV resources are skipped; individual download failures are
captured as a single-row error data.frame keyed
_failed_<name> so the overall fetch still returns the
successful ones.
morie_ingest_ckan_fetch_package_csvs( portal, package_id, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )morie_ingest_ckan_fetch_package_csvs( portal, package_id, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )
portal |
Base URL of the CKAN portal. |
package_id |
Package id or slug. |
api_key |
Optional CKAN API key. |
user_agent |
User-Agent header sent with the request. |
timeout |
HTTP timeout in seconds. |
A named list of data.frames.
Calls the CKAN package_search Action verb against a portal
base URL and returns the raw result list (count,
results, etc.). For a flattened metadata data.frame, use
morie_ingest_ckan_search_packages.
morie_ingest_ckan_package_search( portal, query = NULL, fq = NULL, rows = 100L, start = 0L, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )morie_ingest_ckan_package_search( portal, query = NULL, fq = NULL, rows = 100L, start = 0L, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )
portal |
Base URL of the CKAN portal, e.g.
|
query |
Free-text |
fq |
CKAN |
rows |
Maximum rows to return (default 100). |
start |
Pagination offset (default 0). |
api_key |
Optional CKAN API key (rare for open portals). |
user_agent |
User-Agent header sent with the request. |
timeout |
HTTP timeout in seconds. |
A named list as returned by the CKAN Action API.
## Not run: res <- morie_ingest_ckan_package_search( "https://open.canada.ca/data", query = "corrections" ) length(res$results) ## End(Not run)## Not run: res <- morie_ingest_ckan_package_search( "https://open.canada.ca/data", query = "corrections" ) length(res$results) ## End(Not run)
Calls the CKAN package_show verb. The returned list
contains title, notes, resources, etc. -
resources are the individual downloadable files in the package.
morie_ingest_ckan_package_show( portal, package_id, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )morie_ingest_ckan_package_show( portal, package_id, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )
portal |
Base URL of the CKAN portal. |
package_id |
Package id or slug, e.g.
|
api_key |
Optional CKAN API key. |
user_agent |
User-Agent header sent with the request. |
timeout |
HTTP timeout in seconds. |
The package metadata list.
url_or_id may be a direct download URL (as it appears in
resource$url) or a CKAN resource id, in which case the URL
is resolved via morie_ingest_ckan_resource_show.
morie_ingest_ckan_read_resource( portal, url_or_id, as_format = NULL, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )morie_ingest_ckan_read_resource( portal, url_or_id, as_format = NULL, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )
portal |
Base URL of the CKAN portal (only used when
|
url_or_id |
A direct URL or a CKAN resource id. |
as_format |
Optional format override
( |
api_key |
Optional CKAN API key. |
user_agent |
User-Agent header sent with the request. |
timeout |
HTTP timeout in seconds. |
Format detection: if as_format is given it wins. Otherwise
the extension is sniffed off the URL; unknown extensions fall back
to CSV (matching the Python helper).
Excel / JSON / Parquet readers require optional dependencies (readxl / jsonlite / arrow) and error with an install hint if missing.
A base R data.frame.
Calls the CKAN resource_show verb to resolve a resource id
into its download URL plus metadata.
morie_ingest_ckan_resource_show( portal, resource_id, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )morie_ingest_ckan_resource_show( portal, resource_id, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )
portal |
Base URL of the CKAN portal. |
resource_id |
CKAN resource id (UUID). |
api_key |
Optional CKAN API key. |
user_agent |
User-Agent header sent with the request. |
timeout |
HTTP timeout in seconds. |
The resource metadata list.
Convenience wrapper over
morie_ingest_ckan_package_search that flattens the
most-useful columns into a single data.frame: id,
name, title, organization, license_id,
metadata_modified, num_resources, url (the
canonical <portal>/dataset/<name> URL).
morie_ingest_ckan_search_packages( portal, query, rows = 50L, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )morie_ingest_ckan_search_packages( portal, query, rows = 50L, api_key = NULL, user_agent = .MORIE_CKAN_DEFAULT_UA, timeout = .MORIE_CKAN_DEFAULT_TIMEOUT )
portal |
Base URL of the CKAN portal. |
query |
Free-text query string. |
rows |
Maximum rows to return (default 50). |
api_key |
Optional CKAN API key. |
user_agent |
User-Agent header sent with the request. |
timeout |
HTTP timeout in seconds. |
A base R data.frame.
Posts a JSON-body search request to the NamUs MissingPersons
endpoint (/api/CaseSets/NamUs/MissingPersons/Search) and
pages through the results. Returns morie's documented schema -
case_number, state, county, dlc_date
(date last contact), sex, race, age_min,
age_max, height_cm_min, height_cm_max,
weight_kg_min, weight_kg_max, first_name,
last_name, city, circumstances.
morie_ingest_forensics_namus_missing( state = NULL, max_features = NULL, page_size = 200L, user_agent = .MORIE_FORENSICS_DEFAULT_UA, timeout = .MORIE_FORENSICS_DEFAULT_TIMEOUT )morie_ingest_forensics_namus_missing( state = NULL, max_features = NULL, page_size = 200L, user_agent = .MORIE_FORENSICS_DEFAULT_UA, timeout = .MORIE_FORENSICS_DEFAULT_TIMEOUT )
state |
Two-letter US state code; |
max_features |
Optional hard cap on returned rows. |
page_size |
Records per request (default 200). |
user_agent, timeout
|
Standard request knobs. |
A base R data.frame.
## Not run: df <- morie_ingest_forensics_namus_missing(state = "CA", max_features = 1000L) head(df) ## End(Not run)## Not run: df <- morie_ingest_forensics_namus_missing(state = "CA", max_features = 1000L) head(df) ## End(Not run)
Queries the FBI Crime Data Explorer NIBRS endpoint
(/crime/fbi/cde/nibrs/<state>/<offense>?year=...). Requires
an API key from https://api.data.gov/signup/; pass via
api_key= or set the FBI_CDE_API_KEY environment
variable. Returns one row per offence-event with nested
sub-objects flattened using dotted keys
(offense.code, victim.age, ...).
morie_ingest_forensics_nibrs( year, offense = NULL, state = NULL, api_key = NULL, max_features = NULL, page_size = 500L, timeout = .MORIE_FORENSICS_DEFAULT_TIMEOUT )morie_ingest_forensics_nibrs( year, offense = NULL, state = NULL, api_key = NULL, max_features = NULL, page_size = 500L, timeout = .MORIE_FORENSICS_DEFAULT_TIMEOUT )
year |
Reporting year (e.g. 2023). Required - CDE forces a year scope. |
offense |
NIBRS offence slug (e.g. |
state |
Two-letter US state code (e.g. |
api_key |
FBI CDE API key; falls back to
|
max_features |
Optional hard cap on returned rows. |
page_size |
CDE page size; server-side cap varies by endpoint. |
timeout |
HTTP timeout in seconds. |
A base R data.frame, one row per offence-event.
## Not run: df <- morie_ingest_forensics_nibrs( year = 2023, offense = "aggravated-assault", state = "GA", api_key = Sys.getenv("FBI_CDE_API_KEY"), max_features = 5000L ) head(df) ## End(Not run)## Not run: df <- morie_ingest_forensics_nibrs( year = 2023, offense = "aggravated-assault", state = "GA", api_key = Sys.getenv("FBI_CDE_API_KEY"), max_features = 5000L ) head(df) ## End(Not run)
The raw reference datasets (CSAFE bullets/cartridges, NSRL hash library, ...) are multi-gigabyte and shipped on dedicated download servers; this function returns only the catalog records so the caller can pick what to download separately.
morie_ingest_forensics_nist_rds( dataset_id = NULL, query = NULL, max_features = NULL, page_size = 50L, raw = FALSE, timeout = .MORIE_FORENSICS_DEFAULT_TIMEOUT )morie_ingest_forensics_nist_rds( dataset_id = NULL, query = NULL, max_features = NULL, page_size = 50L, raw = FALSE, timeout = .MORIE_FORENSICS_DEFAULT_TIMEOUT )
dataset_id |
Specific NIST RDS / EDI id (e.g.
|
query |
Free-text search over title / description / keyword.
Ignored when |
max_features |
Optional hard cap on returned rows. |
page_size |
Records per request (default 50). |
raw |
If |
timeout |
HTTP timeout in seconds. |
A base R data.frame.
Convenience wrapper around the CRAN cansim package, which
talks to the Statistics Canada NDM ("cansim") tabular data API.
Use this for canonical CANSIM tables (e.g. "35-10-0177-01")
rather than for PUMF _CSV.zip downloads — those go
through morie_ingest_statcan_csv.
morie_ingest_statcan_cansim( table_id, language = c("eng", "fra"), refresh = FALSE, ... )morie_ingest_statcan_cansim( table_id, language = c("eng", "fra"), refresh = FALSE, ... )
table_id |
A StatCan / NDM table identifier, e.g.
|
language |
One of |
refresh |
If |
... |
Further arguments forwarded to
|
If the STATCAN_API_KEY environment variable is set, it is
passed to cansim::set_cansim_api_key() so authenticated
rate limits apply.
A base R data.frame.
## Not run: # Requires the 'cansim' package and network access. df <- morie_ingest_statcan_cansim("35-10-0177") head(df) ## End(Not run)## Not run: # Requires the 'cansim' package and network access. df <- morie_ingest_statcan_cansim("35-10-0177") head(df) ## End(Not run)
Downloads a Statistics Canada _CSV.zip product from
www150.statcan.gc.ca, extracts a CSV member, and returns
the contents as a base R data.frame. The archive is
streamed to a session-scoped tempfile (PUMF zips can be hundreds
of megabytes), and the tempfile is removed when the function
returns. Nothing is written under ~/.cache unless the
caller explicitly opts in via morie_cache_dir.
morie_ingest_statcan_csv( url, member = NULL, timeout = 600, user_agent = "morie/r (+https://hadesllm.com)", ... )morie_ingest_statcan_csv( url, member = NULL, timeout = 600, user_agent = "morie/r (+https://hadesllm.com)", ... )
url |
Direct URL of the StatCan |
member |
Name of the CSV inside the archive; defaults to the
first |
timeout |
HTTP timeout in seconds (default 600). |
user_agent |
User-Agent string sent with the request. |
... |
Further arguments forwarded to
|
Note that a StatCan catalogue page (e.g.
/n1/en/catalogue/82M0013X) is only an HTML index — the
actual data is linked from the product page
(/n1/pub/82m0013x/82m0013x2024001-eng.htm), which points at
the real ..._CSV.zip.
A base R data.frame.
morie_ingest_statcan_cansim,
morie_cache_dir
## Not run: # Requires network access. url <- paste0( "https://www150.statcan.gc.ca/n1/pub/82m0013x/", "2024001/2022_CSV.zip" ) df <- morie_ingest_statcan_csv(url) head(df) ## End(Not run)## Not run: # Requires network access. url <- paste0( "https://www150.statcan.gc.ca/n1/pub/82m0013x/", "2024001/2022_CSV.zip" ) df <- morie_ingest_statcan_csv(url) head(df) ## End(Not run)
ArcGIS FeatureServer queries cap at the layer's server-side
maxRecordCount (2,000 for the TPS layers). This function
pages through transparently using resultOffset and the
exceededTransferLimit flag, emitting one data.frame.
morie_ingest_tps_feature_layer( layer_url, where = "1=1", out_fields = "*", return_geometry = FALSE, page_size = 2000L, max_features = NULL, user_agent = .MORIE_TPS_DEFAULT_UA, timeout = .MORIE_TPS_DEFAULT_TIMEOUT )morie_ingest_tps_feature_layer( layer_url, where = "1=1", out_fields = "*", return_geometry = FALSE, page_size = 2000L, max_features = NULL, user_agent = .MORIE_TPS_DEFAULT_UA, timeout = .MORIE_TPS_DEFAULT_TIMEOUT )
layer_url |
Full URL to a FeatureServer layer, e.g. one of
the entries in |
where |
ArcGIS WHERE clause. Default |
out_fields |
Comma-separated attribute list, or |
return_geometry |
If |
page_size |
Records per request; clamped server-side to 2,000 for TPS layers. |
max_features |
Optional hard cap on total returned rows. |
user_agent, timeout
|
Standard request knobs. |
A base R data.frame.
## Not run: df <- morie_ingest_tps_feature_layer( morie_ingest_tps_layers()$url[ morie_ingest_tps_layers()$name == "major-crime" ], where = "OCC_YEAR >= 2023", max_features = 5000L ) nrow(df) ## End(Not run)## Not run: df <- morie_ingest_tps_feature_layer( morie_ingest_tps_layers()$url[ morie_ingest_tps_layers()$name == "major-crime" ], where = "OCC_YEAR >= 2023", max_features = 5000L ) nrow(df) ## End(Not run)
Convenience wrapper around
morie_ingest_tps_feature_layer that takes a
registry short-name (e.g. "major-crime",
"shooting-firearms") instead of a raw FeatureServer URL.
morie_ingest_tps_fetch( layer, year = NULL, where = NULL, return_geometry = FALSE, max_features = NULL, ... )morie_ingest_tps_fetch( layer, year = NULL, where = NULL, return_geometry = FALSE, max_features = NULL, ... )
layer |
Short name from |
year |
Optional shortcut for |
where |
Raw ArcGIS WHERE clause (overrides |
return_geometry |
Include longitude/latitude columns. |
max_features |
Optional hard cap on rows. |
... |
Forwarded to |
A base R data.frame.
Returns the canonical Toronto Police Service open-data ArcGIS
FeatureServer layer URLs morie ships with as a flat data.frame.
Useful for discovery and for the CLI --list surface.
morie_ingest_tps_layers()morie_ingest_tps_layers()
A base R data.frame with columns name,
url.
morie_ingest_tps_layers()morie_ingest_tps_layers()
Mirrors the Python morie.inspect_output(). Reads a structured output
file and returns a brief summary of its contents.
morie_inspect_output(path)morie_inspect_output(path)
path |
Path to a JSON, CSV, or RDS file. |
Supported formats: .json (via jsonlite), .csv (via base
utils::read.csv), .rds (via base::readRDS).
A list with components path, format, exists, size_bytes,
and (on success) contents_preview plus type-appropriate metadata.
tmp <- tempfile(fileext = ".json") if (requireNamespace("jsonlite", quietly = TRUE)) { jsonlite::write_json(list(estimate = 0.123, se = 0.045), tmp) morie_inspect_output(tmp) unlink(tmp) }tmp <- tempfile(fileext = ".json") if (requireNamespace("jsonlite", quietly = TRUE)) { jsonlite::write_json(list(estimate = 0.123, se = 0.045), tmp) morie_inspect_output(tmp) unlink(tmp) }
morie's Suggests: list spans ~50 R packages (causal/ML/spatial/IO
families). CRAN policy requires us to leave their install to the
user (no install.packages() at load time, no user-home writes).
This helper resolves which Suggests are missing and (with user
confirmation) installs them, plus prints platform-specific install
hints for the system libraries morie's C/C++ backends use
(libcurl, libsodium, optional liboqs).
morie_install_extras( which = "missing", ask = interactive(), repos = NULL, dependencies = NA, ... )morie_install_extras( which = "missing", ask = interactive(), repos = NULL, dependencies = NA, ... )
which |
Either |
ask |
Logical. If |
repos |
The CRAN-like repository URL(s) to install from.
Default uses |
dependencies |
Passed through to |
... |
Extra args forwarded to |
Invisibly: a list with installed (character of packages
added this call), already_present (already installed),
failed (failed to install), and system_libs (named logical of
detected system libraries).
morie's compiled backends need three C libraries available at build time. Two are typically pre-installed on developer machines; one is optional and gates the post-quantum cryptography family. Install BEFORE installing/upgrading morie so the configure-time probes pick them up.
libcurl (required for HTTP fetchers)
Debian/Ubuntu: sudo apt-get install libcurl4-openssl-dev
Fedora/RHEL: sudo dnf install libcurl-devel
macOS: pre-installed (Apple's libcurl); or brew install curl
Windows: bundled with Rtools
libsodium (required for ChaCha20-Poly1305 + HKDF-SHA256)
Debian/Ubuntu: sudo apt-get install libsodium-dev
Fedora/RHEL: sudo dnf install libsodium-devel
macOS: brew install libsodium
liboqs (optional, gates ML-KEM-768 + ML-DSA-65)
Debian/Ubuntu: build from source (not yet packaged); https://github.com/open-quantum-safe/liboqs
Fedora/RHEL: sudo dnf install liboqs-devel (recent releases)
macOS: brew install liboqs
## Not run: # Interactive: install whichever Suggests are missing morie_install_extras() # CI / scripted: install all, no prompt morie_install_extras(which = "all", ask = FALSE) # Just one family morie_install_extras(which = c("hawkes", "sf", "spdep")) ## End(Not run)## Not run: # Interactive: install whichever Suggests are missing morie_install_extras() # CI / scripted: install all, no prompt morie_install_extras(which = "all", ask = FALSE) # Just one family morie_install_extras(which = c("hawkes", "sf", "spdep")) ## End(Not run)
Test whether an eBAC exceeds a legal driving limit
morie_is_over_legal_limit(ebac, limit = 0.08)morie_is_over_legal_limit(ebac, limit = 0.08)
ebac |
Numeric eBAC value (e.g. from |
limit |
Legal threshold (default 0.08, the per-se limit in most Canadian and US jurisdictions). |
Integer 1 if ebac >= limit, 0 otherwise. (Integer, not
logical, to match the Python sibling and ease binary-outcome modelling.)
morie_is_over_legal_limit(0.09) morie_is_over_legal_limit(0.05, limit = 0.05)morie_is_over_legal_limit(0.09) morie_is_over_legal_limit(0.05, limit = 0.05)
Anderson-Rubin (AR) weak-IV-robust test
morie_iv_anderson_rubin( data, outcome, endogenous, instruments, exogenous = NULL, beta0 = NULL, alpha = 0.05 )morie_iv_anderson_rubin( data, outcome, endogenous, instruments, exogenous = NULL, beta0 = NULL, alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable. |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
beta0 |
Numeric scalar or vector; the structural-coefficient
value(s) to test under H0. Length must match |
alpha |
Significance level (default |
Grid-based Anderson-Rubin confidence interval for a single endogenous variable.
morie_iv_anderson_rubin_ci( data, outcome, endogenous, instruments, exogenous = NULL, grid_min = -10, grid_max = 10, grid_n = 200, alpha = 0.05 )morie_iv_anderson_rubin_ci( data, outcome, endogenous, instruments, exogenous = NULL, grid_min = -10, grid_max = 10, grid_n = 200, alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable. |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
grid_min |
Numeric; lower bound of the AR confidence-set
grid search over candidate |
grid_max |
Numeric; upper bound of the AR confidence-set grid. |
grid_n |
Integer; number of grid points used in
|
alpha |
Significance level (default |
Conditional likelihood-ratio (CLR) test of Moreira (2003)
morie_iv_conditional_lr( data, outcome, endogenous, instruments, exogenous = NULL, beta0 = 0 )morie_iv_conditional_lr( data, outcome, endogenous, instruments, exogenous = NULL, beta0 = 0 )
data |
A |
outcome |
Character; column name of the response variable. |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
beta0 |
Numeric scalar or vector; the structural-coefficient
value(s) to test under H0. Length must match |
Control-function (residual augmentation) IV
morie_iv_control_function( data, outcome, endogenous, instruments, exogenous = NULL, robust = TRUE, alpha = 0.05 )morie_iv_control_function( data, outcome, endogenous, instruments, exogenous = NULL, robust = TRUE, alpha = 0.05 )
data |
Data frame. |
outcome |
Name of the outcome column. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
exogenous |
Optional character vector of exogenous covariate names. |
robust |
Logical; if |
alpha |
Significance level for confidence intervals. |
Computes the Cragg-Donald (1993) weak-instrument statistic. The
statistic is a function of the first-stage regression and is
independent of the outcome variable; outcome only needs to
name a numeric column in data so ivreg can compile
a formula. When outcome = NULL (default), the first
endogenous regressor is reused as the outcome – works because
ivreg's weak-IV diagnostic comes from the first stage
regardless of y.
morie_iv_cragg_donald( data, endogenous, instruments, exogenous = NULL, outcome = NULL )morie_iv_cragg_donald( data, endogenous, instruments, exogenous = NULL, outcome = NULL )
data |
Data frame. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
exogenous |
Optional exogenous covariates. |
outcome |
Optional outcome column name. Default |
Named list with statistic, p_value,
name, details.
Continuously-Updated GMM (CUE-GMM)
morie_iv_cue_gmm( data, outcome, endogenous, instruments, exogenous = NULL, max_iter = 100, tol = 1e-08, alpha = 0.05 )morie_iv_cue_gmm( data, outcome, endogenous, instruments, exogenous = NULL, max_iter = 100, tol = 1e-08, alpha = 0.05 )
data |
Data frame. |
outcome |
Name of the outcome column. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
exogenous |
Optional character vector of exogenous covariate names. |
max_iter |
Outer iteration cap (default 100). |
tol |
Convergence tolerance on the objective. |
alpha |
Significance level for confidence intervals. |
Composite IV diagnostics
morie_iv_diagnostics(data, outcome, endogenous, instruments, exogenous = NULL)morie_iv_diagnostics(data, outcome, endogenous, instruments, exogenous = NULL)
data |
A |
outcome |
Character; column name of the response variable. |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
Durbin-Wu-Hausman test of endogeneity
morie_iv_durbin_wu_hausman( data, outcome, endogenous, instruments, exogenous = NULL )morie_iv_durbin_wu_hausman( data, outcome, endogenous, instruments, exogenous = NULL )
data |
A |
outcome |
Character; column name of the response variable. |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
First-stage F-statistics and partial R^2
morie_iv_first_stage_diagnostics( data, endogenous, instruments, exogenous = NULL )morie_iv_first_stage_diagnostics( data, endogenous, instruments, exogenous = NULL )
data |
A |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
Two-step efficient GMM via gmm::gmm; falls back to 2SLS otherwise.
morie_iv_gmm( data, outcome, endogenous, instruments, exogenous = NULL, weight_matrix = "optimal", robust = TRUE, alpha = 0.05 )morie_iv_gmm( data, outcome, endogenous, instruments, exogenous = NULL, weight_matrix = "optimal", robust = TRUE, alpha = 0.05 )
data |
Data frame. |
outcome |
Name of the outcome column. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
exogenous |
Optional character vector of exogenous covariate names. |
weight_matrix |
One of |
robust |
Logical; if |
alpha |
Significance level for confidence intervals. |
Hansen J test of overidentifying restrictions (robust)
morie_iv_hansen_j(data, outcome, endogenous, instruments, exogenous = NULL)morie_iv_hansen_j(data, outcome, endogenous, instruments, exogenous = NULL)
data |
A |
outcome |
Character; column name of the response variable. |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
Hausman test: OLS vs 2SLS
morie_iv_hausman(data, outcome, endogenous, instruments, exogenous = NULL)morie_iv_hausman(data, outcome, endogenous, instruments, exogenous = NULL)
data |
A |
outcome |
Character; column name of the response variable. |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
Jackknife IV (JIVE; Angrist, Imbens & Krueger 1999)
morie_iv_jive( data, outcome, endogenous, instruments, exogenous = NULL, alpha = 0.05 )morie_iv_jive( data, outcome, endogenous, instruments, exogenous = NULL, alpha = 0.05 )
data |
Data frame. |
outcome |
Name of the outcome column. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
exogenous |
Optional character vector of exogenous covariate names. |
alpha |
Significance level for confidence intervals. |
Kleibergen-Paap rank statistic
morie_iv_kleibergen_paap(data, endogenous, instruments, exogenous = NULL)morie_iv_kleibergen_paap(data, endogenous, instruments, exogenous = NULL)
data |
A |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
Solves the LIML eigenvalue problem; falls back to ivreg::ivreg(...,
method = "M") if available.
morie_iv_liml( data, outcome, endogenous, instruments, exogenous = NULL, robust = TRUE, alpha = 0.05 )morie_iv_liml( data, outcome, endogenous, instruments, exogenous = NULL, robust = TRUE, alpha = 0.05 )
data |
Data frame. |
outcome |
Name of the outcome column. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
exogenous |
Optional character vector of exogenous covariate names. |
robust |
Logical; if |
alpha |
Significance level for confidence intervals. |
Panel IV with unit (and optional time) fixed effects via within-transform
morie_iv_panel( data, outcome, endogenous, instruments, unit, exogenous = NULL, time_fe = NULL, alpha = 0.05 )morie_iv_panel( data, outcome, endogenous, instruments, unit, exogenous = NULL, time_fe = NULL, alpha = 0.05 )
data |
Data frame. |
outcome |
Name of the outcome column. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
unit |
Cluster / unit identifier column. |
exogenous |
Optional character vector of exogenous covariate names. |
time_fe |
Optional time-FE column. |
alpha |
Significance level for confidence intervals. |
IV Probit (Rivers-Vuong control function)
morie_iv_probit( data, outcome, endogenous, instruments, exogenous = NULL, alpha = 0.05 )morie_iv_probit( data, outcome, endogenous, instruments, exogenous = NULL, alpha = 0.05 )
data |
Data frame. |
outcome |
Name of the outcome column. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
exogenous |
Optional character vector of exogenous covariate names. |
alpha |
Significance level for confidence intervals. |
IV residual analysis
morie_iv_residual_analysis( data, outcome, endogenous, instruments, exogenous = NULL )morie_iv_residual_analysis( data, outcome, endogenous, instruments, exogenous = NULL )
data |
A |
outcome |
Character; column name of the response variable. |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
Sargan test of overidentifying restrictions (homoskedastic)
morie_iv_sargan(data, outcome, endogenous, instruments, exogenous = NULL)morie_iv_sargan(data, outcome, endogenous, instruments, exogenous = NULL)
data |
A |
outcome |
Character; column name of the response variable. |
endogenous |
Character vector; column names of the endogenous regressors. |
instruments |
Character vector; column names of the instrumental variables. |
exogenous |
Optional character vector of additional exogenous
regressors included in both the structural equation and the
first stage. |
Split-sample IV
morie_iv_split_sample( data, outcome, endogenous, instruments, exogenous = NULL, split_fraction = 0.5, seed = 42, alpha = 0.05 )morie_iv_split_sample( data, outcome, endogenous, instruments, exogenous = NULL, split_fraction = 0.5, seed = 42, alpha = 0.05 )
data |
Data frame. |
outcome |
Name of the outcome column. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
exogenous |
Optional character vector of exogenous covariate names. |
split_fraction |
Fraction of the data used in the first stage. |
seed |
RNG seed. |
alpha |
Significance level for confidence intervals. |
Stock-Yogo critical values
morie_iv_stock_yogo(n_endogenous = 1, n_instruments = 1)morie_iv_stock_yogo(n_endogenous = 1, n_instruments = 1)
n_endogenous |
Integer; number of endogenous regressors used to look up the Stock-Yogo critical-value table. |
n_instruments |
Integer; number of instruments used to look up the Stock-Yogo critical-value table. |
Estimates a linear IV model via 2SLS, preferring ivreg::ivreg.
morie_iv_tsls( data, outcome, endogenous, instruments, exogenous = NULL, cluster = NULL, robust = TRUE, alpha = 0.05 )morie_iv_tsls( data, outcome, endogenous, instruments, exogenous = NULL, cluster = NULL, robust = TRUE, alpha = 0.05 )
data |
Data frame. |
outcome |
Name of the outcome column. |
endogenous |
Character vector of endogenous regressor names. |
instruments |
Character vector of excluded-instrument names. |
exogenous |
Optional character vector of exogenous covariate names. |
cluster |
Optional name of a cluster ID column. |
robust |
Logical; if |
alpha |
Significance level for confidence intervals. |
A list with class morie_iv_result containing coefficients,
standard errors, t-statistics, p-values, confidence interval bounds,
variable names, sample size, method label, and a details list.
.
morie_iv_wald(data, outcome, treatment, instrument, alpha = 0.05)morie_iv_wald(data, outcome, treatment, instrument, alpha = 0.05)
data |
Data frame. |
outcome |
Outcome column. |
treatment |
Endogenous treatment column. |
instrument |
Binary instrument column. |
alpha |
Significance level. |
Delete-1 jackknife variance estimate
morie_jackknife_estimate(df, statistic)morie_jackknife_estimate(df, statistic)
df |
A data frame. |
statistic |
A function taking a data frame and returning a scalar. |
Named list: estimate, se, bias.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Johansen trace test for cointegration
morie_johansen_cointegration(x, k_ar_diff = 1)morie_johansen_cointegration(x, k_ar_diff = 1)
x |
Numeric matrix (T x k) of I(1) candidate series. |
k_ar_diff |
Number of lagged differences. Default 1. |
Named list with eigenvalues, trace_stat, crit_values,
rank, n, k, method.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Defaults to a univariate local-level model when matrices are omitted.
morie_kalman_filter( x, transition = NULL, H = NULL, Q = NULL, R = NULL, x0 = NULL, P0 = NULL )morie_kalman_filter( x, transition = NULL, H = NULL, Q = NULL, R = NULL, x0 = NULL, P0 = NULL )
x |
Numeric vector or matrix of observations. |
transition |
Transition matrix (default identity). |
H |
Observation matrix (default identity). |
Q |
State-innovation covariance (default sigma^2 I). |
R |
Observation covariance (default sigma^2 I). |
x0 |
Initial state mean. |
P0 |
Initial state covariance. |
Named list with state, state_cov, innovations,
innovation_variance, loglik, n, method.
morie_kalman_filter(x = rnorm(50))morie_kalman_filter(x = rnorm(50))
Kendall's tau-b
morie_kendall_tau(x, y)morie_kendall_tau(x, y)
x |
Numeric vector. |
y |
Numeric vector. |
Named list: tau, p_value.
morie_kendall_tau(x = rnorm(50), y = rnorm(50))morie_kendall_tau(x = rnorm(50), y = rnorm(50))
tau_xy.z = (tau_xy - tau_xz tau_yz) / sqrt((1 - tau_xz^2)(1 - tau_yz^2))
morie_kendall_tau_partial(x, y, z)morie_kendall_tau_partial(x, y, z)
x, y, z
|
Numeric vectors of equal length. |
Named list: statistic (partial tau), p_value, tau_xy, tau_xz, tau_yz, z, n.
morie_kendall_tau_partial(x = rnorm(50), y = rnorm(50), z = rnorm(50))morie_kendall_tau_partial(x = rnorm(50), y = rnorm(50), z = rnorm(50))
Thin extender over kernlab::kpca that performs PCA in a
feature space induced by a reproducing-kernel.
morie_kernel_pca(x, ...)morie_kernel_pca(x, ...)
x |
Numeric matrix or data frame of features (rows = observations). |
... |
Further arguments forwarded to |
A list with $method = "kernlab::kpca" and
$raw (a kpca S4 object with eigenvalues,
eigenvectors and the projected rotated data).
## Not run: if (requireNamespace("kernlab", quietly = TRUE)) { set.seed(1) x <- matrix(stats::rnorm(200), ncol = 4) morie_kernel_pca(x, kernel = "rbfdot", features = 2) } ## End(Not run)## Not run: if (requireNamespace("kernlab", quietly = TRUE)) { set.seed(1) x <- matrix(stats::rnorm(200), ncol = 4) morie_kernel_pca(x, kernel = "rbfdot", features = 2) } ## End(Not run)
Wraps stats::kmeans with Hartigan-Wong (the default).
morie_kmeans_clustering( x, n_clusters = 3L, n_init = 10L, max_iter = 300L, seed = 0L )morie_kmeans_clustering( x, n_clusters = 3L, n_init = 10L, max_iter = 300L, seed = 0L )
x |
Numeric matrix. |
n_clusters |
Number of clusters K. |
n_init |
Number of random restarts. |
max_iter |
Max Lloyd iterations. |
seed |
RNG seed. |
Named list: estimate (inertia), labels, centers, inertia, n_iter, n_clusters, n, method.
morie_kmeans_clustering(x = rnorm(50))morie_kmeans_clustering(x = rnorm(50))
Kruskal-Wallis non-parametric ANOVA
morie_kruskal_wallis_test(...)morie_kruskal_wallis_test(...)
... |
Numeric vectors, one per group. |
Named list: H, df, p_value.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
R port of morie.laniyonu.actuarial_risk_disparity. Audits
the Correctional Service of Canada's four ordinal risk instruments
(Static, DFIA-R Dynamic, Offender Security Level, Reintegration
Potential) and the two binary downstream outcomes (parole granted;
institutional housing level) for race x gender bias.
Pass one of the four ordinal risk scores ("static",
"dynamic", "osl", "reintegration") to run the
Stage 1 threshold-specific ordinal logit; pass "parole" or
"housing" to run the Stage 2 score-net-residual binary logit.
morie_laniyonu_actuarial_risk_disparity( df, outcome, race_cols, gender_col = "gender", score_col = NULL, control_cols = NULL, ordinal_levels = c("low", "medium", "high"), outcome_col = NULL, split_by_gender = TRUE, bootstrap_replicates = 200L, random_state = 20260513L )morie_laniyonu_actuarial_risk_disparity( df, outcome, race_cols, gender_col = "gender", score_col = NULL, control_cols = NULL, ordinal_levels = c("low", "medium", "high"), outcome_col = NULL, split_by_gender = TRUE, bootstrap_replicates = 200L, random_state = 20260513L )
df |
Sentence-level (one row per sentence) CSC microdata. |
outcome |
One of |
race_cols |
Character vector of 0/1 race indicator columns (White is the implicit reference; pass non-reference levels). |
gender_col |
Categorical gender column. |
score_col |
Required when |
control_cols |
Optional additional control columns (age, priors, sentence length, etc.). Pre-dummy any categoricals. |
ordinal_levels |
Level ordering for ordinal outcomes
(default |
outcome_col |
Optional override for the default column name. |
split_by_gender |
If |
bootstrap_replicates |
Stage 2 only; bootstrap reps for residual race-coefficient SEs. |
random_state |
Seed for bootstrap. |
Two empirical stages match the paper:
Stage 1 (ordinal scores): threshold-specific cumulative-logit, fit as two separate binary logits at the low->medium and medium->high cutoffs, plus a proportional-odds LR test. The headline pattern is much larger |beta| at the low->medium cut than at medium->high.
Stage 2 (binary outcomes): the score-net-residual audit - logistic regression of outcome on actuarial score + race indicators (+ controls). A non-zero residual race coefficient is the disparate-treatment signal.
Caveat surfaced via warning() on every call (Goel et al. 2021):
a non-zero residual race coefficient is evidence of OUTPUT disparity,
not PREDICTIVE-VALIDITY disparity. The two are conceptually
distinct; the paper's disparate-treatment claim rests on the former.
A named list of class morie_laniyonu_ard_result
carrying the per-stratum coefficients and a multi-paragraph
interpretation string.
A named list of class morie_laniyonu_ard_result.
O'Connell, C., & Laniyonu, A. (2025). Race, gender, and risk assessments in Canadian federal prison. Race & Justice, 15(3), 428-453.
Goel, S., Shroff, R., Skeem, J., & Slobogin, C. (2021). The accuracy, equity, and jurisprudence of criminal risk assessment. In Research Handbook on Big Data Law (pp. 9-28).
R port of morie.laniyonu.gentrification_policing. Estimates
the direct, indirect (spatial spillover), and total effect of
gentrification on NYPD stop-and-frisk rates at the census-tract x
year level via a Spatial Durbin Model (SDM) decomposition.
morie_laniyonu_gentrification_policing( df, year_col = "year", tract_id_col = "tract_id", stops_col = "stops", population_col = "population", crime_col = "felony_count", demand_col = "calls_311_omp", baseline_income_col = "median_inc_2000", baseline_rent_col = "median_rent_2000", growth_college_col = NULL, growth_rent_col = NULL, follow_income_col = "median_inc_2014", follow_rent_col = "median_rent_2014", baseline_college_col = "pct_ba_2000", follow_college_col = "pct_ba_2014", additional_controls = NULL, weight_matrix = NULL, weight_matrix_kind = c("queen", "knn"), fitted_rho = NULL, fitted_beta_direct = NULL, fitted_beta_spatial = NULL, years = NULL, log_outcome = TRUE )morie_laniyonu_gentrification_policing( df, year_col = "year", tract_id_col = "tract_id", stops_col = "stops", population_col = "population", crime_col = "felony_count", demand_col = "calls_311_omp", baseline_income_col = "median_inc_2000", baseline_rent_col = "median_rent_2000", growth_college_col = NULL, growth_rent_col = NULL, follow_income_col = "median_inc_2014", follow_rent_col = "median_rent_2014", baseline_college_col = "pct_ba_2000", follow_college_col = "pct_ba_2014", additional_controls = NULL, weight_matrix = NULL, weight_matrix_kind = c("queen", "knn"), fitted_rho = NULL, fitted_beta_direct = NULL, fitted_beta_spatial = NULL, years = NULL, log_outcome = TRUE )
df |
Tract-year panel. One row per tract per year. |
year_col, tract_id_col, stops_col, population_col, crime_col, demand_col
|
Column names; defaults match the morie toy bundle schema. |
baseline_income_col, baseline_rent_col
|
Baseline-period income and rent (2000 in the paper). |
growth_college_col, growth_rent_col
|
Growth columns. If
|
follow_income_col, follow_rent_col, baseline_college_col, follow_college_col
|
Used when growth columns are not pre-computed. |
additional_controls |
Extra tract-year controls (pct_black, etc.). |
weight_matrix |
Pre-computed (N, N) row-standardised spatial
weights. Required when |
weight_matrix_kind |
Provenance label only. |
fitted_rho, fitted_beta_direct, fitted_beta_spatial
|
Pre-fitted SDM outputs. Pass these to bypass lite-mode. |
years |
Subset of years to analyse. |
log_outcome |
If |
The paper's headline finding: gentrification has roughly zero direct effect on stops/capita inside the gentrifying tract, but a +51\ stops/capita in neighbouring tracts.
Two modes are supported:
Pre-fitted mode (preferred): pass fitted_rho,
fitted_beta_direct, fitted_beta_spatial from your
own SDM fit (e.g.\ spatialreg::lagsarlm with Durbin terms).
This wrapper handles the diagnostic ladder + Kelejian-Prucha
spillover decomposition.
Lite mode (fall-back): OLS + Moran's I on residuals, decomposition with rho=0. Useful for sanity-checks only.
A list of class morie_laniyonu_gp_result, one
per year analysed.
A list of morie_laniyonu_gp_result, one per
year analysed.
Laniyonu, A. (2018). Coffee shops and street stops: Policing practices in gentrifying neighborhoods. Urban Affairs Review, 54(5), 898-930.
LeSage, J. P., & Pace, R. K. (2009). Introduction to Spatial Econometrics. CRC Press.
R port of morie.laniyonu.smi_force_disparity. Estimates a
hierarchical negative-binomial model with a synthetic area-exposure
(SAE) offset for persons-with-serious-mental-illness (PwSMI), with
year fixed effects and an area random intercept.
Composes a synthetic-area-exposure (SAE) step (base-R logistic on survey microdata, predicted at ACS tract marginals) into a negative-binomial GLM with year fixed effects and an area random intercept approximated by ridge-penalised area dummies.
morie_laniyonu_smi_force_disparity( df, survey_df, survey_trait_col = "smi", survey_covariate_cols, area_covariate_cols = NULL, force_count_col = "force_events", non_smi_count_col = NULL, geog_col = "tract_id", year_col = "year", population_col = "pop_18plus", baseline_year = NULL, include_year_fe = TRUE, include_area_re = TRUE, max_iter = 500L, tol = 1e-06, return_design = FALSE )morie_laniyonu_smi_force_disparity( df, survey_df, survey_trait_col = "smi", survey_covariate_cols, area_covariate_cols = NULL, force_count_col = "force_events", non_smi_count_col = NULL, geog_col = "tract_id", year_col = "year", population_col = "pop_18plus", baseline_year = NULL, include_year_fe = TRUE, include_area_re = TRUE, max_iter = 500L, tol = 1e-06, return_design = FALSE )
df |
Force-event panel, one row per (area, year). |
survey_df |
Survey microdata for fitting P(SMI | covariates). |
survey_trait_col |
Binary column in |
survey_covariate_cols |
Covariates available in BOTH survey_df and df. |
area_covariate_cols |
Optional rename map for df. |
force_count_col |
Count of force events against PwSMI per (area, year). |
non_smi_count_col |
Count of force events against non-SMI per
(area, year). If |
geog_col, year_col, population_col
|
Column names. |
baseline_year |
Year to drop as the reference (default = min). |
include_year_fe, include_area_re
|
Toggle the year FE / area RE blocks. |
max_iter, tol
|
Optimiser controls. |
return_design |
Attach |
The trick: there is no administrative census of who has SMI at the tract level, so the denominator is built by:
Fitting P(SMI | age, sex, race, income, ...) on a national survey using only covariates also tabulated at the tract level by the ACS.
Applying those coefficients to ACS tract marginals to get a per-tract predicted P(SMI).
Multiplying by adult population for a synthetic exposure
denominator .
The count model is
with = PwSMI vs non-SMI, = year, = area.
The headline coefficient is the log relative-risk of
police use of force against PwSMI vs non-SMI.
Paper headlines: RR PwSMI = 11.6x (tract); 10.2x (precinct).
This R port is a frequentist MLE approximation (via
stats::glm.nb in MASS, falling back to a hand-rolled NB MLE
on stats::optim if MASS is unavailable). For paper-grade
Bayesian credible intervals, fit in brms / rstanarm
using the design matrix returned with return_design=TRUE.
Surfaces a warning() on every call: the SMI flag on force
events is a proxy biased TOWARD THE NULL (officers miss more SMI
than they over-attribute), so the estimated is a
conservative lower bound on the true disparity.
A list of class morie_laniyonu_smi_result.
A list of class morie_laniyonu_smi_result.
Laniyonu, A., & Goff, P. A. (2021). Measuring disparities in police use of force and injury among persons with serious mental illness. BMC Psychiatry, 21(1), 500.
Thin extender over lcmm::lcmm for the
Proust-Lima et al. (2017) latent-class linear mixed model on
longitudinal / repeated-measures data.
morie_lcmm_latent_class(fixed, random = ~1, subject, data, ng = 2, ...)morie_lcmm_latent_class(fixed, random = ~1, subject, data, ng = 2, ...)
fixed |
A two-sided formula for the fixed-effects part of the model. |
random |
A one-sided formula for the random-effects part
(default |
subject |
Character; the name of the column in |
data |
A data frame containing the variables in
|
ng |
Integer; the number of latent classes (default
|
... |
Further arguments forwarded to |
A list with $method = "lcmm::lcmm" and $raw
(an lcmm object with the class-membership probabilities,
class-specific fixed-effect estimates, and convergence
diagnostics).
## Not run: if (requireNamespace("lcmm", quietly = TRUE)) { data("data_hlme", package = "lcmm") morie_lcmm_latent_class( fixed = Y ~ Time, random = ~ Time, subject = "ID", data = data_hlme, ng = 2, mixture = ~ Time ) } ## End(Not run)## Not run: if (requireNamespace("lcmm", quietly = TRUE)) { data("data_hlme", package = "lcmm") morie_lcmm_latent_class( fixed = Y ~ Time, random = ~ Time, subject = "ID", data = data_hlme, ng = 2, mixture = ~ Time ) } ## End(Not run)
Manual implementation of the sklearn morie_learning_curve flow: shuffle, split into k folds, for each train-fraction fit on a prefix of the training fold and score on the held-out fold.
morie_learning_curve(x, y, sizes = NULL, cv = 5L, seed = 0L)morie_learning_curve(x, y, sizes = NULL, cv = 5L, seed = 0L)
x |
Numeric matrix predictors. |
y |
Numeric response. |
sizes |
Training-set fractions (default seq(0.1, 1.0, length=5)). |
cv |
Number of CV folds. |
seed |
RNG seed for shuffling. |
Named list: estimate (final val MSE), train_sizes, train_scores, val_scores, n, method.
morie_learning_curve(x = rnorm(50), y = rnorm(50))morie_learning_curve(x = rnorm(50), y = rnorm(50))
Levene test for equality of variances
morie_levene_test(...)morie_levene_test(...)
... |
Numeric vectors, one per group. |
Named list: F, p_value.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
morie's SPDX-style licence metadata
morie_license_metadata()morie_license_metadata()
A named list summarising morie's licence posture, useful for pipeline build manifests, auditd logs, and downstream compliance pipelines.
morie_license_metadata()morie_license_metadata()
Wraps stats::lm and returns coefficients plus classical OLS
standard errors.
morie_linear_regression_ols(x, y)morie_linear_regression_ols(x, y)
x |
Numeric matrix or vector of predictors. |
y |
Numeric response vector. |
Named list with estimate (intercept + slopes),
se, n, method.
Hastie, Tibshirani & Friedman, Elements of Statistical Learning (2009).
morie_linear_regression_ols(x = rnorm(50), y = rnorm(50))morie_linear_regression_ols(x = rnorm(50), y = rnorm(50))
List all datasets with cache status
morie_list_datasets(db_path = NULL, con = NULL)morie_list_datasets(db_path = NULL, con = NULL)
db_path |
Optional path to a SQLite/DuckDB file (default backend). |
con |
Optional pre-opened DBI connection (overrides |
A data.frame with columns: key, name, source, survey, year, type, cached (logical), rows (integer or NA).
morie_list_datasets()morie_list_datasets()
List implemented MORIE CPADS modules
morie_list_morie_modules()morie_list_morie_modules()
Data frame describing the implemented module surface.
morie_list_morie_modules()morie_list_morie_modules()
Return TRUE when at least one live LLM provider is available
morie_llm_agent_available()morie_llm_agent_available()
Logical scalar.
R port of morie.llm.ask. Tries each provider in priority order; on
HTTP/timeout failure falls through to the next, and finally to a
static local help string.
morie_llm_ask( prompt, context = NULL, model = NULL, provider = NULL, system_prompt = NULL, timeout = 120 )morie_llm_ask( prompt, context = NULL, model = NULL, provider = NULL, system_prompt = NULL, timeout = 120 )
prompt |
User question or instruction. |
context |
Optional named list injected as text into the system prompt. |
model |
Optional model override. |
provider |
Optional provider override (ollama/gemini/api/openai/local). NULL = auto-detect. |
system_prompt |
Optional full system-prompt override. |
timeout |
HTTP timeout in seconds. Default 120. |
Character scalar response text, or local-fallback text when all providers fail.
R port of morie.llm.ask_multi. Unlike :func:morie_llm_ask, this
accepts a pre-built messages list (each element: role/content)
enabling multi-turn conversation. Streaming is not supported in the R
port – this always returns a single character scalar.
morie_llm_ask_multi(messages, providers = NULL, model = NULL, timeout = 120)morie_llm_ask_multi(messages, providers = NULL, model = NULL, timeout = 120)
messages |
list of role/content lists. |
providers |
Optional character vector forcing a specific provider ordering. When NULL the auto-detected provider is tried first, then the remaining providers in priority order. |
model |
Optional model identifier. |
timeout |
HTTP timeout in seconds. |
Provider fall-through order mirrors :func:morie_llm_detect_provider:
ollama -> freeapi -> gemini -> api -> openai -> local.
Character scalar response text.
Detect the active LLM provider
morie_llm_detect_provider()morie_llm_detect_provider()
Character scalar provider key: ollama / gemini / api / openai / local.
R port of list_freeapi_models. Walks any JSON files in
inst/ollama_json/ (mirroring the Python morie/ollama_json/
vendoring) and emits a data.frame with one row per unique model. When the
catalogue directory is absent (R-only install), a single fallback row for
the default model is returned so downstream callers always get a usable
table.
morie_llm_list_freeapi_models()morie_llm_list_freeapi_models()
data.frame with columns model / family / size / label / alias.
R port of the Python _probe_freeapi helper. Returns TRUE when at
least one free remote model is reachable. The result is cached in
options(morie.llm.freeapi_cached) for the process lifetime. A single
one-second retry is performed because community servers can be slow.
morie_llm_probe_freeapi(timeout = 4)morie_llm_probe_freeapi(timeout = 4)
timeout |
Probe timeout in seconds. Default 4. |
Logical scalar.
Probe a local Ollama instance
morie_llm_probe_ollama(timeout = 2)morie_llm_probe_ollama(timeout = 2)
timeout |
Probe timeout in seconds. |
Logical scalar – TRUE when reachable.
POST a chat-completion request to an OpenAI-compatible endpoint
morie_llm_request_completion( base_url, model, messages, api_key = NULL, timeout = 120 )morie_llm_request_completion( base_url, model, messages, api_key = NULL, timeout = 120 )
base_url |
Provider base URL. |
model |
Model identifier. |
messages |
List of role/content lists. |
api_key |
Optional bearer token (NULL for local Ollama). |
timeout |
Seconds. Default 120. |
Parsed JSON list (the response body).
Resolution order:
Local RDS/CSV files in standard project locations
SQLite cache (data/cache/morie.db)
CKAN API fetch (requires internet)
morie_load_cpads(db_path = NULL, use_ckan = TRUE, con = NULL)morie_load_cpads(db_path = NULL, use_ckan = TRUE, con = NULL)
db_path |
Optional path to a SQLite/DuckDB file (default backend). |
use_ckan |
Logical; if TRUE and data not found locally or in cache, attempt to fetch from the CKAN API. |
con |
Optional pre-opened DBI connection (overrides |
A data.frame with canonical CPADS columns.
## Not run: # Needs the CPADS PUMF (local file, cache, or a live CKAN fetch). cpads <- morie_load_cpads(use_ckan = TRUE) if (!is.null(cpads)) head(cpads) ## End(Not run)## Not run: # Needs the CPADS PUMF (local file, cache, or a live CKAN fetch). cpads <- morie_load_cpads(use_ckan = TRUE) if (!is.null(cpads)) head(cpads) ## End(Not run)
Load the real CPADS CSV from this repository
morie_load_cpads_data(cpads_csv = .cpads_default_csv())morie_load_cpads_data(cpads_csv = .cpads_default_csv())
cpads_csv |
Path to the CPADS CSV. |
Canonicalized CPADS data frame.
# Reads and canonicalises the CPADS PUMF CSV. The default CSV lives in # a morie project tree; the CKAN-fetched PUMF works identically (see # morie_load_dataset("ocp21")). The tryCatch guard lets the example # render cleanly on machines without the CSV checked out locally. tryCatch(morie_load_cpads_data(), error = function(e) message(conditionMessage(e)))# Reads and canonicalises the CPADS PUMF CSV. The default CSV lives in # a morie project tree; the CKAN-fetched PUMF works identically (see # morie_load_dataset("ocp21")). The tryCatch guard lets the example # render cleanly on machines without the CSV checked out locally. tryCatch(morie_load_cpads_data(), error = function(e) message(conditionMessage(e)))
Resolution tiers, tried in order: built-in DB -> user cache -> local
file -> CKAN datastore -> direct download URL -> ArcGIS layer ->
error. Supports fuzzy matching: morie_load_dataset("cpads_2021")
resolves to ocp21.
morie_load_dataset(key, db_path = NULL, refresh = FALSE, con = NULL)morie_load_dataset(key, db_path = NULL, refresh = FALSE, con = NULL)
key |
Dataset catalog key (or fuzzy match). |
db_path |
Optional path to a SQLite/DuckDB file (default backend). |
refresh |
If |
con |
Optional pre-opened DBI connection for the user cache
(overrides |
A data.frame.
morie_fetch, morie_ckan_search
## Not run: df <- morie_load_dataset("ocp21") # CPADS 2021-2022 (default DuckDB cache) df <- morie_load_dataset("ocp21", refresh = TRUE) # force re-fetch # PostgreSQL cache (run a server first): # con <- DBI::dbConnect(RPostgres::Postgres(), # host = "localhost", dbname = "morie", user = "...") # df <- morie_load_dataset("ocp21", con = con) ## End(Not run)## Not run: df <- morie_load_dataset("ocp21") # CPADS 2021-2022 (default DuckDB cache) df <- morie_load_dataset("ocp21", refresh = TRUE) # force re-fetch # PostgreSQL cache (run a server first): # con <- DBI::dbConnect(RPostgres::Postgres(), # host = "localhost", dbname = "morie", user = "...") # df <- morie_load_dataset("ocp21", con = con) ## End(Not run)
Thin extender over locfdr::locfdr for Efron's
empirical-Bayes local false-discovery-rate estimation from a
vector of z-scores (Efron, 2004; Efron, 2010).
morie_locfdr_estimate(zz, ...)morie_locfdr_estimate(zz, ...)
zz |
Numeric vector of test statistics (typically z-scores) for the locfdr empirical-null fit. |
... |
Further arguments forwarded to |
A list with $method = "locfdr::locfdr" and
$raw (the locfdr object with the fitted local-FDR
curve, empirical-null parameters, and the fdr / Fdrleft /
Fdrright summaries).
## Not run: if (requireNamespace("locfdr", quietly = TRUE)) { set.seed(1) zz <- c(stats::rnorm(900), stats::rnorm(100, mean = 3)) morie_locfdr_estimate(zz) } ## End(Not run)## Not run: if (requireNamespace("locfdr", quietly = TRUE)) { set.seed(1) zz <- c(stats::rnorm(900), stats::rnorm(100, mean = 3)) morie_locfdr_estimate(zz) } ## End(Not run)
Mann-Whitney U test (Wilcoxon rank-sum)
morie_mann_whitney_test( x1, x2, alternative = c("two.sided", "greater", "less") )morie_mann_whitney_test( x1, x2, alternative = c("two.sided", "greater", "less") )
x1 |
Numeric vector (group 1). |
x2 |
Numeric vector (group 2). |
alternative |
|
Named list: W, p_value, r (effect size).
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Reports sigma_m^2 via the VanRaden split sigma_g^2 / (2 sum p_j q_j) alongside the naive sigma_g^2 / p form. sigma_g^2 is obtained from a quick GBLUP fit.
morie_marker_variance(x, y, markers)morie_marker_variance(x, y, markers)
x |
Fixed-effect design (optional). |
y |
Numeric response. |
markers |
(n x m) genotype matrix coded 0/1/2. |
list(estimate, sigma_g2, sigma_e2, h2, sigma_m2_vanraden, sigma_m2_naive, sum_2pq, p_freq, n, p, method).
VanRaden (2008); Montesinos Lopez Ch 3.
morie_marker_variance( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )morie_marker_variance( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )
Computes the conditional-variance Abadie-Imbens SE accounting for the fact that matching introduces correlation across matched observations.
morie_matching_abadie_imbens_se( data, outcome, treatment, match_pairs, n_matches = 1L )morie_matching_abadie_imbens_se( data, outcome, treatment, match_pairs, n_matches = 1L )
data |
Data frame. |
outcome, treatment
|
Column names. |
match_pairs |
Data frame of matched indices. |
n_matches |
Number of matches per treated unit (carried for parity). |
Scalar numeric Abadie-Imbens SE.
Abadie, A., & Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica, 74(1), 235–267.
## Not run: morie_matching_abadie_imbens_se(df, "y", "d", res$match_pairs) ## End(Not run)## Not run: morie_matching_abadie_imbens_se(df, "y", "d", res$match_pairs) ## End(Not run)
Estimates the Average Treatment Effect on the Controls. Uses the
explicit _matched suffix to distinguish it from the IPW estimator
morie_estimate_atc in causal.R.
morie_matching_atc_matched(data, outcome, treatment, match_pairs, alpha = 0.05)morie_matching_atc_matched(data, outcome, treatment, match_pairs, alpha = 0.05)
data |
Data frame. |
outcome |
Outcome column name. |
treatment |
Binary treatment column name. |
match_pairs |
Data frame with columns |
alpha |
Significance level for confidence intervals. |
A list of class morie_te_result.
## Not run: morie_matching_atc_matched(df, "y", "d", res$match_pairs) ## End(Not run)## Not run: morie_matching_atc_matched(df, "y", "d", res$match_pairs) ## End(Not run)
Estimates the Average Treatment Effect via a (weighted) mean difference
between treated and control outcomes. Uses the explicit
_matched suffix to distinguish it from the IPW estimator
morie_estimate_ate in causal.R.
morie_matching_ate_matched( data, outcome, treatment, covariates, weights = NULL, alpha = 0.05 )morie_matching_ate_matched( data, outcome, treatment, covariates, weights = NULL, alpha = 0.05 )
data |
Data frame. |
outcome, treatment
|
Column names. |
covariates |
Character vector of covariates (carried for parity with the Python signature). |
weights |
Optional column of matching / weighting weights. |
alpha |
Significance level for confidence intervals. |
A list of class morie_te_result.
## Not run: morie_matching_ate_matched(df, "y", "d", c("x1", "x2"), weights = "._cem_weight") ## End(Not run)## Not run: morie_matching_ate_matched(df, "y", "d", c("x1", "x2"), weights = "._cem_weight") ## End(Not run)
Estimates the Average Treatment effect on the Treated using paired
differences from a matched sample. Uses the explicit _matched
suffix to distinguish it from the IPW estimator
morie_estimate_att in causal.R.
morie_matching_att_matched( data, outcome, treatment, match_pairs, weights = NULL, alpha = 0.05 )morie_matching_att_matched( data, outcome, treatment, match_pairs, weights = NULL, alpha = 0.05 )
data |
Data frame. |
outcome |
Outcome column name. |
treatment |
Binary treatment column name. |
match_pairs |
Data frame with columns |
weights |
Optional column of matching weights. |
alpha |
Significance level for confidence intervals. |
A list of class morie_te_result.
## Not run: res <- morie_matching_nearest_neighbor(df, "d", c("x1", "x2")) morie_matching_att_matched(df, "y", "d", res$match_pairs) ## End(Not run)## Not run: res <- morie_matching_nearest_neighbor(df, "d", c("x1", "x2")) morie_matching_att_matched(df, "y", "d", res$match_pairs) ## End(Not run)
Reports standardised mean differences (SMD), variance ratios, and
Kolmogorov-Smirnov statistics for each covariate. For a richer
covariate-balance report (including continuous + categorical handling
and Love-plot rendering), see
cobalt::bal.tab /
cobalt::love.plot.
morie_matching_balance( data, treatment, covariates, weights = NULL, threshold = 0.1 )morie_matching_balance( data, treatment, covariates, weights = NULL, threshold = 0.1 )
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
weights |
Optional column name of matching / weighting weights. |
threshold |
Absolute-SMD threshold for the |
A list of class morie_balance_result with
balance_table (a data frame) and scalar summaries
overall_balance, max_smd, balanced.
## Not run: morie_matching_balance(df, "d", c("x1", "x2")) ## End(Not run)## Not run: morie_matching_balance(df, "d", c("x1", "x2")) ## End(Not run)
Thin wrapper around morie_matching_balance returning only the
data-frame component. See cobalt::bal.tab
for an alternative with categorical-variable support.
morie_matching_balance_table(data, treatment, covariates, weights = NULL)morie_matching_balance_table(data, treatment, covariates, weights = NULL)
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
weights |
Optional column name of matching / weighting weights. |
A data frame.
## Not run: morie_matching_balance_table(df, "d", c("x1", "x2")) ## End(Not run)## Not run: morie_matching_balance_table(df, "d", c("x1", "x2")) ## End(Not run)
Finds the largest matched sample with maximum absolute SMD below
balance_threshold. Uses an iterative caliper-tightening
heuristic over morie_matching_nearest_neighbor; for an
exact mixed-integer-programming alternative see
designmatch::cardmatch.
morie_matching_cardinality( data, treatment, covariates, balance_threshold = 0.1, ps = NULL )morie_matching_cardinality( data, treatment, covariates, balance_threshold = 0.1, ps = NULL )
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
balance_threshold |
Maximum absolute SMD tolerated (default 0.1). |
ps |
Optional pre-computed propensity scores. |
A list of class morie_match_result.
Zubizarreta, J. R. (2012). Using mixed integer programming for matching in an observational study of kidney failure after surgery. JASA, 107(500), 1360–1371.
## Not run: morie_matching_cardinality(df, "d", c("x1", "x2"), balance_threshold = 0.1) ## End(Not run)## Not run: morie_matching_cardinality(df, "d", c("x1", "x2"), balance_threshold = 0.1) ## End(Not run)
Thin wrapper around MatchIt::matchit(method = "cem"), which in
turn calls the cem package.
morie_matching_cem(data, treatment, covariates, n_bins = 5L)morie_matching_cem(data, treatment, covariates, n_bins = 5L)
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
n_bins |
Either a single integer (applied to every covariate) or a
named list mapping covariate name to the number of bins
(forwarded as |
A list of class morie_match_result.
Iacus, S. M., King, G., & Porro, G. (2012). Causal inference without balance checking: Coarsened exact matching. Political Analysis, 20(1), 1–24.
## Not run: morie_matching_cem(df, "d", c("x1", "x2"), n_bins = 5) ## End(Not run)## Not run: morie_matching_cem(df, "d", c("x1", "x2"), n_bins = 5) ## End(Not run)
Drops units whose propensity score falls outside the overlap region of treated and control units.
morie_matching_common_support( data, treatment, ps_col = "propensity_score", method = "minmax" )morie_matching_common_support( data, treatment, ps_col = "propensity_score", method = "minmax" )
data |
Data frame. |
treatment |
Binary treatment column name. |
ps_col |
Propensity-score column name (default
|
method |
One of |
A subset of data on common support.
## Not run: df$propensity_score <- morie_matching_estimate_propensity(df, "d", c("x1", "x2")) morie_matching_common_support(df, "d") ## End(Not run)## Not run: df$propensity_score <- morie_matching_estimate_propensity(df, "d", c("x1", "x2")) morie_matching_common_support(df, "d") ## End(Not run)
Matches on the propensity score, then applies bias-corrected linear regression adjustment within the matched sample. Standard errors come from a non-parametric bootstrap.
morie_matching_doubly_robust( data, outcome, treatment, covariates, ps = NULL, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )morie_matching_doubly_robust( data, outcome, treatment, covariates, ps = NULL, n_bootstrap = 200L, seed = 42L, alpha = 0.05 )
data |
Data frame. |
outcome, treatment
|
Column names. |
covariates |
Character vector of covariates. |
ps |
Optional pre-computed propensity scores. |
n_bootstrap |
Number of bootstrap replications. |
seed |
Random seed. |
alpha |
Significance level. |
A list of class morie_te_result with estimand
"ATT_DR".
## Not run: morie_matching_doubly_robust(df, "y", "d", c("x1", "x2"), n_bootstrap = 200) ## End(Not run)## Not run: morie_matching_doubly_robust(df, "y", "d", c("x1", "x2"), n_bootstrap = 200) ## End(Not run)
Thin wrapper around WeightIt::weightit(method = "ebal") (or
ebal::ebalance if WeightIt is unavailable). Computes
weights for the control group so that the weighted moments of the
covariates match those of the treated group.
morie_matching_entropy_balance( data, treatment, covariates, max_moment = 1L, max_iter = 500L, tol = 1e-06 )morie_matching_entropy_balance( data, treatment, covariates, max_moment = 1L, max_iter = 500L, tol = 1e-06 )
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
max_moment |
Highest moment to balance (1 = means, 2 = means + var, 3 = + skewness). |
max_iter |
Maximum Newton iterations (forwarded to ebal). |
tol |
Convergence tolerance (forwarded to ebal). |
A numeric vector of weights aligned to the rows of data
after dropping NAs. Treated units receive weight 1.
Hainmueller, J. (2012). Entropy balancing for causal effects. Political Analysis, 20(1), 25–46.
## Not run: w <- morie_matching_entropy_balance(df, "d", c("x1", "x2")) ## End(Not run)## Not run: w <- morie_matching_entropy_balance(df, "d", c("x1", "x2")) ## End(Not run)
Estimates the probability of treatment via logistic regression or gradient boosting on a set of covariates.
morie_matching_estimate_propensity( data, treatment, covariates, model = "logistic", max_iter = 1000 )morie_matching_estimate_propensity( data, treatment, covariates, model = "logistic", max_iter = 1000 )
data |
Data frame. |
treatment |
Name of the binary treatment column (0/1). |
covariates |
Character vector of covariate names. |
model |
One of |
max_iter |
Maximum iterations for logistic regression. |
A numeric vector of propensity scores aligned to the rows of
data (after dropping NAs in treatment or
covariates); the names of the vector are the row names
of the retained rows.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.
## Not run: df <- data.frame(d = rbinom(200, 1, 0.4), x1 = rnorm(200), x2 = rnorm(200)) ps <- morie_matching_estimate_propensity(df, "d", c("x1", "x2")) ## End(Not run)## Not run: df <- data.frame(d = rbinom(200, 1, 0.4), x1 = rnorm(200), x2 = rnorm(200)) ps <- morie_matching_estimate_propensity(df, "d", c("x1", "x2")) ## End(Not run)
Thin wrapper around MatchIt::matchit(method = "exact").
morie_matching_exact(data, treatment, exact_vars)morie_matching_exact(data, treatment, exact_vars)
data |
Data frame. |
treatment |
Binary treatment column name. |
exact_vars |
Character vector of discrete variables for exact matching. |
A list of class morie_match_result.
## Not run: morie_matching_exact(df, "d", c("region", "year")) ## End(Not run)## Not run: morie_matching_exact(df, "d", c("region", "year")) ## End(Not run)
Thin wrapper around MatchIt::matchit(method = "full"), which
calls optmatch.
morie_matching_full(data, treatment, covariates, ps = NULL, n_subclasses = 10L)morie_matching_full(data, treatment, covariates, ps = NULL, n_subclasses = 10L)
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
ps |
Optional pre-computed propensity scores (ignored; retained for back-compat). |
n_subclasses |
Carried for back-compat (ignored under MatchIt). |
A list of class morie_match_result.
Hansen, B. B. (2004). Full matching in an observational study of coaching for the SAT. JASA, 99(467), 609–618.
## Not run: morie_matching_full(df, "d", c("x1", "x2")) ## End(Not run)## Not run: morie_matching_full(df, "d", c("x1", "x2")) ## End(Not run)
Thin wrapper around Matching::GenMatch + Matching::Match
that returns a morie_match_result. Uses a genetic algorithm
to find covariate weights for Mahalanobis distance matching that
maximise covariate balance.
morie_matching_genetic( data, treatment, covariates, n_neighbors = 1L, pop_size = 50L, n_generations = 20L, seed = 42L )morie_matching_genetic( data, treatment, covariates, n_neighbors = 1L, pop_size = 50L, n_generations = 20L, seed = 42L )
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
n_neighbors |
Number of matches per treated unit
( |
pop_size |
Genetic-algorithm population size (default 50). |
n_generations |
Number of GA generations. |
seed |
Random seed. |
A list of class morie_match_result.
Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects. Review of Economics and Statistics, 95(3), 932–945.
## Not run: morie_matching_genetic(df, "d", c("x1", "x2"), pop_size = 50, n_generations = 20) ## End(Not run)## Not run: morie_matching_genetic(df, "d", c("x1", "x2"), pop_size = 50, n_generations = 20) ## End(Not run)
Matches treated and control units on the basis of their pre-treatment covariate values.
morie_matching_longitudinal( data, treatment, covariates, unit, time, treatment_time, n_pre_periods = 1L, method = "nearest_neighbor" )morie_matching_longitudinal( data, treatment, covariates, unit, time, treatment_time, n_pre_periods = 1L, method = "nearest_neighbor" )
data |
Panel data frame. |
treatment |
Binary treatment indicator column. |
covariates |
Character vector of covariates. |
unit |
Column name identifying units. |
time |
Column name identifying time. |
treatment_time |
Column giving the (per-unit) start of treatment; non-finite values indicate never-treated. |
n_pre_periods |
Number of pre-treatment periods to summarise. |
method |
One of |
A list of class morie_match_result.
## Not run: morie_matching_longitudinal(panel, "d", c("x1"), unit = "id", time = "t", treatment_time = "t0") ## End(Not run)## Not run: morie_matching_longitudinal(panel, "d", c("x1"), unit = "id", time = "t", treatment_time = "t0") ## End(Not run)
Returns a data frame suitable for plotting absolute SMDs before and
after matching. For a publication-ready plot, pass the same
matchit object to cobalt::love.plot.
morie_matching_love_plot_data( unmatched_data, matched_data, treatment, covariates, weights_col = NULL )morie_matching_love_plot_data( unmatched_data, matched_data, treatment, covariates, weights_col = NULL )
unmatched_data, matched_data
|
Data frames. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
weights_col |
Optional column of matching weights in
|
A data frame with columns covariate, smd_before,
smd_after, abs_smd_before, abs_smd_after.
## Not run: morie_matching_love_plot_data(df, res$matched_data, "d", c("x1", "x2")) ## End(Not run)## Not run: morie_matching_love_plot_data(df, res$matched_data, "d", c("x1", "x2")) ## End(Not run)
Thin wrapper around MatchIt::matchit(distance = "mahalanobis").
morie_matching_mahalanobis( data, treatment, covariates, n_neighbors = 1L, caliper = NULL, replace = FALSE, exact = NULL )morie_matching_mahalanobis( data, treatment, covariates, n_neighbors = 1L, caliper = NULL, replace = FALSE, exact = NULL )
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of continuous covariates. |
n_neighbors |
Number of matches per treated unit. |
caliper |
Maximum Mahalanobis distance for a valid match. |
replace |
If |
exact |
Optional character vector of variables to match exactly prior to distance matching. |
A list of class morie_match_result.
## Not run: morie_matching_mahalanobis(df, "d", c("x1", "x2"), n_neighbors = 1) ## End(Not run)## Not run: morie_matching_mahalanobis(df, "d", c("x1", "x2"), n_neighbors = 1) ## End(Not run)
For each non-reference treatment level, matches treated units to the reference group via the chosen binary matching method.
morie_matching_multi_treatment( data, treatment, covariates, reference_group = NULL, method = "nearest_neighbor" )morie_matching_multi_treatment( data, treatment, covariates, reference_group = NULL, method = "nearest_neighbor" )
data |
Data frame. |
treatment |
Treatment column (may take more than two levels). |
covariates |
Character vector of covariates. |
reference_group |
Optional reference level (defaults to the modal level). |
method |
One of |
A named list whose keys are treatment levels and whose values
are morie_match_result objects.
## Not run: morie_matching_multi_treatment(df, "treat3", c("x1", "x2")) ## End(Not run)## Not run: morie_matching_multi_treatment(df, "treat3", c("x1", "x2")) ## End(Not run)
Thin wrapper around MatchIt::matchit(method = "nearest") that
returns a morie_match_result list for compatibility with the
rmorie balance / treatment-effect helpers.
morie_matching_nearest_neighbor( data, treatment, covariates, n_neighbors = 1L, caliper = NULL, replace = FALSE, ps = NULL, alpha = 0.05 )morie_matching_nearest_neighbor( data, treatment, covariates, n_neighbors = 1L, caliper = NULL, replace = FALSE, ps = NULL, alpha = 0.05 )
data |
Data frame. |
treatment |
Binary treatment column (0/1). |
covariates |
Character vector of covariates for the propensity model. |
n_neighbors |
Number of matches per treated unit
(forwarded as |
caliper |
Maximum logit-propensity distance for a valid match,
expressed in SD units of the logit (or |
replace |
If |
ps |
Optional pre-computed propensity scores (ignored; retained for back-compat). |
alpha |
Significance level (carried through to |
A list of class morie_match_result.
## Not run: res <- morie_matching_nearest_neighbor(df, "d", c("x1", "x2"), caliper = 0.2) ## End(Not run)## Not run: res <- morie_matching_nearest_neighbor(df, "d", c("x1", "x2"), caliper = 0.2) ## End(Not run)
Thin wrapper around MatchIt::matchit(method = "optimal"), which
calls optmatch.
morie_matching_optimal_pair( data, treatment, covariates, distance = "propensity", ps = NULL )morie_matching_optimal_pair( data, treatment, covariates, distance = "propensity", ps = NULL )
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
distance |
One of |
ps |
Optional pre-computed propensity scores (ignored; retained for back-compat). |
A list of class morie_match_result.
## Not run: morie_matching_optimal_pair(df, "d", c("x1", "x2")) ## End(Not run)## Not run: morie_matching_optimal_pair(df, "d", c("x1", "x2")) ## End(Not run)
Reports the propensity-score range overlap between treated and control, the number / percentage of units off support, and the IPW effective sample size.
morie_matching_overlap(data, treatment, covariates, ps = NULL)morie_matching_overlap(data, treatment, covariates, ps = NULL)
data |
Data frame. |
treatment |
Binary treatment column. |
covariates |
Character vector of covariates. |
ps |
Optional pre-computed propensity scores. |
A list with ps_summary (per-group quantiles),
overlap_region, n_off_support, pct_off_support,
and effective_sample_size.
## Not run: morie_matching_overlap(df, "d", c("x1", "x2")) ## End(Not run)## Not run: morie_matching_overlap(df, "d", c("x1", "x2")) ## End(Not run)
Compares balance before and after matching and reports percent bias reduction, count of balanced covariates, and overlap statistics.
morie_matching_quality( unmatched_data, matched_data, treatment, covariates, weights = NULL )morie_matching_quality( unmatched_data, matched_data, treatment, covariates, weights = NULL )
unmatched_data, matched_data
|
Data frames. |
treatment |
Binary treatment column. |
covariates |
Character vector of covariates. |
weights |
Optional column of matching weights in |
A list with balance_before, balance_after,
bias_reduction, mean_bias_reduction,
pct_balanced_before, pct_balanced_after,
n_obs_before, n_obs_after.
## Not run: morie_matching_quality(df, res$matched_data, "d", c("x1", "x2")) ## End(Not run)## Not run: morie_matching_quality(df, res$matched_data, "d", c("x1", "x2")) ## End(Not run)
Computes bounds on the p-value for the treatment effect over a grid of
values of gamma (the maximum odds ratio of differential treatment
assignment due to an unobserved confounder). Uses the Wilcoxon
signed-rank approach. For exact bounds, see
sensitivitymv::senmv or the
rbounds package.
morie_matching_rosenbaum_bounds( data, outcome, treatment, match_pairs, gamma_range = NULL )morie_matching_rosenbaum_bounds( data, outcome, treatment, match_pairs, gamma_range = NULL )
data |
Data frame. |
outcome, treatment
|
Column names. |
match_pairs |
Data frame of matched indices. |
gamma_range |
Optional numeric vector of |
A data frame with columns gamma, p_lower,
p_upper, significant_lower, significant_upper.
Rosenbaum, P. R. (2002). Observational Studies (2nd ed.). Springer.
## Not run: morie_matching_rosenbaum_bounds(df, "y", "d", res$match_pairs) ## End(Not run)## Not run: morie_matching_rosenbaum_bounds(df, "y", "d", res$match_pairs) ## End(Not run)
Thin wrapper around MatchIt::matchit(method = "subclass") that
reports within-stratum sample sizes and PS ranges, preserving the
rmorie return shape (data_with_strata + stratum_effects).
morie_matching_subclassify( data, treatment, covariates, ps = NULL, n_strata = 5L )morie_matching_subclassify( data, treatment, covariates, ps = NULL, n_strata = 5L )
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
ps |
Optional pre-computed propensity scores (ignored; retained for back-compat). |
n_strata |
Number of quantile-based strata (default 5). |
A list with components data_with_strata (the matched
data augmented with ._stratum and ._ps columns) and
stratum_effects (per-stratum sample sizes and PS ranges).
## Not run: morie_matching_subclassify(df, "d", c("x1", "x2"), n_strata = 5) ## End(Not run)## Not run: morie_matching_subclassify(df, "d", c("x1", "x2"), n_strata = 5) ## End(Not run)
Clips propensity scores to [lower, upper].
morie_matching_trim_propensity(ps, lower = 0.01, upper = 0.99)morie_matching_trim_propensity(ps, lower = 0.01, upper = 0.99)
ps |
Numeric vector of propensity scores. |
lower, upper
|
Numeric clip bounds (defaults 0.01, 0.99). |
A numeric vector of the same length as ps.
morie_matching_trim_propensity(c(0.001, 0.5, 0.999))morie_matching_trim_propensity(c(0.001, 0.5, 0.999))
Thin wrapper around MatchIt::matchit(method = "nearest",
ratio = max_ratio, min.controls = min_ratio) which supports
variable-ratio nearest-neighbour matching natively.
morie_matching_variable_ratio( data, treatment, covariates, min_ratio = 1L, max_ratio = 5L, caliper = 0.2, ps = NULL )morie_matching_variable_ratio( data, treatment, covariates, min_ratio = 1L, max_ratio = 5L, caliper = 0.2, ps = NULL )
data |
Data frame. |
treatment |
Binary treatment column name. |
covariates |
Character vector of covariates. |
min_ratio, max_ratio
|
Match-count bounds per treated unit. |
caliper |
Caliper on the propensity score (in SD units). |
ps |
Optional pre-computed propensity scores (ignored; retained for back-compat). |
A list of class morie_match_result.
## Not run: morie_matching_variable_ratio(df, "d", c("x1", "x2"), min_ratio = 1, max_ratio = 3) ## End(Not run)## Not run: morie_matching_variable_ratio(df, "d", c("x1", "x2"), min_ratio = 1, max_ratio = 3) ## End(Not run)
Thin extender over metafor::rma that fits a (possibly
moderated) random- or fixed-effects meta-analytic model to
per-study effect sizes and their sampling variances.
morie_meta_rma(yi, vi, data = NULL, ...)morie_meta_rma(yi, vi, data = NULL, ...)
yi |
Numeric vector of study-level effect-size estimates. |
vi |
Numeric vector of sampling variances corresponding to
|
data |
Optional data frame to evaluate |
... |
Further arguments forwarded to |
A list with $method = "metafor::rma" and
$raw (an rma.uni object with the pooled
estimate, heterogeneity statistics and moderator effects).
## Not run: if (requireNamespace("metafor", quietly = TRUE)) { set.seed(1) k <- 12 vi <- stats::runif(k, 0.02, 0.10) yi <- stats::rnorm(k, mean = 0.3, sd = sqrt(vi)) morie_meta_rma(yi = yi, vi = vi) } ## End(Not run)## Not run: if (requireNamespace("metafor", quietly = TRUE)) { set.seed(1) k <- 12 vi <- stats::runif(k, 0.02, 0.10) yi <- stats::rnorm(k, mean = 0.3, sd = sqrt(vi)) morie_meta_rma(yi = yi, vi = vi) } ## End(Not run)
MIDAS regression with Beta-polynomial weights
morie_midas_regression(x, y, K = NULL)morie_midas_regression(x, y, K = NULL)
x |
High-frequency regressor matrix (n_t x K) or flat vector. |
y |
Low-frequency target (length n_t). |
K |
Number of high-frequency lags (required when x is flat). |
Named list with beta0, beta1, theta1, theta2, weights,
r2, n, K, method.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Mini-batch stochastic gradient descent for OLS (R parity)
morie_mini_batch_gradient( x, y, lr = 0.01, n_epochs = 200, batch_size = 32L, seed = 0L )morie_mini_batch_gradient( x, y, lr = 0.01, n_epochs = 200, batch_size = 32L, seed = 0L )
x |
Numeric matrix / vector predictors. |
y |
Numeric response. |
lr |
Learning rate. |
n_epochs |
Number of passes over the data. |
batch_size |
Mini-batch size. |
seed |
RNG seed for shuffling. |
Named list: estimate, reference_ols, n_epochs, batch_size, loss, n, method.
morie_mini_batch_gradient(x = rnorm(50), y = rnorm(50))morie_mini_batch_gradient(x = rnorm(50), y = rnorm(50))
R port of morie.ml.apply_smote. Uses smotefamily::SMOTE when
installed and feasible; falls back to random oversampling (duplicate
minority rows) otherwise. Returns the resampled (X, y) together
with a status list of before/after counts and the method used.
morie_ml_apply_smote(X, y, random_state = 42L, k_neighbors = NULL)morie_ml_apply_smote(X, y, random_state = 42L, k_neighbors = NULL)
X |
Feature data.frame. |
y |
Binary outcome vector (numeric or factor). |
random_state |
Integer seed for the random fallback. Default 42. |
k_neighbors |
Integer or NULL. SMOTE neighbour count; auto-picked
to |
list(X, y, status) where status mirrors the Python dict
keys (method, minority_before, majority_before,
imbalance_ratio_before, total_before, total_after, plus
class_<label>_before / class_<label>_after).
Fits a 100-tree Random Forest on training data and reports a
classification report (precision / recall / F1 / support per class)
on the held-out test set. Mirrors morie.ml.eval_robustness.
morie_ml_eval_robustness( X, y, test_X, test_y, n_estimators = 100L, random_state = 42L )morie_ml_eval_robustness( X, y, test_X, test_y, n_estimators = 100L, random_state = 42L )
X |
Training features (data.frame or matrix). |
y |
Training labels (factor or coercible to factor). |
test_X |
Test features. |
test_y |
Test labels. |
n_estimators |
Number of trees. Default 100. |
random_state |
Integer seed. Default 42. |
Named list keyed by class label and accuracy with
precision / recall / f1-score / support per class, mirroring
sklearn's classification_report(output_dict=True).
Multi-trait GBLUP via vec-stacked mixed-model equations
morie_multi_trait_gblup(x, y, markers, Sigma_g = NULL, Sigma_e = NULL)morie_multi_trait_gblup(x, y, markers, Sigma_g = NULL, Sigma_e = NULL)
x |
Fixed-effect design (vector or matrix). |
y |
Multi-trait response (n x t). |
markers |
Genotype matrix (n x m). |
Sigma_g |
Optional t x t genetic covariance. |
Sigma_e |
Optional t x t residual covariance. |
list(estimate, G_hat, B_hat, Sigma_g, Sigma_e, n, t, method).
Montesinos Lopez Ch 10.
morie_multi_trait_gblup( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )morie_multi_trait_gblup( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )
R port of morie.multiple_testing. Provides p-value adjustment
methods controlling the family-wise error rate (FWER) and the false
discovery rate (FDR), simultaneous-inference helpers, p-value
combination procedures, and gatekeeping / hierarchical testing.
Every adjustment routine returns a morie_rich_result list
carrying original, adjusted, rejected,
method, alpha, n_rejected, n_tests, and
an interpretation paragraph. P-value combination procedures
return a list with the test statistic, combined p-value, and
interpretation.
FWER and FDR methods delegate to stats::p.adjust for the
textbook procedures (Bonferroni, Holm, Hochberg, Hommel,
Benjamini-Hochberg, Benjamini-Yekutieli). Combined-p tests delegate
to poolr when installed and fall back to inline math
otherwise. storey_q / estimate_pi0 delegate to
qvalue when installed.
Draw multivariate normal samples under a structured covariance
morie_mvn_with_covariance( n, p, rng, kernel = c("ar1", "independent", "compound", "toeplitz"), rho = 0.5, mean = NULL )morie_mvn_with_covariance( n, p, rng, kernel = c("ar1", "independent", "compound", "toeplitz"), rho = 0.5, mean = NULL )
n |
Number of samples. |
p |
Dimension. |
rng |
|
kernel |
One of |
rho |
Correlation parameter. |
mean |
Optional length-p mean vector. |
An n x p matrix of samples.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Thin extender over mvtnorm::pmvnorm that evaluates the
multivariate normal CDF over a hyper-rectangle
.
morie_mvnorm_pmv(lower, upper, mean = rep(0, length(lower)), sigma, ...)morie_mvnorm_pmv(lower, upper, mean = rep(0, length(lower)), sigma, ...)
lower |
Numeric vector of lower integration limits
( |
upper |
Numeric vector of upper integration limits
( |
mean |
Numeric mean vector of the same length as
|
sigma |
Numeric positive-(semi)definite covariance matrix. |
... |
Further arguments forwarded to
|
A list with $method = "mvtnorm::pmvnorm" and
$raw (a numeric scalar with the estimated probability
and Monte-Carlo error attributes attached).
## Not run: if (requireNamespace("mvtnorm", quietly = TRUE)) { set.seed(1) S <- matrix(c(1, 0.4, 0.4, 1), 2, 2) morie_mvnorm_pmv( lower = c(-1, -1), upper = c(1, 1), mean = c(0, 0), sigma = S ) } ## End(Not run)## Not run: if (requireNamespace("mvtnorm", quietly = TRUE)) { set.seed(1) S <- matrix(c(1, 0.4, 0.4, 1), 2, 2) morie_mvnorm_pmv( lower = c(-1, -1), upper = c(1, 1), mean = c(0, 0), sigma = S ) } ## End(Not run)
Thin extender over mvtnorm::rmvnorm that draws
observations from the multivariate normal distribution with a
given mean vector and covariance matrix.
morie_mvnorm_sample(n, mean = rep(0, ncol(sigma)), sigma, ...)morie_mvnorm_sample(n, mean = rep(0, ncol(sigma)), sigma, ...)
n |
Integer; the number of multivariate observations to draw. |
mean |
Numeric vector of length |
sigma |
Numeric positive-(semi)definite covariance matrix. |
... |
Further arguments forwarded to
|
A list with $method = "mvtnorm::rmvnorm" and
$raw (a numeric matrix of dimension
).
## Not run: if (requireNamespace("mvtnorm", quietly = TRUE)) { set.seed(1) S <- matrix(c(1, 0.4, 0.4, 1), 2, 2) morie_mvnorm_sample(100, mean = c(0, 0), sigma = S) } ## End(Not run)## Not run: if (requireNamespace("mvtnorm", quietly = TRUE)) { set.seed(1) S <- matrix(c(1, 0.4, 0.4, 1), 2, 2) morie_mvnorm_sample(100, mean = c(0, 0), sigma = S) } ## End(Not run)
N-BEATS-style polynomial + Fourier basis-expansion forecasting
morie_nbeats_basis(x, horizon = 1, n_trend = 3, n_season = 5, period = 12)morie_nbeats_basis(x, horizon = 1, n_trend = 3, n_season = 5, period = 12)
x |
Numeric history. |
horizon |
Forecast horizon. Default 1. |
n_trend |
Polynomial-trend degree. Default 3. |
n_season |
Number of Fourier harmonics. Default 5. |
period |
Seasonal period. Default 12. |
Named list with forecast, fitted, trend, seasonal,
theta_trend, theta_seasonal, r2, n, horizon, method.
morie_nbeats_basis(x = rnorm(50))morie_nbeats_basis(x = rnorm(50))
Thin extender over np::npregbw + np::npreg for
kernel-smoothed nonparametric regression with data-driven
bandwidth selection (Hayfield & Racine, 2008). Runs the
bandwidth-selection routine first and then fits the regression
using the chosen bandwidths.
morie_np_kernel_reg(formula, data, ...)morie_np_kernel_reg(formula, data, ...)
formula |
A model formula of the form
|
data |
A data frame containing the variables in
|
... |
Further arguments forwarded to |
A list with $method = "np::npreg (bws via
npregbw)" and $raw, where $raw is itself a
list with $bws (the rbandwidth object from
np::npregbw) and $fit (the npregression
object from np::npreg).
## Not run: if (requireNamespace("np", quietly = TRUE)) { set.seed(1) n <- 50 df <- data.frame(x = stats::runif(n, -1, 1)) df$y <- sin(pi * df$x) + stats::rnorm(n, sd = 0.1) morie_np_kernel_reg(y ~ x, data = df) } ## End(Not run)## Not run: if (requireNamespace("np", quietly = TRUE)) { set.seed(1) n <- 50 df <- data.frame(x = stats::runif(n, -1, 1)) df$y <- sin(pi * df$x) + stats::rnorm(n, sd = 0.1) morie_np_kernel_reg(y ~ x, data = df) } ## End(Not run)
All NYC OpenData SODA2 endpoints apply a default cap of 1,000 rows
per request unless an explicit $limit (or $$app_token for
authenticated requests) is supplied. For the NYPD CJ datasets
wrapped here that means:
morie_datasets_nyc_nypd_arrests_ytd(offline = FALSE) returns
only 1,000 rows by default, even though the live feed
carries ~69,300 rows.
Pass max_features = N to lift the single-request cap to N
rows (Socrata enforces a hard server-side cap of 50,000 rows
per request).
Pagination (wired in 3OO). For full pulls over the cap,
pass paginate = TRUE. morie walks SODA2 $offset in
page_size-row chunks until the server returns a short page
(exhausted) or max_features is reached. Without an app_token
the per-request ceiling is 1,000 rows so page_size = 1000 is
the default; with page_size = 50000 + app_token you can
pull the full ~69K-row arrests_ytd feed in two requests.
max_pages (default 200) is a safety net against runaway pulls.
Worked example:
# Full live pull of the YTD arrests feed (~69K rows over ~70 pages). df <- morie_datasets_nyc_nypd_arrests_ytd( offline = FALSE, paginate = TRUE) # First 5,000 rows only (5 paged requests of 1,000 each). df <- morie_datasets_nyc_nypd_arrests_ytd( offline = FALSE, paginate = TRUE, max_features = 5000L)
The bundled fixtures (offline mode) are unaffected – they ship 5
rows each as deterministic sample data, and max_features simply
truncates the fixture.
Odds ratio and 95% CI from a 2x2 contingency table
morie_odds_ratio_ci(table_2x2, alpha = 0.05)morie_odds_ratio_ci(table_2x2, alpha = 0.05)
table_2x2 |
A 2x2 matrix: rows are treatment, columns are outcome. |
alpha |
Significance level. |
Named list: or, ci_lower, ci_upper, p_value.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Omega-squared (less biased than eta-squared)
morie_omega_squared(f_stat, df_between, df_within, n)morie_omega_squared(f_stat, df_between, df_within, n)
f_stat |
F statistic. |
df_between |
Degrees of freedom (numerator). |
df_within |
Degrees of freedom (denominator). |
n |
Total sample size. |
Numeric omega-squared.
morie_omega_squared(f_stat = 5.2, df_between = 2, df_within = 87, n = 90)morie_omega_squared(f_stat = 5.2, df_between = 2, df_within = 87, n = 90)
For an ordered sample the coverages U_i = F(X_(i)) - F(X_(i-1)) are i.i.d. Beta(1, n) under H0. Returns empirical coverages (rank-based) plus the cumulative coverage F(X_(n)) - F(X_(1)).
morie_one_sample_coverage(x)morie_one_sample_coverage(x)
x |
Numeric vector. |
Named list: coverages, cumulative, expected, n, sample_min, sample_max, method.
morie_one_sample_coverage(x = rnorm(50))morie_one_sample_coverage(x = rnorm(50))
One-sample t-test
morie_one_sample_t_test( x, mu0 = 0, alternative = c("two.sided", "greater", "less") )morie_one_sample_t_test( x, mu0 = 0, alternative = c("two.sided", "greater", "less") )
x |
Numeric vector. |
mu0 |
Null hypothesis mean (default 0). |
alternative |
|
Named list: t, df, p_value, ci.
morie_one_sample_t_test(x = rnorm(50))morie_one_sample_t_test(x = rnorm(50))
Tests H0: F_1 = ... = F_k against the ordered alternative H1: F_1 <= F_2 <= ... <= F_k. J = sum over i<j of U_ij (Mann-Whitney counts with 1/2 weight for ties).
morie_ordered_alternatives_test(groups)morie_ordered_alternatives_test(groups)
groups |
List of numeric vectors in monotone hypothesised order. |
Normal approximation: E_J = (N^2 - sum n_i^2) / 4 Var_J = (N^2 (2N + 3) - sum n_i^2 (2 n_i + 3)) / 72
Named list: statistic, p_value, z, E_J, Var_J, n, k, method.
morie_ordered_alternatives_test(groups = list(rnorm(20), rnorm(20), rnorm(20)))morie_ordered_alternatives_test(groups = list(rnorm(20), rnorm(20), rnorm(20)))
M^2 = (n - 1) * cor(u, v)^2 ~ chi^2_1 under independence, where (u, v) are row/column scores weighted by cell counts.
morie_ordered_categories(x, row_scores = NULL, col_scores = NULL)morie_ordered_categories(x, row_scores = NULL, col_scores = NULL)
x |
r x c contingency table. |
row_scores |
Length-r row scores; default 1..r. |
col_scores |
Length-c col scores; default 1..c. |
Named list: statistic (M^2), p_value, df, n, correlation.
morie_ordered_categories(x = rnorm(50))morie_ordered_categories(x = rnorm(50))
Uses n_folds cross-fitting: propensity (logistic ridge) and
outcome regression (OLS separately for D=1 and D=0) are fit on K-1
folds and predicted on the held-out fold. The doubly-robust
influence function (Robins-Rotnitzky-Zhao 1994) is averaged to
yield the ATE.
morie_otis_aipw_ate( df, treatment, outcome, covariates, n_folds = 5L, seed = 123L, eps = 0.02 )morie_otis_aipw_ate( df, treatment, outcome, covariates, n_folds = 5L, seed = 123L, eps = 0.02 )
df |
A data frame containing |
treatment |
Name of the binary treatment column. |
outcome |
Name of the (numeric) outcome column. |
covariates |
Character vector of covariate names. |
n_folds |
Number of cross-fitting folds (default 5). |
seed |
Integer seed for the fold partition (default 123). |
eps |
Propensity clip bound (default 0.02). |
A morie_causal_estimate list.
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). JASA 89(427), 846-866.
set.seed(1) n <- 300L x <- rnorm(n) d <- rbinom(n, 1, plogis(0.4 * x)) y <- 0.5 * d + x + rnorm(n) df <- data.frame(d = d, y = y, x = x) morie_otis_aipw_ate(df, treatment = "d", outcome = "y", covariates = "x", n_folds = 3L)set.seed(1) n <- 300L x <- rnorm(n) d <- rbinom(n, 1, plogis(0.4 * x)) y <- 0.5 * d + x + rnorm(n) df <- data.frame(d = d, y = y, x = x) morie_otis_aipw_ate(df, treatment = "d", outcome = "y", covariates = "x", n_folds = 3L)
The python version stacks RF + ridge + OLS/logit + mean (and optionally xgboost) via cross-validated convex weights. The R port would require pulling in SuperLearner or hand-rolling the stacked-cross-fit construction.
morie_otis_aipw_superlearner(...)morie_otis_aipw_superlearner(...)
... |
Arguments mirroring |
Stops with a NotYetPorted message; for the time
being, call morie_otis_aipw_ate() with the default
cross-fit OLS+logit stack.
## Not run: morie_otis_aipw_superlearner(df, treatment = "d", outcome = "y", covariates = "x") ## End(Not run)## Not run: morie_otis_aipw_superlearner(df, treatment = "d", outcome = "y", covariates = "x") ## End(Not run)
Calls morie_otis_rplace / morie_otis_astcmb /
morie_otis_volat / morie_otis_rctrnd /
morie_otis_otdesc on df for one fiscal year and
returns a named list of morie_otis_result objects. If
out_dir is supplied, each result is also written to disk as a
.txt (format()) and a .json
(jsonlite::toJSON when available, else dput).
morie_otis_all_analyses(df, year, sex = NULL, out_dir = NULL)morie_otis_all_analyses(df, year, sex = NULL, out_dir = NULL)
df |
OTIS data.frame. |
year |
Integer fiscal year. |
sex |
Optional gender filter passed to
|
out_dir |
Optional output directory. When non-NULL the directory is created if missing. |
CRAN-safe: with out_dir = NULL (default) no files are written.
Named list of morie_otis_results.
## Not run: df <- morie_otis_load() res <- morie_otis_all_analyses(df, year = 2024) ## End(Not run)## Not run: df <- morie_otis_load() res <- morie_otis_all_analyses(df, year = 2024) ## End(Not run)
R port of morie.otis_all_analyze. Pairs with the OTIS loaders
(see ?morie_otis and the b01/c-series/d-series CSV files
under data/datasets/OTIS/) and chains the existing MRM-OTIS
callables in mrm_otis.R the same way
morie_arsau_analyze_* (in mrm_arsau.R) chains the
generic MRM-UoF callables.
For every dataset id (b01..d07) this module exposes
morie_otis_analyze_<id>(data). Each analyzer returns a named
list with class c("morie_otis_analysis_result",
"morie_rich_result", "list") containing
title / summary_lines / tables /
interpretation / warnings / payload, mirroring
the Python RichResult shape used in
src/morie/otis_all_analyze.py.
Cross-year invariant: UniqueIndividual_ID is reassigned
every fiscal year (see variable_taxonomy.R). Every analyzer
that touches that column is within-year only.
Wraps the full causal pipeline for the canonical Restrictive Confinement Detailed Dataset: 8-state alert-combo encoding -> MatchIt 1:1 NN PSM -> IRM-DML with RF nuisances -> multi-way clustered SE.
morie_otis_analyze_a01(data = NULL, out_dir = NULL)morie_otis_analyze_a01(data = NULL, out_dir = NULL)
data |
a01 data.frame (loaded from
|
out_dir |
Optional output directory. |
A morie_otis_analysis_result. If the morie causal
helpers aren't loaded, returns a "not yet ported" stub.
## Not run: morie_otis_analyze_a01(otis_a01) ## End(Not run)## Not run: morie_otis_analyze_a01(otis_a01) ## End(Not run)
a01 alt-T Ruhela: Age 50+ -> vm count.
morie_otis_analyze_a01_ruhela_alt_age(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_alt_age(data = NULL, out_dir = NULL)morie_otis_analyze_a01_ruhela_alt_age(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_alt_age(data = NULL, out_dir = NULL)
data |
Optional a01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_a01_ruhela_alt_age()## Not run: morie_otis_analyze_a01_ruhela_alt_age()
a01 alt-T Ruhela: Female -> vm count.
morie_otis_analyze_a01_ruhela_alt_gender(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_alt_gender(data = NULL, out_dir = NULL)morie_otis_analyze_a01_ruhela_alt_gender(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_alt_gender(data = NULL, out_dir = NULL)
data |
Optional a01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_a01_ruhela_alt_gender()## Not run: morie_otis_analyze_a01_ruhela_alt_gender()
a01 alt-T Ruhela: Toronto region -> vm count.
morie_otis_analyze_a01_ruhela_alt_toronto(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_alt_toronto(data = NULL, out_dir = NULL)morie_otis_analyze_a01_ruhela_alt_toronto(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_alt_toronto(data = NULL, out_dir = NULL)
data |
Optional a01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_a01_ruhela_alt_toronto()## Not run: morie_otis_analyze_a01_ruhela_alt_toronto()
Runs the complete OTIS-RC methodology arc (IPW + AIPW + g-comp + PSM-NN + PSM-subclass + IRM-DML + match_first + ATC + PLR + SuperLearner) on the canonical alert-complexity -> regional- volatility formulation.
morie_otis_analyze_a01_ruhela_formulations(data = NULL, out_dir = NULL) morie_otis_analyze_a01_dlrm(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm(data = NULL, out_dir = NULL)morie_otis_analyze_a01_ruhela_formulations(data = NULL, out_dir = NULL) morie_otis_analyze_a01_dlrm(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm(data = NULL, out_dir = NULL)
data |
Optional a01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_a01_ruhela_formulations(otis_a01) ## End(Not run)## Not run: morie_otis_analyze_a01_ruhela_formulations(otis_a01) ## End(Not run)
Per-year full-DLRM on a01 canonical formulation.
morie_otis_analyze_a01_ruhela_per_year(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_per_year(data = NULL, out_dir = NULL)morie_otis_analyze_a01_ruhela_per_year(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_per_year(data = NULL, out_dir = NULL)
data |
Optional a01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_a01_ruhela_per_year()## Not run: morie_otis_analyze_a01_ruhela_per_year()
a01 subgroup Ruhela: Female-only cell frame.
morie_otis_analyze_a01_ruhela_subgroup_female(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_subgroup_female(data = NULL, out_dir = NULL)morie_otis_analyze_a01_ruhela_subgroup_female(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_subgroup_female(data = NULL, out_dir = NULL)
data |
Optional a01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_a01_ruhela_subgroup_female()## Not run: morie_otis_analyze_a01_ruhela_subgroup_female()
a01 subgroup Ruhela: Male-only cell frame.
morie_otis_analyze_a01_ruhela_subgroup_male(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_subgroup_male(data = NULL, out_dir = NULL)morie_otis_analyze_a01_ruhela_subgroup_male(data = NULL, out_dir = NULL) morie_otis_analyze_a01_mrm_subgroup_male(data = NULL, out_dir = NULL)
data |
Optional a01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_a01_ruhela_subgroup_male()## Not run: morie_otis_analyze_a01_ruhela_subgroup_male()
Wires together morie_otis_analyze_a01 (causal IRM-DML) with
the Toronto Police Service / StatsCan CSI context. The R port
requires the morie causal pipeline and TPS-CSI helpers to be loaded;
otherwise returns a "not yet ported" stub.
morie_otis_analyze_a01_with_csi_context( data = NULL, variant = "total", rebase_to_year = 2023L, out_dir = NULL )morie_otis_analyze_a01_with_csi_context( data = NULL, variant = "total", rebase_to_year = 2023L, out_dir = NULL )
data |
Optional a01 data.frame. |
variant |
CSI variant: |
rebase_to_year |
Anchor year for the CSI index column
(default 2023). Use |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_a01_with_csi_context(otis_a01) ## End(Not run)## Not run: morie_otis_analyze_a01_with_csi_context(otis_a01) ## End(Not run)
Counterpart to Python analyze_all().
morie_otis_analyze_all(datasets, out_dir = NULL)morie_otis_analyze_all(datasets, out_dir = NULL)
datasets |
Named list |
out_dir |
Optional directory to write per-dataset
|
Named list of morie_otis_analysis_results.
Person-level segregation-placement analysis (b01)
morie_otis_analyze_b01(data)morie_otis_analyze_b01(data)
data |
b01 data.frame (76,934 rows in the public release). |
A morie_otis_analysis_result list with reason / alert /
year-trend tables. Within-year only – UniqueIndividual_ID
is not cross-year-safe.
b01 alt-T Ruhela: Age 50+ -> vm count.
morie_otis_analyze_b01_ruhela_alt_age(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_alt_age(data = NULL, out_dir = NULL)morie_otis_analyze_b01_ruhela_alt_age(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_alt_age(data = NULL, out_dir = NULL)
data |
Optional b01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b01_ruhela_alt_age()## Not run: morie_otis_analyze_b01_ruhela_alt_age()
b01 alt-T Ruhela: Female -> vm count.
morie_otis_analyze_b01_ruhela_alt_gender(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_alt_gender(data = NULL, out_dir = NULL)morie_otis_analyze_b01_ruhela_alt_gender(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_alt_gender(data = NULL, out_dir = NULL)
data |
Optional b01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b01_ruhela_alt_gender()## Not run: morie_otis_analyze_b01_ruhela_alt_gender()
b01 alt-T Ruhela: Toronto region -> vm count.
morie_otis_analyze_b01_ruhela_alt_toronto(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_alt_toronto(data = NULL, out_dir = NULL)morie_otis_analyze_b01_ruhela_alt_toronto(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_alt_toronto(data = NULL, out_dir = NULL)
data |
Optional b01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b01_ruhela_alt_toronto()## Not run: morie_otis_analyze_b01_ruhela_alt_toronto()
OTIS b01 Ruhela formulations (full DLRM).
morie_otis_analyze_b01_ruhela_formulations(data = NULL, out_dir = NULL) morie_otis_analyze_b01_dlrm(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm(data = NULL, out_dir = NULL)morie_otis_analyze_b01_ruhela_formulations(data = NULL, out_dir = NULL) morie_otis_analyze_b01_dlrm(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm(data = NULL, out_dir = NULL)
data |
Optional b01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b01_ruhela_formulations(otis_b01) ## End(Not run)## Not run: morie_otis_analyze_b01_ruhela_formulations(otis_b01) ## End(Not run)
Per-year full-DLRM on b01 canonical formulation.
morie_otis_analyze_b01_ruhela_per_year(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_per_year(data = NULL, out_dir = NULL)morie_otis_analyze_b01_ruhela_per_year(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_per_year(data = NULL, out_dir = NULL)
data |
Optional b01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b01_ruhela_per_year()## Not run: morie_otis_analyze_b01_ruhela_per_year()
b01 subgroup Ruhela: Female-only cell frame.
morie_otis_analyze_b01_ruhela_subgroup_female(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_subgroup_female(data = NULL, out_dir = NULL)morie_otis_analyze_b01_ruhela_subgroup_female(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_subgroup_female(data = NULL, out_dir = NULL)
data |
Optional b01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b01_ruhela_subgroup_female()## Not run: morie_otis_analyze_b01_ruhela_subgroup_female()
b01 subgroup Ruhela: Male-only cell frame.
morie_otis_analyze_b01_ruhela_subgroup_male(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_subgroup_male(data = NULL, out_dir = NULL)morie_otis_analyze_b01_ruhela_subgroup_male(data = NULL, out_dir = NULL) morie_otis_analyze_b01_mrm_subgroup_male(data = NULL, out_dir = NULL)
data |
Optional b01 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b01_ruhela_subgroup_male()## Not run: morie_otis_analyze_b01_ruhela_subgroup_male()
Aggregate segregation days per person per year (b02)
morie_otis_analyze_b02(data)morie_otis_analyze_b02(data)
data |
b02 data.frame. |
b02 alt-T Ruhela: Age 50+ -> total seg days.
morie_otis_analyze_b02_ruhela_alt_age(data = NULL, out_dir = NULL) morie_otis_analyze_b02_mrm_alt_age(data = NULL, out_dir = NULL)morie_otis_analyze_b02_ruhela_alt_age(data = NULL, out_dir = NULL) morie_otis_analyze_b02_mrm_alt_age(data = NULL, out_dir = NULL)
data |
Optional b02 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b02_ruhela_alt_age()## Not run: morie_otis_analyze_b02_ruhela_alt_age()
b02 alt-T Ruhela: Toronto region -> total seg days.
morie_otis_analyze_b02_ruhela_alt_region(data = NULL, out_dir = NULL) morie_otis_analyze_b02_mrm_alt_region(data = NULL, out_dir = NULL)morie_otis_analyze_b02_ruhela_alt_region(data = NULL, out_dir = NULL) morie_otis_analyze_b02_mrm_alt_region(data = NULL, out_dir = NULL)
data |
Optional b02 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b02_ruhela_alt_region()## Not run: morie_otis_analyze_b02_ruhela_alt_region()
OTIS b02 Ruhela formulations: T=Female -> seg-day count.
morie_otis_analyze_b02_ruhela_formulations(data = NULL, out_dir = NULL) morie_otis_analyze_b02_dlrm(data = NULL, out_dir = NULL) morie_otis_analyze_b02_mrm(data = NULL, out_dir = NULL)morie_otis_analyze_b02_ruhela_formulations(data = NULL, out_dir = NULL) morie_otis_analyze_b02_dlrm(data = NULL, out_dir = NULL) morie_otis_analyze_b02_mrm(data = NULL, out_dir = NULL)
data |
Optional b02 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b02_ruhela_formulations(otis_b02) ## End(Not run)## Not run: morie_otis_analyze_b02_ruhela_formulations(otis_b02) ## End(Not run)
Segregation placements by alert x institution (b03)
morie_otis_analyze_b03(data)morie_otis_analyze_b03(data)
data |
b03 data.frame. |
b03 aggregate Ruhela: Alert presence -> seg placements.
morie_otis_analyze_b03_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b03_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_b03_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b03_mrm_aggregate(data, out_dir = NULL)
data |
b03 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b03_ruhela_aggregate(otis_b03)## Not run: morie_otis_analyze_b03_ruhela_aggregate(otis_b03)
Placement durations by region & gender (b04)
morie_otis_analyze_b04(data)morie_otis_analyze_b04(data)
data |
b04 data.frame. |
b04 aggregate Ruhela: Female -> median seg duration.
morie_otis_analyze_b04_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b04_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_b04_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b04_mrm_aggregate(data, out_dir = NULL)
data |
b04 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b04_ruhela_aggregate(otis_b04)## Not run: morie_otis_analyze_b04_ruhela_aggregate(otis_b04)
Distribution of placements by binned duration (b05)
morie_otis_analyze_b05(data)morie_otis_analyze_b05(data)
data |
b05 data.frame. |
Applies the Sprott-Doob 15-day Mandela threshold to OTIS b05 (Ontario provincial segregation placement counts by binned duration).
morie_otis_analyze_b05_mandela_classification(data, out_dir = NULL)morie_otis_analyze_b05_mandela_classification(data, out_dir = NULL)
data |
b05 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b05_mandela_classification(otis_b05)## Not run: morie_otis_analyze_b05_mandela_classification(otis_b05)
OTIS b05 (segregation placements by consecutive duration) does
not carry a demographic treatment variable – the published
schema is just EndFiscalYear, Consecutive_Duration,
Number_SegregationPlacements. The "Ruhela formulation" presumes
a binary treatment column (typically Gender, Race, or alert
status) for the aggregate RF test, so b05 has no meaningful
aggregate Ruhela analysis on its own. Returns a structured
"not applicable" wrapper rather than erroring so dispatcher
loops over the b03..b09 family stay green.
morie_otis_analyze_b05_ruhela_aggregate(data, out_dir = NULL)morie_otis_analyze_b05_ruhela_aggregate(data, out_dir = NULL)
data |
b05 data.frame. |
out_dir |
Optional output directory (unused, accepted for parity with sibling aggregators). |
morie_otis_analysis_result carrying a "not
applicable" note in warnings.
## Not run: morie_otis_analyze_b05_ruhela_aggregate(otis_b05)## Not run: morie_otis_analyze_b05_ruhela_aggregate(otis_b05)
Reasons for placement x institution x gender (b06)
morie_otis_analyze_b06(data)morie_otis_analyze_b06(data)
data |
b06 data.frame. |
b06 aggregate Ruhela: Disciplinary reason -> seg placements.
morie_otis_analyze_b06_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b06_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_b06_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b06_mrm_aggregate(data, out_dir = NULL)
data |
b06 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b06_ruhela_aggregate(otis_b06)## Not run: morie_otis_analyze_b06_ruhela_aggregate(otis_b06)
Alerts x gender (b07)
morie_otis_analyze_b07(data)morie_otis_analyze_b07(data)
data |
b07 data.frame. |
b07 aggregate Ruhela (pivot to long): With-alert -> seg placements.
morie_otis_analyze_b07_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b07_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_b07_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b07_mrm_aggregate(data, out_dir = NULL)
data |
b07 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b07_ruhela_aggregate(otis_b07)## Not run: morie_otis_analyze_b07_ruhela_aggregate(otis_b07)
Durations by institution & gender (b08)
morie_otis_analyze_b08(data)morie_otis_analyze_b08(data)
data |
b08 data.frame. |
b08 aggregate Ruhela: Female -> median seg duration (institution-clustered).
morie_otis_analyze_b08_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b08_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_b08_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b08_mrm_aggregate(data, out_dir = NULL)
data |
b08 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b08_ruhela_aggregate(otis_b08)## Not run: morie_otis_analyze_b08_ruhela_aggregate(otis_b08)
Individuals by number of placements x gender (b09)
morie_otis_analyze_b09(data)morie_otis_analyze_b09(data)
data |
b09 data.frame. |
b09 aggregate Ruhela: Female -> individuals in segregation.
morie_otis_analyze_b09_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b09_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_b09_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_b09_mrm_aggregate(data, out_dir = NULL)
data |
b09 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_b09_ruhela_aggregate(otis_b09)## Not run: morie_otis_analyze_b09_ruhela_aggregate(otis_b09)
Pearson chi-square + Cramer's V on every meaningful 2-way slice of the c-series datasets. Honour to Prof. Doob's chi-square tradition in Canadian corrections research.
morie_otis_analyze_c_chi2( datasets, contingency_value = "NumberIndividuals_RestrictiveConfinement", out_dir = NULL )morie_otis_analyze_c_chi2( datasets, contingency_value = "NumberIndividuals_RestrictiveConfinement", out_dir = NULL )
datasets |
Named list of c-series data.frames
(e.g. |
contingency_value |
Count column to pivot on
(default |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c_chi2(list(c03 = otis_c03, c04 = otis_c04)) ## End(Not run)## Not run: morie_otis_analyze_c_chi2(list(c03 = otis_c03, c04 = otis_c04)) ## End(Not run)
Total individuals x custody/RC/seg x gender (c01)
morie_otis_analyze_c01(data)morie_otis_analyze_c01(data)
data |
c01 data.frame. |
c01 aggregate Ruhela: Female -> RC count.
morie_otis_analyze_c01_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c01_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c01_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c01_mrm_aggregate(data, out_dir = NULL)
data |
c01 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c01_ruhela_aggregate(otis_c01)## Not run: morie_otis_analyze_c01_ruhela_aggregate(otis_c01)
c01 region-cluster variant (year-clustered GEE).
morie_otis_analyze_c01_ruhela_aggregate_region_cluster(data, out_dir = NULL) morie_otis_analyze_c01_mrm_aggregate_region_cluster(data, out_dir = NULL)morie_otis_analyze_c01_ruhela_aggregate_region_cluster(data, out_dir = NULL) morie_otis_analyze_c01_mrm_aggregate_region_cluster(data, out_dir = NULL)
data |
c01 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c01_ruhela_aggregate_region_cluster(otis_c01) ## End(Not run)## Not run: morie_otis_analyze_c01_ruhela_aggregate_region_cluster(otis_c01) ## End(Not run)
Individuals in RC/seg by institution (c02)
morie_otis_analyze_c02(data)morie_otis_analyze_c02(data)
data |
c02 data.frame. |
c02 aggregate Ruhela: Female -> RC (institution GEE).
morie_otis_analyze_c02_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c02_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c02_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c02_mrm_aggregate(data, out_dir = NULL)
data |
c02 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c02_ruhela_aggregate(otis_c02)## Not run: morie_otis_analyze_c02_ruhela_aggregate(otis_c02)
Individuals x race x gender (c03)
morie_otis_analyze_c03(data)morie_otis_analyze_c03(data)
data |
c03 data.frame. |
c03 aggregate Ruhela: Indigenous -> RC.
morie_otis_analyze_c03_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c03_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c03_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c03_mrm_aggregate(data, out_dir = NULL)
data |
c03 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c03_ruhela_aggregate(otis_c03)## Not run: morie_otis_analyze_c03_ruhela_aggregate(otis_c03)
Individuals in RC/seg by race x region (c04)
morie_otis_analyze_c04(data)morie_otis_analyze_c04(data)
data |
c04 data.frame from OTIS. |
RichResult with summary + race-by-region crosstab.
c04 aggregate Ruhela: Indigenous -> RC (by region).
morie_otis_analyze_c04_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c04_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c04_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c04_mrm_aggregate(data, out_dir = NULL)
data |
c04 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c04_ruhela_aggregate(otis_c04)## Not run: morie_otis_analyze_c04_ruhela_aggregate(otis_c04)
c04 region-cluster variant.
morie_otis_analyze_c04_ruhela_aggregate_region_cluster(data, out_dir = NULL) morie_otis_analyze_c04_mrm_aggregate_region_cluster(data, out_dir = NULL)morie_otis_analyze_c04_ruhela_aggregate_region_cluster(data, out_dir = NULL) morie_otis_analyze_c04_mrm_aggregate_region_cluster(data, out_dir = NULL)
data |
c04 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c04_ruhela_aggregate_region_cluster(otis_c04) ## End(Not run)## Not run: morie_otis_analyze_c04_ruhela_aggregate_region_cluster(otis_c04) ## End(Not run)
Individuals in RC/seg by religion x region (c05)
morie_otis_analyze_c05(data)morie_otis_analyze_c05(data)
data |
c05 data.frame from OTIS. |
RichResult with summary + religion-by-region crosstab.
c05 aggregate Ruhela: non-majority religion -> RC.
morie_otis_analyze_c05_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c05_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c05_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c05_mrm_aggregate(data, out_dir = NULL)
data |
c05 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c05_ruhela_aggregate(otis_c05)## Not run: morie_otis_analyze_c05_ruhela_aggregate(otis_c05)
Individuals in RC/seg by age category x region (c06)
morie_otis_analyze_c06(data)morie_otis_analyze_c06(data)
data |
c06 data.frame from OTIS. |
RichResult with summary + age-by-region crosstab.
c06 aggregate Ruhela: Age 50+ -> RC.
morie_otis_analyze_c06_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c06_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c06_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c06_mrm_aggregate(data, out_dir = NULL)
data |
c06 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c06_ruhela_aggregate(otis_c06)## Not run: morie_otis_analyze_c06_ruhela_aggregate(otis_c06)
Individuals x alerts x gender (c07)
morie_otis_analyze_c07(data)morie_otis_analyze_c07(data)
data |
c07 data.frame. |
c07 aggregate Ruhela: Alert presence x Gender -> RC.
morie_otis_analyze_c07_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c07_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c07_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c07_mrm_aggregate(data, out_dir = NULL)
data |
c07 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c07_ruhela_aggregate(otis_c07)## Not run: morie_otis_analyze_c07_ruhela_aggregate(otis_c07)
Individuals by religion x gender (c08)
morie_otis_analyze_c08(data)morie_otis_analyze_c08(data)
data |
c08 data.frame from OTIS. |
RichResult with summary + religion-by-gender crosstab.
c08 aggregate Ruhela: non-majority religion x gender -> RC.
morie_otis_analyze_c08_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c08_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c08_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c08_mrm_aggregate(data, out_dir = NULL)
data |
c08 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c08_ruhela_aggregate(otis_c08)## Not run: morie_otis_analyze_c08_ruhela_aggregate(otis_c08)
Individuals by age category x gender (c09)
morie_otis_analyze_c09(data)morie_otis_analyze_c09(data)
data |
c09 data.frame from OTIS. |
RichResult with summary + age-by-gender crosstab.
c09 aggregate Ruhela: Age 50+ x gender -> RC.
morie_otis_analyze_c09_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c09_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c09_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c09_mrm_aggregate(data, out_dir = NULL)
data |
c09 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c09_ruhela_aggregate(otis_c09)## Not run: morie_otis_analyze_c09_ruhela_aggregate(otis_c09)
RC/seg aggregate durations by institution (c10)
morie_otis_analyze_c10(data)morie_otis_analyze_c10(data)
data |
c10 data.frame. |
c10 aggregate Ruhela: Female -> median RC days (institution GEE).
morie_otis_analyze_c10_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c10_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c10_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c10_mrm_aggregate(data, out_dir = NULL)
data |
c10 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c10_ruhela_aggregate(otis_c10)## Not run: morie_otis_analyze_c10_ruhela_aggregate(otis_c10)
Individuals by aggregate-duration bin (c11)
morie_otis_analyze_c11(data)morie_otis_analyze_c11(data)
data |
c11 data.frame. |
Applies the 15-day threshold to OTIS c11 (Ontario provincial counts of INDIVIDUALS by binned aggregate duration). Reports both restrictive-confinement and segregation-only views.
morie_otis_analyze_c11_mandela_classification(data, out_dir = NULL)morie_otis_analyze_c11_mandela_classification(data, out_dir = NULL)
data |
c11 data.frame. |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c11_mandela_classification(otis_c11)## Not run: morie_otis_analyze_c11_mandela_classification(otis_c11)
c11 aggregate Ruhela: long-duration bin (>=16 days) -> RC.
morie_otis_analyze_c11_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c11_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c11_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c11_mrm_aggregate(data, out_dir = NULL)
data |
c11 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c11_ruhela_aggregate(otis_c11)## Not run: morie_otis_analyze_c11_ruhela_aggregate(otis_c11)
RC/seg aggregate durations by region & gender (c12)
morie_otis_analyze_c12(data)morie_otis_analyze_c12(data)
data |
c12 data.frame. |
c12 aggregate Ruhela: Female -> median RC days (by region).
morie_otis_analyze_c12_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c12_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_c12_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_c12_mrm_aggregate(data, out_dir = NULL)
data |
c12 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_c12_ruhela_aggregate(otis_c12)## Not run: morie_otis_analyze_c12_ruhela_aggregate(otis_c12)
Yearly trend (d01 Poisson CIs) + Alert x Cause / Housing contingency chi^2 + Cramer's V on d06 / d07.
morie_otis_analyze_d_chi2(datasets, out_dir = NULL)morie_otis_analyze_d_chi2(datasets, out_dir = NULL)
datasets |
Named list with |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_d_chi2(list(d01 = otis_d01, d06 = otis_d06, d07 = otis_d07)) ## End(Not run)## Not run: morie_otis_analyze_d_chi2(list(d01 = otis_d01, d06 = otis_d06, d07 = otis_d07)) ## End(Not run)
Person-level custodial deaths (d01)
morie_otis_analyze_d01(data)morie_otis_analyze_d01(data)
data |
d01 data.frame. |
Custodial deaths by gender (d02)
morie_otis_analyze_d02(data)morie_otis_analyze_d02(data)
data |
d02 data.frame from OTIS. |
RichResult with summary + deaths-by-gender crosstab.
d02 aggregate Ruhela: Female -> custodial deaths.
morie_otis_analyze_d02_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_d02_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_d02_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_d02_mrm_aggregate(data, out_dir = NULL)
data |
d02 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_d02_ruhela_aggregate(otis_d02)## Not run: morie_otis_analyze_d02_ruhela_aggregate(otis_d02)
Custodial deaths by race (d03)
morie_otis_analyze_d03(data)morie_otis_analyze_d03(data)
data |
d03 data.frame from OTIS. |
RichResult with summary + deaths-by-race crosstab.
d03 aggregate Ruhela: Indigenous -> custodial deaths.
morie_otis_analyze_d03_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_d03_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_d03_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_d03_mrm_aggregate(data, out_dir = NULL)
data |
d03 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_d03_ruhela_aggregate(otis_d03)## Not run: morie_otis_analyze_d03_ruhela_aggregate(otis_d03)
Custodial deaths by religion (d04)
morie_otis_analyze_d04(data)morie_otis_analyze_d04(data)
data |
d04 data.frame from OTIS. |
RichResult with summary + deaths-by-religion crosstab.
d04 aggregate Ruhela: non-majority religion -> custodial deaths.
morie_otis_analyze_d04_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_d04_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_d04_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_d04_mrm_aggregate(data, out_dir = NULL)
data |
d04 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_d04_ruhela_aggregate(otis_d04)## Not run: morie_otis_analyze_d04_ruhela_aggregate(otis_d04)
Custodial deaths by age category (d05)
morie_otis_analyze_d05(data)morie_otis_analyze_d05(data)
data |
d05 data.frame from OTIS. |
RichResult with summary + deaths-by-age crosstab.
d05 aggregate Ruhela: Age 50+ -> custodial deaths.
morie_otis_analyze_d05_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_d05_mrm_aggregate(data, out_dir = NULL)morie_otis_analyze_d05_ruhela_aggregate(data, out_dir = NULL) morie_otis_analyze_d05_mrm_aggregate(data, out_dir = NULL)
data |
d05 data.frame. |
out_dir |
Optional. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_d05_ruhela_aggregate(otis_d05)## Not run: morie_otis_analyze_d05_ruhela_aggregate(otis_d05)
Custodial deaths by alert x medical cause (d06)
morie_otis_analyze_d06(data)morie_otis_analyze_d06(data)
data |
d06 data.frame from OTIS. |
RichResult with summary + medical-cause-by-alert crosstab.
Custodial deaths by alert x housing unit (d07)
morie_otis_analyze_d07(data)morie_otis_analyze_d07(data)
data |
d07 data.frame from OTIS. |
RichResult with summary + housing-unit-by-alert crosstab.
Cross-references the c11 Mandela classification against the Sprott-Doob Feb 2021 federal SIU figures (Table 19, N=1960).
morie_otis_analyze_otis_mandela_provincial_vs_federal(data, out_dir = NULL)morie_otis_analyze_otis_mandela_provincial_vs_federal(data, out_dir = NULL)
data |
c11 data.frame (used to derive the provincial figures). |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_otis_mandela_provincial_vs_federal(otis_c11)## Not run: morie_otis_analyze_otis_mandela_provincial_vs_federal(otis_c11)
Runs every aggregate Ruhela formulation analyzer against the supplied named datasets list and presents a single primary-IRR comparison table (GEE cluster-robust > NB GLM > Poisson GLM).
morie_otis_analyze_ruhela_grid(datasets, out_dir = NULL)morie_otis_analyze_ruhela_grid(datasets, out_dir = NULL)
datasets |
Named list keyed by dataset id (b03..d05). |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_ruhela_grid(list(b03 = otis_b03, c01 = otis_c01)) ## End(Not run)## Not run: morie_otis_analyze_ruhela_grid(list(b03 = otis_b03, c01 = otis_c01)) ## End(Not run)
Sections:
Aggregate Ruhela formulations grid
(Optional) per-row Ruhela formulations on a01/b01/b02
MRM chi-square family on c-series + d-series
Mandela-RF cross-comparison (provincial vs federal)
morie_otis_analyze_ruhela_master( datasets, include_per_row = FALSE, out_dir = NULL )morie_otis_analyze_ruhela_master( datasets, include_per_row = FALSE, out_dir = NULL )
datasets |
Named list of OTIS data.frames. |
include_per_row |
Logical; if |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_ruhela_master(datasets_list) ## End(Not run)## Not run: morie_otis_analyze_ruhela_master(datasets_list) ## End(Not run)
Runs the complete 10-estimator DLRM separately on each fiscal year. This is a heavy operation (~7x the single-year runtime).
morie_otis_analyze_ruhela_per_year( data, ds_id, treatment, outcome, covariates, year_col = "EndFiscalYear", cluster_col = "EndFiscalYear", out_dir = NULL )morie_otis_analyze_ruhela_per_year( data, ds_id, treatment, outcome, covariates, year_col = "EndFiscalYear", cluster_col = "EndFiscalYear", out_dir = NULL )
data |
Long-format data.frame with treatment / outcome / cov. |
ds_id |
Dataset id label. |
treatment |
Treatment column name. |
outcome |
Outcome column name. |
covariates |
Character vector of covariate column names. |
year_col |
Year column (default |
cluster_col |
Cluster axis for SE, or |
out_dir |
Optional output directory. |
morie_otis_analysis_result.
## Not run: morie_otis_analyze_ruhela_per_year(df, ds_id = "a01", treatment = "T", outcome = "Y", covariates = c("Gender")) ## End(Not run)## Not run: morie_otis_analyze_ruhela_per_year(df, ds_id = "a01", treatment = "T", outcome = "Y", covariates = c("Gender")) ## End(Not run)
Mirrors _ANALYSES in
src/morie/otis_all_analyze.py.
morie_otis_analyzers()morie_otis_analyzers()
Encodes 3 binary alert flags into 8 combinations (a1..a8) and
aggregates per (id, fiscal year), computing a complexity index
ac = number of distinct combinations observed.
morie_otis_astcmb( df, alert_cols = c("mental_health_alert", "suicide_risk_alert", "suicide_watch_alert"), id_col = "unique_individual_id", year_col = "end_fiscal_year" )morie_otis_astcmb( df, alert_cols = c("mental_health_alert", "suicide_risk_alert", "suicide_watch_alert"), id_col = "unique_individual_id", year_col = "end_fiscal_year" )
df |
data.frame with the three alert columns and id/year cols. |
alert_cols |
Character-3 of alert column names. Default mirrors the lower-case Python schema. |
id_col, year_col
|
Column names. |
morie_otis_result.
Returns one row per (pair, estimator) combination with the ATE,
SE, 95% CI, and per-row notes. The IRM-DML row uses the ATE
component of morie_otis_irm_dml()'s output (not the ATTE /
ATC). Concordance across all three estimators is the strongest
evidence of an identified causal effect under conditional
exchangeability.
morie_otis_causal_grid(df = NULL, seed = 123L)morie_otis_causal_grid(df = NULL, seed = 123L)
df |
OTIS placement-level data.frame. If |
seed |
Integer seed for the cross-fitting (default 123). |
Data.frame with columns pair, estimator,
n, p_treat, ate, ate_se,
ate_pval, ci95_lo, ci95_hi, notes.
## Not run: morie_otis_causal_grid() ## End(Not run)## Not run: morie_otis_causal_grid() ## End(Not run)
Eleven callables operationalising Goffman's "total institution" framework (Goffman 1961) on the OTIS dataset:
morie_otis_repeat_placement_concentration(b09)
morie_otis_within_year_placement_count(b01)
morie_otis_within_year_region_diversity(b01)
morie_otis_mortification_cooccurrence(b01)
morie_otis_disciplinary_medical_overlap(b01)
morie_otis_embedding_distribution(b02)
morie_otis_intra_year_transition_matrix(a01)
morie_otis_path_complexity_gini(b01)
morie_otis_region_alert_state_richness(b01)
morie_otis_regC_demog_contingency(b01)
morie_otis_irr_glmm_vm(b01): Poisson + NB2 IRR
(requires MASS for the negative-binomial fit; falls back
to Poisson-only when MASS is unavailable).
All metrics are intra-fiscal-year by construction. OTIS
UniqueIndividual_ID is anonymised as YYYY-XXXXX-AA,
randomly reassigned each fiscal year and each dataset file, so
longitudinal individual-level and cross-dataset linkage are
impossible by design (see docs/methods/otis_linkage.md).
The variable_taxonomy.R registry enforces this with
cross_year_safe = FALSE.
Goffman, E. (1961). Asylums: Essays on the social situation of mental patients and other inmates. Anchor Books.
Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. The Annals of Statistics, 3(5), 1163-1174.
Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661-703.
Calls every morie_otis_* churn callable on its respective
input data.frame and returns a named list of results. Each
input is independent: pass NULL (or omit) to skip a metric.
If out_dir is supplied, each result is also serialised to
disk.
morie_otis_churn_analyze_all( b01 = NULL, b02 = NULL, b09 = NULL, a01 = NULL, out_dir = NULL )morie_otis_churn_analyze_all( b01 = NULL, b02 = NULL, b09 = NULL, a01 = NULL, out_dir = NULL )
b01, b02, b09, a01
|
Input data.frames (any may be |
out_dir |
Optional output directory. |
CRAN-safe: with out_dir = NULL no files are written.
Named list of morie_otis_result.
Encodes the bitfield of three binary alert flags
(MentalHealth, SuicideRisk, SuicideWatch) into the 8-state combo
label used by Ruhela's primary RF and maps the alert profile to a
Mandela-rule category (compliant / at-risk /
torture). This wraps the duration-aware
morie_siu_classify_mandela() when a row's
NumberConsecutiveDays_Segregation is supplied; with the
default flags-only mode the categorisation is "alert-only" and
degrades gracefully when duration is missing.
morie_otis_classify_mandela_combo( mh, sr, sw, days = NA_real_, hours_per_day = 22 )morie_otis_classify_mandela_combo( mh, sr, sw, days = NA_real_, hours_per_day = 22 )
mh |
Integer/logical 0/1 mental-health flag. |
sr |
Integer/logical 0/1 suicide-risk flag. |
sw |
Integer/logical 0/1 suicide-watch flag. |
days |
Optional numeric consecutive-days-segregation. If
supplied, delegates to |
hours_per_day |
Optional numeric daily hours in segregation (default 22, the OTIS Restrictive Confinement convention). |
Mandela Rule 43-45 thresholds: > 15 consecutive days segregation = prolonged solitary = torture (UN GA 70/175). Anything >= 22h/day for >= 15 days is the canonical torture-eligible band.
Named list with combo (integer 0..7), combo_label
(one of a1..a8), alert_count (0..3),
mandela_category, and notes.
morie_otis_classify_mandela_combo(1, 0, 0) morie_otis_classify_mandela_combo(1, 1, 1, days = 20, hours_per_day = 23)morie_otis_classify_mandela_combo(1, 0, 0) morie_otis_classify_mandela_combo(1, 1, 1, days = 20, hours_per_day = 23)
Goffman's "tinkering trades" tension: same person classified by
both punitive and therapeutic rationales. Detects any
SegReason_Disciplinary* flag co-occurring with any
SegReason_*Medical* flag.
morie_otis_disciplinary_medical_overlap(df)morie_otis_disciplinary_medical_overlap(df)
df |
b01 data.frame. |
morie_otis_result.
Fits lognormal, Pareto, and exponential distributions to
TotalAggregatedDays_Segregation by AIC and reports which
family wins.
morie_otis_embedding_distribution(df)morie_otis_embedding_distribution(df)
df |
b02 data.frame. |
morie_otis_result.
Markov transition matrix on Region_AtTimeOfPlacement within
each person-year, with stationary distribution and off-diagonal
Theil-T concentration.
morie_otis_intra_year_transition_matrix(df)morie_otis_intra_year_transition_matrix(df)
df |
a01 data.frame. |
morie_otis_result.
Fits a logistic-regression propensity model on covariates,
clips propensities to , and
computes the Hajek-normalised difference of weighted means. SE
follows the Lunceford-Davidian (2004) sandwich influence-function
form.
morie_otis_ipw_ate(df, treatment, outcome, covariates, eps = 0.02)morie_otis_ipw_ate(df, treatment, outcome, covariates, eps = 0.02)
df |
A data frame containing |
treatment |
Name of the binary treatment column. Strings
|
outcome |
Name of the (numeric) outcome column. |
covariates |
Character vector of covariate names. Character / factor columns are converted to drop-first dummies. |
eps |
Numeric in |
A morie_causal_estimate list with estimator,
ate, ate_se, ate_pval, ate_ci95,
n, n_treated, p_treat, notes.
Lunceford, J. K. & Davidian, M. (2004). Statistics in Medicine 23(19), 2937-2960.
set.seed(1) n <- 300L x <- rnorm(n) d <- rbinom(n, 1, plogis(0.4 * x)) y <- 0.5 * d + x + rnorm(n) df <- data.frame(d = d, y = y, x = x) morie_otis_ipw_ate(df, treatment = "d", outcome = "y", covariates = "x")set.seed(1) n <- 300L x <- rnorm(n) d <- rbinom(n, 1, plogis(0.4 * x)) y <- 0.5 * d + x + rnorm(n) df <- data.frame(d = d, y = y, x = x) morie_otis_ipw_ate(df, treatment = "d", outcome = "y", covariates = "x")
Computes the doubly-robust ATE / ATTE / ATC via the Chernozhukov et
al. (2018) IRM score with cross-fitted nuisance models. Delegates
to DoubleML's DoubleMLIRM when the package (with
mlr3 + mlr3learners) is installed; otherwise falls back
to a self-contained cross-fit using .otis_logit_fit for the
propensity and OLS for the per-arm outcome regressions (mirroring
the python module's ml_outcome="ols", ml_propensity="logit"
branch).
morie_otis_irm_dml( df, treatment, outcome, covariates, cluster_cols = NULL, n_folds = 3L, seed = 123L, eps = 0.02, match_first = FALSE, match_caliper_sd = 0.2 )morie_otis_irm_dml( df, treatment, outcome, covariates, cluster_cols = NULL, n_folds = 3L, seed = 123L, eps = 0.02, match_first = FALSE, match_caliper_sd = 0.2 )
df |
A data frame. |
treatment |
Binary treatment column name. |
outcome |
Outcome column name. |
covariates |
Character vector of covariate names. |
cluster_cols |
|
n_folds |
Number of cross-fitting folds (default 3). |
seed |
Integer seed (default 123). |
eps |
Propensity clip bound (default 0.02). |
match_first |
Logical; if |
match_caliper_sd |
Caliper width (default 0.2 * SD of logit-e). |
Cluster-robust SE: pass cluster_cols as the name (one-way)
or character vector (multi-way Cameron-Gelbach-Miller 2011, up to
2-way). cluster_cols = NULL gives the heteroskedasticity-
consistent SE.
Optional match_first = TRUE runs 1:1 nearest-neighbour
propensity-score matching on logit(e(X)) with caliper
match_caliper_sd * SD(logit(e)) first, then fits IRM-DML on
the matched subset. Mirrors the MatchIt-then-DML pipeline of
OTIS-RC/notez1a.qmd.
Named list with ate, ate_se, ate_pval,
ate_ci95, atte, atte_se, atte_pval,
atte_ci95, atc, atc_se, atc_pval,
atc_ci95, n, n_treated, p_treat,
se_kind.
Chernozhukov, V. et al. (2018). Econometrics Journal 21(1), C1-C68. Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). JBES 29(2), 238-249.
set.seed(1) n <- 300L x <- rnorm(n) d <- rbinom(n, 1, plogis(0.4 * x)) y <- 0.5 * d + x + rnorm(n) df <- data.frame(d = d, y = y, x = x, id = sample.int(50, n, replace = TRUE)) morie_otis_irm_dml(df, treatment = "d", outcome = "y", covariates = "x", n_folds = 3L)set.seed(1) n <- 300L x <- rnorm(n) d <- rbinom(n, 1, plogis(0.4 * x)) y <- 0.5 * d + x + rnorm(n) df <- data.frame(d = d, y = y, x = x, id = sample.int(50, n, replace = TRUE)) morie_otis_irm_dml(df, treatment = "d", outcome = "y", covariates = "x", n_folds = 3L)
Builds the (id x fiscal year) cell with outcome
vm (number of distinct regions visited) and treatment
T_high_ac = 1 if the person-year alert-complexity ac >=
2, then fits Poisson and (optionally) negative-binomial GLMs
adjusting for Year, Gender, and Age. The NB fit uses
MASS::glm.nb when available; if not, only Poisson is
reported.
morie_otis_irr_glmm_vm(df)morie_otis_irr_glmm_vm(df)
df |
b01 data.frame. |
No random effect / cluster-robust SE – for paper-grade inference, use the dedicated OTIS DML pipeline.
morie_otis_result.
Reads the Rscript-exported mirror at
file.path(morie_cache_dir("otis"), "otis_main.csv") unless
csv_path is supplied. The expected schema is 10 columns:
end_fiscal_year, unique_individual_id,
region_at_time_of_placement,
region_most_recent_placement, gender,
age_category, mental_health_alert,
suicide_risk_alert, suicide_watch_alert,
number_of_placements.
morie_otis_load(csv_path = NULL, use_readr = FALSE)morie_otis_load(csv_path = NULL, use_readr = FALSE)
csv_path |
Optional explicit CSV path. |
use_readr |
If |
To refresh the cache from the canonical
correctional_stats_report_environment.RData fixture, run the
repository script scripts/export_otis_csv.R.
data.frame.
## Not run: df <- morie_otis_load() ## End(Not run)## Not run: df <- morie_otis_load() ## End(Not run)
The clinical-alert chain: do mental-health flags causally elevate subsequent suicide-risk-alert occurrence, conditional on demographics and region?
morie_otis_make_pair_a(df)morie_otis_make_pair_a(df)
df |
OTIS placement-level data.frame. |
Named list list(data, T = "T_a", Y = "Y_a", covariates).
## Not run: morie_otis_make_pair_a(morie_otis_load()) ## End(Not run)## Not run: morie_otis_make_pair_a(morie_otis_load()) ## End(Not run)
Same as morie_otis_make_pair_alert_to_volatility_ruhela() but
auto-loads a01 (Restrictive Confinement Detailed) via the registered
morie_otis_load() loader when df = NULL. a01 is the
canonical file the published OTIS-RC res_pool /
res_by_year / res_all are computed on.
morie_otis_make_pair_alert_to_volatility_a01(df = NULL)morie_otis_make_pair_alert_to_volatility_a01(df = NULL)
df |
Optional OTIS a01 data.frame. If |
Same shape as
morie_otis_make_pair_alert_to_volatility_ruhela().
## Not run: morie_otis_make_pair_alert_to_volatility_a01() ## End(Not run)## Not run: morie_otis_make_pair_alert_to_volatility_a01() ## End(Not run)
Convenience wrapper that returns both formulations side-by-side for RDF (Ruhela Dual Formulation) robustness analyses.
morie_otis_make_pair_alert_to_volatility_all(df)morie_otis_make_pair_alert_to_volatility_all(df)
df |
OTIS placement-level data.frame. |
Named list list(ruhela = ..., naive = ...) where
each element is the output of the corresponding make-pair builder.
## Not run: morie_otis_make_pair_alert_to_volatility_all(morie_otis_load()) ## End(Not run)## Not run: morie_otis_make_pair_alert_to_volatility_all(morie_otis_load()) ## End(Not run)
Robustness alternative to
morie_otis_make_pair_alert_to_volatility_ruhela(): treatment
= "max simultaneous flags across the year's rows >= 2"; outcome =
"any placement row with regA != regB" (binary). Produces a different
treatment marginal (~24\
encoding) and a binary rather than count outcome. Used side-by-side
as the Naive arm of an RDF (Ruhela Dual Formulation).
morie_otis_make_pair_alert_to_volatility_naive(df)morie_otis_make_pair_alert_to_volatility_naive(df)
df |
OTIS placement-level data.frame. |
Named list with data, T = "T_high_ac",
Y = "Y_vm_any", covariates =
c("Gender", "Age_Category", "EndFiscalYear").
## Not run: df <- morie_otis_load() morie_otis_make_pair_alert_to_volatility_naive(df) ## End(Not run)## Not run: df <- morie_otis_load() morie_otis_make_pair_alert_to_volatility_naive(df) ## End(Not run)
Implements Ruhela's "ac >= 2 -> vm" RF (Ruhela Formulation): the
8-state combo encoding documented in OTIS-RC/notez1a.qmd and used
for the published res_pool / res_by_year / res_all
estimates. Per (UniqueIndividual_ID, EndFiscalYear), the
alert-state complexity ac is the number of distinct
alert combos with positive support across that person-year's rows
(NOT the max of simultaneous flags – see the Naive arm for that
alternative). Treatment T_high_ac = 1L iff ac >= 2.
Outcome Y_vm_count sums the within-row and across-row
regional-volatility-move indicators.
morie_otis_make_pair_alert_to_volatility_ruhela(df)morie_otis_make_pair_alert_to_volatility_ruhela(df)
df |
OTIS placement-level data.frame (b01 / a01 schema). |
A named list with elements data (the person-year
data.frame), T = "T_high_ac", Y = "Y_vm_count", and
covariates = c("Gender", "Age_Category", "EndFiscalYear").
## Not run: df <- morie_otis_load() pair <- morie_otis_make_pair_alert_to_volatility_ruhela(df) morie_otis_irm_dml(pair$data, treatment = pair$T, outcome = pair$Y, covariates = pair$covariates) ## End(Not run)## Not run: df <- morie_otis_load() pair <- morie_otis_make_pair_alert_to_volatility_ruhela(df) morie_otis_irm_dml(pair$data, treatment = pair$T, outcome = pair$Y, covariates = pair$covariates) ## End(Not run)
Treatment T_b = 1 iff at least 2 of (MentalHealth, SuicideRisk, SuicideWatch) alerts are simultaneously active in the row. Outcome Y_b = 1 iff Number_Of_Placements >= 2 (proxy for any future readmission).
morie_otis_make_pair_b(df)morie_otis_make_pair_b(df)
df |
OTIS placement-level data.frame. |
Named list list(data, T = "T_b", Y = "Y_b", covariates).
## Not run: morie_otis_make_pair_b(morie_otis_load()) ## End(Not run)## Not run: morie_otis_make_pair_b(morie_otis_load()) ## End(Not run)
Treatment T_c = 1 iff Region_AtTimeOfPlacement != Region_MostRecent. Outcome Y_c = NumberConsecutiveDays_Segregation winsorised at the 99th percentile.
morie_otis_make_pair_c(df)morie_otis_make_pair_c(df)
df |
OTIS placement-level data.frame. |
Named list list(data, T = "T_c", Y = "Y_c", covariates).
## Not run: morie_otis_make_pair_c(morie_otis_load()) ## End(Not run)## Not run: morie_otis_make_pair_c(morie_otis_load()) ## End(Not run)
Counts concurrent alert flags per placement and tests independence of MentalHealth vs SuicideRisk via chi-square + Cramer's V.
morie_otis_mortification_cooccurrence(df)morie_otis_mortification_cooccurrence(df)
df |
b01 data.frame. |
morie_otis_result.
Returns unique-individual counts overall and by fiscal year, region, age category, and gender, plus the per-individual placement-count five-number summary.
morie_otis_otdesc( df, id_col = "unique_individual_id", year_col = "end_fiscal_year" )morie_otis_otdesc( df, id_col = "unique_individual_id", year_col = "end_fiscal_year" )
df |
data.frame. |
id_col, year_col
|
Column names. |
morie_otis_result.
Wraps a Frisch-Waugh-Lovell partialling-out estimator with
n_folds cross-fitting on the OLS nuisance functions
and , then regresses outcome residuals on
treatment residuals for the ATE; heteroskedasticity-robust standard
errors. ATT is the ATE divided by the treated share (a simple
weighting approximation; for the production-grade DML use
DoubleML).
morie_otis_otdml( df, outcome = "Y", treatment = "D", covariates = NULL, n_folds = 3L, seed = 123L )morie_otis_otdml( df, outcome = "Y", treatment = "D", covariates = NULL, n_folds = 3L, seed = 123L )
df |
data.frame. |
outcome, treatment
|
Column names. |
covariates |
Character vector of covariate column names. If
|
n_folds |
Integer fold count (default |
seed |
Integer RNG seed. |
Categorical covariates are dummy-coded with model.matrix.
morie_otis_result.
Chernozhukov, V. et al. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1), C1-C68.
Per-(id, year, region) placement counts, with the Gini coefficient reported overall and split by fiscal year and region.
morie_otis_path_complexity_gini(df)morie_otis_path_complexity_gini(df)
df |
b01 data.frame. |
morie_otis_result.
Wraps the six OTIS primitives in otis.R as morie RichResult
lists and exposes:
morie_otis_load: canonical CSV loader (reads
the Rscript-exported otis_main.csv mirror).
morie_otis_all_analyses: driver that runs
rplace / astcmb / volat / rctrnd / otdesc on one data.frame and
optionally serialises each result to disk under a user-supplied
directory (CRAN-safe: never writes without an explicit
out_dir).
morie_otis_otdml is excluded from the bundle because it
requires the caller to specify (treatment, outcome,
covariates) – call it directly when needed.
OTIS UniqueIndividual_ID is randomly reassigned every fiscal
year. All analyses are computed within EndFiscalYear;
cross-year ID joins are forbidden (the
variable_taxonomy.R registry sets
cross_year_safe = FALSE).
The python version uses scikit-learn RF nuisance models for the
Frisch-Waugh-Lovell partialling-out construction. For the R port,
use the analogous morie_estimate_double_ml() from
causal.R, which already wraps DoubleML::DoubleMLPLR
(with mlr3 ranger learners) and a cross-fit ridge fallback.
morie_otis_plr(...)morie_otis_plr(...)
... |
Arguments mirroring |
Stops with a redirect to morie_estimate_double_ml().
## Not run: morie_otis_plr(df, treatment = "d", outcome = "y", covariates = "x") ## End(Not run)## Not run: morie_otis_plr(df, treatment = "d", outcome = "y", covariates = "x") ## End(Not run)
Six lightweight callables mirroring the Python module
morie.otis: regional-placement matrices, alert-state combo
encoding, regional volatility, restrictive-confinement trends,
descriptive statistics, and a partialled-out Plug-in DML (PLR)
ATE/ATT estimator. Each public callable returns a named list with
classes c("morie_otis_result", "morie_rich_result", "list")
carrying summary_lines, optional tables, a
plain-language interpretation, and machine-readable
payload entries.
morie_otis_regional_placement(...) morie_otis_alert_state_combo(...) morie_otis_volatility(...) morie_otis_rc_trends(...) morie_otis_descriptives(...) morie_otis_dml(...)morie_otis_regional_placement(...) morie_otis_alert_state_combo(...) morie_otis_volatility(...) morie_otis_rc_trends(...) morie_otis_descriptives(...) morie_otis_dml(...)
... |
Arguments forwarded verbatim to the canonical short-named OTIS primitive (e.g. |
Data sources: anonymized Ontario MCSCS placement records released
under the Jahn v. Ontario (2020) settlement. The canonical OTIS
table has 76,934 rows (FY 2022/23 – 2024/25). See
morie_otis_load in otis_analyze.R for the
canonical loader.
OTIS UniqueIndividual_ID (format YYYY-XXXXX-AA) is
randomly reassigned every fiscal year and re-randomized per dataset
file even within a year. The variable_taxonomy.R registry
enforces cross_year_safe = FALSE for this column. Every
aggregation below operates within EndFiscalYear; cross-year
joins on the ID are forbidden by design.
Ontario Ministry of the Solicitor General (2025). Restrictive Confinement Detailed Dataset. https://data.ontario.ca.
Jahn v. Ontario (2020). Settlement Agreement – Inmate Data Disclosure.
Chernozhukov, V. et al. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1), C1-C68.
The python implementation provides a greedy 1:k NN matcher on
logit-PS with an Austin (2011) 0.2-SD caliper. In R, prefer the
canonical MatchIt implementation
(MatchIt::matchit(method = "nearest", caliper = 0.2,
std.caliper = TRUE)); the present stub holds the python-API
surface so callers can detect the rename.
morie_otis_psm(...)morie_otis_psm(...)
... |
Arguments mirroring |
Stops with a redirect to MatchIt.
## Not run: morie_otis_psm(df, treatment = "d", outcome = "y", covariates = "x") ## End(Not run)## Not run: morie_otis_psm(df, treatment = "d", outcome = "y", covariates = "x") ## End(Not run)
Rosenbaum-Rubin (1983) PS-stratification (default n_strata = 5,
Cochran 1968 convention). Use MatchIt::matchit(method =
"subclass") for an R-side equivalent.
morie_otis_psm_subclass(...)morie_otis_psm_subclass(...)
... |
Arguments mirroring |
Stops with a redirect to MatchIt subclassification.
## Not run: morie_otis_psm_subclass(df, treatment = "d", outcome = "y", covariates = "x") ## End(Not run)## Not run: morie_otis_psm_subclass(df, treatment = "d", outcome = "y", covariates = "x") ## End(Not run)
Per-(fiscal year, region) counts of unique individuals and total placements.
morie_otis_rctrnd( df, id_col = "unique_individual_id", year_col = "end_fiscal_year", region_col = "region_at_time_of_placement" )morie_otis_rctrnd( df, id_col = "unique_individual_id", year_col = "end_fiscal_year", region_col = "region_at_time_of_placement" )
df |
data.frame. |
id_col, year_col, region_col
|
Column names. |
morie_otis_result (the trends table is in
payload$trends).
Per-person-year multi-region indicator (regC >= 2) cross-
tabulated with Gender and Age_Category; reports chi-square +
Cramer's V on each.
morie_otis_regC_demog_contingency(df)morie_otis_regC_demog_contingency(df)
df |
b01 data.frame. |
morie_otis_result.
Distinct (region x alert-combo) states occupied per person-year.
morie_otis_region_alert_state_richness(df)morie_otis_region_alert_state_richness(df)
df |
b01 data.frame. |
morie_otis_result.
Expands the OTIS b09 banded counts into a per-individual placement-
count vector, then reports the Gini coefficient, Hill-MLE power-law
alpha, top-10\
exponential null. Reuses the .gini_int / .hill_mle
helpers from mrm_otis.R.
morie_otis_repeat_placement_concentration( df, band_col = "NumberPlacements_Segregation", count_col = "NumberIndividuals_Segregation" )morie_otis_repeat_placement_concentration( df, band_col = "NumberPlacements_Segregation", count_col = "NumberIndividuals_Segregation" )
df |
b09 long-form data.frame. |
band_col, count_col
|
Column names. |
morie_otis_result.
Builds a count matrix (age x region) and the row-normalised proportion matrix of unique-individual placements for one fiscal year, optionally filtered by gender.
morie_otis_rplace( df, year, sex = NULL, id_col = "unique_individual_id", age_col = "age_category", region_col = "region_at_time_of_placement", year_col = "end_fiscal_year", gender_col = "gender" )morie_otis_rplace( df, year, sex = NULL, id_col = "unique_individual_id", age_col = "age_category", region_col = "region_at_time_of_placement", year_col = "end_fiscal_year", gender_col = "gender" )
df |
data.frame of OTIS placement records. |
year |
Integer fiscal year (e.g. |
sex |
Optional gender filter ( |
id_col, age_col, region_col, year_col, gender_col
|
Column names. |
morie_otis_result list.
## Not run: df <- morie_otis_load() morie_otis_rplace(df, year = 2024) ## End(Not run)## Not run: df <- morie_otis_load() morie_otis_rplace(df, year = 2024) ## End(Not run)
Run all OTIS-TPS overlay analyses
morie_otis_tps_analyze_all(otis_b01, tps_datasets, out_dir = NULL)morie_otis_tps_analyze_all(otis_b01, tps_datasets, out_dir = NULL)
otis_b01 |
OTIS b01 data.frame. |
tps_datasets |
Named list of TPS data.frames (one per category). |
out_dir |
Optional output directory for |
Named list of morie_otis_analysis_results
(region_rollup, yoy_correlation).
Same body as morie_otis_tps_yoy_correlation; the alias
preserves the Python entry-point name.
morie_otis_tps_composite_overlay(otis_b01, tps_datasets)morie_otis_tps_composite_overlay(otis_b01, tps_datasets)
otis_b01 |
OTIS b01 data.frame
(e.g. |
tps_datasets |
A named list of TPS data.frames, one per
category (e.g. |
R port of morie.otis_tps_overlay. Both feeds touch Toronto,
so three overlay analyses are meaningful:
morie_otis_tps_yoy_correlation() – year-over-year
Pearson r between OTIS Toronto-region segregation placements
and TPS incident counts (per category).
morie_otis_tps_per_region_rollup() – OTIS seg/RC
totals per region x year, with the Toronto row flagged for
overlay use.
morie_otis_tps_composite_overlay() – alias for
morie_otis_tps_yoy_correlation() (preserves the Python
name, same body).
All three return a morie_otis_analysis_result (the shared
RichResult-shaped list from otis_all_analyze.R).
OTIS UniqueIndividual_ID is reassigned every fiscal
year (see variable_taxonomy.R); the overlay therefore
joins at the year grain, never at the person grain.
OTIS uses fiscal-year (EndFiscalYear); TPS uses
calendar OCC_YEAR or REPORT_YEAR. The Pearson r
here is computed on the year-aligned intersection – there is
a small fiscal/calendar misalignment that is documented in the
interpretation but not corrected. Toronto OTIS data covers only
2023-2025, so common-year samples are necessarily small.
OTIS seg/RC totals per region x year (Toronto row flagged for TPS-overlay use)
morie_otis_tps_per_region_rollup(otis_b01)morie_otis_tps_per_region_rollup(otis_b01)
otis_b01 |
OTIS b01 data.frame. |
A morie_otis_analysis_result with a year x region
count matrix.
Year-over-year correlation between OTIS Toronto-region segregation placements and TPS incident counts (per category)
morie_otis_tps_yoy_correlation(otis_b01, tps_datasets)morie_otis_tps_yoy_correlation(otis_b01, tps_datasets)
otis_b01 |
OTIS b01 data.frame
(e.g. |
tps_datasets |
A named list of TPS data.frames, one per
category (e.g. |
A morie_otis_analysis_result with a per-category
Pearson r table.
if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") tps <- list( assault = read.csv("Assault_Open_Data.csv"), robbery = read.csv("Robbery_Open_Data.csv") ) morie_otis_tps_yoy_correlation(b01, tps) }if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") tps <- list( assault = read.csv("Assault_Open_Data.csv"), robbery = read.csv("Robbery_Open_Data.csv") ) morie_otis_tps_yoy_correlation(b01, tps) }
Counts the number of distinct regions an individual was placed in
within one fiscal year (union of region_at_time_of_placement
and region_most_recent_placement).
morie_otis_volat( df, id_col = "unique_individual_id", year_col = "end_fiscal_year", regA_col = "region_at_time_of_placement", regB_col = "region_most_recent_placement" )morie_otis_volat( df, id_col = "unique_individual_id", year_col = "end_fiscal_year", regA_col = "region_at_time_of_placement", regB_col = "region_most_recent_placement" )
df |
data.frame. |
id_col, year_col, regA_col, regB_col
|
Column names. |
morie_otis_result.
Distribution of segregation placements per (individual x fiscal
year) cell. Because OTIS IDs are year-locked
(YYYY-XXXXX-AA), each cell is one anonymous person-year;
cross-year readmission is not measurable.
morie_otis_within_year_placement_count(df)morie_otis_within_year_placement_count(df)
df |
b01 data.frame. |
morie_otis_result.
Distinct Region_AtTimeOfPlacement values per person-year.
morie_otis_within_year_region_diversity(df)morie_otis_within_year_region_diversity(df)
df |
b01 data.frame. |
morie_otis_result.
Paired t-test
morie_paired_t_test(x1, x2, alternative = c("two.sided", "greater", "less"))morie_paired_t_test(x1, x2, alternative = c("two.sided", "greater", "less"))
x1 |
Numeric vector (before/condition 1). |
x2 |
Numeric vector (after/condition 2). |
alternative |
|
Named list: t, df, p_value, ci_diff, mean_diff.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
law_code string into its structural fieldsPhase 3CCC1. NYPD law codes are space-or-zero-padded composites of a 1-4 char statute book prefix and a numeric/alpha section identifier. Examples:
morie_parse_nypd_law_code(law_code)morie_parse_nypd_law_code(law_code)
law_code |
Character vector of NYPD |
"PL 1601005" -> book=PL, section=1601005 (Penal Law)
"VTL0511000" -> book=VTL, section=0511000
"AC 0019190" -> book=AC, section=0019190 (NYC Admin Code)
"ABC0064A00" -> book=ABC, section=0064A00
The book prefix is extracted as the leading run of uppercase ASCII letters; the section is everything after the prefix with leading whitespace stripped. NA / empty inputs return NA fields.
A data.frame with book, section columns aligned to
law_code. Length-preserving.
morie_parse_nypd_law_code(c("PL 1601005", "AC 0019190", "ABC0064A00"))morie_parse_nypd_law_code(c("PL 1601005", "AC 0019190", "ABC0064A00"))
Resolve standard project paths
morie_paths(project_root = NULL)morie_paths(project_root = NULL)
project_root |
Project root directory. If |
Named list of key paths.
tryCatch(morie_paths(), error = function(e) message("not inside a morie project tree") )tryCatch(morie_paths(), error = function(e) message("not inside a morie project tree") )
Wraps stats::prcomp.
morie_pca_dimension_reduction(x, n_components = NULL, seed = 0L)morie_pca_dimension_reduction(x, n_components = NULL, seed = 0L)
x |
Numeric matrix. |
n_components |
Number of components (default min(n, p)). |
seed |
Unused for the SVD path; kept for API parity. |
Named list: estimate, components, explained_variance, explained_variance_ratio, singular_values, scores, n_components, n, method.
morie_pca_dimension_reduction(x = rnorm(50))morie_pca_dimension_reduction(x = rnorm(50))
Convenience preset wrapping buttbp() with the standard PCG band
(25–400 Hz at 2000 Hz sampling). Removes baseline drift below 25 Hz and
anti-aliased high-frequency noise above 400 Hz.
morie_pcg_filter(x, fs = 2000, low = 25, high = 400)morie_pcg_filter(x, fs = 2000, low = 25, high = 400)
x |
Numeric vector (PCG signal). |
fs |
Sampling frequency (Hz, default 2000). |
low |
Lower cutoff (Hz, default 25). |
high |
Upper cutoff (Hz, default 400). |
List with filtered signal (see buttbp()).
if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) x <- rnorm(2000) # 1 second of white-noise PCG-like input y <- morie_pcg_filter(x) length(y$filtered) }if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) x <- rnorm(2000) # 1 second of white-noise PCG-like input y <- morie_pcg_filter(x) length(y$filtered) }
Uses glmnet if available; otherwise the base-R coordinate-descent fallback. Both solve:
morie_penalized_regression( x, y, alpha = 0.5, lam = 1, max_iter = 1000, tol = 1e-06 )morie_penalized_regression( x, y, alpha = 0.5, lam = 1, max_iter = 1000, tol = 1e-06 )
x |
(n x p) predictor matrix. |
y |
Numeric response. |
alpha |
0 (ridge) to 1 (LASSO). |
lam |
Penalty strength. |
max_iter, tol
|
Convergence controls. |
min 1/(2n) ||y - X beta||^2 + lam (alpha ||beta||_1 + (1-alpha)/2 ||beta||_2^2).
list(estimate, beta, intercept, se, alpha, lam, n_iter, n, p, method).
Friedman, Hastie & Tibshirani (2010); Montesinos Lopez Ch 6.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Trims central ranks; only tail ranks contribute. Score: a_i = max(R_i - (1-q)(N+1), 0) - max(q(N+1) - R_i, 0)
morie_percentile_modified_rank(x, y, q = 0.25)morie_percentile_modified_rank(x, y, q = 0.25)
x, y
|
Numeric vectors. |
q |
Tail fraction in (0, 0.5). Default 0.25. |
Named list: statistic, p_value, z, n, m, q.
morie_percentile_modified_rank(x = rnorm(50), y = rnorm(50))morie_percentile_modified_rank(x = rnorm(50), y = rnorm(50))
Thin extender over performance::check_collinearity for
the variance-inflation-factor diagnostic.
morie_performance_check_collinearity(model, ...)morie_performance_check_collinearity(model, ...)
model |
A fitted model object supported by performance. |
... |
Further arguments forwarded to
|
A list with $method =
"performance::check_collinearity" and $raw (the
upstream VIF data frame).
Thin extender over performance::check_model that returns
the diagnostic-plot grob list for a fitted model.
morie_performance_check_model(model, ...)morie_performance_check_model(model, ...)
model |
A fitted model object supported by insight / performance. |
... |
Further arguments forwarded to
|
A list with $method = "performance::check_model"
and $raw (the upstream check_model object).
Thin extender over performance::check_outliers for
composite outlier detection on a fitted model or numeric data.
morie_performance_check_outliers(x, ...)morie_performance_check_outliers(x, ...)
x |
A fitted model object or numeric data frame supported
by |
... |
Further arguments forwarded to
|
A list with $method =
"performance::check_outliers" and $raw (the upstream
outlier-check object).
Thin extender over performance::r2 returning the
appropriate R-squared (Nakagawa, McFadden, Tjur, ...) for a
supported fitted model.
morie_performance_r2(model, ...)morie_performance_r2(model, ...)
model |
A fitted model object supported by performance. |
... |
Further arguments forwarded to |
A list with $method = "performance::r2" and
$raw (the upstream R-squared object).
Point-biserial correlation
morie_point_biserial_r(binary_var, continuous_var)morie_point_biserial_r(binary_var, continuous_var)
binary_var |
Binary numeric vector (0/1). |
continuous_var |
Continuous numeric vector. |
Named list: r, p_value.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Polynomial feature expansion + OLS via stats::poly +
stats::lm. Uses raw (not orthogonal) polynomials for parity
with scikit-learn's PolynomialFeatures.
morie_polynomial_regression(x, y, degree = 2L)morie_polynomial_regression(x, y, degree = 2L)
x |
Numeric vector or matrix. |
y |
Numeric response. |
degree |
Polynomial degree. |
Named list: estimate, se, feature_names, degree, n, method.
morie_polynomial_regression(x = rnorm(50), y = rnorm(50))morie_polynomial_regression(x = rnorm(50), y = rnorm(50))
Mirrors R's power.prop.test().
morie_power_prop_test( n = NULL, p1 = NULL, p2 = NULL, sig_level = 0.05, power = NULL, alternative = c("two.sided", "one.sided") )morie_power_prop_test( n = NULL, p1 = NULL, p2 = NULL, sig_level = 0.05, power = NULL, alternative = c("two.sided", "one.sided") )
n |
Sample size per group. |
p1 |
Proportion in group 1. |
p2 |
Proportion in group 2. |
sig_level |
Type I error rate. |
power |
Desired power. |
alternative |
|
Result of stats::power.prop.test().
morie_power_prop_test(p1 = 0.30, p2 = 0.20, power = 0.80)morie_power_prop_test(p1 = 0.30, p2 = 0.20, power = 0.80)
Solve for any missing parameter (n, delta, sd, sig.level,
or power). Mirrors R's power.t.test().
morie_power_t_test( n = NULL, delta = NULL, sd = 1, sig_level = 0.05, power = NULL, alternative = c("two.sided", "one.sided"), type = c("two.sample", "one.sample", "paired") )morie_power_t_test( n = NULL, delta = NULL, sd = 1, sig_level = 0.05, power = NULL, alternative = c("two.sided", "one.sided"), type = c("two.sample", "one.sample", "paired") )
n |
Sample size per group (NULL to solve for it). |
delta |
Effect size (difference in means). |
sd |
Standard deviation (pooled). |
sig_level |
Type I error rate (alpha). |
power |
Desired power (1 - beta). |
alternative |
|
type |
|
Result of stats::power.t.test().
morie_power_t_test(n = NULL, delta = 0.5, power = 0.80)morie_power_t_test(n = NULL, delta = 0.5, power = 0.80)
Thin extender over ppcor::pcor (matrix-wise) or
ppcor::pcor.test (when y and z are
supplied) for partial correlations controlling for one or more
variables.
morie_ppcor_partial(x, y = NULL, z = NULL, method = "pearson", ...)morie_ppcor_partial(x, y = NULL, z = NULL, method = "pearson", ...)
x |
Numeric vector, matrix, or data frame. When |
y |
Optional second numeric vector. |
z |
Optional numeric vector / matrix of control variables. |
method |
Correlation method ( |
... |
Further arguments forwarded to the upstream function. |
A list with $method (qualified upstream name) and
$raw (the upstream return object).
Thin extender over ppcor::spcor (matrix-wise) or
ppcor::spcor.test (when y and z are
supplied) for semi-partial (part) correlations.
morie_ppcor_semipartial(x, y = NULL, z = NULL, method = "pearson", ...)morie_ppcor_semipartial(x, y = NULL, z = NULL, method = "pearson", ...)
x |
Numeric vector, matrix, or data frame. When |
y |
Optional second numeric vector. |
z |
Optional numeric vector / matrix of control variables. |
method |
Correlation method ( |
... |
Further arguments forwarded to the upstream function. |
A list with $method (qualified upstream name) and
$raw (the upstream return object).
Probability proportional to size (PPS) sampling
morie_pps_sample(df, size_col, n, seed = 42L, replace = FALSE)morie_pps_sample(df, size_col, n, seed = 42L, replace = FALSE)
df |
A data frame. |
size_col |
Name of the size measure column. |
n |
Number of units to select. |
seed |
Random seed. |
replace |
Logical; |
Data frame of selected units with .weight (Hansen-Hurwitz weights).
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Reports Pearson r, Spearman rho, MSE/MSPE, RMSE, R^2, calibration slope and intercept.
morie_prediction_accuracy(y_true, y_pred)morie_prediction_accuracy(y_true, y_pred)
y_true |
Numeric observed. |
y_pred |
Numeric predicted. |
list(estimate (Pearson r), pearson_r, morie_spearman_rho, mse, mspe, rmse, r2, slope, intercept, n, method).
Montesinos Lopez Ch 2.
morie_prediction_accuracy(y_true = rbinom(50, 1, 0.5), y_pred = rbinom(50, 1, 0.5))morie_prediction_accuracy(y_true = rbinom(50, 1, 0.5), y_pred = rbinom(50, 1, 0.5))
Aggregate per-record predictive-policing data to one row per area
morie_predpol_aggregate_areas( area, risk, outcome, group = NULL, population = NULL )morie_predpol_aggregate_areas( area, risk, outcome, group = NULL, population = NULL )
area |
Area identifier for each record. |
risk |
Predicted risk score for each record. |
outcome |
Realised-outcome indicator/count for each record. |
group |
Optional protected attribute per record; the per-area majority value becomes that area's group label. |
population |
Optional area population: a named numeric vector
( |
A named list: areas, mean_risk, outcome_rate, group,
n_records.
agg <- morie_predpol_aggregate_areas( area = c("a", "a", "b", "b"), risk = c(10, 20, 30, 40), outcome = c(1, 0, 1, 1) ) agg$mean_risk # 15 35 agg$outcome_rate # 0.5 1.0agg <- morie_predpol_aggregate_areas( area = c("a", "a", "b", "b"), risk = c(10, 20, 30, 40), outcome = c(1, 0, 1, 1) ) agg$mean_risk # 15 35 agg$outcome_rate # 0.5 1.0
Ranks areas by predicted risk and by realised outcome rate (rank 1 =
highest), forms rank_gap = outcome_rank - risk_rank per area
(positive = over-predicted), and averages the gap within each group.
A Spearman correlation summarises overall calibration.
morie_predpol_calibration_audit(areas, mean_risk, outcome_rate, group)morie_predpol_calibration_audit(areas, mean_risk, outcome_rate, group)
areas |
Area identifiers (one per area). |
mean_risk |
Mean predicted risk per area. |
outcome_rate |
Realised outcome rate per area. |
group |
Majority protected-attribute label per area. |
A named list: value (worst per-group mean gap), spearman,
spearman_pvalue, group_rank_gap, worst_group, rank_gap,
warnings, interpretation.
res <- morie_predpol_calibration_audit( areas = c("d1", "d2", "d3", "d4", "d5", "d6"), mean_risk = c(90, 80, 70, 30, 20, 10), outcome_rate = c(10, 20, 30, 70, 80, 90), group = c("X", "X", "X", "Y", "Y", "Y") ) res$group_rank_gap$X # 3 (group X over-predicted) res$spearman # -1 (perfectly miscalibrated)res <- morie_predpol_calibration_audit( areas = c("d1", "d2", "d3", "d4", "d5", "d6"), mean_risk = c(90, 80, 70, 30, 20, 10), outcome_rate = c(10, 20, 30, 70, 80, 90), group = c("X", "X", "X", "Y", "Y", "Y") ) res$group_rank_gap$X # 3 (group X over-predicted) res$spearman # -1 (perfectly miscalibrated)
Reports per-group n / mean / median / sd, a one-way ANOVA for
whether group membership relates to the score, and each group's
mean-score gap from a reference group. A significant gap is not
itself proof of bias; pair this with morie_predpol_calibration_audit().
morie_predpol_score_disparity(score, group, reference = NULL)morie_predpol_score_disparity(score, group, reference = NULL)
score |
Continuous risk score, one per individual. |
group |
Protected attribute, one per individual. |
reference |
Reference group for the gaps; defaults to the lowest-scoring group. |
A named list: value (mean-score spread), spread,
group_means, gaps, anova_f, anova_pvalue, significant,
reference, warnings, interpretation.
res <- morie_predpol_score_disparity( score = c(9, 10, 11, 19, 20, 21), group = c("A", "A", "A", "B", "B", "B") ) res$value # 10 (group means 10 and 20) res$significant # TRUEres <- morie_predpol_score_disparity( score = c(9, 10, 11, 19, 20, 21), group = c("A", "A", "A", "B", "B", "B") ) res$value # 10 (group means 10 and 20) res$significant # TRUE
For every (city, period) cell the four disparity metrics are
computed; per city the audit then reports the mean of each metric,
the count of periods with DIR above 1, and the DIR temporal range
(max minus min) — the headline measure of instability.
morie_predpol_temporal_audit( period, city, y_pred, group, privileged = NULL, favorable = 1 )morie_predpol_temporal_audit( period, city, y_pred, group, privileged = NULL, favorable = 1 )
period |
Time-period label for each record (e.g. |
city |
City label for each record. |
y_pred |
The decision/assignment for each record. |
group |
Protected attribute for each record. |
privileged |
Reference group; inferred globally from the pooled
data when |
favorable |
Value of |
A named list: value (worst per-city DIR range),
worst_dir_range, cross_city_dir_spread, per_city, cells,
privileged, warnings, interpretation.
period <- c(rep("p1", 10), rep("p2", 10)) city <- rep("A", 20) pred <- rep(c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0), 2) grp <- rep(c(rep("X", 5), rep("Y", 5)), 2) res <- morie_predpol_temporal_audit(period, city, pred, grp, privileged = "X") res$per_city$A$dir_range # 0 — disparity is stable across periodsperiod <- c(rep("p1", 10), rep("p2", 10)) city <- rep("A", 20) pred <- rep(c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0), 2) grp <- rep(c(rep("X", 5), rep("Y", 5)), 2) res <- morie_predpol_temporal_audit(period, city, pred, grp, privileged = "X") res$per_city$A$dir_range # 0 — disparity is stable across periods
Mirrors the Python morie.profile_dataset(). Returns a list of
per-column profiles plus dataset-level metadata.
morie_profile_dataset(df)morie_profile_dataset(df)
df |
A |
A list with components:
n_rows, n_cols
Dataset dimensions.
columnsA named list, one entry per column, each containing
name, dtype, measurement_level, n_missing, n_unique, and
(for numeric columns) mean, sd, min, max, q25, q50, q75.
p <- morie_profile_dataset(iris) p$columns$Species p$columns$Sepal.Lengthp <- morie_profile_dataset(iris) p$columns$Species p$columns$Sepal.Length
Prophet-style additive decomposition (linear trend + Fourier seasonality)
morie_prophet_components(x, period = 12)morie_prophet_components(x, period = 12)
x |
Numeric univariate series. |
period |
Seasonal period. Default 12. |
Named list with trend, seasonal, residual, slope,
intercept, fourier_terms, period, n, method.
morie_prophet_components(x = rnorm(50))morie_prophet_components(x = rnorm(50))
Wilson score confidence interval for a proportion
morie_proportion_ci( successes, n, alpha = 0.05, method = c("wilson", "exact", "wald") )morie_proportion_ci( successes, n, alpha = 0.05, method = c("wilson", "exact", "wald") )
successes |
Number of successes. |
n |
Total observations. |
alpha |
Significance level (default 0.05 -> 95% CI). |
method |
|
Named list: p_hat, ci_lower, ci_upper.
morie_proportion_ci(35, 100)morie_proportion_ci(35, 100)
Hand-rolled base-R implementation. When the psych package is installed,
results agree with psych::alpha()$total to numerical precision.
morie_psymet_alpha(data, ci = 0.95)morie_psymet_alpha(data, ci = 0.95)
data |
Numeric matrix or data.frame: items as columns, respondents as rows. |
ci |
Confidence level (default 0.95). |
A list with components raw, std, avgr, k, n, ci_lo, ci_hi.
Alpha if item deleted
morie_psymet_alphadel(data)morie_psymet_alphadel(data)
data |
Numeric matrix / data.frame. |
data.frame with item, adel.
Average variance extracted (AVE) from factor loadings. Mean(lambda^2).
morie_psymet_ave(loads)morie_psymet_ave(loads)
loads |
Numeric vector of standardised factor loadings (lambda). |
Bartlett's test of sphericity.
morie_psymet_bartlett(data)morie_psymet_bartlett(data)
data |
Numeric matrix or data.frame of items. |
list with chisq, df, pval.
Composite reliability from standardized factor loadings. CR = (sum lambda)^2 / ((sum lambda)^2 + sum(1 - lambda^2))
morie_psymet_cr(loads)morie_psymet_cr(loads)
loads |
Numeric vector of standardised factor loadings (lambda). |
Upper/lower groups by total score (default 27% per Kelley).
morie_psymet_discrimination(data, pct = 0.27)morie_psymet_discrimination(data, pct = 0.27)
data |
Numeric matrix or data.frame of items. |
pct |
Numeric in (0, 0.5); proportion for the upper/lower group split (default 0.27, the Kelley-Cureton rule). |
data.frame with item, d.
Corrected item-total correlations
morie_psymet_itemtotal(data)morie_psymet_itemtotal(data)
data |
Numeric matrix / data.frame. |
data.frame with columns item, r_total, r_corr.
Delegates to psych::KMO() when available; otherwise computes from the
partial-correlation anti-image matrix using base R.
morie_psymet_kmo(data)morie_psymet_kmo(data)
data |
Numeric matrix or data.frame of items. |
list with msa (overall) and named numeric vector items.
Delegates to psych::omega() when available; otherwise uses a single-factor
principal-axis approximation.
morie_psymet_omega(data, nf = 1)morie_psymet_omega(data, nf = 1)
data |
Numeric matrix / data.frame of items. |
nf |
Number of factors (default 1). |
list with total, hier, alpha, nf, expvar.
Delegates to psych::fa.parallel() when available; otherwise compares
observed eigenvalues to the 95th percentile of random-data eigenvalues.
morie_psymet_parallel(data, nsim = 100, seed = 42)morie_psymet_parallel(data, nsim = 100, seed = 42)
data |
Numeric matrix or data.frame of items. |
nsim |
Integer; number of simulated random datasets (default 100). |
seed |
Integer; RNG seed for reproducibility. |
Spearman-Brown split-half reliability.
morie_psymet_splithalf(data, method = c("first_last", "odd_even"))morie_psymet_splithalf(data, method = c("first_last", "odd_even"))
data |
Numeric matrix or data.frame of items. |
method |
"first_last" or "odd_even". |
Thin extender over quantreg::rq for Koenker-Bassett
quantile regression at one or more conditional quantiles
(Koenker & Bassett, 1978; Koenker, 2005).
morie_quantile_reg(formula, tau = 0.5, data, ...)morie_quantile_reg(formula, tau = 0.5, data, ...)
formula |
A model formula of the form
|
tau |
Numeric scalar or vector in |
data |
A data frame containing the variables in
|
... |
Further arguments forwarded to |
A list with $method = "quantreg::rq" and
$raw (an rq / rqs object with the fitted
coefficients at each tau).
## Not run: if (requireNamespace("quantreg", quietly = TRUE)) { set.seed(1) n <- 100 df <- data.frame(x = stats::rnorm(n)) df$y <- 1 + 2 * df$x + stats::rnorm(n) morie_quantile_reg(y ~ x, tau = c(0.25, 0.5, 0.75), data = df) } ## End(Not run)## Not run: if (requireNamespace("quantreg", quietly = TRUE)) { set.seed(1) n <- 100 df <- data.frame(x = stats::rnorm(n)) df$y <- 1 + 2 * df$x + stats::rnorm(n) morie_quantile_reg(y ~ x, tau = c(0.25, 0.5, 0.75), data = df) } ## End(Not run)
Wraps randomForest::randomForest. Auto-detects task from y
(factor / integer-like -> classification, otherwise regression).
morie_random_forest_ensemble( x, y, n_estimators = 100L, max_depth = NULL, task = "auto", seed = 0L, deterministic_seed = NULL )morie_random_forest_ensemble( x, y, n_estimators = 100L, max_depth = NULL, task = "auto", seed = 0L, deterministic_seed = NULL )
x |
Numeric predictor matrix. |
y |
Response. |
n_estimators |
Number of trees. |
max_depth |
Max tree depth (NULL -> unrestricted). |
task |
"auto", "classification", or "regression". |
seed |
RNG seed. |
deterministic_seed |
Integer or NULL. If supplied, the RNG state
is derived from the SHA-keyed |
Named list: estimate, train_score, oob_score, feature_importances, n_estimators, task, n, method.
morie_random_forest_ensemble(x = rnorm(50), y = rnorm(50))morie_random_forest_ensemble(x = rnorm(50), y = rnorm(50))
Uses randomForest if available; otherwise a base-R bagged-tree fallback (regression CART approximation).
morie_random_forest_genomic( x, y, markers, n_trees = 100, max_depth = 10, min_samples = 2, mtry = NULL, seed = 0 )morie_random_forest_genomic( x, y, markers, n_trees = 100, max_depth = 10, min_samples = 2, mtry = NULL, seed = 0 )
x |
Optional fixed features. |
y |
Numeric response. |
markers |
Genotype matrix (n x m). |
n_trees |
Number of trees. |
max_depth |
Max depth (fallback only). |
min_samples |
Min samples per node. |
mtry |
Features sampled per split (default sqrt(p)). |
seed |
Seed. |
list(estimate, y_hat, oob_score, feature_importance, se, n, method).
Breiman (2001); Montesinos Lopez Ch 8.
morie_random_forest_genomic( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )morie_random_forest_genomic( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )
Uses caret::train with search = "random".
morie_random_search_cv( x, y, method = NULL, n_iter = 20L, cv = 5L, task = "auto", seed = 0L, deterministic_seed = NULL )morie_random_search_cv( x, y, method = NULL, n_iter = 20L, cv = 5L, task = "auto", seed = 0L, deterministic_seed = NULL )
x |
Numeric predictor matrix. |
y |
Response. |
method |
caret method id (default by task). |
n_iter |
Number of random draws. |
cv |
CV folds. |
task |
"auto" / "classification" / "regression". |
seed |
RNG seed. |
deterministic_seed |
Integer or NULL. If supplied, the RNG state
is derived from the SHA-keyed |
Named list: estimate, best_params, best_score, sampled_params, sampled_scores, n_iter, task, n, method.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Thin extender over randtests::bartels.rank.test for the
Bartels rank von-Neumann test of randomness in a numeric
sequence.
morie_randtests_bartels(x, ...)morie_randtests_bartels(x, ...)
x |
A numeric vector. |
... |
Further arguments forwarded to
|
A list with $method =
"randtests::bartels.rank.test" and $raw (an
htest object).
Thin extender over randtests::runs.test for a
non-parametric test of randomness in a numeric sequence.
morie_randtests_runs(x, ...)morie_randtests_runs(x, ...)
x |
A numeric vector. |
... |
Further arguments forwarded to
|
A list with $method = "randtests::runs.test"
and $raw (an htest object).
Thin extender over randtests::turning.point.test for the
classical turning-point test of randomness in a numeric
sequence.
morie_randtests_turning_point(x, ...)morie_randtests_turning_point(x, ...)
x |
A numeric vector. |
... |
Further arguments forwarded to
|
A list with $method =
"randtests::turning.point.test" and $raw (an
htest object).
Kendall tau between the observation and its time index t = 1..n. Tests H0: no monotone trend.
morie_rank_based_test(x)morie_rank_based_test(x)
x |
Numeric vector of sequential observations. |
Named list: statistic (tau), p_value, n, inversions, z.
morie_rank_based_test(x = rnorm(50))morie_rank_based_test(x = rnorm(50))
Signed ranks R_i^+ = sign(D_i) * rank(|D_i|) used by Wilcoxon signed-rank.
morie_rank_order_statistics(x, mu0 = 0)morie_rank_order_statistics(x, mu0 = 0)
x |
Numeric vector of differences (or values; mu0 is subtracted). |
mu0 |
Hypothesised median (default 0). |
Named list: signed_ranks, abs_ranks, W_plus, W_minus, n_nonzero, n.
morie_rank_order_statistics(x = rnorm(50))morie_rank_order_statistics(x = rnorm(50))
For each Y_j: placement P_j = number of X_i less than Y_j. Their sum is the Mann-Whitney U statistic for Y vs X.
morie_rank_placements(x, y)morie_rank_placements(x, y)
x, y
|
Numeric vectors. |
Named list: placements, ranks_y, U_y, E_U, Var_U, m, n.
morie_rank_placements(x = rnorm(50), y = rnorm(50))morie_rank_placements(x = rnorm(50), y = rnorm(50))
Calonico-Cattaneo-Titiunik (CCT) MSE-optimal bandwidth
morie_rdd_bandwidth_cct(x, y, cutoff = 0, kernel = "triangular", p = 1)morie_rdd_bandwidth_cct(x, y, cutoff = 0, kernel = "triangular", p = 1)
x |
Numeric vector of running-variable values (used by
bandwidth selectors + density tests that don't take a
|
y |
Numeric vector of outcome values aligned with |
cutoff |
Numeric scalar; the threshold on |
kernel |
One of |
p |
Integer; local-polynomial order (default 1 for local- linear). 2 picks up quadratic curvature for bias correction. |
Dispatches to rdrobust::rdbwselect(bwselect = "mserd") which
implements the modern IK-equivalent CCT MSE-optimal rule.
morie_rdd_bandwidth_ik(x, y, cutoff = 0, kernel = "triangular")morie_rdd_bandwidth_ik(x, y, cutoff = 0, kernel = "triangular")
x |
Numeric vector of running-variable values (used by
bandwidth selectors + density tests that don't take a
|
y |
Numeric vector of outcome values aligned with |
cutoff |
Numeric scalar; the threshold on |
kernel |
One of |
Rule-of-thumb (ROT) bandwidth – Silverman-style on running variable
morie_rdd_bandwidth_rot(x, y, cutoff = 0)morie_rdd_bandwidth_rot(x, y, cutoff = 0)
x |
Numeric vector of running-variable values (used by
bandwidth selectors + density tests that don't take a
|
y |
Numeric vector of outcome values aligned with |
cutoff |
Numeric scalar; the threshold on |
Bandwidth sensitivity sweep
morie_rdd_bandwidth_sensitivity( data, outcome, running, cutoff = 0, bandwidth_range = NULL, p = 1, kernel = "triangular", alpha = 0.05 )morie_rdd_bandwidth_sensitivity( data, outcome, running, cutoff = 0, bandwidth_range = NULL, p = 1, kernel = "triangular", alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable in
|
running |
Character; column name of the running (forcing)
variable in |
cutoff |
Numeric scalar; the threshold on |
bandwidth_range |
Numeric vector of candidate bandwidths used by the sensitivity analysis. |
p |
Integer; local-polynomial order (default 1 for local- linear). 2 picks up quadratic curvature for bias correction. |
kernel |
One of |
alpha |
Significance level (default |
CCT bias-corrected, robust-SE RDD inference
morie_rdd_bias_corrected( data, outcome, running, cutoff = 0, bandwidth = NULL, rho = 1, p = 1, kernel = "triangular", alpha = 0.05 )morie_rdd_bias_corrected( data, outcome, running, cutoff = 0, bandwidth = NULL, rho = 1, p = 1, kernel = "triangular", alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable in
|
running |
Character; column name of the running (forcing)
variable in |
cutoff |
Numeric scalar; the threshold on |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
rho |
Bandwidth ratio for bias correction (Calonico, Cattaneo
& Titiunik 2014); default |
p |
Integer; local-polynomial order (default 1 for local- linear). 2 picks up quadratic curvature for bias correction. |
kernel |
One of |
alpha |
Significance level (default |
Cattaneo-Jansson-Ma (2020) local-polynomial density test
morie_rdd_cattaneo_density( x, cutoff = 0, p = 2, kernel = "triangular", bandwidth = NULL )morie_rdd_cattaneo_density( x, cutoff = 0, p = 2, kernel = "triangular", bandwidth = NULL )
x |
Numeric vector of running-variable values (used by
bandwidth selectors + density tests that don't take a
|
cutoff |
Numeric scalar; the threshold on |
p |
Integer; local-polynomial order (default 1 for local- linear). 2 picks up quadratic curvature for bias correction. |
kernel |
One of |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
Runs a sharp-RDD null test on each covariate.
morie_rdd_covariate_balance( data, running, covariates, cutoff = 0, bandwidth = NULL, kernel = "triangular", alpha = 0.05 )morie_rdd_covariate_balance( data, running, covariates, cutoff = 0, bandwidth = NULL, kernel = "triangular", alpha = 0.05 )
data |
A |
running |
Character; column name of the running (forcing)
variable in |
covariates |
Character vector of column names whose balance at the cutoff is checked. |
cutoff |
Numeric scalar; the threshold on |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
kernel |
One of |
alpha |
Significance level (default |
Thin extender over rddensity::rddensity for the
manipulation / discontinuity-in-density test of Cattaneo, Jansson
& Ma (2020), the modern replacement for the original McCrary
(2008) test.
morie_rdd_density_test(X, cutoff = 0, ...)morie_rdd_density_test(X, cutoff = 0, ...)
X |
Numeric vector of the running / forcing variable. |
cutoff |
Numeric scalar; the threshold value of |
... |
Further arguments forwarded to
|
A list with $method = "rddensity::rddensity" and
$raw (an rddensity object containing the
estimates and the manipulation test statistic).
## Not run: if (requireNamespace("rddensity", quietly = TRUE)) { set.seed(1) x <- c(rnorm(500, -0.2), rnorm(500, 0.2)) morie_rdd_density_test(x, cutoff = 0) } ## End(Not run)## Not run: if (requireNamespace("rddensity", quietly = TRUE)) { set.seed(1) x <- c(rnorm(500, -0.2), rnorm(500, 0.2)) morie_rdd_density_test(x, cutoff = 0) } ## End(Not run)
RDD with discrete running variable
morie_rdd_discrete( data, outcome, running, cutoff = 0, bandwidth = NULL, p = 0, alpha = 0.05 )morie_rdd_discrete( data, outcome, running, cutoff = 0, bandwidth = NULL, p = 0, alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable in
|
running |
Character; column name of the running (forcing)
variable in |
cutoff |
Numeric scalar; the threshold on |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
p |
Integer; local-polynomial order (default 1 for local- linear). 2 picks up quadratic curvature for bias correction. |
alpha |
Significance level (default |
Donut-hole RDD
morie_rdd_donut( data, outcome, running, cutoff = 0, donut = 0, bandwidth = NULL, p = 1, kernel = "triangular", alpha = 0.05 )morie_rdd_donut( data, outcome, running, cutoff = 0, donut = 0, bandwidth = NULL, p = 1, kernel = "triangular", alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable in
|
running |
Character; column name of the running (forcing)
variable in |
cutoff |
Numeric scalar; the threshold on |
donut |
Numeric; symmetric window around the cutoff to drop
in a donut-RDD robustness check (default |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
p |
Integer; local-polynomial order (default 1 for local- linear). 2 picks up quadratic curvature for bias correction. |
kernel |
One of |
alpha |
Significance level (default |
Fuzzy RDD treatment effect via instrumented Wald ratio
morie_rdd_fuzzy( data, outcome, running, treatment, cutoff = 0, bandwidth = NULL, p = 1, kernel = "triangular", alpha = 0.05 )morie_rdd_fuzzy( data, outcome, running, treatment, cutoff = 0, bandwidth = NULL, p = 1, kernel = "triangular", alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable in
|
running |
Character; column name of the running (forcing)
variable in |
treatment |
Character; column name of the treatment-receipt variable (fuzzy designs). |
cutoff |
Numeric scalar; the threshold on |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
p |
Integer; local-polynomial order (default 1 for local- linear). 2 picks up quadratic curvature for bias correction. |
kernel |
One of |
alpha |
Significance level (default |
Geographic / boundary RDD on a signed distance
morie_rdd_geographic( data, outcome, distance_to_boundary, side, bandwidth = NULL, p = 1, kernel = "triangular", alpha = 0.05 )morie_rdd_geographic( data, outcome, distance_to_boundary, side, bandwidth = NULL, p = 1, kernel = "triangular", alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable in
|
distance_to_boundary |
Character; column name of the signed
distance to the geographic boundary in |
side |
Character; column name encoding the treatment side
(e.g. |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
p |
Integer; local-polynomial order (default 1 for local- linear). 2 picks up quadratic curvature for bias correction. |
kernel |
One of |
alpha |
Significance level (default |
Vectorised kernel functions on the support |u| <= 1 (Gaussian is on the real line). Used by RDD local-linear estimators and friends for kernel weighting around the cutoff.
morie_rdd_kernel_triangular(u) morie_rdd_kernel_epanechnikov(u) morie_rdd_kernel_uniform(u) morie_rdd_kernel_gaussian(u)morie_rdd_kernel_triangular(u) morie_rdd_kernel_epanechnikov(u) morie_rdd_kernel_uniform(u) morie_rdd_kernel_gaussian(u)
u |
Numeric vector of standardised distances from the cutoff
(i.e. |
morie_rdd_kernel_triangular:
morie_rdd_kernel_epanechnikov: on |u| <= 1
morie_rdd_kernel_uniform: on |u| <= 1
morie_rdd_kernel_gaussian: , the standard normal density
Numeric vector of kernel weights, same length as u.
Regression kink design – slope discontinuity at the cutoff
morie_rdd_kink( data, outcome, running, cutoff = 0, bandwidth = NULL, kernel = "triangular", alpha = 0.05 )morie_rdd_kink( data, outcome, running, cutoff = 0, bandwidth = NULL, kernel = "triangular", alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable in
|
running |
Character; column name of the running (forcing)
variable in |
cutoff |
Numeric scalar; the threshold on |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
kernel |
One of |
alpha |
Significance level (default |
Local polynomial regression at user-supplied evaluation points
morie_rdd_local_polynomial(x, y, eval_points, h, p = 1, kernel = "triangular")morie_rdd_local_polynomial(x, y, eval_points, h, p = 1, kernel = "triangular")
x |
Running variable (numeric). |
y |
Outcome (numeric). |
eval_points |
Points at which to evaluate the fit. |
h |
Bandwidth. |
p |
Polynomial order (default 1, i.e. local linear). |
kernel |
One of |
A data frame of fitted values and standard errors.
Thin extender over rdlocrand::rdrandinf for finite-sample
randomisation-inference around the cutoff window
(Cattaneo, Frandsen & Titiunik, 2015; Cattaneo, Titiunik & Vazquez-Bare,
2016).
morie_rdd_local_randinf(Y, R, wl, wr, ...)morie_rdd_local_randinf(Y, R, wl, wr, ...)
Y |
Numeric outcome vector. |
R |
Numeric running / forcing variable. |
wl |
Numeric scalar; left edge of the randomisation window. |
wr |
Numeric scalar; right edge of the randomisation window. |
... |
Further arguments forwarded to
|
A list with $method = "rdlocrand::rdrandinf" and
$raw (an rdrandinf object with the
randomisation-inference p-values and the observed statistic).
## Not run: if (requireNamespace("rdlocrand", quietly = TRUE)) { set.seed(1) R <- runif(200, -1, 1) Y <- 0.5 * R + (R >= 0) * 0.3 + rnorm(200, sd = 0.5) morie_rdd_local_randinf(Y, R, wl = -0.1, wr = 0.1) } ## End(Not run)## Not run: if (requireNamespace("rdlocrand", quietly = TRUE)) { set.seed(1) R <- runif(200, -1, 1) Y <- 0.5 * R + (R >= 0) * 0.3 + rnorm(200, sd = 0.5) morie_rdd_local_randinf(Y, R, wl = -0.1, wr = 0.1) } ## End(Not run)
Local-randomisation RDD via permutation in a fixed window
morie_rdd_local_randomisation( data, outcome, running, cutoff = 0, window = 1, n_permutations = 1000, seed = 42, alpha = 0.05 )morie_rdd_local_randomisation( data, outcome, running, cutoff = 0, window = 1, n_permutations = 1000, seed = 42, alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable in
|
running |
Character; column name of the running (forcing)
variable in |
cutoff |
Numeric scalar; the threshold on |
window |
Numeric; half-width of the local randomisation window. |
n_permutations |
Integer; permutation count for the randomisation-based inference. |
seed |
Integer; RNG seed for permutation / bootstrap routines. |
alpha |
Significance level (default |
McCrary (2008) density manipulation test
morie_rdd_mccrary(x, cutoff = 0, n_bins = 50, bandwidth = NULL)morie_rdd_mccrary(x, cutoff = 0, n_bins = 50, bandwidth = NULL)
x |
Numeric vector of running-variable values (used by
bandwidth selectors + density tests that don't take a
|
cutoff |
Numeric scalar; the threshold on |
n_bins |
Integer; bin count for histogram-based density tests and binned-plot reductions. |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
Placebo cutoff falsification test
morie_rdd_placebo_cutoff( data, outcome, running, true_cutoff, placebo_cutoffs, bandwidth = NULL, p = 1, kernel = "triangular", alpha = 0.05 )morie_rdd_placebo_cutoff( data, outcome, running, true_cutoff, placebo_cutoffs, bandwidth = NULL, p = 1, kernel = "triangular", alpha = 0.05 )
data |
A |
outcome |
Character; column name of the response variable in
|
running |
Character; column name of the running (forcing)
variable in |
true_cutoff |
Numeric; the actual policy cutoff (placebo
robustness re-runs the analysis at |
placebo_cutoffs |
Numeric vector of false cutoffs to test. |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
p |
Integer; local-polynomial order (default 1 for local- linear). 2 picks up quadratic curvature for bias correction. |
kernel |
One of |
alpha |
Significance level (default |
Binned scatter + global-polynomial data for an RD plot
morie_rdd_plot_data( data, outcome, running, cutoff = 0, n_bins = 20, p_global = 4, p_local = 1, bandwidth = NULL, kernel = "triangular" )morie_rdd_plot_data( data, outcome, running, cutoff = 0, n_bins = 20, p_global = 4, p_local = 1, bandwidth = NULL, kernel = "triangular" )
data |
A |
outcome |
Character; column name of the response variable in
|
running |
Character; column name of the running (forcing)
variable in |
cutoff |
Numeric scalar; the threshold on |
n_bins |
Integer; bin count for histogram-based density tests and binned-plot reductions. |
p_global |
Integer; polynomial order for the global
component of |
p_local |
Integer; polynomial order for the local component
of |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
kernel |
One of |
RDD power calculation
morie_rdd_power( n, tau, sigma, cutoff_density = 1, bandwidth = NULL, kernel = "triangular", alpha = 0.05 )morie_rdd_power( n, tau, sigma, cutoff_density = 1, bandwidth = NULL, kernel = "triangular", alpha = 0.05 )
n |
Integer; sample-size argument to |
tau |
Numeric; the treatment-effect size used by power / sample-size calculators. |
sigma |
Numeric; outcome standard deviation. |
cutoff_density |
Numeric; running-variable density at the cutoff. |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
kernel |
One of |
alpha |
Significance level (default |
Thin extender over rdpower::rdpower for sharp / fuzzy RDD
power analysis (Cattaneo, Titiunik & Vazquez-Bare, 2019).
morie_rdd_power_calc(data, cutoff = 0, ...)morie_rdd_power_calc(data, cutoff = 0, ...)
data |
Numeric matrix or data frame with two columns: the
outcome |
cutoff |
Numeric scalar; the cutoff for the running
variable (default |
... |
Further arguments forwarded to
|
Named morie_rdd_power_calc rather than morie_rdd_power
because the latter is already taken in R/rdd.R by a closed-form
analytical power formula that takes scalar (n, tau, sigma) rather
than a data frame; this wrapper preserves that function and offers the
full rdpower simulation-based surface alongside it.
A list with $method = "rdpower::rdpower" and
$raw (an rdpower object with the simulated
power and effective sample sizes).
## Not run: if (requireNamespace("rdpower", quietly = TRUE)) { set.seed(1) R <- runif(500, -1, 1) Y <- 0.4 * R + (R >= 0) * 0.2 + rnorm(500, sd = 0.5) morie_rdd_power_calc(cbind(Y, R), cutoff = 0, tau = 0.2) } ## End(Not run)## Not run: if (requireNamespace("rdpower", quietly = TRUE)) { set.seed(1) R <- runif(500, -1, 1) Y <- 0.4 * R + (R >= 0) * 0.2 + rnorm(500, sd = 0.5) morie_rdd_power_calc(cbind(Y, R), cutoff = 0, tau = 0.2) } ## End(Not run)
RDD sample-size determination
morie_rdd_sample_size( tau, sigma, cutoff_density = 1, bandwidth = 1, power = 0.8, kernel = "triangular", alpha = 0.05 )morie_rdd_sample_size( tau, sigma, cutoff_density = 1, bandwidth = 1, power = 0.8, kernel = "triangular", alpha = 0.05 )
tau |
Numeric; the treatment-effect size used by power / sample-size calculators. |
sigma |
Numeric; outcome standard deviation. |
cutoff_density |
Numeric; running-variable density at the cutoff. |
bandwidth |
Numeric; the local-polynomial bandwidth on each
side of the cutoff. |
power |
Numeric in |
kernel |
One of |
alpha |
Significance level (default |
Sharp RDD treatment effect at the cutoff
morie_rdd_sharp( data, outcome, running, cutoff = 0, bandwidth = NULL, p = 1, kernel = "triangular", cluster = NULL, covariates = NULL, alpha = 0.05 )morie_rdd_sharp( data, outcome, running, cutoff = 0, bandwidth = NULL, p = 1, kernel = "triangular", cluster = NULL, covariates = NULL, alpha = 0.05 )
data |
Data frame. |
outcome |
Outcome column. |
running |
Running variable column. |
cutoff |
Threshold (default 0). |
bandwidth |
Optional bandwidth; if |
p |
Local-polynomial order. |
kernel |
Kernel name. |
cluster |
Optional cluster column. |
covariates |
Optional character vector of covariate names. |
alpha |
Significance level. |
Read outputs manifest from a project
morie_read_outputs_manifest( project_root = NULL, manifest_path = NULL, validate = TRUE )morie_read_outputs_manifest( project_root = NULL, manifest_path = NULL, validate = TRUE )
project_root |
Project root path. |
manifest_path |
Optional explicit manifest path. |
validate |
If |
Manifest data frame.
# Craft a minimal manifest in tempdir and read it back: tdir <- tempfile("morie-doc-") dir.create(tdir) man <- file.path(tdir, "outputs_manifest.csv") write.csv( data.frame( output = "results.csv", public_path = file.path(tdir, "results.csv"), size_kb = 0.01, modified = format(Sys.Date()) ), man, row.names = FALSE ) writeLines("x,y\n1,2", file.path(tdir, "results.csv")) morie_read_outputs_manifest(manifest_path = man)# Craft a minimal manifest in tempdir and read it back: tdir <- tempfile("morie-doc-") dir.create(tdir) man <- file.path(tdir, "outputs_manifest.csv") write.csv( data.frame( output = "results.csv", public_path = file.path(tdir, "results.csv"), size_kb = 0.01, modified = format(Sys.Date()) ), man, row.names = FALSE ) writeLines("x,y\n1,2", file.path(tdir, "results.csv")) morie_read_outputs_manifest(manifest_path = man)
Looks up the (level_a, level_b) combination and returns the right default test (Stevens-1946 hierarchy).
morie_recommended_pair_test(tax_a, tax_b)morie_recommended_pair_test(tax_a, tax_b)
tax_a, tax_b
|
Two |
Character scalar — recommended test name.
Recommended summary statistic for a single variable.
morie_recommended_summary(tax)morie_recommended_summary(tax)
tax |
A |
Character scalar — plain-language hint at which summary suits.
Fit a constant-mean, switching-variance K-regime Markov-switching model by EM (Hamilton filter).
morie_regime_switching(x, k_regimes = 2)morie_regime_switching(x, k_regimes = 2)
x |
Numeric univariate series. |
k_regimes |
Number of latent regimes. Default 2. |
Named list with mu, sigma, transition,
smoothed_probabilities, loglik, n, k_regimes, method.
morie_regime_switching(x = rnorm(50))morie_regime_switching(x = rnorm(50))
Wraps glmnet::glmnet. Returns the coefficient path across
the supplied alphas (lambda grid in glmnet terminology).
morie_regularization_path( x, y, penalty = c("ridge", "lasso", "elasticnet"), alphas = NULL, l1_ratio = 0.5 )morie_regularization_path( x, y, penalty = c("ridge", "lasso", "elasticnet"), alphas = NULL, l1_ratio = 0.5 )
x |
Numeric matrix of predictors. |
y |
Numeric response. |
penalty |
One of "ridge", "lasso", "elasticnet". |
alphas |
Lambda grid. Defaults to a logspace. |
l1_ratio |
glmnet alpha; only used when penalty = "elasticnet". |
Named list: estimate, coef_path, alphas, penalty, l1_ratio, n, method.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Risk difference (ARD) with Newcombe CI
morie_risk_difference_ci(table_2x2, alpha = 0.05)morie_risk_difference_ci(table_2x2, alpha = 0.05)
table_2x2 |
A 2x2 matrix: rows are exposure, columns are outcome. |
alpha |
Significance level. |
Named list: rd, ci_lower, ci_upper.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Risk ratio (relative risk) with log-normal CI
morie_risk_ratio_ci(table_2x2, alpha = 0.05)morie_risk_ratio_ci(table_2x2, alpha = 0.05)
table_2x2 |
A 2x2 matrix: rows are exposure, columns are outcome (disease = col 1). |
alpha |
Significance level. |
Named list: rr, ci_lower, ci_upper.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
RKHS regression with Gaussian kernel
morie_rkhs_full(x, y, markers, h = NULL, lam = 1)morie_rkhs_full(x, y, markers, h = NULL, lam = 1)
x |
Fixed-effect design. |
y |
Numeric response. |
markers |
Genotype matrix (n x m). |
h |
Kernel bandwidth (default median ||m_i - m_j||^2). |
lam |
Ridge regulariser on alpha (default 1). |
list(estimate, alpha, beta, K, f_hat, se, h, n, method).
Gianola & van Kaam (2008). Montesinos Lopez Ch 5.
morie_rkhs_full(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))morie_rkhs_full(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))
Vanilla RNN genomic predictor (BPTT, base R)
morie_rnn_genomic( x, y, markers, hidden = 8, n_epochs = 150, lr = 0.01, l2 = 0.001, seed = 0, deterministic_seed = NULL )morie_rnn_genomic( x, y, markers, hidden = 8, n_epochs = 150, lr = 0.01, l2 = 0.001, seed = 0, deterministic_seed = NULL )
x |
Optional fixed-effect design. |
y |
Numeric response. |
markers |
(n x L) marker sequence. |
, n_epochs, lr, l2, seed
|
Hyperparameters. |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
list(estimate, y_hat, W_h, W_x, b_h, w_o, b_o, se, n, method).
Montesinos Lopez Ch 14.
morie_rnn_genomic(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))morie_rnn_genomic(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))
Wraps pROC::roc.
morie_roc_auc_score(y_true, y_score)morie_roc_auc_score(y_true, y_score)
y_true |
Binary labels. |
y_score |
Predicted scores for the positive class. |
Named list: estimate, auc, fpr, tpr, thresholds, n, n_positive, n_negative, method.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
rsample::bootstraps
Thin pass-through that builds a tidymodels-style
rset of bootstrap resamples via
rsample::bootstraps and returns it untouched.
morie_rsample_bootstraps(data, times = 25L, ...)morie_rsample_bootstraps(data, times = 25L, ...)
data |
A data.frame. |
times |
Number of bootstrap resamples (default 25). |
... |
Forwarded to |
An rset rsample object.
rsample::bootstraps, rsample::vfold_cv.
Mirrors the core outputs of the old 07_ebac_ipw.R workflow.
morie_run_ebac_selection_ipw_analysis( data, output_dir = NULL, treatment = "cannabis_any_use", covariates = c("age_group", "gender", "province_region", "mental_health", "physical_health") )morie_run_ebac_selection_ipw_analysis( data, output_dir = NULL, treatment = "cannabis_any_use", covariates = c("age_group", "gender", "province_region", "mental_health", "physical_health") )
data |
Analysis data frame. |
output_dir |
Optional directory for CSV outputs. |
treatment |
Treatment column name. |
covariates |
Covariate names used in the observation model. |
Named list of output tables and the observed-domain analysis frame.
# Run on a synthetic CPADS-shaped frame (the CKAN-fetched PUMF works # identically -- see morie_load_cpads_data() for the real frame): if (requireNamespace("survey", quietly = TRUE)) { set.seed(1) n <- 200 cpads <- data.frame( weight = runif(n, 0.5, 2), alcohol_past12m = rbinom(n, 1, 0.8), heavy_drinking_30d = rbinom(n, 1, 0.3), ebac_tot = abs(rnorm(n, 0.05, 0.03)), ebac_legal = rbinom(n, 1, 0.7), cannabis_any_use = rbinom(n, 1, 0.3), age_group = sample(1:6, n, TRUE), gender = sample(1:2, n, TRUE), province_region = sample(1:5, n, TRUE), mental_health = sample(1:5, n, TRUE), physical_health = sample(1:5, n, TRUE) ) morie_run_ebac_selection_ipw_analysis(cpads) }# Run on a synthetic CPADS-shaped frame (the CKAN-fetched PUMF works # identically -- see morie_load_cpads_data() for the real frame): if (requireNamespace("survey", quietly = TRUE)) { set.seed(1) n <- 200 cpads <- data.frame( weight = runif(n, 0.5, 2), alcohol_past12m = rbinom(n, 1, 0.8), heavy_drinking_30d = rbinom(n, 1, 0.3), ebac_tot = abs(rnorm(n, 0.05, 0.03)), ebac_legal = rbinom(n, 1, 0.7), cannabis_any_use = rbinom(n, 1, 0.3), age_group = sample(1:6, n, TRUE), gender = sample(1:2, n, TRUE), province_region = sample(1:5, n, TRUE), mental_health = sample(1:5, n, TRUE), physical_health = sample(1:5, n, TRUE) ) morie_run_ebac_selection_ipw_analysis(cpads) }
Run one implemented MORIE module against CPADS data
morie_run_morie_module( module_name, cpads_csv = .cpads_default_csv(), output_dir = NULL )morie_run_morie_module( module_name, cpads_csv = .cpads_default_csv(), output_dir = NULL )
module_name |
Module name. |
cpads_csv |
Path to the CPADS CSV. |
output_dir |
Optional directory for CSV outputs. |
Named list of data-frame outputs.
# Dispatch one MORIE module against the canonical CPADS CSV. The CSV # ships with a morie project tree, or is fetched via the CKAN endpoint # (morie_load_dataset("ocp21")). Wrapped in tryCatch so the example # documents usage even when the CSV is not checked out locally. tryCatch( morie_run_morie_module("descriptive-statistics"), error = function(e) message(conditionMessage(e)) )# Dispatch one MORIE module against the canonical CPADS CSV. The CSV # ships with a morie project tree, or is fetched via the CKAN endpoint # (morie_load_dataset("ocp21")). Wrapped in tryCatch so the example # documents usage even when the CSV is not checked out locally. tryCatch( morie_run_morie_module("descriptive-statistics"), error = function(e) message(conditionMessage(e)) )
Run multiple implemented MORIE modules
morie_run_morie_modules( modules = morie_list_morie_modules()$name, cpads_csv = .cpads_default_csv(), output_dir = NULL )morie_run_morie_modules( modules = morie_list_morie_modules()$name, cpads_csv = .cpads_default_csv(), output_dir = NULL )
modules |
Character vector of module names. |
cpads_csv |
Path to the CPADS CSV. |
output_dir |
Optional directory for CSV outputs. |
Named list of module outputs.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Run multiple workflow steps
morie_run_pipeline( steps = NULL, project_root = NULL, script_map = morie_default_workflow_map(), stop_on_error = TRUE, verbose = TRUE )morie_run_pipeline( steps = NULL, project_root = NULL, script_map = morie_default_workflow_map(), stop_on_error = TRUE, verbose = TRUE )
steps |
Ordered vector of workflow step names. |
project_root |
Project root directory. |
script_map |
Named character vector mapping steps to script paths. |
stop_on_error |
If |
verbose |
If |
Data frame of step statuses.
# Build a one-step pipeline in tempdir and dispatch it. The # real package's morie_default_workflow_map() points at scripts that # live in a morie project tree. tdir <- tempfile("morie-doc-") dir.create(tdir) step <- file.path(tdir, "step.R") writeLines('cat("hello from pipeline\\n")', step) morie_run_pipeline( steps = "demo", project_root = tdir, script_map = c(demo = step), verbose = FALSE )# Build a one-step pipeline in tempdir and dispatch it. The # real package's morie_default_workflow_map() points at scripts that # live in a morie project tree. tdir <- tempfile("morie-doc-") dir.create(tdir) step <- file.path(tdir, "step.R") writeLines('cat("hello from pipeline\\n")', step) morie_run_pipeline( steps = "demo", project_root = tdir, script_map = c(demo = step), verbose = FALSE )
Mirrors the core outputs of the old 07_propensity.R workflow.
morie_run_propensity_ipw_analysis( data, output_dir = NULL, trim = c(0.01, 0.99), treatment = "cannabis_any_use", outcome = "heavy_drinking_30d", covariates = c("age_group", "gender", "province_region", "mental_health", "physical_health") )morie_run_propensity_ipw_analysis( data, output_dir = NULL, trim = c(0.01, 0.99), treatment = "cannabis_any_use", outcome = "heavy_drinking_30d", covariates = c("age_group", "gender", "province_region", "mental_health", "physical_health") )
data |
Analysis data frame. |
output_dir |
Optional directory for CSV outputs. |
trim |
Quantile pair used to trim extreme IPW values. |
treatment |
Binary treatment column. |
outcome |
Binary outcome column. |
covariates |
Covariate names for the propensity model. |
Named list of output tables and the analysis data.
# Run on a synthetic CPADS-shaped frame (the CKAN-fetched PUMF works # identically -- see morie_load_cpads_data() for the real frame): set.seed(1) n <- 200 cpads <- data.frame( weight = runif(n, 0.5, 2), alcohol_past12m = rbinom(n, 1, 0.8), heavy_drinking_30d = rbinom(n, 1, 0.3), ebac_tot = abs(rnorm(n, 0.05, 0.03)), ebac_legal = rbinom(n, 1, 0.7), cannabis_any_use = rbinom(n, 1, 0.3), age_group = sample(1:6, n, TRUE), gender = sample(1:2, n, TRUE), province_region = sample(1:5, n, TRUE), mental_health = sample(1:5, n, TRUE), physical_health = sample(1:5, n, TRUE) ) morie_run_propensity_ipw_analysis(cpads)# Run on a synthetic CPADS-shaped frame (the CKAN-fetched PUMF works # identically -- see morie_load_cpads_data() for the real frame): set.seed(1) n <- 200 cpads <- data.frame( weight = runif(n, 0.5, 2), alcohol_past12m = rbinom(n, 1, 0.8), heavy_drinking_30d = rbinom(n, 1, 0.3), ebac_tot = abs(rnorm(n, 0.05, 0.03)), ebac_legal = rbinom(n, 1, 0.7), cannabis_any_use = rbinom(n, 1, 0.3), age_group = sample(1:6, n, TRUE), gender = sample(1:2, n, TRUE), province_region = sample(1:5, n, TRUE), mental_health = sample(1:5, n, TRUE), physical_health = sample(1:5, n, TRUE) ) morie_run_propensity_ipw_analysis(cpads)
Mirrors the Python morie.run_treatment_effects_analysis(). Convenience
wrapper around morie_estimate_ate() that also produces a 95% confidence
interval (delta-method approximation).
R port of investigation.run_treatment_effects_analysis. Returns a
list with:
treatment_effects_summary – data.frame of ATE / ATT / ATC
point estimates (SE / CI columns present but NA, matching Python).
cate_subgroup_estimates – data.frame of within-subgroup
Hajek IPW CATEs with sandwich-style SEs and Wald 95\% CIs.
analysis_frame – the trimmed data.frame with attached
propensity score and IPW weight columns (ps, w_ate, w_att,
w_atc).
Convenience scalars ate / att / atc / se /
ci_lower / ci_upper / n / method preserved from the
previous R surface for backward compatibility.
morie_run_treatment_effects_analysis(data, treatment, outcome, covariates) morie_run_treatment_effects_analysis(data, treatment, outcome, covariates)morie_run_treatment_effects_analysis(data, treatment, outcome, covariates) morie_run_treatment_effects_analysis(data, treatment, outcome, covariates)
data |
data.frame containing treatment, outcome, and covariates. |
treatment |
Treatment column name. |
outcome |
Outcome column name. |
covariates |
Character vector of covariate column names. |
A list with ate, se, ci_lower, ci_upper, n, method.
Named list as described above.
set.seed(1) df <- data.frame( y = rnorm(200), t = rbinom(200, 1, 0.5), x1 = rnorm(200), x2 = rnorm(200) ) morie_run_treatment_effects_analysis(df, treatment = "t", outcome = "y", covariates = c("x1", "x2") )set.seed(1) df <- data.frame( y = rnorm(200), t = rbinom(200, 1, 0.5), x1 = rnorm(200), x2 = rnorm(200) ) morie_run_treatment_effects_analysis(df, treatment = "t", outcome = "y", covariates = c("x1", "x2") )
Mirrors the Python morie.run_weighted_logistic_analysis(). Fits a
binary-outcome model using survey weights via survey::svyglm() if the
suggested survey package is available, otherwise falls back to base
glm() with case weights.
morie_run_weighted_logistic_analysis( data, outcome, predictors, weights_col = NULL )morie_run_weighted_logistic_analysis( data, outcome, predictors, weights_col = NULL )
data |
A |
outcome |
Column name of the binary outcome. |
predictors |
Character vector of predictor column names. |
weights_col |
Optional column name of analytical weights. |
A list with components coefficients (named numeric vector),
std_errors, p_values, n, method ("svyglm" or "glm-weighted").
set.seed(1) df <- data.frame( y = rbinom(200, 1, 0.4), x1 = rnorm(200), x2 = rnorm(200), w = runif(200, 0.5, 1.5) ) morie_run_weighted_logistic_analysis(df, outcome = "y", predictors = c("x1", "x2"), weights_col = "w" )set.seed(1) df <- data.frame( y = rbinom(200, 1, 0.4), x1 = rnorm(200), x2 = rnorm(200), w = runif(200, 0.5, 1.5) ) morie_run_weighted_logistic_analysis(df, outcome = "y", predictors = c("x1", "x2"), weights_col = "w" )
Run one project workflow step
morie_run_workflow_step( step, project_root = NULL, script_map = morie_default_workflow_map(), rscript_bin = file.path(R.home("bin"), "Rscript"), verbose = TRUE )morie_run_workflow_step( step, project_root = NULL, script_map = morie_default_workflow_map(), rscript_bin = file.path(R.home("bin"), "Rscript"), verbose = TRUE )
step |
Step name present in |
project_root |
Project root directory. |
script_map |
Named character vector mapping steps to script paths. |
rscript_bin |
Optional path to |
verbose |
If |
Named list with step metadata and exit status.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Returns a small CSV that ships with the package, suitable for
running examples and tests of the mrm_*() callables without any
network or external data dependency.
morie_sample(name = c("otis_b01", "otis_b09", "otis_c11", "tps_assault"))morie_sample(name = c("otis_b01", "otis_b09", "otis_c11", "tps_assault"))
name |
One of |
A data.frame.
b01 <- morie_sample("otis_b01") head(b01)b01 <- morie_sample("otis_b01") head(b01)
Uses the formula from Hsieh et al. (1998):
morie_sample_size_logistic(p0, or, alpha = 0.05, power = 0.8, two_sided = TRUE)morie_sample_size_logistic(p0, or, alpha = 0.05, power = 0.8, two_sided = TRUE)
p0 |
Prevalence under control. |
or |
Target odds ratio. |
alpha |
Significance level. |
power |
Desired power. |
two_sided |
Logical. |
Integer sample size.
Hsieh FY, Bloch DA, Larsen MD (1998). A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 17(14):1623-1634.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
R port of morie.semipar_bridge. Provides the kernel-based
building blocks used by morie's nuisance estimation pipelines
(TMLE, AIPW, DML): kernel evaluation, Nadaraya-Watson regression,
local linear regression, kernel density estimation, and bandwidth
selection.
The Python module loads a C shared library (semipar_kernels.dylib
/ .so) and falls back to NumPy. The R port is pure R: it
implements the same algorithms in vectorised form and additionally
wraps mgcv::gam for a high-quality penalised-spline smoother
as an alternative to manual bandwidth selection.
kernel_eval: evaluate a kernel function.
nw_regression: Nadaraya-Watson kernel regression.
local_linear: local linear kernel regression.
kde: kernel density estimation.
silverman_bandwidth: rule-of-thumb bandwidth.
loocv_bandwidth: leave-one-out CV bandwidth
for NW regression.
kernel_cond_moments: kernel-weighted mean
and variance.
gam_smoother: mgcv::gam thin-plate
smoother fit + predict.
SemiparKernels: object-style wrapper.
Thin interface to the EValue dispatch family
(evalues.OLS, evalues.RR, evalues.OR,
evalues.HR, evalues.MD), exposed under the
morie_sensitivity_* namespace so MRM / paper callers can
reach the full EValue surface without loading EValue
directly. Pairs with e_value_rr /
e_value_or / e_value_hr /
e_value_d, which are the typed convenience
wrappers around the same backend.
morie_sensitivity_evalue( estimate, se = NULL, sd = NULL, type = c("OLS", "RR", "OR", "HR", "MD"), rare = TRUE, true = NULL, ci_lower = NULL, ci_upper = NULL, ... )morie_sensitivity_evalue( estimate, se = NULL, sd = NULL, type = c("OLS", "RR", "OR", "HR", "MD"), rare = TRUE, true = NULL, ci_lower = NULL, ci_upper = NULL, ... )
estimate |
Observed effect on the requested scale. |
se |
Standard error of |
sd |
Outcome standard deviation (required by |
type |
One of |
rare |
Logical; only relevant for |
true |
Reference value on the appropriate scale (default 0 for OLS / MD, 1 for ratio scales). |
ci_lower, ci_upper
|
Optional 95% CI on the same scale as
|
... |
Additional arguments forwarded to the underlying
|
A list of class morie_sensitivity_evalue with
estimate, e_value_point, e_value_ci,
type, method, and raw (the full EValue
matrix).
VanderWeele, T. J., & Ding, P. (2017). Sensitivity analysis in observational research: introducing the E-value. Annals of Internal Medicine, 167(4), 268–274.
Thin interface to konfound::pkonfound: how many cases would
need to be replaced with average-treatment-effect cases (or how
large would an omitted-variable correlation have to be) to invalidate
the inference? Pairs with
morie_sensitivity_omitted_var_bias (which uses the
Cinelli-Hazlett partial-R-squared framing instead of the
Frank et al. percent-bias-to-invalidate framing).
morie_sensitivity_konfound( estimate, se, n, n_covariates = 0L, alpha = 0.05, ... )morie_sensitivity_konfound( estimate, se, n, n_covariates = 0L, alpha = 0.05, ... )
estimate |
Treatment-coefficient estimate. |
se |
Standard error of |
n |
Number of observations. |
n_covariates |
Number of covariates in the model (excluding the intercept and the treatment). Default 0. |
alpha |
Significance level. Default 0.05. |
... |
Additional arguments forwarded to
|
A list of class morie_sensitivity_konfound with
the percent-bias-to-invalidate, the impact-threshold-of-a-
confounding-variable (ITCV), and the raw konfound object.
Frank, K. A., Maroulis, S. J., Duong, M. Q., & Kelcey, B. M. (2013). What would it take to change an inference? Educational Evaluation and Policy Analysis, 35(4), 437–460.
Thin interface to sensemakr::sensemakr: returns the full
Cinelli-Hazlett robustness-value object including benchmark
bounds, adjusted t-statistics, and the data needed to draw
contour plots. Pairs with omitted_variable_bias,
which is the closed-form version that takes estimate +
se + degrees of freedom directly (useful when you don't
have an lm object handy).
morie_sensitivity_omitted_var_bias( model, treatment, benchmark_covariates = NULL, kd = c(1, 2, 3), ky = NULL, q = 1, alpha = 0.05, ... )morie_sensitivity_omitted_var_bias( model, treatment, benchmark_covariates = NULL, kd = c(1, 2, 3), ky = NULL, q = 1, alpha = 0.05, ... )
model |
A fitted regression model ( |
treatment |
Name of the treatment variable (coefficient). |
benchmark_covariates |
Optional character vector of covariate names whose strengths bound the unmeasured-confounder strength. |
kd |
Multipliers on the benchmark covariate strength.
Default |
ky |
Multipliers on the benchmark covariate's outcome
strength. Default equal to |
q |
Fraction of the estimate to be explained away. Default 1. |
alpha |
Significance level. Default 0.05. |
... |
Additional arguments forwarded to
|
A list of class morie_sensitivity_omitted_var_bias
with the robustness values, partial R-squared of treatment,
benchmark bounds, and the full sensemakr object as raw.
Cinelli, C., & Hazlett, C. (2020). Making sense of sensitivity: extending omitted variable bias. Journal of the Royal Statistical Society B, 82(1), 39–67.
For a range of hidden-confounding levels ,
tests whether the treatment effect remains significant. A large
at which the result remains significant
indicates robustness.
morie_sensitivity_rosenbaum( treated, control, gamma_range = seq(1, 3, by = 0.2) )morie_sensitivity_rosenbaum( treated, control, gamma_range = seq(1, 3, by = 0.2) )
treated |
Numeric vector of outcomes for treated units. |
control |
Numeric vector of outcomes for control units
(may differ in length from |
gamma_range |
Numeric vector of |
Delegates to rbounds::psens() when rbounds is
installed and pairs-of-equal-length data are supplied;
alternatively delegates to sensitivitymv::senmv() when
sensitivitymv is installed. Otherwise falls back to inline
sign-score bounds (Rosenbaum 2002, Section 4.3).
Data frame with columns: gamma, p_lower, p_upper.
Rosenbaum PR (2002). Observational Studies (2nd ed.). Springer.
morie_sensitivity_rosenbaum(treated = rnorm(30, 0.5), control = rnorm(30))morie_sensitivity_rosenbaum(treated = rnorm(30, 0.5), control = rnorm(30))
Thin interface to tipr::tip: returns the minimum value of
the standardised mean difference (smd) or partial R-squared
(R2) of an unmeasured confounder that would tip the lower
(or upper) bound of the confidence interval back to the null.
Pairs with tipping_point_analysis, which targets
missing-data sensitivity rather than unmeasured-confounder
sensitivity.
morie_sensitivity_tipping_point(estimate, smd = NULL, r2 = NULL, ...)morie_sensitivity_tipping_point(estimate, smd = NULL, r2 = NULL, ...)
estimate |
Observed treatment effect on the coefficient scale. |
smd |
Hypothesised standardised mean difference of the unmeasured confounder between treatment groups. |
r2 |
Hypothesised partial R-squared of the unmeasured
confounder with the outcome. Forwarded as the |
... |
Additional arguments forwarded to |
A list of class morie_sensitivity_tipping_point
with the tipped point estimate and the raw tipr object.
D'Agostino McGowan, L. (2022). tipr: An R package for sensitivity analyses for unmeasured confounders. Journal of Open Source Software, 7(77), 4495.
Polynomial-fit smoothing filter. Preserves higher moments (peak heights, shape) better than a moving average and is the standard tool for chromatography, spectroscopy, and biosensor smoothing.
morie_sgolay_smooth(x, window_length = 11L, polyorder = 3L)morie_sgolay_smooth(x, window_length = 11L, polyorder = 3L)
x |
Numeric vector. |
window_length |
Window length (odd integer, default 11). |
polyorder |
Polynomial order (default 3). |
List with filtered (numeric vector) and name.
if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 200) x <- sin(2 * pi * 3 * t) + rnorm(200, sd = 0.2) y <- morie_sgolay_smooth(x, window_length = 11, polyorder = 3) length(y$filtered) }if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) t <- seq(0, 1, length.out = 200) x <- sin(2 * pi * 3 * t) + rnorm(200, sd = 0.2) y <- morie_sgolay_smooth(x, window_length = 11, polyorder = 3) length(y$filtered) }
Shapiro-Wilk normality test
morie_shapiro_wilk_test(x, alpha = 0.05)morie_shapiro_wilk_test(x, alpha = 0.05)
x |
Numeric vector. |
alpha |
Significance level for the |
Named list: W, p_value, is_normal.
morie_shapiro_wilk_test(x = rnorm(50))morie_shapiro_wilk_test(x = rnorm(50))
Builds the discrete rejection region under H0: p = 0.5 with size <= alpha, then evaluates power at the alternative p_alt.
morie_sign_test_power(x, mu0 = 0, p_alt = 0.7, alpha = 0.05)morie_sign_test_power(x, mu0 = 0, p_alt = 0.7, alpha = 0.05)
x |
Numeric vector (only |
mu0 |
Null median. |
p_alt |
Alternative success probability P(X > mu0). |
alpha |
Test level. |
Named list: statistic (power), n, p_alt, alpha, size, k_lower, k_upper.
morie_sign_test_power(x = rnorm(50))morie_sign_test_power(x = rnorm(50))
Simple random sample from a data frame
morie_simple_random_sample(df, n, replace = FALSE, seed = 42L)morie_simple_random_sample(df, n, replace = FALSE, seed = 42L)
df |
A data frame. |
n |
Number of units to select. |
replace |
Sample with replacement? Default |
seed |
Random seed for reproducibility. |
A data frame of n sampled rows with a .weight column added.
df <- data.frame(x = 1:100) srs_sample <- morie_simple_random_sample(df, 20)df <- data.frame(x = 1:100) srs_sample <- morie_simple_random_sample(df, 20)
simpleboot::two.boot
Thin pass-through to simpleboot::two.boot for the
common two-sample bootstrap-of-a-statistic case (e.g. difference
of means, ratio of medians). Returns a boot object the
user can hand to boot::boot.ci or morie_boot_basic_ci().
morie_simpleboot_two(x, y, statistic = mean, R = 1000L, ...)morie_simpleboot_two(x, y, statistic = mean, R = 1000L, ...)
x, y
|
Numeric vectors. |
statistic |
A scalar function applied to one sample at a
time (e.g. |
R |
Number of bootstrap replicates (default 1000). |
... |
Forwarded to |
A boot object.
simpleboot::two.boot, simpleboot::one.boot,
simpleboot::lm.boot.
Simulate a longitudinal panel and return a tidy long-format data.frame
morie_simulate_longitudinal_panel( n_individuals = 50, n_timepoints = 20, p_variables = 3, cov_kernel = "ar1", cov_rho = 0.5, ar_lags = 1L, ar_spectral_radius = 0.8, ar_decay = 0.6, missing_fraction = 0, outlier_fraction = 0, outlier_scale = 5, seed = 42L )morie_simulate_longitudinal_panel( n_individuals = 50, n_timepoints = 20, p_variables = 3, cov_kernel = "ar1", cov_rho = 0.5, ar_lags = 1L, ar_spectral_radius = 0.8, ar_decay = 0.6, missing_fraction = 0, outlier_fraction = 0, outlier_scale = 5, seed = 42L )
n_individuals |
Number of subjects. |
n_timepoints |
Number of time-points per subject. |
p_variables |
Number of variables. |
cov_kernel |
Innovation covariance kernel. |
cov_rho |
Correlation parameter. |
ar_lags |
VAR lag order. |
ar_spectral_radius |
Target spectral radius (per lag). |
ar_decay |
Geometric decay across lags. |
missing_fraction |
Probability of NA mask per entry. |
outlier_fraction |
Probability of outlier amplification. |
outlier_scale |
Multiplicative factor for outliers. |
seed |
Non-negative integer seed. |
A data.frame with columns subject_id, t, variable, value.
if (FALSE) { df <- morie_simulate_longitudinal_panel( n_individuals = 30, n_timepoints = 10, p_variables = 4 ) head(df) }if (FALSE) { df <- morie_simulate_longitudinal_panel( n_individuals = 30, n_timepoints = 10, p_variables = 4 ) head(df) }
Convenience wrapper that runs each of the analysis surfaces and
optionally writes a .txt dump (and .json payload if
jsonlite is available) per result. Failures are captured
per-analysis: one bad surface does not stop the rest.
morie_siu_all_analyses(data = NULL, out_dir = NULL)morie_siu_all_analyses(data = NULL, out_dir = NULL)
data |
Either a data frame (e.g. the output of
|
out_dir |
Optional directory; if non- |
A named list of morie_rich_result objects.
morie.siu.analyze)Turns a scraped SIU_by_case.csv (the canonical output of
morie_fetch_siu) into a set of RichResult-style
descriptive analyses. Each callable accepts either a path to the
CSV or a data frame directly and returns a structured list with
title, summary_lines, tables,
interpretation, warnings, and payload
fields. The structure mirrors the Python analyze.py surfaces so
that the same multi-section render works through morie's
rich-output dispatch.
These analyses are deliberately distinct from the published-table
replicators in morie_siu_sprott_doob_feb2021 and the
Mandela classifier in morie_siu_classify_mandela:
those work from CSC SIU person-stay data, whereas these analyses
work from the Ontario SIU director's-report corpus.
morie_fetch_siu,
morie_siu_classify_mandela,
mrm_siu_per_service_rate.
For each populated field in the parser's row, ask the LLM whether the extracted value is supported by the cached report HTML. Used to surface fields where the regex parser is plausibly wrong – the LLM's verdicts are not authoritative, just an automated way to triage which rows a human should re-read against the HTML.
morie_siu_anomaly_check( case_number, model = c("ollama", "gemini"), cache_dir = file.path(tempdir(), "morie", "siu"), max_html_chars = 80000L, mock_response_text = NULL )morie_siu_anomaly_check( case_number, model = c("ollama", "gemini"), cache_dir = file.path(tempdir(), "morie", "siu"), max_html_chars = 80000L, mock_response_text = NULL )
case_number |
An SIU case number (e.g. |
model |
One of |
cache_dir |
Directory holding the harvester's SIU.csv and
the optional |
max_html_chars |
Soft cap on the HTML payload sent to the model (default 80,000 – larger than any real SIU report, small enough to stay under typical context budgets). |
mock_response_text |
For testing only: if non-NULL, skip the network call and use this string as the model's raw reply. |
One API call is made per case (all fields batched into a single prompt with structured-JSON output).
A data frame with one row per populated parser field:
field, parser_value, verdict (one of
"agree" / "disagree" / "unclear"), and
reason (a short sentence pointing to the report passage).
## Not run: Sys.setenv(GOOGLE_API_KEY = "your-gemini-key") a <- morie_siu_anomaly_check("17-OVI-201", model = "gemini") subset(a, verdict == "disagree") ## End(Not run)## Not run: Sys.setenv(GOOGLE_API_KEY = "your-gemini-key") a <- morie_siu_anomaly_check("17-OVI-201", model = "gemini") subset(a, verdict == "disagree") ## End(Not run)
For any case_number (or drid), return the parser's 64-column row
together with the raw HTML pages it was extracted from – the
director's-report page and, when linked, the news-release page.
This is the per-row ground truth: every field in the emitted CSV
is reproducible from report_html via the parser, and any
disagreement with another data source can be adjudicated against
the saved HTML.
morie_siu_audit_case( case_number, cache_dir = file.path(tempdir(), "morie", "siu"), fetch_if_missing = TRUE )morie_siu_audit_case( case_number, cache_dir = file.path(tempdir(), "morie", "siu"), fetch_if_missing = TRUE )
case_number |
An SIU case number (e.g. |
cache_dir |
Directory holding the harvester's SIU.csv and the
optional |
fetch_if_missing |
If |
Reads from the local cache at <cache_dir>/html/ (populated
by morie_fetch_siu(cache_html = TRUE)) when available, and
falls back to a polite live fetch when the cache is missing.
A list with elements row (the parser's 1-row data
frame for this case), drid, nrid,
report_html, news_html, report_text
(HTML-stripped plain text of the report) and news_text.
## Not run: a <- morie_siu_audit_case( "17-OVI-201", cache_dir = file.path(tempdir(), "morie", "siu") ) cat(substr(a$report_text, 1, 1000), "\n") ## End(Not run)## Not run: a <- morie_siu_audit_case( "17-OVI-201", cache_dir = file.path(tempdir(), "morie", "siu") ) cat(substr(a$report_text, 1, 1000), "\n") ## End(Not run)
Runs morie_siu_anomaly_check() on a vector of case_numbers
and aggregates per-field across them. Output is a data frame with
one row per SIU column, ordered by how often the LLM auditor
agreed with the C++ parser. The worst-ranked rows are the
parser fields that most deserve regex / extraction-logic fixes.
morie_siu_audit_columns( case_numbers, model = c("ollama", "gemini"), cache_dir = file.path(tempdir(), "morie", "siu"), max_html_chars = 80000L, max_examples_per_field = 5L, progress = TRUE )morie_siu_audit_columns( case_numbers, model = c("ollama", "gemini"), cache_dir = file.path(tempdir(), "morie", "siu"), max_html_chars = 80000L, max_examples_per_field = 5L, progress = TRUE )
case_numbers |
Character vector of SIU case numbers to audit. |
model |
One of |
cache_dir |
Directory holding the harvester's SIU.csv and
the optional |
max_html_chars |
Soft cap on the HTML payload sent to the model (default 80,000 – larger than any real SIU report, small enough to stay under typical context budgets). |
max_examples_per_field |
Maximum disagreement examples retained per field (default 5). |
progress |
Logical; print a per-case progress line. |
Examples of LLM-flagged disagreements are attached as the
"examples" attribute of the returned data frame (one
nested data frame per field), with at most
max_examples_per_field cases each. Each example carries
the case_number, the parser_value, and the LLM's one-sentence
reason – enough for a maintainer to pop the cached HTML for
that case, see who's right, and decide whether to refine the
regex pattern for that field.
Designed for cheap local audit: with model = "ollama"
pointed at a local Gemma / Qwen / DeepSeek instance, auditing
50-100 cases costs zero API spend and finishes in a few
minutes. With model = c("gemini", "ollama") the chain
uses paid Gemini first and silently falls back to the local
model on quota / network errors.
A data frame with columns field, n_audited,
n_agree, n_disagree, n_unclear,
agree_rate. Sorted ascending by agree_rate so the
most-broken fields land at the top. The "examples"
attribute holds nested data frames of flagged cases per field.
## Not run: Sys.setenv( OLLAMA_HOST = "http://localhost:11434", OLLAMA_MODEL = "gemma3:4b" ) csv <- morie_fetch_siu(cache_html = TRUE) df <- utils::read.csv(csv, colClasses = "character") sample <- sample(df$case_number[nzchar(df$case_number)], 50L) audit <- morie_siu_audit_columns(sample, model = "ollama") # Worst 8 fields, ripe for parser fixes: head(audit, 8) # See concrete disagreements for the worst field: attr(audit, "examples")[[audit$field[1L]]] ## End(Not run)## Not run: Sys.setenv( OLLAMA_HOST = "http://localhost:11434", OLLAMA_MODEL = "gemma3:4b" ) csv <- morie_fetch_siu(cache_html = TRUE) df <- utils::read.csv(csv, colClasses = "character") sample <- sample(df$case_number[nzchar(df$case_number)], 50L) audit <- morie_siu_audit_columns(sample, model = "ollama") # Worst 8 fields, ripe for parser fixes: head(audit, 8) # See concrete disagreements for the worst field: attr(audit, "examples")[[audit$field[1L]]] ## End(Not run)
Per-police-service tabulation of case counts, charges-recommended counts, and the implied charge rate. Sorted by case count (descending), capped at the top 30 in the rendered table.
morie_siu_by_police_service(data = NULL)morie_siu_by_police_service(data = NULL)
data |
Either a data frame (e.g. the output of
|
A morie_rich_result list with summary lines,
one table, and a payload of raw counts.
Year-over-year case volume plus charges-rate, parsed from the
first four characters of date_of_incident_iso.
morie_siu_by_year(data = NULL)morie_siu_by_year(data = NULL)
data |
Either a data frame (e.g. the output of
|
A morie_rich_result list.
Returns the path <cache_dir>/SIU.csv, creating
cache_dir if needed. Default is file.path(tempdir(),
"morie", "siu"); pass morie_cache_dir("siu") for persistent
caching.
morie_siu_cache_path(cache_dir = file.path(tempdir(), "morie", "siu"))morie_siu_cache_path(cache_dir = file.path(tempdir(), "morie", "siu"))
cache_dir |
Output directory. |
Absolute path to SIU.csv (file may not exist yet).
p <- morie_siu_cache_path(tempfile("siu_demo_")) pp <- morie_siu_cache_path(tempfile("siu_demo_")) p
Distribution (n, mean, median, min, max) of subject-officials, witness-officials, civilian-witnesses, and officers-involved.
morie_siu_case_counts(data = NULL)morie_siu_case_counts(data = NULL)
data |
Either a data frame (e.g. the output of
|
A morie_rich_result list.
Cross-tabulates charges_recommended against incident-year
and tests independence with a Pearson chi-square (
stats::chisq.test; p-value via stats::pchisq). Years with
zero charge-decided cases are dropped. Complements
morie_siu_verify_chi2 in sprott_doob.R, which
tests specific published 2x2 tables; this is a generic "did the
charge rate move?" probe over the harvested SIU corpus.
morie_siu_charges_by_year_chi2(data = NULL)morie_siu_charges_by_year_chi2(data = NULL)
data |
Either a data frame (e.g. the output of
|
A morie_rich_result list including the contingency
table, statistic, df, and p-value.
Operationalizes UN Mandela Rules 43 and 44 against a single SIU person-stay's days, average hours out of cell, and percent- of-days that missed the legislatively-required 4 hours out of cell.
morie_siu_classify_mandela( days_in_siu, hrs_out_of_cell_avg, missed_full_4hrs_pct_of_days )morie_siu_classify_mandela( days_in_siu, hrs_out_of_cell_avg, missed_full_4hrs_pct_of_days )
days_in_siu |
Length of the stay (days). |
hrs_out_of_cell_avg |
Average hours out of cell per day during the stay. |
missed_full_4hrs_pct_of_days |
Percent of days (0-100) where the inmate did not receive the legislatively-required 4 hours out of cell. |
Solitary Confinement (Rule 44): <=2 hrs avg out of cell, missed full 4 hrs every day, stay <=15 days.
Torture (Rules 43+44): same conditions but stay length >=16 days (crosses the "prolonged" threshold).
All other: did not meet either threshold.
A named list with elements category, rule,
and reason.
morie_siu_classify_mandela(20, 1.5, 100)$category morie_siu_classify_mandela(8, 1.5, 100)$category morie_siu_classify_mandela(20, 5, 50)$categorymorie_siu_classify_mandela(20, 1.5, 100)$category morie_siu_classify_mandela(8, 1.5, 100)$category morie_siu_classify_mandela(20, 5, 50)$category
For one case_number, line up the parser's value against the same field in a user-supplied external data source – and, critically, show the surrounding report HTML so the user can adjudicate any disagreement against the actual source document.
morie_siu_compare( case_number, external, field_map = NULL, external_case_col = "Q1", cache_dir = file.path(tempdir(), "morie", "siu") )morie_siu_compare( case_number, external, field_map = NULL, external_case_col = "Q1", cache_dir = file.path(tempdir(), "morie", "siu") )
case_number |
A case number (e.g. |
external |
A data frame of external answers, OR a path to an
|
field_map |
A named list mapping external-column names to morie field names. |
external_case_col |
Name of the external column carrying the case-number key. |
cache_dir |
Directory holding the harvester's SIU.csv and optional cached HTML. |
The ground truth is the SIU director's-report HTML itself. The HTML is what the SIU published; the parser's job is to extract structured fields from it faithfully, and any field's correctness is decidable by reading the cached HTML for that case. Any external reference – a hand-coded survey, an independently-scraped CSV, a colleague's analysis – is just another extraction attempt, possibly with its own errors. This function does not endorse any external source; it only displays both side-by-side with the HTML excerpt so you can decide.
The default field map covers the common SIU-extraction column
layout (Q1 = case_number, Q3 = police_service,
Q4 = number_of_officers_involved, ...). Pass a custom
field_map for any other external schema.
A data frame with one row per mapped field: field,
parser_value, external_value, agree, and
html_excerpt (a 240-character window around the first
occurrence of either value in the cleaned report text). When
parser and external disagree, the html_excerpt is the
tie-breaker.
## Not run: # Caller supplies their own external table; nothing about the # mapping or the file format is canonical to morie. external <- data.frame(case_id = "17-OVI-201", officers = 1L) cmp <- morie_siu_compare( "17-OVI-201", external = external, field_map = list(officers = "number_of_officers_involved"), external_case_col = "case_id" ) subset(cmp, !agree) ## End(Not run)## Not run: # Caller supplies their own external table; nothing about the # mapping or the file format is canonical to morie. external <- data.frame(case_id = "17-OVI-201", officers = 1L) cmp <- morie_siu_compare( "17-OVI-201", external = external, field_map = list(officers = "number_of_officers_involved"), external_case_col = "case_id" ) subset(cmp, !agree) ## End(Not run)
Day-delta distributions for the three SIU process intervals:
From
date_of_incident_iso to date_siu_notified_iso.
From
date_siu_notified_iso to
date_of_director_decision_iso.
The end-to-end interval.
morie_siu_decision_timing(data = NULL)morie_siu_decision_timing(data = NULL)
data |
Either a data frame (e.g. the output of
|
A morie_rich_result list.
Sex/gender frequency table and age-distribution summary.
morie_siu_demographics(data = NULL)morie_siu_demographics(data = NULL)
data |
Either a data frame (e.g. the output of
|
A morie_rich_result list.
On-demand scraper for the Ontario Special Investigations Unit (SIU)
Director's Reports index at https://www.siu.on.ca/en/directors_reports.php.
This is the R port of morie.siu_fetch – the lightweight
httr2/rvest path that complements the C/C++ harvester
in morie_fetch_siu. Use this when:
morie_siu_index_url()morie_siu_index_url()
you want a tiny R-only dependency footprint (no compiled code);
you only need the header / index fields (case_number, police_service, incident date, decision date) – not the full 64-column schema;
you are running on a host where the C++ parser does not build.
Distribution policy (2026-05): the scraped corpus is NOT shipped with the package. Each user runs the scraper themselves, which is unambiguously fair use of public oversight reports.
The scraper is conservative: a 2-second delay between requests,
retries on 5xx, and a descriptive user-agent. The latest published
year as of release is 2023; years = NULL (the default) scrapes
the unfiltered index, which surfaces the most recent posts.
By default this writes SIU.csv under tempdir()
so R cleans it up at end of session. Pass cache_dir =
morie_cache_dir("siu") explicitly to opt into a persistent cross-
session cache; see morie_cache_dir and
morie_cache_clear (no implicit writes to ~/.cache).
Pulls the SIU index, walks every linked case-detail page, and writes a
six-column CSV (case_number, police_service,
incident_iso, notification_iso, decision_iso,
director_decision_text, source_url) into
cache_dir.
morie_siu_fetch_cases( years = NULL, cache_dir = file.path(tempdir(), "morie", "siu"), overwrite = FALSE, progress = TRUE )morie_siu_fetch_cases( years = NULL, cache_dir = file.path(tempdir(), "morie", "siu"), overwrite = FALSE, progress = TRUE )
years |
Integer vector of fiscal years to scrape, or |
cache_dir |
Output directory. Default
|
overwrite |
Logical; if |
progress |
Logical; print a one-line status per index / case
fetch when |
This is the lightweight R-only path. For the full 64-column corpus
use morie_fetch_siu (compiled C++ harvester).
Path to the written SIU.csv.
## Not run: # Network: scrapes the SIU index (~5-15 min at the polite rate). csv <- morie_siu_fetch_cases(cache_dir = tempfile("siu_")) utils::head(utils::read.csv(csv)) ## End(Not run)## Not run: # Network: scrapes the SIU index (~5-15 min at the polite rate). csv <- morie_siu_fetch_cases(cache_dir = tempfile("siu_")) utils::head(utils::read.csv(csv)) ## End(Not run)
Thin wrapper over morie_siu_fetch_cases, returning a
data frame instead of the CSV path. Mirrors the Python
fetch_siu_dataframe() adapter used by the dataset catalog.
morie_siu_fetch_dataframe(...)morie_siu_fetch_dataframe(...)
... |
Forwarded to |
A data frame with the six-column SIU header schema.
## Not run: df <- morie_siu_fetch_dataframe(cache_dir = tempfile("siu_")) utils::head(df) ## End(Not run)## Not run: df <- morie_siu_fetch_dataframe(cache_dir = tempfile("siu_")) utils::head(df) ## End(Not run)
Returns the shipped drid manifest as a data frame – one row per
director's-report id morie has verified, with the parsed case
number, detected language, and the canonical drid (the English
drid for that case, or the first drid if no English version
exists). This is the index morie_fetch_siu() uses
internally; exposing it lets users:
morie_siu_index(lang = c("all", "en", "fr", "valid"), canonical_only = FALSE)morie_siu_index(lang = c("all", "en", "fr", "valid"), canonical_only = FALSE)
lang |
Filter rows by detected language. |
canonical_only |
If |
see exactly which drids ship as known-valid (no need to fetch to find out);
subset to English-only or French-only case lists without running the full harvester;
map between drid (URL fragment) and case_number (SIU's own identifier) offline.
The manifest is refreshed by maintainers via
morie_siu_refresh_manifest(); it ships gzipped under
inst/extdata/ at ~50 KB.
A data frame with columns drid, http_code,
body_bytes, attempts, case_number,
_language, source, retrieved_at_utc,
canonical_drid.
idx <- morie_siu_index(lang = "en") head(idx) # How many drids are English vs French vs unknown? table(morie_siu_index()$`_language`) # Unique-case index (English-preferred) canon <- morie_siu_index(canonical_only = TRUE) nrow(canon)idx <- morie_siu_index(lang = "en") head(idx) # How many drids are English vs French vs unknown? table(morie_siu_index()$`_language`) # Unique-case index (English-preferred) canon <- morie_siu_index(canonical_only = TRUE) nrow(canon)
Sends the cached director's-report HTML for one case through a
large-language-model endpoint and asks it to return the 64-column
morie schema as JSON. The result is in the SAME row format as the
C++ parser, so it drops straight into morie_siu_compare()
as the external argument for an independent diff against
the parser.
morie_siu_llm_extract( case_number, model = c("ollama", "gemini"), cache_dir = file.path(tempdir(), "morie", "siu"), max_html_chars = 80000L, mock_response_text = NULL )morie_siu_llm_extract( case_number, model = c("ollama", "gemini"), cache_dir = file.path(tempdir(), "morie", "siu"), max_html_chars = 80000L, mock_response_text = NULL )
case_number |
An SIU case number (e.g. |
model |
One of |
cache_dir |
Directory holding the harvester's SIU.csv and
the optional |
max_html_chars |
Soft cap on the HTML payload sent to the model (default 80,000 – larger than any real SIU report, small enough to stay under typical context budgets). |
mock_response_text |
For testing only: if non-NULL, skip the network call and use this string as the model's raw reply. |
The cached HTML remains the ground truth. This function does not claim the LLM is more accurate than the regex parser; it provides a fast second extraction so disagreements between two independent methods (regex vs. LLM) can be flagged for human review against the saved report.
Credentials are read from environment variables only – never
hard-coded, never passed as function arguments – so secrets do
not leak into call traces, logs, or scripts. Set
GOOGLE_API_KEY for Gemini, ANTHROPIC_API_KEY for
Claude, or OLLAMA_HOST (e.g.
"http://localhost:11434" or an OllamaFreeAPI base URL) plus
optionally OLLAMA_MODEL (default "llama3.2:3b") for
Ollama-compatible open-weight endpoints.
A one-row data frame with the 64 morie SIU columns. Any field the model could not extract is the empty string (matching the C++ parser's convention).
## Not run: Sys.setenv(GOOGLE_API_KEY = "your-gemini-key") r <- morie_siu_llm_extract("17-OVI-201", model = "gemini") # Diff parser vs LLM against the HTML: morie_siu_compare( "17-OVI-201", external = r, field_map = setNames(as.list(names(r)), names(r)), external_case_col = "case_number" ) ## End(Not run)## Not run: Sys.setenv(GOOGLE_API_KEY = "your-gemini-key") r <- morie_siu_llm_extract("17-OVI-201", model = "gemini") # Diff parser vs LLM against the HTML: morie_siu_compare( "17-OVI-201", external = r, field_map = setNames(as.list(names(r)), names(r)), external_case_col = "case_number" ) ## End(Not run)
Frequency table of the semicolon-delimited keyword tags written by
the parser into mental_health_or_race_indications. The
parser scans narratives against a closed vocabulary
(see morie.siu._parser._MH_RACE_KEYWORDS); this analysis
tallies those tags across the case corpus.
morie_siu_mental_health_race_indicators(data = NULL)morie_siu_mental_health_race_indicators(data = NULL)
data |
Either a data frame (e.g. the output of
|
A morie_rich_result list, including a warning
noting that keyword-presence is a signal, not a verdict.
Parse a SIU director's-report HTML page (pure-R)
morie_siu_parse_html(html, drid = NA_integer_, source_url = NULL)morie_siu_parse_html(html, drid = NA_integer_, source_url = NULL)
html |
Raw HTML response body. |
drid |
Optional integer drid from the request URL – useful when the page itself doesn't echo it. |
source_url |
Optional canonical URL – used to derive drid
and recorded as |
A list with every SIU_COLUMNS key (NAs for
unfound fields).
News-release pages live at news_template.php?nrid=<N> and
have a different layout than the director's reports – a single
headline, short summary paragraph, signed-by-Director line.
morie_siu_parse_news_html(html, nrid = NA_integer_, source_url = NULL)morie_siu_parse_news_html(html, nrid = NA_integer_, source_url = NULL)
html |
Raw HTML response body. |
nrid |
Optional integer nrid from the request URL. |
source_url |
Optional canonical URL. |
A list with nrid, source_url_news,
news_release_title, news_release_date_iso,
news_release_date_raw, news_release_summary,
case_number, and directors_name.
morie.siu._parser)Parses one SIU director's-report HTML page (or one news-release
page) into a structured row list. The production parser lives in
the Rcpp / C++ backend (.siu_parse_report,
.siu_parse_news); this pure-R port is provided as a
reference implementation and as a fallback for environments
where the compiled libmorie backend is unavailable.
Suggested dependencies. These functions optionally use
rvest + xml2 for DOM walking; without them, a
regex-based fallback over flat tag-stripped text is used. Either
way the parser is pure (no network) – hand it a raw HTML string
and it returns a row dict matching SIU_COLUMNS.
Hardened against the SIU page markup shifting over time by:
looking for several label variants per field,
falling back to regex on stripped text when DOM structure shifts,
preserving the verbatim narrative_full
regardless of parse success.
Saves a (case_number, field, verified_value) tuple to a local
overrides CSV at <cache_dir>/canonical_overrides.csv. Every
subsequent morie_fetch_siu() on that cache_dir will
overlay these corrections onto the regex-parsed output. The shipped
inst/extdata/siu_canonical_overrides.csv.gz carries
maintainer-confirmed corrections; this function lets users add
their own without touching the package source.
morie_siu_record_correction( case_number, field, verified_value, note = "", cache_dir = file.path(tempdir(), "morie", "siu") )morie_siu_record_correction( case_number, field, verified_value, note = "", cache_dir = file.path(tempdir(), "morie", "siu") )
case_number |
SIU case number, e.g. |
field |
Name of the column in the SIU schema (e.g.
|
verified_value |
The correct value, verified against the
cached HTML (see |
note |
Optional one-line note describing the basis for the correction (HTML excerpt, LLM verdict, etc.). |
cache_dir |
Directory holding the harvester's SIU.csv. |
This is the "memory" of the parser: every wrong cell you find and fix becomes permanent for that cache_dir. Maintainers can submit corrections upstream by sharing the resulting CSV file.
Invisibly, the path to the updated overrides CSV.
# Writes the correction to a temp cache so the example never # touches the per-user cache directory. tmp <- tempfile("morie_siu_"); dir.create(tmp, recursive = TRUE) morie_siu_record_correction( case_number = "17-OVI-201", field = "location_of_call", verified_value = "Clair Road East, City of Guelph", note = "HTML excerpt: 'on Clair Road East in the City of Guelph'", cache_dir = tmp ) unlink(tmp, recursive = TRUE)# Writes the correction to a temp cache so the example never # touches the per-user cache directory. tmp <- tempfile("morie_siu_"); dir.create(tmp, recursive = TRUE) morie_siu_record_correction( case_number = "17-OVI-201", field = "location_of_call", verified_value = "Clair Road East, City of Guelph", note = "HTML excerpt: 'on Clair Road East in the City of Guelph'", cache_dir = tmp ) unlink(tmp, recursive = TRUE)
Sweeps director's-report ids 1..max_drid and writes a small
CSV recording which ids return a healthy report page, the parsed
case number, and the response body size. The harvester
(morie_fetch_siu) then uses this manifest to short-circuit
the ~30-50 percent of ids that have no report, saving bandwidth and
WAF-trigger risk on every run.
morie_siu_refresh_manifest( out_path = NULL, max_drid = NULL, min_drid = 1L, concurrency = 4L, rate_rps = 4, progress = TRUE )morie_siu_refresh_manifest( out_path = NULL, max_drid = NULL, min_drid = 1L, concurrency = 4L, rate_rps = 4, progress = TRUE )
out_path |
Path to write the gzipped CSV. Default is the in-place manifest location (only useful for maintainers building from a source checkout). |
max_drid |
Highest drid to probe. Default |
min_drid |
Lowest drid to probe (default |
concurrency |
Maximum simultaneous transfers (default |
rate_rps |
Maximum request starts per second (default |
progress |
Logical; print a per-batch progress line. |
The shipped manifest at inst/extdata/siu_drid_manifest.csv.gz
is a snapshot. Users who want the latest can call this function;
it is also how morie maintainers regenerate the snapshot.
Invisibly, a data frame of the full sweep (every probed drid,
including misses), parallel to what was written to out_path.
## Not run: # Network: refreshes the manifest by probing the SIU site # (~25-40 min at the default polite rate of 4 RPS for ~6000 ids). df <- morie_siu_refresh_manifest(out_path = tempfile(fileext = ".csv.gz")) table(df$http_code) ## End(Not run)## Not run: # Network: refreshes the manifest by probing the SIU site # (~25-40 min at the default polite rate of 4 RPS for ~6000 ids). df <- morie_siu_refresh_manifest(out_path = tempfile(fileext = ".csv.gz")) table(df$http_code) ## End(Not run)
For every row in a parser-emitted SIU table, flag cells that
don't match the expected format for their column – case_number
that doesn't look like an SIU case id, date_*_iso that isn't a
valid ISO 8601 date, number_of_* that isn't a positive integer,
charges_recommended that isn't "Yes" / "No", etc. Returns a
data frame ranked by issue count so the most-broken rows surface
at the top for manual inspection against the cached HTML.
morie_siu_sanity_check(df)morie_siu_sanity_check(df)
df |
A data frame in the morie SIU 64-column schema, or a path to such a CSV. |
Designed to be a fast first-pass quality filter – runs in
milliseconds, no network, no LLM, no API key. Doesn't try to
verify correctness against the underlying report (that's what
morie_siu_audit_columns() is for); just checks that each
value MATCHES THE EXPECTED FORMAT for its field. A clean sanity
check is necessary but not sufficient for correctness.
A data frame with one row per source row, columns:
case_number, drid, issues_count (integer
number of suspicious cells), issues (semicolon-separated
string of field:reason pairs). Ordered descending by
issues_count.
## Not run: csv <- morie_fetch_siu(cache_dir = tempdir(), cache_html = TRUE) sanity <- morie_siu_sanity_check(csv) head(sanity, 10) # worst 10 rows -- inspect against HTML table(sanity$issues_count) ## End(Not run)## Not run: csv <- morie_fetch_siu(cache_dir = tempdir(), cache_html = TRUE) sanity <- morie_siu_sanity_check(csv) head(sanity, 10) # worst 10 rows -- inspect against HTML table(sanity$issues_count) ## End(Not run)
Bundles Tables 13, 19, and 23 (the three headline tables) into a
single morie_siu_result with cross-references.
morie_siu_sprott_doob_feb2021()morie_siu_sprott_doob_feb2021()
A morie_siu_result.
morie_siu_sprott_doob_feb2021()$payload$headline_findings$n_total_staysmorie_siu_sprott_doob_feb2021()$payload$headline_findings$n_total_stays
Sprott-Doob-Iftene (May 2021) Table 1: IEDM-reviewed population
morie_siu_sprott_doob_iftene_table1()morie_siu_sprott_doob_iftene_table1()
A morie_siu_result.
Sprott-Doob-Iftene (May 2021) Table 10: per-IEDM decision variance
morie_siu_sprott_doob_iftene_table10()morie_siu_sprott_doob_iftene_table10()
A morie_siu_result.
Sprott-Doob-Iftene (May 2021) Table 15: long-stay no-IEDM cases
morie_siu_sprott_doob_iftene_table15()morie_siu_sprott_doob_iftene_table15()
A morie_siu_result.
Sprott-Doob-Iftene (May 2021) Table 9: IEDM review outcomes
morie_siu_sprott_doob_iftene_table9()morie_siu_sprott_doob_iftene_table9()
A morie_siu_result.
Sprott-Doob (Feb 2021) Table 11: Region x stay length
morie_siu_sprott_doob_table11()morie_siu_sprott_doob_table11()
A morie_siu_result.
Sprott-Doob (Feb 2021) Table 12: regional over-/under-representation
morie_siu_sprott_doob_table12()morie_siu_sprott_doob_table12()
A morie_siu_result.
Sprott-Doob (Feb 2021) Table 13: regional SIU person-stay rates
morie_siu_sprott_doob_table13()morie_siu_sprott_doob_table13()
A morie_siu_result (named list with the replicated
table, summary lines, interpretation, and payload).
morie_siu_sprott_doob_table13()$payload$qc_on_short_stay_ratiomorie_siu_sprott_doob_table13()$payload$qc_on_short_stay_ratio
Sprott-Doob (Feb 2021) Table 15: Region x MH-flag
morie_siu_sprott_doob_table15()morie_siu_sprott_doob_table15()
A morie_siu_result.
Sprott-Doob (Feb 2021) Table 19: Mandela-Rules classification
morie_siu_sprott_doob_table19()morie_siu_sprott_doob_table19()
A morie_siu_result.
morie_siu_sprott_doob_table19()$payload$pct_problematicmorie_siu_sprott_doob_table19()$payload$pct_problematic
Sprott-Doob (Feb 2021) Table 22: Region x Mandela groups
morie_siu_sprott_doob_table22()morie_siu_sprott_doob_table22()
A morie_siu_result.
Sprott-Doob (Feb 2021) Table 23: regional torture/solitary rates
morie_siu_sprott_doob_table23()morie_siu_sprott_doob_table23()
A morie_siu_result.
morie_siu_sprott_doob_table23()$payload$pac_on_torture_ratiomorie_siu_sprott_doob_table23()$payload$pac_on_torture_ratio
Sprott-Doob (Feb 2021) Table 4: length-of-stay distribution
morie_siu_sprott_doob_table4()morie_siu_sprott_doob_table4()
A morie_siu_result.
For SIU cases whose parser-emitted text isn't in the reader's
preferred language, translate the long-form text fields into
target_lang via a local Ollama model (default $0 cost,
no API key) and save each translation as a canonical override.
Subsequent morie_fetch_siu() runs then return text in
target_lang for those cases automatically.
morie_siu_translate_fr_to_en is a thin
back-compat wrapper that calls morie_siu_translate
with target_lang = "en", source_lang = "fr".
morie_siu_translate( target_lang = NULL, source_lang = NULL, case_numbers = NULL, model = "ollama", fields = c("narrative_summary", "news_release_summary", "news_release_title", "relevant_legislation"), cache_dir = file.path(tempdir(), "morie", "siu"), progress = TRUE ) morie_siu_translate_fr_to_en( case_numbers = NULL, model = "ollama", fields = c("narrative_summary", "news_release_summary", "news_release_title", "relevant_legislation"), cache_dir = file.path(tempdir(), "morie", "siu"), progress = TRUE )morie_siu_translate( target_lang = NULL, source_lang = NULL, case_numbers = NULL, model = "ollama", fields = c("narrative_summary", "news_release_summary", "news_release_title", "relevant_legislation"), cache_dir = file.path(tempdir(), "morie", "siu"), progress = TRUE ) morie_siu_translate_fr_to_en( case_numbers = NULL, model = "ollama", fields = c("narrative_summary", "news_release_summary", "news_release_title", "relevant_legislation"), cache_dir = file.path(tempdir(), "morie", "siu"), progress = TRUE )
target_lang |
Target ISO 639-1 language code (or full
language name). Defaults to |
source_lang |
Source language code, or |
case_numbers |
Character vector of SIU case numbers to
translate. Defaults to every row whose |
model |
LLM model chain (see |
fields |
Which text fields to translate. Defaults to the
long-form fields that benefit most from translation:
|
cache_dir |
Directory holding the harvester's SIU.csv and cached HTML. |
progress |
Print per-case progress. |
Use cases:
French-only SIU reports (a few per year of SIU output) that have no English-paired drid – translate to "en" so downstream analyses can join them with the rest.
English SIU reports that a Hindi / Spanish / Mandarin / Punjabi / Arabic / etc. reader needs – translate to their first language for accessibility.
Any cross-language pivot for community-oriented publication, where the reader's first language isn't what the SIU originally published in.
Idempotent (skips cases that already have an override on file
for this target_lang). Self-improving (every translation
accumulates in <cache_dir>/canonical_overrides.csv, so
the SIU table becomes more accessible every time you run this).
Maintainers can promote the resulting overrides into the
shipped inst/extdata/siu_canonical_overrides.csv.gz.
For best speed/quality on multilingual translation use
OLLAMA_MODEL=translategemma:latest – a Gemma model
fine-tuned for translation. Falls back to whatever model
OLLAMA_MODEL points at.
Invisibly, a data frame of newly-recorded (case_number, field, verified_value) translations.
## Not run: Sys.setenv( OLLAMA_HOST = "http://localhost:11434", OLLAMA_MODEL = "translategemma:latest" ) csv <- morie_fetch_siu(cache_html = TRUE) # Translate every non-English row to English: morie_siu_translate(target_lang = "en") # Or translate everything to Hindi for a Hindi-first reader: morie_siu_translate(target_lang = "hi") # Re-fetch picks up the new overrides automatically: csv <- morie_fetch_siu(overwrite = TRUE) ## End(Not run)## Not run: Sys.setenv( OLLAMA_HOST = "http://localhost:11434", OLLAMA_MODEL = "translategemma:latest" ) csv <- morie_fetch_siu(cache_html = TRUE) # Translate every non-English row to English: morie_siu_translate(target_lang = "en") # Or translate everything to Hindi for a Hindi-first reader: morie_siu_translate(target_lang = "hi") # Re-fetch picks up the new overrides automatically: csv <- morie_fetch_siu(overwrite = TRUE) ## End(Not run)
Pure-base-R Pearson chi-square without Yates correction. Intended for quick self-checks of the transcribed cell counts against the published chi-square values.
morie_siu_verify_chi2(observed)morie_siu_verify_chi2(observed)
observed |
A 2D matrix or data frame of non-negative counts. |
A named list with elements chi2, df,
p_value, expected, and n.
morie_siu_verify_chi2(matrix(c(10, 10, 10, 10), nrow = 2))$chi2morie_siu_verify_chi2(matrix(c(10, 10, 10, 10), nrow = 2))$chi2
Cross-checks Sprott-Doob Tables 11, 15, 22 (Feb 2021) and Sprott- Doob-Iftene Tables 5, 10 (May 2021) by recomputing the chi-square from each transcribed contingency table and comparing it to the published value. A "pass" means the recomputed chi-square is within rounding tolerance (1.0-1.5 units) of the published value.
morie_siu_verify_published_chi_squares()morie_siu_verify_published_chi_squares()
A morie_siu_result with the verification table and
per-table warnings for any mismatch.
v <- morie_siu_verify_published_chi_squares() v$payload$n_passv <- morie_siu_verify_published_chi_squares() v$payload$n_pass
morie.Federal Court affidavits / expert evidence indexed by morie.
MORIE_SIUIAP_AFFIDAVITSMORIE_SIUIAP_AFFIDAVITS
An object of class list of length 1.
Searches MORIE_SIUIAP_REPORTS, then MORIE_SIUIAP_CRIMSL_REPORTS,
then MORIE_SIUIAP_AFFIDAVITS, in order, and returns a one-line
citation in the form <authors> (<year>). <title>. <publisher>..
morie_siuiap_cite(report_id = "final_2024")morie_siuiap_cite(report_id = "final_2024")
report_id |
Character scalar. One of the names of
|
A character scalar citation. Errors on unknown report_id.
morie_siuiap_cite("final_2024") morie_siuiap_cite("sprott_doob_torture_solitary_2021")morie_siuiap_cite("final_2024") morie_siuiap_cite("sprott_doob_torture_solitary_2021")
CRIMSL UToronto Sprott / Doob / Iftene research reports (2020-2021).
MORIE_SIUIAP_CRIMSL_REPORTSMORIE_SIUIAP_CRIMSL_REPORTS
An object of class list of length 4.
Earlier (Doob-chaired) panel, established 2019, dissolved mid-2020.
MORIE_SIUIAP_ORIGINAL_PANEL_2019_2020MORIE_SIUIAP_ORIGINAL_PANEL_2019_2020
An object of class list of length 6.
SIU IAP panel mandate (long-form prose).
MORIE_SIUIAP_PANEL_MANDATEMORIE_SIUIAP_PANEL_MANDATE
An object of class character of length 1.
SIU IAP panel members (2021-2024 panel, Sapers-chaired).
MORIE_SIUIAP_PANEL_MEMBERSMORIE_SIUIAP_PANEL_MEMBERS
An object of class list of length 3.
Human-readable summary of the SIU IAP panel.
morie_siuiap_panel_summary()morie_siuiap_panel_summary()
A character scalar summarising chair, members, mandate dates, and the Public Safety Canada URL.
cat(morie_siuiap_panel_summary())cat(morie_siuiap_panel_summary())
SIU IAP panel reports (Public Safety Canada, 2022-2024).
MORIE_SIUIAP_REPORTSMORIE_SIUIAP_REPORTS
An object of class list of length 5.
SIU IAP – Public Safety Canada landing page URL.
MORIE_SIUIAP_URLMORIE_SIUIAP_URL
An object of class character of length 1.
Recovers latent stimulus positions from perceptual placement data by
estimating respondent-specific intercepts and slopes
in the model
Delegates to basicspace::aldmck when the basicspace package is
installed; otherwise a hand-rolled EM/least-squares fallback is used.
morie_spatial_voting_aldrich_mckelvey( Z, n_dims = 1L, max_iter = 100L, tol = 1e-06 )morie_spatial_voting_aldrich_mckelvey( Z, n_dims = 1L, max_iter = 100L, tol = 1e-06 )
Z |
A respondent-by-stimulus numeric matrix of perceptual
placements. |
n_dims |
Number of latent dimensions (typically 1). |
max_iter |
Maximum EM iterations for the fallback solver. |
tol |
Convergence tolerance on the stimulus configuration. |
A list with components zhat (stimulus positions), alpha,
beta, weights, iterations, converged, and engine
("basicspace" or "fallback").
Aldrich, J. H. and McKelvey, R. D. (1977). "A Method of Scaling with Applications to the 1968 and 1972 Presidential Elections." American Political Science Review, 71(1), 111-130.
Poole, K. T. (1998). "Recovering a Basic Space from a Set of Issue Scales." American Journal of Political Science, 42(3), 954-993.
Armstrong, D. A., Bakker, R., Carroll, R., Hare, C., Poole, K. T., and Rosenthal, H. (2021). Analyzing Spatial Models of Choice and Judgment, 2nd ed. Chapman & Hall/CRC.
set.seed(1) Z <- matrix(rnorm(20 * 5), 20, 5) fit <- morie_spatial_voting_aldrich_mckelvey(Z) fit$zhatset.seed(1) Z <- matrix(rnorm(20 * 5), 20, 5) fit <- morie_spatial_voting_aldrich_mckelvey(Z) fit$zhat
Carroll et al. (2013) mixture model between Gaussian and quadratic utility, sampled via slice sampling (Neal 2003). Porting the slice sampler is beyond this session's budget.
morie_spatial_voting_alpha_nominate( votes, n_dims = 2L, n_samples = 500L, burn_in = 100L, seed = 42L )morie_spatial_voting_alpha_nominate( votes, n_dims = 2L, n_samples = 500L, burn_in = 100L, seed = 42L )
votes |
Vote matrix. @param n_dims Latent dimensions. |
n_dims |
Integer; latent ideal-point dimensionality (default 2). |
n_samples |
MCMC samples. @param burn_in Burn-in length. |
burn_in |
Integer; MCMC burn-in iterations to discard before summarising the posterior. |
seed |
RNG seed. |
Never returns; raises NotYetPorted.
Carroll, R., Lewis, J. B., Lo, J., Poole, K. T., and Rosenthal, H. (2013); Neal, R. M. (2003) Annals of Statistics.
## Not run: morie_spatial_voting_alpha_nominate(matrix(0, 5, 5))## Not run: morie_spatial_voting_alpha_nominate(matrix(0, 5, 5))
Anchoring vignettes for DIF correction
morie_spatial_voting_anchoring_vignettes(Y, V, n_categories = 5L)morie_spatial_voting_anchoring_vignettes(Y, V, n_categories = 5L)
Y |
Vector of self-placement ratings. |
V |
Respondent-by-vignette ratings. |
n_categories |
Number of ordered categories. |
List with corrected_scores, thresholds, dif_estimates,
vignette_order, n_respondents, n_vignettes.
King, G., Murray, C. J. L., Salomon, J. A., and Tandon, A. (2003). "Enhancing the Validity and Cross-Cultural Comparability of Measurement in Survey Research." APSR, 97(4), 567-583.
Y <- sample(1:5, 30, replace = TRUE) V <- matrix(sample(1:5, 30 * 3, replace = TRUE), 30, 3) morie_spatial_voting_anchoring_vignettes(Y, V)Y <- sample(1:5, 30, replace = TRUE) V <- matrix(sample(1:5, 30 * 3, replace = TRUE), 30, 3) morie_spatial_voting_anchoring_vignettes(Y, V)
Bayesian Aldrich-McKelvey scaling (stub)
morie_spatial_voting_bayesian_am( Z, n_samples = 1000L, burn_in = 200L, prior_sd = 10 )morie_spatial_voting_bayesian_am( Z, n_samples = 1000L, burn_in = 200L, prior_sd = 10 )
Z |
Perceptual placement matrix. |
n_samples |
MCMC samples. |
burn_in |
Burn-in length. |
prior_sd |
Prior SD on stimulus positions. |
Never returns; raises NotYetPorted.
Hare, C., Armstrong, D. A., Bakker, R., Carroll, R., and Poole, K. T. (2015). "Using Bayesian Aldrich-McKelvey Scaling to Study Citizens' Ideological Preferences and Perceptions." AJPS, 59(3).
## Not run: morie_spatial_voting_bayesian_am(matrix(rnorm(50), 10, 5))## Not run: morie_spatial_voting_bayesian_am(matrix(rnorm(50), 10, 5))
Bayesian IRT likelihood (deterministic part of CJR machinery)
morie_spatial_voting_bayesian_irt_likelihood(votes, x, alpha, beta)morie_spatial_voting_bayesian_irt_likelihood(votes, x, alpha, beta)
votes |
Binary matrix. @param x Ideal points. |
x |
Matrix or data.frame of vote data (rows = legislators, columns = roll-call votes). |
alpha |
Difficulty. @param beta Discrimination. |
beta |
Numeric vector of item-difficulty parameters; one entry per column of |
List with loglik, vote_probs, n_correct, n_total,
accuracy.
Clinton, Jackman & Rivers (2004).
v <- matrix(stats::rbinom(20, 1, 0.5), 4, 5) morie_spatial_voting_bayesian_irt_likelihood( v, matrix(rnorm(4), 4, 1), rep(0, 5), matrix(rnorm(5), 5, 1))v <- matrix(stats::rbinom(20, 1, 0.5), 4, 5) morie_spatial_voting_bayesian_irt_likelihood( v, matrix(rnorm(4), 4, 1), rep(0, 5), matrix(rnorm(5), 5, 1))
Posterior summaries for a Bayesian IRT chain
morie_spatial_voting_bayesian_irt_posterior(chain, standardize = TRUE)morie_spatial_voting_bayesian_irt_posterior(chain, standardize = TRUE)
chain |
Array of shape (n_samples, n_leg, n_dims). |
standardize |
Whether to per-sample standardise. |
List with posterior_mean, posterior_sd, ci_lower,
ci_upper, n_samples, standardized.
Jackman (2009).
ch <- array(rnorm(100 * 5 * 2), c(100, 5, 2)) morie_spatial_voting_bayesian_irt_posterior(ch)ch <- array(rnorm(100 * 5 * 2), c(100, 5, 2)) morie_spatial_voting_bayesian_irt_posterior(ch)
Bayesian MDS (stub) – log-normal distances via Metropolis
morie_spatial_voting_bayesian_mds( D, n_dims = 2L, n_samples = 1000L, burn_in = 200L, sigma_init = 1 )morie_spatial_voting_bayesian_mds( D, n_dims = 2L, n_samples = 1000L, burn_in = 200L, sigma_init = 1 )
D |
Distance matrix. @param n_dims Dimensions. @param n_samples MCMC samples. |
n_dims |
Integer; latent dimensionality. |
n_samples |
Integer; posterior-sample count. |
burn_in |
Burn-in length. @param sigma_init Initial sigma. |
sigma_init |
Numeric; initial value for the latent-coordinate scale (default 1). |
Never returns; raises NotYetPorted.
Oh & Raftery (2001) JASA 96(455).
## Not run: morie_spatial_voting_bayesian_mds(matrix(0, 5, 5))## Not run: morie_spatial_voting_bayesian_mds(matrix(0, 5, 5))
Bayesian unfolding (stub) – Bakker & Poole sampler
morie_spatial_voting_bayesian_unfolding( D, n_dims = 2L, n_samples = 1000L, burn_in = 200L )morie_spatial_voting_bayesian_unfolding( D, n_dims = 2L, n_samples = 1000L, burn_in = 200L )
D |
Respondent-stimulus dissimilarity matrix. |
n_dims |
Latent dimensions. @param n_samples MCMC samples. |
n_samples |
Integer; posterior-sample count. |
burn_in |
Burn-in length. |
Never returns; raises NotYetPorted.
Bakker, R. and Poole, K. T. (2013).
## Not run: morie_spatial_voting_bayesian_unfolding(matrix(0, 3, 4))## Not run: morie_spatial_voting_bayesian_unfolding(matrix(0, 3, 4))
Recovers respondent ideal points from an issue-scale response matrix
via SVD on the column-centred matrix. Implements Poole's (1998)
decomposition . Delegates to
basicspace::blackbox when available.
morie_spatial_voting_blackbox(X, n_dims = 2L)morie_spatial_voting_blackbox(X, n_dims = 2L)
X |
A respondent-by-issue numeric matrix of responses
( |
n_dims |
Number of dimensions to extract. |
A list with ideal_points, stimuli_weights, eigenvalues,
singular_values, explained_variance, col_means, n_dims, and
engine.
Poole, K. T. (1998); Armstrong et al. (2021).
set.seed(1) X <- matrix(rnorm(30 * 6), 30, 6) morie_spatial_voting_blackbox(X, n_dims = 2)set.seed(1) X <- matrix(rnorm(30 * 6), 30, 6) morie_spatial_voting_blackbox(X, n_dims = 2)
Clinton-Jackman-Rivers Bayesian IRT (stub)
morie_spatial_voting_cjr_irt( votes, n_dims = 1L, n_samples = 1000L, burn_in = 200L )morie_spatial_voting_cjr_irt( votes, n_dims = 1L, n_samples = 1000L, burn_in = 200L )
votes |
Binary roll-call matrix. |
n_dims |
Ideal-point dimensions. |
n_samples |
MCMC samples. @param burn_in Burn-in length. |
burn_in |
Integer; MCMC burn-in iterations. |
Never returns; raises NotYetPorted.
Clinton, Jackman & Rivers (2004).
## Not run: morie_spatial_voting_cjr_irt(matrix(0, 5, 5))## Not run: morie_spatial_voting_cjr_irt(matrix(0, 5, 5))
Torgerson scaling via eigendecomposition of the double-centred matrix.
morie_spatial_voting_classical_mds(D, n_dims = 2L)morie_spatial_voting_classical_mds(D, n_dims = 2L)
D |
Symmetric numeric distance matrix. |
n_dims |
Number of dimensions to extract. |
A list with coordinates, eigenvalues, stress, fit,
B_matrix.
Torgerson, W. S. (1952); Armstrong et al. (2021).
D <- as.matrix(dist(matrix(rnorm(40), 10))) morie_spatial_voting_classical_mds(D, n_dims = 2)D <- as.matrix(dist(matrix(rnorm(40), 10))) morie_spatial_voting_classical_mds(D, n_dims = 2)
Cutting-line endpoints for Coombs-mesh plots
morie_spatial_voting_cutting_lines(normals, cutpoints, xlim = c(-1, 1))morie_spatial_voting_cutting_lines(normals, cutpoints, xlim = c(-1, 1))
normals |
(n_votes x n_dims) normal vectors. |
cutpoints |
Numeric cutpoint offsets. |
xlim |
Length-2 numeric vector of x-axis limits. |
List with endpoints (list of pairs), angles, midpoints,
n_lines.
Poole (2005).
morie_spatial_voting_cutting_lines(matrix(rnorm(6), 3, 2), c(0.1, -0.2, 0))morie_spatial_voting_cutting_lines(matrix(rnorm(6), 3, 2), c(0.1, -0.2, 0))
Computes with
.
morie_spatial_voting_double_centering(D)morie_spatial_voting_double_centering(D)
D |
Symmetric numeric distance matrix. |
The double-centered matrix .
Torgerson (1952); Armstrong et al. (2021), Section 3.
morie_spatial_voting_double_centering(as.matrix(dist(matrix(rnorm(20), 5))))morie_spatial_voting_double_centering(as.matrix(dist(matrix(rnorm(20), 5))))
Gaussian-error NOMINATE variant supporting comparable scores across legislative sessions.
morie_spatial_voting_dw_nominate( votes, n_dims = 2L, max_iter = 100L, tol = 1e-06 )morie_spatial_voting_dw_nominate( votes, n_dims = 2L, max_iter = 100L, tol = 1e-06 )
votes |
Legislator-by-vote binary matrix. |
n_dims |
Latent dimensions. |
max_iter |
Maximum iterations. |
tol |
Tolerance (unused; kept for API parity). |
List with ideal_points, dim_weights, normal_vectors,
cutpoints, log_lik, gmp, n_dims.
Poole, K. T. and Rosenthal, H. (1997). Congress: A Political- Economic History of Roll Call Voting. Oxford University Press.
set.seed(1) v <- matrix(stats::rbinom(20 * 30, 1, 0.5), 20, 30) morie_spatial_voting_dw_nominate(v, max_iter = 20)set.seed(1) v <- matrix(stats::rbinom(20 * 30, 1, 0.5), 20, 30) morie_spatial_voting_dw_nominate(v, max_iter = 20)
Time-series IRT where ideal points evolve via a random walk:
.
morie_spatial_voting_dynamic_irt( votes, time_periods, n_samples = 500L, burn_in = 100L, seed = 42L )morie_spatial_voting_dynamic_irt( votes, time_periods, n_samples = 500L, burn_in = 100L, seed = 42L )
votes |
Vote matrix. @param time_periods Per-vote period indices. |
time_periods |
Integer vector of period indices (one per roll call) for the dynamic-IRT random-walk prior on ideal points. |
n_samples |
MCMC samples. @param burn_in Burn-in length. |
burn_in |
Integer; MCMC burn-in iterations. |
seed |
RNG seed. |
Never returns; raises NotYetPorted.
Martin, A. D. and Quinn, K. M. (2002). "Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the U.S. Supreme Court, 1953-1999." Political Analysis, 10(2).
## Not run: morie_spatial_voting_dynamic_irt(matrix(0, 4, 4), 1:4)## Not run: morie_spatial_voting_dynamic_irt(matrix(0, 4, 4), 1:4)
Imai, Lo & Olmsted (2016) closed-form EM updates for binary IRT, suitable for very large vote matrices where MCMC is infeasible.
morie_spatial_voting_em_irt(votes, n_dims = 1L, max_iter = 100L, tol = 1e-06)morie_spatial_voting_em_irt(votes, n_dims = 1L, max_iter = 100L, tol = 1e-06)
votes |
Legislator-by-vote binary matrix. |
n_dims |
Latent dimensions. |
max_iter |
Maximum EM iterations. |
tol |
Convergence tolerance on ideal-point change. |
List with ideal_points, discrimination, difficulty,
log_lik, iterations.
Imai, K., Lo, J., and Olmsted, J. (2016). "Fast Estimation of Ideal Points with Massive Data." APSR, 110(4), 631-656.
set.seed(1) v <- matrix(stats::rbinom(20 * 30, 1, 0.5), 20, 30) morie_spatial_voting_em_irt(v, max_iter = 20L)set.seed(1) v <- matrix(stats::rbinom(20 * 30, 1, 0.5), 20, 30) morie_spatial_voting_em_irt(v, max_iter = 20L)
Ideal-point recovery from unfolding output
morie_spatial_voting_ideal_point_recovery(X_r, X_s = NULL)morie_spatial_voting_ideal_point_recovery(X_r, X_s = NULL)
X_r |
Respondent coordinates. |
X_s |
Stimulus coordinates (unused; the respondent row IS the ideal point). |
Numeric matrix of respondent ideal points.
Armstrong et al. (2021), Section 4.5.
morie_spatial_voting_ideal_point_recovery(matrix(rnorm(6), 3, 2))morie_spatial_voting_ideal_point_recovery(matrix(rnorm(6), 3, 2))
Carroll & Chang (1970) weighted MDS with a shared stimulus space and per-individual dimension weights.
morie_spatial_voting_indscal( dissimilarities, n_dims = 2L, max_iter = 300L, tol = 1e-06 )morie_spatial_voting_indscal( dissimilarities, n_dims = 2L, max_iter = 300L, tol = 1e-06 )
dissimilarities |
List of (n_stim x n_stim) dissimilarity matrices. |
n_dims |
Number of dimensions. |
max_iter |
Maximum ALS iterations. |
tol |
Convergence tolerance on configuration change. |
List with group_config, weights, stress, iterations,
n_individuals, n_stimuli.
Carroll, J. D. and Chang, J.-J. (1970). "Analysis of Individual Differences in Multidimensional Scaling via an N-way Generalization of Eckart-Young Decomposition." Psychometrika, 35(3).
D1 <- as.matrix(dist(matrix(rnorm(20), 5))) D2 <- as.matrix(dist(matrix(rnorm(20), 5))) morie_spatial_voting_indscal(list(D1, D2), n_dims = 2L, max_iter = 30L)D1 <- as.matrix(dist(matrix(rnorm(20), 5))) D2 <- as.matrix(dist(matrix(rnorm(20), 5))) morie_spatial_voting_indscal(list(D1, D2), n_dims = 2L, max_iter = 30L)
MDS fit statistics (Mardia criterion)
morie_spatial_voting_mds_fit_stats(eigenvalues)morie_spatial_voting_mds_fit_stats(eigenvalues)
eigenvalues |
Numeric vector of MDS eigenvalues. |
List with fit_by_dim, cumulative_fit, eigenvalues.
Mardia, K. V. (1978). "Some Properties of Classical Multi-Dimensional Scaling." Communications in Statistics.
morie_spatial_voting_mds_fit_stats(c(4, 2, 1, 0.5))morie_spatial_voting_mds_fit_stats(c(4, 2, 1, 0.5))
Multidimensional Least-Squares Metric Unfolding (Poole 1984; Bakker & Poole 2013). Alternates between respondent and stimulus coordinates, restarting from random seeds and keeping the lowest-stress fit.
morie_spatial_voting_mlsmu6( D, n_dims = 2L, max_iter = 200L, tol = 1e-06, n_restarts = 5L )morie_spatial_voting_mlsmu6( D, n_dims = 2L, max_iter = 200L, tol = 1e-06, n_restarts = 5L )
D |
Respondent-by-stimulus distance/rating matrix. |
n_dims |
Number of latent dimensions. |
max_iter |
Maximum alternations per restart. |
tol |
Convergence tolerance on relative stress change. |
n_restarts |
Number of random restarts. |
A list with respondent_coords, stimulus_coords, stress,
iterations, converged.
Poole, K. T. (1984). "Least Squares Metric, Unidimensional Unfolding." Psychometrika, 49(3). Bakker, R. and Poole, K. T. (2013).
D <- matrix(stats::runif(20 * 6), 20, 6) morie_spatial_voting_mlsmu6(D, n_dims = 2, n_restarts = 1, max_iter = 50)D <- matrix(stats::runif(20 * 6), 20, 6) morie_spatial_voting_mlsmu6(D, n_dims = 2, n_restarts = 1, max_iter = 50)
Lewis & Poole (2004) parametric bootstrap: simulate roll-call matrices from fitted probabilities, re-estimate per bootstrap replicate, compute SE from the bootstrap distribution.
morie_spatial_voting_nominate_bootstrap( votes, ideal_points, normal_vectors_arr, cutpoints, n_boot = 100L, seed = 42L )morie_spatial_voting_nominate_bootstrap( votes, ideal_points, normal_vectors_arr, cutpoints, n_boot = 100L, seed = 42L )
votes |
Original vote matrix. |
ideal_points |
Fitted ideal points. |
normal_vectors_arr |
Fitted normal vectors. |
cutpoints |
Fitted cutpoints. |
n_boot |
Number of bootstrap replications. |
seed |
RNG seed. |
List with se_ideal_points, boot_means, n_boot.
Lewis, J. B. and Poole, K. T. (2004). "Measuring Bias and Uncertainty in Ideal Point Estimates via the Parametric Bootstrap." Political Analysis, 12(2).
set.seed(1) v <- matrix(stats::rbinom(40, 1, 0.5), 5, 8) fit <- morie_spatial_voting_dw_nominate(v, max_iter = 5L) morie_spatial_voting_nominate_bootstrap( v, fit$ideal_points, fit$normal_vectors, fit$cutpoints, n_boot = 5L)set.seed(1) v <- matrix(stats::rbinom(40, 1, 0.5), 5, 8) fit <- morie_spatial_voting_dw_nominate(v, max_iter = 5L) morie_spatial_voting_nominate_bootstrap( v, fit$ideal_points, fit$normal_vectors, fit$cutpoints, n_boot = 5L)
NOMINATE log-likelihood and GMP
morie_spatial_voting_nominate_loglik( votes, x, z_yea, z_nay, beta = 15, w = NULL )morie_spatial_voting_nominate_loglik( votes, x, z_yea, z_nay, beta = 15, w = NULL )
votes |
Legislator-by-vote binary matrix. |
x |
Ideal points. |
z_yea |
Yea outcomes. |
z_nay |
Nay outcomes. |
beta |
Signal-to-noise. |
w |
Dimension weights. |
List with loglik, GMP, n_correct, n_total.
Poole & Rosenthal (1997).
v <- matrix(stats::rbinom(20, 1, 0.5), 4, 5) x <- matrix(rnorm(4), 4, 1); zy <- matrix(rnorm(5), 5, 1) zn <- matrix(rnorm(5), 5, 1) morie_spatial_voting_nominate_loglik(v, x, zy, zn)v <- matrix(stats::rbinom(20, 1, 0.5), 4, 5) x <- matrix(rnorm(4), 4, 1); zy <- matrix(rnorm(5), 5, 1) zn <- matrix(rnorm(5), 5, 1) morie_spatial_voting_nominate_loglik(v, x, zy, zn)
Computes Poole-Rosenthal NOMINATE utilities and vote probabilities.
morie_spatial_voting_nominate_utility(x, z_yea, z_nay, beta = 15, w = NULL)morie_spatial_voting_nominate_utility(x, z_yea, z_nay, beta = 15, w = NULL)
x |
Legislator ideal points (n_leg x n_dims). |
z_yea |
Yea outcome locations (n_votes x n_dims). |
z_nay |
Nay outcome locations (n_votes x n_dims). |
beta |
Signal-to-noise ratio. |
w |
Dimension weights (length n_dims; defaults to 1). |
List with U_yea, U_nay, utility_diff, vote_probs.
Poole, K. T. and Rosenthal, H. (1985); Armstrong et al. (2021), Ch. 5.
x <- matrix(rnorm(8), 4, 2) zy <- matrix(rnorm(6), 3, 2); zn <- matrix(rnorm(6), 3, 2) morie_spatial_voting_nominate_utility(x, zy, zn)x <- matrix(rnorm(8), 4, 2) zy <- matrix(rnorm(6), 3, 2); zn <- matrix(rnorm(6), 3, 2) morie_spatial_voting_nominate_utility(x, zy, zn)
Single NOMINATE vote probability
morie_spatial_voting_nominate_vote_prob( x_i, z_yea_j, z_nay_j, beta = 15, w = NULL )morie_spatial_voting_nominate_vote_prob( x_i, z_yea_j, z_nay_j, beta = 15, w = NULL )
x_i |
Legislator ideal point (vector). |
z_yea_j |
Yea outcome (vector). |
z_nay_j |
Nay outcome (vector). |
beta |
Signal-to-noise ratio. |
w |
Dimension weights. |
Numeric scalar in (0,1).
Poole & Rosenthal (1985).
morie_spatial_voting_nominate_vote_prob(c(0.1), c(0.3), c(-0.3))morie_spatial_voting_nominate_vote_prob(c(0.1), c(0.3), c(-0.3))
Kruskal-style nonmetric MDS using pool-adjacent-violators monotone regression on dissimilarity ranks.
morie_spatial_voting_nonmetric_mds( D, n_dims = 2L, max_iter = 300L, tol = 1e-06 )morie_spatial_voting_nonmetric_mds( D, n_dims = 2L, max_iter = 300L, tol = 1e-06 )
D |
Symmetric dissimilarity matrix. |
n_dims |
Number of dimensions. |
max_iter |
Maximum iterations. |
tol |
Convergence tolerance. |
A list with coordinates, stress, iterations, converged.
Kruskal, J. B. (1964). "Nonmetric Multidimensional Scaling: A Numerical Method." Psychometrika, 29(2), 115-129.
D <- as.matrix(dist(matrix(rnorm(40), 10))) morie_spatial_voting_nonmetric_mds(D)D <- as.matrix(dist(matrix(rnorm(40), 10))) morie_spatial_voting_nonmetric_mds(D)
Efron & Tibshirani (1993) resampling of respondents for Aldrich-McKelvey and Basic Space scaling SEs.
morie_spatial_voting_nonparametric_bootstrap( Z, scale_fn = "am", n_boot = 200L, seed = 42L )morie_spatial_voting_nonparametric_bootstrap( Z, scale_fn = "am", n_boot = 200L, seed = 42L )
Z |
Perception matrix. |
scale_fn |
One of |
n_boot |
Number of bootstrap replications. |
seed |
RNG seed. |
List with se_positions, boot_mean, ci_lower,
ci_upper, n_boot.
Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman & Hall.
set.seed(1) Z <- matrix(rnorm(20 * 5), 20, 5) morie_spatial_voting_nonparametric_bootstrap(Z, n_boot = 10L)set.seed(1) Z <- matrix(rnorm(20 * 5), 20, 5) morie_spatial_voting_nonparametric_bootstrap(Z, n_boot = 10L)
Normal-vector projection of an external measure
morie_spatial_voting_normal_vectors(ideal_points, external_measure)morie_spatial_voting_normal_vectors(ideal_points, external_measure)
ideal_points |
Ideal-point coordinates. |
external_measure |
Vector to project. |
List with normal_vector, angle_degrees, angle_radians,
r_squared, coefficients.
Armstrong et al. (2021), Section 2.6.
morie_spatial_voting_normal_vectors(matrix(rnorm(20), 10, 2), rnorm(10))morie_spatial_voting_normal_vectors(matrix(rnorm(20), 10, 2), rnorm(10))
Nonparametric ideal-point estimation from binary roll-call votes that minimises the number of classification errors (Poole 2000).
morie_spatial_voting_optimal_classification( votes, n_dims = 1L, max_iter = 500L, n_restarts = 10L, seed = 42L )morie_spatial_voting_optimal_classification( votes, n_dims = 1L, max_iter = 500L, n_restarts = 10L, seed = 42L )
votes |
Legislator-by-vote matrix; |
n_dims |
Number of latent dimensions. |
max_iter |
Maximum iterations per restart. |
n_restarts |
Random restarts (best PRE retained). |
seed |
RNG seed. |
A list with ideal_points, cutting_normals, PRE, APRE,
total_errors, null_errors, n_dims.
Poole, K. T. (2000). "Non-Parametric Unfolding of Binary Choice Data." Political Analysis, 8(3), 211-237.
set.seed(1) v <- matrix(stats::rbinom(20 * 30, 1, 0.5), 20, 30) morie_spatial_voting_optimal_classification(v)set.seed(1) v <- matrix(stats::rbinom(20 * 30, 1, 0.5), 20, 30) morie_spatial_voting_optimal_classification(v)
Ordered Optimal Classification for ordinal scales
morie_spatial_voting_ordered_oc(Y, n_dims = 2L, max_iter = 500L, tol = 1e-06)morie_spatial_voting_ordered_oc(Y, n_dims = 2L, max_iter = 500L, tol = 1e-06)
Y |
Respondent-by-item ordinal response matrix. |
n_dims |
Latent dimensions. |
max_iter |
Maximum iterations. |
tol |
Tolerance (unused; kept for API parity). |
List with ideal_points, cutpoints, normals,
correct_class, iterations.
Hare, C., Liu, T.-P., and Lupton, R. N. (2018). "What Ordered Optimal Classification reveals about ideological structure, cleavages, and polarization in the American mass public." Public Choice, 176(1), 57-78.
Y <- matrix(sample(1:4, 60, replace = TRUE), 15, 4) morie_spatial_voting_ordered_oc(Y, n_dims = 1L, max_iter = 20L)Y <- matrix(sample(1:4, 60, replace = TRUE), 15, 4) morie_spatial_voting_ordered_oc(Y, n_dims = 1L, max_iter = 20L)
Ordinal IRT / Quinn factor model (stub)
morie_spatial_voting_ordinal_irt( Y, n_dims = 1L, n_samples = 500L, burn_in = 100L, seed = 42L )morie_spatial_voting_ordinal_irt( Y, n_dims = 1L, n_samples = 500L, burn_in = 100L, seed = 42L )
Y |
Ordinal response matrix. |
n_dims |
Latent dimensions. |
n_samples |
MCMC samples. @param burn_in Burn-in length. |
burn_in |
Integer; MCMC burn-in iterations. |
seed |
RNG seed. |
Never returns; raises NotYetPorted.
Quinn, K. M. (2004). "Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses." Political Analysis, 12(4).
## Not run: morie_spatial_voting_ordinal_irt(matrix(1L, 5, 3))## Not run: morie_spatial_voting_ordinal_irt(matrix(1L, 5, 3))
Orthogonal rotation aligning X to X_target (with reflection
protection against improper rotations).
morie_spatial_voting_procrustes(X, X_target)morie_spatial_voting_procrustes(X, X_target)
X |
Configuration to rotate. |
X_target |
Target configuration. |
List with rotated, rotation_matrix, scale, mse.
Gower & Dijksterhuis (2004).
A <- matrix(rnorm(20), 10, 2); B <- A + 0.05 * matrix(rnorm(20), 10, 2) morie_spatial_voting_procrustes(A, B)A <- matrix(rnorm(20), 10, 2); B <- A + 0.05 * matrix(rnorm(20), 10, 2) morie_spatial_voting_procrustes(A, B)
Iterative majorisation algorithm for metric MDS.
morie_spatial_voting_smacof( D, n_dims = 2L, max_iter = 300L, tol = 1e-06, weights = NULL, init = NULL )morie_spatial_voting_smacof( D, n_dims = 2L, max_iter = 300L, tol = 1e-06, weights = NULL, init = NULL )
D |
Symmetric dissimilarity matrix. |
n_dims |
Number of dimensions. |
max_iter |
Maximum iterations. |
tol |
Convergence tolerance on stress change. |
weights |
Optional weight matrix (defaults to uniform). |
init |
Optional initial configuration (n x n_dims). |
A list with coordinates, stress, iterations, converged.
De Leeuw, J. (1977). "Applications of Convex Analysis to Multidimensional Scaling." In Recent Developments in Statistics, 133-145. Borg & Groenen (2005).
D <- as.matrix(dist(matrix(rnorm(40), 10))) morie_spatial_voting_smacof(D)D <- as.matrix(dist(matrix(rnorm(40), 10))) morie_spatial_voting_smacof(D)
Majorisation-based unfolding embedding respondents and stimuli into a common space.
morie_spatial_voting_smacof_unfolding( D, n_dims = 2L, max_iter = 300L, tol = 1e-06 )morie_spatial_voting_smacof_unfolding( D, n_dims = 2L, max_iter = 300L, tol = 1e-06 )
D |
Respondent-by-stimulus dissimilarity matrix. |
n_dims |
Latent dimensions. |
max_iter |
Maximum iterations. |
tol |
Convergence tolerance. |
A list with respondent_coords, stimulus_coords, stress,
iterations, converged.
Borg & Groenen (2005); Armstrong et al. (2021), Ch. 4.
D <- matrix(stats::runif(12), 3, 4) morie_spatial_voting_smacof_unfolding(D, max_iter = 20)D <- matrix(stats::runif(12), 3, 4) morie_spatial_voting_smacof_unfolding(D, max_iter = 20)
Compute unfolding stress
morie_spatial_voting_unfolding_stress(X_r, X_s, D, weights = NULL)morie_spatial_voting_unfolding_stress(X_r, X_s, D, weights = NULL)
X_r |
Respondent coordinates (n_r x n_dims). |
X_s |
Stimulus coordinates (n_s x n_dims). |
D |
Observed respondent-stimulus dissimilarities. |
weights |
Optional weight matrix. |
A numeric scalar, the weighted sum of squared residuals.
Coombs (1964); Armstrong et al. (2021).
Xr <- matrix(rnorm(6), 3, 2); Xs <- matrix(rnorm(8), 4, 2) D <- matrix(stats::runif(12), 3, 4) morie_spatial_voting_unfolding_stress(Xr, Xs, D)Xr <- matrix(rnorm(6), 3, 2); Xs <- matrix(rnorm(8), 4, 2) D <- matrix(stats::runif(12), 3, 4) morie_spatial_voting_unfolding_stress(Xr, Xs, D)
Slapin & Proksch (2008) one-dimensional Poisson IRT for estimating document positions from word-count data.
morie_spatial_voting_wordfish(dtm, max_iter = 100L, tol = 1e-06)morie_spatial_voting_wordfish(dtm, max_iter = 100L, tol = 1e-06)
dtm |
Document-by-term integer count matrix. |
max_iter |
Maximum EM iterations. |
tol |
Convergence tolerance. |
List with positions, word_weights, word_fixed,
doc_fixed, log_lik, iterations.
Slapin, J. B. and Proksch, S.-O. (2008). "A Scaling Model for Estimating Time-Series Party Positions from Texts." AJPS, 52(3).
set.seed(1) dtm <- matrix(stats::rpois(20 * 30, 5), 20, 30) morie_spatial_voting_wordfish(dtm, max_iter = 20L)set.seed(1) dtm <- matrix(stats::rpois(20 * 30, 5), 20, 30) morie_spatial_voting_wordfish(dtm, max_iter = 20L)
Spearman rank correlation
morie_spearman_rho(x, y)morie_spearman_rho(x, y)
x |
Numeric vector. |
y |
Numeric vector. |
Named list: rho, p_value.
morie_spearman_rho(x = rnorm(50), y = rnorm(50))morie_spearman_rho(x = rnorm(50), y = rnorm(50))
Convenience helper: given a data.frame just-loaded by
morie_arsau_load_*(), returns the
list(name, dtype, valid_values) structure expected by the
audit functions.
morie_specs_from_df(df)morie_specs_from_df(df)
df |
Loaded data.frame. |
List of column specs.
Thin extender over kernlab::specc that performs Ng / Jordan
/ Weiss spectral clustering on a feature matrix.
morie_spectral_cluster(x, centers, ...)morie_spectral_cluster(x, centers, ...)
x |
Numeric matrix or data frame of features (rows = observations). |
centers |
Integer; the number of clusters to extract. |
... |
Further arguments forwarded to |
A list with $method = "kernlab::specc" and
$raw (a specc S4 object containing the cluster
assignments, centres and within-cluster sums of squares).
## Not run: if (requireNamespace("kernlab", quietly = TRUE)) { set.seed(1) x <- rbind( matrix(stats::rnorm(80, mean = -2), ncol = 2), matrix(stats::rnorm(80, mean = 2), ncol = 2) ) morie_spectral_cluster(x, centers = 2) } ## End(Not run)## Not run: if (requireNamespace("kernlab", quietly = TRUE)) { set.seed(1) x <- rbind( matrix(stats::rnorm(80, mean = -2), ncol = 2), matrix(stats::rnorm(80, mean = 2), ncol = 2) ) morie_spectral_cluster(x, centers = 2) } ## End(Not run)
Welch power spectral density
morie_spectral_density(x, fs = 1, nperseg = NULL)morie_spectral_density(x, fs = 1, nperseg = NULL)
x |
Numeric univariate series. |
fs |
Sampling frequency. Default 1. |
nperseg |
Segment length. Default max(n/4, 8). |
Named list with frequencies, psd, n_segments, nperseg,
fs, n, method.
morie_spectral_density(x = rnorm(50))morie_spectral_density(x = rnorm(50))
Replicates the analytical contribution of the four CRIMSL UToronto research reports authored by Prof. Jane B. Sprott (TMU, formerly Ryerson) and Prof. Anthony N. Doob (University of Toronto), with Prof. Adelina Iftene (Dalhousie) co- authoring the May 2021 paper on Independent External Decision Makers (IEDMs).
Understanding the Operation of CSC's Structured Intervention Units – first systematic outside analysis of CSC SIU data.
COVID attribution – tests CSC's COVID-attribution defense.
Solitary Confinement, Torture, and Canada's SIUs – introduces the Mandela-Rules classifier; the most data-intensive of the four.
Independent External Decision Makers – evaluates the IEDM review mechanism.
Headline tables (Feb 2021): Tables 13, 19, 23 reproduce SIU person-stay rates per 1000 prisoners, the Mandela-Rules classification of N=1960 SIU stays (solitary 28.4%, torture 9.9%, all-other 61.7%), and the regional torture/solitary rates.
Headline tables (May 2021): Tables 1, 3, 5, 7, 8, 9, 10, 14, 15 reproduce IEDM-reviewed population characteristics and review outcomes (N=265 stays, 380 reviews).
Sprott, J. B., & Doob, A. N. (2021, February). Solitary Confinement, Torture, and Canada's Structured Intervention Units. Centre for Criminology & Sociolegal Studies, U. of Toronto.
R port of morie.stat_bridge. Exposes the same three modes
available on the Python side — registry enumeration, a formatted
help dump, and command execution — so an external runner (e.g.
the Go TIDE TUI, a shell pipeline) can drive morie's R surface via
Rscript -e 'rmorie::stat_bridge_main(...)'.
Two layers are provided:
Programmatic helpers (stat_bridge_registry_json,
stat_bridge_help, stat_bridge_exec) callable from
ordinary R code.
A dispatcher (stat_bridge_main) that mimics the
command-line entry point of the Python module so the same
invocation pattern works from either runtime.
R port of morie.stat_commands. Maintains a flat registry of
R-callable statistical command entries plus aliases, allowing
downstream tooling (the Go TIDE bridge, headless workers, REPL
frontends) to enumerate, resolve, and dispatch operations from a
single namespace.
Entries are stored in the package-level environment
.morie_stat_commands so that the registry is shared across
sessions within a single R process and can be appended by extension
packages.
stat_command: constructor for a single command.
register_stat_command: add an entry to the
registry, validating uniqueness of name and aliases.
resolve_stat_command: look up by canonical
name or alias.
all_stat_command_names: sorted vector of all
names + aliases.
commands_by_category: list of entries grouped
by category.
run_stat_command: invoke a command's REPL
handler with the supplied arguments.
Local-level state-space model (Kalman filter+smoother)
morie_state_space_model(x)morie_state_space_model(x)
x |
Numeric univariate series. |
Named list with filtered_state, filtered_state_variance,
smoothed_state, loglik, Q, R, n, method.
morie_state_space_model(x = rnorm(50))morie_state_space_model(x = rnorm(50))
Proportional or fixed stratified random sample
morie_stratified_sample( df, strata_col, n_per_stratum, proportional = FALSE, seed = 42L )morie_stratified_sample( df, strata_col, n_per_stratum, proportional = FALSE, seed = 42L )
df |
A data frame. |
strata_col |
Name of the stratification column. |
n_per_stratum |
Either an integer (equal allocation) or a named integer
vector mapping stratum levels to sample sizes. If |
proportional |
Logical; if |
seed |
Random seed. |
Data frame of sampled rows with a .weight column.
df <- data.frame(g = c(rep("A", 60), rep("B", 40)), x = rnorm(100)) morie_stratified_sample(df, "g", n_per_stratum = 10)df <- data.frame(g = c(rep("A", 60), rep("B", 40)), x = rnorm(100)) morie_stratified_sample(df, "g", n_per_stratum = 10)
Mirrors the Python morie.suggest_analysis_plan(). Inspects the output
of morie_profile_dataset() and returns plain-English recommendations for
candidate analyses.
morie_suggest_analysis_plan(profile)morie_suggest_analysis_plan(profile)
profile |
A list returned by |
Character vector of suggestion strings, one per recommendation.
morie_suggest_analysis_plan(morie_profile_dataset(iris))morie_suggest_analysis_plan(morie_profile_dataset(iris))
Mann-Whitney U on the absolute deviations from the pooled median. Tests equality of scales given (approximately) equal medians.
morie_sukhatme_test(x, y)morie_sukhatme_test(x, y)
x, y
|
Numeric vectors. |
Named list: statistic (z), p_value, U, n, m.
morie_sukhatme_test(x = rnorm(50), y = rnorm(50))morie_sukhatme_test(x = rnorm(50), y = rnorm(50))
Summarize an output audit
morie_summarize_output_audit(audit_tbl)morie_summarize_output_audit(audit_tbl)
audit_tbl |
Result from |
Named list with high-level diagnostics.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
For multi-variable marginals use morie_weights_rake(); this helper is the
single-variable convenience.
morie_survey_calibrate( df, aux_vars, population_totals, max_iter = 50, tol = 1e-06 )morie_survey_calibrate( df, aux_vars, population_totals, max_iter = 50, tol = 1e-06 )
df |
A |
aux_vars |
Character vector of column names of auxiliary variables used for calibration (raking, GREG, etc.). |
population_totals |
Named numeric vector of population totals
to calibrate to (one entry per |
max_iter |
Iteration cap for the iterative calibration loop. |
tol |
Convergence tolerance for calibration. |
svyglm in one call). Cluster-robust SEs via the design.Complex-survey GLM constructor (single-shot wrapper that builds a design
and fits a svyglm in one call). Cluster-robust SEs via the design.
morie_survey_complex_glm( df, formula, weight_col, family = "gaussian", cluster_col = NULL, strata_col = NULL )morie_survey_complex_glm( df, formula, weight_col, family = "gaussian", cluster_col = NULL, strata_col = NULL )
df |
A |
formula |
A |
weight_col |
Character; column name of the design weight
variable in |
family |
A |
cluster_col |
Character; column name of the cluster identifier
in |
strata_col |
Character; column name of the stratum identifier
in |
Returns a survey::svydesign object when survey is available; otherwise
returns a lightweight list with the same fields the morie helpers consume.
morie_survey_design( data, weights_col, strata_col = NULL, cluster_col = NULL, fpc_col = NULL, nest = FALSE )morie_survey_design( data, weights_col, strata_col = NULL, cluster_col = NULL, fpc_col = NULL, nest = FALSE )
data |
data.frame. |
weights_col |
Column name of analytic/probability weights. |
strata_col |
Optional strata column. |
cluster_col |
Optional PSU/cluster column. |
fpc_col |
Optional finite-population-correction column. |
nest |
If TRUE, treat cluster IDs as nested within strata. |
Wraps survey::svyglm(). Family argument accepts the same strings as the
Python module ("gaussian", "binomial", "poisson", "gamma", "negativebinomial")
or any R family object.
morie_survey_glm( design, formula, family = c("gaussian", "binomial", "poisson", "gamma", "negativebinomial") )morie_survey_glm( design, formula, family = c("gaussian", "binomial", "poisson", "gamma", "negativebinomial") )
design |
A |
formula |
A |
family |
A |
Hajek (ratio) estimator of a population mean.
morie_survey_hajek_mean(y, weights)morie_survey_hajek_mean(y, weights)
y |
Numeric vector of outcome values aligned with the sample. |
weights |
Numeric vector of design weights aligned with |
Horvitz-Thompson estimator of a population total.
morie_survey_ht_total(y, inclusion_probs)morie_survey_ht_total(y, inclusion_probs)
y |
Numeric vector of outcome values aligned with the sample. |
inclusion_probs |
Numeric vector of inclusion probabilities
( |
list with total, se, ci_lower, ci_upper.
survey::svymean when available).Survey-weighted mean (delegates to survey::svymean when available).
morie_survey_mean(design, variable)morie_survey_mean(design, variable)
design |
A |
variable |
Character; column name of the outcome variable in
the |
Delegates to survey::postStratify() when given a design; otherwise
computes raw post-stratification factors in base R.
morie_survey_poststratify(df, strata_col, population_counts)morie_survey_poststratify(df, strata_col, population_counts)
df |
A |
strata_col |
Character; column name of the stratum identifier
in |
population_counts |
Named numeric vector of population counts by stratum (post-stratification target). |
Ratio estimator of a population total using known X_pop.
morie_survey_ratio(y, x, weights, X_population_total)morie_survey_ratio(y, x, weights, X_population_total)
y |
Numeric vector of outcome values aligned with the sample. |
x |
Numeric vector of auxiliary values aligned with |
weights |
Numeric vector of design weights aligned with |
X_population_total |
Known population total for the auxiliary
variable |
Subpopulation (domain) mean with Woodruff linearised SE.
morie_survey_subpop(df, domain_col, domain_value, outcome_col, weight_col)morie_survey_subpop(df, domain_col, domain_value, outcome_col, weight_col)
df |
A |
domain_col |
Character; column name of the subpopulation /
domain indicator in |
domain_value |
Value (matching |
outcome_col |
Character; column name of the outcome variable
in |
weight_col |
Character; column name of the design weight
variable in |
Wraps survival::survreg(). Supported dist: "weibull", "lognormal",
"loglogistic", "exponential", "gaussian".
morie_survival_aft( data, duration_col, event_col, covariate_cols, dist = c("weibull", "lognormal", "loglogistic", "exponential", "gaussian") )morie_survival_aft( data, duration_col, event_col, covariate_cols, dist = c("weibull", "lognormal", "loglogistic", "exponential", "gaussian") )
data |
A |
duration_col |
Character; column name of the event/censoring
time in |
event_col |
Character; column name of the event-indicator
variable in |
covariate_cols |
Character vector of covariate column names. |
dist |
Distribution name for parametric/AFT fits (e.g.
|
Wraps survival::survfit() with multi-state Surv().
morie_survival_cif(time, event, event_of_interest = 1L, confidence = 0.95)morie_survival_cif(time, event, event_of_interest = 1L, confidence = 0.95)
time |
Numeric vector of event/censoring times. |
event |
Integer event code: 0 = censored, 1 = event of interest,=2 = competing event. |
event_of_interest |
Integer event-type code for competing- risks analyses (Fine-Gray, CIF). |
confidence |
Confidence level for interval estimates (default
|
Compare parametric survival models by AIC/BIC.
morie_survival_compare_parametric(time, event)morie_survival_compare_parametric(time, event)
time |
Numeric vector of event/censoring times. |
event |
Integer/logical vector; 1 = event, 0 = censored. |
Uses survival::concordance() (which handles ties + censoring correctly).
morie_survival_concordance(time, event, risk_score)morie_survival_concordance(time, event, risk_score)
time |
Numeric vector of event/censoring times. |
event |
Integer/logical vector; 1 = event, 0 = censored. |
risk_score |
Numeric vector of predicted risk scores aligned
with |
Wraps survival::coxph() with Efron (default) or Breslow tie handling
and returns a tidy list including hazard ratios, CIs, p-values, and the
Breslow baseline cumulative hazard.
morie_survival_cox( data, duration_col, event_col, covariate_cols, ties = c("efron", "breslow"), confidence = 0.95, penalizer = 0 )morie_survival_cox( data, duration_col, event_col, covariate_cols, ties = c("efron", "breslow"), confidence = 0.95, penalizer = 0 )
data |
data.frame. |
duration_col |
Name of the time column. |
event_col |
Name of the 0/1 event column. |
covariate_cols |
Character vector of covariate column names. |
ties |
"efron" (default) or "breslow". |
confidence |
Confidence level (default 0.95). |
penalizer |
L2 penalty (passed via |
Cox-Snell residuals from a fitted morie Cox model.
morie_survival_coxsnell(cox_result)morie_survival_coxsnell(cox_result)
cox_result |
A |
Deviance residuals.
morie_survival_deviance(cox_result)morie_survival_deviance(cox_result)
cox_result |
A |
Requires the cmprsk package.
morie_survival_finegray( data, duration_col, event_col, covariate_cols, event_of_interest = 1L, confidence = 0.95 )morie_survival_finegray( data, duration_col, event_col, covariate_cols, event_of_interest = 1L, confidence = 0.95 )
data |
A |
duration_col |
Character; column name of the event/censoring
time in |
event_col |
Character; column name of the event-indicator
variable in |
covariate_cols |
Character vector of covariate column names. |
event_of_interest |
Integer event-type code for competing- risks analyses (Fine-Gray, CIF). |
confidence |
Confidence level for interval estimates (default
|
Hazard ratio between two groups via a simple Cox model.
morie_survival_hr(time, event, group, confidence = 0.95)morie_survival_hr(time, event, group, confidence = 0.95)
time |
Numeric vector of event/censoring times. |
event |
Integer/logical vector; 1 = event, 0 = censored. |
group |
Factor/character grouping variable for HR / log-rank stratified comparisons. |
confidence |
Confidence level for interval estimates (default
|
Thin wrapper around survival::survfit() returning a tidy list with
Greenwood or complementary-log-log confidence bands.
morie_survival_km( time, event, confidence = 0.95, ci_method = c("greenwood", "log-log") )morie_survival_km( time, event, confidence = 0.95, ci_method = c("greenwood", "log-log") )
time |
Numeric vector of observation times. |
event |
0/1 event indicator (1 = event observed). |
confidence |
Confidence level (default 0.95). |
ci_method |
"greenwood" (plain) or "log-log". |
list with times, survival, ci_lower, ci_upper,
at_risk, events, censored, median_survival, method.
Landmark dataset constructor.
morie_survival_landmark(data, duration_col, event_col, landmark_time)morie_survival_landmark(data, duration_col, event_col, landmark_time)
data |
A |
duration_col |
Character; column name of the event/censoring
time in |
event_col |
Character; column name of the event-indicator
variable in |
landmark_time |
Numeric landmark time at which to subset the cohort before fitting (lead-time bias correction). |
Left-truncated Kaplan-Meier with delayed entry.
morie_survival_left_truncated_km( entry_time, exit_time, event, confidence = 0.95 )morie_survival_left_truncated_km( entry_time, exit_time, event, confidence = 0.95 )
entry_time |
Left-truncation entry times. |
exit_time |
Exit (event/censoring) times for the left-truncated KM estimator. |
event |
Integer/logical vector; 1 = event, 0 = censored. |
confidence |
Confidence level for interval estimates (default
|
Delegates to survival::survdiff() for the standard log-rank weight (rho=0)
and Peto-Peto (rho=1). Gehan/Tarone-Ware are not supported by survdiff
directly and currently fall back to rho=1 (Peto) as the closest analogue;
use survival::survdiff(..., rho=1) plus FH weights for exact equivalents.
morie_survival_logrank( time, event, group, weight = c("logrank", "peto", "gehan", "tarone") )morie_survival_logrank( time, event, group, weight = c("logrank", "peto", "gehan", "tarone") )
time, event, group
|
Vectors. |
weight |
One of "logrank", "peto", "gehan", "tarone". |
Martingale residuals.
morie_survival_martingale(cox_result)morie_survival_martingale(cox_result)
cox_result |
A |
Nelson-Aalen cumulative-hazard estimator.
morie_survival_nelsonaalen(time, event, confidence = 0.95)morie_survival_nelsonaalen(time, event, confidence = 0.95)
time |
Numeric vector of observation times. |
event |
0/1 event indicator (1 = event observed). |
confidence |
Confidence level (default 0.95). |
list with times, cumhaz, ci_lower, ci_upper,
at_risk, events, censored.
For "exponential", "weibull", "lognormal", "loglogistic", "gaussian".
Use morie_survival_aft() for covariate-adjusted parametric models.
morie_survival_parametric( time, event, dist = c("weibull", "exponential", "lognormal", "loglogistic", "gaussian") )morie_survival_parametric( time, event, dist = c("weibull", "exponential", "lognormal", "loglogistic", "gaussian") )
time |
Numeric vector of event/censoring times. |
event |
Integer/logical vector; 1 = event, 0 = censored. |
dist |
Distribution name for parametric/AFT fits (e.g.
|
Integrates the Kaplan-Meier estimator from 0 to tau using trapezoidal
integration on the step-function. SE follows the Klein-Moeschberger
formula (approximation matches the Python module).
morie_survival_rmst(time, event, tau = NULL, confidence = 0.95)morie_survival_rmst(time, event, tau = NULL, confidence = 0.95)
time |
Numeric vector of event/censoring times. |
event |
Integer/logical vector; 1 = event, 0 = censored. |
tau |
RMST truncation horizon ( |
confidence |
Confidence level for interval estimates (default
|
Difference in RMST between two groups.
morie_survival_rmst_diff( time1, event1, time2, event2, tau = NULL, confidence = 0.95 )morie_survival_rmst_diff( time1, event1, time2, event2, tau = NULL, confidence = 0.95 )
time1 |
Time vector for group 1 ( |
event1 |
Event vector for group 1 ( |
time2 |
Time vector for group 2 ( |
event2 |
Event vector for group 2 ( |
tau |
RMST truncation horizon ( |
confidence |
Confidence level for interval estimates (default
|
Wraps survival::cox.zph() (scaled Schoenfeld residuals).
morie_survival_schoenfeld(cox_result)morie_survival_schoenfeld(cox_result)
cox_result |
Object returned by |
Delegates to survival::survfit() with Surv(left, right, type = "interval2").
Hand-rolled EM is left as a stub for environments without survival.
morie_survival_turnbull(left, right, max_iter = 200, tol = 1e-06)morie_survival_turnbull(left, right, max_iter = 200, tol = 1e-06)
left |
Left-bracket times for interval-censored data
( |
right |
Right-bracket times for interval-censored data. |
max_iter |
Iteration cap for the Turnbull NPMLE EM loop. |
tol |
Convergence tolerance for the Turnbull EM. |
Support-vector regression for genomic prediction
morie_svm_genomic(x, y, markers, C = 1, epsilon = 0.1, gamma = "scale")morie_svm_genomic(x, y, markers, C = 1, epsilon = 0.1, gamma = "scale")
x |
Optional fixed-effect features. |
y |
Numeric response. |
markers |
(n x m) genotype matrix. |
C |
Cost (default 1). |
epsilon |
SVR tube width (default 0.1). |
gamma |
RBF kernel scale ("scale" = 1/(m * var(M)) or numeric). |
list(estimate, y_hat, alpha, support_indices, se, n, method).
Vapnik (1995); Montesinos Lopez Ch 7.
morie_svm_genomic(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))morie_svm_genomic(x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4))
Wraps e1071::svm with a linear kernel.
morie_svm_hinge_primal(x, y, C = 1, seed = 0L)morie_svm_hinge_primal(x, y, C = 1, seed = 0L)
x |
Numeric predictor matrix. |
y |
Binary response. |
C |
Soft-margin inverse regularisation. |
seed |
RNG seed. |
Named list: estimate, intercept, weights, train_accuracy, C, classes, n, method.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Wraps e1071::svm.
morie_svm_kernel_trick( x, y, kernel = "rbf", C = 1, gamma = "scale", degree = 3L, seed = 0L )morie_svm_kernel_trick( x, y, kernel = "rbf", C = 1, gamma = "scale", degree = 3L, seed = 0L )
x |
Numeric predictor matrix. |
y |
Binary response. |
kernel |
One of "rbf" (radial), "poly", "sigmoid", "linear". |
C |
Cost parameter. |
gamma |
Kernel coefficient ("scale" -> 1/(ncol(x)*var(x)), "auto" -> 1/p, or numeric). |
degree |
Polynomial degree. |
seed |
RNG seed. |
Named list: estimate, train_accuracy, n_support, kernel, C, gamma, degree, n, method.
morie_svm_kernel_trick(x = rnorm(50), y = rnorm(50))morie_svm_kernel_trick(x = rnorm(50), y = rnorm(50))
Returns a function that mimics stats::runif but is seeded
from seed. Pairs with morie.longitudinal_sim.sync_rng
on the Python side so the two emit identical streams when given the
same seed.
morie_sync_rng(seed)morie_sync_rng(seed)
seed |
Non-negative integer seed. |
An environment with rnorm, runif, sample
methods that share the same underlying RNG state.
morie_sync_rng(seed = 1L)morie_sync_rng(seed = 1L)
Returns a small data.frame mirroring the column shape of the
published corrections-UoF resource for the given short key.
Schemas are derived from the bundled
inst/extdata/corrections_uof_dictionary.json (parsed from
datadictionary_correctionsribd_en_fr20250822.xlsx). Values
are uniformly drawn from the dictionary "Data Values" examples
or generic ranges; used only for offline-no-fixture fallback in
tests and demos. NOT a substitute for the real upstream data
published at https://data.ontario.ca/dataset/use-of-force-in-correctional-institutions.
morie_synth_corrections_uof(key, n = 30L, seed = 1L)morie_synth_corrections_uof(key, n = 30L, seed = 1L)
key |
Character; one of the 12 short names returned by
|
n |
Integer; number of synthetic rows. Default 30. |
seed |
Integer; RNG seed. Default 1. |
A data.frame.
morie_datasets_corrections_uof_incidents and
the 11 sibling loaders.
df <- morie_synth_corrections_uof("incidents", n = 10) nrow(df)df <- morie_synth_corrections_uof("incidents", n = 10) nrow(df)
Returns a data.frame mirroring the column shape + categorical level
set of the published OTIS dataset for the given id (a01,
b01..b09, c01..c12, or d01..d07). Schema is derived from the
Ontario MCSCS XLSX data dictionary that ships at
inst/extdata/otis_dictionary.json. Values are randomly drawn
from the dictionary categorical levels (or 0..80 for count
columns); used only for offline-no-fixture fallback in tests and
demos. NOT a substitute for the real OTIS data published at
https://data.ontario.ca/dataset/data-on-inmates-in-ontario.
morie_synth_otis(id, n = 200L, seed = 1L)morie_synth_otis(id, n = 200L, seed = 1L)
id |
Character; one of |
n |
Integer; number of synthetic rows. Default 200. |
seed |
Integer; RNG seed for reproducibility. Default 1. |
A data.frame with the dictionary-derived schema.
morie_synth_otis_all for the full 29-dataset
list; morie_datasets_otis_a01 and friends for the
real bundled+live loaders.
df <- morie_synth_otis("c11", n = 50) head(df)df <- morie_synth_otis("c11", n = 50) head(df)
Returns a named list of synthetic data.frames keyed by OTIS
publication id (a01, b01..b09, c01..c12, d01..d07). Each frame is
built by morie_synth_otis with a per-id seed offset
for reproducibility.
morie_synth_otis_all(n = 80L, seed = 2L)morie_synth_otis_all(n = 80L, seed = 2L)
n |
Integer; rows per dataset. Default 80. |
seed |
Integer; base RNG seed. Default 2. |
Named list of 29 data.frames.
all_otis <- morie_synth_otis_all(n = 30) names(all_otis)all_otis <- morie_synth_otis_all(n = 30) names(all_otis)
Replaces pooled ranks with Blom-approximated normal scores a_i = qnorm((R_i - 3/8) / (N + 1/4)). Statistic stat_t = sum of scores from the first sample.
morie_terry_hoeffding_test(x, y)morie_terry_hoeffding_test(x, y)
x, y
|
Numeric vectors. |
Named list: statistic, p_value, z, n, m.
morie_terry_hoeffding_test(x = rnorm(50), y = rnorm(50))morie_terry_hoeffding_test(x = rnorm(50), y = rnorm(50))
GJR-GARCH(1,1) threshold GARCH
morie_tgarch_model(x)morie_tgarch_model(x)
x |
Numeric return series. |
Named list with omega, alpha, gamma, beta, persistence,
loglik, conditional_variance, n, method.
morie_tgarch_model(x = rnorm(50))morie_tgarch_model(x = rnorm(50))
Two-regime self-exciting threshold autoregressive (SETAR) model
morie_threshold_autoregression(x, p = 1, d = 1, n_grid = 50)morie_threshold_autoregression(x, p = 1, d = 1, n_grid = 50)
x |
Numeric univariate series. |
p |
AR order in each regime. Default 1. |
d |
Delay parameter for the threshold variable. Default 1. |
n_grid |
Grid size for threshold search. Default 50. |
Named list with threshold, phi_lower, phi_upper, p, d,
regime_sizes, sse, n, method.
morie_threshold_autoregression(x = rnorm(50))morie_threshold_autoregression(x = rnorm(50))
Returns the bundled inst/extdata/to_hood_158_140_crosswalk.csv,
computed from polygon intersection of the two upstream Open Toronto
GeoJSON layers (Neighbourhoods - 4326.geojson and
Neighbourhoods - historical 140 - 4326.geojson) reprojected to
EPSG:3347 (NAD83 Statistics Canada Lambert – metric, accurate
areas). Seven columns:
morie_to_hood_crosswalk()morie_to_hood_crosswalk()
3-char zero-padded historical short code
historical neighbourhood name (carries "(NN)" suffix)
3-char zero-padded current short code
current neighbourhood name
FORWARD: percent of the 140's area inside this
158. Per-140 rows sum to 100. Used by
morie_tps_disaggregate_140_to_158() as the cake-cutting
weight under a uniform-density assumption.
REVERSE: percent of the 158's area inside this
140. Per-158 rows sum to 100. For pure cake-cuts every split
child has pct_158_in_140 == 100 (each 158 is entirely
inside its parent 140), so
morie_tps_aggregate_158_to_140() is mathematically EXACT
(lossless sum) for the 1:1 + split cohort. Only the one
split+merge sliver in the bundled OT data has a non-100
reverse percent.
"1:1" / "split" (one 140 -> N 158s) / "merge" (multiple 140s -> one 158) / "split+merge"
Empirical distribution on the bundled OT data:
123 1:1 rows (78\
34 split rows (16 historical hoods) – pct_158_in_140 == 100
1 merge – both percents == 100
1 split+merge – one sliver < 100
A data.frame with the columns above. hood_140 and
hood_158 are character (zero-padded to 3 chars).
Returns a data.frame with the canonical City of Toronto Open Data
schema (_id, AREA_ID, AREA_ATTR_ID, PARENT_AREA_ID,
AREA_SHORT_CODE, AREA_LONG_CODE, AREA_NAME, AREA_DESC,
CLASSIFICATION, CLASSIFICATION_CODE, OBJECTID, geometry)
for the requested version.
morie_to_neighbourhoods( version = c("158", "140", "nia"), offline = TRUE, resource_id = NULL )morie_to_neighbourhoods( version = c("158", "140", "nia"), offline = TRUE, resource_id = NULL )
version |
One of |
offline |
If |
resource_id |
Optional CKAN resource id override. Used only
when |
A data.frame.
City of Toronto Open Data, "Neighbourhoods" dataset (https://open.toronto.ca/dataset/neighbourhoods/); "Neighbourhood Improvement Areas" (https://open.toronto.ca/dataset/neighbourhood-improvement-areas/); licensed under the Open Government Licence – Toronto.
df <- morie_to_neighbourhoods("158", offline = TRUE) head(df[, c("AREA_SHORT_CODE", "AREA_NAME")])df <- morie_to_neighbourhoods("158", offline = TRUE) head(df[, c("AREA_SHORT_CODE", "AREA_NAME")])
Closed-form Wilks (1941) tolerance-interval probability that the
sample interval from min(x) to max(x) covers at least coverage of the
population. Gibbons & Chakraborti Ch 2.11.
morie_tolerance_limits(x, coverage = 0.9, confidence = 0.95)morie_tolerance_limits(x, coverage = 0.9, confidence = 0.95)
x |
Numeric vector. |
coverage |
Desired population coverage |
confidence |
Desired confidence (default 0.95). |
P(coverage of (X_(1), X_(n)) >= beta) = 1 - n * beta^(n-1) + (n - 1) * beta^n
Named list: lower, upper, coverage_requested, confidence_achieved, n, method.
Wilks (1941); Gibbons & Chakraborti (6e) Ch 2.11.
morie_tolerance_limits(1:100, coverage = 0.90, confidence = 0.95)morie_tolerance_limits(1:100, coverage = 0.90, confidence = 0.95)
Wraps loading of the City of Toronto neighbourhood polygon layers
for the CURRENT 158-neighbourhood scheme (HOOD_158), the
HISTORICAL 140-neighbourhood scheme (HOOD_140), and the
Neighbourhood Improvement Area (NIA) layer. Also provides
version-resolution helpers so downstream analyses do not silently
mix the two schemes across years.
The mirror of morie_tps_add_hood_158_from_140(). For 158-hoods
that did not exist in the 140-scheme (the newly-split children
like Etobicoke City Centre / Islington from 14-Islington-City-
Centre-West) the result is the historical parent 140-hood.
morie_tps_add_hood_140_from_158( df, col_in = NULL, col_out = "HOOD_140_equiv", crosswalk = NULL )morie_tps_add_hood_140_from_158( df, col_in = NULL, col_out = "HOOD_140_equiv", crosswalk = NULL )
df |
A TPS crime |
col_in |
Name of the input HOOD_158 column. |
col_out |
Name of the new column. Default |
crosswalk |
Optional pre-loaded crosswalk; defaults to
|
df with the equivalent-code column appended.
Looks up each row's HOOD_140 (or hood_140 / NEIGHBOURHOOD_140
/ neighbourhood_140) in the bundled crosswalk and writes the
PRIMARY-overlap 158 hood code into a new column (default name
HOOD_158_equiv).
morie_tps_add_hood_158_from_140( df, col_in = NULL, col_out = "HOOD_158_equiv", crosswalk = NULL )morie_tps_add_hood_158_from_140( df, col_in = NULL, col_out = "HOOD_158_equiv", crosswalk = NULL )
df |
A TPS crime |
col_in |
Name of the input HOOD_140 column. By default the
first match from |
col_out |
Name of the new column to add. Default
|
crosswalk |
Optional pre-loaded crosswalk; defaults to
|
For 1:1 mappings the result is exact. For splits (1 historical hood -> 2–4 current hoods) the 158 hood with the largest area overlap wins; this is lossy – analyses at the 158-level should ideally re-aggregate from the per-incident lat/lon rather than relying on the primary-overlap join.
df with the equivalent-code column appended.
df <- data.frame(EVENT_ID = 1:3, HOOD_140 = c("082", "001", "075")) morie_tps_add_hood_158_from_140(df)df <- data.frame(EVENT_ID = 1:3, HOOD_140 = c("082", "001", "075")) morie_tps_add_hood_158_from_140(df)
Cake-cutting in the REVERSE direction. Given a data.frame of per-
current-hood counts (one row per hood_158, one or more numeric
count columns), sums across the 158-children of each 140 weighted
by pct_158_in_140 / 100. For 1:1 hoods the count passes through.
For splits the children's counts are summed exactly (each child has
pct_158_in_140 == 100 for clean cake-cuts).
morie_tps_aggregate_158_to_140( df, hood_158_col = "HOOD_158", count_cols = NULL, crosswalk = NULL )morie_tps_aggregate_158_to_140( df, hood_158_col = "HOOD_158", count_cols = NULL, crosswalk = NULL )
df |
A |
hood_158_col |
Name of the 158-hood column. Default
|
count_cols |
Character vector of numeric count columns. By
default every numeric column except |
crosswalk |
Optional pre-loaded crosswalk. |
Unlike morie_tps_disaggregate_140_to_158(), this requires NO
uniform-density assumption when the source is a clean cake-cut –
the partition is exhaustive and disjoint by construction. The only
lossy case is the split+merge edge (one Willowdale East sliver
in the bundled OT data); the function handles it via the
pct_158_in_140 weights regardless.
A data.frame with one row per 140-hood, columns
hood_140, name_140, and the summed count_cols.
df <- data.frame(HOOD_158 = c("167", "168", "001"), incidents = c(40, 60, 42)) morie_tps_aggregate_158_to_140(df)df <- data.frame(HOOD_158 = c("167", "168", "001"), incidents = c(40, 60, 42)) morie_tps_aggregate_158_to_140(df)
R-side port of morie.tps_all_analyze. Provides a uniform
bundle (temporal + spatial + offence + neighbourhood-concentration)
that runs on any of Toronto Police Service's 13 public crime CSVs
plus a cross-category comparison driver.
morie_tps_temporal_summary: year / month / dow / hour rollups.
morie_tps_spatial_summary: neighbourhood / division /
premises / location-type + lat-lon bbox.
morie_tps_offence_summary: OFFENCE / UCR / CSI rollups.
morie_tps_neighbourhood_concentration: Gini +
top-10/top-20 share across HOOD_158.
morie_tps_crime_compare: side-by-side counts
and YoY across multiple TPS data.frames.
morie_tps_analyze_one: full bundle on one frame.
morie_tps_analyze_all: full bundle across every
TPS data.frame supplied in a named list.
morie_tps_analyze_assault(), ...,
morie_tps_analyze_theftover(): 13 thin convenience aliases.
Each summary callable returns a named list carrying summary
lines, optional tables, optional warnings, and a plain-language
interpretation. The aggregate morie_tps_analyze_one()
nests these sub-results under named keys.
Mirrors morie.tps_all_analyze.analyze_all. Caller supplies
the data.frames; loading from disk is left to the user (R-side loaders
live in R/data_access.R and R/dataset_catalog.R). An
optional out_dir writes per-dataset tps_<name>.txt
transcripts.
morie_tps_analyze_all(dfs, out_dir = NULL)morie_tps_analyze_all(dfs, out_dir = NULL)
dfs |
Named |
out_dir |
Optional directory to write per-dataset text dumps.
When |
A named list of morie_tps_result values, plus
a `__cross_compare__` entry from
morie_tps_crime_compare.
Convenience alias: full TPS bundle on the Assault dataset.
morie_tps_analyze_assault(df) morie_tps_analyze_autotheft(df) morie_tps_analyze_bicycletheft(df) morie_tps_analyze_breakandenter(df) morie_tps_analyze_communitysafetyindicators(df) morie_tps_analyze_hatecrimes(df) morie_tps_analyze_homicides(df) morie_tps_analyze_intimatepartnerandfamilyviolence(df) morie_tps_analyze_neighbourhoodcrimerates(df) morie_tps_analyze_robbery(df) morie_tps_analyze_shootingandfirearmdiscarges(df) morie_tps_analyze_theftfrommovingvehicle(df) morie_tps_analyze_theftover(df)morie_tps_analyze_assault(df) morie_tps_analyze_autotheft(df) morie_tps_analyze_bicycletheft(df) morie_tps_analyze_breakandenter(df) morie_tps_analyze_communitysafetyindicators(df) morie_tps_analyze_hatecrimes(df) morie_tps_analyze_homicides(df) morie_tps_analyze_intimatepartnerandfamilyviolence(df) morie_tps_analyze_neighbourhoodcrimerates(df) morie_tps_analyze_robbery(df) morie_tps_analyze_shootingandfirearmdiscarges(df) morie_tps_analyze_theftfrommovingvehicle(df) morie_tps_analyze_theftover(df)
df |
A TPS Assault data.frame. |
A morie_tps_result.
High-level orchestration: takes a named list dfs of TPS
open-data data.frames (one per CSI category) and returns both the
per-year and per-ward CSI as a single rich-result object.
morie_tps_analyze_csi_from_dataframes( dfs, year_col = "OCC_YEAR", hood_col = "HOOD_158", variant = c("total", "violent") )morie_tps_analyze_csi_from_dataframes( dfs, year_col = "OCC_YEAR", hood_col = "HOOD_158", variant = c("total", "violent") )
dfs |
Named list of TPS data.frames. Keys outside of
|
year_col |
Year column name (default "OCC_YEAR"). |
hood_col |
Neighbourhood column name (default "HOOD_158"). |
variant |
One of "total" or "violent" (default "total"). |
A morie_tps_result named list carrying by_year
and by_hood data.frames in payload.
Chains temporal + spatial + offence + concentration into a single
nested result. This is the function the 13 convenience aliases
(morie_tps_analyze_assault(), etc.) wrap.
morie_tps_analyze_one(df, name = "?")morie_tps_analyze_one(df, name = "?")
df |
A TPS crime data.frame. |
name |
The canonical TPS dataset name (used in titles). |
A morie_tps_result with named sub-results under
temporal, spatial, offences, concentration.
Builds a monthly count series via stats::ts, fits
ARIMA(1,1,1) with stats::arima, and forecasts h
periods ahead with stats::predict. AIC is reported from
the fit; BIC is computed manually as
AIC + k * (log(n) - 2).
morie_tps_arima_forecast(df, h = 12L, ds_name = "?")morie_tps_arima_forecast(df, h = 12L, ds_name = "?")
df |
A |
h |
Forecast horizon in months. |
ds_name |
Character label. |
A morie_rich_result list with forecast,
aic, bic, n_train.
Errors when the expected schema's column is absent. Warns when BOTH schemas are present (downstream code MAY accidentally use the wrong one).
morie_tps_assert_hood_version(df, expected = c("158", "140"))morie_tps_assert_hood_version(df, expected = c("158", "140"))
df |
A TPS crime |
expected |
Either |
Invisibly TRUE on success.
Always returns csv. excel requires readxl; spatial formats
require sf. If a needed namespace isn't installed, that format
is omitted from the returned vector.
morie_tps_available_formats()morie_tps_available_formats()
Character vector of available format names, sorted.
Generalises morie_tps_polygon_morans_i to two
attributes: measures the cross-correlation between attribute X at
location i and attribute Y at neighbouring locations j.
morie_tps_bivariate_moran( polygons, x_col, y_col, ds_name = "NeighbourhoodCrimeRates", k_neighbours = 5L, centroid_lat_col = "lat", centroid_lon_col = "lon" )morie_tps_bivariate_moran( polygons, x_col, y_col, ds_name = "NeighbourhoodCrimeRates", k_neighbours = 5L, centroid_lat_col = "lat", centroid_lon_col = "lon" )
polygons |
An sf object, or a data.frame with centroid columns. |
x_col, y_col
|
The two attributes (column names). |
ds_name |
Tag for the result title. |
k_neighbours |
k for the k-NN weights graph. |
centroid_lat_col, centroid_lon_col
|
Names of centroid columns
when |
Polygon centroids and k-NN weights are constructed exactly as in
morie_tps_polygon_morans_i; distances use the
haversine formula for parity with the Python source.
A named list with I_xy, n, x_col,
y_col.
set.seed(2026) polys <- data.frame( HOOD_ID = letters[1:16], lat = rep(43.6 + (0:3) * 0.02, 4), lon = rep(-79.4 + (0:3) * 0.02, each = 4), ASSAULT_RATE_2024 = rpois(16, 30), HOMICIDE_RATE_2024 = rpois(16, 2) ) morie_tps_bivariate_moran(polys, x_col = "ASSAULT_RATE_2024", y_col = "HOMICIDE_RATE_2024", centroid_lat_col = "lat", centroid_lon_col = "lon")set.seed(2026) polys <- data.frame( HOOD_ID = letters[1:16], lat = rep(43.6 + (0:3) * 0.02, 4), lon = rep(-79.4 + (0:3) * 0.02, each = 4), ASSAULT_RATE_2024 = rpois(16, 30), HOMICIDE_RATE_2024 = rpois(16, 2) ) morie_tps_bivariate_moran(polys, x_col = "ASSAULT_RATE_2024", y_col = "HOMICIDE_RATE_2024", centroid_lat_col = "lat", centroid_lon_col = "lon")
Tests whether category A's per-neighbourhood count co-varies with category B's count in NEIGHBOURING neighbourhoods (spatial spillover). Builds a k-NN row-standardised spatial weights matrix from per-hood centroids derived from category A's WGS84 latitude/longitude. Reports Pearson r alongside as a non-spatial baseline.
morie_tps_bivariate_morans_i(dfs, cat_a, cat_b, k_neighbours = 5L)morie_tps_bivariate_morans_i(dfs, cat_a, cat_b, k_neighbours = 5L)
dfs |
Named list of TPS data.frames keyed by category. |
cat_a |
Name of category A in |
cat_b |
Name of category B in |
k_neighbours |
Number of nearest neighbours per row in W
(default |
A morie_tps_result named list.
For every category in dfs, computes per-HOOD_158 counts,
aligns onto a common (union) hood index, and reports the Pearson
correlation matrix.
morie_tps_category_correlation_matrix(dfs)morie_tps_category_correlation_matrix(dfs)
dfs |
Named list of TPS data.frames keyed by category. |
A morie_tps_result named list with a single
correlation table.
Implements Pettitt's non-parametric change-point statistic U_t = sum_i sum_j sign(x_i - x_j) for i <= t < j, then reports the year maximising |U_t| and an approximate p-value. No external change-point dependency is required.
morie_tps_changepoint_detection(df, year_col = "OCC_YEAR", ds_name = "?")morie_tps_changepoint_detection(df, year_col = "OCC_YEAR", ds_name = "?")
df |
A |
year_col |
Year column name. |
ds_name |
Character label. |
A morie_rich_result list with
changepoint_year, K_statistic, p_value,
pre_mean, post_mean.
Fits every supplied (kernel, baseline) combination and ranks by AIC. Mirrors Section 5 of Kwan-Chen-Dunsmuir (2024): the Markovian classical Hawkes is the (exponential, constant) row; the non-Markovian non-stationary models are everything else.
morie_tps_compare_hawkes_kernels( df, ds_name = "?", max_n = 4000L, baselines = .TPS_HAWKES_BASELINES, kernels = .TPS_HAWKES_KERNELS )morie_tps_compare_hawkes_kernels( df, ds_name = "?", max_n = 4000L, baselines = .TPS_HAWKES_BASELINES, kernels = .TPS_HAWKES_KERNELS )
df |
Data frame with |
ds_name |
Dataset name used in titles. |
max_n |
Maximum events to fit. |
baselines |
Baseline kinds to sweep over. |
kernels |
Kernel kinds to sweep over. |
Combinations that fail to converge are recorded with an error message rather than aborting the whole comparison.
A morie_rich_result with a per-combination summary
table, the best (lowest-AIC) combination, and the AIC gap
between the classical Markovian model and the winner.
Kwan TKJ, Chen F, Dunsmuir WTM (2024). arXiv:2408.09710.
## Not run: df <- morie_tps_load_tps_dataset("Assault", nrows = 3000) rr <- morie_tps_compare_hawkes_kernels(df, ds_name = "Assault") ## End(Not run)## Not run: df <- morie_tps_load_tps_dataset("Assault", nrows = 3000) rr <- morie_tps_compare_hawkes_kernels(df, ds_name = "Assault") ## End(Not run)
For each TPS category, computes per-HOOD_158 counts, z-standardises across neighbourhoods, and sums (or weight-and-sums) the z-scores to yield a single composite per neighbourhood. Positive composite = neighbourhood with elevated incidence across many crime types; near-zero = average; negative = below-average exposure.
morie_tps_composite_index(dfs, categories = NULL, weights = NULL, top_n = 25L)morie_tps_composite_index(dfs, categories = NULL, weights = NULL, top_n = 25L)
dfs |
Named list of TPS data.frames keyed by category. |
categories |
Optional character vector restricting categories. |
weights |
Optional named numeric vector of per-category weights; defaults to 1.0 for every loaded category. |
top_n |
How many top/bottom neighbourhoods to surface in the
tables (default |
A morie_tps_result named list.
Accepts a named list of TPS data.frames (e.g.
list(Assault = df_a, Robbery = df_r, ...)) and returns a
morie_tps_result with a total-counts table and (when
OCC_YEAR is present in every frame) a side-by-side
year-by-year matrix.
morie_tps_crime_compare(dfs)morie_tps_crime_compare(dfs)
dfs |
Named |
A morie_tps_result list.
Builds a co-occurrence network in lieu of the co-offender network from D'Orsogna & Perc (2015) Fig. 9 / Diviak et al. (2019). Public TPS data has no co-offender records, so we approximate by projecting (top-N premise types) x (HOOD_158 neighbourhoods) onto a premise-by-premise edge-weighted graph. Edge weight is the count of neighbourhoods in which both premise types appear.
morie_tps_criminal_network_graph( category = "Assault", sample_rows = 30000L, top_n_premises = 20L, save_fig = TRUE )morie_tps_criminal_network_graph( category = "Assault", sample_rows = 30000L, top_n_premises = 20L, save_fig = TRUE )
category |
TPS category name. |
sample_rows |
Maximum rows to load. |
top_n_premises |
Number of premise nodes to keep. |
save_fig |
Whether to emit a circular layout PNG. |
A morie_rich_result with node count, edge count,
strongest edge weight, and the adjacency payload.
Diviak T, Dijkstra JK, Snijders TAB (2019). Structure, multiplexity, and centrality in a corruption network. Trends in Organized Crime 22: 274-297.
## Not run: rr <- morie_tps_criminal_network_graph("Assault", top_n_premises = 10L, save_fig = FALSE) print(rr$summary_lines) ## End(Not run)## Not run: rr <- morie_tps_criminal_network_graph("Assault", top_n_premises = 10L, save_fig = FALSE) print(rr$summary_lines) ## End(Not run)
Canonical CSI category names (the 9 TPS open-data feeds).
MORIE_TPS_CSI_CATEGORIES()MORIE_TPS_CSI_CATEGORIES()
Character vector.
Mirrors morie_tps_csi_per_year but groups by
neighbourhood ID rather than fiscal year. Population is not divided
in here because TPS open data does not ship a per-ward population
table; callers are expected to merge in the City of Toronto Open
Data NeighbourhoodCrimeRates per-ward population for per-capita
rates. Returns the un-normalised weighted sum + total count.
morie_tps_csi_per_neighbourhood( counts_per_hood, variant = c("total", "violent"), weights = NULL )morie_tps_csi_per_neighbourhood( counts_per_hood, variant = c("total", "violent"), weights = NULL )
counts_per_hood |
Long data.frame (columns |
variant |
One of "total" or "violent". |
weights |
Optional override vector of weights. |
A data.frame with one row per neighbourhood.
Accepts either a long-format data.frame (columns year,
category, count) or a nested list keyed
[[year]][[category]] = count.
morie_tps_csi_per_year( counts_per_year, variant = c("total", "violent"), weights = NULL, population = NULL, per_capita_unit = 100000L, rebase_to_year = NULL, rebase_to_value = 100 )morie_tps_csi_per_year( counts_per_year, variant = c("total", "violent"), weights = NULL, population = NULL, per_capita_unit = 100000L, rebase_to_year = NULL, rebase_to_value = 100 )
counts_per_year |
Long data.frame or nested list (see above). |
variant |
One of "total" or "violent". |
weights |
Optional override vector of weights. |
population |
Optional named integer vector (year -> pop);
defaults to |
per_capita_unit |
Rate denominator (default 100000). |
rebase_to_year |
Optional anchor year for the index. |
rebase_to_value |
Index value at the anchor year (default 100). |
Returns a data.frame indexed by year with columns:
raw_weighted_sum – sum_c w_c * n_(c,year)
total_count – sum_c n_(c,year)
population – Toronto population that year
csi_per_capita – raw_weighted_sum / population *
per_capita_unit
simple_count_rate – total_count / population *
per_capita_unit
When rebase_to_year is supplied, an additional
csi_index column is added, anchored so that that year's value
equals rebase_to_value.
A data.frame with one row per year.
Return the CSI weight for a TPS open-data category.
morie_tps_csi_weight(category, variant = c("total", "violent"), weights = NULL)morie_tps_csi_weight(category, variant = c("total", "violent"), weights = NULL)
category |
TPS category name (e.g. "Assault", "Homicides"). |
variant |
One of "total" or "violent". |
weights |
Optional named numeric vector overriding the built-in
tables. When supplied, takes precedence over |
Numeric scalar (0 if unknown).
Resolves to <repo>/data/datasets/TPS/ when morie is loaded out
of a source checkout. Users can override per-call via the path
argument of morie_tps_load_dataset().
morie_tps_data_dir()morie_tps_data_dir()
Requires the optional dbscan package. Coordinates are
projected to km via the small-angle latitude factor so eps_km
is interpretable as a kilometre-scale radius.
morie_tps_dbscan_clusters( df, ds_name = "?", eps_km = 0.25, min_samples = 30L, max_n = 30000L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )morie_tps_dbscan_clusters( df, ds_name = "?", eps_km = 0.25, min_samples = 30L, max_n = 30000L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )
df |
Incident-level data.frame. |
ds_name |
Tag for the result title. |
eps_km |
Neighbourhood radius in km. |
min_samples |
DBSCAN |
max_n |
Subsample cap to keep DBSCAN tractable. |
lat_col, lon_col
|
WGS84 column names. |
A named list with the per-cluster table, the count of noise points, and the largest-cluster size.
if (requireNamespace("dbscan", quietly = TRUE)) { set.seed(2026) df <- data.frame( LAT_WGS84 = c(rnorm(60, 43.65, 0.005), rnorm(60, 43.70, 0.005)), LONG_WGS84 = c(rnorm(60, -79.40, 0.005), rnorm(60, -79.38, 0.005)) ) morie_tps_dbscan_clusters(df, eps_km = 0.5, min_samples = 5L) }if (requireNamespace("dbscan", quietly = TRUE)) { set.seed(2026) df <- data.frame( LAT_WGS84 = c(rnorm(60, 43.65, 0.005), rnorm(60, 43.70, 0.005)), LONG_WGS84 = c(rnorm(60, -79.40, 0.005), rnorm(60, -79.38, 0.005)) ) morie_tps_dbscan_clusters(df, eps_km = 0.5, min_samples = 5L) }
Cake-cutting in the FORWARD direction. Given a data.frame of per-
historical-hood counts (one row per hood_140, one or more numeric
count columns), splits each 140's count across its 158-children in
proportion to pct_140_in_158 / 100. For 1:1 hoods the count is
passed through unchanged. For splits the count is partitioned
(e.g. 140-75 Church-Yonge Corridor's 100 incidents become 59.24 in
158-168 Downtown Yonge East and 40.76 in 158-167 Church-Wellesley).
morie_tps_disaggregate_140_to_158( df, hood_140_col = "HOOD_140", count_cols = NULL, crosswalk = NULL )morie_tps_disaggregate_140_to_158( df, hood_140_col = "HOOD_140", count_cols = NULL, crosswalk = NULL )
df |
A |
hood_140_col |
Name of the 140-hood column. Default
|
count_cols |
Character vector of numeric count columns to
disaggregate. Default: every numeric column in |
crosswalk |
Optional pre-loaded crosswalk; defaults to
|
This assumes UNIFORM SPATIAL DENSITY of the underlying events
inside each 140-hood – which is the best you can do without per-
incident lat/lon. If you have lat/lon, prefer re-binning from
points (via sf::st_join against
morie_to_neighbourhoods("158", offline = FALSE)) over this
uniform-density approximation.
A data.frame with columns hood_158, name_158,
hood_140, all chosen count_cols (now per-158 fractional
counts), and pct_140_in_158 (the cake-cut weight applied).
df <- data.frame(HOOD_140 = c("075", "001"), incidents = c(100, 42)) morie_tps_disaggregate_140_to_158(df)df <- data.frame(HOOD_140 = c("075", "001"), incidents = c(100, 42)) morie_tps_disaggregate_140_to_158(df)
Toronto amalgamated in 1998; the six former municipalities (Etobicoke, North York, Scarborough, York, Old Toronto, East York) are still in common use for district-level reporting and are bbox-defined here.
morie_tps_district_for_centroid(lat, lon)morie_tps_district_for_centroid(lat, lon)
lat |
Numeric latitude (WGS84). |
lon |
Numeric longitude (WGS84). |
A character scalar. Defaults to "Old Toronto" when no
bbox matches.
Walks the ArcGIS REST /query endpoint for the category's
FeatureServer layer, accumulates all features in memory, and
writes a single CSV file to cache_dir. Returns the CSV path.
morie_tps_fetch_category( category, cache_dir = NULL, where = "1=1", overwrite = FALSE, max_records_per_page = 2000L )morie_tps_fetch_category( category, cache_dir = NULL, where = "1=1", overwrite = FALSE, max_records_per_page = 2000L )
category |
One of |
cache_dir |
Directory to write the CSV into. Defaults to
|
where |
ArcGIS SQL |
overwrite |
If |
max_records_per_page |
Pagination size (server caps at 2000). |
Path to the written CSV file.
data.frame.Thin wrapper over morie_tps_fetch_category(): writes the CSV
then reads it back. Mirrors the Python fetch_tps_dataframe
convenience used as a DATASET_CATALOG fetcher.
morie_tps_fetch_dataframe(category, ...)morie_tps_fetch_dataframe(category, ...)
category |
One of |
... |
Passed through to |
A data.frame.
Fits the OU parameters (theta, mu, sigma) on daily counts (same
OLS-on-first-differences as morie_tps_langevin_simulate),
then evolves an initial gaussian density centred on the last
observation by an explicit advection-diffusion finite-difference
scheme with reflective boundaries on a grid spanning
from 0 to 1.5 * max(counts) + 1.
morie_tps_fokker_planck_grid(df, ds_name = "?", n_grid = 64L, n_steps = 200L)morie_tps_fokker_planck_grid(df, ds_name = "?", n_grid = 64L, n_steps = 200L)
df |
A |
ds_name |
Character label. |
n_grid |
Grid points (default 64). |
n_steps |
Time steps (default 200, each of length 0.05 days). |
A morie_rich_result list with theta,
mu, sigma, grid, density,
stationary_peak.
Returns Gi* per neighbourhood (count vector aggregated from the incident data.frame), using a binary k-NN spatial weights matrix with self-inclusion (Gi* convention). z-score interpretation: Gi*1.96 = significant hot spot at alpha=0.05; Gi* < -1.96 =significant cold spot.
morie_tps_getis_ord_g_star( df, ds_name = "?", hood_col = "HOOD_158", k_neighbours = 5L, top_n = 20L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )morie_tps_getis_ord_g_star( df, ds_name = "?", hood_col = "HOOD_158", k_neighbours = 5L, top_n = 20L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )
df |
Incident-level data.frame. |
ds_name |
Tag for the result title. |
hood_col |
Neighbourhood id column. |
k_neighbours |
k for the (binary) k-NN weights graph. |
top_n |
Number of top hot/cold spots to surface. |
lat_col, lon_col
|
WGS84 column names. |
A named list with Gi* per hood, the top hot/cold spot tables, and tallies of hot/cold spots at alpha=0.05.
set.seed(2026) df <- data.frame( HOOD_158 = sample(letters[1:20], 400, replace = TRUE), LAT_WGS84 = 43.6 + runif(400, 0, 0.2), LONG_WGS84 = -79.4 + runif(400, 0, 0.2) ) morie_tps_getis_ord_g_star(df)set.seed(2026) df <- data.frame( HOOD_158 = sample(letters[1:20], 400, replace = TRUE), LAT_WGS84 = 43.6 + runif(400, 0, 0.2), LONG_WGS84 = -79.4 + runif(400, 0, 0.2) ) morie_tps_getis_ord_g_star(df)
G=0 perfectly even, G=1 perfectly concentrated. Used for the neighbourhood-concentration callable below; exposed because tests and downstream code may want it directly.
morie_tps_gini_concentration(x)morie_tps_gini_concentration(x)
x |
Numeric vector (e.g. per-spatial-unit incident counts). |
A scalar Gini coefficient in [0, 1] (or NA when input is empty).
R port of morie.tps_hawkes_advanced. Implements the Kwan,
Chen and Dunsmuir (2024, arXiv:2408.09710v1) methodology for Hawkes
process likelihood inference when the baseline intensity is
time-varying and the excitation kernel is non-exponential
(so the intensity process is non-Markovian).
The complete intensity is
with kernel decomposition where
is the branching ratio (mean offspring per
event) and is a probability density on
. Stationarity requires .
Supported kernels: exponential, gamma, Weibull, Lomax (Pareto-II). Supported baselines: constant and sinusoidal-with-trend
Companion to morie_tps_hawkes_temporal_fit (exponential /
constant Markovian special case) in morie.tps_stochastic.
Goodness-of-fit uses time-rescaling residuals (Brown et al. 2002 Neural Comput. 14: 325-346) and a Kolmogorov-Smirnov test against Uniform(0,1).
If the optional R package hawkes or emhawkes is installed it is consulted for the exponential-kernel constant- baseline fast path; otherwise the negative log-likelihood is computed in base R via direct O(n^2) summation. The non-Markovian kernels (gamma, Weibull, Lomax) always use the base-R path – those kernels lack the memorylessness required for O(n) recursion.
morie_tps_hawkes_advanced_fit – fit one
(kernel, baseline) combination and produce a rich result with
time-rescaling KS diagnostics.
morie_tps_compare_hawkes_kernels – 8-way AIC
comparison across (kernel, baseline) combinations.
morie_tps_hawkes_markovian_vs_nonmarkovian –
focused 2x2 comparison: classical exp/const vs gamma/
sinusoidal.
Kwan TKJ, Chen F, Dunsmuir WTM (2024). Likelihood inference for non-stationary Hawkes processes. arXiv:2408.09710v1.
Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM (2002). The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation 14: 325-346.
Mohler GO, Short MB, Brantingham PJ, Schoenberg FP, Tita GE (2011). Self-exciting point process modeling of crime. Journal of the American Statistical Association 106: 100-108.
Companion to the Markovian exponential / constant fit in
morie_tps_hawkes_temporal_fit. Supports the four kernels
(exponential, gamma, Weibull, Lomax) and two baselines (constant,
sinusoidal) of Kwan-Chen-Dunsmuir (2024).
morie_tps_hawkes_advanced_fit( df, kernel = "gamma", baseline = "sinusoidal", ds_name = "?", max_n = 5000L )morie_tps_hawkes_advanced_fit( df, kernel = "gamma", baseline = "sinusoidal", ds_name = "?", max_n = 5000L )
df |
A data frame with an |
kernel |
Excitation kernel: one of |
baseline |
Baseline kind: |
ds_name |
Dataset name used in titles and warnings. |
max_n |
Maximum number of events to retain (for tractable O(n^2) MLE on the non-Markovian path). |
If the optional packages hawkes or emhawkes are available the (exponential, constant) special case can delegate to their compiled likelihood routines; the non-Markovian kernels always use the base-R O(n^2) negative log-likelihood with L-BFGS-B optimisation under explicit box constraints.
Goodness-of-fit is reported via time-rescaling residuals (Brown et al. 2002) and a Kolmogorov-Smirnov test against Uniform(0, 1).
A morie_rich_result with branching ratio,
stationarity verdict, kernel and baseline parameters,
negative log-likelihood, AIC, BIC, and time-rescaling KS
statistic.
Kwan TKJ, Chen F, Dunsmuir WTM (2024). Likelihood inference for non-stationary Hawkes processes. arXiv:2408.09710v1.
## Not run: df <- morie_tps_load_tps_dataset("Assault", nrows = 4000) rr <- morie_tps_hawkes_advanced_fit(df, kernel = "gamma", baseline = "sinusoidal", ds_name = "Assault") print(rr$summary_lines) ## End(Not run)## Not run: df <- morie_tps_load_tps_dataset("Assault", nrows = 4000) rr <- morie_tps_hawkes_advanced_fit(df, kernel = "gamma", baseline = "sinusoidal", ds_name = "Assault") print(rr$summary_lines) ## End(Not run)
Fits the four (kernel, baseline) combinations corresponding to the two endpoints of the Kwan-Chen-Dunsmuir framework: classical exponential / constant Markovian, against gamma / sinusoidal non-Markovian. Faster to run than the full 8-way comparison and suitable for dashboard surfaces.
morie_tps_hawkes_markovian_vs_nonmarkovian(df, ds_name = "?", max_n = 4000L)morie_tps_hawkes_markovian_vs_nonmarkovian(df, ds_name = "?", max_n = 4000L)
df |
Data frame with |
ds_name |
Dataset name used in titles. |
max_n |
Maximum events to fit. |
A morie_rich_result from
morie_tps_compare_hawkes_kernels restricted to the 2x2
sub-grid.
Kwan TKJ, Chen F, Dunsmuir WTM (2024). arXiv:2408.09710.
## Not run: df <- morie_tps_load_tps_dataset("Assault", nrows = 2000) rr <- morie_tps_hawkes_markovian_vs_nonmarkovian(df, ds_name = "Assault") ## End(Not run)## Not run: df <- morie_tps_load_tps_dataset("Assault", nrows = 2000) rr <- morie_tps_hawkes_markovian_vs_nonmarkovian(df, ds_name = "Assault") ## End(Not run)
Maximum-likelihood fit of a temporal-only exponential Hawkes
process to incident times. Optimisation runs in base R
(stats::optim, Nelder-Mead). Reports background rate mu,
branching ratio kappa, decay omega, and the AIC / BIC of the fit.
morie_tps_hawkes_temporal_fit(df, ds_name = "?", max_n = 5000L)morie_tps_hawkes_temporal_fit(df, ds_name = "?", max_n = 5000L)
df |
A |
ds_name |
Character label for the dataset. |
max_n |
Maximum number of incident times to fit (random subsample seeded with 42 if exceeded). |
A morie_rich_result list with mu,
kappa, omega, branching, nll,
aic, bic.
Three-strategy replicator dynamics (cooperator C, defector / predator P, punisher / inspector O) swept across a grid in the (temptation T, inspection cost gamma) plane. Each grid point runs the replicator update to steady state and records the defector share as a "crime rate" proxy. Reproduces the qualitative phase diagram from D'Orsogna & Perc (2015) sec. 5 / Fig. 8.
morie_tps_inspection_game_phase( n_temptations = 20L, n_costs = 20L, n_steps = 600L, save_fig = TRUE )morie_tps_inspection_game_phase( n_temptations = 20L, n_costs = 20L, n_steps = 600L, save_fig = TRUE )
n_temptations, n_costs
|
Grid resolution. |
n_steps |
Replicator iterations per grid point. |
save_fig |
Whether to write the phase-diagram PNG. |
A morie_rich_result containing the mean, min, max
steady-state defector frequency across the grid, plus the
resolution and step count used.
Helbing D, Szolnoki A, Perc M, Szabo G (2010). Punish, but not too hard. New Journal of Physics 12: 083005.
## Not run: rr <- morie_tps_inspection_game_phase( n_temptations = 8L, n_costs = 8L, n_steps = 120L, save_fig = FALSE) print(rr$summary_lines) ## End(Not run)## Not run: rr <- morie_tps_inspection_game_phase( n_temptations = 8L, n_costs = 8L, n_steps = 120L, save_fig = FALSE) print(rr$summary_lines) ## End(Not run)
Evaluates a Gaussian KDE on incident lat/long and returns summary
statistics plus the (lat, lon) of the densest observation. Prefers
MASS::kde2d when available; otherwise uses a pure
base-R Gaussian kernel evaluated at the observation points (i.e.
kernel density at each datum).
morie_tps_kde_density( df, bandwidth = 0.005, ds_name = "?", lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )morie_tps_kde_density( df, bandwidth = 0.005, ds_name = "?", lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )
df |
Incident-level data.frame. |
bandwidth |
Bandwidth multiplier passed to the 2-D KDE; see
|
ds_name |
Tag for the result title. |
lat_col, lon_col
|
WGS84 column names. |
A named list with summary stats including max/mean/median density and the (lat, lon) of the densest observation.
set.seed(2026) df <- data.frame( LAT_WGS84 = 43.6 + rnorm(120, 0, 0.05), LONG_WGS84 = -79.4 + rnorm(120, 0, 0.05) ) morie_tps_kde_density(df, bandwidth = 0.01)set.seed(2026) df <- data.frame( LAT_WGS84 = 43.6 + rnorm(120, 0, 0.05), LONG_WGS84 = -79.4 + rnorm(120, 0, 0.05) ) morie_tps_kde_density(df, bandwidth = 0.01)
Fits an OU process
to daily incident counts via OLS on first-differences, then runs
n_paths forward simulations of length T_days.
morie_tps_langevin_simulate( df, ds_name = "?", n_paths = 100L, T_days = 365L, dt = 1, seed = 42L )morie_tps_langevin_simulate( df, ds_name = "?", n_paths = 100L, T_days = 365L, dt = 1, seed = 42L )
df |
A |
ds_name |
Character label. |
n_paths |
Number of forward paths to simulate. |
T_days |
Forward horizon in days. |
dt |
Time step (days). |
seed |
RNG seed. |
A morie_rich_result list with theta,
mu, sigma, paths (matrix of n_paths x
n_steps), and final-day quantiles.
Each layer's /query endpoint is appended at request time.
Folded into the morie_tps_layer_urls() Rd via @rdname to
avoid the case-insensitive filesystem collision between
MORIE_TPS_LAYER_URLS.Rd and morie_tps_layer_urls.Rd.
morie_tps_layer_urls() MORIE_TPS_LAYER_URLSmorie_tps_layer_urls() MORIE_TPS_LAYER_URLS
An object of class character of length 9.
Named character vector mapping TPS category names to ArcGIS FeatureServer layer roots.
urls <- morie_tps_layer_urls() names(urls) # categories: Assault, AutoTheft, Homicide, ... length(urls) # number of layersurls <- morie_tps_layer_urls() names(urls) # categories: Assault, AutoTheft, Homicide, ... length(urls) # number of layers
Computes the Hill maximum-likelihood estimator of the upper-tail
Pareto exponent of the step-length distribution between
chronologically consecutive incidents, following Brockmann,
Hufnagel & Geisel (2006). For a power-law tail on the Hill MLE is
Standard error is obtained by 200 nonparametric bootstrap resamples.
morie_tps_levy_flight_alpha( category = "Assault", sample_rows = 30000L, lmin_km = 0.5, save_fig = TRUE )morie_tps_levy_flight_alpha( category = "Assault", sample_rows = 30000L, lmin_km = 0.5, save_fig = TRUE )
category |
TPS category name. |
sample_rows |
Maximum rows to load. |
lmin_km |
Lower tail cutoff in km. |
save_fig |
Whether to emit a log-log empirical-vs-fit PNG. |
A morie_rich_result with ,
bootstrap SE, sample-size diagnostics, and a Lévy-regime
interpretation.
Brockmann D, Hufnagel L, Geisel T (2006). The scaling laws of human travel. Nature 439: 462-465.
## Not run: rr <- morie_tps_levy_flight_alpha("Assault", save_fig = FALSE) print(rr$summary_lines$alpha) ## End(Not run)## Not run: rr <- morie_tps_levy_flight_alpha("Assault", save_fig = FALSE) print(rr$summary_lines$alpha) ## End(Not run)
List TPS categories known to the fetcher.
morie_tps_list_categories()morie_tps_list_categories()
Character vector of category names, sorted.
data.frame.Returns one row per registered category with columns name,
description, and primary_date.
morie_tps_list_datasets()morie_tps_list_datasets()
A data.frame sorted by name.
morie_tps_list_datasets()morie_tps_list_datasets()
Formats whose sibling directory or file is not present on disk are omitted from the returned named character vector. Use this to discover which formats a given category actually exports.
morie_tps_list_formats(name)morie_tps_list_formats(name)
name |
TPS category. Case-insensitive. |
Named character vector (format -> file path).
name in the given format.Mirror of Python's morie.tps_io.load_tps. csv and excel
work with base R / readxl; all spatial formats (geojson,
featurecollection, kml, geopackage, sqlitegeodatabase,
shapefile, filegeodatabase) are gated behind
requireNamespace("sf") and surface a clean install message if
the dependency is missing.
morie_tps_load(name, format = "csv", nrows = NULL)morie_tps_load(name, format = "csv", nrows = NULL)
name |
TPS category. Case-insensitive. |
format |
One of MORIE_TPS_SUPPORTED_FORMATS. |
nrows |
Optional integer cap on rows. |
A data.frame (spatial readers return the dropped-sf
data frame; geometry column is preserved as an sfc).
name is case-insensitive. Pass nrows = N for a quick sample
while developing against the largest tables.
morie_tps_load_dataset(name, path = NULL, csv_filename = NULL, nrows = NULL)morie_tps_load_dataset(name, path = NULL, csv_filename = NULL, nrows = NULL)
name |
Character scalar. One of |
path |
Optional character scalar. Override the CSV file or
directory to load from. If a directory, the first |
csv_filename |
Optional filename inside the category's |
nrows |
Optional integer. Cap on rows to load. |
For non-CSV sibling formats (Excel, GeoJSON, KML, GeoPackage,
Shapefile, etc.), use morie_tps_load() from tps_io.R instead.
A data.frame (the CSV contents) with tolerant
OCCURRENCE_* / REPORTED_* column renaming applied.
## Not run: df <- morie_tps_load_dataset("Assault", nrows = 1000L) ## End(Not run)## Not run: df <- morie_tps_load_dataset("Assault", nrows = 1000L) ## End(Not run)
Computes local Moran's Ii for each neighbourhood given a k-NN spatial weights graph on centroid lat/long, with HH / LL / HL / LH quadrant classification.
morie_tps_local_morans_i( df, hood_col = "HOOD_158", ds_name = "?", k_neighbours = 5L, top_n = 20L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )morie_tps_local_morans_i( df, hood_col = "HOOD_158", ds_name = "?", k_neighbours = 5L, top_n = 20L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )
df |
Incident-level data.frame. |
hood_col |
Neighbourhood id column. |
ds_name |
Tag for the result title. |
k_neighbours |
k for the spatial weights graph. |
top_n |
Number of top-Ii rows to surface in the result table. |
lat_col, lon_col
|
WGS84 column names. |
A named list with table (data.frame of per-hood
I_i, z, Wz, quadrant) and quadrant tallies.
set.seed(2026) df <- data.frame( HOOD_158 = sample(letters[1:15], 300, replace = TRUE), LAT_WGS84 = 43.6 + runif(300, 0, 0.2), LONG_WGS84 = -79.4 + runif(300, 0, 0.2) ) morie_tps_local_morans_i(df, top_n = 5L)set.seed(2026) df <- data.frame( HOOD_158 = sample(letters[1:15], 300, replace = TRUE), LAT_WGS84 = 43.6 + runif(300, 0, 0.2), LONG_WGS84 = -79.4 + runif(300, 0, 0.2) ) morie_tps_local_morans_i(df, top_n = 5L)
Treats yearly category counts as the prey and a 3-year
rolling mean as a placeholder predator (TPS does not yet
expose a public mass-stop / use-of-force time series). Under the
classical Lotka-Volterra system,
the small-amplitude oscillation around the equilibrium has period
. Growth rate is
estimated from log-differences of x; symmetrically
from y; the interaction rates follow by
the equilibrium relations.
morie_tps_lotka_volterra_police_crime(category = "Assault", save_fig = TRUE)morie_tps_lotka_volterra_police_crime(category = "Assault", save_fig = TRUE)
category |
TPS category name. |
save_fig |
Whether to write a yearly time-series PNG. |
A morie_rich_result with the four LV parameters, the
linearised cycle period, the year range, and a qualitative
interpretation.
D'Orsogna MR, Perc M (2015). Statistical physics of crime: A review. Physics of Life Reviews 12: sec. 3.4.
## Not run: rr <- morie_tps_lotka_volterra_police_crime("Assault", save_fig = FALSE) print(rr$summary_lines) ## End(Not run)## Not run: rr <- morie_tps_lotka_volterra_police_crime("Assault", save_fig = FALSE) print(rr$summary_lines) ## End(Not run)
Loops morie_tps_polygon_morans_i over a grid of
value-column prefixes and years, returning the resulting matrix
of Moran's I values for downstream visualisation as a heatmap.
morie_tps_moran_sweep_heatmap( polygons, category_prefixes = NULL, years = NULL, k_neighbours = 5L, ds_name = "NeighbourhoodCrimeRates", centroid_lat_col = "lat", centroid_lon_col = "lon" )morie_tps_moran_sweep_heatmap( polygons, category_prefixes = NULL, years = NULL, k_neighbours = 5L, ds_name = "NeighbourhoodCrimeRates", centroid_lat_col = "lat", centroid_lon_col = "lon" )
polygons |
An sf object or data.frame with centroid columns and per-year value columns. |
category_prefixes |
Character vector of column prefixes. Defaults to the 9 published TPS rate categories. |
years |
Integer vector of years. Defaults to 2014:2024. |
k_neighbours |
k for the k-NN weights graph passed down. |
ds_name |
Tag for the result title. |
centroid_lat_col, centroid_lon_col
|
Centroid column names
forwarded to |
Column names are constructed as paste0(prefix, "_", year).
A named list with the (category x year) Moran's I matrix.
set.seed(2026) polys <- data.frame( HOOD_ID = letters[1:16], lat = rep(43.6 + (0:3) * 0.02, 4), lon = rep(-79.4 + (0:3) * 0.02, each = 4), ASSAULT_RATE_2023 = rpois(16, 30), ASSAULT_RATE_2024 = rpois(16, 32), HOMICIDE_RATE_2023 = rpois(16, 2), HOMICIDE_RATE_2024 = rpois(16, 2) ) morie_tps_moran_sweep_heatmap(polys, category_prefixes = c("ASSAULT_RATE", "HOMICIDE_RATE"), years = c(2023L, 2024L), centroid_lat_col = "lat", centroid_lon_col = "lon")set.seed(2026) polys <- data.frame( HOOD_ID = letters[1:16], lat = rep(43.6 + (0:3) * 0.02, 4), lon = rep(-79.4 + (0:3) * 0.02, each = 4), ASSAULT_RATE_2023 = rpois(16, 30), ASSAULT_RATE_2024 = rpois(16, 32), HOMICIDE_RATE_2023 = rpois(16, 2), HOMICIDE_RATE_2024 = rpois(16, 2) ) morie_tps_moran_sweep_heatmap(polys, category_prefixes = c("ASSAULT_RATE", "HOMICIDE_RATE"), years = c(2023L, 2024L), centroid_lat_col = "lat", centroid_lon_col = "lon")
Builds a k-NN spatial weights matrix from neighbourhood centroids (mean LAT/LONG of incidents in each hood) and computes the global Moran's I on the count vector. The Cliff-Ord normal-assumption variance is used for the z-score and two-sided p-value.
morie_tps_morans_i_neighbourhood( df, hood_col = "HOOD_158", ds_name = "?", k_neighbours = 5L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84", use_spdep = FALSE )morie_tps_morans_i_neighbourhood( df, hood_col = "HOOD_158", ds_name = "?", k_neighbours = 5L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84", use_spdep = FALSE )
df |
Incident-level data.frame. |
hood_col |
Character. Neighbourhood id column (default
|
ds_name |
Character. Tag for the result title. |
k_neighbours |
k for the k-NN spatial weights graph (default 5). |
lat_col, lon_col
|
WGS84 column names (default
|
use_spdep |
If |
A named list with classes morie_tps_spatial_result,
morie_rich_result, list. Numeric outputs include
moran_I, expected_I, var_I, z_score,
p_value, n.
set.seed(2026) n_inc <- 400 df <- data.frame( HOOD_158 = sample(letters[1:20], n_inc, replace = TRUE), LAT_WGS84 = 43.6 + runif(n_inc, 0, 0.2), LONG_WGS84 = -79.4 + runif(n_inc, 0, 0.2) ) morie_tps_morans_i_neighbourhood(df)set.seed(2026) n_inc <- 400 df <- data.frame( HOOD_158 = sample(letters[1:20], n_inc, replace = TRUE), LAT_WGS84 = 43.6 + runif(n_inc, 0, 0.2), LONG_WGS84 = -79.4 + runif(n_inc, 0, 0.2) ) morie_tps_morans_i_neighbourhood(df)
Uses HOOD_158 (the 158-neighbourhood scheme) and reports a Gini coefficient plus the cumulative share of incidents in the top-10 and top-20 neighbourhoods.
morie_tps_neighbourhood_concentration(df, ds_name = "?")morie_tps_neighbourhood_concentration(df, ds_name = "?")
df |
A TPS crime data.frame. |
ds_name |
Optional dataset label used in the result title. |
A morie_tps_result list with payload$gini,
payload$n_hoods, payload$p_top10, payload$p_top20.
Top-20 OFFENCE / UCR_CODE / CSI_CATEGORY tables (whichever are present).
morie_tps_offence_summary(df, ds_name = "?")morie_tps_offence_summary(df, ds_name = "?")
df |
A TPS crime data.frame. |
ds_name |
Optional dataset label used in the result title. |
A morie_tps_result named list.
Accepts an sf object (recommended) carrying neighbourhood
polygons and a numeric value column, computes polygon centroids
via sf::st_centroid, then runs Moran's I with a k-NN row-
standardised weights matrix on those centroids. Falls back to a
data.frame carrying precomputed centroid columns when
sf is unavailable.
morie_tps_polygon_morans_i( polygons, value_col, ds_name = "NeighbourhoodCrimeRates", k_neighbours = 5L, centroid_lat_col = "lat", centroid_lon_col = "lon" )morie_tps_polygon_morans_i( polygons, value_col, ds_name = "NeighbourhoodCrimeRates", k_neighbours = 5L, centroid_lat_col = "lat", centroid_lon_col = "lon" )
polygons |
An sf object, or a data.frame with centroid columns. |
value_col |
Column to test for spatial autocorrelation. |
ds_name |
Tag for the result title. |
k_neighbours |
k for the k-NN weights graph. |
centroid_lat_col, centroid_lon_col
|
Names of the centroid
columns when |
A named list with moran_I, z_score,
p_value, n.
set.seed(2026) polys <- data.frame( HOOD_ID = letters[1:16], lat = rep(43.6 + (0:3) * 0.02, 4), lon = rep(-79.4 + (0:3) * 0.02, each = 4), ASSAULT_RATE_2024 = rpois(16, 30) ) morie_tps_polygon_morans_i(polys, value_col = "ASSAULT_RATE_2024", centroid_lat_col = "lat", centroid_lon_col = "lon")set.seed(2026) polys <- data.frame( HOOD_ID = letters[1:16], lat = rep(43.6 + (0:3) * 0.02, 4), lon = rep(-79.4 + (0:3) * 0.02, each = 4), ASSAULT_RATE_2024 = rpois(16, 30) ) morie_tps_polygon_morans_i(polys, value_col = "ASSAULT_RATE_2024", centroid_lat_col = "lat", centroid_lon_col = "lon")
ASSAULT_RATE_2024 -> "Assault rate * 2024". Strips
underscores and casing so plot titles look like prose.
morie_tps_pretty_label(s)morie_tps_pretty_label(s)
s |
Character scalar (column name). |
Character scalar (display label).
Equirectangular projection centred at (lat_c, lon_c), then rotated
rot_deg_cw degrees clockwise (default 17.5). Returns
kilometres east-of-centre (after rotation) and kilometres
north-of-centre (after rotation).
morie_tps_project_xy( lat, lon, rot_deg_cw = .MORIE_TPS_ROT_DEG_CW, lat_c = .MORIE_TPS_LAT_C, lon_c = .MORIE_TPS_LON_C )morie_tps_project_xy( lat, lon, rot_deg_cw = .MORIE_TPS_ROT_DEG_CW, lat_c = .MORIE_TPS_LAT_C, lon_c = .MORIE_TPS_LON_C )
lat |
Numeric vector of latitudes (WGS84 degrees). |
lon |
Numeric vector of longitudes (WGS84 degrees). |
rot_deg_cw |
Clockwise rotation in degrees (default 17.5). |
lat_c, lon_c
|
Centre-point of the projection (default downtown Toronto: 43.70 N, 79.40 W). |
Clockwise convention: positive rot_deg_cw rotates the map
so a line that previously sloped up-right slopes less (or
down-right).
A named list with numeric vectors x (km east of centre)
and y (km north of centre).
List the TPS PSDP layers wrapped by morie
morie_tps_psdp_layers()morie_tps_psdp_layers()
A data.frame with columns layer_key, label,
arcgis_url, fixture, hub_id (3TT+ canonical id matching
the TPS Hub catalog).
A named list of one-row metadata records keyed by canonical
category name. Each entry holds description, primary_date
(canonical date column name), and has_geometry (whether
LAT/LONG WGS84 columns are expected).
MORIE_TPS_REGISTRYMORIE_TPS_REGISTRY
An object of class list of length 13.
R-side port of morie.tps_render. Carries the two design
rules from the Python module (per the author, 2026-05-07):
No floating neighbourhood text labels on the map – hot-spot
identification is delivered via the morie_tps_* result
tables, not via on-canvas text.
Map is rotated approximately 17.5 degrees clockwise in projected space so Lake Ontario's shoreline sits level horizontally – matching the Sigar Li 2022 "Hotspot Policing for the City of Toronto" poster aesthetic and the Hohl 2024 ALMI homicide-cluster map.
Plotting back-ends are gated behind ggplot2; without it, the
callables fall back to base plot(). Heavy panels
(kernel-density, LISA, Getis-Ord, Kulldorff scan) that depend on
the Python TPS spatial modules are intentionally not ported here:
the projection + base choropleth / point-pattern primitives below
are enough for the empirical paper's figures.
morie_tps_project_xy: degrees -> rotated planar km.
morie_tps_pretty_label: ASSAULT_RATE_2024 ->
"Assault rate * 2024".
morie_tps_district_for_centroid: lat/lon ->
pre-1998 borough name.
morie_tps_render_choropleth: polygon choropleth
(ggplot2 if available, else base R).
morie_tps_render_points: point-pattern map
(incident dots + optional DBSCAN colouring).
morie_tps_render_yearly_grid: small-multiples
by year.
Renders a sequential-colour choropleth from a polygon data.frame
carrying one row per neighbourhood with a list-column of WGS84
(lon, lat) rings (geometry) and a numeric rate_col.
This signature deliberately matches what
rmorie::morie_fetch("https://.../NeighbourhoodCrimeRates...",
format = "geojson") returns once unrolled.
morie_tps_render_choropleth( polys, rate_col = "ASSAULT_RATE_2024", title = NULL, cmap = "YlOrRd", outfile = NULL, fig_w = 12, fig_h = 7, show_ids = TRUE, border_color = "#1a1a1a", border_lw = 0.7 )morie_tps_render_choropleth( polys, rate_col = "ASSAULT_RATE_2024", title = NULL, cmap = "YlOrRd", outfile = NULL, fig_w = 12, fig_h = 7, show_ids = TRUE, border_color = "#1a1a1a", border_lw = 0.7 )
polys |
A |
rate_col |
Name of the metric column. Default |
title |
Plot title; defaults to a Hohl-2024-style auto-label. |
cmap |
Sequential colour palette name (default |
outfile |
Path to write the image ( |
fig_w, fig_h
|
Figure size in inches. |
show_ids |
When |
border_color |
Polygon edge colour. |
border_lw |
Polygon edge linewidth. |
When ggplot2 is available the render uses
geom_polygon + scale_fill_distiller; otherwise the
base R polygon() primitive is used.
A ggplot object (when ggplot2 is loaded) or
invisible(NULL) for the base-R fallback; the file path is
returned invisibly when outfile is supplied.
Runs DBSCAN on rotated-km coordinates and colours points by cluster label, with noise rendered grey. Requires the suggested dbscan package; without it a base-graphics single-colour fallback is drawn.
morie_tps_render_dbscan( points_df, eps_km = 0.5, min_samples = 8L, outfile = NULL, ... )morie_tps_render_dbscan( points_df, eps_km = 0.5, min_samples = 8L, outfile = NULL, ... )
points_df |
data.frame with columns |
eps_km |
DBSCAN epsilon in kilometres. |
min_samples |
Minimum samples per cluster. |
outfile |
Optional output path. |
... |
Extra plotting args (size, alpha, palette). |
ggplot object or invisible NULL.
Renders one centroid-anchored circle per polygon row, sized
proportionally to count_col. Useful for showing per-district
incident counts without colour-coding the polygons themselves.
morie_tps_render_district_proportional( polys, count_col, max_radius_km = 3, outfile = NULL )morie_tps_render_district_proportional( polys, count_col, max_radius_km = 3, outfile = NULL )
polys |
data.frame with one row per polygon, including a
|
count_col |
Name of the numeric column. |
max_radius_km |
Largest symbol radius in km. |
outfile |
Optional output path. |
ggplot object or invisible NULL.
Projects (LAT_WGS84, LONG_WGS84) to the rotated Toronto canvas and
draws one dot per incident. When eps_km and min_samples
are supplied AND the dbscan package is installed, points are
coloured by DBSCAN cluster label.
morie_tps_render_points( df, category = "Assault", eps_km = NULL, min_samples = 20L, outfile = NULL, show_top = 12L, fig_w = 12, fig_h = 7.5 )morie_tps_render_points( df, category = "Assault", eps_km = NULL, min_samples = 20L, outfile = NULL, show_top = 12L, fig_w = 12, fig_h = 7.5 )
df |
A TPS data.frame with columns |
category |
Optional category label used in the title. |
eps_km |
DBSCAN neighbourhood radius in km. When |
min_samples |
DBSCAN minimum cluster size. |
outfile |
Path to write the image, or |
show_top |
Cap on how many clusters appear in the legend. |
fig_w, fig_h
|
Figure size. |
A ggplot (when ggplot2 is available) or
invisible(NULL) for the base-R path.
Lays out a 2x2 quad combining a choropleth, point pattern, yearly grid
summary, and (when available) a DBSCAN cluster panel. Falls back to
base graphics with par(mfrow = c(2, 2)) when ggplot2 is absent.
morie_tps_render_quad(data, outfile = NULL, ...)morie_tps_render_quad(data, outfile = NULL, ...)
data |
Named list with elements: |
outfile |
Optional output path; when NULL the rendered object is returned (ggplot or invisible NULL for base). |
... |
Forwarded to the underlying single-panel renderers. |
A patchwork-or-list object (ggplot2 path) or invisible NULL.
Renders Kulldorff-style circular candidate windows on the TPS canvas.
Currently a thin layer over centroids + radius circles; the full
likelihood-ratio overlay and significance ranking depend on the Python
morie.tps_satscan module and are stubbed.
morie_tps_render_satscan_panel(clusters, outfile = NULL)morie_tps_render_satscan_panel(clusters, outfile = NULL)
clusters |
data.frame with columns |
outfile |
Optional output path. |
ggplot object or invisible NULL.
Walks polys once and renders one ggplot facet per year for
columns named <prefix>_<year>.
morie_tps_render_yearly_grid( polys, prefix = "ASSAULT_RATE", years = 2014:2024, cmap = "Reds", outfile = NULL, ncols = 4L )morie_tps_render_yearly_grid( polys, prefix = "ASSAULT_RATE", years = 2014:2024, cmap = "Reds", outfile = NULL, ncols = 4L )
polys |
Polygon data.frame (see
|
prefix |
Column-name prefix (default |
years |
Integer vector of years (default 2014:2024). |
cmap |
Sequential palette name (default |
outfile |
Optional output path. |
ncols |
Number of facet columns. |
A ggplot (when ggplot2 is loaded) or
invisible(NULL) for the base-R fallback.
Many TPS PSDP crime layers carry BOTH HOOD_158 (current) and
HOOD_140 (historical 2014–2021) columns. Pick the version your
analysis needs explicitly so the two schemes are not silently mixed
across years.
morie_tps_resolve_hood_col(df, prefer = c("158", "140"), fallback = TRUE)morie_tps_resolve_hood_col(df, prefer = c("158", "140"), fallback = TRUE)
df |
A TPS crime |
prefer |
Either |
fallback |
If |
Character scalar (the chosen column name), or NULL if no
suitable column is present.
df <- data.frame(OCC_YEAR = 2024L, HOOD_158 = "82", HOOD_140 = "82") morie_tps_resolve_hood_col(df, prefer = "158")df <- data.frame(OCC_YEAR = 2024L, HOOD_158 = "82", HOOD_140 = "82") morie_tps_resolve_hood_col(df, prefer = "158")
Computes Ripley's K(r) at each user-supplied radius (km), the Besag-centred L(r)-r transformation, and the CSR baseline pi*r^2. Coordinates are projected to km via the small-angle latitude factor; for typical city-scale point patterns this is accurate enough that haversine is unnecessary.
morie_tps_ripley_k( df, ds_name = "?", radii_km = c(0.25, 0.5, 1, 2, 3, 5), max_n = 5000L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )morie_tps_ripley_k( df, ds_name = "?", radii_km = c(0.25, 0.5, 1, 2, 3, 5), max_n = 5000L, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84" )
df |
Incident-level data.frame. |
ds_name |
Tag for the result title. |
radii_km |
Numeric vector of radii in km (default 0.25, 0.5, 1, 2, 3, 5). |
max_n |
Subsample cap (default 5000) to keep the pairwise distance matrix tractable. |
lat_col, lon_col
|
WGS84 column names. |
A named list with the per-radius table, intensity, and bounding-box area.
set.seed(2026) df <- data.frame( LAT_WGS84 = 43.6 + rnorm(80, 0, 0.04), LONG_WGS84 = -79.4 + rnorm(80, 0, 0.04) ) morie_tps_ripley_k(df, radii_km = c(0.5, 1, 2))set.seed(2026) df <- data.frame( LAT_WGS84 = 43.6 + rnorm(80, 0, 0.04), LONG_WGS84 = -79.4 + rnorm(80, 0, 0.04) ) morie_tps_ripley_k(df, radii_km = c(0.5, 1, 2))
Hold-out validation forecast: fits SARIMA(p,d,q)(P,D,Q)_12 with
stats::arima to the leading training months, forecasts the
last h months, and reports MAPE / RMSE.
morie_tps_sarima_forecast( df, ds_name = "?", h = 12L, order = c(1L, 1L, 1L), seasonal = c(0L, 1L, 1L, 12L) )morie_tps_sarima_forecast( df, ds_name = "?", h = 12L, order = c(1L, 1L, 1L), seasonal = c(0L, 1L, 1L, 12L) )
df |
A |
ds_name |
Character label. |
h |
Hold-out horizon in months (default 12). |
order |
Non-seasonal ARIMA order |
seasonal |
Seasonal order |
A morie_rich_result list with aic,
bic, mape_pct, rmse, forecast,
actual.
Solves the coupled reaction-diffusion system
on a cosine-corrected Toronto grid seeded by the observed incident
histogram. Localised attractiveness spikes emerge whenever
place the system in the
instability regime (D'Orsogna & Perc 2015, sec. 3.2).
morie_tps_sdb_reaction_diffusion( category = "Assault", sample_rows = 30000L, eta = 0.05, omega = 0.3, theta = 1.5, D = 0.1, gamma = 0.05, n_steps = 800L, dt = 0.04, nx = 90L, ny = 60L, save_fig = TRUE )morie_tps_sdb_reaction_diffusion( category = "Assault", sample_rows = 30000L, eta = 0.05, omega = 0.3, theta = 1.5, D = 0.1, gamma = 0.05, n_steps = 800L, dt = 0.04, nx = 90L, ny = 60L, save_fig = TRUE )
category |
TPS category name (default |
sample_rows |
Maximum number of incident rows to load
( |
eta, omega, theta, D, gamma
|
PDE coefficients. |
n_steps |
Number of forward-Euler integration steps. |
dt |
Integration step size. |
nx, ny
|
Grid resolution. |
save_fig |
Whether to write a 1x3 PNG triptych (seed / A(x,t) / rho(x,t)) to the manifest figure directory. |
Steady-state spike count is compared against a DBSCAN cluster count
on the raw incidents (delegated to morie_tps_dbscan_clusters
when available).
A morie_rich_result list with the steady-state
spike count, mean field values, DBSCAN comparison, and the
integration parameters.
Short MB, D'Orsogna MR, Pasour VB, Tita GE, Brantingham PJ, Bertozzi AL, Chayes LB (2008). A statistical model of criminal behavior. M3AS 18(supp01): 1249-1267.
## Not run: rr <- morie_tps_sdb_reaction_diffusion( "Assault", sample_rows = 5000, n_steps = 200, save_fig = FALSE ) print(rr$summary_lines) ## End(Not run)## Not run: rr <- morie_tps_sdb_reaction_diffusion( "Assault", sample_rows = 5000, n_steps = 200, save_fig = FALSE ) print(rr$summary_lines) ## End(Not run)
Reproduces the localised hot-spot lattice from Short, D'Orsogna & Brantingham (2008) Fig. 4 / D'Orsogna & Perc (2015) Fig. 5 on a clean periodic grid, seeded by a homogeneous steady state plus small Gaussian noise. The parameters chosen here place the system in the Turing-instability regime so the homogeneous solution is unstable and the system self-organises into a near-hexagonal lattice of localised spikes.
morie_tps_sdb_turing_demo( eta = 0.2, omega = 0.033, theta = 0.56, D = 30, gamma = 0.019, n_steps = 6000L, dt = 0.005, n = 80L, save_fig = TRUE )morie_tps_sdb_turing_demo( eta = 0.2, omega = 0.033, theta = 0.56, D = 30, gamma = 0.019, n_steps = 6000L, dt = 0.005, n = 80L, save_fig = TRUE )
eta, omega, theta, D, gamma
|
PDE coefficients. |
n_steps |
Integration steps. |
dt |
Step size. |
n |
Grid side length. |
save_fig |
Whether to write a 1x3 snapshot panel PNG. |
A morie_rich_result with the steady-state spike
count, mean fields, and the integration parameters.
Short MB, D'Orsogna MR, Brantingham PJ et al. (2008). M3AS 18(supp01): 1249-1267.
## Not run: rr <- morie_tps_sdb_turing_demo(n = 32L, n_steps = 300L, save_fig = FALSE) print(rr$summary_lines$SteadySpikes) ## End(Not run)## Not run: rr <- morie_tps_sdb_turing_demo(n = 32L, n_steps = 300L, save_fig = FALSE) print(rr$summary_lines$SteadySpikes) ## End(Not run)
Counts incidents by month-of-year, day-of-week, and hour-of-day, then runs a chi-square goodness-of-fit test against a uniform distribution on each cycle.
morie_tps_seasonal_pattern(df, ds_name = "?")morie_tps_seasonal_pattern(df, ds_name = "?")
df |
A |
ds_name |
Character label. |
A morie_rich_result list with per-cycle counts and
chi-square p-values.
Neighbourhood + division + premises + location-type rollups plus a lat/long bounding-box summary. Tolerates missing columns.
morie_tps_spatial_summary(df, ds_name = "?")morie_tps_spatial_summary(df, ds_name = "?")
df |
A TPS crime data.frame. |
ds_name |
Optional dataset label used in the result title. |
A morie_tps_result named list.
R port of morie.tps_statphysics. Implements the four canonical
methods reviewed by D'Orsogna & Perc (2015), Statistical
physics of crime: A review, Physics of Life Reviews 12: 1-21
(arXiv:1411.1743), together with two illustrative companions
(canonical Turing-pattern demo and Helbing-Szolnoki inspection-game
phase diagram) and a premise x neighbourhood co-occurrence network.
Each callable consumes one TPS category and returns a multi-section
morie_rich_result. Cosine-corrected projection and DBSCAN
delegation are deferred to companion modules (tps_render,
tps_spatial_advanced); when those collaborators are not
available the routines fall back to a stop-stub explaining the gap.
morie_tps_sdb_reaction_diffusion — Short,
D'Orsogna and Brantingham (2008) hot-spot PDE, data-seeded.
morie_tps_levy_flight_alpha — Hill-MLE
Levy-flight tail exponent following Brockmann, Hufnagel and
Geisel (2006).
morie_tps_urban_scaling_beta — Bettencourt
et al. (2007) urban-scaling beta across the 158 Toronto
wards.
morie_tps_lotka_volterra_police_crime — Lotka-
Volterra predator-prey on yearly counts.
morie_tps_sdb_turing_demo — canonical Turing-
instability demo on a periodic lattice.
morie_tps_inspection_game_phase — three-
strategy replicator phase diagram (Helbing, Szolnoki & Perc
2010).
morie_tps_criminal_network_graph — premise x
neighbourhood co-occurrence network (Diviak et al.
2019-style projection from public TPS data).
D'Orsogna MR, Perc M (2015). Statistical physics of crime: A review. Physics of Life Reviews 12: 1-21.
Short MB, D'Orsogna MR, Pasour VB, Tita GE, Brantingham PJ, Bertozzi AL, Chayes LB (2008). A statistical model of criminal behavior. Mathematical Models and Methods in Applied Sciences 18(supp01): 1249-1267.
Brockmann D, Hufnagel L, Geisel T (2006). The scaling laws of human travel. Nature 439: 462-465.
Bettencourt LMA, Lobo J, Helbing D, Kuhnert C, West GB (2007). Growth, innovation, scaling, and the pace of life in cities. Proceedings of the National Academy of Sciences 104: 7301-7306.
Helbing D, Szolnoki A, Perc M, Szabo G (2010). Punish, but not too hard: how costly punishment spreads in the spatial public goods game. New Journal of Physics 12: 083005.
Diviak T, Dijkstra JK, Snijders TAB (2019). Structure, multiplexity, and centrality in a corruption network. Trends in Organized Crime 22: 274-297.
Convenience wrapper that calls morie_tps_sdb_reaction_diffusion,
morie_tps_levy_flight_alpha,
morie_tps_urban_scaling_beta, and
morie_tps_lotka_volterra_police_crime on every category in
the supplied list. Returns a nested list keyed first by category
and then by method.
morie_tps_statphysics_analyze_all(categories = NULL, save_fig = TRUE)morie_tps_statphysics_analyze_all(categories = NULL, save_fig = TRUE)
categories |
Character vector of TPS category names; default is the canonical nine-category TPS set. |
save_fig |
Whether to ask each sub-routine to write its figure. |
A named list of lists of morie_rich_result objects.
D'Orsogna MR, Perc M (2015). Physics of Life Reviews 12: 1-21.
## Not run: res <- morie_tps_statphysics_analyze_all(c("Assault", "Robbery"), save_fig = FALSE) ## End(Not run)## Not run: res <- morie_tps_statphysics_analyze_all(c("Assault", "Robbery"), save_fig = FALSE) ## End(Not run)
R port of morie.tps_stochastic. Four jurisdiction-agnostic
callables: temporal-only exponential Hawkes self-exciting fit,
seasonal ARIMA forecast on monthly counts, Euler-Maruyama
Ornstein-Uhlenbeck simulation, and a 1-D Fokker-Planck density
evolution. The R port keeps optimisation in base R
(stats::optim) and the seasonal forecast in
stats::arima so no external time-series package is needed.
All functions return a multi-section morie_rich_result list.
morie_tps_hawkes_temporal_fit: fit mu, kappa,
omega of an exponential-kernel Hawkes process to incident
times; report branching ratio + AIC/BIC.
morie_tps_sarima_forecast: seasonal ARIMA on
monthly counts with train / hold-out MAPE.
morie_tps_langevin_simulate: Euler-Maruyama
OU SDE paths fitted to daily counts.
morie_tps_fokker_planck_grid: 1-D
finite-difference density evolution under OU drift+diffusion.
References: Mohler et al. 2011 (self-exciting point process crime); Short, D'Orsogna, Bertozzi 2010 (stochastic physics of crime).
morie_tps_load() knows how to dispatch.Format names that morie_tps_load() knows how to dispatch.
MORIE_TPS_SUPPORTED_FORMATSMORIE_TPS_SUPPORTED_FORMATS
An object of class character of length 9.
R port of morie.tps_temporal. Four jurisdiction-agnostic
callables operating on a Toronto Police Service-shaped
data.frame: yearly trend, seasonal cyclic stats,
Pettitt-style change-point on yearly counts, and an ARIMA(1,1,1)
forecast on monthly counts. All functions return a multi-section
morie_rich_result list so output can be printed directly
to a notebook.
morie_tps_year_over_year_trend: OLS slope /
intercept / R-squared on yearly incident counts.
morie_tps_seasonal_pattern: month / DOW /
hour cyclic counts plus chi-square uniformity tests.
morie_tps_changepoint_detection: Pettitt's
non-parametric change-point on yearly counts (no external
change-point dependency).
morie_tps_arima_forecast: ARIMA(1,1,1)
forecast on monthly counts via stats::arima.
Year / month / day-of-week / hour-of-day rollups, plus a coverage line. Robust to missing columns: only includes tables for the fields actually present.
morie_tps_temporal_summary(df, ds_name = "?")morie_tps_temporal_summary(df, ds_name = "?")
df |
A TPS crime data.frame. |
ds_name |
Optional dataset label used in the result title. |
A morie_tps_result named list.
Toronto reference population by fiscal year (StatsCan 17-10-0009-01).
MORIE_TPS_TORONTO_POPULATION_BY_YEAR()MORIE_TPS_TORONTO_POPULATION_BY_YEAR()
Named integer vector (year-as-string -> population).
Total-CSI weights for the 9 TPS open-data categories.
MORIE_TPS_TOTAL_CSI_WEIGHTS()MORIE_TPS_TOTAL_CSI_WEIGHTS()
Named numeric vector.
Performs the standard log-log OLS scaling fit
where is the crime count and is the population
of ward i. indicates super-linear (crime
grows faster than population), linear, and
sub-linear (protective) scaling (Bettencourt
et al. 2007; D'Orsogna & Perc 2015 sec. 4.1).
morie_tps_urban_scaling_beta( category = "Assault", year = 2024L, save_fig = TRUE )morie_tps_urban_scaling_beta( category = "Assault", year = 2024L, save_fig = TRUE )
category |
TPS category name. |
year |
Reference year used to choose the appropriate population and crime columns. |
save_fig |
Whether to write a log-log scatter + fit PNG. |
A morie_rich_result with , its
standard error, R-squared, the back-transformed prefactor
, and a regime label (sub-linear, linear, super-linear).
Bettencourt LMA, Lobo J, Helbing D, Kuhnert C, West GB (2007). Growth, innovation, scaling, and the pace of life in cities. PNAS 104: 7301-7306.
## Not run: rr <- morie_tps_urban_scaling_beta("Assault", year = 2024, save_fig = FALSE) print(rr$summary_lines) ## End(Not run)## Not run: rr <- morie_tps_urban_scaling_beta("Assault", year = 2024, save_fig = FALSE) print(rr$summary_lines) ## End(Not run)
Compact callable mirroring morie.fn.tpsuof.tps_use_of_force.
Computes a use-of-force rate over a known encounter denominator and
returns a per-type count distribution, packaged as a rich-result
list compatible with the morie_mrm_uof_result family in
R/mrm_uof.R.
morie_tps_use_of_force(force_types, n_encounters) morie_tpsuof(force_types, n_encounters)morie_tps_use_of_force(force_types, n_encounters) morie_tpsuof(force_types, n_encounters)
force_types |
Character vector of use-of-force-type labels (one row per use-of-force incident). |
n_encounters |
Positive integer total number of police-public encounters in the denominator. |
rate = length(force_types) / n_encounters
type_counts = table(force_types)
A named list with classes
morie_tps_use_of_force_result, morie_mrm_uof_result,
morie_rich_result, list. Slots: rate,
n, population, type_counts, n_types,
interpretation.
force_types <- c("Physical Control", "Physical Control", "CEW", "Firearm", "OC Spray") morie_tps_use_of_force(force_types, n_encounters = 1000L)force_types <- c("Physical Control", "Physical Control", "CEW", "Firearm", "OC Spray") morie_tps_use_of_force(force_types, n_encounters = 1000L)
Violent-CSI weights for the 9 TPS open-data categories.
MORIE_TPS_VIOLENT_CSI_WEIGHTS()MORIE_TPS_VIOLENT_CSI_WEIGHTS()
Named numeric vector.
Aggregates incident counts by year, restricts to the 1990-2030 window, fits an OLS line, and reports slope, intercept, and R-squared.
morie_tps_year_over_year_trend(df, year_col = "OCC_YEAR", ds_name = "?")morie_tps_year_over_year_trend(df, year_col = "OCC_YEAR", ds_name = "?")
df |
A |
year_col |
Character. Name of the year column
(default |
ds_name |
Character label for the dataset shown in titles. |
A morie_rich_result list with slope,
intercept, r2, direction, years,
counts, fitted.
The City of Toronto adopted the 158-neighbourhood scheme in 2022. Pre-2022 TPS crime records are most faithfully analysed in the historical 140-neighbourhood scheme; 2022-onwards records align with the 158-scheme. TPS often back-fills both columns onto the same record via lat/lon re-geocoding, but the polygon boundaries do not match.
morie_tps_year_to_hood_version(year)morie_tps_year_to_hood_version(year)
year |
Integer year (or vector of years). |
Character vector of "158" / "140" recommendations,
parallel to year.
For each named TPS category in dfs, groups by the dataset's
year column (OCC_YEAR preferred, REPORT_YEAR
fallback), restricts to plausible years (1990-2030), and joins all
series column-wise into a single panel of incident counts.
morie_tps_yoy_panel(dfs, categories = NULL)morie_tps_yoy_panel(dfs, categories = NULL)
dfs |
Named list of TPS data.frames keyed by category name. |
categories |
Optional character vector restricting which keys
of |
A morie_tps_result named list with a single
year-by-category table.
Random fixed projections + ridge head on the mean-pooled context vector.
morie_transformer_genomic( x, y, markers, d_model = 8, lam = 1, seed = 0, deterministic_seed = NULL )morie_transformer_genomic( x, y, markers, d_model = 8, lam = 1, seed = 0, deterministic_seed = NULL )
x |
Optional fixed-effect features. |
y |
Numeric response. |
markers |
(n x L) marker sequence. |
d_model |
Model dimension. |
lam |
Ridge regulariser for the linear head. |
seed |
Seed. |
deterministic_seed |
Optional integer; if supplied, RNG state is
derived via |
list(estimate, y_hat, beta, attention, context, se, n, method).
Vaswani et al. (2017). Montesinos Lopez Ch 15.
morie_transformer_genomic( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )morie_transformer_genomic( x = rnorm(50), y = rnorm(50), markers = matrix(sample(0:2, 200, TRUE), 50, 4) )
Wraps Rtsne::Rtsne.
morie_tsne_reduction( x, n_components = 2L, perplexity = 30, learning_rate = "auto", n_iter = 1000L, seed = 0L, deterministic_seed = NULL )morie_tsne_reduction( x, n_components = 2L, perplexity = 30, learning_rate = "auto", n_iter = 1000L, seed = 0L, deterministic_seed = NULL )
x |
Numeric matrix. |
n_components |
Embedding dimension. |
perplexity |
t-SNE perplexity. |
learning_rate |
Unused by Rtsne (kept for API parity). |
n_iter |
Max iterations. |
seed |
RNG seed. |
deterministic_seed |
Integer or NULL. If supplied, the RNG state
is derived from the SHA-keyed |
Named list: estimate (shape), embedding, kl_divergence, perplexity, n_components, n, method.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Block frequencies of Y in the m+1 intervals defined by the
ordered X-sample. Under H0 the expected block proportion is
1 / (m + 1).
morie_two_sample_coverage(x, y)morie_two_sample_coverage(x, y)
x |
Numeric vector (first sample). |
y |
Numeric vector (second sample). |
Named list: block_freq, block_prop, expected_prop, m, n, cumulative, method.
morie_two_sample_coverage(x = rnorm(50), y = rnorm(50))morie_two_sample_coverage(x = rnorm(50), y = rnorm(50))
Two-sample t-test with tidy output
morie_two_sample_t_test( x1, x2, equal_var = FALSE, alternative = c("two.sided", "greater", "less") )morie_two_sample_t_test( x1, x2, equal_var = FALSE, alternative = c("two.sided", "greater", "less") )
x1 |
Numeric vector (group 1). |
x2 |
Numeric vector (group 2). |
equal_var |
Assume equal variances? Default |
alternative |
|
Named list: t, df, p_value, ci_diff, cohens_d.
morie_two_sample_t_test(rnorm(50, 0.5), rnorm(50, 0))morie_two_sample_t_test(rnorm(50, 0.5), rnorm(50, 0))
Unobserved-components decomposition (trend + seasonal + irregular)
morie_unobserved_components(x, period = 12, trend = "local linear")morie_unobserved_components(x, period = 12, trend = "local linear")
x |
Numeric univariate series. |
period |
Seasonal period (pass 0 to omit). Default 12. |
trend |
Trend specification, "local level" or "local linear". |
Named list with trend, seasonal, irregular, loglik, n,
period, method.
morie_unobserved_components(x = rnorm(50))morie_unobserved_components(x = rnorm(50))
Lists or retrieves bundled userguide PDF files. These are the official PUMF codebooks and user guides from Health Canada / Statistics Canada.
morie_userguide(name = NULL)morie_userguide(name = NULL)
name |
Filename (e.g., |
File path string, or character vector of filenames.
morie_userguide()morie_userguide()
Validate a CPADS analysis data frame
morie_validate_cpads_data(data, strict = TRUE)morie_validate_cpads_data(data, strict = TRUE)
data |
Data frame to validate. |
strict |
If |
Character vector of missing variable names.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Validate outputs manifest structure
morie_validate_outputs_manifest(manifest, strict = TRUE)morie_validate_outputs_manifest(manifest, strict = TRUE)
manifest |
Data frame to validate. |
strict |
If |
TRUE when validation passes.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Scores a_i = qnorm(R_i / (N + 1)); statistic = sum over the first sample.
morie_van_der_waerden_test(x, y)morie_van_der_waerden_test(x, y)
x, y
|
Numeric vectors. |
Named list: statistic, p_value, z, n, m.
morie_van_der_waerden_test(x = rnorm(50), y = rnorm(50))morie_van_der_waerden_test(x = rnorm(50), y = rnorm(50))
Vector error-correction model (VECM)
morie_vecm(Y, k_ar = 1, coint_rank = 1)morie_vecm(Y, k_ar = 1, coint_rank = 1)
Y |
Numeric matrix (T x k) of I(1) candidate series. |
k_ar |
Number of lagged differences. Default 1. |
coint_rank |
Cointegration rank. Default 1. |
Named list with alpha, beta, Gamma, Sigma, loglik, n, k,
rank, method.
morie_vecm(Y = matrix(rnorm(100), 50, 2))morie_vecm(Y = matrix(rnorm(100), 50, 2))
Mirrors the Python morie.verify_statistical_output(). Runs a small
suite of sanity checks on a JSON output containing fields commonly used
across MORIE estimators: ate, se, ci_lower, ci_upper, n,
p_value. Each check is a named boolean; the verification passes if all
checks are TRUE.
morie_verify_statistical_output(path)morie_verify_statistical_output(path)
path |
Path to a JSON output file. |
Checks: SE non-negative; CI lower < CI upper; estimate inside the CI; n positive; p-value (if present) in [0, 1]; estimate finite.
A list with path, passed (logical), and checks (named list
of boolean check results).
tmp <- tempfile(fileext = ".json") if (requireNamespace("jsonlite", quietly = TRUE)) { jsonlite::write_json( list(ate = 0.5, se = 0.1, ci_lower = 0.3, ci_upper = 0.7, n = 200), tmp, auto_unbox = TRUE ) morie_verify_statistical_output(tmp) unlink(tmp) }tmp <- tempfile(fileext = ".json") if (requireNamespace("jsonlite", quietly = TRUE)) { jsonlite::write_json( list(ate = 0.5, se = 0.1, ci_lower = 0.3, ci_upper = 0.7, n = 200), tmp, auto_unbox = TRUE ) morie_verify_statistical_output(tmp) unlink(tmp) }
Fetch and cache a Google Cloud access token via gcloud
morie_vertex_access_token(cfg = NULL)morie_vertex_access_token(cfg = NULL)
cfg |
Config list, or NULL to resolve. |
Character bearer token.
R port of morie.vertex.ask_gemini. POSTs to the Vertex AI REST
endpoint :generateContent and returns the concatenated text from
the first candidate.
morie_vertex_ask_gemini( prompt, model = NULL, system = NULL, temperature = 0.1, max_output_tokens = 2048L, timeout_s = 120, cfg = NULL )morie_vertex_ask_gemini( prompt, model = NULL, system = NULL, temperature = 0.1, max_output_tokens = 2048L, timeout_s = 120, cfg = NULL )
prompt |
Character scalar – the user prompt. |
model |
Optional Gemini model override. |
system |
Optional system instruction. |
temperature |
Numeric. Default 0.1. |
max_output_tokens |
Integer. Default 2048. |
timeout_s |
Numeric HTTP timeout. Default 120. |
cfg |
Pre-resolved config list, or NULL to auto-resolve. |
Character scalar – trimmed generated text.
Tiny smoke test for the Vertex AI client
morie_vertex_health_check()morie_vertex_health_check()
Named list (ok / error / model / project / location / reply).
Resolve Vertex AI configuration from environment variables
morie_vertex_resolve_config()morie_vertex_resolve_config()
Named list: project / location / model / token_ttl_s / gcloud_path.
VPD's GeoDASH Open Data portal (https://geodash.vpd.ca/opendata/)
has no automation API: every download requires a manual click-
through of VPD's terms-of-use plus a popup-based file save per
(year, neighbourhood) selection. Calling
morie_vpd_download_instructions() prints the exact steps
the user needs to follow to get a full-fidelity crime CSV that
morie_datasets_vpd_crime can consume.
morie_vpd_download_instructions(to = NULL)morie_vpd_download_instructions(to = NULL)
to |
Optional file path. If supplied, the instructions are
ALSO written to that file (so the user can read them outside R).
Default |
Invisibly the instruction text (character vector, one line per element).
morie_datasets_vpd_crime() for the loader that
accepts the downloaded file.
morie_vpd_download_instructions()morie_vpd_download_instructions()
Discrete wavelet decomposition for a time series
morie_wavelet_time_series(x, wavelet = "haar", level = NULL)morie_wavelet_time_series(x, wavelet = "haar", level = NULL)
x |
Numeric univariate series. |
wavelet |
Wavelet family. Default "haar". |
level |
Decomposition depth. Default floor(log2 n) capped at 6. |
Named list with approximation, details, energies, level,
n, wavelet, method.
morie_wavelet_time_series(x = rnorm(50))morie_wavelet_time_series(x = rnorm(50))
Bootstrap replicate weights (Rao-Wu rescaling within strata).
morie_weights_bootstrap(weights, n_replicates = 200, strata = NULL, seed = 42)morie_weights_bootstrap(weights, n_replicates = 200, strata = NULL, seed = 42)
weights |
Numeric vector of unit-level design weights. |
n_replicates |
Integer; replicate count. |
strata |
Optional vector of stratum identifiers aligned with
|
seed |
Integer RNG seed. |
Each stratum is split into two halves; signs from a random Hadamard-like
matrix double one half and zero the other. For exact Hadamard ordering use
survey::as.svrepdesign(..., type = "BRR").
morie_weights_brr(weights, strata, n_replicates = NULL, seed = 42)morie_weights_brr(weights, strata, n_replicates = NULL, seed = 42)
weights |
Numeric vector of unit-level design weights. |
strata |
Optional vector of stratum identifiers aligned with
|
n_replicates |
Integer; replicate count. |
seed |
Integer RNG seed. |
Dispatch helper – calibrate to totals via "raking" or "greg".
morie_weights_calibrate_to_totals( weights, df, totals, method = c("raking", "greg"), ... )morie_weights_calibrate_to_totals( weights, df, totals, method = c("raking", "greg"), ... )
weights |
Numeric vector of unit-level design weights. |
df |
A |
totals |
Named numeric vector of calibration targets used by
|
method |
Character; calibration / smoothing / variance method. Allowed values depend on the caller. |
... |
Additional method-specific arguments. |
Combined design x nonresponse x post-strat (x trim) pipeline.
morie_weights_combined( selection_probs, responded, adjustment_cells = NULL, calibration_strata = NULL, population_totals = NULL, trim_percentiles = NULL )morie_weights_combined( selection_probs, responded, adjustment_cells = NULL, calibration_strata = NULL, population_totals = NULL, trim_percentiles = NULL )
selection_probs |
Numeric vector of selection probabilities
for |
responded |
Logical/integer 0/1 vector of response indicators. |
adjustment_cells |
Optional cell identifiers used by
|
calibration_strata |
Optional strata for the calibration
step of |
population_totals |
Named numeric vector of target totals to calibrate to. |
trim_percentiles |
Two-element numeric vector |
Kish design effect (n / ESS).
morie_weights_deff(weights)morie_weights_deff(weights)
weights |
Numeric vector of unit-level design weights. |
.
morie_weights_design(selection_probs)morie_weights_design(selection_probs)
selection_probs |
Numeric vector of selection probabilities
for |
Detect extreme weights at +/- k * IQR or by absolute percentile.
morie_weights_detect_extreme(weights, k = 3)morie_weights_detect_extreme(weights, k = 3)
weights |
Numeric vector of unit-level design weights. |
k |
Numeric multiplier for |
Returns a named list with summary statistics, Kish ESS, design effect, weight-range ratio, and percentile vector.
morie_weights_diagnostics(weights)morie_weights_diagnostics(weights)
weights |
Numeric vector of unit-level design weights. |
.Kish effective sample size: .
morie_weights_ess(weights)morie_weights_ess(weights)
weights |
Numeric vector of unit-level design weights. |
fay_coefficient in [0,1).Fay's BRR weights with perturbation coefficient fay_coefficient in [0,1).
morie_weights_fay_brr( weights, strata, fay_coefficient = 0.5, n_replicates = NULL, seed = 42 )morie_weights_fay_brr( weights, strata, fay_coefficient = 0.5, n_replicates = NULL, seed = 42 )
weights |
Numeric vector of unit-level design weights. |
strata |
Optional vector of stratum identifiers aligned with
|
fay_coefficient |
Fay's coefficient ( |
n_replicates |
Integer; replicate count. |
seed |
Integer RNG seed. |
Closed-form linear calibration to match population totals on auxiliary X.
When survey is installed, defers to survey::calibrate() for a fully
design-aware result; otherwise computes the linear adjustment in base R.
morie_weights_greg(weights, X, population_totals, max_iter = 50, tol = 1e-08)morie_weights_greg(weights, X, population_totals, max_iter = 50, tol = 1e-08)
weights |
Numeric vector of unit-level design weights. |
X |
A numeric matrix or |
population_totals |
Named numeric vector of target totals to calibrate to. |
max_iter |
Iteration cap for calibration / IPF. |
tol |
Convergence tolerance for calibration. |
When the survey package is installed and strata is supplied, defers
to survey::as.svrepdesign(..., type = "JKn") for variance compatibility.
morie_weights_jackknife(weights, strata = NULL, jk_type = c("JK1", "JKn"))morie_weights_jackknife(weights, strata = NULL, jk_type = c("JK1", "JKn"))
weights |
Numeric vector of unit-level design weights. |
strata |
Optional vector of stratum identifiers aligned with
|
jk_type |
Character; jackknife type ( |
Multi-frame (dual-frame) survey weights (Hartley compositing).
morie_weights_multiframe( weights_a, weights_b, overlap_a, overlap_b, method = c("hartley", "optimal"), theta = 0.5 )morie_weights_multiframe( weights_a, weights_b, overlap_a, overlap_b, method = c("hartley", "optimal"), theta = 0.5 )
weights_a |
Numeric vector of weights from frame A
( |
weights_b |
Numeric vector of weights from frame B
( |
overlap_a |
Logical vector flagging frame-A units in overlap. |
overlap_b |
Logical vector flagging frame-B units in overlap. |
method |
Character; calibration / smoothing / variance method. Allowed values depend on the caller. |
theta |
Numeric tuning parameter passed to |
Within each cell, scales respondent weights up by total/responder ratio. Non-respondents end up with weight 0.
morie_weights_nonresponse(weights, responded, adjustment_cells = NULL)morie_weights_nonresponse(weights, responded, adjustment_cells = NULL)
weights |
Numeric vector of unit-level design weights. |
responded |
Logical/integer 0/1 vector of response indicators. |
adjustment_cells |
Optional cell identifiers used by
|
Normalise weights so they sum to n (sample) or N (population).
morie_weights_normalize( weights, target = c("sample_size", "population"), population_size = NULL )morie_weights_normalize( weights, target = c("sample_size", "population"), population_size = NULL )
weights |
Numeric vector of unit-level design weights. |
target |
Numeric target sum ( |
population_size |
Optional known population size used by
|
.
morie_weights_poststratify(weights, strata, population_totals)morie_weights_poststratify(weights, strata, population_totals)
weights |
Numeric vector of unit-level design weights. |
strata |
Optional vector of stratum identifiers aligned with
|
population_totals |
Named numeric vector of target totals to calibrate to. |
Propensity-score non-response weights (logistic).
morie_weights_propensity_nonresponse(weights, responded, X)morie_weights_propensity_nonresponse(weights, responded, X)
weights |
Numeric vector of unit-level design weights. |
responded |
Logical/integer 0/1 vector of response indicators. |
X |
A numeric matrix or |
Adjusts weights so that within each calibration variable the weighted sums
match the supplied marginal targets. margins is a named list keyed by
variable name; each entry is a named numeric vector mapping category values
(as strings) to target totals.
morie_weights_rake( weights, df, margins, max_iter = 100, tol = 1e-06, bounds = NULL )morie_weights_rake( weights, df, margins, max_iter = 100, tol = 1e-06, bounds = NULL )
weights |
Initial numeric weights (length n). |
df |
data.frame containing the calibration variables. |
margins |
Named list of named numeric vectors. |
max_iter |
Maximum IPF iterations (default 100). |
tol |
Convergence tolerance on max relative adjustment (default 1e-6). |
bounds |
Optional |
list with weights, converged, iterations, max_adjustment,
diagnostics (from morie_weights_diagnostics).
method selects the rescaling: "JK1", "JKn", "BRR", "Fay", "bootstrap", "SDR".
morie_weights_replicate_variance( full_estimate, replicate_estimates, method = c("JK1", "JKn", "BRR", "Fay", "bootstrap", "SDR"), fay_coefficient = 0, strata = NULL )morie_weights_replicate_variance( full_estimate, replicate_estimates, method = c("JK1", "JKn", "BRR", "Fay", "bootstrap", "SDR"), fay_coefficient = 0, strata = NULL )
full_estimate |
Numeric scalar; full-sample point estimate. |
replicate_estimates |
Numeric vector of replicate point estimates. |
method |
Character; calibration / smoothing / variance method. Allowed values depend on the caller. |
fay_coefficient |
Fay's coefficient ( |
strata |
Optional vector of stratum identifiers aligned with
|
Successive Difference Replication (SDR) weights.
morie_weights_sdr(weights, n_replicates = 100, seed = 42)morie_weights_sdr(weights, n_replicates = 100, seed = 42)
weights |
Numeric vector of unit-level design weights. |
n_replicates |
Integer; replicate count. |
seed |
Integer RNG seed. |
Smooth survey weights via shrinkage toward the mean (or log-mean).
morie_weights_smooth( weights, method = c("linear_shrinkage", "log_transform"), shrinkage_factor = 0.5 )morie_weights_smooth( weights, method = c("linear_shrinkage", "log_transform"), shrinkage_factor = 0.5 )
weights |
Numeric vector of unit-level design weights. |
method |
Character; calibration / smoothing / variance method. Allowed values depend on the caller. |
shrinkage_factor |
Numeric in |
method = "percentile" clips at the specified percentiles;
method = "winsorize" replaces outliers with the boundary values.
morie_weights_trim( weights, lower_percentile = 1, upper_percentile = 99, method = c("percentile", "winsorize") )morie_weights_trim( weights, lower_percentile = 1, upper_percentile = 99, method = c("percentile", "winsorize") )
weights |
Numeric vector of unit-level design weights. |
lower_percentile |
Lower percentile cut for |
upper_percentile |
Upper percentile cut for |
method |
Character; calibration / smoothing / variance method. Allowed values depend on the caller. |
Simulates samples of size length(x) from Normal(effect_size, 1)
and reports the rejection rate of two-sided wilcox.test at level
alpha.
morie_wilcoxon_power(x, effect_size = 0.5, alpha = 0.05, nsim = 2000, seed = 0)morie_wilcoxon_power(x, effect_size = 0.5, alpha = 0.05, nsim = 2000, seed = 0)
x |
Numeric vector (only |
effect_size |
Location shift under H1. |
alpha |
Test level. |
nsim |
Replicates. |
seed |
Reproducibility seed (NULL = no fix). |
Named list: statistic (power), n, effect_size, alpha, nsim, se.
morie_wilcoxon_power(x = rnorm(50))morie_wilcoxon_power(x = rnorm(50))
Wilcoxon signed-rank test (paired)
morie_wilcoxon_signed_rank_test( x1, x2, alternative = c("two.sided", "greater", "less") )morie_wilcoxon_signed_rank_test( x1, x2, alternative = c("two.sided", "greater", "less") )
x1 |
Numeric vector (before). |
x2 |
Numeric vector (after). |
alternative |
|
Named list: V, p_value.
# See the package vignettes for usage examples: # vignette(package = "rmorie")# See the package vignettes for usage examples: # vignette(package = "rmorie")
Write a Markdown audit report.
morie_write_audit_markdown(out_path, audit_result)morie_write_audit_markdown(out_path, audit_result)
out_path |
Path to write to. |
audit_result |
A |
The path written.
Write synthetic epidemiology-style data to CSV
morie_write_synthetic_data( path, n = 5000L, seed = 42L, special_code_rate = 0.02, profile = c("generic", "morie_legacy"), name_map = NULL, overwrite = FALSE )morie_write_synthetic_data( path, n = 5000L, seed = 42L, special_code_rate = 0.02, profile = c("generic", "morie_legacy"), name_map = NULL, overwrite = FALSE )
path |
Output CSV path. |
n |
Number of rows. |
seed |
Random seed. |
special_code_rate |
Proportion of survey-style missing codes. |
profile |
Naming profile for output columns. |
name_map |
Optional custom variable name map. |
overwrite |
If |
Normalized output path.
out <- morie_write_synthetic_data(tempfile(fileext = ".csv"), n = 200, seed = 1) file.exists(out)out <- morie_write_synthetic_data(tempfile(fileext = ".csv"), n = 200, seed = 1) file.exists(out)
Wraps the xgboost package. If xgboost isn't installed, falls
back to gbm (gradient boosting) so users still get a usable
boosted-trees result; the backend is flagged in the output.
morie_xgboost_objective( x, y, n_estimators = 100L, learning_rate = 0.1, max_depth = 3L, reg_lambda = 1, reg_alpha = 0, task = "auto", seed = 0L, deterministic_seed = NULL )morie_xgboost_objective( x, y, n_estimators = 100L, learning_rate = 0.1, max_depth = 3L, reg_lambda = 1, reg_alpha = 0, task = "auto", seed = 0L, deterministic_seed = NULL )
x |
Numeric predictor matrix. |
y |
Response. |
n_estimators |
Number of boosting rounds. |
learning_rate |
eta / shrinkage. |
max_depth |
Tree depth. |
reg_lambda |
L2 leaf penalty. |
reg_alpha |
L1 leaf penalty. |
task |
"auto", "classification", or "regression". |
seed |
RNG seed. |
deterministic_seed |
Integer or NULL. If supplied, the RNG state
is derived from the SHA-keyed |
Named list: estimate, train_score, feature_importances, backend, n_estimators, learning_rate, max_depth, reg_lambda, reg_alpha, task, n, method.
morie_xgboost_objective(x = rnorm(50), y = rnorm(50))morie_xgboost_objective(x = rnorm(50), y = rnorm(50))
One-way ANOVA with pairwise Bonferroni-adjusted t-tests
mrm_anova_bonferroni(data, response_col, group_col, alpha = 0.05)mrm_anova_bonferroni(data, response_col, group_col, alpha = 0.05)
data |
data.frame. |
response_col |
Response column name. |
group_col |
Group column name. |
alpha |
Family-wise error rate (default 0.05). |
Named list with f_statistic, p_value, n_groups, n_pairs, alpha, alpha_per_pair, pairs (data.frame), interpretation.
set.seed(2026) n <- 30L df <- data.frame( y = c(rnorm(n, 0), rnorm(n, 0.5), rnorm(n, 1)), g = rep(c("A", "B", "C"), each = n) ) res <- mrm_anova_bonferroni(df, response_col = "y", group_col = "g") res$alpha_per_pair # Bonferroni-corrected per-pair alpha res$pairs # per-pair t-tests with adjusted significance flagsset.seed(2026) n <- 30L df <- data.frame( y = c(rnorm(n, 0), rnorm(n, 0.5), rnorm(n, 1)), g = rep(c("A", "B", "C"), each = n) ) res <- mrm_anova_bonferroni(df, response_col = "y", group_col = "g") res$alpha_per_pair # Bonferroni-corrected per-pair alpha res$pairs # per-pair t-tests with adjusted significance flags
One-way ANOVA + Tukey HSD post-hoc
mrm_anova_oneway(data, response_col, group_col, alpha = 0.05)mrm_anova_oneway(data, response_col, group_col, alpha = 0.05)
data |
data.frame containing response and group columns. |
response_col, group_col
|
Column names. |
alpha |
CI level (default 0.05). |
Named list with f_statistic, p_value, df_between, df_within, means, n_per_group, tukey_hsd, interpretation.
set.seed(2026) n <- 30L df <- data.frame( y = c(rnorm(n, 0), rnorm(n, 0.5), rnorm(n, 1)), g = rep(c("A", "B", "C"), each = n) ) res <- mrm_anova_oneway(df, response_col = "y", group_col = "g") res$f_statistic res$p_value res$tukey_hsdset.seed(2026) n <- 30L df <- data.frame( y = c(rnorm(n, 0), rnorm(n, 0.5), rnorm(n, 1)), g = rep(c("A", "B", "C"), each = n) ) res <- mrm_anova_oneway(df, response_col = "y", group_col = "g") res$f_statistic res$p_value res$tukey_hsd
Power of one-way ANOVA given Cohen's f
mrm_anova_power(k_groups, n_per_group, effect_size_f, alpha = 0.05)mrm_anova_power(k_groups, n_per_group, effect_size_f, alpha = 0.05)
k_groups |
Number of groups. |
n_per_group |
Per-group sample size. |
effect_size_f |
Cohen's f. |
alpha |
Type-I error (default 0.05). |
Named list with k_groups, n_per_group, N_total, effect_size_f, alpha, df1, df2, noncentrality, F_critical, power, interpretation.
# Power to detect a medium effect (Cohen's f = 0.25) with 4 groups # of 30 each at alpha = 0.05: res <- mrm_anova_power( k_groups = 4, n_per_group = 30, effect_size_f = 0.25, alpha = 0.05 ) res$power res$F_critical # Sample-size sensitivity: what power do I get with smaller groups? sapply(c(10, 20, 30, 50, 100), function(n) { mrm_anova_power( k_groups = 3, n_per_group = n, effect_size_f = 0.25 )$power })# Power to detect a medium effect (Cohen's f = 0.25) with 4 groups # of 30 each at alpha = 0.05: res <- mrm_anova_power( k_groups = 4, n_per_group = 30, effect_size_f = 0.25, alpha = 0.05 ) res$power res$F_critical # Sample-size sensitivity: what power do I get with smaller groups? sapply(c(10, 20, 30, 50, 100), function(n) { mrm_anova_power( k_groups = 3, n_per_group = n, effect_size_f = 0.25 )$power })
Six analyzers that load one ARSAU dataset via the loaders in
R/arsau.R and chain the generic MRM Use-of-Force callables
from R/mrm_uof.R, producing a single named list with
multi-paragraph interpretation, the loaded data, all sub-analyses,
and the source sidecar (if present).
Each analyzer accepts the same year / language /
data_dir arguments as the matching loader, and returns a
named list whose constituent sub-results are available under named
keys (force_concentration, disparity_by_race, etc.).
Returns each assumption with diagnostic evidence + a flag.
mrm_assumptions_check(data, treatment_col, outcome_col, covariates)mrm_assumptions_check(data, treatment_col, outcome_col, covariates)
data |
data.frame. |
treatment_col |
Binary 0/1 column. |
outcome_col |
Outcome column (presently unused; reserved for future E-value evidence). |
covariates |
Character vector of covariate columns. |
Named list with sutva, unconfoundedness, probabilistic_assignment, overall_verdict sub-lists.
set.seed(2026) n <- 300L x <- rnorm(n) D <- rbinom(n, 1, plogis(0.5 * x)) y <- 0.7 * D + 0.3 * x + rnorm(n) df <- data.frame(D = D, y = y, age = x) chk <- mrm_assumptions_check(df, treatment_col = "D", outcome_col = "y", covariates = "age" ) chk$overall_verdictset.seed(2026) n <- 300L x <- rnorm(n) D <- rbinom(n, 1, plogis(0.5 * x)) y <- 0.7 * D + 0.3 * x + rnorm(n) df <- data.frame(D = D, y = y, age = x) chk <- mrm_assumptions_check(df, treatment_col = "D", outcome_col = "y", covariates = "age" ) chk$overall_verdict
Designed-experiment convenience wrapper around the morie causal estimator family
mrm_causal_design( data, treatment_col, outcome_col, covariates = character(0), estimator = c("ipw", "diff_in_means") )mrm_causal_design( data, treatment_col, outcome_col, covariates = character(0), estimator = c("ipw", "diff_in_means") )
data |
data.frame with treatment, outcome, optional covariates. |
treatment_col |
Binary 0/1 treatment column. |
outcome_col |
Continuous outcome column. |
covariates |
Optional character vector of covariate columns. |
estimator |
One of |
Named list with estimator, estimate, se, ci_lower, ci_upper, p_value, n, n_treated, interpretation.
set.seed(2026) n <- 200L x <- rnorm(n) D <- rbinom(n, 1, plogis(0.5 * x)) y <- 0.7 * D + 0.3 * x + rnorm(n, 0, 0.5) df <- data.frame(D = D, y = y, age = x) # IPW-adjusted ATE ipw <- mrm_causal_design(df, treatment_col = "D", outcome_col = "y", covariates = "age", estimator = "ipw" ) # Naive difference in means for comparison raw <- mrm_causal_design(df, treatment_col = "D", outcome_col = "y", estimator = "diff_in_means" ) c(ipw = ipw$estimate, raw = raw$estimate)set.seed(2026) n <- 200L x <- rnorm(n) D <- rbinom(n, 1, plogis(0.5 * x)) y <- 0.7 * D + 0.3 * x + rnorm(n, 0, 0.5) df <- data.frame(D = D, y = y, age = x) # IPW-adjusted ATE ipw <- mrm_causal_design(df, treatment_col = "D", outcome_col = "y", covariates = "age", estimator = "ipw" ) # Naive difference in means for comparison raw <- mrm_causal_design(df, treatment_col = "D", outcome_col = "y", estimator = "diff_in_means" ) c(ipw = ipw$estimate, raw = raw$estimate)
A design is "balanced on X" if every |SMD(X_i)| <= threshold.
mrm_check_balancing(data, treatment_col, covariates, threshold_pct = 10)mrm_check_balancing(data, treatment_col, covariates, threshold_pct = 10)
data |
data.frame. |
treatment_col |
Binary 0/1 treatment column. |
covariates |
Character vector of covariate columns. |
threshold_pct |
%SMD imbalance threshold (default 10). |
Named list with table, threshold_pct, n_imbalanced, overall_balanced, interpretation.
set.seed(2026) n <- 200L df <- data.frame( D = rbinom(n, 1, 0.4), age = rnorm(n, 50, 10), bmi = rnorm(n, 27, 4) ) df$age[df$D == 1] <- df$age[df$D == 1] + 3 # imbalance on age bal <- mrm_check_balancing(df, treatment_col = "D", covariates = c("age", "bmi") ) bal$overall_balanced bal$interpretationset.seed(2026) n <- 200L df <- data.frame( D = rbinom(n, 1, 0.4), age = rnorm(n, 50, 10), bmi = rnorm(n, 27, 4) ) df$age[df$D == 1] <- df$age[df$D == 1] + 3 # imbalance on age bal <- mrm_check_balancing(df, treatment_col = "D", covariates = c("age", "bmi") ) bal$overall_balanced bal$interpretation
Propensity-score support overlap diagnostic (Cole-Hernan 2008)
mrm_check_overlap(data, treatment_col, covariates)mrm_check_overlap(data, treatment_col, covariates)
data |
data.frame. |
treatment_col |
Binary 0/1 treatment column. |
covariates |
Character vector of covariates. |
Named list with e_treated_quantiles, e_control_quantiles, common_support_lower, common_support_upper, n_outside_support, positivity_violations, interpretation.
set.seed(2026) n <- 300L x <- rnorm(n) D <- rbinom(n, 1, plogis(0.5 * x)) df <- data.frame(D = D, age = x) ovl <- mrm_check_overlap(df, treatment_col = "D", covariates = "age" ) ovl$positivity_violations ovl$interpretationset.seed(2026) n <- 300L x <- rnorm(n) D <- rbinom(n, 1, plogis(0.5 * x)) df <- data.frame(D = D, age = x) ovl <- mrm_check_overlap(df, treatment_col = "D", covariates = "age" ) ovl$positivity_violations ovl$interpretation
Classify placement records under the United Nations Standard Minimum Rules for the Treatment of Prisoners (the Nelson Mandela Rules, UN A/RES/70/175) which define prolonged solitary confinement as any continuous placement exceeding fifteen days. Provides three denominator conventions and an optional broader-restrictive-confinement classification that adds high-alert placements to the numerator.
mrm_classify_mandela( data, duration_col = "NumberConsecutiveDays_Segregation", year_col = "EndFiscalYear", id_col = "UniqueIndividual_ID", threshold_days = 15L, denominator = c("individual_any", "row", "individual_cumulative"), broader_rc = FALSE, alert_cols = c("MentalHealth_Alert", "SuicideRisk_Alert", "SuicideWatch_Alert"), meaningful_contact_col = NULL )mrm_classify_mandela( data, duration_col = "NumberConsecutiveDays_Segregation", year_col = "EndFiscalYear", id_col = "UniqueIndividual_ID", threshold_days = 15L, denominator = c("individual_any", "row", "individual_cumulative"), broader_rc = FALSE, alert_cols = c("MentalHealth_Alert", "SuicideRisk_Alert", "SuicideWatch_Alert"), meaningful_contact_col = NULL )
data |
A data.frame or data.table containing at minimum the placement-duration column and a fiscal-year column. |
duration_col |
Column name (character) of consecutive-day
placement durations. Default |
year_col |
Column name of the fiscal-year identifier. Default
|
id_col |
Column name of the per-year individual identifier.
Default |
threshold_days |
Mandela duration threshold in days. Default
|
denominator |
One of |
broader_rc |
Logical. If |
alert_cols |
Character vector of binary alert columns used to compute alert-complexity for the broader rate. Default the three b01 alert columns. |
meaningful_contact_col |
Optional. Column name of a 1-if-meaningful-contact indicator (federal Sprott-Doob style). When supplied, rows with met-contact are excluded from the numerator. |
The provincial classification operates on duration alone (the
duration_col column). The federal classification additionally
requires unmet "meaningful contact" criteria (Sprott & Doob, 2021);
if meaningful_contact_col is supplied, that column is treated as a
1-if-met indicator and rows with met-contact are excluded from the
numerator regardless of duration.
A data.frame with columns:
Fiscal year (or "pooled").
Total denominator under the chosen convention.
Numerator: count of records (or individuals) classified as Mandela-prolonged.
Proportion n_mandela / denominator, in the unit interval.
Same as rate expressed as percentage.
Broader-rate numerator (if broader_rc).
Broader-rate proportion (if broader_rc).
United Nations General Assembly (2015). United Nations Standard Minimum Rules for the Treatment of Prisoners (the Nelson Mandela Rules). A/RES/70/175.
Sprott, J. B., & Doob, A. N. (2021). Solitary Confinement, Torture, and Canada's Structured Intervention Units. Centre for Criminology and Sociolegal Studies, University of Toronto. Available at the Centre for Criminology and Sociolegal Studies web site: crimsl.utoronto.ca (file TortureSolitarySIUsSprottDoob23Feb2021_0.pdf).
Iftene, A., & Doob, A. N. (2024). Do Independent External Decision Makers Ensure that "An Inmate's Confinement in a Structured Intervention Unit Is to End as Soon as Possible"? (Corrections and Conditional Release Act, Section 33). Dalhousie Schulich School of Law, report 51. https://digitalcommons.schulichlaw.dal.ca/reports/51/
# Strict provincial Mandela on b01: # mrm_classify_mandela(b01_data) # # Broader restrictive-confinement (adds high-alert placements): # mrm_classify_mandela(b01_data, broader_rc = TRUE)# Strict provincial Mandela on b01: # mrm_classify_mandela(b01_data) # # Broader restrictive-confinement (adds high-alert placements): # mrm_classify_mandela(b01_data, broader_rc = TRUE)
Generate sample means from a base distribution.
mrm_clt_demo( base_distribution = "unif", n_samples = 1000L, sample_size = 30L, seed = 42L, ... )mrm_clt_demo( base_distribution = "unif", n_samples = 1000L, sample_size = 30L, seed = 42L, ... )
base_distribution |
Distribution suffix passed to
|
n_samples |
Number of sample means. |
sample_size |
Size of each sample. |
seed |
RNG seed. |
... |
Additional parameters passed to |
data.frame with sample_index, sample_mean, z_score.
# 1000 sample means of size 30 from an exponential(1) base; # standardised z-scores converge to N(0,1): res <- mrm_clt_demo( base_distribution = "exp", n_samples = 1000L, sample_size = 30L, seed = 42L, rate = 1 ) summary(res$z_score) # mean ~ 0, sd ~ 1# 1000 sample means of size 30 from an exponential(1) base; # standardised z-scores converge to N(0,1): res <- mrm_clt_demo( base_distribution = "exp", n_samples = 1000L, sample_size = 30L, seed = 42L, rate = 1 ) summary(res$z_score) # mean ~ 0, sd ~ 1
R parity of morie.mrm_design (Python). Four general-
purpose statistical-design entry points covering the
designexptr.org pedagogical sequence: two-treatment comparison,
one-way ANOVA with Tukey HSD, 2^k factorial design, and a
designed-experiment convenience wrapper around the morie causal
estimator family.
Each design callable returns a named list of estimates,
test statistics, p-values, and a plain-language interpretation.
Box, G. E. P., Hunter, J. S., & Hunter, W. G. (2005). Statistics for Experimenters. Wiley.
set.seed(2026) a <- rnorm(40, mean = 5, sd = 1.2) b <- rnorm(40, mean = 5.5, sd = 1.5) mrm_two_treatment_test(a, b)$p_welchset.seed(2026) a <- rnorm(40, mean = 5, sd = 1.2) b <- rnorm(40, mean = 5.5, sd = 1.5) mrm_two_treatment_test(a, b)$p_welch
Balance, overlap, SUTVA-style assumption checks, and the median
causal effect estimator. R parity of morie.mrm_diagnostics.
Each diagnostic callable returns a named list of balance
and overlap statistics (or the estimated effect) together with a
plain-language interpretation.
Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social and Biomedical Sciences. Cambridge University Press. Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39(1), 33-38. Cole, S. R., & Hernan, M. A. (2008). Constructing inverse probability weights for marginal structural models. AJE, 168(6), 656-664.
set.seed(2026) n <- 200L df <- data.frame( D = rbinom(n, 1, 0.4), age = rnorm(n, 50, 10), bmi = rnorm(n, 27, 4) ) mrm_standardised_difference(df, treatment_col = "D", covariates = c("age", "bmi") )set.seed(2026) n <- 200L df <- data.frame( D = rbinom(n, 1, 0.4), age = rnorm(n, 50, 10), bmi = rnorm(n, 27, 4) ) mrm_standardised_difference(df, treatment_col = "D", covariates = c("age", "bmi") )
R parity of morie.mrm_doe. Closes the Chapter-3/4/5 coverage
gap from designexptr.org.
Each design-of-experiments callable returns a named list
holding the constructed design or the analysis result and a
plain-language interpretation.
Box, G. E. P., Hunter, J. S., & Hunter, W. G. (2005). Statistics for Experimenters (2nd ed.). Wiley. Cochran, W. G., & Cox, G. M. (1957). Experimental Designs (2nd ed.). Wiley. Montgomery, D. C. (2017). Design and Analysis of Experiments (9th ed.). Box, G. E. P., & Wilson, K. B. (1951). On the experimental attainment of optimum conditions. JRSS-B, 13(1), 1-45. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences.
set.seed(2026) n <- 30L df <- data.frame( y = c(rnorm(n, 0), rnorm(n, 0.5), rnorm(n, 1)), g = rep(c("A", "B", "C"), each = n) ) mrm_anova_bonferroni(df, response_col = "y", group_col = "g")$alpha_per_pairset.seed(2026) n <- 30L df <- data.frame( y = c(rnorm(n, 0), rnorm(n, 0.5), rnorm(n, 1)), g = rep(c("A", "B", "C"), each = n) ) mrm_anova_bonferroni(df, response_col = "y", group_col = "g")$alpha_per_pair
Returns main effects (difference of means at +1 vs -1 per factor), all interaction effects, and half-normal-plot coordinates for Daniel's method (which lets the user separate active effects from a null half-normal line on the same axes).
mrm_factorial_2k(data, response_col, factor_cols)mrm_factorial_2k(data, response_col, factor_cols)
data |
data.frame with response_col and factor_cols. Factor
columns may be coded as |
response_col |
Numeric response column. |
factor_cols |
Character vector of factor column names. |
Named list with main_effects, interaction_effects, half_normal_coords (data.frame), n, k, interpretation.
# 2^3 full factorial: 8 runs, factors A, B, C in {-1, +1}. set.seed(2026) lvl <- c(-1, 1) df <- expand.grid(A = lvl, B = lvl, C = lvl) df$y <- 10 + 2 * df$A + 1.5 * df$B + 0.5 * df$A * df$B + rnorm(8, 0, 0.2) res <- mrm_factorial_2k(df, response_col = "y", factor_cols = c("A", "B", "C") ) res$main_effects res$interaction_effects# 2^3 full factorial: 8 runs, factors A, B, C in {-1, +1}. set.seed(2026) lvl <- c(-1, 1) df <- expand.grid(A = lvl, B = lvl, C = lvl) df$y <- 10 + 2 * df$A + 1.5 * df$B + 0.5 * df$A * df$B + rnorm(8, 0, 0.2) res <- mrm_factorial_2k(df, response_col = "y", factor_cols = c("A", "B", "C") ) res$main_effects res$interaction_effects
Factor columns assumed +/-1.
mrm_fractional_factorial(data, response_col, factor_cols, generator = NULL)mrm_fractional_factorial(data, response_col, factor_cols, generator = NULL)
data |
data.frame. |
response_col |
Response column. |
factor_cols |
Character vector of factor columns (each coded -1 or +1). |
generator |
Optional generator string "X=YZ,..." for aliasing. |
Named list with main_effects, alias_structure, n, k, interpretation.
# 2^(3-1) fractional with D = A*B*C generator: 4 runs instead of 8. set.seed(2026) lvl <- c(-1, 1) df <- data.frame( A = c(-1, 1, -1, 1), B = c(-1, -1, 1, 1), C = c(1, -1, -1, 1) ) df$y <- 5 + 2 * df$A + 1.5 * df$B + rnorm(4, 0, 0.3) res <- mrm_fractional_factorial(df, response_col = "y", factor_cols = c("A", "B", "C") ) res$main_effects# 2^(3-1) fractional with D = A*B*C generator: 4 runs instead of 8. set.seed(2026) lvl <- c(-1, 1) df <- data.frame( A = c(-1, 1, -1, 1), B = c(-1, -1, 1, 1), C = c(1, -1, -1, 1) ) df$y <- 5 + 2 * df$A + 1.5 * df$B + rnorm(4, 0, 0.3) res <- mrm_fractional_factorial(df, response_col = "y", factor_cols = c("A", "B", "C") ) res$main_effects
Three-level categorical coding of tract-level gentrification,
mirroring the Python module morie.mrm_primitives.gentrification.
Adapted from Laniyonu (2018) Urban Affairs Review 54(5):898-930,
which itself adapts Chapple / Freeman / Maciag.
The key insight: continuous gentrification indices conflate two distinct populations – already-affluent tracts (immune to gentrification by construction) and marginalised tracts that DID or DID NOT change. The cleanest comparator is the marginalised-but- did-not-gentrify tract, so this primitive emits a 3-level factor:
ineligible – tract was above the baseline-
marginalisation cutoff (top-50\
cannot meaningfully "gentrify". Drop from analyses that want
the gentrification comparator.
eligible – tract was below the cutoff at t=0 AND
did NOT cross the gentrification threshold by t=1. This is the
control: marginalised, did-not-change.
gentrified – tract was below the cutoff at t=0 AND
DID cross the gentrification threshold (top-tercile growth in
college share AND top-tercile growth in median rent).
Implements the Laniyonu (2018) operationalisation:
mrm_gentrification_panel( df, baseline_income_col, baseline_rent_col, growth_college_col, growth_rent_col, baseline_marginalisation_quantile = 0.5, gentrification_growth_quantile = 0.667 )mrm_gentrification_panel( df, baseline_income_col, baseline_rent_col, growth_college_col, growth_rent_col, baseline_marginalisation_quantile = 0.5, gentrification_growth_quantile = 0.667 )
df |
A |
baseline_income_col |
Character. Column carrying baseline (period t=0) income. |
baseline_rent_col |
Character. Column carrying baseline rent. |
growth_college_col |
Character. Column carrying college / BA-share growth between baseline and follow-up. |
growth_rent_col |
Character. Column carrying median-rent growth between baseline and follow-up. |
baseline_marginalisation_quantile |
Numeric in (0, 1); default
0.5. Tract is eligible if baseline income AND rent are
|
gentrification_growth_quantile |
Numeric in (0, 1); default
0.667. Tract gentrifies if college growth AND rent growth are
|
Tract is eligible to gentrify iff baseline income
AND baseline rent are at or below
baseline_marginalisation_quantile of the panel.
Among the eligible, the tract is gentrified iff
growth-in-college-share AND growth-in-rent are at or above
gentrification_growth_quantile.
Everything above the baseline cut is ineligible.
A named list with classes morie_mrm_result,
morie_rich_result, list. Carries labels
(character vector of length nrow(df)), thresholds
(list of four cut-points), counts (table of label levels),
plus interpretation + warnings.
set.seed(1) df <- data.frame( inc0 = runif(50, 20000, 80000), rent0 = runif(50, 500, 2000), coll_g = rnorm(50), rent_g = rnorm(50) ) res <- mrm_gentrification_panel( df, baseline_income_col = "inc0", baseline_rent_col = "rent0", growth_college_col = "coll_g", growth_rent_col = "rent_g" ) table(res$labels)set.seed(1) df <- data.frame( inc0 = runif(50, 20000, 80000), rent0 = runif(50, 500, 2000), coll_g = rnorm(50), rent_g = rnorm(50) ) res <- mrm_gentrification_panel( df, baseline_income_col = "inc0", baseline_rent_col = "rent0", growth_college_col = "coll_g", growth_rent_col = "rent_g" ) table(res$labels)
Graeco-Latin square four-way ANOVA (row, col, Latin, Greek)
mrm_graeco_latin(data, response_col, row_col, col_col, latin_col, greek_col)mrm_graeco_latin(data, response_col, row_col, col_col, latin_col, greek_col)
data |
data.frame. |
response_col, row_col, col_col, latin_col, greek_col
|
Column names. |
Named list with anova, n, interpretation.
# Hardcoded 4 x 4 orthogonal Graeco-Latin square (two random Latin # squares are generally NOT orthogonal, so we use a known pair): L <- matrix(c( "A", "B", "C", "D", "B", "A", "D", "C", "C", "D", "A", "B", "D", "C", "B", "A" ), nrow = 4L, byrow = TRUE) G <- matrix(c( "a", "b", "c", "d", "c", "d", "a", "b", "d", "c", "b", "a", "b", "a", "d", "c" ), nrow = 4L, byrow = TRUE) set.seed(2026) df <- expand.grid(row = paste0("R", 1:4), col = paste0("C", 1:4)) df$latin <- as.vector(L) df$greek <- as.vector(G) df$y <- match(df$latin, LETTERS) * 1.2 + match(df$greek, letters) * 0.5 + rnorm(16, 0, 0.3) res <- mrm_graeco_latin(df, response_col = "y", row_col = "row", col_col = "col", latin_col = "latin", greek_col = "greek" ) res$anova# Hardcoded 4 x 4 orthogonal Graeco-Latin square (two random Latin # squares are generally NOT orthogonal, so we use a known pair): L <- matrix(c( "A", "B", "C", "D", "B", "A", "D", "C", "C", "D", "A", "B", "D", "C", "B", "A" ), nrow = 4L, byrow = TRUE) G <- matrix(c( "a", "b", "c", "d", "c", "d", "a", "b", "d", "c", "b", "a", "b", "a", "d", "c" ), nrow = 4L, byrow = TRUE) set.seed(2026) df <- expand.grid(row = paste0("R", 1:4), col = paste0("C", 1:4)) df$latin <- as.vector(L) df$greek <- as.vector(G) df$y <- match(df$latin, LETTERS) * 1.2 + match(df$greek, letters) * 0.5 + rnorm(16, 0, 0.3) res <- mrm_graeco_latin(df, response_col = "y", row_col = "row", col_col = "col", latin_col = "latin", greek_col = "greek" ) res$anova
R parity of morie.mrm_kulldorff.mrm_tps_kulldorff_scan().
Implements Kulldorff's 1997 Poisson log-likelihood-ratio space-time
scan with a Monte-Carlo permutation test for significance.
The scan iterates over (centre, radius, time-window) tuples,
computing the Poisson LRT against (events uniformly
distributed in space and time). The maximum LRT is the test
statistic; permutations of event timestamps generate the null.
mrm_tps_kulldorff_scan() returns a named list with
the most likely cluster, its Poisson log-likelihood-ratio statistic,
the Monte-Carlo permutation p-value, and a plain-language
interpretation.
Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6), 1481–1496.
if (FALSE) { tps <- morie_sample("tps_assault") mrm_tps_kulldorff_scan(tps, n_permutations = 49) }if (FALSE) { tps <- morie_sample("tps_assault") mrm_tps_kulldorff_scan(tps, n_permutations = 49) }
Latin-square three-way ANOVA (row, col, treatment)
mrm_latin_square(data, response_col, row_col, col_col, treatment_col)mrm_latin_square(data, response_col, row_col, col_col, treatment_col)
data |
data.frame. |
response_col, row_col, col_col, treatment_col
|
Column names. |
Named list with anova, n, k, interpretation.
# 4 x 4 Latin square: each treatment appears once per row and column. # `mrm_random_latin()` returns integer codes 0..k-1; convert to # letters for a more readable example. sq <- mrm_random_latin(k = 4, seed = 2026) df <- expand.grid(row = paste0("R", 1:4), col = paste0("C", 1:4)) df$treatment <- LETTERS[as.integer(as.vector(sq)) + 1L] set.seed(2026) df$y <- match(df$treatment, LETTERS) * 1.5 + rnorm(16, 0, 0.4) res <- mrm_latin_square(df, response_col = "y", row_col = "row", col_col = "col", treatment_col = "treatment" ) res$anova# 4 x 4 Latin square: each treatment appears once per row and column. # `mrm_random_latin()` returns integer codes 0..k-1; convert to # letters for a more readable example. sq <- mrm_random_latin(k = 4, seed = 2026) df <- expand.grid(row = paste0("R", 1:4), col = paste0("C", 1:4)) df$treatment <- LETTERS[as.integer(as.vector(sq)) + 1L] set.seed(2026) df$y <- match(df$treatment, LETTERS) * 1.5 + rnorm(16, 0, 0.4) res <- mrm_latin_square(df, response_col = "y", row_col = "row", col_col = "col", treatment_col = "treatment" ) res$anova
R parity of morie.mrm_tps_lisa() and
morie.mrm_tps_polygon_moran_per_year(). Local Moran's I per
polygon centroid with 999-permutation MC significance, plus a
convenience wrapper for the per-year time series used by the morie
empirical paper Section 7.11.
The LISA callables return named lists with per-polygon
local Moran's I, permutation p-values, cluster classifications, and
(for the per-year wrapper) the time series of global Moran's I.
Anselin, L. (1995). Local indicators of spatial association – LISA. Geographical Analysis, 27(2), 93–115.
if (FALSE) { ncr <- read.csv("Neighbourhood_Crime_Rates_Open_Data.csv") mrm_tps_lisa(ncr, count_col = "ASSAULT_2024", lat_col = "lat", lon_col = "lon" ) }if (FALSE) { ncr <- read.csv("Neighbourhood_Crime_Rates_Open_Data.csv") mrm_tps_lisa(ncr, count_col = "ASSAULT_2024", lat_col = "lat", lon_col = "lon" ) }
R parity of morie.mrm_mathstats. Closes the Chapter-2
coverage gap from designexptr.org/mathematical-statistics-
simulation-and-computation.html.
Each callable returns a named list with the computed
statistic(s) and a plain-language interpretation.
Wilks, S. S. (1962). Mathematical Statistics. Wiley. Casella, G. & Berger, R. L. (2002). Statistical Inference. Duxbury. Lehmann, E. L. & Romano, J. P. (2005). Testing Statistical Hypotheses.
mrm_oneprop_test(x = 58, n = 100, p0 = 0.5)mrm_oneprop_test(x = 58, n = 100, p0 = 0.5)
Empirical Monte-Carlo power
mrm_mc_power(simulator, n_sims = 1000L, alpha = 0.05, seed = 42L)mrm_mc_power(simulator, n_sims = 1000L, alpha = 0.05, seed = 42L)
simulator |
A function(seed) returning a p-value. |
n_sims |
Number of simulated datasets. |
alpha |
Type-I error level. |
seed |
Seed for outer RNG. |
Named list with empirical_power, se, ci95 bounds.
# Empirical power of a one-sample t-test against H0: mu = 0 # with true mu = 0.4 and n = 30. my_sim <- function(seed) { set.seed(seed) x <- rnorm(30, mean = 0.4, sd = 1) stats::t.test(x, mu = 0)$p.value } res <- mrm_mc_power(my_sim, n_sims = 500L, alpha = 0.05) res$empirical_power res$ci95_lower res$ci95_upper# Empirical power of a one-sample t-test against H0: mu = 0 # with true mu = 0.4 and n = 30. my_sim <- function(seed) { set.seed(seed) x <- rnorm(30, mean = 0.4, sd = 1) stats::t.test(x, mu = 0)$p.value } res <- mrm_mc_power(my_sim, n_sims = 500L, alpha = 0.05) res$empirical_power res$ci95_lower res$ci95_upper
Median causal effect via 1:1 nearest-neighbour PS matching
mrm_median_causal_effect(data, treatment_col, outcome_col, covariates)mrm_median_causal_effect(data, treatment_col, outcome_col, covariates)
data |
data.frame. |
treatment_col |
Binary 0/1 column. |
outcome_col |
Outcome column name. |
covariates |
Character vector of covariate columns. |
Named list with median_y1, median_y0, median_treatment_effect, n_matched, interpretation.
set.seed(2026) n <- 200L x <- rnorm(n) D <- rbinom(n, 1, plogis(0.5 * x)) y <- 0.7 * D + 0.3 * x + rnorm(n, 0, 0.5) df <- data.frame(D = D, y = y, age = x) res <- mrm_median_causal_effect(df, treatment_col = "D", outcome_col = "y", covariates = "age" ) res$median_treatment_effect res$n_matchedset.seed(2026) n <- 200L x <- rnorm(n) D <- rbinom(n, 1, plogis(0.5 * x)) y <- 0.7 * D + 0.3 * x + rnorm(n, 0, 0.5) df <- data.frame(D = D, y = y, age = x) res <- mrm_median_causal_effect(df, treatment_col = "D", outcome_col = "y", covariates = "age" ) res$median_treatment_effect res$n_matched
First rung of the diagnostic ladder: if OLS residuals show significant Moran's I, an SDM (or SEM/SAR) is warranted over OLS.
mrm_morans_i(residuals, W)mrm_morans_i(residuals, W)
residuals |
Numeric vector of length N (e.g. OLS residuals). |
W |
Numeric matrix of shape (N, N) – the spatial weight
matrix. Need not be row-standardised but must be aligned with
|
Statistic:
. Positive -> clustering, negative ->
dispersion, ~0 -> spatial randomness.
A named list with classes morie_mrm_result,
morie_rich_result, list. Carries
morans_i (the scalar statistic) plus interpretation
warnings.
set.seed(4) N <- 20 W <- matrix(runif(N * N), N, N); diag(W) <- 0; W <- W / rowSums(W) resid <- rnorm(N) mrm_morans_i(resid, W)$morans_iset.seed(4) N <- 20 W <- matrix(runif(N * N), N, N); diag(W) <- 0; W <- W / rowSums(W) resid <- rnorm(N) mrm_morans_i(resid, W)$morans_i
One-proportion test (binomial exact + Wald approximation)
mrm_oneprop_test(x, n, p0, alpha = 0.05)mrm_oneprop_test(x, n, p0, alpha = 0.05)
x |
Number of successes. |
n |
Number of trials. |
p0 |
Null-hypothesis proportion. |
alpha |
CI level (default 0.05 -> 95% CI). |
Named list with p_hat, p0, n, z_wald, p_value_wald, p_value_exact, ci95_wald_lower/upper, ci95_exact_lower/upper, interpretation.
# H0: proportion = 0.5 against the observed 58/100 successes mrm_oneprop_test(x = 58, n = 100, p0 = 0.5)# H0: proportion = 0.5 against the observed 58/100 successes mrm_oneprop_test(x = 58, n = 100, p0 = 0.5)
Five callables for the OTIS (Offender Tracking Information System) public-release datasets, used in the MRM (Multilevel Reconciliation Methodology) empirical companion paper. Every analysis is computed directly from the OTIS CSV files; no precomputed artifacts are required.
Functions:
mrm_otis_placement_concentration(): Hill-MLE Pareto exponent +
Gini coefficient + top-k% concentration on the b09 per-individual
placement-count distribution (within-fiscal-year).
mrm_otis_seg_duration_km(): Kaplan-Meier survival on the b01
NumberConsecutiveDays_Segregation durations (per-placement;
strata = alert profile).
mrm_otis_mortification_cooccurrence(): pairwise Cramer's V across
the three b01 alert flags (MentalHealth, SuicideRisk, SuicideWatch).
mrm_otis_region_locality(): chi-square + Cramer's V on the
Region_AtTimeOfPlacement x Region_MostRecentPlacement
contingency table, with the diagonal/off-diagonal share.
Plus the existing mrm_classify_mandela() (in mandela.R).
The OTIS UniqueIndividual_ID column has format YYYY-XXXXX-SG and
is randomly reassigned every fiscal year. Cross-year tracking is
therefore invalid; all analyses below operate within fiscal year.
Each mrm_otis_*() callable returns a named list with
the computed statistics (concentration indices, survival curves, or
association measures) and a plain-language interpretation.
if (FALSE) { b09 <- read.csv("b09_individuals_in_segregation.csv") mrm_otis_placement_concentration(b09) }if (FALSE) { b09 <- read.csv("b09_individuals_in_segregation.csv") mrm_otis_placement_concentration(b09) }
R parity of morie.mrm_otis_mandela_spectrum(). Computes a
full grid of provincial Mandela-classified rates across four
denominator conventions x three meaningful-contact proxies, so the
Cross-jurisdiction comparison table in the MRM formulations
paper (Section 5.3) can be reproduced from a single function call.
mrm_otis_mandela_spectrum( data, duration_col = "NumberConsecutiveDays_Segregation", year_col = "EndFiscalYear", id_col = "UniqueIndividual_ID", threshold_days = 15L, alert_cols = c("MentalHealth_Alert", "SuicideRisk_Alert", "SuicideWatch_Alert"), contact_proxies = c("none", "any_alert", "no_alert"), denominators = c("row", "individual_any", "individual_cumulative"), c11_data = NULL )mrm_otis_mandela_spectrum( data, duration_col = "NumberConsecutiveDays_Segregation", year_col = "EndFiscalYear", id_col = "UniqueIndividual_ID", threshold_days = 15L, alert_cols = c("MentalHealth_Alert", "SuicideRisk_Alert", "SuicideWatch_Alert"), contact_proxies = c("none", "any_alert", "no_alert"), denominators = c("row", "individual_any", "individual_cumulative"), c11_data = NULL )
data |
OTIS b01 data.frame. |
duration_col, year_col, id_col
|
Column names. |
threshold_days |
Rule 43 duration threshold (UN: 15 days). |
alert_cols |
Three b01 alert columns ( |
contact_proxies |
Subset of
|
denominators |
Subset of |
c11_data |
Optional c11 aggregate frame for the
|
Denominator conventions:
rowper-placement rate (b01 row count)
individual_anyshare of within-year individuals with any placement satisfying the criterion
individual_cumulativeshare of within-year individuals whose cumulative within-year segregation days exceeds the threshold
c11_aggregatethe duration-band aggregate from c11
(requires c11_data to be supplied)
Meaningful-contact proxies (Rule 44, derived from b01 alert flags):
noneRule 43 only; no contact proxy applied
any_alertRule 43 AND (any of MH/SR/SW alert active); these placements receive staff contact, so this is the looser contact-failure proxy
no_alertRule 43 AND (no alert active); strictest contact-failure proxy
Tidy long-format data.frame with one row per
(year, denominator, contact_proxy) cell, columns
year, denominator, contact_proxy,
n_eligible, n_mandela, rate, pct.
United Nations General Assembly (2015). United Nations Standard Minimum Rules for the Treatment of Prisoners (the Nelson Mandela Rules). A/RES/70/175. Rule 43 = prolonged (more than 15 days). Rule 44 = at least 22 hours/day, no meaningful human contact.
if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") spec <- mrm_otis_mandela_spectrum(b01) head(spec) }if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") spec <- mrm_otis_mandela_spectrum(b01) head(spec) }
Computes the pairwise Cramer's V (and chi-square test) for every pair of the three OTIS b01 alert columns. The MentalHealth x SuicideRisk Cramer's V is the substantive "mortification co-occurrence" figure used in the MRM paper.
mrm_otis_mortification_cooccurrence( data, alert_cols = c("MentalHealth_Alert", "SuicideRisk_Alert", "SuicideWatch_Alert") )mrm_otis_mortification_cooccurrence( data, alert_cols = c("MentalHealth_Alert", "SuicideRisk_Alert", "SuicideWatch_Alert") )
data |
A data.frame with at least the three alert columns. |
alert_cols |
Character vector of alert column names (default the three b01 alert columns). |
Values are computed by treating "Yes" as 1 and any other value as
0; rows with NA in either alert column are dropped from that pair.
A data.frame with one row per pair, columns alert_a,
alert_b, n, chi2, df, p_value, morie_cramers_v.
if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") mrm_otis_mortification_cooccurrence(b01) }if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") mrm_otis_mortification_cooccurrence(b01) }
Expands the OTIS b09 banded per-individual placement counts into a
per-person vector using band midpoints (the published bands are
\{1, 2, 3, 4, 5, 6-10, 11-15, 16-20, 21-25, 26-30, 31-35, 36-40, >40\}),
then computes Hill-MLE Pareto exponent, Gini coefficient, and top-k%
concentration within each fiscal year and pooled.
mrm_otis_placement_concentration( data, year_col = "EndFiscalYear", band_col = "NumberPlacements_Segregation", count_col = "NumberIndividuals_Segregation", gender_col = NULL, gender_keep = NULL, x_min = 1L, top_pct = 0.05 )mrm_otis_placement_concentration( data, year_col = "EndFiscalYear", band_col = "NumberPlacements_Segregation", count_col = "NumberIndividuals_Segregation", gender_col = NULL, gender_keep = NULL, x_min = 1L, top_pct = 0.05 )
data |
A data.frame in b09 long format with the columns named
in |
year_col |
Column name of the fiscal-year identifier
(default |
band_col |
Column name of the placement-count band
(default |
count_col |
Column name of the per-band individual count
(default |
gender_col |
Optional gender filter column. If supplied with
|
gender_keep |
Character vector of gender values to retain. |
x_min |
Hill-MLE lower-tail cutoff (default |
top_pct |
Numeric in (0, 1); top concentration cutoff
(default |
A data.frame with one row per fiscal year plus a final
"pooled" row, containing columns year, n_individuals,
n_placements, mean_per_individual, gini, hill_alpha,
top_pct_share.
Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. The Annals of Statistics, 3(5), 1163-1174.
Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661-703.
if (FALSE) { b09 <- read.csv("b09_individuals_in_segregation_number_of_times_in_segregation.csv") mrm_otis_placement_concentration(b09) }if (FALSE) { b09 <- read.csv("b09_individuals_in_segregation_number_of_times_in_segregation.csv") mrm_otis_placement_concentration(b09) }
Constructs the contingency table
Region_AtTimeOfPlacement x Region_MostRecentPlacement and reports
the chi-square statistic, Cramer's V, and the share of placements on
the diagonal (within-region staying) vs off-diagonal (cross-region
churn). Ontario seg/RC placement is overwhelmingly diagonal
(locality-preserving) in the public release.
mrm_otis_region_locality( data, region_at_col = "Region_AtTimeOfPlacement", region_recent_col = "Region_MostRecentPlacement" )mrm_otis_region_locality( data, region_at_col = "Region_AtTimeOfPlacement", region_recent_col = "Region_MostRecentPlacement" )
data |
A data.frame with |
region_at_col |
Column name of the at-placement region
(default |
region_recent_col |
Column name of the most-recent region
(default |
A list with named elements table (the contingency matrix),
chi2, df, p_value, morie_cramers_v, diagonal_share,
off_diagonal_share.
if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") mrm_otis_region_locality(b01) }if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") mrm_otis_region_locality(b01) }
Treats each row of OTIS b01 as one observed placement with duration
NumberConsecutiveDays_Segregation and no censoring (all durations
are observed end-to-end within the fiscal year). Returns the median
duration and the requested-quantile survival probabilities by stratum.
mrm_otis_seg_duration_km( data, duration_col = "NumberConsecutiveDays_Segregation", group_cols = NULL, probs = c(0.5, 0.25, 0.1, 0.05, 0.01), mandela_threshold = 15L )mrm_otis_seg_duration_km( data, duration_col = "NumberConsecutiveDays_Segregation", group_cols = NULL, probs = c(0.5, 0.25, 0.1, 0.05, 0.01), mandela_threshold = 15L )
data |
A data.frame containing |
duration_col |
Column name of segregation duration in days
(default |
group_cols |
Optional character vector of stratifying-column names. NULL pools all rows. |
probs |
Quantiles of the survival function to report
(default |
mandela_threshold |
Day cutoff (default |
This replaces the misreading of UniqueIndividual_ID = YYYY-XXXXX-SG
as a persistent person identifier (which produces a spurious
~210-day cross-year TTR artifact). The valid quantity here is the
distribution of how long a placement lasts, not how long until the
next placement.
A data.frame with one row per stratum (or one pooled row),
columns stratum, n, mean_days, median_days,
q25_days, pct_above_mandela, median_among_above_mandela.
if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") mrm_otis_seg_duration_km(b01) mrm_otis_seg_duration_km(b01, group_cols = "MentalHealth_Alert") }if (FALSE) { b01 <- read.csv("b01_segregation_detailed_dataset.csv") mrm_otis_seg_duration_km(b01) mrm_otis_seg_duration_km(b01, group_cols = "MentalHealth_Alert") }
Permutes treatment labels within each block.
mrm_perm_block( data, response_col, treatment_col, block_col, n_perm = 1000L, seed = 42L )mrm_perm_block( data, response_col, treatment_col, block_col, n_perm = 1000L, seed = 42L )
data |
data.frame. |
response_col, treatment_col, block_col
|
Column names. |
n_perm |
Number of permutations. |
seed |
RNG seed. |
Named list with observed_statistic, n_perm, p_value, interpretation.
set.seed(2026) df <- expand.grid( block = paste0("B", 1:6), treatment = c("ctrl", "drug") ) # Block-level baseline + treatment effect df$y <- as.numeric(df$block) * 1.2 + ifelse(df$treatment == "drug", 0.7, 0) + rnorm(nrow(df), 0, 0.4) res <- mrm_perm_block(df, response_col = "y", treatment_col = "treatment", block_col = "block", n_perm = 500L ) res$p_valueset.seed(2026) df <- expand.grid( block = paste0("B", 1:6), treatment = c("ctrl", "drug") ) # Block-level baseline + treatment effect df$y <- as.numeric(df$block) * 1.2 + ifelse(df$treatment == "drug", 0.7, 0) + rnorm(nrow(df), 0, 0.4) res <- mrm_perm_block(df, response_col = "y", treatment_col = "treatment", block_col = "block", n_perm = 500L ) res$p_value
If X ~ F, then F(X) ~ Uniform(0,1). Returned U should be approx uniform if the assumed F is correct. Attaches a KS p-value of U against Uniform(0,1) as the diagnostic for fit quality.
mrm_pit(sample, dist = "norm", ...)mrm_pit(sample, dist = "norm", ...)
sample |
Numeric vector. |
dist |
Distribution suffix for |
... |
Additional parameters for |
data.frame with raw, U columns and attributes ks_stat, ks_pvalue.
set.seed(2026) x <- rnorm(200) # Under correct distributional assumption, U should be ~Uniform(0,1): pit <- mrm_pit(x, dist = "norm") attr(pit, "ks_pvalue") # large p-value => no evidence against fit # If we deliberately misspecify (claim t_3 fits the normal sample): pit_wrong <- mrm_pit(x, dist = "t", df = 3) attr(pit_wrong, "ks_pvalue") # small p-value => misspecification detectedset.seed(2026) x <- rnorm(200) # Under correct distributional assumption, U should be ~Uniform(0,1): pit <- mrm_pit(x, dist = "norm") attr(pit, "ks_pvalue") # large p-value => no evidence against fit # If we deliberately misspecify (claim t_3 fits the normal sample): pit_wrong <- mrm_pit(x, dist = "t", df = 3) attr(pit_wrong, "ks_pvalue") # small p-value => misspecification detected
R parity of morie.mrm_primitives.threshold_specific_ordinal().
Adapted from O'Connell & Laniyonu (2025) Race & Justice
15(3):428–453, where a Bayesian cumulative-logit model is fit with
race / gender coefficients allowed to VARY by cumulative threshold.
The empirically critical finding – bias concentrated at the
low->medium cutoff but not the medium->high cutoff – is invisible
to standard proportional-odds specifications.
This R port is the frequentist analogue: for each cutpoint
a separate binary logit is fit to the
indicator , so the coefficient vector
is unconstrained across thresholds. When
MASS is available we delegate to polr
for the proportional-odds (PO) baseline; otherwise the PO baseline
is fit by a stacked-IRLS approximation matching the Python
implementation. The threshold-specific fits always run via
glm with family = binomial("logit").
Standard threshold (proportional-odds, K levels, p covariates):
Threshold-specific extension (one coefficient vector per cutpoint):
O'Connell, M. & Laniyonu, A. (2025). Threshold-specific cumulative-logit models for actuarial-risk audit. Race & Justice, 15(3), 428–453.
mrm_score_net_residual (internal helper)
Q-Q plot coordinates against a reference distribution
mrm_qq_plot(sample, dist = "norm", ...)mrm_qq_plot(sample, dist = "norm", ...)
sample |
Numeric vector. |
dist |
Either |
... |
Additional parameters passed to |
data.frame with rank, empirical, theoretical, plotting_position columns (Blom 1958 plotting positions).
set.seed(2026) x <- rnorm(100) qq <- mrm_qq_plot(x, dist = "norm") head(qq) # plot(qq$theoretical, qq$empirical); abline(0, 1)set.seed(2026) x <- rnorm(100) qq <- mrm_qq_plot(x, dist = "norm") head(qq) # plot(qq$theoretical, qq$empirical); abline(0, 1)
Builds the cyclic Latin square then permutes rows, columns, and symbols. Uniform over a subset of Latin squares (not all).
mrm_random_latin(k, seed = 42L)mrm_random_latin(k, seed = 42L)
k |
Side length. |
seed |
RNG seed. |
A k x k integer matrix (codes 0..k-1) with row names R1..Rk and column names C1..Ck. Each code appears exactly once per row and per column.
# 4 x 4 random Latin square: each of {0, 1, 2, 3} appears once # per row and per column. mrm_random_latin(k = 4, seed = 42L) # Reproducible across runs with the same seed: identical( mrm_random_latin(5, seed = 7), mrm_random_latin(5, seed = 7) )# 4 x 4 random Latin square: each of {0, 1, 2, 3} appears once # per row and per column. mrm_random_latin(k = 4, seed = 42L) # Reproducible across runs with the same seed: identical( mrm_random_latin(5, seed = 7), mrm_random_latin(5, seed = 7) )
Model: y_ij = mu + tau_i (treatment) + beta_j (block) + eps_ij Returns Type-I ANOVA: block enters first, then treatment.
mrm_rcbd(data, response_col, treatment_col, block_col)mrm_rcbd(data, response_col, treatment_col, block_col)
data |
data.frame. |
response_col, treatment_col, block_col
|
Column names. |
Named list with anova (data.frame), n, n_treatments, n_blocks, interpretation.
set.seed(2026) df <- expand.grid( treatment = c("A", "B", "C"), block = c("B1", "B2", "B3", "B4") ) # Treatment effect + block effect + noise df$y <- as.numeric(df$treatment) * 2 + as.numeric(df$block) * 0.5 + rnorm(nrow(df), 0, 0.3) res <- mrm_rcbd(df, response_col = "y", treatment_col = "treatment", block_col = "block" ) res$anovaset.seed(2026) df <- expand.grid( treatment = c("A", "B", "C"), block = c("B1", "B2", "B3", "B4") ) # Treatment effect + block effect + noise df$y <- as.numeric(df$treatment) * 2 + as.numeric(df$block) * 0.5 + rnorm(nrow(df), 0, 0.3) res <- mrm_rcbd(df, response_col = "y", treatment_col = "treatment", block_col = "block" ) res$anova
Fits y = b0 + sum b_i x_i + sum b_ii x_i^2 + sum b_ij x_i x_j and returns the stationary point if the quadratic matrix B is invertible.
mrm_response_surface(data, response_col, factor_cols)mrm_response_surface(data, response_col, factor_cols)
data |
data.frame. |
response_col |
Response column. |
factor_cols |
Character vector of factor columns. |
Named list with coefficients, stationary_point, stationary_y, stationary_nature, eigenvalues, n, interpretation.
# Central composite design on (x1, x2) with quadratic response. set.seed(2026) df <- expand.grid( x1 = c(-1.4, -1, 0, 1, 1.4), x2 = c(-1.4, -1, 0, 1, 1.4) ) df$y <- 10 + 2 * df$x1 + 1.5 * df$x2 - df$x1^2 - 1.2 * df$x2^2 + rnorm(nrow(df), 0, 0.2) res <- mrm_response_surface(df, response_col = "y", factor_cols = c("x1", "x2") ) res$stationary_point res$stationary_nature# Central composite design on (x1, x2) with quadratic response. set.seed(2026) df <- expand.grid( x1 = c(-1.4, -1, 0, 1, 1.4), x2 = c(-1.4, -1, 0, 1, 1.4) ) df$y <- 10 + 2 * df$x1 + 1.5 * df$x2 - df$x1^2 - 1.2 * df$x2^2 + rnorm(nrow(df), 0, 0.2) res <- mrm_response_surface(df, response_col = "y", factor_cols = c("x1", "x2") ) res$stationary_point res$stationary_nature
MORIE ships a small set of reference CSVs in inst/extdata/ so that
the mrm_otis_*() and mrm_tps_*() callables can be exercised
without any network call. For full datasets, the on-demand fetchers
pull from the original public sources:
OTIS: data.ontario.ca CKAN package data-on-inmates-in-ontario.
Resource IDs are baked into morie_dataset_catalog(); use
morie_load_dataset("otisb01") (etc.) which calls the existing
CKAN fetcher.
TPS: Toronto Police Open Data ArcGIS REST. Use
morie_fetch_tps(category = "Assault").
SIU: Ontario SIU Director's Reports site. Use
morie_fetch_siu() which parses the public reports site on
demand (per-user, since redistribution of the parsed corpus is
not clearly licensed).
The on-demand fetchers (morie_fetch_tps(),
morie_fetch_siu()) return the file path to the downloaded or
cached CSV; morie_load_dataset() returns the loaded
data.frame.
if (FALSE) { b01 <- morie_load_dataset("otisb01") head(b01) }if (FALSE) { b01 <- morie_load_dataset("otisb01") head(b01) }
Three callables for SIU case-level CSVs. Unlike OTIS (no placement
dates) and TPS (no per-person ID), SIU exposes per-case dates with a
stable police_service jurisdiction column, enabling a real
"time-to-outcome" KM survival analysis.
Functions:
mrm_siu_case_to_decision_km(): Kaplan-Meier on the gap from
date_of_incident_iso to date_of_director_decision_iso,
stratified by police_service. The valid TTR analysis the
MA-thesis "210-day TTR" claim should have been.
mrm_siu_per_service_rate(): Per-police-service case rate by
year and stratum, useful for cross-jurisdiction comparisons.
mrm_siu_outcome_classifier(): Tabulates the Director's-decision
categories (charges_laid, no_charges, etc.) by service and
by year, reporting both raw counts and shares.
Each mrm_siu_*() callable returns a named list with
the survival, per-service rate, or outcome-classification result and a
plain-language interpretation.
if (FALSE) { siu <- read.csv("SIU.csv") mrm_siu_case_to_decision_km(siu) }if (FALSE) { siu <- read.csv("SIU.csv") mrm_siu_case_to_decision_km(siu) }
Computes the gap (in days) between the incident date and the
Director's decision date for every SIU case, dropping rows where
either date is missing. Reports per-stratum median + IQR + n.
Cases without a decision date as of the snapshot are right-censored
if censor_open_cases = TRUE (default).
mrm_siu_case_to_decision_km( data, incident_col = "date_of_incident_iso", decision_col = "date_of_director_decision_iso", service_col = "police_service", censor_open_cases = TRUE, min_n = 5L )mrm_siu_case_to_decision_km( data, incident_col = "date_of_incident_iso", decision_col = "date_of_director_decision_iso", service_col = "police_service", censor_open_cases = TRUE, min_n = 5L )
data |
A data.frame with the SIU case schema. |
incident_col |
Column with ISO incident date
(default |
decision_col |
Column with ISO Director's decision date
(default |
service_col |
Stratifying jurisdiction column
(default |
censor_open_cases |
Logical (default |
min_n |
Minimum cases per service to retain in the per-service
summary (default |
This is the substantive "time-to-outcome" analysis the MA-thesis "210-day TTR" claim should have been; it operates on real per-case dates with a stable jurisdiction identifier.
A list with elements:
pooled: a single-row data.frame with the pooled median,
mean, IQR, n, n_censored.
by_service: per-service data.frame with the same columns.
if (FALSE) { siu <- read.csv("SIU.csv") res <- mrm_siu_case_to_decision_km(siu) head(res$by_service) }if (FALSE) { siu <- read.csv("SIU.csv") res <- mrm_siu_case_to_decision_km(siu) head(res$by_service) }
Cross-tabulates a categorical outcome column (default
director_decision_category) by service and year, reporting both
raw counts and within-service shares. If the supplied
outcome_col is not present, looks for a few common alternatives
(director_decision, outcome).
mrm_siu_outcome_classifier( data, outcome_col = "director_decision_category", service_col = "police_service" )mrm_siu_outcome_classifier( data, outcome_col = "director_decision_category", service_col = "police_service" )
data |
A data.frame in the SIU case schema. |
outcome_col |
Outcome category column
(default |
service_col |
Police-service column
(default |
A data.frame with columns service, outcome, n_cases,
share_within_service.
if (FALSE) { siu <- read.csv("SIU.csv") mrm_siu_outcome_classifier(siu) }if (FALSE) { siu <- read.csv("SIU.csv") mrm_siu_outcome_classifier(siu) }
Tabulates the number of SIU cases per police service per year, and
optionally per reason_for_interaction stratum.
mrm_siu_per_service_rate( data, service_col = "police_service", incident_col = "date_of_incident_iso", stratify_col = NULL )mrm_siu_per_service_rate( data, service_col = "police_service", incident_col = "date_of_incident_iso", stratify_col = NULL )
data |
A data.frame in the SIU case schema. |
service_col |
Police-service column (default |
incident_col |
Incident-date column for year extraction
(default |
stratify_col |
Optional second stratifying column. |
A data.frame with columns service, year, optional
stratum, n_cases.
if (FALSE) { siu <- read.csv("SIU.csv") mrm_siu_per_service_rate(siu) }if (FALSE) { siu <- read.csv("SIU.csv") mrm_siu_per_service_rate(siu) }
Mirrors the Python module
morie.mrm_primitives.spatial_spillover, adapted from
Laniyonu (2018) Urban Affairs Review 54(5):898-930, which in turn
uses LeSage & Pace (2009) + Elhorst (2010) + the Yang/Noah/Shoff
(2015) decomposition formula.
The Laniyonu (2018) result – gentrification's effect on stops-per-capita is ~0 direct but +51 to +90\ into neighbouring tracts) – only surfaces once you decompose. An OLS or non-spatial FE model would report "no effect" and miss the entire story.
This primitive is the SDM with the canonical decomposition + the
Moran's-I diagnostic that justifies SDM over OLS. We deliberately
do NOT fit the SDM ourselves – spdep/spatialreg are
hard deps we don't want to force. The caller passes the estimated
+ vectors; this primitive does the
decomposition arithmetic.
Implements the standard LeSage & Pace formula:
mrm_spatial_spillover_decomposition( rho, beta_direct, beta_spatial, W, coefficient_names = NULL )mrm_spatial_spillover_decomposition( rho, beta_direct, beta_spatial, W, coefficient_names = NULL )
rho |
Numeric scalar. Spatial-autoregressive coefficient from the fitted SDM. |
beta_direct |
Numeric vector of length |
beta_spatial |
Numeric vector of length |
W |
Numeric matrix of shape (N, N). Row-standardised spatial weight matrix. |
coefficient_names |
Optional character vector of length K with
human-readable covariate names; defaults to |
for each covariate . The diagonal of the resulting
per-observation effects matrix is averaged for the direct
effect; the average off-diagonal-row-sum is the indirect
effect; total = direct + indirect.
A named list with classes morie_mrm_result,
morie_rich_result, list. Carries
decomposition (a data.frame with columns
coefficient, direct, indirect, total,
note), rho, plus interpretation +
warnings.
set.seed(3) N <- 12 W <- matrix(runif(N * N), N, N) diag(W) <- 0 W <- W / rowSums(W) res <- mrm_spatial_spillover_decomposition( rho = 0.4, beta_direct = c(0.10, -0.05), beta_spatial = c(0.30, 0.00), W = W, coefficient_names = c("gentrification", "controls") ) res$decompositionset.seed(3) N <- 12 W <- matrix(runif(N * N), N, N) diag(W) <- 0 W <- W / rowSums(W) res <- mrm_spatial_spillover_decomposition( rho = 0.4, beta_direct = c(0.10, -0.05), beta_spatial = c(0.30, 0.00), W = W, coefficient_names = c("gentrification", "controls") ) res$decomposition
For continuous X: SMD = (mean_t - mean_c) / sqrt((s2_t + s2_c)/2) For binary X: SMD = (p_t - p_c) / sqrt((p_t(1-p_t) + p_c(1-p_c))/2) Returned as percent. |SMD| > 10 is the Austin (2009) imbalance threshold.
mrm_standardised_difference(data, treatment_col, covariates)mrm_standardised_difference(data, treatment_col, covariates)
data |
data.frame. |
treatment_col |
Binary 0/1 treatment column name. |
covariates |
Character vector of covariate columns. |
data.frame with covariate, mean_treated, mean_control, pooled_sd, smd_pct, imbalanced columns.
set.seed(2026) n <- 200L df <- data.frame( D = rbinom(n, 1, 0.4), age = rnorm(n, 50, 10), bmi = rnorm(n, 27, 4) ) df$age[df$D == 1] <- df$age[df$D == 1] + 3 # deliberate imbalance mrm_standardised_difference(df, treatment_col = "D", covariates = c("age", "bmi") )set.seed(2026) n <- 200L df <- data.frame( D = rbinom(n, 1, 0.4), age = rnorm(n, 50, 10), bmi = rnorm(n, 27, 4) ) df$age[df$D == 1] <- df$age[df$D == 1] + 3 # deliberate imbalance mrm_standardised_difference(df, treatment_col = "D", covariates = c("age", "bmi") )
Fit logistic on
the survey microdata.
Apply fitted coefficients to area-level marginals from
area_df.
Multiply predicted rate by area population to obtain the synthetic "population at risk" exposure offset.
mrm_synthetic_area_exposure( survey_df, survey_trait_col, survey_covariate_cols, area_df, area_population_col, fit_callable = NULL, return_per_area_rate = FALSE )mrm_synthetic_area_exposure( survey_df, survey_trait_col, survey_covariate_cols, area_df, area_population_col, fit_callable = NULL, return_per_area_rate = FALSE )
survey_df |
A |
survey_trait_col |
Character. Name of the binary trait column. |
survey_covariate_cols |
Character vector of covariates that are present in BOTH the survey and the area dataset. |
area_df |
A |
area_population_col |
Character. Adult-population column in
|
fit_callable |
Optional function with signature
|
return_per_area_rate |
Logical; default |
A named list with classes morie_mrm_result,
morie_rich_result, list. Carries
exposure (named numeric vector, one entry per area row),
predicted_rate (when requested), coef (the fitted
logistic coefficient vector), plus interpretation +
warnings.
set.seed(2) n_survey <- 500 x1 <- rnorm(n_survey); x2 <- rnorm(n_survey) p <- 1 / (1 + exp(-(-2 + 0.6 * x1 - 0.4 * x2))) y <- rbinom(n_survey, 1, p) survey <- data.frame(trait = y, x1 = x1, x2 = x2) area <- data.frame( x1 = rnorm(20), x2 = rnorm(20), pop = sample(800:1500, 20, replace = TRUE) ) rownames(area) <- paste0("area_", seq_len(20)) res <- mrm_synthetic_area_exposure( survey_df = survey, survey_trait_col = "trait", survey_covariate_cols = c("x1", "x2"), area_df = area, area_population_col = "pop" ) head(res$exposure)set.seed(2) n_survey <- 500 x1 <- rnorm(n_survey); x2 <- rnorm(n_survey) p <- 1 / (1 + exp(-(-2 + 0.6 * x1 - 0.4 * x2))) y <- rbinom(n_survey, 1, p) survey <- data.frame(trait = y, x1 = x1, x2 = x2) area <- data.frame( x1 = rnorm(20), x2 = rnorm(20), pop = sample(800:1500, 20, replace = TRUE) ) rownames(area) <- paste0("area_", seq_len(20)) res <- mrm_synthetic_area_exposure( survey_df = survey, survey_trait_col = "trait", survey_covariate_cols = c("x1", "x2"), area_df = area, area_population_col = "pop" ) head(res$exposure)
Mirrors the Python module
morie.mrm_primitives.synthetic_exposure, adapted from
Laniyonu & Goff (2021) BMC Psychiatry 21(1):500.
The trick: when you need a rate-per-hidden-subpopulation (force- per-PwSMI, contact-per-undocumented, contact-per-homeless) and no administrative census of that subpopulation exists, you can:
Fit on a national
probability sample (NCS-R for SMI; ACS-style survey for other
traits) using ONLY covariates also available at the area level.
Apply the fitted coefficients to area-level marginals from
ACS / census to predict per area.
Multiply by area-level adult population to get a synthetic "population at risk" denominator.
Generalises far beyond Laniyonu & Goff's SMI application: homelessness rates of police force, LGBTQ stop-and-frisk rates, undocumented-immigrant ICE-contact rates – any "rate per hidden subpopulation" estimand.
The returned offset is suitable for use as the offset=
log(exposure) argument in a Poisson / negative-binomial GLM that
counts trait-specific events.
Convenience accessor mirroring
ThresholdSpecificOrdinalResult.coefficient_by_threshold().
mrm_threshold_coefficient(x, covariate)mrm_threshold_coefficient(x, covariate)
x |
A result from |
covariate |
Character, name of one covariate. |
A named numeric vector keyed by threshold label.
For each cumulative cutpoint , fits an
independent logistic regression of on the
covariates. Optionally fits the proportional-odds baseline and
returns the likelihood-ratio test of PO vs. threshold-specific.
mrm_threshold_specific_ordinal( data, outcome_col, covariate_cols, ordinal_levels = NULL, fit_proportional_odds_first = TRUE, max_iter = 200L, tol = 1e-06 )mrm_threshold_specific_ordinal( data, outcome_col, covariate_cols, ordinal_levels = NULL, fit_proportional_odds_first = TRUE, max_iter = 200L, tol = 1e-06 )
data |
data.frame, one row per unit. |
outcome_col |
Character; name of the ordinal outcome column.
Either an ordered factor / integer code or a character column
(in which case |
covariate_cols |
Character vector of predictor columns. Categorical predictors should be one-hot dummied before passing. |
ordinal_levels |
Optional character vector giving the explicit
ordering of the outcome categories (low-to-high). If |
fit_proportional_odds_first |
Logical; if |
max_iter, tol
|
IRLS / GLM control passed to |
An object of class c("mrm_threshold_specific_ordinal",
"morie_mrm_result", "list") with elements
threshold_labels, covariate_names,
coefficients (a (K-1) x p matrix), cutpoints,
log_likelihood, n_obs, and (if requested)
proportional_odds_lr_stat, proportional_odds_lr_df,
proportional_odds_p.
if (FALSE) { df <- data.frame( y = sample(c("low", "med", "high"), 200, replace = TRUE), race = rbinom(200, 1, 0.4), age = rnorm(200) ) mrm_threshold_specific_ordinal(df, outcome_col = "y", covariate_cols = c("race", "age"), ordinal_levels = c("low", "med", "high") ) }if (FALSE) { df <- data.frame( y = sample(c("low", "med", "high"), 200, replace = TRUE), race = rbinom(200, 1, 0.4), age = rnorm(200) ) mrm_threshold_specific_ordinal(df, outcome_col = "y", covariate_cols = c("race", "age"), ordinal_levels = c("low", "med", "high") ) }
Four callables for TPS public-release crime-incident CSVs, used in the MRM empirical companion paper.
Functions:
mrm_tps_levy_scaling(): Hill-MLE Pareto exponent of inter-incident
step-length distribution on the lat/long-coded event stream.
mrm_tps_moran_clustering(): global Moran's I + DBSCAN cluster
summary on the lat/long-coded event stream.
mrm_tps_neighbourhood_recurrence_km(): Kaplan-Meier inter-event
gap distribution per HOOD_158 neighbourhood.
mrm_tps_load_hawkes_refit(): convenience loader that pulls the
precomputed per-category Hawkes (Markovian + Weibull/sin)
fits from the paper_hawkes_refit.json manifest if available.
Each mrm_tps_*() callable returns a named list with
the computed statistic (Pareto exponent, Moran's I, or survival curve)
and a plain-language interpretation; mrm_tps_load_hawkes_refit()
returns the parsed Hawkes-refit manifest as a list.
if (FALSE) { tps <- read.csv("Assault_Open_Data.csv") mrm_tps_levy_scaling(tps) }if (FALSE) { tps <- read.csv("Assault_Open_Data.csv") mrm_tps_levy_scaling(tps) }
Run a 3-d (lat, lon, time) Kulldorff scan with MC inference
mrm_tps_kulldorff_scan( data, date_col = "OCC_DATE", lat_col = "LAT_WGS84", lon_col = "LONG_WGS84", radii_km = c(1, 2, 3, 5, 8), window_years = 4, n_centers = 60L, n_permutations = 199L, n_top_clusters = 1L, seed = 42L )mrm_tps_kulldorff_scan( data, date_col = "OCC_DATE", lat_col = "LAT_WGS84", lon_col = "LONG_WGS84", radii_km = c(1, 2, 3, 5, 8), window_years = 4, n_centers = 60L, n_permutations = 199L, n_top_clusters = 1L, seed = 42L )
data |
data.frame with date_col, lat_col, lon_col. |
date_col |
Column name of the event date (default
|
lat_col, lon_col
|
WGS84 lat/long column names. |
radii_km |
Candidate cylinder radii in km. |
window_years |
Time-cylinder length in years. |
n_centers |
Number of random candidate centres sub-sampled. |
n_permutations |
Monte-Carlo permutations. |
n_top_clusters |
Integer; number of top clusters to return.
Accepted for Python signature parity. The current implementation
returns a single primary cluster (the secondary-cluster loop in
|
seed |
Random seed. |
A one-row data.frame describing the top cluster, with
columns center_lat, center_lon, radius_km,
t_start, t_end, n_observed, n_expected,
relative_risk, log_lrt, p_value.
if (FALSE) { tps <- morie_sample("tps_assault") mrm_tps_kulldorff_scan(tps, n_permutations = 49) }if (FALSE) { tps <- morie_sample("tps_assault") mrm_tps_kulldorff_scan(tps, n_permutations = 49) }
Treats consecutive events in chronological order as a single stream
and computes the inter-event step length (km) via haversine on
WGS84 latitude/longitude. Returns the Hill-MLE exponent restricted
to steps above min_step_km.
mrm_tps_levy_scaling( data, date_col = "OCC_DATE", lat_col = "LAT_WGS84", lon_col = "LONG_WGS84", min_step_km = 0.5, x_min = NULL )mrm_tps_levy_scaling( data, date_col = "OCC_DATE", lat_col = "LAT_WGS84", lon_col = "LONG_WGS84", min_step_km = 0.5, x_min = NULL )
data |
A data.frame with at least the columns named in
|
date_col |
Column name of the date / timestamp
(default |
lat_col |
Column name of WGS84 latitude
(default |
lon_col |
Column name of WGS84 longitude
(default |
min_step_km |
Lower-tail cutoff in km (default |
x_min |
Hill-MLE cutoff (default = |
A list with n_events, n_steps_tail, min_step_km,
hill_alpha.
if (FALSE) { tps <- read.csv("Assault_Open_Data.csv") mrm_tps_levy_scaling(tps) }if (FALSE) { tps <- read.csv("Assault_Open_Data.csv") mrm_tps_levy_scaling(tps) }
Local Moran's I per polygon + quadrant + 999-permutation significance
mrm_tps_lisa( data, count_col, lat_col = "lat", lon_col = "lon", id_col = NULL, k = 6L, n_permutations = 999L, seed = 42L )mrm_tps_lisa( data, count_col, lat_col = "lat", lon_col = "lon", id_col = NULL, k = 6L, n_permutations = 999L, seed = 42L )
data |
data.frame with one row per polygon. |
count_col |
Column with per-polygon counts (e.g. "ASSAULT_2024"). |
lat_col, lon_col
|
WGS84 centroid columns. |
id_col |
Optional polygon-ID column (passed through to output). |
k |
k-NN spatial-weights neighbourhood (default 6). |
n_permutations |
MC permutations (default 999, the spatial-statistics convention). |
seed |
RNG seed. |
A list with elements n_polygons, global_moran_I,
permutations, knn_k, table (per-polygon
data.frame), quadrants_all, quadrants_significant_p05,
n_significant_p05.
if (FALSE) { ncr <- read.csv("Neighbourhood_Crime_Rates_Open_Data.csv") res <- mrm_tps_lisa(ncr, count_col = "ASSAULT_2024", lat_col = "lat", lon_col = "lon" ) }if (FALSE) { ncr <- read.csv("Neighbourhood_Crime_Rates_Open_Data.csv") res <- mrm_tps_lisa(ncr, count_col = "ASSAULT_2024", lat_col = "lat", lon_col = "lon" ) }
Reads paper_hawkes_refit.json and returns the Markovian (exponential
kernel, constant baseline) and non-Markovian (Weibull kernel,
sinusoidal baseline) AIC, branching ratio, and KS p-value per
category as a tidy data.frame.
mrm_tps_load_hawkes_refit(manifest_path)mrm_tps_load_hawkes_refit(manifest_path)
manifest_path |
Path to |
A data.frame with one row per category, columns
category, n_fitted, T_days, aic_mark, kappa_mark,
ks_p_mark, aic_nm, eta_nm, ks_p_nm, delta_aic.
if (FALSE) { mrm_tps_load_hawkes_refit("paper_hawkes_refit.json") }if (FALSE) { mrm_tps_load_hawkes_refit("paper_hawkes_refit.json") }
Grids the lat/long extent of data into a coarse raster of
grid_resolution cells, counts events per cell, and computes the
global Moran's I via a rook contiguity matrix. Also runs DBSCAN on
the raw lat/long points (rescaled to km) and reports cluster
counts.
mrm_tps_moran_clustering( data, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84", grid_resolution = 40L, dbscan_eps = 0.3, dbscan_minpts = 5L )mrm_tps_moran_clustering( data, lat_col = "LAT_WGS84", lon_col = "LONG_WGS84", grid_resolution = 40L, dbscan_eps = 0.3, dbscan_minpts = 5L )
data |
A data.frame with |
lat_col |
Column name of WGS84 latitude. |
lon_col |
Column name of WGS84 longitude. |
grid_resolution |
Number of cells per axis (default |
dbscan_eps |
DBSCAN radius in km (default |
dbscan_minpts |
DBSCAN minimum points per core (default |
This function is a thin computational wrapper. For high-precision
computations on full-sized TPS files use the morie Python
tps_spatial_advanced pipeline; the R version is for quick
interactive auditing.
A list with morans_I, morans_z, dbscan_n_clusters,
dbscan_n_noise, dbscan_largest.
if (FALSE) { tps <- read.csv("Assault_Open_Data.csv") mrm_tps_moran_clustering(tps) }if (FALSE) { tps <- read.csv("Assault_Open_Data.csv") mrm_tps_moran_clustering(tps) }
For each HOOD_158 neighbourhood, sorts events chronologically and computes the gap (in days) between consecutive events. Returns the per-neighbourhood mean, median, and total gaps. No censoring is applied (every gap is observed).
mrm_tps_neighbourhood_recurrence_km( data, date_col = "OCC_DATE", hood_col = "HOOD_158", min_gap_days = 0 )mrm_tps_neighbourhood_recurrence_km( data, date_col = "OCC_DATE", hood_col = "HOOD_158", min_gap_days = 0 )
data |
A data.frame with |
date_col |
Column name of the date column
(default |
hood_col |
Column name of the neighbourhood ID
(default |
min_gap_days |
Smallest gap to include (default |
A data.frame with one row per neighbourhood, columns
hood, n_events, n_gaps, mean_gap_days, median_gap_days,
p25_gap_days, p75_gap_days.
if (FALSE) { tps <- read.csv("Assault_Open_Data.csv") mrm_tps_neighbourhood_recurrence_km(tps) }if (FALSE) { tps <- read.csv("Assault_Open_Data.csv") mrm_tps_neighbourhood_recurrence_km(tps) }
Convenience wrapper that loops mrm_tps_lisa over a vector
of per-year count columns.
mrm_tps_polygon_moran_per_year( data, year_cols, lat_col = "lat", lon_col = "lon", k = 6L, n_permutations = 999L, seed = 42L )mrm_tps_polygon_moran_per_year( data, year_cols, lat_col = "lat", lon_col = "lon", k = 6L, n_permutations = 999L, seed = 42L )
data |
Polygon-level data.frame. |
year_cols |
Character vector of per-year count column names
(e.g. |
lat_col, lon_col, k, n_permutations, seed
|
as in
|
data.frame with columns year, n_events,
moran_I, global_p_value.
# 4 x 4 polygon grid with two yearly count columns. set.seed(2026) grid <- expand.grid( lat = 43.6 + (0:3) * 0.02, lon = -79.4 + (0:3) * 0.02 ) grid$ASSAULT_2023 <- rpois(nrow(grid), lambda = grid$lat * 10) grid$ASSAULT_2024 <- rpois(nrow(grid), lambda = grid$lat * 12) res <- mrm_tps_polygon_moran_per_year( grid, year_cols = c("ASSAULT_2023", "ASSAULT_2024"), lat_col = "lat", lon_col = "lon", k = 4L, n_permutations = 99L, seed = 42L ) res# 4 x 4 polygon grid with two yearly count columns. set.seed(2026) grid <- expand.grid( lat = 43.6 + (0:3) * 0.02, lon = -79.4 + (0:3) * 0.02 ) grid$ASSAULT_2023 <- rpois(nrow(grid), lambda = grid$lat * 10) grid$ASSAULT_2024 <- rpois(nrow(grid), lambda = grid$lat * 12) res <- mrm_tps_polygon_moran_per_year( grid, year_cols = c("ASSAULT_2023", "ASSAULT_2024"), lat_col = "lat", lon_col = "lon", k = 4L, n_permutations = 99L, seed = 42L ) res
Always returns Welch t (unequal variance, canonical), Student t (equal variance), and Mann-Whitney U (rank-based). The Welch p-value is the canonical answer; the others are the sensitivity range.
mrm_two_treatment_test(a, b, alpha = 0.05)mrm_two_treatment_test(a, b, alpha = 0.05)
a, b
|
Outcome vectors under treatments A and B. |
alpha |
CI level (default 0.05). |
Named list with estimate, se, t_statistic, df, p_welch, p_student, p_mannwhitney, ci_lower, ci_upper, n_a, n_b, interpretation.
set.seed(2026) a <- rnorm(40, mean = 5, sd = 1.2) b <- rnorm(40, mean = 5.5, sd = 1.5) res <- mrm_two_treatment_test(a, b) res$estimate # mean(a) - mean(b) res$p_welch # canonical p-value res$p_mannwhitney # rank-based sensitivity checkset.seed(2026) a <- rnorm(40, mean = 5, sd = 1.2) b <- rnorm(40, mean = 5.5, sd = 1.5) res <- mrm_two_treatment_test(a, b) res$estimate # mean(a) - mean(b) res$p_welch # canonical p-value res$p_mannwhitney # rank-based sensitivity check
Two-proportion test (chi-square + Fisher exact + Wald)
mrm_twoprop_test(x1, n1, x2, n2, alpha = 0.05)mrm_twoprop_test(x1, n1, x2, n2, alpha = 0.05)
x1, n1
|
Successes and trials in group 1. |
x2, n2
|
Successes and trials in group 2. |
alpha |
CI level (default 0.05). |
Named list with p1, p2, diff, chi2, df, p_value_chi2, p_value_fisher, z_wald, p_value_wald, ci95_diff_lower/upper, interpretation.
# Compare 47/100 vs 31/100; two-sided test. mrm_twoprop_test(x1 = 47, n1 = 100, x2 = 31, n2 = 100)# Compare 47/100 vs 31/100; two-sided test. mrm_twoprop_test(x1 = 47, n1 = 100, x2 = 31, n2 = 100)
Six jurisdiction-agnostic analyses for police Use-of-Force data,
mirroring the Python module morie.mrm_uof. Every function
accepts a data.frame (or tibble) and returns a named
list carrying both the numeric outputs and a multi-paragraph
plain-language interpretation, so the result can be printed
to a notebook without further post-processing.
mrm_uof_force_concentration: Hill-MLE Pareto
exponent + Gini coefficient + top-5 / top-10 share for incident
counts aggregated by force / service.
mrm_uof_weapon_diversity: weapon-by-force
contingency: chi-square, Cramer's V, and the top-3 cells by
standardised Pearson residual.
mrm_uof_yoy_change: year-on-year percentage
change with a manual largest-gap change-point fallback (the R
side does not require ruptures).
mrm_uof_region_locality: region-at-time vs.
region-now contingency: diagonal share, chi-square, Cramer's V.
mrm_uof_demographic_disparity: per-category
outcome rates with Wilson 95\
baseline group, optional non-parametric bootstrap percentile
interval on the risk ratio.
mrm_uof_data_quality_audit: per-column null
and dtype audit, with optional schema-comparison against a
supplied CKAN sidecar list or column-spec list.
Schema, null, and suspect-value audit
mrm_uof_data_quality_audit(df, sidecar = NULL, expected_schema = NULL)mrm_uof_data_quality_audit(df, sidecar = NULL, expected_schema = NULL)
df |
A data.frame. |
sidecar |
Optional list with |
expected_schema |
Optional list with a |
Named list with per_column, missing_columns,
extra_columns, dtype_mismatches, suspect_flags.
Demographic disparity in outcome rates with risk-ratio CIs
mrm_uof_demographic_disparity( df, demo_col, outcome_col, baseline = NULL, bootstrap_reps = 0L )mrm_uof_demographic_disparity( df, demo_col, outcome_col, baseline = NULL, bootstrap_reps = 0L )
df |
A data.frame. |
demo_col |
Categorical demographic column. |
outcome_col |
Binary outcome column (0/1 or logical). |
baseline |
Optional baseline category (default: largest-N group). |
bootstrap_reps |
Bootstrap replications for the RR percentile CI. Set to 0 (default) to skip. |
Named list with baseline, baseline_rate,
per_category (list of lists), risk_ratios.
Aggregates per-force incident counts and reports a Hill-MLE Pareto tail exponent, the Gini coefficient, and the top-5 / top-10 concentration shares.
mrm_uof_force_concentration(df, force_col, count_col = NULL)mrm_uof_force_concentration(df, force_col, count_col = NULL)
df |
A |
force_col |
Character. Name of the column identifying the force / service / agency. |
count_col |
Character or |
A named list with classes morie_mrm_uof_result,
morie_rich_result, list. Numeric outputs include
pareto_alpha_mle, gini, top5_share,
top10_share, n_forces, n_incidents.
df <- data.frame(force = c(rep("A", 50), rep("B", 5))) res <- mrm_uof_force_concentration(df, "force") res$ginidf <- data.frame(force = c(rep("A", 50), rep("B", 5))) res <- mrm_uof_force_concentration(df, "force") res$gini
Region-at-time vs region-now locality contingency
mrm_uof_region_locality(df, region_at_col, region_now_col)mrm_uof_region_locality(df, region_at_col, region_now_col)
df |
A data.frame. |
region_at_col |
Region at the time of the incident. |
region_now_col |
Most-recent region. |
Named list with diagonal_share, chi2,
pvalue, df, cramers_v.
Builds a weapon x force contingency table, runs a chi-square test of independence, computes Cramer's V, and reports the top-3 (weapon, force) cells by standardised Pearson residual.
mrm_uof_weapon_diversity(df, weapon_col, force_col)mrm_uof_weapon_diversity(df, weapon_col, force_col)
df |
A data.frame. |
weapon_col |
Categorical weapon column. |
force_col |
Categorical force / service column. |
A named list with chi2, pvalue, df,
cramers_v, top_residuals (list-of-lists), and an
interpretation paragraph.
Either supply dfs_by_year (named list mapping year string /
integer to a data.frame) or df + year_col.
mrm_uof_yoy_change( dfs_by_year = NULL, df = NULL, year_col = NULL, count_col = NULL )mrm_uof_yoy_change( dfs_by_year = NULL, df = NULL, year_col = NULL, count_col = NULL )
dfs_by_year |
Named list of |
df |
A data.frame to be grouped by |
year_col |
Required when |
count_col |
Optional column to sum within each year (rows counted otherwise). |
Change-point detection is the manual largest-absolute-difference heuristic (the R port does not require changepoint).
Named list with years, counts, yoy_pct,
change_point_year, mean_abs_yoy_pct.
Chi-square test for variance (Wilks 1962)
mrm_var_test(sample, sigma0_sq, alpha = 0.05)mrm_var_test(sample, sigma0_sq, alpha = 0.05)
sample |
Numeric vector (assumed iid normal). |
sigma0_sq |
Null hypothesis variance. |
alpha |
CI level (default 0.05). |
Named list with s_sq, sigma0_sq, chi2_stat, df, p_value_two_sided, p_value_one_sided_greater/less, ci95_lower/upper, interpretation.
set.seed(2026) x <- rnorm(50, mean = 0, sd = 1.2) # H0: variance = 1. mrm_var_test(sample = x, sigma0_sq = 1)set.seed(2026) x <- rnorm(50, mean = 0, sd = 1.2) # H0: variance = 1. mrm_var_test(sample = x, sigma0_sq = 1)
Delegates to poolr::meff when installed (poolr
implements Galwey, Li-Ji, and Nyholt). Otherwise computes the
chosen estimator inline.
n_effective_tests(correlation_matrix, method = c("galwey", "li_ji", "nyholt"))n_effective_tests(correlation_matrix, method = c("galwey", "li_ji", "nyholt"))
correlation_matrix |
Square symmetric correlation matrix. |
method |
One of |
Effective number of tests (>= 1).
Total number of registered commands (excluding aliases)
n_stat_commands()n_stat_commands()
Performs nested K-fold CV: the outer loop estimates generalisation performance while an inner CV grid search picks the best hyperparameter configuration on each outer training fold. Two calling conventions are supported for backward compatibility:
nested_cross_validate( fit_fn = NULL, predict_fn = NULL, X = NULL, y = NULL, score_fn = NULL, hyperparam_grid = NULL, outer_k = 5L, inner_k = 3L, scoring = "roc_auc", random_state = 42L, tune_fn = NULL, outer_folds = NULL )nested_cross_validate( fit_fn = NULL, predict_fn = NULL, X = NULL, y = NULL, score_fn = NULL, hyperparam_grid = NULL, outer_k = 5L, inner_k = 3L, scoring = "roc_auc", random_state = 42L, tune_fn = NULL, outer_folds = NULL )
fit_fn |
Function with signature |
predict_fn |
Function with signature |
X |
Numeric predictor matrix (or coercible). |
y |
Response vector. |
score_fn |
Optional custom scoring function
|
hyperparam_grid |
Named list of candidate vectors (one per hyperparameter). The Cartesian product defines the search grid. |
outer_k |
Number of outer folds (default 5). |
inner_k |
Number of inner folds (default 3). |
scoring |
Named scoring rule passed to the internal scorer
( |
random_state |
Integer seed for fold construction (default 42). |
tune_fn |
Deprecated legacy positional argument; see Description. |
outer_folds |
Deprecated alias for |
Legacy stub form: nested_cross_validate(tune_fn,
predict_fn, X, y, outer_folds, scoring, random_state) where
tune_fn(X, y) returns a fitted model (no grid argument).
In this mode no inner search is run.
Full form: pass fit_fn, predict_fn,
score_fn, and hyperparam_grid (a named list of
candidate vectors). The function enumerates the Cartesian
product, runs inner K-fold CV on each outer training fold,
picks the best configuration, refits on the full outer-train
fold, and scores on the held-out outer fold.
Named list with outer_scores (numeric vector, length
outer_k), best_hyperparams_per_fold (list of named lists),
mean_score, se_score, and n_configs.
set.seed(1) n <- 120 X <- matrix(rnorm(n * 3), n, 3) y <- as.integer(plogis(X[, 1]) > runif(n)) fit_fn <- function(X, y, hp) { df <- data.frame(y = y, X) suppressWarnings(stats::glm(y ~ ., data = df, family = stats::binomial())) } predict_fn <- function(model, X) { stats::predict(model, newdata = data.frame(X), type = "response") } nested_cross_validate(fit_fn = fit_fn, predict_fn = predict_fn, X = X, y = y, hyperparam_grid = list(dummy = c(1)), outer_k = 3L, inner_k = 2L)set.seed(1) n <- 120 X <- matrix(rnorm(n * 3), n, 3) y <- as.integer(plogis(X[, 1]) > runif(n)) fit_fn <- function(X, y, hp) { df <- data.frame(y = y, X) suppressWarnings(stats::glm(y ~ ., data = df, family = stats::binomial())) } predict_fn <- function(model, X) { stats::predict(model, newdata = data.frame(X), type = "response") } nested_cross_validate(fit_fn = fit_fn, predict_fn = predict_fn, X = X, y = y, hyperparam_grid = list(dummy = c(1)), outer_k = 3L, inner_k = 2L)
Internal Shapiro-Wilk and Jarque-Bera helpers were removed in v0.9.6;
the suite now calls stats::shapiro.test directly and computes
the Jarque-Bera statistic inline. Users wanting a stand-alone
Jarque-Bera test should call tseries::jarque.bera.test.
normality_suite(x)normality_suite(x)
x |
Numeric vector. |
A list of morie_test_result.
Number needed to harm (NNH) — sign-reversed NNT
number_needed_to_harm(a, b, c, d, confidence = 0.95)number_needed_to_harm(a, b, c, d, confidence = 0.95)
a, b, c, d
|
Cell counts. |
confidence |
Confidence level. Default 0.95. |
A morie_effect_size.
Number needed to treat (NNT) = 1 / |RD|
number_needed_to_treat(a, b, c, d, confidence = 0.95)number_needed_to_treat(a, b, c, d, confidence = 0.95)
a, b, c, d
|
Cell counts. |
confidence |
Confidence level. Default 0.95. |
A morie_effect_size.
Computes the kernel-weighted local mean estimator m-hat(x) = sum K_h(x - X_i) Y_i divided by sum K_h(x - X_i), with a Gaussian kernel.
nw_regression(x, y, x_eval, bandwidth)nw_regression(x, y, x_eval, bandwidth)
x |
Numeric vector of observed covariate values, length n. |
y |
Numeric vector of observed outcomes, length n. |
x_eval |
Numeric vector of evaluation points. |
bandwidth |
Positive bandwidth h. |
Numeric vector of fitted values at x_eval.
Nadaraya, E. A. (1964). On Estimating Regression. Theory of Probability and Its Applications, 9(1), 141-142.
[[a, b], [c, d]]
Odds ratio for a 2x2 table [[a, b], [c, d]]
odds_ratio(a, b, c, d, confidence = 0.95)odds_ratio(a, b, c, d, confidence = 0.95)
a, b, c, d
|
Cell counts. |
confidence |
Confidence level. Default 0.95. |
A morie_effect_size.
Odds-ratio table from a fitted logistic GLM
odds_ratio_table( model, confidence = 0.95, digits = 3L, apa = FALSE, output_format = "dataframe", title = "Odds Ratios" )odds_ratio_table( model, confidence = 0.95, digits = 3L, apa = FALSE, output_format = "dataframe", title = "Odds Ratios" )
model |
A fitted |
confidence |
Confidence level. |
digits |
Decimal places. |
apa |
APA formatting. |
output_format |
Output target. |
title |
Title. |
Omega-squared — less biased than eta-squared
omega_squared(ss_effect, ss_total, df_effect, ms_error)omega_squared(ss_effect, ss_total, df_effect, ms_error)
ss_effect, ss_total
|
Sums of squares. |
df_effect |
Numerator d.f. of the effect. |
ms_error |
Error mean square. |
A morie_effect_size.
Closed-form Cinelli-Hazlett robustness-value implementation in
base R. For the full sensemakr treatment (benchmark plots,
adjusted t-statistics, contour plots) on a fitted lm
object, use morie_sensitivity_omitted_var_bias.
omitted_variable_bias( estimate, se, dof, r2_yd_x, partial_r2_treatment, q = 1, alpha = 0.05, benchmark_covariates = NULL )omitted_variable_bias( estimate, se, dof, r2_yd_x, partial_r2_treatment, q = 1, alpha = 0.05, benchmark_covariates = NULL )
estimate |
Treatment coefficient. |
se |
SE of the estimate. |
dof |
Residual degrees of freedom. |
r2_yd_x |
Partial R^2 of treatment with outcome. |
partial_r2_treatment |
Same as |
q |
Fraction of the estimate to be explained away. Default 1. |
alpha |
Significance level. Default 0.05. |
benchmark_covariates |
Named list mapping covariate name -> partial R^2. |
A morie_ovb named-list.
One-sample z-test for a proportion
one_proportion_ztest(count, nobs, value = 0.5, confidence = 0.95)one_proportion_ztest(count, nobs, value = 0.5, confidence = 0.95)
count |
Successes. |
nobs |
Total observations. |
value |
Hypothesised proportion. |
confidence |
Confidence level (Wilson CI). |
One-sample Student's t-test
one_sample_ttest(x, mu0 = 0, confidence = 0.95)one_sample_ttest(x, mu0 = 0, confidence = 0.95)
x |
Numeric vector. |
mu0 |
Hypothesised mean. |
confidence |
Confidence level (default 0.95). |
A morie_test_result.
One-way between-subjects ANOVA
one_way_anova(...)one_way_anova(...)
... |
Two or more numeric vectors (groups). |
morie_test_result with eta-squared effect size.
Convert odds ratio to Cohen's d (Hasselblad & Hedges, 1995)
or_to_d(or_val)or_to_d(or_val)
or_val |
Odds ratio. |
Numeric d.
Convert OR to Pearson r via d
or_to_r(or_val)or_to_r(or_val)
or_val |
Odds ratio. |
Numeric r.
Performs a sign-flipping paired permutation test on the paired
differences. coin::symmetry_test(distribution = "approximate")
is the canonical CRAN equivalent (cross-referenced); rmorie keeps
the inline path because the rmorie API returns the full null
distribution.
paired_permutation_test( x, y, statistic = "mean_diff", n_permutations = 9999L, alternative = "two-sided", seed = 42L )paired_permutation_test( x, y, statistic = "mean_diff", n_permutations = 9999L, alternative = "two-sided", seed = 42L )
x, y
|
Paired numeric vectors (same length). |
statistic |
|
n_permutations |
Number of permutations. |
alternative |
|
seed |
Random seed. |
A morie_permutation_test_result.
coin::symmetry_test.
Paired-sample t-test
paired_ttest(x, y, confidence = 0.95)paired_ttest(x, y, confidence = 0.95)
x, y
|
Equal-length numeric vectors. |
confidence |
Confidence level. |
Generates bootstrap samples from a fitted parametric distribution
rather than from the empirical sample. Delegates to
boot::boot(sim = "parametric") when boot is installed;
otherwise uses an inline rnorm/rpois/rbinom/rexp/rgamma loop.
parametric_bootstrap( data, statistic, distribution = "normal", n_boot = 2000L, ci_level = 0.95, seed = 42L, ... )parametric_bootstrap( data, statistic, distribution = "normal", n_boot = 2000L, ci_level = 0.95, seed = 42L, ... )
data |
Original numeric data (used to fit the distribution). |
statistic |
Function returning a scalar. |
distribution |
One of |
n_boot |
Number of replicates. |
ci_level |
Confidence level. |
seed |
Random seed. |
... |
Distribution-specific parameters (mu, sigma, lam, p, scale, shape). |
A morie_bootstrap_result.
boot::boot.
Partial Pearson correlation controlling for covariates
partial_correlation(x, y, covariates, confidence = 0.95)partial_correlation(x, y, covariates, confidence = 0.95)
x, y
|
Numeric vectors of interest. |
covariates |
Matrix or data frame of covariates. |
confidence |
Confidence level. |
Partial eta-squared
partial_eta_squared(ss_effect, ss_error)partial_eta_squared(ss_effect, ss_error)
ss_effect |
Sum of squares for the effect. |
ss_error |
Error sum of squares. |
A morie_effect_size.
Burg AR-spectrum estimation: parametric PSD via the Burg algorithm for AR-coefficient estimation. Well-suited to short HRV windows where Welch suffers from low spectral resolution.
pburg(x, fs, order = 16L, nfft = 256L)pburg(x, fs, order = 16L, nfft = 256L)
x |
Numeric vector (1-D signal). |
fs |
Sampling frequency in Hz. |
order |
AR model order (default 16). |
nfft |
FFT length for PSD evaluation (default 256). |
Reference: Marple, S.L. (1987) Digital Spectral Analysis with Applications, Prentice-Hall, on the Burg algorithm.
List with filtered (PSD), name, fs, n_samples, and
extra (freqs, order, ar_coefficients).
set.seed(1) t <- seq(0, 1, length.out = 512) x <- sin(2 * pi * 10 * t) + 0.5 * rnorm(length(t)) res <- pburg(x, fs = 512) length(res$filtered)set.seed(1) t <- seq(0, 1, length.out = 512) x <- sin(2 * pi * 10 * t) + 0.5 * rnorm(length(t)) res <- pburg(x, fs = 512) length(res$filtered)
Shannon-energy envelope of a phonocardiogram (PCG): normalises the
signal, computes , then box-smooths over a 20 ms
window. The standard envelope used for S1/S2 segmentation.
pcgenv(pcg, fs)pcgenv(pcg, fs)
pcg |
Numeric vector (1-D PCG signal). |
fs |
Sampling frequency in Hz. |
Reference: Liang, H., Lukkarinen, S. & Hartimo, I. (1997) "Heart sound segmentation algorithm based on heart sound envelogram", Comput. Cardiol., pp. 105–108.
List with filtered (envelope), name, fs, n_samples.
set.seed(1) pcg <- rnorm(4000) res <- pcgenv(pcg, fs = 2000) length(res$filtered)set.seed(1) pcg <- rnorm(4000) res <- pcgenv(pcg, fs = 2000) length(res$filtered)
Combines a 100–400 Hz band-energy ratio, normalised spectral entropy,
and the Higuchi fractal dimension of the PCG into a murmur-likelihood
score in [0, 1].
pcgmur(pcg, fs)pcgmur(pcg, fs)
pcg |
Numeric vector (1-D PCG signal). |
fs |
Sampling frequency in Hz. |
Reference: Rangayyan, R.M. (2015) Biomedical Signal Analysis, 2nd ed., Wiley/IEEE Press, chapter on heart-sound analysis.
List with value (score in [0, 1]), name, and extra
(fractal_dimension, hf_energy_ratio, spectral_entropy,
fd_score, hf_score, ent_score).
if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) pcg <- rnorm(4000) res <- pcgmur(pcg, fs = 2000) res$value }if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) pcg <- rnorm(4000) res <- pcgmur(pcg, fs = 2000) res$value }
Segments a PCG Shannon-energy envelope into S1 (systolic) and S2 (diastolic) heart-sound events: threshold, find above-threshold runs, merge close peaks, label alternating events.
pcgseg(envelope, fs = 2000, min_gap_ms = 100)pcgseg(envelope, fs = 2000, min_gap_ms = 100)
envelope |
Numeric vector (Shannon-energy envelope). |
fs |
Sampling frequency in Hz (default 2000). |
min_gap_ms |
Minimum gap between peaks in ms (default 100). |
Reference: Liang, Lukkarinen & Hartimo (1997), Comput. Cardiol., pp. 105–108.
List with value (cycle count), name, and extra
(s1_indices, s2_indices, n_cycles, n_peaks).
set.seed(1) env <- abs(sin(seq(0, 20, length.out = 4000))) + 0.05 * rnorm(4000) env[env < 0] <- 0 res <- pcgseg(env, fs = 2000) res$extra$n_cyclesset.seed(1) env <- abs(sin(seq(0, 20, length.out = 4000))) + 0.05 * rnorm(4000) env[env < 0] <- 0 res <- pcgseg(env, fs = 2000) res$extra$n_cycles
Pearson product-moment correlation
pearson_correlation(x, y, confidence = 0.95)pearson_correlation(x, y, confidence = 0.95)
x, y
|
Numeric vectors. |
confidence |
Confidence level. |
Estimates the false discovery rate at each candidate threshold
using a matrix of p-values computed under the permutation null,
and selects the largest threshold whose estimated FDR is at most
alpha. Q-values are assigned as the minimum estimated FDR
across thresholds at least as large as each observed p-value.
permutation_fdr(test_stats, null_stats, alpha = 0.05, labels = NULL)permutation_fdr(test_stats, null_stats, alpha = 0.05, labels = NULL)
test_stats |
Numeric vector of observed p-values (named after the Python sibling argument to keep cross-language parity). |
null_stats |
Numeric matrix of permutation-null p-values with
|
alpha |
Target FDR level (default 0.05). |
labels |
Optional character vector of test labels. |
A morie_rich_result list with original (raw
p-values), adjusted (q-values), rejected,
method, alpha, n_rejected, n_tests.
set.seed(1) m <- 20; nperm <- 200 p_obs <- c(stats::runif(m - 4), c(1e-4, 1e-3, 1e-3, 5e-3)) p_null <- matrix(stats::runif(nperm * m), nperm, m) permutation_fdr(p_obs, p_null)set.seed(1) m <- 20; nperm <- 200 p_obs <- c(stats::runif(m - 4), c(1e-4, 1e-3, 1e-3, 5e-3)) p_null <- matrix(stats::runif(nperm * m), nperm, m) permutation_fdr(p_obs, p_null)
Given observed test statistics and a matrix of test statistics under the permutation null, computes step-down max-T adjusted p-values that strongly control the family-wise error rate without requiring independence across tests.
permutation_fwer( test_stats, null_stats, alternative = c("two_sided", "greater", "less"), alpha = 0.05, labels = NULL )permutation_fwer( test_stats, null_stats, alternative = c("two_sided", "greater", "less"), alpha = 0.05, labels = NULL )
test_stats |
Numeric vector of length |
null_stats |
Numeric matrix with |
alternative |
One of |
alpha |
Significance level for rejection (default 0.05). |
labels |
Optional character vector of test labels. |
A morie_rich_result list with original,
adjusted, rejected, method, alpha,
n_rejected, n_tests.
set.seed(1) m <- 10; nperm <- 200 obs <- c(rnorm(m - 2), 4.0, 3.5) null <- matrix(rnorm(nperm * m), nperm, m) permutation_fwer(obs, null)set.seed(1) m <- 10; nperm <- 200 obs <- c(rnorm(m - 2), 4.0, 3.5) null <- matrix(rnorm(nperm * m), nperm, m) permutation_fwer(obs, null)
Shuffles the combined samples n_permutations times to
construct the null distribution of the chosen test statistic.
coin's coin::oneway_test(distribution = "approximate")
implements the same test with a Monte-Carlo null; it is delegated
to when coin is installed and statistic is the
default "mean_diff" (the rmorie API allows arbitrary
f(g1, g2) which coin does not expose, so a custom statistic
falls back to the inline shuffle loop). The inline path keeps the
full null distribution which downstream MRM code consumes.
permutation_test( group1, group2, statistic = "mean_diff", n_permutations = 9999L, alternative = "two-sided", seed = 42L )permutation_test( group1, group2, statistic = "mean_diff", n_permutations = 9999L, alternative = "two-sided", seed = 42L )
group1, group2
|
Numeric vectors. |
statistic |
Either |
n_permutations |
Number of permutations. |
alternative |
|
seed |
Random seed. |
A morie_permutation_test_result.
coin::oneway_test, coin::independence_test.
Pettitt's (1979) test for a single change-point in a time series. Returns the change-point index, the test statistic, and an approximate p-value.
pettitt_changepoint(series)pettitt_changepoint(series)
series |
Numeric vector / time series. |
Named list with change_point_index, U_max, p_value,
note.
Pettitt (1979). A non-parametric approach to the change-point problem. J. R. Stat. Soc. C, 28(2), 126–135.
Petrosian fractal dimension , where counts sign
changes of the first difference. A fast complexity proxy for EEG/ECG.
pfd(x)pfd(x)
x |
Numeric vector. |
Reference: Petrosian, A. (1995) "Kolmogorov complexity of finite sequences and recognition of different preictal EEG patterns", Proc. 8th IEEE Symp. Comput.-Based Med. Syst., pp. 212–217.
List with value (D), name, and extra (n_delta, n).
set.seed(1) x <- cumsum(rnorm(1000)) res <- pfd(x) res$valueset.seed(1) x <- cumsum(rnorm(1000)) res <- pfd(x) res$value
Correlates each covariate with rank-transformed event times.
ph_assumption_test( survival_times, event_indicator, covariates, covariate_names = NULL )ph_assumption_test( survival_times, event_indicator, covariates, covariate_names = NULL )
survival_times |
Event/censoring times. |
event_indicator |
1 = event, 0 = censored. |
covariates |
Covariate matrix. |
covariate_names |
Optional names. |
A list of morie_specification_test objects, one per
covariate.
Phi coefficient for a 2x2 contingency table
phi_coefficient(contingency_table)phi_coefficient(contingency_table)
contingency_table |
2x2 numeric matrix. |
A morie_effect_size.
Point-biserial correlation
point_biserial_correlation(binary, continuous, confidence = 0.95)point_biserial_correlation(binary, continuous, confidence = 0.95)
binary |
0/1 vector. |
continuous |
Numeric vector. |
confidence |
Confidence level. |
Prediction interval for a new study (random-effects meta)
prediction_interval(estimates, standard_errors, confidence = 0.95)prediction_interval(estimates, standard_errors, confidence = 0.95)
estimates |
Numeric vector of effect-size estimates. |
standard_errors |
Numeric vector of SEs. |
confidence |
Confidence level. Default 0.95. |
Numeric c(lower, upper).
Print method for audit results.
## S3 method for class 'morie_audit_result' print(x, ...)## S3 method for class 'morie_audit_result' print(x, ...)
x |
A |
... |
Unused. |
morie_tps_result objects.Pretty-print method for morie_tps_result objects.
## S3 method for class 'morie_tps_result' print(x, ...)## S3 method for class 'morie_tps_result' print(x, ...)
x |
A |
... |
Ignored. |
Print method for taxonomy entries.
## S3 method for class 'morie_variable_taxonomy' print(x, ...)## S3 method for class 'morie_variable_taxonomy' print(x, ...)
x |
A |
... |
Unused. |
Draws bias parameters from prior distributions and returns the
distribution of bias-adjusted estimates. Cross-references
episensr (episensr::probsens) for the canonical
multi-bias version with separate selection-bias and
misclassification-bias models; use episensr directly when
you need those.
probabilistic_bias_analysis( estimate, se, n_simulations = 10000L, bias_parms = NULL, seed = 42L )probabilistic_bias_analysis( estimate, se, n_simulations = 10000L, bias_parms = NULL, seed = 42L )
estimate |
Observed estimate. |
se |
Standard error. |
n_simulations |
Number of MC draws. Default 10000. |
bias_parms |
Named list with |
seed |
RNG seed. Default 42. |
Named list with bias-adjusted distribution summaries.
Pearson r as an effect size with Fisher-z CI
r_effect_size(x, y, confidence = 0.95)r_effect_size(x, y, confidence = 0.95)
x, y
|
Numeric vectors (NA dropped). |
confidence |
Confidence level for CI. Default 0.95. |
A morie_effect_size.
Coefficient of determination R^2
r_squared(x, y)r_squared(x, y)
x, y
|
Numeric vectors (NA dropped). |
A morie_effect_size.
Convert Pearson r to Cohen's d
r_to_d(r)r_to_d(r)
r |
Pearson r. |
Numeric d.
Convert Pearson r to OR via d
r_to_or(r)r_to_or(r)
r |
Pearson r. |
Numeric OR.
Ramsey RESET test for functional-form misspecification
ramsey_reset_test(y, X, powers = c(2, 3))ramsey_reset_test(y, X, powers = c(2, 3))
y |
Response vector. |
X |
Design matrix (with intercept). |
powers |
Integer vector of powers of fitted values to add
to the auxiliary regression (default |
A morie_specification_test.
Random-effects (DerSimonian-Laird) meta-analytic pooling
random_effects_meta( estimates, standard_errors, confidence = 0.95, method = "DL" )random_effects_meta( estimates, standard_errors, confidence = 0.95, method = "DL" )
estimates |
Numeric vector of effect-size estimates. |
standard_errors |
Numeric vector of SEs. |
confidence |
Confidence level. Default 0.95. |
method |
Tau^2 estimator. Only |
A morie_effect_size with tau^2, I^2, Q, prediction
interval in extra.
Rank-biserial correlation (matched rank version)
rank_biserial_correlation(x, y, confidence = 0.95)rank_biserial_correlation(x, y, confidence = 0.95)
x, y
|
Numeric vectors (NA dropped). |
confidence |
Confidence level for CI. Default 0.95. |
A morie_effect_size.
Incidence rate ratio (IRR)
rate_ratio(events1, person_time1, events2, person_time2, confidence = 0.95)rate_ratio(events1, person_time1, events2, person_time2, confidence = 0.95)
events1, person_time1
|
Events and person-time in group 1. |
events2, person_time2
|
Events and person-time in group 2. |
confidence |
Confidence level. Default 0.95. |
A morie_effect_size.
Register a stat_command in the package-level registry
register_stat_command(cmd)register_stat_command(cmd)
cmd |
A |
The command name, invisibly.
Side-by-side regression table for multiple model fits
regression_table( models, exponentiate = FALSE, show_ci = TRUE, show_stars = TRUE, confidence = 0.95, digits = 3L, model_stats = c("nobs", "rsquared", "aic", "bic", "llf"), apa = FALSE, output_format = "dataframe", title = "Regression Results" )regression_table( models, exponentiate = FALSE, show_ci = TRUE, show_stars = TRUE, confidence = 0.95, digits = 3L, model_stats = c("nobs", "rsquared", "aic", "bic", "llf"), apa = FALSE, output_format = "dataframe", title = "Regression Results" )
models |
Named list of fitted models (e.g. |
exponentiate |
Exponentiate coefficients (for OR / HR). |
show_ci |
Include CI line under each coefficient. |
show_stars |
Append significance stars. |
confidence |
Confidence level for CIs. |
digits |
Decimal places. |
model_stats |
Vector of model-stat keys from
|
apa |
APA p-value formatting. |
output_format |
Output target. |
title |
Title. |
Repeats .boot_cross_validate() n_repeats times
with different RNG seeds and pools the per-fold scores.
caret::trainControl(method = "repeatedcv") and
rsample::vfold_cv both implement the same partitioning
(cross-referenced).
repeated_cv( X, y, model_fn, score_fn, n_folds = 10L, n_repeats = 10L, seed = 42L )repeated_cv( X, y, model_fn, score_fn, n_folds = 10L, n_repeats = 10L, seed = 42L )
X |
Numeric matrix or data.frame of predictors. |
y |
Numeric or factor outcome vector aligned with rows of |
model_fn |
Function |
score_fn |
Function |
n_folds |
Integer; number of folds per repeat (default 10). |
n_repeats |
Number of repetitions. |
seed |
Integer RNG seed for reproducibility. |
A morie_cv_result pooling scores across repeats.
caret::trainControl, rsample::vfold_cv.
Sphericity assumption is left to the user; this routine computes the
uncorrected F. Pair with ez::ezANOVA for GG/HF correction if
needed.
repeated_measures_anova(data, outcome, subject, within)repeated_measures_anova(data, outcome, subject, within)
data |
Long-format data frame. |
outcome, subject, within
|
Column names. |
Resolve a command by canonical name or alias
resolve_stat_command(name)resolve_stat_command(name)
name |
Character scalar. |
A morie_stat_command or NULL.
Risk difference for a 2x2 table
risk_difference(a, b, c, d, confidence = 0.95)risk_difference(a, b, c, d, confidence = 0.95)
a, b, c, d
|
Cell counts. |
confidence |
Confidence level. Default 0.95. |
A morie_effect_size.
Risk ratio (relative risk) for a 2x2 table
risk_ratio(a, b, c, d, confidence = 0.95)risk_ratio(a, b, c, d, confidence = 0.95)
a, b, c, d
|
Cell counts. |
confidence |
Confidence level. Default 0.95. |
A morie_effect_size.
Phase 1.g delegates to rbounds when installed and the
wilcoxon or sign method is requested; otherwise
falls back to the base-R normal-approximation implementation
originally shipped with rmorie. The mcnemar path is always
served by the inline binomial formula (rbounds does not expose a
McNemar entry point on CRAN).
rosenbaum_bounds( treated_outcomes, control_outcomes, gamma_range = NULL, method = "wilcoxon" )rosenbaum_bounds( treated_outcomes, control_outcomes, gamma_range = NULL, method = "wilcoxon" )
treated_outcomes |
Vector of outcomes for treated units. |
control_outcomes |
Vector of outcomes for matched controls. |
gamma_range |
Numeric vector of Gamma values (default
|
method |
One of |
A morie_rosenbaum_bounds named-list.
Computes the RR (beat-to-beat) interval series in milliseconds from a vector of R-peak sample indices.
rrint(r_peaks, fs)rrint(r_peaks, fs)
r_peaks |
Integer vector of R-peak sample indices. |
fs |
Sampling frequency in Hz. |
Reference: Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (1996) "Heart rate variability: standards of measurement, physiological interpretation, and clinical use", Circulation 93(5):1043–1065.
List with value (mean RR in ms), name, and extra
(rr_ms, mean_rr, std_rr, n_intervals).
rr <- rrint(c(100, 350, 600, 850, 1100), fs = 250) rr$valuerr <- rrint(c(100, 350, 600, 850, 1100), fs = 250) rr$value
Run a command's REPL handler with positional/keyword arguments
run_stat_command(name, ...)run_stat_command(name, ...)
name |
Command name or alias. |
... |
Arguments forwarded to the REPL handler. |
Whatever the handler returns. Stops with an informative error if the command is not registered.
Wald-Wolfowitz runs test for randomness
runs_test(x, cutoff = NULL)runs_test(x, cutoff = NULL)
x |
Numeric sequence. |
cutoff |
Cut-off (median by default). |
Sample entropy (SampEn) of a 1-D signal: , where
counts template-vector matches at embedding dimension
and at dimension , with Chebyshev distance
tolerance .
sampen(x, m = 2L, r = 0.2)sampen(x, m = 2L, r = 0.2)
x |
Numeric vector. |
m |
Embedding dimension (default 2). |
r |
Tolerance as fraction of sd (default 0.2). |
Reference: Richman, J.S. & Moorman, J.R. (2000) "Physiological time- series analysis using approximate entropy and sample entropy", Am. J. Physiol. Heart Circ. Physiol. 278(6):H2039–H2049 (refines Pincus, S.M. (1991), Proc. Natl. Acad. Sci. USA 88:2297).
List with value (SampEn), name, and extra
(m, r, tolerance, A, B).
set.seed(1) x <- sin(seq(0, 10 * pi, length.out = 500)) + 0.1 * rnorm(500) res <- sampen(x) res$valueset.seed(1) x <- sin(seq(0, 10 * pi, length.out = 500)) + 0.1 * rnorm(500) res <- sampen(x) res$value
Multi-dimensional data quality scores
score_data_quality( data, date_cols = NULL, freshness_days = 365L, key_cols = NULL, consistency_rules = NULL )score_data_quality( data, date_cols = NULL, freshness_days = 365L, key_cols = NULL, consistency_rules = NULL )
data |
Data frame. |
date_cols |
Datetime column names for timeliness. |
freshness_days |
Days for full timeliness score. |
key_cols |
Columns that should be unique together. |
consistency_rules |
List of functions |
Score (Lagrange multiplier) test
score_test(score_vector, information_matrix)score_test(score_vector, information_matrix)
score_vector |
Score vector evaluated under H0. |
information_matrix |
Information matrix under H0. |
A morie_specification_test.
Semi-partial (part) correlation
semi_partial_correlation(x, y, covariates)semi_partial_correlation(x, y, covariates)
x, y
|
Numeric vectors of interest. |
covariates |
Matrix or data frame of covariates. |
Returns a list of closures bound to the same backend (always pure R in this port; the Python module additionally supports a C backend).
SemiparKernels()SemiparKernels()
A list with class morie_semipar_kernels carrying
methods nw_regression, local_linear, kde,
silverman_bandwidth, loocv_bandwidth,
kernel_cond_moments, plus a backend string.
Tools to assess the robustness of causal effect estimates to unmeasured confounding, model specification, and other threats to internal validity. Includes Rosenbaum bounds, the E-value family, Ding-VanderWeele bias formulas, tipping-point analysis, omitted- variable bias (Cinelli-Hazlett), Manski bounds, probabilistic (Monte-Carlo) bias analysis, and specification curve analysis.
Wraps CRAN EValue, tipr, sensemakr, specr, rbounds, episensr, and konfound when available; falls back to base-R closed-form implementations otherwise.
Rosenbaum (2002); VanderWeele & Ding (2017); Cinelli & Hazlett (2020); Manski (1990); Ding & VanderWeele (2016).
Thin wrapper over rbounds::psens() when rbounds is
installed (rank-matched-pair signed-rank bounds across a Gamma
grid). Without rbounds, falls back to a base R normal-
approximation Wilcoxon signed-rank computation.
sensitivity_rosenbaum( data, treatment, outcome, covariates, gamma_range = c(1, 3), n_gamma = 20L )sensitivity_rosenbaum( data, treatment, outcome, covariates, gamma_range = c(1, 3), n_gamma = 20L )
data |
Data frame with treatment + outcome columns. |
treatment |
Binary treatment column (0/1). |
outcome |
Outcome column. |
covariates |
Covariates (used only for matching approximation, here a simple rank-match). |
gamma_range |
c(min, max) of Gamma. Default c(1, 3). |
n_gamma |
Number of Gamma values. Default 20. |
Data frame with Gamma, p_lower, p_upper.
Produces a tidy data.frame with the estimate, CI, p-value, applicable E-values (RR / OR / HR), and a tipping-point delta.
sensitivity_summary( estimate, se, rr = NULL, odds_ratio = NULL, hazard_ratio = NULL, prevalence = NULL )sensitivity_summary( estimate, se, rr = NULL, odds_ratio = NULL, hazard_ratio = NULL, prevalence = NULL )
estimate |
Treatment-effect estimate. |
se |
Standard error. |
rr, odds_ratio, hazard_ratio
|
Optional effect on each scale. |
prevalence |
Outcome prevalence (for OR-to-RR). |
A data.frame with metric, value.
Direct short-name export of the Savitzky-Golay smoother (matches the
Python morie.signal.sgolay name). For the long-form, see
morie_sgolay_smooth(), which this function delegates to.
sgolay(x, window = 11L, polyorder = 3L)sgolay(x, window = 11L, polyorder = 3L)
x |
Numeric vector. |
window |
Window length (odd, default 11). |
polyorder |
Polynomial order (default 3). |
Reference: Savitzky, A. & Golay, M.J.E. (1964) "Smoothing and differentiation of data by simplified least-squares procedures", Anal. Chem. 36(8):1627–1639.
List with filtered, name, fs, n_samples, extra
(window, polyorder).
if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) x <- sin(seq(0, 2 * pi, length.out = 200)) + 0.1 * rnorm(200) res <- sgolay(x) length(res$filtered) }if (requireNamespace("signal", quietly = TRUE)) { set.seed(1) x <- sin(seq(0, 2 * pi, length.out = 200)) + 0.1 * rnorm(200) res <- sgolay(x) length(res$filtered) }
Slightly less conservative than Bonferroni under independence.
Computes directly (closed form); the
mutoss package offers an equivalent step-down variant via
mutoss::SidakSD.
sidak(p_values, alpha = 0.05, labels = NULL)sidak(p_values, alpha = 0.05, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
labels |
Optional character vector of test labels. |
Returns h equal to 0.9 times min of sigma-hat and IQR over 1.34 times n to the negative one-fifth.
silverman_bandwidth(x)silverman_bandwidth(x)
x |
Numeric data vector. |
Bandwidth (numeric scalar).
Silverman, B. W. (1986), p. 48.
Simes test for the global null
simes_combined(p_values)simes_combined(p_values)
p_values |
Numeric vector of raw p-values. |
Spearman rank correlation
spearman_correlation(x, y, confidence = 0.95)spearman_correlation(x, y, confidence = 0.95)
x, y
|
Numeric vectors. |
confidence |
Confidence level. |
Estimates the treatment effect across many reasonable model
specifications to assess robustness. Combines covariate sets x
sample filters x model families. Cross-references specr
(specr::specr) as the canonical modern implementation with
built-in plotting; use specr directly when you want the
published specification-curve plot.
specification_curve( data, outcome, treatment, covariate_sets, sample_filters = NULL, model_types = NULL, alpha = 0.05 )specification_curve( data, outcome, treatment, covariate_sets, sample_filters = NULL, model_types = NULL, alpha = 0.05 )
data |
Analysis data.frame. |
outcome |
Outcome variable name. |
treatment |
Treatment variable name. |
covariate_sets |
List of character vectors (one per spec). |
sample_filters |
Optional. Accepted shapes (for Python<->R parity):
(a) |
model_types |
Character vector of model families:
|
alpha |
Significance level. Default 0.05. |
A morie_spec_curve named-list.
Standardises X and y to zero mean and unit variance before OLS via
stats::lm.
standardized_coefficients(X, y)standardized_coefficients(X, y)
X |
Predictor matrix or data.frame (n x p). |
y |
Outcome vector. |
A data.frame with columns variable, beta, se, t, p_value.
Execute a single command and return the resulting text
stat_bridge_exec(cmd_str)stat_bridge_exec(cmd_str)
cmd_str |
A whitespace-delimited command line, e.g.
|
Captured handler output as a single string.
Inspect a single command by name
stat_bridge_fn_info(name)stat_bridge_fn_info(name)
name |
Command name or alias. |
Multi-line description string or an explanatory error string.
Search the registry for matching commands
stat_bridge_fn_search(query, max_results = 20L)stat_bridge_fn_search(query, max_results = 20L)
query |
Free-text query; matched against names, categories, descriptions, and aliases. |
max_results |
Cap on the number of matches returned. |
Multi-line summary string.
Formatted text dump of the command registry
stat_bridge_help()stat_bridge_help()
A length-1 character string.
Mirrors python -m morie.stat_bridge <mode> [...] so the same
invocation pattern is available via Rscript -e.
stat_bridge_main(args = NULL)stat_bridge_main(args = NULL)
args |
Character vector of CLI arguments (mode + parameters).
When |
Recognised modes: "registry-json", "help",
"exec", "fn-info", "fn-search", "verify".
Invisibly returns the printed text; primarily called for side effects (printing to stdout).
JSON enumeration of all registered commands
stat_bridge_registry_json()stat_bridge_registry_json()
A length-1 character vector containing JSON text.
Calls every registered handler with no arguments inside
tryCatch, reporting which entries can be invoked safely.
Intended to be called from CI smoke tests.
stat_bridge_verify()stat_bridge_verify()
A data.frame with columns name, ok, message.
Construct a stat_command entry
stat_command( name, category, usage, description, handler_repl, handler_stat = NULL, aliases = character(0), module = "", is_compound = FALSE, is_r_bridge = FALSE )stat_command( name, category, usage, description, handler_repl, handler_stat = NULL, aliases = character(0), module = "", is_compound = FALSE, is_r_bridge = FALSE )
name |
Canonical command name (character scalar). |
category |
Category label used for grouping. |
usage |
One-line usage string for help screens. |
description |
Short description. |
handler_repl |
Function implementing the command. |
handler_stat |
Optional terminal handler taking
|
aliases |
Character vector of additional names. |
module |
Source module string (informational). |
is_compound |
Logical; flags compound workflows. |
is_r_bridge |
Logical; flags Python <-> R bridge calls. |
A list with class morie_stat_command.
R port of the Python module morie.statistics. Every function
returns a named list (class "morie_test_result")
carrying the test statistic, p-value, degrees of freedom, confidence
interval, effect size, point estimate, sample size and a free-form
extra list, so downstream code can post-process programmatically.
Location: one_sample_ttest, two_sample_ttest,
welch_ttest, paired_ttest
ANOVA / non-parametric ANOVA: one_way_anova,
two_way_anova, repeated_measures_anova,
friedman_test (stats::kruskal.test for K-W)
Chi-squared family: chi2_goodness_of_fit,
chi2_independence, mcnemar_test, cochrans_q
Correlation: pearson_correlation,
spearman_correlation, kendall_correlation,
point_biserial_correlation, partial_correlation,
semi_partial_correlation
Non-parametric: mann_whitney_u,
wilcoxon_signed_rank, ks_test_one_sample,
ks_test_two_sample, levene_test,
bartlett_test, runs_test
(nortest::ad.test for Anderson-Darling)
Normality: dagostino_pearson, lilliefors_test
(stats::shapiro.test for Shapiro-Wilk,
tseries::jarque.bera.test for Jarque-Bera)
Proportions: one_proportion_ztest,
two_proportion_ztest, fisher_exact_test
Agreement: cohens_kappa, intraclass_correlation
(irr::kappam.fleiss for Fleiss' kappa)
Convenience: normality_suite,
variance_equality_suite, correlation_matrix,
auto_test
Estimates the proportion of true null hypotheses (pi0) and
tightens the BH thresholds by that factor. Delegates to
qvalue::qvalue (Bioconductor) when installed; otherwise
falls back to an inline Storey-style cutoff so the wrapper keeps
working on CRAN-only installs.
storey_q(p_values, alpha = 0.05, lambda_param = 0.5, labels = NULL)storey_q(p_values, alpha = 0.05, lambda_param = 0.5, labels = NULL)
p_values |
Numeric vector of raw p-values. |
alpha |
Significance level. |
lambda_param |
Tuning parameter in (0, 1) for the pi0 estimator. |
labels |
Optional character vector of test labels. |
Delegates to poolr::stouffer when installed and no weights
are supplied; otherwise computes the weighted z-sum inline.
stouffer_combined(p_values, weights = NULL)stouffer_combined(p_values, weights = NULL)
p_values |
Numeric vector of raw p-values. |
weights |
Optional non-negative weights (any scale). |
Draws without replacement at a smaller sample size; valid under
weaker conditions than the bootstrap. No clean CRAN function
exposes the same (data, statistic, subsample_size, n_subsamples)
API; np's npsubsample is closest but is kernel-specific.
Kept as an in-house implementation and flagged
novel/no-clean-CRAN-equivalent.
subsampling( data, statistic, subsample_size = NULL, n_subsamples = 1000L, ci_level = 0.95, seed = 42L )subsampling( data, statistic, subsample_size = NULL, n_subsamples = 1000L, ci_level = 0.95, seed = 42L )
data |
Numeric vector or matrix. |
statistic |
Function returning a scalar. |
subsample_size |
Subsample size; default |
n_subsamples |
Number of subsamples. |
ci_level |
Confidence level. |
seed |
Random seed. |
A morie_bootstrap_result.
Canonical substance category mapping used across CSUS HealthInfobase data files. Maps short keys to human-readable labels and source filenames.
substance_categoriessubstance_categories
A data.frame with columns:
Short key (e.g., "alcohol", "cannabis")
Display label (e.g., "Alcohol", "Cannabis")
Filename in healthinfobase/CSUS/ directory
Canadian Substance Use Survey (CSUS) via Health Infobase Canada.
data(substance_categories) substance_categories$labeldata(substance_categories) substance_categories$label
Descriptive statistics for a set of variables
summary_statistics_table( data, variables = NULL, stats = c("n", "mean", "sd", "median", "min", "max", "missing"), digits = 2L, output_format = "dataframe", title = "Summary Statistics" )summary_statistics_table( data, variables = NULL, stats = c("n", "mean", "sd", "median", "min", "max", "missing"), digits = 2L, output_format = "dataframe", title = "Summary Statistics" )
data |
Data frame. |
variables |
Variable names (auto-detect numeric if NULL). |
stats |
Vector of statistic names. |
digits |
Decimal places. |
output_format |
Output target. |
title |
Title. |
Table 1 (baseline characteristics) stratified by group
table1( data, group_col = NULL, continuous_vars = NULL, categorical_vars = NULL, continuous_summary = c("mean_sd", "median_iqr", "mean_ci"), show_p = TRUE, show_smd = TRUE, show_missing = TRUE, weights = NULL, digits = 2L, apa = FALSE, output_format = "dataframe", title = "Table 1. Baseline Characteristics" )table1( data, group_col = NULL, continuous_vars = NULL, categorical_vars = NULL, continuous_summary = c("mean_sd", "median_iqr", "mean_ci"), show_p = TRUE, show_smd = TRUE, show_missing = TRUE, weights = NULL, digits = 2L, apa = FALSE, output_format = "dataframe", title = "Table 1. Baseline Characteristics" )
data |
Data frame. |
group_col |
Column defining groups, or NULL. |
continuous_vars |
Continuous variable names (auto-detect numeric non-group columns if NULL). |
categorical_vars |
Categorical names (auto-detect character / factor / logical if NULL). |
continuous_summary |
"mean_sd", "median_iqr" or "mean_ci". |
show_p |
Include p-value column. |
show_smd |
Include SMD column (2 groups only). |
show_missing |
Include missing count. |
weights |
Column name for survey weights or NULL. |
digits |
Decimal places. |
apa |
APA-style p-value formatting. |
output_format |
"dataframe", "latex", "html", "markdown", "text", "csv". |
title |
Table title. |
R port of the Python module morie.tables_pub. Builds Table 1
(baseline characteristics), regression tables, odds-ratio and
hazard-ratio tables, correlation matrices, model comparison tables,
ANOVA tables, summary-statistics tables and treatment-effect tables.
Output rendering goes through knitr::kable for "latex", "html",
"markdown", "pipe" and "rst" formats; "dataframe" returns the raw
data.frame, and "text" returns utils::capture.output on
the frame. The gt package is supported as an optional
richer-output backend when installed (Suggests-gated).
Functions consume R-native model objects:
regression_table accepts lm, glm or any
object responding to coef, vcov and
confint.
odds_ratio_table accepts a fitted glm with
family = binomial().
hazard_ratio_table accepts a coxph fit, or
per-parameter beta, se and p vectors.
anova_table wraps stats::anova / car::Anova.
Train on earlier data, test on later data
temporal_validate( fit_fn, predict_fn, X, y, date_col, split_date = NULL, split_quantile = 0.7, scoring = "roc_auc" )temporal_validate( fit_fn, predict_fn, X, y, date_col, split_date = NULL, split_quantile = 0.7, scoring = "roc_auc" )
fit_fn, predict_fn
|
As in |
X |
Data frame including |
y |
Target vector. |
date_col |
Name of date column in |
split_date |
Date to split on, or NULL. |
split_quantile |
Quantile of dates (if |
scoring |
Scoring metric. |
Delegates to poolr::tippett when installed; otherwise
computes the closed form inline.
tippett_combined(p_values)tippett_combined(p_values)
p_values |
Numeric vector of raw p-values. |
How much would unobserved outcomes need to differ from observed
ones for the treatment effect to become non-significant?
Phase 1.g cross-references tipr for the unmeasured-confounder
family of tipping-point calculations
(see also morie_sensitivity_tipping_point).
tipping_point_analysis( estimate, se, n_treated, n_control, delta_range = NULL, outcome_type = "continuous" )tipping_point_analysis( estimate, se, n_treated, n_control, delta_range = NULL, outcome_type = "continuous" )
estimate |
Observed treatment effect. |
se |
Standard error of the estimate. |
n_treated |
Number of treated units. |
n_control |
Number of control units. |
delta_range |
Numeric vector of bias parameters (default
|
outcome_type |
|
A morie_tipping_point named-list.
R-side port of morie.tps_crime. Where the Python module
uses TPS_REGISTRY + load_tps_dataset to materialise
per-category data.frames, the R-side callables here accept a named
list of pre-loaded data.frames (one entry per TPS category) via the
dfs argument. Callers are responsible for loading the CSVs
(e.g. via utils::read.csv or readr::read_csv) and
passing them in keyed by canonical TPS category name (e.g.
"Assault", "Homicides", "BicycleTheft").
Callables:
morie_tps_yoy_panel(): side-by-side year-over-year
panel across TPS categories.
morie_tps_composite_index(): per-neighbourhood
composite crime-risk index (sum of z-standardised counts,
optionally weighted).
morie_tps_bivariate_morans_i(): bivariate Moran's I
between two TPS categories on a shared HOOD_158 footprint
using a k-NN row-standardised spatial weights matrix.
morie_tps_category_correlation_matrix(): Pearson r
on per-hood incident counts across all supplied categories.
Each callable returns a named list with class
c("morie_tps_result", "morie_rich_result", "list") carrying
title, summary_lines, tables (when applicable),
interpretation, warnings, and a free-form
payload.
R-side port of morie.tps_csi. The Crime Severity Index
(Wallace et al., 2009; Statistics Canada Catalogue 85-004-X)
weights each Criminal Code offence by the product of the
average sentence length (days) and the proportion of offenders
incarcerated, so that violent offences with high incarceration
rates and long sentences contribute disproportionately to a city's
per-capita CSI score.
This file exposes the weights used for the 9 Toronto Police Service open-data categories (Assault, Auto Theft, Bicycle Theft, Break and Enter, Homicide, Robbery, Shooting and Firearm Discharges, Theft from Motor Vehicle, Theft Over) and provides per-year + per-neighbourhood CSI aggregates.
TPS open-data categories aggregate over multiple Criminal Code sub-offences. The weights here are representative blends reflecting the typical distribution of sub-offences within each TPS category for FY2023; for an exact reproduction of Statistics Canada's CSI for the City of Toronto one must work directly from the CCJS UCR microdata, which is not in TPS open data.
Weights are pinned to the last published StatsCan methodology
update (Reweighting the Crime Severity Index, Catalogue
85-004-X) and the Toronto-specific override tables in the CCJS
Annual Statistics 2023. Newer revisions (StatsCan revises every
5 years) may shift values by 5-15\
ordering. Override via the weights argument.
Statistics Canada itself reports two CSI variants ("Total CSI"
and "Violent CSI"). Functions here default to Total but accept
variant = "violent" to use violent-only weights, where
non-violent categories (B&E, theft) are zeroed.
Wallace, M., Turner, J., Babyak, C., & Matarazzo, A. (2009). Measuring Crime in Canada: Introducing the Crime Severity Index and Improvements to the Uniform Crime Reporting Survey. Statistics Canada Catalogue 85-004-X.
Statistics Canada (2024). Crime Severity Index, Census Metropolitan Areas, 2023. Catalogue 35-10-0190-01.
R parity of morie.tps_spatial: Moran's I (global), LISA
(local Moran's Ii) for hot/cold spots, and 2-D kernel density
estimation of incident lat/long. Each function accepts a
data.frame of incident-level rows with a neighbourhood id
column plus WGS84 lat/long columns, and returns a named
list carrying numeric outputs alongside a multi-paragraph
interpretation so the result prints in a notebook without
further post-processing.
Spatial weights are built with an internal base-R k-nearest-
neighbours routine; if the optional FNN package is installed
it is used for the KNN graph. The 2-D kernel density estimator
prefers MASS::kde2d when available, otherwise falls
back to a Gaussian density evaluated at the observation points.
If spdep is installed, callers can delegate the global
Moran's I test to spdep::moran.test via the
use_spdep = TRUE switch.
morie_tps_morans_i_neighbourhood: global
Moran's I on neighbourhood-level incident counts.
morie_tps_local_morans_i: LISA (local
Moran's Ii) per neighbourhood with HH/LL/HL/LH quadrant
classification.
morie_tps_kde_density: 2-D kernel density
estimate of geocoded incident points.
R parity of morie.tps_spatial_advanced. Builds on
tps_spatial (global Moran's I, LISA, KDE) with:
morie_tps_ripley_k: Ripley's K function for
point-pattern clustering at multiple radii.
morie_tps_getis_ord_g_star: local
Getis-Ord Gi* hot/cold-spot z-scores.
morie_tps_dbscan_clusters: density-based
clusters on lat/long (via dbscan, optional).
morie_tps_polygon_morans_i: polygon-aware
Moran's I from an sf object's actual polygon centroids
(instead of the centroid-only k-NN approximation in
morie_tps_morans_i_neighbourhood).
morie_tps_bivariate_moran: bivariate Moran's I
between two attributes at the same polygons.
morie_tps_moran_sweep_heatmap: a (category x
year) sweep of polygon Moran's I.
Polygon functions accept either an sf object (gated with
requireNamespace("sf")) or a plain data.frame carrying
precomputed centroid columns. KNN graphs prefer FNN; DBSCAN
requires the optional dbscan package; spatial autocorrelation
tests can optionally be delegated to spdep.
Summary table of causal effect estimates from multiple estimators
treatment_effect_table( estimators, digits = 3L, output_format = "dataframe", title = "Treatment Effect Estimates" )treatment_effect_table( estimators, digits = 3L, output_format = "dataframe", title = "Treatment Effect Estimates" )
estimators |
Named list of lists; each inner list provides
numeric |
digits |
Decimal places. |
output_format |
Output target. |
title |
Title. |
Two-sample z-test for the difference in proportions
two_proportion_ztest(count1, nobs1, count2, nobs2, confidence = 0.95)two_proportion_ztest(count1, nobs1, count2, nobs2, confidence = 0.95)
count1, nobs1
|
First sample. |
count2, nobs2
|
Second sample. |
confidence |
Confidence level. |
Independent two-sample t-test (equal or unequal variance)
two_sample_ttest(x, y, equal_var = TRUE, confidence = 0.95)two_sample_ttest(x, y, equal_var = TRUE, confidence = 0.95)
x, y
|
Numeric vectors. |
equal_var |
If FALSE, use Welch's correction. |
confidence |
Confidence level. |
Uses base R aov, then drop1 for Type-II sums of squares.
two_way_anova(data, outcome, factor_a, factor_b)two_way_anova(data, outcome, factor_a, factor_b)
data |
A data frame. |
outcome |
Name of dependent-variable column. |
factor_a, factor_b
|
Names of factor columns. |
Validate a data frame against a list of column rules
validate_schema(data, rules, raise_on_error = FALSE)validate_schema(data, rules, raise_on_error = FALSE)
data |
A data frame. |
rules |
List of |
raise_on_error |
If TRUE, throw on first error. |
R port of the Python module morie.validation: schema validation,
data quality scoring, cross-validation, calibration / discrimination /
decision-curve analysis, overfitting detection, temporal / external
validation, and reproducibility manifests.
Most callables return a named list (class
"morie_validation_result") so the R caller does not need
S4 or R6. Model-fitting routines accept a user-supplied
fit_fn of signature function(X, y) -> model and a
predict_fn of signature function(model, X) -> prob; this
keeps the port framework-agnostic (works with glm, glmnet,
randomForest, xgboost, etc.).
Phase 3DDD1. Five small fixtures harvested live from
opendata.vancouver.ca for offline reproducibility – chosen to
surface neighbourhood-level civic context useful in carceral /
policing analysis even though VPD itself publishes crime data
separately (see morie_datasets_vpd_crime()).
Phase 3EEE3. Bundled 27-row snapshot of City-run community centres. Useful as an "anchor institutions" overlay for analyses of neighbourhood-level crime + social-service access.
Phase 3EEE3. Bundled 91-row snapshot of community + farmers markets across Vancouver. Useful for food-access / quality-of-life overlays.
Phase 3EEE3. Bundled 100-row sample of designated disability parking locations across Vancouver (out of 159 total).
Phase 3EEE3. Bundled 100-row sample of Vancouver's public art registry (out of 747 total) – artist, install year, neighbourhood, primary material. Useful as a CPTED-style "place-making" overlay variable.
morie_datasets_vancouver_graffiti(offline = TRUE, max_features = NULL) morie_datasets_vancouver_noise_control_areas( offline = TRUE, max_features = NULL ) morie_datasets_vancouver_homeless_shelters(offline = TRUE, max_features = NULL) morie_datasets_vancouver_property_use_inspection_districts( offline = TRUE, max_features = NULL ) morie_datasets_vancouver_fire_halls(offline = TRUE, max_features = NULL) morie_datasets_vancouver_community_centres(offline = TRUE, max_features = NULL) morie_datasets_vancouver_community_food_markets( offline = TRUE, max_features = NULL ) morie_datasets_vancouver_disability_parking( offline = TRUE, max_features = NULL ) morie_datasets_vancouver_public_art(offline = TRUE, max_features = NULL)morie_datasets_vancouver_graffiti(offline = TRUE, max_features = NULL) morie_datasets_vancouver_noise_control_areas( offline = TRUE, max_features = NULL ) morie_datasets_vancouver_homeless_shelters(offline = TRUE, max_features = NULL) morie_datasets_vancouver_property_use_inspection_districts( offline = TRUE, max_features = NULL ) morie_datasets_vancouver_fire_halls(offline = TRUE, max_features = NULL) morie_datasets_vancouver_community_centres(offline = TRUE, max_features = NULL) morie_datasets_vancouver_community_food_markets( offline = TRUE, max_features = NULL ) morie_datasets_vancouver_disability_parking( offline = TRUE, max_features = NULL ) morie_datasets_vancouver_public_art(offline = TRUE, max_features = NULL)
offline |
If |
max_features |
Optional row cap. |
| Loader | Dataset slug | Rows |
morie_datasets_vancouver_graffiti() |
graffiti |
100 (of 7683) |
morie_datasets_vancouver_noise_control_areas() |
noise-control-areas |
3 |
morie_datasets_vancouver_homeless_shelters() |
homeless-shelter-locations |
17 |
morie_datasets_vancouver_property_use_inspection_districts() |
property-use-inspection-districts |
23 |
morie_datasets_vancouver_fire_halls() |
fire-halls |
20 |
All loaders accept the same offline = TRUE (default) /
max_features interface as the other morie dataset wrappers.
Vargha-Delaney A statistic
vargha_delaney_a(x, y, confidence = 0.95)vargha_delaney_a(x, y, confidence = 0.95)
x, y
|
Numeric vectors (NA dropped). |
confidence |
Confidence level for CI. Default 0.95. |
A morie_effect_size.
Classifies every column in OTIS / ARSAU datasets by Stevens-1946 level of measurement (nominal / ordinal / interval / ratio + the practical extensions boolean / date / datetime / identifier / free-text), cardinality, functional role (identifier / outcome / covariate / weight / metadata), and cross-year safety.
Drives a method dispatcher (morie_recommended_summary,
morie_recommended_pair_test) that picks the right
statistical analysis per variable based on its measurement level.
Hard-coded invariant overrides (the data dictionary itself states these, but we encode them in code so analyses cannot accidentally violate them):
OTIS UniqueIndividual_ID: random per-fiscal-year
reassignment -> cross_year_safe = FALSE,
role = "identifier". Cross-year joins on this
column are statistically meaningless.
ARSAU BatchFileName / Indiv_Index: per-
incident identifiers -> role = "identifier".
ARSAU IndivInjuries_PhysicalInjuries: boolean
injury outcome -> role = "outcome".
Stevens, S.S. (1946) "On the theory of scales of measurement." Science, 103(2684), 677-680.
Velleman, P.F. and Wilkinson, L. (1993) "Nominal, ordinal, interval, and ratio typologies are misleading." The American Statistician, 47(1), 65-72.
Run a suite of homogeneity-of-variance tests
variance_equality_suite(...)variance_equality_suite(...)
... |
Two or more numeric vectors. |
Variance ratio (F-test for equality of variances)
variance_ratio(x, y, confidence = 0.95)variance_ratio(x, y, confidence = 0.95)
x, y
|
Numeric vectors (NA dropped). |
confidence |
Confidence level for CI. Default 0.95. |
A morie_effect_size.
Tests H0: R %*% beta = r.
wald_test(estimates, vcov, R = NULL, r = NULL)wald_test(estimates, vcov, R = NULL, r = NULL)
estimates |
Parameter estimates. |
vcov |
Variance-covariance matrix. |
R |
Optional restriction matrix (default identity). |
r |
Optional restriction vector (default zeros). |
A morie_specification_test.
Welch's averaged periodogram PSD: split into segments, window (Hanning),
periodogram each, average. Delegates to oce::pwelch() if available,
otherwise computes a Hanning-windowed, 50%-overlap implementation in
base R.
welch(x, fs, nperseg = 256L)welch(x, fs, nperseg = 256L)
x |
Numeric vector (1-D signal). |
fs |
Sampling frequency in Hz. |
nperseg |
Segment length (default 256). |
Reference: Welch, P.D. (1967) "The use of fast Fourier transform for the estimation of power spectra", IEEE Trans. Audio Electroacoust. AU-15(2):70–73.
List with filtered (PSD), name, fs, n_samples, and
extra (freqs).
set.seed(1) t <- seq(0, 1, length.out = 1024) x <- sin(2 * pi * 50 * t) + 0.3 * rnorm(length(t)) res <- welch(x, fs = 1024) length(res$filtered)set.seed(1) t <- seq(0, 1, length.out = 1024) x <- sin(2 * pi * 50 * t) + 0.3 * rnorm(length(t)) res <- welch(x, fs = 1024) length(res$filtered)
Welch's t-test (convenience wrapper)
welch_ttest(x, y, confidence = 0.95)welch_ttest(x, y, confidence = 0.95)
x, y
|
Numeric vectors. |
confidence |
Confidence level. |
Wilcoxon signed-rank test (one-sample or paired)
wilcoxon_signed_rank(x, y = NULL, alternative = "two.sided")wilcoxon_signed_rank(x, y = NULL, alternative = "two.sided")
x |
Numeric vector. |
y |
Optional paired vector. |
alternative |
One of "two.sided", "less", "greater". |
Multiplies the residuals by random weights (Rademacher or Mammen)
and refits OLS. sandwich::vcovBS implements the standard
wild bootstrap variance-covariance and
fwildclusterboot::boottest adds cluster-wild
p-values; both are cross-referenced here. The inline
implementation is retained because rmorie's API returns the
resampled coefficient distribution (not just a vcov), which is
what downstream MRM analyses consume.
wild_bootstrap( y, X, statistic_idx = 2L, n_boot = 999L, ci_level = 0.95, weight_distribution = "rademacher", seed = 42L )wild_bootstrap( y, X, statistic_idx = 2L, n_boot = 999L, ci_level = 0.95, weight_distribution = "rademacher", seed = 42L )
y |
Numeric response vector. |
X |
Numeric design matrix (include an intercept column). |
statistic_idx |
Column index of the coefficient of interest (1-based). |
n_boot |
Number of replicates. |
ci_level |
Confidence level. |
weight_distribution |
|
seed |
Random seed. |
A morie_bootstrap_result.
sandwich::vcovBS, fwildclusterboot::boottest,
morie_did_wild_cluster_bootstrap().