--- title: "Getting started with MORIE in R" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Getting started with MORIE in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = requireNamespace("rmorie", quietly = TRUE) ) ``` # Overview MORIE is a multi-domain scientific-computing toolkit with parallel Python and R packages. The R package mirrors a substantial subset of the Python package, focused on the surfaces that are most useful from within an R workflow: dataset loading, causal estimators, survey sampling and weighting, basic spectral analysis, and helpers for the MRM (McNamara--Ruhela--Medina) framework that is MORIE's primary sociolegal-data application. This vignette walks through a minimal end-to-end session: load the package, look at the bundled dataset catalogue, load one dataset, and run an average-treatment-effect estimator on a small synthetic example. A second vignette (`mrm-otis-walkthrough`) covers the MRM ten-estimator ensemble on OTIS provincial data. # Loading the package ```{r setup} library(rmorie) ``` # The dataset catalogue `morie_dataset_catalog()` returns a data frame summarising every dataset bundled with the package or accessible via the package's loaders. This is the easiest way to discover what's available without leaving the R session. ```{r catalog, eval = FALSE} catalog <- morie_dataset_catalog() head(catalog) ``` For details on a single dataset (variables, source, citation), use `morie_dataset_info()`: ```{r dataset-info, eval = FALSE} morie_dataset_info("cpads-2122") ``` # Loading a dataset `morie_load_dataset()` returns a tibble (or data frame) for any dataset in the catalogue. Public-use datasets that ship inside the package require no further configuration; for datasets backed by remote SQLite mirrors, configure `MORIE_LOCAL_DB_DIR` (local directory of `.sqlite` files) or `MORIE_REMOTE_URL` (HTTP endpoint). ```{r load, eval = FALSE} df <- morie_load_dataset("cpads-2122") dim(df) ``` # A simple ATE estimate For users who already have a treatment / outcome / covariate dataset in hand, the estimators are designed to work on any tibble or data frame --- there is no hard-coded column-name convention. The example below is fully synthetic and runnable without any external data. ```{r ate-synth} set.seed(2026) n <- 500 X1 <- rnorm(n) X2 <- rnorm(n) # Confounded treatment assignment. treat <- as.integer(plogis(0.5 * X1 - 0.3 * X2) > runif(n)) # Outcome with a true ATE of +1.0 plus covariate effects. y <- 1.0 * treat + 0.7 * X1 - 0.2 * X2 + rnorm(n, sd = 0.5) df_synth <- data.frame(y = y, treat = treat, X1 = X1, X2 = X2) result <- morie_estimate_ate( data = df_synth, outcome = "y", treatment = "treat", covariates = c("X1", "X2") ) print(result) ``` The returned object is a list with the point estimate, standard error, confidence interval, and the underlying nuisance fits, in the `RichResult`-compatible structure described in the Python package paper. # Companion estimators `morie_estimate_att()`, `morie_estimate_atc()`, and `morie_estimate_aipw()` follow the same calling convention. The augmented IPW estimator (`morie_estimate_aipw()`) is doubly robust under correct specification of either the propensity model or the outcome model. ```{r aipw-synth, eval = FALSE} result_aipw <- morie_estimate_aipw( data = df_synth, outcome = "y", treatment = "treat", covariates = c("X1", "X2") ) print(result_aipw) ``` # Where to go next - The `mrm-otis-walkthrough` vignette demonstrates the ten-estimator MRM ensemble on Ontario OTIS provincial restrictive-confinement microdata. - The MORIE package paper describes the wider scope of the toolkit beyond R: signal processing, cryptography, spatial statistics, statistical-physics-of-crime models, psychometrics, and the full Python interface. - Citation: see `citation("morie")`.