OMOPy¶

Pythonic, type-safe interface for OMOP CDM databases.

OMOPy is a single Python package that reimplements the DARWIN-EU R package ecosystem for working with OMOP Common Data Model databases. It provides lazy database access via Ibis, type-safe data structures via Pydantic and Polars, and a clean Pythonic API with full type hints.

Modules¶

Module	R Equivalent	Description
`omopy.generics`	omopgenerics	Core type system: CDM references, tables, codelists, summarised results
`omopy.connector`	CDMConnector	Database connections, CDM loading, cohort generation, CIRCE engine
`omopy.profiles`	PatientProfiles	Patient-level enrichment: demographics, intersections, death
`omopy.codelist`	CodelistGenerator	Vocabulary search, hierarchy traversal, codelist operations
`omopy.vis`	visOmopResults	Format, tabulate, and plot summarised results
`omopy.characteristics`	CohortCharacteristics	Cohort characterization: summarise, tabulate, plot
`omopy.incidence`	IncidencePrevalence	Incidence rates and prevalence proportions
`omopy.drug`	DrugUtilisation	Drug cohort generation, utilisation metrics, dose analysis
`omopy.survival`	CohortSurvival	Kaplan-Meier survival, competing risks, survival plots
`omopy.treatment`	TreatmentPatterns	Treatment pathway analysis, Sankey and sunburst plots
`omopy.drug_diagnostics`	DrugExposureDiagnostics	Drug exposure quality checks and diagnostics
`omopy.pregnancy`	PregnancyIdentifier	Pregnancy episode identification (HIPPS algorithm)
`omopy.testing`	TestGenerator	Test data generation for OMOP CDM studies

Quick Example¶

from omopy.connector import cdm_from_con, generate_concept_cohort_set
from omopy.generics import Codelist

# Connect to a DuckDB OMOP CDM database
cdm = cdm_from_con("path/to/omop.duckdb", cdm_schema="cdm")

# Define a concept-based cohort
codelist = Codelist({"hypertension": [320128]})
cdm = generate_concept_cohort_set(cdm, codelist, name="hypertension_cohort")

# Enrich with demographics
from omopy.profiles import add_demographics
result = add_demographics(cdm["hypertension_cohort"], cdm)

# Collect to a Polars DataFrame
df = result.collect()
print(df)

Design Principles¶

Single package — one pip install omopy replaces 17 R packages
Lazy by default — Ibis constructs SQL queries; nothing executes until you call .collect()
Type-safe — Pydantic models with frozen immutability; full type annotations throughout
Pythonic — snake_case, context managers, keyword arguments, no R idioms
Database-agnostic — DuckDB, PostgreSQL, SQL Server, Snowflake, BigQuery, and more via Ibis backends

Requirements¶

Python >= 3.14
A database with OMOP CDM v5.3 or v5.4 tables

Status¶

Phase	Module	Status
Phase 0	`omopy.generics`	Complete (236 tests)
Phase 1+2	`omopy.connector`	Complete (292 tests)
Phase 3A	`omopy.profiles`	Complete (122 tests)
Phase 3B	`omopy.codelist`	Complete (122 tests)
Phase 3C	`omopy.vis`	Complete (115 tests)
Phase 4A	`omopy.characteristics`	Complete (73 tests)
Phase 4B	`omopy.incidence`	Complete (86 tests)
Phase 5A	`omopy.drug`	Complete (101 tests)
Phase 5B	`omopy.survival`	Complete (80 tests)
Phase 6A	`omopy.treatment`	Complete (127 tests)
Phase 6B	`omopy.drug_diagnostics`	Complete (80 tests)
Phase 7A	`omopy.pregnancy`	Complete (122 tests)
Phase 8A	`omopy.testing`	Complete (63 tests)

Total: 1619 tests, all passing.