Skip to content

OMOPy

Pythonic, type-safe interface for OMOP CDM databases.

OMOPy is a single Python package that reimplements the DARWIN-EU R package ecosystem for working with OMOP Common Data Model databases. It provides lazy database access via Ibis, type-safe data structures via Pydantic and Polars, and a clean Pythonic API with full type hints.

Modules

Module R Equivalent Description
omopy.generics omopgenerics Core type system: CDM references, tables, codelists, summarised results
omopy.connector CDMConnector Database connections, CDM loading, cohort generation, CIRCE engine
omopy.profiles PatientProfiles Patient-level enrichment: demographics, intersections, death
omopy.codelist CodelistGenerator Vocabulary search, hierarchy traversal, codelist operations
omopy.vis visOmopResults Format, tabulate, and plot summarised results
omopy.characteristics CohortCharacteristics Cohort characterization: summarise, tabulate, plot
omopy.incidence IncidencePrevalence Incidence rates and prevalence proportions
omopy.drug DrugUtilisation Drug cohort generation, utilisation metrics, dose analysis
omopy.survival CohortSurvival Kaplan-Meier survival, competing risks, survival plots
omopy.treatment TreatmentPatterns Treatment pathway analysis, Sankey and sunburst plots
omopy.drug_diagnostics DrugExposureDiagnostics Drug exposure quality checks and diagnostics
omopy.pregnancy PregnancyIdentifier Pregnancy episode identification (HIPPS algorithm)
omopy.testing TestGenerator Test data generation for OMOP CDM studies

Quick Example

from omopy.connector import cdm_from_con, generate_concept_cohort_set
from omopy.generics import Codelist

# Connect to a DuckDB OMOP CDM database
cdm = cdm_from_con("path/to/omop.duckdb", cdm_schema="cdm")

# Define a concept-based cohort
codelist = Codelist({"hypertension": [320128]})
cdm = generate_concept_cohort_set(cdm, codelist, name="hypertension_cohort")

# Enrich with demographics
from omopy.profiles import add_demographics
result = add_demographics(cdm["hypertension_cohort"], cdm)

# Collect to a Polars DataFrame
df = result.collect()
print(df)

Design Principles

  • Single package — one pip install omopy replaces 17 R packages
  • Lazy by default — Ibis constructs SQL queries; nothing executes until you call .collect()
  • Type-safe — Pydantic models with frozen immutability; full type annotations throughout
  • Pythonic — snake_case, context managers, keyword arguments, no R idioms
  • Database-agnostic — DuckDB, PostgreSQL, SQL Server, Snowflake, BigQuery, and more via Ibis backends

Requirements

  • Python >= 3.14
  • A database with OMOP CDM v5.3 or v5.4 tables

Status

Phase Module Status
Phase 0 omopy.generics Complete (236 tests)
Phase 1+2 omopy.connector Complete (292 tests)
Phase 3A omopy.profiles Complete (122 tests)
Phase 3B omopy.codelist Complete (122 tests)
Phase 3C omopy.vis Complete (115 tests)
Phase 4A omopy.characteristics Complete (73 tests)
Phase 4B omopy.incidence Complete (86 tests)
Phase 5A omopy.drug Complete (101 tests)
Phase 5B omopy.survival Complete (80 tests)
Phase 6A omopy.treatment Complete (127 tests)
Phase 6B omopy.drug_diagnostics Complete (80 tests)
Phase 7A omopy.pregnancy Complete (122 tests)
Phase 8A omopy.testing Complete (63 tests)

Total: 1619 tests, all passing.