Cohort Survival¶

The omopy.survival module provides survival analysis functions for OMOP CDM cohorts. It is the Python equivalent of the R CohortSurvival package.

Overview¶

The module has four layers:

Add survival data — enrich a cohort with time-to-event and status columns
Estimate — compute Kaplan-Meier or competing risk survival curves
Table — format results as publication-ready tables (via omopy.vis)
Plot — visualize survival curves as plotly figures

Step 1: Set Up Cohorts¶

Survival analysis requires a target cohort (the exposed/index population) and an outcome cohort (the event of interest). These are standard OMOP cohort tables in your CDM:

from omopy.connector import cdm_from_con, generate_concept_cohort_set
from omopy.generics import Codelist

cdm = cdm_from_con("path/to/omop.duckdb", cdm_schema="cdm")

# Define target and outcome cohorts
cdm = generate_concept_cohort_set(
    cdm,
    Codelist({"diabetes": [201826]}),
    name="target",
)
cdm = generate_concept_cohort_set(
    cdm,
    Codelist({"stroke": [439847]}),
    name="outcome",
)

Step 2: Add Survival Columns¶

The add_cohort_survival() function enriches each row of a cohort with time (days to event or censoring) and status (1 = event, 0 = censored):

from omopy.survival import add_cohort_survival

cohort = add_cohort_survival(
    cdm["target"],
    cdm,
    outcome_cohort_table="outcome",
    outcome_cohort_id=1,
    outcome_washout=180,       # Exclude persons with prior event in 180 days
    censor_on_cohort_exit=True,
    follow_up_days=365,        # Cap follow-up at 1 year
)

print(cohort.collect())
# subject_id | cohort_start_date | cohort_end_date | time | status
# 1          | 2020-01-15        | 2021-01-14      | 87   | 1
# 2          | 2020-03-01        | 2021-02-28      | 365  | 0

Censoring Hierarchy¶

Survival time is computed from the index date (target cohort start) to the earliest of:

Event date — first outcome occurrence after the index
Cohort exit — target cohort end date (if censor_on_cohort_exit=True)
Censor date — custom column value (if censor_on_date is specified)
Follow-up cap — maximum days of follow-up (if follow_up_days is finite)
Observation end — end of the observation period

Washout¶

The outcome_washout parameter excludes persons who had the outcome event within a specified window before their index date. Set to inf (the default) to require the entire prior history to be event-free.

Step 3: Estimate Survival¶

Single Event (Kaplan-Meier)¶

The primary function estimates Kaplan-Meier survival from target/outcome cohort pairs:

from omopy.survival import estimate_single_event_survival

result = estimate_single_event_survival(
    cdm,
    target_cohort_table="target",
    outcome_cohort_table="outcome",
    outcome_washout=180,
    censor_on_cohort_exit=False,
    follow_up_days=365,
    strata=["sex"],      # Stratify by pre-added columns
    event_gap=30,        # Risk table interval width
    estimate_gap=1,      # Survival curve resolution (days)
)

The result is a SummarisedResult containing four types of data:

Result type	Content
`survival_estimates`	Time-point survival probabilities with CIs
`survival_events`	Risk table (n_risk, n_event, n_censor per interval)
`survival_summary`	Median survival, RMST, quantiles
`survival_attrition`	Step-by-step subject counts through the pipeline

Competing Risks (Aalen-Johansen)¶

When a competing event can prevent the outcome of interest, use the cumulative incidence function:

from omopy.survival import estimate_competing_risk_survival

# Add a competing risk cohort (e.g., death)
cdm = generate_concept_cohort_set(
    cdm,
    Codelist({"death": [4306655]}),
    name="competing",
)

result = estimate_competing_risk_survival(
    cdm,
    target_cohort_table="target",
    outcome_cohort_table="outcome",
    competing_outcome_cohort_table="competing",
    follow_up_days=365,
)

The competing risk result reports cumulative incidence (probability of the event occurring) rather than survival probability.

Multiple Cohort Combinations¶

Both estimation functions accept lists of cohort IDs to analyse all combinations:

result = estimate_single_event_survival(
    cdm,
    target_cohort_table="target",
    outcome_cohort_table="outcome",
    target_cohort_id=[1, 2],    # Analyse both target definitions
    outcome_cohort_id=[1, 2],   # Against both outcome definitions
)

Step 4: Convert Results¶

The as_survival_result() function converts the long-format SummarisedResult into structured wide-format DataFrames:

from omopy.survival import as_survival_result

wide = as_survival_result(result)

# Dict with keys: "estimates", "events", "summary", "attrition"
print(wide["estimates"].columns)
# ['result_id', 'cdm_name', 'target_cohort', 'outcome',
#  'strata_name', 'strata_level', 'time', 'estimate', 'estimate_95CI_lower', ...]

print(wide["summary"])
# median_survival, restricted_mean_survival, quantiles

Step 5: Tables¶

All table functions format the SummarisedResult into publication-ready output:

from omopy.survival import (
    table_survival,
    table_survival_events,
    table_survival_attrition,
)

# Survival summary table (median, RMST, quantiles)
tbl = table_survival(result, type="polars")

# Risk table (n at risk, events, censored per interval)
events_tbl = table_survival_events(result, type="polars")

# Attrition table (step-by-step counts)
att_tbl = table_survival_attrition(result, type="polars")

You can also request type="gt" for a great_tables.GT object for rich HTML display.

Table Options¶

Query the default options for customization:

from omopy.survival import options_table_survival

defaults = options_table_survival()
# {'header': 'estimate', 'group_column': ['target_cohort'], ...}

Step 6: Plots¶

Survival Curves¶

The plot_survival() function creates Kaplan-Meier or cumulative incidence curves with optional confidence interval ribbons:

from omopy.survival import plot_survival

fig = plot_survival(result)
fig.show()

# With CI ribbons and faceting
fig = plot_survival(
    result,
    confidence_interval=True,
    facet="target_cohort",
    colour="outcome",
    time_scale="days",
)
fig.show()

Risk Tables¶

The plot can include an integrated risk table below the curve:

fig = plot_survival(
    result,
    risk_table=True,
    risk_table_times=[0, 30, 90, 180, 365],
)
fig.show()

Grouping Columns¶

Discover which columns are available for faceting or colouring:

from omopy.survival import available_survival_grouping

# All available grouping columns
cols = available_survival_grouping(result)
# ['target_cohort', 'outcome', 'analysis_type', ...]

# Only columns with more than one value
varying = available_survival_grouping(result, varying=True)
# ['outcome']

Mock Data¶

Generate synthetic CDM data for testing survival workflows:

from omopy.survival import mock_survival

mock_cdm = mock_survival(
    n_persons=200,
    seed=42,
    event_rate=0.3,
    competing_rate=0.15,
    max_follow_up=3650,
    include_strata=True,
)

# The mock CDM has target, outcome, and competing cohort tables
print(mock_cdm["target"].collect())

Working with Results¶

All estimation functions return SummarisedResult objects from omopy.generics. These support standard operations:

# Tidy format (unpack group/strata into named columns)
tidy_df = result.tidy()

# Filter by settings
filtered = result.filter_settings(result_type="survival_estimates")

# Apply minimum cell count suppression
suppressed = result.suppress(min_cell_count=5)

# Split by group
groups = result.split_group()

See the SummarisedResult reference for full details.