Skip to content

omopy.incidence

Incidence and prevalence estimation — generate denominator cohorts, compute incidence rates and prevalence proportions, and present results as tables and plots.

This module is the Python equivalent of the R IncidencePrevalence package. Table rendering delegates to omopy.vis; plot rendering uses plotly; confidence intervals use scipy.

Denominator Generation

Build denominator cohorts from observation periods, optionally stratified by age, sex, and prior observation requirements.

generate_denominator_cohort_set

generate_denominator_cohort_set(
    cdm: CdmReference,
    name: str = "denominator",
    *,
    cohort_date_range: tuple[date | None, date | None] = (
        None,
        None,
    ),
    age_group: list[tuple[int, int]] | None = None,
    sex: list[Literal["Both", "Male", "Female"]]
    | Literal["Both", "Male", "Female"] = "Both",
    days_prior_observation: int | list[int] = 0,
    requirement_interactions: bool = True,
) -> CdmReference

Generate denominator cohorts from the general population.

Creates one or more denominator cohorts based on observation periods, optionally stratified by age group, sex, and required prior observation. When requirement_interactions is True, every combination of the supplied criteria generates a separate cohort.

Parameters

cdm CDM reference containing person and observation_period tables. name Name for the output cohort table in the CDM. cohort_date_range (start_date, end_date) study window. None values use the earliest/latest observation dates in the database. age_group List of (min_age, max_age) tuples. Default [(0, 150)]. sex "Both", "Male", "Female", or a list of these. days_prior_observation Required days of prior observation. Integer or list of integers. requirement_interactions If True, create cohorts for all combinations of criteria. If False, criteria are applied independently.

Returns

CdmReference The CDM with the new denominator cohort table attached.

generate_target_denominator_cohort_set

generate_target_denominator_cohort_set(
    cdm: CdmReference,
    name: str = "denominator",
    *,
    target_cohort_table: str,
    target_cohort_id: int | list[int] | None = None,
    cohort_date_range: tuple[date | None, date | None] = (
        None,
        None,
    ),
    time_at_risk: tuple[int, float]
    | list[tuple[int, float]]
    | None = None,
    age_group: list[tuple[int, int]] | None = None,
    sex: list[Literal["Both", "Male", "Female"]]
    | Literal["Both", "Male", "Female"] = "Both",
    days_prior_observation: int | list[int] = 0,
    requirements_at_entry: bool = True,
    requirement_interactions: bool = True,
) -> CdmReference

Generate denominator cohorts scoped to a target cohort.

Like :func:generate_denominator_cohort_set but restricts time contribution to when a person is in a target cohort, with optional time-at-risk windows relative to target cohort entry.

Parameters

cdm CDM reference. name Name for the output cohort table. target_cohort_table Name of an existing cohort table in the CDM to use as target. target_cohort_id Which cohort IDs from the target table to use. None = all. cohort_date_range Study window. time_at_risk (start_offset, end_offset) in days relative to target cohort entry. float('inf') for the end means use observation end. Can be a list for multiple windows. age_group, sex, days_prior_observation Stratification criteria. requirements_at_entry If True, age/prior observation criteria must be met at target cohort start. If False, contribution starts once criteria are met during follow-up. requirement_interactions Cross-product of all criteria?

Returns

CdmReference The CDM with the new denominator cohort table attached.

Core Estimation

Compute incidence rates and prevalence proportions over calendar intervals with confidence intervals, washout logic, and strata support.

estimate_incidence

estimate_incidence(
    cdm: CdmReference,
    denominator_table: str,
    outcome_table: str,
    *,
    censor_table: str | None = None,
    denominator_cohort_id: int | list[int] | None = None,
    outcome_cohort_id: int | list[int] | None = None,
    censor_cohort_id: int | list[int] | None = None,
    interval: Literal[
        "weeks", "months", "quarters", "years", "overall"
    ] = "years",
    complete_database_intervals: bool = True,
    outcome_washout: int | float = float("inf"),
    repeated_events: bool = False,
    strata: list[str] | None = None,
    include_overall_strata: bool = True,
) -> SummarisedResult

Estimate incidence rates from denominator and outcome cohorts.

Computes incidence as outcome events per 100,000 person-years with exact Poisson confidence intervals.

Parameters

cdm CDM reference. denominator_table Name of the denominator cohort table. outcome_table Name of the outcome cohort table. censor_table Optional censoring cohort table name. denominator_cohort_id Which denominator cohort IDs to use. None = all. outcome_cohort_id Which outcome cohort IDs to use. None = all. censor_cohort_id Which censor cohort IDs to use. interval Time interval for rate calculation. complete_database_intervals Only include intervals fully captured by database observation. outcome_washout Days between events. float('inf') means first event only. repeated_events Allow multiple events per person. strata Column names in the denominator table for stratification. include_overall_strata Include an overall (unstratified) analysis.

Returns

SummarisedResult Summarised result with incidence estimates.

estimate_point_prevalence

estimate_point_prevalence(
    cdm: CdmReference,
    denominator_table: str,
    outcome_table: str,
    *,
    denominator_cohort_id: int | list[int] | None = None,
    outcome_cohort_id: int | list[int] | None = None,
    interval: Literal[
        "weeks", "months", "quarters", "years", "overall"
    ] = "years",
    time_point: Literal["start", "middle", "end"] = "start",
    strata: list[str] | None = None,
    include_overall_strata: bool = True,
) -> SummarisedResult

Estimate point prevalence at a specific time within each interval.

Counts the proportion of persons with the outcome on a given date within each calendar interval.

Parameters

cdm CDM reference. denominator_table Name of the denominator cohort table. outcome_table Name of the outcome cohort table. denominator_cohort_id, outcome_cohort_id Cohort ID filters. interval Calendar interval. time_point Where in the interval to measure: "start", "middle", or "end". strata Stratification columns. include_overall_strata Include unstratified analysis.

Returns

SummarisedResult Summarised result with point prevalence estimates.

estimate_period_prevalence

estimate_period_prevalence(
    cdm: CdmReference,
    denominator_table: str,
    outcome_table: str,
    *,
    denominator_cohort_id: int | list[int] | None = None,
    outcome_cohort_id: int | list[int] | None = None,
    interval: Literal[
        "weeks", "months", "quarters", "years", "overall"
    ] = "years",
    complete_database_intervals: bool = True,
    full_contribution: bool = False,
    strata: list[str] | None = None,
    include_overall_strata: bool = True,
) -> SummarisedResult

Estimate period prevalence over each calendar interval.

Counts the proportion of persons with any overlap with the outcome during each interval, among those contributing time.

Parameters

cdm CDM reference. denominator_table Name of the denominator cohort table. outcome_table Name of the outcome cohort table. denominator_cohort_id, outcome_cohort_id Cohort ID filters. interval Calendar interval. complete_database_intervals Only include intervals fully captured by observation. full_contribution Require the person to be observed for the full interval. strata Stratification columns. include_overall_strata Include unstratified analysis.

Returns

SummarisedResult Summarised result with period prevalence estimates.

Result Conversion

Pivot long-form SummarisedResult objects into wide tidy DataFrames with named columns for each estimate.

as_incidence_result

as_incidence_result(
    result: SummarisedResult, *, metadata: bool = False
) -> pl.DataFrame

Convert a summarised result to a tidy incidence DataFrame.

Pivots the long-form SummarisedResult into a wide DataFrame with one row per interval and columns for each estimate.

Parameters

result A SummarisedResult from :func:estimate_incidence. metadata If True, include settings metadata columns.

Returns

pl.DataFrame Wide-form incidence results.

as_prevalence_result

as_prevalence_result(
    result: SummarisedResult, *, metadata: bool = False
) -> pl.DataFrame

Convert a summarised result to a tidy prevalence DataFrame.

Pivots the long-form SummarisedResult into a wide DataFrame with one row per interval and columns for each estimate.

Parameters

result A SummarisedResult from :func:estimate_point_prevalence or :func:estimate_period_prevalence. metadata If True, include settings metadata columns.

Returns

pl.DataFrame Wide-form prevalence results.

Table Functions

Thin wrappers around vis_omop_table() with epidemiological formatting defaults.

table_incidence

table_incidence(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    settings_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    options: dict[str, Any] | None = None,
) -> Any

Render an incidence results table.

Parameters

result A SummarisedResult from :func:estimate_incidence. type "gt" for great_tables, "polars" for DataFrame. header Columns to pivot into header. group_column Row grouping columns. settings_column Settings columns to include. hide Columns to hide. style Table style configuration. options Override options from :func:options_table_incidence.

Returns

great_tables.GT or polars.DataFrame

table_prevalence

table_prevalence(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    settings_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    options: dict[str, Any] | None = None,
) -> Any

Render a prevalence results table.

Parameters

result A SummarisedResult from a prevalence estimation function. type "gt" for great_tables, "polars" for DataFrame. header Columns to pivot into header. group_column Row grouping columns. settings_column Settings columns to include. hide Columns to hide. style Table style configuration. options Override options from :func:options_table_prevalence.

Returns

great_tables.GT or polars.DataFrame

table_incidence_attrition

table_incidence_attrition(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    settings_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
) -> Any

Render an attrition table for incidence analyses.

Parameters

result A SummarisedResult from :func:estimate_incidence. type, header, group_column, settings_column, hide, style Table rendering options.

Returns

great_tables.GT or polars.DataFrame

table_prevalence_attrition

table_prevalence_attrition(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    settings_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
) -> Any

Render an attrition table for prevalence analyses.

Parameters

result A SummarisedResult from a prevalence estimation function. type, header, group_column, settings_column, hide, style Table rendering options.

Returns

great_tables.GT or polars.DataFrame

options_table_incidence

options_table_incidence() -> dict[str, Any]

Return default table options for incidence results.

Returns

dict Default options for :func:table_incidence.

options_table_prevalence

options_table_prevalence() -> dict[str, Any]

Return default table options for prevalence results.

Returns

dict Default options for :func:table_prevalence.

Plot Functions

Wrappers around scatter_plot() and bar_plot() with epidemiological defaults.

plot_incidence

plot_incidence(
    result: SummarisedResult,
    *,
    x: str = "variable_level",
    y: str = "incidence_100000_pys",
    line: bool = True,
    point: bool = True,
    ribbon: bool = True,
    y_min: str | None = "incidence_100000_pys_95ci_lower",
    y_max: str | None = "incidence_100000_pys_95ci_upper",
    facet: str | list[str] | None = None,
    colour: str | None = None,
) -> Any

Plot incidence rates as a line plot with confidence ribbons.

Parameters

result A SummarisedResult from :func:estimate_incidence. x Column for x-axis (default: interval labels). y Estimate name for y-axis. line, point, ribbon Display elements. y_min, y_max Estimate names for CI ribbon bounds. facet Faceting column(s). colour Colour grouping column.

Returns

plotly.graph_objects.Figure

plot_prevalence

plot_prevalence(
    result: SummarisedResult,
    *,
    x: str = "variable_level",
    y: str = "prevalence",
    line: bool = True,
    point: bool = True,
    ribbon: bool = True,
    y_min: str | None = "prevalence_95ci_lower",
    y_max: str | None = "prevalence_95ci_upper",
    facet: str | list[str] | None = None,
    colour: str | None = None,
) -> Any

Plot prevalence proportions as a line plot with confidence ribbons.

Parameters

result A SummarisedResult from a prevalence estimation function. x Column for x-axis. y Estimate name for y-axis. line, point, ribbon Display elements. y_min, y_max Estimate names for CI ribbon bounds. facet Faceting column(s). colour Colour grouping column.

Returns

plotly.graph_objects.Figure

plot_incidence_population

plot_incidence_population(
    result: SummarisedResult,
    *,
    x: str = "variable_level",
    y: str = "n_persons",
    facet: str | list[str] | None = None,
    colour: str | None = None,
) -> Any

Bar plot of denominator population counts from incidence results.

Parameters

result A SummarisedResult from :func:estimate_incidence. x Column for x-axis. y Estimate name for y-axis. facet Faceting column(s). colour Colour grouping column.

Returns

plotly.graph_objects.Figure

plot_prevalence_population

plot_prevalence_population(
    result: SummarisedResult,
    *,
    x: str = "variable_level",
    y: str = "n_persons",
    facet: str | list[str] | None = None,
    colour: str | None = None,
) -> Any

Bar plot of denominator population counts from prevalence results.

Parameters

result A SummarisedResult from a prevalence estimation function. x Column for x-axis. y Estimate name for y-axis. facet Faceting column(s). colour Colour grouping column.

Returns

plotly.graph_objects.Figure

Grouping Helpers

Discover available grouping columns for faceted plots.

available_incidence_grouping

available_incidence_grouping(
    result: SummarisedResult, *, varying: bool = False
) -> list[str]

List variables available for grouping/faceting incidence plots.

Parameters

result A SummarisedResult from :func:estimate_incidence. varying If True, only return variables with more than one unique value.

Returns

list[str] Available grouping variable names.

available_prevalence_grouping

available_prevalence_grouping(
    result: SummarisedResult, *, varying: bool = False
) -> list[str]

List variables available for grouping/faceting prevalence plots.

Parameters

result A SummarisedResult from a prevalence estimation function. varying If True, only return variables with more than one unique value.

Returns

list[str] Available grouping variable names.

Mock Data & Benchmarking

mock_incidence_prevalence

mock_incidence_prevalence(
    *,
    sample_size: int = 100,
    outcome_prevalence: float = 0.2,
    seed: int | None = None,
    study_start: date | None = None,
    study_end: date | None = None,
) -> CdmReference

Create a mock CDM reference for incidence/prevalence testing.

Generates synthetic person, observation_period, and cohort tables suitable for testing the estimation functions.

Parameters

sample_size Number of persons. outcome_prevalence Probability that each person has an outcome event. seed Random seed for reproducibility. study_start Start of study period. Default: 2010-01-01. study_end End of study period. Default: 2020-12-31.

Returns

CdmReference CDM with person, observation_period, target cohort, and outcome cohort tables.

benchmark_incidence_prevalence

benchmark_incidence_prevalence(
    cdm: CdmReference, *, analysis_type: str = "all"
) -> dict[str, float]

Run timing benchmarks on incidence and prevalence estimation.

Parameters

cdm CDM reference with denominator and outcome cohorts. analysis_type "all", "incidence", or "prevalence".

Returns

dict[str, float] Timing results in seconds.