omopy.incidence¶

Incidence and prevalence estimation — generate denominator cohorts, compute incidence rates and prevalence proportions, and present results as tables and plots.

This module is the Python equivalent of the R IncidencePrevalence package. Table rendering delegates to omopy.vis; plot rendering uses plotly; confidence intervals use scipy.

Denominator Generation¶

Build denominator cohorts from observation periods, optionally stratified by age, sex, and prior observation requirements.

generate_denominator_cohort_set ¶

generate_denominator_cohort_set(
    cdm: CdmReference,
    name: str = "denominator",
    *,
    cohort_date_range: tuple[date | None, date | None] = (
        None,
        None,
    ),
    age_group: list[tuple[int, int]] | None = None,
    sex: list[Literal["Both", "Male", "Female"]]
    | Literal["Both", "Male", "Female"] = "Both",
    days_prior_observation: int | list[int] = 0,
    requirement_interactions: bool = True,
) -> CdmReference

Generate denominator cohorts from the general population.

Creates one or more denominator cohorts based on observation periods, optionally stratified by age group, sex, and required prior observation. When requirement_interactions is True, every combination of the supplied criteria generates a separate cohort.

Parameters¶

cdm CDM reference containing person and observation_period tables. name Name for the output cohort table in the CDM. cohort_date_range (start_date, end_date) study window. None values use the earliest/latest observation dates in the database. age_group List of (min_age, max_age) tuples. Default [(0, 150)]. sex "Both", "Male", "Female", or a list of these. days_prior_observation Required days of prior observation. Integer or list of integers. requirement_interactions If True, create cohorts for all combinations of criteria. If False, criteria are applied independently.

Returns¶

CdmReference The CDM with the new denominator cohort table attached.

generate_target_denominator_cohort_set ¶

generate_target_denominator_cohort_set(
    cdm: CdmReference,
    name: str = "denominator",
    *,
    target_cohort_table: str,
    target_cohort_id: int | list[int] | None = None,
    cohort_date_range: tuple[date | None, date | None] = (
        None,
        None,
    ),
    time_at_risk: tuple[int, float]
    | list[tuple[int, float]]
    | None = None,
    age_group: list[tuple[int, int]] | None = None,
    sex: list[Literal["Both", "Male", "Female"]]
    | Literal["Both", "Male", "Female"] = "Both",
    days_prior_observation: int | list[int] = 0,
    requirements_at_entry: bool = True,
    requirement_interactions: bool = True,
) -> CdmReference

Generate denominator cohorts scoped to a target cohort.

Like :func:generate_denominator_cohort_set but restricts time contribution to when a person is in a target cohort, with optional time-at-risk windows relative to target cohort entry.

Parameters¶

cdm CDM reference. name Name for the output cohort table. target_cohort_table Name of an existing cohort table in the CDM to use as target. target_cohort_id Which cohort IDs from the target table to use. None = all. cohort_date_range Study window. time_at_risk (start_offset, end_offset) in days relative to target cohort entry. float('inf') for the end means use observation end. Can be a list for multiple windows. age_group, sex, days_prior_observation Stratification criteria. requirements_at_entry If True, age/prior observation criteria must be met at target cohort start. If False, contribution starts once criteria are met during follow-up. requirement_interactions Cross-product of all criteria?

Returns¶

CdmReference The CDM with the new denominator cohort table attached.

Core Estimation¶

Compute incidence rates and prevalence proportions over calendar intervals with confidence intervals, washout logic, and strata support.

estimate_incidence ¶

estimate_incidence(
    cdm: CdmReference,
    denominator_table: str,
    outcome_table: str,
    *,
    censor_table: str | None = None,
    denominator_cohort_id: int | list[int] | None = None,
    outcome_cohort_id: int | list[int] | None = None,
    censor_cohort_id: int | list[int] | None = None,
    interval: Literal[
        "weeks", "months", "quarters", "years", "overall"
    ] = "years",
    complete_database_intervals: bool = True,
    outcome_washout: int | float = float("inf"),
    repeated_events: bool = False,
    strata: list[str] | None = None,
    include_overall_strata: bool = True,
) -> SummarisedResult

Estimate incidence rates from denominator and outcome cohorts.

Computes incidence as outcome events per 100,000 person-years with exact Poisson confidence intervals.

Parameters¶

cdm CDM reference. denominator_table Name of the denominator cohort table. outcome_table Name of the outcome cohort table. censor_table Optional censoring cohort table name. denominator_cohort_id Which denominator cohort IDs to use. None = all. outcome_cohort_id Which outcome cohort IDs to use. None = all. censor_cohort_id Which censor cohort IDs to use. interval Time interval for rate calculation. complete_database_intervals Only include intervals fully captured by database observation. outcome_washout Days between events. float('inf') means first event only. repeated_events Allow multiple events per person. strata Column names in the denominator table for stratification. include_overall_strata Include an overall (unstratified) analysis.

Returns¶

SummarisedResult Summarised result with incidence estimates.

estimate_point_prevalence ¶

estimate_point_prevalence(
    cdm: CdmReference,
    denominator_table: str,
    outcome_table: str,
    *,
    denominator_cohort_id: int | list[int] | None = None,
    outcome_cohort_id: int | list[int] | None = None,
    interval: Literal[
        "weeks", "months", "quarters", "years", "overall"
    ] = "years",
    time_point: Literal["start", "middle", "end"] = "start",
    strata: list[str] | None = None,
    include_overall_strata: bool = True,
) -> SummarisedResult

Estimate point prevalence at a specific time within each interval.

Counts the proportion of persons with the outcome on a given date within each calendar interval.

Parameters¶

cdm CDM reference. denominator_table Name of the denominator cohort table. outcome_table Name of the outcome cohort table. denominator_cohort_id, outcome_cohort_id Cohort ID filters. interval Calendar interval. time_point Where in the interval to measure: "start", "middle", or "end". strata Stratification columns. include_overall_strata Include unstratified analysis.

Returns¶

SummarisedResult Summarised result with point prevalence estimates.

estimate_period_prevalence ¶

estimate_period_prevalence(
    cdm: CdmReference,
    denominator_table: str,
    outcome_table: str,
    *,
    denominator_cohort_id: int | list[int] | None = None,
    outcome_cohort_id: int | list[int] | None = None,
    interval: Literal[
        "weeks", "months", "quarters", "years", "overall"
    ] = "years",
    complete_database_intervals: bool = True,
    full_contribution: bool = False,
    strata: list[str] | None = None,
    include_overall_strata: bool = True,
) -> SummarisedResult

Estimate period prevalence over each calendar interval.

Counts the proportion of persons with any overlap with the outcome during each interval, among those contributing time.

Parameters¶

cdm CDM reference. denominator_table Name of the denominator cohort table. outcome_table Name of the outcome cohort table. denominator_cohort_id, outcome_cohort_id Cohort ID filters. interval Calendar interval. complete_database_intervals Only include intervals fully captured by observation. full_contribution Require the person to be observed for the full interval. strata Stratification columns. include_overall_strata Include unstratified analysis.

Returns¶

SummarisedResult Summarised result with period prevalence estimates.

Result Conversion¶

Pivot long-form SummarisedResult objects into wide tidy DataFrames with named columns for each estimate.

as_incidence_result ¶

as_incidence_result(
    result: SummarisedResult, *, metadata: bool = False
) -> pl.DataFrame

Convert a summarised result to a tidy incidence DataFrame.

Pivots the long-form SummarisedResult into a wide DataFrame with one row per interval and columns for each estimate.

Parameters¶

result A SummarisedResult from :func:estimate_incidence. metadata If True, include settings metadata columns.

Returns¶

pl.DataFrame Wide-form incidence results.

as_prevalence_result ¶

as_prevalence_result(
    result: SummarisedResult, *, metadata: bool = False
) -> pl.DataFrame

Convert a summarised result to a tidy prevalence DataFrame.

Pivots the long-form SummarisedResult into a wide DataFrame with one row per interval and columns for each estimate.

options_table_incidence() -> dict[str, Any]

Return default table options for incidence results.

Returns¶

dict Default options for :func:table_incidence.

options_table_prevalence ¶

options_table_prevalence() -> dict[str, Any]

Return default table options for prevalence results.

Returns¶

dict Default options for :func:table_prevalence.

Plot Functions¶

Wrappers around scatter_plot() and bar_plot() with epidemiological defaults.

plot_incidence ¶

plot_incidence(
    result: SummarisedResult,
    *,
    x: str = "variable_level",
    y: str = "incidence_100000_pys",
    line: bool = True,
    point: bool = True,
    ribbon: bool = True,
    y_min: str | None = "incidence_100000_pys_95ci_lower",
    y_max: str | None = "incidence_100000_pys_95ci_upper",
    facet: str | list[str] | None = None,
    colour: str | None = None,
) -> Any

Plot incidence rates as a line plot with confidence ribbons.

Parameters¶

result A SummarisedResult from :func:estimate_incidence. x Column for x-axis (default: interval labels). y Estimate name for y-axis. line, point, ribbon Display elements. y_min, y_max Estimate names for CI ribbon bounds. facet Faceting column(s). colour Colour grouping column.

Returns¶

plotly.graph_objects.Figure

plot_prevalence ¶

plot_prevalence(
    result: SummarisedResult,
    *,
    x: str = "variable_level",
    y: str = "prevalence",
    line: bool = True,
    point: bool = True,
    ribbon: bool = True,
    y_min: str | None = "prevalence_95ci_lower",
    y_max: str | None = "prevalence_95ci_upper",
    facet: str | list[str] | None = None,
    colour: str | None = None,
) -> Any

Plot prevalence proportions as a line plot with confidence ribbons.

Parameters¶

result A SummarisedResult from a prevalence estimation function. x Column for x-axis. y Estimate name for y-axis. line, point, ribbon Display elements. y_min, y_max Estimate names for CI ribbon bounds. facet Faceting column(s). colour Colour grouping column.

Returns¶

plotly.graph_objects.Figure

plot_incidence_population ¶

plot_incidence_population(
    result: SummarisedResult,
    *,
    x: str = "variable_level",
    y: str = "n_persons",
    facet: str | list[str] | None = None,
    colour: str | None = None,
) -> Any

Bar plot of denominator population counts from incidence results.

Parameters¶

result A SummarisedResult from :func:estimate_incidence. x Column for x-axis. y Estimate name for y-axis. facet Faceting column(s). colour Colour grouping column.

Returns¶

plotly.graph_objects.Figure

plot_prevalence_population ¶

plot_prevalence_population(
    result: SummarisedResult,
    *,
    x: str = "variable_level",
    y: str = "n_persons",
    facet: str | list[str] | None = None,
    colour: str | None = None,
) -> Any

Bar plot of denominator population counts from prevalence results.

mock_incidence_prevalence ¶

mock_incidence_prevalence(
    *,
    sample_size: int = 100,
    outcome_prevalence: float = 0.2,
    seed: int | None = None,
    study_start: date | None = None,
    study_end: date | None = None,
) -> CdmReference

Create a mock CDM reference for incidence/prevalence testing.

Generates synthetic person, observation_period, and cohort tables suitable for testing the estimation functions.

Parameters¶

sample_size Number of persons. outcome_prevalence Probability that each person has an outcome event. seed Random seed for reproducibility. study_start Start of study period. Default: 2010-01-01. study_end End of study period. Default: 2020-12-31.

Returns¶

CdmReference CDM with person, observation_period, target cohort, and outcome cohort tables.

benchmark_incidence_prevalence ¶

benchmark_incidence_prevalence(
    cdm: CdmReference, *, analysis_type: str = "all"
) -> dict[str, float]

Run timing benchmarks on incidence and prevalence estimation.

Parameters¶

cdm CDM reference with denominator and outcome cohorts. analysis_type "all", "incidence", or "prevalence".

Returns¶

dict[str, float] Timing results in seconds.