omopy.incidence¶
Incidence and prevalence estimation — generate denominator cohorts, compute incidence rates and prevalence proportions, and present results as tables and plots.
This module is the Python equivalent of the R IncidencePrevalence package.
Table rendering delegates to omopy.vis; plot rendering uses
plotly; confidence intervals use
scipy.
Denominator Generation¶
Build denominator cohorts from observation periods, optionally stratified by age, sex, and prior observation requirements.
generate_denominator_cohort_set
¶
generate_denominator_cohort_set(
cdm: CdmReference,
name: str = "denominator",
*,
cohort_date_range: tuple[date | None, date | None] = (
None,
None,
),
age_group: list[tuple[int, int]] | None = None,
sex: list[Literal["Both", "Male", "Female"]]
| Literal["Both", "Male", "Female"] = "Both",
days_prior_observation: int | list[int] = 0,
requirement_interactions: bool = True,
) -> CdmReference
Generate denominator cohorts from the general population.
Creates one or more denominator cohorts based on observation periods,
optionally stratified by age group, sex, and required prior observation.
When requirement_interactions is True, every combination of
the supplied criteria generates a separate cohort.
Parameters¶
cdm
CDM reference containing person and observation_period tables.
name
Name for the output cohort table in the CDM.
cohort_date_range
(start_date, end_date) study window. None values use the
earliest/latest observation dates in the database.
age_group
List of (min_age, max_age) tuples. Default [(0, 150)].
sex
"Both", "Male", "Female", or a list of these.
days_prior_observation
Required days of prior observation. Integer or list of integers.
requirement_interactions
If True, create cohorts for all combinations of criteria.
If False, criteria are applied independently.
Returns¶
CdmReference The CDM with the new denominator cohort table attached.
generate_target_denominator_cohort_set
¶
generate_target_denominator_cohort_set(
cdm: CdmReference,
name: str = "denominator",
*,
target_cohort_table: str,
target_cohort_id: int | list[int] | None = None,
cohort_date_range: tuple[date | None, date | None] = (
None,
None,
),
time_at_risk: tuple[int, float]
| list[tuple[int, float]]
| None = None,
age_group: list[tuple[int, int]] | None = None,
sex: list[Literal["Both", "Male", "Female"]]
| Literal["Both", "Male", "Female"] = "Both",
days_prior_observation: int | list[int] = 0,
requirements_at_entry: bool = True,
requirement_interactions: bool = True,
) -> CdmReference
Generate denominator cohorts scoped to a target cohort.
Like :func:generate_denominator_cohort_set but restricts time
contribution to when a person is in a target cohort, with optional
time-at-risk windows relative to target cohort entry.
Parameters¶
cdm
CDM reference.
name
Name for the output cohort table.
target_cohort_table
Name of an existing cohort table in the CDM to use as target.
target_cohort_id
Which cohort IDs from the target table to use. None = all.
cohort_date_range
Study window.
time_at_risk
(start_offset, end_offset) in days relative to target cohort
entry. float('inf') for the end means use observation end.
Can be a list for multiple windows.
age_group, sex, days_prior_observation
Stratification criteria.
requirements_at_entry
If True, age/prior observation criteria must be met at
target cohort start. If False, contribution starts once
criteria are met during follow-up.
requirement_interactions
Cross-product of all criteria?
Returns¶
CdmReference The CDM with the new denominator cohort table attached.
Core Estimation¶
Compute incidence rates and prevalence proportions over calendar intervals with confidence intervals, washout logic, and strata support.
estimate_incidence
¶
estimate_incidence(
cdm: CdmReference,
denominator_table: str,
outcome_table: str,
*,
censor_table: str | None = None,
denominator_cohort_id: int | list[int] | None = None,
outcome_cohort_id: int | list[int] | None = None,
censor_cohort_id: int | list[int] | None = None,
interval: Literal[
"weeks", "months", "quarters", "years", "overall"
] = "years",
complete_database_intervals: bool = True,
outcome_washout: int | float = float("inf"),
repeated_events: bool = False,
strata: list[str] | None = None,
include_overall_strata: bool = True,
) -> SummarisedResult
Estimate incidence rates from denominator and outcome cohorts.
Computes incidence as outcome events per 100,000 person-years with exact Poisson confidence intervals.
Parameters¶
cdm
CDM reference.
denominator_table
Name of the denominator cohort table.
outcome_table
Name of the outcome cohort table.
censor_table
Optional censoring cohort table name.
denominator_cohort_id
Which denominator cohort IDs to use. None = all.
outcome_cohort_id
Which outcome cohort IDs to use. None = all.
censor_cohort_id
Which censor cohort IDs to use.
interval
Time interval for rate calculation.
complete_database_intervals
Only include intervals fully captured by database observation.
outcome_washout
Days between events. float('inf') means first event only.
repeated_events
Allow multiple events per person.
strata
Column names in the denominator table for stratification.
include_overall_strata
Include an overall (unstratified) analysis.
Returns¶
SummarisedResult Summarised result with incidence estimates.
estimate_point_prevalence
¶
estimate_point_prevalence(
cdm: CdmReference,
denominator_table: str,
outcome_table: str,
*,
denominator_cohort_id: int | list[int] | None = None,
outcome_cohort_id: int | list[int] | None = None,
interval: Literal[
"weeks", "months", "quarters", "years", "overall"
] = "years",
time_point: Literal["start", "middle", "end"] = "start",
strata: list[str] | None = None,
include_overall_strata: bool = True,
) -> SummarisedResult
Estimate point prevalence at a specific time within each interval.
Counts the proportion of persons with the outcome on a given date within each calendar interval.
Parameters¶
cdm
CDM reference.
denominator_table
Name of the denominator cohort table.
outcome_table
Name of the outcome cohort table.
denominator_cohort_id, outcome_cohort_id
Cohort ID filters.
interval
Calendar interval.
time_point
Where in the interval to measure: "start", "middle", or "end".
strata
Stratification columns.
include_overall_strata
Include unstratified analysis.
Returns¶
SummarisedResult Summarised result with point prevalence estimates.
estimate_period_prevalence
¶
estimate_period_prevalence(
cdm: CdmReference,
denominator_table: str,
outcome_table: str,
*,
denominator_cohort_id: int | list[int] | None = None,
outcome_cohort_id: int | list[int] | None = None,
interval: Literal[
"weeks", "months", "quarters", "years", "overall"
] = "years",
complete_database_intervals: bool = True,
full_contribution: bool = False,
strata: list[str] | None = None,
include_overall_strata: bool = True,
) -> SummarisedResult
Estimate period prevalence over each calendar interval.
Counts the proportion of persons with any overlap with the outcome during each interval, among those contributing time.
Parameters¶
cdm CDM reference. denominator_table Name of the denominator cohort table. outcome_table Name of the outcome cohort table. denominator_cohort_id, outcome_cohort_id Cohort ID filters. interval Calendar interval. complete_database_intervals Only include intervals fully captured by observation. full_contribution Require the person to be observed for the full interval. strata Stratification columns. include_overall_strata Include unstratified analysis.
Returns¶
SummarisedResult Summarised result with period prevalence estimates.
Result Conversion¶
Pivot long-form SummarisedResult objects into wide tidy DataFrames with
named columns for each estimate.
as_incidence_result
¶
Convert a summarised result to a tidy incidence DataFrame.
Pivots the long-form SummarisedResult into a wide DataFrame with one row per interval and columns for each estimate.
Parameters¶
result
A SummarisedResult from :func:estimate_incidence.
metadata
If True, include settings metadata columns.
Returns¶
pl.DataFrame Wide-form incidence results.
as_prevalence_result
¶
Convert a summarised result to a tidy prevalence DataFrame.
Pivots the long-form SummarisedResult into a wide DataFrame with one row per interval and columns for each estimate.
Parameters¶
result
A SummarisedResult from :func:estimate_point_prevalence or
:func:estimate_period_prevalence.
metadata
If True, include settings metadata columns.
Returns¶
pl.DataFrame Wide-form prevalence results.
Table Functions¶
Thin wrappers around vis_omop_table() with epidemiological formatting
defaults.
table_incidence
¶
table_incidence(
result: SummarisedResult,
*,
type: Literal["gt", "polars"] | None = None,
header: list[str] | None = None,
group_column: list[str] | None = None,
settings_column: list[str] | None = None,
hide: list[str] | None = None,
style: Any | None = None,
options: dict[str, Any] | None = None,
) -> Any
Render an incidence results table.
Parameters¶
result
A SummarisedResult from :func:estimate_incidence.
type
"gt" for great_tables, "polars" for DataFrame.
header
Columns to pivot into header.
group_column
Row grouping columns.
settings_column
Settings columns to include.
hide
Columns to hide.
style
Table style configuration.
options
Override options from :func:options_table_incidence.
Returns¶
great_tables.GT or polars.DataFrame
table_prevalence
¶
table_prevalence(
result: SummarisedResult,
*,
type: Literal["gt", "polars"] | None = None,
header: list[str] | None = None,
group_column: list[str] | None = None,
settings_column: list[str] | None = None,
hide: list[str] | None = None,
style: Any | None = None,
options: dict[str, Any] | None = None,
) -> Any
Render a prevalence results table.
Parameters¶
result
A SummarisedResult from a prevalence estimation function.
type
"gt" for great_tables, "polars" for DataFrame.
header
Columns to pivot into header.
group_column
Row grouping columns.
settings_column
Settings columns to include.
hide
Columns to hide.
style
Table style configuration.
options
Override options from :func:options_table_prevalence.
Returns¶
great_tables.GT or polars.DataFrame
table_incidence_attrition
¶
table_prevalence_attrition
¶
options_table_incidence
¶
Return default table options for incidence results.
Returns¶
dict
Default options for :func:table_incidence.
options_table_prevalence
¶
Return default table options for prevalence results.
Returns¶
dict
Default options for :func:table_prevalence.
Plot Functions¶
Wrappers around scatter_plot() and bar_plot() with epidemiological
defaults.
plot_incidence
¶
plot_incidence(
result: SummarisedResult,
*,
x: str = "variable_level",
y: str = "incidence_100000_pys",
line: bool = True,
point: bool = True,
ribbon: bool = True,
y_min: str | None = "incidence_100000_pys_95ci_lower",
y_max: str | None = "incidence_100000_pys_95ci_upper",
facet: str | list[str] | None = None,
colour: str | None = None,
) -> Any
Plot incidence rates as a line plot with confidence ribbons.
Parameters¶
result
A SummarisedResult from :func:estimate_incidence.
x
Column for x-axis (default: interval labels).
y
Estimate name for y-axis.
line, point, ribbon
Display elements.
y_min, y_max
Estimate names for CI ribbon bounds.
facet
Faceting column(s).
colour
Colour grouping column.
Returns¶
plotly.graph_objects.Figure
plot_prevalence
¶
plot_prevalence(
result: SummarisedResult,
*,
x: str = "variable_level",
y: str = "prevalence",
line: bool = True,
point: bool = True,
ribbon: bool = True,
y_min: str | None = "prevalence_95ci_lower",
y_max: str | None = "prevalence_95ci_upper",
facet: str | list[str] | None = None,
colour: str | None = None,
) -> Any
Plot prevalence proportions as a line plot with confidence ribbons.
Parameters¶
result A SummarisedResult from a prevalence estimation function. x Column for x-axis. y Estimate name for y-axis. line, point, ribbon Display elements. y_min, y_max Estimate names for CI ribbon bounds. facet Faceting column(s). colour Colour grouping column.
Returns¶
plotly.graph_objects.Figure
plot_incidence_population
¶
plot_prevalence_population
¶
Grouping Helpers¶
Discover available grouping columns for faceted plots.
available_incidence_grouping
¶
available_prevalence_grouping
¶
Mock Data & Benchmarking¶
mock_incidence_prevalence
¶
mock_incidence_prevalence(
*,
sample_size: int = 100,
outcome_prevalence: float = 0.2,
seed: int | None = None,
study_start: date | None = None,
study_end: date | None = None,
) -> CdmReference
Create a mock CDM reference for incidence/prevalence testing.
Generates synthetic person, observation_period, and cohort tables suitable for testing the estimation functions.
Parameters¶
sample_size
Number of persons.
outcome_prevalence
Probability that each person has an outcome event.
seed
Random seed for reproducibility.
study_start
Start of study period. Default: 2010-01-01.
study_end
End of study period. Default: 2020-12-31.
Returns¶
CdmReference
CDM with person, observation_period, target cohort,
and outcome cohort tables.