omopy.characteristics¶
Cohort characterization analytics — summarise, tabulate, and plot cohort characteristics, counts, attrition, timing, overlap, large-scale characteristics, and codelist usage.
This module is the Python equivalent of the R CohortCharacteristics package.
Table rendering delegates to omopy.vis; plot rendering uses
plotly.
Summarise Functions¶
The core analytical functions. Each queries cohort data (optionally enriching
with demographics and intersections via omopy.profiles) and produces a
SummarisedResult.
summarise_characteristics
¶
summarise_characteristics(
cohort: CohortTable,
*,
cohort_id: list[int] | None = None,
strata: list[str | list[str]] | None = None,
counts: bool = True,
demographics: bool = True,
age_group: dict[str, tuple[float, float]]
| list[tuple[float, float]]
| None = None,
table_intersect_flag: list[dict[str, Any]]
| None = None,
table_intersect_count: list[dict[str, Any]]
| None = None,
table_intersect_date: list[dict[str, Any]]
| None = None,
table_intersect_days: list[dict[str, Any]]
| None = None,
cohort_intersect_flag: list[dict[str, Any]]
| None = None,
cohort_intersect_count: list[dict[str, Any]]
| None = None,
cohort_intersect_date: list[dict[str, Any]]
| None = None,
cohort_intersect_days: list[dict[str, Any]]
| None = None,
concept_intersect_flag: list[dict[str, Any]]
| None = None,
concept_intersect_count: list[dict[str, Any]]
| None = None,
concept_intersect_date: list[dict[str, Any]]
| None = None,
concept_intersect_days: list[dict[str, Any]]
| None = None,
other_variables: list[str] | None = None,
estimates: dict[str, tuple[str, ...]] | None = None,
) -> SummarisedResult
Summarise cohort characteristics including demographics and intersections.
This is the main entry point for cohort characterisation. It:
- Enriches cohort records with demographics (age, sex, observation periods)
- Adds any requested intersections (table, cohort, concept)
- Aggregates per cohort × stratum into a standardised SummarisedResult
Parameters¶
cohort
A CohortTable to summarise.
cohort_id
Restrict to specific cohort definition IDs. None = all.
strata
Stratification columns. Each element is a column name or list of
column names to cross-stratify. The overall (unstratified) result
is always included.
counts
Include subject/record counts.
demographics
Include demographic variables (age, sex, prior/future observation,
days in cohort).
age_group
Age grouping specification, forwarded to add_demographics().
table_intersect_flag, table_intersect_count,
table_intersect_date, table_intersect_days
Lists of keyword-argument dicts forwarded to the corresponding
omopy.profiles.add_table_intersect_*() function.
cohort_intersect_flag, cohort_intersect_count,
cohort_intersect_date, cohort_intersect_days
Lists of keyword-argument dicts forwarded to the corresponding
omopy.profiles.add_cohort_intersect_*() function.
concept_intersect_flag, concept_intersect_count,
concept_intersect_date, concept_intersect_days
Lists of keyword-argument dicts forwarded to the corresponding
omopy.profiles.add_concept_intersect_*() function.
other_variables
Additional columns already present in the cohort to summarise.
estimates
Override default estimates per variable name. Keys are variable
names, values are tuples of estimate names.
Returns¶
SummarisedResult
With result_type="summarise_characteristics".
summarise_cohort_count
¶
summarise_cohort_count(
cohort: CohortTable,
*,
cohort_id: list[int] | None = None,
strata: list[str | list[str]] | None = None,
) -> SummarisedResult
Summarise subject and record counts per cohort.
Thin wrapper around :func:summarise_characteristics with
counts=True, demographics=False.
Parameters¶
cohort
A CohortTable to count.
cohort_id
Restrict to specific cohort definition IDs. None = all.
strata
Stratification columns.
Returns¶
SummarisedResult
With result_type="summarise_cohort_count".
summarise_cohort_attrition
¶
summarise_cohort_attrition(
cohort: CohortTable,
*,
cohort_id: list[int] | None = None,
) -> SummarisedResult
Summarise cohort attrition as a SummarisedResult.
Pivots the attrition table (reasons, excluded counts) into the standard long-format result.
Parameters¶
cohort
A CohortTable with attrition data.
cohort_id
Restrict to specific cohort definition IDs. None = all.
Returns¶
SummarisedResult
With result_type="summarise_cohort_attrition",
strata_name="reason", additional_name="reason_id".
summarise_cohort_timing
¶
summarise_cohort_timing(
cohort: CohortTable,
*,
cohort_id: list[int] | None = None,
strata: list[str | list[str]] | None = None,
restrict_to_first_entry: bool = True,
estimates: tuple[str, ...] = (
"min",
"q25",
"median",
"q75",
"max",
),
) -> SummarisedResult
Summarise pairwise timing between cohort entries.
For each pair of cohorts, computes the distribution of days between cohort entries for subjects appearing in both.
Parameters¶
cohort
A CohortTable.
cohort_id
Restrict to specific cohort definition IDs. None = all.
strata
Stratification columns (must exist before the join).
restrict_to_first_entry
If True, only consider the first entry per subject per cohort.
estimates
Statistics to compute on days_between_cohort_entries.
Returns¶
SummarisedResult
With result_type="summarise_cohort_timing",
group_name="cohort_name_reference &&& cohort_name_comparator".
summarise_cohort_overlap
¶
summarise_cohort_overlap(
cohort: CohortTable,
*,
cohort_id: list[int] | None = None,
strata: list[str | list[str]] | None = None,
overlap_by: str = "subject_id",
) -> SummarisedResult
Summarise pairwise overlap between cohorts.
For each pair of cohorts, counts subjects in only the reference, only the comparator, or in both.
Parameters¶
cohort
A CohortTable.
cohort_id
Restrict to specific cohort definition IDs. None = all.
strata
Stratification columns.
overlap_by
Column identifying unique entities (default: "subject_id").
Returns¶
SummarisedResult
With result_type="summarise_cohort_overlap",
group_name="cohort_name_reference &&& cohort_name_comparator".
summarise_large_scale_characteristics
¶
summarise_large_scale_characteristics(
cohort: CohortTable,
*,
cohort_id: list[int] | None = None,
strata: list[str | list[str]] | None = None,
window: list[Window] | None = None,
event_in_window: list[str] | None = None,
episode_in_window: list[str] | None = None,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
minimum_frequency: float = 0.005,
excluded_codes: list[int] | None = None,
) -> SummarisedResult
Summarise large-scale characteristics (concept-level prevalence).
For each specified OMOP domain table and time window, computes the frequency of each concept relative to the cohort.
Parameters¶
cohort
A CohortTable.
cohort_id
Restrict to specific cohort definition IDs. None = all.
strata
Stratification columns.
window
Time windows as (lower, upper) day offsets from index_date.
Defaults to standard epidemiological windows.
event_in_window
OMOP table names to count events (point-in-time). E.g.
["condition_occurrence", "drug_exposure"].
episode_in_window
OMOP table names to count episodes (interval overlap).
index_date
Column name for the index date.
censor_date
Column name for censoring. None = no censoring.
minimum_frequency
Minimum frequency threshold (0–1) to include a concept.
excluded_codes
Concept IDs to exclude from results.
Returns¶
SummarisedResult
With result_type="summarise_large_scale_characteristics".
summarise_cohort_codelist
¶
Table Functions¶
Thin wrappers around vis_omop_table() / vis_table() with
domain-specific defaults for estimate formatting, headers, and grouping.
table_characteristics
¶
table_characteristics(
result: SummarisedResult,
*,
type: Literal["gt", "polars"] | None = None,
header: list[str] | None = None,
group_column: list[str] | None = None,
hide: list[str] | None = None,
style: Any | None = None,
**options: Any,
) -> Any
Render a characteristics table.
Parameters¶
result
A SummarisedResult with result_type="summarise_characteristics".
type
Output format: "gt" for great_tables, "polars" for DataFrame.
header
Columns to pivot into header. Defaults to
["cdm_name", "cohort_name"].
group_column
Columns for row grouping.
hide
Columns to hide.
style
A TableStyle for styling.
Returns¶
great_tables.GT or polars.DataFrame
table_cohort_count
¶
table_cohort_attrition
¶
table_cohort_timing
¶
table_cohort_timing(
result: SummarisedResult,
*,
time_scale: Literal["days", "years"] = "days",
unique_combinations: bool = True,
type: Literal["gt", "polars"] | None = None,
header: list[str] | None = None,
group_column: list[str] | None = None,
hide: list[str] | None = None,
style: Any | None = None,
**options: Any,
) -> Any
Render a cohort timing table.
Parameters¶
result
A SummarisedResult with result_type="summarise_cohort_timing".
time_scale
"days" or "years" (divides by 365.25).
unique_combinations
If True, show only unique cohort pairs (A→B but not B→A).
type, header, group_column, hide, style
See :func:table_characteristics.
Returns¶
great_tables.GT or polars.DataFrame
table_cohort_overlap
¶
table_large_scale_characteristics
¶
table_top_large_scale_characteristics
¶
available_table_columns
¶
Plot Functions¶
Wrappers around bar_plot(), scatter_plot(), box_plot(), and custom
Plotly visualizations.
plot_characteristics
¶
plot_cohort_count
¶
plot_cohort_count(
result: SummarisedResult,
*,
x: str | None = None,
facet: str | list[str] | None = None,
colour: str | None = None,
style: Any | None = None,
) -> Any
Plot cohort counts as a bar chart.
Parameters¶
result
A SummarisedResult with result_type="summarise_cohort_count".
x
Column for x-axis. Defaults to "cohort_name".
facet
Column(s) for faceting. Defaults to ["cdm_name"].
colour
Column for colour grouping.
style
A PlotStyle for styling.
Returns¶
plotly.graph_objects.Figure
plot_cohort_attrition
¶
Render an attrition flowchart as a Plotly figure.
Unlike the R version which uses DiagrammeR, this renders a simplified vertical flowchart using Plotly shapes and annotations.
Parameters¶
result
A SummarisedResult with
result_type="summarise_cohort_attrition".
show
Which counts to display: ["subjects"], ["records"],
or ["subjects", "records"] (default).
Returns¶
plotly.graph_objects.Figure A flowchart figure.
plot_cohort_overlap
¶
plot_cohort_overlap(
result: SummarisedResult,
*,
unique_combinations: bool = True,
facet: str | list[str] | None = None,
colour: str | None = None,
style: Any | None = None,
) -> Any
Plot cohort overlap as a stacked bar chart.
Parameters¶
result
A SummarisedResult with result_type="summarise_cohort_overlap".
unique_combinations
If True, show only unique cohort pairs.
facet
Column(s) for faceting. Defaults to
["cdm_name", "cohort_name_reference"].
colour
Column for colour grouping. Defaults to "variable_name".
style
A PlotStyle for styling.
Returns¶
plotly.graph_objects.Figure
plot_cohort_timing
¶
plot_cohort_timing(
result: SummarisedResult,
*,
plot_type: Literal[
"boxplot", "densityplot"
] = "boxplot",
time_scale: Literal["days", "years"] = "days",
unique_combinations: bool = True,
facet: str | list[str] | None = None,
colour: str | list[str] | None = None,
style: Any | None = None,
) -> Any
Plot cohort timing distributions.
Parameters¶
result
A SummarisedResult with result_type="summarise_cohort_timing".
plot_type
"boxplot" or "densityplot".
time_scale
"days" or "years" (divides by 365.25).
unique_combinations
If True, show only unique cohort pairs.
facet
Column(s) for faceting. Defaults to
["cdm_name", "cohort_name_reference"].
colour
Column for colour grouping. Defaults to
"cohort_name_comparator".
style
A PlotStyle for styling.
Returns¶
plotly.graph_objects.Figure
plot_large_scale_characteristics
¶
plot_large_scale_characteristics(
result: SummarisedResult,
*,
facet: str | list[str] | None = None,
colour: str | None = None,
style: Any | None = None,
) -> Any
Plot large-scale characteristics as a scatter plot.
Parameters¶
result
A SummarisedResult with
result_type="summarise_large_scale_characteristics".
facet
Column(s) for faceting. Defaults to
["cdm_name", "cohort_name"].
colour
Column for colour grouping. Defaults to "variable_level".
style
A PlotStyle for styling.
Returns¶
plotly.graph_objects.Figure
plot_compared_large_scale_characteristics
¶
plot_compared_large_scale_characteristics(
result: SummarisedResult,
*,
colour: str,
reference: str | None = None,
facet: str | list[str] | None = None,
missings: float | None = 0.0,
style: Any | None = None,
) -> Any
Plot compared large-scale characteristics.
Shows a scatter plot where x is the reference group's percentage and y is each comparison group's percentage, with a diagonal reference line.
Parameters¶
result
A SummarisedResult with
result_type="summarise_large_scale_characteristics".
colour
Required. Column to colour by (e.g. "cohort_name").
reference
Level of colour to use as reference (x-axis). Defaults to
the first alphabetical level.
facet
Column(s) for faceting.
missings
Replace missing percentages with this value. None = drop.
style
A PlotStyle for styling.
Returns¶
plotly.graph_objects.Figure
Mock Data¶
mock_cohort_characteristics
¶
mock_cohort_characteristics(
*, n_cohorts: int = 2, n_strata: int = 0, seed: int = 42
) -> SummarisedResult
Generate a mock SummarisedResult for cohort characteristics.
Creates synthetic data representative of a
summarise_characteristics() output, useful for testing
table/plot functions without requiring a database.
Parameters¶
n_cohorts Number of cohorts to simulate. n_strata Number of additional strata to include (0 = overall only). seed Random seed for reproducibility.
Returns¶
SummarisedResult
With result_type="summarise_characteristics".