omopy.characteristics¶

Cohort characterization analytics — summarise, tabulate, and plot cohort characteristics, counts, attrition, timing, overlap, large-scale characteristics, and codelist usage.

This module is the Python equivalent of the R CohortCharacteristics package. Table rendering delegates to omopy.vis; plot rendering uses plotly.

Summarise Functions¶

The core analytical functions. Each queries cohort data (optionally enriching with demographics and intersections via omopy.profiles) and produces a SummarisedResult.

summarise_characteristics ¶

summarise_characteristics(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
    counts: bool = True,
    demographics: bool = True,
    age_group: dict[str, tuple[float, float]]
    | list[tuple[float, float]]
    | None = None,
    table_intersect_flag: list[dict[str, Any]]
    | None = None,
    table_intersect_count: list[dict[str, Any]]
    | None = None,
    table_intersect_date: list[dict[str, Any]]
    | None = None,
    table_intersect_days: list[dict[str, Any]]
    | None = None,
    cohort_intersect_flag: list[dict[str, Any]]
    | None = None,
    cohort_intersect_count: list[dict[str, Any]]
    | None = None,
    cohort_intersect_date: list[dict[str, Any]]
    | None = None,
    cohort_intersect_days: list[dict[str, Any]]
    | None = None,
    concept_intersect_flag: list[dict[str, Any]]
    | None = None,
    concept_intersect_count: list[dict[str, Any]]
    | None = None,
    concept_intersect_date: list[dict[str, Any]]
    | None = None,
    concept_intersect_days: list[dict[str, Any]]
    | None = None,
    other_variables: list[str] | None = None,
    estimates: dict[str, tuple[str, ...]] | None = None,
) -> SummarisedResult

Summarise cohort characteristics including demographics and intersections.

This is the main entry point for cohort characterisation. It:

Enriches cohort records with demographics (age, sex, observation periods)
Adds any requested intersections (table, cohort, concept)
Aggregates per cohort × stratum into a standardised SummarisedResult

Parameters¶

cohort A CohortTable to summarise. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns. Each element is a column name or list of column names to cross-stratify. The overall (unstratified) result is always included. counts Include subject/record counts. demographics Include demographic variables (age, sex, prior/future observation, days in cohort). age_group Age grouping specification, forwarded to add_demographics(). table_intersect_flag, table_intersect_count, table_intersect_date, table_intersect_days Lists of keyword-argument dicts forwarded to the corresponding omopy.profiles.add_table_intersect_*() function. cohort_intersect_flag, cohort_intersect_count, cohort_intersect_date, cohort_intersect_days Lists of keyword-argument dicts forwarded to the corresponding omopy.profiles.add_cohort_intersect_*() function. concept_intersect_flag, concept_intersect_count, concept_intersect_date, concept_intersect_days Lists of keyword-argument dicts forwarded to the corresponding omopy.profiles.add_concept_intersect_*() function. other_variables Additional columns already present in the cohort to summarise. estimates Override default estimates per variable name. Keys are variable names, values are tuples of estimate names.

Returns¶

SummarisedResult With result_type="summarise_characteristics".

summarise_cohort_count ¶

summarise_cohort_count(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
) -> SummarisedResult

Summarise subject and record counts per cohort.

Thin wrapper around :func:summarise_characteristics with counts=True, demographics=False.

Parameters¶

cohort A CohortTable to count. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns.

Returns¶

SummarisedResult With result_type="summarise_cohort_count".

summarise_cohort_attrition ¶

summarise_cohort_attrition(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
) -> SummarisedResult

Summarise cohort attrition as a SummarisedResult.

Pivots the attrition table (reasons, excluded counts) into the standard long-format result.

Parameters¶

cohort A CohortTable with attrition data. cohort_id Restrict to specific cohort definition IDs. None = all.

Returns¶

SummarisedResult With result_type="summarise_cohort_attrition", strata_name="reason", additional_name="reason_id".

summarise_cohort_timing ¶

summarise_cohort_timing(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
    restrict_to_first_entry: bool = True,
    estimates: tuple[str, ...] = (
        "min",
        "q25",
        "median",
        "q75",
        "max",
    ),
) -> SummarisedResult

Summarise pairwise timing between cohort entries.

For each pair of cohorts, computes the distribution of days between cohort entries for subjects appearing in both.

Parameters¶

cohort A CohortTable. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns (must exist before the join). restrict_to_first_entry If True, only consider the first entry per subject per cohort. estimates Statistics to compute on days_between_cohort_entries.

Returns¶

SummarisedResult With result_type="summarise_cohort_timing", group_name="cohort_name_reference &&& cohort_name_comparator".

summarise_cohort_overlap ¶

summarise_cohort_overlap(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
    overlap_by: str = "subject_id",
) -> SummarisedResult

Summarise pairwise overlap between cohorts.

For each pair of cohorts, counts subjects in only the reference, only the comparator, or in both.

Parameters¶

cohort A CohortTable. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns. overlap_by Column identifying unique entities (default: "subject_id").

Returns¶

SummarisedResult With result_type="summarise_cohort_overlap", group_name="cohort_name_reference &&& cohort_name_comparator".

summarise_large_scale_characteristics ¶

summarise_large_scale_characteristics(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
    window: list[Window] | None = None,
    event_in_window: list[str] | None = None,
    episode_in_window: list[str] | None = None,
    index_date: str = "cohort_start_date",
    censor_date: str | None = None,
    minimum_frequency: float = 0.005,
    excluded_codes: list[int] | None = None,
) -> SummarisedResult

Summarise large-scale characteristics (concept-level prevalence).

For each specified OMOP domain table and time window, computes the frequency of each concept relative to the cohort.

Parameters¶

cohort A CohortTable. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns. window Time windows as (lower, upper) day offsets from index_date. Defaults to standard epidemiological windows. event_in_window OMOP table names to count events (point-in-time). E.g. ["condition_occurrence", "drug_exposure"]. episode_in_window OMOP table names to count episodes (interval overlap). index_date Column name for the index date. censor_date Column name for censoring. None = no censoring. minimum_frequency Minimum frequency threshold (0–1) to include a concept. excluded_codes Concept IDs to exclude from results.

Returns¶

SummarisedResult With result_type="summarise_large_scale_characteristics".

summarise_cohort_codelist ¶

summarise_cohort_codelist(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
) -> SummarisedResult

Summarise the codelist used to define each cohort.

Parameters¶

cohort A CohortTable with codelist metadata. cohort_id Restrict to specific cohort definition IDs. None = all.

Returns¶

SummarisedResult With result_type="summarise_cohort_codelist", strata_name="codelist_name &&& codelist_type".

Table Functions¶

Thin wrappers around vis_omop_table() / vis_table() with domain-specific defaults for estimate formatting, headers, and grouping.

table_characteristics ¶

table_characteristics(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a characteristics table.

Parameters¶

result A SummarisedResult with result_type="summarise_characteristics". type Output format: "gt" for great_tables, "polars" for DataFrame. header Columns to pivot into header. Defaults to ["cdm_name", "cohort_name"]. group_column Columns for row grouping. hide Columns to hide. style A TableStyle for styling.

Returns¶

great_tables.GT or polars.DataFrame

table_cohort_count ¶

table_cohort_count(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a cohort count table.

Parameters¶

result A SummarisedResult with result_type="summarise_cohort_count". type, header, group_column, hide, style See :func:table_characteristics.

Returns¶

great_tables.GT or polars.DataFrame

table_cohort_attrition ¶

table_cohort_attrition(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a cohort attrition table.

Parameters¶

result A SummarisedResult with result_type="summarise_cohort_attrition". type, header, group_column, hide, style See :func:table_characteristics.

Returns¶

great_tables.GT or polars.DataFrame

table_cohort_timing ¶

table_cohort_timing(
    result: SummarisedResult,
    *,
    time_scale: Literal["days", "years"] = "days",
    unique_combinations: bool = True,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a cohort timing table.

Parameters¶

result A SummarisedResult with result_type="summarise_cohort_timing". time_scale "days" or "years" (divides by 365.25). unique_combinations If True, show only unique cohort pairs (A→B but not B→A). type, header, group_column, hide, style See :func:table_characteristics.

Returns¶

great_tables.GT or polars.DataFrame

table_cohort_overlap ¶

table_cohort_overlap(
    result: SummarisedResult,
    *,
    unique_combinations: bool = True,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a cohort overlap table.

Parameters¶

result A SummarisedResult with result_type="summarise_cohort_overlap". unique_combinations If True, show only unique cohort pairs. type, header, group_column, hide, style See :func:table_characteristics.

Returns¶

great_tables.GT or polars.DataFrame

table_large_scale_characteristics ¶

table_large_scale_characteristics(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
) -> Any

Render the full large-scale characteristics table.

Parameters¶

result A SummarisedResult with result_type="summarise_large_scale_characteristics". type Output format. hide Columns to hide. style Table style.

Returns¶

great_tables.GT or polars.DataFrame

table_top_large_scale_characteristics ¶

table_top_large_scale_characteristics(
    result: SummarisedResult,
    *,
    top_concepts: int = 10,
    type: Literal["gt", "polars"] | None = None,
    style: Any | None = None,
) -> Any

Render the top N most frequent concepts as a table.

Parameters¶

result A SummarisedResult with result_type="summarise_large_scale_characteristics". top_concepts Number of top concepts per group to display. type Output format. style Table style.

Returns¶

great_tables.GT or polars.DataFrame

available_table_columns ¶

available_table_columns(
    result: SummarisedResult,
) -> list[str]

Return columns available for table customisation.

Parameters¶

result Any characteristics SummarisedResult.

Returns¶

list[str] Column names from cdm_name, group, strata, additional, and settings columns.

Plot Functions¶

Wrappers around bar_plot(), scatter_plot(), box_plot(), and custom Plotly visualizations.

plot_characteristics ¶

plot_characteristics(
    result: SummarisedResult,
    *,
    plot_type: Literal[
        "barplot", "scatterplot", "boxplot"
    ] = "barplot",
    facet: str | list[str] | None = None,
    colour: str | None = None,
    style: Any | None = None,
) -> Any

Plot characteristics results.

Parameters¶

result A SummarisedResult with result_type="summarise_characteristics". plot_type "barplot", "scatterplot", or "boxplot". facet Column(s) for faceting. colour Column for colour grouping. style A PlotStyle for styling.

Returns¶

plotly.graph_objects.Figure

plot_cohort_count ¶

plot_cohort_count(
    result: SummarisedResult,
    *,
    x: str | None = None,
    facet: str | list[str] | None = None,
    colour: str | None = None,
    style: Any | None = None,
) -> Any

Plot cohort counts as a bar chart.

Parameters¶

result A SummarisedResult with result_type="summarise_cohort_count". x Column for x-axis. Defaults to "cohort_name". facet Column(s) for faceting. Defaults to ["cdm_name"]. colour Column for colour grouping. style A PlotStyle for styling.

Returns¶

plotly.graph_objects.Figure

plot_cohort_attrition ¶

plot_cohort_attrition(
    result: SummarisedResult,
    *,
    show: list[str] | None = None,
) -> Any

Render an attrition flowchart as a Plotly figure.

Unlike the R version which uses DiagrammeR, this renders a simplified vertical flowchart using Plotly shapes and annotations.

Parameters¶

result A SummarisedResult with result_type="summarise_cohort_attrition". show Which counts to display: ["subjects"], ["records"], or ["subjects", "records"] (default).

Returns¶

plotly.graph_objects.Figure A flowchart figure.

plot_cohort_overlap ¶

plot_cohort_overlap(
    result: SummarisedResult,
    *,
    unique_combinations: bool = True,
    facet: str | list[str] | None = None,
    colour: str | None = None,
    style: Any | None = None,
) -> Any

Plot cohort overlap as a stacked bar chart.

Parameters¶

result A SummarisedResult with result_type="summarise_cohort_overlap". unique_combinations If True, show only unique cohort pairs. facet Column(s) for faceting. Defaults to ["cdm_name", "cohort_name_reference"]. colour Column for colour grouping. Defaults to "variable_name". style A PlotStyle for styling.

Returns¶

plotly.graph_objects.Figure

plot_cohort_timing ¶

plot_cohort_timing(
    result: SummarisedResult,
    *,
    plot_type: Literal[
        "boxplot", "densityplot"
    ] = "boxplot",
    time_scale: Literal["days", "years"] = "days",
    unique_combinations: bool = True,
    facet: str | list[str] | None = None,
    colour: str | list[str] | None = None,
    style: Any | None = None,
) -> Any

Plot cohort timing distributions.

Parameters¶

result A SummarisedResult with result_type="summarise_cohort_timing". plot_type "boxplot" or "densityplot". time_scale "days" or "years" (divides by 365.25). unique_combinations If True, show only unique cohort pairs. facet Column(s) for faceting. Defaults to ["cdm_name", "cohort_name_reference"]. colour Column for colour grouping. Defaults to "cohort_name_comparator". style A PlotStyle for styling.

Returns¶

plotly.graph_objects.Figure

plot_large_scale_characteristics ¶

plot_large_scale_characteristics(
    result: SummarisedResult,
    *,
    facet: str | list[str] | None = None,
    colour: str | None = None,
    style: Any | None = None,
) -> Any

Plot large-scale characteristics as a scatter plot.

Parameters¶

result A SummarisedResult with result_type="summarise_large_scale_characteristics". facet Column(s) for faceting. Defaults to ["cdm_name", "cohort_name"]. colour Column for colour grouping. Defaults to "variable_level". style A PlotStyle for styling.

Returns¶

plotly.graph_objects.Figure

plot_compared_large_scale_characteristics ¶

plot_compared_large_scale_characteristics(
    result: SummarisedResult,
    *,
    colour: str,
    reference: str | None = None,
    facet: str | list[str] | None = None,
    missings: float | None = 0.0,
    style: Any | None = None,
) -> Any

Plot compared large-scale characteristics.

Shows a scatter plot where x is the reference group's percentage and y is each comparison group's percentage, with a diagonal reference line.

Parameters¶

result A SummarisedResult with result_type="summarise_large_scale_characteristics". colour Required. Column to colour by (e.g. "cohort_name"). reference Level of colour to use as reference (x-axis). Defaults to the first alphabetical level. facet Column(s) for faceting. missings Replace missing percentages with this value. None = drop. style A PlotStyle for styling.

Returns¶

plotly.graph_objects.Figure

Mock Data¶

mock_cohort_characteristics ¶

mock_cohort_characteristics(
    *, n_cohorts: int = 2, n_strata: int = 0, seed: int = 42
) -> SummarisedResult

Generate a mock SummarisedResult for cohort characteristics.

Creates synthetic data representative of a summarise_characteristics() output, useful for testing table/plot functions without requiring a database.

Parameters¶

n_cohorts Number of cohorts to simulate. n_strata Number of additional strata to include (0 = overall only). seed Random seed for reproducibility.

Returns¶

SummarisedResult With result_type="summarise_characteristics".