Skip to content

omopy.characteristics

Cohort characterization analytics — summarise, tabulate, and plot cohort characteristics, counts, attrition, timing, overlap, large-scale characteristics, and codelist usage.

This module is the Python equivalent of the R CohortCharacteristics package. Table rendering delegates to omopy.vis; plot rendering uses plotly.

Summarise Functions

The core analytical functions. Each queries cohort data (optionally enriching with demographics and intersections via omopy.profiles) and produces a SummarisedResult.

summarise_characteristics

summarise_characteristics(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
    counts: bool = True,
    demographics: bool = True,
    age_group: dict[str, tuple[float, float]]
    | list[tuple[float, float]]
    | None = None,
    table_intersect_flag: list[dict[str, Any]]
    | None = None,
    table_intersect_count: list[dict[str, Any]]
    | None = None,
    table_intersect_date: list[dict[str, Any]]
    | None = None,
    table_intersect_days: list[dict[str, Any]]
    | None = None,
    cohort_intersect_flag: list[dict[str, Any]]
    | None = None,
    cohort_intersect_count: list[dict[str, Any]]
    | None = None,
    cohort_intersect_date: list[dict[str, Any]]
    | None = None,
    cohort_intersect_days: list[dict[str, Any]]
    | None = None,
    concept_intersect_flag: list[dict[str, Any]]
    | None = None,
    concept_intersect_count: list[dict[str, Any]]
    | None = None,
    concept_intersect_date: list[dict[str, Any]]
    | None = None,
    concept_intersect_days: list[dict[str, Any]]
    | None = None,
    other_variables: list[str] | None = None,
    estimates: dict[str, tuple[str, ...]] | None = None,
) -> SummarisedResult

Summarise cohort characteristics including demographics and intersections.

This is the main entry point for cohort characterisation. It:

  1. Enriches cohort records with demographics (age, sex, observation periods)
  2. Adds any requested intersections (table, cohort, concept)
  3. Aggregates per cohort × stratum into a standardised SummarisedResult

Parameters

cohort A CohortTable to summarise. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns. Each element is a column name or list of column names to cross-stratify. The overall (unstratified) result is always included. counts Include subject/record counts. demographics Include demographic variables (age, sex, prior/future observation, days in cohort). age_group Age grouping specification, forwarded to add_demographics(). table_intersect_flag, table_intersect_count, table_intersect_date, table_intersect_days Lists of keyword-argument dicts forwarded to the corresponding omopy.profiles.add_table_intersect_*() function. cohort_intersect_flag, cohort_intersect_count, cohort_intersect_date, cohort_intersect_days Lists of keyword-argument dicts forwarded to the corresponding omopy.profiles.add_cohort_intersect_*() function. concept_intersect_flag, concept_intersect_count, concept_intersect_date, concept_intersect_days Lists of keyword-argument dicts forwarded to the corresponding omopy.profiles.add_concept_intersect_*() function. other_variables Additional columns already present in the cohort to summarise. estimates Override default estimates per variable name. Keys are variable names, values are tuples of estimate names.

Returns

SummarisedResult With result_type="summarise_characteristics".

summarise_cohort_count

summarise_cohort_count(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
) -> SummarisedResult

Summarise subject and record counts per cohort.

Thin wrapper around :func:summarise_characteristics with counts=True, demographics=False.

Parameters

cohort A CohortTable to count. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns.

Returns

SummarisedResult With result_type="summarise_cohort_count".

summarise_cohort_attrition

summarise_cohort_attrition(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
) -> SummarisedResult

Summarise cohort attrition as a SummarisedResult.

Pivots the attrition table (reasons, excluded counts) into the standard long-format result.

Parameters

cohort A CohortTable with attrition data. cohort_id Restrict to specific cohort definition IDs. None = all.

Returns

SummarisedResult With result_type="summarise_cohort_attrition", strata_name="reason", additional_name="reason_id".

summarise_cohort_timing

summarise_cohort_timing(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
    restrict_to_first_entry: bool = True,
    estimates: tuple[str, ...] = (
        "min",
        "q25",
        "median",
        "q75",
        "max",
    ),
) -> SummarisedResult

Summarise pairwise timing between cohort entries.

For each pair of cohorts, computes the distribution of days between cohort entries for subjects appearing in both.

Parameters

cohort A CohortTable. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns (must exist before the join). restrict_to_first_entry If True, only consider the first entry per subject per cohort. estimates Statistics to compute on days_between_cohort_entries.

Returns

SummarisedResult With result_type="summarise_cohort_timing", group_name="cohort_name_reference &&& cohort_name_comparator".

summarise_cohort_overlap

summarise_cohort_overlap(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
    overlap_by: str = "subject_id",
) -> SummarisedResult

Summarise pairwise overlap between cohorts.

For each pair of cohorts, counts subjects in only the reference, only the comparator, or in both.

Parameters

cohort A CohortTable. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns. overlap_by Column identifying unique entities (default: "subject_id").

Returns

SummarisedResult With result_type="summarise_cohort_overlap", group_name="cohort_name_reference &&& cohort_name_comparator".

summarise_large_scale_characteristics

summarise_large_scale_characteristics(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
    strata: list[str | list[str]] | None = None,
    window: list[Window] | None = None,
    event_in_window: list[str] | None = None,
    episode_in_window: list[str] | None = None,
    index_date: str = "cohort_start_date",
    censor_date: str | None = None,
    minimum_frequency: float = 0.005,
    excluded_codes: list[int] | None = None,
) -> SummarisedResult

Summarise large-scale characteristics (concept-level prevalence).

For each specified OMOP domain table and time window, computes the frequency of each concept relative to the cohort.

Parameters

cohort A CohortTable. cohort_id Restrict to specific cohort definition IDs. None = all. strata Stratification columns. window Time windows as (lower, upper) day offsets from index_date. Defaults to standard epidemiological windows. event_in_window OMOP table names to count events (point-in-time). E.g. ["condition_occurrence", "drug_exposure"]. episode_in_window OMOP table names to count episodes (interval overlap). index_date Column name for the index date. censor_date Column name for censoring. None = no censoring. minimum_frequency Minimum frequency threshold (0–1) to include a concept. excluded_codes Concept IDs to exclude from results.

Returns

SummarisedResult With result_type="summarise_large_scale_characteristics".

summarise_cohort_codelist

summarise_cohort_codelist(
    cohort: CohortTable,
    *,
    cohort_id: list[int] | None = None,
) -> SummarisedResult

Summarise the codelist used to define each cohort.

Parameters

cohort A CohortTable with codelist metadata. cohort_id Restrict to specific cohort definition IDs. None = all.

Returns

SummarisedResult With result_type="summarise_cohort_codelist", strata_name="codelist_name &&& codelist_type".

Table Functions

Thin wrappers around vis_omop_table() / vis_table() with domain-specific defaults for estimate formatting, headers, and grouping.

table_characteristics

table_characteristics(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a characteristics table.

Parameters

result A SummarisedResult with result_type="summarise_characteristics". type Output format: "gt" for great_tables, "polars" for DataFrame. header Columns to pivot into header. Defaults to ["cdm_name", "cohort_name"]. group_column Columns for row grouping. hide Columns to hide. style A TableStyle for styling.

Returns

great_tables.GT or polars.DataFrame

table_cohort_count

table_cohort_count(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a cohort count table.

Parameters

result A SummarisedResult with result_type="summarise_cohort_count". type, header, group_column, hide, style See :func:table_characteristics.

Returns

great_tables.GT or polars.DataFrame

table_cohort_attrition

table_cohort_attrition(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a cohort attrition table.

Parameters

result A SummarisedResult with result_type="summarise_cohort_attrition". type, header, group_column, hide, style See :func:table_characteristics.

Returns

great_tables.GT or polars.DataFrame

table_cohort_timing

table_cohort_timing(
    result: SummarisedResult,
    *,
    time_scale: Literal["days", "years"] = "days",
    unique_combinations: bool = True,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a cohort timing table.

Parameters

result A SummarisedResult with result_type="summarise_cohort_timing". time_scale "days" or "years" (divides by 365.25). unique_combinations If True, show only unique cohort pairs (A→B but not B→A). type, header, group_column, hide, style See :func:table_characteristics.

Returns

great_tables.GT or polars.DataFrame

table_cohort_overlap

table_cohort_overlap(
    result: SummarisedResult,
    *,
    unique_combinations: bool = True,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
    **options: Any,
) -> Any

Render a cohort overlap table.

Parameters

result A SummarisedResult with result_type="summarise_cohort_overlap". unique_combinations If True, show only unique cohort pairs. type, header, group_column, hide, style See :func:table_characteristics.

Returns

great_tables.GT or polars.DataFrame

table_large_scale_characteristics

table_large_scale_characteristics(
    result: SummarisedResult,
    *,
    type: Literal["gt", "polars"] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
) -> Any

Render the full large-scale characteristics table.

Parameters

result A SummarisedResult with result_type="summarise_large_scale_characteristics". type Output format. hide Columns to hide. style Table style.

Returns

great_tables.GT or polars.DataFrame

table_top_large_scale_characteristics

table_top_large_scale_characteristics(
    result: SummarisedResult,
    *,
    top_concepts: int = 10,
    type: Literal["gt", "polars"] | None = None,
    style: Any | None = None,
) -> Any

Render the top N most frequent concepts as a table.

Parameters

result A SummarisedResult with result_type="summarise_large_scale_characteristics". top_concepts Number of top concepts per group to display. type Output format. style Table style.

Returns

great_tables.GT or polars.DataFrame

available_table_columns

available_table_columns(
    result: SummarisedResult,
) -> list[str]

Return columns available for table customisation.

Parameters

result Any characteristics SummarisedResult.

Returns

list[str] Column names from cdm_name, group, strata, additional, and settings columns.

Plot Functions

Wrappers around bar_plot(), scatter_plot(), box_plot(), and custom Plotly visualizations.

plot_characteristics

plot_characteristics(
    result: SummarisedResult,
    *,
    plot_type: Literal[
        "barplot", "scatterplot", "boxplot"
    ] = "barplot",
    facet: str | list[str] | None = None,
    colour: str | None = None,
    style: Any | None = None,
) -> Any

Plot characteristics results.

Parameters

result A SummarisedResult with result_type="summarise_characteristics". plot_type "barplot", "scatterplot", or "boxplot". facet Column(s) for faceting. colour Column for colour grouping. style A PlotStyle for styling.

Returns

plotly.graph_objects.Figure

plot_cohort_count

plot_cohort_count(
    result: SummarisedResult,
    *,
    x: str | None = None,
    facet: str | list[str] | None = None,
    colour: str | None = None,
    style: Any | None = None,
) -> Any

Plot cohort counts as a bar chart.

Parameters

result A SummarisedResult with result_type="summarise_cohort_count". x Column for x-axis. Defaults to "cohort_name". facet Column(s) for faceting. Defaults to ["cdm_name"]. colour Column for colour grouping. style A PlotStyle for styling.

Returns

plotly.graph_objects.Figure

plot_cohort_attrition

plot_cohort_attrition(
    result: SummarisedResult,
    *,
    show: list[str] | None = None,
) -> Any

Render an attrition flowchart as a Plotly figure.

Unlike the R version which uses DiagrammeR, this renders a simplified vertical flowchart using Plotly shapes and annotations.

Parameters

result A SummarisedResult with result_type="summarise_cohort_attrition". show Which counts to display: ["subjects"], ["records"], or ["subjects", "records"] (default).

Returns

plotly.graph_objects.Figure A flowchart figure.

plot_cohort_overlap

plot_cohort_overlap(
    result: SummarisedResult,
    *,
    unique_combinations: bool = True,
    facet: str | list[str] | None = None,
    colour: str | None = None,
    style: Any | None = None,
) -> Any

Plot cohort overlap as a stacked bar chart.

Parameters

result A SummarisedResult with result_type="summarise_cohort_overlap". unique_combinations If True, show only unique cohort pairs. facet Column(s) for faceting. Defaults to ["cdm_name", "cohort_name_reference"]. colour Column for colour grouping. Defaults to "variable_name". style A PlotStyle for styling.

Returns

plotly.graph_objects.Figure

plot_cohort_timing

plot_cohort_timing(
    result: SummarisedResult,
    *,
    plot_type: Literal[
        "boxplot", "densityplot"
    ] = "boxplot",
    time_scale: Literal["days", "years"] = "days",
    unique_combinations: bool = True,
    facet: str | list[str] | None = None,
    colour: str | list[str] | None = None,
    style: Any | None = None,
) -> Any

Plot cohort timing distributions.

Parameters

result A SummarisedResult with result_type="summarise_cohort_timing". plot_type "boxplot" or "densityplot". time_scale "days" or "years" (divides by 365.25). unique_combinations If True, show only unique cohort pairs. facet Column(s) for faceting. Defaults to ["cdm_name", "cohort_name_reference"]. colour Column for colour grouping. Defaults to "cohort_name_comparator". style A PlotStyle for styling.

Returns

plotly.graph_objects.Figure

plot_large_scale_characteristics

plot_large_scale_characteristics(
    result: SummarisedResult,
    *,
    facet: str | list[str] | None = None,
    colour: str | None = None,
    style: Any | None = None,
) -> Any

Plot large-scale characteristics as a scatter plot.

Parameters

result A SummarisedResult with result_type="summarise_large_scale_characteristics". facet Column(s) for faceting. Defaults to ["cdm_name", "cohort_name"]. colour Column for colour grouping. Defaults to "variable_level". style A PlotStyle for styling.

Returns

plotly.graph_objects.Figure

plot_compared_large_scale_characteristics

plot_compared_large_scale_characteristics(
    result: SummarisedResult,
    *,
    colour: str,
    reference: str | None = None,
    facet: str | list[str] | None = None,
    missings: float | None = 0.0,
    style: Any | None = None,
) -> Any

Plot compared large-scale characteristics.

Shows a scatter plot where x is the reference group's percentage and y is each comparison group's percentage, with a diagonal reference line.

Parameters

result A SummarisedResult with result_type="summarise_large_scale_characteristics". colour Required. Column to colour by (e.g. "cohort_name"). reference Level of colour to use as reference (x-axis). Defaults to the first alphabetical level. facet Column(s) for faceting. missings Replace missing percentages with this value. None = drop. style A PlotStyle for styling.

Returns

plotly.graph_objects.Figure

Mock Data

mock_cohort_characteristics

mock_cohort_characteristics(
    *, n_cohorts: int = 2, n_strata: int = 0, seed: int = 42
) -> SummarisedResult

Generate a mock SummarisedResult for cohort characteristics.

Creates synthetic data representative of a summarise_characteristics() output, useful for testing table/plot functions without requiring a database.

Parameters

n_cohorts Number of cohorts to simulate. n_strata Number of additional strata to include (0 = overall only). seed Random seed for reproducibility.

Returns

SummarisedResult With result_type="summarise_characteristics".