Skip to content

omopy.drug_diagnostics

Drug exposure diagnostics — run configurable quality checks on drug_exposure records for specified ingredient concepts, summarise findings, and visualise results.

This module is the Python equivalent of the R DrugExposureDiagnostics package. Table rendering delegates to omopy.vis; plot rendering uses plotly.

Constants

AVAILABLE_CHECKS module-attribute

AVAILABLE_CHECKS: tuple[str, ...] = (
    "missing",
    "exposure_duration",
    "type",
    "route",
    "source_concept",
    "days_supply",
    "verbatim_end_date",
    "dose",
    "sig",
    "quantity",
    "days_between",
    "diagnostics_summary",
)

Core Types

Pydantic model for storing diagnostic check results.

DiagnosticsResult

Bases: BaseModel

Container for drug exposure diagnostics results.

Holds a named dict of Polars DataFrames (one per check) plus metadata about the execution. Immutable after creation.

Attributes

results Dict mapping check names to Polars DataFrames with check results. checks_performed Tuple of check names that were actually run. ingredient_concepts Dict mapping ingredient concept IDs to their names. cdm_name Name of the CDM instance. sample_size Number of records sampled per ingredient (or None if no sampling). min_cell_count Minimum cell count threshold used for suppression. execution_time_seconds Total execution time in seconds.

__getitem__

__getitem__(key: str) -> pl.DataFrame

Allow dict-like access: result['missing'].

__iter__

__iter__()

Iterate over check names (not Pydantic fields).

keys

keys()

Return check names.

values

values()

Return result DataFrames.

items

items()

Return (check_name, DataFrame) pairs.

Diagnostic Computation

Run one or more diagnostic checks on drug exposure records.

execute_checks

execute_checks(
    cdm: CdmReference,
    ingredient_concept_ids: list[int] | int,
    *,
    checks: list[str] | tuple[str, ...] | None = None,
    sample_size: int | None = 10000,
    min_cell_count: int = 5,
) -> DiagnosticsResult

Run drug exposure diagnostic checks for specified ingredients.

This is the main entry point for the drug diagnostics module. For each ingredient concept ID, it resolves descendant drug concepts, fetches (and optionally samples) drug_exposure records, and runs each enabled check.

Parameters

cdm A CdmReference connected to an OMOP CDM database. ingredient_concept_ids One or more ingredient concept IDs to diagnose. checks Which checks to run. Defaults to all available checks. See :data:AVAILABLE_CHECKS for valid names. sample_size Maximum number of records to sample per ingredient. Set to None to use all records (can be slow for large datasets). min_cell_count Counts below this threshold are replaced with None for privacy protection. Set to 0 to disable.

Returns

DiagnosticsResult Container with a dict of Polars DataFrames (one per check), plus metadata about the execution.

Raises

ValueError If any check name is not in AVAILABLE_CHECKS. TypeError If cdm is not a CdmReference.

Examples

import omopy cdm = omopy.connector.cdm_from_con(con, cdm_schema="base") result = omopy.drug_diagnostics.execute_checks( ... cdm, ... ingredient_concept_ids=[1125315, 1503297], ... checks=["missing", "exposure_duration", "type"], ... sample_size=5000, ... ) result["missing"] # Polars DataFrame

Summarise Functions

Aggregate diagnostic results into a standardised SummarisedResult.

summarise_drug_diagnostics

summarise_drug_diagnostics(
    result: DiagnosticsResult,
) -> SummarisedResult

Convert drug diagnostics results to SummarisedResult format.

Transforms the dict-of-DataFrames output from :func:execute_checks into the standard 13-column :class:~omopy.generics.SummarisedResult format used by table_drug_diagnostics() and plot_drug_diagnostics().

Parameters

result Output from :func:execute_checks.

Returns

SummarisedResult Standardised result with one result_id per check type.

Examples

diag = omopy.drug_diagnostics.execute_checks(cdm, [1125315]) sr = omopy.drug_diagnostics.summarise_drug_diagnostics(diag) sr.settings

Table Functions

Format summarised results as publication-ready tables using omopy.vis.vis_omop_table().

table_drug_diagnostics

table_drug_diagnostics(
    result: SummarisedResult,
    *,
    check: str | None = None,
    type: Literal["gt", "polars"] | None = None,
    header: list[str] | None = None,
    group_column: list[str] | None = None,
    hide: list[str] | None = None,
    style: Any | None = None,
) -> Any

Format drug diagnostics results as a display-ready table.

Parameters

result A SummarisedResult from :func:summarise_drug_diagnostics. check Specific check to display (e.g. "missing", "exposure_duration"). If None, all checks are included. type Output format: "gt" for great_tables.GT, "polars" for a Polars DataFrame. Default is "polars". header Columns to use as multi-level headers. group_column Columns to use for row grouping. hide Columns to hide from the output. style Optional TableStyle for customisation.

Returns

great_tables.GT | polars.DataFrame Formatted table.

Plot Functions

Visualise diagnostic results as bar charts and box plots.

plot_drug_diagnostics

plot_drug_diagnostics(
    result: SummarisedResult,
    *,
    check: str = "missing",
    facet: str | None = None,
    colour: str | None = None,
    title: str | None = None,
    style: Any | None = None,
) -> Any

Create a plot for drug diagnostics results.

Generates bar charts for categorical checks and box plots for quantile-based checks.

Parameters

result A SummarisedResult from :func:summarise_drug_diagnostics. check Which check to plot. One of: "missing", "exposure_duration", "type", "route", "source_concept", "sig", "quantity", "days_supply", "days_between". facet Column to facet by (currently unused, reserved for future use). colour Override colour for all bars/boxes. title Chart title. Defaults to a descriptive title based on the check. style Optional plot style configuration (reserved for future use).

Returns

plotly.graph_objects.Figure Interactive plotly figure.

Raises

ValueError If check is not a valid plottable check name.

Mock Data & Benchmarking

mock_drug_exposure

mock_drug_exposure(
    *,
    n_ingredients: int = 2,
    n_records_per_ingredient: int = 100,
    seed: int = 42,
    include_checks: list[str] | None = None,
) -> DiagnosticsResult

Generate a mock DiagnosticsResult for testing.

Creates synthetic data representative of :func:execute_checks output, useful for testing table/plot/summarise functions without requiring a database.

Parameters

n_ingredients Number of ingredient concepts to simulate. n_records_per_ingredient Number of drug exposure records per ingredient. seed Random seed for reproducibility. include_checks Which checks to include. Defaults to all available checks.

Returns

DiagnosticsResult Mock results with realistic distributions.

benchmark_drug_diagnostics

benchmark_drug_diagnostics(
    cdm: Any,
    ingredient_concept_ids: list[int],
    *,
    checks: list[str] | None = None,
    sample_size: int | None = 10000,
    n_runs: int = 3,
) -> pl.DataFrame

Benchmark execute_checks performance.

Runs :func:execute_checks multiple times and reports timing statistics.

Parameters

cdm A CdmReference connected to an OMOP CDM database. ingredient_concept_ids Ingredient concept IDs to diagnose. checks Which checks to run. Defaults to all. sample_size Maximum records to sample per ingredient. n_runs Number of repetitions for timing.

Returns

polars.DataFrame DataFrame with columns: run, ingredient_concept_id, n_records, execution_time_seconds.