omopy.drug_diagnostics¶
Drug exposure diagnostics — run configurable quality checks on
drug_exposure records for specified ingredient concepts, summarise
findings, and visualise results.
This module is the Python equivalent of the R DrugExposureDiagnostics
package. Table rendering delegates to omopy.vis; plot rendering
uses plotly.
Constants¶
AVAILABLE_CHECKS
module-attribute
¶
AVAILABLE_CHECKS: tuple[str, ...] = (
"missing",
"exposure_duration",
"type",
"route",
"source_concept",
"days_supply",
"verbatim_end_date",
"dose",
"sig",
"quantity",
"days_between",
"diagnostics_summary",
)
Core Types¶
Pydantic model for storing diagnostic check results.
DiagnosticsResult
¶
Bases: BaseModel
Container for drug exposure diagnostics results.
Holds a named dict of Polars DataFrames (one per check) plus metadata about the execution. Immutable after creation.
Attributes¶
results
Dict mapping check names to Polars DataFrames with check results.
checks_performed
Tuple of check names that were actually run.
ingredient_concepts
Dict mapping ingredient concept IDs to their names.
cdm_name
Name of the CDM instance.
sample_size
Number of records sampled per ingredient (or None if no sampling).
min_cell_count
Minimum cell count threshold used for suppression.
execution_time_seconds
Total execution time in seconds.
Diagnostic Computation¶
Run one or more diagnostic checks on drug exposure records.
execute_checks
¶
execute_checks(
cdm: CdmReference,
ingredient_concept_ids: list[int] | int,
*,
checks: list[str] | tuple[str, ...] | None = None,
sample_size: int | None = 10000,
min_cell_count: int = 5,
) -> DiagnosticsResult
Run drug exposure diagnostic checks for specified ingredients.
This is the main entry point for the drug diagnostics module. For each ingredient concept ID, it resolves descendant drug concepts, fetches (and optionally samples) drug_exposure records, and runs each enabled check.
Parameters¶
cdm
A CdmReference connected to an OMOP CDM database.
ingredient_concept_ids
One or more ingredient concept IDs to diagnose.
checks
Which checks to run. Defaults to all available checks.
See :data:AVAILABLE_CHECKS for valid names.
sample_size
Maximum number of records to sample per ingredient. Set to
None to use all records (can be slow for large datasets).
min_cell_count
Counts below this threshold are replaced with None for
privacy protection. Set to 0 to disable.
Returns¶
DiagnosticsResult Container with a dict of Polars DataFrames (one per check), plus metadata about the execution.
Raises¶
ValueError
If any check name is not in AVAILABLE_CHECKS.
TypeError
If cdm is not a CdmReference.
Examples¶
import omopy cdm = omopy.connector.cdm_from_con(con, cdm_schema="base") result = omopy.drug_diagnostics.execute_checks( ... cdm, ... ingredient_concept_ids=[1125315, 1503297], ... checks=["missing", "exposure_duration", "type"], ... sample_size=5000, ... ) result["missing"] # Polars DataFrame
Summarise Functions¶
Aggregate diagnostic results into a standardised SummarisedResult.
summarise_drug_diagnostics
¶
Convert drug diagnostics results to SummarisedResult format.
Transforms the dict-of-DataFrames output from :func:execute_checks
into the standard 13-column :class:~omopy.generics.SummarisedResult
format used by table_drug_diagnostics() and plot_drug_diagnostics().
Parameters¶
result
Output from :func:execute_checks.
Returns¶
SummarisedResult
Standardised result with one result_id per check type.
Examples¶
diag = omopy.drug_diagnostics.execute_checks(cdm, [1125315]) sr = omopy.drug_diagnostics.summarise_drug_diagnostics(diag) sr.settings
Table Functions¶
Format summarised results as publication-ready tables using
omopy.vis.vis_omop_table().
table_drug_diagnostics
¶
table_drug_diagnostics(
result: SummarisedResult,
*,
check: str | None = None,
type: Literal["gt", "polars"] | None = None,
header: list[str] | None = None,
group_column: list[str] | None = None,
hide: list[str] | None = None,
style: Any | None = None,
) -> Any
Format drug diagnostics results as a display-ready table.
Parameters¶
result
A SummarisedResult from :func:summarise_drug_diagnostics.
check
Specific check to display (e.g. "missing", "exposure_duration").
If None, all checks are included.
type
Output format: "gt" for great_tables.GT, "polars" for
a Polars DataFrame. Default is "polars".
header
Columns to use as multi-level headers.
group_column
Columns to use for row grouping.
hide
Columns to hide from the output.
style
Optional TableStyle for customisation.
Returns¶
great_tables.GT | polars.DataFrame Formatted table.
Plot Functions¶
Visualise diagnostic results as bar charts and box plots.
plot_drug_diagnostics
¶
plot_drug_diagnostics(
result: SummarisedResult,
*,
check: str = "missing",
facet: str | None = None,
colour: str | None = None,
title: str | None = None,
style: Any | None = None,
) -> Any
Create a plot for drug diagnostics results.
Generates bar charts for categorical checks and box plots for quantile-based checks.
Parameters¶
result
A SummarisedResult from :func:summarise_drug_diagnostics.
check
Which check to plot. One of:
"missing", "exposure_duration", "type", "route",
"source_concept", "sig", "quantity", "days_supply",
"days_between".
facet
Column to facet by (currently unused, reserved for future use).
colour
Override colour for all bars/boxes.
title
Chart title. Defaults to a descriptive title based on the check.
style
Optional plot style configuration (reserved for future use).
Returns¶
plotly.graph_objects.Figure Interactive plotly figure.
Raises¶
ValueError
If check is not a valid plottable check name.
Mock Data & Benchmarking¶
mock_drug_exposure
¶
mock_drug_exposure(
*,
n_ingredients: int = 2,
n_records_per_ingredient: int = 100,
seed: int = 42,
include_checks: list[str] | None = None,
) -> DiagnosticsResult
Generate a mock DiagnosticsResult for testing.
Creates synthetic data representative of :func:execute_checks output,
useful for testing table/plot/summarise functions without requiring a
database.
Parameters¶
n_ingredients Number of ingredient concepts to simulate. n_records_per_ingredient Number of drug exposure records per ingredient. seed Random seed for reproducibility. include_checks Which checks to include. Defaults to all available checks.
Returns¶
DiagnosticsResult Mock results with realistic distributions.
benchmark_drug_diagnostics
¶
benchmark_drug_diagnostics(
cdm: Any,
ingredient_concept_ids: list[int],
*,
checks: list[str] | None = None,
sample_size: int | None = 10000,
n_runs: int = 3,
) -> pl.DataFrame
Benchmark execute_checks performance.
Runs :func:execute_checks multiple times and reports timing statistics.
Parameters¶
cdm
A CdmReference connected to an OMOP CDM database.
ingredient_concept_ids
Ingredient concept IDs to diagnose.
checks
Which checks to run. Defaults to all.
sample_size
Maximum records to sample per ingredient.
n_runs
Number of repetitions for timing.
Returns¶
polars.DataFrame
DataFrame with columns: run, ingredient_concept_id,
n_records, execution_time_seconds.