omopy.testing¶
Test data generation for OMOP CDM studies — read patient data from Excel/CSV, validate against CDM specifications, construct CdmReference objects, and generate mock test databases.
This module is the Python equivalent of the R TestGenerator package.
Excel I/O uses openpyxl; cohort
timeline plots use plotly.
Read & Validate¶
Read patient data from files and validate against CDM specifications.
read_patients
¶
read_patients(
path: str | Path,
*,
cdm_version: str = "5.4",
test_name: str = "test",
output_path: str | Path | None = None,
) -> dict[str, pl.DataFrame]
Read patient data from an Excel file or CSV directory.
Auto-detects the format based on the path:
- If
pathends with.xlsx, reads each sheet as a CDM table (sheet name = table name). - If
pathis a directory, reads each.csvfile as a CDM table (filename stem = table name).
The data is validated against the CDM specification. If validation
fails, a ValueError is raised with all error messages.
If output_path is provided, writes the data as a JSON file
(format: {"table_name": [{col: val, ...}, ...], ...}).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to an |
required |
cdm_version
|
str
|
CDM version string ( |
'5.4'
|
test_name
|
str
|
Name for this test patient set (used in JSON metadata). |
'test'
|
output_path
|
str | Path | None
|
Optional path to write JSON output. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
A dict mapping table names to Polars DataFrames. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the data fails CDM validation. |
FileNotFoundError
|
If the path does not exist or contains no data. |
validate_patient_data
¶
Validate patient data against the OMOP CDM specification.
Checks that each table name is a valid CDM table, that column names match the CDM field specs, and that required fields are present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, DataFrame]
|
Mapping of table name to Polars DataFrame. |
required |
cdm_version
|
str
|
CDM version string ( |
'5.4'
|
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of error messages. An empty list means the data is valid. |
CDM Construction¶
Build CdmReference objects from JSON test definitions or synthetic data.
patients_cdm
¶
patients_cdm(
json_path: str | Path,
*,
cdm_version: str = "5.4",
cdm_name: str | None = None,
) -> CdmReference
Load patient data from a JSON file into a CdmReference.
Reads a JSON file with format::
{
"_meta": {"test_name": "...", "cdm_version": "5.4"},
"person": [{"person_id": 1, ...}, ...],
"observation_period": [...]
}
Creates Polars DataFrames for each table and wraps them as
CdmTable (or CohortTable when appropriate).
Unlike the R equivalent which downloads an empty Eunomia CDM, this
function creates in-memory tables from the JSON data only. Vocabulary
tables are not included; use cdm_from_con with a real database
if vocabulary tables are needed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
json_path
|
str | Path
|
Path to the JSON file. |
required |
cdm_version
|
str
|
CDM version string ( |
'5.4'
|
cdm_name
|
str | None
|
Human-readable name for this CDM. Defaults to the
JSON file stem or |
None
|
Returns:
| Type | Description |
|---|---|
CdmReference
|
A |
mock_test_cdm
¶
mock_test_cdm(
*,
seed: int = 42,
n_persons: int = 5,
cdm_version: str = "5.4",
include_conditions: bool = True,
include_drugs: bool = True,
include_measurements: bool = False,
) -> CdmReference
Create a small mock CDM with synthetic data for testing.
Generates realistic-looking synthetic data for person,
observation_period, and optionally condition_occurrence,
drug_exposure, and measurement tables.
This requires no database or file I/O — everything is created in-memory as Polars DataFrames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed
|
int
|
Random seed for reproducibility. |
42
|
n_persons
|
int
|
Number of persons to generate. |
5
|
cdm_version
|
str
|
CDM version string ( |
'5.4'
|
include_conditions
|
bool
|
Whether to generate |
True
|
include_drugs
|
bool
|
Whether to generate |
True
|
include_measurements
|
bool
|
Whether to generate |
False
|
Returns:
| Type | Description |
|---|---|
CdmReference
|
A |
Template Generation¶
Generate blank Excel templates with CDM-compliant column headers.
generate_test_tables
¶
generate_test_tables(
table_names: list[str],
*,
cdm_version: str = "5.4",
output_path: str | Path = ".",
filename: str | None = None,
) -> Path
Generate an empty Excel file with sheets for specified CDM tables.
Each sheet contains the correct column headers from the CDM
specification. Vocabulary tables (concept, concept_ancestor,
etc.) are excluded automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_names
|
list[str]
|
List of CDM table names to include as sheets. |
required |
cdm_version
|
str
|
CDM version string ( |
'5.4'
|
output_path
|
str | Path
|
Directory where the file will be created. |
'.'
|
filename
|
str | None
|
Output filename. Defaults to
|
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the created Excel file. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any table name is not a valid CDM table, or if a vocabulary table is requested. |
Visualization¶
Plot cohort membership timelines for individual patients.
graph_cohort
¶
Plot cohort timelines for a single subject.
Each cohort is a named DataFrame with columns cohort_definition_id,
subject_id, cohort_start_date, cohort_end_date. This
function draws a horizontal segment for each cohort entry for the
given subject_id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_id
|
int
|
The subject to visualize. |
required |
cohorts
|
dict[str, DataFrame]
|
Mapping of cohort name to cohort DataFrame. |
required |
style
|
Any | None
|
Optional Plotly layout overrides (dict or |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
A |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no cohort records found for the subject, or if required columns are missing. |