omopy.generics¶
Core type system for OMOPy — foundational types that all other modules depend on.
This module is the Python equivalent of R's omopgenerics package.
CDM Container Classes¶
CdmReference¶
CdmReference
¶
CdmReference(
tables: dict[str, CdmTable] | None = None,
*,
cdm_version: CdmVersion = CdmVersion.V5_4,
cdm_name: str = "",
cdm_source: CdmSource | None = None,
)
Top-level container for an OMOP CDM instance.
Holds a collection of named CDM tables and optional source metadata.
Behaves like a dict: cdm["person"] returns the person CdmTable.
Usage::
cdm = CdmReference(
tables={"person": person_tbl, "observation_period": obs_tbl},
cdm_version=CdmVersion.V5_4,
cdm_name="my_cdm",
)
person = cdm["person"]
cdm["my_cohort"] = my_cohort_table # insert new table
CdmSource¶
CdmSource
¶
Bases: Protocol
Protocol for CDM data sources (database backends, local files, etc.).
Implementations in later phases:
- DbSource (Phase 1): database-backed via Ibis/SQLAlchemy
- LocalCdm (Phase 0): in-memory Polars DataFrames
This protocol defines the minimal interface that CdmReference needs from its backend.
CdmTable¶
CdmTable
¶
CdmTable(
data: DataFrame | LazyFrame | Any,
*,
tbl_name: str,
tbl_source: str = "local",
cdm: CdmReference | None = None,
)
A named table in an OMOP CDM, wrapping a concrete data source.
The class preserves three key pieces of metadata through transformations:
tbl_name— canonical CDM table name (e.g."person").tbl_source— string identifier for the source (e.g."duckdb").cdm— weak back-reference to the parent :class:CdmReference.
Creating derived tables (filter, join, etc.) should use :meth:_with_data
to produce a new CdmTable that inherits the metadata.
data
property
¶
The underlying data (Polars DF/LF or Ibis table expression).
cdm
property
writable
¶
Back-reference to the parent CDM reference, if any.
filter
¶
Filter rows, preserving CdmTable metadata.
select
¶
Select columns, preserving CdmTable metadata.
join
¶
join(
other: CdmTable | DataFrame | LazyFrame | Any,
on: str | list[str] | None = None,
how: str = "inner",
**kwargs: Any,
) -> Self
Join with another table, preserving this table's metadata.
collect
¶
Materialize the data to a Polars DataFrame.
For lazy sources (LazyFrame, Ibis), this triggers execution. Uses PyArrow as the zero-copy interchange format when available.
count
¶
Return the number of rows.
For Ibis-backed tables, uses the database's COUNT(*) rather than materialising the full table.
CohortTable¶
CohortTable
¶
CohortTable(
data: DataFrame | LazyFrame | Any,
*,
tbl_name: str = "cohort",
tbl_source: str = "local",
cdm: CdmReference | None = None,
settings: DataFrame | None = None,
attrition: DataFrame | None = None,
cohort_codelist: DataFrame | None = None,
)
Bases: CdmTable
A specialised CDM table representing a generated cohort.
Extends :class:CdmTable with three pieces of companion metadata:
settings— A DataFrame mappingcohort_definition_idtocohort_name(and possibly other columns).attrition— A DataFrame tracking inclusion/exclusion at each step.cohort_codelist— A :class:Codelistof concept IDs used to generate each cohort.
These mirror the R cohort_set, cohort_attrition, and
cohort_codelist attributes.
cohort_count
¶
Compute number of records and subjects per cohort definition.
Codelist Types¶
Codelist¶
Codelist
¶
Bases: dict[str, list[int]]
A named collection of concept ID lists.
Inherits from dict[str, list[int]]. Keys are codelist names,
values are lists of integer concept IDs.
Usage::
cl = Codelist({"diabetes": [201826, 442793], "hypertension": [316866]})
assert "diabetes" in cl
assert cl["diabetes"] == [201826, 442793]
ConceptEntry¶
ConceptEntry
¶
Bases: BaseModel
A single concept within a concept set expression.
Matches the ATLAS JSON format::
{
"concept": {"CONCEPT_ID": 123, "CONCEPT_NAME": "Foo", ...},
"isExcluded": false,
"includeDescendants": true,
"includeMapped": false
}
ConceptSetExpression¶
ConceptSetExpression
¶
ConceptSetExpression(
data: dict[str, list[ConceptEntry]] | None = None,
/,
**kwargs: list[ConceptEntry],
)
Bases: dict[str, list[ConceptEntry]]
A named collection of concept set expressions (with flags).
Each entry includes concept metadata plus is_excluded,
include_descendants, and include_mapped flags.
Usage::
cse = ConceptSetExpression({
"diabetes": [
ConceptEntry(concept_id=201826, include_descendants=True),
ConceptEntry(concept_id=442793, is_excluded=True),
]
})
to_codelist
¶
Convert to a simple Codelist.
Drops flags, keeping only included concepts.
Summarised Results¶
SummarisedResult¶
SummarisedResult
¶
Standard OHDSI summarised result format.
Wraps a Polars DataFrame with the 13 required columns plus a companion settings DataFrame. Provides methods for:
- Suppression (
suppress) - Splitting name-level pairs (
split_group,split_strata, etc.) - Uniting columns into name-level pairs (
unite_group,unite_strata, etc.) - Pivoting estimates (
pivot_estimates) - Adding settings (
add_settings) - Filtering by settings, strata, or group values
suppress
¶
Suppress estimate values where counts are below min_cell_count.
Following the R implementation:
1. Identify rows where variable_name is in GROUP_COUNT_VARIABLES
and estimate_value < min_cell_count.
2. Mark those result_id + group + strata + variable combinations.
3. Set estimate_value to "-" (suppressed sentinel) for those
rows and linked percentage rows.
split_strata
¶
Split strata_name/strata_level into individual columns.
split_additional
¶
Split additional_name/additional_level into individual columns.
unite_group
¶
Unite columns into group_name/group_level.
unite_strata
¶
Unite columns into strata_name/strata_level.
unite_additional
¶
Unite columns into additional_name/additional_level.
pivot_estimates
¶
Pivot estimate_name/estimate_value into wide format.
Each unique estimate_name becomes a column, with values from
estimate_value, cast according to estimate_type.
add_settings
¶
Join settings columns to the result data.
If columns is None, all settings columns are joined.
filter_settings
¶
Filter by settings values.
Example::
result.filter_settings(result_type="cohort_count")
filter_additional
¶
Filter by additional name-level pairs.
tidy
¶
Convert to a tidy DataFrame.
Add settings + split all name-level pairs + pivot.
Schema Definitions¶
CdmSchema¶
CdmSchema
¶
Registry for OMOP CDM schema specifications.
All data is lazily loaded and cached at the class level on first access.
Usage::
schema = CdmSchema(CdmVersion.V5_4)
person_fields = schema.fields_for_table("person")
required_tables = schema.required_table_names()
table_specs
property
¶
All table-level specs for this CDM version.
result_field_specs
property
¶
Specs for summarised_result / settings fields.
field_table_columns
property
¶
Semantic column mappings for clinical tables.
fields_for_table
¶
Return field specs for a specific table.
required_fields_for_table
¶
Return only required field specs for a table.
table_names
¶
Return all table names, optionally filtered by type.
required_table_names
¶
Return names of tables marked as required at table level.
table_names_in_group
¶
Return table names belonging to a logical group.
table_spec_for
¶
Return the TableSpec for a specific table, or None.
field_column_info
¶
Get semantic column mapping for a clinical table.
validate_columns
¶
validate_columns(
table_name: str,
columns: Sequence[str],
*,
check_required: bool = True,
) -> list[str]
Validate columns against the spec. Returns list of error messages.
Checks: 1. If check_required, all required columns must be present. 2. (Warning-level) Extra columns not in spec are noted.
FieldSpec¶
FieldSpec
¶
Bases: BaseModel
A single field in a CDM table (from fieldsTables).
varchar_length
property
¶
Extract max length from varchar(N) or varchar(max).
TableSpec¶
TableSpec
¶
Bases: BaseModel
Table-level metadata from the CDM spec CSVs.
ResultFieldSpec¶
ResultFieldSpec
¶
Bases: BaseModel
Field specification for a summarised/compared result.
Enums¶
CdmVersion¶
CdmVersion
¶
Bases: StrEnum
Supported OMOP CDM versions.
CdmDataType¶
CdmDataType
¶
Bases: StrEnum
Data types used in OMOP CDM field specifications.
from_spec
classmethod
¶
Parse a CDM datatype string like 'varchar(50)' or 'integer'.
TableType¶
TableType
¶
Bases: StrEnum
Classification of CDM table types.
TableGroup¶
TableGroup
¶
Bases: StrEnum
Logical groupings of CDM tables for batch selection.
TableSchema¶
TableSchema
¶
Bases: StrEnum
Database schema a CDM table lives in.
Type Aliases & Constants¶
CdmVersionLiteral¶
SUPPORTED_CDM_VERSIONS¶
NAME_LEVEL_SEP¶
OVERALL¶
COHORT_REQUIRED_COLUMNS¶
COHORT_REQUIRED_COLUMNS
module-attribute
¶
COHORT_REQUIRED_COLUMNS: tuple[str, ...] = (
"cohort_definition_id",
"subject_id",
"cohort_start_date",
"cohort_end_date",
)
SUMMARISED_RESULT_COLUMNS¶
SUMMARISED_RESULT_COLUMNS
module-attribute
¶
SUMMARISED_RESULT_COLUMNS: tuple[str, ...] = (
"result_id",
"cdm_name",
"group_name",
"group_level",
"strata_name",
"strata_level",
"variable_name",
"variable_level",
"estimate_name",
"estimate_type",
"estimate_value",
"additional_name",
"additional_level",
)
SETTINGS_REQUIRED_COLUMNS¶
SETTINGS_REQUIRED_COLUMNS
module-attribute
¶
SETTINGS_REQUIRED_COLUMNS: tuple[str, ...] = (
"result_id",
"result_type",
"package_name",
"package_version",
)
GROUP_COUNT_VARIABLES¶
GROUP_COUNT_VARIABLES
module-attribute
¶
Validation Functions¶
assert_character
¶
assert_character(
value: Any,
*,
name: str = "value",
min_length: int | None = None,
max_length: int | None = None,
na_allowed: bool = True,
null_allowed: bool = False,
) -> None
Assert value is a string or sequence of strings.
assert_choice
¶
assert_choice(
value: Any,
choices: Sequence[Any],
*,
name: str = "value",
null_allowed: bool = False,
) -> None
Assert value is one of the given choices.
assert_class
¶
assert_class(
value: Any,
cls: type | tuple[type, ...],
*,
name: str = "value",
null_allowed: bool = False,
) -> None
Assert value is an instance of cls.
assert_date
¶
Assert value is a datetime.date (or datetime).
assert_list
¶
assert_list(
value: Any,
*,
name: str = "value",
element_class: type | None = None,
min_length: int | None = None,
null_allowed: bool = False,
) -> None
Assert value is a list (or sequence).
assert_logical
¶
Assert value is a boolean.
assert_numeric
¶
assert_numeric(
value: Any,
*,
name: str = "value",
min_val: int | float | None = None,
max_val: int | float | None = None,
null_allowed: bool = False,
) -> None
Assert value is numeric (int or float).
assert_true
¶
Assert a boolean condition is True.
assert_table_columns
¶
assert_table_columns(
columns: Sequence[str],
required: Sequence[str],
*,
table_name: str = "table",
) -> None
Assert all required columns are present in columns.
I/O Functions¶
export_codelist
¶
Export a Codelist to a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
codelist
|
Codelist
|
The codelist to export. |
required |
path
|
str | Path
|
Directory to write files to. |
required |
format
|
str
|
|
'csv'
|
import_codelist
¶
Import a Codelist from file(s).
If path is a CSV file, reads it (expects codelist_name, concept_id).
If path is a directory, reads all .json files as individual concept sets.
export_concept_set_expression
¶
export_concept_set_expression(
cse: ConceptSetExpression,
path: str | Path,
*,
format: str = "json",
) -> Path
Export a ConceptSetExpression to JSON files (one per concept set).
import_concept_set_expression
¶
import_concept_set_expression(
path: str | Path, *, format: str | None = None
) -> ConceptSetExpression
Import a ConceptSetExpression from JSON file(s) or a CSV.
export_summarised_result
¶
export_summarised_result(
result: SummarisedResult,
path: str | Path,
*,
min_cell_count: int = 5,
) -> Path
Export a SummarisedResult to a CSV file.
Applies suppression before export. Settings are stored as additional rows in the same CSV with a special marker column.
import_summarised_result
¶
Import a SummarisedResult from a CSV file.