omopy.profiles¶
Patient-level enrichment for OMOP CDM tables — add demographic information, cohort/table/concept intersections, death data, and categorical binning.
This module is the Python equivalent of the R PatientProfiles package.
Demographics¶
add_age
¶
add_age(
x: CdmTable,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
age_name: str = "age",
age_missing_month: int = 1,
age_missing_day: int = 1,
age_impose_month: bool = False,
age_impose_day: bool = False,
age_unit: Literal["years", "months", "days"] = "years",
age_group: dict[str, tuple[float, float]]
| list[tuple[float, float]]
| None = None,
missing_age_group_value: str = "None",
) -> CdmTable
Add an age column computed at the index date.
Uses the R PatientProfiles integer-arithmetic trick for year/month age computation.
Parameters¶
x
Input CDM table.
cdm
CDM reference. If None, uses x.cdm.
index_date
Column to compute age at.
age_name
Output column name.
age_missing_month, age_missing_day
Defaults for missing birth month/day.
age_impose_month, age_impose_day
Force defaults even when actual values exist.
age_unit
"years" (default), "months", or "days".
age_group
Optional age-group binning (dict or list of ranges).
missing_age_group_value
Value for missing age groups.
Returns¶
CdmTable Input table with the age column added.
add_sex
¶
add_sex(
x: CdmTable,
cdm: CdmReference | None = None,
*,
sex_name: str = "sex",
missing_sex_value: str = "None",
) -> CdmTable
Add a sex column (Male / Female / missing).
Maps gender_concept_id: 8507 → Male, 8532 → Female,
anything else → missing_sex_value.
Parameters¶
x
Input CDM table.
cdm
CDM reference. If None, uses x.cdm.
sex_name
Output column name.
missing_sex_value
Value when sex is unknown.
Returns¶
CdmTable Input table with the sex column added.
add_demographics
¶
add_demographics(
x: CdmTable,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
age: bool = True,
age_name: str = "age",
age_missing_month: int = 1,
age_missing_day: int = 1,
age_impose_month: bool = False,
age_impose_day: bool = False,
age_unit: Literal["years", "months", "days"] = "years",
age_group: dict[str, tuple[float, float]]
| list[tuple[float, float]]
| None = None,
missing_age_group_value: str = "None",
sex: bool = True,
sex_name: str = "sex",
missing_sex_value: str = "None",
prior_observation: bool = True,
prior_observation_name: str = "prior_observation",
prior_observation_type: Literal[
"days", "date"
] = "days",
future_observation: bool = True,
future_observation_name: str = "future_observation",
future_observation_type: Literal[
"days", "date"
] = "days",
date_of_birth: bool = False,
date_of_birth_name: str = "date_of_birth",
) -> CdmTable
Add demographic columns to a CDM table.
Joins the person and observation_period tables to compute
age, sex, prior/future observation, and date of birth for each row.
Parameters¶
x
Input CDM table (must have a person identifier and index_date).
cdm
CDM reference. If None, uses x.cdm.
index_date
Column name of the date to compute demographics at.
age, sex, prior_observation, future_observation, date_of_birth
Whether to add each column.
age_name, sex_name, prior_observation_name,
future_observation_name, date_of_birth_name
Column names for the output.
age_missing_month, age_missing_day
Default month/day when birth month/day is missing.
age_impose_month, age_impose_day
Force the missing values even when actual values exist.
age_unit
Unit for age: "years" (default), "months", or "days".
age_group
Optional age-group binning. Dict mapping label to (lower, upper)
range, or a list of (lower, upper) tuples (auto-labelled).
missing_age_group_value
Value for rows where age is missing.
missing_sex_value
Value for rows where sex is unknown.
prior_observation_type, future_observation_type
"days" for integer days, "date" for the actual date.
Returns¶
CdmTable The input table with new demographic columns.
add_date_of_birth
¶
add_date_of_birth(
x: CdmTable,
cdm: CdmReference | None = None,
*,
date_of_birth_name: str = "date_of_birth",
missing_month: int = 1,
missing_day: int = 1,
impose_month: bool = False,
impose_day: bool = False,
) -> CdmTable
Add a date_of_birth column constructed from the person table.
Combines year_of_birth, month_of_birth, day_of_birth
from the person table into a single date column.
Parameters¶
x
Input CDM table.
cdm
CDM reference. If None, uses x.cdm.
date_of_birth_name
Output column name.
missing_month, missing_day
Defaults when birth month/day is missing.
impose_month, impose_day
Force defaults even when actual values exist.
Returns¶
CdmTable Input table with date of birth added.
add_prior_observation
¶
add_prior_observation(
x: CdmTable,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
prior_observation_name: str = "prior_observation",
prior_observation_type: Literal[
"days", "date"
] = "days",
) -> CdmTable
Add a prior_observation column.
Computes the number of days (or actual date) from the start of the observation period containing the index date.
Parameters¶
x
Input CDM table.
cdm
CDM reference. If None, uses x.cdm.
index_date
Column to measure from.
prior_observation_name
Output column name.
prior_observation_type
"days" for integer days, "date" for the observation start date.
Returns¶
CdmTable Input table with prior observation added.
add_future_observation
¶
add_future_observation(
x: CdmTable,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
future_observation_name: str = "future_observation",
future_observation_type: Literal[
"days", "date"
] = "days",
) -> CdmTable
Add a future_observation column.
Computes the number of days (or actual date) from the index date to the end of the observation period containing the index date.
Parameters¶
x
Input CDM table.
cdm
CDM reference. If None, uses x.cdm.
index_date
Column to measure from.
future_observation_name
Output column name.
future_observation_type
"days" for integer days, "date" for the observation end date.
Returns¶
CdmTable Input table with future observation added.
add_in_observation
¶
add_in_observation(
x: CdmTable,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
window: Window = (0, 0),
complete_interval: bool = False,
name_style: str = "in_observation",
) -> CdmTable
Add an in_observation flag (1/0) for each time window.
Checks whether the index date (± window) falls within a person's observation period.
Parameters¶
x
Input CDM table.
cdm
CDM reference. If None, uses x.cdm.
index_date
Column to check.
window
Time window relative to index date.
complete_interval
If True, requires the observation period to completely cover
the window. If False, any overlap suffices.
name_style
Output column name (or template with {window_name}).
Returns¶
CdmTable Input table with the in-observation flag(s) added.
Cohort Intersections¶
add_cohort_intersect_flag
¶
add_cohort_intersect_flag(
x: CdmTable,
target_cohort_table: str | CohortTable,
cdm: CdmReference | None = None,
*,
target_cohort_id: list[int] | None = None,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
target_start_date: str = "cohort_start_date",
target_end_date: str = "cohort_end_date",
window: Window | list[Window] = (0, float("inf")),
name_style: str = "{cohort_name}_{window_name}",
) -> CdmTable
Add a binary flag (0/1) per cohort and time window.
Parameters¶
x
Input CDM table.
target_cohort_table
Name of a cohort table in the CDM, or a CohortTable directly.
cdm
CDM reference.
target_cohort_id
Subset of cohort IDs. None = all.
index_date
Reference date column in x.
censor_date
Optional censoring column.
target_start_date
Start date column in cohort table.
target_end_date
End date column in cohort table.
window
Time window(s).
name_style
Column naming template with {cohort_name} and {window_name}.
Returns¶
CdmTable Input table with flag columns added.
add_cohort_intersect_count
¶
add_cohort_intersect_count(
x: CdmTable,
target_cohort_table: str | CohortTable,
cdm: CdmReference | None = None,
*,
target_cohort_id: list[int] | None = None,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
target_start_date: str = "cohort_start_date",
target_end_date: str = "cohort_end_date",
window: Window | list[Window] = (0, float("inf")),
name_style: str = "{cohort_name}_{window_name}",
) -> CdmTable
Add event count per cohort and time window.
Parameters¶
x Input CDM table. target_cohort_table Cohort table name or CohortTable. cdm CDM reference. target_cohort_id Subset of cohort IDs. index_date, censor_date Reference and censoring dates. target_start_date, target_end_date Date columns in cohort table. window Time window(s). name_style Column naming template.
Returns¶
CdmTable Input table with count columns added.
add_cohort_intersect_date
¶
add_cohort_intersect_date(
x: CdmTable,
target_cohort_table: str | CohortTable,
cdm: CdmReference | None = None,
*,
target_cohort_id: list[int] | None = None,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
target_date: str = "cohort_start_date",
order: Literal["first", "last"] = "first",
window: Window | list[Window] = (0, float("inf")),
name_style: str = "{cohort_name}_{window_name}",
) -> CdmTable
Add the date of the first/last cohort event per cohort and window.
Parameters¶
x
Input CDM table.
target_cohort_table
Cohort table name or CohortTable.
cdm
CDM reference.
target_cohort_id
Subset of cohort IDs.
index_date, censor_date
Reference and censoring dates.
target_date
Date column in cohort table (point-in-time).
order
"first" or "last" event.
window
Time window(s).
name_style
Column naming template.
Returns¶
CdmTable Input table with date columns added.
add_cohort_intersect_days
¶
add_cohort_intersect_days(
x: CdmTable,
target_cohort_table: str | CohortTable,
cdm: CdmReference | None = None,
*,
target_cohort_id: list[int] | None = None,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
target_date: str = "cohort_start_date",
order: Literal["first", "last"] = "first",
window: Window | list[Window] = (0, float("inf")),
name_style: str = "{cohort_name}_{window_name}",
) -> CdmTable
Add days from index to first/last cohort event per cohort and window.
Parameters¶
x
Input CDM table.
target_cohort_table
Cohort table name or CohortTable.
cdm
CDM reference.
target_cohort_id
Subset of cohort IDs.
index_date, censor_date
Reference and censoring dates.
target_date
Date column in cohort table.
order
"first" or "last" event.
window
Time window(s).
name_style
Column naming template.
Returns¶
CdmTable Input table with days columns added.
add_cohort_intersect_field
¶
add_cohort_intersect_field(
x: CdmTable,
target_cohort_table: str | CohortTable,
field: str,
cdm: CdmReference | None = None,
*,
target_cohort_id: list[int] | None = None,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
target_date: str = "cohort_start_date",
order: Literal["first", "last"] = "first",
window: Window | list[Window] = (0, float("inf")),
name_style: str = "{cohort_name}_{field}_{window_name}",
) -> CdmTable
Add a field value from the first/last cohort event.
Parameters¶
x
Input CDM table.
target_cohort_table
Cohort table name or CohortTable.
field
Column name in cohort table to extract.
cdm
CDM reference.
target_cohort_id
Subset of cohort IDs.
index_date, censor_date
Reference and censoring dates.
target_date
Date column in cohort table.
order
"first" or "last" event.
window
Time window(s).
name_style
Column naming template with {field}.
Returns¶
CdmTable Input table with field columns added.
Concept Intersections¶
add_concept_intersect_flag
¶
add_concept_intersect_flag(
x: CdmTable,
concept_set: Codelist | dict[str, list[int]],
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
in_observation: bool = True,
name_style: str = "{concept_name}_{window_name}",
) -> CdmTable
Add a binary flag (0/1) per concept set and time window.
Parameters¶
x
Input CDM table.
concept_set
Mapping of concept set names to lists of concept IDs.
cdm
CDM reference. If None, uses x.cdm.
index_date
Column in x containing the reference date.
censor_date
Optional censoring column.
window
Time window(s) relative to index date.
in_observation
Restrict to events within observation period.
name_style
Column naming template. {concept_name} and {window_name}
are replaced.
Returns¶
CdmTable Input table with flag columns added.
add_concept_intersect_count
¶
add_concept_intersect_count(
x: CdmTable,
concept_set: Codelist | dict[str, list[int]],
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
in_observation: bool = True,
name_style: str = "{concept_name}_{window_name}",
) -> CdmTable
Add event count per concept set and time window.
Parameters¶
x
Input CDM table.
concept_set
Mapping of concept set names to lists of concept IDs.
cdm
CDM reference.
index_date
Reference date column in x.
censor_date
Optional censoring column.
window
Time window(s).
in_observation
Restrict to observation period.
name_style
Column naming template.
Returns¶
CdmTable Input table with count columns added.
add_concept_intersect_date
¶
add_concept_intersect_date(
x: CdmTable,
concept_set: Codelist | dict[str, list[int]],
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
in_observation: bool = True,
order: Literal["first", "last"] = "first",
name_style: str = "{concept_name}_{window_name}",
) -> CdmTable
Add the date of the first/last event per concept set and window.
Parameters¶
x
Input CDM table.
concept_set
Mapping of concept set names to lists of concept IDs.
cdm
CDM reference.
index_date
Reference date column.
censor_date
Optional censoring column.
window
Time window(s).
in_observation
Restrict to observation period.
order
"first" or "last" event.
name_style
Column naming template.
Returns¶
CdmTable Input table with date columns added.
add_concept_intersect_days
¶
add_concept_intersect_days(
x: CdmTable,
concept_set: Codelist | dict[str, list[int]],
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
in_observation: bool = True,
order: Literal["first", "last"] = "first",
name_style: str = "{concept_name}_{window_name}",
) -> CdmTable
Add days from index to first/last event per concept set and window.
Parameters¶
x
Input CDM table.
concept_set
Mapping of concept set names to lists of concept IDs.
cdm
CDM reference.
index_date
Reference date column.
censor_date
Optional censoring column.
window
Time window(s).
in_observation
Restrict to observation period.
order
"first" or "last" event.
name_style
Column naming template.
Returns¶
CdmTable Input table with days columns added.
add_concept_intersect_field
¶
add_concept_intersect_field(
x: CdmTable,
concept_set: Codelist | dict[str, list[int]],
field: str,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
in_observation: bool = True,
order: Literal["first", "last"] = "first",
name_style: str = "{concept_name}_{field}_{window_name}",
) -> CdmTable
Add a field value from the first/last event per concept set.
.. note::
Field extraction for concept intersects is limited since events
come from different domain tables. The ``field`` must be a column
present in the unified overlap table (currently only the
canonical columns are available).
Parameters¶
x
Input CDM table.
concept_set
Mapping of concept set names to lists of concept IDs.
field
Column name to extract.
cdm
CDM reference.
index_date
Reference date column.
censor_date
Optional censoring column.
window
Time window(s).
in_observation
Restrict to observation period.
order
"first" or "last" event.
name_style
Column naming template with {field}.
Returns¶
CdmTable Input table with field columns added.
Table Intersections¶
add_table_intersect_flag
¶
add_table_intersect_flag(
x: CdmTable,
table_name: str,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
target_start_date: str | None = None,
target_end_date: str | None = None,
in_observation: bool = True,
name_style: str = "{table_name}_{window_name}",
) -> CdmTable
Add a binary flag (0/1) for events in an OMOP table.
Parameters¶
x
Input CDM table.
table_name
Name of the OMOP table to intersect with.
cdm
CDM reference. If None, uses x.cdm.
index_date
Column in x containing the reference date.
censor_date
Optional column in x for censoring events.
window
Time window(s) relative to index date.
target_start_date
Start date column in target. Auto-detected if None.
target_end_date
End date column in target. Auto-detected if None.
in_observation
Restrict to events within observation period.
name_style
Column naming template. Use {table_name} and {window_name}.
Returns¶
CdmTable Input table with flag column(s) added.
add_table_intersect_count
¶
add_table_intersect_count(
x: CdmTable,
table_name: str,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
target_start_date: str | None = None,
target_end_date: str | None = None,
in_observation: bool = True,
name_style: str = "{table_name}_{window_name}",
) -> CdmTable
Add event count from an OMOP table.
Parameters¶
x
Input CDM table.
table_name
Name of the OMOP table to intersect with.
cdm
CDM reference.
index_date
Reference date column in x.
censor_date
Optional censoring column.
window
Time window(s).
target_start_date
Start date in target. Auto-detected if None.
target_end_date
End date in target. Auto-detected if None.
in_observation
Restrict to observation period.
name_style
Column naming template.
Returns¶
CdmTable Input table with count column(s) added.
add_table_intersect_date
¶
add_table_intersect_date(
x: CdmTable,
table_name: str,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
target_date: str | None = None,
in_observation: bool = True,
order: Literal["first", "last"] = "first",
name_style: str = "{table_name}_{window_name}",
) -> CdmTable
Add the date of the first/last event from an OMOP table.
Parameters¶
x
Input CDM table.
table_name
Name of the OMOP table.
cdm
CDM reference.
index_date
Reference date column.
censor_date
Optional censoring column.
window
Time window(s).
target_date
Date column in target. Auto-detected if None.
in_observation
Restrict to observation period.
order
"first" or "last" event.
name_style
Column naming template.
Returns¶
CdmTable Input table with date column(s) added.
add_table_intersect_days
¶
add_table_intersect_days(
x: CdmTable,
table_name: str,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
target_date: str | None = None,
in_observation: bool = True,
order: Literal["first", "last"] = "first",
name_style: str = "{table_name}_{window_name}",
) -> CdmTable
Add days from index to first/last event in an OMOP table.
Parameters¶
x
Input CDM table.
table_name
Name of the OMOP table.
cdm
CDM reference.
index_date
Reference date column.
censor_date
Optional censoring column.
window
Time window(s).
target_date
Date column in target. Auto-detected if None.
in_observation
Restrict to observation period.
order
"first" or "last" event.
name_style
Column naming template.
Returns¶
CdmTable Input table with days column(s) added.
add_table_intersect_field
¶
add_table_intersect_field(
x: CdmTable,
table_name: str,
field: str,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
censor_date: str | None = None,
window: Window | list[Window] = (0, float("inf")),
target_date: str | None = None,
in_observation: bool = True,
order: Literal["first", "last"] = "first",
name_style: str = "{table_name}_{field}_{window_name}",
) -> CdmTable
Add a field value from the first/last event in an OMOP table.
Parameters¶
x
Input CDM table.
table_name
Name of the OMOP table.
field
Column name in the target table to extract.
cdm
CDM reference.
index_date
Reference date column.
censor_date
Optional censoring column.
window
Time window(s).
target_date
Date column in target. Auto-detected if None.
in_observation
Restrict to observation period.
order
"first" or "last" event.
name_style
Column naming template. Use {field} placeholder.
Returns¶
CdmTable Input table with field column(s) added.
Death¶
add_death_flag
¶
add_death_date
¶
add_death_days
¶
Categories¶
add_categories
¶
add_categories(
x: CdmTable,
variable: str,
categories: dict[str, list[tuple[float, float]]]
| dict[str, dict[str, tuple[float, float]]],
*,
missing_category_value: str = "None",
overlap: bool = False,
) -> CdmTable
Add categorical columns by binning a numeric variable.
Parameters¶
x Input CDM table. variable Column name to categorize. categories Mapping of output column name to category definitions. Each category definition is either:
- A list of ``(lower, upper)`` tuples (auto-labelled).
- A dict mapping label to ``(lower, upper)`` tuple.
missing_category_value
Value for rows where the variable is NULL.
overlap
If True, allow overlapping ranges.
Returns¶
CdmTable Input table with new categorical columns.
Examples¶
add_categories( ... x, "age", ... {"age_group": { ... "young": (0, 17), ... "adult": (18, 64), ... "senior": (65, float("inf")), ... }}, ... )
Utilities¶
add_cdm_name
¶
add_cohort_name
¶
add_concept_name
¶
add_concept_name(
x: CdmTable,
cdm: CdmReference | None = None,
*,
column: str | None = None,
name_style: str = "{column}_name",
) -> CdmTable
Add concept name(s) by looking up concept IDs in the concept table.
Parameters¶
x
Input CDM table.
cdm
CDM reference.
column
Column containing concept IDs. If None, auto-detects columns
ending in _concept_id.
name_style
Template for new column names with {column} placeholder.
Returns¶
CdmTable Input table with concept name column(s) added.
filter_cohort_id
¶
filter_in_observation
¶
filter_in_observation(
x: CdmTable,
cdm: CdmReference | None = None,
*,
index_date: str = "cohort_start_date",
) -> CdmTable
Filter to rows where the index date is within an observation period.
INNER JOINs with observation_period and filters to rows where
obs_start <= index_date <= obs_end.
Parameters¶
x Input CDM table. cdm CDM reference. index_date Column to check against observation periods.
Returns¶
CdmTable Filtered table (only rows within observation).
Column Helpers¶
start_date_column
¶
Return the canonical start-date column name for an OMOP table.
Parameters¶
table_name
CDM table name (e.g. "condition_occurrence").
Returns¶
str
Column name (e.g. "condition_start_date").
For non-OMOP tables, returns "cohort_start_date".
Examples¶
start_date_column("drug_exposure") 'drug_exposure_start_date' start_date_column("my_cohort") 'cohort_start_date'
end_date_column
¶
Return the canonical end-date column name for an OMOP table.
Parameters¶
table_name
CDM table name (e.g. "condition_occurrence").
Returns¶
str
Column name (e.g. "condition_end_date").
For non-OMOP tables, returns "cohort_end_date".
Examples¶
end_date_column("drug_exposure") 'drug_exposure_end_date' end_date_column("my_cohort") 'cohort_end_date'
person_id_column
¶
Detect the person identifier column from a table's column list.
Checks for "person_id" first, then "subject_id" (used in
cohort tables). Raises if neither is found.
Parameters¶
table_columns Column names of the table.
Returns¶
str
"person_id" or "subject_id".
Raises¶
ValueError If neither column is found.
standard_concept_id_column
¶
source_concept_id_column
¶
Windows¶
Window¶
validate_windows
¶
Validate and normalise window specifications.
Accepts a single window tuple or a list of windows. Returns a list
of validated (lower, upper) tuples.
Parameters¶
windows
A single (lower, upper) tuple or a list of them.
Returns¶
list[Window] Validated window list.
Raises¶
ValueError
If any window has lower > upper.
window_name
¶
Generate a standardised name for a time window.
Follows the R convention: negative values prefixed with m,
infinity as inf.
Parameters¶
window
A (lower, upper) pair.
Returns¶
str
E.g. "0_to_inf", "m365_to_m1".
Examples¶
window_name((0, float('inf'))) '0_to_inf' window_name((-365, -1)) 'm365_to_m1' window_name((float('-inf'), float('inf'))) 'minf_to_inf'
format_name_style
¶
Format a name-style template with the given replacements.
Templates use {placeholder} syntax. The result is converted to
snake_case and lowered.
Parameters¶
template
A string like "{cohort_name}_{window_name}".
**replacements
Keyword arguments for each placeholder.
Returns¶
str Formatted, snake_case column name.
Examples¶
format_name_style("{cohort_name}_{window_name}", ... cohort_name="My Cohort", window_name="0_to_inf") 'my_cohort_0_to_inf'