Skip to content

omopy.codelist

Vocabulary-based code list generation and analysis — search the OMOP vocabulary, traverse concept hierarchies, and build/manipulate code lists for phenotyping.

This module is the Python equivalent of the R CodelistGenerator package.

get_candidate_codes

get_candidate_codes(
    cdm: CdmReference,
    keywords: str | list[str],
    *,
    search_synonyms: bool = False,
    exclude: str | list[str] | None = None,
    domains: str | list[str] | None = None,
    standard_concept: str | list[str] | None = None,
    vocabulary_id: str | list[str] | None = None,
    concept_class_id: str | list[str] | None = None,
    include_descendants: bool = False,
    name: str | None = None,
) -> Codelist

Search for concepts by keyword matching on concept_name.

Parameters

cdm CDM reference with access to concept (and optionally concept_synonym). keywords One or more keyword strings. Each is matched using SQL LIKE (case-insensitive, surrounded by %). search_synonyms If True, also search concept_synonym.concept_synonym_name. exclude Keywords to exclude (concepts matching these are removed). domains Restrict to specific domain_id(s) (e.g. "Condition"). standard_concept Filter by standard_concept value(s) ("S", "C", etc.). Defaults to None (no filter). Pass "S" for standard only. vocabulary_id Restrict to specific vocabulary_id(s). concept_class_id Restrict to specific concept_class_id(s). include_descendants If True, include descendants of matching concepts via concept_ancestor. name Name for the codelist entry. Defaults to the first keyword.

Returns

Codelist A codelist with one entry containing matching concept IDs.

get_mappings

get_mappings(
    cdm: CdmReference,
    codelist: Codelist,
    *,
    relationship_id: str | list[str] = "Maps to",
    name_style: str = "{concept_set_name}",
) -> Codelist

Get mapped concepts via concept_relationship.

For each concept set in the codelist, finds concepts linked via the specified relationship(s) in concept_relationship.

Parameters

cdm CDM reference. codelist Input codelist with concept IDs to find mappings for. relationship_id Relationship type(s) to follow (e.g. "Maps to"). name_style Naming template. {concept_set_name} is replaced with the original concept set name.

Returns

Codelist New codelist with mapped concept IDs.

Hierarchy Traversal

get_descendants

get_descendants(
    cdm: CdmReference,
    concept_id: Codelist | int | list[int],
    *,
    include_self: bool = True,
    name: str | None = None,
) -> Codelist

Get all descendant concepts via the concept_ancestor table.

Parameters

cdm CDM reference with access to concept_ancestor and concept. concept_id One or more ancestor concept IDs, or a :class:Codelist whose concept IDs will be extracted automatically. include_self If True, include the input concept(s) themselves. name Name for the resulting codelist entry. Defaults to "descendants_{id}".

Returns

Codelist A codelist mapping name to descendant concept IDs (standard only).

get_ancestors

get_ancestors(
    cdm: CdmReference,
    concept_id: Codelist | int | list[int],
    *,
    include_self: bool = True,
    name: str | None = None,
) -> Codelist

Get all ancestor concepts via the concept_ancestor table.

Parameters

cdm CDM reference with access to concept_ancestor and concept. concept_id One or more descendant concept IDs, or a :class:Codelist whose concept IDs will be extracted automatically. include_self If True, include the input concept(s) themselves. name Name for the resulting codelist entry.

Returns

Codelist A codelist mapping name to ancestor concept IDs (standard only).

Drug & ATC

get_drug_ingredient_codes

get_drug_ingredient_codes(
    cdm: CdmReference,
    ingredient: str
    | list[str]
    | int
    | list[int]
    | None = None,
    *,
    name: str | None = None,
) -> Codelist

Get drug concepts linked to specified ingredients.

If ingredient is a string, searches by keyword in concept_name where concept_class_id = 'Ingredient'. If an integer, uses concept_id directly.

Parameters

cdm CDM reference. ingredient Ingredient name(s) or concept ID(s). If None, returns all standard ingredient concepts. name Name for the codelist entry.

Returns

Codelist Codelist of drug ingredient concept IDs.

get_atc_codes

get_atc_codes(
    cdm: CdmReference,
    atc_name: str | None = None,
    *,
    level: str | list[str] | None = None,
    name: str | None = None,
) -> Codelist

Get ATC (Anatomical Therapeutic Chemical) codes.

Finds ATC concepts in the vocabulary and optionally their linked RxNorm concepts via concept_relationship.

Parameters

cdm CDM reference. atc_name Keyword to search ATC concept names. If None, returns all. level ATC concept class(es) to filter to (e.g. "ATC 1st", "ATC 2nd", "ATC 3rd", "ATC 4th", "ATC 5th"). name Name for the codelist entry.

Returns

Codelist Codelist of ATC concept IDs.

Set Operations

union_codelists

union_codelists(*codelists: Codelist) -> Codelist

Union multiple codelists, merging concept IDs per name.

Concept sets with the same name across different codelists are merged (set union). Concept sets with distinct names are preserved as-is.

Parameters

*codelists One or more Codelist objects.

Returns

Codelist Merged codelist.

intersect_codelists

intersect_codelists(*codelists: Codelist) -> Codelist

Intersect multiple codelists, keeping only shared concept IDs.

For each concept set name present in ALL input codelists, returns only the concept IDs that appear in every codelist's version of that concept set.

Parameters

*codelists Two or more Codelist objects.

Returns

Codelist Codelist with intersected concept sets.

compare_codelists

compare_codelists(
    codelist_a: Codelist, codelist_b: Codelist
) -> dict[str, dict[str, list[int]]]

Compare two codelists element-by-element.

For each concept set name present in both codelists, computes: - only_a: concept IDs only in codelist_a - only_b: concept IDs only in codelist_b - both: concept IDs in both

Parameters

codelist_a, codelist_b Two codelists to compare.

Returns

dict[str, dict[str, list[int]]] Mapping of concept set names to comparison results.

Subsetting

subset_to_codes_in_use

subset_to_codes_in_use(
    codelist: Codelist, cdm: CdmReference
) -> Codelist

Subset a codelist to only concepts that actually appear in the CDM.

Checks each domain table for the presence of concept IDs.

Parameters

codelist Input codelist. cdm CDM reference with access to domain tables.

Returns

Codelist Codelist filtered to concepts found in the data.

subset_by_domain

subset_by_domain(
    codelist: Codelist,
    cdm: CdmReference,
    domain_id: str | list[str],
) -> Codelist

Subset a codelist to concepts in specific domain(s).

Parameters

codelist Input codelist. cdm CDM reference (for concept table lookup). domain_id Domain(s) to keep (e.g. "Condition", "Drug").

Returns

Codelist Codelist filtered to concepts in the specified domain(s).

subset_by_vocabulary

subset_by_vocabulary(
    codelist: Codelist,
    cdm: CdmReference,
    vocabulary_id: str | list[str],
) -> Codelist

Subset a codelist to concepts in specific vocabulary(ies).

Parameters

codelist Input codelist. cdm CDM reference. vocabulary_id Vocabulary ID(s) to keep (e.g. "SNOMED", "RxNorm").

Returns

Codelist Codelist filtered to concepts in the specified vocabulary(ies).

Stratification

stratify_by_domain

stratify_by_domain(
    codelist: Codelist, cdm: CdmReference
) -> Codelist

Split each concept set by domain_id.

Parameters

codelist Input codelist. cdm CDM reference.

Returns

Codelist New codelist with entries like "{name}_{domain}".

stratify_by_vocabulary

stratify_by_vocabulary(
    codelist: Codelist, cdm: CdmReference
) -> Codelist

Split each concept set by vocabulary_id.

Parameters

codelist Input codelist. cdm CDM reference.

Returns

Codelist New codelist with entries like "{name}_{vocabulary}".

stratify_by_concept_class

stratify_by_concept_class(
    codelist: Codelist, cdm: CdmReference
) -> Codelist

Split each concept set by concept_class_id.

Parameters

codelist Input codelist. cdm CDM reference.

Returns

Codelist New codelist with entries like "{name}_{concept_class}".

Diagnostics

summarise_code_use

summarise_code_use(
    codelist: Codelist,
    cdm: CdmReference,
    *,
    count_by: str = "record",
) -> pl.DataFrame

Count usage of codelist concepts across CDM domain tables.

For each concept in the codelist, counts how many records (or distinct persons) reference it in the appropriate domain table.

Parameters

codelist Codelist to summarise. cdm CDM reference. count_by "record" for total record count or "person" for distinct person count.

Returns

pl.DataFrame DataFrame with columns: concept_set_name, concept_id, concept_name, domain_id, vocabulary_id, count.

summarise_orphan_codes

summarise_orphan_codes(
    codelist: Codelist, cdm: CdmReference
) -> pl.DataFrame

Find related concepts used in the data but not in the codelist.

For each concept set, finds: 1. Descendants not in the codelist 2. Concepts mapped to codelist concepts but not included that actually appear in the CDM data.

Parameters

codelist Input codelist. cdm CDM reference.

Returns

pl.DataFrame DataFrame with columns: concept_set_name, concept_id, concept_name, domain_id, relationship, count.