omopy.codelist¶
Vocabulary-based code list generation and analysis — search the OMOP vocabulary, traverse concept hierarchies, and build/manipulate code lists for phenotyping.
This module is the Python equivalent of the R CodelistGenerator package.
Vocabulary Search¶
get_candidate_codes
¶
get_candidate_codes(
cdm: CdmReference,
keywords: str | list[str],
*,
search_synonyms: bool = False,
exclude: str | list[str] | None = None,
domains: str | list[str] | None = None,
standard_concept: str | list[str] | None = None,
vocabulary_id: str | list[str] | None = None,
concept_class_id: str | list[str] | None = None,
include_descendants: bool = False,
name: str | None = None,
) -> Codelist
Search for concepts by keyword matching on concept_name.
Parameters¶
cdm
CDM reference with access to concept (and optionally
concept_synonym).
keywords
One or more keyword strings. Each is matched using SQL LIKE
(case-insensitive, surrounded by %).
search_synonyms
If True, also search concept_synonym.concept_synonym_name.
exclude
Keywords to exclude (concepts matching these are removed).
domains
Restrict to specific domain_id(s) (e.g. "Condition").
standard_concept
Filter by standard_concept value(s) ("S", "C", etc.).
Defaults to None (no filter). Pass "S" for standard only.
vocabulary_id
Restrict to specific vocabulary_id(s).
concept_class_id
Restrict to specific concept_class_id(s).
include_descendants
If True, include descendants of matching concepts via
concept_ancestor.
name
Name for the codelist entry. Defaults to the first keyword.
Returns¶
Codelist A codelist with one entry containing matching concept IDs.
get_mappings
¶
get_mappings(
cdm: CdmReference,
codelist: Codelist,
*,
relationship_id: str | list[str] = "Maps to",
name_style: str = "{concept_set_name}",
) -> Codelist
Get mapped concepts via concept_relationship.
For each concept set in the codelist, finds concepts linked via
the specified relationship(s) in concept_relationship.
Parameters¶
cdm
CDM reference.
codelist
Input codelist with concept IDs to find mappings for.
relationship_id
Relationship type(s) to follow (e.g. "Maps to").
name_style
Naming template. {concept_set_name} is replaced with the
original concept set name.
Returns¶
Codelist New codelist with mapped concept IDs.
Hierarchy Traversal¶
get_descendants
¶
get_descendants(
cdm: CdmReference,
concept_id: Codelist | int | list[int],
*,
include_self: bool = True,
name: str | None = None,
) -> Codelist
Get all descendant concepts via the concept_ancestor table.
Parameters¶
cdm
CDM reference with access to concept_ancestor and concept.
concept_id
One or more ancestor concept IDs, or a :class:Codelist whose
concept IDs will be extracted automatically.
include_self
If True, include the input concept(s) themselves.
name
Name for the resulting codelist entry. Defaults to "descendants_{id}".
Returns¶
Codelist A codelist mapping name to descendant concept IDs (standard only).
get_ancestors
¶
get_ancestors(
cdm: CdmReference,
concept_id: Codelist | int | list[int],
*,
include_self: bool = True,
name: str | None = None,
) -> Codelist
Get all ancestor concepts via the concept_ancestor table.
Parameters¶
cdm
CDM reference with access to concept_ancestor and concept.
concept_id
One or more descendant concept IDs, or a :class:Codelist whose
concept IDs will be extracted automatically.
include_self
If True, include the input concept(s) themselves.
name
Name for the resulting codelist entry.
Returns¶
Codelist A codelist mapping name to ancestor concept IDs (standard only).
Drug & ATC¶
get_drug_ingredient_codes
¶
get_drug_ingredient_codes(
cdm: CdmReference,
ingredient: str
| list[str]
| int
| list[int]
| None = None,
*,
name: str | None = None,
) -> Codelist
Get drug concepts linked to specified ingredients.
If ingredient is a string, searches by keyword in concept_name
where concept_class_id = 'Ingredient'. If an integer, uses concept_id
directly.
Parameters¶
cdm CDM reference. ingredient Ingredient name(s) or concept ID(s). If None, returns all standard ingredient concepts. name Name for the codelist entry.
Returns¶
Codelist Codelist of drug ingredient concept IDs.
get_atc_codes
¶
get_atc_codes(
cdm: CdmReference,
atc_name: str | None = None,
*,
level: str | list[str] | None = None,
name: str | None = None,
) -> Codelist
Get ATC (Anatomical Therapeutic Chemical) codes.
Finds ATC concepts in the vocabulary and optionally their linked RxNorm concepts via concept_relationship.
Parameters¶
cdm
CDM reference.
atc_name
Keyword to search ATC concept names. If None, returns all.
level
ATC concept class(es) to filter to (e.g. "ATC 1st",
"ATC 2nd", "ATC 3rd", "ATC 4th", "ATC 5th").
name
Name for the codelist entry.
Returns¶
Codelist Codelist of ATC concept IDs.
Set Operations¶
union_codelists
¶
intersect_codelists
¶
Intersect multiple codelists, keeping only shared concept IDs.
For each concept set name present in ALL input codelists, returns only the concept IDs that appear in every codelist's version of that concept set.
Parameters¶
*codelists Two or more Codelist objects.
Returns¶
Codelist Codelist with intersected concept sets.
compare_codelists
¶
Compare two codelists element-by-element.
For each concept set name present in both codelists, computes:
- only_a: concept IDs only in codelist_a
- only_b: concept IDs only in codelist_b
- both: concept IDs in both
Parameters¶
codelist_a, codelist_b Two codelists to compare.
Returns¶
dict[str, dict[str, list[int]]] Mapping of concept set names to comparison results.
Subsetting¶
subset_to_codes_in_use
¶
subset_by_domain
¶
subset_by_vocabulary
¶
Stratification¶
stratify_by_domain
¶
stratify_by_vocabulary
¶
stratify_by_concept_class
¶
Diagnostics¶
summarise_code_use
¶
summarise_code_use(
codelist: Codelist,
cdm: CdmReference,
*,
count_by: str = "record",
) -> pl.DataFrame
Count usage of codelist concepts across CDM domain tables.
For each concept in the codelist, counts how many records (or distinct persons) reference it in the appropriate domain table.
Parameters¶
codelist
Codelist to summarise.
cdm
CDM reference.
count_by
"record" for total record count or "person" for
distinct person count.
Returns¶
pl.DataFrame
DataFrame with columns: concept_set_name, concept_id,
concept_name, domain_id, vocabulary_id, count.
summarise_orphan_codes
¶
Find related concepts used in the data but not in the codelist.
For each concept set, finds: 1. Descendants not in the codelist 2. Concepts mapped to codelist concepts but not included that actually appear in the CDM data.
Parameters¶
codelist Input codelist. cdm CDM reference.
Returns¶
pl.DataFrame
DataFrame with columns: concept_set_name, concept_id,
concept_name, domain_id, relationship, count.