Cohort Generation¶
OMOPy supports two approaches to cohort generation: concept-based cohorts (simple, code-driven) and CIRCE-based cohorts (complex, from ATLAS JSON definitions).
Concept-Based Cohorts¶
The simplest approach — define cohorts by lists of OMOP concept IDs:
from omopy.generics import Codelist
from omopy.connector import generate_concept_cohort_set
# Define concept sets
codelist = Codelist({
"hypertension": [320128],
"viral_sinusitis": [40481087],
"diabetes": [201826, 442793],
})
# Generate cohorts
cdm = generate_concept_cohort_set(
cdm,
codelist,
name="conditions",
)
# Access the result
cohort = cdm["conditions"]
df = cohort.collect()
print(df)
How It Works¶
For each concept set, generate_concept_cohort_set:
- Looks up which clinical domain each concept belongs to (Condition, Drug, etc.)
- Queries the corresponding domain table (e.g.,
condition_occurrence) - Filters to rows matching the concept IDs
- Constrains to records within an observation period
- Collapses overlapping date ranges into continuous cohort eras
- Assigns
cohort_definition_idbased on the codelist order
Options¶
cdm = generate_concept_cohort_set(
cdm,
codelist,
name="conditions",
limit="first", # "first" (default) or "all"
required_observation=(0, 0), # (days_before, days_after) in observation period
end="observation_period_end_date", # cohort end date strategy
)
CIRCE-Based Cohorts (ATLAS JSON)¶
For complex cohort definitions created in ATLAS:
from omopy.connector import generate_cohort_set
# From a directory of JSON files
cdm = generate_cohort_set(
cdm,
"path/to/cohort_definitions/",
name="atlas_cohorts",
)
# From a single JSON file
cdm = generate_cohort_set(
cdm,
"path/to/cohort.json",
name="my_cohort",
)
CIRCE Engine Features¶
The CIRCE engine is a clean-room Python implementation supporting:
- Primary criteria — initial qualifying events from any clinical domain
- Inclusion rules — additional criteria with temporal windows
- Correlated criteria — count-based requirements (at least N, at most N, exactly N)
- Demographic criteria — age, gender, race, ethnicity filters
- End strategies — date offset, custom drug era
- Era collapsing — merge overlapping cohort periods with configurable gap days
- Censor windows — hard date boundaries
CohortTable Structure¶
Generated cohorts are stored as CohortTable objects with companion metadata:
cohort = cdm["conditions"]
# Required columns
# cohort_definition_id | subject_id | cohort_start_date | cohort_end_date
# Companion data
print(cohort.settings) # Polars DF: cohort_definition_id, cohort_name, ...
print(cohort.attrition) # Polars DF: step-by-step record counts
print(cohort.cohort_codelist) # Polars DF: concept IDs per cohort
Working with Codelist Objects¶
from omopy.generics import Codelist
# Create from a dict
cl = Codelist({
"aspirin": [1112807],
"ibuprofen": [1177480],
})
# Access
print(cl.names) # ['aspirin', 'ibuprofen']
print(cl.all_concept_ids) # {1112807, 1177480}
print(cl["aspirin"]) # [1112807]
ConceptSetExpression (Rich Codelist)¶
For codelists with include/exclude/descendants flags: