OHDSI

LLMs can support automated mapping
of ICD to SNOMED concepts

Large Language Model–Based Classification of ICD-10-CM to SNOMED Mappings
for Improved Semantic Fidelity in OHDSI

Dmytro Dymshyts  ·  Anna Ostropolets  ·  Martijn Schuemie

OHDSI European Symposium 2026 Johnson & Johnson

The Problem

Concept mapping in OHDSI can lose or corrupt clinical meaning when source and target terminologies don't align perfectly. LLMs are proposed as a solution to define the type of semantic equivalence of mappings.

Why Mapping Quality Matters

ICD-10-CM → SNOMED CT gap

ICD-10-CM was designed for administrative billing, while SNOMED CT is a clinical ontology. The same condition can be described at different levels of specificity in each vocabulary.

Semantic mismatch types

equal same clinical meaning
uphill target more general
downhill target more specific
up&downhill both general & specific drift
incorrect different condition entirely

Impact on research

📊

Cohort definitions over- or under-include patients when mappings are not semantically equivalent.

🔬

Phenotyping algorithms inherit vocabulary biases, affecting reproducibility across data sources.

🔗

Federated analyses require consistent semantics — non-equal mappings introduce silent confounding.

Proposed solution

Use LLMs to automatically classify existing mappings by semantic equivalence type. Non-equal mappings can then be replaced with better-fitting SNOMED CT targets.

Pipeline Overview — 6 Steps

1
Candidate Retrieval
Fetch SNOMED CT targets & synonyms from OMOP vocabulary DB
2
Source Normalisation
GPT-4o removes non-specific qualifiers (NOS, NEC, unspecified…)
3
Text-Match Pre-check
Bag-of-words match → equal / high without LLM
4
LLM Classification
GPT-o3 classifies into 5 labels with confidence levels
5
Confidence Validation
Adversarial re-check + majority-vote tiebreaker for low/moderate
6
Output Partitioning
Consistent set (mappingCons) vs. conflicting set (mappingInc)

Decision provenance tags

text_matchEqual by bag-of-words; no LLM
LLMSingle o3 call
LLM (cached)Reused from cache for identical pair
LLM (validated)Confirmed by adversarial re-check
LLM (tiebreak)Majority vote across 3 calls
needs_reviewAll 3 calls differed; manual review
skippedSource already has an 'equal' target

Step 1 — Candidate Retrieval & Step 2 — Source Normalisation

Step 1 — Candidate Mapping Retrieval

SNOMED CT targets and synonyms fetched from a Databricks-hosted OMOP vocabulary database. Canonical names prioritised over synonyms via if_concept_name flag. No LLM calls in this step.

Step 2c — AND / OR logic

In ICD-10 "and" is replaced by "or" (with some exceptions) — in ICD-10 a coordinating "and" usually means "and/or", so treating it as "or" preserves the intended broader concept rather than falsely narrowing it.

# Example transformations "Atrial fibrillation and flutter""Atrial fibrillation or flutter"

Step 2 — Source Normalisation (GPT-4o)

Gate regexp — names sent to LLM only if they contain:

\bunspecified\b | \bother\b | \bnonspecific\b \buncomplicated\b | \bwithout\b | disorders | diseases not otherwise specified | not elsewhere classified

GPT-4o normalisation prompt (single-turn):

You are a clinical terminology expert. Task: produce a NORMALIZED concept name. REMOVE: unspecified, NOS, NEC, other, uncomplicated, in diseases classified elsewhere, 'without X' (keep main condition) KEEP: body site, laterality, severity, stage/grade, acuity, etiology/cause, named subtypes (classical, atypical…) Output: ONLY the normalized name. No explanation.

Step 3 — Text-Match Pre-check (No LLM)

Source–target pairs compared as bag-of-words (sorted word multisets). A match assigns equal / high confidence without any LLM call.

3a — Stop-word removal

# Articles a, an, the # Prepositions of, in, on, at, to, from, for, by, as # Other due

3b — Hyphen normalisation

# 1. Unicode → ASCII hyphen gsub("[—–‐]", "-", x) # 2. Hyphen → space # iodine-deficiency → iodine deficiency v_space <- gsub("-", " ", x) # 3. Hyphen → nothing # non-toxic → nontoxic v_none <- gsub("-", "", x) # 4. 'non word' → 'nonword' gsub("\bnon\s+([a-z0-9]+)\b", "non\1", s, perl=TRUE)

3c — Tokenisation

# Replace non-alphanumeric gsub("[^a-z0-9]+", " ", x)

Each name yields up to 6 normalised variants. A match in any source–target combination resolves the pair without an LLM call.

Examples:

"Iodine-deficiency" ↔ "Iodine deficiency"

"Non-toxic" ↔ "Nontoxic"

Step 4 — LLM Classification (GPT-o3)

Label taxonomy

LabelDefinition
equal Same clinical condition, or one is the only available subtype.
e.g. "Clonic hemifacial spasm" = "Hemifacial spasm"
uphillTarget more general — source has features absent in target.
downhillTarget more specific — target adds features absent in source.
up&downhillBoth: source drops AND target adds clinically meaningful features.
incorrectDifferent conditions — different disorder family or no semantic overlap.

Key interpretation rules

① Adjectives hypertensive, diabetic ≡ "due to" / "caused by"

~itis = inflammation of; ~osis = disorder of

③ Synonyms: kidney = renal; word order doesn't change meaning

④ Preserve: body site, laterality, severity, stage, acuity, etiology

⑤ Use incorrect ONLY for clinically different diseases

System prompt (GPT-o3, single-turn)

You are a clinical terminology expert working with OHDSI standardized vocabularies. Your task is NOT to create mappings. Your task is to CLASSIFY an existing ICD10CM → SNOMED mapping. Return exactly one label: equal | uphill | downhill | incorrect | up&downhill Decision process: Same condition? → equal Target drops detail? → uphill Target adds detail? → downhill Both adds and drops? → up&downhill Different condition? → incorrect Output: single JSON object only. { "label": "...", "confidence": "high|moderate|low" }

Step 4c — Caching & skip logic

Results cached by (source, target) pair — identical pairs classified once.

Once any target is labelled equal, all remaining candidates are skipped.

Step 5 — Validation & Step 6 — Output Partitioning

Pairs with low or moderate confidence undergo up to two additional LLM calls.

1

Initial Classification (Step 4)

GPT-o3 assigns label + high/moderate/low confidence. High → done. Low/moderate → Call 2.

2

Adversarial Re-check

A previous classification assigned <label> with <moderate/low> confidence. Carefully reconsider. Apply all rules strictly.
3

Neutral Tiebreaker (calls 1&2 disagree only)

Original prompt reissued. Majority vote across 3 calls decides final label.

Calls agree
LLM (validated)
Disagree → Call 3
LLM (tiebreak)
All 3 differ
needs_review

Step 6 — Output Partitioning

mappingCons — Consistent set

All non-incorrect targets agree, or ≥1 target labelled equal. Ready to use in OHDSI.

mappingInc — Inconsistent set

Concepts with conflicting target labels. Require manual review before use.

F chapter summary

97.6%
automation rate
6
flagged for manual review
7
incorrect mappings detected
17
expert corrections (substance use)

Results — ICD-10-CM F Chapter (715 mappings)

Distribution of classification labels across 715 ICD-10-CM F chapter 1:1 mappings

equal
71.3%
510
uphill
21.5%
154
downhill
6.3%
45
up&down
0.6%
4
incorrect
1.0%
7
manual
0.8%
6

Expert review: 17 labels corrected — model confused harmful use vs abuse vs disorder related to use of psychoactive substance.

71.3%
of mappings are semantically equal — immediately usable
28.7%
have semantic mismatch — candidates for replacement
97.6%
automation rate — only 6 flagged for manual review

Quality check method

① Re-running the model several times on the same subset — expecting consistent results.

② Review by a medical professional across all classified pairs.

Results — Full Output Examples (G Chapter Subset)

CodeOriginal source nameNormalised nameSNOMED target Step 1Conf 1Step 2Conf 2Step 3Conf 3 Final labelConfMethod
G02 Meningitis in other infectious and parasitic diseases classified elsewhere Meningitis in infectious or parasitic diseases Infective meningitis equalmod equalhigh equalhighLLM (validated)
G04.00 Acute disseminated encephalitis and encephalomyelitis, unspecified Acute disseminated encephalitis or encephalomyelitis Acute disseminated encephalomyelitis equalhighLLM
G04.01 Postinfectious acute disseminated encephalitis and encephalomyelitis (postinfectious ADEM) Postinfectious acute disseminated encephalitis or encephalomyelitis (postinfectious ADEM) Acute disseminated encephalomyelitis following infectious disease downhillmod downhillmod downhillmodLLM (validated)
G05.3 Encephalitis and encephalomyelitis in diseases classified elsewhere Encephalitis or encephalomyelitis Encephalitis, myelitis and encephalomyelitis uphillhighLLM
G12.1 Other inherited spinal muscular atrophy Inherited spinal muscular atrophy Spinal muscular atrophy uphillmod equalmod uphillhigh equalhighLLM (tiebreak)
G31.09 Other frontotemporal neurocognitive disorder Frontotemporal neurocognitive disorder Frontotemporal dementia equalhighLLM
G31.83 Neurocognitive disorder with Lewy bodies Neurocognitive disorder with Lewy bodies Senile dementia of the Lewy body type downhillmod equalmod equalhigh equalhighLLM (tiebreak)
G31.84 Mild cognitive impairment of uncertain or unknown etiology Mild cognitive impairment of uncertain or unknown etiology Minimal cognitive impairment equalmod equalmod equalmodLLM (validated)
G40.219 Localization-related (focal) symptomatic epilepsy with complex partial seizures, intractable, without status epilepticus Localization-related epilepsy or epileptic syndromes with complex partial seizures, intractable Focal epilepsy uphillhighLLM
G40.5 Epileptic seizures related to external causes Epileptic seizures related to external causes Epileptic seizure uphillhighLLM
G40.8 Other epilepsy and recurrent seizures Epilepsy or recurrent seizures Epilepsy downhillmod equalmod equalmod equalmodLLM (tiebreak)
G40.824 Epileptic spasms, intractable, without status epilepticus Epileptic spasms, intractable Refractory infantile spasms downhillhighLLM
G04.3 Acute necrotizing hemorrhagic encephalopathy Acute necrotizing hemorrhagic encephalopathy Acute hemorrhagic leukoencephalitis incorrecthighLLM
G40.11 Localization-related (focal) symptomatic epilepsy with simple partial seizures, intractable Localization-related epilepsy or epileptic syndromes with simple partial seizures, intractable Focal onset aware epileptic seizure incorrecthighLLM
G40.A0 Absence epileptic syndrome, not intractable Absence epileptic syndrome, not intractable Absence seizure uphillmod incorrecthigh incorrectmod incorrecthighLLM (tiebreak)
G43.B0 Ophthalmoplegic migraine, not intractable Ophthalmoplegic migraine, not intractable Recurrent painful ophthalmoplegic neuropathy uphillmod uphillhigh uphillhighLLM (validated)

Conclusion & Next Steps

Key findings

Automation is feasible. 97.6% of the F chapter classified automatically — only 6 mappings required manual review.

📈

Most mappings are equal. 71.3% (510/715) of ICD-10-CM F chapter 1:1 mappings are semantically equivalent to SNOMED CT targets.

⚠️

Substance use is hard. Model confused harmful use vs abuse vs disorder related to psychoactive substance — 17 labels corrected after expert review.

🔁

Reproducibility confirmed. Re-running the model on the same subset produced consistent results.

Next steps

  • Extend beyond the F chapter to full ICD-10-CM vocabulary
  • Replace non-equal mappings with semantically equivalent SNOMED CT targets
  • Define policy for cases where no equivalent target exists
  • Apply pipeline to multi-target (1:N) mappings
  • Evaluate impact on OHDSI cohort definitions
  • Explore ICD-10 international and other source vocabularies
QR code — GitHub repo
Scan to open repository
github.com/dimshitc/
OHDSImappingClassifier
Dmytro Dymshyts · Anna Ostropolets
Martijn Schuemie · Johnson & Johnson