LLMs can support automated mapping
of ICD to SNOMED concepts

Large Language Model–Based Classification of ICD-10-CM to SNOMED Mappings
for Improved Semantic Fidelity in OHDSI

Dmytro Dymshyts · Anna Ostropolets · Martijn Schuemie

OHDSI European Symposium 2026 Johnson & Johnson

The Problem

Concept mapping in OHDSI can lose or corrupt clinical meaning when source and target terminologies don't align perfectly. LLMs are proposed as a solution to define the type of semantic equivalence of mappings.

Why Mapping Quality Matters

ICD-10-CM → SNOMED CT gap

ICD-10-CM was designed for administrative billing, while SNOMED CT is a clinical ontology. The same condition can be described at different levels of specificity in each vocabulary.

Semantic mismatch types

equal same clinical meaning
uphill target more general
downhill target more specific
up&downhill both general & specific drift
incorrect different condition entirely

Impact on research

📊

Cohort definitions over- or under-include patients when mappings are not semantically equivalent.

🔬

Phenotyping algorithms inherit vocabulary biases, affecting reproducibility across data sources.

🔗

Federated analyses require consistent semantics — non-equal mappings introduce silent confounding.

Proposed solution

Use LLMs to automatically classify existing mappings by semantic equivalence type. Non-equal mappings can then be replaced with better-fitting SNOMED CT targets.

Pipeline Overview — 6 Steps

1

Candidate Retrieval

Fetch SNOMED CT targets & synonyms from OMOP vocabulary DB

›

2

Source Normalisation

GPT-4o removes non-specific qualifiers (NOS, NEC, unspecified…)

›

3

Text-Match Pre-check

Bag-of-words match → equal / high without LLM

›

4

LLM Classification

GPT-o3 classifies into 5 labels with confidence levels

›

5

Confidence Validation

Adversarial re-check + majority-vote tiebreaker for low/moderate

›

6

Output Partitioning

Consistent set (mappingCons) vs. conflicting set (mappingInc)

Decision provenance tags

text_matchEqual by bag-of-words; no LLM

LLMSingle o3 call

LLM (cached)Reused from cache for identical pair

LLM (validated)Confirmed by adversarial re-check

LLM (tiebreak)Majority vote across 3 calls

needs_reviewAll 3 calls differed; manual review

skippedSource already has an 'equal' target

Step 1 — Candidate Retrieval & Step 2 — Source Normalisation

Step 1 — Candidate Mapping Retrieval

SNOMED CT targets and synonyms fetched from a Databricks-hosted OMOP vocabulary database. Canonical names prioritised over synonyms via if_concept_name flag. No LLM calls in this step.

Step 2c — AND / OR logic

In ICD-10 "and" is replaced by "or" (with some exceptions) — in ICD-10 a coordinating "and" usually means "and/or", so treating it as "or" preserves the intended broader concept rather than falsely narrowing it.

# Example transformations
"Atrial fibrillation and flutter"
→ "Atrial fibrillation or flutter"
        

Step 2 — Source Normalisation (GPT-4o)

Gate regexp — names sent to LLM only if they contain:

\bunspecified\b | \bother\b | \bnonspecific\b
\buncomplicated\b | \bwithout\b | disorders | diseases
not otherwise specified | not elsewhere classified
        

GPT-4o normalisation prompt (single-turn):

You are a clinical terminology expert.
Task: produce a NORMALIZED concept name.

REMOVE: unspecified, NOS, NEC, other,
  uncomplicated, in diseases classified elsewhere,
  'without X' (keep main condition)

KEEP: body site, laterality, severity,
  stage/grade, acuity, etiology/cause,
  named subtypes (classical, atypical…)

Output: ONLY the normalized name. No explanation.
        

Step 3 — Text-Match Pre-check (No LLM)

Source–target pairs compared as bag-of-words (sorted word multisets). A match assigns equal / high confidence without any LLM call.

3a — Stop-word removal

# Articles
a, an, the

# Prepositions
of, in, on, at, to, from,
for, by, as

# Other
due
      

3b — Hyphen normalisation

# 1. Unicode → ASCII hyphen
gsub("[—–‐]", "-", x)

# 2. Hyphen → space
# iodine-deficiency → iodine deficiency
v_space <- gsub("-", " ", x)

# 3. Hyphen → nothing
# non-toxic → nontoxic
v_none <- gsub("-", "", x)

# 4. 'non word' → 'nonword'
gsub("\bnon\s+([a-z0-9]+)\b",
     "non\1", s, perl=TRUE)
      

3c — Tokenisation

# Replace non-alphanumeric
gsub("[^a-z0-9]+", " ", x)
      

Each name yields up to 6 normalised variants. A match in any source–target combination resolves the pair without an LLM call.

Examples:

"Iodine-deficiency" ↔ "Iodine deficiency"

"Non-toxic" ↔ "Nontoxic"

Step 4 — LLM Classification (GPT-o3)

Label taxonomy

Label	Definition
equal	Same clinical condition, or one is the only available subtype. e.g. "Clonic hemifacial spasm" = "Hemifacial spasm"
uphill	Target more general — source has features absent in target.
downhill	Target more specific — target adds features absent in source.
up&downhill	Both: source drops AND target adds clinically meaningful features.
incorrect	Different conditions — different disorder family or no semantic overlap.

Key interpretation rules

① Adjectives hypertensive, diabetic ≡ "due to" / "caused by"

② ~itis = inflammation of; ~osis = disorder of

③ Synonyms: kidney = renal; word order doesn't change meaning

④ Preserve: body site, laterality, severity, stage, acuity, etiology

⑤ Use incorrect ONLY for clinically different diseases

System prompt (GPT-o3, single-turn)

You are a clinical terminology expert working
with OHDSI standardized vocabularies.
Your task is NOT to create mappings.
Your task is to CLASSIFY an existing
ICD10CM → SNOMED mapping.

Return exactly one label:
equal | uphill | downhill | incorrect | up&downhill

Decision process:
  Same condition?          → equal
  Target drops detail?     → uphill
  Target adds detail?      → downhill
  Both adds and drops?     → up&downhill
  Different condition?     → incorrect

Output: single JSON object only.
{ "label": "...", "confidence": "high|moderate|low" }
      

Step 4c — Caching & skip logic

Results cached by (source, target) pair — identical pairs classified once.

Once any target is labelled equal, all remaining candidates are skipped.

Step 5 — Validation & Step 6 — Output Partitioning

Pairs with low or moderate confidence undergo up to two additional LLM calls.

1

Initial Classification (Step 4)

GPT-o3 assigns label + high/moderate/low confidence. High → done. Low/moderate → Call 2.

2

Adversarial Re-check

A previous classification assigned <label>
with <moderate/low> confidence.
Carefully reconsider. Apply all rules strictly.
            

3

Neutral Tiebreaker (calls 1&2 disagree only)

Original prompt reissued. Majority vote across 3 calls decides final label.

Calls agree

LLM (validated)

Disagree → Call 3

LLM (tiebreak)

All 3 differ

needs_review

Step 6 — Output Partitioning

mappingCons — Consistent set

All non-incorrect targets agree, or ≥1 target labelled equal. Ready to use in OHDSI.

mappingInc — Inconsistent set

Concepts with conflicting target labels. Require manual review before use.

F chapter summary

97.6%

automation rate

6

flagged for manual review

7

incorrect mappings detected

17

expert corrections (substance use)

Results — ICD-10-CM F Chapter (715 mappings)

Distribution of classification labels across 715 ICD-10-CM F chapter 1:1 mappings

equal

71.3%

510

uphill

21.5%

154

downhill

6.3%

45

up&down

0.6%

4

incorrect

1.0%

7

manual

0.8%

6

Expert review: 17 labels corrected — model confused harmful use vs abuse vs disorder related to use of psychoactive substance.

71.3%

of mappings are semantically equal — immediately usable

28.7%

have semantic mismatch — candidates for replacement

97.6%

automation rate — only 6 flagged for manual review

Quality check method

① Re-running the model several times on the same subset — expecting consistent results.

② Review by a medical professional across all classified pairs.

Results — Full Output Examples (G Chapter Subset)

Code	Original source name	Normalised name	SNOMED target	Step 1	Conf 1	Step 2	Conf 2	Step 3	Conf 3	Final label	Conf	Method
G02	Meningitis in other infectious and parasitic diseases classified elsewhere	Meningitis in infectious or parasitic diseases	Infective meningitis	equal	mod	equal	high	—	—	equal	high	LLM (validated)
G04.00	Acute disseminated encephalitis and encephalomyelitis, unspecified	Acute disseminated encephalitis or encephalomyelitis	Acute disseminated encephalomyelitis	—	—	—	—	—	—	equal	high	LLM
G04.01	Postinfectious acute disseminated encephalitis and encephalomyelitis (postinfectious ADEM)	Postinfectious acute disseminated encephalitis or encephalomyelitis (postinfectious ADEM)	Acute disseminated encephalomyelitis following infectious disease	downhill	mod	downhill	mod	—	—	downhill	mod	LLM (validated)
G05.3	Encephalitis and encephalomyelitis in diseases classified elsewhere	Encephalitis or encephalomyelitis	Encephalitis, myelitis and encephalomyelitis	—	—	—	—	—	—	uphill	high	LLM
G12.1	Other inherited spinal muscular atrophy	Inherited spinal muscular atrophy	Spinal muscular atrophy	uphill	mod	equal	mod	uphill	high	equal	high	LLM (tiebreak)
G31.09	Other frontotemporal neurocognitive disorder	Frontotemporal neurocognitive disorder	Frontotemporal dementia	—	—	—	—	—	—	equal	high	LLM
G31.83	Neurocognitive disorder with Lewy bodies	Neurocognitive disorder with Lewy bodies	Senile dementia of the Lewy body type	downhill	mod	equal	mod	equal	high	equal	high	LLM (tiebreak)
G31.84	Mild cognitive impairment of uncertain or unknown etiology	Mild cognitive impairment of uncertain or unknown etiology	Minimal cognitive impairment	equal	mod	equal	mod	—	—	equal	mod	LLM (validated)
G40.219	Localization-related (focal) symptomatic epilepsy with complex partial seizures, intractable, without status epilepticus	Localization-related epilepsy or epileptic syndromes with complex partial seizures, intractable	Focal epilepsy	—	—	—	—	—	—	uphill	high	LLM
G40.5	Epileptic seizures related to external causes	Epileptic seizures related to external causes	Epileptic seizure	—	—	—	—	—	—	uphill	high	LLM
G40.8	Other epilepsy and recurrent seizures	Epilepsy or recurrent seizures	Epilepsy	downhill	mod	equal	mod	equal	mod	equal	mod	LLM (tiebreak)
G40.824	Epileptic spasms, intractable, without status epilepticus	Epileptic spasms, intractable	Refractory infantile spasms	—	—	—	—	—	—	downhill	high	LLM
G04.3	Acute necrotizing hemorrhagic encephalopathy	Acute necrotizing hemorrhagic encephalopathy	Acute hemorrhagic leukoencephalitis	—	—	—	—	—	—	incorrect	high	LLM
G40.11	Localization-related (focal) symptomatic epilepsy with simple partial seizures, intractable	Localization-related epilepsy or epileptic syndromes with simple partial seizures, intractable	Focal onset aware epileptic seizure	—	—	—	—	—	—	incorrect	high	LLM
G40.A0	Absence epileptic syndrome, not intractable	Absence epileptic syndrome, not intractable	Absence seizure	uphill	mod	incorrect	high	incorrect	mod	incorrect	high	LLM (tiebreak)
G43.B0	Ophthalmoplegic migraine, not intractable	Ophthalmoplegic migraine, not intractable	Recurrent painful ophthalmoplegic neuropathy	uphill	mod	uphill	high	—	—	uphill	high	LLM (validated)

Conclusion & Next Steps

Key findings

✅

Automation is feasible. 97.6% of the F chapter classified automatically — only 6 mappings required manual review.

📈

Most mappings are equal. 71.3% (510/715) of ICD-10-CM F chapter 1:1 mappings are semantically equivalent to SNOMED CT targets.

⚠️

Substance use is hard. Model confused harmful use vs abuse vs disorder related to psychoactive substance — 17 labels corrected after expert review.

🔁

Reproducibility confirmed. Re-running the model on the same subset produced consistent results.

Next steps

Extend beyond the F chapter to full ICD-10-CM vocabulary
Replace non-equal mappings with semantically equivalent SNOMED CT targets
Define policy for cases where no equivalent target exists
Apply pipeline to multi-target (1:N) mappings
Evaluate impact on OHDSI cohort definitions
Explore ICD-10 international and other source vocabularies

Scan to open repository

github.com/dimshitc/
OHDSImappingClassifier

Dmytro Dymshyts · Anna Ostropolets
Martijn Schuemie · Johnson & Johnson

LLMs can support automated mappingof ICD to SNOMED concepts