Skip to content

ETL Instructions for Mapping ICDO to SNOMED

Eduard Korchmar edited this page Jan 8, 2020 · 6 revisions

COMPLETE ICD-O SOURCE CODES

Cancer diagnoses are usually represented by a combination of ICD-O-3 histology and topography codes. To map this combination to SNOMED follow these steps:

  1. Transform diagnosis SOURCE VALUE
  • Histology code. In the source, it is normally formatted like this: 8140/3, where 8140 is histology type and 3 is tumor behavior. If histology type and behavior are stored separately, concatenate them to get single histology code.
  • Topography code. In the source, it is normally formatted like this: C15.3. Be aware of the before the fourth character . If the source doesn't have the dot, insert it after the 3d character: C513 -> C15.3. If the source code contains only 3 characters, add '.9' to get code for "unspecified part of" subcathegory: C50 -> C50.9.
  • Source value. Concatenate histology code and topography code using hyphen: 8140/3-C51.3. This value will be stored in the CONDITION_OCCURRENCE.CONDITION_SOURCE_VALUE field.
  1. Extract value of diagnosis SOURCE_CONCEPT_ID. Concept_ID for the combined histology/topography code is stored in the CONCEPT table. The following SQL shows how to extract its value for the above example:
    SELECT CONCEPT_ID
    FROM CONCEPT
    WHERE CONCEPT_CODE = ‘8140/3-C15.3’ --Adenocarcinoma, NOS, of upper third of esophagus
    AND VOCABULARY_ID = ‘ICDO3’
    
    The resulting value 44501519 will be stored in the CONDITION_OCCURRENCE.CONDITION_SOURCE_CONCEPT_ID field and will be used in mapping to a standard SNOMED code or itself (next section).
  2. Extract value of STANDARD CONCEPT ID Source concept ID of the combined histology/topography code is mapped to itself or a standard concept ID in the CONCEPT_RELATIONSHIP table. The following SQL shows how to extract its value for the above example:
    SELECT CONCEPT_ID_2
    FROM CONCEPT_RELATIONSHIP
    WHERE CONCEPT_ID_1 = 44501519
    AND RELATIONSHIP_ID = 'Maps to'
    
    The resulting value [36715848] will be stored in the CONDITION_OCCURRENCE.CONDITION_CONCEPT_ID field and/or the EPISODE.EPISODE_OBJECT_CONCEPT_ID.

INCOMPLETE ICD-O SOURCE CODES

In some cases when the source data are incomplete, apply the following approach.

  1. Tumor behavior is not known Use 1 (uncertain behavior) to making your code complete: 8070 -> 8070/1
  2. Topography is unknown Use mappings from this file https://seer.cancer.gov/tools/conversion/ICD03toICD9CM-ICD10-ICD10CM.xls (last 3 tabs of this file) to obtain topography if you have ICD-10 code for this diagnosis. Note, if you have long ICD-10CM code, you need to cut it off to have only 5 symbols (including dot): C50.211 -> C50.2. In case when a patient has several cancer diagnoses, use ICD-10 from the date closest to the ICD-O histology date.
  3. Either Topography or Histology is unretrievable Use string value 'NULL' (spelled out) instead of the code. For instance, neoplasm of endometrium without specified morphology will have code 'NULL-C54.1', and neuronevus without specified site can be coded as '8725/0-NULL'.

REFERENCES

Information about ICDO3 vocabulary is here: http://www.iacr.com.fr/index.php?option=com_content&view=category&layout=blog&id=100&Itemid=577

Information about our approach to mapping is here: http://www.ohdsi.org/web/wiki/lib/exe/fetch.php?media=documentation:oncology:poster2018-improvement_of_cancer_diagnosis_representation_in_omop_cdm3_1_.pdf

Detailed information about ICDO3 implementation in OMOP CDM is here: https://www.ohdsi.org/web/wiki/doku.php?id=documentation:vocabulary:icdo3

Clone this wiki locally