The following analysis was performed by the VA Informatics and Computing Infrastructure as a part of the QA process for the the VA’s Corporate Data Warehouse transformation to OMOP. This document was created using R Markdown, the VA OMOP Data (which uses OMOP Version 5.2), and an MS-SQL environment. The SQL Script used to create the tables used within this document is called “SQL Process for finding gender relatives with discordance.sql”.

Methods for evaluating administrative code relationships through SNOMED hierarchy in OMOP

Within the OMOP common data model, standard vocabularies are used within tables to define clinical concepts. Within the CONDITION domain, the SNOMED CT coding vocabulary has been utilized to describe all clinical concepts. The SNOMED CT hierarchy, facilitated by OMOP’s CONCEPT_ANCESTOR table, has enabled simple relationship determination between CONDITION concepts and their heirarchical ancestors and descendants.

These relationship determinations made it possible to achieve the goal of creating potential gender siblings from the SNOMED hierarchy, that is–two different SNOMED codes that represented the same clinical concept, but for opposite genders.

To create these gender siblings, two methods were employed:

First, once the concept table was limited to only SNOMED concepts where concept_name like ‘%male%’ or ‘%female%’, the following two methods were used on the remaining gender-specific SNOMED concepts.

Method 1. Concept_Name Match

If two SNOMEDs’ descriptions matched on all characters, except for the words ‘male’ or ‘female’, then the codes were considered gender siblings. The following is an example:

select f_concept_name as Female_SNOMED_sibling, M_Concept_name as Male_SNOMED_sibling
FROM snogen.SNOMED_GenderRelatives2
where f_concept_id = 133711
1 records
Female_SNOMED_sibling Male_SNOMED_sibling
Overlapping malignant neoplasm of female breast Overlapping malignant neoplasm of male breast

Method 2. Ancestry Match

The second method employed determined whether two codes code be gender siblings based on matching SNOMED ancestries. If two codes had the same exact family tree, that is–both codes have the exact same parent codes, but one code had the word ‘female’ in its description while the other contained the word ‘male’, then they would also be considered gender siblings.

Figure 1. Gender Sibling According to SNOMED Ancestry

Figure 1. Gender Sibling According to SNOMED Ancestry

select f_concept_id, f_concept_name, F_concept_code, m_Concept_ID, M_Concept_name, M_concept_code, f_ancestorct, m_ancestorct, AncestorTreeMatch, NameMatch, AncestorCtMatch
FROM snogen.SNOMED_GenderRelatives2

Once gender sibling pairs were created through these two methods, gender-specific siblings were separated based on gender. Next, male and female concept lists were joined to the OMOP CONDITION_OCCURRENCE table. All resulting rows from the CONDITION_OCCURRENCE table joins were then joined to the OMOP PERSON table to ascertain gender of the patients receiving those CONDITION diagnoses. Finally, a summary table was created with the resultant gender discrepancy for each gender-specific code.

select CONDITION_CONCEPT_ID, concept_name, ConceptGender, Pat_ct, FemaleInstanceCt, MaleInstanceCt, DiscordantInstancepct
FROM snogen.discordance_summary
ORDER BY DiscordantInstancepct desc

Results

As evidenced in the above dataframes, there were 481 total gender sibling pairs, of which only 50 total concepts, existed in the CONDITION_OCCURRENCE table. This is due to the fact that some CONDITION domain SNOMED codes have never been used as a patient diagnosis in our records, and because some SNOMED codes were specified for other domains (ex - Procedure, Observation, Measurement).

Of those that existed, 24 were female codes and 26 were male codes.

## 
## female   male 
##     24     26

As seen below, of those concepts with more than 100 gender discrepant instances, most are female codes (12 vs. 5) with large number of instances, that is–diagnoses, being assigned incorrectly to male patients.

select CONDITION_CONCEPT_ID, concept_name, ConceptGender, Pat_ct, FemaleInstanceCt, MaleInstanceCt, DiscordantInstancepct
FROM snogen.discordance_summary
where 
  (ConceptGender = 'female' and MaleInstanceCt>100)
   OR
  ( ConceptGender = 'male' and FemaleInstanceCt>100)
order by DiscordantInstancepct desc
## 
## female   male 
##     12      5

Limitations/Conclusions

It is, of course, possible that these two methods leave out many similar SNOMED codes that could be considered gender siblings with other definitions. ‘Male’ and ‘Female’ are also not the only gender-specific words in the SNOMED Descriptions. Further, it is possible, although unlikely, for a code to contain ‘Male’ or ‘Female’ and not be gender-specific (i.e.–Male Pattern Alopecia). Although it would be interesting to be able to include all related, but opposite-gender, codes into an analysis of this kind, it made sense for this analysis to use and easily replicable method that is very specific. In subsequent analyses, a more sensitive approach might be helpful for understanding the breadth of the problem.

This process only takes into account the CONDITION_OCCURRENCE concepts, but these types of gender specific codes may exist within the PROCEDURE, OBSERVATION, DEVICE and other OMOP Domains as well. To evaluate all misclassification of SNOMED gender specific codes, it would be necessary to look at all of the OMOP domains where these codes exist. This method and similar methods could be used to create data plausibility/quality checks, as well.