User Tools

Site Tools


documentation:athena:import_snomed

I would build the logic slightly differently:

1. Concepts. - We have one authoratitative source: SNOMED international in combination with SNOMED UK. Other components might follow later (DM+D, other country-specific versions). - We get a stream of concepts from them:

  1. Attributes of existing concepts are overwritten by the new concepts
  2. New concepts are added
  3. Missing concepts are deprecated
  4. Explicitely deprecated (inactivated) concepts are deprecated
  5. We do domain assignments for all of them. This is done by building the entire hierarchical tree and defining “peaks”, of which all children inherit their domain.
  6. We define standard_concepts depending on their deprecation status and domain

- We get a stream of concept-to-concept relationships

  1. New ones get added
  2. Missing ones - if the concepts are deprecated, we leave them alone, if the concepts are active, we deprecate them
  3. Explicitely deprecated ones are deprecated

- We get a stream of update (inactive to active) relationships (only one per deprecated concept must exist)

  1. New ones get added
  2. Existing identical ones get left alone
  3. Existing update relationship to a different concept get deprecated and the new one added

Makes sense?

I am not sure we need UMLS for them. UMLS is really only a re-formating of SNOMED. There isn't much going on. Unless you found something in http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/SNOMEDCT_US. Take a look.

Import data from SNOMED vocabulary.

The source SNOMED vocabulary can be acquired from SNOMED. Also, SNOMED is included in UMLS. Both of this resources are suggested to be used in the import process.

Local copy of the vocabulary will be used to extract the concepts, and the UMLS web API will be used for additional concept analysis. The advantages of this approach are:

  • If the SNOMED vocabulary has been updated, we can process it, and we do not have to wait the next UMLS update.
  • Less load on web API.
  • Still we can use UMLS knowledge about the SNOMED vocabulary, especially in cross-vocabulary relations.

We'll start from the basic import process, which will give additional knowledge about the process itself.

The import process

Each concept in the source dictionary can be:

  • Identified
  • Validated

In current scope, identification means that OMOP and UMLS already have info about current context. When the Concept is identified, it can be validated. Each Concept is described by its type, set of attributes and relations with other Concepts. During the validation process, we must compare the Source and UMLS Concepts description to OMOP. If the translation can be performed to both directions, without data integrity and validity violation, we can say that the Concept is valid.

Identification

To identify the Concept we must:

  1. Search OMOP by the “CONCEPT_CODE”
  2. Query the UMLS by web API.

After this checks we will receive the data:

Records processed X
Records recognized only by OMOP Y
Records recognized only by UMLS Z
Records recognized by OMOP and UMLS N
Records not recognized M

From this table we can say that:

N - stable records, recognized by both systems, most likely they are valid.

Z - missing records, that should be added to OMOP. We can use UMLS data for validation purposes.

Y - this data should be inspected. There might be an invalid records, or we importing newer version of SNOMED, that included in UMLS.

M - new records, that are just added to new version of SNOMED. We need to validate them, using the source description.

Also, we must have an ability to see each of this subsets as the table or export it to file by the user request.

Validation

This process allows us to ensure, that OMOP describes the Concept exactly as the Source vocabulary. We also can use UMLS API for additional checks. At first we should define the Concept's type. It can be:

  • Domain
  • Relationship
  • Standard Concept
  • Classification Concept
  • Vocabulary

After the type of the Concept is been defined, we can perform the additional checks, that are specific for each type.

Domain

If the current concept is Domain, we can verify that:

  • There is a Domain entity connected with this Concept.
  • The string description of the Source Concept is equal to CONCEPT_NAME, and DOMAIN_NAME. If they are not equal, there must be at least one connected Concept Synonym with equal CONCEPT_SYNONYM_NAME.

Relationship

If current Source Concept is Relationship, we should compare it with the Relationship Mapping.

Classification Concept

  • Must belong to same Domain as the Source Concept.
  • Must be a part of the same Vocabulary, as being imported.
  • Must have same count of siblings as the Source Concept.
  • Must have equal CONCEPT_NAME, or CONCEPT_SYNONYM.

Standard Concept

  • Must belong to same Domain.
  • Must be a part of same vocabulary.
  • Must have an equal Relations within the importing vocabulary.
documentation/athena/import_snomed.txt · Last modified: 2015/04/02 04:30 by cgreich