User Tools

Site Tools


documentation:athena:import_data_from_umls

Import data from UMLS

UMLS(Unified Medical Language System) contains the selection of vocabularies which should be imported into OMOP. The list of vocabularies:

  • CPT4
  • HCPCS
  • ICD10
  • ICD9CM
  • LOINC
  • MeDRA
  • RxNorm
  • SNOMED

UMLS consist of three modules: Metathesaurus(contains data about Concepts), Semantic Network(contains data about Types and Relations), SPECIALIST(NLP-module). Detailed info about UMLS: UMLS

The import process can be divided in sub processes:

  • Analyze the Semantic Network components used in source description.
  • Analyze source inner relations.
  • Analyze source external relations.
  • Load source to OMOP.
  • Validate the cross-vocabulary data integrity.
  • Validate the integrity of the data, which is described with imported vocabulary.

Semantic Network analysis

In UMLS, vocabulary represented as the aggregate of Source Concepts. Each Source Concept is related to UNLS Concept, and describes it from the vocabulary’s point of view. Also, Concept is described by the Semantic Network by Types and Relations. Type defines the Concepts meaning and the Relation describes the relation between the different Concepts. In the OMOP Concept type and relations are described by Links. When we importing the vocabulary, we have to interpret the Concept description from the Semantic Network terms to the OMOP Class & Relationship terms. Before we can start importing, we should ensure that all Types and Relations used in the description of current vocabulary can be translated to OMOP Class & Relationship terms without data validity violation.

Source inner relations analysis

We need to check all Concept-to-Concept relations inside the importing vocabulary, and ensure that all that relations can be described by OMOP.

Source external relations analysis

First we need to get all vocabularies that are being referenced from the current one, and then we need to check which of them are present in the OMOP. We’ll need to validate the relations between the Source Concepts of the vocabularies that exist in the OMOP. If some external vocabularies are not exist in OMOP, their relations to importing vocabulary can be omitted.

Load source to OMOP

After the analysis has been performed the actual load will start. Whole process must be logged in some readable form.

Validate cross-vocabulary data integrity

When the load process is completed, we need to ensure, there are no invalid relations between the new version of the vocabulary and other vocabularies.

Validate OMOP data integrity and validity

We also need to check integrity and validity of all the data, which references newly uploaded vocabulary.

documentation/athena/import_data_from_umls.txt · Last modified: 2015/03/18 11:40 by gleb_malikov