User Tools

Site Tools


documentation:vocabulary:data_etl

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
documentation:vocabulary:data_etl [2015/01/24 13:03]
cgreich
documentation:vocabulary:data_etl [2015/01/24 14:23] (current)
cgreich
Line 2: Line 2:
 The most important impact the Standardized Vocabularies have on the ETL process from raw to CDM-formatted data is the Domain of each Concept. Irrespective from which source table a record comes, or what coding scheme it is represented by, the destination table will be determined by the domain_id of the respective Concept. Any ETL will have to follow the following logic for process every record in the source data: The most important impact the Standardized Vocabularies have on the ETL process from raw to CDM-formatted data is the Domain of each Concept. Irrespective from which source table a record comes, or what coding scheme it is represented by, the destination table will be determined by the domain_id of the respective Concept. Any ETL will have to follow the following logic for process every record in the source data:
   * ** Retrieve the source code** from the record.   * ** Retrieve the source code** from the record.
-  * **Find the Concept** that corresponds to the source code. In most cases, that is done by looking up the source code in the concept_code field with the correct content of the vocabulary_id field in the [[documentation:​cdm:​concept|CONCEPT]] table. Sometimes, the source code needs to be manipulated to find the right match. For example, [[documentation:​vocabulary:​icd9cm|ICD-9-CM]] codes have a dot after the 2nd or 3rd character, but in source data they are often stored without the dot. Or [[documentation:​vocabulary:​ndc|NDC]] codes come in a variety of formats (with dashes and asterisks or without, 9 or 11 digit), making the mapping process more complicated. Look into the specification of each vocabulary for specific recommendations of the lookup process.+  * **Find the Concept** that corresponds to the source code. In most cases, that is done by looking up the source code in the concept_code field with the correct content of the vocabulary_id field in the [[documentation:​cdm:​concept|CONCEPT]] table. Sometimes, the source code needs to be manipulated to find the right match. For example, [[documentation:​vocabulary:​icd9cm|ICD-9-CM]] codes have a dot after the 2nd or 3rd character, but in source data they are often stored without the dot. Or [[documentation:​vocabulary:​ndc|NDC]] codes come in a variety of formats (with dashes and asterisks or without, 9 or 11 digit), making the mapping process more complicated. Look into the specification of each [[documentation:​vocabulary:​vocabularies|Vocabulary]] ​for specific recommendations of the lookup process.
   * **Map to a Standard Concept** (standard_concept='​S'​) by retrieving all the active (invalid_reason field should be NULL) records from the [[documentation:​cdm:​concept_relationship|CONCEPT_RELATIONSHIP]] table. Use concept_id_1 for the Concept you want to map and concept_id_2 for the destination Concept, with relationship_id='​Maps to'. If the source code is a local code and you have a [[documentation:​cdm:​source_to_concept_map|SOURCE_TO_CONCEPT_MAP]] table for these, determine the destination Concept from the target_concept_id field. The destination Concept could be the Concept itself if it happens to be a Standard Concept, or a Concept in another vocabulary. In most cases, a source code maps to a single destination Standard Concept, but in some cases it could be two or three.   * **Map to a Standard Concept** (standard_concept='​S'​) by retrieving all the active (invalid_reason field should be NULL) records from the [[documentation:​cdm:​concept_relationship|CONCEPT_RELATIONSHIP]] table. Use concept_id_1 for the Concept you want to map and concept_id_2 for the destination Concept, with relationship_id='​Maps to'. If the source code is a local code and you have a [[documentation:​cdm:​source_to_concept_map|SOURCE_TO_CONCEPT_MAP]] table for these, determine the destination Concept from the target_concept_id field. The destination Concept could be the Concept itself if it happens to be a Standard Concept, or a Concept in another vocabulary. In most cases, a source code maps to a single destination Standard Concept, but in some cases it could be two or three.
   * **Write a record in the corresponding CDM table** for each destination Standard Concept based on the content of the domain_id field. Place the ID of the Standard Concept into the <​domain>​_concept_id,​ the Source Concept into the <​domain>​_source_concept_id and the source code into the <​domain>​_source_value field. Most data tables require a start_date and a Type Concept, and some have more fields that need consideration. See the details of each table in the [[documentation:​cdm:​standardized_clinical_data_tables|CDM Specifications]]. The corresponding table/field combination for each Domain is as follows:   * **Write a record in the corresponding CDM table** for each destination Standard Concept based on the content of the domain_id field. Place the ID of the Standard Concept into the <​domain>​_concept_id,​ the Source Concept into the <​domain>​_source_concept_id and the source code into the <​domain>​_source_value field. Most data tables require a start_date and a Type Concept, and some have more fields that need consideration. See the details of each table in the [[documentation:​cdm:​standardized_clinical_data_tables|CDM Specifications]]. The corresponding table/field combination for each Domain is as follows:
  
 ^domain_id^CDM table^Field^Comment^ ^domain_id^CDM table^Field^Comment^
-|Generic|Any|Any|Generic Concepts can be in any field that ends in concept_id.| +|[[documentation:​vocabulary:​generic|Generic]]|Any|Any|Generic Concepts can be in any field that ends in concept_id.| 
-|Gender|[[documentation:​cdm:​person|PERSON]]|gender_concept_id| | +|[[documentation:​vocabulary:​gender|Gender]]|[[documentation:​cdm:​person|PERSON]]|gender_concept_id| | 
-|Race|[[documentation:​cdm:​person|PERSON]]|race_concept_id| | +|[[documentation:​vocabulary:​race|Race]]|[[documentation:​cdm:​person|PERSON]]|race_concept_id| | 
-|Ethnicity|[[documentation:​cdm:​person|PERSON]]|ethnicity_concept_id| | +|[[documentation:​vocabulary:​ethnicity|Ethnicity]]|[[documentation:​cdm:​person|PERSON]]|ethnicity_concept_id| | 
-|Visit|[[documentation:​cdm:​visit_occurrence|VISIT_OCCURRENCE]]|visit_concept_id| | +|[[documentation:​vocabulary:​visit|Visit]]|[[documentation:​cdm:​visit_occurrence|VISIT_OCCURRENCE]]|visit_concept_id| | 
-|Procedure|[[documentation:​cdm:​procedure_occurrence|PROCEDURE_OCCURRENCE]]|procedure_concept_id| | +|[[documentation:​vocabulary:​procedure|Procedure]]|[[documentation:​cdm:​procedure_occurrence|PROCEDURE_OCCURRENCE]]|procedure_concept_id| | 
-|Modifier|[[documentation:​cdm:​procedure_occurrence|PROCEDURE_OCCURRENCE]]|modifier_concept_id| | +|[[documentation:​vocabulary:​modifier|Modifier]]|[[documentation:​cdm:​procedure_occurrence|PROCEDURE_OCCURRENCE]]|modifier_concept_id| | 
-|Drug|[[documentation:​cdm:​drug_exposure|DRUG_EXPOSURE]]|drug_concept_id| | +|[[documentation:​vocabulary:​drug|Drug]]|[[documentation:​cdm:​drug_exposure|DRUG_EXPOSURE]]|drug_concept_id| | 
-|Route|[[documentation:​cdm:​drug_exposure|DRUG_EXPOSURE]]|route_concept_id| | +|[[documentation:​vocabulary:​route|Route]]|[[documentation:​cdm:​drug_exposure|DRUG_EXPOSURE]]|route_concept_id| | 
-|Unit|[[documentation:​cdm:​measurement|MEASUREMENT]] or [[documentation:​cdm:​observation|OBSERVATION]] or [[documentation:​cdm:​specimen|SPECIMEN]]|unit_concept_id|Units are used in different contexts. *| +|[[documentation:​vocabulary:​unit|Unit]]|[[documentation:​cdm:​measurement|MEASUREMENT]] or [[documentation:​cdm:​observation|OBSERVATION]] or [[documentation:​cdm:​specimen|SPECIMEN]]|unit_concept_id|Units are used in different contexts. *| 
-|Device|[[documentation:​cdm:​device_exposure|DEVICE_EXPOSURE]]|device_concept_id| | +|[[documentation:​vocabulary:​device|Device]]|[[documentation:​cdm:​device_exposure|DEVICE_EXPOSURE]]|device_concept_id| | 
-|Condition|[[documentation:​cdm:​condition_occurrence|CONDITION_OCCURRENCE]]|condition_concept_id| | +|[[documentation:​vocabulary:​condition|Condition]]|[[documentation:​cdm:​condition_occurrence|CONDITION_OCCURRENCE]]|condition_concept_id| | 
-|Measurement|[[documentation:​cdm:​measurement|MEASUREMENT]]|measurement_concept_id| | +|[[documentation:​vocabulary:​measurement|Measurement]]|[[documentation:​cdm:​measurement|MEASUREMENT]]|measurement_concept_id| | 
-|Meas Value Operator|[[documentation:​cdm:​measurement|MEASUREMENT]]|operator_concept_id| | +|[[documentation:​vocabulary:​meas_value_operator|Meas Value Operator]]|[[documentation:​cdm:​measurement|MEASUREMENT]]|operator_concept_id| | 
-|Meas Value|[[documentation:​cdm:​measurement|MEASUREMENT]]|value_as_concept_id| | +|[[documentation:​vocabulary:​meas_value|Meas Value]]|[[documentation:​cdm:​measurement|MEASUREMENT]]|value_as_concept_id| | 
-|Observation|[[documentation:​cdm:​observation|OBSERVATION]]|observation_concept_id| | +|[[documentation:​vocabulary:​observation|Observation]]|[[documentation:​cdm:​observation|OBSERVATION]]|observation_concept_id| | 
-|Relationship|[[documentation:​cdm:​fact_relationship|FACT_RELATIONSHIP]]|relationship_concept_id| | +|[[documentation:​vocabulary:​relationship|Relationship]]|[[documentation:​cdm:​fact_relationship|FACT_RELATIONSHIP]]|relationship_concept_id| | 
-|Place of Service|[[documentation:​cdm:​care_site|CARE_SITE]]|place_of_service_concept_id| | +|[[documentation:​vocabulary:​place_of_service|Place of Service]]|[[documentation:​cdm:​care_site|CARE_SITE]]|place_of_service_concept_id| | 
-|Provider Specialty|[[documentation:​cdm:​provider|PROVIDER]]|specialty_concept_id| | +|[[documentation:​vocabulary:​provider_specialty|Provider Specialty]]|[[documentation:​cdm:​provider|PROVIDER]]|specialty_concept_id| | 
-|Currency|[[documentation:​cdm:visit_cost|VISIT_COST]] or [[documentation:​cdm:​procedure_cost|PROCEDURE_COST]] or [[documentation:​cdm:​drug_cost|DRUG_COST]] or [[documentation:​cdm:​device_cost|DEVICE_COST]]|currency_concept_id|Currency values appear in any of the *_COST tables. *| +|[[documentation:​vocabulary:currency|Currency]]|[[documentation:​cdm:​measurement|MEASUREMENT]] or [[documentation:​cdm:​observation|OBSERVATION]] or [[documentation:​cdm:​specimen|SPECIMEN]]|currency_concept_id|Currency values appear in any of the *_COST tables. *| 
-|Revenue Code|[[documentation:​cdm:​procedure_cost|PROCEDURE_COST]]|revenue_code_concept_id| | +|[[documentation:​vocabulary:​revenue_code|Revenue Code]]|[[documentation:​cdm:​procedure_cost|PROCEDURE_COST]]|revenue_code_concept_id| | 
-|Specimen|[[documentation:​cdm:​specimen|SPECIMEN]]|specimen_concept_id| | +|[[documentation:​vocabulary:​specimen|Specimen]]|[[documentation:​cdm:​specimen|SPECIMEN]]|specimen_concept_id| | 
-|Spec Anatomic Site|[[documentation:​cdm:​specimen|SPECIMEN]] or [[documentation:​cdm:​measurement|MEASUREMENT]] or [[documentation:​cdm:​observation|OBSERVATION]]|anatomic_site_concept_id or value_as_concept_id or value_as_concept_id|Anatomical Site Concepts are used to characterize the origin of a Specimen, but also the result of a Measurement or Observation. *| +|[[documentation:​vocabulary:​spec_anatomic_site|Spec Anatomic Site]]|[[documentation:​cdm:​specimen|SPECIMEN]] or [[documentation:​cdm:​measurement|MEASUREMENT]] or [[documentation:​cdm:​observation|OBSERVATION]]|anatomic_site_concept_id or value_as_concept_id or value_as_concept_id|Anatomical Site Concepts are used to characterize the origin of a Specimen, but also the result of a Measurement or Observation. *| 
-|Spec Disease Status|[[documentation:​cdm:​specimen|SPECIMEN]]|disease_status_concept_id| |+|[[documentation:​vocabulary:​spec_disease_status|Spec Disease Status]]|[[documentation:​cdm:​specimen|SPECIMEN]]|disease_status_concept_id| |
  
 * If there is more than one potential destination table the ETL needs to identify the context in which a Standard Concept is used, and select the right table from this table. * If there is more than one potential destination table the ETL needs to identify the context in which a Standard Concept is used, and select the right table from this table.
  
-  * For some Source Concepts there is **no mapping to a Standard Concept** (there is no record in the CONCEPT_RELATIONSHIP or SOURCE_TO_CONCEPT_MAP tables), usually because the Source Concept is too generic or otherwise ill-defined. In these cases, the domain_id field of the Source Concept itself should be used to place the record in the right table. Some Source Concepts have combination Domains, such as "​Device/​Procedure"​. In these cases, write a record into each of the combination Domain table (in this case to the DEVICE_OCCURRENCE and PROCEDURE_OCCURRENCE tables). As the Standard Concept write Concept 0 (concept_id=0) into the respective ​*_concept_id field.+  * For some Source Concepts there is **no mapping to a Standard Concept** (there is no record in the CONCEPT_RELATIONSHIP or SOURCE_TO_CONCEPT_MAP tables), usually because the Source Concept is too generic or otherwise ill-defined. In these cases, the domain_id field of the Source Concept itself should be used to place the record in the right table. Some Source Concepts have combination Domains, such as "​Device/​Procedure"​. In these cases, write a record into each of the combination Domain table (in this case to the DEVICE_OCCURRENCE and PROCEDURE_OCCURRENCE tables). As the Standard Concept write Concept 0 (concept_id=0) into the respective ​<​domain>​_concept_id field.
  
 If in the mapping process a Concept of the "​Domain"​ or "​Metadata"​ Domains are retrieved, an error has snuck into the mapping table. Please report those cases in the [[http://​forums.ohdsi.org/​c/​cdm-builders/​l/​latest|CDM-Builder]]. If in the mapping process a Concept of the "​Domain"​ or "​Metadata"​ Domains are retrieved, an error has snuck into the mapping table. Please report those cases in the [[http://​forums.ohdsi.org/​c/​cdm-builders/​l/​latest|CDM-Builder]].
  
-The same is true if a Type Concept Domain, like "Obs Period Type", "Death Type", "Visit Type", "​Procedure Type", etc. is produced. Though these Type Concepts are valid concepts and have to be placed into the <​domain>​_type_concept_id field of the respective CDM tables, they cannot be the result of the mapping process. They denote the origin of the record and the selection of Type Concepts ​have to be hard-wired ​in the ETL process.+The same is true if a Type Concept Domain, like "Obs Period Type", "Death Type", "Visit Type", "​Procedure Type", etc. is produced. Though these Type Concepts are valid concepts and have to be placed into the <​domain>​_type_concept_id field of the respective CDM tables, they cannot be the result of the mapping process. They denote the origin of the record and the selection of Type Concepts ​should ​be hard-wired ​into the ETL process.
documentation/vocabulary/data_etl.1422104611.txt.gz · Last modified: 2015/01/24 13:03 by cgreich