This is an old revision of the document!
The most straightforward way to implement a local drug Vocabulary is to utilize RxNorm Extension logic to extend standard Drug concept pool and map source drug products to concepts in RxNorm extended hierarchy.
To incorporate a new set of drug information, a structure should be achieved that contains every Concept only once and preserves the RxNorm structure, no matter which vocabulary the additional Concept is coming from. In a way, it should create a mixed RxNorm/drug vocabularies union.
In order to achieve this, any two equivalent Concepts have to be matched through its components: Ingredients to Ingredients, Forms to Forms, Supplier to Supplier, etc. Concepts are defined as matching if all components match. For example, a Clinical Drug matches another Clinical Drug if it contains the same Ingredients at the same strength and the same Dose Form.
Standard concepts in Drug domain are placed in a single comprehensive hierarchy based on their attributes. To correctly implement a vocabulary in CDM, following attributes must be extracted:
Such attributes may be given explicitly in a well-structured source vocabulary or have to be extracted from drug product names. In case of drug products or discrete attributes like Ingredients or Brand Names not having their own codes, new codes have to be constructed combining the word “OMOP” and a running number. The running number should be unique across all vocabularies. That means, each time a new vocabulary is added or refreshed, the next Concept Code should be the one of the last (without the 'OMOP' string) +1.
All source vocabulary concepts, extracted attributes and dosage and packaging information must be staged in standardized format of input tables, that can processed to be included into CDM Standardized Vocabularies.
To implement a tool to create and maintain the above structure, a number of issues need to be taken care of:
The new vocabulary should be prepared in the following tables:
|concept_name||Yes||string(255)||An unambiguous, meaningful and descriptive name for the Concept in English language|
|domain_id||Yes||string(20)||A foreign key to the DOMAIN table. The standard content is 'Drug', but for non-drugs it could be 'Device' or 'Observation'|
|vocabulary_id||Yes||string(20)||A foreign key to the VOCABULARY table. The value of this field should be identical for all records, indicating the new vocabulary being added.|
|concept_class_id||Yes||string(20)||One of the above listed RxNorm Concept Classes|
|concept_code||Yes||string(50)||The code in the source vocabulary. If the source vocabulary does not contain a code, e.g. for ingredients or dose forms, they will be created automatically (see below OMOP created codes)|
|source_concept_class_id||No||string(20)||Concept class that is given by the source vocabulary|
|possible_excipient||No||string(1)||A flag only relevant to ingredients, indicating whether or not they are not active ingredients and could be omitted from an ingredient list. Currently ignored.|
|valid_start_date||No||date||Date when the Concept became valid. This may or may not coincide with the date the product went to market. Default value is 01.01.1970, unless source gives explicit date.|
|valid_end_date||No||date||Date when the Concept became invalid. Market withdrawal does not mean a Concept is invalid. Deprecated concepts have VALID_END_DATE of a day before update, unless source gives explicit date. VALID_END_DATE for all valid source concepts must be 31.12.2099|
|invalid_reason||No||string(1)||Flag indicating wether the Concept is active (today's date between valid_start and valid_end_date), or upgraded ('U') or deprecated ('D').|
This table is expected to contain as a minimum the comprehensive list of Concept Classes:
It may contain Branded or Clinical Drug Forms or Components, but if not they will be derived (see below). Note that units should not have their own concept in the DRUG_CONCEPT_STAGE table. Instead, they should be used as verbatim. If the precise Concept Class is not known, it can be included as “Drug Product” and the correct Concept Class will be assigned during the incorporation automatically based on the availability of Strength, Dose Form, Brand Name, Supplier, Quantity and Box Size information.
Brand Names that are simple combinations of generic international name of active substance and manufacturer name (e.g. “Aspirin Bayer”) should not appear as attributes for Drug Products. Manufacturer information should be stored as a concept with Supplier class.
Concepts that belong to the source vocabulary, but do not belong to Drug domain by OMOP rules., should be classified as 'Device'. Typically, these belong to different substance groups:
Animal drugs can be handled as Drugs or Devices, depending on what their role in patient data can be expected to be. Note that only concepts from Drug domain can have attributes.
This table should contain the mapping between source codes and Standard Concepts for Ingredients, Brand Names, Dose Forms, Suppliers and Units.It also may contain mapping from source drugs to Standard Concepts for related ATC classes. All other relationships will be ignored.
|concept_code_1||Yes||string(255)||The source code|
|concept_id_2||Yes||integer||The existing target Concept|
|precedence||No||integer||For multiple concept_code_1/concept_id_2 combination the order of precedence in which they should be considered for equivalence testing. The mapping with the highest prevalence among the drugs will be used for writing a record to the CONCEPT_RELATIONSHIP table. A missing precedence will be interpreted as precedence 1. Every precedence value should be unique per concept_code_1|
|conversion_factor||No||float||The factor used to convert the source code to the target Concept. This is usually defined for units|
This table should contain all mappings from the new to existing Concepts and their precedence.
Units should be mapped to Standard Concept Units. Weight units should be converted to milligram, volume units should be mapped to milliliter, molar - to millimole with the right conversion factor. The source_code field should contain the verbatim string of the unit. It is highly desirable to only use units that are in use by Standard native RxNorm concepts. Querry DRUG_STRENGTH table for a distinct list.
Ingredients must be usually mapped to Standard concepts one to one. If ingredient is given as a mix (e.g. Co-dried gel of Magnesium Carbonate and Aluminium Hydroxide), it should be split in multiple enitities with distinct new codes; each component of the mix must be mapped to standard ingredient.
One to many mappings with precedence should be used if:
Dose Forms are commonly mapped to multiple RxNorm dose forms with precedence. Modified release forms should be first mapped to corresponding forms in RxNorm vocabulary (like Delayed Release Oral Capsule), and then to more generic forms (Oral Capsule) with lower precedence.
|concept_code_1||Yes||string(255)||One source code of the pair|
|concept_code_2||Yes||string(255)||The other source code of the pair|
This table should contain relationships for each Drug Concept: To the Ingredients (always), the Dose Form (if appropriate),the Supplier (if appropriate) and the Brand Name (if appropriate). All other relationships will be derived and ignored if they exist in the table. The relationships need not be symmetrical, only the one initiating from the Drug Concept is required.
If Drug Product concept does not have an Ingredient attribute, it will not have any standard mapping target after processing. Supplier attribute will not be considered for concepts without DS_STAGE or PC_STAGE entry since Marketed Product concepts can not exist without dosage information.
|drug_concept_code||Yes||string(255)||The source code of the Drug or Drug Component, either Branded or Clinical|
|ingredient_concept_code||Yes||string(255)||The source code for one of the Ingredients|
|amount_value||No||float||The numeric value for absolute content (usually solid formulations)|
|amount_unit||No||string(255)||The verbatim unit of the absolute content (solids)|
|numerator_value||No||float||The numerator value for a concentration (usually liquid formulations)|
|numerator_unit||No||string(255)||The verbatim numerator unit of a concentration (liquids)|
|denominator_value||No||float||The denominator value for a concentration (usally liquid formulations). It should contain a number for Quantified products, and null for everything else.|
|denominator_unit||No||string(255)||The verbatim denominator unit of a concentration (liquids)|
|box_size||No||integer||The amount of units per box|
This table contains the dose of each ingredient in each drug, as well as the box_size. For drugs which have no strength information or have only for some of the containing ingredients, the ds_stage record must be omitted. '0' values in ds_stage are only allowed for inert drugs. Drug ingredients should match those in internal_relationship_stage. If ingredients are mapped to the same one in relationship_to_concept their dosages should be summed up. A drug should not contain ingredients in solid (amount) and liquid (numerator/denominator) form. This might be caused be either source data aberration or drug pack, which must be split into separate Drug Products and processed in PC_STAGE table.
|pack_concept_code||Yes||string(255)||The source code of the Pack, either Branded or Clinical|
|drug_concept_code||Yes||string(255)||The component drug product in the Pack|
|amount||No||integer||The number of units of the drug product in drug_concept_code|
|box_size||No||integer||The number of packs if the pack is boxed (several packs in a larger container|
This table contains the composition of a Clinical or Branded Pack: The Clinical or Branded Drug and, number of doses in each box and number of boxes in each pack. If it is a boxed Pack, it will also contain the box size, since Packs have no records in DS_STAGE like the other drug products. Packs are allowed to have branded drugs as components, although usually Brand Name is only attributed to packs as a whole. Supplier may only be atrributed to the pack as a whole.
This table contains alternative names for concepts. These are either alternative names provided by source or names in original languages, since DRUG_CONCEPT_STAGE will contain english names only.
|synonym_concept_id||integer||Always and empty field in this table|
|synonym_name||string(255)||Alternative name of the concept. There is no need to copy the entry from DRUG_CONCEPT_STAGE|
|synonym_concept_code||string(50)||Concept code in source vocabulary|
|synonym_vocabulary_id||string(20)||VOCABULARY_ID of source vocabulary|
|language_concept_id||integer||CONCEPT_ID for Standard concept representing language|
This table allows to manually map source concepts to existing standard concept in OMOP CDM circumventing standard vocabulary building process. This is useful to represent source concepts that belong to a different domain, preserve relationships inside the vocabulary or to map concepts to standard Drug concepts from outside of RxNorm and RxNorm Extension logic (e.g. standard concepts in ATC or CVX vocabulary. It is also recommended to provide manual mapping for drugs that may have poor source representation yet are usually of special interest for researchers, like insulins or vaccines.
|concept_code_1||Yes||string(255)||CONCEPT_CODE of source concept in either CONCEPT or DRUG_CONCEPT_STAGE tables|
|concept_code_2||Yes||string(255)||CONCEPT_CODE of target concept in either CONCEPT or DRUG_CONCEPT_STAGE tables|
|vocabulary_id_1||Yes||string(20)||VOCABULARY_ID value of source concept|
|vocabulary_id_2||Yes||string(20)||VOCABULARY_ID value of target concept|
|relationship_id||Yes||sting(20)||Indicates the type of relation from source to target; most usually will indicate equivalence mapping ('Maps to'). Must be one of the values from RELATIONSHIP table|
|valid_start_date||No||date||Date when the relation became valid|
|valid_end_date||No||date||Date when the relation became invalid|
|invalid_reason||No||string(1)||Non-null entry allows for manual deprecation of existing relationship. Deprecated relationships that are absent from CONCEPT_RELATIONSHIP table will not be added to Standardized Vocabularies|
This table needs not to be symmetrical like CONCEPT_RELATIONSHIP; complementary relationships will be built automatically. Note that concepts with equivalence mappings in this table should not have relations to attributes in other input tables.
The input tables need to have the following quality requirements:
|Rule||If rule is violated|
|Each record should be unique in all tables.||The processing will fail.|
|Concept Codes should be unique and should not repeat for different products.||The processing will fail.|
|Combinations of product components should be unique. These are Ingredient-strength(s) combination, Dose Form, Brand Name, Quantity, Box size.||Only the highest Concept Code is retained, and the other ones are treated as non-standard Concepts and mapped to the highest.|
|Each product should have links (records in INTERNAL_RELATIONSHIP_STAGE) to all their Ingredients.||The product will be treated as if it had only the linked Ingredients. If no Ingredients are linked, the product will be processed into the CONCEPT_STAGE table, but as an orphan without any related Concept Classes.|
|Ingredients should be linked to their Standard Counterparts.||These Ingredients are treated as new Standard Ingredients.|
|Dose Forms should be linked to their Standard Counterparts.||The processing will fail.|
|Brand Names should be linked to their Valid Counterparts.||These Brand Names will be treated as new Concepts.|
|All % in source dosages should be converted into mg/ml (mg) unless it is a gas.||A drug would not be mapped to it's Standard Conept|
|Marketed Product (a drug that has relationship to it's supplier in INTERNAL_RELATIONSHIP_STAGE) should have both dosage and Dose Form||The product won't be processed into CONCEPT_STAGE table.|
|Boxed drug should have both dosage and Dose Form.||The product won't be processed into CONCEPT_STAGE table.|
|Product ingredients should match in INTERNAL_RELATIONSHIP_STAGE and DS_STAGE||The processing will fail.|
|When mapping Ingredients, Dose Forms or other attributes are mapped to multiple targets precedence values must be present and unique for each source concept||Processing will create orphaned meaningless branches of RxNorm Extension concepts.|
|Concepts with active 'Maps to' relations inside CONCEPT_RELATIONSHIP_MANUAL should not have any entries indicating relationships with attributes||Redundant branches of RxNorm Extension or multiple mappings may be created|
For quality assurance of input tables you can use drug_stage_tables_QA.sql script from project's github.
All propositions to add a new vocabulary into CDM may be submitted (optionally with prepared input tables) as issues on github.
Matching This step is necessary to add inferred equivalence relationships between new and existing Standard Concepts. All matches are created.
The matching considers all components (Ingredient-strength(s), Dose Form, Brand Name, Quantity, Box size) in the order of precedence. A 10% mismatch between strength or total volume values is still considered a match. Matching beteween normal and Quantified products compares the Numerator Value of the non-quantied to the Numerator divided by the Denominator Value.
Inferring of missing Concept Classes
After RxNorm Extension Standard concept with the fullest attribute set will be created for each source entity, all missing preceding Concept Classes are inferred from the existing ones, from bottom upwards.
|Concept Class||Defined by|
|Clinical Drug Component||Ingredient-strength. Note that Clinical Components are always single-Ingredient. This is in contrast to all other Concept Classes|
|Branded Drug Component||Ingredient-strength(s), Brand Name|
|Clinical Drug Form||Ingredient(s), Dose Form|
|Branded Drug Form||Ingredient(s), Dose Form, Brand Name|
|Clinical Drug||Ingredient-strength(s), Dose Form|
|Branded Drug||Ingredient-strength(s), Dose Form, Brand Name|
|Quantified Clinical Drug||Ingredient-strength(s), Dose Form, Quantity|
|Quantified Branded Drug||Ingredient-strength(s), Dose Form, Brand Name, Quantity|
|Clinical Drug Box||Ingredient-strength(s), Dose Form, Box size|
|Branded Drug Box||Ingredient-strength(s), Dose Form, Brand Name, Box size|
|Quantified Clinical Box||Ingredient-strength(s), Dose Form, Quantity, Box size|
|Quantified Branded Box||Ingredient-strength(s), Dose Form, Brand Name, Quantity, Box size|
Even though all drug classes are inferred, only those will be written to the CONCEPT table that have no mapping to an existing equivalent Standard Concept.
Result All records in the DRUG_CONCEPT_STAGE table are written to the CONCEPT_STAGE table as follows. The standard_concept field is set to 'S' for all products and Ingredients and Brand Names that have no match to existing Standard Concepts. Dose Forms, Brand Names and Suppliers are always written as non-standard.
All records linking drug products to their Ingredients, Dose Forms, Suppliers and Brand Names are written to the CONCEPT_RELATIONSHIP_STAGE table. Note that this can be a one or two step connection:
* Ingredients, Dose Forms, Suppliers and Brand Names that have no equivalent to RxNorm (and are therefore Standard Concepts): These are converted from the INTERNAL_RELATIONSHIP_STAGE table. * Ingredients, Dose Forms and Brand Names that have an RxNorm equivalent (at least one) are not written into the CONCEPT_RELATIONSHIP_STAGE table, but the RxNorm equivalent instead, using the records from the RELATIONSHIP_TO_CONCEPT table with the relationship_id = 'Has standard ing', 'Has standard brand' and 'Has standard form' .
Relationships between Drug Products or derivatives (Drug Forms and Components) are connected through CONCEPT_RELATIONSHIP_STAGE records with the the following relationship_id values:
|Concept Class 1||Concept Class 2||Relationship ID|
|Brand Name||Branded Drug||Brand name of|
|Brand Name||Branded Drug Comp||Brand name of|
|Brand Name||Branded Drug Form||Brand name of|
|Brand Name||Ingredient||Brand name of|
|Brand Name||Quant Branded Drug||Brand name of|
|Brand Name||Marketed Product||Brand name of|
|Branded Drug||Brand Name||RxNorm has ing|
|Branded Drug||Branded Drug Comp||Consists of|
|Branded Drug||Branded Drug Form||RxNorm is a|
|Branded Drug||Branded Pack||Contained in|
|Branded Drug||Clinical Drug||Tradename of|
|Branded Drug||Clinical Drug Comp||Consists of|
|Branded Drug||Dose Form||RxNorm has dose form|
|Branded Drug||Quant Branded Drug||Has quantified form|
|Branded Drug||Quant Clinical Drug||Tradename of|
|Branded Drug||Marketed Product||Has marketed form|
|Branded Drug Comp||Brand Name||RxNorm has ing|
|Branded Drug Comp||Branded Drug||Constitutes|
|Branded Drug Comp||Clinical Drug Comp||Tradename of|
|Branded Drug Comp||Quant Branded Drug||Constitutes|
|Branded Drug Form||Brand Name||RxNorm has ing|
|Branded Drug Form||Branded Drug||RxNorm inverse is a|
|Branded Drug Form||Clinical Drug Form||Tradename of|
|Branded Drug Form||Dose Form||RxNorm has dose form|
|Branded Drug Form||Quant Branded Drug||RxNorm inverse is a|
|Branded Pack||Branded Drug||Contains|
|Branded Pack||Clinical Drug||Contains|
|Branded Pack||Clinical Pack||Tradename of|
|Branded Pack||Dose Form||RxNorm has dose form|
|Branded Pack||Quant Branded Drug||Contains|
|Branded Pack||Quant Clinical Drug||Contains|
|Branded Pack||Marketed Product||Has marketed form|
|Clinical Drug||Branded Drug||Has tradename|
|Clinical Drug||Branded Pack||Contained in|
|Clinical Drug||Clinical Drug Comp||Consists of|
|Clinical Drug||Clinical Drug Form||RxNorm is a|
|Clinical Drug||Clinical Pack||Contained in|
|Clinical Drug||Dose Form||RxNorm has dose form|
|Clinical Drug||Quant Branded Drug||Has tradename|
|Clinical Drug||Quant Clinical Drug||Has quantified form|
|Clinical Drug||Marketed Product||Has marketed form|
|Clinical Drug||Marketed Product||Contained in|
|Clinical Drug Comp||Branded Drug||Constitutes|
|Clinical Drug Comp||Branded Drug Comp||Has tradename|
|Clinical Drug Comp||Clinical Drug||Constitutes|
|Clinical Drug Comp||Ingredient||Has precise ing|
|Clinical Drug Comp||Ingredient||RxNorm has ing|
|Clinical Drug Comp||Quant Branded Drug||Constitutes|
|Clinical Drug Comp||Quant Clinical Drug||Constitutes|
|Clinical Drug Form||Branded Drug Form||Has tradename|
|Clinical Drug Form||Clinical Drug||RxNorm inverse is a|
|Clinical Drug Form||Dose Form||RxNorm has dose form|
|Clinical Drug Form||Ingredient||RxNorm has ing|
|Clinical Drug Form||Quant Clinical Drug||RxNorm inverse is a|
|Clinical Pack||Branded Pack||Has tradename|
|Clinical Pack||Clinical Drug||Contains|
|Clinical Pack||Dose Form||RxNorm has dose form|
|Clinical Pack||Quant Clinical Drug||Contains|
|Clinical Pack||Marketed Product||Has quantified form|
|Dose Form||Branded Drug||RxNorm dose form of|
|Dose Form||Branded Drug Form||RxNorm dose form of|
|Dose Form||Branded Pack||RxNorm dose form of|
|Dose Form||Clinical Drug||RxNorm dose form of|
|Dose Form||Clinical Drug Form||RxNorm dose form of|
|Dose Form||Clinical Pack||RxNorm dose form of|
|Dose Form||Quant Branded Drug||RxNorm dose form of|
|Dose Form||Quant Clinical Drug||RxNorm dose form of|
|Dose Form||Marketed Product||RxNorm dose form of|
|Ingredient||Brand Name||Has brand name|
|Ingredient||Clinical Drug Comp||RxNorm ing of|
|Ingredient||Clinical Drug Form||RxNorm ing of|
|Marketed Product||Brand Name||Has brand name|
|Marketed Product||Branded Drug||Marketed form of|
|Marketed Product||Branded Pack||Marketed form of|
|Marketed Product||Clinical Drug||Marketed form of|
|Marketed Product||Clinical Drug||Contains|
|Marketed Product||Clinical Pack||Marketed form of|
|Marketed Product||Dose Form||RxNorm has dose form|
|Marketed Product||Quant Branded Drug||Marketed form of|
|Marketed Product||Quant Clinical Drug||Contains|
|Marketed Product||Quant Clinical Drug||Marketed form of|
|Marketed Product||Supplier||Has supplier|
|Quant Branded Drug||Brand Name||RxNorm has ing|
|Quant Branded Drug||Branded Drug||Quantified form of|
|Quant Branded Drug||Branded Drug Comp||Consists of|
|Quant Branded Drug||Branded Drug Form||RxNorm is a|
|Quant Branded Drug||Branded Pack||Contained in|
|Quant Branded Drug||Clinical Drug||Tradename of|
|Quant Branded Drug||Clinical Drug Comp||Consists of|
|Quant Branded Drug||Dose Form||RxNorm has dose form|
|Quant Branded Drug||Quant Clinical Drug||Tradename of|
|Quant Branded Drug||Marketed Product||Has marketed form|
|Quant Clinical Drug||Branded Drug||Has tradename|
|Quant Clinical Drug||Branded Pack||Contained in|
|Quant Clinical Drug||Clinical Drug||Quantified form of|
|Quant Clinical Drug||Clinical Drug Comp||Consists of|
|Quant Clinical Drug||Clinical Drug Form||RxNorm is a|
|Quant Clinical Drug||Clinical Pack||Contained in|
|Quant Clinical Drug||Dose Form||RxNorm has dose form|
|Quant Clinical Drug||Quant Branded Drug||Has tradename|
|Quant Clinical Drug||Marketed Product||Has marketed form|
|Quant Clinical Drug||Marketed Product||Contained in|
|Supplier||Marketed Product||Supplier of|
Name of the Branded and Clinical Pack concepts is built by combining names of the contents.
Relationships between any Drug Concept Class and a Classfication Concept Class is recorded through the “Drug has drug class” and “Drug class of drug” generic relationship pair.
Finally, a new DRUG_STRENGTH_STAGE table should be created from DS_STAGE and the the unit conversions in RELATIONSHIP_TO_CONCEPT, so the content can be added to the DRUG_STRENGTH table. This includes only Drug Concepts that have no mapping to an existing Standard Concept and are now Standard themselves.