User Tools

Site Tools


documentation:vocabulary:background

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
documentation:vocabulary:background [2015/01/20 00:31]
cgreich created
documentation:vocabulary:background [2016/06/18 16:08]
cgreich
Line 1: Line 1:
 ===== Background and Motivation ===== ===== Background and Motivation =====
-The Standardized Vocabularies contains all of the code sets, terminologies,​ vocabularies,​ nomenclatures,​ lexicons, thesauri, ontologies, taxonomies, classifications,​ abstractions, ​and other such data that are required for: +The availability ​of very large-scale healthcare databases in electronic form has opened ​the possibility to generate systematic ​and large-scale evidence ​and insights about the application ​of healthcare to patients. This discipline is called Observational Outcome Research, and it uses longitudinal patient level clinical data in order to describe and understand the pathogenesis of disease and the effect of other clinical events as well as treatment interventions on the progression of the disease. ​This research constitutes secondary use of the data, which is being collected usually for purposes other than research: administrative data such as insurance reimbursement claims and Electronic Health or Medical Record (EHR, EMR). 
-  * Generation of the transformed (i.e., standardized) data from the raw dataset into CDM , +
-  * Searching, querying ​and extraction of the transformed data, and browsing and navigating the hierarchies ​of classes and abstractions inherent in the transformed data, and +
-  * Interpreting the meanings of the data. +
-Observational outcome research ​uses longitudinal patient level clinical data in order to describe and understand the pathogenesis of disease and the effect of other clinical events as well as treatment interventions on the progression of the disease. ​For this, it uses observational ​data, i.e. data collected ​about the patients ​usually for purposes other than research ​(primary use)mostly ​administrative data such as insurance reimbursement claims and Electronic Health or Medical Record (EHR, EMR) data.+
  
-Because the data are collected ​for primary use, the format and representation follows that primary use. It also introduces artifacts and bias into the data. Observational research however requires a picture of the patient experience as close to the clinical reality as possible. In addition, all source datasets differ from each other in format and content representation, which makes robust, reproducible and automated research a significant challenge. Since healthcare systems differ between countries, the problem becomes even harder for research carried out internationally.+Because ​of the collection purpose ​for primary use, the format and representation ​of the data follows that primary use. It also introduces artifacts and bias into the data. In addition, all source datasets differ from each other in format and content representation. Since healthcare systems differ between countries, the problem becomes even harder for research carried out internationally. All this makes robust, reproducible and automated research a significant challenge
  
-The solution is the standardization of the data and a standardization of the representation. This allows methods and tools to operate on data of disparate ​provenance, freeing the analyst from having to dissect the idiosyncrasies of anyone ​dataset and focusing on the validity of the analytical ​approach+The solution is the standardization of the data and a standardization of the representation. This allows methods and tools to operate on data of disparate ​origin, freeing the analyst from having to dissect the idiosyncrasies of a particular ​dataset and manipulating ​the data to make it fit for research. It also allows to develop ​analytical ​methods on one dataset, and applying it an any other dataset in CDM format
  
-The OMOP CDM and Standardized Vocabularies provide such a framework for systematic research. ​This framework ​consists of the following components and mechanisms: +The OMOP CDM and Standardized Vocabularies provide such a framework for systematic research. ​It consists of the following components and mechanisms:
-  * Multiple Vocabularies used in observational data consolidated into a common format. This relieves the researchers from having to understand and handle multiple different formats and life cycle conventions these Vocabularies come with. +
-  * Assignment of a Domain for each Concept of the Vocabularies. For each concept, the Standardized Vocabularies define the clinical Domain, and with it the CDM table a clinical entity should be placed into or looked for during queries. +
-  * Relationships between Concepts within the Vocabularies and across Vocabularies,​ e.g. Relationships between Concepts in the Condition Domain as indication to Concepts in the Drug Domain, as well as mapping of equivalent Concept in different Vocabularies to each other. +
-  * Hierarchical structure within Concepts of a Domain. This allows to researcher to query for all Concepts (e.g. drug products) that are hierarchically subsumed under a higher level Concept (e.g. a drug class).+
  
-It is important to note that these components ​are constructed ​strictly for the purpose of supporting ​observational research. In that regard the Standardized Vocabularies differ from large collections with equivalence mapping of concepts such as the [[XXX|UMLS]]. UMLS resources have been used heavily as a basis for constructing many of the Standardized Vocabulary components, but significant additional efforts have been made to the purpose ​of this resource:+  - **Standardization:​** Multiple Vocabularies used in observational data consolidated into a common format. This relieves the researchers from having to understand and handle multiple different formats and life cycle conventions of the Vocabularies. 
 +  - **Unique Standard Concepts**: For each Clinical Entity, there is only one Concepts representing it, called the Standard Concept. Other equivalent or similar Concepts are designated non-Standard and mapped to the Standard ones. 
 +  - **Domains:​** Each Concept is assigned a Domain, and it is done so correctly. This also defines in which CDM table a clinical entity should be placed into or looked up in at query time. 
 +  - **Comprehensive coverage:** All meaningful entities in a Domain have a Standard Concept. This ensures that any activity or observation relevant to a patient can be represented in the data. 
 +  - **Hierarchy:​** Within a Domain all Concepts are organized in a hierarchical structure. This allows to researcher to query for all Concepts (e.g. drug products) that are hierarchically subsumed under a higher level Concept (e.g. a drug class). 
 +  - **Relationships** between Concepts within and acros Vocabularies and **Mappings** from non-Standard to Standard Concepts. 
 + 
 +It is important to note that these critera ​are followed ​strictly for the purpose of observational research. In that regard the Standardized Vocabularies differ from large collections with equivalence mapping of concepts such as the [[http://​www.nlm.nih.gov/​research/​umls|UMLS]], which supports indexing and searching the entire biomedical literature. UMLS resources have been used heavily as a basis for constructing many of the Standardized Vocabulary components, but significant additional efforts have been made to the make the framework for for purpose:
   * Additional Vocabularies were created, mostly for metadata purposes.   * Additional Vocabularies were created, mostly for metadata purposes.
-  * Mappings and relationships were added to achieve comprehensive coverage. If equivalence ​couldn'​t ​be achieved, "​uphill"​ relationships from more granular non-standard to higher level Standard Concepts ​were created.+  * Mappings and relationships were added to achieve comprehensive coverage. If equivalence ​cannot ​be achieved, "​uphill"​ relationships from more granular non-standard to higher level Standard Concepts ​are created.
   * A comprehensive domain structure was established and each Concepts was assigned a Domain (or combination of Domains).   * A comprehensive domain structure was established and each Concepts was assigned a Domain (or combination of Domains).
   * A hierarchical tree within Domains was built representing classifications used in medical science and clinical practice.   * A hierarchical tree within Domains was built representing classifications used in medical science and clinical practice.
  
 +However, significant work needs to be done to achieve all the criteria in all of the Domains. Currently, we can achieve the following compliance for the most complex ones:
  
- +^Domain^Standardization^Unique Concepts^Reliable Domains^Comprehensive Coverage^Hierarchy^Mapping^ 
- +|Drug| ​ x  |  x  |  x  |In US. Other countries in process| ​ x  |  x  | 
- +|Condition| ​ x  |  x  |  x  |  x  |  x  |  x  | 
-The availability of very large-scale healthcare databases in electronic form, such as administrative claims and electronic health record data, has opened the possibility to generate systematic and large-scale evidence and insights about the application of healthcare to patients. Amongst them the effectiveness and risks of treatment interventions. However, because of a lack of standardization,​ clinical terminologies may differ across databases. One approach to fully harvest the value of multiple data sources and assure that the output is comparable is to standardize source codes into a common terminology. +|Procedure|  x  |Heavy overlapping| ​ x  |  x  | |   
- +|Measurement| ​ x  |  somewhat ​ |  mostly ​ |  x  |  minimal ​ | | 
-In the US, diagnosis codes in medical claims are generally processed based on the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) coding systemThe Health Insurance Portability and Accountability Act (HIPAA) prescribes adoption rules about how transaction standards for electronic healthcare data interchange for covered entities are regulated, among them the use of ICD-9-CM.[4] From October 2013, ICD-10-CM , the successor to ICD-9-CM, must be used on all HIPAA transactions.[5] ​ For inpatient hospital procedure coding, the International Classification of Diseases, Tenth Revision, ​Procedure ​Coding System (ICD-10-PCS) will be used.[6] ​ +|Device| | |  mostly ​ | | | | 
 +|Unit| ​ x  |  x  |  x  |  x  | | |
  
  
documentation/vocabulary/background.txt · Last modified: 2016/06/18 19:06 by cgreich