User Tools

Site Tools


documentation:vocabulary:principles

This is an old revision of the document!


Principles

The Standardized Vocabularies are constructed with a few principles in mind. Not every principle has been executed to perfection, but it represents a general motivation and direction of the ongoing improvement and development process:

  1. Standardization: Multiple Vocabularies used in observational data consolidated into a common format. This relieves the researchers from having to understand and handle multiple different formats and life cycle conventions of the Vocabularies.
  2. Unique Standard Concepts: For each Clinical Entity, there is only one Concepts representing it, called the Standard Concept. Other equivalent or similar Concepts are designated non-Standard and mapped to the Standard ones.
  3. Domains: Each Concept is assigned a Domain, and it is done so correctly. This also defines in which CDM table a clinical entity should be placed into or looked up in at query time.
  4. Comprehensive coverage: The Standardized Vocabularies intend to capture every event that is relevant to the patient's clinical experience (e.g. Conditions, Procedures, Exposures to Drug, etc.) and some of the administrative artifacts of the healthcare system (e.g. Visits, Care Sites, etc.). This means that in many cases a single Vocabulary is not sufficient to cover a Domain.
  5. Hierarchy: Within a Domain all Concepts are organized in a hierarchical structure. This allows to researcher to query for all Concepts (e.g. drug products) that are hierarchically subsumed under a higher level Concept (e.g. a drug class). This entails the solution of two separate problems:
    • Each Concepts should have one or more classifications (bottom up).
    • Each Classification should contain all the relevant Concepts (top down).
  6. Relationships between Concepts within and acros Vocabularies and Mappings from non-Standard to Standard Concepts.
  7. Life cycle keeping data representation up to date but supporting the processing of deprecated Concepts. This means that for each deprecated Concept a replacement Concept has to be assigned capturing the meaning of the clinical entity.

It is important to note that these critera are followed strictly for the purpose of observational research. In that regard the Standardized Vocabularies differ from large collections with equivalence mapping of concepts such as the UMLS, which supports indexing and searching the entire biomedical literature. UMLS resources have been used heavily as a basis for constructing many of the Standardized Vocabulary components, but significant additional efforts have been made to the make the framework for for purpose:

  • Additional Vocabularies were created, mostly for metadata purposes.
  • Mappings and relationships were added to achieve comprehensive coverage. If equivalence cannot be achieved, “uphill” relationships from more granular non-standard to higher level Standard Concepts are created.
  • A comprehensive domain structure was established and each Concepts was assigned a Domain (or combination of Domains).
  • A hierarchical tree within Domains was built representing classifications used in medical science and clinical practice.

However, significant work needs to be done to achieve all the criteria in all of the Domains. Currently, for the complex and non-administrative Domains we can achieve the following compliance:

DomainStandardizationUnique ConceptsReliable DomainsComprehensive CoverageHierarchy
Drug x x x In US, other countries in process x x
Condition x x x x x x
Procedure x heavily overlapping x x
Measurement x somewhat mostly x minimal
Device mostly
Unit x x x x

The Life Cycle is implemented for all Concepts, and its rules are described

documentation/vocabulary/principles.1466290228.txt.gz · Last modified: 2016/06/18 22:50 by cgreich