User Tools

Site Tools


documentation:next_cdm:metadata

Metadata

Proposals are now tracked as github issues

Use case

  1. display metadata within Atlas-Achilles Web (when reviewing data characterization plots and tables)
  2. allow organizations with multiple OMOP CDM datasets to have a mechanism to store dataset metadata (analysis of this use will provide input for phase 2 of metadata standardization)
  3. only run certain data quality checks when they are appropriate to the dataset (e.g., general population dataset; this use case depends on proper concept level standardization)

CDM changes

The proposal is adding a single table to the CDM specs. In phase 1, we are trying to provide a mechanism for sites to capture metadata. The concept level standardization is planned in phase 2.

new METADATA table

Tablename: METADATA

This table is relying on concept_id's that exist for CDM tables. In Atlas, search for those using advanced search and selecting Metadata.

Column Description Data_type
METADATA_CONCEPT_ID OMOP Vocabulary CONCEPT_ID that identifies the information you with to track (e.g. 8 for metadata about a Visit) INT
METADATA_TYPE_CONCEPT_ID OMOP Vocabulary CONCEPT_ID that identifies the type information you with to track (e.g. 1 for metadata about Domains such as a Visit) INT
NAME Name of the CONCEPT_ID stored in METADATA_CONCEPT_ID or in the event there is not an applicable CONCEPT_ID NAME can be used to represent the data stored (e.g. CDM_BUILDER VERSION) VARCHAR(250)
VALUE Store the metadata value you wish to capture NVCHAR

Modified proposal

Column Description Data_type Required
METADATA_CONCEPT_ID OMOP Vocabulary CONCEPT_ID that identifies the information you with to track (e.g. 8 for metadata about a Visit) INT
METADATA_TYPE_CONCEPT_ID OMOP Vocabulary CONCEPT_ID that identifies the type information you with to track (e.g. 1 for metadata about Domains such as a Visit) INT
NAME Name of the CONCEPT_ID stored in METADATA_CONCEPT_ID or in the event there is not an applicable CONCEPT_ID NAME can be used to represent the data stored (e.g. CDM_BUILDER VERSION) VARCHAR(250)
VALUE_AS_STRING Store the metadata value (string) NVCHAR
VALUE_AS_CONCEPT_ID OMOP Vocabulary CONCEPT_ID that reflects the metadata value int No
METADATA_DATETIME The date and time associated with metadata datetime No
METADATA_DATE date date No

Example records:

METADATA_ CONCEPT_ID METADATA_TYPE_ CONCEPT_ID NAME VALUE
51 1 PERSON Person information is pulled from insurance enrollment data where the individual both has medical and prescription benefits. The month of birth is not provided however for enrollees who start their enrollment the year they are born we extrapolate their month of birth from the month where their enrollment starts, for the majority of patients only year of birth is available. Persons who change gender over their enrollments or change year of birth are excluded.
0 1 OBSERVATION PERIOD An observation period is a representation of when a patient was enrolled in a health insurance plan and had prescription benefits. Periods of continuous enrollment are consolidated by combining monthly records as long as the time between the end of one enrollment period and the start of the next is 32 days or less.
57 1 CARE SITE There is not clear care site information in this source so no data will be captured within this table.
8 1 VISIT For the outpatient visits, all activity that is recorded on a single day for a person is considered to have occurred during one visit with the visit start and end date corresponding to this date.
55 1 PROVIDER Unique list of health care providers (physicians). Truven does provide some provider information however some of the providers listed by Truven may also be considered care sites or organizations. Since there is not clear way to decipher between all items identified as providers by Truven, regardless if they are truly organizations or care sites, they will be added to this table.
0 1 DEATH Death in Truven can be captured at discharge from an inpatient visits or in some cases by diagnosis code. The death data in this source should not be considered complete, for example if a patient left a hospital and later died at home that would not be captured. Additionally if a death was recorded however if the patient continues to have services charges after 30 days of the death date we assume the death data was faulty.
191CONDITIONCondition records are primarily recorded as codified claims data (e.g. ICD9 or ICD10 records that are submitted associated with a service). Additional condition information comes from patients who also have Health Risk Assessment data from Truven.
131DRUGDrug exposure records are primarily recorded as codified claims data (e.g. an NDC code or a procedure code that includes a drug). If the OMOP Vocabulary deems a code of a non-traditional drug centric vocabulary is in fact a drug exposure, the record will move to this table (e.g. CPT4- 90690- “Typhoid vaccine, live, oral” maps to drug concept in the OMOP Vocabularies so the CDM_BUILDER will move the record to the DRUG_EXPOSURE table instead of the procedure table). Additional drug exposure information comes from patients who also have Health Risk Assessment data from Truven.
101PROCEDUREProcedure occurrence records are recorded as codified claims data (e.g. a CPT4 code or ICD9 procedure code). If the OMOP Vocabulary deems a procedure code to be of a type of another domain (e.g. CPT4- 90690- “Typhoid vaccine, live, oral” maps to drug concept in the OMOP Vocabularies so the CDM_BUILDER will move the record to the DRUG_EXPOSURE table instead of the procedure table) however in the case of the primary procedure code those will always write a record to this table in order to maintain cost data.
211MEASUREMENTMeasurement data traditionally comes from lab data supplied from laboratory service vendors however data vendors such as Truven do not have 100% representation with their lab results (e.g. they will only receive lab data of vendors they have contracted with like a Quest Diagnostics). If the OMOP Vocabulary deems a code of a non-traditional measurement centric vocabulary is in fact a measurement, the record will move to this table (e.g. ICD9- V85.22- “Body Mass Index 26.0-26.9, adult” usually thought of as a diagnosis code maps to a measurement concept in the OMOP Vocabularies so the CDM_BUILDER will move the record to the MEASUREMENT table). Additional measurement information comes from patients who also have Health Risk Assessment data from Truven.
271OBSERVATIONCodified data or Health Risk Assessment data that is not a diagnosis, drug exposure, procedure, or measurement will become an observation.
0 0 CDM_BUILDER VERSION 1.8.0.9
00DATASET_TYPEClinical Trial Data

The proposal encourages all CDM adopters to fully populate and utilize the existing CDM_SOURCE table.

END OF PROPOSAL












Text below only reflects some historical notes related to the proposal above.

Details 1

Proposing person: Patrick Ryan, Martijn Schuemie, Ajit Londhe, & Erica Voss

(may need to be updated)

Additionally we would like the CDM_SOURCE table to store metadata about each of the domains. Our idea is to implement it by adding an additional column for each domain in the CDM to the CDM_SOURCE table (i.e. CDM_SOURCE.VISIT_OCCURRENCE, CDM_SOURCE.PERSON, etc). The value this brings is this will allow us to display information about a specific domain on an ACHILLES report. For example, VISIT_OCCURRENCE logic in PREMIER is fairly complex and displaying a description of that logic at the point where someone is reviewing the data in ACHILLES would be beneficial.

Here is an example of some text for JMDC:

Database as a whole

(already has a column) JMDC database consists of data from 60 Society-Managed Health Insurances covering workers aged 18 to 65 and their dependents (children younger than 18 years old and elderly people older than 65 years old). The old people (particularly those aged 66 or older) are less representative as compared with whole population in the nation. When estimated among the people who are younger than 66 years old, the proportion of children younger than 18 years old in JMDC is approximately the same as the proportion in the whole nation. JMDC data includes data on membership status of the insured people and claims data provided by insurers under contract. Claims data are derived from monthly claims issued by clinics, hospitals and community pharmacies.

Person

JMDC covers workers aged 18 to 65 and their dependents (children younger than 18 years old and elderly people older than 65 years old). The old people (particularly those aged 66 or older) are less representative as compared with whole population in the nation. When estimated among the people who are younger than 66 years old, the proportion of children younger than 18 years old in JMDC is approximately the same as the proportion in the whole nation. Only the year of birth is available, so not the day or month.

Observation_period

The observation period is defined as the time of enrollment in the health insurance. If the member is a dependent, the enrollment depends on the enrollment of the main beneficiary.

Care_site

Care sites in JMDC are institutions where care is provided, typically a department in a hospital.


Details 2

debate about CDM_SOURCE table

CDM_SOURCE table

improve the guidance for this table

(superceded by inclusion of the below information in the METADATA table)

  • capture DATASET_TYPE_CONCEPT_ID Definition: Reference to concept_id in OHDSI/OMOP Terminology (class = “Dataset Type”) that indicates what type of data is in the dataset. Set to NULL if none of the concepts correctly characterizes the data. For large samples of specialized population by insurance (e.g., US Medicaide, use general population concepts)
    • Values are: General population EHR data, General population claims data, General Population EHR + Claims Data, Clinical Trial Data

Advanced Data Quality checks (inside Achilles Heel) would take advantage of this information in this new column.

DATASET_TYPE_CONCEPT_ID
  • if you don't want to (or can't) declare the type of data, use concept 0 (*)
  • Clinical trial data (dataset type) (*)
  • Multiple sources (dataset type)
  • Registry data (dataset type)
  • Predominantly Electronic Health Record data (dataset type)
  • Predominantly Administrative/Claims data (dataset type)
  • Predominantly Health Information Exchange data (dataset type)
  • Data limited to a single medical specialty/clinical domain, not covering general population (dataset type) (*)

Predominantly means if at least 51% of significant records comes from a given source. Inpatient vs outpatient data can be determined from visit types and does not need to be classified above.


Column Description Data type
DATASET_TYPE_CONCEPT_ID Type of dataset. Reference to OMOP Concept that provides dataset type classification. integer
Details 3

Proposing person: Ajit Londhe, & Erica Voss

We would like to propose the following table to hold metadata:

Tablename: METADATA

Column Description Data_type
METADATA_CONCEPT_ID OMOP Vocabulary CONCEPT_ID that identifies the information you with to track (e.g. 8 for metadata about a Visit) INT
METADATA_TYPE_CONCEPT_ID OMOP Vocabulary CONCEPT_ID that identifies the type information you with to track (e.g. 1 for metadata about Domains such as a Visit) INT
NAME Name of the CONCEPT_ID stored in METADATA_CONCEPT_ID or in the event there is not an applicable CONCEPT_ID NAME can be used to represent the data stored (e.g. CDM_BUILDER VERSION) VARCHAR(250)
VALUE Store the metadata value you wish to capture NVCHAR

Example records:

METADATA_CONCEPT_ID METADATA_TYPE_CONCEPT_ID NAME VALUE
8 1 VISIT For the outpatient visits, all activity that is recorded on a single day for a person is considered to have occurred during one visit with the visit start and end date corresponding to this date.
0 0 CDM_BUILDER VERSION 1.8.0.9

NOTES original table was

Column Description Data type
DATASET_TYPE_CONCEPT_ID Type of dataset. Reference to OMOP Concept that provides dataset type classification. integer
PERSON text
OBSERVATION_PERIOD text
VISIT_OCCURRENCE Description of the logic used to populate the table (column name indicates the table). text
PROCEDURE_OCCURRENCE Description of the logic used to populate the table (column name indicates the table). text
CONDITION_OCCURRENCE Description of the logic used to populate the table (column name indicates the table). text
DRUG_EXPOSURE Description of the logic used to populate the table (column name indicates the table). text
MEASUREMENT Description of the logic used to populate the table (column name indicates the table). text
documentation/next_cdm/metadata.txt · Last modified: 2017/07/10 16:26 by clairblacketer