User Tools

Site Tools


documentation:next_cdm:schema_revisions

CDM Schema Revisions

(needs rewriting to accommodate schemas in the specs)

  • Requesters: Frank DeFalco, Patrick Ryan

Proposal

We propose changes to the tables included in the CDM schema in order to clarify their intent. Specifically we propose

  • The cohort, cohort_definition, cohort_attribute and attribute_definition tables move from the CDM specification to the results schema specification.
  • The results schema specification will be managed by a database migration package that will be developed leveraging Flyway.
  • The table in the OHDSI schema named cohort_definition and the result schema table 'cohort_definition' should not use the same name to avoid confusion. Option 1: Rename 'cohort_definition' in the OHDSI schema. Option 2: Rename 'cohort_definition' in the results schema. Since 'cohort_definition' is being moved from CDM to results schema, it seems least disruptive to pursue Option 2.
Background

The OHDSI data architecture defines the different categories and relative schema that are used within the broader OHDSI architecture including the source (native schema), standardized (CDM schema), derived (results schema) and administrative (ohdsi schema).

In April 2012 the CDM V4 specification introduced the cohort table as a location to store records that share a particular feature during a particular time span and defined cohorts as a group of entities exposed to a common circumstance. This table has since been included as part of the DDL statements to create a CDM database.

When the initial tool to create cohort definitions (CIRCE) was introduced it introduced a new 'cohort' table that was found in the results schema where it would store people identified when a cohort definition is executed against a CDM database.

Since that time many other tools have been created and new tables have appropriately been deployed in the separate results schema to store the data that they derive from the CDM schema. This represents the fundamental issue we are seeking to resolve whereby some derived results are being stored in the CDM schema specified table and others are defined and maintained in the results schema.

Conventions

Our proposed conventions are that all tables in the CDM schema should contain data that was derived from the original data source (also referred to as “native schema”) All tables in the RESULTS schema should contain data that was derived from the CDM schema. The RESULTS schema table will include tables for achilles results, cohort generation, heracles results, estimation results, etc.

This is a subtle change but one that provides clear conventions for the intent of the different schemas. The development of the database migration package will also provide a new and useful tool for users to be able to create the necessary tables to use OHDSI tools in a more flexible way. This will remove the current limitation whereas the only way to create RESULTS schema tables is by installing and running the WebAPI. The WebAPI will instead leverage this migration package to validate and migrate the tables required for its operation.

Additionally we propose that a convention be adopted whereby no table name is reused across any of the schemas defined in the data architecture in order to prevent collision or confusion.

Use Cases

  • Install / Migrate CDM schema
  • Install / Migrate RESULTS schema
documentation/next_cdm/schema_revisions.txt · Last modified: 2018/02/12 20:23 by patrick_ryan