This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
|
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/28 21:14] bchristian created |
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 18:47] (current) bchristian Updates from afternoon discussion |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| What do I have to have to do an OMOP conversion? | What do I have to have to do an OMOP conversion? | ||
| + | |||
| + | Separate the process into modules. | ||
| Pre-analysis | Pre-analysis | ||
| Line 7: | Line 9: | ||
| - OMOP vocabulary (reference codes) | - OMOP vocabulary (reference codes) | ||
| - are any fields masked or modified for privacy and if so, what is the algorithm? | - are any fields masked or modified for privacy and if so, what is the algorithm? | ||
| + | - Use min and max to measure range for observations and for each field. | ||
| + | - Define how to handle exceptions | ||
| - Identify subject matter expert for source. | - Identify subject matter expert for source. | ||
| - Identify OMOP subject matter expert. | - Identify OMOP subject matter expert. | ||
| Line 15: | Line 19: | ||
| - Is the conversion feasible? | - Is the conversion feasible? | ||
| - Draft ETL Specs. | - Draft ETL Specs. | ||
| + | - Privacy requirements | ||
| + | - Definition of visit | ||
| + | - Specific codes that are of interest. Ensure these are mapped. | ||
| + | - Business rules for handling conflicting information (e.g. visits after patient death). | ||
| + | - Specific metrics that are of interest. | ||
| Line 20: | Line 29: | ||
| Post-analysis | Post-analysis | ||
| + | - Revised Data dictionary | ||
| - Initial ETL Spec including: | - Initial ETL Spec including: | ||
| - | - Business rules for mapping in a detailed specification, preferably in a computer readable format, like White Rabbit. | + | - Business rules for mapping in a detailed specification, preferably in a computer readable format, like [[https://www.ohdsi.org/analytic-tools/whiterabbit-for-etl-design/?ModPagespeed=noscript|White Rabbit]]. |
| - | - View of mapping, preferably in a computer readable format, like rabbit in a hat. | + | - View of mapping, preferably in a computer readable format, like [[https://www.ohdsi.org/analytic-tools/whiterabbit-for-etl-design/?ModPagespeed=noscript|Rabbit-In-a-Hat]]. |
| - Identify any additional mapping needed: | - Identify any additional mapping needed: | ||
| - custom or local mapping of organizational codes | - custom or local mapping of organizational codes | ||
| - country codes not already in OHDSI vocabulary like OHIP codes | - country codes not already in OHDSI vocabulary like OHIP codes | ||
| + | - What to do with old codes that have been deactivated or invalidated but have no replacement codes | ||
| + | - add new values to be published in next version of vocabulary | ||
| - Preferred development methodology | - Preferred development methodology | ||
| Line 37: | Line 49: | ||
| - Dimension table first | - Dimension table first | ||
| - GRA table always last because derived | - GRA table always last because derived | ||
| + | - Define range of data to convert | ||
| + | - Observation period | ||
| + | - first/last? | ||
| + | - EMR vs Claims | ||
| + | - Use min and max to measure range for observations and for each field. | ||
| + | - exclude reversals for claims | ||
| + | - exclude invalid dates | ||
| + | - identify broken processes that generate invalid dates to help remove invalid dates | ||
| + | - Use start and end date of vocabulary items | ||
| + | - Document what to do with records that are missing required fields | ||
| + | - maybe you have a medical coder who can code from a description field | ||
| + | - Document what to do with records that have fields with invalid values | ||
| + | - Software lifecycle | ||
| + | - How do you develop, test, and accept for production? | ||
| + | - Jenkins for automated build | ||
| + | - SVN for source code control | ||
| + | - How do you manage effort and cost to convert millions of patient records in TB of data? | ||
| + | - Use a sample subset of the total data based on number of patients, amount of data, or processing time | ||
| + | - develop and test using a sample subset of entire data (150 thousand patients) | ||
| + | - business acceptance test using a large sample subset of entire data (1 million patients) | ||
| + | - production run using entire data (millions of patients) | ||
| + | - Define destination location(s) | ||
| + | - Always get the latest vocabularies before each refresh (development, test, or production run) | ||
| + | - Where do you get the most recent list of codes? | ||
| + | - Frequency or schedule of reviews | ||
| + | - How do you become aware of updates to CDM? | ||
| + | - How do you become aware of updates to vocabularies? | ||
| + | - Partitioning for parallelism to optimize performance | ||
| + | - Guidelines for incremental update | ||
| + | - Reusable code/Tables | ||
| + | - Intermediate model? | ||
| + | |||
| + | Quality Assurance (QA) | ||
| + | - How do we ensure ETL is good? | ||
| + | - metrics for success | ||
| + | - for each source and destination table | ||
| + | - top mapped codes with count of records | ||
| + | - top unmapped codes with count of records | ||
| + | - variance between previous and current run | ||
| + | - count of records | ||
| + | - % of records with mapped codes | ||
| + | - % unique codes that are mapped | ||
| + | - for select fields (demographics), show histogram of values | ||
| + | - compare actual to expected results | ||
| + | - ensure referential integrity on platforms that do not enforce it | ||
| + | - it would be awesome to compare histogram of values for source with equivalent destination | ||
| + | - it would be awesome to show improvements between runs due to better mapping and coding | ||
| + | - it would be nice to show average condition per visit | ||
| + | - some deidentification processes introduce variance in dates or id values | ||
| + | - How do we get business units to participate? | ||
| + | - How do we get approval from business units? | ||
| + | - Validate destination data with use cases and compare against source data with use cases. Investigate or accept variance. | ||
| + | - Standard model checks that are independent of data or volume | ||
| + | - automatic vs manual checks | ||
| + | - Frequency or schedule of reviews | ||
| + | - Tools | ||
| + | - Achilles | ||
| + | - Autosys | ||
| + | - Oozie | ||
| + | Operation | ||
| + | - Guidance for archive | ||
| + | - Tools | ||
| + | - monitoring | ||
| + | - kibana | ||