This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision | |||
|
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 16:00] bchristian Discussion about QA |
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 18:47] (current) bchristian Updates from afternoon discussion |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| What do I have to have to do an OMOP conversion? | What do I have to have to do an OMOP conversion? | ||
| + | |||
| + | Separate the process into modules. | ||
| Pre-analysis | Pre-analysis | ||
| Line 29: | Line 31: | ||
| - Revised Data dictionary | - Revised Data dictionary | ||
| - Initial ETL Spec including: | - Initial ETL Spec including: | ||
| - | - Business rules for mapping in a detailed specification, preferably in a computer readable format, like White Rabbit. | + | - Business rules for mapping in a detailed specification, preferably in a computer readable format, like [[https://www.ohdsi.org/analytic-tools/whiterabbit-for-etl-design/?ModPagespeed=noscript|White Rabbit]]. |
| - | - View of mapping, preferably in a computer readable format, like rabbit in a hat. | + | - View of mapping, preferably in a computer readable format, like [[https://www.ohdsi.org/analytic-tools/whiterabbit-for-etl-design/?ModPagespeed=noscript|Rabbit-In-a-Hat]]. |
| - Identify any additional mapping needed: | - Identify any additional mapping needed: | ||
| - custom or local mapping of organizational codes | - custom or local mapping of organizational codes | ||
| Line 57: | Line 59: | ||
| - Use start and end date of vocabulary items | - Use start and end date of vocabulary items | ||
| - Document what to do with records that are missing required fields | - Document what to do with records that are missing required fields | ||
| + | - maybe you have a medical coder who can code from a description field | ||
| - Document what to do with records that have fields with invalid values | - Document what to do with records that have fields with invalid values | ||
| - Software lifecycle | - Software lifecycle | ||
| - How do you develop, test, and accept for production? | - How do you develop, test, and accept for production? | ||
| - | - How do you manage effort and cost to convert millions of patient records in TB of data? | ||
| - | - develop and test using a sample subset of entire data (150 thousand patients) | ||
| - | - business acceptance test using a large sample subset of entire data () | ||
| - | - production run using entire data | ||
| - Jenkins for automated build | - Jenkins for automated build | ||
| - SVN for source code control | - SVN for source code control | ||
| + | - How do you manage effort and cost to convert millions of patient records in TB of data? | ||
| + | - Use a sample subset of the total data based on number of patients, amount of data, or processing time | ||
| + | - develop and test using a sample subset of entire data (150 thousand patients) | ||
| + | - business acceptance test using a large sample subset of entire data (1 million patients) | ||
| + | - production run using entire data (millions of patients) | ||
| - Define destination location(s) | - Define destination location(s) | ||
| - Always get the latest vocabularies before each refresh (development, test, or production run) | - Always get the latest vocabularies before each refresh (development, test, or production run) | ||
| - Where do you get the most recent list of codes? | - Where do you get the most recent list of codes? | ||
| + | - Frequency or schedule of reviews | ||
| - How do you become aware of updates to CDM? | - How do you become aware of updates to CDM? | ||
| - How do you become aware of updates to vocabularies? | - How do you become aware of updates to vocabularies? | ||
| - Partitioning for parallelism to optimize performance | - Partitioning for parallelism to optimize performance | ||
| + | - Guidelines for incremental update | ||
| + | - Reusable code/Tables | ||
| + | - Intermediate model? | ||
| - | QA | + | Quality Assurance (QA) |
| - How do we ensure ETL is good? | - How do we ensure ETL is good? | ||
| - metrics for success | - metrics for success | ||
| Line 89: | Line 97: | ||
| - it would be awesome to show improvements between runs due to better mapping and coding | - it would be awesome to show improvements between runs due to better mapping and coding | ||
| - it would be nice to show average condition per visit | - it would be nice to show average condition per visit | ||
| + | - some deidentification processes introduce variance in dates or id values | ||
| + | - How do we get business units to participate? | ||
| + | - How do we get approval from business units? | ||
| + | - Validate destination data with use cases and compare against source data with use cases. Investigate or accept variance. | ||
| + | - Standard model checks that are independent of data or volume | ||
| + | - automatic vs manual checks | ||
| + | - Frequency or schedule of reviews | ||
| + | - Tools | ||
| + | - Achilles | ||
| + | - Autosys | ||
| + | - Oozie | ||
| + | Operation | ||
| + | - Guidance for archive | ||
| + | - Tools | ||
| + | - monitoring | ||
| + | - kibana | ||