User Tools

Site Tools


documentation:vocabulary_etl

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
documentation:vocabulary_etl [2015/06/16 12:45]
hardhouse add DRG
documentation:vocabulary_etl [2022/02/23 21:38] (current)
mkallfelz
Line 1: Line 1:
 +
 +
 +
 +''​!! This cookbook is outdated - do not follow these instructions !!''​
 +
 +
 +Vocabulary Users download vocabularies from Athena: [[https://​athena.ohdsi.org/​]]
 +
 +
 +----
 +
 ====== Background ====== ====== Background ======
  
Line 67: Line 78:
 | SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​26/​subpack/​102/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​1kkxe04jo40gw5q3xs0obolnvx/​SNOMEDCT2/​18.0.0/​UK_SCT2CLFULL/​uk_sct2clfull_18.0.0_20141001000001.zip | | SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​26/​subpack/​102/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​1kkxe04jo40gw5q3xs0obolnvx/​SNOMEDCT2/​18.0.0/​UK_SCT2CLFULL/​uk_sct2clfull_18.0.0_20141001000001.zip |
 | SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​6/​subpack/​24/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​15sgy3n9hin6pvu7rwv11go46t/​NHSBSA/​6.1.0/​NHSBSA_DMD/​nhsbsa_dmd_6.1.0_20150608000001.zip | | SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​6/​subpack/​24/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​15sgy3n9hin6pvu7rwv11go46t/​NHSBSA/​6.1.0/​NHSBSA_DMD/​nhsbsa_dmd_6.1.0_20150608000001.zip |
-| RxNorm, NDFRT, VA Product, VA Class, ATC| UMLS website | http://​www.nlm.nih.gov/​research/​umls/​rxnorm/​docs/​rxnormfiles.html | http://​download.nlm.nih.gov/​umls/​kss/​rxnorm/​RxNorm_full_03022015.zip | +| SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​6/​subpack/​25/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​1mebfzrdolgi1kwge9xh0ddgtl/​NHSBSA/​6.3.0/​NHSBSA_DMDBONUS/​nhsbsa_dmdbonus_6.3.0_20150622000001.zip | 
-| HCPCS| CMS website | http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Alpha-Numeric-HCPCS.html | http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Downloads/​2015-Annual-Alpha-Numeric-HCPCS-File.zip |+| RxNorm, NDFRT, VA Product, VA Class, ATC, MeSH, ICD10, GCN_SEQNO, ETC, Indication ​| UMLS website | http://​www.nlm.nih.gov/​research/​umls/​rxnorm/​docs/​rxnormfiles.html | http://​download.nlm.nih.gov/​umls/​kss/​rxnorm/​RxNorm_full_03022015.zip | 
 +| HCPCS| CMS website | http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Alpha-Numeric-HCPCS.html | https://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Downloads/​2016-Alpha-Numeric-HCPCS-File.zip |
 | ICD9CM and ICD9Proc | CMS website | http://​www.cms.gov/​Medicare/​Coding/​ICD9ProviderDiagnosticCodes/​codes.html| http://​www.cms.gov/​Medicare/​Coding/​ICD9ProviderDiagnosticCodes/​Downloads/​ICD-9-CM-v32-master-descriptions.zip | | ICD9CM and ICD9Proc | CMS website | http://​www.cms.gov/​Medicare/​Coding/​ICD9ProviderDiagnosticCodes/​codes.html| http://​www.cms.gov/​Medicare/​Coding/​ICD9ProviderDiagnosticCodes/​Downloads/​ICD-9-CM-v32-master-descriptions.zip |
 +| ICD10CM | CDC website | http://​www.cdc.gov/​nchs/​icd/​icd10cm.htm | ftp://​ftp.cdc.gov/​pub/​Health_Statistics/​NCHS/​Publications/​ICD10CM/​20xx/​ICD10CM_FY20xx_code_descriptions.zip |
 | LOINC| LOIC website | https://​loinc.org/​downloads/​files/​loinc-table-csv-text-format| http://​loinc.org/​downloads/​files/​loinc-table-csv-text-format/​loinc-table-file-csv-text-format/​download | | LOINC| LOIC website | https://​loinc.org/​downloads/​files/​loinc-table-csv-text-format| http://​loinc.org/​downloads/​files/​loinc-table-csv-text-format/​loinc-table-file-csv-text-format/​download |
 | LOINC| LOIC website | https://​loinc.org/​downloads/​files/​loinc-multiaxial-hierarchy| http://​loinc.org/​downloads/​files/​loinc-multiaxial-hierarchy/​loinc-multiaxial-hierarchy-file/​download | | LOINC| LOIC website | https://​loinc.org/​downloads/​files/​loinc-multiaxial-hierarchy| http://​loinc.org/​downloads/​files/​loinc-multiaxial-hierarchy/​loinc-multiaxial-hierarchy-file/​download |
Line 76: Line 89:
 | LOINC| UMLS website | http://​www.nlm.nih.gov/​research/​umls/​mapping_projects/​loinc_to_cpt_map.html | http://​download.nlm.nih.gov/​umls/​kss/​mappings/​LNC215_TO_CPT2005/​LNC215_TO_CPT2005_MAPPINGS.zip | | LOINC| UMLS website | http://​www.nlm.nih.gov/​research/​umls/​mapping_projects/​loinc_to_cpt_map.html | http://​download.nlm.nih.gov/​umls/​kss/​mappings/​LNC215_TO_CPT2005/​LNC215_TO_CPT2005_MAPPINGS.zip |
 | NDC-SPL| FDA website | http://​www.fda.gov/​Drugs/​InformationOnDrugs/​ucm142438.htm | http://​www.fda.gov/​downloads/​Drugs/​DevelopmentApprovalProcess/​UCM070838.zip | | NDC-SPL| FDA website | http://​www.fda.gov/​Drugs/​InformationOnDrugs/​ucm142438.htm | http://​www.fda.gov/​downloads/​Drugs/​DevelopmentApprovalProcess/​UCM070838.zip |
 +| NDC-SPL| Dailymed website | http://​dailymed.nlm.nih.gov/​dailymed/​spl-resources-all-drug-labels.cfm | ftp://​public.nlm.nih.gov/​nlmdata/​.dailymed/​dm_spl_release_human_rx_part1.zip |
 +| NDC-SPL| Dailymed website | http://​dailymed.nlm.nih.gov/​dailymed/​spl-resources-all-mapping-files.cfm | ftp://​public.nlm.nih.gov/​nlmdata/​.dailymed/​rxnorm_mappings.zip |
 | READ| UK HSCIC website | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​1/​subpack/​21/​releases. | select latest complete released file on the web page | | READ| UK HSCIC website | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​1/​subpack/​21/​releases. | select latest complete released file on the web page |
 | READ| UK HSCIC website | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​9/​subpack/​9/​releases | select latest complete released file on the web page | | READ| UK HSCIC website | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​9/​subpack/​9/​releases | select latest complete released file on the web page |
Line 101: Line 116:
   * ICD10   * ICD10
   * ICD9CM   * ICD9CM
 +  * ICD10CM
   * ICD9Proc   * ICD9Proc
   * LOINC   * LOINC
Line 109: Line 125:
   * SNOMED   * SNOMED
   * UMLS   * UMLS
 +  * DRG
 +  * MeSH
   * working   * working
   * Final_Assembly   * Final_Assembly
Line 236: Line 254:
   - Extract f_ampp2_xxxxxxx.xml,​ f_amp2_xxxxxxx.xml,​ f_vmpp2_xxxxxxx.xml,​ f_vmp2_xxxxxxx.xml,​ f_lookup2_xxxxxxx.xml,​ f_vtm2_xxxxxxx.xml and f_ingredient2_xxxxxxx.xml   - Extract f_ampp2_xxxxxxx.xml,​ f_amp2_xxxxxxx.xml,​ f_vmpp2_xxxxxxx.xml,​ f_vmp2_xxxxxxx.xml,​ f_lookup2_xxxxxxx.xml,​ f_vtm2_xxxxxxx.xml and f_ingredient2_xxxxxxx.xml
   - Load them into f_ampp2, f_amp2, f_vmpp2, f_vmp2, f_lookup2, f_vtm2 and f_ingredient2. Use the control files of the same name.   - Load them into f_ampp2, f_amp2, f_vmpp2, f_vmp2, f_lookup2, f_vtm2 and f_ingredient2. Use the control files of the same name.
 +  - Download nhsbsa_dmdbonus_X.X.X_YYYYMMDDXXXXXX.zip from https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​6/​subpack/​25/​releases
 +  - Extract weekXXYYYY-rX_X-BNF.zip/​f_bnf1_XXXXXXX.xml and rename him to dmdbonus.xml
 +  - Load dmdbonus.xml using dmdbonus.ctl
  
 Staging table load statements Staging table load statements
Line 255: Line 276:
 sqlldr PRODV5/<​password>​ CONTROL=f_ingredient2.ctl LOG=f_ingredient2.log BAD=f_ingredient2.bad ​ sqlldr PRODV5/<​password>​ CONTROL=f_ingredient2.ctl LOG=f_ingredient2.log BAD=f_ingredient2.bad ​
 sqlldr PRODV5/<​password>​ CONTROL=f_lookup2.ctl LOG=f_lookup2.log BAD=f_lookup2.bad ​ sqlldr PRODV5/<​password>​ CONTROL=f_lookup2.ctl LOG=f_lookup2.log BAD=f_lookup2.bad ​
 +sqlldr PRODV5/<​password>​ CONTROL=dmdbonus.ctl LOG=dmdbonus.log BAD=dmdbonus.bad ​
 </​code>​ </​code>​
  
Line 294: Line 316:
  
   - Run create_source_tables.sql   - Run create_source_tables.sql
-  - Download the latest file http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Alpha-Numeric-HCPCS.html,​ file name YYYY-Annual-Alpha-Numeric-HCPCS-File.zip +  - Download the latest file http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Alpha-Numeric-HCPCS.html,​ file name YYYY-Alpha-Numeric-HCPCS-File.zip 
-  - Extract ​HCPCYYYY_CONTR_ANWEB_v2.xlsx +  - Extract ​HCPCYYYY_CONTR_ANWEB.xlsx 
-  - Open the above file in excel and export a csv file (with double quoted fields) called ​HCPC2015_CONTR_ANWEB_v2.csv for loading into ANWEB_V2 staging table. ​+  - Open the above file in excel and export a csv file (with double quoted fields) called ​ANWEB_V2.csv for loading into ANWEB_V2 staging table. ​
   - Load the file data into ANWEB_V2. Use the control files of the same name.   - Load the file data into ANWEB_V2. Use the control files of the same name.
   - Run load_stage.sql   - Run load_stage.sql
Line 431: Line 453:
   - Extract loinc.csv, map_to.csv, source_organization.csv and LOINC_250_MULTI-AXIAL_HIERARCHY.CSV   - Extract loinc.csv, map_to.csv, source_organization.csv and LOINC_250_MULTI-AXIAL_HIERARCHY.CSV
   - Load them into LOINC, MAP_TO, SOURCE_ORGANIZATION and LOINC_HIERARCHY. Use the control files of the same name.   - Load them into LOINC, MAP_TO, SOURCE_ORGANIZATION and LOINC_HIERARCHY. Use the control files of the same name.
-  - Load LOINC Answers - Load LOINC_XXX_SELECTED_FORMS.zip from http://​loinc.org/​downloads/​files/​loinc-panels-and-forms-file/​loinc-panels-and-forms-file-all-selected-panels-and-forms/​download +  - Load LOINC Answers ​and LOINC Forms - Load LOINC_XXX_PanelsAndForms.zip from http://​loinc.org/​downloads/​files/​loinc-panels-and-forms-file/​loinc-panels-and-forms-file-all-selected-panels-and-forms/​download 
-  - Open LOINC_XXX_SELECTED_FORMS.xlsx and load worksheet "​ANSWERS"​ to table LOINC_ANSWERS+  - Open LOINC_XXX_PanelsAndForms.xlsx and load worksheet "​ANSWERS"​ to table LOINC_ANSWERS ​(clear columns after DisplayText,​ save as Unicode text (UTF-8 w/o BOM) and use loinc_answers.ctl),​ worksheet "​FORMS"​ to table LOINC_FORMS (clear columns after Loinc, save as Unicode text (UTF-8 w/o BOM) and use loinc_forms.ctl)
   - Open loinc_class.csv and load it into table loinc_class   - Open loinc_class.csv and load it into table loinc_class
   - Download SnomedCT_LOINC_TechnologyPreview_INT_xxxxxxxx.zip from https://​loinc.org/​news/​draft-loinc-snomed-ct-mappings-and-expression-associations-now-available.html   - Download SnomedCT_LOINC_TechnologyPreview_INT_xxxxxxxx.zip from https://​loinc.org/​news/​draft-loinc-snomed-ct-mappings-and-expression-associations-now-available.html
Line 486: Line 508:
   - Extract the product.txt file.   - Extract the product.txt file.
   - Load product.txt into PRODUCT using control file of the same name   - Load product.txt into PRODUCT using control file of the same name
 +  - Download additional source for SPL concepts and relationships from http://​dailymed.nlm.nih.gov/​dailymed/​spl-resources-all-drug-labels.cfm and http://​dailymed.nlm.nih.gov/​dailymed/​spl-resources-all-mapping-files.cfm
 +  - -Full Releases of HUMAN PRESCRIPTION LABELS, HUMAN OTC LABELS, HOMEOPATHIC LABELS and REMAINDER LABELS (1st link)
 +  - -SPL-RXNORM MAPPINGS (2d link)
 +  - Extract LABELS using unzipxml.sh and load xml files using loadxml.ctl
 +  - Extract rxnorm_mappings.zip and load rxnorm_mappings.txt using rxnorm_mappings.ctl
   - Run load_stage.sql   - Run load_stage.sql
   - Run generic_update.sql (from working directory)   - Run generic_update.sql (from working directory)
Line 525: Line 552:
 === Overview === === Overview ===
  
-Download DRG dataset+Download, the DRG dataset. Decompress the gzipped files. Run the ETL script to create the staging tables schema and load the datasets.
  
 === Details === === Details ===
Line 531: Line 558:
   - Run create_source_tables.sql   - Run create_source_tables.sql
   - Download "Files for FY" (Table 5 only) from http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Acute-Inpatient-Files-for-Download.html   - Download "Files for FY" (Table 5 only) from http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Acute-Inpatient-Files-for-Download.html
-  2011: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_2011_FR_Table_5.zip +  ​2011: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_2011_FR_Table_5.zip 
-  2012: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_12_NPRM_Table_5.zip +  ​2012: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_12_NPRM_Table_5.zip 
-  2013: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_13_FR_Table_5.zip +  ​2013: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_13_FR_Table_5.zip 
-  2014: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_14_FR_Table_5.zip +  ​2014: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_14_FR_Table_5.zip 
-  2015: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY2015-NPRM-Table-5.zip+  ​2015: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY2015-NPRM-Table-5.zip
   - Extract *.txt files from archives   - Extract *.txt files from archives
   - Rename them to FY2011.txt, FY2012.txt ... FY2015.txt and sequentially load using control files of the same names   - Rename them to FY2011.txt, FY2012.txt ... FY2015.txt and sequentially load using control files of the same names
   - Run load_stage.sql   - Run load_stage.sql
-  - Run generic_update.sql ​(from working directory)+  - generic_update.sql ​NOT needed
  
 staging table load statements staging table load statements
Line 552: Line 579:
  
 ==== Step 12 ==== ==== Step 12 ====
 +
 +=== Overview ===
 +
 +MeSH datasets contains in UMLS.
 +
 +=== Details ===
 +
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +
 +==== Step 13 ====
 +
 +=== Overview ===
 +
 +ICD10 datasets contains in UMLS.
 +
 +=== Details ===
 +
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +
 +
 +==== Step 14 ====
 +
 +=== Overview ===
 +
 +GCN_SEQNO datasets contains in RxNorm.
 +
 +=== Details ===
 +
 +  - Run create_source_tables.sql
 +  - Load NDDF_PRODUCT_INFO.TXT into NDDF_PRODUCT_INFO using control file of the same name
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +==== Step 15 ====
 +
 +=== Overview ===
 +
 +ETC datasets contains in "ETC sources.zip"​.
 +
 +=== Details ===
 +
 +  - Run create_source_tables.sql
 +  - Unpack "ETC sources.zip"​
 +  - Load all TXT files using control files of the same name
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +==== Step 16 ====
 +
 +=== Overview ===
 +
 +Indication datasets contains in "​Indication sources.zip"​.
 +
 +=== Details ===
 +
 +  - Run create_source_tables.sql
 +  - Unpack "​Indication sources.zip"​
 +  - Load all TXT files using control files of the same name
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +
 +
 +
 +==== Step 17 ====
 +
 +=== Overview ===
 +
 +Download the ICD10CM datasets. Decompress the gzipped files. Run the ETL script to create the staging tables schema and load the datasets. ​
 +
 +=== Details ===
 +
 +  - Run create_source_tables.sql
 +  - Download ftp://​ftp.cdc.gov/​pub/​Health_Statistics/​NCHS/​Publications/​ICD10CM/​20xx/​ICD10CM_FY20xx_code_descriptions.zip
 +  - Extract icd10cm_order_20xx.txt and rename to icd10cm.txt
 +  - Load into icd10cm_table with icd10cm.ctl
 +  - Unzip icd10cm2snomed.zip and load into icd10cm2snomed with icd10cm2snomed.ctl
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 + 
 +==== Step 18 ====
  
 === Overview === === Overview ===
Line 562: Line 675:
   - Run drug_strength.sql   - Run drug_strength.sql
  
-==== Step 13 ====+==== Step 19 ====
  
 === Overview === === Overview ===
Line 575: Line 688:
   - Run manual_changes_15-Mar-2015.sql ​   - Run manual_changes_15-Mar-2015.sql ​
  
-==== Step 13 ====+==== Step 20 ====
  
 === Overview === === Overview ===
documentation/vocabulary_etl.1434458702.txt.gz · Last modified: 2015/06/16 12:45 by hardhouse