User Tools

Site Tools


documentation:vocabulary_etl

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
documentation:vocabulary_etl [2015/06/16 12:45]
hardhouse add DRG
documentation:vocabulary_etl [2016/03/29 12:49]
hardhouse add ICD10CM
Line 67: Line 67:
 | SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​26/​subpack/​102/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​1kkxe04jo40gw5q3xs0obolnvx/​SNOMEDCT2/​18.0.0/​UK_SCT2CLFULL/​uk_sct2clfull_18.0.0_20141001000001.zip | | SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​26/​subpack/​102/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​1kkxe04jo40gw5q3xs0obolnvx/​SNOMEDCT2/​18.0.0/​UK_SCT2CLFULL/​uk_sct2clfull_18.0.0_20141001000001.zip |
 | SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​6/​subpack/​24/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​15sgy3n9hin6pvu7rwv11go46t/​NHSBSA/​6.1.0/​NHSBSA_DMD/​nhsbsa_dmd_6.1.0_20150608000001.zip | | SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​6/​subpack/​24/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​15sgy3n9hin6pvu7rwv11go46t/​NHSBSA/​6.1.0/​NHSBSA_DMD/​nhsbsa_dmd_6.1.0_20150608000001.zip |
-| RxNorm, NDFRT, VA Product, VA Class, ATC| UMLS website | http://​www.nlm.nih.gov/​research/​umls/​rxnorm/​docs/​rxnormfiles.html | http://​download.nlm.nih.gov/​umls/​kss/​rxnorm/​RxNorm_full_03022015.zip | +| SNOMED | UK HSCIC Website ​  | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​6/​subpack/​25/​releases | https://​isd.hscic.gov.uk/​artefact/​trud3/​1mebfzrdolgi1kwge9xh0ddgtl/​NHSBSA/​6.3.0/​NHSBSA_DMDBONUS/​nhsbsa_dmdbonus_6.3.0_20150622000001.zip | 
-| HCPCS| CMS website | http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Alpha-Numeric-HCPCS.html | http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Downloads/​2015-Annual-Alpha-Numeric-HCPCS-File.zip |+| RxNorm, NDFRT, VA Product, VA Class, ATC, MeSH, ICD10, GCN_SEQNO, ETC, Indication ​| UMLS website | http://​www.nlm.nih.gov/​research/​umls/​rxnorm/​docs/​rxnormfiles.html | http://​download.nlm.nih.gov/​umls/​kss/​rxnorm/​RxNorm_full_03022015.zip | 
 +| HCPCS| CMS website | http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Alpha-Numeric-HCPCS.html | https://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Downloads/​2016-Alpha-Numeric-HCPCS-File.zip |
 | ICD9CM and ICD9Proc | CMS website | http://​www.cms.gov/​Medicare/​Coding/​ICD9ProviderDiagnosticCodes/​codes.html| http://​www.cms.gov/​Medicare/​Coding/​ICD9ProviderDiagnosticCodes/​Downloads/​ICD-9-CM-v32-master-descriptions.zip | | ICD9CM and ICD9Proc | CMS website | http://​www.cms.gov/​Medicare/​Coding/​ICD9ProviderDiagnosticCodes/​codes.html| http://​www.cms.gov/​Medicare/​Coding/​ICD9ProviderDiagnosticCodes/​Downloads/​ICD-9-CM-v32-master-descriptions.zip |
 +| ICD10CM | CDC website | http://​www.cdc.gov/​nchs/​icd/​icd10cm.htm | ftp://​ftp.cdc.gov/​pub/​Health_Statistics/​NCHS/​Publications/​ICD10CM/​20xx/​ICD10CM_FY20xx_code_descriptions.zip |
 | LOINC| LOIC website | https://​loinc.org/​downloads/​files/​loinc-table-csv-text-format| http://​loinc.org/​downloads/​files/​loinc-table-csv-text-format/​loinc-table-file-csv-text-format/​download | | LOINC| LOIC website | https://​loinc.org/​downloads/​files/​loinc-table-csv-text-format| http://​loinc.org/​downloads/​files/​loinc-table-csv-text-format/​loinc-table-file-csv-text-format/​download |
 | LOINC| LOIC website | https://​loinc.org/​downloads/​files/​loinc-multiaxial-hierarchy| http://​loinc.org/​downloads/​files/​loinc-multiaxial-hierarchy/​loinc-multiaxial-hierarchy-file/​download | | LOINC| LOIC website | https://​loinc.org/​downloads/​files/​loinc-multiaxial-hierarchy| http://​loinc.org/​downloads/​files/​loinc-multiaxial-hierarchy/​loinc-multiaxial-hierarchy-file/​download |
Line 76: Line 78:
 | LOINC| UMLS website | http://​www.nlm.nih.gov/​research/​umls/​mapping_projects/​loinc_to_cpt_map.html | http://​download.nlm.nih.gov/​umls/​kss/​mappings/​LNC215_TO_CPT2005/​LNC215_TO_CPT2005_MAPPINGS.zip | | LOINC| UMLS website | http://​www.nlm.nih.gov/​research/​umls/​mapping_projects/​loinc_to_cpt_map.html | http://​download.nlm.nih.gov/​umls/​kss/​mappings/​LNC215_TO_CPT2005/​LNC215_TO_CPT2005_MAPPINGS.zip |
 | NDC-SPL| FDA website | http://​www.fda.gov/​Drugs/​InformationOnDrugs/​ucm142438.htm | http://​www.fda.gov/​downloads/​Drugs/​DevelopmentApprovalProcess/​UCM070838.zip | | NDC-SPL| FDA website | http://​www.fda.gov/​Drugs/​InformationOnDrugs/​ucm142438.htm | http://​www.fda.gov/​downloads/​Drugs/​DevelopmentApprovalProcess/​UCM070838.zip |
 +| NDC-SPL| Dailymed website | http://​dailymed.nlm.nih.gov/​dailymed/​spl-resources-all-drug-labels.cfm | ftp://​public.nlm.nih.gov/​nlmdata/​.dailymed/​dm_spl_release_human_rx_part1.zip |
 +| NDC-SPL| Dailymed website | http://​dailymed.nlm.nih.gov/​dailymed/​spl-resources-all-mapping-files.cfm | ftp://​public.nlm.nih.gov/​nlmdata/​.dailymed/​rxnorm_mappings.zip |
 | READ| UK HSCIC website | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​1/​subpack/​21/​releases. | select latest complete released file on the web page | | READ| UK HSCIC website | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​1/​subpack/​21/​releases. | select latest complete released file on the web page |
 | READ| UK HSCIC website | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​9/​subpack/​9/​releases | select latest complete released file on the web page | | READ| UK HSCIC website | https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​9/​subpack/​9/​releases | select latest complete released file on the web page |
Line 101: Line 105:
   * ICD10   * ICD10
   * ICD9CM   * ICD9CM
 +  * ICD10CM
   * ICD9Proc   * ICD9Proc
   * LOINC   * LOINC
Line 109: Line 114:
   * SNOMED   * SNOMED
   * UMLS   * UMLS
 +  * DRG
 +  * MeSH
   * working   * working
   * Final_Assembly   * Final_Assembly
Line 236: Line 243:
   - Extract f_ampp2_xxxxxxx.xml,​ f_amp2_xxxxxxx.xml,​ f_vmpp2_xxxxxxx.xml,​ f_vmp2_xxxxxxx.xml,​ f_lookup2_xxxxxxx.xml,​ f_vtm2_xxxxxxx.xml and f_ingredient2_xxxxxxx.xml   - Extract f_ampp2_xxxxxxx.xml,​ f_amp2_xxxxxxx.xml,​ f_vmpp2_xxxxxxx.xml,​ f_vmp2_xxxxxxx.xml,​ f_lookup2_xxxxxxx.xml,​ f_vtm2_xxxxxxx.xml and f_ingredient2_xxxxxxx.xml
   - Load them into f_ampp2, f_amp2, f_vmpp2, f_vmp2, f_lookup2, f_vtm2 and f_ingredient2. Use the control files of the same name.   - Load them into f_ampp2, f_amp2, f_vmpp2, f_vmp2, f_lookup2, f_vtm2 and f_ingredient2. Use the control files of the same name.
 +  - Download nhsbsa_dmdbonus_X.X.X_YYYYMMDDXXXXXX.zip from https://​isd.hscic.gov.uk/​trud3/​user/​authenticated/​group/​0/​pack/​6/​subpack/​25/​releases
 +  - Extract weekXXYYYY-rX_X-BNF.zip/​f_bnf1_XXXXXXX.xml and rename him to dmdbonus.xml
 +  - Load dmdbonus.xml using dmdbonus.ctl
  
 Staging table load statements Staging table load statements
Line 255: Line 265:
 sqlldr PRODV5/<​password>​ CONTROL=f_ingredient2.ctl LOG=f_ingredient2.log BAD=f_ingredient2.bad ​ sqlldr PRODV5/<​password>​ CONTROL=f_ingredient2.ctl LOG=f_ingredient2.log BAD=f_ingredient2.bad ​
 sqlldr PRODV5/<​password>​ CONTROL=f_lookup2.ctl LOG=f_lookup2.log BAD=f_lookup2.bad ​ sqlldr PRODV5/<​password>​ CONTROL=f_lookup2.ctl LOG=f_lookup2.log BAD=f_lookup2.bad ​
 +sqlldr PRODV5/<​password>​ CONTROL=dmdbonus.ctl LOG=dmdbonus.log BAD=dmdbonus.bad ​
 </​code>​ </​code>​
  
Line 294: Line 305:
  
   - Run create_source_tables.sql   - Run create_source_tables.sql
-  - Download the latest file http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Alpha-Numeric-HCPCS.html,​ file name YYYY-Annual-Alpha-Numeric-HCPCS-File.zip +  - Download the latest file http://​www.cms.gov/​Medicare/​Coding/​HCPCSReleaseCodeSets/​Alpha-Numeric-HCPCS.html,​ file name YYYY-Alpha-Numeric-HCPCS-File.zip 
-  - Extract ​HCPCYYYY_CONTR_ANWEB_v2.xlsx +  - Extract ​HCPCYYYY_CONTR_ANWEB.xlsx 
-  - Open the above file in excel and export a csv file (with double quoted fields) called ​HCPC2015_CONTR_ANWEB_v2.csv for loading into ANWEB_V2 staging table. ​+  - Open the above file in excel and export a csv file (with double quoted fields) called ​ANWEB_V2.csv for loading into ANWEB_V2 staging table. ​
   - Load the file data into ANWEB_V2. Use the control files of the same name.   - Load the file data into ANWEB_V2. Use the control files of the same name.
   - Run load_stage.sql   - Run load_stage.sql
Line 431: Line 442:
   - Extract loinc.csv, map_to.csv, source_organization.csv and LOINC_250_MULTI-AXIAL_HIERARCHY.CSV   - Extract loinc.csv, map_to.csv, source_organization.csv and LOINC_250_MULTI-AXIAL_HIERARCHY.CSV
   - Load them into LOINC, MAP_TO, SOURCE_ORGANIZATION and LOINC_HIERARCHY. Use the control files of the same name.   - Load them into LOINC, MAP_TO, SOURCE_ORGANIZATION and LOINC_HIERARCHY. Use the control files of the same name.
-  - Load LOINC Answers - Load LOINC_XXX_SELECTED_FORMS.zip from http://​loinc.org/​downloads/​files/​loinc-panels-and-forms-file/​loinc-panels-and-forms-file-all-selected-panels-and-forms/​download +  - Load LOINC Answers ​and LOINC Forms - Load LOINC_XXX_PanelsAndForms.zip from http://​loinc.org/​downloads/​files/​loinc-panels-and-forms-file/​loinc-panels-and-forms-file-all-selected-panels-and-forms/​download 
-  - Open LOINC_XXX_SELECTED_FORMS.xlsx and load worksheet "​ANSWERS"​ to table LOINC_ANSWERS+  - Open LOINC_XXX_PanelsAndForms.xlsx and load worksheet "​ANSWERS"​ to table LOINC_ANSWERS ​(clear columns after DisplayText,​ save as Unicode text (UTF-8 w/o BOM) and use loinc_answers.ctl),​ worksheet "​FORMS"​ to table LOINC_FORMS (clear columns after Loinc, save as Unicode text (UTF-8 w/o BOM) and use loinc_forms.ctl)
   - Open loinc_class.csv and load it into table loinc_class   - Open loinc_class.csv and load it into table loinc_class
   - Download SnomedCT_LOINC_TechnologyPreview_INT_xxxxxxxx.zip from https://​loinc.org/​news/​draft-loinc-snomed-ct-mappings-and-expression-associations-now-available.html   - Download SnomedCT_LOINC_TechnologyPreview_INT_xxxxxxxx.zip from https://​loinc.org/​news/​draft-loinc-snomed-ct-mappings-and-expression-associations-now-available.html
Line 486: Line 497:
   - Extract the product.txt file.   - Extract the product.txt file.
   - Load product.txt into PRODUCT using control file of the same name   - Load product.txt into PRODUCT using control file of the same name
 +  - Download additional source for SPL concepts and relationships from http://​dailymed.nlm.nih.gov/​dailymed/​spl-resources-all-drug-labels.cfm and http://​dailymed.nlm.nih.gov/​dailymed/​spl-resources-all-mapping-files.cfm
 +  - -Full Releases of HUMAN PRESCRIPTION LABELS, HUMAN OTC LABELS, HOMEOPATHIC LABELS and REMAINDER LABELS (1st link)
 +  - -SPL-RXNORM MAPPINGS (2d link)
 +  - Extract LABELS using unzipxml.sh and load xml files using loadxml.ctl
 +  - Extract rxnorm_mappings.zip and load rxnorm_mappings.txt using rxnorm_mappings.ctl
   - Run load_stage.sql   - Run load_stage.sql
   - Run generic_update.sql (from working directory)   - Run generic_update.sql (from working directory)
Line 525: Line 541:
 === Overview === === Overview ===
  
-Download DRG dataset+Download, the DRG dataset. Decompress the gzipped files. Run the ETL script to create the staging tables schema and load the datasets.
  
 === Details === === Details ===
Line 531: Line 547:
   - Run create_source_tables.sql   - Run create_source_tables.sql
   - Download "Files for FY" (Table 5 only) from http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Acute-Inpatient-Files-for-Download.html   - Download "Files for FY" (Table 5 only) from http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Acute-Inpatient-Files-for-Download.html
-  2011: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_2011_FR_Table_5.zip +  ​2011: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_2011_FR_Table_5.zip 
-  2012: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_12_NPRM_Table_5.zip +  ​2012: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_12_NPRM_Table_5.zip 
-  2013: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_13_FR_Table_5.zip +  ​2013: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_13_FR_Table_5.zip 
-  2014: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_14_FR_Table_5.zip +  ​2014: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY_14_FR_Table_5.zip 
-  2015: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY2015-NPRM-Table-5.zip+  ​2015: http://​www.cms.gov/​Medicare/​Medicare-Fee-for-Service-Payment/​AcuteInpatientPPS/​Downloads/​FY2015-NPRM-Table-5.zip
   - Extract *.txt files from archives   - Extract *.txt files from archives
   - Rename them to FY2011.txt, FY2012.txt ... FY2015.txt and sequentially load using control files of the same names   - Rename them to FY2011.txt, FY2012.txt ... FY2015.txt and sequentially load using control files of the same names
   - Run load_stage.sql   - Run load_stage.sql
-  - Run generic_update.sql ​(from working directory)+  - generic_update.sql ​NOT needed
  
 staging table load statements staging table load statements
Line 552: Line 568:
  
 ==== Step 12 ==== ==== Step 12 ====
 +
 +=== Overview ===
 +
 +MeSH datasets contains in UMLS.
 +
 +=== Details ===
 +
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +
 +==== Step 13 ====
 +
 +=== Overview ===
 +
 +ICD10 datasets contains in UMLS.
 +
 +=== Details ===
 +
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +
 +
 +==== Step 14 ====
 +
 +=== Overview ===
 +
 +GCN_SEQNO datasets contains in RxNorm.
 +
 +=== Details ===
 +
 +  - Run create_source_tables.sql
 +  - Load NDDF_PRODUCT_INFO.TXT into NDDF_PRODUCT_INFO using control file of the same name
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +==== Step 15 ====
 +
 +=== Overview ===
 +
 +ETC datasets contains in "ETC sources.zip"​.
 +
 +=== Details ===
 +
 +  - Run create_source_tables.sql
 +  - Unpack "ETC sources.zip"​
 +  - Load all TXT files using control files of the same name
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +==== Step 16 ====
 +
 +=== Overview ===
 +
 +Indication datasets contains in "​Indication sources.zip"​.
 +
 +=== Details ===
 +
 +  - Run create_source_tables.sql
 +  - Unpack "​Indication sources.zip"​
 +  - Load all TXT files using control files of the same name
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 +
 +
 +
 +==== Step 17 ====
 +
 +=== Overview ===
 +
 +Download the ICD10CM datasets. Decompress the gzipped files. Run the ETL script to create the staging tables schema and load the datasets. ​
 +
 +=== Details ===
 +
 +  - Run create_source_tables.sql
 +  - Download ftp://​ftp.cdc.gov/​pub/​Health_Statistics/​NCHS/​Publications/​ICD10CM/​20xx/​ICD10CM_FY20xx_code_descriptions.zip
 +  - Extract icd10cm_order_20xx.txt and rename to icd10cm.txt
 +  - Load into icd10cm_table with icd10cm.ctl
 +  - Unzip icd10cm2snomed.zip and load into icd10cm2snomed with icd10cm2snomed.ctl
 +  - Run load_stage.sql
 +  - Run generic_update.sql (from working directory)
 +
 + 
 +==== Step 18 ====
  
 === Overview === === Overview ===
Line 562: Line 664:
   - Run drug_strength.sql   - Run drug_strength.sql
  
-==== Step 13 ====+==== Step 19 ====
  
 === Overview === === Overview ===
Line 575: Line 677:
   - Run manual_changes_15-Mar-2015.sql ​   - Run manual_changes_15-Mar-2015.sql ​
  
-==== Step 13 ====+==== Step 20 ====
  
 === Overview === === Overview ===
documentation/vocabulary_etl.txt · Last modified: 2022/02/23 21:38 by mkallfelz