User Tools

Site Tools


documentation:software:whiterabbit

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
documentation:software:whiterabbit [2015/06/19 17:39]
aguynamedryan1 Provide instructions on switching or loading in new CDM
documentation:software:whiterabbit [2019/02/26 19:34]
ericavoss
Line 9: Line 9:
 WhiteRabbit is a software tool to help prepare for ETLs (Extraction,​ Transformation,​ Loading) of WhiteRabbit is a software tool to help prepare for ETLs (Extraction,​ Transformation,​ Loading) of
 longitudinal healthcare databases into the [[documentation:​cdm:​single-page|Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)]]. The source data can be in longitudinal healthcare databases into the [[documentation:​cdm:​single-page|Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)]]. The source data can be in
-comma-separated text files, or in a database (MySQL, SQL Server, ​ORACLE, PostgreSQL), the +comma-separated text files, or in a database (MySQL, SQL Server, ​Oracle, PostgreSQL, ​Microsoft APS, Microsoft Access, Amazon RedShift). Note that the CDM will need to be in one of limited set of database ​platforms ​(SQL Server, Oracle, PostgreSQL, Microsoft APS, Amazon RedShift). 
-CDM will be in a database (MySQL, ​SQL Server, PostgreSQL). WhiteRabbit is known not to work +
-with Teradata.+
  
 WhiteRabbit’s main function is to perform a scan of the source data, providing detailed WhiteRabbit’s main function is to perform a scan of the source data, providing detailed
 information on the tables, fields, and values that appear in a field. This scan will generate a information on the tables, fields, and values that appear in a field. This scan will generate a
-report that can be used as a reference when designing the ETL, for instance by using the RabbitIn-a-Hat+report that can be used as a reference when designing the ETL, for instance by using the Rabbit-In-a-Hat
 tool. White Rabbit differs from standard data profiling tools in that it attempts to tool. White Rabbit differs from standard data profiling tools in that it attempts to
 prevent the display of personally identifiable information (PII) data values in the generated prevent the display of personally identifiable information (PII) data values in the generated
Line 77: Line 75:
  
 **SQL Server** **SQL Server**
-  * //**Server location:​**//​ the name or IP address of the server running SQL Server+  * //**Server location:​**//​ the name or IP address of the server running SQL Server. You can also specify the port (ex: <​host>:<​port>​),​ which defaults to 1433. 
   * //**User name:**// name of the user used to log into the server. Optionally, the domain can be specified as <​domain>/<​user>​ (e.g. '​MyDomain/​Joe'​)   * //**User name:**// name of the user used to log into the server. Optionally, the domain can be specified as <​domain>/<​user>​ (e.g. '​MyDomain/​Joe'​)
   * //​**Password:​**//​ password for the supplied user name   * //​**Password:​**//​ password for the supplied user name
Line 86: Line 84:
   - Run it, thereby extracting its contents to a folder.   - Run it, thereby extracting its contents to a folder.
   - In the extracted folder you will find the file //​sqljdbc_4.0/​enu/​auth/​x64/​sqljdbc_auth.dll//​ (64-bits) or //​sqljdbc_4.0/​enu/​auth/​x86/​sqljdbc_auth.dll//​ (32-bits), which needs to be moved to a location on the system path, for example to //​c:/​windows/​system32//​.   - In the extracted folder you will find the file //​sqljdbc_4.0/​enu/​auth/​x64/​sqljdbc_auth.dll//​ (64-bits) or //​sqljdbc_4.0/​enu/​auth/​x86/​sqljdbc_auth.dll//​ (32-bits), which needs to be moved to a location on the system path, for example to //​c:/​windows/​system32//​.
 +
 +**MS APS**
 +
 +At this time WhiteRabbit does not run on this platform.
  
 **PostgreSQL** **PostgreSQL**
Line 109: Line 111:
  
 Once all settings are completed, press the “Scan tables” button. After the scan is completed the report will be written to the working folder. Once all settings are completed, press the “Scan tables” button. After the scan is completed the report will be written to the working folder.
 +
 +=== Running from the command line ===
 +
 +For various reasons one could prefer to run WhiteRabbit from the command line. This is possible by specifying all the options one would normally select in the user interface in an .ini file. An example ini file can be found [[https://​github.com/​OHDSI/​WhiteRabbit/​blob/​master/​iniFileExamples/​WhiteRabbit.ini|here]]. Then, we can reference the ini file when calling WhiteRabbit from the command line:
 +
 +  java -jar WhiteRabbit.jar -ini WhiteRabbit.ini
  
 === Reading the Scan === === Reading the Scan ===
Line 222: Line 230:
  
 Once the document is completed, this should be shared with the individuals who plan to implement the code to execute the ETL. Once the document is completed, this should be shared with the individuals who plan to implement the code to execute the ETL.
 +
 +===== Generating a testing framework =====
 +
 +To make sure the ETL process is working as specified, it is highly recommended to create [[https://​en.wikipedia.org/​wiki/​Unit_testing|unit tests]] that evaluate the behavior of the ETL process. To efficiently create a set of unit tests Rabbit-in-a-Hat can [[documentation:​software:​whiterabbit:​test_framework|generate a testing framework]].
  
 ====== Best Practices ====== ====== Best Practices ======
documentation/software/whiterabbit.txt · Last modified: 2020/02/18 14:12 by maximmoinat