User Tools

Site Tools


documentation:software:whiterabbit

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
documentation:software:whiterabbit [2015/06/19 17:39]
aguynamedryan1 Provide instructions on switching or loading in new CDM
documentation:software:whiterabbit [2019/12/24 16:29] (current)
mkahn 24-12-2019: BigQuery configuration instructions added by M Kahn
Line 9: Line 9:
 WhiteRabbit is a software tool to help prepare for ETLs (Extraction,​ Transformation,​ Loading) of WhiteRabbit is a software tool to help prepare for ETLs (Extraction,​ Transformation,​ Loading) of
 longitudinal healthcare databases into the [[documentation:​cdm:​single-page|Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)]]. The source data can be in longitudinal healthcare databases into the [[documentation:​cdm:​single-page|Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)]]. The source data can be in
-comma-separated text files, or in a database (MySQL, SQL Server, ​ORACLE, PostgreSQL), the +comma-separated text files, or in a database (MySQL, SQL Server, ​Oracle, PostgreSQL, ​Microsoft APS, Microsoft Access, Amazon RedShift, Google BigQuery). Note that the CDM will need to be in one of limited set of database ​platforms ​(SQL Server, Oracle, PostgreSQL, Microsoft APS, Amazon RedShift, Google BigQuery). 
-CDM will be in a database (MySQL, ​SQL Server, PostgreSQL). WhiteRabbit is known not to work +
-with Teradata.+
  
 WhiteRabbit’s main function is to perform a scan of the source data, providing detailed WhiteRabbit’s main function is to perform a scan of the source data, providing detailed
 information on the tables, fields, and values that appear in a field. This scan will generate a information on the tables, fields, and values that appear in a field. This scan will generate a
-report that can be used as a reference when designing the ETL, for instance by using the RabbitIn-a-Hat+report that can be used as a reference when designing the ETL, for instance by using the Rabbit-In-a-Hat
 tool. White Rabbit differs from standard data profiling tools in that it attempts to tool. White Rabbit differs from standard data profiling tools in that it attempts to
 prevent the display of personally identifiable information (PII) data values in the generated prevent the display of personally identifiable information (PII) data values in the generated
Line 54: Line 52:
  
 Here you can specify the location of the source data. The following source types are supported: Here you can specify the location of the source data. The following source types are supported:
-delimited text files, MySQL, Oracle, SQL Server, ​and PostgreSQL. Below are connection+delimited text files, MySQL, Oracle, SQL Server, PostgreSQL, and Google BigQuery. Below are connection
 instructions for each data type of data source. Once you have entered the necessary instructions for each data type of data source. Once you have entered the necessary
 information,​ the “Test connection” button can ensure a connection can be made. information,​ the “Test connection” button can ensure a connection can be made.
Line 77: Line 75:
  
 **SQL Server** **SQL Server**
-  * //**Server location:​**//​ the name or IP address of the server running SQL Server+  * //**Server location:​**//​ the name or IP address of the server running SQL Server. You can also specify the port (ex: <​host>:<​port>​),​ which defaults to 1433. 
   * //**User name:**// name of the user used to log into the server. Optionally, the domain can be specified as <​domain>/<​user>​ (e.g. '​MyDomain/​Joe'​)   * //**User name:**// name of the user used to log into the server. Optionally, the domain can be specified as <​domain>/<​user>​ (e.g. '​MyDomain/​Joe'​)
   * //​**Password:​**//​ password for the supplied user name   * //​**Password:​**//​ password for the supplied user name
Line 86: Line 84:
   - Run it, thereby extracting its contents to a folder.   - Run it, thereby extracting its contents to a folder.
   - In the extracted folder you will find the file //​sqljdbc_4.0/​enu/​auth/​x64/​sqljdbc_auth.dll//​ (64-bits) or //​sqljdbc_4.0/​enu/​auth/​x86/​sqljdbc_auth.dll//​ (32-bits), which needs to be moved to a location on the system path, for example to //​c:/​windows/​system32//​.   - In the extracted folder you will find the file //​sqljdbc_4.0/​enu/​auth/​x64/​sqljdbc_auth.dll//​ (64-bits) or //​sqljdbc_4.0/​enu/​auth/​x86/​sqljdbc_auth.dll//​ (32-bits), which needs to be moved to a location on the system path, for example to //​c:/​windows/​system32//​.
 +
 +**MS APS**
 +
 +At this time WhiteRabbit does not run on this platform.
  
 **PostgreSQL** **PostgreSQL**
 +
   * //**Server location:​**//​ this field contains the host name and database name (<​host>/<​database>​)   * //**Server location:​**//​ this field contains the host name and database name (<​host>/<​database>​)
   * //**User name:**// name of the user used to log into the server   * //**User name:**// name of the user used to log into the server
   * //​**Password:​**//​ password for the supplied user name   * //​**Password:​**//​ password for the supplied user name
   * //​**Database name:**// this field contains the schema containing the source tables   * //​**Database name:**// this field contains the schema containing the source tables
 +
 +**Google BigQuery**
 +
 +Google BigQuery (GBQ) supports two different connection/​authentication methods -- application default credentials and service account authentication. The former method is considered more secure because it writes auditing events to stackdriver. The specific method used is determined by the arguments provided to the configuration panel as described below.
 +
 +
 +Authentication via application default credentials:​
 +
 +When using application default credentials authentication,​ you must run the following gcloud command in the user account only once: '​gcloud auth application-default login' (do not include the single quote characters). An application key is written to ~/​.config/​gcloud/​application_default_credentails.json.
 +
 +  * //**Server location:​**//​ name of the GBQ ProjectID
 +  * //**User name:**// not used
 +  * //​**Password:​**//​ not used
 +  * //​**Database name:**// data set name within ProjectID named in Server location field
 +
 +Authentication via service account credentials:​
 +  * //**Server location:​**//​ name of GBQ ProjectID
 +  * //**User name:**// OAuth service account email address
 +  * //​**Password:​**//​ OAuth private key path (file location of private key JSON file). Must be a valid full file pathname
 +  * //​**Database name:**// data set name within ProjectID named in Server location field
 +
  
 ==== Scanning a Database ==== ==== Scanning a Database ====
Line 109: Line 133:
  
 Once all settings are completed, press the “Scan tables” button. After the scan is completed the report will be written to the working folder. Once all settings are completed, press the “Scan tables” button. After the scan is completed the report will be written to the working folder.
 +
 +=== Running from the command line ===
 +
 +For various reasons one could prefer to run WhiteRabbit from the command line. This is possible by specifying all the options one would normally select in the user interface in an .ini file. An example ini file can be found [[https://​github.com/​OHDSI/​WhiteRabbit/​blob/​master/​iniFileExamples/​WhiteRabbit.ini|here]]. Then, we can reference the ini file when calling WhiteRabbit from the command line:
 +
 +  java -jar WhiteRabbit.jar -ini WhiteRabbit.ini
  
 === Reading the Scan === === Reading the Scan ===
Line 222: Line 252:
  
 Once the document is completed, this should be shared with the individuals who plan to implement the code to execute the ETL. Once the document is completed, this should be shared with the individuals who plan to implement the code to execute the ETL.
 +
 +===== Generating a testing framework =====
 +
 +To make sure the ETL process is working as specified, it is highly recommended to create [[https://​en.wikipedia.org/​wiki/​Unit_testing|unit tests]] that evaluate the behavior of the ETL process. To efficiently create a set of unit tests Rabbit-in-a-Hat can [[documentation:​software:​whiterabbit:​test_framework|generate a testing framework]].
  
 ====== Best Practices ====== ====== Best Practices ======
documentation/software/whiterabbit.1434735543.txt.gz · Last modified: 2015/06/19 17:39 by aguynamedryan1