User Tools

Site Tools


documentation:software:whiterabbit

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
documentation:software:whiterabbit [2016/12/22 13:59]
ericavoss [Specifying the Location of Data] Noting it doesn't work on APS
documentation:software:whiterabbit [2019/12/24 16:29] (current)
mkahn 24-12-2019: BigQuery configuration instructions added by M Kahn
Line 9: Line 9:
 WhiteRabbit is a software tool to help prepare for ETLs (Extraction,​ Transformation,​ Loading) of WhiteRabbit is a software tool to help prepare for ETLs (Extraction,​ Transformation,​ Loading) of
 longitudinal healthcare databases into the [[documentation:​cdm:​single-page|Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)]]. The source data can be in longitudinal healthcare databases into the [[documentation:​cdm:​single-page|Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)]]. The source data can be in
-comma-separated text files, or in a database (MySQL, SQL Server, Oracle, PostgreSQL, Microsoft APS, Microsoft Access, Amazon RedShift). Note that the CDM will need to be in one of a limited set of database platforms (SQL Server, Oracle, PostgreSQL, Microsoft APS, Amazon RedShift). ​+comma-separated text files, or in a database (MySQL, SQL Server, Oracle, PostgreSQL, Microsoft APS, Microsoft Access, Amazon RedShift, Google BigQuery). Note that the CDM will need to be in one of a limited set of database platforms (SQL Server, Oracle, PostgreSQL, Microsoft APS, Amazon RedShift, Google BigQuery). 
  
 WhiteRabbit’s main function is to perform a scan of the source data, providing detailed WhiteRabbit’s main function is to perform a scan of the source data, providing detailed
 information on the tables, fields, and values that appear in a field. This scan will generate a information on the tables, fields, and values that appear in a field. This scan will generate a
-report that can be used as a reference when designing the ETL, for instance by using the RabbitIn-a-Hat+report that can be used as a reference when designing the ETL, for instance by using the Rabbit-In-a-Hat
 tool. White Rabbit differs from standard data profiling tools in that it attempts to tool. White Rabbit differs from standard data profiling tools in that it attempts to
 prevent the display of personally identifiable information (PII) data values in the generated prevent the display of personally identifiable information (PII) data values in the generated
Line 52: Line 52:
  
 Here you can specify the location of the source data. The following source types are supported: Here you can specify the location of the source data. The following source types are supported:
-delimited text files, MySQL, Oracle, SQL Server, ​and PostgreSQL. Below are connection+delimited text files, MySQL, Oracle, SQL Server, PostgreSQL, and Google BigQuery. Below are connection
 instructions for each data type of data source. Once you have entered the necessary instructions for each data type of data source. Once you have entered the necessary
 information,​ the “Test connection” button can ensure a connection can be made. information,​ the “Test connection” button can ensure a connection can be made.
Line 75: Line 75:
  
 **SQL Server** **SQL Server**
-  * //**Server location:​**//​ the name or IP address of the server running SQL Server+  * //**Server location:​**//​ the name or IP address of the server running SQL Server. You can also specify the port (ex: <​host>:<​port>​),​ which defaults to 1433. 
   * //**User name:**// name of the user used to log into the server. Optionally, the domain can be specified as <​domain>/<​user>​ (e.g. '​MyDomain/​Joe'​)   * //**User name:**// name of the user used to log into the server. Optionally, the domain can be specified as <​domain>/<​user>​ (e.g. '​MyDomain/​Joe'​)
   * //​**Password:​**//​ password for the supplied user name   * //​**Password:​**//​ password for the supplied user name
Line 90: Line 90:
  
 **PostgreSQL** **PostgreSQL**
 +
   * //**Server location:​**//​ this field contains the host name and database name (<​host>/<​database>​)   * //**Server location:​**//​ this field contains the host name and database name (<​host>/<​database>​)
   * //**User name:**// name of the user used to log into the server   * //**User name:**// name of the user used to log into the server
   * //​**Password:​**//​ password for the supplied user name   * //​**Password:​**//​ password for the supplied user name
   * //​**Database name:**// this field contains the schema containing the source tables   * //​**Database name:**// this field contains the schema containing the source tables
 +
 +**Google BigQuery**
 +
 +Google BigQuery (GBQ) supports two different connection/​authentication methods -- application default credentials and service account authentication. The former method is considered more secure because it writes auditing events to stackdriver. The specific method used is determined by the arguments provided to the configuration panel as described below.
 +
 +
 +Authentication via application default credentials:​
 +
 +When using application default credentials authentication,​ you must run the following gcloud command in the user account only once: '​gcloud auth application-default login' (do not include the single quote characters). An application key is written to ~/​.config/​gcloud/​application_default_credentails.json.
 +
 +  * //**Server location:​**//​ name of the GBQ ProjectID
 +  * //**User name:**// not used
 +  * //​**Password:​**//​ not used
 +  * //​**Database name:**// data set name within ProjectID named in Server location field
 +
 +Authentication via service account credentials:​
 +  * //**Server location:​**//​ name of GBQ ProjectID
 +  * //**User name:**// OAuth service account email address
 +  * //​**Password:​**//​ OAuth private key path (file location of private key JSON file). Must be a valid full file pathname
 +  * //​**Database name:**// data set name within ProjectID named in Server location field
 +
  
 ==== Scanning a Database ==== ==== Scanning a Database ====
documentation/software/whiterabbit.1482415184.txt.gz · Last modified: 2016/12/22 13:59 by ericavoss