Background Observational data present the opportunity to identify patterns that can be utilized to develop discriminative personalized health outcome risk models. The most widely implemented framework for developing health outcome risk models requires experts to define a set of independent variables, and these are then fed into a logistic or cox regression. The disadvantage of such a framework is that it requires the time consuming aspect of defining different sets of independent variables for each health outcome and dataset. Furthermore, the model is generally restricted to a small selection of variables and therefore ignores large quantities of data, whose inclusion may be able to improve risk prediction performance. For example, restricting the set of independent variables based on current expert knowledge may result in the exclusion of highly predictive independent variables because they are unknown at the time.
The PatientLevelPrediction package developed in OHDSI (Observational Health Data Science and Informatics) uses an adaptive framework approach, where a data-driven method explores all the data to find the independent variables that are predictive of the outcome. This is accomplished by the logistic regression model including a large number of independent variables but each independent variable’s coefficient has a Laplace prior, which acts as a type of regularization and results in many coefficients being shrunk to zero to limit model overfitting. The independent variables with non-zero coefficients are selected by the model as they are predictive of the health outcome. If such a framework is able to perform well, then it could be efficiently applied, across the network of observational data available to the OHDSI community, to develop risk models for many health outcomes. The added advantage of the lasso logistic regression is that is learns an easy to interpret sparse model, so it may also be used to gain new medical insight by clearly highlighting unknown risk factors.
In this study described here we would like to provide a proof of principle of the PatientLevelPrediction package, and evaluate the performance and robustness of the risk prediction models across a range of health outcomes and datasets. We would also like to show that the PatientLevelPrediction package can be easily deployed in a distributed research network to develop risk models using different observational datasets.
Full Protocol: Word doc for the protocol
Initial Proposal Date:
Launch Date: TBD
Study Closure Date: TBD
Results Submission: Email
Table Accessed: CONCEPT, CONCEPT_ANCESTOR, CONDITION_ERA, CONDITION_OCCURRENCE, DRUG_ERA, DRUG_EXPOSURE, MEASUREMENT, OBSERVATION, OBSERVATION_PERIOD, PERSON, PROCEDURE_OCCURRENCE, VISIT_OCCURRENCE
Database Dialects: SQL Server, Postgres, Oracle, PDW
Software: SQL and R