User Tools

Site Tools


Proof of concept study for large-scale patient-level predictive modelling in the OHDSI data network

Objective: The objective of the large-scale patient-level predictive modelling study is to develop models using 5 commonly used classifiers for a single ‘at risk’ cohort (pharmaceutically treated depression cohort) and 22 ‘outcome’ cohorts. The study is implemented across the OHDSI collaborator network to externally validate the models and assess their transportabilities across the world.

Rationale: Observational Health Data Sciences and Informatics (OHDSI) holds the promise of making massive-scale, patient-specific predictive modeling a reality. The data is stored in the common data model (CDM) enables uniform and transparent analysis. The large standardized populations contain rich data to build highly predictive large-scale models and also provide immediate opportunity to serve large communities of patients who are in most need of improved quality of care. Effective exploitation of these massive dataset to develop patient-level prediction models demands a standardized pipeline for both model development and evaluation.

A patient level prediction model problem is defined by an ‘at risk’ cohort (the group of people we wish to do the prediction for), the ‘outcome’ cohort (the outcome we wish to predict) and the ‘at-risk’ period (time window relative to the start of the at risk cohort index date). At present only a limited number of conditions have existing patient level prediction models and little is known about the feasibility of utilizing observational databases for clinically useful patient level prediction models at scale (for all suitable ‘at risk’ and ‘outcome’ cohort pairs). We want to fill this gap by using the observational databases to determine a very large number of ‘at-risk’ and ‘outcome’ pairs and develop prediction models for all these pairs. This study is the start of that challenging but extremely interesting journey.

Project Lead(s): Peter Rijnbeek, Jenna Reps

Coordinating Institution(s): Erasmus MC Rotterdam, The Netherlands

Additional Participants (currently seeking additional collaborators):
Martijn Schuemie, Marc Suchard, Patrick Ryan

Full Protocol: Large-Scale Patient-Level Prediction Protocol

Initial Proposal Date: 2016-08-01

Launch Date: 2016-09-23

Receive Results for Analysis Date: TBD

Study Closure Date: TBD

Results Submission: Via the OHDSI Sharing module embedded in study.


CDM: V5 only

Table Accessed: person, drug_exposure, observations

Database Dialects: SQL Server, Postgres, Oracle

Software: R (>= 3.2.2, RTools), Java, Python (2.7)

Hardware: Recommended 8 cores, 64GB memory, 250GB free space

Datasets Run

  • CCAE (Janssen)
  • MDCD (Janssen)
  • MDCR (Janssen)
  • Optum (Janssen)

Datasets Running

  • Ajou University School of Medicine
  • Erasmus MC
  • Stanford
  • Columbia University
research/largescalepred.txt · Last modified: 2017/02/02 19:18 by prijnbeek