User Tools

Site Tools


projects:workgroups:patient-level_prediction:best-practice

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
projects:workgroups:patient-level_prediction:best-practice [2016/05/04 08:23]
jreps [Best practices]
projects:workgroups:patient-level_prediction:best-practice [2016/05/04 15:43] (current)
prijnbeek [Best practices]
Line 11: Line 11:
 ===== Best practices ===== ===== Best practices =====
  
-**Data characterisation and cleaning**: Before modelling it is important to characterize the cohorts, for example by looking at the prevalence of certain covariates. Tools are being developed in the community to facilitate this. A data cleaning step is recommend, e.g. remove outliers in lab values.+**Data characterisation and cleaning**: Before modelling it is important to characterize the cohorts, for example by looking at the prevalence of certain covariates. Tools are being developed in the community to facilitate this. A data cleaning step is recommended, e.g. remove outliers in lab values.
  
 **Dealing with missing values **: A best practice still needs to established. **Dealing with missing values **: A best practice still needs to established.
Line 17: Line 17:
 **Feature construction and selection**:​ Both feature construction and selection should be completely transparent using a standardised approach to be able repeat the modelling but also to enable application of the model on unseen data. **Feature construction and selection**:​ Both feature construction and selection should be completely transparent using a standardised approach to be able repeat the modelling but also to enable application of the model on unseen data.
  
-**Inclusion and exclusion criteria** should be made explicit. It is recommended to do sensitivity analyses ​not he choices made. Visualisation tools could help and this will be further explored in the WG. +**Inclusion and exclusion criteria** should be made explicit. It is recommended to do sensitivity analyses ​on the choices made. Visualisation tools could help and this will be further explored in the WG. 
  
 **Model development** is done using a split-sample approach. The percentage used for training could depend on the number of cases, but as a rule of thumb 80/20 split is recommended. Hyper-parameter training should only be done on the training set.  **Model development** is done using a split-sample approach. The percentage used for training could depend on the number of cases, but as a rule of thumb 80/20 split is recommended. Hyper-parameter training should only be done on the training set. 
projects/workgroups/patient-level_prediction/best-practice.1462350239.txt.gz ยท Last modified: 2016/05/04 08:23 by jreps