Model performance

How does model performance compare with clinician performance ?

Our models- outperform physicians in prognostic performance- based on existing studies of clinician predictive performance. Our models have achieved an area under the curve (AUC) of 0.93-0.94; a significant improvement over an AUC of 0.65-0.75 for clinical ‘gut-instinct’. Our models can be deployed into the hospital EMR to automatically risk-stratify every patient at the time of admission/discharge.

Our models should not be used to replace clinician’s judgement and patient preferences; but instead as a screening and risk stratification tool that can be used to increase the availiability of serious illness conversations in appropriate clinical circumstances.

Figure 1. Left Panel: The receiver operator curve for the minSIA8 model. Right Panel: The observed rate of death at 1 year within each one of the 10 probability bins is plotted on the y-axis. The predicted probability from the RF model is indicated on the x-axis.The dotted diagonal line represents points along a perfectly calibrated model. Each point on the graph represents one of the 10 bins of probability. The bars delineate the 95% confidence intervals around the observed probability.

Figure 1: Figure 1. Left Panel: The receiver operator curve for the minSIA8 model. Right Panel: The observed rate of death at 1 year within each one of the 10 probability bins is plotted on the y-axis. The predicted probability from the RF model is indicated on the x-axis.The dotted diagonal line represents points along a perfectly calibrated model. Each point on the graph represents one of the 10 bins of probability. The bars delineate the 95% confidence intervals around the observed probability.

The models were developed and validated on a large population of hospitalised patients. The calibration curve (which measures how well the predicted probabilities track the observed probabilities in the validation set) is shown in the Figure 2. The min-SIA8 model tracks the actual risk to within 10%. Using the model, 83% of deaths can be captured by reviewing the cases with the highest 20% risk.

The recall plot shows the percentage of the overall number of cases in a given category that are gained (y-axis) when we apply the minSIA8 and select the highest k-deciles (x-axis). For example if the positivity threshold is set to be the highest ranking 20% cases (by predicted probability) then 83% percent of true-positives would be selected.

Figure 2: The recall plot shows the percentage of the overall number of cases in a given category that are gained (y-axis) when we apply the minSIA8 and select the highest k-deciles (x-axis). For example if the positivity threshold is set to be the highest ranking 20% cases (by predicted probability) then 83% percent of true-positives would be selected.