Review and evaluation of performance measures for survival prediction models in external validation settings.

Rahman, M Shafiqur; Ambler, Gareth; Choodari-Oskooei, Babak; Omar, Rumana Z

Rahman, M Shafiqur; Ambler, Gareth; Choodari-Oskooei, Babak; Omar, Rumana Z.

Afiliación

Rahman MS; Institute of Statistical Research and Training, University of Dhaka, Dhaka, Bangladesh. shafiq@isrt.ac.bd.
Ambler G; Department of Statistical Science, University College London, London, UK.
Choodari-Oskooei B; Institute of Clinical Trials & Methodology, University College London, London, UK.
Omar RZ; Department of Statistical Science, University College London, London, UK.

BMC Med Res Methodol ; 17(1): 60, 2017 Apr 18.

Article en En | MEDLINE | ID: mdl-28420338

ABSTRACT

ABSTRACT

BACKGROUND:

When developing a prediction model for survival data it is essential to validate its performance in external validation settings using appropriate performance measures. Although a number of such measures have been proposed, there is only limited guidance regarding their use in the context of model validation. This paper reviewed and evaluated a wide range of performance measures to provide some guidelines for their use in practice.

METHODS:

An extensive simulation study based on two clinical datasets was conducted to investigate the performance of the measures in external validation settings. Measures were selected from categories that assess the overall performance, discrimination and calibration of a survival prediction model. Some of these have been modified to allow their use with validation data, and a case study is provided to describe how these measures can be estimated in practice. The measures were evaluated with respect to their robustness to censoring and ease of interpretation. All measures are implemented, or are straightforward to implement, in statistical software.

RESULTS:

Most of the performance measures were reasonably robust to moderate levels of censoring. One exception was Harrell's concordance measure which tended to increase as censoring increased.

CONCLUSIONS:

We recommend that Uno's concordance measure is used to quantify concordance when there are moderate levels of censoring. Alternatively, Gönen and Heller's measure could be considered, especially if censoring is very high, but we suggest that the prediction model is re-calibrated first. We also recommend that Royston's D is routinely reported to assess discrimination since it has an appealing interpretation. The calibration slope is useful for both internal and external validation settings and recommended to report routinely. Our recommendation would be to use any of the predictive accuracy measures and provide the corresponding predictive accuracy curves. In addition, we recommend to investigate the characteristics of the validation data such as the level of censoring and the distribution of the prognostic index derived in the validation setting before choosing the performance measures.

Asunto(s)

Modelos Biológicos; Modelos Estadísticos; Análisis de Supervivencia; Neoplasias de la Mama; Cardiomiopatía Hipertrófica; Simulación por Computador; Conjuntos de Datos como Asunto; Humanos; Estudios de Validación como Asunto

Palabras clave

Prognostic model; Survival analysis; Validation; calibration; discrimination

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Análisis de Supervivencia / Modelos Estadísticos / Modelos Biológicos Tipo de estudio: Evaluation_studies / Guideline / Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: BMC Med Res Methodol Asunto de la revista: MEDICINA Año: 2017 Tipo del documento: Article País de afiliación: Bangladesh

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google