Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures.

Faucheux, Lilith; Resche-Rigon, Matthieu; Curis, Emmanuel; Soumelis, Vassili; Chevret, Sylvie

Faucheux, Lilith; Resche-Rigon, Matthieu; Curis, Emmanuel; Soumelis, Vassili; Chevret, Sylvie.

Afiliação

Faucheux L; Université de Paris, Sorbonne Paris Cité, ECSTRRA Team, INSERM UMR1153, Paris, France.
Resche-Rigon M; Université de Paris, Sorbonne Paris Cité, INSERM U976, Paris, France.
Curis E; Université de Paris, Sorbonne Paris Cité, ECSTRRA Team, INSERM UMR1153, Paris, France.
Soumelis V; Service de Biostatistique et Information Médicale, AP-HP, Hôpital Saint-Louis, Paris, France.
Chevret S; Service de Biostatistique et Information Médicale, AP-HP, Hôpital Saint-Louis, Paris, France.

Biom J ; 63(2): 372-393, 2021 02.

Article em En | MEDLINE | ID: mdl-32627864

ABSTRACT

ABSTRACT

Cluster analysis, commonly used to explore large biomedical datasets, can be challenging, notably due to missing data or left-censored data induced by the sensitivity limits of the biochemical measurement method. Usually, complete-case analysis, simple imputation, or stochastic simple imputation are applied before clustering. More recently, consensus methods following multiple imputation have been proposed. However, they ignore left-censoring and do not allow the number of clusters to vary across the partitions of each imputed dataset. Here, we developed a consensus-based clustering algorithm in which left-censored data are taken into account using a modified multiple imputation method and the number of clusters is estimated for each imputed dataset. A simulation study was conducted to assess the performance in terms of the number of clusters, the percentage of unclassified observations, and the adjusted Rand index. The simulation results showed that the investigated method works well compared to several alternative approaches. A real-world application in breast cancer patients showed that the proposed method may reveal novel clusters of patients.

Assuntos

Algoritmos; Análise por Conglomerados; Simulação por Computador; Humanos

Palavras-chave

breast cancer; clustering; consensus; left-censored data; missing data; multiple imputation

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos Limite: Humans Idioma: En Revista: Biom J Ano de publicação: 2021 Tipo de documento: Article País de afiliação: França

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google