A comparative study of clustering methods on gene expression data for lung cancer prognosis.
BMC Res Notes
; 16(1): 319, 2023 Nov 08.
Article
in En
| MEDLINE
| ID: mdl-37941025
Lung cancer subtyping based on gene expression data is important for identifying patient subgroups with differing survival prognosis to facilitate customized treatment strategies for each subtype of patients. Unsupervised clustering methods are the traditional approach for clustering patients into subtypes. However, since those methods cluster patients based only on gene expression data, the resulting clusters may not always be relevant to the survival outcome of interest. In recent years, semi-supervised and supervised methods have been proposed, which leverage the survival outcome data to identify clusters more relevant to survival prognosis. This paper aims to compare the performance of different clustering methods for identifying clinically prognostic lung cancer subtypes based on two lung adenocarcinoma datasets. For each method, we clustered patients into two clusters and assessed the difference in patient survival time between clusters. Unsupervised methods were found to have large logrank p-values and no significant results in most cases. Semi-supervised and supervised methods had improved performance over unsupervised methods and very significant p-values. These results indicate that unsupervised methods are not capable of identifying clusters with significant differences in survival prognosis in most cases, while supervised and semi-supervised methods can better cluster patients into clinically useful subtypes.
Key words
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Lung Neoplasms
Limits:
Humans
Language:
En
Journal:
BMC Res Notes
Year:
2023
Document type:
Article
Affiliation country:
Country of publication: