Your browser doesn't support javascript.
loading
Subclassification of lung adenocarcinoma through comprehensive multi-omics data to benefit survival outcomes.
Wei, Jiayi; Wang, Xin; Guo, Hongping; Zhang, Ling; Shi, Yao; Wang, Xiao.
Afiliación
  • Wei J; Qingdao University, Qingdao, China.
  • Wang X; Qingdao University, Qingdao, China.
  • Guo H; Hubei Normal University, China.
  • Zhang L; Salk Institute for Biological Studies, La Jolla, CA, USA. Electronic address: linzhang@salk.edu.
  • Shi Y; Qingdao University, Qingdao, China. Electronic address: yshi@qdu.edu.cn.
  • Wang X; Qingdao University, Qingdao, China. Electronic address: xwang1020@qdu.edu.cn.
Comput Biol Chem ; 112: 108150, 2024 Oct.
Article en En | MEDLINE | ID: mdl-39018587
ABSTRACT

OBJECTIVES:

Lung adenocarcinoma (LUAD) is the most common subtype of non-small cell lung cancer. Understanding the molecular mechanisms underlying tumor progression is of great clinical significance. This study aims to identify novel molecular markers associated with LUAD subtypes, with the goal of improving the precision of LUAD subtype classification. Additionally, optimization efforts are directed towards enhancing insights from the perspective of patient survival analysis. MATERIALS AND

METHODS:

We propose an innovative feature-selection approach that focuses on LUAD classification, which is comprehensive and robust. The proposed method integrates multi-omics data from The Cancer Genome Atlas (TCGA) and leverages a synergistic combination of max-relevance and min-redundancy, least absolute shrinkage and selection operator, and Boruta algorithms. These selected features were deployed in six machine-learning classifiers logistic regression, random forest, support vector machine, naive Bayes, k-Nearest Neighbor, and XGBoost.

RESULTS:

The proposed approach achieved an area under the receiver operating characteristic curve (AUC) of 0.9958 for LR. Notably, the accuracy and AUC of a composite model incorporating copy number, methylation, as well as RNA- sequencing data for expression of exons, genes, and miRNA mature strands surpassed the accuracy and AUC metrics of models with single-omics data or other multi-omics combinations. Survival analyses, revealed the SVM classifier to elicit optimal classification, outperforming that achieved by TCGA. To enhance model interpretability, SHapley Additive exPlanations (SHAP) values were utilized to elucidate the impact of each feature on the predictions. Gene Ontology (GO) enrichment analysis identified significant biological processes, molecular functions, and cellular components associated with LUAD subtypes.

CONCLUSION:

In summary, our feature selection process, based on TCGA multi-omics data and combined with multiple machine learning classifiers, proficiently identifies molecular subtypes of lung adenocarcinoma and their corresponding significant genes. Our method could enhance the early detection and diagnosis of LUAD, expedite the development of targeted therapies and, ultimately, lengthen patient survival.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Adenocarcinoma del Pulmón / Neoplasias Pulmonares Límite: Humans Idioma: En Revista: Comput Biol Chem Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA / QUIMICA Año: 2024 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Adenocarcinoma del Pulmón / Neoplasias Pulmonares Límite: Humans Idioma: En Revista: Comput Biol Chem Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA / QUIMICA Año: 2024 Tipo del documento: Article
...