A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis.

Borah, Kasmika; Das, Himanish Shekhar; Seth, Soumita; Mallick, Koushik; Rahaman, Zubair; Mallik, Saurav

Borah, Kasmika; Das, Himanish Shekhar; Seth, Soumita; Mallick, Koushik; Rahaman, Zubair; Mallik, Saurav.

Afiliación

Borah K; Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India.
Das HS; Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India. himanish.das@cottonuniversity.ac.in.
Seth S; Department of Computer Science and Engineering, Future Institute of Engineering and Management, Narendrapur, Kolkata, 700150, West Bengal, India.
Mallick K; Department of Computer Science and Engineering, RCC Institute of Information Technology, Canal S Rd, Beleghata, Kolkata, 700015, West Bengal, India.
Rahaman Z; Vitas Healthcare, Kissimmee, FL, USA.
Mallik S; Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA, 02115, USA. smallik@hsph.harvard.edu.

Funct Integr Genomics ; 24(5): 139, 2024 Aug 19.

Article en En | MEDLINE | ID: mdl-39158621

ABSTRACT

ABSTRACT

Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento; Aprendizaje Automático; Humanos; Secuenciación de Nucleótidos de Alto Rendimiento/métodos; Aprendizaje Profundo

Palabras clave

Dimensionality Reduction; Feature Extraction; Feature Selection; Next Generation Sequencing data

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Secuenciación de Nucleótidos de Alto Rendimiento / Aprendizaje Automático Límite: Humans Idioma: En Revista: Funct Integr Genomics Asunto de la revista: BIOLOGIA MOLECULAR / GENETICA Año: 2024 Tipo del documento: Article País de afiliación: India

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google