Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES).

Choi, Yongjun; Cha, Junho; Choi, Sungkyoung

Choi, Yongjun; Cha, Junho; Choi, Sungkyoung.

Afiliación

Choi Y; Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.
Cha J; Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.
Choi S; Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea. day0413@hanyang.ac.kr.

BMC Bioinformatics ; 25(1): 56, 2024 Feb 02.

Article en En | MEDLINE | ID: mdl-38308205

ABSTRACT

ABSTRACT

BACKGROUND:

Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES).

RESULTS:

First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen's Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems.

CONCLUSIONS:

Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.

Asunto(s)

Algoritmos; Estudio de Asociación del Genoma Completo; Humanos; Teorema de Bayes; Aprendizaje Automático; República de Corea/epidemiología

Palabras clave

Asthma; Disease risk prediction model; Ensemble methods; GWAS; Genome-wide association study; KoGES; Korean Genome and Epidemiology Study; Large-scale genetic data; Machine learning methods; Oversampling; Penalized methods

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Estudio de Asociación del Genoma Completo Tipo de estudio: Prognostic_studies / Risk_factors_studies / Screening_studies Límite: Humans País/Región como asunto: Asia Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Corea del Sur

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google