Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
BMC Bioinformatics ; 25(1): 56, 2024 Feb 02.
Artículo en Inglés | MEDLINE | ID: mdl-38308205

RESUMEN

BACKGROUND: Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES). RESULTS: First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen's Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems. CONCLUSIONS: Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.


Asunto(s)
Algoritmos , Estudio de Asociación del Genoma Completo , Humanos , Teorema de Bayes , Aprendizaje Automático , República de Corea/epidemiología
2.
BMC Med Inform Decis Mak ; 24(1): 178, 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38915008

RESUMEN

OBJECTIVE: This study aimed to develop and validate a quantitative index system for evaluating the data quality of Electronic Medical Records (EMR) in disease risk prediction using Machine Learning (ML). MATERIALS AND METHODS: The index system was developed in four steps: (1) a preliminary index system was outlined based on literature review; (2) we utilized the Delphi method to structure the indicators at all levels; (3) the weights of these indicators were determined using the Analytic Hierarchy Process (AHP) method; and (4) the developed index system was empirically validated using real-world EMR data in a ML-based disease risk prediction task. RESULTS: The synthesis of review findings and the expert consultations led to the formulation of a three-level index system with four first-level, 11 second-level, and 33 third-level indicators. The weights of these indicators were obtained through the AHP method. Results from the empirical analysis illustrated a positive relationship between the scores assigned by the proposed index system and the predictive performances of the datasets. DISCUSSION: The proposed index system for evaluating EMR data quality is grounded in extensive literature analysis and expert consultation. Moreover, the system's high reliability and suitability has been affirmed through empirical validation. CONCLUSION: The novel index system offers a robust framework for assessing the quality and suitability of EMR data in ML-based disease risk predictions. It can serve as a guide in building EMR databases, improving EMR data quality control, and generating reliable real-world evidence.


Asunto(s)
Exactitud de los Datos , Registros Electrónicos de Salud , Aprendizaje Automático , Registros Electrónicos de Salud/normas , Humanos , Medición de Riesgo/normas , Técnica Delphi
3.
Stud Health Technol Inform ; 310: 1021-1025, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269969

RESUMEN

Coronary artery disease (CAD) has the highest disease burden worldwide. To manage this burden, predictive models are required to screen patients for preventative treatment. A range of variables have been explored for their capacity to predict disease, including phenotypic (age, sex, BMI and smoking status), medical imaging (carotid artery thickness) and genotypic. We use a machine learning models and the UK Biobank cohort to measure the prediction capacity of these 3 variable categories, both in combination and isolation. We demonstrate that phenotypic variables from the Framingham risk score have the best prediction capacity, although a combination of phenotypic, medical imaging and genotypic variables deliver the most specific models. Furthermore, we demonstrate that Variant Spark, a random forest based GWAS platform, performs effective feature selection for SNP-based genotype variables, identifying 115 significantly associated SNPs to the CAD phenotype.


Asunto(s)
Enfermedad de la Arteria Coronaria , Humanos , Enfermedad de la Arteria Coronaria/diagnóstico por imagen , Enfermedad de la Arteria Coronaria/genética , Grosor Intima-Media Carotídeo , Fenotipo , Genotipo , Aprendizaje Automático
4.
Fundam Res ; 4(4): 752-760, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39156563

RESUMEN

The potential for being able to identify individuals at high disease risk solely based on genotype data has garnered significant interest. Although widely applied, traditional polygenic risk scoring methods fall short, as they are built on additive models that fail to capture the intricate associations among single nucleotide polymorphisms (SNPs). This presents a limitation, as genetic diseases often arise from complex interactions between multiple SNPs. To address this challenge, we developed DeepRisk, a biological knowledge-driven deep learning method for modeling these complex, nonlinear associations among SNPs, to provide a more effective method for scoring the risk of common diseases with genome-wide genotype data. Evaluations demonstrated that DeepRisk outperforms existing PRS-based methods in identifying individuals at high risk for four common diseases: Alzheimer's disease, inflammatory bowel disease, type 2 diabetes, and breast cancer.

5.
Comput Biol Med ; 178: 108763, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38889629

RESUMEN

The current disease risk prediction model with many parameters is complex to run smoothly on mobile terminals such as tablets and mobile phones in imaginative elderly care application scenarios. In order to further reduce the number of parameters in the model and enable the disease risk prediction model to run smoothly on mobile terminals, we designed a model called Motico (An Attention Mechanism Network Model for Image Data Classification). During the implementation of the Motico model, in order to protect image features, we designed an image data preprocessing method and an attention mechanism network model for image data classification. The Motico model parameter size is only 5.26 MB, and the memory only takes up 135.69 MB. In the experiment, the accuracy of disease risk prediction was 96 %, the precision rate was 97 %, the recall rate was 93 %, the specificity was 98 %, the F1 score was 95 %, and the AUC was 95 %. This experimental result shows that our Motico model can implement classification prediction based on the image data classification attention mechanism network on mobile terminals.


Asunto(s)
Envejecimiento , Humanos , Anciano , Envejecimiento/fisiología , Femenino , Procesamiento de Imagen Asistido por Computador/métodos , Masculino , Anciano de 80 o más Años
6.
Front Genet ; 15: 1409755, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38993480

RESUMEN

This research aims to advance the detection of Chronic Kidney Disease (CKD) through a novel gene-based predictive model, leveraging recent breakthroughs in gene sequencing. We sourced and merged gene expression profiles of CKD-affected renal tissues from the Gene Expression Omnibus (GEO) database, classifying them into two sets for training and validation in a 7:3 ratio. The training set included 141 CKD and 33 non-CKD specimens, while the validation set had 60 and 14, respectively. The disease risk prediction model was constructed using the training dataset, while the validation dataset confirmed the model's identification capabilities. The development of our predictive model began with evaluating differentially expressed genes (DEGs) between the two groups. We isolated six genes using Lasso and random forest (RF) methods-DUSP1, GADD45B, IFI44L, IFI30, ATF3, and LYZ-which are critical in differentiating CKD from non-CKD tissues. We refined our random forest (RF) model through 10-fold cross-validation, repeated five times, to optimize the mtry parameter. The performance of our model was robust, with an average AUC of 0.979 across the folds, translating to a 91.18% accuracy. Validation tests further confirmed its efficacy, with a 94.59% accuracy and an AUC of 0.990. External validation using dataset GSE180394 yielded an AUC of 0.913, 89.83% accuracy, and a sensitivity rate of 0.889, underscoring the model's reliability. In summary, the study identified critical genetic biomarkers and successfully developed a novel disease risk prediction model for CKD. This model can serve as a valuable tool for CKD disease risk assessment and contribute significantly to CKD identification.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA