Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Proteins ; 89(10): 1277-1288, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33993559

RESUMO

There is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Aprendizado Profundo , Redes Neurais de Computação
2.
PeerJ Comput Sci ; 10: e1981, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38660198

RESUMO

Background: In today's world, numerous applications integral to various facets of daily life include automatic speech recognition methods. Thus, the development of a successful automatic speech recognition system can significantly augment the convenience of people's daily routines. While many automatic speech recognition systems have been established for widely spoken languages like English, there has been insufficient progress in developing such systems for less common languages such as Turkish. Moreover, due to its agglutinative structure, designing a speech recognition system for Turkish presents greater challenges compared to other language groups. Therefore, our study focused on proposing deep learning models for automatic speech recognition in Turkish, complemented by the integration of a language model. Methods: In our study, deep learning models were formulated by incorporating convolutional neural networks, gated recurrent units, long short-term memories, and transformer layers. The Zemberek library was employed to craft the language model to improve system performance. Furthermore, the Bayesian optimization method was applied to fine-tune the hyper-parameters of the deep learning models. To evaluate the model's performance, standard metrics widely used in automatic speech recognition systems, specifically word error rate and character error rate scores, were employed. Results: Upon reviewing the experimental results, it becomes evident that when optimal hyper-parameters are applied to models developed with various layers, the scores are as follows: Without the use of a language model, the Turkish Microphone Speech Corpus dataset yields scores of 22.2 -word error rate and 14.05-character error rate, while the Turkish Speech Corpus dataset results in scores of 11.5 -word error rate and 4.15 character error rate. Upon incorporating the language model, notable improvements were observed. Specifically, for the Turkish Microphone Speech Corpus dataset, the word error rate score decreased to 9.85, and the character error rate score lowered to 5.35. Similarly, the word error rate score improved to 8.4, and the character error rate score decreased to 2.7 for the Turkish Speech Corpus dataset. These results demonstrate that our model outperforms the studies found in the existing literature.

3.
Diagnostics (Basel) ; 14(13)2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-39001254

RESUMO

BACKGROUND: Diabetic retinopathy (DR) is a prevalent microvascular complication of diabetes mellitus, and early detection is crucial for effective management. Metabolomics profiling has emerged as a promising approach for identifying potential biomarkers associated with DR progression. This study aimed to develop a hybrid explainable artificial intelligence (XAI) model for targeted metabolomics analysis of patients with DR, utilizing a focused approach to identify specific metabolites exhibiting varying concentrations among individuals without DR (NDR), those with non-proliferative DR (NPDR), and individuals with proliferative DR (PDR) who have type 2 diabetes mellitus (T2DM). METHODS: A total of 317 T2DM patients, including 143 NDR, 123 NPDR, and 51 PDR cases, were included in the study. Serum samples underwent targeted metabolomics analysis using liquid chromatography and mass spectrometry. Several machine learning models, including Support Vector Machines (SVC), Random Forest (RF), Decision Tree (DT), Logistic Regression (LR), and Multilayer Perceptrons (MLP), were implemented as solo models and in a two-stage ensemble hybrid approach. The models were trained and validated using 10-fold cross-validation. SHapley Additive exPlanations (SHAP) were employed to interpret the contributions of each feature to the model predictions. Statistical analyses were conducted using the Shapiro-Wilk test for normality, the Kruskal-Wallis H test for group differences, and the Mann-Whitney U test with Bonferroni correction for post-hoc comparisons. RESULTS: The hybrid SVC + MLP model achieved the highest performance, with an accuracy of 89.58%, a precision of 87.18%, an F1-score of 88.20%, and an F-beta score of 87.55%. SHAP analysis revealed that glucose, glycine, and age were consistently important features across all DR classes, while creatinine and various phosphatidylcholines exhibited higher importance in the PDR class, suggesting their potential as biomarkers for severe DR. CONCLUSION: The hybrid XAI models, particularly the SVC + MLP ensemble, demonstrated superior performance in predicting DR progression compared to solo models. The application of SHAP facilitates the interpretation of feature importance, providing valuable insights into the metabolic and physiological markers associated with different stages of DR. These findings highlight the potential of hybrid XAI models combined with explainable techniques for early detection, targeted interventions, and personalized treatment strategies in DR management.

4.
IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1104-1113, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35849663

RESUMO

Protein secondary structure, solvent accessibility and torsion angle predictions are preliminary steps to predict 3D structure of a protein. Deep learning approaches have achieved significant improvements in predicting various features of protein structure. In this study, IGPRED-Multitask, a deep learning model with multi task learning architecture based on deep inception network, graph convolutional network and a bidirectional long short-term memory is proposed. Moreover, hyper-parameters of the model are fine-tuned using Bayesian optimization, which is faster and more effective than grid search. The same benchmark test data sets as in the OPUS-TASS paper including TEST2016, TEST2018, CASP12, CASP13, CASPFM, HARD68, CAMEO93, CAMEO93_HARD, as well as the train and validation sets, are used for fair comparison with the literature. Statistically significant improvements are observed in secondary structure prediction on 4 datasets, in phi angle prediction on 2 datasets and in psi angel prediction on 3 datasets compared to the state-of-the-art methods. For solvent accessibility prediction, TEST2016 and TEST2018 datasets are used only to assess the performance of the proposed model.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Solventes/química , Teorema de Bayes , Proteínas/química
5.
Metabolites ; 13(12)2023 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-38132885

RESUMO

Diabetic retinopathy (DR), a common ocular microvascular complication of diabetes, contributes significantly to diabetes-related vision loss. This study addresses the imperative need for early diagnosis of DR and precise treatment strategies based on the explainable artificial intelligence (XAI) framework. The study integrated clinical, biochemical, and metabolomic biomarkers associated with the following classes: non-DR (NDR), non-proliferative diabetic retinopathy (NPDR), and proliferative diabetic retinopathy (PDR) in type 2 diabetes (T2D) patients. To create machine learning (ML) models, 10% of the data was divided into validation sets and 90% into discovery sets. The validation dataset was used for hyperparameter optimization and feature selection stages, while the discovery dataset was used to measure the performance of the models. A 10-fold cross-validation technique was used to evaluate the performance of ML models. Biomarker discovery was performed using minimum redundancy maximum relevance (mRMR), Boruta, and explainable boosting machine (EBM). The predictive proposed framework compares the results of eXtreme Gradient Boosting (XGBoost), natural gradient boosting for probabilistic prediction (NGBoost), and EBM models in determining the DR subclass. The hyperparameters of the models were optimized using Bayesian optimization. Combining EBM feature selection with XGBoost, the optimal model achieved (91.25 ± 1.88) % accuracy, (89.33 ± 1.80) % precision, (91.24 ± 1.67) % recall, (89.37 ± 1.52) % F1-Score, and (97.00 ± 0.25) % the area under the ROC curve (AUROC). According to the EBM explanation, the six most important biomarkers in determining the course of DR were tryptophan (Trp), phosphatidylcholine diacyl C42:2 (PC.aa.C42.2), butyrylcarnitine (C4), tyrosine (Tyr), hexadecanoyl carnitine (C16) and total dimethylarginine (DMA). The identified biomarkers may provide a better understanding of the progression of DR, paving the way for more precise and cost-effective diagnostic and treatment strategies.

6.
Diagnostics (Basel) ; 13(18)2023 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-37761316

RESUMO

Obesity is the excessive accumulation of adipose tissue in the body that leads to health risks. The study aimed to classify obesity levels using a tree-based machine-learning approach considering physical activity and nutritional habits. Methods: The current study employed an observational design, collecting data from a public dataset via a web-based survey to assess eating habits and physical activity levels. The data included gender, age, height, weight, family history of being overweight, dietary patterns, physical activity frequency, and more. Data preprocessing involved addressing class imbalance using Synthetic Minority Over-sampling TEchnique-Nominal Continuous (SMOTE-NC) and feature selection using Recursive Feature Elimination (RFE). Three classification algorithms (logistic regression (LR), random forest (RF), and Extreme Gradient Boosting (XGBoost)) were used for obesity level prediction, and Bayesian optimization was employed for hyperparameter tuning. The performance of different models was evaluated using metrics such as accuracy, recall, precision, F1-score, area under the curve (AUC), and precision-recall curve. The LR model showed the best performance across most metrics, followed by RF and XGBoost. Feature selection improved the performance of LR and RF models, while XGBoost's performance was mixed. The study contributes to the understanding of obesity classification using machine-learning techniques based on physical activity and nutritional habits. The LR model demonstrated the most robust performance, and feature selection was shown to enhance model efficiency. The findings underscore the importance of considering both physical activity and nutritional habits in addressing the obesity epidemic.

7.
IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1909-1918, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33476272

RESUMO

Behçet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behçet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.


Assuntos
Síndrome de Behçet , Polimorfismo de Nucleotídeo Único , Síndrome de Behçet/diagnóstico , Síndrome de Behçet/genética , Predisposição Genética para Doença/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética
8.
NPJ Digit Med ; 4(1): 53, 2021 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-33742069

RESUMO

Consumer wearables and sensors are a rich source of data about patients' daily disease and symptom burden, particularly in the case of movement disorders like Parkinson's disease (PD). However, interpreting these complex data into so-called digital biomarkers requires complicated analytical approaches, and validating these biomarkers requires sufficient data and unbiased evaluation methods. Here we describe the use of crowdsourcing to specifically evaluate and benchmark features derived from accelerometer and gyroscope data in two different datasets to predict the presence of PD and severity of three PD symptoms: tremor, dyskinesia, and bradykinesia. Forty teams from around the world submitted features, and achieved drastically improved predictive performance for PD status (best AUROC = 0.87), as well as tremor- (best AUPR = 0.75), dyskinesia- (best AUPR = 0.48) and bradykinesia-severity (best AUPR = 0.95).

9.
J Microbiol Methods ; 177: 106045, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32890569

RESUMO

The accurate identification of lactobacilli is essential for the effective management of industrial practices associated with lactobacilli strains, such as the production of fermented foods or probiotic supplements. For this reason, in this study, we proposed the Multi Fragment Melting Analysis System (MFMAS)-lactobacilli based on high resolution melting (HRM) analysis of multiple DNA regions that have high interspecies heterogeneity for fast and reliable identification and characterization of lactobacilli. The MFMAS-lactobacilli is a new and customized version of the MFMAS, which was developed by our research group. MFMAS-lactobacilli is a combined system that consists of i) a ready-to-use plate, which is designed for multiple HRM analysis, and ii) a data analysis software, which is used to characterize lactobacilli species via incorporating machine learning techniques. Simultaneous HRM analysis of multiple DNA fragments yields a fingerprint for each tested strain and the identification is performed by comparing the fingerprints of unknown strains with those of known lactobacilli species registered in the MFMAS. In this study, a total of 254 isolates, which were recovered from fermented foods and probiotic supplements, were subjected to MFMAS analysis, and the results were confirmed by a combination of different molecular techniques. All of the analyzed isolates were exactly differentiated and accurately identified by applying the single-step procedure of MFMAS, and it was determined that all of the tested isolates belonged to 18 different lactobacilli species. The individual analysis of each target DNA region provided identification with an accuracy range from 59% to 90% for all tested isolates. However, when each target DNA region was analyzed simultaneously, perfect discrimination and 100% accurate identification were obtained even in closely related species. As a result, it was concluded that MFMAS-lactobacilli is a multi-purpose method that can be used to differentiate, classify, and identify lactobacilli species. Hence, our proposed system could be a potential alternative to overcome the inconsistencies and difficulties of the current methods.


Assuntos
Técnicas Bacteriológicas/métodos , DNA Bacteriano/análise , Lactobacillus/genética , Lactobacillus/isolamento & purificação , Microbiologia de Alimentos , Genes Bacterianos/genética , Modelos Logísticos , Aprendizado de Máquina , Reação em Cadeia da Polimerase/métodos , Probióticos , Análise de Sequência de DNA , Software
10.
J Bioinform Comput Biol ; 16(5): 1850020, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30353781

RESUMO

Secondary structure and solvent accessibility prediction provide valuable information for estimating the three dimensional structure of a protein. As new feature extraction methods are developed the dimensionality of the input feature space increases steadily. Reducing the number of dimensions provides several advantages such as faster model training, faster prediction and noise elimination. In this work, several dimensionality reduction techniques have been employed including various feature selection methods, autoencoders and PCA for protein secondary structure and solvent accessibility prediction. The reduced feature set is used to train a support vector machine at the second stage of a hybrid classifier. Cross-validation experiments on two difficult benchmarks demonstrate that the dimension of the input space can be reduced substantially while maintaining the prediction accuracy. This will enable the incorporation of additional informative features derived for predicting the structural properties of proteins without reducing the accuracy due to overfitting.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Solventes/química , Algoritmos , Redes Neurais de Computação , Análise de Componente Principal , Estrutura Secundária de Proteína , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA