Búsqueda | Portal Regional de la BVS

1.

SB-Net: Synergizing CNN and LSTM networks for uncovering retrosynthetic pathways in organic synthesis.

Mir, Bilal Ahmad; Tayara, Hilal; Chong, Kil To.

Comput Biol Chem ; 112: 108130, 2024 Jun 15.

Artículo en Inglés | MEDLINE | ID: mdl-38954849

RESUMEN

Retrosynthesis is vital in synthesizing target products, guiding reaction pathway design crucial for drug and material discovery. Current models often neglect multi-scale feature extraction, limiting efficacy in leveraging molecular descriptors. Our proposed SB-Net model, a deep-learning architecture tailored for retrosynthesis prediction, addresses this gap. SB-Net combines CNN and Bi-LSTM architectures, excelling in capturing multi-scale molecular features. It integrates parallel branches for processing one-hot encoded descriptors and ECFP, merging through dense layers. Experimental results demonstrate SB-Net's superiority, achieving 73.6â¯% top-1 and 94.6â¯% top-10 accuracy on USPTO-50k data. Versatility is validated on MetaNetX, with rates of 52.8â¯% top-1, 74.3â¯% top-3, 79.8â¯% top-5, and 83.5â¯% top-10. SB-Net's success in bioretrosynthesis prediction tasks indicates its efficacy. This research advances computational chemistry, offering a robust deep-learning model for retrosynthesis prediction. With implications for drug discovery and synthesis planning, SB-Net promises innovative and efficient pathways.

2.

A graph neural network approach for predicting drug susceptibility in the human microbiome.

Rehman, Mobeen Ur; Hussain, Irfan; Tayara, Hilal; Chong, Kil To.

Comput Biol Med ; 179: 108729, 2024 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-38955124

RESUMEN

Recent studies have illuminated the critical role of the human microbiome in maintaining health and influencing the pharmacological responses of drugs. Clinical trials, encompassing approximately 150 drugs, have unveiled interactions with the gastrointestinal microbiome, resulting in the conversion of these drugs into inactive metabolites. It is imperative to explore the field of pharmacomicrobiomics during the early stages of drug discovery, prior to clinical trials. To achieve this, the utilization of machine learning and deep learning models is highly desirable. In this study, we have proposed graph-based neural network models, namely GCN, GAT, and GINCOV models, utilizing the SMILES dataset of drug microbiome. Our primary objective was to classify the susceptibility of drugs to depletion by gut microbiota. Our results indicate that the GINCOV surpassed the other models, achieving impressive performance metrics, with an accuracy of 93% on the test dataset. This proposed Graph Neural Network (GNN) model offers a rapid and efficient method for screening drugs susceptible to gut microbiota depletion and also encourages the improvement of patient-specific dosage responses and formulations.

3.

Advancing Peptide-Based Cancer Therapy with AI: In-Depth Analysis of State-of-the-Art AI Models.

Bhattarai, Sadik; Tayara, Hilal; Chong, Kil To.

J Chem Inf Model ; 2024 Jun 14.

Artículo en Inglés | MEDLINE | ID: mdl-38874445

RESUMEN

Anticancer peptides (ACPs) play a vital role in selectively targeting and eliminating cancer cells. Evaluating and comparing predictions from various machine learning (ML) and deep learning (DL) techniques is challenging but crucial for anticancer drug research. We conducted a comprehensive analysis of 15 ML and 10 DL models, including the models released after 2022, and found that support vector machines (SVMs) with feature combination and selection significantly enhance overall performance. DL models, especially convolutional neural networks (CNNs) with light gradient boosting machine (LGBM) based feature selection approaches, demonstrate improved characterization. Assessment using a new test data set (ACP10) identifies ACPred, MLACP 2.0, AI4ACP, mACPred, and AntiCP2.0_AAC as successive optimal predictors, showcasing robust performance. Our review underscores current prediction tool limitations and advocates for an omnidirectional ACP prediction framework to propel ongoing research.

4.

NaII-Pred: An ensemble-learning framework for the identification and interpretation of sodium ion inhibitors by fusing multiple feature representation.

Hassan, Mir Tanveerul; Tayara, Hilal; Chong, Kil To.

Comput Biol Med ; 178: 108737, 2024 Jun 15.

Artículo en Inglés | MEDLINE | ID: mdl-38879934

RESUMEN

High-affinity ligand peptides for ion channels are essential for controlling the flow of ions across the plasma membrane. These peptides are now being investigated as possible therapeutic possibilities for a variety of illnesses, including cancer and cardiovascular disease. So, the identification and interpretation of ligand peptide inhibitors to control ion flow across cells become pivotal for exploration. In this work, we developed an ensemble-based model, NaII-Pred, for the identification of sodium ion inhibitors. The ensemble model was trained, tested, and evaluated on three different datasets. The NaII-Pred method employs six different descriptors and a hybrid feature set in conjunction with five conventional machine learning classifiers to create 35 baseline models. Through an ensemble approach, the top five baseline models trained on the hybrid feature set were integrated to yield the final predictive model, NaII-Pred. Our proposed model, NaII-Pred, outperforms the baseline models and the current predictors on both datasets. We believe NaII-Pred will play a critical role in screening and identifying potential sodium ion inhibitors and will be an invaluable tool.

5.

An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction.

Pratiwi, Nor Kumalasari Caecar; Tayara, Hilal; Chong, Kil To.

Int J Mol Sci ; 25(11)2024 May 29.

Artículo en Inglés | MEDLINE | ID: mdl-38892144

RESUMEN

In this study, we present an innovative approach to improve the prediction of protein-protein interactions (PPIs) through the utilization of an ensemble classifier, specifically focusing on distinguishing between native and non-native interactions. Leveraging the strengths of various base models, including random forest, gradient boosting, extreme gradient boosting, and light gradient boosting, our ensemble classifier integrates these diverse predictions using a logistic regression meta-classifier. Our model was evaluated using a comprehensive dataset generated from molecular dynamics simulations. While the gains in AUC and other metrics might seem modest, they contribute to a model that is more robust, consistent, and adaptable. To assess the effectiveness of various approaches, we compared the performance of logistic regression to four baseline models. Our results indicate that logistic regression consistently underperforms across all evaluated metrics. This suggests that it may not be well-suited to capture the complex relationships within this dataset. Tree-based models, on the other hand, appear to be more effective for problems involving molecular dynamics simulations. Extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) are optimized for performance and speed, handling datasets effectively and incorporating regularizations to avoid over-fitting. Our findings indicate that the ensemble method enhances the predictive capability of PPIs, offering a promising tool for computational biology and drug discovery by accurately identifying potential interaction sites and facilitating the understanding of complex protein functions within biological systems.

Asunto(s)

Simulación de Dinámica Molecular , Mapeo de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Algoritmos , Unión Proteica , Modelos Logísticos

6.

PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction.

Jeevan, Kandel; Palistha, Shrestha; Tayara, Hilal; Chong, Kil T.

J Cheminform ; 16(1): 66, 2024 Jun 07.

Artículo en Inglés | MEDLINE | ID: mdl-38849917

RESUMEN

Accurate ligand binding site prediction (LBSP) within proteins is essential for drug discovery. We developed ProteinUNetResNetV2.0 (PUResNetV2.0), leveraging sparse representation of protein structures to improve LBSP accuracy. Our training dataset included protein complexes from 4729 protein families. Evaluations on benchmark datasets showed that PUResNetV2.0 achieved an 85.4% Distance Center Atom (DCA) success rate and a 74.7% F1 Score on the Holo801 dataset, outperforming existing methods. However, its performance in specific cases, such as RNA, DNA, peptide-like ligand, and ion binding site prediction, was limited due to constraints in our training data. Our findings underscore the potential of sparse representation in LBSP, especially for oligomeric structures, suggesting PUResNetV2.0 as a promising tool for computational drug discovery.

7.

AMPred-CNN: Ames mutagenicity prediction model based on convolutional neural networks.

Tran, Thi Tuyet Van; Tayara, Hilal; Chong, Kil To.

Comput Biol Med ; 176: 108560, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38754218

RESUMEN

Mutagenicity assessment plays a pivotal role in the safety evaluation of chemicals, pharmaceuticals, and environmental compounds. In recent years, the development of robust computational models for predicting chemical mutagenicity has gained significant attention, driven by the need for efficient and cost-effective toxicity assessments. In this paper, we proposed AMPred-CNN, an innovative Ames mutagenicity prediction model based on Convolutional Neural Networks (CNNs), uniquely employing molecular structures as images to leverage CNNs' powerful feature extraction capabilities. The study employs the widely used benchmark mutagenicity dataset from Hansen et al. for model development and evaluation. Comparative analyses with traditional ML models on different molecular features reveal substantial performance enhancements. AMPred-CNN outshines these models, demonstrating superior accuracy, AUC, F1 score, MCC, sensitivity, and specificity on the test set. Notably, AMPred-CNN is further benchmarked against seven recent ML and DL models, consistently showcasing superior performance with an impressive AUC of 0.954. Our study highlights the effectiveness of CNNs in advancing mutagenicity prediction, paving the way for broader applications in toxicology and drug development.

Asunto(s)

Pruebas de Mutagenicidad , Mutágenos , Redes Neurales de la Computación , Mutágenos/toxicidad

8.

Integrated Computational Approaches for Drug Design Targeting Cruzipain.

Parvez, Aiman; Lee, Jeong-Sang; Alam, Waleed; Tayara, Hilal; Chong, Kil To.

Int J Mol Sci ; 25(7)2024 Mar 27.

Artículo en Inglés | MEDLINE | ID: mdl-38612558

RESUMEN

Cruzipain inhibitors are required after medications to treat Chagas disease because of the need for safer, more effective treatments. Trypanosoma cruzi is the source of cruzipain, a crucial cysteine protease that has driven interest in using computational methods to create more effective inhibitors. We employed a 3D-QSAR model, using a dataset of 36 known inhibitors, and a pharmacophore model to identify potential inhibitors for cruzipain. We also built a deep learning model using the Deep purpose library, trained on 204 active compounds, and validated it with a specific test set. During a comprehensive screening of the Drug Bank database of 8533 molecules, pharmacophore and deep learning models identified 1012 and 340 drug-like molecules, respectively. These molecules were further evaluated through molecular docking, followed by induced-fit docking. Ultimately, molecular dynamics simulation was performed for the final potent inhibitors that exhibited strong binding interactions. These results present four novel cruzipain inhibitors that can inhibit the cruzipain protein of T. cruzi.

Asunto(s)

Enfermedad de Chagas , Cisteína Endopeptidasas , Humanos , Simulación del Acoplamiento Molecular , Proteínas Protozoarias , Enfermedad de Chagas/tratamiento farmacológico , Diseño de Fármacos

9.

Stack-AAgP: Computational prediction and interpretation of anti-angiogenic peptides using a meta-learning framework.

Gaffar, Saima; Tayara, Hilal; Chong, Kil To.

Comput Biol Med ; 174: 108438, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38613893

RESUMEN

BACKGROUND: Angiogenesis plays a vital role in the pathogenesis of several human diseases, particularly in the case of solid tumors. In the realm of cancer treatment, recent investigations into peptides with anti-angiogenic properties have yielded encouraging outcomes, thereby creating a hopeful therapeutic avenue for the treatment of cancer. Therefore, correctly identifying the anti-angiogenic peptides is extremely important in comprehending their biophysical and biochemical traits, laying the groundwork for uncovering novel drugs to combat cancer. METHODS: In this work, we present a novel ensemble-learning-based model, Stack-AAgP, specifically designed for the accurate identification and interpretation of anti-angiogenic peptides (AAPs). Initially, a feature representation approach is employed, generating 24 baseline models through six machine learning algorithms (random forest [RF], extra tree classifier [ETC], extreme gradient boosting [XGB], light gradient boosting machine [LGBM], CatBoost, and SVM) and four feature encodings (pseudo-amino acid composition [PAAC], amphiphilic pseudo-amino acid composition [APAAC], composition of k-spaced amino acid pairs [CKSAAP], and quasi-sequence-order [QSOrder]). Subsequently, the output (predicted probabilities) from 24 baseline models was inputted into the same six machine-learning classifiers to generate their respective meta-classifiers. Finally, the meta-classifiers were stacked together using the ensemble-learning framework to construct the final predictive model. RESULTS: Findings from the independent test demonstrate that Stack-AAgP outperforms the state-of-the-art methods by a considerable margin. Systematic experiments were conducted to assess the influence of hyperparameters on the proposed model. Our model, Stack-AAgP, was evaluated on the independent NT15 dataset, revealing superiority over existing predictors with an accuracy improvement ranging from 5% to 7.5% and an increase in Matthews Correlation Coefficient (MCC) from 7.2% to 12.2%.

Asunto(s)

Inhibidores de la Angiogénesis , Aprendizaje Automático , Inhibidores de la Angiogénesis/uso terapéutico , Humanos , Péptidos/química , Biología Computacional/métodos , Algoritmos

10.

Harnessing machine learning to predict cytochrome P450 inhibition through molecular properties.

Zahid, Hamza; Tayara, Hilal; Chong, Kil To.

Arch Toxicol ; 2024 Apr 15.

Artículo en Inglés | MEDLINE | ID: mdl-38619593

RESUMEN

Cytochrome P450 enzymes are a superfamily of enzymes responsible for the metabolism of a variety of medicines and xenobiotics. Among the Cytochrome P450 family, five isozymes that include 1A2, 2C9, 2C19, 2D6, and 3A4 are most important for the metabolism of xenobiotics. Inhibition of any of these five CYP isozymes causes drug-drug interactions with high pharmacological and toxicological effects. So, the inhibition or non-inhibition prediction of these isozymes is of great importance. Many techniques based on machine learning and deep learning algorithms are currently being used to predict whether these isozymes will be inhibited or not. In this study, three different molecular or substructural properties that include Morgan, MACCS and Morgan (combined) and RDKit of the various molecules are used to train a distinct SVM model against each isozyme (1A2, 2C9, 2C19, 2D6, and 3A4). On the independent dataset, Morgan fingerprints provided the best results, while MACCS and Morgan (combined) achieved comparable results in terms of balanced accuracy (BA), sensitivity (Sn), and Mathews correlation coefficient (MCC). For the Morgan fingerprints, balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4 on an independent dataset ranged between 0.81 and 0.85, 0.61 and 0.70, 0.72 and 0.83, respectively. Similarly, on the independent dataset, MACCS and Morgan (combined) fingerprints achieved competitive results in terms of balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4, which ranged between 0.79 and 0.85, 0.59 and 0.69, 0.69 and 0.82, respectively.

11.

An integrative machine learning model for the identification of tumor T-cell antigens.

Hassan, Mir Tanveerul; Tayara, Hilal; Chong, Kil To.

Biosystems ; 237: 105177, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38458346

RESUMEN

The escalating global incidence of cancer poses significant health challenges, underscoring the need for innovative and more efficacious treatments. Cancer immunotherapy, a promising approach leveraging the body's immune system against cancer, emerges as a compelling solution. Consequently, the identification and characterization of tumor T-cell antigens (TTCAs) have become pivotal for exploration. In this manuscript, we introduce TTCA-IF, an integrative machine learning-based framework designed for TTCAs identification. TTCA-IF employs ten feature encoding types in conjunction with five conventional machine learning classifiers. To establish a robust foundation, these classifiers are trained, resulting in the creation of 150 baseline models. The outputs from these baseline models are then fed back into the five classifiers, generating their respective meta-models. Through an ensemble approach, the five meta-models are seamlessly integrated to yield the final predictive model, the TTCA-IF model. Our proposed model, TTCA-IF, surpasses both baseline models and existing predictors in performance. In a comparative analysis involving nine novel peptide sequences, TTCA-IF demonstrated exceptional accuracy by correctly identifying 8 out of 9 peptides as TTCAs. As a tool for screening and pinpointing potential TTCAs, we anticipate TTCA-IF to be invaluable in advancing cancer immunotherapy.

Asunto(s)

Aprendizaje Automático , Neoplasias , Humanos , Tiazolidinas , Linfocitos T , Neoplasias/terapia , Neoplasias/diagnóstico

12.

Unveiling dominant recombination loss in perovskite solar cells with a XGBoost-based machine learning approach.

Akbar, Basir; Tayara, Hilal; Chong, Kil To.

iScience ; 27(3): 109200, 2024 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-38420582

RESUMEN

Remarkable and intelligent perovskite solar cells (PSCs) have attracted substantial attention from researchers and are undergoing rapid advancements in photovoltaic technology. These developments aim to create highly efficient energy devices with fewer dominant recombination losses within the realm of third-generation solar cells. Diverse machine learning (ML) algorithms implemented, addressing dominant losses due to recombination in PSCs, focusing on grain boundaries (GBs), interfaces, and band-to-band recombination. The extreme gradient boosting (XGBoost) classifier effectively predicts the recombination losses. Our model trained with 7-fold cross-validation to ensure generalizability and robustness. Leveraging Optuna and shapley additive explanations (SHAP) for hyperparameter optimization and investigate the influence of features on target variables, achieved 85% accuracy on over 2 million simulated data, respectively. Because of the input parameters (light intensity and open-circuit voltage), the performance evaluation measures for the dominant losses caused by the recombination predicted by proposed model were superior to those of state-of-the-art models.

13.

SolPredictor: Predicting Solubility with Residual Gated Graph Neural Network.

Ahmad, Waqar; Tayara, Hilal; Shim, HyunJoo; Chong, Kil To.

Int J Mol Sci ; 25(2)2024 Jan 05.

Artículo en Inglés | MEDLINE | ID: mdl-38255790

RESUMEN

Computational methods play a pivotal role in the pursuit of efficient drug discovery, enabling the rapid assessment of compound properties before costly and time-consuming laboratory experiments. With the advent of technology and large data availability, machine and deep learning methods have proven efficient in predicting molecular solubility. High-precision in silico solubility prediction has revolutionized drug development by enhancing formulation design, guiding lead optimization, and predicting pharmacokinetic parameters. These benefits result in considerable cost and time savings, resulting in a more efficient and shortened drug development process. The proposed SolPredictor is designed with the aim of developing a computational model for solubility prediction. The model is based on residual graph neural network convolution (RGNN). The RGNNs were designed to capture long-range dependencies in graph-structured data. Residual connections enable information to be utilized over various layers, allowing the model to capture and preserve essential features and patterns scattered throughout the network. The two largest datasets available to date are compiled, and the model uses a simplified molecular-input line-entry system (SMILES) representation. SolPredictor uses the ten-fold split cross-validation Pearson correlation coefficient R2 0.79±0.02 and root mean square error (RMSE) 1.03±0.04. The proposed model was evaluated using five independent datasets. Error analysis, hyperparameter optimization analysis, and model explainability were used to determine the molecular features that were most valuable for prediction.

Asunto(s)

Desarrollo de Medicamentos , Descubrimiento de Drogas , Solubilidad , Correlación de Datos , Redes Neurales de la Computación

14.

DL-SPhos: Prediction of serine phosphorylation sites using transformer language model.

Shrestha, Palistha; Kandel, Jeevan; Tayara, Hilal; Chong, Kil To.

Comput Biol Med ; 169: 107925, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38183701

RESUMEN

Serine phosphorylation plays a pivotal role in the pathogenesis of various cellular processes and diseases. Roughly 81% of human diseases have links to phosphorylation, and an overwhelming 86.4% of protein phosphorylation takes place at serine residues. In eukaryotes, over a quarter of proteins undergo phosphorylation, with more than half implicated in numerous disorders, notably cancer and reproductive system diseases. This study primarily focuses on serine-phosphorylation-driven pathogenesis and the critical role of conserved motif identification. While numerous techniques exist for predicting serine phosphorylation sites, traditional wet lab experiments are resource-intensive. Our paper introduces a cutting-edge deep learning tool for predicting S phosphorylation sites, integrating explainable AI for motif identification, a transformer language model, and deep neural network components. We trained our model on protein sequences from UniProt, validated it against the dbPTM benchmark dataset, and employed the PTMD dataset to explore motifs related to mammalian disorders. Our results highlight that our model surpasses other deep learning predictors by a significant 3%. Furthermore, we utilized the local interpretable model-agnostic explanations (LIME) approach to shed light on the predictions, emphasizing the amino acid residues crucial for S phosphorylation. Notably, our model also outperformed competitors in kinase-specific serine phosphorylation prediction on benchmark datasets.

Asunto(s)

Redes Neurales de la Computación , Proteínas , Animales , Humanos , Fosforilación , Proteínas/metabolismo , Secuencia de Aminoácidos , Serina/metabolismo , Mamíferos/metabolismo

15.

Unlocking the therapeutic potential of drug combinations through synergy prediction using graph transformer networks.

Alam, Waleed; Tayara, Hilal; Chong, Kil To.

Comput Biol Med ; 170: 108007, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38242015

RESUMEN

Drug combinations are frequently used to treat cancer to reduce side effects and increase efficacy. The experimental discovery of drug combination synergy is time-consuming and expensive for large datasets. Therefore, an efficient and reliable computational approach is required to investigate these drug combinations. Advancements in deep learning can handle large datasets with various biological problems. In this study, we developed a SynergyGTN model based on the Graph Transformer Network to predict the synergistic drug combinations against an untreated cancer cell line expression profile. We represent the drug via a graph, with each node and edge of the graph containing nine types of atomic feature vectors and four bonds features, respectively. The cell lines represent based on their gene expression profiles. The drug graph was passed through the GTN layers to extract a generalized feature map for each drug pairs. The drug pair extracted features and cell-line gene expression profiles were concatenated and subsequently subjected to processing through multiple densely connected layers. SynergyGTN outperformed the state-of-the-art methods, with a receiver operating characteristic area under the curve improvement of 5% on the 5-fold cross-validation. The accuracy of SynergyGTN was further verified through three types of cross-validation tests strategies namely leave-drug-out, leave-combination-out, and leave-tissue-out, resulting in improvement in accuracy of 8%, 1%, and 2%, respectively. The Astrazeneca Dream dataset was utilized as an independent dataset to validate and assess the generalizability of the proposed method, resulting in an improvement in balanced accuracy of 13%. In conclusion, SynergyGTN is a reliable and efficient computational approach for predicting drug combination synergy in cancer treatment. Finally, we developed a web server tool to facilitate the pharmaceutical industry and researchers, as available at: http://nsclbio.jbnu.ac.kr/tools/SynergyGTN/.

Asunto(s)

Biología Computacional , Transcriptoma , Sinergismo Farmacológico , Biología Computacional/métodos , Combinación de Medicamentos , Línea Celular Tumoral

16.

Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods.

Khan, Asad; Kandel, Jeevan; Tayara, Hilal; Chong, Kil To.

Mol Inform ; 43(2): e202300217, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38050743

RESUMEN

Rapid and accurate prediction of bandgaps and efficiency of perovskite solar cells is a crucial challenge for various solar cell applications. Existing theoretical and experimental methods often accurately measure these parameters; however, these methods are costly and time-consuming. Machine learning-based approaches offer a promising and computationally efficient method to address this problem. In this study, we trained different machine learning(ML) models using previously reported experimental data. Among the different ML models, the CatBoostRegressor performed better for both bandgap and efficiency approximations. We evaluated the proposed model using k-fold cross-validation and investigated the relative importance of input features using Shapley Additive Explanations (SHAP). SHAP interprets valuable insights into feature contributions of the prediction of the proposed model. Furthermore, we validated the performance of the proposed model using an independent dataset, demonstrating its robustness and generalizability beyond the training data. Our findings show that machine learning-based approaches, with the aid of SHAP, can provide a promising and computationally efficient method for the accurate and rapid prediction of perovskite solar cell properties. The proposed model is expected to facilitate the discovery of new perovskite materials and is freely available at GitHub (https://github.com/AsadKhanJBNU/perovskite_bandgap_and_efficiency.git) for the perovskite community.

Asunto(s)

Compuestos de Calcio , Óxidos , Titanio , Aprendizaje Automático

17.

IF-AIP: A machine learning method for the identification of anti-inflammatory peptides using multi-feature fusion strategy.

Gaffar, Saima; Hassan, Mir Tanveerul; Tayara, Hilal; Chong, Kil To.

Comput Biol Med ; 168: 107724, 2024 01.

Artículo en Inglés | MEDLINE | ID: mdl-37989075

RESUMEN

BACKGROUND: The most commonly used therapy currently for inflammatory and autoimmune diseases is nonspecific anti-inflammatory drugs, which have various hazardous side effects. Recently, some anti-inflammatory peptides (AIPs) have been found to be a substitute therapy for inflammatory diseases like rheumatoid arthritis and Alzheimer's. Therefore, the identification of these AIPs is an emerging topic that is equally important. METHODS: In this work, we have proposed an identification model for AIPs using a voting classifier. We used eight different feature descriptors and five conventional machine-learning classifiers. The eight feature encodings were concatenated to get a hybrid feature set. The five baseline models trained on the hybrid feature set were integrated via a voting classifier. Finally, a feature selection algorithm was used to select the optimal feature set for the construction of our final model, named IF-AIP. RESULTS: We tested the proposed model on two independent datasets. On independent data 1, the IF-AIP model shows an improvement of 3%-5.6% in terms of accuracies and 6.7%-10.8% in terms of MCC compared to the existing methods. On the independent dataset 2, our model IF-AIP shows an overall improvement of 2.9%-5.7% in terms of accuracy and 8.3%-8.6% in terms of MCC score compared to the existing methods. A comparative performance analysis was conducted between the proposed model and existing methods using a set of 24 novel peptide sequences. Notably, the IF-AIP method exhibited exceptional accuracy, correctly identifying all 24 peptides as AIPs. The source code, pre-trained models, and all datasets are made available at https://github.com/Mir-Saima/IF-AIP.

Asunto(s)

Aprendizaje Automático , Péptidos , Algoritmos , Antiinflamatorios/análisis , Programas Informáticos

18.

ORI-Explorer: a unified cell-specific tool for origin of replication sites prediction by feature fusion.

Abbas, Zeeshan; Rehman, Mobeen Ur; Tayara, Hilal; Chong, Kil To.

Bioinformatics ; 39(11)2023 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-37929975

RESUMEN

MOTIVATION: The origins of replication sites (ORIs) are precise regions inside the DNA sequence where the replication process begins. These locations are critical for preserving the genome's integrity during cell division and guaranteeing the faithful transfer of genetic data from generation to generation. The advent of experimental techniques has aided in the discovery of ORIs in many species. Experimentation, on the other hand, is often more time-consuming and pricey than computational approaches, and it necessitates specific equipment and knowledge. Recently, ORI sites have been predicted using computational techniques like motif-based searches and artificial intelligence algorithms based on sequence characteristics and chromatin states. RESULTS: In this article, we developed ORI-Explorer, a unique artificial intelligence-based technique that combines multiple feature engineering techniques to train CatBoost Classifier for recognizing ORIs from four distinct eukaryotic species. ORI-Explorer was created by utilizing a unique combination of three traditional feature-encoding techniques and a feature set obtained from a deep-learning neural network model. The ORI-Explorer has significantly outperformed current predictors on the testing dataset. Furthermore, by employing the sophisticated SHapley Additive exPlanation method, we give crucial insights that aid in comprehending model success, highlighting the most relevant features vital for forecasting cell-specific ORIs. ORI-Explorer is also intended to aid community-wide attempts in discovering potential ORIs and developing innovative verifiable biological hypotheses. AVAILABILITY AND IMPLEMENTATION: The used datasets along with the source code are made available through https://github.com/Z-Abbas/ORI-Explorer and https://zenodo.org/record/8358679.

Asunto(s)

Inteligencia Artificial , Origen de Réplica , Replicación del ADN , Cromatina , Secuencia de Bases

19.

Recent Studies of Artificial Intelligence on In Silico Drug Absorption.

Tran, Thi Tuyet Van; Tayara, Hilal; Chong, Kil To.

J Chem Inf Model ; 63(20): 6198-6211, 2023 10 23.

Artículo en Inglés | MEDLINE | ID: mdl-37819031

RESUMEN

Absorption is an important area of research in pharmacochemistry and drug development, because the drug has to be absorbed before any drug effects can occur. Furthermore, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile of drugs can be directly and considerably altered by modulating factors affecting absorption. Many drugs in development fail because of poor absorption. The research and continuous efforts of researchers in recent years have brought many successes and promises in drug absorption property prediction, especially in silico, which helps to reduce the time and cost significantly for screening undesirable drug candidates. In this report, we explicitly provide an overview of recent in silico studies on predicting absorption properties, especially from 2019 to the present, using artificial intelligence. Additionally, we have collected and investigated public databases that support absorption prediction research. On those grounds, we also proposed the challenges and development directions of absorption prediction in the future. We hope this review can provide researchers with valuable guidelines on absorption prediction to facilitate the development of newer approaches in drug discovery.

Asunto(s)

Inteligencia Artificial , Descubrimiento de Drogas , Fenómenos Químicos , Bases de Datos Factuales

20.

Improving Enhancer Identification with a Multi-Classifier Stacked Ensemble Model.

Mir, Bilal Ahmad; Rehman, Mobeen Ur; Tayara, Hilal; Chong, Kil To.

J Mol Biol ; 435(23): 168314, 2023 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-37852600

RESUMEN

Enhancers are DNA regions that are responsible for controlling the expression of genes. Enhancers are usually found upstream or downstream of a gene, or even inside a gene's intron region, but are normally located at a distant location from the genes they control. By integrating experimental and computational approaches, it is possible to uncover enhancers within DNA sequences, which possess regulatory properties. Experimental techniques such as ChIP-seq and ATAC-seq can identify genomic regions that are associated with transcription factors or accessible to regulatory proteins. On the other hand, computational techniques can predict enhancers based on sequence features and epigenetic modifications. In our study, we have developed a multi-classifier stacked ensemble (MCSE-enhancer) model that can accurately identify enhancers. We utilized feature descriptors from various physiochemical properties as input for our six baseline classifiers and built a stacked classifier, which outperformed previous enhancer classification techniques in terms of accuracy, specificity, sensitivity, and Mathew's correlation coefficient. Our model achieved an accuracy of 81.5%, representing a 2-3% improvement over existing models.

Asunto(s)

Biología Computacional , Elementos de Facilitación Genéticos , Aprendizaje Automático , Análisis de Secuencia de ADN , Biología Computacional/métodos , ADN/química , ADN/genética , Factores de Transcripción/química , Análisis de Secuencia de ADN/métodos

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA