Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Comput Biol Med ; 182: 109128, 2024 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-39270460

RESUMO

The immune system depends on antibodies (Abs) to recognize and attach to a wide range of antigens, playing a pivotal role in immunity. The precise prediction of the variable fragment (Fv) region of antibodies is vital for the progress of therapeutic and commercial applications, particularly in the treatment of diseases such as cancer. Although deep learning models exist for accurate antibody structure prediction, challenges persist, particularly in modeling complementarity-determining regions (CDRs) and the overall antibody Fv structures. Introducing the FvFold model, a deep learning approach harnessing the capabilities of the ProtT5-XL-UniRef50 protein language model which is capable of predicting accurate antibody Fv structure. Through evaluations on various benchmarks, our model outperforms existing models, demonstrating superior accuracy by achieving lower Root Mean Square Deviation (RMSD) in almost all loops and Orientational Coordinate Distance (OCD) values in the RosettaAntibody benchmark, Therapeutic benchmark and IgFold benchmark compared to the previous top-performing model.

2.
Pharmaceuticals (Basel) ; 17(9)2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39338272

RESUMO

Targeting epidermal growth factor receptor (EGFR) mutants is a promising strategy for treating non-small cell lung cancer (NSCLC). This study focused on the computational identification and characterization of potential EGFR mutant-selective inhibitors using pharmacophore design and validation by deep learning, virtual screening, ADMET (Absorption, distribution, metabolism, excretion and toxicity), and molecular docking-dynamics simulations. A pharmacophore model was generated using Pharmit based on the potent inhibitor JBJ-125, which targets the mutant EGFR (PDB 5D41) and is used for the virtual screening of the Zinc database. In total, 16 hits were retrieved from 13,127,550 molecules and 122,276,899 conformers. The pharmacophore model was validated via DeepCoy, generating 100 inactive decoy structures for each active molecule and ADMET tests were conducted using SWISS ADME and PROTOX 3.0. Filtered compounds underwent molecular docking studies using Glide, revealing promising interactions with the EGFR allosteric site along with better docking scores. Molecular dynamics (MD) simulations confirmed the stability of the docked conformations. These results bring out five novel compounds that can be evaluated as single agents or in combination with existing therapies, holding promise for treating the EGFR-mutant NSCLC.

3.
Comput Biol Med ; 182: 109087, 2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39232403

RESUMO

Epigenetic modifications, particularly RNA methylation and histone alterations, play a crucial role in heredity, development, and disease. Among these, RNA 5-methylcytosine (m5C) is the most prevalent RNA modification in mammalian cells, essential for processes such as ribosome synthesis, translational fidelity, mRNA nuclear export, turnover, and translation. The increasing volume of nucleotide sequences has led to the development of machine learning-based predictors for m5C site prediction. However, these predictors often face challenges related to training data limitations and overfitting due to insufficient external validation. This study introduces m5C-Seq, an ensemble learning approach for RNA modification profiling, designed to address these issues. m5C-Seq employs a meta-classifier that integrates 15 probabilities generated from a novel, large dataset using systematic encoding methods to make final predictions. Demonstrating superior performance compared to existing predictors, m5C-Seq represents a significant advancement in accurate RNA modification profiling. The code and the newly established datasets are made available through GitHub at https://github.com/Z-Abbas/m5C-Seq.

4.
Nat Commun ; 15(1): 6699, 2024 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-39107330

RESUMO

Post-translational modifications (PTMs) are pivotal in modulating protein functions and influencing cellular processes like signaling, localization, and degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts unsupervised learning to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model's final decoder layer to elucidate sequence motifs essential for molecular recognition and analyze the effects of mutations at or near PTM sites to offer deeper insights into protein functionality. Comparative assessments reveal that PTMGPT2 outperforms existing methods across 19 PTM types, underscoring its potential in identifying disease associations and drug targets.


Assuntos
Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Humanos , Biologia Computacional/métodos , Algoritmos , Bases de Dados de Proteínas
5.
Comput Biol Med ; 179: 108729, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38955124

RESUMO

Recent studies have illuminated the critical role of the human microbiome in maintaining health and influencing the pharmacological responses of drugs. Clinical trials, encompassing approximately 150 drugs, have unveiled interactions with the gastrointestinal microbiome, resulting in the conversion of these drugs into inactive metabolites. It is imperative to explore the field of pharmacomicrobiomics during the early stages of drug discovery, prior to clinical trials. To achieve this, the utilization of machine learning and deep learning models is highly desirable. In this study, we have proposed graph-based neural network models, namely GCN, GAT, and GINCOV models, utilizing the SMILES dataset of drug microbiome. Our primary objective was to classify the susceptibility of drugs to depletion by gut microbiota. Our results indicate that the GINCOV surpassed the other models, achieving impressive performance metrics, with an accuracy of 93% on the test dataset. This proposed Graph Neural Network (GNN) model offers a rapid and efficient method for screening drugs susceptible to gut microbiota depletion and also encourages the improvement of patient-specific dosage responses and formulations.


Assuntos
Microbioma Gastrointestinal , Redes Neurais de Computação , Humanos , Microbioma Gastrointestinal/efeitos dos fármacos , Microbiota/efeitos dos fármacos , Aprendizado de Máquina , Aprendizado Profundo
6.
Comput Biol Chem ; 112: 108130, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38954849

RESUMO

Retrosynthesis is vital in synthesizing target products, guiding reaction pathway design crucial for drug and material discovery. Current models often neglect multi-scale feature extraction, limiting efficacy in leveraging molecular descriptors. Our proposed SB-Net model, a deep-learning architecture tailored for retrosynthesis prediction, addresses this gap. SB-Net combines CNN and Bi-LSTM architectures, excelling in capturing multi-scale molecular features. It integrates parallel branches for processing one-hot encoded descriptors and ECFP, merging through dense layers. Experimental results demonstrate SB-Net's superiority, achieving 73.6 % top-1 and 94.6 % top-10 accuracy on USPTO-50k data. Versatility is validated on MetaNetX, with rates of 52.8 % top-1, 74.3 % top-3, 79.8 % top-5, and 83.5 % top-10. SB-Net's success in bioretrosynthesis prediction tasks indicates its efficacy. This research advances computational chemistry, offering a robust deep-learning model for retrosynthesis prediction. With implications for drug discovery and synthesis planning, SB-Net promises innovative and efficient pathways.

7.
Artigo em Inglês | MEDLINE | ID: mdl-39042543

RESUMO

The emergence of immune-evasive mutations in the SARS-CoV-2 spike protein is consistently challenging existing vaccines and therapies, making precise prediction of their escape potential a critical imperative. Artificial Intelligence(AI) holds great promise for deciphering the intricate language of protein. Here, we employed a Generative Adversarial Network to decipher the hidden escape pathways within the spike protein by generating spikes that closely resemble natural ones. Through comprehensive analysis, we demonstrated that generated sequences capture natural escape characteristics. Moreover, incorporating these sequences into an AI-based escape prediction model significantly enhanced its performance, achieving a 7% increase in detecting natural escape mutations on the experimentally validated Greaney dataset. Similar improvements were observed on other datasets, demonstrating the model's generalizability. Precisely predicting immune-evasive spikes not only enables the design of strategically targeted therapies but also has the potential to expedite future viral therapeutics. This breakthrough carries profound implications for shaping a more resilient future against viral threats.

8.
Int J Mol Sci ; 25(11)2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38892144

RESUMO

In this study, we present an innovative approach to improve the prediction of protein-protein interactions (PPIs) through the utilization of an ensemble classifier, specifically focusing on distinguishing between native and non-native interactions. Leveraging the strengths of various base models, including random forest, gradient boosting, extreme gradient boosting, and light gradient boosting, our ensemble classifier integrates these diverse predictions using a logistic regression meta-classifier. Our model was evaluated using a comprehensive dataset generated from molecular dynamics simulations. While the gains in AUC and other metrics might seem modest, they contribute to a model that is more robust, consistent, and adaptable. To assess the effectiveness of various approaches, we compared the performance of logistic regression to four baseline models. Our results indicate that logistic regression consistently underperforms across all evaluated metrics. This suggests that it may not be well-suited to capture the complex relationships within this dataset. Tree-based models, on the other hand, appear to be more effective for problems involving molecular dynamics simulations. Extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) are optimized for performance and speed, handling datasets effectively and incorporating regularizations to avoid over-fitting. Our findings indicate that the ensemble method enhances the predictive capability of PPIs, offering a promising tool for computational biology and drug discovery by accurately identifying potential interaction sites and facilitating the understanding of complex protein functions within biological systems.


Assuntos
Simulação de Dinâmica Molecular , Mapeamento de Interação de Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Algoritmos , Ligação Proteica , Modelos Logísticos
9.
J Chem Inf Model ; 64(13): 4941-4957, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38874445

RESUMO

Anticancer peptides (ACPs) play a vital role in selectively targeting and eliminating cancer cells. Evaluating and comparing predictions from various machine learning (ML) and deep learning (DL) techniques is challenging but crucial for anticancer drug research. We conducted a comprehensive analysis of 15 ML and 10 DL models, including the models released after 2022, and found that support vector machines (SVMs) with feature combination and selection significantly enhance overall performance. DL models, especially convolutional neural networks (CNNs) with light gradient boosting machine (LGBM) based feature selection approaches, demonstrate improved characterization. Assessment using a new test data set (ACP10) identifies ACPred, MLACP 2.0, AI4ACP, mACPred, and AntiCP2.0_AAC as successive optimal predictors, showcasing robust performance. Our review underscores current prediction tool limitations and advocates for an omnidirectional ACP prediction framework to propel ongoing research.


Assuntos
Antineoplásicos , Neoplasias , Peptídeos , Neoplasias/tratamento farmacológico , Peptídeos/química , Humanos , Antineoplásicos/química , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Aprendizado Profundo , Aprendizado de Máquina , Redes Neurais de Computação , Inteligência Artificial , Máquina de Vetores de Suporte
10.
J Cheminform ; 16(1): 66, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849917

RESUMO

Accurate ligand binding site prediction (LBSP) within proteins is essential for drug discovery. We developed ProteinUNetResNetV2.0 (PUResNetV2.0), leveraging sparse representation of protein structures to improve LBSP accuracy. Our training dataset included protein complexes from 4729 protein families. Evaluations on benchmark datasets showed that PUResNetV2.0 achieved an 85.4% Distance Center Atom (DCA) success rate and a 74.7% F1 Score on the Holo801 dataset, outperforming existing methods. However, its performance in specific cases, such as RNA, DNA, peptide-like ligand, and ion binding site prediction, was limited due to constraints in our training data. Our findings underscore the potential of sparse representation in LBSP, especially for oligomeric structures, suggesting PUResNetV2.0 as a promising tool for computational drug discovery.

11.
Comput Biol Med ; 178: 108737, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38879934

RESUMO

High-affinity ligand peptides for ion channels are essential for controlling the flow of ions across the plasma membrane. These peptides are now being investigated as possible therapeutic possibilities for a variety of illnesses, including cancer and cardiovascular disease. So, the identification and interpretation of ligand peptide inhibitors to control ion flow across cells become pivotal for exploration. In this work, we developed an ensemble-based model, NaII-Pred, for the identification of sodium ion inhibitors. The ensemble model was trained, tested, and evaluated on three different datasets. The NaII-Pred method employs six different descriptors and a hybrid feature set in conjunction with five conventional machine learning classifiers to create 35 baseline models. Through an ensemble approach, the top five baseline models trained on the hybrid feature set were integrated to yield the final predictive model, NaII-Pred. Our proposed model, NaII-Pred, outperforms the baseline models and the current predictors on both datasets. We believe NaII-Pred will play a critical role in screening and identifying potential sodium ion inhibitors and will be an invaluable tool.


Assuntos
Aprendizado de Máquina , Sódio/metabolismo , Sódio/química , Humanos , Bloqueadores dos Canais de Sódio/farmacologia
12.
Comput Biol Med ; 176: 108560, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38754218

RESUMO

Mutagenicity assessment plays a pivotal role in the safety evaluation of chemicals, pharmaceuticals, and environmental compounds. In recent years, the development of robust computational models for predicting chemical mutagenicity has gained significant attention, driven by the need for efficient and cost-effective toxicity assessments. In this paper, we proposed AMPred-CNN, an innovative Ames mutagenicity prediction model based on Convolutional Neural Networks (CNNs), uniquely employing molecular structures as images to leverage CNNs' powerful feature extraction capabilities. The study employs the widely used benchmark mutagenicity dataset from Hansen et al. for model development and evaluation. Comparative analyses with traditional ML models on different molecular features reveal substantial performance enhancements. AMPred-CNN outshines these models, demonstrating superior accuracy, AUC, F1 score, MCC, sensitivity, and specificity on the test set. Notably, AMPred-CNN is further benchmarked against seven recent ML and DL models, consistently showcasing superior performance with an impressive AUC of 0.954. Our study highlights the effectiveness of CNNs in advancing mutagenicity prediction, paving the way for broader applications in toxicology and drug development.


Assuntos
Testes de Mutagenicidade , Mutagênicos , Redes Neurais de Computação , Mutagênicos/toxicidade
13.
Int J Mol Sci ; 25(7)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38612558

RESUMO

Cruzipain inhibitors are required after medications to treat Chagas disease because of the need for safer, more effective treatments. Trypanosoma cruzi is the source of cruzipain, a crucial cysteine protease that has driven interest in using computational methods to create more effective inhibitors. We employed a 3D-QSAR model, using a dataset of 36 known inhibitors, and a pharmacophore model to identify potential inhibitors for cruzipain. We also built a deep learning model using the Deep purpose library, trained on 204 active compounds, and validated it with a specific test set. During a comprehensive screening of the Drug Bank database of 8533 molecules, pharmacophore and deep learning models identified 1012 and 340 drug-like molecules, respectively. These molecules were further evaluated through molecular docking, followed by induced-fit docking. Ultimately, molecular dynamics simulation was performed for the final potent inhibitors that exhibited strong binding interactions. These results present four novel cruzipain inhibitors that can inhibit the cruzipain protein of T. cruzi.


Assuntos
Doença de Chagas , Cisteína Endopeptidases , Humanos , Simulação de Acoplamento Molecular , Proteínas de Protozoários , Doença de Chagas/tratamento farmacológico , Desenho de Fármacos
14.
Comput Biol Med ; 174: 108438, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38613893

RESUMO

BACKGROUND: Angiogenesis plays a vital role in the pathogenesis of several human diseases, particularly in the case of solid tumors. In the realm of cancer treatment, recent investigations into peptides with anti-angiogenic properties have yielded encouraging outcomes, thereby creating a hopeful therapeutic avenue for the treatment of cancer. Therefore, correctly identifying the anti-angiogenic peptides is extremely important in comprehending their biophysical and biochemical traits, laying the groundwork for uncovering novel drugs to combat cancer. METHODS: In this work, we present a novel ensemble-learning-based model, Stack-AAgP, specifically designed for the accurate identification and interpretation of anti-angiogenic peptides (AAPs). Initially, a feature representation approach is employed, generating 24 baseline models through six machine learning algorithms (random forest [RF], extra tree classifier [ETC], extreme gradient boosting [XGB], light gradient boosting machine [LGBM], CatBoost, and SVM) and four feature encodings (pseudo-amino acid composition [PAAC], amphiphilic pseudo-amino acid composition [APAAC], composition of k-spaced amino acid pairs [CKSAAP], and quasi-sequence-order [QSOrder]). Subsequently, the output (predicted probabilities) from 24 baseline models was inputted into the same six machine-learning classifiers to generate their respective meta-classifiers. Finally, the meta-classifiers were stacked together using the ensemble-learning framework to construct the final predictive model. RESULTS: Findings from the independent test demonstrate that Stack-AAgP outperforms the state-of-the-art methods by a considerable margin. Systematic experiments were conducted to assess the influence of hyperparameters on the proposed model. Our model, Stack-AAgP, was evaluated on the independent NT15 dataset, revealing superiority over existing predictors with an accuracy improvement ranging from 5% to 7.5% and an increase in Matthews Correlation Coefficient (MCC) from 7.2% to 12.2%.


Assuntos
Inibidores da Angiogênese , Aprendizado de Máquina , Inibidores da Angiogênese/uso terapêutico , Humanos , Peptídeos/química , Biologia Computacional/métodos , Algoritmos
15.
Arch Toxicol ; 98(8): 2647-2658, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38619593

RESUMO

Cytochrome P450 enzymes are a superfamily of enzymes responsible for the metabolism of a variety of medicines and xenobiotics. Among the Cytochrome P450 family, five isozymes that include 1A2, 2C9, 2C19, 2D6, and 3A4 are most important for the metabolism of xenobiotics. Inhibition of any of these five CYP isozymes causes drug-drug interactions with high pharmacological and toxicological effects. So, the inhibition or non-inhibition prediction of these isozymes is of great importance. Many techniques based on machine learning and deep learning algorithms are currently being used to predict whether these isozymes will be inhibited or not. In this study, three different molecular or substructural properties that include Morgan, MACCS and Morgan (combined) and RDKit of the various molecules are used to train a distinct SVM model against each isozyme (1A2, 2C9, 2C19, 2D6, and 3A4). On the independent dataset, Morgan fingerprints provided the best results, while MACCS and Morgan (combined) achieved comparable results in terms of balanced accuracy (BA), sensitivity (Sn), and Mathews correlation coefficient (MCC). For the Morgan fingerprints, balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4 on an independent dataset ranged between 0.81 and 0.85, 0.61 and 0.70, 0.72 and 0.83, respectively. Similarly, on the independent dataset, MACCS and Morgan (combined) fingerprints achieved competitive results in terms of balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4, which ranged between 0.79 and 0.85, 0.59 and 0.69, 0.69 and 0.82, respectively.


Assuntos
Inibidores das Enzimas do Citocromo P-450 , Sistema Enzimático do Citocromo P-450 , Aprendizado de Máquina , Inibidores das Enzimas do Citocromo P-450/farmacologia , Sistema Enzimático do Citocromo P-450/metabolismo , Humanos , Isoenzimas/metabolismo , Interações Medicamentosas , Xenobióticos/toxicidade , Xenobióticos/metabolismo , Máquina de Vetores de Suporte
16.
Biosystems ; 237: 105177, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38458346

RESUMO

The escalating global incidence of cancer poses significant health challenges, underscoring the need for innovative and more efficacious treatments. Cancer immunotherapy, a promising approach leveraging the body's immune system against cancer, emerges as a compelling solution. Consequently, the identification and characterization of tumor T-cell antigens (TTCAs) have become pivotal for exploration. In this manuscript, we introduce TTCA-IF, an integrative machine learning-based framework designed for TTCAs identification. TTCA-IF employs ten feature encoding types in conjunction with five conventional machine learning classifiers. To establish a robust foundation, these classifiers are trained, resulting in the creation of 150 baseline models. The outputs from these baseline models are then fed back into the five classifiers, generating their respective meta-models. Through an ensemble approach, the five meta-models are seamlessly integrated to yield the final predictive model, the TTCA-IF model. Our proposed model, TTCA-IF, surpasses both baseline models and existing predictors in performance. In a comparative analysis involving nine novel peptide sequences, TTCA-IF demonstrated exceptional accuracy by correctly identifying 8 out of 9 peptides as TTCAs. As a tool for screening and pinpointing potential TTCAs, we anticipate TTCA-IF to be invaluable in advancing cancer immunotherapy.


Assuntos
Aprendizado de Máquina , Neoplasias , Humanos , Tiazolidinas , Linfócitos T , Neoplasias/terapia , Neoplasias/diagnóstico
17.
iScience ; 27(3): 109200, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38420582

RESUMO

Remarkable and intelligent perovskite solar cells (PSCs) have attracted substantial attention from researchers and are undergoing rapid advancements in photovoltaic technology. These developments aim to create highly efficient energy devices with fewer dominant recombination losses within the realm of third-generation solar cells. Diverse machine learning (ML) algorithms implemented, addressing dominant losses due to recombination in PSCs, focusing on grain boundaries (GBs), interfaces, and band-to-band recombination. The extreme gradient boosting (XGBoost) classifier effectively predicts the recombination losses. Our model trained with 7-fold cross-validation to ensure generalizability and robustness. Leveraging Optuna and shapley additive explanations (SHAP) for hyperparameter optimization and investigate the influence of features on target variables, achieved 85% accuracy on over 2 million simulated data, respectively. Because of the input parameters (light intensity and open-circuit voltage), the performance evaluation measures for the dominant losses caused by the recombination predicted by proposed model were superior to those of state-of-the-art models.

18.
Int J Mol Sci ; 25(2)2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38255790

RESUMO

Computational methods play a pivotal role in the pursuit of efficient drug discovery, enabling the rapid assessment of compound properties before costly and time-consuming laboratory experiments. With the advent of technology and large data availability, machine and deep learning methods have proven efficient in predicting molecular solubility. High-precision in silico solubility prediction has revolutionized drug development by enhancing formulation design, guiding lead optimization, and predicting pharmacokinetic parameters. These benefits result in considerable cost and time savings, resulting in a more efficient and shortened drug development process. The proposed SolPredictor is designed with the aim of developing a computational model for solubility prediction. The model is based on residual graph neural network convolution (RGNN). The RGNNs were designed to capture long-range dependencies in graph-structured data. Residual connections enable information to be utilized over various layers, allowing the model to capture and preserve essential features and patterns scattered throughout the network. The two largest datasets available to date are compiled, and the model uses a simplified molecular-input line-entry system (SMILES) representation. SolPredictor uses the ten-fold split cross-validation Pearson correlation coefficient R2 0.79±0.02 and root mean square error (RMSE) 1.03±0.04. The proposed model was evaluated using five independent datasets. Error analysis, hyperparameter optimization analysis, and model explainability were used to determine the molecular features that were most valuable for prediction.


Assuntos
Desenvolvimento de Medicamentos , Descoberta de Drogas , Solubilidade , Correlação de Dados , Redes Neurais de Computação
19.
Comput Biol Med ; 170: 108007, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38242015

RESUMO

Drug combinations are frequently used to treat cancer to reduce side effects and increase efficacy. The experimental discovery of drug combination synergy is time-consuming and expensive for large datasets. Therefore, an efficient and reliable computational approach is required to investigate these drug combinations. Advancements in deep learning can handle large datasets with various biological problems. In this study, we developed a SynergyGTN model based on the Graph Transformer Network to predict the synergistic drug combinations against an untreated cancer cell line expression profile. We represent the drug via a graph, with each node and edge of the graph containing nine types of atomic feature vectors and four bonds features, respectively. The cell lines represent based on their gene expression profiles. The drug graph was passed through the GTN layers to extract a generalized feature map for each drug pairs. The drug pair extracted features and cell-line gene expression profiles were concatenated and subsequently subjected to processing through multiple densely connected layers. SynergyGTN outperformed the state-of-the-art methods, with a receiver operating characteristic area under the curve improvement of 5% on the 5-fold cross-validation. The accuracy of SynergyGTN was further verified through three types of cross-validation tests strategies namely leave-drug-out, leave-combination-out, and leave-tissue-out, resulting in improvement in accuracy of 8%, 1%, and 2%, respectively. The Astrazeneca Dream dataset was utilized as an independent dataset to validate and assess the generalizability of the proposed method, resulting in an improvement in balanced accuracy of 13%. In conclusion, SynergyGTN is a reliable and efficient computational approach for predicting drug combination synergy in cancer treatment. Finally, we developed a web server tool to facilitate the pharmaceutical industry and researchers, as available at: http://nsclbio.jbnu.ac.kr/tools/SynergyGTN/.


Assuntos
Biologia Computacional , Transcriptoma , Sinergismo Farmacológico , Biologia Computacional/métodos , Combinação de Medicamentos , Linhagem Celular Tumoral
20.
Comput Biol Med ; 169: 107925, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38183701

RESUMO

Serine phosphorylation plays a pivotal role in the pathogenesis of various cellular processes and diseases. Roughly 81% of human diseases have links to phosphorylation, and an overwhelming 86.4% of protein phosphorylation takes place at serine residues. In eukaryotes, over a quarter of proteins undergo phosphorylation, with more than half implicated in numerous disorders, notably cancer and reproductive system diseases. This study primarily focuses on serine-phosphorylation-driven pathogenesis and the critical role of conserved motif identification. While numerous techniques exist for predicting serine phosphorylation sites, traditional wet lab experiments are resource-intensive. Our paper introduces a cutting-edge deep learning tool for predicting S phosphorylation sites, integrating explainable AI for motif identification, a transformer language model, and deep neural network components. We trained our model on protein sequences from UniProt, validated it against the dbPTM benchmark dataset, and employed the PTMD dataset to explore motifs related to mammalian disorders. Our results highlight that our model surpasses other deep learning predictors by a significant 3%. Furthermore, we utilized the local interpretable model-agnostic explanations (LIME) approach to shed light on the predictions, emphasizing the amino acid residues crucial for S phosphorylation. Notably, our model also outperformed competitors in kinase-specific serine phosphorylation prediction on benchmark datasets.


Assuntos
Redes Neurais de Computação , Proteínas , Animais , Humanos , Fosforilação , Proteínas/metabolismo , Sequência de Aminoácidos , Serina/metabolismo , Mamíferos/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA