Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38662579

RESUMO

MOTIVATION: Recent advancements in natural language processing have highlighted the effectiveness of global contextualized representations from protein language models (pLMs) in numerous downstream tasks. Nonetheless, strategies to encode the site-of-interest leveraging pLMs for per-residue prediction tasks, such as crotonylation (Kcr) prediction, remain largely uncharted. RESULTS: Herein, we adopt a range of approaches for utilizing pLMs by experimenting with different input sequence types (full-length protein sequence versus window sequence), assessing the implications of utilizing per-residue embedding of the site-of-interest as well as embeddings of window residues centered around it. Building upon these insights, we developed a novel residual ConvBiLSTM network designed to process window-level embeddings of the site-of-interest generated by the ProtT5-XL-UniRef50 pLM using full-length sequences as input. This model, termed T5ResConvBiLSTM, surpasses existing state-of-the-art Kcr predictors in performance across three diverse datasets. To validate our approach of utilizing full sequence-based window-level embeddings, we also delved into the interpretability of ProtT5-derived embedding tensors in two ways: firstly, by scrutinizing the attention weights obtained from the transformer's encoder block; and secondly, by computing SHAP values for these tensors, providing a model-agnostic interpretation of the prediction results. Additionally, we enhance the latent representation of ProtT5 by incorporating two additional local representations, one derived from amino acid properties and the other from supervised embedding layer, through an intermediate fusion stacked generalization approach, using an n-mer window sequence (or, peptide/fragment). The resultant stacked model, dubbed LMCrot, exhibits a more pronounced improvement in predictive performance across the tested datasets. AVAILABILITY AND IMPLEMENTATION: LMCrot is publicly available at https://github.com/KCLabMTU/LMCrot.


Assuntos
Proteínas , Proteínas/química , Proteínas/metabolismo , Processamento de Linguagem Natural , Biologia Computacional/métodos , Bases de Dados de Proteínas , Software , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos
2.
Int J Mol Sci ; 24(21)2023 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-37958983

RESUMO

O-linked ß-N-acetylglucosamine (O-GlcNAc) is a distinct monosaccharide modification of serine (S) or threonine (T) residues of nucleocytoplasmic and mitochondrial proteins. O-GlcNAc modification (i.e., O-GlcNAcylation) is involved in the regulation of diverse cellular processes, including transcription, epigenetic modifications, and cell signaling. Despite the great progress in experimentally mapping O-GlcNAc sites, there is an unmet need to develop robust prediction tools that can effectively locate the presence of O-GlcNAc sites in protein sequences of interest. In this work, we performed a comprehensive evaluation of a framework for prediction of protein O-GlcNAc sites using embeddings from pre-trained protein language models. In particular, we compared the performance of three protein sequence-based large protein language models (pLMs), Ankh, ESM-2, and ProtT5, for prediction of O-GlcNAc sites and also evaluated various ensemble strategies to integrate embeddings from these protein language models. Upon investigation, the decision-level fusion approach that integrates the decisions of the three embedding models, which we call LM-OGlcNAc-Site, outperformed the models trained on these individual language models as well as other fusion approaches and other existing predictors in almost all of the parameters evaluated. The precise prediction of O-GlcNAc sites will facilitate the probing of O-GlcNAc site-specific functions of proteins in physiology and diseases. Moreover, these findings also indicate the effectiveness of combined uses of multiple protein language models in post-translational modification prediction and open exciting avenues for further research and exploration in other protein downstream tasks. LM-OGlcNAc-Site's web server and source code are publicly available to the community.


Assuntos
Processamento de Proteína Pós-Traducional , Proteínas , Proteínas/química , Sequência de Aminoácidos , Acetilglucosamina/metabolismo , N-Acetilglucosaminiltransferases/metabolismo
3.
iScience ; 26(10): 107817, 2023 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-37744034

RESUMO

Extracellular signal-regulated kinases 1 and 2 (ERK1/2) are dysregulated in many pervasive diseases. Recently, we discovered that ERK1/2 is oxidized by signal-generated hydrogen peroxide in various cell types. Since the putative sites of oxidation lie within or near ERK1/2's ligand-binding surfaces, we investigated how oxidation of ERK2 regulates interactions with the model substrates Sub-D and Sub-F. These studies revealed that ERK2 undergoes sulfenylation at C159 on its D-recruitment site surface and that this modification modulates ERK2 activity differentially between substrates. Integrated biochemical, computational, and mutational analyses suggest a plausible mechanism for peroxide-dependent changes in ERK2-substrate interactions. Interestingly, oxidation decreased ERK2's affinity for some D-site ligands while increasing its affinity for others. Finally, oxidation by signal-generated peroxide enhanced ERK1/2's ability to phosphorylate ribosomal S6 kinase A1 (RSK1) in HeLa cells. Together, these studies lay the foundation for examining crosstalk between redox- and phosphorylation-dependent signaling at the level of kinase-substrate selection.

4.
J Proteome Res ; 22(8): 2548-2557, 2023 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-37459437

RESUMO

Phosphorylation is one of the most important post-translational modifications and plays a pivotal role in various cellular processes. Although there exist several computational tools to predict phosphorylation sites, existing tools have not yet harnessed the knowledge distilled by pretrained protein language models. Herein, we present a novel deep learning-based approach called LMPhosSite for the general phosphorylation site prediction that integrates embeddings from the local window sequence and the contextualized embedding obtained using global (overall) protein sequence from a pretrained protein language model to improve the prediction performance. Thus, the LMPhosSite consists of two base-models: one for capturing effective local representation and the other for capturing global per-residue contextualized embedding from a pretrained protein language model. The output of these base-models is integrated using a score-level fusion approach. LMPhosSite achieves a precision, recall, Matthew's correlation coefficient, and F1-score of 38.78%, 67.12%, 0.390, and 49.15%, for the combined serine and threonine independent test data set and 34.90%, 62.03%, 0.298, and 44.67%, respectively, for the tyrosine independent test data set, which is better than the compared approaches. These results demonstrate that LMPhosSite is a robust computational tool for the prediction of the general phosphorylation sites in proteins.


Assuntos
Aprendizado Profundo , Fosforilação , Proteínas/metabolismo , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos
5.
Glycobiology ; 33(5): 411-422, 2023 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-37067908

RESUMO

Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.


Assuntos
Aminoácidos , Glicoproteínas , Humanos , Glicosilação , Glicoproteínas/metabolismo , Aminoácidos/química , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos
6.
Sensors (Basel) ; 23(8)2023 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-37112326

RESUMO

Older adults are more vulnerable to falling due to normal changes due to aging, and their falls are a serious medical risk with high healthcare and societal costs. However, there is a lack of automatic fall detection systems for older adults. This paper reports (1) a wireless, flexible, skin-wearable electronic device for both accurate motion sensing and user comfort, and (2) a deep learning-based classification algorithm for reliable fall detection of older adults. The cost-effective skin-wearable motion monitoring device is designed and fabricated using thin copper films. It includes a six-axis motion sensor and is directly laminated on the skin without adhesives for the collection of accurate motion data. To study accurate fall detection using the proposed device, different deep learning models, body locations for the device placement, and input datasets are investigated using motion data based on various human activities. Our results indicate the optimal location to place the device is the chest, achieving accuracy of more than 98% for falls with motion data from older adults. Moreover, our results suggest a large motion dataset directly collected from older adults is essential to improve the accuracy of fall detection for the older adult population.


Assuntos
Aprendizado Profundo , Dispositivos Eletrônicos Vestíveis , Humanos , Idoso , Algoritmos , Movimento (Física)
7.
BMC Bioinformatics ; 24(1): 41, 2023 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-36755242

RESUMO

BACKGROUND: Protein S-nitrosylation (SNO) plays a key role in transferring nitric oxide-mediated signals in both animals and plants and has emerged as an important mechanism for regulating protein functions and cell signaling of all main classes of protein. It is involved in several biological processes including immune response, protein stability, transcription regulation, post translational regulation, DNA damage repair, redox regulation, and is an emerging paradigm of redox signaling for protection against oxidative stress. The development of robust computational tools to predict protein SNO sites would contribute to further interpretation of the pathological and physiological mechanisms of SNO. RESULTS: Using an intermediate fusion-based stacked generalization approach, we integrated embeddings from supervised embedding layer and contextualized protein language model (ProtT5) and developed a tool called pLMSNOSite (protein language model-based SNO site predictor). On an independent test set of experimentally identified SNO sites, pLMSNOSite achieved values of 0.340, 0.735 and 0.773 for MCC, sensitivity and specificity respectively. These results show that pLMSNOSite performs better than the compared approaches for the prediction of S-nitrosylation sites. CONCLUSION: Together, the experimental results suggest that pLMSNOSite achieves significant improvement in the prediction performance of S-nitrosylation sites and represents a robust computational approach for predicting protein S-nitrosylation sites. pLMSNOSite could be a useful resource for further elucidation of SNO and is publicly available at https://github.com/KCLabMTU/pLMSNOSite .


Assuntos
Óxido Nítrico , Proteínas , Animais , Proteínas/metabolismo , Óxido Nítrico/metabolismo , Oxirredução , Processamento de Proteína Pós-Traducional , Transdução de Sinais
8.
Sci Rep ; 13(1): 3277, 2023 Feb 25.
Artigo em Inglês | MEDLINE | ID: mdl-36841922

RESUMO

With the technological advancement in recent years and the widespread use of magnetism in every sector of the current technology, a search for a low-cost magnetic material has been more important than ever. The discovery of magnetism in alternate materials such as metal chalcogenides with abundant atomic constituents would be a milestone in such a scenario. However, considering the multitude of possible chalcogenide configurations, predictive computational modeling or experimental synthesis is an open challenge. Here, we recourse to a stacked generalization machine learning model to predict magnetic moment (µB) in hexagonal Fe-based bimetallic chalcogenides, FexAyB; A represents Ni, Co, Cr, or Mn, and B represents S, Se, or Te, and x and y represent the concentration of respective atoms. The stacked generalization model is trained on the dataset obtained using first-principles density functional theory. The model achieves MSE, MAE, and R2 values of 1.655 (µB)2, 0.546 (µB), and 0.922 respectively on an independent test set, indicating that our model predicts the compositional dependent magnetism in bimetallic chalcogenides with a high degree of accuracy. A generalized algorithm is also developed to test the universality of our proposed model for any concentration of Ni, Co, Cr, or Mn up to 62.5% in bimetallic chalcogenides.

9.
Sci Rep ; 12(1): 16933, 2022 10 08.
Artigo em Inglês | MEDLINE | ID: mdl-36209286

RESUMO

Protein succinylation is an important post-translational modification (PTM) responsible for many vital metabolic activities in cells, including cellular respiration, regulation, and repair. Here, we present a novel approach that combines features from supervised word embedding with embedding from a protein language model called ProtT5-XL-UniRef50 (hereafter termed, ProtT5) in a deep learning framework to predict protein succinylation sites. To our knowledge, this is one of the first attempts to employ embedding from a pre-trained protein language model to predict protein succinylation sites. The proposed model, dubbed LMSuccSite, achieves state-of-the-art results compared to existing methods, with performance scores of 0.36, 0.79, 0.79 for MCC, sensitivity, and specificity, respectively. LMSuccSite is likely to serve as a valuable resource for exploration of succinylation and its role in cellular physiology and disease.


Assuntos
Biologia Computacional , Lisina , Biologia Computacional/métodos , Idioma , Lisina/metabolismo , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo
11.
Methods Mol Biol ; 2499: 65-104, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696075

RESUMO

Machine learning has become one of the most popular choices for developing computational approaches in protein structural bioinformatics. The ability to extract features from protein sequence/structure often becomes one of the crucial steps for the development of machine learning-based approaches. Over the years, various sequence, structural, and physicochemical descriptors have been developed for proteins and these descriptors have been used to predict/solve various bioinformatics problems. Hence, several feature extraction tools have been developed over the years to help researchers to generate numeric features from protein sequences. Most of these tools have some limitations regarding the number of sequences they can handle and the subsequent preprocessing that is required for the generated features before they can be fed to machine learning methods. Here, we present Feature Extraction from Protein Sequences (FEPS), a toolkit for feature extraction. FEPS is a versatile software package for generating various descriptors from protein sequences and can handle several sequences: the number of which is limited only by the computational resources. In addition, the features extracted from FEPS do not require subsequent processing and are ready to be fed to the machine learning techniques as it provides various output formats as well as the ability to concatenate these generated features. FEPS is made freely available via an online web server as well as a stand-alone toolkit. FEPS, a comprehensive toolkit for feature extraction, will help spur the development of machine learning-based models for various bioinformatics problems.


Assuntos
Biologia Computacional , Software , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Aprendizado de Máquina , Proteínas/química
12.
Methods Mol Biol ; 2499: 155-176, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696080

RESUMO

Peroxiredoxins (Prxs) are a protein superfamily, present in all organisms, that play a critical role in protecting cellular macromolecules from oxidative damage but also regulate intracellular and intercellular signaling processes involving redox-regulated proteins and pathways. Bioinformatic approaches using computational tools that focus on active site-proximal sequence fragments (known as active site signatures) and iterative clustering and searching methods (referred to as TuLIP and MISST) have recently enabled the recognition of over 38,000 peroxiredoxins, as well as their classification into six functionally relevant groups. With these data providing so many examples of Prxs in each class, machine learning approaches offer an opportunity to extract additional information about features characteristic of these protein groups.In this study, we developed a novel computational method named "RF-Prx" based on a random forest (RF) approach integrated with K-space amino acid pairs (KSAAP) to identify peroxiredoxins and classify them into one of six subgroups. Our process performed in a superior manner compared to other machine learning classifiers. Thus the RF approach integrated with K-space amino acid pairs enabled the detection of class-specific conserved sequences outside the known functional centers and with potential importance. For example, drugs designed to target Prx proteins would likely suffer from cross-reactivity among distinct Prxs if targeted to conserved active sites, but this may be avoidable if remote, class-specific regions could be targeted instead.


Assuntos
Biologia Computacional , Peroxirredoxinas , Aminoácidos/metabolismo , Oxirredução , Estresse Oxidativo , Peroxirredoxinas/química
13.
Methods Mol Biol ; 2499: 285-322, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696087

RESUMO

Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.


Assuntos
Aprendizado Profundo , Proteômica , Espectrometria de Massas , Processamento de Proteína Pós-Traducional , Proteínas/química
14.
Sci Rep ; 12(1): 6541, 2022 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-35449168

RESUMO

In classical machine learning, regressors are trained without attempting to gain insight into the mechanism connecting inputs and outputs. Natural sciences, however, are interested in finding a robust interpretable function for the target phenomenon, that can return predictions even outside of the training domains. This paper focuses on viscosity prediction problem in steelmaking, and proposes Einstein-Roscoe regression (ERR), which learns the coefficients of the Einstein-Roscoe equation, and is able to extrapolate to unseen domains. Besides, it is often the case in the natural sciences that some measurements are unavailable or expensive than the others due to physical constraints. To this end, we employ a transfer learning framework based on Gaussian process, which allows us to estimate the regression parameters using the auxiliary measurements available in a reasonable cost. In experiments using the viscosity measurements in high temperature slag suspension system, ERR is compared favorably with various machine learning approaches in interpolation settings, while outperformed all of them in extrapolation settings. Furthermore, after estimating parameters using the auxiliary dataset obtained at room temperature, an increase in accuracy is observed in the high temperature dataset, which corroborates the effectiveness of the proposed approach.

15.
Bioinformatics ; 38(6): 1754-1755, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-34978562

RESUMO

MOTIVATION: Accurate and efficient predictions of protein structures play an important role in understanding their functions. Iterative Threading Assembly Refinement (I-TASSER) is one of the most successful and widely used protein structure prediction methods in the recent community-wide CASP experiments. Yet, the computational efficiency of I-TASSER is one of the limiting factors that prevent its application for large-scale structure modeling. RESULTS: We present I-TASSER for Graphics Processing Units (GPU-I-TASSER), a GPU accelerated I-TASSER protein structure prediction tool for fast and accurate protein structure prediction. Our implementation is based on OpenACC parallelization of the replica-exchange Monte Carlo simulations to enhance the speed of I-TASSER by extending its capabilities to the GPU architecture. On a benchmark dataset of 71 protein structures, GPU-I-TASSER achieves on average a 10× speedup with comparable structure prediction accuracy compared to the CPU version of the I-TASSER. AVAILABILITY AND IMPLEMENTATION: The complete source code for GPU-I-TASSER can be downloaded and used without restriction from https://zhanggroup.org/GPU-I-TASSER/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Proteínas/química , Método de Monte Carlo , Algoritmos
16.
Molecules ; 26(23)2021 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-34885895

RESUMO

Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.


Assuntos
Proteoma/química , Aprendizado Profundo , Glicosilação , Humanos , Modelos Biológicos , Redes Neurais de Computação , Polissacarídeos/análise , Processamento de Proteína Pós-Traducional
17.
Front Cell Dev Biol ; 9: 662983, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34249915

RESUMO

Phosphorylation, which is mediated by protein kinases and opposed by protein phosphatases, is an important post-translational modification that regulates many cellular processes, including cellular metabolism, cell migration, and cell division. Due to its essential role in cellular physiology, a great deal of attention has been devoted to identifying sites of phosphorylation on cellular proteins and understanding how modification of these sites affects their cellular functions. This has led to the development of several computational methods designed to predict sites of phosphorylation based on a protein's primary amino acid sequence. In contrast, much less attention has been paid to dephosphorylation and its role in regulating the phosphorylation status of proteins inside cells. Indeed, to date, dephosphorylation site prediction tools have been restricted to a few tyrosine phosphatases. To fill this knowledge gap, we have employed a transfer learning strategy to develop a deep learning-based model to predict sites that are likely to be dephosphorylated. Based on independent test results, our model, which we termed DTL-DephosSite, achieved efficiency scores for phosphoserine/phosphothreonine residues of 84%, 84% and 0.68 with respect to sensitivity (SN), specificity (SP) and Matthew's correlation coefficient (MCC). Similarly, DTL-DephosSite exhibited efficiency scores of 75%, 88% and 0.64 for phosphotyrosine residues with respect to SN, SP, and MCC.

18.
Sci Rep ; 11(1): 12550, 2021 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-34131195

RESUMO

Protein phosphorylation, which is one of the most important post-translational modifications (PTMs), is involved in regulating myriad cellular processes. Herein, we present a novel deep learning based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii, a model algal phototroph. An ensemble model combining convolutional neural networks and long short-term memory (LSTM) achieves the best performance in predicting phosphorylation sites in C. reinhardtii. Deemed Chlamy-EnPhosSite, the measured best AUC and MCC are 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing higher than those measures for other predictors. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC-MS/MS) in a blinded study and approximately 89.69% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 76.83% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga.


Assuntos
Chlamydomonas reinhardtii/genética , Aprendizado Profundo , Fosfoproteínas/genética , Proteoma/genética , Cromatografia Líquida , Fosforilação/genética , Processamento de Proteína Pós-Traducional/genética , Serina/genética , Espectrometria de Massas em Tandem , Treonina/genética
19.
Int J Mol Sci ; 22(11)2021 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-34074028

RESUMO

Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.


Assuntos
Biologia Computacional/métodos , Microscopia Crioeletrônica/métodos , Aprendizado Profundo , Proteínas/química , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Modelos Moleculares , Redes Neurais de Computação , Conformação Proteica , Software
20.
BMC Bioinformatics ; 21(Suppl 3): 63, 2020 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-32321437

RESUMO

BACKGROUND: Protein succinylation has recently emerged as an important and common post-translation modification (PTM) that occurs on lysine residues. Succinylation is notable both in its size (e.g., at 100 Da, it is one of the larger chemical PTMs) and in its ability to modify the net charge of the modified lysine residue from + 1 to - 1 at physiological pH. The gross local changes that occur in proteins upon succinylation have been shown to correspond with changes in gene activity and to be perturbed by defects in the citric acid cycle. These observations, together with the fact that succinate is generated as a metabolic intermediate during cellular respiration, have led to suggestions that protein succinylation may play a role in the interaction between cellular metabolism and important cellular functions. For instance, succinylation likely represents an important aspect of genomic regulation and repair and may have important consequences in the etiology of a number of disease states. In this study, we developed DeepSuccinylSite, a novel prediction tool that uses deep learning methodology along with embedding to identify succinylation sites in proteins based on their primary structure. RESULTS: Using an independent test set of experimentally identified succinylation sites, our method achieved efficiency scores of 79%, 68.7% and 0.48 for sensitivity, specificity and MCC respectively, with an area under the receiver operator characteristic (ROC) curve of 0.8. In side-by-side comparisons with previously described succinylation predictors, DeepSuccinylSite represents a significant improvement in overall accuracy for prediction of succinylation sites. CONCLUSION: Together, these results suggest that our method represents a robust and complementary technique for advanced exploration of protein succinylation.


Assuntos
Aprendizado Profundo , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Succinatos/metabolismo , Sítios de Ligação , Ciclo do Ácido Cítrico , Lisina/metabolismo , Proteínas/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...