Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
BMC Bioinformatics ; 24(1): 41, 2023 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-36755242

RESUMO

BACKGROUND: Protein S-nitrosylation (SNO) plays a key role in transferring nitric oxide-mediated signals in both animals and plants and has emerged as an important mechanism for regulating protein functions and cell signaling of all main classes of protein. It is involved in several biological processes including immune response, protein stability, transcription regulation, post translational regulation, DNA damage repair, redox regulation, and is an emerging paradigm of redox signaling for protection against oxidative stress. The development of robust computational tools to predict protein SNO sites would contribute to further interpretation of the pathological and physiological mechanisms of SNO. RESULTS: Using an intermediate fusion-based stacked generalization approach, we integrated embeddings from supervised embedding layer and contextualized protein language model (ProtT5) and developed a tool called pLMSNOSite (protein language model-based SNO site predictor). On an independent test set of experimentally identified SNO sites, pLMSNOSite achieved values of 0.340, 0.735 and 0.773 for MCC, sensitivity and specificity respectively. These results show that pLMSNOSite performs better than the compared approaches for the prediction of S-nitrosylation sites. CONCLUSION: Together, the experimental results suggest that pLMSNOSite achieves significant improvement in the prediction performance of S-nitrosylation sites and represents a robust computational approach for predicting protein S-nitrosylation sites. pLMSNOSite could be a useful resource for further elucidation of SNO and is publicly available at https://github.com/KCLabMTU/pLMSNOSite .


Assuntos
Óxido Nítrico , Proteínas , Animais , Proteínas/metabolismo , Óxido Nítrico/metabolismo , Oxirredução , Processamento de Proteína Pós-Traducional , Transdução de Sinais
3.
Methods Mol Biol ; 2499: 285-322, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696087

RESUMO

Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.


Assuntos
Aprendizado Profundo , Proteômica , Espectrometria de Massas , Processamento de Proteína Pós-Traducional , Proteínas/química
4.
Sci Rep ; 12(1): 6541, 2022 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-35449168

RESUMO

In classical machine learning, regressors are trained without attempting to gain insight into the mechanism connecting inputs and outputs. Natural sciences, however, are interested in finding a robust interpretable function for the target phenomenon, that can return predictions even outside of the training domains. This paper focuses on viscosity prediction problem in steelmaking, and proposes Einstein-Roscoe regression (ERR), which learns the coefficients of the Einstein-Roscoe equation, and is able to extrapolate to unseen domains. Besides, it is often the case in the natural sciences that some measurements are unavailable or expensive than the others due to physical constraints. To this end, we employ a transfer learning framework based on Gaussian process, which allows us to estimate the regression parameters using the auxiliary measurements available in a reasonable cost. In experiments using the viscosity measurements in high temperature slag suspension system, ERR is compared favorably with various machine learning approaches in interpolation settings, while outperformed all of them in extrapolation settings. Furthermore, after estimating parameters using the auxiliary dataset obtained at room temperature, an increase in accuracy is observed in the high temperature dataset, which corroborates the effectiveness of the proposed approach.

5.
Mol Omics ; 16(5): 448-454, 2020 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-32555810

RESUMO

Methylation, which is one of the most prominent post-translational modifications on proteins, regulates many important cellular functions. Though several model-based methylation site predictors have been reported, all existing methods employ machine learning strategies, such as support vector machines and random forest, to predict sites of methylation based on a set of "hand-selected" features. As a consequence, the subsequent models may be biased toward one set of features. Moreover, due to the large number of features, model development can often be computationally expensive. In this paper, we propose an alternative approach based on deep learning to predict arginine methylation sites. Our model, which we termed DeepRMethylSite, is computationally less expensive than traditional feature-based methods while eliminating potential biases that can arise through features selection. Based on independent testing on our dataset, DeepRMethylSite achieved efficiency scores of 68%, 82% and 0.51 with respect to sensitivity (SN), specificity (SP) and Matthew's correlation coefficient (MCC), respectively. Importantly, in side-by-side comparisons with other state-of-the-art methylation site predictors, our method performs on par or better in all scoring metrics tested.


Assuntos
Algoritmos , Arginina/metabolismo , Aprendizado Profundo , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Bases de Dados de Proteínas , Metilação , Redes Neurais de Computação , Curva ROC , Reprodutibilidade dos Testes
6.
BMC Bioinformatics ; 21(Suppl 3): 63, 2020 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-32321437

RESUMO

BACKGROUND: Protein succinylation has recently emerged as an important and common post-translation modification (PTM) that occurs on lysine residues. Succinylation is notable both in its size (e.g., at 100 Da, it is one of the larger chemical PTMs) and in its ability to modify the net charge of the modified lysine residue from + 1 to - 1 at physiological pH. The gross local changes that occur in proteins upon succinylation have been shown to correspond with changes in gene activity and to be perturbed by defects in the citric acid cycle. These observations, together with the fact that succinate is generated as a metabolic intermediate during cellular respiration, have led to suggestions that protein succinylation may play a role in the interaction between cellular metabolism and important cellular functions. For instance, succinylation likely represents an important aspect of genomic regulation and repair and may have important consequences in the etiology of a number of disease states. In this study, we developed DeepSuccinylSite, a novel prediction tool that uses deep learning methodology along with embedding to identify succinylation sites in proteins based on their primary structure. RESULTS: Using an independent test set of experimentally identified succinylation sites, our method achieved efficiency scores of 79%, 68.7% and 0.48 for sensitivity, specificity and MCC respectively, with an area under the receiver operator characteristic (ROC) curve of 0.8. In side-by-side comparisons with previously described succinylation predictors, DeepSuccinylSite represents a significant improvement in overall accuracy for prediction of succinylation sites. CONCLUSION: Together, these results suggest that our method represents a robust and complementary technique for advanced exploration of protein succinylation.


Assuntos
Aprendizado Profundo , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Succinatos/metabolismo , Sítios de Ligação , Ciclo do Ácido Cítrico , Lisina/metabolismo , Proteínas/química
7.
Mol Omics ; 15(3): 189-204, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-31025681

RESUMO

Glutarylation, which is a newly identified posttranslational modification that occurs on lysine residues, has recently emerged as an important regulator of several metabolic and mitochondrial processes. However, the specific sites of modification on individual proteins, as well as the extent of glutarylation throughout the proteome, remain largely uncharacterized. Though informative, proteomic approaches based on mass spectrometry can be expensive, technically challenging and time-consuming. Therefore, the ability to predict glutarylation sites from protein primary sequences can complement proteomics analyses and help researchers study the characteristics and functional consequences of glutarylation. To this end, we used Random Forest (RF) machine learning strategies to identify the physiochemical and sequence-based features that correlated most substantially with glutarylation. We then used these features to develop a novel method to predict glutarylation sites from primary amino acid sequences using RF. Based on 10-fold cross-validation, the resulting algorithm, termed 'RF-GlutarySite', achieved efficiency scores of 75%, 81%, 68% and 0.50 with respect to accuracy (ACC), sensitivity (SN), specificity (SP) and Matthew's correlation coefficient (MCC), respectively. Likewise, using an independent test set, RF-GlutarySite exhibited ACC, SN, SP and MCC scores of 72%, 73%, 70% and 0.43, respectively. Results using both 10-fold cross validation and an independent test set were on par with or better than those achieved by existing glutarylation site predictors. Notably, RF-GlutarySite achieved the highest SN score among available glutarylation site prediction tools. Consequently, our method has the potential to uncover new glutarylation sites and to facilitate the discovery of relationships between glutarylation and well-known lysine modifications, such as acetylation, methylation and SUMOylation, as well as a number of recently identified lysine modifications, such as malonylation and succinylation.


Assuntos
Biologia Computacional/métodos , Glutaratos/metabolismo , Proteômica/métodos , Algoritmos , Sequência de Aminoácidos , Aminoácidos , Modelos Químicos , Conformação Proteica , Processamento de Proteína Pós-Traducional , Máquina de Vetores de Suporte
8.
IEEE/ACM Trans Comput Biol Bioinform ; 15(6): 1844-1852, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29990125

RESUMO

The Nuclear Receptor (NR) superfamily plays an important role in key biological, developmental, and physiological processes. Developing a method for the classification of NR proteins is an important step towards understanding the structure and functions of the newly discovered NR protein. The recent studies on NR classification are either unable to achieve optimum accuracy or are not designed for all the known NR subfamilies. In this study, we developed RF-NR, which is a Random Forest based approach for improved classification of nuclear receptors. The RF-NR can predict whether a query protein sequence belongs to one of the eight NR subfamilies or it is a non-NR sequence. The RF-NR uses spectrum-like features namely: Amino Acid Composition, Di-peptide Composition, and Tripeptide Composition. Benchmarking on two independent datasets with varying sequence redundancy reduction criteria, the RF-NR achieves better (or comparable) accuracy than other existing methods. The added advantage of our approach is that we can also obtain biological insights about the important features that are required to classify NR subfamilies. RF-NR is freely available at http://bcb.ncat.edu/RF_NR.


Assuntos
Biologia Computacional/métodos , Receptores Citoplasmáticos e Nucleares/química , Receptores Citoplasmáticos e Nucleares/classificação , Algoritmos , Bases de Dados de Proteínas , Aprendizado de Máquina
9.
Artigo em Inglês | MEDLINE | ID: mdl-28113600

RESUMO

Computing similarity or dissimilarity between protein structures is an important task in structural biology. A conventional method to compute protein structure dissimilarity requires structural alignment of the proteins. However, defining one best alignment is difficult, especially when the structures are very different. In this paper, we propose a new similarity measure for protein structure comparisons using a set of multi-view 2D images of 3D protein structures. In this approach, each protein structure is represented by a subspace from the image set. The similarity between two protein structures is then characterized by the canonical angles between the two subspaces. The primary advantage of our method is that precise alignment is not needed. We employed Grassmann Discriminant Analysis (GDA) as the subspace-based learning in the classification framework. We applied our method for the classification problem of seven SCOP structural classes of protein 3D structures. The proposed method outperformed the k-nearest neighbor method (k-NN) based on conventional alignment-based methods CE, FATCAT, and TM-align. Our method was also applied to the classification of SCOP folds of membrane proteins, where the proposed method could recognize the fold HEM-binding four-helical bundle (f.21) much better than TM-Align.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Bases de Dados de Proteínas , Análise Discriminante , Dobramento de Proteína , Proteínas/classificação
10.
BMC Bioinformatics ; 18(Suppl 16): 577, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29297322

RESUMO

BACKGROUND: The ß-Lactamase (BL) enzyme family is an important class of enzymes that plays a key role in bacterial resistance to antibiotics. As the newly identified number of BL enzymes is increasing daily, it is imperative to develop a computational tool to classify the newly identified BL enzymes into one of its classes. There are two types of classification of BL enzymes: Molecular Classification and Functional Classification. Existing computational methods only address Molecular Classification and the performance of these existing methods is unsatisfactory. RESULTS: We addressed the unsatisfactory performance of the existing methods by implementing a Deep Learning approach called Convolutional Neural Network (CNN). We developed CNN-BLPred, an approach for the classification of BL proteins. The CNN-BLPred uses Gradient Boosted Feature Selection (GBFS) in order to select the ideal feature set for each BL classification. Based on the rigorous benchmarking of CCN-BLPred using both leave-one-out cross-validation and independent test sets, CCN-BLPred performed better than the other existing algorithms. Compared with other architectures of CNN, Recurrent Neural Network, and Random Forest, the simple CNN architecture with only one convolutional layer performs the best. After feature extraction, we were able to remove ~95% of the 10,912 features using Gradient Boosted Trees. During 10-fold cross validation, we increased the accuracy of the classic BL predictions by 7%. We also increased the accuracy of Class A, Class B, Class C, and Class D performance by an average of 25.64%. The independent test results followed a similar trend. CONCLUSIONS: We implemented a deep learning algorithm known as Convolutional Neural Network (CNN) to develop a classifier for BL classification. Combined with feature selection on an exhaustive feature set and using balancing method such as Random Oversampling (ROS), Random Undersampling (RUS) and Synthetic Minority Oversampling Technique (SMOTE), CNN-BLPred performs significantly better than existing algorithms for BL classification.


Assuntos
Algoritmos , Redes Neurais de Computação , beta-Lactamases/classificação , Sequência de Aminoácidos , Bases de Dados de Proteínas , Modelos Moleculares , Curva ROC , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA