Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2029-2040, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37015594

RESUMO

Peptide-binding proteins play significant roles in various applications such as gene expression, metabolism, signal transmission, DNA (Deoxyribose Nucleic Acid) repair, and replication. Investigating the binding residues in protein-peptide complexes, especially from their sequence only, is challenging experimentally and computationally. Although several computational approaches have been introduced to determine and predict these binding residues, there is still ample room to improve the prediction performance. In this work, we introduce a novel ensemble machine learning-based approach called SPPPred (Sequence-based Protein-Peptide binding residue Prediction) to predict protein-peptide binding residues. First, we extract relevant sequential information and employ genetic programming algorithm for feature construction to find more distinctive features. We then, in the next step, build an ensemble-based machine learning classifier to predict binding residues. The proposed method shows consistent and comparable performance on both ten-fold cross-validation and independent test set. Furthermore, SPPPred yields F-Measure (F-M), Accuracy(ACC), and Matthews' Correlation Coefficient (MCC) of 0.310, 0.949, and 0.230 on the independent test set, respectively, which outperforms other competing methods by approximately up to 9% on the independent test set. SPPPred is publicly available https://github.com/GTaherzadeh/SPPPred.git.


Assuntos
Peptídeos , Proteínas , Proteínas/química , Peptídeos/genética , Peptídeos/química , Ligação Proteica , Aprendizado de Máquina , DNA/química , Algoritmos
2.
Methods Mol Biol ; 2499: 177-186, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696081

RESUMO

Protein glycosylation is one of the most complex posttranslational modifications (PTM) that play a fundamental role in protein function. Identification and annotation of these sites using experimental approaches are challenging and time consuming. Hence, there is a demand to build fast and efficient computational methods to address this problem. Here, we present the SPRINT-Gly framework containing the largest dataset and a prediction model of glycosylation sites for a given protein sequence. In this framework, we construct a large dataset containing N- and O-linked glycosylation sites of human and mouse proteins, collected from different sources. We then introduce the SPRINT-Gly method to predict putative N- and O-linked sites. SPRINT-Gly is a machine learning-based approach consisting of a number of trained predictive models for glycosylation sites in both human and mouse proteins, separately. The method is built by incorporating sequence-based, predicted structural, and physicochemical information of the neighboring residues of each N- and O-linked glycosylation site and by training deep learning neural network and support vector machine as classifiers. SPRINT-Gly outperformed other existing methods by achieving 18% and 50% higher Matthew's correlation coefficient for N- and O-linked glycosylation site prediction, respectively. SPRINT-Gly is publicly available as an online and stand-alone predictor at https://sparks-lab.org/server/sprint-gly/ .


Assuntos
Proteínas , Máquina de Vetores de Suporte , Sequência de Aminoácidos , Animais , Biologia Computacional/métodos , Glicosilação , Humanos , Camundongos , Processamento de Proteína Pós-Traducional , Proteínas/química
3.
PLoS Comput Biol ; 17(9): e1009380, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34491988

RESUMO

The SARS-CoV-2 pandemic highlights the need for a detailed molecular understanding of protective antibody responses. This is underscored by the emergence and spread of SARS-CoV-2 variants, including Alpha (B.1.1.7) and Delta (B.1.617.2), some of which appear to be less effectively targeted by current monoclonal antibodies and vaccines. Here we report a high resolution and comprehensive map of antibody recognition of the SARS-CoV-2 spike receptor binding domain (RBD), which is the target of most neutralizing antibodies, using computational structural analysis. With a dataset of nonredundant experimentally determined antibody-RBD structures, we classified antibodies by RBD residue binding determinants using unsupervised clustering. We also identified the energetic and conservation features of epitope residues and assessed the capacity of viral variant mutations to disrupt antibody recognition, revealing sets of antibodies predicted to effectively target recently described viral variants. This detailed structure-based reference of antibody RBD recognition signatures can inform therapeutic and vaccine design strategies.


Assuntos
Anticorpos Antivirais , COVID-19/virologia , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus , Anticorpos Antivirais/química , Anticorpos Antivirais/metabolismo , Sítios de Ligação , Análise por Conglomerados , Biologia Computacional , Humanos , Modelos Moleculares , Ligação Proteica , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/metabolismo
4.
Comput Struct Biotechnol J ; 18: 3528-3538, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33304452

RESUMO

RNA modification is an essential step towards generation of new RNA structures. Such modification is potentially able to modify RNA function or its stability. Among different modifications, 5-Hydroxymethylcytosine (5hmC) modification of RNA exhibit significant potential for a series of biological processes. Understanding the distribution of 5hmC in RNA is essential to determine its biological functionality. Although conventional sequencing techniques allow broad identification of 5hmC, they are both time-consuming and resource-intensive. In this study, we propose a new computational tool called iRNA5hmC-PS to tackle this problem. To build iRNA5hmC-PS we extract a set of novel sequence-based features called Position-Specific Gapped k-mer (PSG k-mer) to obtain maximum sequential information. Our feature analysis shows that our proposed PSG k-mer features contain vital information for the identification of 5hmC sites. We also use a group-wise feature importance calculation strategy to select a small subset of features containing maximum discriminative information. Our experimental results demonstrate that iRNA5hmC-PS is able to enhance the prediction performance, dramatically. iRNA5hmC-PS achieves 78.3% prediction performance, which is 12.8% better than those reported in the previous studies. iRNA5hmC-PS is publicly available as an online tool at http://103.109.52.8:81/iRNA5hmC-PS. Its benchmark dataset, source codes, and documentation are available at https://github.com/zahid6454/iRNA5hmC-PS.

5.
IEEE Access ; 8: 77888-77902, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33354488

RESUMO

Post Translational Modification (PTM) is considered an important biological process with a tremendous impact on the function of proteins in both eukaryotes, and prokaryotes cells. During the past decades, a wide range of PTMs has been identified. Among them, malonylation is a recently identified PTM which plays a vital role in a wide range of biological interactions. Notwithstanding, this modification plays a potential role in energy metabolism in different species including Homo Sapiens. The identification of PTM sites using experimental methods is time-consuming and costly. Hence, there is a demand for introducing fast and cost-effective computational methods. In this study, we propose a new machine learning method, called Mal-Light, to address this problem. To build this model, we extract local evolutionary-based information according to the interaction of neighboring amino acids using a bi-peptide based method. We then use Light Gradient Boosting (LightGBM) as our classifier to predict malonylation sites. Our results demonstrate that Mal-Light is able to significantly improve malonylation site prediction performance compared to previous studies found in the literature. Using Mal-Light we achieve Matthew's correlation coefficient (MCC) of 0.74 and 0.60, Accuracy of 86.66% and 79.51%, Sensitivity of 78.26% and 67.27%, and Specificity of 95.05% and 91.75%, for Homo Sapiens and Mus Musculus proteins, respectively. Mal-Light is implemented as an online predictor which is publicly available at: (http://brl.uiu.ac.bd/MalLight/).

6.
Comput Biol Med ; 125: 104022, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33022522

RESUMO

Post Transactional Modification (PTM) is a vital process which plays an important role in a wide range of biological interactions. One of the most recently identified PTMs is Malonylation. It has been shown that Malonylation has an important impact on different biological pathways including glucose and fatty acid metabolism. Malonylation can be detected experimentally using mass spectrometry. However, this process is both costly and time-consuming which has inspired research to find more efficient and fast computational methods to solve this problem. This paper proposes a novel approach, called SEMal, to identify Malonylation sites in protein sequences. It uses both structural and evolutionary-based features to solve this problem. It also uses Rotation Forest (RoF) as its classification technique to predict Malonylation sites. To the best of our knowledge, our extracted features as well as our employed classifier have never been used for this problem. Compared to the previously proposed methods, SEMal outperforms them in all metrics such as sensitivity (0.94 and 0.89), accuracy (0.94 and 0.91), and Matthews correlation coefficient (0.88 and 0.82), for Homo Sapiens and Mus Musculus species, respectively. SEMal is publicly available as an online predictor at: http://brl.uiu.ac.bd/SEMal/.


Assuntos
Lisina , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos , Animais , Evolução Biológica , Humanos , Lisina/metabolismo , Camundongos
7.
Genes (Basel) ; 11(9)2020 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-32878321

RESUMO

Post Translational Modification (PTM) is defined as the alteration of protein sequence upon interaction with different macromolecules after the translation process. Glutarylation is considered one of the most important PTMs, which is associated with a wide range of cellular functioning, including metabolism, translation, and specified separate subcellular localizations. During the past few years, a wide range of computational approaches has been proposed to predict Glutarylation sites. However, despite all the efforts that have been made so far, the prediction performance of the Glutarylation sites has remained limited. One of the main challenges to tackle this problem is to extract features with significant discriminatory information. To address this issue, we propose a new machine learning method called BiPepGlut using the concept of a bi-peptide-based evolutionary method for feature extraction. To build this model, we also use the Extra-Trees (ET) classifier for the classification purpose, which, to the best of our knowledge, has never been used for this task. Our results demonstrate BiPepGlut is able to significantly outperform previously proposed models to tackle this problem. BiPepGlut achieves 92.0%, 84.8%, 95.6%, 0.82, and 0.88 in accuracy, sensitivity, specificity, Matthew's Correlation Coefficient, and F1-score, respectively. BiPepGlut is implemented as a publicly available online predictor.


Assuntos
Evolução Molecular , Glutaratos/química , Lisina/química , Mycobacterium tuberculosis/metabolismo , Fragmentos de Peptídeos/química , Processamento de Proteína Pós-Traducional , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Animais , Biologia Computacional , Glutaratos/metabolismo , Lisina/metabolismo , Aprendizado de Máquina , Camundongos , Mycobacterium tuberculosis/crescimento & desenvolvimento , Fragmentos de Peptídeos/metabolismo , Proteínas/metabolismo , Máquina de Vetores de Suporte
8.
J Theor Biol ; 496: 110278, 2020 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-32298689

RESUMO

MOTIVATION: Interactions between proteins and peptides influence biological functions. Predicting such bio-molecular interactions can lead to faster disease prevention and help in drug discovery. Experimental methods for determining protein-peptide binding sites are costly and time-consuming. Therefore, computational methods have become prevalent. However, existing models show extremely low detection rates of actual peptide binding sites in proteins. To address this problem, we employed a two-stage technique - first, we extracted the relevant features from protein sequences and transformed them into images applying a novel method and then, we applied a convolutional neural network to identify the peptide binding sites in proteins. RESULTS: We found that our approach achieves 67% sensitivity or recall (true positive rate) surpassing existing methods by over 35%.


Assuntos
Redes Neurais de Computação , Proteínas , Sítios de Ligação , Peptídeos/metabolismo , Ligação Proteica
9.
Curr Opin Struct Biol ; 62: 56-69, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-31874386

RESUMO

Protein glycosylation is the most complex and prevalent post-translation modification in terms of the number of proteins modified and the diversity generated. To understand the functional roles of glycoproteins it is important to gain an insight into the repertoire of oligosaccharides present. The comparison and relative quantitation of glycoforms combined with site-specific identification and occupancy are necessary steps in this direction. Computational platforms have continued to mature assisting researchers with the interpretation of such glycomics and glycoproteomics data sets, but frequently support dedicated workflows and users rely on the manual interpretation of data to gain insights into the glycoproteome. The growth of site-specific knowledge has also led to the implementation of machine-learning algorithms to predict glycosylation which is now being integrated into glycoproteomics pipelines. This short review describes commercial and open-access databases and software with an emphasis on those that are actively maintained and designed to support current analytical workflows.


Assuntos
Bases de Dados de Proteínas , Glicômica/métodos , Glicoproteínas/química , Proteômica/métodos , Software , Animais , Bactérias/química , Biologia Computacional , Glicosilação , Humanos , Aprendizado de Máquina , Plantas/química , Processamento de Proteína Pós-Traducional
10.
Bioinformatics ; 35(20): 4140-4146, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30903686

RESUMO

MOTIVATION: Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively. RESULTS: The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features. AVAILABILITY AND IMPLEMENTATION: http://sparks-lab.org/server/SPRINT-Gly/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Máquina de Vetores de Suporte , Animais , Glicoproteínas , Glicosilação , Humanos , Camundongos , Processamento de Proteína Pós-Traducional
11.
Molecules ; 23(12)2018 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-30544729

RESUMO

Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Sequência de Aminoácidos , Sítios de Ligação , Internet , Aprendizado de Máquina , Modelos Moleculares , Estrutura Secundária de Proteína , Proteínas/genética , Sumoilação
12.
Curr Protoc Protein Sci ; 94(1): e75, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30106511

RESUMO

Protein-carbohydrate interaction is essential for biological systems, and carbohydrate-binding proteins (CBPs) are important targets when designing antiviral and anticancer drugs. Due to the high cost and difficulty associated with experimental approaches, many computational methods have been developed as complementary approaches to predict CBPs or carbohydrate-binding sites. However, most of these computational methods are not publicly available. Here, we provide a comprehensive review of related studies and demonstrate our two recently developed bioinformatics methods. The method SPOT-CBP is a template-based method for detecting CBPs based on structure through structural homology search combined with a knowledge-based scoring function. This method can yield model complex structure in addition to accurate prediction of CBPs. Furthermore, it has been observed that similarly accurate predictions can be made using structures from homology modeling, which has significantly expanded its applicability. The other method, SPRINT-CBH, is a de novo approach that predicts binding residues directly from protein sequences by using sequence information and predicted structural properties. This approach does not need structurally similar templates and thus is not limited by the current database of known protein-carbohydrate complex structures. These two complementary methods are available at https://sparks-lab.org. © 2018 by John Wiley & Sons, Inc.


Assuntos
Simulação por Computador , Lectinas/química , Lectinas/genética , Análise de Sequência de Proteína/métodos , Sítios de Ligação
13.
J Comput Chem ; 39(22): 1757-1763, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29761520

RESUMO

Malonylation is a recently discovered post-translational modification (PTM) in which a malonyl group attaches to a lysine (K) amino acid residue of a protein. In this work, a novel machine learning model, SPRINT-Mal, is developed to predict malonylation sites by employing sequence and predicted structural features. Evolutionary information and physicochemical properties are found to be the two most discriminative features whereas a structural feature called half-sphere exposure provides additional improvement to the prediction performance. SPRINT-Mal trained on mouse data yields robust performance for 10-fold cross validation and independent test set with Area Under the Curve (AUC) values of 0.74 and 0.76 and Matthews' Correlation Coefficient (MCC) of 0.213 and 0.20, respectively. Moreover, SPRINT-Mal achieved comparable performance when testing on H. sapiens proteins without species-specific training but not in bacterium S. erythraea. This suggests similar underlying physicochemical mechanisms between mouse and human but not between mouse and bacterium. SPRINT-Mal is freely available as an online server at: http://sparks-lab.org/server/SPRINT-Mal/. © 2018 Wiley Periodicals, Inc.


Assuntos
Proteínas de Bactérias/química , Lisina/química , Aprendizado de Máquina , Malonatos/química , Animais , Proteínas de Bactérias/metabolismo , Hominidae/metabolismo , Humanos , Lisina/metabolismo , Malonatos/metabolismo , Camundongos , Estrutura Molecular , Processamento de Proteína Pós-Traducional , Saccharopolyspora/química , Saccharopolyspora/metabolismo
14.
PLoS One ; 13(2): e0191900, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29432431

RESUMO

Post-translational modification refers to the biological mechanism involved in the enzymatic modification of proteins after being translated in the ribosome. This mechanism comprises a wide range of structural modifications, which bring dramatic variations to the biological function of proteins. One of the recently discovered modifications is succinylation. Although succinylation can be detected through mass spectrometry, its current experimental detection turns out to be a timely process unable to meet the exponential growth of sequenced proteins. Therefore, the implementation of fast and accurate computational methods has emerged as a feasible solution. This paper proposes a novel classification approach, which effectively incorporates the secondary structure and evolutionary information of proteins through profile bigrams for succinylation prediction. The proposed predictor, abbreviated as SSEvol-Suc, made use of the above features for training an AdaBoost classifier and consequently predicting succinylated lysine residues. When SSEvol-Suc was compared with four benchmark predictors, it outperformed them in metrics such as sensitivity (0.909), accuracy (0.875) and Matthews correlation coefficient (0.75).


Assuntos
Evolução Biológica , Proteínas/química , Ácido Succínico/metabolismo , Processamento de Proteína Pós-Traducional , Estrutura Secundária de Proteína , Proteínas/metabolismo
15.
BMC Genomics ; 19(Suppl 1): 923, 2018 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-29363424

RESUMO

BACKGROUND: Post-translational modification is considered an important biological mechanism with critical impact on the diversification of the proteome. Although a long list of such modifications has been studied, succinylation of lysine residues has recently attracted the interest of the scientific community. The experimental detection of succinylation sites is an expensive process, which consumes a lot of time and resources. Therefore, computational predictors of this covalent modification have emerged as a last resort to tackling lysine succinylation. RESULTS: In this paper, we propose a novel computational predictor called 'Success', which efficiently uses the structural and evolutionary information of amino acids for predicting succinylation sites. To do this, each lysine was described as a vector that combined the above information of surrounding amino acids. We then designed a support vector machine with a radial basis function kernel for discriminating between succinylated and non-succinylated residues. We finally compared the Success predictor with three state-of-the-art predictors in the literature. As a result, our proposed predictor showed a significant improvement over the compared predictors in statistical metrics, such as sensitivity (0.866), accuracy (0.838) and Matthews correlation coefficient (0.677) on a benchmark dataset. CONCLUSIONS: The proposed predictor effectively uses the structural and evolutionary information of the amino acids surrounding a lysine. The bigram feature extraction approach, while retaining the same number of features, facilitates a better description of lysines. A support vector machine with a radial basis function kernel was used to discriminate between modified and unmodified lysines. The aforementioned aspects make the Success predictor outperform three state-of-the-art predictors in succinylation detection.


Assuntos
Algoritmos , Aminoácidos/química , Evolução Molecular , Lisina/química , Processamento de Proteína Pós-Traducional , Ácido Succínico/metabolismo , Sequência de Aminoácidos , Aminoácidos/metabolismo , Biologia Computacional/métodos , Lisina/metabolismo
16.
J Comput Chem ; 39(8): 407-411, 2018 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-29164646

RESUMO

Determining the flexibility of structured biomolecules is important for understanding their biological functions. One quantitative measurement of flexibility is the atomic Debye-Waller factor or temperature B-factor. Most existing studies are limited to temperature B-factors of proteins and their prediction. Only one method attempted to predict temperature B-factors of ribosomal RNA. Here, we developed and compared machine-learning techniques in prediction of temperature B-factors of RNAs. The best model based on Support Vector Machines yields Pearson's correction coefficient at 0.51 for fivefold cross validation and 0.50 for the independent test. Analysis of the performance indicates that the model has the best performance on rRNAs, tRNAs, and protein-bound RNAs, for long chains in particular. The server is available at http://sparks-lab.org/server/RNAflex. © 2017 Wiley Periodicals, Inc.


Assuntos
RNA Ribossômico/química , Máquina de Vetores de Suporte , Modelos Moleculares , Temperatura
17.
Bioinformatics ; 34(3): 477-484, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29028926

RESUMO

Motivation: Protein-peptide interactions are one of the most important biological interactions and play crucial role in many diseases including cancer. Therefore, knowledge of these interactions provides invaluable insights into all cellular processes, functional mechanisms, and drug discovery. Protein-peptide interactions can be analyzed by studying the structures of protein-peptide complexes. However, only a small portion has known complex structures and experimental determination of protein-peptide interaction is costly and inefficient. Thus, predicting peptide-binding sites computationally will be useful to improve efficiency and cost effectiveness of experimental studies. Here, we established a machine learning method called SPRINT-Str (Structure-based prediction of protein-Peptide Residue-level Interaction) to use structural information for predicting protein-peptide binding residues. These predicted binding residues are then employed to infer the peptide-binding site by a clustering algorithm. Results: SPRINT-Str achieves robust and consistent results for prediction of protein-peptide binding regions in terms of residues and sites. Matthews' Correlation Coefficient (MCC) for 10-fold cross validation and independent test set are 0.27 and 0.293, respectively, as well as 0.775 and 0.782, respectively for area under the curve. The prediction outperforms other state-of-the-art methods, including our previously developed sequence-based method. A further spatial neighbor clustering of predicted binding residues leads to prediction of binding sites at 20-116% higher coverage than the next best method at all precision levels in the test set. The application of SPRINT-Str to protein binding with DNA, RNA and carbohydrate confirms the method's capability of separating peptide-binding sites from other functional sites. More importantly, similar performance in prediction of binding residues and sites is obtained when experimentally determined structures are replaced by unbound structures or quality model structures built from homologs, indicating its wide applicability. Availability and implementation: http://sparks-lab.org/server/SPRINT-Str. Contact: yangyd25@mail.sysu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Peptídeos/metabolismo , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Biologia Computacional/métodos , Humanos , Peptídeos/química , Ligação Proteica , Domínios Proteicos , Proteína Tirosina Fosfatase não Receptora Tipo 4/metabolismo , Proteínas/química
18.
J Theor Biol ; 425: 97-102, 2017 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-28483566

RESUMO

Post-translational modification (PTM) is a covalent and enzymatic modification of proteins, which contributes to diversify the proteome. Despite many reported PTMs with essential roles in cellular functioning, lysine succinylation has emerged as a subject of particular interest. Because its experimental identification remains a costly and time-consuming process, computational predictors have been recently proposed for tackling this important issue. However, the performance of current predictors is still very limited. In this paper, we propose a new predictor called PSSM-Suc which employs evolutionary information of amino acids for predicting succinylated lysine residues. Here we described each lysine residue in terms of profile bigrams extracted from position specific scoring matrices. We compared the performance of PSSM-Suc to that of existing predictors using a widely used benchmark dataset. PSSM-Suc showed a significant improvement in performance over state-of-the-art predictors. Its sensitivity, accuracy and Matthews correlation coefficient were 0.8159, 0.8199 and 0.6396, respectively.


Assuntos
Biologia Computacional/métodos , Lisina/metabolismo , Matrizes de Pontuação de Posição Específica , Processamento de Proteína Pós-Traducional , Algoritmos , Sequência de Aminoácidos , Aminoácidos/química , Animais , Evolução Molecular , Sensibilidade e Especificidade
19.
Anal Biochem ; 527: 24-32, 2017 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-28363440

RESUMO

Post-Translational Modification (PTM) is a biological reaction which contributes to diversify the proteome. Despite many modifications with important roles in cellular activity, lysine succinylation has recently emerged as an important PTM mark. It alters the chemical structure of lysines, leading to remarkable changes in the structure and function of proteins. In contrast to the huge amount of proteins being sequenced in the post-genome era, the experimental detection of succinylated residues remains expensive, inefficient and time-consuming. Therefore, the development of computational tools for accurately predicting succinylated lysines is an urgent necessity. To date, several approaches have been proposed but their sensitivity has been reportedly poor. In this paper, we propose an approach that utilizes structural features of amino acids to improve lysine succinylation prediction. Succinylated and non-succinylated lysines were first retrieved from 670 proteins and characteristics such as accessible surface area, backbone torsion angles and local structure conformations were incorporated. We used the k-nearest neighbors cleaning treatment for dealing with class imbalance and designed a pruned decision tree for classification. Our predictor, referred to as SucStruct (Succinylation using Structural features), proved to significantly improve performance when compared to previous predictors, with sensitivity, accuracy and Mathew's correlation coefficient equal to 0.7334-0.7946, 0.7444-0.7608 and 0.4884-0.5240, respectively.


Assuntos
Aminoácidos/metabolismo , Lisina/metabolismo , Modelos Estatísticos , Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , Ácido Succínico/metabolismo , Algoritmos , Sequência de Aminoácidos , Animais , Humanos , Proteoma/genética , Roedores/genética , Roedores/metabolismo
20.
J Chem Inf Model ; 56(10): 2115-2122, 2016 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-27623166

RESUMO

Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH .


Assuntos
Metabolismo dos Carboidratos , Proteínas/metabolismo , Máquina de Vetores de Suporte , Sítios de Ligação , Carboidratos/química , Bases de Dados de Proteínas , Humanos , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA