Búsqueda | Portal Regional de la BVS

PepCNN deep learning tool for predicting peptide binding residues in proteins using sequence, structural, and language model features.

Chandra, Abel; Sharma, Alok; Dehzangi, Iman; Tsunoda, Tatsuhiko; Sattar, Abdul.

Sci Rep ; 13(1): 20882, 2023 11 28.

Artículo en Inglés | MEDLINE | ID: mdl-38016996

RESUMEN

Protein-peptide interactions play a crucial role in various cellular processes and are implicated in abnormal cellular behaviors leading to diseases such as cancer. Therefore, understanding these interactions is vital for both functional genomics and drug discovery efforts. Despite a significant increase in the availability of protein-peptide complexes, experimental methods for studying these interactions remain laborious, time-consuming, and expensive. Computational methods offer a complementary approach but often fall short in terms of prediction accuracy. To address these challenges, we introduce PepCNN, a deep learning-based prediction model that incorporates structural and sequence-based information from primary protein sequences. By utilizing a combination of half-sphere exposure, position specific scoring matrices from multiple-sequence alignment tool, and embedding from a pre-trained protein language model, PepCNN outperforms state-of-the-art methods in terms of specificity, precision, and AUC. The PepCNN software and datasets are publicly available at https://github.com/abelavit/PepCNN.git .

Asunto(s)

Aprendizaje Profundo , Proteínas/metabolismo , Péptidos , Programas Informáticos , Secuencia de Aminoácidos

Transformer-based deep learning for predicting protein properties in the life sciences.

Chandra, Abel; Tünnermann, Laura; Löfstedt, Tommy; Gratz, Regina.

Elife ; 122023 01 18.

Artículo en Inglés | MEDLINE | ID: mdl-36651724

RESUMEN

Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in protein property prediction. There is hope that deep learning can close the gap between the number of sequenced proteins and proteins with known properties based on lab experiments. Language models from the field of natural language processing have gained popularity for protein property predictions and have led to a new computational revolution in biology, where old prediction results are being improved regularly. Such models can learn useful multipurpose representations of proteins from large open repositories of protein sequences and can be used, for instance, to predict protein properties. The field of natural language processing is growing quickly because of developments in a class of models based on a particular model-the Transformer model. We review recent developments and the use of large-scale Transformer models in applications for predicting protein characteristics and how such models can be used to predict, for example, post-translational modifications. We review shortcomings of other deep learning models and explain how the Transformer models have quickly proven to be a very promising way to unravel information hidden in the sequences of amino acids.

Asunto(s)

Disciplinas de las Ciencias Biológicas , Aprendizaje Profundo , Secuencia de Aminoácidos , Aminoácidos , Lenguaje

RAM-PGK: Prediction of Lysine Phosphoglycerylation Based on Residue Adjacency Matrix.

Chandra, Abel Avitesh; Sharma, Alok; Dehzangi, Abdollah; Tsunoda, Tatushiko.

Genes (Basel) ; 11(12)2020 12 20.

Artículo en Inglés | MEDLINE | ID: mdl-33419274

RESUMEN

BACKGROUND: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues. RESULTS: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor. CONCLUSION: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.

Asunto(s)

Conjuntos de Datos como Asunto , Ácidos Glicéricos/metabolismo , Lisina/metabolismo , Procesamiento Proteico-Postraduccional , Máquina de Vectores de Soporte , Algoritmos , Secuencia de Aminoácidos , Lisina/química , Curva ROC , Programas Informáticos

Bigram-PGK: phosphoglycerylation prediction using the technique of bigram probabilities of position specific scoring matrix.

Chandra, Abel; Sharma, Alok; Dehzangi, Abdollah; Shigemizu, Daichi; Tsunoda, Tatsuhiko.

BMC Mol Cell Biol ; 20(Suppl 2): 57, 2019 Dec 20.

Artículo en Inglés | MEDLINE | ID: mdl-31856704

RESUMEN

BACKGROUND: The biological process known as post-translational modification (PTM) is a condition whereby proteomes are modified that affects normal cell biology, and hence the pathogenesis. A number of PTMs have been discovered in the recent years and lysine phosphoglycerylation is one of the fairly recent developments. Even with a large number of proteins being sequenced in the post-genomic era, the identification of phosphoglycerylation remains a big challenge due to factors such as cost, time consumption and inefficiency involved in the experimental efforts. To overcome this issue, computational techniques have emerged to accurately identify phosphoglycerylated lysine residues. However, the computational techniques proposed so far hold limitations to correctly predict this covalent modification. RESULTS: We propose a new predictor in this paper called Bigram-PGK which uses evolutionary information of amino acids to try and predict phosphoglycerylated sites. The benchmark dataset which contains experimentally labelled sites is employed for this purpose and profile bigram occurrences is calculated from position specific scoring matrices of amino acids in the protein sequences. The statistical measures of this work, such as sensitivity, specificity, precision, accuracy, Mathews correlation coefficient and area under ROC curve have been reported to be 0.9642, 0.8973, 0.8253, 0.9193, 0.8330, 0.9306, respectively. CONCLUSIONS: The proposed predictor, based on the feature of evolutionary information and support vector machine classifier, has shown great potential to effectively predict phosphoglycerylated and non-phosphoglycerylated lysine residues when compared against the existing predictors. The data and software of this work can be acquired from https://github.com/abelavit/Bigram-PGK.

Asunto(s)

Biología Computacional/métodos , Lisina/metabolismo , Procesamiento Proteico-Postraduccional , Algoritmos , Secuencia de Aminoácidos , Glucólisis , Lisina/química , Posición Específica de Matrices de Puntuación , Reproducibilidad de los Resultados , Programas Informáticos , Máquina de Vectores de Soporte

EvolStruct-Phogly: incorporating structural properties and evolutionary information from profile bigrams for the phosphoglycerylation prediction.

Chandra, Abel Avitesh; Sharma, Alok; Dehzangi, Abdollah; Tsunoda, Tatushiko.

BMC Genomics ; 19(Suppl 9): 984, 2019 Apr 18.

Artículo en Inglés | MEDLINE | ID: mdl-30999859

RESUMEN

BACKGROUND: Post-translational modification (PTM), which is a biological process, tends to modify proteome that leads to changes in normal cell biology and pathogenesis. In the recent times, there has been many reported PTMs. Out of the many modifications, phosphoglycerylation has become particularly the subject of interest. The experimental procedure for identification of phosphoglycerylated residues continues to be an expensive, inefficient and time-consuming effort, even with a large number of proteins that are sequenced in the post-genomic period. Computational methods are therefore being anticipated in order to effectively predict phosphoglycerylated lysines. Even though there are predictors available, the ability to detect phosphoglycerylated lysine residues still remains inadequate. RESULTS: We have introduced a new predictor in this paper named EvolStruct-Phogly that uses structural and evolutionary information relating to amino acids to predict phosphoglycerylated lysine residues. Benchmarked data is employed containing experimentally identified phosphoglycerylated and non-phosphoglycerylated lysines. We have then extracted the three structural information which are accessible surface area of amino acids, backbone torsion angles, amino acid's local structure conformations and profile bigrams of position-specific scoring matrices. CONCLUSION: EvolStruct-Phogly showed a noteworthy improvement in regards to the performance when compared with the previous predictors. The performance metrics obtained are as follows: sensitivity 0.7744, specificity 0.8533, precision 0.7368, accuracy 0.8275, and Mathews correlation coefficient of 0.6242. The software package and data of this work can be obtained from https://github.com/abelavit/EvolStruct-Phogly or www.alok-ai-lab.com.

Asunto(s)

Biología Computacional/métodos , Ácidos Difosfoglicéricos/química , Evolución Molecular , Procesamiento Proteico-Postraduccional , Proteínas/química , Algoritmos , Sitios de Unión , Ácidos Difosfoglicéricos/metabolismo , Humanos , Lisina/química , Lisina/metabolismo , Proteínas/metabolismo , Análisis de Secuencia de Proteína , Programas Informáticos , Máquina de Vectores de Soporte

GlyStruct: glycation prediction using structural properties of amino acid residues.

Reddy, Hamendra Manhar; Sharma, Alok; Dehzangi, Abdollah; Shigemizu, Daichi; Chandra, Abel Avitesh; Tsunoda, Tatushiko.

BMC Bioinformatics ; 19(Suppl 13): 547, 2019 Feb 04.

Artículo en Inglés | MEDLINE | ID: mdl-30717650

RESUMEN

BACKGROUND: Glycation is a one of the post-translational modifications (PTM) where sugar molecules and residues in protein sequences are covalently bonded. It has become one of the clinically important PTM in recent times attributed to many chronic and age related complications. Being a non-enzymatic reaction, it is a great challenge when it comes to its prediction due to the lack of significant bias in the sequence motifs. RESULTS: We developed a classifier, GlyStruct based on support vector machine, to predict glycated and non-glycated lysine residues using structural properties of amino acid residues. The features used were secondary structure, accessible surface area and the local backbone torsion angles. For this work, a benchmark dataset was extracted containing 235 glycated and 303 non-glycated lysine residues. GlyStruct demonstrated improved performance of approximately 10% in comparison to benchmark method of Gly-PseAAC. The performance for GlyStruct on the metrics, sensitivity, specificity, accuracy and Mathew's correlation coefficient were 0.7013, 0.7989, 0.7562, and 0.5065, respectively for 10-fold cross-validation. CONCLUSION: Glycation has emerged to be one of the clinically important PTM of proteins in recent times. Therefore, the development of computational tools become necessary to predict glycation, which could help medical professionals administer drugs and manage patients more effectively. The proposed predictor manages to classify glycated and non-glycated lysine residues with promising results consistently on various cross-validation schemes and outperforms other state of the art methods.

Asunto(s)

Algoritmos , Aminoácidos/química , Biología Computacional/métodos , Secuencia de Aminoácidos , Área Bajo la Curva , Benchmarking , Glicosilación , Humanos , Péptidos/química , Máquina de Vectores de Soporte

PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids.

Chandra, Abel; Sharma, Alok; Dehzangi, Abdollah; Ranganathan, Shoba; Jokhan, Anjeela; Chou, Kuo-Chen; Tsunoda, Tatsuhiko.

Sci Rep ; 8(1): 17923, 2018 12 18.

Artículo en Inglés | MEDLINE | ID: mdl-30560923

RESUMEN

The biological process known as post-translational modification (PTM) contributes to diversifying the proteome hence affecting many aspects of normal cell biology and pathogenesis. There have been many recently reported PTMs, but lysine phosphoglycerylation has emerged as the most recent subject of interest. Despite a large number of proteins being sequenced, the experimental method for detection of phosphoglycerylated residues remains an expensive, time-consuming and inefficient endeavor in the post-genomic era. Instead, the computational methods are being proposed for accurately predicting phosphoglycerylated lysines. Though a number of predictors are available, performance in detecting phosphoglycerylated lysine residues is still limited. In this paper, we propose a new predictor called PhoglyStruct that utilizes structural information of amino acids alongside a multilayer perceptron classifier for predicting phosphoglycerylated and non-phosphoglycerylated lysine residues. For the experiment, we located phosphoglycerylated and non-phosphoglycerylated lysines in our employed benchmark. We then derived and integrated properties such as accessible surface area, backbone torsion angles, and local structure conformations. PhoglyStruct showed significant improvement in the ability to detect phosphoglycerylated residues from non-phosphoglycerylated ones when compared to previous predictors. The sensitivity, specificity, accuracy, Mathews correlation coefficient and AUC were 0.8542, 0.7597, 0.7834, 0.5468 and 0.8077, respectively. The data and Matlab/Octave software packages are available at https://github.com/abelavit/PhoglyStruct .

Asunto(s)

Biología Computacional/métodos , Glicerol/química , Lisina/química , Algoritmos , Conformación Molecular , Fosforilación , Procesamiento Proteico-Postraduccional

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA