Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36916746

RESUMO

MOTIVATION: Computational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired. RESULTS: Here, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue's local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein. AVAILABILITY AND IMPLEMENTATION: The source code of ProDESIGN-LE is available at https://github.com/bigict/ProDESIGN-LE.


Assuntos
Proteínas , Software , Sequência de Aminoácidos , Proteínas/química
2.
Genomics Proteomics Bioinformatics ; 21(5): 913-925, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37001856

RESUMO

Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.


Assuntos
Algoritmos , Proteínas , Conformação Proteica , Proteínas/química , Redes Neurais de Computação , Dobramento de Proteína , Biologia Computacional/métodos
3.
J Bioinform Comput Biol ; 5(2a): 297-311, 2007 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-17589963

RESUMO

In protein identification by tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. To date, the widely-used database searching methods adopted simple statistical models for predicting. For some peptide, these models usually yield a theoretical spectrum with a significant deviation from the experimental one. In this paper, in order to derive an improved predicting model, we utilized a non-linear programming model to quantify the factors impacting peptide fragmentation. Then, an iterative algorithm was proposed to solve this optimization problem. Upon a training set of 1803 spectra, the experimental result showed a good agreement with some known principles about peptide fragmentation, such as the tendency to cleave at the middle of peptide, and Pro's preference of the N-terminal cleavage. Moreover, upon a testing set of 941 spectra, comparison of the predicted spectra against the experimental ones showed that this method can generate reasonable predictions. The results in this paper can offer help to both database searching and de novo methods.


Assuntos
Algoritmos , Espectrometria de Massas/métodos , Modelos Químicos , Mapeamento de Peptídeos/métodos , Peptídeos/química , Proteínas/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Dados de Sequência Molecular , Dinâmica não Linear
4.
BMC Bioinformatics ; 7: 222, 2006 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-16638152

RESUMO

BACKGROUND: Tandem mass spectrometry (MS/MS) is a powerful tool for protein identification. Although great efforts have been made in scoring the correlation between tandem mass spectra and an amino acid sequence database, improvements could be made in three aspects, including characterization ofpeaks in spectra, adoption of effective scoring functions and access to thereliability of matching between peptides and spectra. RESULTS: A novel scoring function is presented, along with criteria to estimate the performance confidence of the function. Through learning the typesof product ions and the probability of generating them, a hypothetic spectrum was generated for each candidate peptide. Then relative entropy was introduced to measure the similarity between the hypothetic and the observed spectra. Based on the extreme value distribution (EVD) theory, a threshold was chosen to distinguish a true peptide assignment from a random one. Tests on a public MS/MS dataset demonstrated that this method performs better than the well-known SEQUEST. CONCLUSION: A reliable identification of proteins from the spectra promises a more efficient application of tandem mass spectrometry to proteomes with high complexity.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Mapeamento de Peptídeos/métodos , Peptídeos/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Dados de Sequência Molecular , Peptídeos/análise , Peptídeos/classificação
5.
J Proteome Res ; 7(1): 202-8, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18092745

RESUMO

In protein identification through tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. The widely used prediction models, such as SEQUEST and MASCOT, ignore the intensity of the ions with important neutral losses, including water loss and ammonia loss. However, ignoring these neutral losses results in a significant deviation between the predicted theoretical spectrum and its experimental counterpart. Here, based on the "one peak, multiple explanations" observation, we proposed an expectation-maximization (EM) method to automatically learn the probabilities of water loss and ammonia loss for each amino acid. Then we employed these probabilities to design an improved statistical model for theoretical spectrum prediction. We implemented these methods and tested them on practical data. On a training set containing 1803 spectra, the experimental results show a good agreement with some known knowledge about neutral losses, such as the tendency of water loss from Asp, Glu, Ser, and Thr. Furthermore, on a testing set containing 941 spectra, the improved similarity between the experimental and predicted spectra demonstrates that this method can generate more reasonable predictions relative to the model that ignores neutral losses. As an application of the derived probabilities, we implemented a database searching method adopting the improved theoretical spectrum model with neutral loss ions estimated. Experimental results on Keller's data set demonstrate that this method can identify peptides more accurately than SEQUEST. In another application to validate SEQUEST's results, the reported peptide-spectrum pairs are reranked with respect to the similarity between experimental and predicted spectra. Experimental results on both LTQ and QSTAR data sets suggest that this reranking strategy can effectively distinguish the false negative predictions reported by SEQUEST.


Assuntos
Aminoácidos/química , Amônia/química , Peptídeos/análise , Espectrometria de Massas em Tandem/métodos , Água/química , Algoritmos , Bases de Dados Factuais , Probabilidade , Software
6.
Artigo em Inglês | MEDLINE | ID: mdl-17369654

RESUMO

In protein identification through MS/MS spectrum, it is critical to accurately predict theoretical spectrum from a peptide sequence, which heavily depends on a quantitative understanding of the fragmentation process. To date, widely used database searching methods adopted a simple statistical model to predict theoretical spectrum, yielding a spectrum deviating significantly from the practical spectrum for some peptides and therefore preventing automated positive identification. Here, in order to derive an improved predicting model, we proposed a novel method to automatically learn the factors influencing fragmentation from a training set of MS/MS spectra. In this method, the determining of factors is converted into an optimization problem to minimize an objective function that measures the distance between experimental spectrum and theoretical one. Then, an iterative algorithm was proposed to minimize the non-linear objective function. We implemented the methods and tested them on experimental data. The examination of 1451 spectra is in good agreement with some known knowledge about peptide fragmentation, such as the tendency of cleavage towards the middle of peptide, and Pro's preference of N-terminal cleavage. Moreover, on a testing set containing 1425 spectra, comparison between predicted and practical spectra generates a median correlation of 0.759, showing this method's ability to predict a "realistic" spectrum. The results in this paper help to an accurate identification of protein through both database searching and de novo methods.


Assuntos
Biologia Computacional/métodos , Espectrometria de Massas/métodos , Peptídeos/química , Algoritmos , Automação , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Células K562 , Cinética , Modelos Estatísticos , Modelos Teóricos , Estrutura Terciária de Proteína , Processamento de Sinais Assistido por Computador , Espectrofotometria/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA