Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros

Bases de datos
Tipo del documento
Asunto de la revista
País de afiliación
Intervalo de año de publicación
1.
Nature ; 596(7873): 583-589, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34265844

RESUMEN

Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1-4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence-the structure prediction component of the 'protein folding problem'8-has been an important open research problem for more than 50 years9. Despite recent progress10-14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.


Asunto(s)
Redes Neurales de la Computación , Conformación Proteica , Pliegue de Proteína , Proteínas/química , Secuencia de Aminoácidos , Biología Computacional/métodos , Biología Computacional/normas , Bases de Datos de Proteínas , Aprendizaje Profundo/normas , Modelos Moleculares , Reproducibilidad de los Resultados , Alineación de Secuencia
2.
Nature ; 596(7873): 590-596, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34293799

RESUMEN

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.


Asunto(s)
Biología Computacional/normas , Aprendizaje Profundo/normas , Modelos Moleculares , Conformación Proteica , Proteoma/química , Conjuntos de Datos como Asunto/normas , Diacilglicerol O-Acetiltransferasa/química , Glucosa-6-Fosfatasa/química , Humanos , Proteínas de la Membrana/química , Pliegue de Proteína , Reproducibilidad de los Resultados
3.
Nature ; 577(7792): 706-710, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31942072

RESUMEN

Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence1. This problem is of fundamental importance as the structure of a protein largely determines its function2; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures3. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force4 that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction5 (CASP13)-a blind assessment of the state of the field-AlphaFold created high-accuracy structures (with template modelling (TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined7.


Asunto(s)
Aprendizaje Profundo , Modelos Moleculares , Conformación Proteica , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Caspasas/química , Caspasas/genética , Conjuntos de Datos como Asunto , Pliegue de Proteína , Proteínas/genética
4.
Proteins ; 89(12): 1711-1721, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34599769

RESUMEN

We describe the operation and improvement of AlphaFold, the system that was entered by the team AlphaFold2 to the "human" category in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The AlphaFold system entered in CASP14 is entirely different to the one entered in CASP13. It used a novel end-to-end deep neural network trained to produce protein structures from amino acid sequence, multiple sequence alignments, and homologous proteins. In the assessors' ranking by summed z scores (>2.0), AlphaFold scored 244.0 compared to 90.8 by the next best group. The predictions made by AlphaFold had a median domain GDT_TS of 92.4; this is the first time that this level of average accuracy has been achieved during CASP, especially on the more difficult Free Modeling targets, and represents a significant improvement in the state of the art in protein structure prediction. We reported how AlphaFold was run as a human team during CASP14 and improved such that it now achieves an equivalent level of performance without intervention, opening the door to highly accurate large-scale structure prediction.


Asunto(s)
Modelos Moleculares , Redes Neurales de la Computación , Pliegue de Proteína , Proteínas , Programas Informáticos , Secuencia de Aminoácidos , Biología Computacional , Aprendizaje Profundo , Conformación Proteica , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína
5.
Proteins ; 87(12): 1141-1148, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31602685

RESUMEN

We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.


Asunto(s)
Biología Computacional/métodos , Redes Neurales de la Computación , Conformación Proteica , Pliegue de Proteína , Proteínas/química , Algoritmos , Bases de Datos de Proteínas , Modelos Moleculares
6.
Science ; 381(6664): eadg7492, 2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37733863

RESUMEN

The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.


Asunto(s)
Sustitución de Aminoácidos , Enfermedad , Mutación Missense , Proteoma , Alineación de Secuencia , Humanos , Sustitución de Aminoácidos/genética , Benchmarking , Secuencia Conservada , Bases de Datos Genéticas , Enfermedad/genética , Genoma Humano , Conformación Proteica , Proteoma/genética , Alineación de Secuencia/métodos , Aprendizaje Automático
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA