Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
Nature ; 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38718835

RESUMO

The introduction of AlphaFold 21 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design2-6. In this paper, we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture, which is capable of joint structure prediction of complexes including proteins, nucleic acids, small molecules, ions, and modified residues. The new AlphaFold model demonstrates significantly improved accuracy over many previous specialised tools: far greater accuracy on protein-ligand interactions than state of the art docking tools, much higher accuracy on protein-nucleic acid interactions than nucleic-acid-specific predictors, and significantly higher antibody-antigen prediction accuracy than AlphaFold-Multimer v2.37,8. Together these results show that high accuracy modelling across biomolecular space is possible within a single unified deep learning framework.

2.
Nucleic Acids Res ; 52(D1): D368-D375, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37933859

RESUMO

The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.


The AlphaFold Protein Structure Database (AlphaFold DB) is a massive digital library of predicted protein structures, with over 214 million entries, marking a 500-times expansion in size since its initial release in 2021. The structures are predicted using Google DeepMind's AlphaFold 2 artificial intelligence (AI) system. Our new report highlights the latest updates we have made to this database. We have added more data on specific organisms and proteins related to global health and expanded to cover almost the complete UniProt database, a primary data resource of protein sequences. We also made it easier for our users to access the data by directly downloading files or using advanced cloud-based tools. Finally, we have also improved how users view and search through these protein structures, making the user experience smoother and more informative. In short, AlphaFold DB has been growing rapidly and has become more user-friendly and robust to support the broader scientific community.


Assuntos
Inteligência Artificial , Estrutura Secundária de Proteína , Proteoma , Sequência de Aminoácidos , Bases de Dados de Proteínas , Ferramenta de Busca , Proteínas/química
3.
Science ; 381(6664): eadg7492, 2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37733863

RESUMO

The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.


Assuntos
Substituição de Aminoácidos , Doença , Mutação de Sentido Incorreto , Proteoma , Alinhamento de Sequência , Humanos , Substituição de Aminoácidos/genética , Benchmarking , Sequência Conservada , Bases de Dados Genéticas , Doença/genética , Genoma Humano , Conformação Proteica , Proteoma/genética , Alinhamento de Sequência/métodos , Aprendizado de Máquina
4.
JAMA ; 330(15): 1425-1426, 2023 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-37732824

RESUMO

In this Viewpoint, 2023 Lasker award winners John Jumper and Demis Hassabis describe their invention, the artificial intelligence­based system AlphaFold, which is able to predict protein structure with great accuracy.


Assuntos
Distinções e Prêmios , Pesquisa Biomédica , Conformação Proteica , Pesquisa Biomédica/história , Medicina , Estrutura Molecular , Reino Unido
5.
Gigascience ; 112022 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-36448847

RESUMO

While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.


Assuntos
Metadados , Registros , Sequência de Aminoácidos , Bases de Dados de Proteínas , Simulação por Computador
7.
Nat Commun ; 13(1): 5500, 2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36127359

RESUMO

Insulin-like growth factor (IGF) signaling is highly conserved and tightly regulated by proteases including Pregnancy-Associated Plasma Protein A (PAPP-A). PAPP-A and its paralog PAPP-A2 are metalloproteases that mediate IGF bioavailability through cleavage of IGF binding proteins (IGFBPs). Here, we present single-particle cryo-EM structures of the catalytically inactive mutant PAPP-A (E483A) in complex with a peptide from its substrate IGFBP5 (PAPP-ABP5) and also in its substrate-free form, by leveraging the power of AlphaFold to generate a high quality predicted model as a starting template. We show that PAPP-A is a flexible trans-dimer that binds IGFBP5 via a 25-amino acid anchor peptide which extends into the metalloprotease active site. This unique IGFBP5 anchor peptide that mediates the specific PAPP-A-IGFBP5 interaction is not found in other PAPP-A substrates. Additionally, we illustrate the critical role of the PAPP-A central domain as it mediates both IGFBP5 recognition and trans-dimerization. We further demonstrate that PAPP-A trans-dimer formation and distal inter-domain interactions are both required for efficient proteolysis of IGFBP4, but dispensable for IGFBP5 cleavage. Together the structural and biochemical studies reveal the mechanism of PAPP-A substrate binding and selectivity.


Assuntos
Proteína Plasmática A Associada à Gravidez , Somatomedinas , Aminoácidos/metabolismo , Peptídeos/metabolismo , Proteína Plasmática A Associada à Gravidez/química , Proteína Plasmática A Associada à Gravidez/metabolismo , Ligação Proteica , Somatomedinas/metabolismo
8.
Nat Commun ; 13(1): 3880, 2022 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-35794124

RESUMO

Sexual reproduction consists of genome reduction by meiosis and subsequent gamete fusion. The presence of genes homologous to eukaryotic meiotic genes in archaea and bacteria suggests that DNA repair mechanisms evolved towards meiotic recombination. However, fusogenic proteins resembling those found in gamete fusion in eukaryotes have so far not been found in prokaryotes. Here, we identify archaeal proteins that are homologs of fusexins, a superfamily of fusogens that mediate eukaryotic gamete and somatic cell fusion, as well as virus entry. The crystal structure of a trimeric archaeal fusexin (Fusexin1 or Fsx1) reveals an archetypical fusexin architecture with unique features such as a six-helix bundle and an additional globular domain. Ectopically expressed Fusexin1 can fuse mammalian cells, and this process involves the additional globular domain and a conserved fusion loop. Furthermore, archaeal fusexin genes are found within integrated mobile elements, suggesting potential roles in cell-cell fusion and gene exchange in archaea, as well as different scenarios for the evolutionary history of fusexins.


Assuntos
Archaea , Eucariotos , Animais , Archaea/genética , Fusão Celular , Eucariotos/genética , Células Eucarióticas , Células Germinativas/metabolismo , Mamíferos
9.
Nat Commun ; 13(1): 3526, 2022 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-35725571

RESUMO

Recognition of promoters in bacterial RNA polymerases (RNAPs) is controlled by sigma subunits. The key sequence motif recognized by the sigma, the -10 promoter element, is located in the non-template strand of the double-stranded DNA molecule ~10 nucleotides upstream of the transcription start site. Here, we explain the mechanism by which the phage AR9 non-virion RNAP (nvRNAP), a bacterial RNAP homolog, recognizes the -10 element of its deoxyuridine-containing promoter in the template strand. The AR9 sigma-like subunit, the nvRNAP enzyme core, and the template strand together form two nucleotide base-accepting pockets whose shapes dictate the requirement for the conserved deoxyuridines. A single amino acid substitution in the AR9 sigma-like subunit allows one of these pockets to accept a thymine thus expanding the promoter consensus. Our work demonstrates the extent to which viruses can evolve host-derived multisubunit enzymes to make transcription of their own genes independent of the host.


Assuntos
RNA Viral , Proteínas do Complexo da Replicase Viral , RNA Polimerases Dirigidas por DNA/metabolismo , Desoxiuridina , Regiões Promotoras Genéticas/genética , Fator sigma/metabolismo , Transcrição Gênica
10.
Nat Struct Mol Biol ; 29(3): 190-193, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35273390

RESUMO

Glycoprotein 2 (GP2) and uromodulin (UMOD) filaments protect against gastrointestinal and urinary tract infections by acting as decoys for bacterial fimbrial lectin FimH. By combining AlphaFold2 predictions with X-ray crystallography and cryo-EM, we show that these proteins contain a bipartite decoy module whose new fold presents the high-mannose glycan recognized by FimH. The structure rationalizes UMOD mutations associated with kidney diseases and visualizes a key epitope implicated in cast nephropathy.


Assuntos
Adesinas Bacterianas , Fímbrias Bacterianas , Adesinas Bacterianas/genética , Cristalografia por Raios X , Proteínas de Fímbrias/química , Proteínas de Fímbrias/genética , Proteínas de Fímbrias/metabolismo , Fímbrias Bacterianas/química , Fímbrias Bacterianas/metabolismo , Proteínas Ligadas por GPI , Humanos , Manose/análise , Uromodulina/análise , Uromodulina/química , Uromodulina/metabolismo
12.
Nucleic Acids Res ; 50(D1): D439-D444, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34791371

RESUMO

The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.


Assuntos
Bases de Dados de Proteínas , Dobramento de Proteína , Proteínas/química , Software , Sequência de Aminoácidos , Animais , Bactérias/genética , Bactérias/metabolismo , Conjuntos de Dados como Assunto , Dictyostelium/genética , Dictyostelium/metabolismo , Fungos/genética , Fungos/metabolismo , Humanos , Internet , Modelos Moleculares , Plantas/genética , Plantas/metabolismo , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Proteínas/genética , Proteínas/metabolismo , Trypanosoma cruzi/genética , Trypanosoma cruzi/metabolismo
13.
Proteins ; 89(12): 1711-1721, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34599769

RESUMO

We describe the operation and improvement of AlphaFold, the system that was entered by the team AlphaFold2 to the "human" category in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The AlphaFold system entered in CASP14 is entirely different to the one entered in CASP13. It used a novel end-to-end deep neural network trained to produce protein structures from amino acid sequence, multiple sequence alignments, and homologous proteins. In the assessors' ranking by summed z scores (>2.0), AlphaFold scored 244.0 compared to 90.8 by the next best group. The predictions made by AlphaFold had a median domain GDT_TS of 92.4; this is the first time that this level of average accuracy has been achieved during CASP, especially on the more difficult Free Modeling targets, and represents a significant improvement in the state of the art in protein structure prediction. We reported how AlphaFold was run as a human team during CASP14 and improved such that it now achieves an equivalent level of performance without intervention, opening the door to highly accurate large-scale structure prediction.


Assuntos
Modelos Moleculares , Redes Neurais de Computação , Dobramento de Proteína , Proteínas , Software , Sequência de Aminoácidos , Biologia Computacional , Aprendizado Profundo , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
14.
Nat Methods ; 18(10): 1196-1203, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34608324

RESUMO

How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer-promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution.


Assuntos
DNA/genética , Bases de Dados Genéticas , Epigênese Genética , Regulação da Expressão Gênica , Aprendizado de Máquina , Rede Nervosa , Animais , Linhagem Celular , Genoma , Genômica/métodos , Humanos , Camundongos , Locos de Características Quantitativas
15.
Nature ; 596(7873): 583-589, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34265844

RESUMO

Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1-4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence-the structure prediction component of the 'protein folding problem'8-has been an important open research problem for more than 50 years9. Despite recent progress10-14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.


Assuntos
Redes Neurais de Computação , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Sequência de Aminoácidos , Biologia Computacional/métodos , Biologia Computacional/normas , Bases de Dados de Proteínas , Aprendizado Profundo/normas , Modelos Moleculares , Reprodutibilidade dos Testes , Alinhamento de Sequência
16.
Nature ; 596(7873): 590-596, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34293799

RESUMO

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.


Assuntos
Biologia Computacional/normas , Aprendizado Profundo/normas , Modelos Moleculares , Conformação Proteica , Proteoma/química , Conjuntos de Dados como Assunto/normas , Diacilglicerol O-Aciltransferase/química , Glucose-6-Fosfatase/química , Humanos , Proteínas de Membrana/química , Dobramento de Proteína , Reprodutibilidade dos Testes
17.
Nature ; 577(7792): 706-710, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31942072

RESUMO

Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence1. This problem is of fundamental importance as the structure of a protein largely determines its function2; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures3. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force4 that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction5 (CASP13)-a blind assessment of the state of the field-AlphaFold created high-accuracy structures (with template modelling (TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined7.


Assuntos
Aprendizado Profundo , Modelos Moleculares , Conformação Proteica , Proteínas/química , Software , Sequência de Aminoácidos , Caspases/química , Caspases/genética , Conjuntos de Dados como Assunto , Dobramento de Proteína , Proteínas/genética
18.
Biophys J ; 117(8): 1429-1441, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-31587831

RESUMO

Single-molecule force spectroscopy has proven extremely beneficial in elucidating folding pathways for membrane proteins. Here, we simulate these measurements, conducting hundreds of unfolding trajectories using our fast Upside algorithm for slow enough speeds to reproduce key experimental features that may be missed using all-atom methods. The speed also enables us to determine the logarithmic dependence of pulling velocities on the rupture levels to better compare to experimental values. For simulations of atomic force microscope measurements in which force is applied vertically to the C-terminus of bacteriorhodopsin, we reproduce the major experimental features including even the back-and-forth unfolding of single helical turns. When pulling laterally on GlpG to mimic the experiment, we observe quite different behavior depending on the stiffness of the spring. With a soft spring, as used in the experimental studies with magnetic tweezers, the force remains nearly constant after the initial unfolding event, and a few pathways and a high degree of cooperativity are observed in both the experiment and simulation. With a stiff spring, however, the force drops to near zero after each major unfolding event, and numerous intermediates are observed along a wide variety of pathways. Hence, the mode of force application significantly alters the perception of the folding landscape, including the number of intermediates and the degree of folding cooperativity, important issues that should be considered when designing experiments and interpreting unfolding data.


Assuntos
Proteínas de Ligação a DNA/química , Endopeptidases/química , Proteínas de Escherichia coli/química , Proteínas de Membrana/química , Simulação de Dinâmica Molecular , Dobramento de Proteína , Bicamadas Lipídicas/química
19.
Proteins ; 87(12): 1141-1148, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31602685

RESUMO

We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Modelos Moleculares
20.
PLoS Comput Biol ; 14(12): e1006578, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30589834

RESUMO

An ongoing challenge in protein chemistry is to identify the underlying interaction energies that capture protein dynamics. The traditional trade-off in biomolecular simulation between accuracy and computational efficiency is predicated on the assumption that detailed force fields are typically well-parameterized, obtaining a significant fraction of possible accuracy. We re-examine this trade-off in the more realistic regime in which parameterization is a greater source of error than the level of detail in the force field. To address parameterization of coarse-grained force fields, we use the contrastive divergence technique from machine learning to train from simulations of 450 proteins. In our procedure, the computational efficiency of the model enables high accuracy through the precise tuning of the Boltzmann ensemble. This method is applied to our recently developed Upside model, where the free energy for side chains is rapidly calculated at every time-step, allowing for a smooth energy landscape without steric rattling of the side chains. After this contrastive divergence training, the model is able to de novo fold proteins up to 100 residues on a single core in days. This improved Upside model provides a starting point both for investigation of folding dynamics and as an inexpensive Bayesian prior for protein physics that can be integrated with additional experimental or bioinformatic data.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Teorema de Bayes , Simulação por Computador , Aprendizado de Máquina , Simulação de Dinâmica Molecular/estatística & dados numéricos , Conformação Proteica , Dobramento de Proteína , Software , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...