Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 73
Filter
1.
Nat Methods ; 21(1): 110-116, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38036854

ABSTRACT

Artificial intelligence-based protein structure prediction methods such as AlphaFold have revolutionized structural biology. The accuracies of these predictions vary, however, and they do not take into account ligands, covalent modifications or other environmental factors. Here, we evaluate how well AlphaFold predictions can be expected to describe the structure of a protein by comparing predictions directly with experimental crystallographic maps. In many cases, AlphaFold predictions matched experimental maps remarkably closely. In other cases, even very high-confidence predictions differed from experimental maps on a global scale through distortion and domain orientation, and on a local scale in backbone and side-chain conformation. We suggest considering AlphaFold predictions as exceptionally useful hypotheses. We further suggest that it is important to consider the confidence in prediction when interpreting AlphaFold predictions and to carry out experimental structure determination to verify structural details, particularly those that involve interactions not included in the prediction.


Subject(s)
Artificial Intelligence , Mental Processes , Crystallography , Protein Conformation
2.
Nat Methods ; 21(7): 1340-1348, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38918604

ABSTRACT

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein-nucleic acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: Escherichia coli beta-galactosidase with inhibitor, SARS-CoV-2 virus RNA-dependent RNA polymerase with covalently bound nucleotide analog and SARS-CoV-2 virus ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. The quality of submitted ligand models and surrounding atoms were analyzed by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics and contact scores. A composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.


Subject(s)
Cryoelectron Microscopy , Models, Molecular , Cryoelectron Microscopy/methods , Ligands , SARS-CoV-2 , COVID-19/virology , Escherichia coli , beta-Galactosidase/chemistry , beta-Galactosidase/metabolism , Protein Conformation , Reproducibility of Results
3.
Nat Methods ; 19(11): 1376-1382, 2022 11.
Article in English | MEDLINE | ID: mdl-36266465

ABSTRACT

Machine-learning prediction algorithms such as AlphaFold and RoseTTAFold can create remarkably accurate protein models, but these models usually have some regions that are predicted with low confidence or poor accuracy. We hypothesized that by implicitly including new experimental information such as a density map, a greater portion of a model could be predicted accurately, and that this might synergistically improve parts of the model that were not fully addressed by either machine learning or experiment alone. An iterative procedure was developed in which AlphaFold models are automatically rebuilt on the basis of experimental density maps and the rebuilt models are used as templates in new AlphaFold predictions. We show that including experimental information improves prediction beyond the improvement obtained with simple rebuilding guided by the experimental data. This procedure for AlphaFold modeling with density has been incorporated into an automated procedure for interpretation of crystallographic and electron cryo-microscopy maps.


Subject(s)
Algorithms , Proteins , Models, Molecular , Cryoelectron Microscopy/methods , Proteins/chemistry , Machine Learning , Protein Conformation
4.
Nat Methods ; 18(2): 156-164, 2021 02.
Article in English | MEDLINE | ID: mdl-33542514

ABSTRACT

This paper describes outcomes of the 2019 Cryo-EM Model Challenge. The goals were to (1) assess the quality of models that can be produced from cryogenic electron microscopy (cryo-EM) maps using current modeling software, (2) evaluate reproducibility of modeling results from different software developers and users and (3) compare performance of current metrics used for model evaluation, particularly Fit-to-Map metrics, with focus on near-atomic resolution. Our findings demonstrate the relatively high accuracy and reproducibility of cryo-EM models derived by 13 participating teams from four benchmark maps, including three forming a resolution series (1.8 to 3.1 Å). The results permit specific recommendations to be made about validating near-atomic cryo-EM structures both in the context of individual experiments and structure data archives such as the Protein Data Bank. We recommend the adoption of multiple scoring parameters to provide full and objective annotation and assessment of the model, reflective of the observed cryo-EM map density.


Subject(s)
Cryoelectron Microscopy/methods , Models, Molecular , Crystallography, X-Ray , Protein Conformation , Proteins/chemistry
5.
J Biol Chem ; 296: 100742, 2021.
Article in English | MEDLINE | ID: mdl-33957126

ABSTRACT

Ever since the first structures of proteins were determined in the 1960s, structural biologists have required methods to visualize biomolecular structures, both as an essential tool for their research and also to promote 3D comprehension of structural results by a wide audience of researchers, students, and the general public. In this review to celebrate the 50th anniversary of the Protein Data Bank, we present our own experiences in developing and applying methods of visualization and analysis to the ever-expanding archive of protein and nucleic acid structures in the worldwide Protein Data Bank. Across that timespan, Jane and David Richardson have concentrated on the organization inside and between the macromolecules, with ribbons to show the overall backbone "fold" and contact dots to show how the all-atom details fit together locally. David Goodsell has explored surface-based representations to present and explore biological subjects that range from molecules to cells. This review concludes with some ideas about the current challenges being addressed by the field of biomolecular visualization.


Subject(s)
Databases, Protein/history , Models, Molecular , Molecular Biology/history , History, 20th Century , History, 21st Century , Humans
6.
Biophys J ; 120(6): 1085-1096, 2021 03 16.
Article in English | MEDLINE | ID: mdl-33460600

ABSTRACT

This work builds upon the record-breaking speed and generous immediate release of new experimental three-dimensional structures of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins and complexes, which are crucial to downstream vaccine and drug development. We have surveyed those structures to catch the occasional errors that could be significant for those important uses and for which we were able to provide demonstrably higher-accuracy corrections. This process relied on new validation and correction methods such as CaBLAM and ISOLDE, which are not yet in routine use. We found such important and correctable problems in seven early SARS-CoV-2 structures. Two of the structures were soon superseded by new higher-resolution data, confirming our proposed changes. For the other five, we emailed the depositors a documented and illustrated report and encouraged them to make the model corrections themselves and use the new option at the worldwide Protein Data Bank for depositors to re-version their coordinates without changing the Protein Data Bank code. This quickly and easily makes the better-accuracy coordinates available to anyone who examines or downloads their structure, even before formal publication. The changes have involved sequence misalignments, incorrect RNA conformations near a bound inhibitor, incorrect metal ligands, and cis-trans or peptide flips that prevent good contact at interaction sites. These improvements have propagated into nearly all related structures done afterward. This process constitutes a new form of highly rigorous peer review, which is actually faster and more strict than standard publication review because it has access to coordinates and maps; journal peer review would also be strengthened by such access.


Subject(s)
Peer Review , SARS-CoV-2/chemistry , Adenosine Monophosphate/analogs & derivatives , Adenosine Monophosphate/chemistry , Adenosine Monophosphate/pharmacology , Alanine/analogs & derivatives , Alanine/chemistry , Alanine/pharmacology , Antibodies, Viral , Catalytic Domain , DNA-Directed RNA Polymerases/metabolism , Humans , Models, Molecular , Nucleocapsid/chemistry , Phosphoproteins/chemistry , RNA-Binding Proteins/chemistry , SARS-CoV-2/drug effects , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/metabolism , Zinc/metabolism
7.
J Struct Biol ; 204(2): 301-312, 2018 11.
Article in English | MEDLINE | ID: mdl-30107233

ABSTRACT

We find that the overall quite good methods used in the CryoEM Model Challenge could still benefit greatly from several strategies for improving local conformations. Our assessments primarily use validation criteria from the MolProbity web service. Those criteria include MolProbity's all-atom contact analysis, updated versions of standard conformational validations for protein and RNA, plus two recent additions: first, flags for cis-nonPro and twisted peptides, and second, the CaBLAM system for diagnosing secondary structure, validating Cα backbone, and validating adjacent peptide CO orientations in the context of the Cα trace. In general, automated ab initio building of starting models is quite good at backbone connectivity but often fails at local conformation or sequence register, especially at poorer than 3.5 Šresolution. However, we show that even if criteria (such as Ramachandran or rotamer) are explicitly restrained to improve refinement behavior and overall validation scores, automated optimization of a deposited structure seldom corrects specific misfittings that start in the wrong local minimum, but just hides them. Therefore, local problems should be identified, and as many as possible corrected, before starting refinement. Secondary structures are confusing at 3-4 Šbut can be better recognized at 6-8 Å. In future model challenges, specific steps being tested (such as segmentation) and the required documentation (such as PDB code of starting model) should each be explicitly defined, so competing methods on a given task can be meaningfully compared. Individual local examples are presented here, to understand what local mistakes and corrections look like in 3D, how they probably arise, and what possible improvements to methodology might help avoid them. At these resolutions, both structural biologists and end-users need meaningful estimates of local uncertainty, perhaps through explicit ensembles. Fitting problems can best be diagnosed by validation that spans multiple residues; CaBLAM is such a multi-residue tool, and its effectiveness is demonstrated.


Subject(s)
Cryoelectron Microscopy/methods , Proteins/chemistry , Proteins/metabolism , Databases, Protein , Protein Conformation
8.
Nat Methods ; 17(7): 663-664, 2020 07.
Article in English | MEDLINE | ID: mdl-32616927
9.
Nucleic Acids Res ; 43(7): 3420-33, 2015 Apr 20.
Article in English | MEDLINE | ID: mdl-25813047

ABSTRACT

Hoogsteen (HG) base pairs (bps) provide an alternative pairing geometry to Watson-Crick (WC) bps and can play unique functional roles in duplex DNA. Here, we use structural features unique to HG bps (syn purine base, HG hydrogen bonds and constricted C1'-C1' distance across the bp) to search for HG bps in X-ray structures of DNA duplexes in the Protein Data Bank. The survey identifies 106 A•T and 34 G•C HG bps in DNA duplexes, many of which are undocumented in the literature. It also uncovers HG-like bps with syn purines lacking HG hydrogen bonds or constricted C1'-C1' distances that are analogous to conformations that have been proposed to populate the WC-to-HG transition pathway. The survey reveals HG preferences similar to those observed for transient HG bps in solution by nuclear magnetic resonance, including stronger preferences for A•T versus G•C bps, TA versus GG steps, and also suggests enrichment at terminal ends with a preference for 5'-purine. HG bps induce small local perturbations in neighboring bps and, surprisingly, a small but significant degree of DNA bending (∼14°) directed toward the major groove. The survey provides insights into the preferences and structural consequences of HG bps in duplex DNA.


Subject(s)
Base Pairing , DNA/chemistry , Nucleic Acid Conformation , Crystallography, X-Ray
10.
Proteins ; 84(9): 1177-89, 2016 09.
Article in English | MEDLINE | ID: mdl-27018641

ABSTRACT

Here we describe the updated MolProbity rotamer-library distributions derived from an order-of-magnitude larger and more stringently quality-filtered dataset of about 8000 (vs. 500) protein chains, and we explain the resulting changes and improvements to model validation as seen by users. To include only side-chains with satisfactory justification for their given conformation, we added residue-specific filters for electron-density value and model-to-density fit. The combined new protocol retains a million residues of data, while cleaning up false-positive noise in the multi- χ datapoint distributions. It enables unambiguous characterization of conformational clusters nearly 1000-fold less frequent than the most common ones. We describe examples of local interactions that favor these rare conformations, including the role of authentic covalent bond-angle deviations in enabling presumably strained side-chain conformations. Further, along with favored and outlier, an allowed category (0.3-2.0% occurrence in reference data) has been added, analogous to Ramachandran validation categories. The new rotamer distributions are used for current rotamer validation in MolProbity and PHENIX, and for rotamer choice in PHENIX model-building and refinement. The multi-dimensional χ distributions and Top8000 reference dataset are freely available on GitHub. These rotamers are termed "ultimate" because data sampling and quality are now fully adequate for this task, and also because we believe the future of conformational validation should integrate side-chain with backbone criteria. Proteins 2016; 84:1177-1189. © 2016 Wiley Periodicals, Inc.


Subject(s)
Algorithms , Electrons , Peptide Library , Proteins/chemistry , Amino Acids/chemistry , Databases, Protein , Datasets as Topic , Protein Conformation , Proteins/classification , Statistical Distributions , Thermodynamics
11.
Nucleic Acids Res ; 42(20): 12833-46, 2014 Nov 10.
Article in English | MEDLINE | ID: mdl-25326328

ABSTRACT

The hepatitis delta virus (HDV) ribozyme is a self-cleaving RNA enzyme essential for processing viral transcripts during rolling circle viral replication. The first crystal structure of the cleaved ribozyme was solved in 1998, followed by structures of uncleaved, mutant-inhibited and ion-complexed forms. Recently, methods have been developed that make the task of modeling RNA structure and dynamics significantly easier and more reliable. We have used ERRASER and PHENIX to rebuild and re-refine the cleaved and cis-acting C75U-inhibited structures of the HDV ribozyme. The results correct local conformations and identify alternates for RNA residues, many in functionally important regions, leading to improved R values and model validation statistics for both structures. We compare the rebuilt structures to a higher resolution, trans-acting deoxy-inhibited structure of the ribozyme, and conclude that although both inhibited structures are consistent with the currently accepted hammerhead-like mechanism of cleavage, they do not add direct structural evidence to the biochemical and modeling data. However, the rebuilt structures (PDBs: 4PR6, 4PRF) provide a more robust starting point for research on the dynamics and catalytic mechanism of the HDV ribozyme and demonstrate the power of new techniques to make significant improvements in RNA structures that impact biologically relevant conclusions.


Subject(s)
Hepatitis Delta Virus/enzymology , RNA, Catalytic/chemistry , Base Pairing , Models, Molecular , Nucleic Acid Conformation , RNA Cleavage , RNA, Catalytic/metabolism , Ribonucleoprotein, U1 Small Nuclear/chemistry , Ribonucleoprotein, U1 Small Nuclear/metabolism
12.
Biophys J ; 106(3): 510-25, 2014 Feb 04.
Article in English | MEDLINE | ID: mdl-24507592

ABSTRACT

The United Nations has declared 2014 the International Year of Crystallography, and in commemoration, this review features a selection of 54 notable macromolecular crystal structures that have illuminated the field of biophysics in the 54 years since the first excitement of the myoglobin and hemoglobin structures in 1960. Chronological by publication of the earliest solved structure, each illustrated entry briefly describes key concepts or methods new at the time and key later work leveraged by knowledge of the three-dimensional atomic structure.


Subject(s)
Crystallography, X-Ray/methods , Proteins/chemistry , Amino Acid Sequence , Animals , Humans , Molecular Sequence Data , Protein Conformation
13.
Acta Crystallogr D Biol Crystallogr ; 70(Pt 4): 1104-14, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24699654

ABSTRACT

Many macromolecular model-building and refinement programs can automatically place solvent atoms in electron density at moderate-to-high resolution. This process frequently builds water molecules in place of elemental ions, the identification of which must be performed manually. The solvent-picking algorithms in phenix.refine have been extended to build common ions based on an analysis of the chemical environment as well as physical properties such as occupancy, B factor and anomalous scattering. The method is most effective for heavier elements such as calcium and zinc, for which a majority of sites can be placed with few false positives in a diverse test set of structures. At atomic resolution, it is observed that it can also be possible to identify tightly bound sodium and magnesium ions. A number of challenges that contribute to the difficulty of completely automating the process of structure completion are discussed.


Subject(s)
Automation, Laboratory/methods , Crystallography, X-Ray/methods , Ions/chemistry , Models, Molecular , Protein Structure, Tertiary , Thermolysin/chemistry , Thrombin/chemistry
14.
ArXiv ; 2024 Feb 02.
Article in English | MEDLINE | ID: mdl-38076521

ABSTRACT

In January 2020, a workshop was held at EMBL-EBI (Hinxton, UK) to discuss data requirements for deposition and validation of cryoEM structures, with a focus on single-particle analysis. The meeting was attended by 47 experts in data processing, model building and refinement, validation, and archiving of such structures. This report describes the workshop's motivation and history, the topics discussed, and consensus recommendations resulting from the workshop. Some challenges for future methods-development efforts in this area are also highlighted, as is the implementation to date of some of the recommendations.

15.
Res Sq ; 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38343795

ABSTRACT

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein/nucleic-acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: E. coli beta-galactosidase with inhibitor, SARS-CoV-2 RNA-dependent RNA polymerase with covalently bound nucleotide analog, and SARS-CoV-2 ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. We found that (1) the quality of submitted ligand models and surrounding atoms varied, as judged by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics, and contact scores, and (2) a composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.

16.
IUCrJ ; 11(Pt 2): 140-151, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38358351

ABSTRACT

In January 2020, a workshop was held at EMBL-EBI (Hinxton, UK) to discuss data requirements for the deposition and validation of cryoEM structures, with a focus on single-particle analysis. The meeting was attended by 47 experts in data processing, model building and refinement, validation, and archiving of such structures. This report describes the workshop's motivation and history, the topics discussed, and the resulting consensus recommendations. Some challenges for future methods-development efforts in this area are also highlighted, as is the implementation to date of some of the recommendations.


Subject(s)
Data Curation , Cryoelectron Microscopy/methods
17.
Biopolymers ; 99(3): 170-82, 2013 Mar.
Article in English | MEDLINE | ID: mdl-23023928

ABSTRACT

Macromolecular crystal structures are among the best of scientific data, providing detailed insight into these complex and biologically important molecules with a relatively low level of error and subjectivity. However, there are two notable problems with getting the most information from them. The first is that the models are not perfect: there is still opportunity for improving them, and users need to evaluate whether the local reliability in a structure is up to answering their question of interest. The second is that protein and nucleic acid molecules are highly complex and individual, inherently handed and three-dimensional, and the cooperative and subtle interactions that govern their detailed structure and function are not intuitively evident. Thus there is a real need for graphical representations and descriptive classifications that enable molecular 3D literacy. We have spent our career working to understand these elegant molecules ourselves, and building tools to help us and others determine and understand them better. The Protein Data Bank (PDB) has of course been vital and central to this undertaking. Here we combine some history of our involvement as depositors, illustrators, evaluators, and end-users of PDB structures with commentary on how best to study and draw scientific inferences from them.


Subject(s)
Databases, Protein , Models, Molecular , Proteins/chemistry , Crystallography, X-Ray , Macromolecular Substances , Micrococcal Nuclease/chemistry , Staphylococcaceae/enzymology
18.
PLoS Comput Biol ; 8(8): e1002629, 2012.
Article in English | MEDLINE | ID: mdl-22876172

ABSTRACT

Amino acid substitutions in protein structures often require subtle backbone adjustments that are difficult to model in atomic detail. An improved ability to predict realistic backbone changes in response to engineered mutations would be of great utility for the blossoming field of rational protein design. One model that has recently grown in acceptance is the backrub motion, a low-energy dipeptide rotation with single-peptide counter-rotations, that is coupled to dynamic two-state sidechain rotamer jumps, as evidenced by alternate conformations in very high-resolution crystal structures. It has been speculated that backrubs may facilitate sequence changes equally well as rotamer changes. However, backrub-induced shifts and experimental uncertainty are of similar magnitude for backbone atoms in even high-resolution structures, so comparison of wildtype-vs.-mutant crystal structure pairs is not sufficient to directly link backrubs to mutations. In this study, we use two alternative approaches that bypass this limitation. First, we use a quality-filtered structure database to aggregate many examples for precisely defined motifs with single amino acid differences, and find that the effectively amplified backbone differences closely resemble backrubs. Second, we directly apply a provably-accurate, backrub-enabled protein design algorithm to idealized versions of these motifs, and discover that the lowest-energy computed models match the average-coordinate experimental structures. These results support the hypothesis that backrubs participate in natural protein evolution and validate their continued use for design of synthetic proteins.


Subject(s)
Movement , Mutation , Algorithms , Amino Acids/chemistry , Uncertainty
19.
Acta Crystallogr D Struct Biol ; 79(Pt 12): 1071-1078, 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-37921807

ABSTRACT

Model building and refinement, and the validation of their correctness, are very effective and reliable at local resolutions better than about 2.5 Šfor both crystallography and cryo-EM. However, at local resolutions worse than 2.5 Šboth the procedures and their validation break down and do not ensure reliably correct models. This is because in the broad density at lower resolution, critical features such as protein backbone carbonyl O atoms are not just less accurate but are not seen at all, and so peptide orientations are frequently wrongly fitted by 90-180°. This puts both backbone and side chains into the wrong local energy minimum, and they are then worsened rather than improved by further refinement into a valid but incorrect rotamer or Ramachandran region. On the positive side, new tools are being developed to locate this type of pernicious error in PDB depositions, such as CaBLAM, EMRinger, Pperp diagnosis of ribose puckers, and peptide flips in PDB-REDO, while interactive modeling in Coot or ISOLDE can help to fix many of them. Another positive trend is that artificial intelligence predictions such as those made by AlphaFold2 contribute additional evidence from large multiple sequence alignments, and in high-confidence parts they provide quite good starting models for loops, termini or whole domains with otherwise ambiguous density.


Subject(s)
Artificial Intelligence , Proteins , Models, Molecular , Proteins/chemistry , Crystallography, X-Ray , Peptides , Cryoelectron Microscopy/methods , Protein Conformation
20.
Acta Crystallogr D Struct Biol ; 79(Pt 3): 234-244, 2023 Mar 01.
Article in English | MEDLINE | ID: mdl-36876433

ABSTRACT

Experimental structure determination can be accelerated with artificial intelligence (AI)-based structure-prediction methods such as AlphaFold. Here, an automatic procedure requiring only sequence information and crystallographic data is presented that uses AlphaFold predictions to produce an electron-density map and a structural model. Iterating through cycles of structure prediction is a key element of this procedure: a predicted model rebuilt in one cycle is used as a template for prediction in the next cycle. This procedure was applied to X-ray data for 215 structures released by the Protein Data Bank in a recent six-month period. In 87% of cases our procedure yielded a model with at least 50% of Cα atoms matching those in the deposited models within 2 Å. Predictions from the iterative template-guided prediction procedure were more accurate than those obtained without templates. It is concluded that AlphaFold predictions obtained based on sequence information alone are usually accurate enough to solve the crystallographic phase problem with molecular replacement, and a general strategy for macromolecular structure determination that includes AI-based prediction both as a starting point and as a method of model optimization is suggested.


Subject(s)
Artificial Intelligence , Crystallography , Databases, Protein , Models, Structural
SELECTION OF CITATIONS
SEARCH DETAIL