ABSTRACT
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
Subject(s)
Computational Biology , Furylfuramide , Computational Biology/methods , Models, Molecular , Proteins/chemistry , Sequence AlignmentABSTRACT
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty-two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and x-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as noncanonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
Subject(s)
Algorithms , RNA , Computational Biology/methods , Proteins/chemistryABSTRACT
The epididymal lumen contains a complex cystatin-rich nonpathological amyloid matrix with putative roles in sperm maturation and sperm protection. Given our growing understanding for the biological function of this and other functional amyloids, the problem still remains: how functional amyloids assemble including their initial transition to early oligomeric forms. To examine this, we developed a protocol for the purification of nondenatured mouse CRES, a component of the epididymal amyloid matrix, allowing us to examine its assembly to amyloid under conditions that may mimic those in vivo. Herein we use X-ray crystallography, solution-state NMR, and solid-state NMR to follow at the atomic level the assembly of the CRES amyloidogenic precursor as it progressed from monomeric folded protein to an advanced amyloid. We show the CRES monomer has a typical cystatin fold that assembles into highly branched amyloid matrices, comparable to those in vivo, by forming ß-sheet assemblies that our data suggest occur via two distinct mechanisms: a unique conformational switch of a highly flexible disulfide-anchored loop to a rigid ß-strand and by traditional cystatin domain swapping. Our results provide key insight into our understanding of functional amyloid assembly by revealing the earliest structural transitions from monomer to oligomer and by showing that some functional amyloid structures may be built by multiple and distinctive assembly mechanisms.
Subject(s)
Amyloid/chemistry , Amyloidogenic Proteins/chemistry , Cystatins/chemistry , Amyloid/metabolism , Amyloid/ultrastructure , Amyloidogenic Proteins/metabolism , Animals , Crystallography, X-Ray , Cystatins/metabolism , Epididymis/metabolism , Magnetic Resonance Spectroscopy , Male , Mice , Models, Molecular , Protein Conformation , Protein Folding , Protein MultimerizationABSTRACT
SUMMARY: Covariance-based predictions of residue contacts and inter-residue distances are an increasingly popular data type in protein bioinformatics. Here we present ConPlot, a web-based application for convenient display and analysis of contact maps and distograms. Integration of predicted contact data with other predictions is often required to facilitate inference of structural features. ConPlot can therefore use the empty space near the contact map diagonal to display multiple coloured tracks representing other sequence-based predictions. Popular file formats are natively read and bespoke data can also be flexibly displayed. This novel visualization will enable easier interpretation of predicted contact maps. AVAILABILITY AND IMPLEMENTATION: available online at www.conplot.org, along with documentation and examples. Alternatively, ConPlot can be installed and used locally using the docker image from the project's Docker Hub repository. ConPlot is licensed under the BSD 3-Clause. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Proteins , Software , Internet , Proteins/geneticsABSTRACT
Microsomal triglyceride transfer protein (MTP) plays an essential role in lipid metabolism, especially in the biogenesis of very low-density lipoproteins and chylomicrons via the transfer of neutral lipids and the assembly of apoB-containing lipoproteins. Our understanding of the molecular mechanisms of MTP has been hindered by a lack of structural information of this heterodimeric complex comprising an MTPα subunit and a protein disulfide isomerase (PDI) ß-subunit. The structure of MTP presented here gives important insights into the potential mechanisms of action of this essential lipid transfer molecule, structure-based rationale for previously reported disease-causing mutations, and a means for rational drug design against cardiovascular disease and obesity. In contrast to the previously reported structure of lipovitellin, which has a funnel-like lipid-binding cavity, the lipid-binding site is encompassed in a ß-sandwich formed by 2 ß-sheets from the C-terminal domain of MTPα. The lipid-binding cavity of MTPα is large enough to accommodate a single lipid. PDI independently has a major role in oxidative protein folding in the endoplasmic reticulum. Comparison of the mechanism of MTPα binding by PDI with previously published structures gives insights into large protein substrate binding by PDI and suggests that the previous structures of human PDI represent the "substrate-bound" and "free" states rather than differences arising from redox state.
Subject(s)
Carrier Proteins/chemistry , Binding Sites , Crystallography, X-Ray , Humans , Protein Conformation, beta-StrandABSTRACT
The application of state-of-the-art deep-learning approaches to the protein modeling problem has expanded the "high-accuracy" category in CASP14 to encompass all targets. Building on the metrics used for high-accuracy assessment in previous CASPs, we evaluated the performance of all groups that submitted models for at least 10 targets across all difficulty classes, and judged the usefulness of those produced by AlphaFold2 (AF2) as molecular replacement search models with AMPLE. Driven by the qualitative diversity of the targets submitted to CASP, we also introduce DipDiff as a new measure for the improvement in backbone geometry provided by a model versus available templates. Although a large leap in high-accuracy is seen due to AF2, the second-best method in CASP14 out-performed the best in CASP13, illustrating the role of community-based benchmarking in the development and evolution of the protein structure prediction field.
Subject(s)
Models, Molecular , Protein Conformation , Proteins , Software , Computational Biology/methods , Computational Biology/standards , Databases, Protein , Proteins/chemistry , Proteins/metabolism , Reproducibility of Results , Sequence Analysis, ProteinABSTRACT
The assessment of CASP models for utility in molecular replacement is a measure of their use in a valuable real-world application. In CASP7, the metric for molecular replacement assessment involved full likelihood-based molecular replacement searches; however, this restricted the assessable targets to crystal structures with only one copy of the target in the asymmetric unit, and to those where the search found the correct pose. In CASP10, full molecular replacement searches were replaced by likelihood-based rigid-body refinement of models superimposed on the target using the LGA algorithm, with the metric being the refined log-likelihood-gain (LLG) score. This enabled multi-copy targets and very poor models to be evaluated, but a significant further issue remained: the requirement of diffraction data for assessment. We introduce here the relative-expected-LLG (reLLG), which is independent of diffraction data. This reLLG is also independent of any crystal form, and can be calculated regardless of the source of the target, be it X-ray, NMR or cryo-EM. We calibrate the reLLG against the LLG for targets in CASP14, showing that it is a robust measure of both model and group ranking. Like the LLG, the reLLG shows that accurate coordinate error estimates add substantial value to predicted models. We find that refinement by CASP groups can often convert an inadequate initial model into a successful MR search model. Consistent with findings from others, we show that the AlphaFold2 models are sufficiently good, and reliably so, to surpass other current model generation strategies for attempting molecular replacement phasing.
Subject(s)
Models, Molecular , Protein Conformation , Proteins , Software , Algorithms , Computational Biology , Crystallography, X-Ray , Magnetic Resonance Spectroscopy , Proteins/chemistry , Proteins/metabolismABSTRACT
AMPLE clusters and truncates ab initio protein structure predictions, producing search models for molecular replacement. Here, an interesting degree of complementarity is shown between targets solved using the different ab initio modelling programs QUARK and ROSETTA. Search models derived from either program collectively solve almost all of the all-helical targets in the test set. Initial solutions produced by Phaser after only 5â min perform surprisingly well, improving the prospects for in situ structure solution by AMPLE during synchrotron visits. Taken together, the results show the potential for AMPLE to run more quickly and successfully solve more targets than previously suspected.
Subject(s)
Proteins/chemistry , Software , Protein Conformation , Time FactorsABSTRACT
The Nipah virus phosphoprotein (P) is multimeric and tethers the viral polymerase to the nucleocapsid. We present the crystal structure of the multimerization domain of Nipah virus P: a long, parallel, tetrameric, coiled coil with a small, α-helical cap structure. Across the paramyxoviruses, these domains share little sequence identity yet are similar in length and structural organization, suggesting a common requirement for scaffolding or spatial organization of the functions of P in the virus life cycle.
Subject(s)
Biopolymers/chemistry , Nipah Virus/chemistry , Phosphoproteins/chemistry , Crystallography, X-Ray , Protein ConformationABSTRACT
Echinomycin is a nonribosomal depsipeptide natural product with a range of interesting bioactivities that make it an important target for drug discovery and development. It contains a thioacetal bridge, a unique chemical motif derived from the disulfide bond of its precursor antibiotic triostin A by the action of an S-adenosyl-L-methionine-dependent methyltransferase, Ecm18. The crystal structure of Ecm18 in complex with its reaction products S-adenosyl-L-homocysteine and echinomycin was determined at 1.50 Å resolution. Phasing was achieved using a new molecular replacement package called AMPLE, which automatically derives search models from structure predictions based on ab initio protein modelling. Structural analysis indicates that a combination of proximity effects, medium effects, and catalysis by strain drives the unique transformation of the disulfide bond into the thioacetal linkage.
Subject(s)
Disulfides/chemistry , Echinomycin/biosynthesis , Catalysis , Crystallography, X-Ray , Echinomycin/chemistry , Homocysteine/biosynthesis , Homocysteine/chemistry , Hydrogen Bonding , Methionine/chemistry , Methionine/metabolism , Methyltransferases/metabolism , Protein Structure, Tertiary , Quinoxalines/chemistryABSTRACT
The availability of highly accurate protein structure predictions from AlphaFold2 (AF2) and similar tools has hugely expanded the applicability of molecular replacement (MR) for crystal structure solution. Many structures can be solved routinely using raw models, structures processed to remove unreliable parts or models split into distinct structural units. There is therefore an open question around how many and which cases still require experimental phasing methods such as single-wavelength anomalous diffraction (SAD). Here, this question is addressed using a large set of PDB depositions that were solved by SAD. A large majority (87%) could be solved using unedited or minimally edited AF2 predictions. A further 18 (4%) yield straightforwardly to MR after splitting of the AF2 prediction using Slice'N'Dice, although different splitting methods succeeded on slightly different sets of cases. It is also found that further unique targets can be solved by alternative modelling approaches such as ESMFold (four cases), alternative MR approaches such as ARCIMBOLDO and AMPLE (two cases each), and multimeric model building with AlphaFold-Multimer or UniFold (three cases). Ultimately, only 12 cases, or 3% of the SAD-phased set, did not yield to any form of MR tested here, offering valuable hints as to the number and the characteristics of cases where experimental phasing remains essential for macromolecular structure solution.
ABSTRACT
The accuracy of the information in the Protein Data Bank (PDB) is of great importance for the myriad downstream applications that make use of protein structural information. Despite best efforts, the occasional introduction of errors is inevitable, especially where the experimental data are of limited resolution. A novel protein structure validation approach based on spotting inconsistencies between the residue contacts and distances observed in a structural model and those computationally predicted by methods such as AlphaFold2 has previously been established. It is particularly well suited to the detection of register errors. Importantly, this new approach is orthogonal to traditional methods based on stereochemistry or map-model agreement, and is resolution independent. Here, thousands of likely register errors are identified by scanning 3-5â Å resolution structures in the PDB. Unlike most methods, the application of this approach yields suggested corrections to the register of affected regions, which it is shown, even by limited implementation, lead to improved refinement statistics in the vast majority of cases. A few limitations and confounding factors such as fold-switching proteins are characterized, but this approach is expected to have broad application in spotting potential issues in current accessions and, through its implementation and distribution in CCP4, helping to ensure the accuracy of future depositions.
ABSTRACT
AMPLE is a program developed for clustering and truncating ab initio protein structure predictions into search models for molecular replacement. Here, it is shown that its core cluster-and-truncate methods also work well for processing NMR ensembles into search models. Rosetta remodelling helps to extend success to NMR structures bearing low sequence identity or high structural divergence from the target protein. Potential future routes to improved performance are considered and practical, general guidelines on using AMPLE are provided.
Subject(s)
Amino Acid Substitution , Bacterial Outer Membrane Proteins/chemistry , Multigene Family , Nuclear Magnetic Resonance, Biomolecular/methods , Software , Thioredoxins/chemistry , Amino Acid Substitution/genetics , Bacterial Outer Membrane Proteins/genetics , Crystallography, X-Ray/methods , Forecasting , Models, Molecular , Protein Folding , Software/standards , Streptomyces coelicolor/chemistry , Streptomyces coelicolor/genetics , Thioredoxins/geneticsABSTRACT
In late 2020, the results of CASP14, the 14th event in a series of competitions to assess the latest developments in computational protein structure-prediction methodology, revealed the giant leap forward that had been made by Google's Deepmind in tackling the prediction problem. The level of accuracy in their predictions was the first instance of a competitor achieving a global distance test score of better than 90 across all categories of difficulty. This achievement represents both a challenge and an opportunity for the field of experimental structural biology. For structure determination by macromolecular X-ray crystallography, access to highly accurate structure predictions is of great benefit, particularly when it comes to solving the phase problem. Here, details of new utilities and enhanced applications in the CCP4 suite, designed to allow users to exploit predicted models in determining macromolecular structures from X-ray diffraction data, are presented. The focus is mainly on applications that can be used to solve the phase problem through molecular replacement.
Subject(s)
Crystallography, X-Ray , X-Ray DiffractionABSTRACT
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and X-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as non-canonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
ABSTRACT
The Collaborative Computational Project No. 4 (CCP4) is a UK-led international collective with a mission to develop, test, distribute and promote software for macromolecular crystallography. The CCP4 suite is a multiplatform collection of programs brought together by familiar execution routines, a set of common libraries and graphical interfaces. The CCP4 suite has experienced several considerable changes since its last reference article, involving new infrastructure, original programs and graphical interfaces. This article, which is intended as a general literature citation for the use of the CCP4 software suite in structure determination, will guide the reader through such transformations, offering a general overview of the new features and outlining future developments. As such, it aims to highlight the individual programs that comprise the suite and to provide the latest references to them for perusal by crystallographers around the world.
Subject(s)
Proteins , Software , Proteins/chemistry , Crystallography, X-Ray , Macromolecular SubstancesABSTRACT
Protein ab initio models predicted from sequence data alone can enable the elucidation of crystal structures by molecular replacement. However, the calculation of such ab initio models is typically computationally expensive. Here, a computational pipeline based on the clustering and truncation of cheaply obtained ab initio models for the preparation of structure ensembles is described. Clustering is used to select models and to quantitatively predict their local accuracy, allowing rational truncation of predicted inaccurate regions. The resulting ensembles, with or without rapidly added side chains, solved 43% of all test cases, with an 80% success rate for all-α proteins. A program implementing this approach, AMPLE, is included in the CCP4 suite of programs. It only requires the input of a FASTA sequence file and a diffraction data file. It carries out the modelling using locally installed Rosetta, creates search ensembles and automatically performs molecular replacement and model rebuilding.
Subject(s)
Proteins/chemistry , Crystallography, X-Ray , Models, Molecular , Protein ConformationABSTRACT
Determination of protein structures typically entails building a model that satisfies the collected experimental observations and its deposition in the Protein Data Bank. Experimental limitations can lead to unavoidable uncertainties during the process of model building, which result in the introduction of errors into the deposited model. Many metrics are available for model validation, but most are limited to consideration of the physico-chemical aspects of the model or its match to the experimental data. The latest advances in the field of deep learning have enabled the increasingly accurate prediction of inter-residue distances, an advance which has played a pivotal role in the recent improvements observed in the field of protein ab initio modelling. Here, new validation methods are presented based on the use of these precise inter-residue distance predictions, which are compared with the distances observed in the protein model. Sequence-register errors are particularly clearly detected and the register shifts required for their correction can be reliably determined. The method is available in the ConKit package (https://www.conkit.org).
Subject(s)
Deep Learning , Databases, ProteinABSTRACT
Crystallographers have an array of search-model options for structure solution by molecular replacement (MR). The well established options of homologous experimental structures and regular secondary-structure elements or motifs are increasingly supplemented by computational modelling. Such modelling may be carried out locally or may use pre-calculated predictions retrieved from databases such as the EBI AlphaFold database. MrParse is a new pipeline to help to streamline the decision process in MR by consolidating bioinformatic predictions in one place. When reflection data are provided, MrParse can rank any experimental homologues found using eLLG, which indicates the likelihood that a given search model will work in MR. Inbuilt displays of predicted secondary structure, coiled-coil and transmembrane regions further inform the choice of MR protocol. MrParse can also identify and rank homologues in the EBI AlphaFold database, a function that will also interest other structural biologists and bioinformaticians.
Subject(s)
Proteins , Databases, Protein , Models, Molecular , Protein Domains , Protein Structure, Secondary , Proteins/chemistryABSTRACT
Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method's application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.