RESUMEN
The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein-nucleic acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: Escherichia coli beta-galactosidase with inhibitor, SARS-CoV-2 virus RNA-dependent RNA polymerase with covalently bound nucleotide analog and SARS-CoV-2 virus ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. The quality of submitted ligand models and surrounding atoms were analyzed by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics and contact scores. A composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.
Asunto(s)
Microscopía por Crioelectrón , Modelos Moleculares , Microscopía por Crioelectrón/métodos , Ligandos , SARS-CoV-2 , COVID-19/virología , Escherichia coli , beta-Galactosidasa/química , beta-Galactosidasa/metabolismo , Conformación Proteica , Reproducibilidad de los ResultadosRESUMEN
The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein/nucleic-acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: E. coli beta-galactosidase with inhibitor, SARS-CoV-2 RNA-dependent RNA polymerase with covalently bound nucleotide analog, and SARS-CoV-2 ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. We found that (1) the quality of submitted ligand models and surrounding atoms varied, as judged by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics, and contact scores, and (2) a composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.
RESUMEN
Model building and refinement, and the validation of their correctness, are very effective and reliable at local resolutions better than about 2.5â Å for both crystallography and cryo-EM. However, at local resolutions worse than 2.5â Å both the procedures and their validation break down and do not ensure reliably correct models. This is because in the broad density at lower resolution, critical features such as protein backbone carbonyl O atoms are not just less accurate but are not seen at all, and so peptide orientations are frequently wrongly fitted by 90-180°. This puts both backbone and side chains into the wrong local energy minimum, and they are then worsened rather than improved by further refinement into a valid but incorrect rotamer or Ramachandran region. On the positive side, new tools are being developed to locate this type of pernicious error in PDB depositions, such as CaBLAM, EMRinger, Pperp diagnosis of ribose puckers, and peptide flips in PDB-REDO, while interactive modeling in Coot or ISOLDE can help to fix many of them. Another positive trend is that artificial intelligence predictions such as those made by AlphaFold2 contribute additional evidence from large multiple sequence alignments, and in high-confidence parts they provide quite good starting models for loops, termini or whole domains with otherwise ambiguous density.
Asunto(s)
Inteligencia Artificial , Proteínas , Modelos Moleculares , Proteínas/química , Cristalografía por Rayos X , Péptidos , Microscopía por Crioelectrón/métodos , Conformación ProteicaRESUMEN
Myoglobin is a globular protein involved in oxygen storage and transport. No consensus yet exists on the atomic level mechanism by which oxygen and other small nonpolar ligands move between the myoglobin's buried heme, which is the ligand binding site, and surrounding solvent. This study uses room temperature molecular dynamics simulations to provide a complete atomic level picture of ligand migration in myoglobin. Multiple trajectories--providing a cumulative total of 7 micros of simulation--are analyzed. Our simulation results are consistent with and tie together previous experimental findings. Specifically, we characterize: (i) Explicit full trajectories in which the CO ligand shuttles between the internal binding site and the solvent and (ii) pattern and structural origins of transient voids available for ligand migration. The computations are performed both in sperm whale myoglobin wild-type and in sperm whale V68F myoglobin mutant, which is experimentally known to slow ligand-binding kinetics. On the basis of these independent, but mutually consistent ligand migration and transient void computations, we find that there are two discrete dynamical pathways for ligand migration in myoglobin. Trajectory hops between these pathways are limited to two bottleneck regions. Ligand enters and exits the protein matrix in common identifiable portals on the protein surface. The pathways are located in the "softer" regions of the protein matrix and go between its helices and in its loop regions. Localized structural fluctuations are the primary physical origin of the simulated CO migration pathways inside the protein.
Asunto(s)
Algoritmos , Monóxido de Carbono/química , Mioglobina/química , Solventes/química , Animales , Sitios de Unión , Transporte Biológico , Ligandos , Proteínas Mutantes/química , Fenilalanina/genética , Estructura Secundaria de Proteína , Valina/genética , BallenasRESUMEN
The MolProbity web service provides macromolecular model validation to help correct local errors, for the structural biology community worldwide. Here we highlight new validation features, and also describe how we are fighting back against outside developments which compromise that mission. Our new tool called UnDowser analyzes the properties and context of clashing HOH "waters" to diagnose what they might actually represent; a dozen distinct scenarios are illustrated and described. We now treat alternate conformations more thoroughly, and switching to the Neo4j database (graphical rather than relational) enables cleaner, more comprehensive, and much larger reference datasets. A problematic outside change is that refinement software now increasingly restrains traditional validation criteria (geometry, clashes, rotamers, and even Ramachandran) in order to supplement the sparser experimental data at 3-4 Å resolutions typical of modern cryoEM. But unfortunately the broad density allows model optimization without fixing underlying problems, which means these structures often score much better on validation than they really are. CaBLAM, our tool designed for evaluating peptide orientations at lower resolutions, was described in the previous Tools issue, and here we demonstrate its effectiveness in diagnosing local errors even when other validation outliers have been artificially removed. Sophisticated hacking of the MolProbity server has required continual monitoring and various security measures short of restricting user access. The deprecation of Java applets now prevents KiNG interactive online display of outliers on the 3D model during a MolProbity run, but that important functionality has now been recaptured with a modified version of the Javascript NGL Viewer.
Asunto(s)
Biología Computacional/métodos , Sustancias Macromoleculares/química , Microscopía por Crioelectrón , Cristalografía por Rayos X , Imagenología Tridimensional , Modelos Moleculares , Conformación Molecular , Programas Informáticos , Navegador WebRESUMEN
Diffraction (X-ray, neutron and electron) and electron cryo-microscopy are powerful methods to determine three-dimensional macromolecular structures, which are required to understand biological processes and to develop new therapeutics against diseases. The overall structure-solution workflow is similar for these techniques, but nuances exist because the properties of the reduced experimental data are different. Software tools for structure determination should therefore be tailored for each method. Phenix is a comprehensive software package for macromolecular structure determination that handles data from any of these techniques. Tasks performed with Phenix include data-quality assessment, map improvement, model building, the validation/rebuilding/refinement cycle and deposition. Each tool caters to the type of experimental data. The design of Phenix emphasizes the automation of procedures, where possible, to minimize repetitive and time-consuming manual tasks, while default parameters are chosen to encourage best practice. A graphical user interface provides access to many command-line features of Phenix and streamlines the transition between programs, project tracking and re-running of previous tasks.
Asunto(s)
Automatización/métodos , Sustancias Macromoleculares/química , Diseño de Software , Validación de Programas de Computación , Microscopía por Crioelectrón/métodos , Cristalografía por Rayos X/métodos , Modelos Moleculares , Conformación MolecularRESUMEN
Traditionally, validation was considered to be a final gatekeeping function, but refinement is smoother and results are better if model validation actively guides corrections throughout structure solution. This shifts emphasis from global to local measures: primarily geometry, conformations and sterics. A fit into the wrong local minimum conformation usually produces outliers in multiple measures. Moving to the right local minimum should be prioritized, rather than small shifts across arbitrary borderlines. Steric criteria work best with all explicit H atoms. `Backrub' motions should be used for side chains and `P-perp' diagnostics to correct ribose puckers. A `water' may actually be an ion, a relic of misfitting or an unmodeled alternate. Beware of wishful thinking in modeling ligands. At high resolution, internally consistent alternate conformations should be modeled and geometry in poor density should not be downweighted. At low resolution, CaBLAM should be used to diagnose protein secondary structure and ERRASER to correct RNA backbone. All atoms should not be forced inside density, beware of sequence misalignment, and very rare conformations such as cis-non-Pro peptides should be avoided. Automation continues to improve, but the crystallographer still must look at each outlier, in the context of density, and correct most of them. For the valid few with unambiguous density and something that is holding them in place, a functional reason should be sought. The expectation is a few outliers, not zero.
Asunto(s)
Cristalografía por Rayos X/métodos , Modelos Moleculares , Estudios de Validación como Asunto , Métodos , Proteínas/química , ARN/químicaRESUMEN
This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure improvements, including rewrite of previous Java utilities to now use existing or newly written Python utilities in the open-source CCTBX portion of the Phenix software system. This improves long-term maintainability and enhances the thorough integration of MolProbity-style validation within Phenix. There is now a complete MolProbity mirror site at http://molprobity.manchester.ac.uk. GitHub serves our open-source code, reference datasets, and the resulting multi-dimensional distributions that define most validation criteria. Coordinate output after Asn/Gln/His "flip" correction is now more idealized, since the post-refinement step has apparently often been skipped in the past. Two distinct sets of heavy-atom-to-hydrogen distances and accompanying van der Waals radii have been researched and improved in accuracy, one for the electron-cloud-center positions suitable for X-ray crystallography and one for nuclear positions. New validations include messages at input about problem-causing format irregularities, updates of Ramachandran and rotamer criteria from the million quality-filtered residues in a new reference dataset, the CaBLAM Cα-CO virtual-angle analysis of backbone and secondary structure for cryoEM or low-resolution X-ray, and flagging of the very rare cis-nonProline and twisted peptides which have recently been greatly overused. Due to wide application of MolProbity validation and corrections by the research community, in Phenix, and at the worldwide Protein Data Bank, newly deposited structures have continued to improve greatly as measured by MolProbity's unique all-atom clashscore.
Asunto(s)
Bases de Datos de Proteínas , Modelos Moleculares , Lenguajes de Programación , Proteínas/química , Proteínas/genéticaRESUMEN
Geometrical validation around the Calpha is described, with a new Cbeta measure and updated Ramachandran plot. Deviation of the observed Cbeta atom from ideal position provides a single measure encapsulating the major structure-validation information contained in bond angle distortions. Cbeta deviation is sensitive to incompatibilities between sidechain and backbone caused by misfit conformations or inappropriate refinement restraints. A new phi,psi plot using density-dependent smoothing for 81,234 non-Gly, non-Pro, and non-prePro residues with B < 30 from 500 high-resolution proteins shows sharp boundaries at critical edges and clear delineation between large empty areas and regions that are allowed but disfavored. One such region is the gamma-turn conformation near +75 degrees,-60 degrees, counted as forbidden by common structure-validation programs; however, it occurs in well-ordered parts of good structures, it is overrepresented near functional sites, and strain is partly compensated by the gamma-turn H-bond. Favored and allowed phi,psi regions are also defined for Pro, pre-Pro, and Gly (important because Gly phi,psi angles are more permissive but less accurately determined). Details of these accurate empirical distributions are poorly predicted by previous theoretical calculations, including a region left of alpha-helix, which rates as favorable in energy yet rarely occurs. A proposed factor explaining this discrepancy is that crowding of the two-peptide NHs permits donating only a single H-bond. New calculations by Hu et al. [Proteins 2002 (this issue)] for Ala and Gly dipeptides, using mixed quantum mechanics and molecular mechanics, fit our nonrepetitive data in excellent detail. To run our geometrical evaluations on a user-uploaded file, see MOLPROBITY (http://kinemage.biochem.duke.edu) or RAMPAGE (http://www-cryst.bioc.cam.ac.uk/rampage).
Asunto(s)
Aminoácidos/química , Conformación Proteica , Proteínas/química , Carbono/química , Glicina/química , Imagenología Tridimensional , Internet , Modelos Moleculares , Estructura Molecular , Prolina/químicaRESUMEN
Model validation has evolved from a passive final gatekeeping step to an ongoing diagnosis and healing process that enables significant improvement of accuracy. A recent phase of active development was spurred by the worldwide Protein Data Bank requiring data deposition and establishing Validation Task Force committees, by strong growth in high-quality reference data, by new speed and ease of computations, and by an upswing of interest in large molecular machines and structural ensembles. Progress includes automated correction methods, concise and user-friendly validation reports for referees and on the PDB websites, extension of error correction to RNA and error diagnosis to ligands, carbohydrates, and membrane proteins, and a good start on better methods for low resolution and for multiple conformations.