ABSTRACT
Intense X-rays available at powerful synchrotron beamlines provide macromolecular crystallographers with an incomparable tool for investigating biological phenomena on an atomic scale. The resulting insights into the mechanism's underlying biological processes have played an essential role and shaped biomedical sciences during the last 30 years, considered the "golden age" of structural biology. In this review, we analyze selected aspects of the impact of synchrotron radiation on structural biology. Synchrotron beamlines have been used to determine over 70% of all macromolecular structures deposited into the Protein Data Bank (PDB). These structures were deposited by over 13,000 different research groups. Interestingly, despite the impressive advances in synchrotron technologies, the median resolution of macromolecular structures determined using synchrotrons has remained constant throughout the last 30 years, at about 2 Å. Similarly, the median times from the data collection to the deposition and release have not changed significantly. We describe challenges to reproducibility related to recording all relevant data and metadata during the synchrotron experiments, including diffraction images. Finally, we discuss some of the recent opinions suggesting a diminishing importance of X-ray crystallography due to impressive advances in Cryo-EM and theoretical modeling. We believe that synchrotrons of the future will increasingly evolve towards a life science center model, where X-ray crystallography, Cryo-EM, and other experimental and computational resources and knowledge are encompassed within a versatile research facility. The recent response of crystallographers to the COVID-19 pandemic suggests that X-ray crystallography conducted at synchrotron beamlines will continue to play an essential role in structural biology and drug discovery for years to come.
ABSTRACT
Metal ions bound to macromolecules play an integral role in many cellular processes. They can directly participate in catalytic mechanisms or be essential for the structural integrity of proteins and nucleic acids. However, their unique nature in macromolecules can make them difficult to model and refine, and a substantial portion of metal ions in the PDB are misidentified or poorly refined. CheckMyMetal (CMM) is a validation tool that has gained widespread acceptance as an essential tool for researchers working on metal-macromolecule complexes. CMM can be used during structure determination or to validate metal binding sites in structural models within the PDB. The functionalities of CMM have recently been greatly enhanced and provide researchers with additional information that can guide modeling decisions. The new version of CMM shows metals in the context of electron density maps and allows for on-the-fly refinement of metal binding sites. The improvements should increase the reproducibility of biomedical research. The web server is available at https://cmm.minorlab.org.
Subject(s)
Metals , Proteins , Binding Sites , Reproducibility of Results , Models, Molecular , Proteins/chemistry , Metals/metabolism , IonsABSTRACT
The rut pathway of pyrimidine catabolism is a novel pathway that allows pyrimidine bases to serve as the sole nitrogen source in suboptimal temperatures. The rut operon in E. coli evaded detection until 2006, yet consists of seven proteins named RutA, RutB, etc. through RutG. The operon is comprised of a pyrimidine transporter and six enzymes that cleave and further process the uracil ring. Herein, we report the structure of RutD, a member of the α/ß hydrolase superfamily, which is proposed to enhance the rate of hydrolysis of aminoacrylate, a toxic side product of uracil degradation, to malonic semialdehyde. Although this reaction will occur spontaneously in water, the toxicity of aminoacrylate necessitates catalysis by RutD for efficient growth with uracil as a nitrogen source. RutD has a novel and conserved arrangement of residues corresponding to the α/ß hydrolase active site, where the nucleophile's spatial position occupied by Ser, Cys, or Asp of the canonical catalytic triad is replaced by histidine. We have used a combination of crystallographic structure determination, modeling and bioinformatics, to propose a novel mechanism for this enzyme. This approach also revealed that RutD represents a previously undescribed family within the α/ß hydrolases. We compare and contrast RutD with PcaD, which is the closest structural homolog to RutD. PcaD is a 3-oxoadipate-enol-lactonase with a classic arrangement of residues in the active site. We have modeled a substrate in the PcaD active site and proposed a reaction mechanism.
Subject(s)
Escherichia coli Proteins/chemistry , Hydrolases/chemistry , Carboxylic Ester Hydrolases/chemistry , Catalytic Domain , Escherichia coli/chemistry , Escherichia coli/enzymology , Escherichia coli/metabolism , Escherichia coli Proteins/metabolism , Hydrolases/metabolism , Metabolic Networks and Pathways , Models, Molecular , Protein Binding , Protein Conformation , Pyrimidines/metabolismABSTRACT
RutC is the third enzyme in the Escherichia coli rut pathway of uracil degradation. RutC belongs to the highly conserved YjgF family of proteins. The structure of the RutC protein was determined and refined to 1.95â Å resolution. The crystal belonged to space group P2(1)2(1)2 and contained six molecules in the asymmetric unit. The structure was solved by SAD phasing and was refined to an Rwork of 19.3% (Rfree=21.7%). The final model revealed that this protein has a Bacillus chorismate mutase-like fold and forms a homotrimer with a hydrophobic cavity in the center of the structure and ligand-binding clefts between two subunits. A likely function for RutC is the reduction of peroxy-aminoacrylate to aminoacrylate as a part of a detoxification process.
Subject(s)
Escherichia coli Proteins/chemistry , Escherichia coli/enzymology , Operon , Oxidoreductases/chemistry , Amino Acid Sequence , Catalytic Domain , Conserved Sequence , Crystallography, X-Ray , Escherichia coli/genetics , Hydrogen Bonding , Hydrophobic and Hydrophilic Interactions , Models, Molecular , Molecular Sequence Annotation , Molecular Sequence Data , Protein Structure, Quaternary , Protein Structure, Secondary , Sequence Alignment , Structural Homology, ProteinABSTRACT
The crystal structure of a short-chain dehydrogenase/reductase from Bacillus anthracis strain `Ames Ancestor' complexed with NADP has been determined and refined to 1.87 Å resolution. The structure of the enzyme consists of a Rossmann fold composed of seven parallel ß-strands sandwiched by three α-helices on each side. An NADP molecule from an endogenous source is bound in the conserved binding pocket in the syn conformation. The loop region responsible for binding another substrate forms two perpendicular short helices connected by a sharp turn.
Subject(s)
Bacillus anthracis/enzymology , Oxidoreductases/chemistry , Binding Sites , Biocatalysis , Models, Molecular , Protein Structure, Quaternary , Protein Structure, Tertiary , Structural Homology, Protein , Substrate SpecificityABSTRACT
Herein we present the newest version of the HKL-3000 system that integrates data collection, data reduction, phasing, model building, refinement, and validation. The system significantly accelerates the process of structure determination and has proven its high value for the determination of very high-quality structures. The heuristic for choosing the best approach for every step of structure determination for various quality samples and diffraction data has been optimized. The latest modifications increase the likelihood of a successful structure determination with challenging data. The HKL-3000 is a successor of HKL and HKL-2000 programs. The use of the HKL family of programs has been reported for over 73,000 PDB deposits, that is, almost 50% of macromolecular structures determined with X-ray diffraction.
Subject(s)
Models, Molecular , Software , X-Ray Diffraction , Molecular StructureABSTRACT
A structure of the apo-form of the putative transcriptional regulator SCO0520 from Streptomyces coelicolor A3(2) was determined at 1.8 Å resolution. SCO0520 belongs to the TetR family of regulators. In the crystal lattice, the asymmetric unit contains two monomers that form an Ω-shaped dimer. The distance between the two DNA-recognition domains is much longer than the corresponding distances in the known structures of other TetR family proteins. In addition, the subunits in the dimer have different conformational states, resulting in different relative positions of the DNA-binding and regulatory domains. Similar conformational modifications are observed in other TetR regulators and result from ligand binding. These studies provide information about the flexibility of SCO0520 molecule and its putative biological function.
Subject(s)
Bacterial Proteins/chemistry , DNA-Binding Proteins/chemistry , Streptomyces coelicolor/genetics , Transcription Factors/chemistry , Amino Acid Sequence , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Crystallography, X-Ray , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Helix-Turn-Helix Motifs , Models, Molecular , Molecular Sequence Data , Protein Conformation , Protein Multimerization , Protein Structure, Secondary , Protein Structure, Tertiary , Sequence Alignment , Sequence Analysis, DNA , Streptomyces coelicolor/chemistry , Transcription Factors/genetics , Transcription Factors/metabolismABSTRACT
Chromodomains are modules implicated in the recognition of lysine-methylated histone tails and nucleic acids. CHD (for chromo-ATPase/helicase-DNA-binding) proteins regulate ATP-dependent nucleosome assembly and mobilization through their conserved double chromodomains and SWI2/SNF2 helicase/ATPase domain. The Drosophila CHD1 localizes to the interbands and puffs of the polytene chromosomes, which are classic sites of transcriptional activity. Other CHD isoforms (CHD3/4 or Mi-2) are important for nucleosome remodelling in histone deacetylase complexes. Deletion of chromodomains impairs nucleosome binding and remodelling by CHD proteins. Here we describe the structure of the tandem arrangement of the human CHD1 chromodomains, and its interactions with histone tails. Unlike HP1 and Polycomb proteins that use single chromodomains to bind to their respective methylated histone H3 tails, the two chromodomains of CHD1 cooperate to interact with one methylated H3 tail. We show that the human CHD1 double chromodomains target the lysine 4-methylated histone H3 tail (H3K4me), a hallmark of active chromatin. Methylammonium recognition involves two aromatic residues, not the three-residue aromatic cage used by chromodomains of HP1 and Polycomb proteins. Furthermore, unique inserts within chromodomain 1 of CHD1 block the expected site of H3 tail binding seen in HP1 and Polycomb, instead directing H3 binding to a groove at the inter-chromodomain junction.
Subject(s)
Chromatin/metabolism , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , Histones/chemistry , Histones/metabolism , Amino Acid Sequence , Animals , Chromatin/chemistry , Chromobox Protein Homolog 5 , Chromosomal Proteins, Non-Histone/chemistry , Crystallography, X-Ray , DNA Helicases , Drosophila Proteins/chemistry , Humans , Lysine/metabolism , Methylation , Models, Molecular , Molecular Sequence Data , Polycomb Repressive Complex 1 , Protein Structure, Tertiary , Structure-Activity RelationshipABSTRACT
As part of the global mobilization to combat the present pandemic, almost 100â 000 COVID-19-related papers have been published and nearly a thousand models of macromolecules encoded by SARS-CoV-2 have been deposited in the Protein Data Bank within less than a year. The avalanche of new structural data has given rise to multiple resources dedicated to assessing the correctness and quality of structural data and models. Here, an approach to evaluate the massive amounts of such data using the resource https://covid19.bioreproducibility.org is described, which offers a template that could be used in large-scale initiatives undertaken in response to future biomedical crises. Broader use of the described methodology could considerably curtail information noise and significantly improve the reproducibility of biomedical research.
ABSTRACT
The COVID-19 pandemic has triggered numerous scientific activities aimed at understanding the SARS-CoV-2 virus and ultimately developing treatments. Structural biologists have already determined hundreds of experimental X-ray, cryo-EM, and NMR structures of proteins and nucleic acids related to this coronavirus, and this number is still growing. To help biomedical researchers, who may not necessarily be experts in structural biology, navigate through the flood of structural models, we have created an online resource, covid19.bioreproducibility.org, that aggregates expert-verified information about SARS-CoV-2-related macromolecular models. In this article, we describe this web resource along with the suite of tools and methodologies used for assessing the structures presented therein.
Subject(s)
COVID-19/genetics , Internet , SARS-CoV-2/ultrastructure , Viral Proteins/ultrastructure , COVID-19/virology , Databases, Chemical , Humans , Models, Structural , Pandemics , Research , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity , Viral Proteins/chemistry , Viral Proteins/geneticsABSTRACT
Efficient and comprehensive data management is an indispensable component of modern scientific research and requires effective tools for all but the most trivial experiments. The LabDB system developed and used in our laboratory was originally designed to track the progress of a structure determination pipeline in several large National Institutes of Health (NIH) projects. While initially designed for structural biology experiments, its modular nature makes it easily applied in laboratories of various sizes in many experimental fields. Over many years, LabDB has transformed into a sophisticated system integrating a range of biochemical, biophysical, and crystallographic experimental data, which harvests data both directly from laboratory instruments and through human input via a web interface. The core module of the system handles many types of universal laboratory management data, such as laboratory personnel, chemical inventories, storage locations, and custom stock solutions. LabDB also tracks various biochemical experiments, including spectrophotometric and fluorescent assays, thermal shift assays, isothermal titration calorimetry experiments, and more. LabDB has been used to manage data for experiments that resulted in over 1200 deposits to the Protein Data Bank (PDB); the system is currently used by the Center for Structural Genomics of Infectious Diseases (CSGID) and several large laboratories. This chapter also provides examples of data mining analyses and warnings about incomplete and inconsistent experimental data. These features, together with its capabilities for detailed tracking, analysis, and auditing of experimental data, make the described system uniquely suited to inspect potential sources of irreproducibility in life sciences research.
Subject(s)
Computational Biology , Database Management Systems , Databases, Protein , Humans , Reproducibility of ResultsABSTRACT
In macromolecular crystallography, the acquisition of a complete set of diffraction intensities typically involves a high cumulative dose of X-ray radiation. In the process of data acquisition, the irradiated crystal lattice undergoes a broad range of chemical and physical changes. These result in the gradual decay of diffraction intensities, accompanied by changes in the macroscopic organization of crystal lattice order and by localized changes in electron density that, owing to complex radiation chemistry, are specific for a particular macromolecule. The decay of diffraction intensities is a well defined physical process that is fully correctable during scaling and merging analysis and therefore, while limiting the amount of diffraction, it has no other impact on phasing procedures. Specific chemical changes, which are variable even between different crystal forms of the same macromolecule, are more difficult to predict, describe and correct in data. Appearing during the process of data collection, they result in gradual changes in structure factors and therefore have profound consequences in phasing procedures. Examples of various combinations of radiation-induced changes are presented and various considerations pertinent to the determination of the best strategies for handling diffraction data analysis in representative situations are discussed.
Subject(s)
Crystallography, X-Ray/methods , X-Rays , Models, Molecular , Protein Structure, Tertiary , Proteins/analysis , Proteins/chemistryABSTRACT
We tested the general applicability of in situ proteolysis to form protein crystals suitable for structure determination by adding a protease (chymotrypsin or trypsin) digestion step to crystallization trials of 55 bacterial and 14 human proteins that had proven recalcitrant to our best efforts at crystallization or structure determination. This is a work in progress; so far we determined structures of 9 bacterial proteins and the human aminoimidazole ribonucleotide synthetase (AIRS) domain.
Subject(s)
Crystallization/methods , Crystallography/methods , Peptide Hydrolases/chemistry , Proteins/chemistry , Proteins/ultrastructure , Protein ConformationABSTRACT
BACKGROUND: Many Gram-positive lactic acid bacteria (LAB) produce anti-bacterial peptides and small proteins called bacteriocins, which enable them to compete against other bacteria in the environment. These peptides fall structurally into three different classes, I, II, III, with class IIa being pediocin-like single entities and class IIb being two-peptide bacteriocins. Self-protective cognate immunity proteins are usually co-transcribed with these toxins. Several examples of cognates for IIa have already been solved structurally. Streptococcus pyogenes, closely related to LAB, is one of the most common human pathogens, so knowledge of how it competes against other LAB species is likely to prove invaluable. RESULTS: We have solved the crystal structure of the gene-product of locus Spy_2152 from S. pyogenes, (PDB:2fu2), and found it to comprise an anti-parallel four-helix bundle that is structurally similar to other bacteriocin immunity proteins. Sequence analyses indicate this protein to be a possible immunity protein protective against class IIa or IIb bacteriocins. However, given that S. pyogenes appears to lack any IIa pediocin-like proteins but does possess class IIb bacteriocins, we suggest this protein confers immunity to IIb-like peptides. CONCLUSIONS: Combined structural, genomic and proteomic analyses have allowed the identification and in silico characterization of a new putative immunity protein from S. pyogenes, possibly the first structure of an immunity protein protective against potential class IIb two-peptide bacteriocins. We have named the two pairs of putative bacteriocins found in S. pyogenes pyogenecin 1, 2, 3 and 4.
Subject(s)
Bacteriocins/chemistry , Streptococcus pyogenes/chemistry , Amino Acid Sequence , Crystallography, X-Ray , Models, Molecular , Molecular Sequence Data , Protein Conformation , Protein Structure, Tertiary , Sequence Alignment , Sequence Analysis, ProteinABSTRACT
The 3' processing of most bacterial precursor tRNAs involves exonucleolytic trimming to yield a mature CCA end. This step is carried out by RNase T, a member of the large DEDD family of exonucleases. We report the crystal structures of RNase T from Escherichia coli and Pseudomonas aeruginosa, which show that this enzyme adopts an opposing dimeric arrangement, with the catalytic DEDD residues from one monomer closely juxtaposed with a large basic patch on the other monomer. This arrangement suggests that RNase T has to be dimeric for substrate specificity, and agrees very well with prior site-directed mutagenesis studies. The dimeric architecture of RNase T is very similar to the arrangement seen in oligoribonuclease, another bacterial DEDD family exoribonuclease. The catalytic residues in these two enzymes are organized very similarly to the catalytic domain of the third DEDD family exoribonuclease in E. coli, RNase D, which is monomeric.
Subject(s)
Exoribonucleases/chemistry , RNA, Transfer/chemistry , Amino Acid Sequence , Escherichia coli/enzymology , Escherichia coli/genetics , Exoribonucleases/genetics , Molecular Sequence Data , Pseudomonas aeruginosa/enzymology , Pseudomonas aeruginosa/genetics , RNA, Transfer/metabolismABSTRACT
It has been increasingly recognized that preservation and public accessibility of primary experimental data are cornerstones necessary for the reproducibility of empirical sciences. In the field of molecular crystallography, many journals now recommend that authors of manuscripts presenting a new crystal structure should deposit their primary experimental data (X-ray diffraction images) to one of the dedicated resources created in recent years. Here, we describe our experiences developing the Integrated Resource for Reproducibility in Molecular Crystallography (IRRMC) and describe several examples of a crucial role that diffraction data can play in improving previously determined protein structures. In its first four years, several hundred crystallographers have deposited data from over 5200 diffraction experiments performed at over 60 different synchrotron beamlines or home sources all over the world. In addition to improving the resource and curating submitted data, we have been building a pipeline for extraction or, in some cases, reconstruction of the metadata necessary for seamless automated processing. Preliminary analysis indicates that about 95% of the archived data can be automatically reprocessed. A high rate of reprocessing success shows the feasibility of using the automated metadata extraction and automated processing as a validation step for the deposition of raw diffraction images. The IRRMC is guided by the Findable, Accessible, Interoperable, and Reusable data management principles.
ABSTRACT
Crystal structures of two orthologs of the regulatory subunit of acetohydroxyacid synthase III (AHAS, EC 2.2.1.6) from Thermotoga maritima (TM0549) and Nitrosomonas europea (NE1324) were determined by single-wavelength anomalous diffraction methods with the use of selenomethionine derivatives at 2.3 A and 2.5 A, respectively. TM0549 and NE1324 share the same fold, and in both proteins the polypeptide chain contains two separate domains of a similar size. Each protein contains a C-terminal domain with ferredoxin-type fold and an N-terminal ACT domain, of which the latter is characteristic for several proteins involved in amino acid metabolism. The ferredoxin domain is stabilized by a calcium ion in the crystal structure of NE1324 and by a Mg(H2O)(6)2+ ion in TM0549. Both TM0549 and NE1324 form dimeric assemblies in the crystal lattice.
Subject(s)
Acetolactate Synthase/chemistry , Bacterial Proteins/chemistry , Thermotoga maritima/enzymology , Acetolactate Synthase/genetics , Acetolactate Synthase/metabolism , Amino Acid Sequence , Arginine/chemistry , Arginine/metabolism , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Calcium/chemistry , Calcium/metabolism , Crystallography, X-Ray , Isoenzymes/chemistry , Isoenzymes/genetics , Isoenzymes/metabolism , Models, Molecular , Molecular Sequence Data , Protein Binding , Protein Structure, Secondary , Protein Structure, Tertiary , Sequence Homology, Amino Acid , Thermotoga maritima/geneticsABSTRACT
Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-`one-click' experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/.
Subject(s)
Models, Molecular , Software , Amino Acid Motifs , Amino Acids/chemistry , Proteins/chemistryABSTRACT
The misidentification of a protein sample, or contamination of a sample with the wrong protein, may be a potential reason for the non-reproducibility of experiments. This problem may occur in the process of heterologous overexpression and purification of recombinant proteins, as well as purification of proteins from natural sources. If the contaminated or misidentified sample is used for crystallization, in many cases the problem may not be detected until structures are determined. In the case of functional studies, the problem may not be detected for years. Here several procedures that can be successfully used for the identification of crystallized protein contaminants, including: (i) a lattice parameter search against known structures, (ii) sequence or fold identification from partially built models, and (iii) molecular replacement with common contaminants as search templates have been presented. A list of common contaminant structures to be used as alternative search models was provided. These methods were used to identify four cases of purification and crystallization artifacts. This report provides troubleshooting pointers for researchers facing difficulties in phasing or model building.
Subject(s)
Crystallization/methods , Proteins/chemistry , Acetyltransferases/chemistry , Acetyltransferases/isolation & purification , Animals , Artifacts , Bacterial Proteins/chemistry , Bacterial Proteins/isolation & purification , DNA-Directed RNA Polymerases/chemistry , DNA-Directed RNA Polymerases/isolation & purification , Escherichia coli/chemistry , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/isolation & purification , Proteins/isolation & purification , Recombinant Proteins/chemistry , Recombinant Proteins/isolation & purification , Reproducibility of Results , Sigma Factor/chemistry , Sigma Factor/isolation & purification , Staphylococcus aureus/chemistry , Survivin , Xenopus/metabolism , Xenopus Proteins/chemistryABSTRACT
The low reproducibility of published experimental results in many scientific disciplines has recently garnered negative attention in scientific journals and the general media. Public transparency, including the availability of `raw' experimental data, will help to address growing concerns regarding scientific integrity. Macromolecular X-ray crystallography has led the way in requiring the public dissemination of atomic coordinates and a wealth of experimental data, making the field one of the most reproducible in the biological sciences. However, there remains no mandate for public disclosure of the original diffraction data. The Integrated Resource for Reproducibility in Macromolecular Crystallography (IRRMC) has been developed to archive raw data from diffraction experiments and, equally importantly, to provide related metadata. Currently, the database of our resource contains data from 2920 macromolecular diffraction experiments (5767 data sets), accounting for around 3% of all depositions in the Protein Data Bank (PDB), with their corresponding partially curated metadata. IRRMC utilizes distributed storage implemented using a federated architecture of many independent storage servers, which provides both scalability and sustainability. The resource, which is accessible via the web portal at http://www.proteindiffraction.org, can be searched using various criteria. All data are available for unrestricted access and download. The resource serves as a proof of concept and demonstrates the feasibility of archiving raw diffraction data and associated metadata from X-ray crystallographic studies of biological macromolecules. The goal is to expand this resource and include data sets that failed to yield X-ray structures in order to facilitate collaborative efforts that will improve protein structure-determination methods and to ensure the availability of `orphan' data left behind for various reasons by individual investigators and/or extinct structural genomics projects.