RESUMEN
The Collaborative Computational Project No. 4 (CCP4) is a UK-led international collective with a mission to develop, test, distribute and promote software for macromolecular crystallography. The CCP4 suite is a multiplatform collection of programs brought together by familiar execution routines, a set of common libraries and graphical interfaces. The CCP4 suite has experienced several considerable changes since its last reference article, involving new infrastructure, original programs and graphical interfaces. This article, which is intended as a general literature citation for the use of the CCP4 software suite in structure determination, will guide the reader through such transformations, offering a general overview of the new features and outlining future developments. As such, it aims to highlight the individual programs that comprise the suite and to provide the latest references to them for perusal by crystallographers around the world.
Asunto(s)
Proteínas , Programas Informáticos , Proteínas/química , Cristalografía por Rayos X , Sustancias MacromolecularesRESUMEN
The oligosaccharides in N-glycosylation provide key structural and functional contributions to a glycoprotein. These contributions are dependent on the composition and overall conformation of the glycans. The Privateer software allows structural biologists to evaluate and improve the atomic structures of carbohydrates, including N-glycans; this software has recently been extended to check glycan composition through the use of glycomics data. Here, a broadening of the scope of the software to analyse and validate the overall conformation of N-glycans is presented, focusing on a newly compiled set of glycosidic linkage torsional preferences harvested from a curated set of glycoprotein models.
Asunto(s)
Oligosacáridos , Polisacáridos , Polisacáridos/química , Oligosacáridos/química , Glicoproteínas/química , Glicosilación , Glicómica , Conformación de CarbohidratosRESUMEN
Artificial intelligence-based protein structure prediction approaches have had a transformative effect on biomolecular sciences. The predicted protein models in the AlphaFold protein structure database, however, all lack coordinates for small molecules, essential for molecular structure or function: hemoglobin lacks bound heme; zinc-finger motifs lack zinc ions essential for structural integrity and metalloproteases lack metal ions needed for catalysis. Ligands important for biological function are absent too; no ADP or ATP is bound to any of the ATPases or kinases. Here we present AlphaFill, an algorithm that uses sequence and structure similarity to 'transplant' such 'missing' small molecules and ions from experimentally determined structures to predicted protein models. The algorithm was successfully validated against experimental structures. A total of 12,029,789 transplants were performed on 995,411 AlphaFold models and are available together with associated validation metrics in the alphafill.eu databank, a resource to help scientists make new hypotheses and design targeted experiments.
Asunto(s)
Inteligencia Artificial , Proteínas , Conformación Proteica , Proteínas/química , Zinc , Iones , LigandosRESUMEN
While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.
Asunto(s)
Metadatos , Registros , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Simulación por ComputadorRESUMEN
Cohesin structures the genome through the formation of chromatin loops and by holding together the sister chromatids. The acetylation of cohesin's SMC3 subunit is a dynamic process that involves the acetyltransferase ESCO1 and deacetylase HDAC8. Here we show that this cohesin acetylation cycle controls the three-dimensional genome in human cells. ESCO1 restricts the length of chromatin loops, and of architectural stripes emanating from CTCF sites. HDAC8 conversely promotes the extension of such loops and stripes. This role in controlling loop length turns out to be distinct from the canonical role of cohesin acetylation that protects against WAPL-mediated DNA release. We reveal that acetylation controls the interaction of cohesin with PDS5A to restrict chromatin loop length. Our data support a model in which this PDS5A-bound state acts as a brake that enables the pausing and restart of loop enlargement. The cohesin acetylation cycle hereby provides punctuation in the process of genome folding.
Asunto(s)
Proteínas de Ciclo Celular , Proteínas Cromosómicas no Histona , Acetilación , Proteínas de Ciclo Celular/metabolismo , Cromátides/metabolismo , Cromatina , Proteínas Cromosómicas no Histona/metabolismo , Histona Desacetilasas/genética , Humanos , Proteínas Nucleares/metabolismo , Proteínas Represoras/genética , CohesinasRESUMEN
The quality of macromolecular structure models crucially depends on refinement and validation targets, which optimally describe the expected chemistry. Commonly used software for these two procedures has been designed and developed in a protein-centric manner, resulting in relatively few established features for the refinement and validation of nucleic acid-containing structure models. Here, new nucleic acid-specific approaches implemented in PDB-REDO are described, including a new restraint model using noncovalent geometries (base-pair hydrogen bonding and base-pair stacking) as refinement targets. New validation routines are also presented, including a metric for Watson-Crick base-pair geometry normality (ZbpG). Applying the PDB-REDO pipeline with the new restraint model to the whole Protein Data Bank (PDB) demonstrates an overall positive effect on the quality of nucleic acid-containing structure models. Finally, we discuss examples of improvements in the geometry of specific nucleic acid structures in the PDB. The new PDB-REDO models and pipeline are available at https://pdb-redo.eu/.
Asunto(s)
Biología Computacional/métodos , Conformación de Ácido Nucleico , Ácidos Nucleicos/química , Programas Informáticos , Modelos MolecularesRESUMEN
Comparison of homologous structure models is a key step in analyzing protein structure. With a wealth of homologous structures, comparison becomes a tedious process, and often only a small (user-biased) selection of data is used. A multitude of structural superposition algorithms are then typically used to visualize the structures together in 3D and to compare them. Here, the Local Annotation of Homology-Matched Amino acids (LAHMA) website (https://lahma.pdb-redo.eu) is presented, which compares any structure model with all of its close homologs from the PDB-REDO databank. LAHMA displays structural features in sequence space, allowing users to uncover differences between homologous structure models that can be analyzed for their relevance to chemistry or biology. LAHMA visualizes numerous structural features, also allowing one-click comparison of structure-quality plots (for example the Ramachandran plot) and `in-browser' structural visualization of 3D models.
Asunto(s)
Algoritmos , Modelos Moleculares , Proteínas/química , Homología Estructural de Proteína , Bases de Datos de Proteínas , Programas InformáticosRESUMEN
Ramachandran plots report the distribution of the (Ï, ψ) torsion angles of the protein backbone and are one of the best quality metrics of experimental structure models. Typically, validation software reports the number of residues belonging to "outlier," "allowed," and "favored" regions. While "zero unexplained outliers" can be considered the current "gold standard," this can be misleading if deviations from expected distributions are not considered. We revisited the Ramachandran Z score (Rama-Z), a quality metric introduced more than two decades ago but underutilized. We describe a reimplementation of the Rama-Z score in the Computational Crystallography Toolbox along with an algorithm to estimate its uncertainty for individual models; final implementations are available in Phenix and PDB-REDO. We discuss the interpretation of the Rama-Z score and advocate including it in the validation reports provided by the Protein Data Bank. We also advocate reporting it alongside the outlier/allowed/favored counts in structural publications.
Asunto(s)
Algoritmos , Modelos Moleculares , Proteínas/ultraestructura , Sesgo , Microscopía por Crioelectrón , Cristalografía por Rayos X , Bases de Datos de Proteínas , Humanos , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Programas InformáticosRESUMEN
Using genome-wide radiogenetic profiling, we functionally dissect vulnerabilities of cancer cells to ionizing radiation (IR). We identify ERCC6L2 as a major determinant of IR response, together with classical DNA damage response genes and members of the recently identified shieldin and CTC1-STN1-TEN1 (CST) complexes. We show that ERCC6L2 contributes to non-homologous end joining (NHEJ), and it may exert this function through interactions with SFPQ. In addition to causing radiosensitivity, ERCC6L2 loss restores DNA end resection and partially rescues homologous recombination (HR) in BRCA1-deficient cells. As a consequence, ERCC6L2 deficiency confers resistance to poly (ADP-ribose) polymerase (PARP) inhibition in tumors deficient for both BRCA1 and p53. Moreover, we show that ERCC6L2 mutations are found in human tumors and correlate with a better overall survival in patients treated with radiotherapy (RT); this finding suggests that ERCC6L2 is a predictive biomarker of RT response.
Asunto(s)
Reparación del ADN por Unión de Extremidades/efectos de la radiación , ADN Helicasas/metabolismo , Animales , Humanos , RatonesRESUMEN
N-Glycosylation is one of the most common post-translational modifications and is implicated in, for example, protein folding and interaction with ligands and receptors. N-Glycosylation trees are complex structures of linked carbohydrate residues attached to asparagine residues. While carbohydrates are typically modeled in protein structures, they are often incomplete or have the wrong chemistry. Here, new tools are presented to automatically rebuild existing glycosylation trees, to extend them where possible, and to add new glycosylation trees if they are missing from the model. The method has been incorporated in the PDB-REDO pipeline and has been applied to build or rebuild 16â 452 carbohydrate residues in 11â 651 glycosylation trees in 4498 structure models, and is also available from the PDB-REDO web server. With better modeling of N-glycosylation, the biological function of this important modification can be better and more easily understood.
Asunto(s)
Conformación de Carbohidratos , Bases de Datos de Proteínas , Glicoproteínas/química , Polisacáridos/química , Conformación Proteica , Secuencia de Carbohidratos , Cristalografía por Rayos X/métodos , Humanos , Modelos MolecularesRESUMEN
The West-Life project (https://about.west-life.eu/) is a Horizon 2020 project funded by the European Commission to provide data processing and data management services for the international community of structural biologists, and in particular to support integrative experimental approaches within the field of structural biology. It has developed enhancements to existing web services for structure solution and analysis, created new pipelines to link these services into more complex higher-level workflows, and added new data management facilities. Through this work it has striven to make the benefits of European e-Infrastructures more accessible to life-science researchers in general and structural biologists in particular.
RESUMEN
Inherent protein flexibility, poor or low-resolution diffraction data or poorly defined electron-density maps often inhibit the building of complete structural models during X-ray structure determination. However, recent advances in crystallographic refinement and model building often allow completion of previously missing parts. This paper presents algorithms that identify regions missing in a certain model but present in homologous structures in the Protein Data Bank (PDB), and 'graft' these regions of interest. These new regions are refined and validated in a fully automated procedure. Including these developments in the PDB-REDO pipeline has enabled the building of 24â 962 missing loops in the PDB. The models and the automated procedures are publicly available through the PDB-REDO databank and webserver. More complete protein structure models enable a higher quality public archive but also a better understanding of protein function, better comparison between homologous structures and more complete data mining in structural bioinformatics projects.
RESUMEN
The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design.
Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Conformación Proteica , Estructura Secundaria de Proteína , Alineación de Secuencia , Análisis de Secuencia de Proteína , Integración de Sistemas , Interfaz Usuario-ComputadorRESUMEN
BACKGROUND: Many newly detected point mutations are located in protein-coding regions of the human genome. Knowledge of their effects on the protein's 3D structure provides insight into the protein's mechanism, can aid the design of further experiments, and eventually can lead to the development of new medicines and diagnostic tools. RESULTS: In this article we describe HOPE, a fully automatic program that analyzes the structural and functional effects of point mutations. HOPE collects information from a wide range of information sources including calculations on the 3D coordinates of the protein by using WHAT IF Web services, sequence annotations from the UniProt database, and predictions by DAS services. Homology models are built with YASARA. Data is stored in a database and used in a decision scheme to identify the effects of a mutation on the protein's 3D structure and function. HOPE builds a report with text, figures, and animations that is easy to use and understandable for (bio)medical researchers. CONCLUSIONS: We tested HOPE by comparing its output to the results of manually performed projects. In all straightforward cases HOPE performed similar to a trained bioinformatician. The use of 3D structures helps optimize the results in terms of reliability and details. HOPE's results are easy to understand and are presented in a way that is attractive for researchers without an extensive bioinformatics background.
Asunto(s)
Enfermedades Genéticas Congénitas/genética , Mutación , Conformación Proteica , Proteínas/genética , Programas Informáticos , Biología Computacional/métodos , Bases de Datos de Proteínas , Genoma Humano , Humanos , Proteínas/química , Interfaz Usuario-ComputadorRESUMEN
SUMMARY: When referring to genes, authors often use synonyms instead of the official gene symbols. In order to accurately retrieve as many relevant documents as possible, we have developed GeneE, a web application that expands a gene query to include all known synonyms, and adds disambiguation information for ambiguous terms, before forwarding the query to either PubMed, Google or Jane. The query expansion algorithm is also available as a web service. AVAILABILITY: http://biosemantics.org/geneE
Asunto(s)
Genes/genética , Almacenamiento y Recuperación de la Información/métodos , Internet , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/clasificación , Proteínas/genética , PubMed , Terminología como AsuntoRESUMEN
Homology modelling is normally the technique of choice when experimental structure data are not available but three-dimensional coordinates are needed, for example, to aid with detailed interpretation of results of spectroscopic studies. Herein, the state of the art of homology modelling will be described in the light of a series of recent developments, and an overview will be given of the problems and opportunities encountered in this field. The major topic, the accuracy and precision of homology models, will be discussed extensively due to its influence on the reliability of conclusions drawn from the combination of homology models and spectroscopic data. Three real-world examples will illustrate how both homology modelling and spectroscopy can be beneficial for (bio)medical research.
Asunto(s)
Modelos Moleculares , Homología de Secuencia , Análisis Espectral/métodos , Secuencia de Aminoácidos , Humanos , Datos de Secuencia Molecular , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/metabolismo , Marcadores de SpinRESUMEN
Signature genes are genes that are unique to a taxonomic clade and are common within it. They contain a wealth of information about clade-specific processes and hold a strong evolutionary signal that can be used to phylogenetically characterize a set of sequences, such as a metagenomics sample. As signature genes are based on gene content, they provide a means to assess the taxonomic origin of a sequence sample that is complementary to sequence-based analyses. Here, we introduce Signature (http://www.cmbi.ru.nl/signature), a web server that identifies the signature genes in a set of query sequences, and therewith phylogenetically characterizes it. The server produces a list of taxonomic clades that share signature genes with the set of query sequences, along with an insightful image of the tree of life, in which the clades are color coded based on the number of signature genes present. This allows the user to quickly see from which part(s) of the taxonomy the query sequences likely originate.