RESUMO
In January 2020, a workshop was held at EMBL-EBI (Hinxton, UK) to discuss data requirements for the deposition and validation of cryoEM structures, with a focus on single-particle analysis. The meeting was attended by 47 experts in data processing, model building and refinement, validation, and archiving of such structures. This report describes the workshop's motivation and history, the topics discussed, and the resulting consensus recommendations. Some challenges for future methods-development efforts in this area are also highlighted, as is the implementation to date of some of the recommendations.
Assuntos
Curadoria de Dados , Microscopia Crioeletrônica/métodosRESUMO
In January 2020, a workshop was held at EMBL-EBI (Hinxton, UK) to discuss data requirements for deposition and validation of cryoEM structures, with a focus on single-particle analysis. The meeting was attended by 47 experts in data processing, model building and refinement, validation, and archiving of such structures. This report describes the workshop's motivation and history, the topics discussed, and consensus recommendations resulting from the workshop. Some challenges for future methods-development efforts in this area are also highlighted, as is the implementation to date of some of the recommendations.
RESUMO
Cryogenic electron microscopy (cryo-EM) has recently been established as a powerful technique for solving macromolecular structures. Although the best resolutions achievable are improving, a significant majority of data are still resolved at resolutions worse than 3 Å, where it is non-trivial to build or fit atomic models. The map reconstructions and atomic models derived from the maps are also prone to errors accumulated through the different stages of data processing. Here, we highlight the need to evaluate both model geometry and fit to data at different resolutions. Assessment of cryo-EM structures from SARS-CoV-2 highlights a bias towards optimising the model geometry to agree with the most common conformations, compared to the agreement with data. We present the CoVal web service which provides multiple validation metrics to reflect the quality of atomic models derived from cryo-EM data of structures from SARS-CoV-2. We demonstrate that further refinement can lead to improvement of the agreement with data without the loss of geometric quality. We also discuss the recent CCP-EM developments aimed at addressing some of the current shortcomings.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Conformação Proteica , SoftwareRESUMO
Recently, there has been a dramatic improvement in the quality and quantity of data derived using cryogenic electron microscopy (cryo-EM). This is also associated with a large increase in the number of atomic models built. Although the best resolutions that are achievable are improving, often the local resolution is variable, and a significant majority of data are still resolved at resolutions worse than 3â Å. Model building and refinement is often challenging at these resolutions, and hence atomic model validation becomes even more crucial to identify less reliable regions of the model. Here, a graphical user interface for atomic model validation, implemented in the CCP-EM software suite, is presented. It is aimed to develop this into a platform where users can access multiple complementary validation metrics that work across a range of resolutions and obtain a summary of evaluations. Based on the validation estimates from atomic models associated with cryo-EM structures from SARS-CoV-2, it was observed that models typically favor adopting the most common conformations over fitting the observations when compared with the model agreement with data. At low resolutions, the stereochemical quality may be favored over data fit, but care should be taken to ensure that the model agrees with the data in terms of resolvable features. It is demonstrated that further re-refinement can lead to improvement of the agreement with data without the loss of geometric quality. This also highlights the need for improved resolution-dependent weight optimization in model refinement and an effective test for overfitting that would help to guide the refinement process.
Assuntos
Microscopia Crioeletrônica/métodos , Validação de Programas de Computador , Software , COVID-19 , Processamento de Imagem Assistida por Computador , Modelos Moleculares , Reprodutibilidade dos Testes , Interface Usuário-ComputadorRESUMO
In crystallography, the phase problem can often be addressed by the careful preparation of molecular-replacement search models. This has led to the development of pipelines such as MrBUMP that can automatically identify homologous proteins from an input sequence and edit them to focus on the areas that are most conserved. Many of these approaches can be applied directly to cryo-EM to help discover, prepare and correctly place models (here called cryo-EM search models) into electrostatic potential maps. This can significantly reduce the amount of manual model building that is required for structure determination. Here, MrBUMP is repurposed to fit automatically obtained PDB-derived chains and domains into cryo-EM maps. MrBUMP was successfully able to identify and place cryo-EM search models across a range of resolutions. Methods such as map segmentation are also explored as potential routes to improved performance. Map segmentation was also found to improve the effectiveness of the pipeline for higher resolution (<8â Å) data sets.
Assuntos
Microscopia Crioeletrônica/métodos , Proteínas/ultraestrutura , Software , Animais , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Conformação Proteica , Domínios Proteicos , Proteínas/químicaRESUMO
It is shown that the tendency of an archetypal antimicrobial peptide to insert into and perforate a simple lipid bilayer is strongly modulated by tensile stress in the membrane. The results, obtained through molecular dynamics simulations, have been demonstrated with several lipid compositions and appear to be general, although quantitative details differ. The findings imply that the potency of antimicrobial peptides may not be a purely intrinsic chemical property and, instead, depends on the mechanical state of the target membrane.
Assuntos
Peptídeos Catiônicos Antimicrobianos/química , Bicamadas Lipídicas/química , Modelos Químicos , Peptídeos Catiônicos Antimicrobianos/metabolismo , Simulação por Computador , Bicamadas Lipídicas/metabolismo , Fosfatidilcolinas/química , Resistência à TraçãoRESUMO
BACKGROUND: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for tyrhe compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. RESULTS: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed 'histosketch' that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a 'real life' example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. CONCLUSIONS: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space. ( https://github.com/will-rowe/hulk ).
Assuntos
Bactérias/classificação , Microbioma Gastrointestinal , Metagenômica/métodos , Antibacterianos/uso terapêutico , Infecções Bacterianas/tratamento farmacológico , Estudos de Coortes , Humanos , Recém-Nascido , Recém-Nascido Prematuro , Aprendizado de Máquina , Análise de Sequência de DNA , SoftwareRESUMO
Motivation: Antimicrobial resistance (AMR) remains a major threat to global health. Profiling the collective AMR genes within a metagenome (the 'resistome') facilitates greater understanding of AMR gene diversity and dynamics. In turn, this can allow for gene surveillance, individualized treatment of bacterial infections and more sustainable use of antimicrobials. However, resistome profiling can be complicated by high similarity between reference genes, as well as the sheer volume of sequencing data and the complexity of analysis workflows. We have developed an efficient and accurate method for resistome profiling that addresses these complications and improves upon currently available tools. Results: Our method combines a variation graph representation of gene sets with a locality-sensitive hashing Forest indexing scheme to allow for fast classification of metagenomic sequence reads using similarity-search queries. Subsequent hierarchical local alignment of classified reads against graph traversals enables accurate reconstruction of full-length gene sequences using a scoring scheme. We provide our implementation, graphing Resistance Out Of meTagenomes (GROOT), and show it to be both faster and more accurate than a current reference-dependent tool for resistome profiling. GROOT runs on a laptop and can process a typical 2 gigabyte metagenome in 2 min using a single CPU. Our method is not restricted to resistome profiling and has the potential to improve current metagenomic workflows. Availability and implementation: GROOT is written in Go and is available at https://github.com/will-rowe/groot (MIT license). Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Metagenômica , Infecções Bacterianas , Humanos , MetagenomaRESUMO
Increasing sophistication in molecular-replacement (MR) software and the rapid expansion of the PDB in recent years have allowed the technique to become the dominant method for determining the phases of a target structure in macromolecular X-ray crystallography. In addition, improvements in bioinformatic techniques for finding suitable homologous structures for use as MR search models, combined with developments in refinement and model-building techniques, have pushed the applicability of MR to lower sequence identities and made weak MR solutions more amenable to refinement and improvement. MrBUMP is a CCP4 pipeline which automates all stages of the MR procedure. Its scope covers everything from the sourcing and preparation of suitable search models right through to rebuilding of the positioned search model. Recent improvements to the pipeline include the adoption of more sensitive bioinformatic tools for sourcing search models, enhanced model-preparation techniques including better ensembling of homologues, and the use of phase improvement and model building on the resulting solution. The pipeline has also been deployed as an online service through CCP4 online, which allows its users to exploit large bioinformatic databases and coarse-grained parallelism to speed up the determination of a possible solution. Finally, the molecular-graphics application CCP4mg has been combined with MrBUMP to provide an interactive visual aid to the user during the process of selecting and manipulating search models for use in MR. Here, these developments in MrBUMP are described with a case study to explore how some of the enhancements to the pipeline and to CCP4mg can help to solve a difficult case.
Assuntos
Gráficos por Computador , Conformação Proteica , Proteínas/análise , Proteínas/química , Design de Software , Simulação por Computador , Cristalografia por Raios X , Humanos , Modelos MolecularesRESUMO
Molecular replacement (MR) is the predominant route to solution of the phase problem in macromolecular crystallography. Although routine in many cases, it becomes more effortful and often impossible when the available experimental structures typically used as search models are only distantly homologous to the target. Nevertheless, with current powerful MR software, relatively small core structures shared between the target and known structure, of 20-40% of the overall structure for example, can succeed as search models where they can be isolated. Manual sculpting of such small structural cores is rarely attempted and is dependent on the crystallographer's expertise and understanding of the protein family in question. Automated search-model editing has previously been performed on the basis of sequence alignment, in order to eliminate, for example, side chains or loops that are not present in the target, or on the basis of structural features (e.g. solvent accessibility) or crystallographic parameters (e.g. B factors). Here, based on recent work demonstrating a correlation between evolutionary conservation and protein rigidity/packing, novel automated ways to derive edited search models from a given distant homologue over a range of sizes are presented. A variety of structure-based metrics, many readily obtained from online webservers, can be fed to the MR pipeline AMPLE to produce search models that succeed with a set of test cases where expertly manually edited comparators, further processed in diverse ways with MrBUMP, fail. Further significant performance gains result when the structure-based distance geometry method CONCOORD is used to generate ensembles from the distant homologue. To our knowledge, this is the first such approach whereby a single structure is meaningfully transformed into an ensemble for the purposes of MR. Additional cases further demonstrate the advantages of the approach. CONCOORD is freely available and computationally inexpensive, so these novel methods offer readily available new routes to solve difficult MR cases.
Assuntos
Conformação Proteica , Proteínas/análise , Proteínas/química , Software , Simulação por Computador , Cristalografia por Raios X , Humanos , Modelos MolecularesRESUMO
BACKGROUND: De novo transcriptome assembly is an important technique for understanding gene expression in non-model organisms. Many de novo assemblers using the de Bruijn graph of a set of the RNA sequences rely on in-memory representation of this graph. However, current methods analyse the complete set of read-derived k-mer sequence at once, resulting in the need for computer hardware with large shared memory. RESULTS: We introduce a novel approach that clusters k-mers as the first step. The clusters correspond to small sets of gene products, which can be processed quickly to give candidate transcripts. We implement the clustering step using the MapReduce approach for parallelising the analysis of large datasets, which enables the use of compute clusters. The computational task is distributed across the compute system using the industry-standard MPI protocol, and no specialised hardware is required. Using this approach, we have re-implemented the Inchworm module from the widely used Trinity pipeline, and tested the method in the context of the full Trinity pipeline. Validation tests on a range of real datasets show large reductions in the runtime and per-node memory requirements, when making use of a compute cluster. CONCLUSIONS: Our study shows that MapReduce-based clustering has great potential for distributing challenging sequencing problems, without loss of accuracy. Although we have focussed on the Trinity package, we propose that such clustering is a useful initial step for other assembly pipelines.
Assuntos
Algoritmos , Análise por Conglomerados , Sequenciamento de Nucleotídeos em Larga Escala , RNA/química , RNA/genética , Análise de Sequência de RNA , TranscriptomaRESUMO
Epidermal growth factor receptor (EGFR) signalling is activated by ligand-induced receptor dimerization. Notably, ligand binding also induces EGFR oligomerization, but the structures and functions of the oligomers are poorly understood. Here, we use fluorophore localization imaging with photobleaching to probe the structure of EGFR oligomers. We find that at physiological epidermal growth factor (EGF) concentrations, EGFR assembles into oligomers, as indicated by pairwise distances of receptor-bound fluorophore-conjugated EGF ligands. The pairwise ligand distances correspond well with the predictions of our structural model of the oligomers constructed from molecular dynamics simulations. The model suggests that oligomerization is mediated extracellularly by unoccupied ligand-binding sites and that oligomerization organizes kinase-active dimers in ways optimal for auto-phosphorylation in trans between neighbouring dimers. We argue that ligand-induced oligomerization is essential to the regulation of EGFR signalling.
Assuntos
Receptores ErbB/química , Receptores ErbB/metabolismo , Animais , Artefatos , Sítios de Ligação , Células CHO , Cricetinae , Cricetulus , Fator de Crescimento Epidérmico/metabolismo , Transferência Ressonante de Energia de Fluorescência , Ligantes , Simulação de Dinâmica Molecular , Fosforilação , Domínios Proteicos , Multimerização Proteica , Transdução de SinaisRESUMO
For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions ('decoys'), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue-residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing ß-structure, decoy quality and MR performance were further improved by the use of a ß-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with ß-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.
RESUMO
Coiled-coil protein folds are among the most abundant in nature. These folds consist of long wound α-helices and are architecturally simple, but paradoxically their crystallographic structures are notoriously difficult to solve with molecular-replacement techniques. The program AMPLE can solve crystal structures by molecular replacement using ab initio search models in the absence of an existent homologous protein structure. AMPLE has been benchmarked on a large and diverse test set of coiled-coil crystal structures and has been found to solve 80% of all cases. Successes included structures with chain lengths of up to 253 residues and resolutions down to 2.9â Å, considerably extending the limits on size and resolution that are typically tractable by ab initio methodologies. The structures of two macromolecular complexes, one including DNA, were also successfully solved using their coiled-coil components. It is demonstrated that both the ab initio modelling and the use of ensemble search models contribute to the success of AMPLE by comparison with phasing attempts using single structures or ideal polyalanine helices. These successes suggest that molecular replacement with AMPLE should be the method of choice for the crystallo-graphic elucidation of a coiled-coil structure. Furthermore, AMPLE may be able to exploit the presence of a coiled coil in a complex to provide a convenient route for phasing.
RESUMO
AMPLE clusters and truncates ab initio protein structure predictions, producing search models for molecular replacement. Here, an interesting degree of complementarity is shown between targets solved using the different ab initio modelling programs QUARK and ROSETTA. Search models derived from either program collectively solve almost all of the all-helical targets in the test set. Initial solutions produced by Phaser after only 5â min perform surprisingly well, improving the prospects for in situ structure solution by AMPLE during synchrotron visits. Taken together, the results show the potential for AMPLE to run more quickly and successfully solve more targets than previously suspected.
Assuntos
Proteínas/química , Software , Conformação Proteica , Fatores de TempoRESUMO
Echinomycin is a nonribosomal depsipeptide natural product with a range of interesting bioactivities that make it an important target for drug discovery and development. It contains a thioacetal bridge, a unique chemical motif derived from the disulfide bond of its precursor antibiotic triostin A by the action of an S-adenosyl-L-methionine-dependent methyltransferase, Ecm18. The crystal structure of Ecm18 in complex with its reaction products S-adenosyl-L-homocysteine and echinomycin was determined at 1.50 Å resolution. Phasing was achieved using a new molecular replacement package called AMPLE, which automatically derives search models from structure predictions based on ab initio protein modelling. Structural analysis indicates that a combination of proximity effects, medium effects, and catalysis by strain drives the unique transformation of the disulfide bond into the thioacetal linkage.
Assuntos
Dissulfetos/química , Equinomicina/biossíntese , Catálise , Cristalografia por Raios X , Equinomicina/química , Homocisteína/biossíntese , Homocisteína/química , Ligação de Hidrogênio , Metionina/química , Metionina/metabolismo , Metiltransferases/metabolismo , Estrutura Terciária de Proteína , Quinoxalinas/químicaRESUMO
AMPLE is a program developed for clustering and truncating ab initio protein structure predictions into search models for molecular replacement. Here, it is shown that its core cluster-and-truncate methods also work well for processing NMR ensembles into search models. Rosetta remodelling helps to extend success to NMR structures bearing low sequence identity or high structural divergence from the target protein. Potential future routes to improved performance are considered and practical, general guidelines on using AMPLE are provided.
Assuntos
Substituição de Aminoácidos , Proteínas da Membrana Bacteriana Externa/química , Família Multigênica , Ressonância Magnética Nuclear Biomolecular/métodos , Software , Tiorredoxinas/química , Substituição de Aminoácidos/genética , Proteínas da Membrana Bacteriana Externa/genética , Cristalografia por Raios X/métodos , Previsões , Modelos Moleculares , Dobramento de Proteína , Software/normas , Streptomyces coelicolor/química , Streptomyces coelicolor/genética , Tiorredoxinas/genéticaRESUMO
The ectodomain of the human epidermal growth factor receptor (hEGFR) controls input to several cell signalling networks via binding with extracellular growth factors. To gain insight into the dynamics and ligand binding of the ectodomain, the hEGFR monomer was subjected to molecular dynamics simulation. The monomer was found to be substantially more flexible than the ectodomain dimer studied previously. Simulations where the endogeneous ligand EGF binds to either Subdomain I or Subdomain III, or where hEGFR is unbound, show significant differences in dynamics. The molecular mechanics Poisson-Boltzmann surface area method has been used to derive relative free energies of ligand binding, and we find that the ligand is capable of binding either subdomain with a slight preference for III. Alanine-scanning calculations for the effect of selected ligand mutants on binding reproduce the trends of affinity measurements. Taken together, these results emphasize the possible role of the ectodomain monomer in the initial step of ligand binding, and add details to the static picture obtained from crystal structures.
Assuntos
Fator de Crescimento Epidérmico/metabolismo , Receptores ErbB/química , Receptores ErbB/metabolismo , Humanos , Ligantes , Ligação Proteica , Conformação ProteicaRESUMO
Protein ab initio models predicted from sequence data alone can enable the elucidation of crystal structures by molecular replacement. However, the calculation of such ab initio models is typically computationally expensive. Here, a computational pipeline based on the clustering and truncation of cheaply obtained ab initio models for the preparation of structure ensembles is described. Clustering is used to select models and to quantitatively predict their local accuracy, allowing rational truncation of predicted inaccurate regions. The resulting ensembles, with or without rapidly added side chains, solved 43% of all test cases, with an 80% success rate for all-α proteins. A program implementing this approach, AMPLE, is included in the CCP4 suite of programs. It only requires the input of a FASTA sequence file and a diffraction data file. It carries out the modelling using locally installed Rosetta, creates search ensembles and automatically performs molecular replacement and model rebuilding.
Assuntos
Proteínas/química , Cristalografia por Raios X , Modelos Moleculares , Conformação ProteicaRESUMO
The CCP4 (Collaborative Computational Project, Number 4) software suite is a collection of programs and associated data and software libraries which can be used for macromolecular structure determination by X-ray crystallography. The suite is designed to be flexible, allowing users a number of methods of achieving their aims. The programs are from a wide variety of sources but are connected by a common infrastructure provided by standard file formats, data objects and graphical interfaces. Structure solution by macromolecular crystallography is becoming increasingly automated and the CCP4 suite includes several automation pipelines. After giving a brief description of the evolution of CCP4 over the last 30 years, an overview of the current suite is given. While detailed descriptions are given in the accompanying articles, here it is shown how the individual programs contribute to a complete software package.