RESUMO
The interpretation of cryo-EM maps often includes the docking of known or predicted structures of the components, which is particularly useful when the map resolution is worse than 4â Å. Although it can be effective to search the entire map to find the best placement of a component, the process can be slow when the maps are large. However, frequently there is a well-founded hypothesis about where particular components are located. In such cases, a local search using a map subvolume will be much faster because the search volume is smaller, and more sensitive because optimizing the search volume for the rotation-search step enhances the signal to noise. A Fourier-space likelihood-based local search approach, based on the previously published em_placement software, has been implemented in the new emplace_local program. Tests confirm that the local search approach enhances the speed and sensitivity of the computations. An interactive graphical interface in the ChimeraX molecular-graphics program provides a convenient way to set up and evaluate docking calculations, particularly in defining the part of the map into which the components should be placed.
Assuntos
Microscopia Crioeletrônica , Simulação de Acoplamento Molecular , Software , Microscopia Crioeletrônica/métodos , Simulação de Acoplamento Molecular/métodos , Conformação ProteicaRESUMO
Advances in computational tools for atomic model building are leading to accurate models of large molecular assemblies seen in electron microscopy, often at challenging resolutions of 3-4 Å. We describe new methods in the UCSF ChimeraX molecular modeling package that take advantage of machine-learning structure predictions, provide likelihood-based fitting in maps, and compute per-residue scores to identify modeling errors. Additional model-building tools assist analysis of mutations, post-translational modifications, and interactions with ligands. We present the latest ChimeraX model-building capabilities, including several community-developed extensions. ChimeraX is available free of charge for noncommercial use at https://www.rbvi.ucsf.edu/chimerax.
Assuntos
Software , Microscopia Crioeletrônica/métodos , Funções Verossimilhança , Modelos Moleculares , Microscopia Eletrônica , Conformação ProteicaRESUMO
MOTIVATION: Knowledge graphs (KGs) are being adopted in industry, commerce and academia. Biomedical KG presents a challenge due to the complexity, size and heterogeneity of the underlying information. RESULTS: In this work, we present the Scalable Precision Medicine Open Knowledge Engine (SPOKE), a biomedical KG connecting millions of concepts via semantically meaningful relationships. SPOKE contains 27 million nodes of 21 different types and 53 million edges of 55 types downloaded from 41 databases. The graph is built on the framework of 11 ontologies that maintain its structure, enable mappings and facilitate navigation. SPOKE is built weekly by python scripts which download each resource, check for integrity and completeness, and then create a 'parent table' of nodes and edges. Graph queries are translated by a REST API and users can submit searches directly via an API or a graphical user interface. Conclusions/Significance: SPOKE enables the integration of seemingly disparate information to support precision medicine efforts. AVAILABILITY AND IMPLEMENTATION: The SPOKE neighborhood explorer is available at https://spoke.rbvi.ucsf.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Reconhecimento Automatizado de Padrão , Medicina de Precisão , Bases de Dados FactuaisRESUMO
Single-cell RNA-sequencing (scRNA-seq) has revolutionized molecular biology and medicine by enabling high-throughput studies of cellular heterogeneity in diverse tissues. Applying network biology approaches to scRNA-seq data can provide useful insights into genes driving heterogeneous cell-type compositions of tissues. Here, we present scNetViz- a Cytoscape app to aid biological interpretation of cell clusters in scRNA-seq data using network analysis. scNetViz calculates the differential expression of each gene across clusters and then creates a cluster-specific gene functional interaction network between the significantly differentially expressed genes for further analysis, such as pathway enrichment analysis. To automate a complete data analysis workflow, scNetViz integrates parts of the Scanpy software, which is a popular Python package for scRNA-seq data analysis, with Cytoscape apps such as stringApp, cyPlot, and enhancedGraphics. We describe our implementation of methods for accessing data from public single cell atlas projects, differential expression analysis, visualization, and automation. scNetViz enables users to analyze data from public atlases or their own experiments, which we illustrate with two use cases. Analysis can be performed via the Cytoscape GUI or CyREST programming interface using R (RCy3) or Python (py4cytoscape).
Assuntos
Redes Reguladoras de Genes , Software , Automação , Análise de Dados , Fluxo de TrabalhoRESUMO
UCSF ChimeraX is the next-generation interactive visualization program from the Resource for Biocomputing, Visualization, and Informatics (RBVI), following UCSF Chimera. ChimeraX brings (a) significant performance and graphics enhancements; (b) new implementations of Chimera's most highly used tools, many with further improvements; (c) several entirely new analysis features; (d) support for new areas such as virtual reality, light-sheet microscopy, and medical imaging data; (e) major ease-of-use advances, including toolbars with icons to perform actions with a single click, basic "undo" capabilities, and more logical and consistent commands; and (f) an app store for researchers to contribute new tools. ChimeraX includes full user documentation and is free for noncommercial use, with downloads available for Windows, Linux, and macOS from https://www.rbvi.ucsf.edu/chimerax.
Assuntos
Gráficos por Computador , Imageamento Tridimensional , Modelos Moleculares , SoftwareRESUMO
Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.
Assuntos
Leucócitos Mononucleares , Algoritmos , Animais , Perfilação da Expressão Gênica , Humanos , Camundongos , RNA , Reprodutibilidade dos Testes , Análise de Célula ÚnicaRESUMO
The radical SAM superfamily contains over 100,000 homologous enzymes that catalyze a remarkably broad range of reactions required for life, including metabolism, nucleic acid modification, and biogenesis of cofactors. While the highly conserved SAM-binding motif responsible for formation of the key 5'-deoxyadenosyl radical intermediate is a key structural feature that simplifies identification of superfamily members, our understanding of their structure-function relationships is complicated by the modular nature of their structures, which exhibit varied and complex domain architectures. To gain new insight about these relationships, we classified the entire set of sequences into similarity-based subgroups that could be visualized using sequence similarity networks. This superfamily-wide analysis reveals important features that had not previously been appreciated from studies focused on one or a few members. Functional information mapped to the networks indicates which members have been experimentally or structurally characterized, their known reaction types, and their phylogenetic distribution. Despite the biological importance of radical SAM chemistry, the vast majority of superfamily members have never been experimentally characterized in any way, suggesting that many new reactions remain to be discovered. In addition to 20 subgroups with at least one known function, we identified additional subgroups made up entirely of sequences of unknown function. Importantly, our results indicate that even general reaction types fail to track well with our sequence similarity-based subgroupings, raising major challenges for function prediction for currently identified and new members that continue to be discovered. Interactive similarity networks and other data from this analysis are available from the Structure-Function Linkage Database.
Assuntos
Enzimas/classificação , Radicais Livres/metabolismo , Domínios Proteicos/genética , S-Adenosilmetionina/metabolismo , Sequência de Aminoácidos/genética , Biologia Computacional , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , Evolução Molecular , Radicais Livres/química , Filogenia , S-Adenosilmetionina/química , Alinhamento de Sequência , Relação Estrutura-AtividadeRESUMO
Can virtual reality be useful for visualizing and analyzing molecular structures and three-dimensional (3D) microscopy? Uses we are exploring include studies of drug binding to proteins and the effects of mutations, building accurate atomic models in electron microscopy and x-ray density maps, understanding how immune system cells move using 3D light microscopy, and teaching schoolchildren about biomolecules that are the machinery of life. Virtual reality (VR) offers immersive display with a wide field of view and head tracking for better perception of molecular architectures and uses 6-degree-of-freedom hand controllers for simple manipulation of 3D data. Conventional computer displays with trackpad, mouse and keyboard excel at two-dimensional tasks such as writing and studying research literature, uses for which VR technology is at present far inferior. Adding VR to the conventional computing environment could improve 3D capabilities if new user-interface problems can be solved. We have developed three VR applications: ChimeraX for analyzing molecular structures and electron and light microscopy data, AltPDB for collaborative discussions around atomic models, and Molecular Zoo for teaching young students characteristics of biomolecules. Investigations over three decades have produced an extensive literature evaluating the potential of VR in research and education. Consumer VR headsets are now affordable to researchers and educators, allowing direct tests of whether the technology is valuable in these areas. We survey here advantages and disadvantages of VR for molecular biology in the context of affordable and dramatically more powerful VR and graphics hardware than has been available in the past.
Assuntos
Modelos Moleculares , Conformação Molecular , Software , Animais , Simulação por Computador , Humanos , Imageamento Tridimensional , Proteínas/química , Interface Usuário-ComputadorRESUMO
UCSF ChimeraX is next-generation software for the visualization and analysis of molecular structures, density maps, 3D microscopy, and associated data. It addresses challenges in the size, scope, and disparate types of data attendant with cutting-edge experimental methods, while providing advanced options for high-quality rendering (interactive ambient occlusion, reliable molecular surface calculations, etc.) and professional approaches to software design and distribution. This article highlights some specific advances in the areas of visualization and usability, performance, and extensibility. ChimeraX is free for noncommercial use and is available from http://www.rbvi.ucsf.edu/chimerax/ for Windows, Mac, and Linux.
Assuntos
Imageamento Tridimensional , Software , Estrutura MolecularRESUMO
With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD. Database URL: http://sfld.rbvi.ucsf.edu/.
Assuntos
Bases de Dados de Proteínas , Enzimas/química , Enzimas/genética , Ontologia Genética , Anotação de Sequência Molecular , Homologia Estrutural de Proteína , Relação Estrutura-AtividadeRESUMO
Integrating access to web services with desktop applications allows for an expanded set of application features, including performing computationally intensive tasks and convenient searches of databases. We describe how we have enhanced UCSF Chimera (http://www.rbvi.ucsf.edu/chimera/), a program for the interactive visualization and analysis of molecular structures and related data, through the addition of several web services (http://www.rbvi.ucsf.edu/chimera/docs/webservices.html). By streamlining access to web services, including the entire job submission, monitoring and retrieval process, Chimera makes it simpler for users to focus on their science projects rather than data manipulation. Chimera uses Opal, a toolkit for wrapping scientific applications as web services, to provide scalable and transparent access to several popular software packages. We illustrate Chimera's use of web services with an example workflow that interleaves use of these services with interactive manipulation of molecular sequences and structures, and we provide an example Python program to demonstrate how easily Opal-based web services can be accessed from within an application. Web server availability: http://webservices.rbvi.ucsf.edu/opal2/dashboard?command=serviceList.
Assuntos
Estrutura Molecular , Software , Internet , Modelos MolecularesRESUMO
The Structure-Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure-function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies 'look alike', making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.
Assuntos
Bases de Dados de Proteínas , Enzimas/química , Enzimas/classificação , Enzimas/metabolismo , Internet , Anotação de Sequência Molecular , Alinhamento de Sequência , Relação Estrutura-AtividadeRESUMO
Structural modeling of macromolecular complexes greatly benefits from interactive visualization capabilities. Here we present the integration of several modeling tools into UCSF Chimera. These include comparative modeling by MODELLER, simultaneous fitting of multiple components into electron microscopy density maps by IMP MultiFit, computing of small-angle X-ray scattering profiles and fitting of the corresponding experimental profile by IMP FoXS, and assessment of amino acid sidechain conformations based on rotamer probabilities and local interactions by Chimera.
Assuntos
Simulação por Computador , Modelos Moleculares , Software , Sequência de Aminoácidos , Animais , Bovinos , Proteínas de Escherichia coli/química , Proteínas de Choque Térmico/química , Substâncias Macromoleculares/química , Dados de Sequência Molecular , Conformação Proteica , Subunidades Proteicas/química , Espalhamento a Baixo Ângulo , Homologia Estrutural de Proteína , Difração de Raios XRESUMO
In functionally diverse enzyme superfamilies (SFs), conserved structural and active site features reflect catalytic capabilities 'hard-wired' in each SF architecture. Overlaid on this foundation, evolutionary changes in active site machinery, structural topology and other aspects of structural organization and interactions support the emergence of new reactions, mechanisms, and substrate specificity. This review connects topological with functional variation in each of the haloalkanoic acid dehalogenase (HAD) and vicinal oxygen chelate fold (VOC) SFs and a set of redox-active thioredoxin (Trx)-fold SFs to illustrate a few of the varied themes nature has used to evolve new functions from a limited set of structural scaffolds.
Assuntos
Enzimas/química , Enzimas/metabolismo , Evolução Molecular , Animais , Domínio Catalítico , Sequência Conservada , Humanos , Hidrolases/química , Hidrolases/metabolismo , Modelos Moleculares , Conformação Proteica , Especificidade por Substrato , Tiorredoxinas/químicaRESUMO
ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment (http://salilab.org/modeller/). ModBase currently contains 10,355,444 reliable models for domains in 2,421,920 unique protein sequences. ModBase allows users to update comparative models on demand, and request modeling of additional sequences through an interface to the ModWeb modeling server (http://salilab.org/modweb). ModBase models are available through the ModBase interface as well as the Protein Model Portal (http://www.proteinmodelportal.org/). Recently developed associated resources include the SALIGN server for multiple sequence and structure alignment (http://salilab.org/salign), the ModEval server for predicting the accuracy of protein structure models (http://salilab.org/modeval), the PCSS server for predicting which peptides bind to a given protein (http://salilab.org/pcss) and the FoXS server for calculating and fitting Small Angle X-ray Scattering profiles (http://salilab.org/foxs).
Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas de Bactérias/química , Gráficos por Computador , Peptídeos/química , Mapeamento de Interação de Proteínas , Proteínas/química , Espalhamento a Baixo Ângulo , Alinhamento de Sequência , Software , Homologia Estrutural de Proteína , Interface Usuário-Computador , Difração de Raios XRESUMO
Linking proteomics and structural data is critical to our understanding of cellular processes, and interactive exploration of these complementary data sets can be extremely valuable for developing or confirming hypotheses in silico. However, few computational tools facilitate linking these types of data interactively. In addition, the tools that do exist are neither well understood nor widely used by the proteomics or structural biology communities. We briefly describe several relevant tools, and then, using three scenarios, we present in depth two tools for the integrated exploration of proteomics and structural data.
Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteômica/métodos , Animais , Humanos , Modelos Moleculares , Proteínas Mutantes/química , Ligação Proteica , Saccharomyces cerevisiae/enzimologia , SoftwareRESUMO
With an increasing interest in RNA therapeutics and for targeting RNA to treat disease, there is a need for the tools used in protein-based drug design, particularly DOCKing algorithms, to be extended or adapted for nucleic acids. Here, we have compiled a test set of RNA-ligand complexes to validate the ability of the DOCK suite of programs to successfully recreate experimentally determined binding poses. With the optimized parameters and a minimal scoring function, 70% of the test set with less than seven rotatable ligand bonds and 26% of the test set with less than 13 rotatable bonds can be successfully recreated within 2 A heavy-atom RMSD. When DOCKed conformations are rescored with the implicit solvent models AMBER generalized Born with solvent-accessible surface area (GB/SA) and Poisson-Boltzmann with solvent-accessible surface area (PB/SA) in combination with explicit water molecules and sodium counterions, the success rate increases to 80% with PB/SA for less than seven rotatable bonds and 58% with AMBER GB/SA and 47% with PB/SA for less than 13 rotatable bonds. These results indicate that DOCK can indeed be useful for structure-based drug design aimed at RNA. Our studies also suggest that RNA-directed ligands often differ from typical protein-ligand complexes in their electrostatic properties, but these differences can be accommodated through the choice of potential function. In addition, in the course of the study, we explore a variety of newly added DOCK functions, demonstrating the ease with which new functions can be added to address new scientific questions.
Assuntos
RNA/química , Software , Algoritmos , Sítios de Ligação , Ligantes , Modelos Moleculares , RNA/metabolismoRESUMO
Structural and biochemical constraints force some segments of proteins to evolve more slowly than others, often allowing identification of conserved structural or sequence motifs that can be associated with substrate binding properties, chemical mechanisms, and molecular functions. We have assessed the functional and structural constraints imposed by cofactors on the evolution of new functions in a superfamily of flavoproteins characterized by two-dinucleotide binding domains, the "two dinucleotide binding domains" flavoproteins (tDBDF) superfamily. Although these enzymes catalyze many different types of oxidation/reduction reactions, each is initiated by a stereospecific hydride transfer reaction between two cofactors, a pyridine nucleotide and flavin adenine dinucleotide (FAD). Sequence and structural analysis of more than 1,600 members of the superfamily reveals new members and identifies details of the evolutionary connections among them. Our analysis shows that in all of the highly divergent families within the superfamily, these cofactors adopt a conserved configuration optimal for stereospecific hydride transfer that is stabilized by specific interactions with amino acids from several motifs distributed among both dinucleotide binding domains. The conservation of cofactor configuration in the active site restricts the pyridine nucleotide to interact with FAD from the re-side, limiting the flow of electrons from the re-side to the si-side. This directionality of electron flow constrains interactions with the different partner proteins of different families to occur on the same face of the cofactor binding domains. As a result, superimposing the structures of tDBDFs aligns not only these interacting proteins, but also their constituent electron acceptors, including heme and iron-sulfur clusters. Thus, not only are specific aspects of the cofactor-directed chemical mechanism conserved across the superfamily, the constraints they impose are manifested in the mode of protein-protein interactions. Overlaid on this foundation of conserved interactions, nature has conscripted different protein partners to serve as electron acceptors, thereby generating diversification of function across the superfamily.
Assuntos
Evolução Biológica , Coenzimas/química , Sequência Conservada/fisiologia , Flavoproteínas Transferidoras de Elétrons/química , Flavoproteínas Transferidoras de Elétrons/genética , Domínios e Motivos de Interação entre Proteínas , Sítio Alostérico/fisiologia , Domínio Catalítico/fisiologia , Fosfatos de Dinucleosídeos/química , Flavoproteínas Transferidoras de Elétrons/metabolismo , Flavina-Adenina Dinucleotídeo/química , Domínios e Motivos de Interação entre Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos , Relação Estrutura-AtividadeRESUMO
BACKGROUND: Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a) provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b) facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit); (c) can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d) interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. RESULTS: The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. CONCLUSION: The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is available for Microsoft Windows, Apple Mac OS X, Linux, and other platforms from http://www.cgl.ucsf.edu/chimera.