RESUMEN
RNA polymerase II (RNAPII) lies at the core of dynamic control of gene expression. Using 53 RNAPII point mutants, we generated a point mutant epistatic miniarray profile (pE-MAP) comprising â¼60,000 quantitative genetic interactions in Saccharomyces cerevisiae. This analysis enabled functional assignment of RNAPII subdomains and uncovered connections between individual regions and other protein complexes. Using splicing microarrays and mutants that alter elongation rates in vitro, we found an inverse relationship between RNAPII speed and in vivo splicing efficiency. Furthermore, the pE-MAP classified fast and slow mutants that favor upstream and downstream start site selection, respectively. The striking coordination of polymerization rate with transcription initiation and splicing suggests that transcription rate is tuned to regulate multiple gene expression steps. The pE-MAP approach provides a powerful strategy to understand other multifunctional machines at amino acid resolution.
Asunto(s)
Epistasis Genética , ARN Polimerasa II/genética , ARN Polimerasa II/metabolismo , Saccharomyces cerevisiae/enzimología , Saccharomyces cerevisiae/genética , Alelos , Estudio de Asociación del Genoma Completo , Mutación Puntual , ARN Polimerasa II/química , Empalme del ARN , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo , Sitio de Iniciación de la Transcripción , Transcripción Genética , TranscriptomaRESUMEN
Mass-spectrometry-based proteomics allows the quantification of thousands of proteins, protein variants, and their modifications, in many biological samples. These are derived from the measurement of peptide relative quantities, and it is not always possible to distinguish proteins with similar sequences due to the absence of protein-specific peptides. In such cases, peptide signals are reported in protein groups that can correspond to several genes. Here, we show that multi-gene protein groups have a limited impact on GO-term enrichment, but selecting only one gene per group affects network analysis. We thus present the Cytoscape app Proteo Visualizer (https://apps.cytoscape.org/apps/ProteoVisualizer) that is designed for retrieving protein interaction networks from STRING using protein groups as input and thus allows visualisation and network analysis of bottom-up MS-based proteomics data sets.
RESUMEN
MOTIVATION: Large language models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains such as biomedicine. Solutions such as pretraining and domain-specific fine-tuning add substantial computational overhead, requiring further domain-expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo, and GPT-4, to generate meaningful biomedical text rooted in established knowledge. RESULTS: Compared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain-specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion. AVAILABILITY AND IMPLEMENTATION: SPOKE KG can be accessed at https://spoke.rbvi.ucsf.edu/neighborhood.html. It can also be accessed using REST-API (https://spoke.rbvi.ucsf.edu/swagger/). KG-RAG code is made available at https://github.com/BaranziniLab/KG_RAG. Biomedical benchmark datasets used in this study are made available to the research community in the same GitHub repository.
Asunto(s)
Procesamiento de Lenguaje Natural , Biología Computacional/métodos , Algoritmos , HumanosRESUMEN
MOTIVATION: Knowledge graphs (KGs) are being adopted in industry, commerce and academia. Biomedical KG presents a challenge due to the complexity, size and heterogeneity of the underlying information. RESULTS: In this work, we present the Scalable Precision Medicine Open Knowledge Engine (SPOKE), a biomedical KG connecting millions of concepts via semantically meaningful relationships. SPOKE contains 27 million nodes of 21 different types and 53 million edges of 55 types downloaded from 41 databases. The graph is built on the framework of 11 ontologies that maintain its structure, enable mappings and facilitate navigation. SPOKE is built weekly by python scripts which download each resource, check for integrity and completeness, and then create a 'parent table' of nodes and edges. Graph queries are translated by a REST API and users can submit searches directly via an API or a graphical user interface. Conclusions/Significance: SPOKE enables the integration of seemingly disparate information to support precision medicine efforts. AVAILABILITY AND IMPLEMENTATION: The SPOKE neighborhood explorer is available at https://spoke.rbvi.ucsf.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Medicina de Precisión , Bases de Datos FactualesRESUMEN
Ferroptosis is a caspase-independent, iron-dependent form of regulated necrosis extant in traumatic brain injury, Huntington disease, and hemorrhagic stroke. It can be activated by cystine deprivation leading to glutathione depletion, the insufficiency of the antioxidant glutathione peroxidase-4, and the hemolysis products hemoglobin and hemin. A cardinal feature of ferroptosis is extracellular signal-regulated kinase (ERK)1/2 activation culminating in its translocation to the nucleus. We have previously confirmed that the mitogen-activated protein (MAP) kinase kinase (MEK) inhibitor U0126 inhibits persistent ERK1/2 phosphorylation and ferroptosis. Here, we show that hemin exposure, a model of secondary injury in brain hemorrhage and ferroptosis, activated ERK1/2 in mouse neurons. Accordingly, MEK inhibitor U0126 protected against hemin-induced ferroptosis. Unexpectedly, U0126 prevented hemin-induced ferroptosis independent of its ability to inhibit ERK1/2 signaling. In contrast to classical ferroptosis in neurons or cancer cells, chemically diverse inhibitors of MEK did not block hemin-induced ferroptosis, nor did the forced expression of the ERK-selective MAP kinase phosphatase (MKP)3. We conclude that hemin or hemoglobin-induced ferroptosis, unlike glutathione depletion, is ERK1/2-independent. Together with recent studies, our findings suggest the existence of a novel subtype of neuronal ferroptosis relevant to bleeding in the brain that is 5-lipoxygenase-dependent, ERK-independent, and transcription-independent. Remarkably, our unbiased phosphoproteome analysis revealed dramatic differences in phosphorylation induced by two ferroptosis subtypes. As U0126 also reduced cell death and improved functional recovery after hemorrhagic stroke in male mice, our analysis also provides a template on which to build a search for U0126's effects in a variant of neuronal ferroptosis.SIGNIFICANCE STATEMENT Ferroptosis is an iron-dependent mechanism of regulated necrosis that has been linked to hemorrhagic stroke. Common features of ferroptotic death induced by diverse stimuli are the depletion of the antioxidant glutathione, production of lipoxygenase-dependent reactive lipids, sensitivity to iron chelation, and persistent activation of extracellular signal-regulated kinase (ERK) signaling. Unlike classical ferroptosis induced in neurons or cancer cells, here we show that ferroptosis induced by hemin is ERK-independent. Paradoxically, the canonical MAP kinase kinase (MEK) inhibitor U0126 blocks brain hemorrhage-induced death. Altogether, these data suggest that a variant of ferroptosis is unleashed in hemorrhagic stroke. We present the first, unbiased phosphoproteomic analysis of ferroptosis as a template on which to understand distinct paths to cell death that meet the definition of ferroptosis.
Asunto(s)
Ferroptosis , Accidente Cerebrovascular Hemorrágico , Animales , Antioxidantes/metabolismo , Quinasas MAP Reguladas por Señal Extracelular/metabolismo , Glutatión/metabolismo , Hemina/metabolismo , Hemina/farmacología , Hemoglobinas/metabolismo , Hemorragias Intracraneales/metabolismo , Hierro/metabolismo , Masculino , Ratones , Quinasas de Proteína Quinasa Activadas por Mitógenos/metabolismo , Necrosis/metabolismo , Neuronas/metabolismo , FosforilaciónRESUMEN
BACKGROUND: Since the initial publication of clusterMaker, the need for tools to analyze large biological datasets has only increased. New datasets are significantly larger than a decade ago, and new experimental techniques such as single-cell transcriptomics continue to drive the need for clustering or classification techniques to focus on portions of datasets of interest. While many libraries and packages exist that implement various algorithms, there remains the need for clustering packages that are easy to use, integrated with visualization of the results, and integrated with other commonly used tools for biological data analysis. clusterMaker2 has added several new algorithms, including two entirely new categories of analyses: node ranking and dimensionality reduction. Furthermore, many of the new algorithms have been implemented using the Cytoscape jobs API, which provides a mechanism for executing remote jobs from within Cytoscape. Together, these advances facilitate meaningful analyses of modern biological datasets despite their ever-increasing size and complexity. RESULTS: The use of clusterMaker2 is exemplified by reanalyzing the yeast heat shock expression experiment that was included in our original paper; however, here we explored this dataset in significantly more detail. Combining this dataset with the yeast protein-protein interaction network from STRING, we were able to perform a variety of analyses and visualizations from within clusterMaker2, including Leiden clustering to break the entire network into smaller clusters, hierarchical clustering to look at the overall expression dataset, dimensionality reduction using UMAP to find correlations between our hierarchical visualization and the UMAP plot, fuzzy clustering, and cluster ranking. Using these techniques, we were able to explore the highest-ranking cluster and determine that it represents a strong contender for proteins working together in response to heat shock. We found a series of clusters that, when re-explored as fuzzy clusters, provide a better presentation of mitochondrial processes. CONCLUSIONS: clusterMaker2 represents a significant advance over the previously published version, and most importantly, provides an easy-to-use tool to perform clustering and to visualize clusters within the Cytoscape network context. The new algorithms should be welcome to the large population of Cytoscape users, particularly the new dimensionality reduction and fuzzy clustering techniques.
Asunto(s)
Aplicaciones Móviles , Saccharomyces cerevisiae , Algoritmos , Mapas de Interacción de Proteínas , Análisis por ConglomeradosRESUMEN
Biological networks are often used to represent complex biological systems, which can contain several types of entities. Analysis and visualization of such networks is supported by the Cytoscape software tool and its many apps. While earlier versions of stringApp focused on providing intraspecies protein-protein interactions from the STRING database, the new stringApp 2.0 greatly improves the support for heterogeneous networks. Here, we highlight new functionality that makes it possible to create networks that contain proteins and interactions from STRING as well as other biological entities and associations from other sources. We exemplify this by complementing a published SARS-CoV-2 interactome with interactions from STRING. We have also extended stringApp with new data and query functionality for protein-protein interactions between eukaryotic parasites and their hosts. We show how this can be used to retrieve and visualize a cross-species network for a malaria parasite, its host, and its vector. Finally, the latest stringApp version has an improved user interface, allows retrieval of both functional associations and physical interactions, and supports group-wise enrichment analysis of different parts of a network to aid biological interpretation. stringApp is freely available at https://apps.cytoscape.org/apps/stringapp.
Asunto(s)
COVID-19 , Humanos , SARS-CoV-2 , Programas Informáticos , Proteínas , EucariontesRESUMEN
SUMMARY: IntAct App is a Cytoscape 3 application that grants in-depth access to IntAct's molecular interaction data. It build networks where nodes are interacting molecules (mainly proteins, but also genes, RNA, chemicals ) and edges represent evidence of interaction. Users can query a network by providing its molecules, identified by different fields and optionally include all their interacting partners in the resulting network. The app offers three visualizations: one only displaying interactions, another representing every evidence and the last one emphasizing evidence where mutated versions of proteins were used. Users can also filter networks and click on nodes and edges to access all their related details. Finally, the application supports automation of its main features via Cytoscape commands. AVAILABILITY AND IMPLEMENTATION: Implementation available at https://apps.cytoscape.org/apps/intactapp, while the source code is available at https://github.com/EBI-IntAct/IntactApp.
RESUMEN
Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.
Asunto(s)
Genómica/métodos , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Animales , Bases de Datos Genéticas , Ontología de Genes , HumanosRESUMEN
Biological network figures are ubiquitous in the biology and medical literature. On the one hand, a good network figure can quickly provide information about the nature and degree of interactions between items and enable inferences about the reason for those interactions. On the other hand, good network figures are difficult to create. In this paper, we outline 10 simple rules for creating biological network figures for communication, from choosing layouts, to applying color or other channels to show attributes, to the use of layering and separation. These rules are accompanied by illustrative examples. We also provide a concise set of references and additional resources for each rule.
Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Atención , Color , Humanos , Mapas de Interacción de Proteínas/fisiología , Transducción de Señal/fisiología , Percepción VisualRESUMEN
Protein networks have become a popular tool for analyzing and visualizing the often long lists of proteins or genes obtained from proteomics and other high-throughput technologies. One of the most popular sources of such networks is the STRING database, which provides protein networks for more than 2000 organisms, including both physical interactions from experimental data and functional associations from curated pathways, automatic text mining, and prediction methods. However, its web interface is mainly intended for inspection of small networks and their underlying evidence. The Cytoscape software, on the other hand, is much better suited for working with large networks and offers greater flexibility in terms of network analysis, import, and visualization of additional data. To include both resources in the same workflow, we created stringApp, a Cytoscape app that makes it easy to import STRING networks into Cytoscape, retains the appearance and many of the features of STRING, and integrates data from associated databases. Here, we introduce many of the stringApp features and show how they can be used to carry out complex network analysis and visualization tasks on a typical proteomics data set, all through the Cytoscape user interface. stringApp is freely available from the Cytoscape app store: http://apps.cytoscape.org/apps/stringapp .
Asunto(s)
Análisis de Datos , Proteómica/métodos , Programas Informáticos , Biología Computacional/métodos , Internet , Mapas de Interacción de Proteínas , Interfaz Usuario-ComputadorRESUMEN
A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.
Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Programas Informáticos , Modelos Moleculares , Unión Proteica , Conformación Proteica , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Proteínas/química , Proteínas/metabolismo , Relación Estructura-Actividad , Interfaz Usuario-Computador , Navegador WebRESUMEN
Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.
Asunto(s)
Bases de Datos de Proteínas , Peroxirredoxinas/química , Peroxirredoxinas/clasificación , Mapeo de Interacción de Proteínas/métodos , Análisis de Secuencia de Proteína/métodos , Homología de Secuencia de Aminoácido , Secuencia de Aminoácidos , Sitios de Unión , Sistemas de Administración de Bases de Datos , Activación Enzimática , Ensayos Analíticos de Alto Rendimiento/métodos , Datos de Secuencia Molecular , Familia de Multigenes , Peroxirredoxinas/ultraestructura , Unión ProteicaRESUMEN
Human immunodeficiency virus (HIV) has a small genome and therefore relies heavily on the host cellular machinery to replicate. Identifying which host proteins and complexes come into physical contact with the viral proteins is crucial for a comprehensive understanding of how HIV rewires the host's cellular machinery during the course of infection. Here we report the use of affinity tagging and purification mass spectrometry to determine systematically the physical interactions of all 18 HIV-1 proteins and polyproteins with host proteins in two different human cell lines (HEK293 and Jurkat). Using a quantitative scoring system that we call MiST, we identified with high confidence 497 HIV-human protein-protein interactions involving 435 individual human proteins, with â¼40% of the interactions being identified in both cell types. We found that the host proteins hijacked by HIV, especially those found interacting in both cell types, are highly conserved across primates. We uncovered a number of host complexes targeted by viral proteins, including the finding that HIV protease cleaves eIF3d, a subunit of eukaryotic translation initiation factor 3. This host protein is one of eleven identified in this analysis that act to inhibit HIV replication. This data set facilitates a more comprehensive and detailed understanding of how the host machinery is manipulated during the course of HIV infection.
Asunto(s)
VIH-1/química , VIH-1/metabolismo , Interacciones Huésped-Patógeno , Proteínas del Virus de la Inmunodeficiencia Humana/metabolismo , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas/fisiología , Marcadores de Afinidad , Secuencia de Aminoácidos , Secuencia Conservada , Factor 3 de Iniciación Eucariótica/química , Factor 3 de Iniciación Eucariótica/metabolismo , Células HEK293 , Infecciones por VIH/metabolismo , Infecciones por VIH/virología , Proteasa del VIH/metabolismo , VIH-1/fisiología , Proteínas del Virus de la Inmunodeficiencia Humana/análisis , Proteínas del Virus de la Inmunodeficiencia Humana/química , Proteínas del Virus de la Inmunodeficiencia Humana/aislamiento & purificación , Humanos , Inmunoprecipitación , Células Jurkat , Espectrometría de Masas , Unión Proteica , Reproducibilidad de los Resultados , Replicación ViralRESUMEN
BACKGROUND: Development of automatable processes for clustering proteins into functionally relevant groups is a critical hurdle as an increasing number of sequences are deposited into databases. Experimental function determination is exceptionally time-consuming and can't keep pace with the identification of protein sequences. A tool, DASP (Deacon Active Site Profiler), was previously developed to identify protein sequences with active site similarity to a query set. Development of two iterative, automatable methods for clustering proteins into functionally relevant groups exposed algorithmic limitations to DASP. RESULTS: The accuracy and efficiency of DASP was significantly improved through six algorithmic enhancements implemented in two stages: DASP2 and DASP3. Validation demonstrated DASP3 provides greater score separation between true positives and false positives than earlier versions. In addition, DASP3 shows similar performance to previous versions in clustering protein structures into isofunctional groups (validated against manual curation), but DASP3 gathers and clusters protein sequences into isofunctional groups more efficiently than DASP and DASP2. CONCLUSIONS: DASP algorithmic enhancements resulted in improved efficiency and accuracy of identifying proteins that contain active site features similar to those of the query set. These enhancements provide incremental improvement in structure database searches and initial sequence database searches; however, the enhancements show significant improvement in iterative sequence searches, suggesting DASP3 is an appropriate tool for the iterative processes required for clustering proteins into isofunctional groups.
Asunto(s)
Algoritmos , Análisis de Secuencia de Proteína/métodos , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Dominio Catalítico , Análisis por Conglomerados , Bases de Datos de Proteínas , Proteínas/químicaRESUMEN
MOTIVATION: cddApp is a Cytoscape extension that supports the annotation of protein networks with information about domains and specific functional sites from the National Center for Biotechnology Information's conserved domain database (CDD). CDD information is loaded for nodes annotated with NCBI numbers or UniProt identifiers and (optionally) Protein Data Bank structures. cddApp integrates with the Cytoscape apps structureViz2 and enhancedGraphics. Together, these three apps provide powerful tools to annotate nodes with CDD domain and site information and visualize that information in both network and structural contexts. AVAILABILITY AND IMPLEMENTATION: cddApp is written in Java and freely available for download from the Cytoscape app store (http://apps.cytoscape.org). Documentation is provided at http://www.rbvi.ucsf.edu/cytoscape, and the source is publically available from GitHub http://github.com/RBVI/cddApp.
Asunto(s)
Proteínas Bacterianas/metabolismo , Biología Computacional/instrumentación , Redes y Vías Metabólicas , Anotación de Secuencia Molecular/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Bacillus , Proteínas Bacterianas/química , Secuencia Conservada , Bases de Datos de Proteínas , Humanos , Conformación Proteica , Mapeo de Interacción de ProteínasRESUMEN
Integrating access to web services with desktop applications allows for an expanded set of application features, including performing computationally intensive tasks and convenient searches of databases. We describe how we have enhanced UCSF Chimera (http://www.rbvi.ucsf.edu/chimera/), a program for the interactive visualization and analysis of molecular structures and related data, through the addition of several web services (http://www.rbvi.ucsf.edu/chimera/docs/webservices.html). By streamlining access to web services, including the entire job submission, monitoring and retrieval process, Chimera makes it simpler for users to focus on their science projects rather than data manipulation. Chimera uses Opal, a toolkit for wrapping scientific applications as web services, to provide scalable and transparent access to several popular software packages. We illustrate Chimera's use of web services with an example workflow that interleaves use of these services with interactive manipulation of molecular sequences and structures, and we provide an example Python program to demonstrate how easily Opal-based web services can be accessed from within an application. Web server availability: http://webservices.rbvi.ucsf.edu/opal2/dashboard?command=serviceList.
Asunto(s)
Estructura Molecular , Programas Informáticos , Internet , Modelos MolecularesRESUMEN
The Structure-Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure-function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies 'look alike', making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.
Asunto(s)
Bases de Datos de Proteínas , Enzimas/química , Enzimas/clasificación , Enzimas/metabolismo , Internet , Anotación de Secuencia Molecular , Alineación de Secuencia , Relación Estructura-ActividadRESUMEN
Cytoscape is an open-source bioinformatics environment for the analysis, integration, visualization, and query of biological networks. In this perspective piece, we describe our project to bring the Cytoscape desktop application to the web while explaining our strategy in ways relevant to others in the bioinformatics community. We examine opportunities and challenges in developing bioinformatics software that spans both the desktop and web, and we describe our ongoing efforts to build a Cytoscape web application, highlighting the principles that guide our development.
RESUMEN
Advances in computational tools for atomic model building are leading to accurate models of large molecular assemblies seen in electron microscopy, often at challenging resolutions of 3-4 Å. We describe new methods in the UCSF ChimeraX molecular modeling package that take advantage of machine-learning structure predictions, provide likelihood-based fitting in maps, and compute per-residue scores to identify modeling errors. Additional model-building tools assist analysis of mutations, post-translational modifications, and interactions with ligands. We present the latest ChimeraX model-building capabilities, including several community-developed extensions. ChimeraX is available free of charge for noncommercial use at https://www.rbvi.ucsf.edu/chimerax.