Search | VHL Search Portal

1.

From structure to systems: high-resolution, quantitative genetic analysis of RNA polymerase II.

Braberg, Hannes; Jin, Huiyan; Moehle, Erica A; Chan, Yujia A; Wang, Shuyi; Shales, Michael; Benschop, Joris J; Morris, John H; Qiu, Chenxi; Hu, Fuqu; Tang, Leung K; Fraser, James S; Holstege, Frank C P; Hieter, Philip; Guthrie, Christine; Kaplan, Craig D; Krogan, Nevan J.

Cell ; 154(4): 775-88, 2013 Aug 15.

Article in English | MEDLINE | ID: mdl-23932120

ABSTRACT

RNA polymerase II (RNAPII) lies at the core of dynamic control of gene expression. Using 53 RNAPII point mutants, we generated a point mutant epistatic miniarray profile (pE-MAP) comprising â¼60,000 quantitative genetic interactions in Saccharomyces cerevisiae. This analysis enabled functional assignment of RNAPII subdomains and uncovered connections between individual regions and other protein complexes. Using splicing microarrays and mutants that alter elongation rates in vitro, we found an inverse relationship between RNAPII speed and in vivo splicing efficiency. Furthermore, the pE-MAP classified fast and slow mutants that favor upstream and downstream start site selection, respectively. The striking coordination of polymerization rate with transcription initiation and splicing suggests that transcription rate is tuned to regulate multiple gene expression steps. The pE-MAP approach provides a powerful strategy to understand other multifunctional machines at amino acid resolution.

Subject(s)

Epistasis, Genetic , RNA Polymerase II/genetics , RNA Polymerase II/metabolism , Saccharomyces cerevisiae/enzymology , Saccharomyces cerevisiae/genetics , Alleles , Genome-Wide Association Study , Point Mutation , RNA Polymerase II/chemistry , RNA Splicing , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Transcription Factors/metabolism , Transcription Initiation Site , Transcription, Genetic , Transcriptome

2.

Biomedical knowledge graph-optimized prompt generation for large language models.

Soman, Karthik; Rose, Peter W; Morris, John H; Akbas, Rabia E; Smith, Brett; Peetoom, Braian; Villouta-Reyes, Catalina; Cerono, Gabriel; Shi, Yongmei; Rizk-Jackson, Angela; Israni, Sharat; Nelson, Charlotte A; Huang, Sui; Baranzini, Sergio E.

Bioinformatics ; 2024 Sep 17.

Article in English | MEDLINE | ID: mdl-39288310

ABSTRACT

MOTIVATION: Large Language Models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, requiring further domain-expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to generate meaningful biomedical text rooted in established knowledge. RESULTS: Compared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain-specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion. AVAILABILITY AND IMPLEMENTATION: SPOKE KG can be accessed at https://spoke.rbvi.ucsf.edu/neighborhood.html. It can also be accessed using REST-API (https://spoke.rbvi.ucsf.edu/swagger/). KG-RAG code is made available at https://github.com/BaranziniLab/KG_RAG. Biomedical benchmark datasets used in this study are made available to the research community in the same GitHub repository. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3.

The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information.

Morris, John H; Soman, Karthik; Akbas, Rabia E; Zhou, Xiaoyuan; Smith, Brett; Meng, Elaine C; Huang, Conrad C; Cerono, Gabriel; Schenk, Gundolf; Rizk-Jackson, Angela; Harroud, Adil; Sanders, Lauren; Costes, Sylvain V; Bharat, Krish; Chakraborty, Arjun; Pico, Alexander R; Mardirossian, Taline; Keiser, Michael; Tang, Alice; Hardi, Josef; Shi, Yongmei; Musen, Mark; Israni, Sharat; Huang, Sui; Rose, Peter W; Nelson, Charlotte A; Baranzini, Sergio E.

Bioinformatics ; 39(2)2023 02 03.

Article in English | MEDLINE | ID: mdl-36759942

ABSTRACT

MOTIVATION: Knowledge graphs (KGs) are being adopted in industry, commerce and academia. Biomedical KG presents a challenge due to the complexity, size and heterogeneity of the underlying information. RESULTS: In this work, we present the Scalable Precision Medicine Open Knowledge Engine (SPOKE), a biomedical KG connecting millions of concepts via semantically meaningful relationships. SPOKE contains 27 million nodes of 21 different types and 53 million edges of 55 types downloaded from 41 databases. The graph is built on the framework of 11 ontologies that maintain its structure, enable mappings and facilitate navigation. SPOKE is built weekly by python scripts which download each resource, check for integrity and completeness, and then create a 'parent table' of nodes and edges. Graph queries are translated by a REST API and users can submit searches directly via an API or a graphical user interface. Conclusions/Significance: SPOKE enables the integration of seemingly disparate information to support precision medicine efforts. AVAILABILITY AND IMPLEMENTATION: The SPOKE neighborhood explorer is available at https://spoke.rbvi.ucsf.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Pattern Recognition, Automated , Precision Medicine , Databases, Factual

4.

Hemin-Induced Death Models Hemorrhagic Stroke and Is a Variant of Classical Neuronal Ferroptosis.

Zille, Marietta; Oses-Prieto, Juan A; Savage, Sara R; Karuppagounder, Saravanan S; Chen, Yingxin; Kumar, Amit; Morris, John H; Scheidt, Karl A; Burlingame, Alma L; Ratan, Rajiv R.

J Neurosci ; 42(10): 2065-2079, 2022 03 09.

Article in English | MEDLINE | ID: mdl-34987108

ABSTRACT

Ferroptosis is a caspase-independent, iron-dependent form of regulated necrosis extant in traumatic brain injury, Huntington disease, and hemorrhagic stroke. It can be activated by cystine deprivation leading to glutathione depletion, the insufficiency of the antioxidant glutathione peroxidase-4, and the hemolysis products hemoglobin and hemin. A cardinal feature of ferroptosis is extracellular signal-regulated kinase (ERK)1/2 activation culminating in its translocation to the nucleus. We have previously confirmed that the mitogen-activated protein (MAP) kinase kinase (MEK) inhibitor U0126 inhibits persistent ERK1/2 phosphorylation and ferroptosis. Here, we show that hemin exposure, a model of secondary injury in brain hemorrhage and ferroptosis, activated ERK1/2 in mouse neurons. Accordingly, MEK inhibitor U0126 protected against hemin-induced ferroptosis. Unexpectedly, U0126 prevented hemin-induced ferroptosis independent of its ability to inhibit ERK1/2 signaling. In contrast to classical ferroptosis in neurons or cancer cells, chemically diverse inhibitors of MEK did not block hemin-induced ferroptosis, nor did the forced expression of the ERK-selective MAP kinase phosphatase (MKP)3. We conclude that hemin or hemoglobin-induced ferroptosis, unlike glutathione depletion, is ERK1/2-independent. Together with recent studies, our findings suggest the existence of a novel subtype of neuronal ferroptosis relevant to bleeding in the brain that is 5-lipoxygenase-dependent, ERK-independent, and transcription-independent. Remarkably, our unbiased phosphoproteome analysis revealed dramatic differences in phosphorylation induced by two ferroptosis subtypes. As U0126 also reduced cell death and improved functional recovery after hemorrhagic stroke in male mice, our analysis also provides a template on which to build a search for U0126's effects in a variant of neuronal ferroptosis.SIGNIFICANCE STATEMENT Ferroptosis is an iron-dependent mechanism of regulated necrosis that has been linked to hemorrhagic stroke. Common features of ferroptotic death induced by diverse stimuli are the depletion of the antioxidant glutathione, production of lipoxygenase-dependent reactive lipids, sensitivity to iron chelation, and persistent activation of extracellular signal-regulated kinase (ERK) signaling. Unlike classical ferroptosis induced in neurons or cancer cells, here we show that ferroptosis induced by hemin is ERK-independent. Paradoxically, the canonical MAP kinase kinase (MEK) inhibitor U0126 blocks brain hemorrhage-induced death. Altogether, these data suggest that a variant of ferroptosis is unleashed in hemorrhagic stroke. We present the first, unbiased phosphoproteomic analysis of ferroptosis as a template on which to understand distinct paths to cell death that meet the definition of ferroptosis.

Subject(s)

Ferroptosis , Hemorrhagic Stroke , Animals , Antioxidants/metabolism , Extracellular Signal-Regulated MAP Kinases/metabolism , Glutathione/metabolism , Hemin/metabolism , Hemin/pharmacology , Hemoglobins/metabolism , Intracranial Hemorrhages/metabolism , Iron/metabolism , Male , Mice , Mitogen-Activated Protein Kinase Kinases/metabolism , Necrosis/metabolism , Neurons/metabolism , Phosphorylation

5.

clusterMaker2: a major update to clusterMaker, a multi-algorithm clustering app for Cytoscape.

Utriainen, Maija; Morris, John H.

BMC Bioinformatics ; 24(1): 134, 2023 Apr 05.

Article in English | MEDLINE | ID: mdl-37020209

ABSTRACT

BACKGROUND: Since the initial publication of clusterMaker, the need for tools to analyze large biological datasets has only increased. New datasets are significantly larger than a decade ago, and new experimental techniques such as single-cell transcriptomics continue to drive the need for clustering or classification techniques to focus on portions of datasets of interest. While many libraries and packages exist that implement various algorithms, there remains the need for clustering packages that are easy to use, integrated with visualization of the results, and integrated with other commonly used tools for biological data analysis. clusterMaker2 has added several new algorithms, including two entirely new categories of analyses: node ranking and dimensionality reduction. Furthermore, many of the new algorithms have been implemented using the Cytoscape jobs API, which provides a mechanism for executing remote jobs from within Cytoscape. Together, these advances facilitate meaningful analyses of modern biological datasets despite their ever-increasing size and complexity. RESULTS: The use of clusterMaker2 is exemplified by reanalyzing the yeast heat shock expression experiment that was included in our original paper; however, here we explored this dataset in significantly more detail. Combining this dataset with the yeast protein-protein interaction network from STRING, we were able to perform a variety of analyses and visualizations from within clusterMaker2, including Leiden clustering to break the entire network into smaller clusters, hierarchical clustering to look at the overall expression dataset, dimensionality reduction using UMAP to find correlations between our hierarchical visualization and the UMAP plot, fuzzy clustering, and cluster ranking. Using these techniques, we were able to explore the highest-ranking cluster and determine that it represents a strong contender for proteins working together in response to heat shock. We found a series of clusters that, when re-explored as fuzzy clusters, provide a better presentation of mitochondrial processes. CONCLUSIONS: clusterMaker2 represents a significant advance over the previously published version, and most importantly, provides an easy-to-use tool to perform clustering and to visualize clusters within the Cytoscape network context. The new algorithms should be welcome to the large population of Cytoscape users, particularly the new dimensionality reduction and fuzzy clustering techniques.

Subject(s)

Mobile Applications , Saccharomyces cerevisiae , Algorithms , Protein Interaction Maps , Cluster Analysis

6.

Cytoscape stringApp 2.0: Analysis and Visualization of Heterogeneous Biological Networks.

Doncheva, Nadezhda T; Morris, John H; Holze, Henrietta; Kirsch, Rebecca; Nastou, Katerina C; Cuesta-Astroz, Yesid; Rattei, Thomas; Szklarczyk, Damian; von Mering, Christian; Jensen, Lars J.

J Proteome Res ; 22(2): 637-646, 2023 02 03.

Article in English | MEDLINE | ID: mdl-36512705

ABSTRACT

Biological networks are often used to represent complex biological systems, which can contain several types of entities. Analysis and visualization of such networks is supported by the Cytoscape software tool and its many apps. While earlier versions of stringApp focused on providing intraspecies protein-protein interactions from the STRING database, the new stringApp 2.0 greatly improves the support for heterogeneous networks. Here, we highlight new functionality that makes it possible to create networks that contain proteins and interactions from STRING as well as other biological entities and associations from other sources. We exemplify this by complementing a published SARS-CoV-2 interactome with interactions from STRING. We have also extended stringApp with new data and query functionality for protein-protein interactions between eukaryotic parasites and their hosts. We show how this can be used to retrieve and visualize a cross-species network for a malaria parasite, its host, and its vector. Finally, the latest stringApp version has an improved user interface, allows retrieval of both functional associations and physical interactions, and supports group-wise enrichment analysis of different parts of a network to aid biological interpretation. stringApp is freely available at https://apps.cytoscape.org/apps/stringapp.

Subject(s)

COVID-19 , Humans , SARS-CoV-2 , Software , Proteins , Eukaryota

7.

IntAct App: a Cytoscape application for molecular interaction network visualization and analysis.

Ragueneau, Eliot; Shrivastava, Anjali; Morris, John H; Del-Toro, Noemi; Hermjakob, Henning; Porras, Pablo.

Bioinformatics ; 37(20): 3684-3685, 2021 Oct 25.

Article in English | MEDLINE | ID: mdl-33961020

ABSTRACT

SUMMARY: IntAct App is a Cytoscape 3 application that grants in-depth access to IntAct's molecular interaction data. It build networks where nodes are interacting molecules (mainly proteins, but also genes, RNA, chemicals) and edges represent evidence of interaction. Users can query a network by providing its molecules, identified by different fields and optionally include all their interacting partners in the resulting network. The app offers three visualizations: one only displaying interactions, another representing every evidence and the last one emphasizing evidence where mutated versions of proteins were used. Users can also filter networks and click on nodes and edges to access all their related details. Finally, the application supports automation of its main features via Cytoscape commands. AVAILABILITY AND IMPLEMENTATION: Implementation available at https://apps.cytoscape.org/apps/intactapp, while the source code is available at https://github.com/EBI-IntAct/IntactApp.

8.

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

Szklarczyk, Damian; Gable, Annika L; Lyon, David; Junge, Alexander; Wyder, Stefan; Huerta-Cepas, Jaime; Simonovic, Milan; Doncheva, Nadezhda T; Morris, John H; Bork, Peer; Jensen, Lars J; Mering, Christian von.

Nucleic Acids Res ; 47(D1): D607-D613, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30476243

ABSTRACT

Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

Subject(s)

Genomics/methods , Protein Interaction Mapping/methods , Software , Animals , Databases, Genetic , Gene Ontology , Humans

9.

Ten simple rules to create biological network figures for communication.

Marai, G Elisabeta; Pinaud, Bruno; Bühler, Katja; Lex, Alexander; Morris, John H.

PLoS Comput Biol ; 15(9): e1007244, 2019 09.

Article in English | MEDLINE | ID: mdl-31557157

ABSTRACT

Biological network figures are ubiquitous in the biology and medical literature. On the one hand, a good network figure can quickly provide information about the nature and degree of interactions between items and enable inferences about the reason for those interactions. On the other hand, good network figures are difficult to create. In this paper, we outline 10 simple rules for creating biological network figures for communication, from choosing layouts, to applying color or other channels to show attributes, to the use of layering and separation. These rules are accompanied by illustrative examples. We also provide a concise set of references and additional resources for each rule.

Subject(s)

Computational Biology/methods , Computer Graphics , Attention , Color , Humans , Protein Interaction Maps/physiology , Signal Transduction/physiology , Visual Perception

10.

Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data.

Doncheva, Nadezhda T; Morris, John H; Gorodkin, Jan; Jensen, Lars J.

J Proteome Res ; 18(2): 623-632, 2019 02 01.

Article in English | MEDLINE | ID: mdl-30450911

ABSTRACT

Protein networks have become a popular tool for analyzing and visualizing the often long lists of proteins or genes obtained from proteomics and other high-throughput technologies. One of the most popular sources of such networks is the STRING database, which provides protein networks for more than 2000 organisms, including both physical interactions from experimental data and functional associations from curated pathways, automatic text mining, and prediction methods. However, its web interface is mainly intended for inspection of small networks and their underlying evidence. The Cytoscape software, on the other hand, is much better suited for working with large networks and offers greater flexibility in terms of network analysis, import, and visualization of additional data. To include both resources in the same workflow, we created stringApp, a Cytoscape app that makes it easy to import STRING networks into Cytoscape, retains the appearance and many of the features of STRING, and integrates data from associated databases. Here, we introduce many of the stringApp features and show how they can be used to carry out complex network analysis and visualization tasks on a typical proteomics data set, all through the Cytoscape user interface. stringApp is freely available from the Cytoscape app store: http://apps.cytoscape.org/apps/stringapp .

Subject(s)

Data Analysis , Proteomics/methods , Software , Computational Biology/methods , Internet , Protein Interaction Maps , User-Computer Interface

11.

The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible.

Szklarczyk, Damian; Morris, John H; Cook, Helen; Kuhn, Michael; Wyder, Stefan; Simonovic, Milan; Santos, Alberto; Doncheva, Nadezhda T; Roth, Alexander; Bork, Peer; Jensen, Lars J; von Mering, Christian.

Nucleic Acids Res ; 45(D1): D362-D368, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27924014

ABSTRACT

A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.

Subject(s)

Computational Biology/methods , Databases, Protein , Software , Models, Molecular , Protein Binding , Protein Conformation , Protein Interaction Mapping , Protein Interaction Maps , Proteins/chemistry , Proteins/metabolism , Structure-Activity Relationship , User-Computer Interface , Web Browser

12.

An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins.

Harper, Angela F; Leuthaeuser, Janelle B; Babbitt, Patricia C; Morris, John H; Ferrin, Thomas E; Poole, Leslie B; Fetrow, Jacquelyn S.

PLoS Comput Biol ; 13(2): e1005284, 2017 02.

Article in English | MEDLINE | ID: mdl-28187133

ABSTRACT

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.

Subject(s)

Databases, Protein , Peroxiredoxins/chemistry , Peroxiredoxins/classification , Protein Interaction Mapping/methods , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Amino Acid Sequence , Binding Sites , Database Management Systems , Enzyme Activation , High-Throughput Screening Assays/methods , Molecular Sequence Data , Multigene Family , Peroxiredoxins/ultrastructure , Protein Binding

13.

Global landscape of HIV-human protein complexes.

Jäger, Stefanie; Cimermancic, Peter; Gulbahce, Natali; Johnson, Jeffrey R; McGovern, Kathryn E; Clarke, Starlynn C; Shales, Michael; Mercenne, Gaelle; Pache, Lars; Li, Kathy; Hernandez, Hilda; Jang, Gwendolyn M; Roth, Shoshannah L; Akiva, Eyal; Marlett, John; Stephens, Melanie; D'Orso, Iván; Fernandes, Jason; Fahey, Marie; Mahon, Cathal; O'Donoghue, Anthony J; Todorovic, Aleksandar; Morris, John H; Maltby, David A; Alber, Tom; Cagney, Gerard; Bushman, Frederic D; Young, John A; Chanda, Sumit K; Sundquist, Wesley I; Kortemme, Tanja; Hernandez, Ryan D; Craik, Charles S; Burlingame, Alma; Sali, Andrej; Frankel, Alan D; Krogan, Nevan J.

Nature ; 481(7381): 365-70, 2011 Dec 21.

Article in English | MEDLINE | ID: mdl-22190034

ABSTRACT

Human immunodeficiency virus (HIV) has a small genome and therefore relies heavily on the host cellular machinery to replicate. Identifying which host proteins and complexes come into physical contact with the viral proteins is crucial for a comprehensive understanding of how HIV rewires the host's cellular machinery during the course of infection. Here we report the use of affinity tagging and purification mass spectrometry to determine systematically the physical interactions of all 18 HIV-1 proteins and polyproteins with host proteins in two different human cell lines (HEK293 and Jurkat). Using a quantitative scoring system that we call MiST, we identified with high confidence 497 HIV-human protein-protein interactions involving 435 individual human proteins, with â¼40% of the interactions being identified in both cell types. We found that the host proteins hijacked by HIV, especially those found interacting in both cell types, are highly conserved across primates. We uncovered a number of host complexes targeted by viral proteins, including the finding that HIV protease cleaves eIF3d, a subunit of eukaryotic translation initiation factor 3. This host protein is one of eleven identified in this analysis that act to inhibit HIV replication. This data set facilitates a more comprehensive and detailed understanding of how the host machinery is manipulated during the course of HIV infection.

Subject(s)

HIV-1/chemistry , HIV-1/metabolism , Host-Pathogen Interactions , Human Immunodeficiency Virus Proteins/metabolism , Protein Interaction Mapping/methods , Protein Interaction Maps/physiology , Affinity Labels , Amino Acid Sequence , Conserved Sequence , Eukaryotic Initiation Factor-3/chemistry , Eukaryotic Initiation Factor-3/metabolism , HEK293 Cells , HIV Infections/metabolism , HIV Infections/virology , HIV Protease/metabolism , HIV-1/physiology , Human Immunodeficiency Virus Proteins/analysis , Human Immunodeficiency Virus Proteins/chemistry , Human Immunodeficiency Virus Proteins/isolation & purification , Humans , Immunoprecipitation , Jurkat Cells , Mass Spectrometry , Protein Binding , Reproducibility of Results , Virus Replication

14.

DASP3: identification of protein sequences belonging to functionally relevant groups.

Leuthaeuser, Janelle B; Morris, John H; Harper, Angela F; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S.

BMC Bioinformatics ; 17(1): 458, 2016 Nov 11.

Article in English | MEDLINE | ID: mdl-27835946

ABSTRACT

BACKGROUND: Development of automatable processes for clustering proteins into functionally relevant groups is a critical hurdle as an increasing number of sequences are deposited into databases. Experimental function determination is exceptionally time-consuming and can't keep pace with the identification of protein sequences. A tool, DASP (Deacon Active Site Profiler), was previously developed to identify protein sequences with active site similarity to a query set. Development of two iterative, automatable methods for clustering proteins into functionally relevant groups exposed algorithmic limitations to DASP. RESULTS: The accuracy and efficiency of DASP was significantly improved through six algorithmic enhancements implemented in two stages: DASP2 and DASP3. Validation demonstrated DASP3 provides greater score separation between true positives and false positives than earlier versions. In addition, DASP3 shows similar performance to previous versions in clustering protein structures into isofunctional groups (validated against manual curation), but DASP3 gathers and clusters protein sequences into isofunctional groups more efficiently than DASP and DASP2. CONCLUSIONS: DASP algorithmic enhancements resulted in improved efficiency and accuracy of identifying proteins that contain active site features similar to those of the query set. These enhancements provide incremental improvement in structure database searches and initial sequence database searches; however, the enhancements show significant improvement in iterative sequence searches, suggesting DASP3 is an appropriate tool for the iterative processes required for clustering proteins into isofunctional groups.

Subject(s)

Algorithms , Sequence Analysis, Protein/methods , Amino Acid Motifs , Amino Acid Sequence , Catalytic Domain , Cluster Analysis , Databases, Protein , Proteins/chemistry

15.

cddApp: a Cytoscape app for accessing the NCBI conserved domain database.

Morris, John H; Wu, Allan; Yamashita, Roxanne A; Marchler-Bauer, Aron; Ferrin, Thomas E.

Bioinformatics ; 31(1): 134-6, 2015 Jan 01.

Article in English | MEDLINE | ID: mdl-25212755

ABSTRACT

MOTIVATION: cddApp is a Cytoscape extension that supports the annotation of protein networks with information about domains and specific functional sites from the National Center for Biotechnology Information's conserved domain database (CDD). CDD information is loaded for nodes annotated with NCBI numbers or UniProt identifiers and (optionally) Protein Data Bank structures. cddApp integrates with the Cytoscape apps structureViz2 and enhancedGraphics. Together, these three apps provide powerful tools to annotate nodes with CDD domain and site information and visualize that information in both network and structural contexts. AVAILABILITY AND IMPLEMENTATION: cddApp is written in Java and freely available for download from the Cytoscape app store (http://apps.cytoscape.org). Documentation is provided at http://www.rbvi.ucsf.edu/cytoscape, and the source is publically available from GitHub http://github.com/RBVI/cddApp.

Subject(s)

Bacterial Proteins/metabolism , Computational Biology/instrumentation , Metabolic Networks and Pathways , Molecular Sequence Annotation/methods , Sequence Analysis, Protein/methods , Software , Algorithms , Bacillus , Bacterial Proteins/chemistry , Conserved Sequence , Databases, Protein , Humans , Protein Conformation , Protein Interaction Mapping

16.

Enhancing UCSF Chimera through web services.

Huang, Conrad C; Meng, Elaine C; Morris, John H; Pettersen, Eric F; Ferrin, Thomas E.

Nucleic Acids Res ; 42(Web Server issue): W478-84, 2014 Jul.

Article in English | MEDLINE | ID: mdl-24861624

ABSTRACT

Integrating access to web services with desktop applications allows for an expanded set of application features, including performing computationally intensive tasks and convenient searches of databases. We describe how we have enhanced UCSF Chimera (http://www.rbvi.ucsf.edu/chimera/), a program for the interactive visualization and analysis of molecular structures and related data, through the addition of several web services (http://www.rbvi.ucsf.edu/chimera/docs/webservices.html). By streamlining access to web services, including the entire job submission, monitoring and retrieval process, Chimera makes it simpler for users to focus on their science projects rather than data manipulation. Chimera uses Opal, a toolkit for wrapping scientific applications as web services, to provide scalable and transparent access to several popular software packages. We illustrate Chimera's use of web services with an example workflow that interleaves use of these services with interactive manipulation of molecular sequences and structures, and we provide an example Python program to demonstrate how easily Opal-based web services can be accessed from within an application. Web server availability: http://webservices.rbvi.ucsf.edu/opal2/dashboard?command=serviceList.

Subject(s)

Molecular Structure , Software , Internet , Models, Molecular

17.

The Structure-Function Linkage Database.

Akiva, Eyal; Brown, Shoshana; Almonacid, Daniel E; Barber, Alan E; Custer, Ashley F; Hicks, Michael A; Huang, Conrad C; Lauck, Florian; Mashiyama, Susan T; Meng, Elaine C; Mischel, David; Morris, John H; Ojha, Sunil; Schnoes, Alexandra M; Stryke, Doug; Yunes, Jeffrey M; Ferrin, Thomas E; Holliday, Gemma L; Babbitt, Patricia C.

Nucleic Acids Res ; 42(Database issue): D521-30, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24271399

ABSTRACT

The Structure-Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure-function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies 'look alike', making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.

Subject(s)

Databases, Protein , Enzymes/chemistry , Enzymes/classification , Enzymes/metabolism , Internet , Molecular Sequence Annotation , Sequence Alignment , Structure-Activity Relationship

18.

Translating desktop success to the web in the cytoscape project.

Pratt, Dexter; Pillich, Rudolf T; Morris, John H.

Front Bioinform ; 3: 1125949, 2023.

Article in English | MEDLINE | ID: mdl-37035036

ABSTRACT

Cytoscape is an open-source bioinformatics environment for the analysis, integration, visualization, and query of biological networks. In this perspective piece, we describe our project to bring the Cytoscape desktop application to the web while explaining our strategy in ways relevant to others in the bioinformatics community. We examine opportunities and challenges in developing bioinformatics software that spans both the desktop and web, and we describe our ongoing efforts to build a Cytoscape web application, highlighting the principles that guide our development.

19.

UCSF ChimeraX: Tools for structure building and analysis.

Meng, Elaine C; Goddard, Thomas D; Pettersen, Eric F; Couch, Greg S; Pearson, Zach J; Morris, John H; Ferrin, Thomas E.

Protein Sci ; 32(11): e4792, 2023 11.

Article in English | MEDLINE | ID: mdl-37774136

ABSTRACT

Advances in computational tools for atomic model building are leading to accurate models of large molecular assemblies seen in electron microscopy, often at challenging resolutions of 3-4 Å. We describe new methods in the UCSF ChimeraX molecular modeling package that take advantage of machine-learning structure predictions, provide likelihood-based fitting in maps, and compute per-residue scores to identify modeling errors. Additional model-building tools assist analysis of mutations, post-translational modifications, and interactions with ligands. We present the latest ChimeraX model-building capabilities, including several community-developed extensions. ChimeraX is available free of charge for noncommercial use at https://www.rbvi.ucsf.edu/chimerax.

Subject(s)

Software , Cryoelectron Microscopy/methods , Likelihood Functions , Models, Molecular , Microscopy, Electron , Protein Conformation

20.

Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution.

Apeltsin, Leonard; Morris, John H; Babbitt, Patricia C; Ferrin, Thomas E.

Bioinformatics ; 27(3): 326-33, 2011 Feb 01.

Article in English | MEDLINE | ID: mdl-21118823

ABSTRACT

MOTIVATION: Clustering protein sequence data into functionally specific families is a difficult but important problem in biological research. One useful approach for tackling this problem involves representing the sequence dataset as a protein similarity network, and afterwards clustering the network using advanced graph analysis techniques. Although a multitude of such network clustering algorithms have been developed over the past few years, comparing algorithms is often difficult because performance is affected by the specifics of network construction. We investigate an important aspect of network construction used in analyzing protein superfamilies and present a heuristic approach for improving the performance of several algorithms. RESULTS: We analyzed how the performance of network clustering algorithms relates to thresholding the network prior to clustering. Our results, over four different datasets, show how for each input dataset there exists an optimal threshold range over which an algorithm generates its most accurate clustering output. Our results further show how the optimal threshold range correlates with the shape of the edge weight distribution for the input similarity network. We used this correlation to develop an automated threshold selection heuristic in order to most optimally filter a similarity network prior to clustering. This heuristic allows researchers to process their protein datasets with runtime efficient network clustering algorithms without sacrificing the clustering accuracy of the final results. AVAILABILITY: Python code for implementing the automated threshold selection heuristic, together with the datasets used in our analysis, are available at http://www.rbvi.ucsf.edu/Research/cytoscape/threshold_scripts.zip.

Subject(s)

Algorithms , Pattern Recognition, Automated/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Amino Acid Sequence , Artificial Intelligence , Cluster Analysis , Proteins/metabolism , Software

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL