RESUMO
A deficient interferon (IFN) response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has been implicated as a determinant of severe coronavirus disease 2019 (COVID-19). To identify the molecular effectors that govern IFN control of SARS-CoV-2 infection, we conducted a large-scale gain-of-function analysis that evaluated the impact of human IFN-stimulated genes (ISGs) on viral replication. A limited subset of ISGs were found to control viral infection, including endosomal factors inhibiting viral entry, RNA binding proteins suppressing viral RNA synthesis, and a highly enriched cluster of endoplasmic reticulum (ER)/Golgi-resident ISGs inhibiting viral assembly/egress. These included broad-acting antiviral ISGs and eight ISGs that specifically inhibited SARS-CoV-2 and SARS-CoV-1 replication. Among the broad-acting ISGs was BST2/tetherin, which impeded viral release and is antagonized by SARS-CoV-2 Orf7a protein. Overall, these data illuminate a set of ISGs that underlie innate immune control of SARS-CoV-2/SARS-CoV-1 infection, which will facilitate the understanding of host determinants that impact disease severity and offer potential therapeutic strategies for COVID-19.
Assuntos
Antígenos CD/genética , Interações Hospedeiro-Patógeno/genética , Fatores Reguladores de Interferon/genética , Interferon Tipo I/genética , SARS-CoV-2/genética , Proteínas Virais/genética , Animais , Antígenos CD/química , Antígenos CD/imunologia , Sítios de Ligação , Linhagem Celular Tumoral , Chlorocebus aethiops , Retículo Endoplasmático/genética , Retículo Endoplasmático/imunologia , Retículo Endoplasmático/virologia , Proteínas Ligadas por GPI/química , Proteínas Ligadas por GPI/genética , Proteínas Ligadas por GPI/imunologia , Regulação da Expressão Gênica , Complexo de Golgi/genética , Complexo de Golgi/imunologia , Complexo de Golgi/virologia , Células HEK293 , Interações Hospedeiro-Patógeno/imunologia , Humanos , Imunidade Inata , Fatores Reguladores de Interferon/classificação , Fatores Reguladores de Interferon/imunologia , Interferon Tipo I/imunologia , Simulação de Acoplamento Molecular , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , SARS-CoV-2/imunologia , Transdução de Sinais , Células Vero , Proteínas Virais/química , Proteínas Virais/imunologia , Internalização do Vírus , Liberação de Vírus/genética , Liberação de Vírus/imunologia , Replicação Viral/genética , Replicação Viral/imunologiaRESUMO
In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.
Assuntos
Reposicionamento de Medicamentos , Software , Reposicionamento de Medicamentos/métodos , Humanos , Internet , Descoberta de Drogas/métodos , Biologia de Sistemas/métodos , Biologia Computacional/métodosRESUMO
MOTIVATION: The investigation of sets of genes using biological pathways is a common task for researchers and is supported by a wide variety of software tools. This type of analysis generates hypotheses about the biological processes that are active or modulated in a specific experimental context. RESULTS: The Network Data Exchange Integrated Query (NDEx IQuery) is a new tool for network and pathway-based gene set interpretation that complements or extends existing resources. It combines novel sources of pathways, integration with Cytoscape, and the ability to store and share analysis results. The NDEx IQuery web application performs multiple gene set analyses based on diverse pathways and networks stored in NDEx. These include curated pathways from WikiPathways and SIGNOR, published pathway figures from the last 27 years, machine-assembled networks using the INDRA system, and the new NCI-PID v2.0, an updated version of the popular NCI Pathway Interaction Database. NDEx IQuery's integration with MSigDB and cBioPortal now provides pathway analysis in the context of these two resources. AVAILABILITY AND IMPLEMENTATION: NDEx IQuery is available at https://www.ndexbio.org/iquery and is implemented in Javascript and Java.
Assuntos
Biologia Computacional , Software , Biologia Computacional/métodos , Mapas de Interação de Proteínas , Publicações , Bases de Dados Factuais , InternetRESUMO
MOTIVATION: A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called 'causal interaction' takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. RESULTS: Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. AVAILABILITY AND IMPLEMENTATION: The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Software , Causalidade , HumanosRESUMO
Detection of community structure has become a fundamental step in the analysis of biological networks with application to protein function annotation, disease gene prediction, and drug discovery. This recent impact creates a need to make these techniques and their accompanying visualization schemes available to a broad range of biologists. Here we present a service-oriented, end-to-end software framework, CDAPS (Community Detection APplication and Service), that integrates the identification, annotation, visualization, and interrogation of multiscale network communities, accessible within the popular Cytoscape network analysis platform. With novel design principles, CDAPS addresses unmet new challenges, such as identifying hierarchical community structures, comparison of outputs generated from diverse network resources, and easy deployment of new algorithms, to facilitate community-sourced science. We demonstrate that the CDAPS framework can be applied to high-throughput protein-protein interaction networks to gain novel insights, such as the identification of putative new members of known protein complexes.
Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Software , Algoritmos , Bases de Dados Genéticas , Humanos , Mapas de Interação de ProteínasRESUMO
Motivation: Seamless exchange of biological network data enables bioinformatic algorithms to integrate networks as prior knowledge input as well as to document resulting network output. However, the interoperability between pathway databases and various methods and platforms for analysis is currently lacking. The Network Data Exchange (NDEx) is an open-source data commons that facilitates the user-centered sharing and publication of networks of many types and formats. Results: Here, we present a software package that allows users to programmatically connect to and interface with NDEx servers from within R. The network repository can be searched and networks can be retrieved and converted into igraph-compatible objects. These networks can be modified and extended within R and uploaded back to the NDEx servers. Availability and implementation: ndexr is a free and open-source R package, available via GitHub (https://github.com/frankkramer-lab/ndexr) and Bioconductor (http://bioconductor.org/packages/ndexr/). Contact: florian.auer@med.uni-goettingen.de. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional/métodos , Software , Algoritmos , Redes e Vias Metabólicas , Mapas de Interação de Proteínas , Publicações , Transdução de SinaisRESUMO
Network propagation is an important and widely used algorithm in systems biology, with applications in protein function prediction, disease gene prioritization, and patient stratification. However, up to this point it has required significant expertise to run. Here we extend the popular network analysis program Cytoscape to perform network propagation as an integrated function. Such integration greatly increases the access to network propagation by putting it in the hands of biologists and linking it to the many other types of network analysis and visualization available through Cytoscape. We demonstrate the power and utility of the algorithm by identifying mutations conferring resistance to Vemurafenib.
Assuntos
Algoritmos , Software , Biologia de Sistemas/métodos , Animais , Resistencia a Medicamentos Antineoplásicos , Indóis , Modelos Biológicos , Mutação , Mapeamento de Interação de Proteínas/métodos , Sulfonamidas , VemurafenibRESUMO
The Network-extracted Ontology (NeXO) is a gene ontology inferred directly from large-scale molecular networks. While most ontologies are constructed through manual expert curation, NeXO uses a principled computational approach which integrates evidence from hundreds of thousands of individual gene and protein interactions to construct a global hierarchy of cellular components and processes. Here, we describe the development of the NeXO Web platform (http://www.nexontology.org)-an online database and graphical user interface for visualizing, browsing and performing term enrichment analysis using NeXO and the gene ontology. The platform applies state-of-the-art web technology and visualization techniques to provide an intuitive framework for investigating biological machinery captured by both data-driven and manually curated ontologies.
Assuntos
Bases de Dados Genéticas , Ontologia Genética , Redes Reguladoras de Genes , Gráficos por Computador , Epistasia Genética , Internet , Mapeamento de Interação de ProteínasRESUMO
Advancements in genomic and proteomic technologies have powered the use of gene and protein networks ("interactomes") for understanding genotype-phenotype translation. However, the proliferation of interactomes complicates the selection of networks for specific applications. Here, we present a comprehensive evaluation of 46 current human interactomes, encompassing protein-protein interactions as well as gene regulatory, signaling, colocalization, and genetic interaction networks. Our analysis shows that large composite networks such as HumanNet, STRING, and FunCoup are most effective for identifying disease genes, while smaller networks such as DIP and SIGNOR demonstrate strong interaction prediction performance. These findings provide a benchmark for interactomes across diverse network biology applications and clarify factors that influence network performance. Furthermore, our evaluation pipeline paves the way for continued assessment of emerging and updated interaction networks in the future.
RESUMO
Gene set analysis is a mainstay of functional genomics, but it relies on curated databases of gene functions that are incomplete. Here we evaluate five Large Language Models (LLMs) for their ability to discover the common biological functions represented by a gene set, substantiated by supporting rationale, citations and a confidence assessment. Benchmarking against canonical gene sets from the Gene Ontology, GPT-4 confidently recovered the curated name or a more general concept (73% of cases), while benchmarking against random gene sets correctly yielded zero confidence. Gemini-Pro and Mixtral-Instruct showed ability in naming but were falsely confident for random sets, whereas Llama2-70b had poor performance overall. In gene sets derived from 'omics data, GPT-4 identified novel functions not reported by classical functional enrichment (32% of cases), which independent review indicated were largely verifiable and not hallucinations. The ability to rapidly synthesize common gene functions positions LLMs as valuable 'omics assistants.
RESUMO
Motivation: Molecular Regulatory Pathways (MRPs) are crucial for understanding biological functions. Knowledge Graphs (KGs) have become vital in organizing and analyzing MRPs, providing structured representations of complex biological interactions. Current tools for mining KGs from biomedical literature are inadequate in capturing complex, hierarchical relationships and contextual information about MRPs. Large Language Models (LLMs) like GPT-4 offer a promising solution, with advanced capabilities to decipher the intricate nuances of language. However, their potential for end-to-end KG construction, particularly for MRPs, remains largely unexplored. Results: We present reguloGPT, a novel GPT-4 based in-context learning prompt, designed for the end-to-end joint name entity recognition, N-ary relationship extraction, and context predictions from a sentence that describes regulatory interactions with MRPs. Our reguloGPT approach introduces a context-aware relational graph that effectively embodies the hierarchical structure of MRPs and resolves semantic inconsistencies by embedding context directly within relational edges. We created a benchmark dataset including 400 annotated PubMed titles on N6-methyladenosine (m6A) regulations. Rigorous evaluation of reguloGPT on the benchmark dataset demonstrated marked improvement over existing algorithms. We further developed a novel G-Eval scheme, leveraging GPT-4 for annotation-free performance evaluation and demonstrated its agreement with traditional annotation-based evaluations. Utilizing reguloGPT predictions on m6A-related titles, we constructed the m6A-KG and demonstrated its utility in elucidating m6A's regulatory mechanisms in cancer phenotypes across various cancers. These results underscore reguloGPT's transformative potential for extracting biological knowledge from the literature. Availability and implementation: The source code of reguloGPT, the m6A title and benchmark datasets, and m6A-KG are available at: https://github.com/Huang-AI4Medicine-Lab/reguloGPT.
RESUMO
Defining the subset of cellular factors governing SARS-CoV-2 replication can provide critical insights into viral pathogenesis and identify targets for host-directed antiviral therapies. While a number of genetic screens have previously reported SARS-CoV-2 host dependency factors, these approaches relied on utilizing pooled genome-scale CRISPR libraries, which are biased towards the discovery of host proteins impacting early stages of viral replication. To identify host factors involved throughout the SARS-CoV-2 infectious cycle, we conducted an arrayed genome-scale siRNA screen. Resulting data were integrated with published datasets to reveal pathways supported by orthogonal datasets, including transcriptional regulation, epigenetic modifications, and MAPK signalling. The identified proviral host factors were mapped into the SARS-CoV-2 infectious cycle, including 27 proteins that were determined to impact assembly and release. Additionally, a subset of proteins were tested across other coronaviruses revealing 17 potential pan-coronavirus targets. Further studies illuminated a role for the heparan sulfate proteoglycan perlecan in SARS-CoV-2 viral entry, and found that inhibition of the non-canonical NF-kB pathway through targeting of BIRC2 restricts SARS-CoV-2 replication both in vitro and in vivo. These studies provide critical insight into the landscape of virus-host interactions driving SARS-CoV-2 replication as well as valuable targets for host-directed antivirals.
RESUMO
This article describes the Cell Maps for Artificial Intelligence (CM4AI) project and its goals, methods, standards, current datasets, software tools , status, and future directions. CM4AI is the Functional Genomics Data Generation Project in the U.S. National Institute of Health's (NIH) Bridge2AI program. Its overarching mission is to produce ethical, AI-ready datasets of cell architecture, inferred from multimodal data collected for human cell lines, to enable transformative biomedical AI research.
RESUMO
Translating high-confidence (hc) autism spectrum disorder (ASD) genes into viable treatment targets remains elusive. We constructed a foundational protein-protein interaction (PPI) network in HEK293T cells involving 100 hcASD risk genes, revealing over 1,800 PPIs (87% novel). Interactors, expressed in the human brain and enriched for ASD but not schizophrenia genetic risk, converged on protein complexes involved in neurogenesis, tubulin biology, transcriptional regulation, and chromatin modification. A PPI map of 54 patient-derived missense variants identified differential physical interactions, and we leveraged AlphaFold-Multimer predictions to prioritize direct PPIs and specific variants for interrogation in Xenopus tropicalis and human forebrain organoids. A mutation in the transcription factor FOXP1 led to reconfiguration of DNA binding sites and altered development of deep cortical layer neurons in forebrain organoids. This work offers new insights into molecular mechanisms underlying ASD and describes a powerful platform to develop and test therapeutic strategies for many genetically-defined conditions.
RESUMO
BACKGROUND: Gene expression profiling and other genome-scale measurement technologies provide comprehensive information about molecular changes resulting from a chemical or genetic perturbation, or disease state. A critical challenge is the development of methods to interpret these large-scale data sets to identify specific biological mechanisms that can provide experimentally verifiable hypotheses and lead to the understanding of disease and drug action. RESULTS: We present a detailed description of Reverse Causal Reasoning (RCR), a reverse engineering methodology to infer mechanistic hypotheses from molecular profiling data. This methodology requires prior knowledge in the form of small networks that causally link a key upstream controller node representing a biological mechanism to downstream measurable quantities. These small directed networks are generated from a knowledge base of literature-curated qualitative biological cause-and-effect relationships expressed as a network. The small mechanism networks are evaluated as hypotheses to explain observed differential measurements. We provide a simple implementation of this methodology, Whistle, specifically geared towards the analysis of gene expression data and using prior knowledge expressed in Biological Expression Language (BEL). We present the Whistle analyses for three transcriptomic data sets using a publically available knowledge base. The mechanisms inferred by Whistle are consistent with the expected biology for each data set. CONCLUSIONS: Reverse Causal Reasoning yields mechanistic insights to the interpretation of gene expression profiling data that are distinct from and complementary to the results of analyses using ontology or pathway gene sets. This reverse engineering algorithm provides an evidence-driven approach to the development of models of disease, drug action, and drug toxicity.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Bases de Conhecimento , Algoritmos , Animais , Mama/citologia , Endotélio Vascular/citologia , Células Epiteliais/citologia , Perfilação da Expressão Gênica/métodos , Genoma Humano , Histona-Lisina N-Metiltransferase/genética , Humanos , Resistência à Insulina/genética , Camundongos , Análise em Microsséries , Sondas Moleculares/genética , Proteínas Nucleares/genéticaRESUMO
Cytoscape is an open-source bioinformatics environment for the analysis, integration, visualization, and query of biological networks. In this perspective piece, we describe our project to bring the Cytoscape desktop application to the web while explaining our strategy in ways relevant to others in the bioinformatics community. We examine opportunities and challenges in developing bioinformatics software that spans both the desktop and web, and we describe our ongoing efforts to build a Cytoscape web application, highlighting the principles that guide our development.
RESUMO
Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI's GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in 'omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.
RESUMO
A longstanding goal of biomedicine is to understand how alterations in molecular and cellular networks give rise to the spectrum of human diseases. For diseases with shared etiology, understanding the common causes allows for improved diagnosis of each disease, development of new therapies and more comprehensive identification of disease genes. Accordingly, this protocol describes how to evaluate the extent to which two diseases, each characterized by a set of mapped genes, are colocalized in a reference gene interaction network. This procedure uses network propagation to measure the network 'distance' between gene sets. For colocalized diseases, the network can be further analyzed to extract common gene communities at progressive granularities. In particular, we show how to: (1) obtain input gene sets and a reference gene interaction network; (2) identify common subnetworks of genes that encompass or are in close proximity to all gene sets; (3) use multiscale community detection to identify systems and pathways represented by each common subnetwork to generate a network colocalized systems map; (4) validate identified genes and systems using a mouse variant database; and (5) visualize and further investigate select genes, interactions and systems for relevance to phenotype(s) of interest. We demonstrate the utility of this approach by identifying shared biological mechanisms underlying autism and congenital heart disease. However, this protocol is general and can be applied to any gene sets attributed to diseases or other phenotypes with suspected joint association. A typical NetColoc run takes less than an hour. Software and documentation are available at https://github.com/ucsd-ccbb/NetColoc .
Assuntos
Redes Reguladoras de Genes , Software , Humanos , Bases de Dados Factuais , Biologia Computacional/métodosRESUMO
The DNA damage response (DDR) ensures error-free DNA replication and transcription and is disrupted in numerous diseases. An ongoing challenge is to determine the proteins orchestrating DDR and their organization into complexes, including constitutive interactions and those responding to genomic insult. Here, we use multi-conditional network analysis to systematically map DDR assemblies at multiple scales. Affinity purifications of 21 DDR proteins, with/without genotoxin exposure, are combined with multi-omics data to reveal a hierarchical organization of 605 proteins into 109 assemblies. The map captures canonical repair mechanisms and proposes new DDR-associated proteins extending to stress, transport, and chromatin functions. We find that protein assemblies closely align with genetic dependencies in processing specific genotoxins and that proteins in multiple assemblies typically act in multiple genotoxin responses. Follow-up by DDR functional readouts newly implicates 12 assembly members in double-strand-break repair. The DNA damage response assemblies map is available for interactive visualization and query (ccmi.org/ddram/).
Assuntos
Cromatina , Reparo do DNA , Reparo do DNA/genética , Cromatina/genética , Dano ao DNA/genéticaRESUMO
In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.