RESUMEN
Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into searchable signatures along with signatures extracted from Genotype-Tissue Expression (GTEx) and Gene Expression Omnibus (GEO), connections between drugs, genes, pathways and diseases can be illuminated. SigCom LINCS is a webserver that serves over a million gene expression signatures processed, analyzed, and visualized from LINCS, GTEx, and GEO. SigCom LINCS is built with Signature Commons, a cloud-agnostic skeleton Data Commons with a focus on serving searchable signatures. SigCom LINCS provides a rapid signature similarity search for mimickers and reversers given sets of up and down genes, a gene set, a single gene, or any search term. Additionally, users of SigCom LINCS can perform a metadata search to find and analyze subsets of signatures and find information about genes and drugs. SigCom LINCS is findable, accessible, interoperable, and reusable (FAIR) with metadata linked to standard ontologies and vocabularies. In addition, all the data and signatures within SigCom LINCS are available via a well-documented API. In summary, SigCom LINCS, available at https://maayanlab.cloud/sigcom-lincs, is a rich webserver resource for accelerating drug and target discovery in systems pharmacology.
Asunto(s)
Metadatos , Transcriptoma , Transcriptoma/genética , Motor de BúsquedaRESUMEN
Phosphoproteomics and proteomics experiments capture a global snapshot of the cellular signaling network, but these methods do not directly measure kinase state. Kinase Enrichment Analysis 3 (KEA3) is a webserver application that infers overrepresentation of upstream kinases whose putative substrates are in a user-inputted list of proteins. KEA3 can be applied to analyze data from phosphoproteomics and proteomics studies to predict the upstream kinases responsible for observed differential phosphorylations. The KEA3 background database contains measured and predicted kinase-substrate interactions (KSI), kinase-protein interactions (KPI), and interactions supported by co-expression and co-occurrence data. To benchmark the performance of KEA3, we examined whether KEA3 can predict the perturbed kinase from single-kinase perturbation followed by gene expression experiments, and phosphoproteomics data collected from kinase-targeting small molecules. We show that integrating KSIs and KPIs across data sources to produce a composite ranking improves the recovery of the expected kinase. The KEA3 webserver is available at https://maayanlab.cloud/kea3.
Asunto(s)
Proteínas Quinasas/metabolismo , Programas Informáticos , Expresión Génica , Humanos , Fosforilación , Inhibidores de Proteínas Quinasas , Proteómica , SARS-CoV-2/enzimologíaRESUMEN
The frequency by which genes are studied correlates with the prior knowledge accumulated about them. This leads to an imbalance in research attention where some genes are highly investigated while others are ignored. Geneshot is a search engine developed to illuminate this gap and to promote attention to the under-studied genome. Through a simple web interface, Geneshot enables researchers to enter arbitrary search terms, to receive ranked lists of genes relevant to the search terms. Returned ranked gene lists contain genes that were previously published in association with the search terms, as well as genes predicted to be associated with the terms based on data integration from multiple sources. The search results are presented with interactive visualizations. To predict gene function, Geneshot utilizes gene-gene similarity matrices from processed RNA-seq data, or from gene-gene co-occurrence data obtained from multiple sources. In addition, Geneshot can be used to analyze the novelty of gene sets and augment gene sets with additional relevant genes. The Geneshot web-server and API are freely and openly available from https://amp.pharm.mssm.edu/geneshot.
Asunto(s)
Genes , Programas Informáticos , Minería de Datos , Expresión Génica , Internet , Publicaciones , RNA-Seq , Investigadores , Interfaz Usuario-ComputadorRESUMEN
High-throughput experiments produce increasingly large datasets that are difficult to analyze and integrate. While most data integration approaches focus on aligning metadata, data integration can be achieved by abstracting experimental results into gene sets. Such gene sets can be made available for reuse through gene set enrichment analysis tools such as Enrichr. Enrichr currently only supports gene sets compiled from human and mouse, limiting accessibility for investigators that study other model organisms. modEnrichr is an expansion of Enrichr for four model organisms: fish, fly, worm and yeast. The gene set libraries within FishEnrichr, FlyEnrichr, WormEnrichr and YeastEnrichr are created from the Gene Ontology, mRNA expression profiles, GeneRIF, pathway databases, protein domain databases and other organism-specific resources. Additionally, libraries were created by predicting gene function from RNA-seq co-expression data processed uniformly from the gene expression omnibus for each organism. The modEnrichr suite of tools provides the ability to convert gene lists across species using an ortholog conversion tool that automatically detects the species. For complex analyses, modEnrichr provides API access that enables submitting batch queries. In summary, modEnrichr leverages existing model organism databases and other resources to facilitate comprehensive hypothesis generation through data integration.
Asunto(s)
Bases de Datos Genéticas , Expresión Génica/genética , Biblioteca de Genes , Biblioteca Genómica , Programas Informáticos , Animales , Biología Computacional , Ontología de Genes , Humanos , MetadatosRESUMEN
While gene expression data at the mRNA level can be globally and accurately measured, profiling the activity of cell signaling pathways is currently much more difficult. eXpression2Kinases (X2K) computationally predicts involvement of upstream cell signaling pathways, given a signature of differentially expressed genes. X2K first computes enrichment for transcription factors likely to regulate the expression of the differentially expressed genes. The next step of X2K connects these enriched transcription factors through known protein-protein interactions (PPIs) to construct a subnetwork. The final step performs kinase enrichment analysis on the members of the subnetwork. X2K Web is a new implementation of the original eXpression2Kinases algorithm with important enhancements. X2K Web includes many new transcription factor and kinase libraries, and PPI networks. For demonstration, thousands of gene expression signatures induced by kinase inhibitors, applied to six breast cancer cell lines, are provided for fetching directly into X2K Web. The results are displayed as interactive downloadable vector graphic network images and bar graphs. Benchmarking various settings via random permutations enabled the identification of an optimal set of parameters to be used as the default settings in X2K Web. X2K Web is freely available from http://X2K.cloud.
Asunto(s)
Expresión Génica , Proteínas Quinasas/metabolismo , Transducción de Señal , Programas Informáticos , Animales , Línea Celular Tumoral , Expresión Génica/efectos de los fármacos , Humanos , Internet , Ratones , Mapeo de Interacción de Proteínas , Inhibidores de Proteínas Quinasas/farmacología , Transducción de Señal/genética , Factores de Transcripción/metabolismoRESUMEN
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.
Asunto(s)
Biología Computacional/métodos , Biblioteca de Genes , Ontología de Genes , Interfaz Usuario-Computador , Benchmarking , Biología Computacional/estadística & datos numéricos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Genoma Humano , Humanos , Internet , Anotación de Secuencia MolecularRESUMEN
Long non-coding ribonucleic acids (lncRNAs) account for the largest group of non-coding RNAs. However, knowledge about their function and regulation is limited. lncHUB2 is a web server database that provides known and inferred knowledge about the function of 18 705 human and 11 274 mouse lncRNAs. lncHUB2 produces reports that contain the secondary structure fold of the lncRNA, related publications, the most correlated coding genes, the most correlated lncRNAs, a network that visualizes the most correlated genes, predicted mouse phenotypes, predicted membership in biological processes and pathways, predicted upstream transcription factor regulators, and predicted disease associations. In addition, the reports include subcellular localization information; expression across tissues, cell types, and cell lines, and predicted small molecules and CRISPR knockout (CRISPR-KO) genes prioritized based on their likelihood to up- or downregulate the expression of the lncRNA. Overall, lncHUB2 is a database with rich information about human and mouse lncRNAs and as such it can facilitate hypothesis generation for many future studies. The lncHUB2 database is available at https://maayanlab.cloud/lncHUB2. Database URL: https://maayanlab.cloud/lncHUB2.
Asunto(s)
ARN Largo no Codificante , Humanos , Animales , Ratones , Línea Celular , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Bases de Datos Factuales , ConocimientoRESUMEN
Motivation: Many biological and biomedical researchers commonly search for information about genes and drugs to gather knowledge from these resources. For the most part, such information is served as landing pages in disparate data repositories and web portals. Results: The Gene and Drug Landing Page Aggregator (GDLPA) provides users with access to 50 gene-centric and 19 drug-centric repositories, enabling them to retrieve landing pages corresponding to their gene and drug queries. Bringing these resources together into one dashboard that directs users to the landing pages across many resources can help centralize gene- and drug-centric knowledge, as well as raise awareness of available resources that may be missed when using standard search engines. To demonstrate the utility of GDLPA, case studies for the gene klotho and the drug remdesivir were developed. The first case study highlights the potential role of klotho as a drug target for aging and kidney disease, while the second study gathers knowledge regarding approval, usage, and safety for remdesivir, the first approved coronavirus disease 2019 therapeutic. Finally, based on our experience, we provide guidelines for developing effective landing pages for genes and drugs. Availability and implementation: GDLPA is open source and is available from: https://cfde-gene-pages.cloud/. Supplementary information: Supplementary data are available at Bioinformatics Advances online.
RESUMEN
Understanding the underlying molecular and structural similarities between seemingly heterogeneous sets of drugs can aid in identifying drug repurposing opportunities and assist in the discovery of novel properties of preclinical small molecules. A wealth of information about drug and small molecule structure, targets, indications and side effects; induced gene expression signatures; and other attributes are publicly available through web-based tools, databases and repositories. By processing, abstracting and aggregating information from these resources into drug set libraries, knowledge about novel properties of drugs and small molecules can be systematically imputed with machine learning. In addition, drug set libraries can be used as the underlying database for drug set enrichment analysis. Here, we present Drugmonizome, a database with a search engine for querying annotated sets of drugs and small molecules for performing drug set enrichment analysis. Utilizing the data within Drugmonizome, we also developed Drugmonizome-ML. Drugmonizome-ML enables users to construct customized machine learning pipelines using the drug set libraries from Drugmonizome. To demonstrate the utility of Drugmonizome, drug sets from 12 independent SARS-CoV-2 in vitro screens were subjected to consensus enrichment analysis. Despite the low overlap among these 12 independent in vitro screens, we identified common biological processes critical for blocking viral replication. To demonstrate Drugmonizome-ML, we constructed a machine learning pipeline to predict whether approved and preclinical drugs may induce peripheral neuropathy as a potential side effect. Overall, the Drugmonizome and Drugmonizome-ML resources provide rich and diverse knowledge about drugs and small molecules for direct systems pharmacology applications. Database URL: https://maayanlab.cloud/drugmonizome/.
Asunto(s)
Tratamiento Farmacológico de COVID-19 , Bases de Datos Farmacéuticas , SARS-CoV-2/efectos de los fármacos , Antivirales/química , Antivirales/farmacología , COVID-19/virología , Descubrimiento de Drogas , Evaluación Preclínica de Medicamentos , Reposicionamiento de Medicamentos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Técnicas In Vitro , Aprendizaje Automático , Enfermedades del Sistema Nervioso Periférico/inducido químicamente , SARS-CoV-2/fisiología , Bibliotecas de Moléculas Pequeñas , Interfaz Usuario-Computador , Replicación Viral/efectos de los fármacosRESUMEN
Profiling samples from patients, tissues, and cells with genomics, transcriptomics, epigenomics, proteomics, and metabolomics ultimately produces lists of genes and proteins that need to be further analyzed and integrated in the context of known biology. Enrichr (Chen et al., 2013; Kuleshov et al., 2016) is a gene set search engine that enables the querying of hundreds of thousands of annotated gene sets. Enrichr uniquely integrates knowledge from many high-profile projects to provide synthesized information about mammalian genes and gene sets. The platform provides various methods to compute gene set enrichment, and the results are visualized in several interactive ways. This protocol provides a summary of the key features of Enrichr, which include using Enrichr programmatically and embedding an Enrichr button on any website. © 2021 Wiley Periodicals LLC. Basic Protocol 1: Analyzing lists of differentially expressed genes from transcriptomics, proteomics and phosphoproteomics, GWAS studies, or other experimental studies Basic Protocol 2: Searching Enrichr by a single gene or key search term Basic Protocol 3: Preparing raw or processed RNA-seq data through BioJupies in preparation for Enrichr analysis Basic Protocol 4: Analyzing gene sets for model organisms using modEnrichr Basic Protocol 5: Using Enrichr in Geneshot Basic Protocol 6: Using Enrichr in ARCHS4 Basic Protocol 7: Using the enrichment analysis visualization Appyter to visualize Enrichr results Basic Protocol 8: Using the Enrichr API Basic Protocol 9: Adding an Enrichr button to a website.
Asunto(s)
Descubrimiento del Conocimiento , Programas Informáticos , Animales , Biología Computacional , Genómica , Humanos , RNA-SeqRESUMEN
The coronavirus (CoV) severe acute respiratory syndrome (SARS)-CoV-2 (COVID-19) pandemic has received rapid response by the research community to offer suggestions for repurposing of approved drugs as well as to improve our understanding of the COVID-19 viral life cycle molecular mechanisms. In a short period, tens of thousands of research preprints and other publications have emerged including those that report lists of experimentally validated drugs and compounds as potential COVID-19 therapies. In addition, gene sets from interacting COVID-19 virus-host proteins and differentially expressed genes when comparing infected to uninfected cells are being published at a fast rate. To organize this rapidly accumulating knowledge, we developed the COVID-19 Gene and Drug Set Library (https://amp.pharm.mssm.edu/covid19/), a collection of gene and drug sets related to COVID-19 research from multiple sources. The COVID-19 Gene and Drug Set Library is delivered as a web-based interface that enables users to view, download, analyze, visualize, and contribute gene and drug sets related to COVID-19 research. To evaluate the content of the library, we performed several analyses including comparing the results from 6 in-vitro drug screens for COVID-19 repurposing candidates. Surprisingly, we observe little overlap across these initial screens. The most common and unique hit across these screen is mefloquine, a malaria drug that should receive more attention as a potential therapeutic for COVID-19. Overall, the library of gene and drug sets can be used to identify community consensus, make researchers and clinicians aware of the development of new potential therapies, as well as allow the research community to work together towards a cure for COVID-19.
RESUMEN
In a short period, many research publications that report sets of experimentally validated drugs as potential COVID-19 therapies have emerged. To organize this accumulating knowledge, we developed the COVID-19 Drug and Gene Set Library (https://amp.pharm.mssm.edu/covid19/), a collection of drug and gene sets related to COVID-19 research from multiple sources. The platform enables users to view, download, analyze, visualize, and contribute drug and gene sets related to COVID-19 research. To evaluate the content of the library, we compared the results from six in vitro drug screens for COVID-19 repurposing candidates. Surprisingly, we observe low overlap across screens while highlighting overlapping candidates that should receive more attention as potential therapeutics for COVID-19. Overall, the COVID-19 Drug and Gene Set Library can be used to identify community consensus, make researchers and clinicians aware of new potential therapies, enable machine-learning applications, and facilitate the research community to work together toward a cure.
RESUMEN
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.
RESUMEN
The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program that catalogs how human cells globally respond to chemical, genetic, and disease perturbations. Resources generated by LINCS include experimental and computational methods, visualization tools, molecular and imaging data, and signatures. By assembling an integrated picture of the range of responses of human cells exposed to many perturbations, the LINCS program aims to better understand human disease and to advance the development of new therapies. Perturbations under study include drugs, genetic perturbations, tissue micro-environments, antibodies, and disease-causing mutations. Responses to perturbations are measured by transcript profiling, mass spectrometry, cell imaging, and biochemical methods, among other assays. The LINCS program focuses on cellular physiology shared among tissues and cell types relevant to an array of diseases, including cancer, heart disease, and neurodegenerative disorders. This Perspective describes LINCS technologies, datasets, tools, and approaches to data accessibility and reusability.