Búsqueda | BVS Bolivia

1.

Rummagene: massive mining of gene sets from supporting materials of biomedical research publications.

Clarke, Daniel J B; Marino, Giacomo B; Deng, Eden Z; Xie, Zhuorui; Evangelista, John Erol; Ma'ayan, Avi.

Commun Biol ; 7(1): 482, 2024 Apr 20.

Artículo en Inglés | MEDLINE | ID: mdl-38643247

RESUMEN

Many biomedical research publications contain gene sets in their supporting tables, and these sets are currently not available for search and reuse. By crawling PubMed Central, the Rummagene server provides access to hundreds of thousands of such mammalian gene sets. So far, we scanned 5,448,589 articles to find 121,237 articles that contain 642,389 gene sets. These sets are served for enrichment analysis, free text, and table title search. Investigating statistical patterns within the Rummagene database, we demonstrate that Rummagene can be used for transcription factor and kinase enrichment analyses, and for gene function predictions. By combining gene set similarity with abstract similarity, Rummagene can find surprising relationships between biological processes, concepts, and named entities. Overall, Rummagene brings to surface the ability to search a massive collection of published biomedical datasets that are currently buried and inaccessible. The Rummagene web application is available at https://rummagene.com .

Asunto(s)

Investigación Biomédica , Minería de Datos , Animales , Programas Informáticos , Bases de Datos Factuales , Regulación de la Expresión Génica , Mamíferos

2.

D2H2: diabetes data and hypothesis hub.

Marino, Giacomo B; Ahmed, Nasheath; Xie, Zhuorui; Jagodnik, Kathleen M; Han, Jason; Clarke, Daniel J B; Lachmann, Alexander; Keller, Mark P; Attie, Alan D; Ma'ayan, Avi.

Bioinform Adv ; 3(1): vbad178, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38107655

RESUMEN

Motivation: There is a rapid growth in the production of omics datasets collected by the diabetes research community. However, such published data are underutilized for knowledge discovery. To make bioinformatics tools and published omics datasets from the diabetes field more accessible to biomedical researchers, we developed the Diabetes Data and Hypothesis Hub (D2H2). Results: D2H2 contains hundreds of high-quality curated transcriptomics datasets relevant to diabetes, accessible via a user-friendly web-based portal. The collected and processed datasets are curated from the Gene Expression Omnibus (GEO). Each curated study has a dedicated page that provides data visualization, differential gene expression analysis, and single-gene queries. To enable the investigation of these curated datasets and to provide easy access to bioinformatics tools that serve gene and gene set-related knowledge, we developed the D2H2 chatbot. Utilizing GPT, we prompt users to enter free text about their data analysis needs. Parsing the user prompt, together with specifying information about all D2H2 available tools and workflows, we answer user queries by invoking the most relevant tools via the tools' API. D2H2 also has a hypotheses generation module where gene sets are randomly selected from the bulk RNA-seq precomputed signatures. We then find highly overlapping gene sets extracted from publications listed in PubMed Central with abstract dissimilarity. With the help of GPT, we speculate about a possible explanation of the high overlap between the gene sets. Overall, D2H2 is a platform that provides a suite of bioinformatics tools and curated transcriptomics datasets for hypothesis generation. Availability and implementation: D2H2 is available at: https://d2h2.maayanlab.cloud/ and the source code is available from GitHub at https://github.com/MaayanLab/D2H2-site under the CC BY-NC 4.0 license.

3.

Dex-Benchmark: datasets and code to evaluate algorithms for transcriptomics data analysis.

Xie, Zhuorui; Chen, Clara; Ma'ayan, Avi.

PeerJ ; 11: e16351, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37953774

RESUMEN

Many tools and algorithms are available for analyzing transcriptomics data. These include algorithms for performing sequence alignment, data normalization and imputation, clustering, identifying differentially expressed genes, and performing gene set enrichment analysis. To make the best choice about which tools to use, objective benchmarks can be developed to compare the quality of different algorithms to extract biological knowledge maximally and accurately from these data. The Dexamethasone Benchmark (Dex-Benchmark) resource aims to fill this need by providing the community with datasets and code templates for benchmarking different gene expression analysis tools and algorithms. The resource provides access to a collection of curated RNA-seq, L1000, and ChIP-seq data from dexamethasone treatment as well as genetic perturbations of its known targets. In addition, the website provides Jupyter Notebooks that use these pre-processed curated datasets to demonstrate how to benchmark the different steps in gene expression analysis. By comparing two independent data sources and data types with some expected concordance, we can assess which tools and algorithms best recover such associations. To demonstrate the usefulness of the resource for discovering novel drug targets, we applied it to optimize data processing strategies for the chemical perturbations and CRISPR single gene knockouts from the L1000 transcriptomics data from the Library of Integrated Network Cellular Signatures (LINCS) program, with a focus on understudied proteins from the Illuminating the Druggable Genome (IDG) program. Overall, the Dex-Benchmark resource can be utilized to assess the quality of transcriptomics and other related bioinformatics data analysis workflows. The resource is available from: https://maayanlab.github.io/dex-benchmark.

Asunto(s)

Benchmarking , Transcriptoma , Transcriptoma/genética , Algoritmos , Perfilación de la Expresión Génica , Dexametasona

4.

Toxicology knowledge graph for structural birth defects.

Evangelista, John Erol; Clarke, Daniel J B; Xie, Zhuorui; Marino, Giacomo B; Utti, Vivian; Jenkins, Sherry L; Ahooyi, Taha Mohseni; Bologa, Cristian G; Yang, Jeremy J; Binder, Jessica L; Kumar, Praveen; Lambert, Christophe G; Grethe, Jeffrey S; Wenger, Eric; Taylor, Deanne; Oprea, Tudor I; de Bono, Bernard; Ma'ayan, Avi.

Commun Med (Lond) ; 3(1): 98, 2023 Jul 17.

Artículo en Inglés | MEDLINE | ID: mdl-37460679

RESUMEN

BACKGROUND: Birth defects are functional and structural abnormalities that impact about 1 in 33 births in the United States. They have been attributed to genetic and other factors such as drugs, cosmetics, food, and environmental pollutants during pregnancy, but for most birth defects there are no known causes. METHODS: To further characterize associations between small molecule compounds and their potential to induce specific birth abnormalities, we gathered knowledge from multiple sources to construct a reproductive toxicity Knowledge Graph (ReproTox-KG) with a focus on associations between birth defects, drugs, and genes. Specifically, we gathered data from drug/birth-defect associations from co-mentions in published abstracts, gene/birth-defect associations from genetic studies, drug- and preclinical-compound-induced gene expression changes in cell lines, known drug targets, genetic burden scores for human genes, and placental crossing scores for small molecules. RESULTS: Using ReproTox-KG and semi-supervised learning (SSL), we scored >30,000 preclinical small molecules for their potential to cross the placenta and induce birth defects, and identified >500 birth-defect/gene/drug cliques that can be used to explain molecular mechanisms for drug-induced birth defects. The ReproTox-KG can be accessed via a web-based user interface available at https://maayanlab.cloud/reprotox-kg . This site enables users to explore the associations between birth defects, approved and preclinical drugs, and all human genes. CONCLUSIONS: ReproTox-KG provides a resource for exploring knowledge about the molecular mechanisms of birth defects with the potential of predicting the likelihood of genes and preclinical small molecules to induce birth defects.

While birth defects are common, for most birth defects there are no known causes. During pregnancy, developing babies are exposed to drugs, cosmetics, food, and environmental pollutants that may cause birth defects. However, exactly how these environmental factors are involved in producing birth defects is difficult to discern. Also, birth defects can be a consequence of the genes inherited from the parents. We combined general data about human genes and drugs with specific data previously implicating genes and drugs in inducing birth defects to create a knowledge graph representation that connects genes, drugs, and birth defects. This knowledge graph can be used to explore new links that may explain why birth defects occur, particularly those that result from a combination of inherited and environmental influences.

5.

GeneRanger and TargetRanger: processed gene and protein expression levels across cells and tissues for target discovery.

Marino, Giacomo B; Ngai, Michael; Clarke, Daniel J B; Fleishman, Reid H; Deng, Eden Z; Xie, Zhuorui; Ahmed, Nasheath; Ma'ayan, Avi.

Nucleic Acids Res ; 51(W1): W213-W224, 2023 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-37166966

RESUMEN

Several atlasing efforts aim to profile human gene and protein expression across tissues, cell types and cell lines in normal physiology, development and disease. One utility of these resources is to examine the expression of a single gene across all cell types, tissues and cell lines in each atlas. However, there is currently no centralized place that integrates data from several atlases to provide this type of data in a uniform format for visualization, analysis and download, and via an application programming interface. To address this need, GeneRanger is a web server that provides access to processed data about gene and protein expression across normal human cell types, tissues and cell lines from several atlases. At the same time, TargetRanger is a related web server that takes as input RNA-seq data from profiled human cells and tissues, and then compares the uploaded input data to expression levels across the atlases to identify genes that are highly expressed in the input and lowly expressed across normal human cell types and tissues. Identified targets can be filtered by transmembrane or secreted proteins. The results from GeneRanger and TargetRanger are visualized as box and scatter plots, and as interactive tables. GeneRanger and TargetRanger are available from https://generanger.maayanlab.cloud and https://targetranger.maayanlab.cloud, respectively.

Asunto(s)

Proteómica , Seudogenes , Programas Informáticos , Humanos , Línea Celular , RNA-Seq , Internet

6.

Enrichr-KG: bridging enrichment analysis across multiple libraries.

Evangelista, John Erol; Xie, Zhuorui; Marino, Giacomo B; Nguyen, Nhi; Clarke, Daniel J B; Ma'ayan, Avi.

Nucleic Acids Res ; 51(W1): W168-W179, 2023 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-37166973

RESUMEN

Gene and protein set enrichment analysis is a critical step in the analysis of data collected from omics experiments. Enrichr is a popular gene set enrichment analysis web-server search engine that contains hundreds of thousands of annotated gene sets. While Enrichr has been useful in providing enrichment analysis with many gene set libraries from different categories, integrating enrichment results across libraries and domains of knowledge can further hypothesis generation. To this end, Enrichr-KG is a knowledge graph database and a web-server application that combines selected gene set libraries from Enrichr for integrative enrichment analysis and visualization. The enrichment results are presented as subgraphs made of nodes and links that connect genes to their enriched terms. In addition, users of Enrichr-KG can add gene-gene links, as well as predicted genes to the subgraphs. This graphical representation of cross-library results with enriched and predicted genes can illuminate hidden associations between genes and annotated enriched terms from across datasets and resources. Enrichr-KG currently serves 26 gene set libraries from different categories that include transcription, pathways, ontologies, diseases/drugs, and cell types. To demonstrate the utility of Enrichr-KG we provide several case studies. Enrichr-KG is freely available at: https://maayanlab.cloud/enrichr-kg.

Asunto(s)

Biblioteca de Genes , Proteínas , Programas Informáticos , Bases de Datos Factuales , Motor de Búsqueda , Internet

7.

Computational screen to identify potential targets for immunotherapeutic identification and removal of senescence cells.

Deng, Eden Z; Fleishman, Reid H; Xie, Zhuorui; Marino, Giacomo B; Clarke, Daniel J B; Ma'ayan, Avi.

Aging Cell ; 22(6): e13809, 2023 06.

Artículo en Inglés | MEDLINE | ID: mdl-37082798

RESUMEN

To prioritize gene and protein candidates that may enable the selective identification and removal of senescent cells, we compared gene expression signatures from replicative senescent cells to transcriptomics and proteomics atlases of normal human tissues and cell types. RNA-seq samples from in vitro senescent cells (6 studies, 13 conditions) were analyzed for identifying targets at the gene and transcript levels that are highly expressed in senescent cells compared to their expression in normal human tissues and cell types. A gene set made of 301 genes called SenoRanger was established based on consensus analysis across studies and backgrounds. Of the identified senescence-associated targets, 29% of the genes in SenoRanger are also highly differentially expressed in aged tissues from GTEx. The SenoRanger gene set includes previously known as well as novel senescence-associated genes. Pathway analysis that connected the SenoRanger genes to their functional annotations confirms their potential role in several aging and senescence-related processes. Overall, SenoRanger provides solid hypotheses about potentially useful targets for identifying and removing senescence cells.

Asunto(s)

Envejecimiento , Senescencia Celular , Humanos , Anciano , Senescencia Celular/genética , Envejecimiento/genética , Perfilación de la Expresión Génica , Línea Celular , Inmunoterapia

8.

lncHUB2: aggregated and inferred knowledge about human and mouse lncRNAs.

Marino, Giacomo B; Wojciechowicz, Megan L; Clarke, Daniel J B; Kuleshov, Maxim V; Xie, Zhuorui; Jeon, Minji; Lachmann, Alexander; Ma'ayan, Avi.

Database (Oxford) ; 20232023 03 04.

Artículo en Inglés | MEDLINE | ID: mdl-36869839

RESUMEN

Long non-coding ribonucleic acids (lncRNAs) account for the largest group of non-coding RNAs. However, knowledge about their function and regulation is limited. lncHUB2 is a web server database that provides known and inferred knowledge about the function of 18 705 human and 11 274 mouse lncRNAs. lncHUB2 produces reports that contain the secondary structure fold of the lncRNA, related publications, the most correlated coding genes, the most correlated lncRNAs, a network that visualizes the most correlated genes, predicted mouse phenotypes, predicted membership in biological processes and pathways, predicted upstream transcription factor regulators, and predicted disease associations. In addition, the reports include subcellular localization information; expression across tissues, cell types, and cell lines, and predicted small molecules and CRISPR knockout (CRISPR-KO) genes prioritized based on their likelihood to up- or downregulate the expression of the lncRNA. Overall, lncHUB2 is a database with rich information about human and mouse lncRNAs and as such it can facilitate hypothesis generation for many future studies. The lncHUB2 database is available at https://maayanlab.cloud/lncHUB2. Database URL: https://maayanlab.cloud/lncHUB2.

Asunto(s)

ARN Largo no Codificante , Humanos , Animales , Ratones , Línea Celular , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Bases de Datos Factuales , Conocimiento

9.

A multi-omic analysis of MCF10A cells provides a resource for integrative assessment of ligand-mediated molecular and phenotypic responses.

Gross, Sean M; Dane, Mark A; Smith, Rebecca L; Devlin, Kaylyn L; McLean, Ian C; Derrick, Daniel S; Mills, Caitlin E; Subramanian, Kartik; London, Alexandra B; Torre, Denis; Evangelista, John Erol; Clarke, Daniel J B; Xie, Zhuorui; Erdem, Cemal; Lyons, Nicholas; Natoli, Ted; Pessa, Sarah; Lu, Xiaodong; Mullahoo, James; Li, Jonathan; Adam, Miriam; Wassie, Brook; Liu, Moqing; Kilburn, David F; Liby, Tiera A; Bucher, Elmar; Sanchez-Aguila, Crystal; Daily, Kenneth; Omberg, Larsson; Wang, Yunguan; Jacobson, Connor; Yapp, Clarence; Chung, Mirra; Vidovic, Dusica; Lu, Yiling; Schurer, Stephan; Lee, Albert; Pillai, Ajay; Subramanian, Aravind; Papanastasiou, Malvina; Fraenkel, Ernest; Feiler, Heidi S; Mills, Gordon B; Jaffe, Jake D; Ma'ayan, Avi; Birtwistle, Marc R; Sorger, Peter K; Korkola, James E; Gray, Joe W; Heiser, Laura M.

Commun Biol ; 5(1): 1066, 2022 10 07.

Artículo en Inglés | MEDLINE | ID: mdl-36207580

RESUMEN

The phenotype of a cell and its underlying molecular state is strongly influenced by extracellular signals, including growth factors, hormones, and extracellular matrix proteins. While these signals are normally tightly controlled, their dysregulation leads to phenotypic and molecular states associated with diverse diseases. To develop a detailed understanding of the linkage between molecular and phenotypic changes, we generated a comprehensive dataset that catalogs the transcriptional, proteomic, epigenomic and phenotypic responses of MCF10A mammary epithelial cells after exposure to the ligands EGF, HGF, OSM, IFNG, TGFB and BMP2. Systematic assessment of the molecular and cellular phenotypes induced by these ligands comprise the LINCS Microenvironment (ME) perturbation dataset, which has been curated and made publicly available for community-wide analysis and development of novel computational methods ( synapse.org/LINCS_MCF10A ). In illustrative analyses, we demonstrate how this dataset can be used to discover functionally related molecular features linked to specific cellular phenotypes. Beyond these analyses, this dataset will serve as a resource for the broader scientific community to mine for biological insights, to compare signals carried across distinct molecular modalities, and to develop new computational methods for integrative data analysis.

Asunto(s)

Factor de Crecimiento Epidérmico , Proteómica , Factor de Crecimiento Epidérmico/farmacología , Proteínas de la Matriz Extracelular , Ligandos , Fenotipo

10.

Transforming L1000 profiles to RNA-seq-like profiles with deep learning.

Jeon, Minji; Xie, Zhuorui; Evangelista, John E; Wojciechowicz, Megan L; Clarke, Daniel J B; Ma'ayan, Avi.

BMC Bioinformatics ; 23(1): 374, 2022 Sep 13.

Artículo en Inglés | MEDLINE | ID: mdl-36100892

RESUMEN

The L1000 technology, a cost-effective high-throughput transcriptomics technology, has been applied to profile a collection of human cell lines for their gene expression response to > 30,000 chemical and genetic perturbations. In total, there are currently over 3 million available L1000 profiles. Such a dataset is invaluable for the discovery of drug and target candidates and for inferring mechanisms of action for small molecules. The L1000 assay only measures the mRNA expression of 978 landmark genes while 11,350 additional genes are computationally reliably inferred. The lack of full genome coverage limits knowledge discovery for half of the human protein coding genes, and the potential for integration with other transcriptomics profiling data. Here we present a Deep Learning two-step model that transforms L1000 profiles to RNA-seq-like profiles. The input to the model are the measured 978 landmark genes while the output is a vector of 23,614 RNA-seq-like gene expression profiles. The model first transforms the landmark genes into RNA-seq-like 978 gene profiles using a modified CycleGAN model applied to unpaired data. The transformed 978 RNA-seq-like landmark genes are then extrapolated into the full genome space with a fully connected neural network model. The two-step model achieves 0.914 Pearson's correlation coefficients and 1.167 root mean square errors when tested on a published paired L1000/RNA-seq dataset produced by the LINCS and GTEx programs. The processed RNA-seq-like profiles are made available for download, signature search, and gene centric reverse search with unique case studies.

Asunto(s)

Aprendizaje Profundo , Perfilación de la Expresión Génica , Humanos , RNA-Seq , Transcriptoma

11.

Getting Started with LINCS Datasets and Tools.

Xie, Zhuorui; Kropiwnicki, Eryk; Wojciechowicz, Megan L; Jagodnik, Kathleen M; Shu, Ingrid; Bailey, Allison; Clarke, Daniel J B; Jeon, Minji; Evangelista, John Erol; V Kuleshov, Maxim; Lachmann, Alexander; Parigi, Abhijna A; Sanchez, Jose M; Jenkins, Sherry L; Ma'ayan, Avi.

Curr Protoc ; 2(7): e487, 2022 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-35876555

RESUMEN

The Library of Integrated Network-based Cellular Signatures (LINCS) was an NIH Common Fund program that aimed to expand our knowledge about human cellular responses to chemical, genetic, and microenvironment perturbations. Responses to perturbations were measured by transcriptomics, proteomics, cellular imaging, and other high content assays. The second phase of the LINCS program, which lasted 7 years, involved the engagement of six data and signature generation centers (DSGCs) and one data coordination and integration center (DCIC). The DSGCs and the DCIC developed several digital resources, including tools, databases, and workflows that aim to facilitate the use of the LINCS data and integrate this data with other publicly available data. The digital resources developed by the DSGCs and the DCIC can be used to gain new biological and pharmacological insights that can lead to the development of novel therapeutics. This protocol provides step-by-step instructions for processing the LINCS data into signatures, and utilizing the digital resources developed by the LINCS consortia for hypothesis generation and knowledge discovery. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Navigating L1000 tools and data in CLUE.io Basic Protocol 2: Computing signatures from the L1000 data with the CD method Basic Protocol 3: Analyzing lists of differentially expressed genes and querying them against the L1000 data with BioJupies and the Bulk RNA-seq Appyter Basic Protocol 4: Utilizing the L1000FWD resource for drug discovery Basic Protocol 5: KINOMEscan and the KINOMEscan Appyter Basic Protocol 6: LINCS P100 and GCP Proteomics Assays Basic Protocol 7: The LINCS Joint Project (LJP) Basic Protocol 8: The LINCS Data Portals and SigCom LINCS Basic Protocol 9: Creating and analyzing signatures with iLINCS.

Asunto(s)

Descubrimiento de Drogas , Proteómica , Bases de Datos Factuales , Descubrimiento de Drogas/métodos , Biblioteca de Genes , Humanos , Transcriptoma

12.

SigCom LINCS: data and metadata search engine for a million gene expression signatures.

Evangelista, John Erol; Clarke, Daniel J B; Xie, Zhuorui; Lachmann, Alexander; Jeon, Minji; Chen, Kerwin; Jagodnik, Kathleen M; Jenkins, Sherry L; Kuleshov, Maxim V; Wojciechowicz, Megan L; Schürer, Stephan C; Medvedovic, Mario; Ma'ayan, Avi.

Nucleic Acids Res ; 50(W1): W697-W709, 2022 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-35524556

RESUMEN

Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into searchable signatures along with signatures extracted from Genotype-Tissue Expression (GTEx) and Gene Expression Omnibus (GEO), connections between drugs, genes, pathways and diseases can be illuminated. SigCom LINCS is a webserver that serves over a million gene expression signatures processed, analyzed, and visualized from LINCS, GTEx, and GEO. SigCom LINCS is built with Signature Commons, a cloud-agnostic skeleton Data Commons with a focus on serving searchable signatures. SigCom LINCS provides a rapid signature similarity search for mimickers and reversers given sets of up and down genes, a gene set, a single gene, or any search term. Additionally, users of SigCom LINCS can perform a metadata search to find and analyze subsets of signatures and find information about genes and drugs. SigCom LINCS is findable, accessible, interoperable, and reusable (FAIR) with metadata linked to standard ontologies and vocabularies. In addition, all the data and signatures within SigCom LINCS are available via a well-documented API. In summary, SigCom LINCS, available at https://maayanlab.cloud/sigcom-lincs, is a rich webserver resource for accelerating drug and target discovery in systems pharmacology.

Asunto(s)

Metadatos , Transcriptoma , Transcriptoma/genética , Motor de Búsqueda

13.

Gene and drug landing page aggregator.

Clarke, Daniel J B; Kuleshov, Maxim V; Xie, Zhuorui; Evangelista, John E; Meyers, Marilyn R; Kropiwnicki, Eryk; Jenkins, Sherry L; Ma'ayan, Avi.

Bioinform Adv ; 2(1): vbac013, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35368424

RESUMEN

Motivation: Many biological and biomedical researchers commonly search for information about genes and drugs to gather knowledge from these resources. For the most part, such information is served as landing pages in disparate data repositories and web portals. Results: The Gene and Drug Landing Page Aggregator (GDLPA) provides users with access to 50 gene-centric and 19 drug-centric repositories, enabling them to retrieve landing pages corresponding to their gene and drug queries. Bringing these resources together into one dashboard that directs users to the landing pages across many resources can help centralize gene- and drug-centric knowledge, as well as raise awareness of available resources that may be missed when using standard search engines. To demonstrate the utility of GDLPA, case studies for the gene klotho and the drug remdesivir were developed. The first case study highlights the potential role of klotho as a drug target for aging and kidney disease, while the second study gathers knowledge regarding approval, usage, and safety for remdesivir, the first approved coronavirus disease 2019 therapeutic. Finally, based on our experience, we provide guidelines for developing effective landing pages for genes and drugs. Availability and implementation: GDLPA is open source and is available from: https://cfde-gene-pages.cloud/. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

14.

blitzGSEA: efficient computation of gene set enrichment analysis through gamma distribution approximation.

Lachmann, Alexander; Xie, Zhuorui; Ma'ayan, Avi.

Bioinformatics ; 38(8): 2356-2357, 2022 04 12.

Artículo en Inglés | MEDLINE | ID: mdl-35143610

RESUMEN

MOTIVATION: The identification of pathways and biological processes from differential gene expression is central for interpretation of data collected by transcriptomics assays. Gene set enrichment analysis (GSEA) is the most commonly used algorithm to calculate the significance of the relevancy of an annotated gene set with a differential expression signature. To compute significance, GSEA implements permutation tests which are slow and inaccurate for comparing many differential expression signatures to thousands of annotated gene sets. RESULTS: Here, we present blitzGSEA, an algorithm that is based on the same running sum statistic as GSEA, but instead of performing permutations, blitzGSEA approximates the enrichment score probabilities based on Gamma distributions. blitzGSEA achieves significant improvement in performance compared with prior GSEA implementations, while approximating small P-values more accurately. AVAILABILITY AND IMPLEMENTATION: The data, a python package, together with all source code, and a detailed user guide are available from GitHub at: https://github.com/MaayanLab/blitzgsea. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Programas Informáticos , Perfilación de la Expresión Génica , Probabilidad

15.

DrugShot: querying biomedical search terms to retrieve prioritized lists of small molecules.

Kropiwnicki, Eryk; Lachmann, Alexander; Clarke, Daniel J B; Xie, Zhuorui; Jagodnik, Kathleen M; Ma'ayan, Avi.

BMC Bioinformatics ; 23(1): 76, 2022 Feb 19.

Artículo en Inglés | MEDLINE | ID: mdl-35183110

RESUMEN

BACKGROUND: PubMed contains millions of abstracts that co-mention terms that describe drugs with other biomedical terms such as genes or diseases. Unique opportunities exist for leveraging these co-mentions by integrating them with other drug-drug similarity resources such as the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 signatures to develop novel hypotheses. RESULTS: DrugShot is a web-based server application and an Appyter that enables users to enter any biomedical search term into a simple input form to receive ranked lists of drugs and other small molecules based on their relevance to the search term. To produce ranked lists of small molecules, DrugShot cross-references returned PubMed identifiers (PMIDs) with DrugRIF or AutoRIF, which are curated resources of drug-PMID associations, to produce an associated small molecule list where each small molecule is ranked according to total co-mentions with the search term from shared PubMed IDs. Additionally, using two types of drug-drug similarity matrices, lists of small molecules are predicted to be associated with the search term. Such predictions are based on literature co-mentions and signature similarity from LINCS L1000 drug-induced gene expression profiles. CONCLUSIONS: DrugShot prioritizes drugs and small molecules associated with biomedical search terms. In addition to listing known associations, DrugShot predicts additional drugs and small molecules related to any search term. Hence, DrugShot can be used to prioritize drugs and preclinical compounds for drug repurposing and suggest indications and adverse events for preclinical compounds. DrugShot is freely and openly available at: https://maayanlab.cloud/drugshot and https://appyters.maayanlab.cloud/#/DrugShot .

Asunto(s)

Reposicionamiento de Medicamentos , Programas Informáticos , Biblioteca de Genes , Transcriptoma

16.

KEA3: improved kinase enrichment analysis via data integration.

Kuleshov, Maxim V; Xie, Zhuorui; London, Alexandra B K; Yang, Janice; Evangelista, John Erol; Lachmann, Alexander; Shu, Ingrid; Torre, Denis; Ma'ayan, Avi.

Nucleic Acids Res ; 49(W1): W304-W316, 2021 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-34019655

RESUMEN

Phosphoproteomics and proteomics experiments capture a global snapshot of the cellular signaling network, but these methods do not directly measure kinase state. Kinase Enrichment Analysis 3 (KEA3) is a webserver application that infers overrepresentation of upstream kinases whose putative substrates are in a user-inputted list of proteins. KEA3 can be applied to analyze data from phosphoproteomics and proteomics studies to predict the upstream kinases responsible for observed differential phosphorylations. The KEA3 background database contains measured and predicted kinase-substrate interactions (KSI), kinase-protein interactions (KPI), and interactions supported by co-expression and co-occurrence data. To benchmark the performance of KEA3, we examined whether KEA3 can predict the perturbed kinase from single-kinase perturbation followed by gene expression experiments, and phosphoproteomics data collected from kinase-targeting small molecules. We show that integrating KSIs and KPIs across data sources to produce a composite ranking improves the recovery of the expected kinase. The KEA3 webserver is available at https://maayanlab.cloud/kea3.

Asunto(s)

Proteínas Quinasas/metabolismo , Programas Informáticos , Expresión Génica , Humanos , Fosforilación , Inhibidores de Proteínas Quinasas , Proteómica , SARS-CoV-2/enzimología

17.

Appyters: Turning Jupyter Notebooks into data-driven web apps.

Clarke, Daniel J B; Jeon, Minji; Stein, Daniel J; Moiseyev, Nicole; Kropiwnicki, Eryk; Dai, Charles; Xie, Zhuorui; Wojciechowicz, Megan L; Litz, Skylar; Hom, Jason; Evangelista, John Erol; Goldman, Lucas; Zhang, Serena; Yoon, Christine; Ahamed, Tahmid; Bhuiyan, Samantha; Cheng, Minxuan; Karam, Julie; Jagodnik, Kathleen M; Shu, Ingrid; Lachmann, Alexander; Ayling, Sam; Jenkins, Sherry L; Ma'ayan, Avi.

Patterns (N Y) ; 2(3): 100213, 2021 Mar 12.

Artículo en Inglés | MEDLINE | ID: mdl-33748796

RESUMEN

Jupyter Notebooks have transformed the communication of data analysis pipelines by facilitating a modular structure that brings together code, markdown text, and interactive visualizations. Here, we extended Jupyter Notebooks to broaden their accessibility with Appyters. Appyters turn Jupyter Notebooks into fully functional standalone web-based bioinformatics applications. Appyters present to users an entry form enabling them to upload their data and set various parameters for a multitude of data analysis workflows. Once the form is filled, the Appyter executes the corresponding notebook in the cloud, producing the output without requiring the user to interact directly with the code. Appyters were used to create many bioinformatics web-based reusable workflows, including applications to build customized machine learning pipelines, analyze omics data, and produce publishable figures. These Appyters are served in the Appyters Catalog at https://appyters.maayanlab.cloud. In summary, Appyters enable the rapid development of interactive web-based bioinformatics applications.

18.

Gene Set Knowledge Discovery with Enrichr.

Xie, Zhuorui; Bailey, Allison; Kuleshov, Maxim V; Clarke, Daniel J B; Evangelista, John E; Jenkins, Sherry L; Lachmann, Alexander; Wojciechowicz, Megan L; Kropiwnicki, Eryk; Jagodnik, Kathleen M; Jeon, Minji; Ma'ayan, Avi.

Curr Protoc ; 1(3): e90, 2021 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-33780170

RESUMEN

Profiling samples from patients, tissues, and cells with genomics, transcriptomics, epigenomics, proteomics, and metabolomics ultimately produces lists of genes and proteins that need to be further analyzed and integrated in the context of known biology. Enrichr (Chen et al., 2013; Kuleshov et al., 2016) is a gene set search engine that enables the querying of hundreds of thousands of annotated gene sets. Enrichr uniquely integrates knowledge from many high-profile projects to provide synthesized information about mammalian genes and gene sets. The platform provides various methods to compute gene set enrichment, and the results are visualized in several interactive ways. This protocol provides a summary of the key features of Enrichr, which include using Enrichr programmatically and embedding an Enrichr button on any website. © 2021 Wiley Periodicals LLC. Basic Protocol 1: Analyzing lists of differentially expressed genes from transcriptomics, proteomics and phosphoproteomics, GWAS studies, or other experimental studies Basic Protocol 2: Searching Enrichr by a single gene or key search term Basic Protocol 3: Preparing raw or processed RNA-seq data through BioJupies in preparation for Enrichr analysis Basic Protocol 4: Analyzing gene sets for model organisms using modEnrichr Basic Protocol 5: Using Enrichr in Geneshot Basic Protocol 6: Using Enrichr in ARCHS4 Basic Protocol 7: Using the enrichment analysis visualization Appyter to visualize Enrichr results Basic Protocol 8: Using the Enrichr API Basic Protocol 9: Adding an Enrichr button to a website.

Asunto(s)

Descubrimiento del Conocimiento , Programas Informáticos , Animales , Biología Computacional , Genómica , Humanos , RNA-Seq

19.

Interoperable RNA-Seq analysis in the cloud.

Lachmann, Alexander; Clarke, Daniel J B; Torre, Denis; Xie, Zhuorui; Ma'ayan, Avi.

Biochim Biophys Acta Gene Regul Mech ; 1863(6): 194521, 2020 06.

Artículo en Inglés | MEDLINE | ID: mdl-32156561

RESUMEN

RNA-Sequencing (RNA-Seq) is currently the leading technology for genome-wide transcript quantification. Mapping the raw reads to transcript and gene level counts can be achieved by different aligners. Here we report an in-depth comparison of transcript quantification methods. Our goal is the specific use of cost-efficient RNA-Seq analysis for deployment in a cloud infrastructure composed of interacting microservices. The individual modules cover file transfer into the cloud and APIs to handle the cloud alignment jobs. We next demonstrate how newly generated RNA-Seq data can be placed in the context of thousands of previously published datasets in near real time. With in-depth benchmarks, we identify suitable gene count quantification methods to facilitate cost-effective, accurate, and cloud-based RNA-Seq analysis service. Pseudo-alignment algorithms such as kallisto and Salmon combine high read quality estimation with cost efficient runtime performance. HISAT2 is the fastest of the classical aligners with good alignment quality. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.

Asunto(s)

Nube Computacional , Análisis de Secuencia de ARN , Algoritmos , Animales , Benchmarking , Humanos , Ratones , Reacción en Cadena en Tiempo Real de la Polimerasa , Alineación de Secuencia

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA