Búsqueda | Portal Regional de la BVS

Rummagene: massive mining of gene sets from supporting materials of biomedical research publications.

Clarke, Daniel J B; Marino, Giacomo B; Deng, Eden Z; Xie, Zhuorui; Evangelista, John Erol; Ma'ayan, Avi.

Commun Biol ; 7(1): 482, 2024 Apr 20.

Artículo en Inglés | MEDLINE | ID: mdl-38643247

RESUMEN

Many biomedical research publications contain gene sets in their supporting tables, and these sets are currently not available for search and reuse. By crawling PubMed Central, the Rummagene server provides access to hundreds of thousands of such mammalian gene sets. So far, we scanned 5,448,589 articles to find 121,237 articles that contain 642,389 gene sets. These sets are served for enrichment analysis, free text, and table title search. Investigating statistical patterns within the Rummagene database, we demonstrate that Rummagene can be used for transcription factor and kinase enrichment analyses, and for gene function predictions. By combining gene set similarity with abstract similarity, Rummagene can find surprising relationships between biological processes, concepts, and named entities. Overall, Rummagene brings to surface the ability to search a massive collection of published biomedical datasets that are currently buried and inaccessible. The Rummagene web application is available at https://rummagene.com .

Asunto(s)

Investigación Biomédica , Minería de Datos , Animales , Programas Informáticos , Bases de Datos Factuales , Regulación de la Expresión Génica , Mamíferos

RummaGEO: Automatic Mining of Human and Mouse Gene Sets from GEO.

Marino, Giacomo B; Clarke, Daniel J B; Deng, Eden Z; Ma'ayan, Avi.

bioRxiv ; 2024 Apr 13.

Artículo en Inglés | MEDLINE | ID: mdl-38645198

RESUMEN

The Gene Expression Omnibus (GEO) is a major open biomedical research repository for transcriptomics and other omics datasets. It currently contains millions of gene expression samples from tens of thousands of studies collected by many biomedical research laboratories from around the world. While users of the GEO repository can search the metadata describing studies for locating relevant datasets, there are currently no methods or resources that facilitate global search of GEO at the data level. To address this shortcoming, we developed RummaGEO, a webserver application that enables gene expression signature search of a large collection of human and mouse RNA-seq studies deposited into GEO. To develop the search engine, we performed offline automatic identification of sample conditions from the uniformly aligned GEO studies available from ARCHS4. We then computed differential expression signatures to extract gene sets from these studies. In total, RummaGEO currently contains 135,264 human and 158,062 mouse gene sets extracted from 23,395 GEO studies. Next, we analyzed the contents of the RummaGEO database to identify statistical patterns and perform various global analyses. The contents of the RummaGEO database are provided as a web-server search engine with signature search, PubMed search, and metadata search functionalities. Overall, RummaGEO provides an unprecedented resource for the biomedical research community enabling hypothesis generation for many future studies. The RummaGEO search engine is available from: https://rummageo.com/.

Pan-cancer proteogenomics characterization of tumor immunity.

Petralia, Francesca; Ma, Weiping; Yaron, Tomer M; Caruso, Francesca Pia; Tignor, Nicole; Wang, Joshua M; Charytonowicz, Daniel; Johnson, Jared L; Huntsman, Emily M; Marino, Giacomo B; Calinawan, Anna; Evangelista, John Erol; Selvan, Myvizhi Esai; Chowdhury, Shrabanti; Rykunov, Dmitry; Krek, Azra; Song, Xiaoyu; Turhan, Berk; Christianson, Karen E; Lewis, David A; Deng, Eden Z; Clarke, Daniel J B; Whiteaker, Jeffrey R; Kennedy, Jacob J; Zhao, Lei; Segura, Rossana Lazcano; Batra, Harsh; Raso, Maria Gabriela; Parra, Edwin Roger; Soundararajan, Rama; Tang, Ximing; Li, Yize; Yi, Xinpei; Satpathy, Shankha; Wang, Ying; Wiznerowicz, Maciej; González-Robles, Tania J; Iavarone, Antonio; Gosline, Sara J C; Reva, Boris; Robles, Ana I; Nesvizhskii, Alexey I; Mani, D R; Gillette, Michael A; Klein, Robert J; Cieslik, Marcin; Zhang, Bing; Paulovich, Amanda G; Sebra, Robert; Gümüs, Zeynep H.

Cell ; 187(5): 1255-1277.e27, 2024 Feb 29.

Artículo en Inglés | MEDLINE | ID: mdl-38359819

RESUMEN

Despite the successes of immunotherapy in cancer treatment over recent decades, less than <10%-20% cancer cases have demonstrated durable responses from immune checkpoint blockade. To enhance the efficacy of immunotherapies, combination therapies suppressing multiple immune evasion mechanisms are increasingly contemplated. To better understand immune cell surveillance and diverse immune evasion responses in tumor tissues, we comprehensively characterized the immune landscape of more than 1,000 tumors across ten different cancers using CPTAC pan-cancer proteogenomic data. We identified seven distinct immune subtypes based on integrative learning of cell type compositions and pathway activities. We then thoroughly categorized unique genomic, epigenetic, transcriptomic, and proteomic changes associated with each subtype. Further leveraging the deep phosphoproteomic data, we studied kinase activities in different immune subtypes, which revealed potential subtype-specific therapeutic targets. Insights from this work will facilitate the development of future immunotherapy strategies and enhance precision targeting with existing agents.

Asunto(s)

Neoplasias , Proteogenómica , Humanos , Terapia Combinada , Genómica , Neoplasias/genética , Neoplasias/inmunología , Neoplasias/terapia , Proteómica , Escape del Tumor

D2H2: diabetes data and hypothesis hub.

Marino, Giacomo B; Ahmed, Nasheath; Xie, Zhuorui; Jagodnik, Kathleen M; Han, Jason; Clarke, Daniel J B; Lachmann, Alexander; Keller, Mark P; Attie, Alan D; Ma'ayan, Avi.

Bioinform Adv ; 3(1): vbad178, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38107655

RESUMEN

Motivation: There is a rapid growth in the production of omics datasets collected by the diabetes research community. However, such published data are underutilized for knowledge discovery. To make bioinformatics tools and published omics datasets from the diabetes field more accessible to biomedical researchers, we developed the Diabetes Data and Hypothesis Hub (D2H2). Results: D2H2 contains hundreds of high-quality curated transcriptomics datasets relevant to diabetes, accessible via a user-friendly web-based portal. The collected and processed datasets are curated from the Gene Expression Omnibus (GEO). Each curated study has a dedicated page that provides data visualization, differential gene expression analysis, and single-gene queries. To enable the investigation of these curated datasets and to provide easy access to bioinformatics tools that serve gene and gene set-related knowledge, we developed the D2H2 chatbot. Utilizing GPT, we prompt users to enter free text about their data analysis needs. Parsing the user prompt, together with specifying information about all D2H2 available tools and workflows, we answer user queries by invoking the most relevant tools via the tools' API. D2H2 also has a hypotheses generation module where gene sets are randomly selected from the bulk RNA-seq precomputed signatures. We then find highly overlapping gene sets extracted from publications listed in PubMed Central with abstract dissimilarity. With the help of GPT, we speculate about a possible explanation of the high overlap between the gene sets. Overall, D2H2 is a platform that provides a suite of bioinformatics tools and curated transcriptomics datasets for hypothesis generation. Availability and implementation: D2H2 is available at: https://d2h2.maayanlab.cloud/ and the source code is available from GitHub at https://github.com/MaayanLab/D2H2-site under the CC BY-NC 4.0 license.

Toxicology knowledge graph for structural birth defects.

Evangelista, John Erol; Clarke, Daniel J B; Xie, Zhuorui; Marino, Giacomo B; Utti, Vivian; Jenkins, Sherry L; Ahooyi, Taha Mohseni; Bologa, Cristian G; Yang, Jeremy J; Binder, Jessica L; Kumar, Praveen; Lambert, Christophe G; Grethe, Jeffrey S; Wenger, Eric; Taylor, Deanne; Oprea, Tudor I; de Bono, Bernard; Ma'ayan, Avi.

Commun Med (Lond) ; 3(1): 98, 2023 Jul 17.

Artículo en Inglés | MEDLINE | ID: mdl-37460679

RESUMEN

BACKGROUND: Birth defects are functional and structural abnormalities that impact about 1 in 33 births in the United States. They have been attributed to genetic and other factors such as drugs, cosmetics, food, and environmental pollutants during pregnancy, but for most birth defects there are no known causes. METHODS: To further characterize associations between small molecule compounds and their potential to induce specific birth abnormalities, we gathered knowledge from multiple sources to construct a reproductive toxicity Knowledge Graph (ReproTox-KG) with a focus on associations between birth defects, drugs, and genes. Specifically, we gathered data from drug/birth-defect associations from co-mentions in published abstracts, gene/birth-defect associations from genetic studies, drug- and preclinical-compound-induced gene expression changes in cell lines, known drug targets, genetic burden scores for human genes, and placental crossing scores for small molecules. RESULTS: Using ReproTox-KG and semi-supervised learning (SSL), we scored >30,000 preclinical small molecules for their potential to cross the placenta and induce birth defects, and identified >500 birth-defect/gene/drug cliques that can be used to explain molecular mechanisms for drug-induced birth defects. The ReproTox-KG can be accessed via a web-based user interface available at https://maayanlab.cloud/reprotox-kg . This site enables users to explore the associations between birth defects, approved and preclinical drugs, and all human genes. CONCLUSIONS: ReproTox-KG provides a resource for exploring knowledge about the molecular mechanisms of birth defects with the potential of predicting the likelihood of genes and preclinical small molecules to induce birth defects.

While birth defects are common, for most birth defects there are no known causes. During pregnancy, developing babies are exposed to drugs, cosmetics, food, and environmental pollutants that may cause birth defects. However, exactly how these environmental factors are involved in producing birth defects is difficult to discern. Also, birth defects can be a consequence of the genes inherited from the parents. We combined general data about human genes and drugs with specific data previously implicating genes and drugs in inducing birth defects to create a knowledge graph representation that connects genes, drugs, and birth defects. This knowledge graph can be used to explore new links that may explain why birth defects occur, particularly those that result from a combination of inherited and environmental influences.

GeneRanger and TargetRanger: processed gene and protein expression levels across cells and tissues for target discovery.

Marino, Giacomo B; Ngai, Michael; Clarke, Daniel J B; Fleishman, Reid H; Deng, Eden Z; Xie, Zhuorui; Ahmed, Nasheath; Ma'ayan, Avi.

Nucleic Acids Res ; 51(W1): W213-W224, 2023 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-37166966

RESUMEN

Several atlasing efforts aim to profile human gene and protein expression across tissues, cell types and cell lines in normal physiology, development and disease. One utility of these resources is to examine the expression of a single gene across all cell types, tissues and cell lines in each atlas. However, there is currently no centralized place that integrates data from several atlases to provide this type of data in a uniform format for visualization, analysis and download, and via an application programming interface. To address this need, GeneRanger is a web server that provides access to processed data about gene and protein expression across normal human cell types, tissues and cell lines from several atlases. At the same time, TargetRanger is a related web server that takes as input RNA-seq data from profiled human cells and tissues, and then compares the uploaded input data to expression levels across the atlases to identify genes that are highly expressed in the input and lowly expressed across normal human cell types and tissues. Identified targets can be filtered by transmembrane or secreted proteins. The results from GeneRanger and TargetRanger are visualized as box and scatter plots, and as interactive tables. GeneRanger and TargetRanger are available from https://generanger.maayanlab.cloud and https://targetranger.maayanlab.cloud, respectively.

Asunto(s)

Proteómica , Seudogenes , Programas Informáticos , Humanos , Línea Celular , RNA-Seq , Internet

Enrichr-KG: bridging enrichment analysis across multiple libraries.

Evangelista, John Erol; Xie, Zhuorui; Marino, Giacomo B; Nguyen, Nhi; Clarke, Daniel J B; Ma'ayan, Avi.

Nucleic Acids Res ; 51(W1): W168-W179, 2023 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-37166973

RESUMEN

Gene and protein set enrichment analysis is a critical step in the analysis of data collected from omics experiments. Enrichr is a popular gene set enrichment analysis web-server search engine that contains hundreds of thousands of annotated gene sets. While Enrichr has been useful in providing enrichment analysis with many gene set libraries from different categories, integrating enrichment results across libraries and domains of knowledge can further hypothesis generation. To this end, Enrichr-KG is a knowledge graph database and a web-server application that combines selected gene set libraries from Enrichr for integrative enrichment analysis and visualization. The enrichment results are presented as subgraphs made of nodes and links that connect genes to their enriched terms. In addition, users of Enrichr-KG can add gene-gene links, as well as predicted genes to the subgraphs. This graphical representation of cross-library results with enriched and predicted genes can illuminate hidden associations between genes and annotated enriched terms from across datasets and resources. Enrichr-KG currently serves 26 gene set libraries from different categories that include transcription, pathways, ontologies, diseases/drugs, and cell types. To demonstrate the utility of Enrichr-KG we provide several case studies. Enrichr-KG is freely available at: https://maayanlab.cloud/enrichr-kg.

Asunto(s)

Biblioteca de Genes , Proteínas , Programas Informáticos , Bases de Datos Factuales , Motor de Búsqueda , Internet

Computational screen to identify potential targets for immunotherapeutic identification and removal of senescence cells.

Deng, Eden Z; Fleishman, Reid H; Xie, Zhuorui; Marino, Giacomo B; Clarke, Daniel J B; Ma'ayan, Avi.

Aging Cell ; 22(6): e13809, 2023 06.

Artículo en Inglés | MEDLINE | ID: mdl-37082798

RESUMEN

To prioritize gene and protein candidates that may enable the selective identification and removal of senescent cells, we compared gene expression signatures from replicative senescent cells to transcriptomics and proteomics atlases of normal human tissues and cell types. RNA-seq samples from in vitro senescent cells (6 studies, 13 conditions) were analyzed for identifying targets at the gene and transcript levels that are highly expressed in senescent cells compared to their expression in normal human tissues and cell types. A gene set made of 301 genes called SenoRanger was established based on consensus analysis across studies and backgrounds. Of the identified senescence-associated targets, 29% of the genes in SenoRanger are also highly differentially expressed in aged tissues from GTEx. The SenoRanger gene set includes previously known as well as novel senescence-associated genes. Pathway analysis that connected the SenoRanger genes to their functional annotations confirms their potential role in several aging and senescence-related processes. Overall, SenoRanger provides solid hypotheses about potentially useful targets for identifying and removing senescence cells.

Asunto(s)

Envejecimiento , Senescencia Celular , Humanos , Anciano , Senescencia Celular/genética , Envejecimiento/genética , Perfilación de la Expresión Génica , Línea Celular , Inmunoterapia

lncHUB2: aggregated and inferred knowledge about human and mouse lncRNAs.

Marino, Giacomo B; Wojciechowicz, Megan L; Clarke, Daniel J B; Kuleshov, Maxim V; Xie, Zhuorui; Jeon, Minji; Lachmann, Alexander; Ma'ayan, Avi.

Database (Oxford) ; 20232023 03 04.

Artículo en Inglés | MEDLINE | ID: mdl-36869839

RESUMEN

Long non-coding ribonucleic acids (lncRNAs) account for the largest group of non-coding RNAs. However, knowledge about their function and regulation is limited. lncHUB2 is a web server database that provides known and inferred knowledge about the function of 18 705 human and 11 274 mouse lncRNAs. lncHUB2 produces reports that contain the secondary structure fold of the lncRNA, related publications, the most correlated coding genes, the most correlated lncRNAs, a network that visualizes the most correlated genes, predicted mouse phenotypes, predicted membership in biological processes and pathways, predicted upstream transcription factor regulators, and predicted disease associations. In addition, the reports include subcellular localization information; expression across tissues, cell types, and cell lines, and predicted small molecules and CRISPR knockout (CRISPR-KO) genes prioritized based on their likelihood to up- or downregulate the expression of the lncRNA. Overall, lncHUB2 is a database with rich information about human and mouse lncRNAs and as such it can facilitate hypothesis generation for many future studies. The lncHUB2 database is available at https://maayanlab.cloud/lncHUB2. Database URL: https://maayanlab.cloud/lncHUB2.

Asunto(s)

ARN Largo no Codificante , Humanos , Animales , Ratones , Línea Celular , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Bases de Datos Factuales , Conocimiento

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA