Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

Pan-cancer proteogenomics characterization of tumor immunity.

Petralia, Francesca; Ma, Weiping; Yaron, Tomer M; Caruso, Francesca Pia; Tignor, Nicole; Wang, Joshua M; Charytonowicz, Daniel; Johnson, Jared L; Huntsman, Emily M; Marino, Giacomo B; Calinawan, Anna; Evangelista, John Erol; Selvan, Myvizhi Esai; Chowdhury, Shrabanti; Rykunov, Dmitry; Krek, Azra; Song, Xiaoyu; Turhan, Berk; Christianson, Karen E; Lewis, David A; Deng, Eden Z; Clarke, Daniel J B; Whiteaker, Jeffrey R; Kennedy, Jacob J; Zhao, Lei; Segura, Rossana Lazcano; Batra, Harsh; Raso, Maria Gabriela; Parra, Edwin Roger; Soundararajan, Rama; Tang, Ximing; Li, Yize; Yi, Xinpei; Satpathy, Shankha; Wang, Ying; Wiznerowicz, Maciej; González-Robles, Tania J; Iavarone, Antonio; Gosline, Sara J C; Reva, Boris; Robles, Ana I; Nesvizhskii, Alexey I; Mani, D R; Gillette, Michael A; Klein, Robert J; Cieslik, Marcin; Zhang, Bing; Paulovich, Amanda G; Sebra, Robert; Gümüs, Zeynep H.

Cell ; 187(5): 1255-1277.e27, 2024 Feb 29.

Artículo en Inglés | MEDLINE | ID: mdl-38359819

RESUMEN

Despite the successes of immunotherapy in cancer treatment over recent decades, less than <10%-20% cancer cases have demonstrated durable responses from immune checkpoint blockade. To enhance the efficacy of immunotherapies, combination therapies suppressing multiple immune evasion mechanisms are increasingly contemplated. To better understand immune cell surveillance and diverse immune evasion responses in tumor tissues, we comprehensively characterized the immune landscape of more than 1,000 tumors across ten different cancers using CPTAC pan-cancer proteogenomic data. We identified seven distinct immune subtypes based on integrative learning of cell type compositions and pathway activities. We then thoroughly categorized unique genomic, epigenetic, transcriptomic, and proteomic changes associated with each subtype. Further leveraging the deep phosphoproteomic data, we studied kinase activities in different immune subtypes, which revealed potential subtype-specific therapeutic targets. Insights from this work will facilitate the development of future immunotherapy strategies and enhance precision targeting with existing agents.

Asunto(s)

Neoplasias , Proteogenómica , Humanos , Terapia Combinada , Genómica , Neoplasias/genética , Neoplasias/inmunología , Neoplasias/terapia , Proteómica , Escape del Tumor

2.

Enrichr-KG: bridging enrichment analysis across multiple libraries.

Evangelista, John Erol; Xie, Zhuorui; Marino, Giacomo B; Nguyen, Nhi; Clarke, Daniel J B; Ma'ayan, Avi.

Nucleic Acids Res ; 51(W1): W168-W179, 2023 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-37166973

RESUMEN

Gene and protein set enrichment analysis is a critical step in the analysis of data collected from omics experiments. Enrichr is a popular gene set enrichment analysis web-server search engine that contains hundreds of thousands of annotated gene sets. While Enrichr has been useful in providing enrichment analysis with many gene set libraries from different categories, integrating enrichment results across libraries and domains of knowledge can further hypothesis generation. To this end, Enrichr-KG is a knowledge graph database and a web-server application that combines selected gene set libraries from Enrichr for integrative enrichment analysis and visualization. The enrichment results are presented as subgraphs made of nodes and links that connect genes to their enriched terms. In addition, users of Enrichr-KG can add gene-gene links, as well as predicted genes to the subgraphs. This graphical representation of cross-library results with enriched and predicted genes can illuminate hidden associations between genes and annotated enriched terms from across datasets and resources. Enrichr-KG currently serves 26 gene set libraries from different categories that include transcription, pathways, ontologies, diseases/drugs, and cell types. To demonstrate the utility of Enrichr-KG we provide several case studies. Enrichr-KG is freely available at: https://maayanlab.cloud/enrichr-kg.

Asunto(s)

Biblioteca de Genes , Proteínas , Programas Informáticos , Bases de Datos Factuales , Motor de Búsqueda , Internet

3.

GeneRanger and TargetRanger: processed gene and protein expression levels across cells and tissues for target discovery.

Marino, Giacomo B; Ngai, Michael; Clarke, Daniel J B; Fleishman, Reid H; Deng, Eden Z; Xie, Zhuorui; Ahmed, Nasheath; Ma'ayan, Avi.

Nucleic Acids Res ; 51(W1): W213-W224, 2023 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-37166966

RESUMEN

Several atlasing efforts aim to profile human gene and protein expression across tissues, cell types and cell lines in normal physiology, development and disease. One utility of these resources is to examine the expression of a single gene across all cell types, tissues and cell lines in each atlas. However, there is currently no centralized place that integrates data from several atlases to provide this type of data in a uniform format for visualization, analysis and download, and via an application programming interface. To address this need, GeneRanger is a web server that provides access to processed data about gene and protein expression across normal human cell types, tissues and cell lines from several atlases. At the same time, TargetRanger is a related web server that takes as input RNA-seq data from profiled human cells and tissues, and then compares the uploaded input data to expression levels across the atlases to identify genes that are highly expressed in the input and lowly expressed across normal human cell types and tissues. Identified targets can be filtered by transmembrane or secreted proteins. The results from GeneRanger and TargetRanger are visualized as box and scatter plots, and as interactive tables. GeneRanger and TargetRanger are available from https://generanger.maayanlab.cloud and https://targetranger.maayanlab.cloud, respectively.

Asunto(s)

Proteómica , Seudogenes , Programas Informáticos , Humanos , Línea Celular , RNA-Seq , Internet

4.

SigCom LINCS: data and metadata search engine for a million gene expression signatures.

Evangelista, John Erol; Clarke, Daniel J B; Xie, Zhuorui; Lachmann, Alexander; Jeon, Minji; Chen, Kerwin; Jagodnik, Kathleen M; Jenkins, Sherry L; Kuleshov, Maxim V; Wojciechowicz, Megan L; Schürer, Stephan C; Medvedovic, Mario; Ma'ayan, Avi.

Nucleic Acids Res ; 50(W1): W697-W709, 2022 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-35524556

RESUMEN

Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into searchable signatures along with signatures extracted from Genotype-Tissue Expression (GTEx) and Gene Expression Omnibus (GEO), connections between drugs, genes, pathways and diseases can be illuminated. SigCom LINCS is a webserver that serves over a million gene expression signatures processed, analyzed, and visualized from LINCS, GTEx, and GEO. SigCom LINCS is built with Signature Commons, a cloud-agnostic skeleton Data Commons with a focus on serving searchable signatures. SigCom LINCS provides a rapid signature similarity search for mimickers and reversers given sets of up and down genes, a gene set, a single gene, or any search term. Additionally, users of SigCom LINCS can perform a metadata search to find and analyze subsets of signatures and find information about genes and drugs. SigCom LINCS is findable, accessible, interoperable, and reusable (FAIR) with metadata linked to standard ontologies and vocabularies. In addition, all the data and signatures within SigCom LINCS are available via a well-documented API. In summary, SigCom LINCS, available at https://maayanlab.cloud/sigcom-lincs, is a rich webserver resource for accelerating drug and target discovery in systems pharmacology.

Asunto(s)

Metadatos , Transcriptoma , Transcriptoma/genética , Motor de Búsqueda

5.

Transforming L1000 profiles to RNA-seq-like profiles with deep learning.

Jeon, Minji; Xie, Zhuorui; Evangelista, John E; Wojciechowicz, Megan L; Clarke, Daniel J B; Ma'ayan, Avi.

BMC Bioinformatics ; 23(1): 374, 2022 Sep 13.

Artículo en Inglés | MEDLINE | ID: mdl-36100892

RESUMEN

The L1000 technology, a cost-effective high-throughput transcriptomics technology, has been applied to profile a collection of human cell lines for their gene expression response to > 30,000 chemical and genetic perturbations. In total, there are currently over 3 million available L1000 profiles. Such a dataset is invaluable for the discovery of drug and target candidates and for inferring mechanisms of action for small molecules. The L1000 assay only measures the mRNA expression of 978 landmark genes while 11,350 additional genes are computationally reliably inferred. The lack of full genome coverage limits knowledge discovery for half of the human protein coding genes, and the potential for integration with other transcriptomics profiling data. Here we present a Deep Learning two-step model that transforms L1000 profiles to RNA-seq-like profiles. The input to the model are the measured 978 landmark genes while the output is a vector of 23,614 RNA-seq-like gene expression profiles. The model first transforms the landmark genes into RNA-seq-like 978 gene profiles using a modified CycleGAN model applied to unpaired data. The transformed 978 RNA-seq-like landmark genes are then extrapolated into the full genome space with a fully connected neural network model. The two-step model achieves 0.914 Pearson's correlation coefficients and 1.167 root mean square errors when tested on a published paired L1000/RNA-seq dataset produced by the LINCS and GTEx programs. The processed RNA-seq-like profiles are made available for download, signature search, and gene centric reverse search with unique case studies.

Asunto(s)

Aprendizaje Profundo , Perfilación de la Expresión Génica , Humanos , RNA-Seq , Transcriptoma

6.

DrugShot: querying biomedical search terms to retrieve prioritized lists of small molecules.

Kropiwnicki, Eryk; Lachmann, Alexander; Clarke, Daniel J B; Xie, Zhuorui; Jagodnik, Kathleen M; Ma'ayan, Avi.

BMC Bioinformatics ; 23(1): 76, 2022 Feb 19.

Artículo en Inglés | MEDLINE | ID: mdl-35183110

RESUMEN

BACKGROUND: PubMed contains millions of abstracts that co-mention terms that describe drugs with other biomedical terms such as genes or diseases. Unique opportunities exist for leveraging these co-mentions by integrating them with other drug-drug similarity resources such as the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 signatures to develop novel hypotheses. RESULTS: DrugShot is a web-based server application and an Appyter that enables users to enter any biomedical search term into a simple input form to receive ranked lists of drugs and other small molecules based on their relevance to the search term. To produce ranked lists of small molecules, DrugShot cross-references returned PubMed identifiers (PMIDs) with DrugRIF or AutoRIF, which are curated resources of drug-PMID associations, to produce an associated small molecule list where each small molecule is ranked according to total co-mentions with the search term from shared PubMed IDs. Additionally, using two types of drug-drug similarity matrices, lists of small molecules are predicted to be associated with the search term. Such predictions are based on literature co-mentions and signature similarity from LINCS L1000 drug-induced gene expression profiles. CONCLUSIONS: DrugShot prioritizes drugs and small molecules associated with biomedical search terms. In addition to listing known associations, DrugShot predicts additional drugs and small molecules related to any search term. Hence, DrugShot can be used to prioritize drugs and preclinical compounds for drug repurposing and suggest indications and adverse events for preclinical compounds. DrugShot is freely and openly available at: https://maayanlab.cloud/drugshot and https://appyters.maayanlab.cloud/#/DrugShot .

Asunto(s)

Reposicionamiento de Medicamentos , Programas Informáticos , Biblioteca de Genes , Transcriptoma

7.

LINCS Data Portal 2.0: next generation access point for perturbation-response signatures.

Stathias, Vasileios; Turner, John; Koleti, Amar; Vidovic, Dusica; Cooper, Daniel; Fazel-Najafabadi, Mehdi; Pilarczyk, Marcin; Terryn, Raymond; Chung, Caty; Umeano, Afoma; Clarke, Daniel J B; Lachmann, Alexander; Evangelista, John Erol; Ma'ayan, Avi; Medvedovic, Mario; Schürer, Stephan C.

Nucleic Acids Res ; 48(D1): D431-D439, 2020 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-31701147

RESUMEN

The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program with the goal of generating a large-scale and comprehensive catalogue of perturbation-response signatures by utilizing a diverse collection of perturbations across many model systems and assay types. The LINCS Data Portal (LDP) has been the primary access point for the compendium of LINCS data and has been widely utilized. Here, we report the first major update of LDP (http://lincsportal.ccs.miami.edu/signatures) with substantial changes in the data architecture and APIs, a completely redesigned user interface, and enhanced curated metadata annotations to support more advanced, intuitive and deeper querying, exploration and analysis capabilities. The cornerstone of this update has been the decision to reprocess all high-level LINCS datasets and make them accessible at the data point level enabling users to directly access and download any subset of signatures across the entire library independent from the originating source, project or assay. Access to the individual signatures also enables the newly implemented signature search functionality, which utilizes the iLINCS platform to identify conditions that mimic or reverse gene set queries. A newly designed query interface enables global metadata search with autosuggest across all annotations associated with perturbations, model systems, and signatures.

Asunto(s)

Biología Celular , Bases de Datos Factuales , Ensayos Clínicos como Asunto , Biología Computacional , Curaduría de Datos , Humanos , Almacenamiento y Recuperación de la Información , Metadatos , National Institutes of Health (U.S.) , Estados Unidos , Interfaz Usuario-Computador

8.

EnrichrBot: Twitter bot tracking tweets about human genes.

Bartal, Alon; Lachmann, Alexander; Clarke, Daniel J B; Seiden, Allison H; Jagodnik, Kathleen M; Ma'ayan, Avi.

Bioinformatics ; 36(12): 3932-3934, 2020 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-32277816

RESUMEN

MOTIVATION: Micro-blogging with Twitter to communicate new results, discuss ideas and share techniques is becoming central. While most Twitter users are real people, the Twitter API provides the opportunity to develop Twitter bots and to analyze global trends in tweets. RESULTS: EnrichrBot is a bot that tracks and tweets information about human genes implementing six principal functions: (i) tweeting information about under-studied genes including non-coding lncRNAs, (ii) replying to requests for information about genes, (iii) responding to GWASbot, another bot that tweets Manhattan plots from genome-wide association study analysis of the UK Biobank, (iv) tweeting randomly selected gene sets from the Enrichr database for analysis with Enrichr, (v) responding to mentions of human genes in tweets with additional information about these genes and (vi) tweeting a weekly report about the most trending genes on Twitter. AVAILABILITY AND IMPLEMENTATION: https://twitter.com/botenrichr; source code: https://github.com/MaayanLab/EnrichrBot. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Medios de Comunicación Sociales , Blogging , Estudio de Asociación del Genoma Completo , Humanos

9.

eXpression2Kinases (X2K) Web: linking expression signatures to upstream cell signaling networks.

Clarke, Daniel J B; Kuleshov, Maxim V; Schilder, Brian M; Torre, Denis; Duffy, Mary E; Keenan, Alexandra B; Lachmann, Alexander; Feldmann, Axel S; Gundersen, Gregory W; Silverstein, Moshe C; Wang, Zichen; Ma'ayan, Avi.

Nucleic Acids Res ; 46(W1): W171-W179, 2018 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-29800326

RESUMEN

While gene expression data at the mRNA level can be globally and accurately measured, profiling the activity of cell signaling pathways is currently much more difficult. eXpression2Kinases (X2K) computationally predicts involvement of upstream cell signaling pathways, given a signature of differentially expressed genes. X2K first computes enrichment for transcription factors likely to regulate the expression of the differentially expressed genes. The next step of X2K connects these enriched transcription factors through known protein-protein interactions (PPIs) to construct a subnetwork. The final step performs kinase enrichment analysis on the members of the subnetwork. X2K Web is a new implementation of the original eXpression2Kinases algorithm with important enhancements. X2K Web includes many new transcription factor and kinase libraries, and PPI networks. For demonstration, thousands of gene expression signatures induced by kinase inhibitors, applied to six breast cancer cell lines, are provided for fetching directly into X2K Web. The results are displayed as interactive downloadable vector graphic network images and bar graphs. Benchmarking various settings via random permutations enabled the identification of an optimal set of parameters to be used as the default settings in X2K Web. X2K Web is freely available from http://X2K.cloud.

Asunto(s)

Expresión Génica , Proteínas Quinasas/metabolismo , Transducción de Señal , Programas Informáticos , Animales , Línea Celular Tumoral , Expresión Génica/efectos de los fármacos , Humanos , Internet , Ratones , Mapeo de Interacción de Proteínas , Inhibidores de Proteínas Quinasas/farmacología , Transducción de Señal/genética , Factores de Transcripción/metabolismo

10.

RummaGEO: Automatic Mining of Human and Mouse Gene Sets from GEO.

Marino, Giacomo B; Clarke, Daniel J B; Deng, Eden Z; Ma'ayan, Avi.

bioRxiv ; 2024 Apr 13.

Artículo en Inglés | MEDLINE | ID: mdl-38645198

RESUMEN

The Gene Expression Omnibus (GEO) is a major open biomedical research repository for transcriptomics and other omics datasets. It currently contains millions of gene expression samples from tens of thousands of studies collected by many biomedical research laboratories from around the world. While users of the GEO repository can search the metadata describing studies for locating relevant datasets, there are currently no methods or resources that facilitate global search of GEO at the data level. To address this shortcoming, we developed RummaGEO, a webserver application that enables gene expression signature search of a large collection of human and mouse RNA-seq studies deposited into GEO. To develop the search engine, we performed offline automatic identification of sample conditions from the uniformly aligned GEO studies available from ARCHS4. We then computed differential expression signatures to extract gene sets from these studies. In total, RummaGEO currently contains 135,264 human and 158,062 mouse gene sets extracted from 23,395 GEO studies. Next, we analyzed the contents of the RummaGEO database to identify statistical patterns and perform various global analyses. The contents of the RummaGEO database are provided as a web-server search engine with signature search, PubMed search, and metadata search functionalities. Overall, RummaGEO provides an unprecedented resource for the biomedical research community enabling hypothesis generation for many future studies. The RummaGEO search engine is available from: https://rummageo.com/.

11.

Multiomics2Targets identifies targets from cancer cohorts profiled with transcriptomics, proteomics, and phosphoproteomics.

Deng, Eden Z; Marino, Giacomo B; Clarke, Daniel J B; Diamant, Ido; Resnick, Adam C; Ma, Weiping; Wang, Pei; Ma'ayan, Avi.

Cell Rep Methods ; 4(8): 100839, 2024 Aug 19.

Artículo en Inglés | MEDLINE | ID: mdl-39127042

RESUMEN

The availability of data from profiling of cancer patients with multiomics is rapidly increasing. However, integrative analysis of such data for personalized target identification is not trivial. Multiomics2Targets is a platform that enables users to upload transcriptomics, proteomics, and phosphoproteomics data matrices collected from the same cohort of cancer patients. After uploading the data, Multiomics2Targets produces a report that resembles a research publication. The uploaded matrices are processed, analyzed, and visualized using the tools Enrichr, KEA3, ChEA3, Expression2Kinases, and TargetRanger to identify and prioritize proteins, genes, and transcripts as potential targets. Figures and tables, as well as descriptions of the methods and results, are automatically generated. Reports include an abstract, introduction, methods, results, discussion, conclusions, and references and are exportable as citable PDFs and Jupyter Notebooks. Multiomics2Targets is applied to analyze version 3 of the Clinical Proteomic Tumor Analysis Consortium (CPTAC3) pan-cancer cohort, identifying potential targets for each CPTAC3 cancer subtype. Multiomics2Targets is available from https://multiomics2targets.maayanlab.cloud/.

Asunto(s)

Neoplasias , Fosfoproteínas , Proteómica , Transcriptoma , Humanos , Proteómica/métodos , Neoplasias/genética , Neoplasias/metabolismo , Fosfoproteínas/metabolismo , Fosfoproteínas/genética , Estudios de Cohortes , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Biología Computacional/métodos

12.

Rummagene: massive mining of gene sets from supporting materials of biomedical research publications.

Clarke, Daniel J B; Marino, Giacomo B; Deng, Eden Z; Xie, Zhuorui; Evangelista, John Erol; Ma'ayan, Avi.

Commun Biol ; 7(1): 482, 2024 Apr 20.

Artículo en Inglés | MEDLINE | ID: mdl-38643247

RESUMEN

Many biomedical research publications contain gene sets in their supporting tables, and these sets are currently not available for search and reuse. By crawling PubMed Central, the Rummagene server provides access to hundreds of thousands of such mammalian gene sets. So far, we scanned 5,448,589 articles to find 121,237 articles that contain 642,389 gene sets. These sets are served for enrichment analysis, free text, and table title search. Investigating statistical patterns within the Rummagene database, we demonstrate that Rummagene can be used for transcription factor and kinase enrichment analyses, and for gene function predictions. By combining gene set similarity with abstract similarity, Rummagene can find surprising relationships between biological processes, concepts, and named entities. Overall, Rummagene brings to surface the ability to search a massive collection of published biomedical datasets that are currently buried and inaccessible. The Rummagene web application is available at https://rummagene.com .

Asunto(s)

Investigación Biomédica , Minería de Datos , Animales , Programas Informáticos , Bases de Datos Factuales , Regulación de la Expresión Génica , Mamíferos

13.

PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices.

Lachmann, Alexander; Rizzo, Kaeli A; Bartal, Alon; Jeon, Minji; Clarke, Daniel J B; Ma'ayan, Avi.

PeerJ ; 11: e14927, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-36874981

RESUMEN

Background: Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is highly predictive of both gene annotations and protein-protein interactions. However, the performance of the predictions varies depending on whether the gene annotations and interactions are cell type and tissue specific or agnostic. Tissue and cell type-specific gene-gene co-expression data can be useful for making more accurate predictions because many genes perform their functions in unique ways in different cellular contexts. However, identifying the optimal tissues and cell types to partition the global gene-gene co-expression matrix is challenging. Results: Here we introduce and validate an approach called PRediction of gene Insights from Stratified Mammalian gene co-EXPression (PrismEXP) for improved gene annotation predictions based on RNA-seq gene-gene co-expression data. Using uniformly aligned data from ARCHS4, we apply PrismEXP to predict a wide variety of gene annotations including pathway membership, Gene Ontology terms, as well as human and mouse phenotypes. Predictions made with PrismEXP outperform predictions made with the global cross-tissue co-expression correlation matrix approach on all tested domains, and training using one annotation domain can be used to predict annotations in other domains. Conclusions: By demonstrating the utility of PrismEXP predictions in multiple use cases we show how PrismEXP can be used to enhance unsupervised machine learning methods to better understand the roles of understudied genes and proteins. To make PrismEXP accessible, it is provided via a user-friendly web interface, a Python package, and an Appyter. AVAILABILITY. The PrismEXP web-based application, with pre-computed PrismEXP predictions, is available from: https://maayanlab.cloud/prismexp; PrismEXP is also available as an Appyter: https://appyters.maayanlab.cloud/PrismEXP/; and as Python package: https://github.com/maayanlab/prismexp.

Asunto(s)

Mamíferos , Humanos , Animales , Ratones , Anotación de Secuencia Molecular , Ontología de Genes , Fenotipo

14.

lncHUB2: aggregated and inferred knowledge about human and mouse lncRNAs.

Marino, Giacomo B; Wojciechowicz, Megan L; Clarke, Daniel J B; Kuleshov, Maxim V; Xie, Zhuorui; Jeon, Minji; Lachmann, Alexander; Ma'ayan, Avi.

Database (Oxford) ; 20232023 03 04.

Artículo en Inglés | MEDLINE | ID: mdl-36869839

RESUMEN

Long non-coding ribonucleic acids (lncRNAs) account for the largest group of non-coding RNAs. However, knowledge about their function and regulation is limited. lncHUB2 is a web server database that provides known and inferred knowledge about the function of 18 705 human and 11 274 mouse lncRNAs. lncHUB2 produces reports that contain the secondary structure fold of the lncRNA, related publications, the most correlated coding genes, the most correlated lncRNAs, a network that visualizes the most correlated genes, predicted mouse phenotypes, predicted membership in biological processes and pathways, predicted upstream transcription factor regulators, and predicted disease associations. In addition, the reports include subcellular localization information; expression across tissues, cell types, and cell lines, and predicted small molecules and CRISPR knockout (CRISPR-KO) genes prioritized based on their likelihood to up- or downregulate the expression of the lncRNA. Overall, lncHUB2 is a database with rich information about human and mouse lncRNAs and as such it can facilitate hypothesis generation for many future studies. The lncHUB2 database is available at https://maayanlab.cloud/lncHUB2. Database URL: https://maayanlab.cloud/lncHUB2.

Asunto(s)

ARN Largo no Codificante , Humanos , Animales , Ratones , Línea Celular , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Bases de Datos Factuales , Conocimiento

15.

Computational screen to identify potential targets for immunotherapeutic identification and removal of senescence cells.

Deng, Eden Z; Fleishman, Reid H; Xie, Zhuorui; Marino, Giacomo B; Clarke, Daniel J B; Ma'ayan, Avi.

Aging Cell ; 22(6): e13809, 2023 06.

Artículo en Inglés | MEDLINE | ID: mdl-37082798

RESUMEN

To prioritize gene and protein candidates that may enable the selective identification and removal of senescent cells, we compared gene expression signatures from replicative senescent cells to transcriptomics and proteomics atlases of normal human tissues and cell types. RNA-seq samples from in vitro senescent cells (6 studies, 13 conditions) were analyzed for identifying targets at the gene and transcript levels that are highly expressed in senescent cells compared to their expression in normal human tissues and cell types. A gene set made of 301 genes called SenoRanger was established based on consensus analysis across studies and backgrounds. Of the identified senescence-associated targets, 29% of the genes in SenoRanger are also highly differentially expressed in aged tissues from GTEx. The SenoRanger gene set includes previously known as well as novel senescence-associated genes. Pathway analysis that connected the SenoRanger genes to their functional annotations confirms their potential role in several aging and senescence-related processes. Overall, SenoRanger provides solid hypotheses about potentially useful targets for identifying and removing senescence cells.

Asunto(s)

Envejecimiento , Senescencia Celular , Humanos , Anciano , Senescencia Celular/genética , Envejecimiento/genética , Perfilación de la Expresión Génica , Línea Celular , Inmunoterapia

16.

D2H2: diabetes data and hypothesis hub.

Marino, Giacomo B; Ahmed, Nasheath; Xie, Zhuorui; Jagodnik, Kathleen M; Han, Jason; Clarke, Daniel J B; Lachmann, Alexander; Keller, Mark P; Attie, Alan D; Ma'ayan, Avi.

Bioinform Adv ; 3(1): vbad178, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38107655

RESUMEN

Motivation: There is a rapid growth in the production of omics datasets collected by the diabetes research community. However, such published data are underutilized for knowledge discovery. To make bioinformatics tools and published omics datasets from the diabetes field more accessible to biomedical researchers, we developed the Diabetes Data and Hypothesis Hub (D2H2). Results: D2H2 contains hundreds of high-quality curated transcriptomics datasets relevant to diabetes, accessible via a user-friendly web-based portal. The collected and processed datasets are curated from the Gene Expression Omnibus (GEO). Each curated study has a dedicated page that provides data visualization, differential gene expression analysis, and single-gene queries. To enable the investigation of these curated datasets and to provide easy access to bioinformatics tools that serve gene and gene set-related knowledge, we developed the D2H2 chatbot. Utilizing GPT, we prompt users to enter free text about their data analysis needs. Parsing the user prompt, together with specifying information about all D2H2 available tools and workflows, we answer user queries by invoking the most relevant tools via the tools' API. D2H2 also has a hypotheses generation module where gene sets are randomly selected from the bulk RNA-seq precomputed signatures. We then find highly overlapping gene sets extracted from publications listed in PubMed Central with abstract dissimilarity. With the help of GPT, we speculate about a possible explanation of the high overlap between the gene sets. Overall, D2H2 is a platform that provides a suite of bioinformatics tools and curated transcriptomics datasets for hypothesis generation. Availability and implementation: D2H2 is available at: https://d2h2.maayanlab.cloud/ and the source code is available from GitHub at https://github.com/MaayanLab/D2H2-site under the CC BY-NC 4.0 license.

17.

Toxicology knowledge graph for structural birth defects.

Evangelista, John Erol; Clarke, Daniel J B; Xie, Zhuorui; Marino, Giacomo B; Utti, Vivian; Jenkins, Sherry L; Ahooyi, Taha Mohseni; Bologa, Cristian G; Yang, Jeremy J; Binder, Jessica L; Kumar, Praveen; Lambert, Christophe G; Grethe, Jeffrey S; Wenger, Eric; Taylor, Deanne; Oprea, Tudor I; de Bono, Bernard; Ma'ayan, Avi.

Commun Med (Lond) ; 3(1): 98, 2023 Jul 17.

Artículo en Inglés | MEDLINE | ID: mdl-37460679

RESUMEN

BACKGROUND: Birth defects are functional and structural abnormalities that impact about 1 in 33 births in the United States. They have been attributed to genetic and other factors such as drugs, cosmetics, food, and environmental pollutants during pregnancy, but for most birth defects there are no known causes. METHODS: To further characterize associations between small molecule compounds and their potential to induce specific birth abnormalities, we gathered knowledge from multiple sources to construct a reproductive toxicity Knowledge Graph (ReproTox-KG) with a focus on associations between birth defects, drugs, and genes. Specifically, we gathered data from drug/birth-defect associations from co-mentions in published abstracts, gene/birth-defect associations from genetic studies, drug- and preclinical-compound-induced gene expression changes in cell lines, known drug targets, genetic burden scores for human genes, and placental crossing scores for small molecules. RESULTS: Using ReproTox-KG and semi-supervised learning (SSL), we scored >30,000 preclinical small molecules for their potential to cross the placenta and induce birth defects, and identified >500 birth-defect/gene/drug cliques that can be used to explain molecular mechanisms for drug-induced birth defects. The ReproTox-KG can be accessed via a web-based user interface available at https://maayanlab.cloud/reprotox-kg . This site enables users to explore the associations between birth defects, approved and preclinical drugs, and all human genes. CONCLUSIONS: ReproTox-KG provides a resource for exploring knowledge about the molecular mechanisms of birth defects with the potential of predicting the likelihood of genes and preclinical small molecules to induce birth defects.

While birth defects are common, for most birth defects there are no known causes. During pregnancy, developing babies are exposed to drugs, cosmetics, food, and environmental pollutants that may cause birth defects. However, exactly how these environmental factors are involved in producing birth defects is difficult to discern. Also, birth defects can be a consequence of the genes inherited from the parents. We combined general data about human genes and drugs with specific data previously implicating genes and drugs in inducing birth defects to create a knowledge graph representation that connects genes, drugs, and birth defects. This knowledge graph can be used to explore new links that may explain why birth defects occur, particularly those that result from a combination of inherited and environmental influences.

18.

Gene set predictor for post-treatment Lyme disease.

Clarke, Daniel J B; Rebman, Alison W; Fan, Jinshui; Soloski, Mark J; Aucott, John N; Ma'ayan, Avi.

Cell Rep Med ; 3(11): 100816, 2022 11 15.

Artículo en Inglés | MEDLINE | ID: mdl-36384094

RESUMEN

Lyme disease (LD) is tick-borne disease whose post-treatment sequelae are not well understood. For this study, we enrolled 152 individuals with symptoms of post-treatment LD (PTLD) to profile their peripheral blood mononuclear cells (PBMCs) with RNA sequencing (RNA-seq). Combined with RNA-seq data from 72 individuals with acute LD and 44 uninfected controls, we investigated differences in differential gene expression. We observe that most individuals with PTLD have an inflammatory signature that is distinguished from the acute LD group. By distilling gene sets from this study with gene sets from other sources, we identify a subset of genes that are highly expressed in the cohorts but are not already established as biomarkers for inflammatory response or other viral or bacterial infections. We further reduce this gene set by feature importance to establish an mRNA biomarker set capable of distinguishing healthy individuals from those with acute LD or PTLD as a candidate for translation into an LD diagnostic.

Asunto(s)

Enfermedad de Lyme , Síndrome de la Enfermedad Post-Lyme , Humanos , Síndrome de la Enfermedad Post-Lyme/metabolismo , Leucocitos Mononucleares/metabolismo , Análisis de Secuencia de ARN , Enfermedad de Lyme/diagnóstico , ARN , Biomarcadores

19.

Gene and drug landing page aggregator.

Clarke, Daniel J B; Kuleshov, Maxim V; Xie, Zhuorui; Evangelista, John E; Meyers, Marilyn R; Kropiwnicki, Eryk; Jenkins, Sherry L; Ma'ayan, Avi.

Bioinform Adv ; 2(1): vbac013, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35368424

RESUMEN

Motivation: Many biological and biomedical researchers commonly search for information about genes and drugs to gather knowledge from these resources. For the most part, such information is served as landing pages in disparate data repositories and web portals. Results: The Gene and Drug Landing Page Aggregator (GDLPA) provides users with access to 50 gene-centric and 19 drug-centric repositories, enabling them to retrieve landing pages corresponding to their gene and drug queries. Bringing these resources together into one dashboard that directs users to the landing pages across many resources can help centralize gene- and drug-centric knowledge, as well as raise awareness of available resources that may be missed when using standard search engines. To demonstrate the utility of GDLPA, case studies for the gene klotho and the drug remdesivir were developed. The first case study highlights the potential role of klotho as a drug target for aging and kidney disease, while the second study gathers knowledge regarding approval, usage, and safety for remdesivir, the first approved coronavirus disease 2019 therapeutic. Finally, based on our experience, we provide guidelines for developing effective landing pages for genes and drugs. Availability and implementation: GDLPA is open source and is available from: https://cfde-gene-pages.cloud/. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

20.

Getting Started with the IDG KMC Datasets and Tools.

Kropiwnicki, Eryk; Binder, Jessica L; Yang, Jeremy J; Holmes, Jayme; Lachmann, Alexander; Clarke, Daniel J B; Sheils, Timothy; Kelleher, Keith J; Metzger, Vincent T; Bologa, Cristian G; Oprea, Tudor I; Ma'ayan, Avi.

Curr Protoc ; 2(1): e355, 2022 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-35085427

RESUMEN

The Illuminating the Druggable Genome (IDG) consortium is a National Institutes of Health (NIH) Common Fund program designed to enhance our knowledge of under-studied proteins, more specifically, proteins unannotated within the three most commonly drug-targeted protein families: G-protein coupled receptors, ion channels, and protein kinases. Since 2014, the IDG Knowledge Management Center (IDG-KMC) has generated several open-access datasets and resources that jointly serve as a highly translational machine-learning-ready knowledgebase focused on human protein-coding genes and their products. The goal of the IDG-KMC is to develop comprehensive integrated knowledge for the druggable genome to illuminate the uncharacterized or poorly annotated portion of the druggable genome. The tools derived from the IDG-KMC provide either user-friendly visualizations or ways to impute the knowledge about potential targets using machine learning strategies. In the following protocols, we describe how to use each web-based tool to accelerate illumination in under-studied proteins. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Interacting with the Pharos user interface Basic Protocol 2: Accessing the data in Harmonizome Basic Protocol 3: The ARCHS4 resource Basic Protocol 4: Making predictions about gene function with PrismExp Basic Protocol 5: Using Geneshot to illuminate knowledge about under-studied targets Basic Protocol 6: Exploring under-studied targets with TIN-X Basic Protocol 7: Interacting with the DrugCentral user interface Basic Protocol 8: Estimating Anti-SARS-CoV-2 activities with DrugCentral REDIAL-2020 Basic Protocol 9: Drug Set Enrichment Analysis using Drugmonizome Basic Protocol 10: The Drugmonizome-ML Appyter Basic Protocol 11: The Harmonizome-ML Appyter Basic Protocol 12: GWAS target illumination with TIGA Basic Protocol 13: Prioritizing kinases for lists of proteins and phosphoproteins with KEA3 Basic Protocol 14: Converting PubMed searches to drug sets with the DrugShot Appyter.

Asunto(s)

Bases de Datos Genéticas , Genoma , COVID-19 , Humanos , Aprendizaje Automático , Proteínas , SARS-CoV-2

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA