|

1.

Making Common Fund data more findable: catalyzing a data ecosystem.

Charbonneau, Amanda L; Brady, Arthur; Czajkowski, Karl; Aluvathingal, Jain; Canchi, Saranya; Carter, Robert; Chard, Kyle; Clarke, Daniel J B; Crabtree, Jonathan; Creasy, Heather H; D'Arcy, Mike; Felix, Victor; Giglio, Michelle; Gingrich, Alicia; Harris, Rayna M; Hodges, Theresa K; Ifeonu, Olukemi; Jeon, Minji; Kropiwnicki, Eryk; Lim, Marisa C W; Liming, R Lee; Lumian, Jessica; Mahurkar, Anup A; Mandal, Meisha; Munro, James B; Nadendla, Suvarna; Richter, Rudyard; Romano, Cia; Rocca-Serra, Philippe; Schor, Michael; Schuler, Robert E; Tangmunarunkit, Hongsuda; Waldrop, Alex; Williams, Cris; Word, Karen; Sansone, Susanna-Assunta; Ma'ayan, Avi; Wagner, Rick; Foster, Ian; Kesselman, Carl; Brown, C Titus; White, Owen.

Gigascience ; 112022 11 21.

Article En | MEDLINE | ID: mdl-36409836

The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biomedical datasets from individual Common Fund Programs' Data Coordination Centers (DCCs) into a uniform metadata model that can then be indexed and searched from a centralized portal. This Crosscut Metadata Model (C2M2) supports the wide variety of data types and metadata terms used by individual DCCs and can readily describe nearly all forms of biomedical research data. We detail its use to ingest and index data from 11 DCCs.

Ecosystem , Financial Management , Metadata

2.

Getting Started with LINCS Datasets and Tools.

Xie, Zhuorui; Kropiwnicki, Eryk; Wojciechowicz, Megan L; Jagodnik, Kathleen M; Shu, Ingrid; Bailey, Allison; Clarke, Daniel J B; Jeon, Minji; Evangelista, John Erol; V Kuleshov, Maxim; Lachmann, Alexander; Parigi, Abhijna A; Sanchez, Jose M; Jenkins, Sherry L; Ma'ayan, Avi.

Curr Protoc ; 2(7): e487, 2022 Jul.

Article En | MEDLINE | ID: mdl-35876555

The Library of Integrated Network-based Cellular Signatures (LINCS) was an NIH Common Fund program that aimed to expand our knowledge about human cellular responses to chemical, genetic, and microenvironment perturbations. Responses to perturbations were measured by transcriptomics, proteomics, cellular imaging, and other high content assays. The second phase of the LINCS program, which lasted 7 years, involved the engagement of six data and signature generation centers (DSGCs) and one data coordination and integration center (DCIC). The DSGCs and the DCIC developed several digital resources, including tools, databases, and workflows that aim to facilitate the use of the LINCS data and integrate this data with other publicly available data. The digital resources developed by the DSGCs and the DCIC can be used to gain new biological and pharmacological insights that can lead to the development of novel therapeutics. This protocol provides step-by-step instructions for processing the LINCS data into signatures, and utilizing the digital resources developed by the LINCS consortia for hypothesis generation and knowledge discovery. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Navigating L1000 tools and data in CLUE.io Basic Protocol 2: Computing signatures from the L1000 data with the CD method Basic Protocol 3: Analyzing lists of differentially expressed genes and querying them against the L1000 data with BioJupies and the Bulk RNA-seq Appyter Basic Protocol 4: Utilizing the L1000FWD resource for drug discovery Basic Protocol 5: KINOMEscan and the KINOMEscan Appyter Basic Protocol 6: LINCS P100 and GCP Proteomics Assays Basic Protocol 7: The LINCS Joint Project (LJP) Basic Protocol 8: The LINCS Data Portals and SigCom LINCS Basic Protocol 9: Creating and analyzing signatures with iLINCS.

Drug Discovery , Proteomics , Databases, Factual , Drug Discovery/methods , Gene Library , Humans , Transcriptome

3.

Gene and drug landing page aggregator.

Clarke, Daniel J B; Kuleshov, Maxim V; Xie, Zhuorui; Evangelista, John E; Meyers, Marilyn R; Kropiwnicki, Eryk; Jenkins, Sherry L; Ma'ayan, Avi.

Bioinform Adv ; 2(1): vbac013, 2022.

Article En | MEDLINE | ID: mdl-35368424

Motivation: Many biological and biomedical researchers commonly search for information about genes and drugs to gather knowledge from these resources. For the most part, such information is served as landing pages in disparate data repositories and web portals. Results: The Gene and Drug Landing Page Aggregator (GDLPA) provides users with access to 50 gene-centric and 19 drug-centric repositories, enabling them to retrieve landing pages corresponding to their gene and drug queries. Bringing these resources together into one dashboard that directs users to the landing pages across many resources can help centralize gene- and drug-centric knowledge, as well as raise awareness of available resources that may be missed when using standard search engines. To demonstrate the utility of GDLPA, case studies for the gene klotho and the drug remdesivir were developed. The first case study highlights the potential role of klotho as a drug target for aging and kidney disease, while the second study gathers knowledge regarding approval, usage, and safety for remdesivir, the first approved coronavirus disease 2019 therapeutic. Finally, based on our experience, we provide guidelines for developing effective landing pages for genes and drugs. Availability and implementation: GDLPA is open source and is available from: https://cfde-gene-pages.cloud/. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

4.

DrugShot: querying biomedical search terms to retrieve prioritized lists of small molecules.

Kropiwnicki, Eryk; Lachmann, Alexander; Clarke, Daniel J B; Xie, Zhuorui; Jagodnik, Kathleen M; Ma'ayan, Avi.

BMC Bioinformatics ; 23(1): 76, 2022 Feb 19.

Article En | MEDLINE | ID: mdl-35183110

BACKGROUND: PubMed contains millions of abstracts that co-mention terms that describe drugs with other biomedical terms such as genes or diseases. Unique opportunities exist for leveraging these co-mentions by integrating them with other drug-drug similarity resources such as the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 signatures to develop novel hypotheses. RESULTS: DrugShot is a web-based server application and an Appyter that enables users to enter any biomedical search term into a simple input form to receive ranked lists of drugs and other small molecules based on their relevance to the search term. To produce ranked lists of small molecules, DrugShot cross-references returned PubMed identifiers (PMIDs) with DrugRIF or AutoRIF, which are curated resources of drug-PMID associations, to produce an associated small molecule list where each small molecule is ranked according to total co-mentions with the search term from shared PubMed IDs. Additionally, using two types of drug-drug similarity matrices, lists of small molecules are predicted to be associated with the search term. Such predictions are based on literature co-mentions and signature similarity from LINCS L1000 drug-induced gene expression profiles. CONCLUSIONS: DrugShot prioritizes drugs and small molecules associated with biomedical search terms. In addition to listing known associations, DrugShot predicts additional drugs and small molecules related to any search term. Hence, DrugShot can be used to prioritize drugs and preclinical compounds for drug repurposing and suggest indications and adverse events for preclinical compounds. DrugShot is freely and openly available at: https://maayanlab.cloud/drugshot and https://appyters.maayanlab.cloud/#/DrugShot .

Drug Repositioning , Software , Gene Library , Transcriptome

5.

Getting Started with the IDG KMC Datasets and Tools.

Kropiwnicki, Eryk; Binder, Jessica L; Yang, Jeremy J; Holmes, Jayme; Lachmann, Alexander; Clarke, Daniel J B; Sheils, Timothy; Kelleher, Keith J; Metzger, Vincent T; Bologa, Cristian G; Oprea, Tudor I; Ma'ayan, Avi.

Curr Protoc ; 2(1): e355, 2022 Jan.

Article En | MEDLINE | ID: mdl-35085427

The Illuminating the Druggable Genome (IDG) consortium is a National Institutes of Health (NIH) Common Fund program designed to enhance our knowledge of under-studied proteins, more specifically, proteins unannotated within the three most commonly drug-targeted protein families: G-protein coupled receptors, ion channels, and protein kinases. Since 2014, the IDG Knowledge Management Center (IDG-KMC) has generated several open-access datasets and resources that jointly serve as a highly translational machine-learning-ready knowledgebase focused on human protein-coding genes and their products. The goal of the IDG-KMC is to develop comprehensive integrated knowledge for the druggable genome to illuminate the uncharacterized or poorly annotated portion of the druggable genome. The tools derived from the IDG-KMC provide either user-friendly visualizations or ways to impute the knowledge about potential targets using machine learning strategies. In the following protocols, we describe how to use each web-based tool to accelerate illumination in under-studied proteins. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Interacting with the Pharos user interface Basic Protocol 2: Accessing the data in Harmonizome Basic Protocol 3: The ARCHS4 resource Basic Protocol 4: Making predictions about gene function with PrismExp Basic Protocol 5: Using Geneshot to illuminate knowledge about under-studied targets Basic Protocol 6: Exploring under-studied targets with TIN-X Basic Protocol 7: Interacting with the DrugCentral user interface Basic Protocol 8: Estimating Anti-SARS-CoV-2 activities with DrugCentral REDIAL-2020 Basic Protocol 9: Drug Set Enrichment Analysis using Drugmonizome Basic Protocol 10: The Drugmonizome-ML Appyter Basic Protocol 11: The Harmonizome-ML Appyter Basic Protocol 12: GWAS target illumination with TIGA Basic Protocol 13: Prioritizing kinases for lists of proteins and phosphoproteins with KEA3 Basic Protocol 14: Converting PubMed searches to drug sets with the DrugShot Appyter.

Databases, Genetic , Genome , COVID-19 , Humans , Machine Learning , Proteins , SARS-CoV-2

6.

Appyters: Turning Jupyter Notebooks into data-driven web apps.

Clarke, Daniel J B; Jeon, Minji; Stein, Daniel J; Moiseyev, Nicole; Kropiwnicki, Eryk; Dai, Charles; Xie, Zhuorui; Wojciechowicz, Megan L; Litz, Skylar; Hom, Jason; Evangelista, John Erol; Goldman, Lucas; Zhang, Serena; Yoon, Christine; Ahamed, Tahmid; Bhuiyan, Samantha; Cheng, Minxuan; Karam, Julie; Jagodnik, Kathleen M; Shu, Ingrid; Lachmann, Alexander; Ayling, Sam; Jenkins, Sherry L; Ma'ayan, Avi.

Patterns (N Y) ; 2(3): 100213, 2021 Mar 12.

Article En | MEDLINE | ID: mdl-33748796

Jupyter Notebooks have transformed the communication of data analysis pipelines by facilitating a modular structure that brings together code, markdown text, and interactive visualizations. Here, we extended Jupyter Notebooks to broaden their accessibility with Appyters. Appyters turn Jupyter Notebooks into fully functional standalone web-based bioinformatics applications. Appyters present to users an entry form enabling them to upload their data and set various parameters for a multitude of data analysis workflows. Once the form is filled, the Appyter executes the corresponding notebook in the cloud, producing the output without requiring the user to interact directly with the code. Appyters were used to create many bioinformatics web-based reusable workflows, including applications to build customized machine learning pipelines, analyze omics data, and produce publishable figures. These Appyters are served in the Appyters Catalog at https://appyters.maayanlab.cloud. In summary, Appyters enable the rapid development of interactive web-based bioinformatics applications.

7.

Drugmonizome and Drugmonizome-ML: integration and abstraction of small molecule attributes for drug enrichment analysis and machine learning.

Kropiwnicki, Eryk; Evangelista, John E; Stein, Daniel J; Clarke, Daniel J B; Lachmann, Alexander; Kuleshov, Maxim V; Jeon, Minji; Jagodnik, Kathleen M; Ma'ayan, Avi.

Database (Oxford) ; 20212021 03 31.

Article En | MEDLINE | ID: mdl-33787872

Understanding the underlying molecular and structural similarities between seemingly heterogeneous sets of drugs can aid in identifying drug repurposing opportunities and assist in the discovery of novel properties of preclinical small molecules. A wealth of information about drug and small molecule structure, targets, indications and side effects; induced gene expression signatures; and other attributes are publicly available through web-based tools, databases and repositories. By processing, abstracting and aggregating information from these resources into drug set libraries, knowledge about novel properties of drugs and small molecules can be systematically imputed with machine learning. In addition, drug set libraries can be used as the underlying database for drug set enrichment analysis. Here, we present Drugmonizome, a database with a search engine for querying annotated sets of drugs and small molecules for performing drug set enrichment analysis. Utilizing the data within Drugmonizome, we also developed Drugmonizome-ML. Drugmonizome-ML enables users to construct customized machine learning pipelines using the drug set libraries from Drugmonizome. To demonstrate the utility of Drugmonizome, drug sets from 12 independent SARS-CoV-2 in vitro screens were subjected to consensus enrichment analysis. Despite the low overlap among these 12 independent in vitro screens, we identified common biological processes critical for blocking viral replication. To demonstrate Drugmonizome-ML, we constructed a machine learning pipeline to predict whether approved and preclinical drugs may induce peripheral neuropathy as a potential side effect. Overall, the Drugmonizome and Drugmonizome-ML resources provide rich and diverse knowledge about drugs and small molecules for direct systems pharmacology applications. Database URL: https://maayanlab.cloud/drugmonizome/.

COVID-19 Drug Treatment , Databases, Pharmaceutical , SARS-CoV-2/drug effects , Antiviral Agents/chemistry , Antiviral Agents/pharmacology , COVID-19/virology , Drug Discovery , Drug Evaluation, Preclinical , Drug Repositioning , Drug-Related Side Effects and Adverse Reactions , Humans , In Vitro Techniques , Machine Learning , Peripheral Nervous System Diseases/chemically induced , SARS-CoV-2/physiology , Small Molecule Libraries , User-Computer Interface , Virus Replication/drug effects

8.

Gene Set Knowledge Discovery with Enrichr.

Xie, Zhuorui; Bailey, Allison; Kuleshov, Maxim V; Clarke, Daniel J B; Evangelista, John E; Jenkins, Sherry L; Lachmann, Alexander; Wojciechowicz, Megan L; Kropiwnicki, Eryk; Jagodnik, Kathleen M; Jeon, Minji; Ma'ayan, Avi.

Curr Protoc ; 1(3): e90, 2021 Mar.

Article En | MEDLINE | ID: mdl-33780170

Profiling samples from patients, tissues, and cells with genomics, transcriptomics, epigenomics, proteomics, and metabolomics ultimately produces lists of genes and proteins that need to be further analyzed and integrated in the context of known biology. Enrichr (Chen et al., 2013; Kuleshov et al., 2016) is a gene set search engine that enables the querying of hundreds of thousands of annotated gene sets. Enrichr uniquely integrates knowledge from many high-profile projects to provide synthesized information about mammalian genes and gene sets. The platform provides various methods to compute gene set enrichment, and the results are visualized in several interactive ways. This protocol provides a summary of the key features of Enrichr, which include using Enrichr programmatically and embedding an Enrichr button on any website. © 2021 Wiley Periodicals LLC. Basic Protocol 1: Analyzing lists of differentially expressed genes from transcriptomics, proteomics and phosphoproteomics, GWAS studies, or other experimental studies Basic Protocol 2: Searching Enrichr by a single gene or key search term Basic Protocol 3: Preparing raw or processed RNA-seq data through BioJupies in preparation for Enrichr analysis Basic Protocol 4: Analyzing gene sets for model organisms using modEnrichr Basic Protocol 5: Using Enrichr in Geneshot Basic Protocol 6: Using Enrichr in ARCHS4 Basic Protocol 7: Using the enrichment analysis visualization Appyter to visualize Enrichr results Basic Protocol 8: Using the Enrichr API Basic Protocol 9: Adding an Enrichr button to a website.

Knowledge Discovery , Software , Animals , Computational Biology , Genomics , Humans , RNA-Seq

9.

Prioritizing Pain-Associated Targets with Machine Learning.

Jeon, Minji; Jagodnik, Kathleen M; Kropiwnicki, Eryk; Stein, Daniel J; Ma'ayan, Avi.

Biochemistry ; 60(18): 1430-1446, 2021 05 11.

Article En | MEDLINE | ID: mdl-33606503

While hundreds of genes have been associated with pain, much of the molecular mechanisms of pain remain unknown. As a result, current analgesics are limited to few clinically validated targets. Here, we trained a machine learning (ML) ensemble model to predict new targets for 17 categories of pain. The model utilizes features from transcriptomics, proteomics, and gene ontology to prioritize targets for modulating pain. We focused on identifying novel G-protein-coupled receptors (GPCRs), ion channels, and protein kinases because these proteins represent the most successful drug target families. The performance of the model to predict novel pain targets is 0.839 on average based on AUROC, while the predictions for arthritis had the highest accuracy (AUROC = 0.929). The model predicts hundreds of novel targets for pain; for example, GPR132 and GPR109B are highly ranked GPCRs for rheumatoid arthritis. Overall, gene-pain association predictions cluster into three groups that are enriched for cytokine, calcium, and GABA-related cell signaling pathways. These predictions can serve as a foundation for future experimental exploration to advance the development of safer and more effective analgesics.

Analgesics/chemistry , Analgesics/pharmacology , Drug Delivery Systems , Machine Learning , Pain/drug therapy , Drug Design , Drug Discovery , Humans , Models, Biological

10.

The COVID-19 Drug and Gene Set Library.

Kuleshov, Maxim V; Stein, Daniel J; Clarke, Daniel J B; Kropiwnicki, Eryk; Jagodnik, Kathleen M; Bartal, Alon; Evangelista, John E; Hom, Jason; Cheng, Minxuan; Bailey, Allison; Zhou, Abigail; Ferguson, Laura B; Lachmann, Alexander; Ma'ayan, Avi.

Patterns (N Y) ; 1(6): 100090, 2020 Sep 11.

Article En | MEDLINE | ID: mdl-32838343

In a short period, many research publications that report sets of experimentally validated drugs as potential COVID-19 therapies have emerged. To organize this accumulating knowledge, we developed the COVID-19 Drug and Gene Set Library (https://amp.pharm.mssm.edu/covid19/), a collection of drug and gene sets related to COVID-19 research from multiple sources. The platform enables users to view, download, analyze, visualize, and contribute drug and gene sets related to COVID-19 research. To evaluate the content of the library, we compared the results from six in vitro drug screens for COVID-19 repurposing candidates. Surprisingly, we observe low overlap across screens while highlighting overlapping candidates that should receive more attention as potential therapeutics for COVID-19. Overall, the COVID-19 Drug and Gene Set Library can be used to identify community consensus, make researchers and clinicians aware of new potential therapies, enable machine-learning applications, and facilitate the research community to work together toward a cure.

11.

The COVID-19 Gene and Drug Set Library.

Kuleshov, Maxim V; Clarke, Daniel J B; Kropiwnicki, Eryk; Jagodnik, Kathleen M; Bartal, Alon; Evangelista, John E; Zhou, Abigail; Ferguson, Laura B; Lachmann, Alexander; Ma'ayan, Avi.

Res Sq ; 2020 May 13.

Article En | MEDLINE | ID: mdl-32702729

The coronavirus (CoV) severe acute respiratory syndrome (SARS)-CoV-2 (COVID-19) pandemic has received rapid response by the research community to offer suggestions for repurposing of approved drugs as well as to improve our understanding of the COVID-19 viral life cycle molecular mechanisms. In a short period, tens of thousands of research preprints and other publications have emerged including those that report lists of experimentally validated drugs and compounds as potential COVID-19 therapies. In addition, gene sets from interacting COVID-19 virus-host proteins and differentially expressed genes when comparing infected to uninfected cells are being published at a fast rate. To organize this rapidly accumulating knowledge, we developed the COVID-19 Gene and Drug Set Library (https://amp.pharm.mssm.edu/covid19/), a collection of gene and drug sets related to COVID-19 research from multiple sources. The COVID-19 Gene and Drug Set Library is delivered as a web-based interface that enables users to view, download, analyze, visualize, and contribute gene and drug sets related to COVID-19 research. To evaluate the content of the library, we performed several analyses including comparing the results from 6 in-vitro drug screens for COVID-19 repurposing candidates. Surprisingly, we observe little overlap across these initial screens. The most common and unique hit across these screen is mefloquine, a malaria drug that should receive more attention as a potential therapeutic for COVID-19. Overall, the library of gene and drug sets can be used to identify community consensus, make researchers and clinicians aware of the development of new potential therapies, as well as allow the research community to work together towards a cure for COVID-19.

12.

ChEA3: transcription factor enrichment analysis by orthogonal omics integration.

Keenan, Alexandra B; Torre, Denis; Lachmann, Alexander; Leong, Ariel K; Wojciechowicz, Megan L; Utti, Vivian; Jagodnik, Kathleen M; Kropiwnicki, Eryk; Wang, Zichen; Ma'ayan, Avi.

Nucleic Acids Res ; 47(W1): W212-W224, 2019 07 02.

Article En | MEDLINE | ID: mdl-31114921

Identifying the transcription factors (TFs) responsible for observed changes in gene expression is an important step in understanding gene regulatory networks. ChIP-X Enrichment Analysis 3 (ChEA3) is a transcription factor enrichment analysis tool that ranks TFs associated with user-submitted gene sets. The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF-gene co-expression from RNA-seq studies, TF-target associations from ChIP-seq experiments, and TF-gene co-occurrence computed from crowd-submitted gene lists. Enrichment results from these distinct sources are integrated to generate a composite rank that improves the prediction of the correct upstream TF compared to ranks produced by individual libraries. We compare ChEA3 with existing TF prediction tools and show that ChEA3 performs better. By integrating the ChEA3 libraries, we illuminate general transcription factor properties such as whether the TF behaves as an activator or a repressor. The ChEA3 web-server is available from https://amp.pharm.mssm.edu/ChEA3.

Computational Biology/methods , Databases, Genetic , Gene Library , Transcription Factors/genetics , Chromatin Immunoprecipitation Sequencing/methods , Datasets as Topic , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Humans