Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 37(18): 2889-2895, 2021 09 29.
Article in English | MEDLINE | ID: mdl-33824954

ABSTRACT

MOTIVATION: Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples. RESULTS: We find that the top-ranked algorithm, based on random forest regression, beat the other methods in accuracy and reproducibility; more traditional gaussian-mixture methods performed well and tended to be faster, and the best deep learning approach yielded outcomes slightly inferior to the above methods. We anticipate researchers in the field will find the dataset and algorithms developed in this study to be a powerful research tool for benchmarking their deconvolution methods and a resource useful for multiple applications. AVAILABILITY AND IMPLEMENTATION: The data is freely available at clue.io/data (section Contests) and the software is on GitHub at https://github.com/cmap/gene_deconvolution_challenge. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Reproducibility of Results , Random Forest , Biology
2.
Nat Cancer ; 1(2): 235-248, 2020 02.
Article in English | MEDLINE | ID: mdl-32613204

ABSTRACT

Anti-cancer uses of non-oncology drugs have occasionally been found, but such discoveries have been serendipitous. We sought to create a public resource containing the growth inhibitory activity of 4,518 drugs tested across 578 human cancer cell lines. We used PRISM, a molecular barcoding method, to screen drugs against cell lines in pools. An unexpectedly large number of non-oncology drugs selectively inhibited subsets of cancer cell lines in a manner predictable from the cell lines' molecular features. Our findings include compounds that killed by inducing PDE3A-SLFN12 complex formation; vanadium-containing compounds whose killing depended on the sulfate transporter SLC26A2; the alcohol dependence drug disulfiram, which killed cells with low expression of metallothioneins; and the anti-inflammatory drug tepoxalin, which killed via the multi-drug resistance protein ABCB1. The PRISM drug repurposing resource (https://depmap.org/repurposing) is a starting point to develop new oncology therapeutics, and more rarely, for potential direct clinical translation.


Subject(s)
Neoplasms , Cell Line , Disulfiram , Drug Repositioning , Humans , Neoplasms/drug therapy
3.
PLoS One ; 14(9): e0222165, 2019.
Article in English | MEDLINE | ID: mdl-31560691

ABSTRACT

Open data science and algorithm development competitions offer a unique avenue for rapid discovery of better computational strategies. We highlight three examples in computational biology and bioinformatics research in which the use of competitions has yielded significant performance gains over established algorithms. These include algorithms for antibody clustering, imputing gene expression data, and querying the Connectivity Map (CMap). Performance gains are evaluated quantitatively using realistic, albeit sanitized, data sets. The solutions produced through these competitions are then examined with respect to their utility and the prospects for implementation in the field. We present the decision process and competition design considerations that lead to these successful outcomes as a model for researchers who want to use competitions and non-domain crowds as collaborators to further their research.


Subject(s)
Computational Biology/trends , Algorithms , Antibodies/classification , Antibodies/genetics , Cluster Analysis , Crowdsourcing/trends , Gene Expression Profiling/statistics & numerical data , Humans , Inventions/trends
4.
Nat Methods ; 16(9): 843-852, 2019 09.
Article in English | MEDLINE | ID: mdl-31471613

ABSTRACT

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.


Subject(s)
Computational Biology/methods , Disease/genetics , Gene Regulatory Networks , Genome-Wide Association Study , Models, Biological , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Algorithms , Gene Expression Profiling , Humans , Phenotype , Protein Interaction Maps
5.
Bioinformatics ; 35(8): 1427-1429, 2019 04 15.
Article in English | MEDLINE | ID: mdl-30203022

ABSTRACT

MOTIVATION: Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments. Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented size of data generated and its complex associated metadata have also created data storage and integration challenges. RESULTS: We present the GCTx file format and a suite of open-source packages for the efficient storage, serialization and analysis of dense two-dimensional matrices. We have extensively used the format in the Connectivity Map to assemble and share massive datasets currently comprising 1.3 million experiments, and we anticipate that the format's generalizability, paired with code libraries that we provide, will lower barriers for integrated cross-assay analysis and algorithm development. AVAILABILITY AND IMPLEMENTATION: Software packages (available in Python, R, Matlab and Java) are freely available at https://github.com/cmap. Additional instructions, tutorials and datasets are available at clue.io/code. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Metadata , Software , Algorithms , Information Storage and Retrieval
6.
Nat Methods ; 15(7): 543-546, 2018 07.
Article in English | MEDLINE | ID: mdl-29915188

ABSTRACT

Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.


Subject(s)
Genomics/methods , Internet , Machine Learning , DNA/genetics , Databases, Nucleic Acid , Nucleic Acid Amplification Techniques , RNA/genetics , Software
7.
PLoS Biol ; 15(11): e2003213, 2017 Nov.
Article in English | MEDLINE | ID: mdl-29190685

ABSTRACT

The application of RNA interference (RNAi) to mammalian cells has provided the means to perform phenotypic screens to determine the functions of genes. Although RNAi has revolutionized loss-of-function genetic experiments, it has been difficult to systematically assess the prevalence and consequences of off-target effects. The Connectivity Map (CMAP) represents an unprecedented resource to study the gene expression consequences of expressing short hairpin RNAs (shRNAs). Analysis of signatures for over 13,000 shRNAs applied in 9 cell lines revealed that microRNA (miRNA)-like off-target effects of RNAi are far stronger and more pervasive than generally appreciated. We show that mitigating off-target effects is feasible in these datasets via computational methodologies to produce a consensus gene signature (CGS). In addition, we compared RNAi technology to clustered regularly interspaced short palindromic repeat (CRISPR)-based knockout by analysis of 373 single guide RNAs (sgRNAs) in 6 cells lines and show that the on-target efficacies are comparable, but CRISPR technology is far less susceptible to systematic off-target effects. These results will help guide the proper use and analysis of loss-of-function reagents for the determination of gene function.


Subject(s)
Clustered Regularly Interspaced Short Palindromic Repeats , Gene Expression Profiling , Gene Regulatory Networks/genetics , Genomics/methods , RNA Interference/physiology , Cells, Cultured , Gene Expression Regulation, Neoplastic , Genomics/standards , HT29 Cells , Hep G2 Cells , Humans , MCF-7 Cells , RNA, Small Interfering/genetics , Transcriptome
8.
Cell ; 171(6): 1437-1452.e17, 2017 Nov 30.
Article in English | MEDLINE | ID: mdl-29195078

ABSTRACT

We previously piloted the concept of a Connectivity Map (CMap), whereby genes, drugs, and disease states are connected by virtue of common gene-expression signatures. Here, we report more than a 1,000-fold scale-up of the CMap as part of the NIH LINCS Consortium, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that we term L1000. We show that L1000 is highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts. We further show that the expanded CMap can be used to discover mechanism of action of small molecules, functionally annotate genetic variants of disease genes, and inform clinical trials. The 1.3 million L1000 profiles described here, as well as tools for their analysis, are available at https://clue.io.


Subject(s)
Gene Expression Profiling/methods , Cell Line, Tumor , Drug Resistance, Neoplasm , Gene Expression Profiling/economics , Humans , Neoplasms/drug therapy , Organ Specificity , Pharmaceutical Preparations/metabolism , Sequence Analysis, RNA/economics , Sequence Analysis, RNA/methods , Small Molecule Libraries
11.
Cancer Cell ; 30(2): 214-228, 2016 08 08.
Article in English | MEDLINE | ID: mdl-27478040

ABSTRACT

Recent genome sequencing efforts have identified millions of somatic mutations in cancer. However, the functional impact of most variants is poorly understood. Here we characterize 194 somatic mutations identified in primary lung adenocarcinomas. We present an expression-based variant-impact phenotyping (eVIP) method that uses gene expression changes to distinguish impactful from neutral somatic mutations. eVIP identified 69% of mutations analyzed as impactful and 31% as functionally neutral. A subset of the impactful mutations induces xenograft tumor formation in mice and/or confers resistance to cellular EGFR inhibition. Among these impactful variants are rare somatic, clinically actionable variants including EGFR S645C, ARAF S214C and S214F, ERBB2 S418T, and multiple BRAF variants, demonstrating that rare mutations can be functionally important in cancer.


Subject(s)
Adenocarcinoma/genetics , High-Throughput Nucleotide Sequencing/methods , Lung Neoplasms/genetics , Mutation , Adenocarcinoma of Lung , Animals , Cell Line, Tumor , Gene Expression Profiling , Heterografts , Humans , Mice , Oncogenes , Phenotype
12.
Bioinformatics ; 32(12): 1832-9, 2016 06 15.
Article in English | MEDLINE | ID: mdl-26873929

ABSTRACT

MOTIVATION: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. RESULTS: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. AVAILABILITY AND IMPLEMENTATION: D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression , Gene Expression Profiling , Linear Models , Machine Learning , RNA
13.
Proc Natl Acad Sci U S A ; 111(30): 10911-6, 2014 Jul 29.
Article in English | MEDLINE | ID: mdl-25024206

ABSTRACT

High-throughput screening has become a mainstay of small-molecule probe and early drug discovery. The question of how to build and evolve efficient screening collections systematically for cell-based and biochemical screening is still unresolved. It is often assumed that chemical structure diversity leads to diverse biological performance of a library. Here, we confirm earlier results showing that this inference is not always valid and suggest instead using biological measurement diversity derived from multiplexed profiling in the construction of libraries with diverse assay performance patterns for cell-based screens. Rather than using results from tens or hundreds of completed assays, which is resource intensive and not easily extensible, we use high-dimensional image-based cell morphology and gene expression profiles. We piloted this approach using over 30,000 compounds. We show that small-molecule profiling can be used to select compound sets with high rates of activity and diverse biological performance.


Subject(s)
Drug Evaluation, Preclinical/methods , Gene Expression Profiling , Gene Expression Regulation/drug effects , Cell Line, Tumor , Humans
14.
Nature ; 504(7478): 138-42, 2013 Dec 05.
Article in English | MEDLINE | ID: mdl-24185007

ABSTRACT

Malignant melanomas harbouring point mutations (Val600Glu) in the serine/threonine-protein kinase BRAF (BRAF(V600E)) depend on RAF-MEK-ERK signalling for tumour cell growth. RAF and MEK inhibitors show remarkable clinical efficacy in BRAF(V600E) melanoma; however, resistance to these agents remains a formidable challenge. Global characterization of resistance mechanisms may inform the development of more effective therapeutic combinations. Here we carried out systematic gain-of-function resistance studies by expressing more than 15,500 genes individually in a BRAF(V600E) melanoma cell line treated with RAF, MEK, ERK or combined RAF-MEK inhibitors. These studies revealed a cyclic-AMP-dependent melanocytic signalling network not previously associated with drug resistance, including G-protein-coupled receptors, adenyl cyclase, protein kinase A and cAMP response element binding protein (CREB). Preliminary analysis of biopsies from BRAF(V600E) melanoma patients revealed that phosphorylated (active) CREB was suppressed by RAF-MEK inhibition but restored in relapsing tumours. Expression of transcription factors activated downstream of MAP kinase and cAMP pathways also conferred resistance, including c-FOS, NR4A1, NR4A2 and MITF. Combined treatment with MAPK-pathway and histone-deacetylase inhibitors suppressed MITF expression and cAMP-mediated resistance. Collectively, these data suggest that oncogenic dysregulation of a melanocyte lineage dependency can cause resistance to RAF-MEK-ERK inhibition, which may be overcome by combining signalling- and chromatin-directed therapeutics.


Subject(s)
Antineoplastic Agents/pharmacology , Drug Resistance, Neoplasm/genetics , Melanocytes/drug effects , Mitogen-Activated Protein Kinases/metabolism , Protein Kinase Inhibitors/pharmacology , CREB-Binding Protein/metabolism , Cell Line, Tumor , Cell Lineage , Cyclic AMP/metabolism , Gene Expression Regulation, Neoplastic , HEK293 Cells , Humans , Melanocytes/cytology , Melanocytes/enzymology , Melanoma/enzymology , Melanoma/physiopathology , Signal Transduction , Transcription Factors/genetics , Transcription Factors/metabolism
15.
Nat Chem Biol ; 9(12): 840-848, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24161946

ABSTRACT

Efforts to develop more effective therapies for acute leukemia may benefit from high-throughput screening systems that reflect the complex physiology of the disease, including leukemia stem cells (LSCs) and supportive interactions with the bone marrow microenvironment. The therapeutic targeting of LSCs is challenging because LSCs are highly similar to normal hematopoietic stem and progenitor cells (HSPCs) and are protected by stromal cells in vivo. We screened 14,718 compounds in a leukemia-stroma co-culture system for inhibition of cobblestone formation, a cellular behavior associated with stem-cell function. Among those compounds that inhibited malignant cells but spared HSPCs was the cholesterol-lowering drug lovastatin. Lovastatin showed anti-LSC activity in vitro and in an in vivo bone marrow transplantation model. Mechanistic studies demonstrated that the effect was on target, via inhibition of HMG-CoA reductase. These results illustrate the power of merging physiologically relevant models with high-throughput screening.


Subject(s)
Antineoplastic Agents/pharmacology , Drug Screening Assays, Antitumor/methods , Leukemia , Neoplastic Stem Cells/drug effects , Cell Line, Tumor , Hematopoietic Stem Cells , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/pharmacology , Lovastatin/pharmacology , Neoplastic Stem Cells/cytology , Neoplastic Stem Cells/physiology
17.
Science ; 341(6143): 1238303, 2013 Jul 19.
Article in English | MEDLINE | ID: mdl-23869022

ABSTRACT

The ribosome is centrally situated to sense metabolic states, but whether its activity, in turn, coherently rewires transcriptional responses is unknown. Here, through integrated chemical-genetic analyses, we found that a dominant transcriptional effect of blocking protein translation in cancer cells was inactivation of heat shock factor 1 (HSF1), a multifaceted transcriptional regulator of the heat-shock response and many other cellular processes essential for anabolic metabolism, cellular proliferation, and tumorigenesis. These analyses linked translational flux to the regulation of HSF1 transcriptional activity and to the modulation of energy metabolism. Targeting this link with translation initiation inhibitors such as rocaglates deprived cancer cells of their energy and chaperone armamentarium and selectively impaired the proliferation of both malignant and premalignant cells with early-stage oncogenic lesions.


Subject(s)
DNA-Binding Proteins/biosynthesis , Neoplasms/metabolism , Neoplasms/pathology , Protein Biosynthesis/physiology , Ribosomes/metabolism , Transcription Factors/biosynthesis , Animals , Antineoplastic Agents/chemistry , Antineoplastic Agents/isolation & purification , Antineoplastic Agents/pharmacology , Benzofurans/pharmacology , Cell Line, Tumor , Cell Proliferation , Cell Transformation, Neoplastic/drug effects , Cell Transformation, Neoplastic/metabolism , Cell Transformation, Neoplastic/pathology , DNA-Binding Proteins/antagonists & inhibitors , Energy Metabolism/drug effects , Gene Expression Regulation, Neoplastic , Heat Shock Transcription Factors , High-Throughput Screening Assays , Humans , Mice , NIH 3T3 Cells , Neoplasm Transplantation , Neoplasms/genetics , Protein Biosynthesis/drug effects , Protein Biosynthesis/genetics , Ribosomes/drug effects , Transcription Factors/antagonists & inhibitors
18.
J Comp Psychol ; 123(4): 357-67, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19929104

ABSTRACT

Budgerigars and zebra finches were tested, using operant conditioning techniques, on their ability to identify a zebra finch song in the presence of a background masker emitted from either the same or a different location as the signal. Identification thresholds were obtained for three masker types differing in their spectrotemporal characteristics (noise, modulated noise, and a song chorus). Both bird species exhibited similar amounts of spatial unmasking across the three masker types. The amount of unmasking was greater when the masker was played continuously compared to when the target and masker were presented simultaneously. These results suggest that spatial factors are important for birds in the identification of natural signals in noisy environments.


Subject(s)
Auditory Perception , Finches , Melopsittacus , Orientation , Sound Localization , Vocalization, Animal , Animals , Attention , Conditioning, Operant , Female , Male , Perceptual Masking , Pitch Discrimination , Sound Spectrography , Species Specificity
19.
J Neurosci ; 28(25): 6304-8, 2008 Jun 18.
Article in English | MEDLINE | ID: mdl-18562600

ABSTRACT

Intensity variation poses a fundamental problem for sensory discrimination because changes in the response of sensory neurons as a result of stimulus identity, e.g., a change in the identity of the speaker uttering a word, can potentially be confused with changes resulting from stimulus intensity, for example, the loudness of the utterance. Here we report on the responses of neurons in field L, the primary auditory cortex homolog in songbirds, which allow for accurate discrimination of birdsongs that is invariant to intensity changes over a large range. Such neurons comprise a subset of a population that is highly diverse, in terms of both discrimination accuracy and intensity sensitivity. We find that the neurons with a high degree of invariance also display a high discrimination performance, and that the degree of invariance is significantly correlated with the reproducibility of spike timing on a short time scale and the temporal sparseness of spiking activity. Our results indicate that a temporally sparse spike timing-based code at a primary cortical stage can provide a substrate for intensity-invariant discrimination of natural sounds.


Subject(s)
Acoustic Stimulation/methods , Auditory Pathways/physiology , Pitch Discrimination/physiology , Sound , Vocalization, Animal/physiology , Animals , Auditory Perception/physiology , Finches , Male
20.
Nat Neurosci ; 10(12): 1601-7, 2007 Dec.
Article in English | MEDLINE | ID: mdl-17994016

ABSTRACT

Humans and animals must often discriminate between complex natural sounds in the presence of competing sounds (maskers). Although the auditory cortex is thought to be important in this task, the impact of maskers on cortical discrimination remains poorly understood. We examined neural responses in zebra finch (Taeniopygia guttata) field L (homologous to primary auditory cortex) to target birdsongs that were embedded in three different maskers (broadband noise, modulated noise and birdsong chorus). We found two distinct forms of interference in the neural responses: the addition of spurious spikes occurring primarily during the silent gaps between song syllables and the suppression of informative spikes occurring primarily during the syllables. Both effects systematically degraded neural discrimination as the target intensity decreased relative to that of the masker. The behavioral performance of songbirds degraded in a parallel manner. Our results identify neural interference that could explain the perceptual interference at the heart of the cocktail party problem.


Subject(s)
Auditory Cortex/cytology , Discrimination, Psychological/physiology , Neurons/physiology , Perceptual Masking/physiology , Sound , Vocalization, Animal/physiology , Acoustic Stimulation/methods , Action Potentials/physiology , Analysis of Variance , Animals , Behavior, Animal , Conditioning, Operant , Dose-Response Relationship, Radiation , Finches , Male , Pattern Recognition, Physiological/physiology , Psychometrics
SELECTION OF CITATIONS
SEARCH DETAIL
...