RESUMO
We previously piloted the concept of a Connectivity Map (CMap), whereby genes, drugs, and disease states are connected by virtue of common gene-expression signatures. Here, we report more than a 1,000-fold scale-up of the CMap as part of the NIH LINCS Consortium, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that we term L1000. We show that L1000 is highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts. We further show that the expanded CMap can be used to discover mechanism of action of small molecules, functionally annotate genetic variants of disease genes, and inform clinical trials. The 1.3 million L1000 profiles described here, as well as tools for their analysis, are available at https://clue.io.
Assuntos
Perfilação da Expressão Gênica/métodos , Linhagem Celular Tumoral , Resistencia a Medicamentos Antineoplásicos , Perfilação da Expressão Gênica/economia , Humanos , Neoplasias/tratamento farmacológico , Especificidade de Órgãos , Preparações Farmacêuticas/metabolismo , Análise de Sequência de RNA/economia , Análise de Sequência de RNA/métodos , Bibliotecas de Moléculas PequenasRESUMO
Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.
Assuntos
Biologia Computacional/métodos , Doença/genética , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Modelos Biológicos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Algoritmos , Perfilação da Expressão Gênica , Humanos , Fenótipo , Mapas de Interação de ProteínasRESUMO
MOTIVATION: Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples. RESULTS: We find that the top-ranked algorithm, based on random forest regression, beat the other methods in accuracy and reproducibility; more traditional gaussian-mixture methods performed well and tended to be faster, and the best deep learning approach yielded outcomes slightly inferior to the above methods. We anticipate researchers in the field will find the dataset and algorithms developed in this study to be a powerful research tool for benchmarking their deconvolution methods and a resource useful for multiple applications. AVAILABILITY AND IMPLEMENTATION: The data is freely available at clue.io/data (section Contests) and the software is on GitHub at https://github.com/cmap/gene_deconvolution_challenge. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Software , Reprodutibilidade dos Testes , Algoritmo Florestas Aleatórias , BiologiaRESUMO
Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.
Assuntos
Genômica/métodos , Internet , Aprendizado de Máquina , DNA/genética , Bases de Dados de Ácidos Nucleicos , Técnicas de Amplificação de Ácido Nucleico , RNA/genética , SoftwareRESUMO
MOTIVATION: Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments. Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented size of data generated and its complex associated metadata have also created data storage and integration challenges. RESULTS: We present the GCTx file format and a suite of open-source packages for the efficient storage, serialization and analysis of dense two-dimensional matrices. We have extensively used the format in the Connectivity Map to assemble and share massive datasets currently comprising 1.3 million experiments, and we anticipate that the format's generalizability, paired with code libraries that we provide, will lower barriers for integrated cross-assay analysis and algorithm development. AVAILABILITY AND IMPLEMENTATION: Software packages (available in Python, R, Matlab and Java) are freely available at https://github.com/cmap. Additional instructions, tutorials and datasets are available at clue.io/code. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Metadados , Software , Algoritmos , Armazenamento e Recuperação da InformaçãoRESUMO
The application of RNA interference (RNAi) to mammalian cells has provided the means to perform phenotypic screens to determine the functions of genes. Although RNAi has revolutionized loss-of-function genetic experiments, it has been difficult to systematically assess the prevalence and consequences of off-target effects. The Connectivity Map (CMAP) represents an unprecedented resource to study the gene expression consequences of expressing short hairpin RNAs (shRNAs). Analysis of signatures for over 13,000 shRNAs applied in 9 cell lines revealed that microRNA (miRNA)-like off-target effects of RNAi are far stronger and more pervasive than generally appreciated. We show that mitigating off-target effects is feasible in these datasets via computational methodologies to produce a consensus gene signature (CGS). In addition, we compared RNAi technology to clustered regularly interspaced short palindromic repeat (CRISPR)-based knockout by analysis of 373 single guide RNAs (sgRNAs) in 6 cells lines and show that the on-target efficacies are comparable, but CRISPR technology is far less susceptible to systematic off-target effects. These results will help guide the proper use and analysis of loss-of-function reagents for the determination of gene function.
Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/genética , Genômica/métodos , Interferência de RNA/fisiologia , Células Cultivadas , Regulação Neoplásica da Expressão Gênica , Genômica/normas , Células HT29 , Células Hep G2 , Humanos , Células MCF-7 , RNA Interferente Pequeno/genética , TranscriptomaRESUMO
Malignant melanomas harbouring point mutations (Val600Glu) in the serine/threonine-protein kinase BRAF (BRAF(V600E)) depend on RAF-MEK-ERK signalling for tumour cell growth. RAF and MEK inhibitors show remarkable clinical efficacy in BRAF(V600E) melanoma; however, resistance to these agents remains a formidable challenge. Global characterization of resistance mechanisms may inform the development of more effective therapeutic combinations. Here we carried out systematic gain-of-function resistance studies by expressing more than 15,500 genes individually in a BRAF(V600E) melanoma cell line treated with RAF, MEK, ERK or combined RAF-MEK inhibitors. These studies revealed a cyclic-AMP-dependent melanocytic signalling network not previously associated with drug resistance, including G-protein-coupled receptors, adenyl cyclase, protein kinase A and cAMP response element binding protein (CREB). Preliminary analysis of biopsies from BRAF(V600E) melanoma patients revealed that phosphorylated (active) CREB was suppressed by RAF-MEK inhibition but restored in relapsing tumours. Expression of transcription factors activated downstream of MAP kinase and cAMP pathways also conferred resistance, including c-FOS, NR4A1, NR4A2 and MITF. Combined treatment with MAPK-pathway and histone-deacetylase inhibitors suppressed MITF expression and cAMP-mediated resistance. Collectively, these data suggest that oncogenic dysregulation of a melanocyte lineage dependency can cause resistance to RAF-MEK-ERK inhibition, which may be overcome by combining signalling- and chromatin-directed therapeutics.
Assuntos
Antineoplásicos/farmacologia , Resistencia a Medicamentos Antineoplásicos/genética , Melanócitos/efeitos dos fármacos , Proteínas Quinases Ativadas por Mitógeno/metabolismo , Inibidores de Proteínas Quinases/farmacologia , Proteína de Ligação a CREB/metabolismo , Linhagem Celular Tumoral , Linhagem da Célula , AMP Cíclico/metabolismo , Regulação Neoplásica da Expressão Gênica , Células HEK293 , Humanos , Melanócitos/citologia , Melanócitos/enzimologia , Melanoma/enzimologia , Melanoma/fisiopatologia , Transdução de Sinais , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
MOTIVATION: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only â¼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. RESULTS: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. AVAILABILITY AND IMPLEMENTATION: D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Expressão Gênica , Perfilação da Expressão Gênica , Modelos Lineares , Aprendizado de Máquina , RNARESUMO
High-throughput screening has become a mainstay of small-molecule probe and early drug discovery. The question of how to build and evolve efficient screening collections systematically for cell-based and biochemical screening is still unresolved. It is often assumed that chemical structure diversity leads to diverse biological performance of a library. Here, we confirm earlier results showing that this inference is not always valid and suggest instead using biological measurement diversity derived from multiplexed profiling in the construction of libraries with diverse assay performance patterns for cell-based screens. Rather than using results from tens or hundreds of completed assays, which is resource intensive and not easily extensible, we use high-dimensional image-based cell morphology and gene expression profiles. We piloted this approach using over 30,000 compounds. We show that small-molecule profiling can be used to select compound sets with high rates of activity and diverse biological performance.
Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/efeitos dos fármacos , Linhagem Celular Tumoral , HumanosRESUMO
Efforts to develop more effective therapies for acute leukemia may benefit from high-throughput screening systems that reflect the complex physiology of the disease, including leukemia stem cells (LSCs) and supportive interactions with the bone marrow microenvironment. The therapeutic targeting of LSCs is challenging because LSCs are highly similar to normal hematopoietic stem and progenitor cells (HSPCs) and are protected by stromal cells in vivo. We screened 14,718 compounds in a leukemia-stroma co-culture system for inhibition of cobblestone formation, a cellular behavior associated with stem-cell function. Among those compounds that inhibited malignant cells but spared HSPCs was the cholesterol-lowering drug lovastatin. Lovastatin showed anti-LSC activity in vitro and in an in vivo bone marrow transplantation model. Mechanistic studies demonstrated that the effect was on target, via inhibition of HMG-CoA reductase. These results illustrate the power of merging physiologically relevant models with high-throughput screening.
Assuntos
Antineoplásicos/farmacologia , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Leucemia , Células-Tronco Neoplásicas/efeitos dos fármacos , Linhagem Celular Tumoral , Células-Tronco Hematopoéticas , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/farmacologia , Lovastatina/farmacologia , Células-Tronco Neoplásicas/citologia , Células-Tronco Neoplásicas/fisiologiaRESUMO
Humans and animals must often discriminate between complex natural sounds in the presence of competing sounds (maskers). Although the auditory cortex is thought to be important in this task, the impact of maskers on cortical discrimination remains poorly understood. We examined neural responses in zebra finch (Taeniopygia guttata) field L (homologous to primary auditory cortex) to target birdsongs that were embedded in three different maskers (broadband noise, modulated noise and birdsong chorus). We found two distinct forms of interference in the neural responses: the addition of spurious spikes occurring primarily during the silent gaps between song syllables and the suppression of informative spikes occurring primarily during the syllables. Both effects systematically degraded neural discrimination as the target intensity decreased relative to that of the masker. The behavioral performance of songbirds degraded in a parallel manner. Our results identify neural interference that could explain the perceptual interference at the heart of the cocktail party problem.
Assuntos
Córtex Auditivo/citologia , Discriminação Psicológica/fisiologia , Neurônios/fisiologia , Mascaramento Perceptivo/fisiologia , Som , Vocalização Animal/fisiologia , Estimulação Acústica/métodos , Potenciais de Ação/fisiologia , Análise de Variância , Animais , Comportamento Animal , Condicionamento Operante , Relação Dose-Resposta à Radiação , Tentilhões , Masculino , Reconhecimento Fisiológico de Modelo/fisiologia , PsicometriaRESUMO
Anti-cancer uses of non-oncology drugs have occasionally been found, but such discoveries have been serendipitous. We sought to create a public resource containing the growth inhibitory activity of 4,518 drugs tested across 578 human cancer cell lines. We used PRISM, a molecular barcoding method, to screen drugs against cell lines in pools. An unexpectedly large number of non-oncology drugs selectively inhibited subsets of cancer cell lines in a manner predictable from the cell lines' molecular features. Our findings include compounds that killed by inducing PDE3A-SLFN12 complex formation; vanadium-containing compounds whose killing depended on the sulfate transporter SLC26A2; the alcohol dependence drug disulfiram, which killed cells with low expression of metallothioneins; and the anti-inflammatory drug tepoxalin, which killed via the multi-drug resistance protein ABCB1. The PRISM drug repurposing resource (https://depmap.org/repurposing) is a starting point to develop new oncology therapeutics, and more rarely, for potential direct clinical translation.
Assuntos
Neoplasias , Linhagem Celular , Dissulfiram , Reposicionamento de Medicamentos , Humanos , Neoplasias/tratamento farmacológicoRESUMO
Intensity variation poses a fundamental problem for sensory discrimination because changes in the response of sensory neurons as a result of stimulus identity, e.g., a change in the identity of the speaker uttering a word, can potentially be confused with changes resulting from stimulus intensity, for example, the loudness of the utterance. Here we report on the responses of neurons in field L, the primary auditory cortex homolog in songbirds, which allow for accurate discrimination of birdsongs that is invariant to intensity changes over a large range. Such neurons comprise a subset of a population that is highly diverse, in terms of both discrimination accuracy and intensity sensitivity. We find that the neurons with a high degree of invariance also display a high discrimination performance, and that the degree of invariance is significantly correlated with the reproducibility of spike timing on a short time scale and the temporal sparseness of spiking activity. Our results indicate that a temporally sparse spike timing-based code at a primary cortical stage can provide a substrate for intensity-invariant discrimination of natural sounds.
Assuntos
Estimulação Acústica/métodos , Vias Auditivas/fisiologia , Discriminação da Altura Tonal/fisiologia , Som , Vocalização Animal/fisiologia , Animais , Percepção Auditiva/fisiologia , Tentilhões , MasculinoRESUMO
Open data science and algorithm development competitions offer a unique avenue for rapid discovery of better computational strategies. We highlight three examples in computational biology and bioinformatics research in which the use of competitions has yielded significant performance gains over established algorithms. These include algorithms for antibody clustering, imputing gene expression data, and querying the Connectivity Map (CMap). Performance gains are evaluated quantitatively using realistic, albeit sanitized, data sets. The solutions produced through these competitions are then examined with respect to their utility and the prospects for implementation in the field. We present the decision process and competition design considerations that lead to these successful outcomes as a model for researchers who want to use competitions and non-domain crowds as collaborators to further their research.
Assuntos
Biologia Computacional/tendências , Algoritmos , Anticorpos/classificação , Anticorpos/genética , Análise por Conglomerados , Crowdsourcing/tendências , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Invenções/tendênciasRESUMO
A central finding in many cortical areas is that single neurons can match behavioral performance in the discrimination of sensory stimuli. However, whether this is true for natural behaviors involving complex natural stimuli remains unknown. Here we use the model system of songbirds to address this problem. Specifically, we investigate whether neurons in field L, the homolog of primary auditory cortex, can match behavioral performance in the discrimination of conspecific songs. We use a classification framework based on the (dis)similarity between single spike trains to quantify neural discrimination. We use this framework to investigate the discriminability of single spike trains in field L in response to conspecific songs, testing different candidate neural codes underlying discrimination. We find that performance based on spike timing is significantly higher than performance based on spike rate and interspike intervals. We then assess the impact of temporal correlations in spike trains on discrimination. In contrast to widely discussed effects of correlations in limiting the accuracy of a population code, temporal correlations appear to improve the performance of single neurons in the majority of cases. Finally, we compare neural performance with behavioral performance. We find a diverse range of performance levels in field L, with neural performance matching behavioral accuracy only for the best neurons using a spike-timing-based code.
Assuntos
Estimulação Acústica/métodos , Córtex Auditivo/fisiologia , Comportamento Animal/fisiologia , Discriminação Psicológica/fisiologia , Neurônios/fisiologia , Potenciais de Ação/fisiologia , Animais , Córtex Auditivo/citologia , Percepção Auditiva/fisiologia , Tentilhões , Neurônios/citologia , Vocalização Animal/fisiologiaRESUMO
Recent genome sequencing efforts have identified millions of somatic mutations in cancer. However, the functional impact of most variants is poorly understood. Here we characterize 194 somatic mutations identified in primary lung adenocarcinomas. We present an expression-based variant-impact phenotyping (eVIP) method that uses gene expression changes to distinguish impactful from neutral somatic mutations. eVIP identified 69% of mutations analyzed as impactful and 31% as functionally neutral. A subset of the impactful mutations induces xenograft tumor formation in mice and/or confers resistance to cellular EGFR inhibition. Among these impactful variants are rare somatic, clinically actionable variants including EGFR S645C, ARAF S214C and S214F, ERBB2 S418T, and multiple BRAF variants, demonstrating that rare mutations can be functionally important in cancer.
Assuntos
Adenocarcinoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias Pulmonares/genética , Mutação , Adenocarcinoma de Pulmão , Animais , Linhagem Celular Tumoral , Perfilação da Expressão Gênica , Xenoenxertos , Humanos , Camundongos , Oncogenes , FenótipoRESUMO
The ribosome is centrally situated to sense metabolic states, but whether its activity, in turn, coherently rewires transcriptional responses is unknown. Here, through integrated chemical-genetic analyses, we found that a dominant transcriptional effect of blocking protein translation in cancer cells was inactivation of heat shock factor 1 (HSF1), a multifaceted transcriptional regulator of the heat-shock response and many other cellular processes essential for anabolic metabolism, cellular proliferation, and tumorigenesis. These analyses linked translational flux to the regulation of HSF1 transcriptional activity and to the modulation of energy metabolism. Targeting this link with translation initiation inhibitors such as rocaglates deprived cancer cells of their energy and chaperone armamentarium and selectively impaired the proliferation of both malignant and premalignant cells with early-stage oncogenic lesions.
Assuntos
Proteínas de Ligação a DNA/biossíntese , Neoplasias/metabolismo , Neoplasias/patologia , Biossíntese de Proteínas/fisiologia , Ribossomos/metabolismo , Fatores de Transcrição/biossíntese , Animais , Antineoplásicos/química , Antineoplásicos/isolamento & purificação , Antineoplásicos/farmacologia , Benzofuranos/farmacologia , Linhagem Celular Tumoral , Proliferação de Células , Transformação Celular Neoplásica/efeitos dos fármacos , Transformação Celular Neoplásica/metabolismo , Transformação Celular Neoplásica/patologia , Proteínas de Ligação a DNA/antagonistas & inibidores , Metabolismo Energético/efeitos dos fármacos , Regulação Neoplásica da Expressão Gênica , Fatores de Transcrição de Choque Térmico , Ensaios de Triagem em Larga Escala , Humanos , Camundongos , Células NIH 3T3 , Transplante de Neoplasias , Neoplasias/genética , Biossíntese de Proteínas/efeitos dos fármacos , Biossíntese de Proteínas/genética , Ribossomos/efeitos dos fármacos , Fatores de Transcrição/antagonistas & inibidoresRESUMO
Budgerigars and zebra finches were tested, using operant conditioning techniques, on their ability to identify a zebra finch song in the presence of a background masker emitted from either the same or a different location as the signal. Identification thresholds were obtained for three masker types differing in their spectrotemporal characteristics (noise, modulated noise, and a song chorus). Both bird species exhibited similar amounts of spatial unmasking across the three masker types. The amount of unmasking was greater when the masker was played continuously compared to when the target and masker were presented simultaneously. These results suggest that spatial factors are important for birds in the identification of natural signals in noisy environments.