RESUMO
We previously piloted the concept of a Connectivity Map (CMap), whereby genes, drugs, and disease states are connected by virtue of common gene-expression signatures. Here, we report more than a 1,000-fold scale-up of the CMap as part of the NIH LINCS Consortium, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that we term L1000. We show that L1000 is highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts. We further show that the expanded CMap can be used to discover mechanism of action of small molecules, functionally annotate genetic variants of disease genes, and inform clinical trials. The 1.3 million L1000 profiles described here, as well as tools for their analysis, are available at https://clue.io.
Assuntos
Perfilação da Expressão Gênica/métodos , Linhagem Celular Tumoral , Resistencia a Medicamentos Antineoplásicos , Perfilação da Expressão Gênica/economia , Humanos , Neoplasias/tratamento farmacológico , Especificidade de Órgãos , Preparações Farmacêuticas/metabolismo , Análise de Sequência de RNA/economia , Análise de Sequência de RNA/métodos , Bibliotecas de Moléculas PequenasRESUMO
Gene-expression profiling has become a mainstay in immunology, but subtle changes in gene networks related to biological processes are hard to discern when comparing various datasets. For instance, conservation of the transcriptional response to sepsis in mouse models and human disease remains controversial. To improve transcriptional analysis in immunology, we created ImmuneSigDB: a manually annotated compendium of â¼5,000 gene-sets from diverse cell states, experimental manipulations, and genetic perturbations in immunology. Analysis using ImmuneSigDB identified signatures induced in activated myeloid cells and differentiating lymphocytes that were highly conserved between humans and mice. Sepsis triggered conserved patterns of gene expression in humans and mouse models. However, we also identified species-specific biological processes in the sepsis transcriptional response: although both species upregulated phagocytosis-related genes, a mitosis signature was specific to humans. ImmuneSigDB enables granular analysis of transcriptomic data to improve biological understanding of immune processes of the human and mouse immune systems.
Assuntos
Bases de Dados Genéticas , Inflamação/imunologia , Transcriptoma , Animais , Humanos , Camundongos , Especificidade da EspécieRESUMO
Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.
Assuntos
Genômica/métodos , Internet , Aprendizado de Máquina , DNA/genética , Bases de Dados de Ácidos Nucleicos , Técnicas de Amplificação de Ácido Nucleico , RNA/genética , SoftwareRESUMO
UNLABELLED: Studying plants using high-throughput genomics technologies is becoming routine, but interpretation of genome-wide expression data in terms of biological pathways remains a challenge, partly due to the lack of pathway databases. To create a knowledgebase for plant pathway analysis, we collected 1683 lists of differentially expressed genes from 397 gene-expression studies, which constitute a molecular signature database of various genetic and environmental perturbations of Arabidopsis. In addition, we extracted 1909 gene sets from various sources such as Gene Ontology, KEGG, AraCyc, Plant Ontology, predicted target genes of microRNAs and transcription factors, and computational gene clusters defined by meta-analysis. With this knowledgebase, we applied Gene Set Enrichment Analysis to an expression profile of cold acclimation and identified expected functional categories and pathways. Our results suggest that the AraPath database can be used to generate specific, testable hypotheses regarding plant molecular pathways from gene expression data. AVAILABILITY: http://bioinformatics.sdstate.edu/arapath/.
Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Bases de Conhecimento , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Genoma de Planta , Genômica/métodos , Família MultigênicaRESUMO
BACKGROUND & AIMS: In approximately 70% of patients with hepatocellular carcinoma (HCC) treated by resection or ablation, disease recurs within 5 years. Although gene expression signatures have been associated with outcome, there is no method to predict recurrence based on combined clinical, pathology, and genomic data (from tumor and cirrhotic tissue). We evaluated gene expression signatures associated with outcome in a large cohort of patients with early stage (Barcelona-Clinic Liver Cancer 0/A), single-nodule HCC and heterogeneity of signatures within tumor tissues. METHODS: We assessed 287 HCC patients undergoing resection and tested genome-wide expression platforms using tumor (n = 287) and adjacent nontumor, cirrhotic tissue (n = 226). We evaluated gene expression signatures with reported prognostic ability generated from tumor or cirrhotic tissue in 18 and 4 reports, respectively. In 15 additional patients, we profiled samples from the center and periphery of the tumor, to determine stability of signatures. Data analysis included Cox modeling and random survival forests to identify independent predictors of tumor recurrence. RESULTS: Gene expression signatures that were associated with aggressive HCC were clustered, as well as those associated with tumors of progenitor cell origin and those from nontumor, adjacent, cirrhotic tissues. On multivariate analysis, the tumor-associated signature G3-proliferation (hazard ratio [HR], 1.75; P = .003) and an adjacent poor-survival signature (HR, 1.74; P = .004) were independent predictors of HCC recurrence, along with satellites (HR, 1.66; P = .04). Samples from different sites in the same tumor nodule were reproducibly classified. CONCLUSIONS: We developed a composite prognostic model for HCC recurrence, based on gene expression patterns in tumor and adjacent tissues. These signatures predict early and overall recurrence in patients with HCC, and complement findings from clinical and pathology analyses.
Assuntos
Biomarcadores Tumorais/genética , Carcinoma Hepatocelular/genética , DNA de Neoplasias/genética , Regulação Neoplásica da Expressão Gênica , Neoplasias Hepáticas/genética , Recidiva Local de Neoplasia/diagnóstico , Biomarcadores Tumorais/biossíntese , Carcinoma Hepatocelular/patologia , Carcinoma Hepatocelular/cirurgia , Feminino , Genótipo , Hepatectomia , Humanos , Incidência , Itália/epidemiologia , Japão/epidemiologia , Neoplasias Hepáticas/patologia , Neoplasias Hepáticas/cirurgia , Masculino , Pessoa de Meia-Idade , Recidiva Local de Neoplasia/epidemiologia , Prognóstico , Espanha/epidemiologia , Taxa de Sobrevida , Estados Unidos/epidemiologiaRESUMO
MOTIVATION: Well-annotated gene sets representing the universe of the biological processes are critical for meaningful and insightful interpretation of large-scale genomic data. The Molecular Signatures Database (MSigDB) is one of the most widely used repositories of such sets. RESULTS: We report the availability of a new version of the database, MSigDB 3.0, with over 6700 gene sets, a complete revision of the collection of canonical pathways and experimental signatures from publications, enhanced annotations and upgrades to the web site. AVAILABILITY AND IMPLEMENTATION: MSigDB is freely available for non-commercial use at http://www.broadinstitute.org/msigdb.
Assuntos
Bases de Dados Genéticas , Genômica , Internet , Anotação de Sequência MolecularRESUMO
Cystic fibrosis (CF) is a lethal autosomal recessive disease caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene. The common ΔF508-CFTR mutation results in protein misfolding and proteasomal degradation. If ΔF508-CFTR trafficks to the cell surface, its anion channel function may be partially restored. Several in vitro strategies can partially correct ΔF508-CFTR trafficking and function, including low-temperature, small molecules, overexpression of miR-138, or knockdown of SIN3A. The challenge remains to translate such interventions into therapies and to understand their mechanisms. One approach for connecting such interventions to small molecule therapies that has previously succeeded for CF and other diseases is via mRNA expression profiling and iterative searches of small molecules with similar expression signatures. Here, we query the Library of Integrated Network-based Cellular Signatures using transcriptomic signatures from previously generated CF expression data, including RNAi- and low temperature-based rescue signatures. This LINCS in silico screen prioritized 135 small molecules that mimicked our rescue interventions based on their genomewide transcriptional perturbations. Functional screens of these small molecules identified eight compounds that partially restored ΔF508-CFTR function, as assessed by cAMP-activated chloride conductance. Of these, XL147 rescued ΔF508-CFTR function in primary CF airway epithelia, while also showing cooperativity when administered with C18. Improved CF corrector therapies are needed and this integrative drug prioritization approach offers a novel method to both identify small molecules that may rescue ΔF508-CFTR function and identify gene networks underlying such rescue.
Assuntos
Fibrose Cística , MicroRNAs , Linhagem Celular , Fibrose Cística/tratamento farmacológico , Fibrose Cística/genética , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Regulador de Condutância Transmembrana em Fibrose Cística/metabolismo , Descoberta de Drogas , Humanos , MicroRNAs/genética , MutaçãoRESUMO
Type 2 diabetes mellitus is a complex disorder associated with multiple genetic, epigenetic, developmental, and environmental factors. Animal models of type 2 diabetes differ based on diet, drug treatment, and gene knockouts, and yet all display the clinical hallmarks of hyperglycemia and insulin resistance in peripheral tissue. The recent advances in gene-expression microarray technologies present an unprecedented opportunity to study type 2 diabetes mellitus at a genome-wide scale and across different models. To date, a key challenge has been to identify the biological processes or signaling pathways that play significant roles in the disorder. Here, using a network-based analysis methodology, we identified two sets of genes, associated with insulin signaling and a network of nuclear receptors, which are recurrent in a statistically significant number of diabetes and insulin resistance models and transcriptionally altered across diverse tissue types. We additionally identified a network of protein-protein interactions between members from the two gene sets that may facilitate signaling between them. Taken together, the results illustrate the benefits of integrating high-throughput microarray studies, together with protein-protein interaction networks, in elucidating the underlying biological processes associated with a complex disorder.
Assuntos
Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Modelos Biológicos , Biologia de Sistemas , Animais , Diabetes Mellitus Tipo 2/fisiopatologia , Modelos Animais de Doenças , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/fisiologia , Humanos , Insulina/fisiologia , Transdução de Sinais/fisiologiaRESUMO
The systematic sequencing of the cancer genome has led to the identification of numerous genetic alterations in cancer. However, a deeper understanding of the functional consequences of these alterations is necessary to guide appropriate therapeutic strategies. Here, we describe Onco-GPS (OncoGenic Positioning System), a data-driven analysis framework to organize individual tumor samples with shared oncogenic alterations onto a reference map defined by their underlying cellular states. We applied the methodology to the RAS pathway and identified nine distinct components that reflect transcriptional activities downstream of RAS and defined several functional states associated with patterns of transcriptional component activation that associates with genomic hallmarks and response to genetic and pharmacological perturbations. These results show that the Onco-GPS is an effective approach to explore the complex landscape of oncogenic cellular states across cancers, and an analytic framework to summarize knowledge, establish relationships, and generate more effective disease models for research or as part of individualized precision medicine paradigms.
Assuntos
Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , Biomarcadores Tumorais/metabolismo , Linhagem Celular Tumoral , Perfilação da Expressão Gênica/métodos , Genes ras/genética , Genoma , Humanos , Sistema de Sinalização das MAP Quinases , Neoplasias/patologia , Medicina de PrecisãoRESUMO
PDX1 is a homeodomain transcription factor essential for pancreatic development and mature beta cell function. Homeodomain proteins typically recognize short TAAT DNA motifs in vitro: this binding displays paradoxically low specificity and affinity, given the extremely high specificity of action of these proteins in vivo. To better understand how PDX1 selects target genes in vivo, we have examined the interaction of PDX1 with natural and artificial binding sites. Comparison of PDX1 binding sites in several target promoters revealed an evolutionarily conserved pattern of nucleotides flanking the TAAT core. Using competitive in vitro DNA binding assays, we defined three groups of binding sites displaying high, intermediate and low affinity. Transfection experiments revealed a striking correlation between the ability of each sequence to activate transcription in cultured beta cells, and its ability to bind PDX1 in vitro. Site selection from a pool of oligonucleotides (sequence NNNTAATNNN) revealed a non-random preference for particular nucleotides at the flanking locations, resembling natural PDX1 binding sites. Taken together, the data indicate that the intrinsic DNA binding specificity of PDX1, in particular the bases adjacent to TAAT, plays an important role in determining the spectrum of target genes.
Assuntos
DNA/genética , DNA/metabolismo , Proteínas de Homeodomínio , Elementos de Resposta/genética , Transativadores/metabolismo , Animais , Sequência de Bases , Ligação Competitiva , Linhagem Celular Tumoral , Cricetinae , Insulina/genética , Camundongos , Regiões Promotoras Genéticas/genética , Ligação Proteica , Ratos , Alinhamento de Sequência , Especificidade por SubstratoRESUMO
Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.
Assuntos
Genoma Humano , Bases de Conhecimento , Bioestatística , Bases de Dados Genéticas/estatística & dados numéricos , Epistasia Genética , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Estatísticas não ParamétricasRESUMO
Systematic efforts to sequence the cancer genome have identified large numbers of mutations and copy number alterations in human cancers. However, elucidating the functional consequences of these variants, and their interactions to drive or maintain oncogenic states, remains a challenge in cancer research. We developed REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene dependency of oncogenic pathways or sensitivity to a drug treatment. We used REVEALER to uncover complementary genomic alterations associated with the transcriptional activation of ß-catenin and NRF2, MEK-inhibitor sensitivity, and KRAS dependency. REVEALER successfully identified both known and new associations, demonstrating the power of combining functional profiles with extensive characterization of genomic alterations in cancer genomes.
Assuntos
Biomarcadores Tumorais/genética , Mapeamento Cromossômico/métodos , Estudo de Associação Genômica Ampla/métodos , Proteínas de Neoplasias/genética , Neoplasias/genética , Polimorfismo de Nucleotídeo Único/genética , Resistencia a Medicamentos Antineoplásicos/genética , Genes Neoplásicos/genética , Predisposição Genética para Doença/genética , Genoma Humano/genética , Humanos , Mutação/genética , Neoplasias/diagnóstico , Transdução de Sinais/genéticaRESUMO
The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.
RESUMO
Annotated lists of genes help researchers to prioritize their own lists of candidate genes and to plan follow-up studies. The Molecular Signatures Database (MSigDB) is one of the most widely used knowledge base repositories of annotated sets of genes involved in biochemical pathways, signaling cascades, expression profiles from research publications, and other biological concepts. Here we provide an overview of MSigDB and its online analytical tools.
Assuntos
Bases de Dados Genéticas , Internet , Mineração de Dados , Interface Usuário-ComputadorRESUMO
Different types of mature B-cell lymphocytes are overall highly similar. Nevertheless, some B cells proliferate intensively, while others rarely do. Here, we demonstrate that a simple binary classification of gene expression in proliferating vs. resting B cells can identify, with remarkable selectivity, global in vivo regulators of the mammalian cell cycle, many of which are also post-translationally regulated by the APC/C E3 ligase. Consequently, we discover a novel regulatory network between the APC/C and the E2F transcription factors and discuss its potential impact on the G1-S transition of the cell cycle. In addition, by focusing on genes whose expression inversely correlates with proliferation, we demonstrate the inherent ability of our approach to also identify in vivo regulators of cell differentiation, cell survival, and other antiproliferative processes. Relying on data sets of wt, non-transgenic animals, our approach can be applied to other cell lineages and human data sets.
Assuntos
Redes Reguladoras de Genes , Transcriptoma , Proteína da Polipose Adenomatosa do Colo/metabolismo , Linfócitos B/citologia , Linfócitos B/metabolismo , Diferenciação Celular , Linhagem da Célula , Proliferação de Células , Fatores de Transcrição E2F/metabolismo , Fator de Transcrição E2F1/metabolismo , Fase G1 , Células HEK293 , Células HeLa , Humanos , Fase S , Complexos Ubiquitina-Proteína Ligase/metabolismo , Ubiquitina-Proteína Ligases/metabolismoRESUMO
The abundance of genomic data now available in biomedical research has stimulated the development of sophisticated statistical methods for interpreting the data, and of special visualization tools for displaying the results in a concise and meaningful manner. However, biologists often find these methods and tools difficult to understand and use correctly. GenePattern is a freely available software package that addresses this issue by providing more than 100 analysis and visualization tools for genomic research in a comprehensive user-friendly environment for users at all levels of computational experience and sophistication. This unit demonstrates how to prepare and analyze microarray data in GenePattern.