RESUMO
Many patients with advanced cancers achieve dramatic responses to a panoply of therapeutics yet retain minimal residual disease (MRD), which ultimately results in relapse. To gain insights into the biology of MRD, we applied single-cell RNA sequencing to malignant cells isolated from BRAF mutant patient-derived xenograft melanoma cohorts exposed to concurrent RAF/MEK-inhibition. We identified distinct drug-tolerant transcriptional states, varying combinations of which co-occurred within MRDs from PDXs and biopsies of patients on treatment. One of these exhibited a neural crest stem cell (NCSC) transcriptional program largely driven by the nuclear receptor RXRG. An RXR antagonist mitigated accumulation of NCSCs in MRD and delayed the development of resistance. These data identify NCSCs as key drivers of resistance and illustrate the therapeutic potential of MRD-directed therapy. They also highlight how gene regulatory network architecture reprogramming may be therapeutically exploited to limit cellular heterogeneity, a key driver of disease progression and therapy resistance.
Assuntos
Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Melanoma/tratamento farmacológico , Neoplasia Residual/tratamento farmacológico , Células-Tronco Neoplásicas/efeitos dos fármacos , Células-Tronco Neurais/efeitos dos fármacos , Inibidores de Proteínas Quinases/farmacologia , Receptor X Retinoide gama/antagonistas & inibidores , Animais , Biomarcadores Tumorais , Resistencia a Medicamentos Antineoplásicos/efeitos dos fármacos , Feminino , Humanos , MAP Quinase Quinase 1/antagonistas & inibidores , MAP Quinase Quinase 1/genética , Masculino , Melanoma/metabolismo , Melanoma/patologia , Camundongos SCID , Mutação , Neoplasia Residual/metabolismo , Neoplasia Residual/patologia , Células-Tronco Neoplásicas/metabolismo , Células-Tronco Neoplásicas/patologia , Células-Tronco Neurais/metabolismo , Células-Tronco Neurais/patologia , Proteínas Proto-Oncogênicas B-raf/antagonistas & inibidores , Proteínas Proto-Oncogênicas B-raf/genética , Células Tumorais Cultivadas , Ensaios Antitumorais Modelo de XenoenxertoRESUMO
The diversity of cell types and regulatory states in the brain, and how these change during aging, remains largely unknown. We present a single-cell transcriptome atlas of the entire adult Drosophila melanogaster brain sampled across its lifespan. Cell clustering identified 87 initial cell clusters that are further subclustered and validated by targeted cell-sorting. Our data show high granularity and identify a wide range of cell types. Gene network analyses using SCENIC revealed regulatory heterogeneity linked to energy consumption. During aging, RNA content declines exponentially without affecting neuronal identity in old brains. This single-cell brain atlas covers nearly all cells in the normal brain and provides the tools to study cellular diversity alongside other Drosophila and mammalian single-cell datasets in our unique single-cell analysis platform: SCope (http://scope.aertslab.org). These results, together with SCope, allow comprehensive exploration of all transcriptional states of an entire aging brain.
Assuntos
Envelhecimento , Encéfalo/metabolismo , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Redes Reguladoras de Genes , Análise de Célula Única/métodos , Transcriptoma , Animais , Drosophila melanogaster/fisiologia , Feminino , Perfilação da Expressão Gênica , MasculinoRESUMO
The Drosophila brain is a frequently used model in neuroscience. Single-cell transcriptome analysis1-6, three-dimensional morphological classification7 and electron microscopy mapping of the connectome8,9 have revealed an immense diversity of neuronal and glial cell types that underlie an array of functional and behavioural traits in the fly. The identities of these cell types are controlled by gene regulatory networks (GRNs), involving combinations of transcription factors that bind to genomic enhancers to regulate their target genes. Here, to characterize GRNs at the cell-type level in the fly brain, we profiled the chromatin accessibility of 240,919 single cells spanning 9 developmental timepoints and integrated these data with single-cell transcriptomes. We identify more than 95,000 regulatory regions that are used in different neuronal cell types, of which 70,000 are linked to developmental trajectories involving neurogenesis, reprogramming and maturation. For 40 cell types, uniquely accessible regions were associated with their expressed transcription factors and downstream target genes through a combination of motif discovery, network inference and deep learning, creating enhancer GRNs. The enhancer architectures revealed by DeepFlyBrain lead to a better understanding of neuronal regulatory diversity and can be used to design genetic driver lines for cell types at specific timepoints, facilitating their characterization and manipulation.
Assuntos
Drosophila , Regulação da Expressão Gênica , Animais , Encéfalo/metabolismo , Drosophila/genética , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes/genética , Fatores de Transcrição/metabolismoRESUMO
Joint profiling of chromatin accessibility and gene expression in individual cells provides an opportunity to decipher enhancer-driven gene regulatory networks (GRNs). Here we present a method for the inference of enhancer-driven GRNs, called SCENIC+. SCENIC+ predicts genomic enhancers along with candidate upstream transcription factors (TFs) and links these enhancers to candidate target genes. To improve both recall and precision of TF identification, we curated and clustered a motif collection with more than 30,000 motifs. We benchmarked SCENIC+ on diverse datasets from different species, including human peripheral blood mononuclear cells, ENCODE cell lines, melanoma cell states and Drosophila retinal development. Next, we exploit SCENIC+ predictions to study conserved TFs, enhancers and GRNs between human and mouse cell types in the cerebral cortex. Finally, we use SCENIC+ to study the dynamics of gene regulation along differentiation trajectories and the effect of TF perturbations on cell state. SCENIC+ is available at scenicplus.readthedocs.io .
Assuntos
Redes Reguladoras de Genes , Multiômica , Animais , Humanos , Camundongos , Leucócitos Mononucleares , Regulação da Expressão Gênica , Cromatina/genética , Drosophila/genética , Elementos Facilitadores GenéticosRESUMO
We present cisTopic, a probabilistic framework used to simultaneously discover coaccessible enhancers and stable cell states from sparse single-cell epigenomics data ( http://github.com/aertslab/cistopic ). Using a compendium of single-cell ATAC-seq datasets from differentiating hematopoietic cells, brain and transcription factor perturbations, we demonstrate that topic modeling can be exploited for robust identification of cell types, enhancers and relevant transcription factors. cisTopic provides insight into the mechanisms underlying regulatory heterogeneity in cell populations.
Assuntos
Epigenômica/métodos , Perfilação da Expressão Gênica/métodos , Modelos Teóricos , Análise de Célula Única/métodos , Animais , Células Sanguíneas/metabolismo , Encéfalo/metabolismo , Células Cultivadas , Análise por Conglomerados , Redes Reguladoras de Genes/genética , Humanos , Camundongos , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Sequência de RNA , Fluxo de TrabalhoRESUMO
Single-cell technologies allow measuring chromatin accessibility and gene expression in each cell, but jointly utilizing both layers to map bona fide gene regulatory networks and enhancers remains challenging. Here, we generate independent single-cell RNA-seq and single-cell ATAC-seq atlases of the Drosophila eye-antennal disc and spatially integrate the data into a virtual latent space that mimics the organization of the 2D tissue using ScoMAP (Single-Cell Omics Mapping into spatial Axes using Pseudotime ordering). To validate spatially predicted enhancers, we use a large collection of enhancer-reporter lines and identify ~ 85% of enhancers in which chromatin accessibility and enhancer activity are coupled. Next, we infer enhancer-to-gene relationships in the virtual space, finding that genes are mostly regulated by multiple, often redundant, enhancers. Exploiting cell type-specific enhancers, we deconvolute cell type-specific effects of bulk-derived chromatin accessibility QTLs. Finally, we discover that Prospero drives neuronal differentiation through the binding of a GGG motif. In summary, we provide a comprehensive spatial characterization of gene regulation in a 2D tissue.
Assuntos
Cromatina/metabolismo , Drosophila/genética , Elementos Facilitadores Genéticos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica/genética , Análise de Célula Única/métodos , Animais , Animais Geneticamente Modificados , Antenas de Artrópodes/metabolismo , Diferenciação Celular/genética , Cromatina/genética , Sequenciamento de Cromatina por Imunoprecipitação , Bases de Dados Genéticas , Drosophila/metabolismo , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Epigenômica , Olho/crescimento & desenvolvimento , Olho/metabolismo , Ontologia Genética , Redes Reguladoras de Genes , Genômica , Imuno-Histoquímica , Larva/genética , Larva/crescimento & desenvolvimento , Larva/metabolismo , Proteínas do Tecido Nervoso/genética , Proteínas do Tecido Nervoso/metabolismo , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Células Fotorreceptoras/metabolismo , Regiões Promotoras Genéticas , Locos de Características Quantitativas , Análise Espaço-Temporal , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcriptoma/genéticaRESUMO
We present SCENIC, a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (http://scenic.aertslab.org). On a compendium of single-cell data from tumors and brain, we demonstrate that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states. SCENIC provides critical biological insights into the mechanisms driving cellular heterogeneity.
Assuntos
Redes Reguladoras de Genes , Análise de Célula Única , Algoritmos , Animais , Encéfalo/metabolismo , Análise por Conglomerados , Perfilação da Expressão Gênica , Humanos , CamundongosRESUMO
Drosophila eye development is a complex process that involves many transcription factors (TFs) and interactions with their cofactors and targets. The TF Sine oculis (So) and its cofactor Eyes absent (Eya) are highly conserved and are both necessary and sufficient for eye development. Despite their many important roles during development, the direct targets of So are still largely unknown. Therefore the So-dependent regulatory network governing eye determination and differentiation is poorly understood. In this study, we intersected gene expression profiles of so or eya mutant eye tissue prepared from three different developmental stages and identified 1731 differentially expressed genes across the Drosophila genome. A combination of co-expression analyses and motif discovery identified a set of twelve putative direct So targets, including three known and nine novel targets. We also used our previous So ChIP-seq data to assess motif predictions for So and identified a canonical So binding motif. Finally, we performed in vivo enhancer reporter assays to test predicted enhancers from six candidate target genes and find that at least one enhancer from each gene is expressed in the developing eye disc and that their expression patterns overlap with that of So. We furthermore confirmed that the expression level of predicted direct So targets, for which antibodies are available, are reduced in so or eya post-mitotic knockout eye discs. In summary, we expand the set of putative So targets and show for the first time that the combined use of expression profiling of so with its cofactor eya is an effective method to identify novel So targets. Moreover, since So is highly conserved throughout the metazoa, our results provide the basis for future functional studies in a wide variety of organisms.
Assuntos
Olho Composto de Artrópodes/crescimento & desenvolvimento , Proteínas de Drosophila/fisiologia , Drosophila melanogaster/genética , Proteínas do Olho/fisiologia , Regulação da Expressão Gênica no Desenvolvimento , Estudo de Associação Genômica Ampla , Proteínas de Homeodomínio/fisiologia , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Imunoprecipitação da Cromatina , Olho Composto de Artrópodes/ultraestrutura , Sequência Consenso , Drosophila melanogaster/crescimento & desenvolvimento , Ontologia Genética , Estudos de Associação Genética , Discos Imaginais/metabolismo , Larva , Pupa , RNA Mensageiro/genética , Transcrição Gênica , TranscriptomaRESUMO
BACKGROUND: In the study of complex diseases using genome-wide expression data from clinical samples, a difficult case is the identification and mapping of the gene signatures associated to the stages that occur in the progression of a disease. The stages usually correspond to different subtypes or classes of the disease, and the difficulty to identify them often comes from patient heterogeneity and sample variability that can hide the biomedical relevant changes that characterize each stage, making standard differential analysis inadequate or inefficient. RESULTS: We propose a methodology to study diseases or disease stages ordered in a sequential manner (e.g. from early stages with good prognosis to more acute or serious stages associated to poor prognosis). The methodology is applied to diseases that have been studied obtaining genome-wide expression profiling of cohorts of patients at different stages. The approach allows searching for consistent expression patterns along the progression of the disease through two major steps: (i) identifying genes with increasing or decreasing trends in the progression of the disease; (ii) clustering the increasing/decreasing gene expression patterns using an unsupervised approach to reveal whether there are consistent patterns and find genes altered at specific disease stages. The first step is carried out using Gamma rank correlation to identify genes whose expression correlates with a categorical variable that represents the stages of the disease. The second step is done using a Self Organizing Map (SOM) to cluster the genes according to their progressive profiles and identify specific patterns. Both steps are done after normalization of the genomic data to allow the integration of multiple independent datasets. In order to validate the results and evaluate their consistency and biological relevance, the methodology is applied to datasets of three different diseases: myelodysplastic syndrome, colorectal cancer and Alzheimer's disease. A software script written in R, named genediseasePatterns, is provided to allow the use and application of the methodology. CONCLUSION: The method presented allows the analysis of the progression of complex and heterogeneous diseases that can be divided in pathological stages. It identifies gene groups whose expression patterns change along the advance of the disease, and it can be applied to different types of genomic data studying cohorts of patients in different states.
Assuntos
Perfilação da Expressão Gênica/métodos , Transcriptoma , Algoritmos , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Doença de Alzheimer/patologia , Análise por Conglomerados , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Bases de Dados Genéticas , Progressão da Doença , Humanos , Síndromes Mielodisplásicas/genética , Síndromes Mielodisplásicas/metabolismo , Síndromes Mielodisplásicas/patologia , Estadiamento de Neoplasias , Análise de Sequência de RNA , Índice de Gravidade de DoençaRESUMO
Functional Gene Networks (FGNet) is an R/Bioconductor package that generates gene networks derived from the results of functional enrichment analysis (FEA) and annotation clustering. The sets of genes enriched with specific biological terms (obtained from a FEA platform) are transformed into a network by establishing links between genes based on common functional annotations and common clusters. The network provides a new view of FEA results revealing gene modules with similar functions and genes that are related to multiple functions. In addition to building the functional network, FGNet analyses the similarity between the groups of genes and provides a distance heatmap and a bipartite network of functionally overlapping genes. The application includes an interface to directly perform FEA queries using different external tools: DAVID, GeneTerm Linker, TopGO or GAGE; and a graphical interface to facilitate the use.
Assuntos
Redes Reguladoras de Genes , Software , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Análise por Conglomerados , Córtex Entorrinal/metabolismo , Perfilação da Expressão Gênica/métodos , Humanos , Neurônios/metabolismoRESUMO
BACKGROUND: Despite the large increase of transcriptomic studies that look for gene signatures on diseases, there is still a need for integrative approaches that obtain separation of multiple pathological states providing robust selection of gene markers for each disease subtype and information about the possible links or relations between those genes. RESULTS: We present a network-oriented and data-driven bioinformatic approach that searches for association of genes and diseases based on the analysis of genome-wide expression data derived from microarrays or RNA-Seq studies. The approach aims to (i) identify gene sets associated to different pathological states analysed together; (ii) identify a minimum subset within these genes that unequivocally differentiates and classifies the compared disease subtypes; (iii) provide a measurement of the discriminant power of these genes and (iv) identify links between the genes that characterise each of the disease subtypes. This bioinformatic approach is implemented in an R package, named geNetClassifier, available as an open access tool in Bioconductor. To illustrate the performance of the tool, we applied it to two independent datasets: 250 samples from patients with four major leukemia subtypes analysed using expression arrays; another leukemia dataset analysed with RNA-Seq that includes a subtype also present in the previous set. The results show the selection of key deregulated genes recently reported in the literature and assigned to the leukemia subtypes studied. We also show, using these independent datasets, the selection of similar genes in a network built for the same disease subtype. CONCLUSIONS: The construction of gene networks related to specific disease subtypes that include parameters such as gene-to-gene association, gene disease specificity and gene discriminant power can be very useful to draw gene-disease maps and to unravel the molecular features that characterize specific pathological states. The application of the bioinformatic tool here presented shows a neat way to achieve such molecular characterization of the diseases using genome-wide expression data.
Assuntos
Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Marcadores Genéticos/genética , Leucemia/genética , Sequência de Bases , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Predisposição Genética para Doença , Humanos , Leucemia/classificação , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNARESUMO
BACKGROUND: Analysis of DNA copy number alterations and gene expression changes in human samples have been used to find potential target genes in complex diseases. Recent studies have combined these two types of data using different strategies, but focusing on finding gene-based relationships. However, it has been proposed that these data can be used to identify key genomic regions, which may enclose causal genes under the assumption that disease-associated gene expression changes are caused by genomic alterations.
Assuntos
Algoritmos , Dosagem de Genes/genética , Genoma Humano/genética , Genômica/métodos , Glioblastoma/genética , Modelos Genéticos , Transcriptoma , HumanosRESUMO
This protocol explains how to perform a fast SCENIC analysis alongside standard best practices steps on single-cell RNA-sequencing data using software containers and Nextflow pipelines. SCENIC reconstructs regulons (i.e., transcription factors and their target genes) assesses the activity of these discovered regulons in individual cells and uses these cellular activity patterns to find meaningful clusters of cells. Here we present an improved version of SCENIC with several advances. SCENIC has been refactored and reimplemented in Python (pySCENIC), resulting in a tenfold increase in speed, and has been packaged into containers for ease of use. It is now also possible to use epigenomic track databases, as well as motifs, to refine regulons. In this protocol, we explain the different steps of SCENIC: the workflow starts from the count matrix depicting the gene abundances for all cells and consists of three stages. First, coexpression modules are inferred using a regression per-target approach (GRNBoost2). Next, the indirect targets are pruned from these modules using cis-regulatory motif discovery (cisTarget). Lastly, the activity of these regulons is quantified via an enrichment score for the regulon's target genes (AUCell). Nonlinear projection methods can be used to display visual groupings of cells based on the cellular activity patterns of these regulons. The results can be exported as a loom file and visualized in the SCope web application. This protocol is illustrated on two use cases: a peripheral blood mononuclear cell data set and a panel of single-cell RNA-sequencing cancer experiments. For a data set of 10,000 genes and 50,000 cells, the pipeline runs in <2 h.
Assuntos
Redes Reguladoras de Genes , Análise de Célula Única/métodos , Fluxo de Trabalho , Animais , Linhagem Celular Tumoral , Humanos , CamundongosRESUMO
Single-cell techniques are advancing rapidly and are yielding unprecedented insight into cellular heterogeneity. Mapping the gene regulatory networks (GRNs) underlying cell states provides attractive opportunities to mechanistically understand this heterogeneity. In this review, we discuss recently emerging methods to map GRNs from single-cell transcriptomics data, tackling the challenge of increased noise levels and data sparsity compared with bulk data, alongside increasing data volumes. Next, we discuss how new techniques for single-cell epigenomics, such as single-cell ATAC-seq and single-cell DNA methylation profiling, can be used to decipher gene regulatory programmes. We finally look forward to the application of single-cell multi-omics and perturbation techniques that will likely play important roles for GRN inference in the future.
Assuntos
Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Epigenômica/métodosRESUMO
Transcriptional enhancers function as docking platforms for combinations of transcription factors (TFs) to control gene expression. How enhancer sequences determine nucleosome occupancy, TF recruitment and transcriptional activation in vivo remains unclear. Using ATAC-seq across a panel of Drosophila inbred strains, we found that SNPs affecting binding sites of the TF Grainy head (Grh) causally determine the accessibility of epithelial enhancers. We show that deletion and ectopic expression of Grh cause loss and gain of DNA accessibility, respectively. However, although Grh binding is necessary for enhancer accessibility, it is insufficient to activate enhancers. Finally, we show that human Grh homologs-GRHL1, GRHL2 and GRHL3-function similarly. We conclude that Grh binding is necessary and sufficient for the opening of epithelial enhancers but not for their activation. Our data support a model positing that complex spatiotemporal expression patterns are controlled by regulatory hierarchies in which pioneer factors, such as Grh, establish tissue-specific accessible chromatin landscapes upon which other factors can act.
Assuntos
Proteínas de Ligação a DNA/genética , Proteínas de Drosophila/genética , Nucleossomos/genética , Fatores de Transcrição/genética , Animais , Animais Geneticamente Modificados , Sítios de Ligação , Linhagem Celular Tumoral , Cromatina/genética , Drosophila melanogaster/genética , Elementos Facilitadores Genéticos , Células Epiteliais , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Células MCF-7 , Polimorfismo de Nucleotídeo Único , Ativação TranscricionalRESUMO
Cancer cells are embedded in the tumor microenvironment (TME), a complex ecosystem of stromal cells. Here, we present a 52,698-cell catalog of the TME transcriptome in human lung tumors at single-cell resolution, validated in independent samples where 40,250 additional cells were sequenced. By comparing with matching non-malignant lung samples, we reveal a highly complex TME that profoundly molds stromal cells. We identify 52 stromal cell subtypes, including novel subpopulations in cell types hitherto considered to be homogeneous, as well as transcription factors underlying their heterogeneity. For instance, we discover fibroblasts expressing different collagen sets, endothelial cells downregulating immune cell homing and genes coregulated with established immune checkpoint transcripts and correlating with T-cell activity. By assessing marker genes for these cell subtypes in bulk RNA-sequencing data from 1,572 patients, we illustrate how these correlate with survival, while immunohistochemistry for selected markers validates them as separate cellular entities in an independent series of lung tumors. Hence, in providing a comprehensive catalog of stromal cells types and by characterizing their phenotype and co-optive behavior, this resource provides deeper insights into lung cancer biology that will be helpful in advancing lung cancer diagnosis and therapy.
Assuntos
Neoplasias Pulmonares/patologia , Microambiente Tumoral , Linfócitos B/patologia , Biomarcadores Tumorais/metabolismo , Regulação para Baixo , Células Endoteliais/patologia , Fibroblastos/patologia , Humanos , Pulmão/patologia , Células Mieloides/patologia , Neoplasias/imunologia , Neoplasias/patologia , Fenótipo , Análise de Sequência de RNA , Análise de Célula Única , Células Estromais/patologia , Análise de Sobrevida , Linfócitos T/patologiaRESUMO
Identification and functional validation of oncogenic drivers are essential steps toward advancing cancer precision medicine. Here, we have presented a comprehensive analysis of the somatic genomic landscape of the widely used BRAFV600E- and NRASQ61K-driven mouse models of melanoma. By integrating the data with publically available genomic, epigenomic, and transcriptomic information from human clinical samples, we confirmed the importance of several genes and pathways previously implicated in human melanoma, including the tumor-suppressor genes phosphatase and tensin homolog (PTEN), cyclin dependent kinase inhibitor 2A (CDKN2A), LKB1, and others. Importantly, this approach also identified additional putative melanoma drivers with prognostic and therapeutic relevance. Surprisingly, one of these genes encodes the tyrosine kinase FES. Whereas FES is highly expressed in normal human melanocytes, FES expression is strongly decreased in over 30% of human melanomas. This downregulation correlates with poor overall survival. Correspondingly, engineered deletion of Fes accelerated tumor progression in a BRAFV600E-driven mouse model of melanoma. Together, these data implicate FES as a driver of melanoma progression and demonstrate the potential of cross-species oncogenomic approaches combined with mouse modeling to uncover impactful mutations and oncogenic driver alleles with clinical importance in the treatment of human cancer.
Assuntos
Melanoma/genética , Proteínas Proto-Oncogênicas c-fes/genética , Neoplasias Cutâneas/genética , Animais , Linhagem Celular Tumoral , Proliferação de Células , Variações do Número de Cópias de DNA , Genes Supressores de Tumor , Genômica , Humanos , Melanoma/metabolismo , Camundongos Endogâmicos C57BL , Camundongos Nus , Camundongos Transgênicos , Transplante de Neoplasias , Oncogenes , Proteínas Proto-Oncogênicas c-fes/metabolismo , Neoplasias Cutâneas/metabolismo , Via de Sinalização WntRESUMO
Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.