RESUMEN
The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.
Asunto(s)
Regulación Neoplásica de la Expresión Génica , Mutación , Neoplasias/genética , Empalme del ARN , Ensamble y Desensamble de Cromatina , Biología Computacional/métodos , Bases de Datos Genéticas , Genoma Humano , Humanos , Redes y Vías Metabólicas/genética , Neoplasias/metabolismo , Regiones Promotoras GenéticasRESUMEN
Because of its clonal evolution a tumor rarely contains multiple genomic alterations in the same pathway as disrupting the pathway by one gene often is sufficient to confer the complete fitness advantage. As a result, many cancer driver genes display mutual exclusivity across tumors. However, searching for mutually exclusive gene sets requires analyzing all possible combinations of genes, leading to a problem which is typically too computationally complex to be solved without a stringent a priori filtering, restricting the mutations included in the analysis. To overcome this problem, we present SSA-ME, a network-based method to detect cancer driver genes based on independently scoring small subnetworks for mutual exclusivity using a reinforced learning approach. Because of the algorithmic efficiency, no stringent upfront filtering is required. Analysis of TCGA cancer datasets illustrates the added value of SSA-ME: well-known recurrently mutated but also rarely mutated drivers are prioritized. We show that using mutual exclusivity to detect cancer driver genes is complementary to state-of-the-art approaches. This framework, in which a large number of small subnetworks are being analyzed in order to solve a computationally complex problem (SSA), can be generically applied to any problem in which local neighborhoods in a network hold useful information.
Asunto(s)
Biología Computacional/métodos , Predisposición Genética a la Enfermedad/genética , Mutación , Neoplasias/genética , Proto-Oncogenes/genética , Algoritmos , Evolución Clonal/genética , Bases de Datos Genéticas , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Transducción de Señal/genéticaRESUMEN
Identification of genomic regions associated with a phenotype of interest is a fundamental step toward solving questions in biology and improving industrial research. Bulk segregant analysis (BSA) combined with high-throughput sequencing is a technique to efficiently identify these genomic regions associated with a trait of interest. However, distinguishing true from spuriously linked genomic regions and accurately delineating the genomic positions of these truly linked regions requires the use of complex statistical models currently implemented in software tools that are generally difficult to operate for non-expert users. To facilitate the exploration and analysis of data generated by bulked segregant analysis, we present EXPLoRA-web, a web service wrapped around our previously published algorithm EXPLoRA, which exploits linkage disequilibrium to increase the power and accuracy of quantitative trait loci identification in BSA analysis. EXPLoRA-web provides a user friendly interface that enables easy data upload and parallel processing of different parameter configurations. Results are provided graphically and as BED file and/or text file and the input is expected in widely used formats, enabling straightforward BSA data analysis. The web server is available at http://bioinformatics.intec.ugent.be/explora-web/.
Asunto(s)
Algoritmos , Desequilibrio de Ligamiento , Sitios de Carácter Cuantitativo , Carácter Cuantitativo Heredable , Programas Informáticos , Alelos , Animales , Bacterias/genética , Bacterias/metabolismo , Gráficos por Computador , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Almacenamiento y Recuperación de la Información , Internet , FenotipoRESUMEN
Clonal populations accumulate mutations over time, resulting in different haplotypes. Deep sequencing of such a population in principle provides information to reconstruct these haplotypes and the frequency at which the haplotypes occur. However, this reconstruction is technically not trivial, especially not in clonal systems with a relatively low mutation frequency. The low number of segregating sites in those systems adds ambiguity to the haplotype phasing and thus obviates the reconstruction of genome-wide haplotypes based on sequence overlap information.Therefore, we present EVORhA, a haplotype reconstruction method that complements phasing information in the non-empty read overlap with the frequency estimations of inferred local haplotypes. As was shown with simulated data, as soon as read lengths and/or mutation rates become restrictive for state-of-the-art methods, the use of this additional frequency information allows EVORhA to still reliably reconstruct genome-wide haplotypes. On real data, we show the applicability of the method in reconstructing the population composition of evolved bacterial populations and in decomposing mixed bacterial infections from clinical samples.
Asunto(s)
Genoma Bacteriano , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Infecciones Bacterianas/microbiología , Coinfección/microbiología , Escherichia coli/genética , Evolución Molecular , Humanos , Polimorfismo GenéticoRESUMEN
BACKGROUND: Bulk segregant analysis (BSA) coupled to high throughput sequencing is a powerful method to map genomic regions related with phenotypes of interest. It relies on crossing two parents, one inferior and one superior for a trait of interest. Segregants displaying the trait of the superior parent are pooled, the DNA extracted and sequenced. Genomic regions linked to the trait of interest are identified by searching the pool for overrepresented alleles that normally originate from the superior parent. BSA data analysis is non-trivial due to sequencing, alignment and screening errors. RESULTS: To increase the power of the BSA technology and obtain a better distinction between spuriously and truly linked regions, we developed EXPLoRA (EXtraction of over-rePresented aLleles in BSA), an algorithm for BSA data analysis that explicitly models the dependency between neighboring marker sites by exploiting the properties of linkage disequilibrium through a Hidden Markov Model (HMM). Reanalyzing a BSA dataset for high ethanol tolerance in yeast allowed reliably identifying QTLs linked to this phenotype that could not be identified with statistical significance in the original study. Experimental validation of one of the least pronounced linked regions, by identifying its causative gene VPS70, confirmed the potential of our method. CONCLUSIONS: EXPLoRA has a performance at least as good as the state-of-the-art and it is robust even at low signal to noise ratio's i.e. when the true linkage signal is diluted by sampling, screening errors or when few segregants are available.