RESUMEN
Identifying molecular cancer drivers is critical for precision oncology. Multiple advanced algorithms to identify drivers now exist, but systematic attempts to combine and optimize them on large datasets are few. We report a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations. We identify 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors.
Asunto(s)
Neoplasias/patología , Algoritmos , Antígeno B7-H1/genética , Biología Computacional , Bases de Datos Genéticas , Entropía , Humanos , Inestabilidad de Microsatélites , Mutación , Neoplasias/genética , Neoplasias/inmunología , Análisis de Componente Principal , Receptor de Muerte Celular Programada 1/genéticaRESUMEN
Mutational processes constantly shape the somatic genome, leading to immunity, aging, cancer, and other diseases. When cancer is the outcome, we are afforded a glimpse into these processes by the clonal expansion of the malignant cell. Here, we characterize a less explored layer of the mutational landscape of cancer: mutational asymmetries between the two DNA strands. Analyzing whole-genome sequences of 590 tumors from 14 different cancer types, we reveal widespread asymmetries across mutagenic processes, with transcriptional ("T-class") asymmetry dominating UV-, smoking-, and liver-cancer-associated mutations and replicative ("R-class") asymmetry dominating POLE-, APOBEC-, and MSI-associated mutations. We report a striking phenomenon of transcription-coupled damage (TCD) on the non-transcribed DNA strand and provide evidence that APOBEC mutagenesis occurs on the lagging-strand template during DNA replication. As more genomes are sequenced, studying and classifying their asymmetries will illuminate the underlying biological mechanisms of DNA damage and repair.
Asunto(s)
Daño del ADN , Análisis Mutacional de ADN , Reparación del ADN , Neoplasias/genética , Replicación del ADN , Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Mutación , Neoplasias/patología , Transcripción GenéticaRESUMEN
The discovery of drivers of cancer has traditionally focused on protein-coding genes1-4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5' region of TP53, in the 3' untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available.
Asunto(s)
Genoma Humano/genética , Mutación/genética , Neoplasias/genética , Roturas del ADN , Bases de Datos Genéticas , Regulación Neoplásica de la Expresión Génica , Estudio de Asociación del Genoma Completo , Humanos , Mutación INDELRESUMEN
Large panels of comprehensively characterized human cancer models, including the Cancer Cell Line Encyclopedia (CCLE), have provided a rigorous framework with which to study genetic variants, candidate targets, and small-molecule and biological therapeutics and to identify new marker-driven cancer dependencies. To improve our understanding of the molecular features that contribute to cancer phenotypes, including drug responses, here we have expanded the characterizations of cancer cell lines to include genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression and reverse-phase protein array data for 1,072 cell lines from individuals of various lineages and ethnicities. Integration of these data with functional characterizations such as drug-sensitivity, short hairpin RNA knockdown and CRISPR-Cas9 knockout data reveals potential targets for cancer drugs and associated biomarkers. Together, this dataset and an accompanying public data portal provide a resource for the acceleration of cancer research using model cancer cell lines.
Asunto(s)
Línea Celular Tumoral , Neoplasias/genética , Neoplasias/patología , Antineoplásicos/farmacología , Biomarcadores de Tumor , Metilación de ADN , Resistencia a Antineoplásicos , Etnicidad/genética , Edición Génica , Histonas/metabolismo , Humanos , MicroARNs/genética , Terapia Molecular Dirigida , Neoplasias/metabolismo , Análisis por Matrices de Proteínas , Empalme del ARNRESUMEN
Which genetic alterations drive tumorigenesis and how they evolve over the course of disease and therapy are central questions in cancer biology. Here we identify 44 recurrently mutated genes and 11 recurrent somatic copy number variations through whole-exome sequencing of 538 chronic lymphocytic leukaemia (CLL) and matched germline DNA samples, 278 of which were collected in a prospective clinical trial. These include previously unrecognized putative cancer drivers (RPS15, IKZF3), and collectively identify RNA processing and export, MYC activity, and MAPK signalling as central pathways involved in CLL. Clonality analysis of this large data set further enabled reconstruction of temporal relationships between driver events. Direct comparison between matched pre-treatment and relapse samples from 59 patients demonstrated highly frequent clonal evolution. Thus, large sequencing data sets of clinically informative samples enable the discovery of novel genes associated with cancer, the network of relationships between the driver events, and their impact on disease relapse and clinical outcome.
Asunto(s)
Progresión de la Enfermedad , Evolución Molecular , Leucemia Linfocítica Crónica de Células B/genética , Mutación/genética , Recurrencia Local de Neoplasia/genética , Transformación Celular Neoplásica/genética , Células Clonales/metabolismo , Células Clonales/patología , Variaciones en el Número de Copia de ADN/genética , Exoma/genética , Genes myc/genética , Humanos , Factor de Transcripción Ikaros/genética , Leucemia Linfocítica Crónica de Células B/diagnóstico , Leucemia Linfocítica Crónica de Células B/patología , Leucemia Linfocítica Crónica de Células B/terapia , Sistema de Señalización de MAP Quinasas/genética , Pronóstico , Procesamiento Postranscripcional del ARN/genética , Transporte de ARN/genética , Proteínas Ribosómicas/genética , Resultado del TratamientoRESUMEN
Expression of mRNA is often regulated by the binding of a small RNA (miRNA, snoRNA, siRNA). While the pairing contribution to the net free energy is well parameterized and can be computed in O(N) time, the cost of removing pre-existing mRNA secondary structure has not received sufficient attention. Conventional methods for computing the unfolding free energy of a target mRNA are costly, scaling like the cube of the number of target bases O(N3). Here we introduce a model to describe the unfolding costs of the binding site, which features surprisingly big differences in the free energy parameters for the four bases. The model is implemented in our O(N) algorithm, BindOligoNet. Donor splice site prediction is more accurate when using our calculation of spliceosomal U1-snRNA to mRNA net binding free energy. Our base-dependent free energies also correlate with efficient ribosome docking near the start codon.
Asunto(s)
Iniciación de la Cadena Peptídica Traduccional , Empalme del ARN , ARN Mensajero , Algoritmos , Sitios de Unión , Conformación de Ácido Nucleico , Nucleótidos , ARN Mensajero/biosíntesis , ARN Mensajero/química , ARN Nuclear Pequeño/química , Empalmosomas/química , TermodinámicaRESUMEN
Current statistical models for assessing hotspot significance do not properly account for variation in site-specific mutability, thereby yielding many false-positives. We thus (i) detail a Log-normal-Poisson (LNP) background model that accounts for this variability in a manner consistent with models of mutagenesis; (ii) use it to show that passenger hotspots arise from all common mutational processes; and (iii) apply it to a â¼10,000-patient cohort to nominate driver hotspots with far fewer false-positives compared with conventional methods. Overall, we show that many cancer hotspot mutations recurring at the same genomic site across multiple tumors are actually passenger events, recurring at inherently mutable genomic sites under no positive selection.
Asunto(s)
Carcinogénesis/genética , Genómica/métodos , Modelos Genéticos , Mutagénesis , Neoplasias/genética , Análisis Mutacional de ADN , Conjuntos de Datos como Asunto , Genes Supresores de Tumor , Humanos , Distribución de Poisson , Curva ROC , Selección Genética , Secuenciación del ExomaRESUMEN
How somatic mutations accumulate in normal cells is poorly understood. A comprehensive analysis of RNA sequencing data from ~6700 samples across 29 normal tissues revealed multiple somatic variants, demonstrating that macroscopic clones can be found in many normal tissues. We found that sun-exposed skin, esophagus, and lung have a higher mutation burden than other tested tissues, which suggests that environmental factors can promote somatic mosaicism. Mutation burden was associated with both age and tissue-specific cell proliferation rate, highlighting that mutations accumulate over both time and number of cell divisions. Finally, normal tissues were found to harbor mutations in known cancer genes and hotspots. This study provides a broad view of macroscopic clonal expansion in human tissues, thus serving as a foundation for associating clonal expansion with environmental factors, aging, and risk of disease.
Asunto(s)
Análisis Mutacional de ADN/métodos , Neoplasias/genética , Análisis de Secuencia de ARN/métodos , Células Clonales , Femenino , Humanos , Masculino , Especificidad de Órganos/genéticaRESUMEN
Hürthle cell carcinoma of the thyroid (HCC) is a form of thyroid cancer recalcitrant to radioiodine therapy that exhibits an accumulation of mitochondria. We performed whole-exome sequencing on a cohort of primary, recurrent, and metastatic tumors, and identified recurrent mutations in DAXX, TP53, NRAS, NF1, CDKN1A, ARHGAP35, and the TERT promoter. Parallel analysis of mtDNA revealed recurrent homoplasmic mutations in subunits of complex I of the electron transport chain. Analysis of DNA copy-number alterations uncovered widespread loss of chromosomes culminating in near-haploid chromosomal content in a large fraction of HCC, which was maintained during metastatic spread. This work uncovers a distinct molecular origin of HCC compared with other thyroid malignancies.
Asunto(s)
Aberraciones Cromosómicas , ADN Mitocondrial/genética , Mutación , Neoplasias de la Tiroides/genética , Variaciones en el Número de Copia de ADN , Haploidia , Humanos , Metástasis de la Neoplasia , Telomerasa/genética , Neoplasias de la Tiroides/patología , Secuenciación del ExomaRESUMEN
Diffuse large B cell lymphoma (DLBCL), the most common lymphoid malignancy in adults, is a clinically and genetically heterogeneous disease that is further classified into transcriptionally defined activated B cell (ABC) and germinal center B cell (GCB) subtypes. We carried out a comprehensive genetic analysis of 304 primary DLBCLs and identified low-frequency alterations, captured recurrent mutations, somatic copy number alterations, and structural variants, and defined coordinate signatures in patients with available outcome data. We integrated these genetic drivers using consensus clustering and identified five robust DLBCL subsets, including a previously unrecognized group of low-risk ABC-DLBCLs of extrafollicular/marginal zone origin; two distinct subsets of GCB-DLBCLs with different outcomes and targetable alterations; and an ABC/GCB-independent group with biallelic inactivation of TP53, CDKN2A loss, and associated genomic instability. The genetic features of the newly characterized subsets, their mutational signatures, and the temporal ordering of identified alterations provide new insights into DLBCL pathogenesis. The coordinate genetic signatures also predict outcome independent of the clinical International Prognostic Index and suggest new combination treatment strategies. More broadly, our results provide a roadmap for an actionable DLBCL classification.
Asunto(s)
Linfoma de Células B Grandes Difuso/genética , Linfoma de Células B Grandes Difuso/patología , Variaciones en el Número de Copia de ADN/genética , Reordenamiento Génico/genética , Genes Relacionados con las Neoplasias , Heterogeneidad Genética , Humanos , Mutación/genética , Tasa de Mutación , Resultado del TratamientoRESUMEN
In the version of this article originally published, an asterisk was omitted from Fig. 1a. The asterisk has been added to the figure. Additionally, a "NOTCH2" label was erroneously included in Fig. 4a. The label has been removed. The errors have been corrected in the PDF and HTML versions of this article.
RESUMEN
In the version of this article originally published, some text above the "Tri-nucleotide sequence motifs" label in Fig. 2a appeared incorrectly. The text was garbled and should have appeared as nucleotide codes.Additionally, the labels on the bars in Fig. 2c were not italicized in the original publication. These are gene symbols, and they should have been italicized.The colored labels above the graphs in Fig. 4b were also erroneously not italicized. These labels represent gene names and loci, and they should have been italicized.
RESUMEN
There is a striking and unexplained male predominance across many cancer types. A subset of X-chromosome genes can escape X-inactivation, which would protect females from complete functional loss by a single mutation. To identify putative 'escape from X-inactivation tumor-suppressor' (EXITS) genes, we examined somatic alterations from >4,100 cancers across 21 tumor types for sex bias. Six of 783 non-pseudoautosomal region (PAR) X-chromosome genes (ATRX, CNKSR2, DDX3X, KDM5C, KDM6A, and MAGEC3) harbored loss-of-function mutations more frequently in males (based on a false discovery rate < 0.1), in comparison to zero of 18,055 autosomal and PAR genes (Fisher's exact P < 0.0001). Male-biased mutations in genes that escape X-inactivation were observed in combined analysis across many cancers and in several individual tumor types, suggesting a generalized phenomenon. We conclude that biallelic expression of EXITS genes in females explains a portion of the reduced cancer incidence in females as compared to males across a variety of tumor types.
Asunto(s)
Cromosomas Humanos X/genética , Genes Supresores de Tumor , Genes Ligados a X/genética , Mutación/genética , Neoplasias/genética , Sexismo/estadística & datos numéricos , Inactivación del Cromosoma X/genética , Femenino , Humanos , MasculinoRESUMEN
Microsatellites (MSs) are tracts of variable-length repeats of short DNA motifs that exhibit high rates of mutation in the form of insertions or deletions (indels) of the repeated motif. Despite their prevalence, the contribution of somatic MS indels to cancer has been largely unexplored, owing to difficulties in detecting them in short-read sequencing data. Here we present two tools: MSMuTect, for accurate detection of somatic MS indels, and MSMutSig, for identification of genes containing MS indels at a higher frequency than expected by chance. Applying MSMuTect to whole-exome data from 6,747 human tumors representing 20 tumor types, we identified >1,000 previously undescribed MS indels in cancer genes. Additionally, we demonstrate that the number and pattern of MS indels can accurately distinguish microsatellite-stable tumors from tumors with microsatellite instability, thus potentially improving classification of clinically relevant subgroups. Finally, we identified seven MS indel driver hotspots: four in known cancer genes (ACVR2A, RNF43, JAK1, and MSH3) and three in genes not previously implicated as cancer drivers (ESRP1, PRDM2, and DOCK3).
Asunto(s)
Mutación INDEL/genética , Repeticiones de Microsatélite/genética , Neoplasias/genética , Exoma/genética , Genes Relacionados con las Neoplasias , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inestabilidad de Microsatélites , Mutación/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismoRESUMEN
Comprehensive multiplatform analysis of 80 uveal melanomas (UM) identifies four molecularly distinct, clinically relevant subtypes: two associated with poor-prognosis monosomy 3 (M3) and two with better-prognosis disomy 3 (D3). We show that BAP1 loss follows M3 occurrence and correlates with a global DNA methylation state that is distinct from D3-UM. Poor-prognosis M3-UM divide into subsets with divergent genomic aberrations, transcriptional features, and clinical outcomes. We report change-of-function SRSF2 mutations. Within D3-UM, EIF1AX- and SRSF2/SF3B1-mutant tumors have distinct somatic copy number alterations and DNA methylation profiles, providing insight into the biology of these low- versus intermediate-risk clinical mutation subtypes.
Asunto(s)
Biomarcadores de Tumor/genética , Metilación de ADN , Regulación Neoplásica de la Expresión Génica , Melanoma/genética , Mutación , Neoplasias de la Úvea/genética , Variaciones en el Número de Copia de ADN , Factor 1 Eucariótico de Iniciación/genética , Humanos , Melanoma/clasificación , Monosomía , Fosfoproteínas/genética , Pronóstico , Factores de Empalme de ARN/genética , Factores de Empalme Serina-Arginina/genética , Proteínas Supresoras de Tumor/genética , Ubiquitina Tiolesterasa/genética , Neoplasias de la Úvea/clasificaciónRESUMEN
Cholangiocarcinoma (CCA) is an aggressive malignancy of the bile ducts, with poor prognosis and limited treatment options. Here, we describe the integrated analysis of somatic mutations, RNA expression, copy number, and DNA methylation by The Cancer Genome Atlas of a set of predominantly intrahepatic CCA cases and propose a molecular classification scheme. We identified an IDH mutant-enriched subtype with distinct molecular features including low expression of chromatin modifiers, elevated expression of mitochondrial genes, and increased mitochondrial DNA copy number. Leveraging the multi-platform data, we observed that ARID1A exhibited DNA hypermethylation and decreased expression in the IDH mutant subtype. More broadly, we found that IDH mutations are associated with an expanded histological spectrum of liver tumors with molecular features that stratify with CCA. Our studies reveal insights into the molecular pathogenesis and heterogeneity of cholangiocarcinoma and provide classification information of potential therapeutic significance.