RESUMEN
International differences in the incidence of many cancer types indicate the existence of carcinogen exposures that have not yet been identified by conventional epidemiology make a substantial contribution to cancer burden1. In clear cell renal cell carcinoma, obesity, hypertension and tobacco smoking are risk factors, but they do not explain the geographical variation in its incidence2. Underlying causes can be inferred by sequencing the genomes of cancers from populations with different incidence rates and detecting differences in patterns of somatic mutations. Here we sequenced 962 clear cell renal cell carcinomas from 11 countries with varying incidence. The somatic mutation profiles differed between countries. In Romania, Serbia and Thailand, mutational signatures characteristic of aristolochic acid compounds were present in most cases, but these were rare elsewhere. In Japan, a mutational signature of unknown cause was found in more than 70% of cases but in less than 2% elsewhere. A further mutational signature of unknown cause was ubiquitous but exhibited higher mutation loads in countries with higher incidence rates of kidney cancer. Known signatures of tobacco smoking correlated with tobacco consumption, but no signature was associated with obesity or hypertension, suggesting that non-mutagenic mechanisms of action underlie these risk factors. The results of this study indicate the existence of multiple, geographically variable, mutagenic exposures that potentially affect tens of millions of people and illustrate the opportunities for new insights into cancer causation through large-scale global cancer genomics.
Asunto(s)
Carcinoma de Células Renales , Exposición a Riesgos Ambientales , Geografía , Neoplasias Renales , Mutágenos , Mutación , Femenino , Humanos , Masculino , Ácidos Aristolóquicos/efectos adversos , Carcinoma de Células Renales/genética , Carcinoma de Células Renales/epidemiología , Carcinoma de Células Renales/inducido químicamente , Exposición a Riesgos Ambientales/efectos adversos , Exposición a Riesgos Ambientales/análisis , Genoma Humano/genética , Genómica , Hipertensión/epidemiología , Incidencia , Japón/epidemiología , Neoplasias Renales/genética , Neoplasias Renales/epidemiología , Neoplasias Renales/inducido químicamente , Mutágenos/efectos adversos , Obesidad/epidemiología , Factores de Riesgo , Rumanía/epidemiología , Serbia/epidemiología , Tailandia/epidemiología , Fumar Tabaco/efectos adversos , Fumar Tabaco/genéticaRESUMEN
MOTIVATION: Analysis of mutational signatures is a powerful approach for understanding the mutagenic processes that have shaped the evolution of a cancer genome. To evaluate the mutational signatures operative in a cancer genome, one first needs to quantify their activities by estimating the number of mutations imprinted by each signature. RESULTS: Here we present SigProfilerAssignment, a desktop and an online computational framework for assigning all types of mutational signatures to individual samples. SigProfilerAssignment is the first tool that allows both analysis of copy-number signatures and probabilistic assignment of signatures to individual somatic mutations. As its computational engine, the tool uses a custom implementation of the forward stagewise algorithm for sparse regression and nonnegative least squares for numerical optimization. Analysis of 2700 synthetic cancer genomes with and without noise demonstrates that SigProfilerAssignment outperforms four commonly used approaches for assigning mutational signatures. AVAILABILITY AND IMPLEMENTATION: SigProfilerAssignment is available under the BSD 2-clause license at https://github.com/AlexandrovLab/SigProfilerAssignment with a web implementation at https://cancer.sanger.ac.uk/signatures/assignment/.
Asunto(s)
Neoplasias , Humanos , Mutación , Neoplasias/genética , Algoritmos , GenomaRESUMEN
BACKGROUND: All cancers harbor somatic mutations in their genomes. In principle, mutations affecting between one and fifty base pairs are generally classified as small mutational events. Conversely, large mutational events affect more than fifty base pairs, and, in most cases, they encompass copy-number and structural variants affecting many thousands of base pairs. Prior studies have demonstrated that examining patterns of somatic mutations can be leveraged to provide both biological and clinical insights, thus, resulting in an extensive repertoire of tools for evaluating small mutational events. Recently, classification schemas for examining large-scale mutational events have emerged and shown their utility across the spectrum of human cancers. However, there has been no computationally efficient bioinformatics tool that allows visualizing and exploring these large-scale mutational events. RESULTS: Here, we present a new version of SigProfilerMatrixGenerator that now delivers integrated capabilities for examining large mutational events. The tool provides support for examining copy-number variants and structural variants under two previously developed classification schemas and it supports data from numerous algorithms and data modalities. SigProfilerMatrixGenerator is written in Python with an R wrapper package provided for users that prefer working in an R environment. CONCLUSIONS: The new version of SigProfilerMatrixGenerator provides the first standardized bioinformatics tool for optimized exploration and visualization of two previously developed classification schemas for copy number and structural variants. The tool is freely available at https://github.com/AlexandrovLab/SigProfilerMatrixGenerator with an extensive documentation at https://osf.io/s93d5/wiki/home/ .
Asunto(s)
Algoritmos , Biología Computacional , Humanos , MutaciónRESUMEN
Lung cancer in never smokers (LCINS) accounts for up to 25% of all lung cancers and has been associated with exposure to secondhand tobacco smoke and air pollution in observational studies. Here, we evaluate the mutagenic exposures in LCINS by examining deep whole-genome sequencing data from a large international cohort of 871 treatment-naïve LCINS recruited from 28 geographical locations within the Sherlock-Lung study. KRAS mutations were 3.8-fold more common in adenocarcinomas of never smokers from North America and Europe, while a 1.6-fold higher prevalence of EGFR and TP53 mutations was observed in adenocarcinomas from East Asia. Signature SBS40a, with unknown cause, was found in most samples and accounted for the largest proportion of single base substitutions in adenocarcinomas, being enriched in EGFR-mutated cases. Conversely, the aristolochic acid signature SBS22a was almost exclusively observed in patients from Taipei. Even though LCINS exposed to secondhand smoke had an 8.3% higher mutational burden and 5.4% shorter telomeres, passive smoking was not associated with driver mutations in cancer driver genes or the activities of individual mutational signatures. In contrast, patients from regions with high levels of air pollution were more likely to have TP53 mutations while exhibiting shorter telomeres and an increase in most types of somatic mutations, including a 3.9-fold elevation of signature SBS4 (q-value=3.1 × 10-5), previously linked mainly to tobacco smoking, and a 76% increase of clock-like signature SBS5 (q-value=5.0 × 10-5). A positive dose-response effect was observed with air pollution levels, which correlated with both a decrease in telomere length and an elevation in somatic mutations, notably attributed to signatures SBS4 and SBS5. Our results elucidate the diversity of mutational processes shaping the genomic landscape of lung cancer in never smokers.
RESUMEN
APOBEC enzymes are part of the innate immunity and are responsible for restricting viruses and retroelements by deaminating cytosine residues1,2. Most solid tumors harbor different levels of somatic mutations attributed to the off-target activities of APOBEC3A (A3A) and/or APOBEC3B (A3B)3-6. However, how APOBEC3A/B enzymes shape the tumor evolution in the presence of exogenous mutagenic processes is largely unknown. Here, by combining deep whole-genome sequencing with multi-omics profiling of 309 lung cancers from smokers with detailed tobacco smoking information, we identify two subtypes defined by low (LAS) and high (HAS) APOBEC mutagenesis. LAS are enriched for A3B-like mutagenesis and KRAS mutations, whereas HAS for A3A-like mutagenesis and TP53 mutations. Unlike APOBEC3A, APOBEC3B expression is strongly associated with an upregulation of the base excision repair pathway. Hypermutation by unrepaired A3A and tobacco smoking mutagenesis combined with TP53-induced genomic instability can trigger senescence7, apoptosis8, and cell regeneration9, as indicated by high expression of pulmonary healing signaling pathway, stemness markers and distal cell-of-origin in HAS. The expected association of tobacco smoking variables (e.g., time to first cigarette) with genomic/epigenomic changes are not observed in HAS, a plausible consequence of frequent cell senescence or apoptosis. HAS have more neoantigens, slower clonal expansion, and older age at onset compared to LAS, particularly in heavy smokers, consistent with high proportions of newly generated, unmutated cells and frequent immuno-editing. These findings show how heterogeneity in mutational burden across co-occurring mutational processes and cell types contributes to tumor development, with important clinical implications.
RESUMEN
Tobacco smoke, alone or combined with alcohol, is the predominant cause of head and neck cancer (HNC). Here, we further explore how tobacco exposure contributes to cancer development by mutational signature analysis of 265 whole-genome sequenced HNC from eight countries. Six tobacco-associated mutational signatures were detected, including some not previously reported. Differences in HNC incidence between countries corresponded with differences in mutation burdens of tobacco-associated signatures, consistent with the dominant role of tobacco in HNC causation. Differences were found in the burden of tobacco-associated signatures between anatomical subsites, suggesting that tissue-specific factors modulate mutagenesis. We identified an association between tobacco smoking and three additional alcohol-related signatures indicating synergism between the two exposures. Tobacco smoking was associated with differences in the mutational spectra and repertoire of driver mutations in cancer genes, and in patterns of copy number change. Together, the results demonstrate the multiple pathways by which tobacco smoke can influence the evolution of cancer cell clones.
RESUMEN
Background: All cancers harbor somatic mutations in their genomes. In principle, mutations affecting between one and fifty base pairs are generally classified as small mutational events. Conversely, large mutational events affect more than fifty base pairs, and, in most cases, they encompass copy-number and structural variants affecting many thousands of base pairs. Prior studies have demonstrated that examining patterns of somatic mutations can be leveraged to provide both biological and clinical insights, thus, resulting in an extensive repertoire of tools for evaluating small mutational events. Recently, classification schemas for examining large-scale mutational events have emerged and shown their utility across the spectrum of human cancers. However, there has been no standard bioinformatics tool that allows visualizing and exploring these large-scale mutational events. Results: Here, we present a new version of SigProfilerMatrixGenerator that now delivers integrated capabilities for examining large mutational events. The tool provides support for examining copy-number variants and structural variants under two previously developed classification schemas and it supports data from numerous algorithms and data modalities. SigProfilerMatrixGenerator is written in Python with an R wrapper package provided for users that prefer working in an R environment. Conclusions: The new version of SigProfilerMatrixGenerator provides the first standardized bioinformatics tool for optimized exploration and visualization of two previously developed classification schemas for copy number and structural variants. The tool is freely available at https://github.com/AlexandrovLab/SigProfilerMatrixGenerator with an extensive documentation at https://osf.io/s93d5/wiki/home/ .
RESUMEN
Analysis of mutational signatures is a powerful approach for understanding the mutagenic processes that have shaped the evolution of a cancer genome. Here we present SigProfilerAssignment, a desktop and an online computational framework for assigning all types of mutational signatures to individual samples. SigProfilerAssignment is the first tool that allows both analysis of copy-number signatures and probabilistic assignment of signatures to individual somatic mutations. As its computational engine, the tool uses a custom implementation of the forward stagewise algorithm for sparse regression and nonnegative least squares for numerical optimization. Analysis of 2,700 synthetic cancer genomes with and without noise demonstrates that SigProfilerAssignment outperforms four commonly used approaches for assigning mutational signatures. SigProfilerAssignment is freely available at https://github.com/AlexandrovLab/SigProfilerAssignment with a web implementation at https://cancer.sanger.ac.uk/signatures/assignment/.
RESUMEN
A central challenge in protein modeling research and protein structure prediction in particular is known as decoy selection. The problem refers to selecting biologically-active/native tertiary structures among a multitude of physically-realistic structures generated by template-free protein structure prediction methods. Research on decoy selection is active. Clustering-based methods are popular, but they fail to identify good/near-native decoys on datasets where near-native decoys are severely under-sampled by a protein structure prediction method. Reasonable progress is reported by methods that additionally take into account the internal energy of a structure and employ it to identify basins in the energy landscape organizing the multitude of decoys. These methods, however, incur significant time costs for extracting basins from the landscape. In this paper, we propose a novel decoy selection method based on non-negative matrix factorization. We demonstrate that our method outperforms energy landscape-based methods. In particular, the proposed method addresses both the time cost issue and the challenge of identifying good decoys in a sparse dataset, successfully recognizing near-native decoys for both easy and hard protein targets.
Asunto(s)
Algoritmos , Proteínas , Análisis por Conglomerados , Conformación Proteica , Pliegue de Proteína , Proteínas/química , Proteínas/genéticaRESUMEN
Mutational signature analysis is commonly performed in cancer genomic studies. Here, we present SigProfilerExtractor, an automated tool for de novo extraction of mutational signatures, and benchmark it against another 13 bioinformatics tools by using 34 scenarios encompassing 2,500 simulated signatures found in 60,000 synthetic genomes and 20,000 synthetic exomes. For simulations with 5% noise, reflecting high-quality datasets, SigProfilerExtractor outperforms other approaches by elucidating between 20% and 50% more true-positive signatures while yielding 5-fold less false-positive signatures. Applying SigProfilerExtractor to 4,643 whole-genome- and 19,184 whole-exome-sequenced cancers reveals four novel signatures. Two of the signatures are confirmed in independent cohorts, and one of these signatures is associated with tobacco smoking. In summary, this report provides a reference tool for analysis of mutational signatures, a comprehensive benchmarking of bioinformatics tools for extracting signatures, and several novel mutational signatures, including one putatively attributed to direct tobacco smoking mutagenesis in bladder tissues.