RESUMEN
Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.
Asunto(s)
Inversión Cromosómica , Duplicaciones Segmentarias en el Genoma , Inversión Cromosómica/genética , Variaciones en el Número de Copia de ADN/genética , Genoma Humano , Genómica , HumanosRESUMEN
Multiple signatures of somatic mutations have been identified in cancer genomes. Exome sequences of 1,001 human cancer cell lines and 577 xenografts revealed most common mutational signatures, indicating past activity of the underlying processes, usually in appropriate cancer types. To investigate ongoing patterns of mutational-signature generation, cell lines were cultured for extended periods and subsequently DNA sequenced. Signatures of discontinued exposures, including tobacco smoke and ultraviolet light, were not generated in vitro. Signatures of normal and defective DNA repair and replication continued to be generated at roughly stable mutation rates. Signatures of APOBEC cytidine deaminase DNA-editing exhibited substantial fluctuations in mutation rate over time with episodic bursts of mutations. The initiating factors for the bursts are unclear, although retrotransposon mobilization may contribute. The examined cell lines constitute a resource of live experimental models of mutational processes, which potentially retain patterns of activity and regulation operative in primary human cancers.
Asunto(s)
Desaminasas APOBEC/genética , Neoplasias/genética , Desaminasas APOBEC/metabolismo , Línea Celular , Línea Celular Tumoral , ADN/metabolismo , Análisis Mutacional de ADN/métodos , Bases de Datos Genéticas , Exoma , Genoma Humano/genética , Xenoinjertos , Humanos , Mutagénesis , Mutación/genética , Tasa de Mutación , Retroelementos , Secuenciación del Exoma/métodosRESUMEN
Somatic rearrangements resulting in genomic structural variation drive malignant phenotypes by altering the expression or function of cancer genes. Pan-cancer studies have revealed that structural variants (SVs) are the predominant class of driver mutation in most cancer types, but because they are difficult to discover, they remain understudied when compared with point mutations. This review provides an overview of the current knowledge of somatic SVs, discussing their primary roles, prevalence in different contexts, and mutational mechanisms. SVs arise throughout the life history of cancer, and 55% of driver mutations uncovered by the Pan-Cancer Analysis of Whole Genomes project represent SVs. Leveraging the convergence of cell biology and genomics, we propose a mechanistic classification of somatic SVs, from simple to highly complex DNA rearrangement classes. The actions of DNA repair and DNA replication processes together with mitotic errors result in a rich spectrum of SV formation processes, with cascading effects mediating extensive structural diversity after an initiating DNA lesion has formed. Thanks to new sequencing technologies, including the sequencing of single-cell genomes, open questions about the molecular triggers and the biomolecules involved in SV formation as well as their mutational rates can now be addressed.
Asunto(s)
Variación Estructural del Genoma , Neoplasias , Genoma Humano , Genómica , Humanos , Mutación , Neoplasias/epidemiología , Neoplasias/genética , Neoplasias/patología , PrevalenciaRESUMEN
Nodal peripheral T-cell lymphoma not otherwise specified (PTCL-NOS) remains a diagnosis encompassing a heterogenous group of PTCL cases not fitting criteria for more homogeneous subtypes. They are characterized by a poor clinical outcome when treated with anthracycline-containing regimens. A better understanding of their biology could improve prognostic stratification and foster the development of novel therapeutic approaches. Recent targeted and whole exome sequencing studies have shown recurrent copy number abnormalities (CNAs) with prognostic significance. Here, investigating 5 formalin-fixed, paraffin embedded cases of PTCL-NOS by whole genome sequencing (WGS), we found a high prevalence of structural variants and complex events, such as chromothripsis likely responsible for the observed CNAs. Among them, CDKN2A and PTEN deletions emerged as the most frequent aberration, as confirmed in a final cohort of 143 patients with nodal PTCL. The incidence of CDKN2A and PTEN deletions among PTCL-NOS was 46% and 26%, respectively. Furthermore, we found that co-occurrence of CDKN2A and PTEN deletions is an event associated with PTCL-NOS with absolute specificity. In contrast, these deletions were rare and never co-occurred in angioimmunoblastic and anaplastic lymphomas. CDKN2A deletion was associated with shorter overall survival in multivariate analysis corrected by age, IPI, transplant eligibility and GATA3 expression (adjusted HR =2.53; 95% CI 1.006-6.3; p=0.048). These data suggest that CDKN2A deletions may be relevant for refining the prognosis of PTCL-NOS and their significance should be evaluated in prospective trials.
Asunto(s)
Inhibidor p16 de la Quinasa Dependiente de Ciclina/genética , Linfoma de Células T Periférico , Antraciclinas , Estudios de Cohortes , Eliminación de Gen , Humanos , Linfoma de Células T Periférico/diagnóstico , Linfoma de Células T Periférico/genética , Fosfohidrolasa PTEN , Pronóstico , Estudios ProspectivosRESUMEN
BACKGROUND: Chimeric transcripts are commonly defined as transcripts linking two or more different genes in the genome, and can be explained by various biological mechanisms such as genomic rearrangement, read-through or trans-splicing, but also by technical or biological artefacts. Several studies have shown their importance in cancer, cell pluripotency and motility. Many programs have recently been developed to identify chimeras from Illumina RNA-seq data (mostly fusion genes in cancer). However outputs of different programs on the same dataset can be widely inconsistent, and tend to include many false positives. Other issues relate to simulated datasets restricted to fusion genes, real datasets with limited numbers of validated cases, result inconsistencies between simulated and real datasets, and gene rather than junction level assessment. RESULTS: Here we present ChimPipe, a modular and easy-to-use method to reliably identify fusion genes and transcription-induced chimeras from paired-end Illumina RNA-seq data. We have also produced realistic simulated datasets for three different read lengths, and enhanced two gold-standard cancer datasets by associating exact junction points to validated gene fusions. Benchmarking ChimPipe together with four other state-of-the-art tools on this data showed ChimPipe to be the top program at identifying exact junction coordinates for both kinds of datasets, and the one showing the best trade-off between sensitivity and precision. Applied to 106 ENCODE human RNA-seq datasets, ChimPipe identified 137 high confidence chimeras connecting the protein coding sequence of their parent genes. In subsequent experiments, three out of four predicted chimeras, two of which recurrently expressed in a large majority of the samples, could be validated. Cloning and sequencing of the three cases revealed several new chimeric transcript structures, 3 of which with the potential to encode a chimeric protein for which we hypothesized a new role. Applying ChimPipe to human and mouse ENCODE RNA-seq data led to the identification of 131 recurrent chimeras common to both species, and therefore potentially conserved. CONCLUSIONS: ChimPipe combines discordant paired-end reads and split-reads to detect any kind of chimeras, including those originating from polymerase read-through, and shows an excellent trade-off between sensitivity and precision. The chimeras found by ChimPipe can be validated in-vitro with high accuracy.
Asunto(s)
Proteínas de Fusión Oncogénica , Recombinación Genética , Programas Informáticos , Transcripción Genética , Animales , Biología Computacional/métodos , Simulación por Computador , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Reproducibilidad de los Resultados , Análisis de Secuencia de ARNRESUMEN
Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.
RESUMEN
Treatment-eradicated cancer subclones have been reported in leukemia and have recently been detected in solid tumors. Here we introduce Differential Subclone Eradication and Resistance (DSER) analysis, a method developed to identify molecular targets for improved therapy by direct comparison of genomic features of eradicated and resistant subclones in pre- and posttreatment samples from a patient with BRCA2-deficient metastatic prostate cancer. FANCI and EYA4 were identified as candidate DNA repair-related targets for converting subclones from resistant to eradicable, and RNAi-mediated depletion of FANCI confirmed it as a potential target. The EYA4 alteration was associated with adjacent L1 transposon insertion during cancer evolution upon treatment, raising questions surrounding the role of therapy in L1 activation. Both carboplatin and enzalutamide turned on L1 transposon machinery in LNCaP and VCaP but not in PC3 and 22Rv1 prostate cancer cell lines. L1 activation in LNCaP and VCaP was inhibited by the antiretroviral drug azidothymidine. L1 activation was also detected postcastration in LuCaP 77 and LuCaP 105 xenograft models and postchemotherapy in previously published time-series transcriptomic data from SCC25 head and neck cancer cells. In conclusion, DSER provides an informative intermediate step toward effective precision cancer medicine and should be tested in future studies, especially those including dramatic but temporary metastatic tumor regression. L1 transposon activation may be a modifiable source of cancer genomic heterogeneity, suggesting the potential of leveraging newly discovered triggers and blockers of L1 activity to overcome therapy resistance. SIGNIFICANCE: Differential analysis of eradicated and resistant subclones following cancer treatment identifies that L1 activity associated with resistance is induced by current therapies and blocked by the antiretroviral drug azidothymidine.
Asunto(s)
Biomarcadores de Tumor , Evolución Clonal/genética , Heterogeneidad Genética , Elementos de Nucleótido Esparcido Largo , Neoplasias/genética , Antineoplásicos/química , Antineoplásicos/farmacología , Autopsia , Biopsia , Línea Celular Tumoral , Islas de CpG , Metilación de ADN , Manejo de la Enfermedad , Susceptibilidad a Enfermedades , Resistencia a Antineoplásicos/genética , Epigénesis Genética , Silenciador del Gen , Genómica/métodos , Humanos , Elementos de Nucleótido Esparcido Largo/efectos de los fármacos , Terapia Molecular Dirigida/métodos , Neoplasias/diagnóstico , Neoplasias/mortalidad , ARN Interferente Pequeño/genética , Retroelementos , Resultado del TratamientoRESUMEN
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Asunto(s)
Variación Genética , Genoma Humano , Haplotipos , Femenino , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Secuencias Repetitivas Esparcidas , Masculino , Grupos de Población/genética , Sitios de Carácter Cuantitativo , Retroelementos , Análisis de Secuencia de ADN , Inversión de Secuencia , Secuenciación Completa del GenomaRESUMEN
Most cancers are characterized by the somatic acquisition of genomic rearrangements during tumour evolution that eventually drive the oncogenesis. Here, using multiplatform sequencing technologies, we identify and characterize a remarkable mutational mechanism in human hepatocellular carcinoma caused by Hepatitis B virus, by which DNA molecules from the virus are inserted into the tumour genome causing dramatic changes in its configuration, including non-homologous chromosomal fusions, dicentric chromosomes and megabase-size telomeric deletions. This aberrant mutational mechanism, present in at least 8% of all HCC tumours, can provide the driver rearrangements that a cancer clone requires to survive and grow, including loss of relevant tumour suppressor genes. Most of these events are clonal and occur early during liver cancer evolution. Real-time timing estimation reveals some HBV-mediated rearrangements occur as early as two decades before cancer diagnosis. Overall, these data underscore the importance of characterising liver cancer genomes for patterns of HBV integration.
Asunto(s)
Carcinoma Hepatocelular/genética , ADN Viral , Genoma Humano , Virus de la Hepatitis B/genética , Neoplasias Hepáticas/genética , Carcinoma Hepatocelular/virología , Regulación Neoplásica de la Expresión Génica , Humanos , Integración Viral , Secuenciación Completa del GenomaRESUMEN
About half of all cancers have somatic integrations of retrotransposons. Here, to characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 38 histological cancer subtypes within the framework of the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. We identified 19,166 somatically acquired retrotransposition events, which affected 35% of samples and spanned a range of event types. Long interspersed nuclear element (LINE-1; L1 hereafter) insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, which sometimes leads to the removal of tumor-suppressor genes, and can induce complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications for the development of human tumors.
Asunto(s)
Carcinogénesis/genética , Reordenamiento Génico/genética , Genoma Humano/genética , Elementos de Nucleótido Esparcido Largo/genética , Neoplasias/genética , Retroelementos/genética , Humanos , Neoplasias/patologíaRESUMEN
The multiple myeloma (MM) genome is heterogeneous and evolves through preclinical and post-diagnosis phases. Here we report a catalog and hierarchy of driver lesions using sequences from 67 MM genomes serially collected from 30 patients together with public exome datasets. Bayesian clustering defines at least 7 genomic subgroups with distinct sets of co-operating events. Focusing on whole genome sequencing data, complex structural events emerge as major drivers, including chromothripsis and a novel replication-based mechanism of templated insertions, which typically occur early. Hyperdiploidy also occurs early, with individual trisomies often acquired in different chronological windows during evolution, and with a preferred order of acquisition. Conversely, positively selected point mutations, whole genome duplication and chromoplexy events occur in later disease phases. Thus, initiating driver events, drawn from a limited repertoire of structural and numerical chromosomal changes, shape preferred trajectories of evolution that are biologically relevant but heterogeneous across patients.