RESUMO
The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (â¼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.
Assuntos
Genoma Humano/genética , Genômica/métodos , Mutação/genética , Neoplasias/genética , Análise Mutacional de DNA/métodos , Progressão da Doença , Humanos , Neoplasias/patologia , Sequenciamento Completo do GenomaRESUMO
Spina bifida (SB) is a debilitating birth defect caused by multiple gene and environment interactions. Though SB shows non-Mendelian inheritance, genetic factors contribute to an estimated 70% of cases. Nevertheless, identifying human mutations conferring SB risk is challenging due to its relative rarity, genetic heterogeneity, incomplete penetrance, and environmental influences that hamper genome-wide association studies approaches to untargeted discovery. Thus, SB genetic studies may suffer from population substructure and/or selection bias introduced by typical candidate gene searches. We report a population based, ancestry-matched whole-genome sequence analysis of SB genetic predisposition using a systems biology strategy to interrogate 298 case-control subject genomes (149 pairs). Genes that were enriched in likely gene disrupting (LGD), rare protein-coding variants were subjected to machine learning analysis to identify genes in which LGD variants occur with a different frequency in cases versus controls and so discriminate between these groups. Those genes with high discriminatory potential for SB significantly enriched pathways pertaining to carbon metabolism, inflammation, innate immunity, cytoskeletal regulation, and essential transcriptional regulation consistent with their having impact on the pathogenesis of human SB. Additionally, an interrogation of conserved noncoding sequences identified robust variant enrichment in regulatory regions of several transcription factors critical to embryonic development. This genome-wide perspective offers an effective approach to the interrogation of coding and noncoding sequence variant contributions to rare complex genetic disorders.
Assuntos
Genoma Humano , Disrafismo Espinal/genética , Estudos de Casos e Controles , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Biologia de Sistemas , Fatores de Transcrição/genéticaRESUMO
Most mutations in cancer genomes occur in the non-coding regions with unknown impact on tumor development. Although the increase in the number of cancer whole-genome sequences has revealed numerous putative non-coding cancer drivers, their information is dispersed across multiple studies making it difficult to understand their roles in tumorigenesis of different cancer types. We have developed CNCDatabase, Cornell Non-coding Cancer driver Database (https://cncdatabase.med.cornell.edu/) that contains detailed information about predicted non-coding drivers at gene promoters, 5' and 3' UTRs (untranslated regions), enhancers, CTCF insulators and non-coding RNAs. CNCDatabase documents 1111 protein-coding genes and 90 non-coding RNAs with reported drivers in their non-coding regions from 32 cancer types by computational predictions of positive selection using whole-genome sequences; differential gene expression in samples with and without mutations; or another set of experimental validations including luciferase reporter assays and genome editing. The database can be easily modified and scaled as lists of non-coding drivers are revised in the community with larger whole-genome sequencing studies, CRISPR screens and further experimental validations. Overall, CNCDatabase provides a helpful resource for researchers to explore the pathological role of non-coding alterations in human cancers.
Assuntos
Carcinogênese/genética , Bases de Dados Genéticas , Regulação Neoplásica da Expressão Gênica , Genoma Humano , Neoplasias/genética , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Carcinogênese/metabolismo , Carcinogênese/patologia , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Elementos Facilitadores Genéticos , Genes Reporter , Humanos , Elementos Isolantes , Luciferases/genética , Luciferases/metabolismo , Mutação , Neoplasias/metabolismo , Neoplasias/patologia , Fases de Leitura Aberta , Regiões Promotoras Genéticas , RNA não Traduzido/classificação , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Regiões não Traduzidas , Sequenciamento Completo do GenomaRESUMO
The growing catalogue of structural variants in humans often overlooks inversions as one of the most difficult types of variation to study, even though they affect phenotypic traits in diverse organisms. Here, we have analysed in detail 90 inversions predicted from the comparison of two independently assembled human genomes: the reference genome (NCBI36/HG18) and HuRef. Surprisingly, we found that two thirds of these predictions (62) represent errors either in assembly comparison or in one of the assemblies, including 27 misassembled regions in HG18. Next, we validated 22 of the remaining 28 potential polymorphic inversions using different PCR techniques and characterized their breakpoints and ancestral state. In addition, we determined experimentally the derived allele frequency in Europeans for 17 inversions (DAF = 0.01-0.80), as well as the distribution in 14 worldwide populations for 12 of them based on the 1000 Genomes Project data. Among the validated inversions, nine have inverted repeats (IRs) at their breakpoints, and two show nucleotide variation patterns consistent with a recurrent origin. Conversely, inversions without IRs have a unique origin and almost all of them show deletions or insertions at the breakpoints in the derived allele mediated by microhomology sequences, which highlights the importance of mechanisms like FoSTeS/MMBIR in the generation of complex rearrangements in the human genome. Finally, we found several inversions located within genes and at least one candidate to be positively selected in Africa. Thus, our study emphasizes the importance of careful analysis and validation of large-scale genomic predictions to extract reliable biological conclusions.
Assuntos
Inversão Cromossômica/genética , Genoma Humano/genética , Anotação de Sequência Molecular , Inversão de Sequência/genética , Evolução Molecular , Humanos , Polimorfismo Genético , Seleção Genética/genética , Análise de Sequência de DNARESUMO
In recent years different types of structural variants (SVs) have been discovered in the human genome and their functional impact has become increasingly clear. Inversions, however, are poorly characterized and more difficult to study, especially those mediated by inverted repeats or segmental duplications. Here, we describe the results of a simple and fast inverse PCR (iPCR) protocol for high-throughput genotyping of a wide variety of inversions using a small amount of DNA. In particular, we analyzed 22 inversions predicted in humans ranging from 5.1 kb to 226 kb and mediated by inverted repeat sequences of 1.6-24 kb. First, we validated 17 of the 22 inversions in a panel of nine HapMap individuals from different populations, and we genotyped them in 68 additional individuals of European origin, with correct genetic transmission in â¼ 12 mother-father-child trios. Global inversion minor allele frequency varied between 1% and 49% and inversion genotypes were consistent with Hardy-Weinberg equilibrium. By analyzing the nucleotide variation and the haplotypes in these regions, we found that only four inversions have linked tag-SNPs and that in many cases there are multiple shared SNPs between standard and inverted chromosomes, suggesting an unexpected high degree of inversion recurrence during human evolution. iPCR was also used to check 16 of these inversions in four chimpanzees and two gorillas, and 10 showed both orientations either within or between species, providing additional support for their multiple origin. Finally, we have identified several inversions that include genes in the inverted or breakpoint regions, and at least one disrupts a potential coding gene. Thus, these results represent a significant advance in our understanding of inversion polymorphism in human populations and challenge the common view of a single origin of inversions, with important implications for inversion analysis in SNP-based studies.
Assuntos
Inversão Cromossômica/genética , Evolução Molecular , Sequências Repetidas Invertidas/genética , Duplicações Segmentares Genômicas/genética , Animais , Mapeamento Cromossômico , Genoma Humano , Projeto HapMap , Humanos , Pan troglodytes/genética , Polimorfismo GenéticoRESUMO
The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Humano , Inversão de Sequência , Pontos de Quebra do Cromossomo , Inversão Cromossômica , Humanos , Internet , Polimorfismo Genético , Duplicações Segmentares Genômicas , Integração de SistemasRESUMO
Structural variations (SVs) in cancer cells often impact large genomic regions with functional consequences. However, identification of SVs under positive selection is a challenging task because little is known about the genomic features related to the background breakpoint distribution in different cancers. We report a method that uses a generalized additive model to investigate the breakpoint proximity curves from 2,382 whole-genomes of 32 cancer types. We find that a multivariate model, which includes linear and nonlinear partial contributions of various tissue-specific features and their interaction terms, can explain up to 57% of the observed deviance of breakpoint proximity. In particular, three-dimensional genomic features such as topologically associating domains (TADs), TAD-boundaries and their interaction with other features show significant contributions. The model is validated by identification of known cancer genes and revealed putative drivers in cancers different than those with previous evidence of positive selection.
Assuntos
Cromatina , Neoplasias , Genoma , Genômica , Humanos , Neoplasias/genéticaRESUMO
Radical cystectomy with pelvic lymphadenectomy and urinary diversion is the standard treatment for patients diagnosed with localized muscle-invasive bladder cancer. Enhanced recovery after surgery (ERAS) is a multimodal perioperative care pathway comprising recommendations on different items with variable evidence that are aimed at improving outcomes. This review provides an overview of the application of specific elements of the ERAS guidelines. Forty-eight series were identified through our literature search. The studies reported a median of 16 out of the 22 ERAS steps (72.7%). The elements were applied in 79.3% of cases (interquartile range 61.1-85%) if mentioned in the studies, decreasing to 73.5% in the postoperative period. PATIENT SUMMARY: Guidelines on enhanced recovery after surgery recommend steps to follow and cover all areas of the patient's journey through the surgical process. We looked at the application of the elements for patients with bladder cancer. We found inconsistent reporting and use.
Assuntos
Recuperação Pós-Cirúrgica Melhorada , Neoplasias da Bexiga Urinária , Humanos , Neoplasias da Bexiga Urinária/cirurgiaRESUMO
In castration-resistant prostate cancer (CRPC), the loss of androgen receptor (AR) dependence leads to clinically aggressive tumors with few therapeutic options. We used ATAC-seq (assay for transposase-accessible chromatin sequencing), RNA-seq, and DNA sequencing to investigate 22 organoids, six patient-derived xenografts, and 12 cell lines. We identified the well-characterized AR-dependent and neuroendocrine subtypes, as well as two AR-negative/low groups: a Wnt-dependent subtype, and a stem cell-like (SCL) subtype driven by activator protein-1 (AP-1) transcription factors. We used transcriptomic signatures to classify 366 patients, which showed that SCL is the second most common subtype of CRPC after AR-dependent. Our data suggest that AP-1 interacts with the YAP/TAZ and TEAD proteins to maintain subtype-specific chromatin accessibility and transcriptomic landscapes in this group. Together, this molecular classification reveals drug targets and can potentially guide therapeutic decisions.
Assuntos
Cromatina , Terapia de Alvo Molecular , Neoplasias de Próstata Resistentes à Castração , Linhagem Celular Tumoral , Cromatina/genética , Perfilação da Expressão Gênica , Humanos , Masculino , Células-Tronco Neoplásicas/classificação , Células-Tronco Neoplásicas/metabolismo , Organoides/metabolismo , Organoides/patologia , Neoplasias de Próstata Resistentes à Castração/classificação , Neoplasias de Próstata Resistentes à Castração/tratamento farmacológico , Neoplasias de Próstata Resistentes à Castração/genética , Receptores Androgênicos/genética , Receptores Androgênicos/metabolismo , Fator de Transcrição AP-1/genética , Fator de Transcrição AP-1/metabolismoRESUMO
Non-coding variants have been shown to be related to disease by alteration of 3D genome structures. We propose a deep learning method, DeepMILO, to predict the effects of variants on CTCF/cohesin-mediated insulator loops. Application of DeepMILO on variants from whole-genome sequences of 1834 patients of twelve cancer types revealed 672 insulator loops disrupted in at least 10% of patients. Our results show mutations at loop anchors are associated with upregulation of the cancer driver genes BCL2 and MYC in malignant lymphoma thus pointing to a possible new mechanism for their dysregulation via alteration of insulator loops.
Assuntos
Cromatina/química , Aprendizado Profundo , Elementos Isolantes , Neoplasias/genética , Fator de Ligação a CCCTC/metabolismo , Proteínas de Ciclo Celular/metabolismo , Linhagem Celular Tumoral , Proteínas Cromossômicas não Histona/metabolismo , Humanos , Mutação , Sequenciamento Completo do Genoma , CoesinasRESUMO
Recent studies have shown that mutations at non-coding elements, such as promoters and enhancers, can act as cancer drivers. However, an important class of non-coding elements, namely CTCF insulators, has been overlooked in the previous driver analyses. We used insulator annotations from CTCF and cohesin ChIA-PET and analyzed somatic mutations in 1,962 whole genomes from 21 cancer types. Using the heterogeneous patterns of transcription-factor-motif disruption, functional impact, and recurrence of mutations, we developed a computational method that revealed 21 insulators showing signals of positive selection. In particular, mutations in an insulator in multiple cancer types, including 16% of melanoma samples, are associated with TGFB1 up-regulation. Using CRISPR-Cas9, we find that alterations at two of the most frequently mutated regions in this insulator increase cell growth by 40%-50%, supporting the role of this boundary element as a cancer driver. Thus, our study reveals several CTCF insulators as putative cancer drivers.
Assuntos
Fator de Ligação a CCCTC/genética , Fator de Ligação a CCCTC/metabolismo , Animais , Proteínas de Ciclo Celular/genética , Proteínas Cromossômicas não Histona/genética , Proteínas de Ligação a DNA/genética , Regulação da Expressão Gênica/genética , Regulação Neoplásica da Expressão Gênica/genética , Genoma Humano , Humanos , Mutação , Neoplasias/genética , Neoplasias/metabolismo , Regiões Promotoras Genéticas/genética , Proteínas Repressoras/genética , CoesinasRESUMO
We report a novel computational method, RegNetDriver, to identify tumorigenic drivers using the combined effects of coding and non-coding single nucleotide variants, structural variants, and DNA methylation changes in the DNase I hypersensitivity based regulatory network. Integration of multi-omics data from 521 prostate tumor samples indicated a stronger regulatory impact of structural variants, as they affect more transcription factor hubs in the tissue-specific network. Moreover, crosstalk between transcription factor hub expression modulated by structural variants and methylation levels likely leads to the differential expression of target genes. We report known prostate tumor regulatory drivers and nominate novel transcription factors (ERF, CREB3L1, and POU2F2), which are supported by functional validation.
Assuntos
Algoritmos , Carcinogênese/genética , Proteína de Ligação ao Elemento de Resposta ao AMP Cíclico/genética , Regulação Neoplásica da Expressão Gênica , Proteínas do Tecido Nervoso/genética , Fator 2 de Transcrição de Octâmero/genética , Neoplasias da Próstata/genética , Proteínas Repressoras/genética , Sítios de Ligação , Carcinogênese/metabolismo , Carcinogênese/patologia , Mapeamento Cromossômico , Proteína de Ligação ao Elemento de Resposta ao AMP Cíclico/metabolismo , Metilação de DNA , Desoxirribonuclease I , Epigênese Genética , Redes Reguladoras de Genes , Humanos , Masculino , Proteínas do Tecido Nervoso/metabolismo , Fator 2 de Transcrição de Octâmero/metabolismo , Especificidade de Órgãos , Polimorfismo de Nucleotídeo Único , Regiões Promotoras Genéticas , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/patologia , Ligação Proteica , Mapeamento de Interação de Proteínas , Proteínas Repressoras/metabolismoRESUMO
UNLABELLED: Next-generation sequencing technologies expedited research to develop efficient computational tools for the identification of structural variants (SVs) and their use to study human diseases. As deeper data is obtained, the existence of higher complexity SVs in some genomes becomes more evident, but the detection and definition of most of these complex rearrangements is still in its infancy. The full characterization of SVs is a key aspect for discovering their biological implications. Here we present a pipeline (PeSV-Fisher) for the detection of deletions, gains, intra- and inter-chromosomal translocations, and inversions, at very reasonable computational costs. We further provide comprehensive information on co-localization of SVs in the genome, a crucial aspect for studying their biological consequences. The algorithm uses a combination of methods based on paired-reads and read-depth strategies. PeSV-Fisher has been designed with the aim to facilitate identification of somatic variation, and, as such, it is capable of analysing two or more samples simultaneously, producing a list of non-shared variants between samples. We tested PeSV-Fisher on available sequencing data, and compared its behaviour to that of frequently deployed tools (BreakDancer and VariationHunter). We have also tested this algorithm on our own sequencing data, obtained from a tumour and a normal blood sample of a patient with chronic lymphocytic leukaemia, on which we have also validated the results by targeted re-sequencing of different kinds of predictions. This allowed us to determine confidence parameters that influence the reliability of breakpoint predictions. AVAILABILITY: PeSV-Fisher is available at http://gd.crg.eu/tools.