Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 55
Filter
1.
Cell ; 180(5): 915-927.e16, 2020 03 05.
Article in English | MEDLINE | ID: mdl-32084333

ABSTRACT

The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (∼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.


Subject(s)
Genome, Human/genetics , Genomics/methods , Mutation/genetics , Neoplasms/genetics , DNA Mutational Analysis/methods , Disease Progression , Humans , Neoplasms/pathology , Whole Genome Sequencing
2.
Cell ; 173(2): 371-385.e18, 2018 04 05.
Article in English | MEDLINE | ID: mdl-29625053

ABSTRACT

Identifying molecular cancer drivers is critical for precision oncology. Multiple advanced algorithms to identify drivers now exist, but systematic attempts to combine and optimize them on large datasets are few. We report a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations. We identify 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors.


Subject(s)
Neoplasms/pathology , Algorithms , B7-H1 Antigen/genetics , Computational Biology , Databases, Genetic , Entropy , Humans , Microsatellite Instability , Mutation , Neoplasms/genetics , Neoplasms/immunology , Principal Component Analysis , Programmed Cell Death 1 Receptor/genetics
4.
Nature ; 578(7793): 112-121, 2020 02.
Article in English | MEDLINE | ID: mdl-32025012

ABSTRACT

A key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments that range in size from kilobases to whole chromosomes1-7. Here we develop methods to group, classify and describe somatic structural variants, using data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumour types8. Sixteen signatures of structural variation emerged. Deletions have a multimodal size distribution, assort unevenly across tumour types and patients, are enriched in late-replicating regions and correlate with inversions. Tandem duplications also have a multimodal size distribution, but are enriched in early-replicating regions-as are unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy-number gains and frequent inverted rearrangements. One prominent structure consists of 2-7 templates copied from distinct regions of the genome strung together within one locus. Such cycles of templated insertions correlate with tandem duplications, and-in liver cancer-frequently activate the telomerase gene TERT. A wide variety of rearrangement processes are active in cancer, which generate complex configurations of the genome upon which selection can act.


Subject(s)
Genetic Variation , Genome, Human/genetics , Neoplasms/genetics , Gene Rearrangement/genetics , Genomics , Humans , Mutagenesis, Insertional , Telomerase/genetics
6.
Proc Natl Acad Sci U S A ; 118(51)2021 12 21.
Article in English | MEDLINE | ID: mdl-34916285

ABSTRACT

Spina bifida (SB) is a debilitating birth defect caused by multiple gene and environment interactions. Though SB shows non-Mendelian inheritance, genetic factors contribute to an estimated 70% of cases. Nevertheless, identifying human mutations conferring SB risk is challenging due to its relative rarity, genetic heterogeneity, incomplete penetrance, and environmental influences that hamper genome-wide association studies approaches to untargeted discovery. Thus, SB genetic studies may suffer from population substructure and/or selection bias introduced by typical candidate gene searches. We report a population based, ancestry-matched whole-genome sequence analysis of SB genetic predisposition using a systems biology strategy to interrogate 298 case-control subject genomes (149 pairs). Genes that were enriched in likely gene disrupting (LGD), rare protein-coding variants were subjected to machine learning analysis to identify genes in which LGD variants occur with a different frequency in cases versus controls and so discriminate between these groups. Those genes with high discriminatory potential for SB significantly enriched pathways pertaining to carbon metabolism, inflammation, innate immunity, cytoskeletal regulation, and essential transcriptional regulation consistent with their having impact on the pathogenesis of human SB. Additionally, an interrogation of conserved noncoding sequences identified robust variant enrichment in regulatory regions of several transcription factors critical to embryonic development. This genome-wide perspective offers an effective approach to the interrogation of coding and noncoding sequence variant contributions to rare complex genetic disorders.


Subject(s)
Genome, Human , Spinal Dysraphism/genetics , Case-Control Studies , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Systems Biology , Transcription Factors/genetics
7.
Nucleic Acids Res ; 49(D1): D1094-D1101, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33095860

ABSTRACT

Most mutations in cancer genomes occur in the non-coding regions with unknown impact on tumor development. Although the increase in the number of cancer whole-genome sequences has revealed numerous putative non-coding cancer drivers, their information is dispersed across multiple studies making it difficult to understand their roles in tumorigenesis of different cancer types. We have developed CNCDatabase, Cornell Non-coding Cancer driver Database (https://cncdatabase.med.cornell.edu/) that contains detailed information about predicted non-coding drivers at gene promoters, 5' and 3' UTRs (untranslated regions), enhancers, CTCF insulators and non-coding RNAs. CNCDatabase documents 1111 protein-coding genes and 90 non-coding RNAs with reported drivers in their non-coding regions from 32 cancer types by computational predictions of positive selection using whole-genome sequences; differential gene expression in samples with and without mutations; or another set of experimental validations including luciferase reporter assays and genome editing. The database can be easily modified and scaled as lists of non-coding drivers are revised in the community with larger whole-genome sequencing studies, CRISPR screens and further experimental validations. Overall, CNCDatabase provides a helpful resource for researchers to explore the pathological role of non-coding alterations in human cancers.


Subject(s)
Carcinogenesis/genetics , Databases, Genetic , Gene Expression Regulation, Neoplastic , Genome, Human , Neoplasms/genetics , 3' Untranslated Regions , 5' Untranslated Regions , Carcinogenesis/metabolism , Carcinogenesis/pathology , Clustered Regularly Interspaced Short Palindromic Repeats , Enhancer Elements, Genetic , Genes, Reporter , Humans , Insulator Elements , Luciferases/genetics , Luciferases/metabolism , Mutation , Neoplasms/metabolism , Neoplasms/pathology , Open Reading Frames , Promoter Regions, Genetic , RNA, Untranslated/classification , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Untranslated Regions , Whole Genome Sequencing
8.
PLoS Genet ; 16(4): e1008663, 2020 04.
Article in English | MEDLINE | ID: mdl-32243438

ABSTRACT

Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identified LoF-tolerant protein-coding genes. However, the tolerance of human genomes to losing enhancers has not yet been evaluated. Here we present the catalog of LoF-tolerant enhancers using structural variants from whole-genome sequences. Using a conservative approach, we estimate that individual human genomes possess at least 28 LoF-tolerant enhancers on average. We assessed the properties of LoF-tolerant enhancers in a unified regulatory network constructed by integrating tissue-specific enhancers and gene-gene interactions. We find that LoF-tolerant enhancers tend to be more tissue-specific and regulate fewer and more dispensable genes relative to other enhancers. They are enriched in immune-related cells while enhancers with low LoF-tolerance are enriched in kidney and brain/neuronal stem cells. We developed a supervised learning approach to predict the LoF-tolerance of all enhancers, which achieved an area under the receiver operating characteristics curve (AUROC) of 98%. We predict 3,519 more enhancers would be likely tolerant to LoF and 129 enhancers that would have low LoF-tolerance. Our predictions are supported by a known set of disease enhancers and novel deletions from PacBio sequencing. The LoF-tolerance scores provided here will serve as an important reference for disease studies.


Subject(s)
Enhancer Elements, Genetic/genetics , Genome, Human/genetics , Loss of Function Mutation , Conserved Sequence , Disease/genetics , Gene Expression Regulation , Genetic Predisposition to Disease , Humans , Organ Specificity/genetics , ROC Curve , Reproducibility of Results , Supervised Machine Learning
9.
Nat Rev Genet ; 17(2): 93-108, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26781813

ABSTRACT

Patients with cancer carry somatic sequence variants in their tumour in addition to the germline variants in their inherited genome. Although variants in protein-coding regions have received the most attention, numerous studies have noted the importance of non-coding variants in cancer. Moreover, the overwhelming majority of variants, both somatic and germline, occur in non-coding portions of the genome. We review the current understanding of non-coding variants in cancer, including the great diversity of the mutation types--from single nucleotide variants to large genomic rearrangements--and the wide range of mechanisms by which they affect gene expression to promote tumorigenesis, such as disrupting transcription factor-binding sites or functions of non-coding RNAs. We highlight specific case studies of somatic and germline variants, and discuss how non-coding variants can be interpreted on a large-scale through computational and experimental methods.


Subject(s)
Genetic Variation , Neoplasms/genetics , RNA, Untranslated , Gene Expression Regulation, Neoplastic , Genome-Wide Association Study , Genomic Instability , Germ-Line Mutation , Humans
10.
Am J Hum Genet ; 102(5): 920-942, 2018 05 03.
Article in English | MEDLINE | ID: mdl-29727691

ABSTRACT

We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources).


Subject(s)
Algorithms , DNA, Intergenic/genetics , Genetic Variation , Models, Genetic , Organ Specificity/genetics , Genome-Wide Association Study , Humans , Linkage Disequilibrium/genetics , Molecular Sequence Annotation , Polymorphism, Single Nucleotide/genetics , Probability , Quantitative Trait Loci/genetics , Reproducibility of Results , Twins/genetics
11.
J Pathol ; 244(2): 143-150, 2018 02.
Article in English | MEDLINE | ID: mdl-29149504

ABSTRACT

Breast adenoid cystic carcinoma (AdCC), a rare type of triple-negative breast cancer, has been shown to be driven by MYB pathway activation, most often underpinned by the MYB-NFIB fusion gene. Alternative genetic mechanisms, such as MYBL1 rearrangements, have been reported in MYB-NFIB-negative salivary gland AdCCs. Here we report on the molecular characterization by massively parallel sequencing of four breast AdCCs lacking the MYB-NFIB fusion gene. In two cases, we identified MYBL1 rearrangements (MYBL1-ACTN1 and MYBL1-NFIB), which were associated with MYBL1 overexpression. A third AdCC harboured a high-level MYB amplification, which resulted in MYB overexpression at the mRNA and protein levels. RNA-sequencing and whole-genome sequencing revealed no definite alternative driver in the fourth AdCC studied, despite high levels of MYB expression and the activation of pathways similar to those activated in MYB-NFIB-positive AdCCs. In this case, a deletion encompassing the last intron and part of exon 15 of MYB, including the binding site of ERG-1, a transcription factor that may downregulate MYB, and the exon 15 splice site, was detected. In conclusion, we demonstrate that MYBL1 rearrangements and MYB amplification probably constitute alternative genetic drivers of breast AdCCs, functioning through MYBL1 or MYB overexpression. These observations emphasize that breast AdCCs probably constitute a convergent phenotype, whereby activation of MYB and MYBL1 and their downstream targets can be driven by the MYB-NFIB fusion gene, MYBL1 rearrangements, MYB amplification, or other yet to be identified mechanisms. Copyright © 2017 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.


Subject(s)
Biomarkers, Tumor/genetics , Carcinoma, Adenoid Cystic/genetics , Gene Amplification , Gene Fusion , Gene Rearrangement , Oncogene Proteins, Fusion/genetics , Proto-Oncogene Proteins c-myb/genetics , Proto-Oncogene Proteins/genetics , Trans-Activators/genetics , Triple Negative Breast Neoplasms/genetics , Biomarkers, Tumor/analysis , Carcinoma, Adenoid Cystic/chemistry , Carcinoma, Adenoid Cystic/pathology , Female , Genetic Predisposition to Disease , Humans , Middle Aged , Phenotype , Proto-Oncogene Proteins c-myb/analysis , Triple Negative Breast Neoplasms/chemistry , Triple Negative Breast Neoplasms/pathology
12.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955619

ABSTRACT

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Subject(s)
DNA/genetics , Encyclopedias as Topic , Gene Regulatory Networks/genetics , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Alleles , Cell Line , GATA1 Transcription Factor/metabolism , Gene Expression Profiling , Genomics , Humans , K562 Cells , Organ Specificity , Phosphorylation/genetics , Polymorphism, Single Nucleotide/genetics , Protein Interaction Maps , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Selection, Genetic/genetics , Transcription Initiation Site
14.
Nature ; 470(7332): 59-65, 2011 Feb 03.
Article in English | MEDLINE | ID: mdl-21293372

ABSTRACT

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.


Subject(s)
DNA Copy Number Variations/genetics , Genetics, Population , Genome, Human/genetics , Genomics , Gene Duplication/genetics , Genetic Predisposition to Disease/genetics , Genotype , Humans , Mutagenesis, Insertional/genetics , Reproducibility of Results , Sequence Analysis, DNA , Sequence Deletion/genetics
15.
Nucleic Acids Res ; 43(17): 8123-34, 2015 Sep 30.
Article in English | MEDLINE | ID: mdl-26304545

ABSTRACT

In cancer research, background models for mutation rates have been extensively calibrated in coding regions, leading to the identification of many driver genes, recurrently mutated more than expected. Noncoding regions are also associated with disease; however, background models for them have not been investigated in as much detail. This is partially due to limited noncoding functional annotation. Also, great mutation heterogeneity and potential correlations between neighboring sites give rise to substantial overdispersion in mutation count, resulting in problematic background rate estimation. Here, we address these issues with a new computational framework called LARVA. It integrates variants with a comprehensive set of noncoding functional elements, modeling the mutation counts of the elements with a ß-binomial distribution to handle overdispersion. LARVA, moreover, uses regional genomic features such as replication timing to better estimate local mutation rates and mutational hotspots. We demonstrate LARVA's effectiveness on 760 whole-genome tumor sequences, showing that it identifies well-known noncoding drivers, such as mutations in the TERT promoter. Furthermore, LARVA highlights several novel highly mutated regulatory sites that could potentially be noncoding drivers. We make LARVA available as a software tool and release our highly mutated annotations as an online resource (larva.gersteinlab.org).


Subject(s)
Genomics/methods , Mutation , Neoplasms/genetics , Regulatory Sequences, Nucleic Acid , Software , Genome , Humans , Molecular Sequence Annotation , Mutation Rate
16.
Genome Res ; 21(2): 276-85, 2011 Feb.
Article in English | MEDLINE | ID: mdl-21177971

ABSTRACT

We present an integrative machine learning method, incRNA, for whole-genome identification of noncoding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find more than 7000 novel ncRNA candidates, among which more than 1000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the approximately 7000 novel ncRNA candidates are true positives. We then analyze 15 novel ncRNA candidates by RT-PCR, detecting the expression for 14. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors (∼59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.


Subject(s)
Caenorhabditis elegans/genetics , High-Throughput Nucleotide Sequencing , Oligonucleotide Array Sequence Analysis , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , Algorithms , Animals , Binding Sites/genetics , DNA, Intergenic/genetics , Gene Expression Profiling , Molecular Sequence Annotation , Nucleic Acid Conformation , RNA Polymerase II/metabolism , Transcription Factors/metabolism
17.
PLoS Comput Biol ; 9(3): e1002886, 2013.
Article in English | MEDLINE | ID: mdl-23505346

ABSTRACT

The decreasing cost of sequencing is leading to a growing repertoire of personal genomes. However, we are lagging behind in understanding the functional consequences of the millions of variants obtained from sequencing. Global system-wide effects of variants in coding genes are particularly poorly understood. It is known that while variants in some genes can lead to diseases, complete disruption of other genes, called 'loss-of-function tolerant', is possible with no obvious effect. Here, we build a systems-based classifier to quantitatively estimate the global perturbation caused by deleterious mutations in each gene. We first survey the degree to which gene centrality in various individual networks and a unified 'Multinet' correlates with the tolerance to loss-of-function mutations and evolutionary conservation. We find that functionally significant and highly conserved genes tend to be more central in physical protein-protein and regulatory networks. However, this is not the case for metabolic pathways, where the highly central genes have more duplicated copies and are more tolerant to loss-of-function mutations. Integration of three-dimensional protein structures reveals that the correlation with centrality in the protein-protein interaction network is also seen in terms of the number of interaction interfaces used. Finally, combining all the network and evolutionary properties allows us to build a classifier distinguishing functionally essential and loss-of-function tolerant genes with higher accuracy (AUC = 0.91) than any individual property. Application of the classifier to the whole genome shows its strong potential for interpretation of variants involved in mendelian diseases and in complex disorders probed by genome-wide association studies.


Subject(s)
Gene Regulatory Networks , Genomics/methods , Models, Genetic , Mutation , Protein Interaction Maps , Animals , Humans , Logistic Models , Metabolic Networks and Pathways , Pan troglodytes , Phosphorylation , Reproducibility of Results , Sequence Analysis, DNA/methods , Signal Transduction , Statistics, Nonparametric
18.
Cell Syst ; 2024 Aug 30.
Article in English | MEDLINE | ID: mdl-39236711

ABSTRACT

Most cancer types lack targeted therapeutic options, and when first-line targeted therapies are available, treatment resistance is a huge challenge. Recent technological advances enable the use of assay for transposase-accessible chromatin with sequencing (ATAC-seq) and RNA sequencing (RNA-seq) on patient tissue in a high-throughput manner. Here, we present a computational approach that leverages these datasets to identify drug targets based on tumor lineage. We constructed gene regulatory networks for 371 patients of 22 cancer types using machine learning approaches trained with three-dimensional genomic data for enhancer-to-promoter contacts. Next, we identified the key transcription factors (TFs) in these networks, which are used to find therapeutic vulnerabilities, by direct targeting of either TFs or the proteins that they interact with. We validated four candidates identified for neuroendocrine, liver, and renal cancers, which have a dismal prognosis with current therapeutic options.

19.
bioRxiv ; 2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38895201

ABSTRACT

Transposable elements (TEs) are abundant in the human genome, and they provide the sources for genetic and functional diversity. The regulation of TEs expression and their functional consequences in physiological conditions and cancer development remain to be fully elucidated. Previous studies suggested TEs are repressed by DNA methylation and chromatin modifications. The effect of 3D chromatin topology on TE regulation remains elusive. Here, by integrating transcriptome and 3D genome architecture studies, we showed that haploinsufficient loss of NIPBL selectively activates alternative promoters at the long terminal repeats (LTRs) of the TE subclasses. This activation occurs through the reorganization of topologically associating domain (TAD) hierarchical structures and recruitment of proximal enhancers. These observations indicate that TAD hierarchy restricts transcriptional activation of LTRs that already possess open chromatin features. In cancer, perturbation of the hierarchical chromatin topology can lead to co-option of LTRs as functional alternative promoters in a context-dependent manner and drive aberrant transcriptional activation of novel oncogenes and other divergent transcripts. These data uncovered a new layer of regulatory mechanism of TE expression beyond DNA and chromatin modification in human genome. They also posit the TAD hierarchy dysregulation as a novel mechanism for alternative promoter-mediated oncogene activation and transcriptional diversity in cancer, which may be exploited therapeutically.

20.
Science ; 385(6713): eadk9217, 2024 09 06.
Article in English | MEDLINE | ID: mdl-39236169

ABSTRACT

To identify cancer-associated gene regulatory changes, we generated single-cell chromatin accessibility landscapes across eight tumor types as part of The Cancer Genome Atlas. Tumor chromatin accessibility is strongly influenced by copy number alterations that can be used to identify subclones, yet underlying cis-regulatory landscapes retain cancer type-specific features. Using organ-matched healthy tissues, we identified the "nearest healthy" cell types in diverse cancers, demonstrating that the chromatin signature of basal-like-subtype breast cancer is most similar to secretory-type luminal epithelial cells. Neural network models trained to learn regulatory programs in cancer revealed enrichment of model-prioritized somatic noncoding mutations near cancer-associated genes, suggesting that dispersed, nonrecurrent, noncoding mutations in cancer are functional. Overall, these data and interpretable gene regulatory models for cancer and healthy tissue provide a framework for understanding cancer-specific gene regulation.


Subject(s)
Chromatin , Gene Expression Regulation, Neoplastic , Neoplasms , Single-Cell Analysis , Humans , Chromatin/metabolism , Chromatin/genetics , Neoplasms/genetics , Neural Networks, Computer , Mutation , DNA Copy Number Variations , Breast Neoplasms/genetics , Breast Neoplasms/pathology
SELECTION OF CITATIONS
SEARCH DETAIL