RESUMO
MOTIVATION: Sparse survival models are statistical models that select a subset of predictor variables while modeling the time until an event occurs, which can subsequently help interpretability and transportability. The subset of important features is often obtained with regularized models, such as the Cox Proportional Hazards model with Lasso regularization, which limit the number of non-zero coefficients. However, such models can be sensitive to the choice of regularization hyperparameter. RESULTS: In this work, we develop a software package and demonstrate how knowledge distillation, a powerful technique in machine learning that aims to transfer knowledge from a complex teacher model to a simpler student model, can be leveraged to learn sparse survival models while mitigating this challenge. For this purpose, we present sparsesurv, a Python package that contains a set of teacher-student model pairs, including the semi-parametric accelerated failure time and the extended hazards models as teachers, which currently do not have Python implementations. It also contains in-house survival function estimators, removing the need for external packages. Sparsesurv is validated against R-based Elastic Net regularized linear Cox proportional hazards models as implemented in the commonly used glmnet package. Our results reveal that knowledge distillation-based approaches achieve competitive discriminative performance relative to glmnet across the regularization path while making the choice of the regularization hyperparameter significantly easier. All of these features, combined with a sklearn-like API, make sparsesurv an easy-to-use Python package that enables survival analysis for high-dimensional datasets through fitting sparse survival models via knowledge distillation. AVAILABILITY AND IMPLEMENTATION: sparsesurv is freely available under a BSD 3 license on GitHub (https://github.com/BoevaLab/sparsesurv) and The Python Package Index (PyPi) (https://pypi.org/project/sparsesurv/).
Assuntos
Aprendizado de Máquina , Software , Modelos de Riscos Proporcionais , AlgoritmosRESUMO
Numerous cancer types have shown to present hypermethylation of CpG islands, also known as a CpG island methylator phenotype (CIMP), often associated with survival variation. Despite extensive research on CIMP, the etiology of this variability remains elusive, possibly due to lack of consistency in defining CIMP. In this work, we utilize a pan-cancer approach to further explore CIMP, focusing on 26 cancer types profiled in the Cancer Genome Atlas (TCGA). We defined CIMP systematically and agnostically, discarding any effects associated with age, gender or tumor purity. We then clustered samples based on their most variable DNA methylation values and analyzed resulting patient groups. Our results confirmed the existence of CIMP in 19 cancers, including gliomas and colorectal cancer. We further showed that CIMP was associated with survival differences in eight cancer types and, in five, represented a prognostic biomarker independent of clinical factors. By analyzing genetic and transcriptomic data, we further uncovered potential drivers of CIMP and classified them in four categories: mutations in genes directly involved in DNA demethylation; mutations in histone methyltransferases; mutations in genes not involved in methylation turnover, such as KRAS and BRAF; and microsatellite instability. Among the 19 CIMP-positive cancers, very few shared potential driver events, and those drivers were only IDH1 and SETD2 mutations. Finally, we found that CIMP was strongly correlated with tumor microenvironment characteristics, such as lymphocyte infiltration. Overall, our results indicate that CIMP does not exhibit a pan-cancer manifestation; rather, general dysregulation of CpG DNA methylation is caused by heterogeneous mechanisms.
Assuntos
Neoplasias Colorretais , Neoplasias Colorretais/genética , Ilhas de CpG , Metilação de DNA , Humanos , Instabilidade de Microssatélites , Mutação , Fenótipo , Microambiente TumoralRESUMO
Although originally described as transcriptional activator, SPI1/PU.1, a major player in haematopoiesis whose alterations are associated with haematological malignancies, has the ability to repress transcription. Here, we investigated the mechanisms underlying gene repression in the erythroid lineage, in which SPI1 exerts an oncogenic function by blocking differentiation. We show that SPI1 represses genes by binding active enhancers that are located in intergenic or gene body regions. HDAC1 acts as a cooperative mediator of SPI1-induced transcriptional repression by deacetylating SPI1-bound enhancers in a subset of genes, including those involved in erythroid differentiation. Enhancer deacetylation impacts on promoter acetylation, chromatin accessibility and RNA pol II occupancy. In addition to the activities of HDAC1, polycomb repressive complex 2 (PRC2) reinforces gene repression by depositing H3K27me3 at promoter sequences when SPI1 is located at enhancer sequences. Moreover, our study identified a synergistic relationship between PRC2 and HDAC1 complexes in mediating the transcriptional repression activity of SPI1, ultimately inducing synergistic adverse effects on leukaemic cell survival. Our results highlight the importance of the mechanism underlying transcriptional repression in leukemic cells, involving complex functional connections between SPI1 and the epigenetic regulators PRC2 and HDAC1.
Assuntos
Histona Desacetilase 1 , Leucemia Eritroblástica Aguda , Complexo Repressor Polycomb 2 , Proteínas Proto-Oncogênicas , Transativadores , Acetilação , Animais , Cromatina/genética , Histona Desacetilase 1/genética , Leucemia Eritroblástica Aguda/genética , Camundongos , Complexo Repressor Polycomb 2/genética , Complexo Repressor Polycomb 2/metabolismo , Regiões Promotoras Genéticas , Proteínas Proto-Oncogênicas/genética , Transativadores/genéticaRESUMO
During X chromosome inactivation (XCI), the Polycomb Repressive Complex 2 (PRC2) is thought to participate in the early maintenance of the inactive state. Although Xist RNA is essential for the recruitment of PRC2 to the X chromosome, the precise mechanism remains unclear. Here, we demonstrate that the PRC2 cofactor Jarid2 is an important mediator of Xist-induced PRC2 targeting. The region containing the conserved B and F repeats of Xist is critical for Jarid2 recruitment via its unique N-terminal domain. Xist-induced Jarid2 recruitment occurs chromosome-wide independently of a functional PRC2 complex, unlike at other parts of the genome, such as CG-rich regions, where Jarid2 and PRC2 binding are interdependent. Conversely, we show that Jarid2 loss prevents efficient PRC2 and H3K27me3 enrichment to Xist-coated chromatin. Jarid2 thus represents an important intermediate between PRC2 and Xist RNA for the initial targeting of the PRC2 complex to the X chromosome during onset of XCI.
Assuntos
Complexo Repressor Polycomb 2/metabolismo , RNA Longo não Codificante/fisiologia , Inativação do Cromossomo X , Cromossomo X/metabolismo , Animais , Mecanismo Genético de Compensação de Dose , Humanos , Camundongos , Complexo Repressor Polycomb 2/genética , Complexo Repressor Polycomb 2/fisiologia , RNA Longo não Codificante/metabolismoRESUMO
BACKGROUND: Multiple studies rely on ChIP-seq experiments to assess the effect of gene modulation and drug treatments on protein binding and chromatin structure. However, most methods commonly used for the normalization of ChIP-seq binding intensity signals across conditions, e.g., the normalization to the same number of reads, either assume a constant signal-to-noise ratio across conditions or base the estimates of correction factors on genomic regions with intrinsically different signals between conditions. Inaccurate normalization of ChIP-seq signal may, in turn, lead to erroneous biological conclusions. RESULTS: We developed a new R package, CHIPIN, that allows normalizing ChIP-seq signals across different conditions/samples when spike-in information is not available, but gene expression data are at hand. Our normalization technique is based on the assumption that, on average, no differences in ChIP-seq signals should be observed in the regulatory regions of genes whose expression levels are constant across samples/conditions. In addition to normalizing ChIP-seq signals, CHIPIN provides as output a number of graphs and calculates statistics allowing the user to assess the efficiency of the normalization and qualify the specificity of the antibody used. In addition to ChIP-seq, CHIPIN can be used without restriction on open chromatin ATAC-seq or DNase hypersensitivity data. We validated the CHIPIN method on several ChIP-seq data sets and documented its superior performance in comparison to several commonly used normalization techniques. CONCLUSIONS: The CHIPIN method provides a new way for ChIP-seq signal normalization across conditions when spike-in experiments are not available. The method is implemented in a user-friendly R package available on GitHub: https://github.com/BoevaLab/CHIPIN.
Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Cromatina , Imunoprecipitação da Cromatina , Ligação Proteica , Análise de Sequência de DNARESUMO
We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources).
Assuntos
Algoritmos , DNA Intergênico/genética , Variação Genética , Modelos Genéticos , Especificidade de Órgãos/genética , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação/genética , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Probabilidade , Locos de Características Quantitativas/genética , Reprodutibilidade dos Testes , Gêmeos/genéticaRESUMO
Adrenal cortex steroids are essential for body homeostasis, and adrenal insufficiency is a life-threatening condition. Adrenal endocrine activity is maintained through recruitment of subcapsular progenitor cells that follow a unidirectional differentiation path from zona glomerulosa to zona fasciculata (zF). Here, we show that this unidirectionality is ensured by the histone methyltransferase EZH2. Indeed, we demonstrate that EZH2 maintains adrenal steroidogenic cell differentiation by preventing expression of GATA4 and WT1 that cause abnormal dedifferentiation to a progenitor-like state in Ezh2 KO adrenals. EZH2 further ensures normal cortical differentiation by programming cells for optimal response to adrenocorticotrophic hormone (ACTH)/PKA signaling. This is achieved by repression of phosphodiesterases PDE1B, 3A, and 7A and of PRKAR1B. Consequently, EZH2 ablation results in blunted zF differentiation and primary glucocorticoid insufficiency. These data demonstrate an all-encompassing role for EZH2 in programming steroidogenic cells for optimal response to differentiation signals and in maintaining their differentiated state.
Assuntos
Córtex Suprarrenal/enzimologia , Subunidade RIbeta da Proteína Quinase Dependente de AMP Cíclico/metabolismo , Proteína Potenciadora do Homólogo 2 de Zeste/metabolismo , Transdução de Sinais , Córtex Suprarrenal/metabolismo , Animais , Diferenciação Celular , Subunidade RIbeta da Proteína Quinase Dependente de AMP Cíclico/genética , Nucleotídeo Cíclico Fosfodiesterase do Tipo 1/genética , Nucleotídeo Cíclico Fosfodiesterase do Tipo 1/metabolismo , Nucleotídeo Cíclico Fosfodiesterase do Tipo 3/genética , Nucleotídeo Cíclico Fosfodiesterase do Tipo 3/metabolismo , Nucleotídeo Cíclico Fosfodiesterase do Tipo 7/genética , Nucleotídeo Cíclico Fosfodiesterase do Tipo 7/metabolismo , Proteína Potenciadora do Homólogo 2 de Zeste/genética , Feminino , Masculino , Camundongos Endogâmicos C57BL , Camundongos Knockout , Esteroides/metabolismo , Zona Fasciculada/citologia , Zona Fasciculada/enzimologia , Zona Fasciculada/metabolismo , Zona Glomerulosa/citologia , Zona Glomerulosa/enzimologia , Zona Glomerulosa/metabolismoRESUMO
Super-enhancers (SEs) are key transcriptional drivers of cellular, developmental, and disease states in mammals, yet the conservational and regulatory features of these enhancer elements in nonmammalian vertebrates are unknown. To define SEs in zebrafish and enable sequence and functional comparisons to mouse and human SEs, we used genome-wide histone H3 lysine 27 acetylation (H3K27ac) occupancy as a primary SE delineator. Our study determined the set of SEs in pluripotent state cells and adult zebrafish tissues and revealed both similarities and differences between zebrafish and mammalian SEs. Although the total number of SEs was proportional to the genome size, the genomic distribution of zebrafish SEs differed from that of the mammalian SEs. Despite the evolutionary distance separating zebrafish and mammals and the low overall SE sequence conservation, â¼42% of zebrafish SEs were located in close proximity to orthologs that also were associated with SEs in mouse and human. Compared to their nonassociated counterparts, higher sequence conservation was revealed for those SEs that have maintained orthologous gene associations. Functional dissection of two of these SEs identified conserved sequence elements and tissue-specific expression patterns, while chromatin accessibility analyses predicted transcription factors governing the function of pluripotent state zebrafish SEs. Our zebrafish annotations and comparative studies show the extent of SE usage and their conservation across vertebrates, permitting future gene regulatory studies in several tissues.
Assuntos
Cromatina/genética , Sequência Conservada/genética , Elementos Facilitadores Genéticos , Peixe-Zebra/genética , Acetilação , Animais , Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica no Desenvolvimento , Genômica , Histonas/genética , Humanos , Camundongos , Fatores de Transcrição/genéticaRESUMO
Motivation: In cancer, clonal evolution is assessed based on information coming from single nucleotide variants and copy number alterations. Nonetheless, existing methods often fail to accurately combine information from both sources to truthfully reconstruct clonal populations in a given tumor sample or in a set of tumor samples coming from the same patient. Moreover, previously published methods detect clones from a single set of variants. As a result, compromises have to be done between stringent variant filtering [reducing dispersion in variant allele frequency estimates (VAFs)] and using all biologically relevant variants. Results: We present a framework for defining cancer clones using most reliable variants of high depth of coverage and assigning functional mutations to the detected clones. The key element of our framework is QuantumClone, a method for variant clustering into clones based on VAFs, genotypes of corresponding regions and information about tumor purity. We validated QuantumClone and our framework on simulated data. We then applied our framework to whole genome sequencing data for 19 neuroblastoma trios each including constitutional, diagnosis and relapse samples. We confirmed an enrichment of damaging variants within such pathways as MAPK (mitogen-activated protein kinases), neuritogenesis, epithelial-mesenchymal transition, cell survival and DNA repair. Most pathways had more damaging variants in the expanding clones compared to shrinking ones, which can be explained by the increased total number of variants between these two populations. Functional mutational rate varied for ancestral clones and clones shrinking or expanding upon treatment, suggesting changes in clone selection mechanisms at different time points of tumor evolution. Availability and implementation: Source code and binaries of the QuantumClone R package are freely available for download at https://CRAN.R-project.org/package=QuantumClone. Contact: gudrun.schleiermacher@curie.fr or valentina.boeva@inserm.fr. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Evolução Clonal , Variações do Número de Cópias de DNA , Tipagem Molecular/métodos , Neoplasias/genética , Software , Sequenciamento Completo do Genoma/métodos , Análise por Conglomerados , Análise Mutacional de DNA/métodos , Frequência do Gene , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação , Neoplasias/diagnósticoRESUMO
Comparing histone modification profiles between cancer and normal states, or across different tumor samples, can provide insights into understanding cancer initiation, progression and response to therapy. ChIP-seq histone modification data of cancer samples are distorted by copy number variation innate to any cancer cell. We present HMCan-diff, the first method designed to analyze ChIP-seq data to detect changes in histone modifications between two cancer samples of different genetic backgrounds, or between a cancer sample and a normal control. HMCan-diff explicitly corrects for copy number bias, and for other biases in the ChIP-seq data, which significantly improves prediction accuracy compared to methods that do not consider such corrections. On in silico simulated ChIP-seq data generated using genomes with differences in copy number profiles, HMCan-diff shows a much better performance compared to other methods that have no correction for copy number bias. Additionally, we benchmarked HMCan-diff on four experimental datasets, characterizing two histone marks in two different scenarios. We correlated changes in histone modifications between a cancer and a normal control sample with changes in gene expression. On all experimental datasets, HMCan-diff demonstrated better performance compared to the other methods.
Assuntos
Regulação Neoplásica da Expressão Gênica , Código das Histonas , Histonas/genética , Neoplasias/genética , Software , Algoritmos , Imunoprecipitação da Cromatina , Conjuntos de Dados como Assunto , Progressão da Doença , Dosagem de Genes , Histonas/metabolismo , Humanos , Cadeias de Markov , Neoplasias/metabolismo , Neoplasias/patologiaRESUMO
MOTIVATION: Read simulators combined with alignment evaluation tools provide the most straightforward way to evaluate and compare mappers. Simulation of reads is accompanied by information about their positions in the source genome. This information is then used to evaluate alignments produced by the mapper. Finally, reports containing statistics of successful read alignments are created.In default of standards for encoding read origins, every evaluation tool has to be made explicitly compatible with the simulator used to generate reads. RESULTS: To solve this obstacle, we have created a generic format Read Naming Format (Rnf) for assigning read names with encoded information about original positions. Futhermore, we have developed an associated software package RnfTools containing two principal components. MIShmash applies one of popular read simulating tools (among DwgSim, Art, Mason, CuReSim, etc.) and transforms the generated reads into Rnf format. LAVEnder evaluates then a given read mapper using simulated reads in Rnf format. A special attention is payed to mapping qualities that serve for parametrization of Roc curves, and to evaluation of the effect of read sample contamination. AVAILABILITY AND IMPLEMENTATION: RnfTools: http://karel-brinda.github.io/rnftools Spec. of Rnf: http://karel-brinda.github.io/rnf-spec CONTACT: karel.brinda@univ-mlv.fr.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Simulação por Computador , Genoma , HumanosRESUMO
MOTIVATION: Whole genome sequencing of paired-end reads can be applied to characterize the landscape of large somatic rearrangements of cancer genomes. Several methods for detecting structural variants with whole genome sequencing data have been developed. So far, none of these methods has combined information about abnormally mapped read pairs connecting rearranged regions and associated global copy number changes automatically inferred from the same sequencing data file. Our aim was to create a computational method that could use both types of information, i.e. normal and abnormal reads, and demonstrate that by doing so we can highly improve both sensitivity and specificity rates of structural variant prediction. RESULTS: We developed a computational method, SV-Bay, to detect structural variants from whole genome sequencing mate-pair or paired-end data using a probabilistic Bayesian approach. This approach takes into account depth of coverage by normal reads and abnormalities in read pair mappings. To estimate the model likelihood, SV-Bay considers GC-content and read mappability of the genome, thus making important corrections to the expected read count. For the detection of somatic variants, SV-Bay makes use of a matched normal sample when it is available. We validated SV-Bay on simulated datasets and an experimental mate-pair dataset for the CLB-GA neuroblastoma cell line. The comparison of SV-Bay with several other methods for structural variant detection demonstrated that SV-Bay has better prediction accuracy both in terms of sensitivity and false-positive detection rate. AVAILABILITY AND IMPLEMENTATION: https://github.com/InstitutCurie/SV-Bay CONTACT: valentina.boeva@inserm.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Teorema de Bayes , Estudo de Associação Genômica Ampla , Variação Estrutural do Genoma , Neoplasias/genética , Composição de Bases , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , MetagenômicaRESUMO
BACKGROUND: Methylation of high-density CpG regions known as CpG Islands (CGIs) has been widely described as a mechanism associated with gene expression regulation. Aberrant promoter methylation is considered a hallmark of cancer involved in silencing of tumor suppressor genes and activation of oncogenes. However, recent studies have also challenged the simple model of gene expression control by promoter methylation in cancer, and the precise mechanism of and role played by changes in DNA methylation in carcinogenesis remains elusive. RESULTS: Using a large dataset of 672 matched cancerous and healthy methylomes, gene expression, and copy number profiles accross 3 types of tissues from The Cancer Genome Atlas (TCGA), we perform a detailed meta-analysis to clarify the interplay between promoter methylation and gene expression in normal and cancer samples. On the one hand, we recover the existence of a CpG island methylator phenotype (CIMP) with prognostic value in a subset of breast, colon and lung cancer samples, where a common subset of promoter CGIs hypomethylated in normal samples become hypermethylated. However, this hypermethylation is not accompanied by a decrease in expression of the corresponding genes, which are already lowly expressed in the normal genes. On the other hand, we identify tissue-specific sets of genes, different between normal and cancer samples, whose inter-individual variation in expression is significantly correlated with the variation in methylation of the 3' flanking regions of the promoter CGIs. These subsets of genes are not the same in the different tissues, nor between normal and cancerous samples, but transcription factors are over-represented in all subsets. CONCLUSION: Our results suggest that epigenetic reprogramming in cancer does not contribute to cancer development via direct inhibition of gene expression through promoter hypermethylation. It may instead modify how the expression of a few specific genes, particularly transcription factors, are associated with DNA methylation variations in a tissue-dependent manner.
Assuntos
Metilação de DNA/genética , Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , Regiões Promotoras Genéticas/genética , HumanosRESUMO
MOTIVATION: Because of its low cost, amplicon sequencing, also known as ultra-deep targeted sequencing, is now becoming widely used in oncology for detection of actionable mutations, i.e. mutations influencing cell sensitivity to targeted therapies. Amplicon sequencing is based on the polymerase chain reaction amplification of the regions of interest, a process that considerably distorts the information on copy numbers initially present in the tumor DNA. Therefore, additional experiments such as single nucleotide polymorphism (SNP) or comparative genomic hybridization (CGH) arrays often complement amplicon sequencing in clinics to identify copy number status of genes whose amplification or deletion has direct consequences on the efficacy of a particular cancer treatment. So far, there has been no proven method to extract the information on gene copy number aberrations based solely on amplicon sequencing. RESULTS: Here we present ONCOCNV, a method that includes a multifactor normalization and annotation technique enabling the detection of large copy number changes from amplicon sequencing data. We validated our approach on high and low amplicon density datasets and demonstrated that ONCOCNV can achieve a precision comparable with that of array CGH techniques in detecting copy number aberrations. Thus, ONCOCNV applied on amplicon sequencing data would make the use of additional array CGH or SNP array experiments unnecessary.
Assuntos
Dosagem de Genes , Genes Neoplásicos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Hibridização Genômica Comparativa , DNA de Neoplasias/química , Exoma , Feminino , Humanos , Masculino , Reação em Cadeia da Polimerase , Polimorfismo de Nucleotídeo ÚnicoRESUMO
MOTIVATION: DNA copy number profiles characterize regions of chromosome gains, losses and breakpoints in tumor genomes. Although many models have been proposed to detect these alterations, it is not clear which model is appropriate before visual inspection the signal, noise and models for a particular profile. RESULTS: We propose SegAnnDB, a Web-based computer vision system for genomic segmentation: first, visually inspect the profiles and manually annotate altered regions, then SegAnnDB determines the precise alteration locations using a mathematical model of the data and annotations. SegAnnDB facilitates collaboration between biologists and bioinformaticians, and uses the University of California, Santa Cruz genome browser to visualize copy number alterations alongside known genes. AVAILABILITY AND IMPLEMENTATION: The breakpoints project on INRIA GForge hosts the source code, an Amazon Machine Image can be launched and a demonstration Web site is http://bioviz.rocq.inria.fr.
Assuntos
Variações do Número de Cópias de DNA , Software , Algoritmos , Pontos de Quebra do Cromossomo , Genômica/métodos , InternetRESUMO
MOTIVATION: Cancer cells are often characterized by epigenetic changes, which include aberrant histone modifications. In particular, local or regional epigenetic silencing is a common mechanism in cancer for silencing expression of tumor suppressor genes. Though several tools have been created to enable detection of histone marks in ChIP-seq data from normal samples, it is unclear whether these tools can be efficiently applied to ChIP-seq data generated from cancer samples. Indeed, cancer genomes are often characterized by frequent copy number alterations: gains and losses of large regions of chromosomal material. Copy number alterations may create a substantial statistical bias in the evaluation of histone mark signal enrichment and result in underdetection of the signal in the regions of loss and overdetection of the signal in the regions of gain. RESULTS: We present HMCan (Histone modifications in cancer), a tool specially designed to analyze histone modification ChIP-seq data produced from cancer genomes. HMCan corrects for the GC-content and copy number bias and then applies Hidden Markov Models to detect the signal from the corrected data. On simulated data, HMCan outperformed several commonly used tools developed to analyze histone modification data produced from genomes without copy number alterations. HMCan also showed superior results on a ChIP-seq dataset generated for the repressive histone mark H3K27me3 in a bladder cancer cell line. HMCan predictions matched well with experimental data (qPCR validated regions) and included, for example, the previously detected H3K27me3 mark in the promoter of the DLEC1 gene, missed by other tools we tested.
Assuntos
Montagem e Desmontagem da Cromatina/genética , Imunoprecipitação da Cromatina/métodos , Epigênese Genética , Histonas/genética , Processamento de Proteína Pós-Traducional , Software , Neoplasias da Bexiga Urinária/genética , Composição de Bases , Simulação por Computador , Variações do Número de Cópias de DNA/genética , Genoma Humano , Histonas/metabolismo , Humanos , Cadeias de Markov , Análise de Sequência com Séries de Oligonucleotídeos , Regiões Promotoras Genéticas/genética , Neoplasias da Bexiga Urinária/diagnósticoRESUMO
Acute leukemias are characterized by deregulation of transcriptional networks that control the lineage specificity of gene expression. The aberrant overexpression of the Spi-1/PU.1 transcription factor leads to erythroleukemia. To determine how Spi-1 mechanistically influences the transcriptional program, we combined a ChIP-seq analysis with transcriptional profiling in cells from an erythroleukemic mouse model. We show that Spi-1 displays a selective DNA-binding that does not often cause transcriptional modulation. We report that Spi-1 controls transcriptional activation and repression partially through distinct Spi-1 recruitment to chromatin. We revealed several parameters impacting on Spi-1-mediated transcriptional activation. Gene activation is facilitated by Spi-1 occupancy close to transcriptional starting site of genes devoid of CGIs. Moreover, in those regions Spi-1 acts by binding to multiple motifs tightly clustered and with similar orientation. Finally, in contrast to the myeloid and lymphoid B cells in which Spi-1 exerts a physiological activity, in the erythroleukemic cells, lineage-specific cooperating factors do not play a prevalent role in Spi-1-mediated transcriptional activation. Thus, our work describes a new mechanism of gene activation through clustered site occupancy of Spi-1 particularly relevant in regard to the strong expression of Spi-1 in the erythroleukemic cells.
Assuntos
Leucemia Eritroblástica Aguda/genética , Proteínas Proto-Oncogênicas/metabolismo , Elementos Reguladores de Transcrição , Transativadores/metabolismo , Ativação Transcricional , Animais , Sítios de Ligação , Linhagem Celular Tumoral , Imunoprecipitação da Cromatina , Ilhas de CpG , DNA/química , DNA/metabolismo , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genoma , Leucemia Eritroblástica Aguda/metabolismo , Camundongos , Camundongos Transgênicos , Motivos de Nucleotídeos , Análise de Sequência de DNA , Sítio de Iniciação de TranscriçãoRESUMO
BACKGROUND: The phenomenon of field cancerization reflects the transition of normal cells into those predisposed to cancer. Assessing the scope and intensity of this process in the colon may support risk prediction and colorectal cancer prevention. METHODS: The Swiss Epigenetic Colorectal Cancer Study (SWEPIC) study, encompassing 1111 participants for DNA methylation analysis and a subset of 84 for RNA sequencing, was employed to detect field cancerization in individuals with adenomatous polyps (AP). Methylation variations were evaluated for their discriminative capability, including in external cohorts, genomic localization, clinical correlations, and associated RNA expression patterns. RESULTS: Normal cecal tissue of individuals harboring an AP in the proximal colon manifested dysregulated DNA methylation compared to tissue from healthy individuals at 558 unique loci. Leveraging these adenoma-related differentially variable and methylated CpGs (aDVMCs), our classifier discerned between healthy and AP-adjacent tissues across SWEPIC datasets (cross-validated area under the receiver operating characteristic curve [ROC AUC] = 0.63-0.81), including within age-stratified cohorts. This discriminative capacity was validated in 3 external sets, differentiating healthy from cancer-adjacent tissue (ROC AUC = 0.82-0.88). Notably, aDVMC dysregulation correlated with polyp multiplicity. More than 50% of aDVMCs were significantly associated with age. These aDVMCs were enriched in active regions of the genome (P < .001), and associated genes exhibited altered expression in AP-adjacent tissues. CONCLUSIONS: Our findings underscore the early onset of field cancerization in the right colon during the neoplastic transformation process. A more extensive validation of aDVMC dysregulation as a stratification tool could pave the way for enhanced surveillance approaches, especially given its linkage to adenoma emergence.
Assuntos
Pólipos Adenomatosos , Metilação de DNA , Humanos , Pólipos Adenomatosos/genética , Pólipos Adenomatosos/patologia , Feminino , Masculino , Pessoa de Meia-Idade , Idoso , Biomarcadores Tumorais/genética , Mucosa Intestinal/patologia , Mucosa Intestinal/metabolismo , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Regulação Neoplásica da Expressão Gênica , Transformação Celular Neoplásica/genética , Ilhas de CpG/genética , Epigênese GenéticaRESUMO
Esophageal adenocarcinoma (EAC) is a highly lethal cancer of the upper gastrointestinal tract with rising incidence in western populations. To decipher EAC disease progression and therapeutic response, we performed multiomic analyses of a cohort of primary and metastatic EAC tumors, incorporating single-nuclei transcriptomic and chromatin accessibility sequencing, along with spatial profiling. We identified tumor microenvironmental features previously described to associate with therapy response. We identified five malignant cell programs, including undifferentiated, intermediate, differentiated, epithelial-to-mesenchymal transition, and cycling programs, which were associated with differential epigenetic plasticity and clinical outcomes, and for which we inferred candidate transcription factor regulons. Furthermore, we revealed diverse spatial localizations of malignant cells expressing their associated transcriptional programs and predicted their significant interactions with microenvironmental cell types. We validated our findings in three external single-cell RNA-seq and three bulk RNA-seq studies. Altogether, our findings advance the understanding of EAC heterogeneity, disease progression, and therapeutic response.
RESUMO
Adrenocortical carcinoma (ACC) is a rare and highly heterogeneous disease with a notably poor prognosis due to significant challenges in diagnosis and treatment. Emphasizing on the importance of precision medicine, there is an increasing need for comprehensive genomic resources alongside well-developed experimental models to devise personalized therapeutic strategies. We present ACC_CellMinerCDB, a substantive genomic and drug sensitivity database (available at https://discover.nci.nih.gov/acc_cellminercdb) comprising ACC cell lines, patient-derived xenografts, surgical samples, and responses to more than 2,400 drugs examined by the NCI and National Center for Advancing Translational Sciences. This database exposes shared genomic pathways among ACC cell lines and surgical samples, thus authenticating the cell lines as research models. It also allows exploration of pertinent treatment markers such as MDR-1, SOAT1, MGMT, MMR, and SLFN11 and introduces the potential to repurpose agents like temozolomide for ACC therapy. ACC_CellMinerCDB provides the foundation for exploring larger preclinical ACC models. SIGNIFICANCE: ACC_CellMinerCDB, a comprehensive database of cell lines, patient-derived xenografts, surgical samples, and drug responses, reveals shared genomic pathways and treatment-relevant markers in ACC. This resource offers insights into potential therapeutic targets and the opportunity to repurpose existing drugs for ACC therapy.