RESUMEN
Metastasis is the leading cause of cancer-related deaths, and greater knowledge of the metastatic microenvironment is necessary to effectively target this process. Microenvironmental changes occur at distant sites prior to clinically detectable metastatic disease; however, the key niche regulatory signals during metastatic progression remain poorly characterized. Here, we identify a core immune suppression gene signature in pre-metastatic niche formation that is expressed predominantly by myeloid cells. We target this immune suppression program by utilizing genetically engineered myeloid cells (GEMys) to deliver IL-12 to modulate the metastatic microenvironment. Our data demonstrate that IL12-GEMy treatment reverses immune suppression in the pre-metastatic niche by activating antigen presentation and T cell activation, resulting in reduced metastatic and primary tumor burden and improved survival of tumor-bearing mice. We demonstrate that IL12-GEMys can functionally modulate the core program of immune suppression in the pre-metastatic niche to successfully rebalance the dysregulated metastatic microenvironment in cancer.
Asunto(s)
Terapia de Inmunosupresión , Células Mieloides/metabolismo , Inmunidad Adaptativa , Animales , Línea Celular Tumoral , Ingeniería Genética , Humanos , Interleucina-12/genética , Interleucina-12/metabolismo , Pulmón/metabolismo , Neoplasias Pulmonares/inmunología , Neoplasias Pulmonares/mortalidad , Neoplasias Pulmonares/patología , Activación de Linfocitos , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Transgénicos , Células Mieloides/citología , Células Mieloides/inmunología , Metástasis de la Neoplasia , Rabdomiosarcoma/metabolismo , Rabdomiosarcoma/patología , Tasa de Supervivencia , Linfocitos T/inmunología , Linfocitos T/metabolismo , Microambiente TumoralRESUMEN
Precision oncology has made significant advances, mainly by targeting actionable mutations in cancer driver genes. Aiming to expand treatment opportunities, recent studies have begun to explore the utility of tumor transcriptome to guide patient treatment. Here, we introduce SELECT (synthetic lethality and rescue-mediated precision oncology via the transcriptome), a precision oncology framework harnessing genetic interactions to predict patient response to cancer therapy from the tumor transcriptome. SELECT is tested on a broad collection of 35 published targeted and immunotherapy clinical trials from 10 different cancer types. It is predictive of patients' response in 80% of these clinical trials and in the recent multi-arm WINTHER trial. The predictive signatures and the code are made publicly available for academic use, laying a basis for future prospective clinical studies.
Asunto(s)
Biomarcadores de Tumor/genética , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Terapia Molecular Dirigida , Neoplasias/tratamiento farmacológico , Medicina de Precisión , Mutaciones Letales Sintéticas , Transcriptoma/efectos de los fármacos , Anciano , Biomarcadores de Tumor/antagonistas & inhibidores , Biomarcadores de Tumor/inmunología , Ensayos Clínicos como Asunto , Femenino , Estudios de Seguimiento , Humanos , Inmunoterapia , Masculino , Neoplasias/genética , Neoplasias/patología , Pronóstico , Estudios Prospectivos , Estudios Retrospectivos , Tasa de SupervivenciaRESUMEN
The urea cycle (UC) is the main pathway by which mammals dispose of waste nitrogen. We find that specific alterations in the expression of most UC enzymes occur in many tumors, leading to a general metabolic hallmark termed "UC dysregulation" (UCD). UCD elicits nitrogen diversion toward carbamoyl-phosphate synthetase2, aspartate transcarbamylase, and dihydrooratase (CAD) activation and enhances pyrimidine synthesis, resulting in detectable changes in nitrogen metabolites in both patient tumors and their bio-fluids. The accompanying excess of pyrimidine versus purine nucleotides results in a genomic signature consisting of transversion mutations at the DNA, RNA, and protein levels. This mutational bias is associated with increased numbers of hydrophobic tumor antigens and a better response to immune checkpoint inhibitors independent of mutational load. Taken together, our findings demonstrate that UCD is a common feature of tumors that profoundly affects carcinogenesis, mutagenesis, and immunotherapy response.
Asunto(s)
Genómica , Metabolómica , Neoplasias/patología , Urea/metabolismo , Sistemas de Transporte de Aminoácidos Básicos/metabolismo , Animales , Aspartato Carbamoiltransferasa/genética , Aspartato Carbamoiltransferasa/metabolismo , Carbamoil-Fosfato Sintasa (Glutamina-Hidrolizante)/genética , Carbamoil-Fosfato Sintasa (Glutamina-Hidrolizante)/metabolismo , Línea Celular Tumoral , Dihidroorotasa/genética , Dihidroorotasa/metabolismo , Femenino , Humanos , Ratones , Ratones Endogámicos C57BL , Ratones SCID , Proteínas de Transporte de Membrana Mitocondrial , Neoplasias/metabolismo , Ornitina Carbamoiltransferasa/antagonistas & inhibidores , Ornitina Carbamoiltransferasa/genética , Ornitina Carbamoiltransferasa/metabolismo , Fosforilación/efectos de los fármacos , Pirimidinas/biosíntesis , Pirimidinas/química , Interferencia de ARN , ARN Interferente Pequeño/metabolismo , Sirolimus/farmacología , Serina-Treonina Quinasas TOR/antagonistas & inhibidores , Serina-Treonina Quinasas TOR/metabolismoRESUMEN
αß lineage T cells, most of which are CD4+ or CD8+ and recognize MHC I- or MHC II-presented antigens, are essential for immune responses and develop from CD4+CD8+ thymocytes. The absence of in vitro models and the heterogeneity of αß thymocytes have hampered analyses of their intrathymic differentiation. Here, combining single-cell RNA and ATAC (chromatin accessibility) sequencing, we identified mouse and human αß thymocyte developmental trajectories. We demonstrated asymmetric emergence of CD4+ and CD8+ lineages, matched differentiation programs of agonist-signaled cells to their MHC specificity, and identified correspondences between mouse and human transcriptomic and epigenomic patterns. Through computational analysis of single-cell data and binding sites for the CD4+-lineage transcription factor Thpok, we inferred transcriptional networks associated with CD4+- or CD8+-lineage differentiation, and with expression of Thpok or of the CD8+-lineage factor Runx3. Our findings provide insight into the mechanisms of CD4+ and CD8+ T cell differentiation and a foundation for mechanistic investigations of αß T cell development.
Asunto(s)
Diferenciación Celular/inmunología , Linaje de la Célula/inmunología , Subgrupos de Linfocitos T/inmunología , Timocitos/inmunología , Animales , Presentación de Antígeno/inmunología , Linfocitos T CD4-Positivos/inmunología , Linfocitos T CD4-Positivos/metabolismo , Linfocitos T CD8-positivos/inmunología , Linfocitos T CD8-positivos/metabolismo , Diferenciación Celular/genética , Linaje de la Célula/genética , Epigenoma , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Antígenos de Histocompatibilidad/genética , Antígenos de Histocompatibilidad/inmunología , Antígenos de Histocompatibilidad/metabolismo , Humanos , Ratones , Subgrupos de Linfocitos T/metabolismo , Timocitos/metabolismo , Timo/inmunología , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , TranscriptomaRESUMEN
Complex age-associated phenotypes are caused, in part, by an interaction between an individual's genotype and age. The mechanisms governing such interactions are however not entirely understood. Here, we provide a novel transcriptional mechanism-based framework-SNiPage, to investigate such interactions, whereby a transcription factor (TF) whose expression changes with age (age-associated TF), binds to a polymorphic regulatory element in an allele-dependent fashion, rendering the target gene's expression dependent on both, the age and the genotype. Applying SNiPage to GTEx, we detected ~637 significant TF-SNP-Gene triplets on average across 25 tissues, where the TF binds to a regulatory SNP in the gene's promoter or putative enhancer and potentially regulates its expression in an age- and allele-dependent fashion. The detected SNPs are enriched for epigenomic marks indicative of regulatory activity, exhibit allele-specific chromatin accessibility, and spatial proximity to their putative gene targets. Furthermore, the TF-SNP interaction-dependent target genes have established links to aging and to age-associated diseases. In six hypertension-implicated tissues, detected interactions significantly inform hypertension state of an individual. Lastly, the age-interacting SNPs exhibit a greater proximity to the reported phenotype/diseases-associated SNPs than eSNPs identified in an interaction-independent fashion. Overall, we present a novel mechanism-based model, and a novel framework SNiPage, to identify functionally relevant SNP-age interactions in transcriptional control and illustrate their potential utility in understanding complex age-associated phenotypes.
Asunto(s)
Envejecimiento/genética , Regulación de la Expresión Génica , Modelos Biológicos , Fenotipo , Polimorfismo de Nucleótido Simple , Transcripción Genética , Algoritmos , Alelos , Humanos , Factores de Transcripción/metabolismoRESUMEN
The RB1 tumor suppressor is recurrently mutated in a variety of cancers including retinoblastomas, small cell lung cancers, triple-negative breast cancers, prostate cancers, and osteosarcomas. Finding new synthetic lethal (SL) interactions with RB1 could lead to new approaches to treating cancers with inactivated RB1. We identified 95 SL partners of RB1 based on a Drosophila screen for genetic modifiers of the eye phenotype caused by defects in the RB1 ortholog, Rbf1. We validated 38 mammalian orthologs of Rbf1 modifiers as RB1 SL partners in human cancer cell lines with defective RB1 alleles. We further show that for many of the RB1 SL genes validated in human cancer cell lines, low activity of the SL gene in human tumors, when concurrent with low levels of RB1 was associated with improved patient survival. We investigated higher order combinatorial gene interactions by creating a novel Drosophila cancer model with co-occurring Rbf1, Pten and Ras mutations, and found that targeting RB1 SL genes in this background suppressed the dramatic tumor growth and rescued fly survival whilst having minimal effects on wild-type cells. Finally, we found that drugs targeting the identified RB1 interacting genes/pathways, such as UNC3230, PYR-41, TAK-243, isoginkgetin, madrasin, and celastrol also elicit SL in human cancer cell lines. In summary, we identified several high confidence, evolutionarily conserved, novel targets for RB1-deficient cells that may be further adapted for the treatment of human cancer.
Asunto(s)
Neoplasias/genética , Fosfotransferasas (Aceptor de Grupo Alcohol)/genética , Empalme del ARN , Proteína de Retinoblastoma/genética , Transducción de Señal , Ubiquitina/metabolismo , Animales , Animales Modificados Genéticamente , Línea Celular Tumoral , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Anomalías del Ojo/genética , Anomalías del Ojo/metabolismo , Humanos , Neoplasias/metabolismo , Neoplasias/patología , Fosfohidrolasa PTEN/genética , Fosfohidrolasa PTEN/metabolismo , Fosfotransferasas (Aceptor de Grupo Alcohol)/metabolismo , Interferencia de ARN , Proteína de Retinoblastoma/deficiencia , Proteína de Retinoblastoma/metabolismo , Especificidad de la Especie , Análisis de Supervivencia , Mutaciones Letales Sintéticas/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Proteínas ras/genética , Proteínas ras/metabolismoRESUMEN
BACKGROUND: Previous investigations of transcriptomic signatures of cancer patient survival and post-therapy relapse have focused on tumor tissue. In contrast, here we show that in colorectal cancer (CRC) transcriptomes derived from normal tissues adjacent to tumors (NATs) are better predictors of relapse. RESULTS: Using the transcriptomes of paired tumor and NAT specimens from 80 Korean CRC patients retrospectively determined to be in recurrence or nonrecurrence states, we found that, when comparing recurrent with nonrecurrent samples, NATs exhibit a greater number of differentially expressed genes (DEGs) than tumors. Training two prognostic elastic net-based machine learning models-NAT-based and tumor-based in our Samsung Medical Center (SMC) cohort, we found that NAT-based model performed better in predicting the survival when the model was applied to the tumor-derived transcriptomes of an independent cohort of 450 COAD patients in TCGA. Furthermore, compositions of tumor-infiltrating immune cells in NATs were found to have better prognostic capability than in tumors. We also confirmed through Cox regression analysis that in both SMC-CRC as well as in TCGA-COAD cohorts, a greater proportion of genes exhibited significant hazard ratio when NAT-derived transcriptome was used compared to when tumor-derived transcriptome was used. CONCLUSIONS: Taken together, our results strongly suggest that NAT-derived transcriptomes and immune cell composition of CRC are better predictors of patient survival and tumor recurrence than the primary tumor.
Asunto(s)
Neoplasias Colorrectales , Transcriptoma , Humanos , Transcriptoma/genética , Estudios Retrospectivos , Neoplasias Colorrectales/patología , Recurrencia Local de Neoplasia/genética , Perfilación de la Expresión Génica , PronósticoRESUMEN
Unliganded Estrogen receptor alpha (ERα) has been implicated in ligand-dependent gene regulation. Upon ligand exposure, ERα binds to several EREs relatively proximal to the pre-marked, unliganded ERα-bound sites and affects transient but robust gene expression. However, the underlying mechanisms are not fully understood. Here we demonstrate that upon ligand stimulation, persistent sites interact extensively, via chromatin looping, with the proximal transiently ERα-bound sites, forming Ligand Dependent ERα Enhancer Cluster in 3D (LDEC). The E2-target genes are regulated by these clustered enhancers but not by the H3K27Ac super-enhancers. Further, CRISPR-based deletion of TFF1 persistent site disrupts the formation of its LDEC resulting in the loss of E2-dependent expression of TFF1 and its neighboring genes within the same TAD. The LDEC overlap with nuclear ERα condensates that coalesce in a ligand and persistent site dependent manner. Furthermore, formation of clustered enhancers, as well as condensates, coincide with the active phase of signaling and their later disappearance results in the loss of gene expression even though persistent sites remain bound by ERα. Our results establish, at TFF1 and NRIP1 locus, a direct link between ERα condensates, ERα enhancer clusters, and transient, but robust, gene expression in a ligand-dependent fashion.
Asunto(s)
Ensamble y Desensamble de Cromatina , Elementos de Facilitación Genéticos , Receptor alfa de Estrógeno/metabolismo , Receptor alfa de Estrógeno/genética , Eliminación de Gen , Histonas/metabolismo , Humanos , Ligandos , Células MCF-7 , Factor Trefoil-1/genéticaRESUMEN
MOTIVATION: Transcriptomes are routinely used to prioritize genes underlying specific phenotypes. Current approaches largely focus on differentially expressed genes (DEGs), despite the recognition that phenotypes emerge via a network of interactions between genes and proteins, many of which may not be differentially expressed. Furthermore, many practical applications lack sufficient samples or an appropriate control to robustly identify statistically significant DEGs. RESULTS: We provide a computational tool-PathExt, which, in contrast to differential genes, identifies differentially active paths when a control is available, and most active paths otherwise, in an omics-integrated biological network. The sub-network comprising such paths, referred to as the TopNet, captures the most relevant genes and processes underlying the specific biological context. The TopNet forms a well-connected graph, reflecting the tight orchestration in biological systems. Two key advantages of PathExt are (i) it can extract characteristic genes and pathways even when only a single sample is available, and (ii) it can be used to study a system even in the absence of an appropriate control. We demonstrate the utility of PathExt via two diverse sets of case studies, to characterize (i) Mycobacterium tuberculosis response upon exposure to 18 antibacterial drugs where only one transcriptomic sample is available for each exposure; and (ii) tissue-relevant genes and processes using transcriptomic data for 39 human tissues. Overall, PathExt is a general tool for prioritizing context-relevant genes in any omics-integrated biological network for any condition(s) of interest, even with a single sample or in the absence of appropriate controls. AVAILABILITYAND IMPLEMENTATION: The source code for PathExt is available at https://github.com/NarmadaSambaturu/PathExt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional , Programas Informáticos , Humanos , Proteínas , TranscriptomaRESUMEN
Knowledge of genes that are critical to a tissue's function remains difficult to ascertain and presents a major bottleneck toward a mechanistic understanding of genotype-phenotype links. Here, we present the first machine learning model-FUGUE-combining transcriptional and network features, to predict tissue-relevant genes across 30 human tissues. FUGUE achieves an average cross-validation auROC of 0.86 and auPRC of 0.50 (expected 0.09). In independent datasets, FUGUE accurately distinguishes tissue or cell type-specific genes, significantly outperforming the conventional metric based on tissue-specific expression alone. Comparison of tissue-relevant transcription factors across tissue recapitulate their developmental relationships. Interestingly, the tissue-relevant genes cluster on the genome within topologically associated domains and furthermore, are highly enriched for differentially expressed genes in the corresponding cancer type. We provide the prioritized gene lists in 30 human tissues and an open-source software to prioritize genes in a novel context given multi-sample transcriptomic data.
Asunto(s)
Estudios de Asociación Genética , Aprendizaje Automático , Modelos Genéticos , Biología Computacional , Femenino , Regulación del Desarrollo de la Expresión Génica , Redes Reguladoras de Genes , Genoma Humano , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Masculino , Familia de Multigenes , Neoplasias/genética , Mapas de Interacción de Proteínas/genética , Programas Informáticos , Distribución Tisular , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , TranscriptomaRESUMEN
DNA methylation at the promoter of a gene is presumed to render it silent, yet a sizable fraction of genes with methylated proximal promoters exhibit elevated expression. Here, we show, through extensive analysis of the methylome and transcriptome in 34 tissues, that in many such cases, transcription is initiated by a distal upstream CpG island (CGI) located several kilobases away that functions as an alternative promoter. Specifically, such genes are expressed precisely when the neighboring CGI is unmethylated but remain silenced otherwise. Based on CAGE and Pol II localization data, we found strong evidence of transcription initiation at the upstream CGI and a lack thereof at the methylated proximal promoter itself. Consistent with their alternative promoter activity, CGI-initiated transcripts are associated with signals of stable elongation and splicing that extend into the gene body, as evidenced by tissue-specific RNA-seq and other DNA-encoded splice signals. Furthermore, based on both inter- and intra-species analyses, such CGIs were found to be under greater purifying selection relative to CGIs upstream of silenced genes. Overall, our study describes a hitherto unreported conserved mechanism of transcription of genes with methylated proximal promoters in a tissue-specific fashion. Importantly, this phenomenon explains the aberrant expression patterns of some cancer driver genes, potentially due to aberrant hypomethylation of distal CGIs, despite methylation at proximal promoters.
Asunto(s)
Islas de CpG , Silenciador del Gen , Regiones Promotoras Genéticas , Iniciación de la Transcripción Genética , Línea Celular , Metilación de ADN , Humanos , TranscriptomaRESUMEN
Most patients with advanced cancer eventually acquire resistance to targeted therapies, spurring extensive efforts to identify molecular events mediating therapy resistance. Many of these events involve synthetic rescue (SR) interactions, where the reduction in cancer cell viability caused by targeted gene inactivation is rescued by an adaptive alteration of another gene (the rescuer). Here, we perform a genome-wide in silico prediction of SR rescuer genes by analyzing tumor transcriptomics and survival data of 10,000 TCGA cancer patients. Predicted SR interactions are validated in new experimental screens. We show that SR interactions can successfully predict cancer patients' response and emerging resistance. Inhibiting predicted rescuer genes sensitizes resistant cancer cells to therapies synergistically, providing initial leads for developing combinatorial approaches to overcome resistance proactively. Finally, we show that the SR analysis of melanoma patients successfully identifies known mediators of resistance to immunotherapy and predicts novel rescuers.
Asunto(s)
Biología Computacional , Resistencia a Antineoplásicos/genética , Sinergismo Farmacológico , Melanoma/genética , Femenino , Perfilación de la Expresión Génica , Humanos , Inmunoterapia , Masculino , Melanoma/tratamiento farmacológico , Terapia Molecular Dirigida , Mutaciones Letales SintéticasRESUMEN
ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a high-throughput technique to identify genomic regions that are bound in vivo by a particular protein, e.g., a transcription factor (TF). Biological factors, such as chromatin state, indirect and cooperative binding, as well as experimental factors, such as antibody quality, cross-linking, and PCR biases, are known to affect the outcome of ChIP-seq experiments. However, the relative impact of these factors on inferences made from ChIP-seq data is not entirely clear. Here, via a detailed ChIP-seq simulation pipeline, ChIPulate, we assess the impact of various biological and experimental sources of variation on several outcomes of a ChIP-seq experiment, viz., the recoverability of the TF binding motif, accuracy of TF-DNA binding detection, the sensitivity of inferred TF-DNA binding strength, and number of replicates needed to confidently infer binding strength. We find that the TF motif can be recovered despite poor and non-uniform extraction and PCR amplification efficiencies. The recovery of the motif is, however, affected to a larger extent by the fraction of sites that are either cooperatively or indirectly bound. Importantly, our simulations reveal that the number of ChIP-seq replicates needed to accurately measure in vivo occupancy at high-affinity sites is larger than the recommended community standards. Our results establish statistical limits on the accuracy of inferences of protein-DNA binding from ChIP-seq and suggest that increasing the mean extraction efficiency, rather than amplification efficiency, would better improve sensitivity. The source code and instructions for running ChIPulate can be found at https://github.com/vishakad/chipulate.
Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Factores de Transcripción , Sitios de Unión/genética , Simulación por Computador , ADN/química , ADN/genética , ADN/metabolismo , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Escherichia coli/genética , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Unión Proteica/genética , Factores de Transcripción/química , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
Complex gene expression patterns are mediated by the binding of transcription factors (TFs) to specific genomic loci. The in vivo occupancy of a TF is, in large part, determined by the TF's DNA binding interaction partners, motivating genomic context-based models of TF occupancy. However, approaches thus far have assumed a uniform TF binding model to explain genome-wide cell-type-specific binding sites. Therefore, the cell type heterogeneity of TF occupancy models, as well as the extent to which binding rules underlying a TF's occupancy are shared across cell types, has not been investigated. Here, we develop an ensemble-based approach (TRISECT) to identify the heterogeneous binding rules for cell-type-specific TF occupancy and analyze the inter-cell-type sharing of such rules. Comprehensive analysis of 23 TFs, each with ChIP-seq data in four to 12 different cell types, shows that by explicitly capturing the heterogeneity of binding rules, TRISECT accurately identifies in vivo TF occupancy. Importantly, many of the binding rules derived from individual cell types are shared across cell types and reveal distinct yet functionally coherent putative target genes in different cell types. Closer inspection of the predicted cell-type-specific interaction partners provides insights into the context-specific functional landscape of a TF. Together, our novel ensemble-based approach reveals, for the first time, a widespread heterogeneity of binding rules, comprising the interaction partners within a cell type, many of which nevertheless transcend cell types. Notably, the putative targets of shared binding rules in different cell types, while distinct, exhibit significant functional coherence.
Asunto(s)
Proteínas de Unión al ADN/genética , Heterogeneidad Genética , Unión Proteica/genética , Factores de Transcripción/genética , Sitios de Unión/genética , Linaje de la Célula/genética , Biología Computacional , Regulación de la Expresión Génica , Genómica , Humanos , Sensibilidad y EspecificidadRESUMEN
BACKGROUND: Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. RESULTS: We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it. CONCLUSIONS: Compositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Microbiota , ARN Ribosómico 16S/genética , Teorema de BayesRESUMEN
Guilt-by-association codifies the empirical observation that a gene's function is informed by its neighborhood in a biological network. This would imply that when a gene's network context is altered, for instance in disease condition, so could be the gene's function. Although context-specific changes in biological networks have been explored, the potential changes they may induce on the functional roles of genes are yet to be characterized. Here we analyze, for the first time, the network-induced potential functional changes in breast cancer. Using transcriptomic samples for 1047 breast tumors and 110 healthy breast tissues from TCGA, we derive sample-specific protein interaction networks and assign sample-specific functions to genes via a diffusion strategy. Testing for significant changes in the inferred functions between normal and cancer samples, we find several functions to have significantly gained or lost genes in cancer, not due to differential expression of genes known to perform the function, but rather due to changes in the network topology. Our predicted functional changes are supported by mutational and copy number profiles in breast cancers. Our diffusion-based functional assignment provides a novel characterization of a tumor that is complementary to the standard approach based on functional annotation alone. Importantly, this characterization is effective in predicting patient survival, as well as in predicting several known histopathological subtypes of breast cancer.
Asunto(s)
Neoplasias de la Mama/genética , Biología Computacional/métodos , Mapas de Interacción de Proteínas/genética , Transcriptoma/genética , Algoritmos , Mama/metabolismo , Neoplasias de la Mama/metabolismo , Análisis por Conglomerados , Difusión , Femenino , Perfilación de la Expresión Génica , Humanos , Mutación , Mapas de Interacción de Proteínas/fisiologíaRESUMEN
A coregulated module of genes ("regulon") can have evolutionarily conserved expression patterns and yet have diverged upstream regulators across species. For instance, the ribosomal genes regulon is regulated by the transcription factor (TF) TBF1 in Candida albicans, while in Saccharomyces cerevisiae it is regulated by RAP1. Only a handful of such rewiring events have been established, and the prevalence or conditions conducive to such events are not well known. Here, we develop a novel probabilistic scoring method to comprehensively screen for regulatory rewiring within regulons across 23 yeast species. Investigation of 1,713 regulons and 176 TFs yielded 5,353 significant rewiring events at 5% false discovery rate (FDR). Besides successfully recapitulating known rewiring events, our analyses also suggest TF candidates for certain processes reported to be under distinct regulatory controls in S. cerevisiae and C. albicans, for which the implied regulators are not known: 1) Oxidative stress response (Sc-MSN2 to Ca-FKH2) and 2) nutrient modulation (Sc-RTG1 to Ca-GCN4/Ca-UME6). Furthermore, a stringent screen to detect TF rewiring at individual genes identified 1,446 events at 10% FDR. Overall, these events are supported by strong coexpression between the predicted regulator and its target gene(s) in a species-specific fashion (>50-fold). Independent functional analyses of rewiring TF pairs revealed greater functional interactions and shared biological processes between them (P = 1 × 10(-3)).Our study represents the first comprehensive assessment of regulatory rewiring; with a novel approach that has generated a unique high-confidence resource of several specific events, suggesting that evolutionary rewiring is relatively frequent and may be a significant mechanism of regulatory innovation.
Asunto(s)
Regulación Fúngica de la Expresión Génica , Ensayos Analíticos de Alto Rendimiento/métodos , Levaduras/genética , Evolución Molecular , Proteínas Fúngicas/genética , Redes Reguladoras de Genes , Factores de Transcripción/genéticaRESUMEN
BACKGROUND: Large mega base-pair genomic regions show robust alterations in DNA methylation levels in multiple cancers. A vast majority of these regions are hypomethylated in cancers. These regions are generally enriched for CpG islands, Lamin Associated Domains and Large organized chromatin lysine modification domains, and are associated with stochastic variability in gene expression. Given the size and consistency of hypomethylated blocks (HMB) across cancer types, we hypothesized that the immediate causes of methylation instability are likely to be encoded in the genomic region near HMB boundaries, in terms of specific genomic or epigenomic signatures. However, a detailed characterization of the HMB boundaries has not been reported. METHOD: Here, we focused on ~13 k HMBs, encompassing approximately half of the genome, identified in colon cancer. We modeled the genomic features of HMB boundaries by Random Forest to identify their salient features, in terms of transcription factor (TF) binding motifs. Additionally we analyzed various epigenomic marks, and chromatin structural features of HMB boundaries relative to the non-HMB genomic regions. RESULT: We found that the classical promoter epigenomic mark--H3K4me3, is highly enriched at HMB boundaries, as are CTCF bound sites. HMB boundaries harbor distinct combinations of TF motifs. Our Random Forest model based on TF motifs can accurately distinguish boundaries not only from regions inside and outside HMBs, but surprisingly, from active promoters as well. Interestingly, the distinguishing TFs and their interacting proteins are involved in chromatin modification. Finally, HMB boundaries significantly coincide with the boundaries of Topologically Associating Domains of the chromatin. CONCLUSION: Our analyses suggest that the overall architecture of HMBs is guided by pre-existing chromatin architecture, and are associated with aberrant activity of promoter-like sequences at the boundary.
Asunto(s)
Neoplasias del Colon/genética , Metilación de ADN/genética , Epigenómica , Genoma Humano , Línea Celular Tumoral , Cromatina/genética , Neoplasias del Colon/patología , Islas de CpG/genética , Histonas/genética , Humanos , Regiones Promotoras GenéticasRESUMEN
CRISPRs offer adaptive immunity in prokaryotes by acquiring genomic fragments from infecting phage and subsequently exploiting them for phage restriction via an RNAi-like mechanism. Here, we develop and analyze a dynamical model of CRISPR-mediated prokaryote-phage coevolution that incorporates classical CRISPR kinetics along with the recently discovered infection-induced activation and autoimmunity side effects. Our analyses reveal two striking characteristics of the CRISPR defense strategy: that both restriction and abortive infections operate during coevolution with phages, driving phages to much lower densities than possible with restriction alone, and that CRISPR maintenance is determined by a key dimensionless combination of parameters, which upper bounds the activation level of CRISPRs in uninfected populations. We contrast these qualitative observations with experimental data on CRISPR kinetics, which offer insight into the spacer deletion mechanism and the observed low CRISPR prevalence in clinical isolates. More generally, we exploit numerical simulations to delineate four regimes of CRISPR dynamics in terms of its host, kinetic, and regulatory parameters.