RESUMEN
For neurodevelopmental disorders (NDDs), a molecular diagnosis is key for management, predicting outcome, and counseling. Often, routine DNA-based tests fail to establish a genetic diagnosis in NDDs. Transcriptome analysis (RNA sequencing [RNA-seq]) promises to improve the diagnostic yield but has not been applied to NDDs in routine diagnostics. Here, we explored the diagnostic potential of RNA-seq in 96 individuals including 67 undiagnosed subjects with NDDs. We performed RNA-seq on single individuals' cultured skin fibroblasts, with and without cycloheximide treatment, and used modified OUTRIDER Z scores to detect gene expression outliers and mis-splicing by exonic and intronic outliers. Analysis was performed by a user-friendly web application, and candidate pathogenic transcriptional events were confirmed by secondary assays. We identified intragenic deletions, monoallelic expression, and pseudoexonic insertions but also synonymous and non-synonymous variants with deleterious effects on transcription, increasing the diagnostic yield for NDDs by 13%. We found that cycloheximide treatment and exonic/intronic Z score analysis increased detection and resolution of aberrant splicing. Importantly, in one individual mis-splicing was found in a candidate gene nearly matching the individual's specific phenotype. However, pathogenic splicing occurred in another neuronal-expressed gene and provided a molecular diagnosis, stressing the need to customize RNA-seq. Lastly, our web browser application allowed custom analysis settings that facilitate diagnostic application and ranked pathogenic transcripts as top candidates. Our results demonstrate that RNA-seq is a complementary method in the genomic diagnosis of NDDs and, by providing accessible analysis with improved sensitivity, our transcriptome analysis approach facilitates wider implementation of RNA-seq in routine genome diagnostics.
Asunto(s)
Perfilación de la Expresión Génica , Trastornos del Neurodesarrollo , Humanos , RNA-Seq , Cicloheximida , Análisis de Secuencia de ARN/métodos , Trastornos del Neurodesarrollo/diagnóstico , Trastornos del Neurodesarrollo/genéticaRESUMEN
Despite increasing knowledge of disease-causing genes in human genetics, approximately half of the individuals affected by neurodevelopmental disorders remain genetically undiagnosed. Part of this missing heritability might be caused by genetic variants outside of protein-coding genes, which are not routinely diagnostically investigated. A recent preprint identified de novo variants in the non-coding spliceosomal snRNA gene RNU4-2 as a cause of a frequent novel syndromic neurodevelopmental disorder. Here we mined 164 whole genome sequencing (WGS) trios from individuals with neurodevelopmental or multiple congenital anomaly disorders that received diagnostic genomic investigations at our clinic. We identify a recurrent de novo RNU4-2 variant (NR_003137.2(RNU4-2):n.64_65insT) in a 5-year-old girl with severe global developmental delay, hypotonia, microcephaly, and seizures that likely explains her phenotype, given that extensive previous genetic investigations failed to identify an alternative cause. We present detailed phenotyping of the individual obtained during a 5-year follow-up. This includes photographs showing recognizable facial features for this novel disorder, which might allow prioritizing other currently unexplained affected individuals sharing similar facial features for targeted investigations of RNU4-2. This case illustrates the power of re-analysis to solve previously unexplained cases even when a diagnostic genome remains negative.
Asunto(s)
Trastornos del Neurodesarrollo , Secuenciación Completa del Genoma , Humanos , Femenino , Preescolar , Trastornos del Neurodesarrollo/genética , Trastornos del Neurodesarrollo/diagnóstico , Fenotipo , ARN Nuclear Pequeño/genética , Mutación/genética , Predisposición Genética a la EnfermedadRESUMEN
Detailed genomic contact maps have revealed that chromosomes are structurally organized in megabase-sized topologically associated domains (TADs) that encompass smaller subTADs. These domains segregate in the nuclear space to form active and inactive nuclear compartments, but cause and consequence of compartmentalization are largely unknown. Here, we combined lacO/lacR binding platforms with allele-specific 4C technologies to track their precise position in the three-dimensional genome upon recruitment of NANOG, SUV39H1, or EZH2. We observed locked genomic loci resistant to spatial repositioning and unlocked loci that could be repositioned to different nuclear subcompartments with distinct chromatin signatures. Focal protein recruitment caused the entire subTAD, but not surrounding regions, to engage in new genomic contacts. Compartment switching was found uncoupled from transcription changes, and the enzymatic modification of histones per se was insufficient for repositioning. Collectively, this suggests that trans-associated factors influence three-dimensional compartmentalization independent of their cis effect on local chromatin composition and activity.
Asunto(s)
Núcleo Celular/metabolismo , Segregación Cromosómica , Células Madre Embrionarias/metabolismo , Sitios Genéticos , Operón Lac , Represoras Lac/metabolismo , Animales , Células Cultivadas , Cromatina/metabolismo , Ensamble y Desensamble de Cromatina , Proteína Potenciadora del Homólogo Zeste 2 , Regulación de la Expresión Génica , Proteínas de Homeodominio/genética , Proteínas de Homeodominio/metabolismo , Represoras Lac/genética , Metiltransferasas/genética , Metiltransferasas/metabolismo , Ratones de la Cepa 129 , Ratones Endogámicos C57BL , Proteína Homeótica Nanog , Complejo Represivo Polycomb 2/genética , Complejo Represivo Polycomb 2/metabolismo , Proteínas Represoras/genética , Proteínas Represoras/metabolismo , TransfecciónRESUMEN
OBJECTIVE: To evaluate which cytogenetic characteristics of confined placental mosaicism (CPM) detected in the first trimester chorionic villi and/or placentas in terms of chromosome aberration, cell lineage involved and trisomy origin will lead to fetal growth restriction and low birthweight. METHODS: Cohort study using routinely collected perinatal data and cytogenetic data of non-invasive prenatal testing, the first trimester chorionic villi sampling and postnatal placentas. RESULTS: 215 CPM cases were found. Fetal growth restriction (FGR) and low birthweight below the 10th percentile (BW < p10) were seen in 34.0% and 23.1%, respectively. Excluding cases of trisomy 16, 29.1% showed FGR and 17.9% had a BW < p10. The highest rate of FGR and BW < p10 was found in CPM type 3, but differences with type 1 and 2 were not significant. FGR and BW < p10 were significantly more often observed in cases with meiotic trisomies. CONCLUSION: There is an association between CPM and FGR and BW < p10. This association is not restricted to trisomy 16, neither to CPM type 3, nor to CPM involving a meiotic trisomy. Pregnancies with all CPM types and origins should be considered to be at increased risk of FGR and low BW < p10. A close prenatal fetal monitoring is indicated in all cases of CPM.
Asunto(s)
Placenta , Trisomía , Embarazo , Femenino , Humanos , Placenta/metabolismo , Trisomía/diagnóstico , Trisomía/genética , Mosaicismo , Retardo del Crecimiento Fetal/diagnóstico , Retardo del Crecimiento Fetal/genética , Estudios de Cohortes , Peso al Nacer , Estudios Retrospectivos , Cromosomas Humanos Par 16RESUMEN
Hereditary spastic paraplegias (HSP) are rare, inherited neurodegenerative or neurodevelopmental disorders that mainly present with lower limb spasticity and muscle weakness due to motor neuron dysfunction. Whole genome sequencing identified bi-allelic truncating variants in AMFR, encoding a RING-H2 finger E3 ubiquitin ligase anchored at the membrane of the endoplasmic reticulum (ER), in two previously genetically unexplained HSP-affected siblings. Subsequently, international collaboration recognized additional HSP-affected individuals with similar bi-allelic truncating AMFR variants, resulting in a cohort of 20 individuals from 8 unrelated, consanguineous families. Variants segregated with a phenotype of mainly pure but also complex HSP consisting of global developmental delay, mild intellectual disability, motor dysfunction, and progressive spasticity. Patient-derived fibroblasts, neural stem cells (NSCs), and in vivo zebrafish modeling were used to investigate pathomechanisms, including initial preclinical therapy assessment. The absence of AMFR disturbs lipid homeostasis, causing lipid droplet accumulation in NSCs and patient-derived fibroblasts which is rescued upon AMFR re-expression. Electron microscopy indicates ER morphology alterations in the absence of AMFR. Similar findings are seen in amfra-/- zebrafish larvae, in addition to altered touch-evoked escape response and defects in motor neuron branching, phenocopying the HSP observed in patients. Interestingly, administration of FDA-approved statins improves touch-evoked escape response and motor neuron branching defects in amfra-/- zebrafish larvae, suggesting potential therapeutic implications. Our genetic and functional studies identify bi-allelic truncating variants in AMFR as a cause of a novel autosomal recessive HSP by altering lipid metabolism, which may potentially be therapeutically modulated using precision medicine with statins.
Asunto(s)
Inhibidores de Hidroximetilglutaril-CoA Reductasas , Paraplejía Espástica Hereditaria , Animales , Humanos , Paraplejía Espástica Hereditaria/tratamiento farmacológico , Paraplejía Espástica Hereditaria/genética , Inhibidores de Hidroximetilglutaril-CoA Reductasas/farmacología , Inhibidores de Hidroximetilglutaril-CoA Reductasas/uso terapéutico , Pez Cebra , Mutación , Neuronas Motoras , Receptores del Factor Autocrino de Motilidad/genéticaRESUMEN
The Hippo/YAP signaling pathway is a crucial regulator of tissue growth, stem cell activity, and tumorigenesis. However, the mechanism by which YAP controls transcription remains to be fully elucidated. Here, we utilize global chromatin occupancy analyses to demonstrate that robust YAP binding is restricted to a relatively small number of distal regulatory elements in the genome. YAP occupancy defines a subset of enhancers and superenhancers with the highest transcriptional outputs. YAP modulates transcription from these elements predominantly by regulating promoter-proximal polymerase II (Pol II) pause release. Mechanistically, YAP interacts and recruits the Mediator complex to enhancers, allowing the recruitment of the CDK9 elongating kinase. Genetic and chemical perturbation experiments demonstrate the requirement for Mediator and CDK9 in YAP-driven phenotypes of overgrowth and tumorigenesis. Our results here uncover the molecular mechanisms employed by YAP to exert its growth and oncogenic functions, and suggest strategies for intervention.
Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/genética , Neoplasias de los Conductos Biliares/genética , Colangiocarcinoma/genética , Regulación Neoplásica de la Expresión Génica , Péptidos y Proteínas de Señalización Intracelular/genética , Complejo Mediador/genética , Fosfoproteínas/genética , Proteínas Adaptadoras Transductoras de Señales/metabolismo , Animales , Antineoplásicos/farmacología , Neoplasias de los Conductos Biliares/tratamiento farmacológico , Neoplasias de los Conductos Biliares/metabolismo , Neoplasias de los Conductos Biliares/patología , Carcinogénesis/efectos de los fármacos , Carcinogénesis/genética , Carcinogénesis/metabolismo , Carcinogénesis/patología , Línea Celular Tumoral , Colangiocarcinoma/tratamiento farmacológico , Colangiocarcinoma/metabolismo , Colangiocarcinoma/patología , Cromatina/química , Cromatina/metabolismo , Quinasa 9 Dependiente de la Ciclina/genética , Quinasa 9 Dependiente de la Ciclina/metabolismo , ADN Polimerasa II/genética , ADN Polimerasa II/metabolismo , Elementos de Facilitación Genéticos , Flavonoides/farmacología , Humanos , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Complejo Mediador/metabolismo , Ratones , Ratones Transgénicos , Fosfoproteínas/metabolismo , Piperidinas/farmacología , Unión Proteica , Transducción de Señal , Transactivadores , Factores de Transcripción , Transcripción Genética , Proteínas Coactivadoras Transcripcionales con Motivo de Unión a PDZ , Ensayos Antitumor por Modelo de Xenoinjerto , Proteínas Señalizadoras YAPRESUMEN
Chromosome conformation capture (3C) methods measure DNA contact frequencies based on nuclear proximity ligation, to uncover in vivo genomic folding patterns. 4C-seq is a derivative 3C method, designed to search the genome for sequences contacting a selected genomic site of interest. 4C-seq employs inverse PCR and next generation sequencing to amplify, identify and quantify its proximity ligated DNA fragments. It generates high-resolution contact profiles for selected genomic sites based on limited amounts of sequencing reads. 4C-seq can be used to study multiple aspects of genome organization. It primarily serves to identify specific long-range DNA contacts between individual regulatory DNA modules, forming for example regulatory chromatin loops between enhancers and promoters, or architectural chromatin loops between cohesin- and CTCF- associated domain boundaries. Additionally, 4C-seq contact profiles can reveal the contours of contact domains and can identify the structural domains that co-occupy the same nuclear compartment. Here, we present an improved step-by-step protocol for sample preparation and the generation of 4C-seq sequencing libraries, including an optimized PCR and 4C template purification strategy. In addition, a data processing pipeline is provided which processes multiplexed 4C-seq reads directly from FASTQ files and generates files compatible with standard genome browsers for visualization and further statistical analysis of the data such as peak calling using peakC. The protocols and the pipeline presented should readily allow anyone to generate, visualize and interpret their own high resolution 4C contact datasets.
Asunto(s)
Cromatina/genética , Análisis de Datos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Cromatina/química , Conjuntos de Datos como Asunto , Biblioteca de Genes , Conformación de Ácido Nucleico , Secuencias Reguladoras de Ácidos Nucleicos , Análisis de Secuencia de ADN/métodos , Programas InformáticosRESUMEN
BACKGROUND: Observed levels of gene expression strongly depend on both activity of DNA binding transcription factors (TFs) and chromatin state through different histone modifications (HMs). In order to recover the functional relationship between local chromatin state, TF binding and observed levels of gene expression, regression methods have proven to be useful tools. They have been successfully applied to predict mRNA levels from genome-wide experimental data and they provide insight into context-dependent gene regulatory mechanisms. However, heterogeneity arising from gene-set specific regulatory interactions is often overlooked. RESULTS: We show that regression models that predict gene expression by using experimentally derived ChIP-seq profiles of TFs can be significantly improved by mixture modelling. In order to find biologically relevant gene clusters, we employ a Bayesian allocation procedure which allows us to integrate additional biological information such as three-dimensional nuclear organization of chromosomes and gene function. The data integration procedure involves transforming the additional data into gene similarity values. We propose a generic similarity measure that is especially suitable for situations where the additional data are of both continuous and discrete type, and compare its performance with similar measures in the context of mixture modelling. CONCLUSIONS: We applied the proposed method on a data from mouse embryonic stem cells (ESC). We find that including additional data results in mixture components that exhibit biologically meaningful gene clusters, and provides valuable insight into the heterogeneity of the regulatory interactions.
Asunto(s)
Células Madre Embrionarias/metabolismo , Regulación de la Expresión Génica , Células Madre Pluripotentes/metabolismo , Animales , Teorema de Bayes , Cromatina/genética , Cromatina/metabolismo , Inmunoprecipitación de Cromatina , Genoma , Ratones , Análisis de Regresión , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
During pregnancy, cell-free DNA (cfDNA) in maternal blood encompasses a small percentage of cell-free fetal DNA (cffDNA), an easily accessible source for determination of fetal disease status in risk families through non-invasive procedures. In case of monogenic heritable disease, background maternal cfDNA prohibits direct observation of the maternally inherited allele. Non-invasive prenatal diagnostics (NIPD) of monogenic diseases therefore relies on parental haplotyping and statistical assessment of inherited alleles from cffDNA, techniques currently unavailable for routine clinical practice. Here, we present monogenic NIPD (MG-NIPD), which requires a blood sample from both parents, for targeted locus amplification (TLA)-based phasing of heterozygous variants selectively at a gene of interest. Capture probes-based targeted sequencing of cfDNA from the pregnant mother and a tailored statistical analysis enables predicting fetal gene inheritance. MG-NIPD was validated for 18 pregnancies, focusing on CFTR, CYP21A2, and HBB. In all cases we could predict the inherited alleles with >98% confidence, even at relatively early stages (8 weeks) of pregnancy. This prediction and the accuracy of parental haplotyping was confirmed by sequencing of fetal material obtained by parallel invasive procedures. MG-NIPD is a robust method that requires standard instrumentation and can be implemented in any clinic to provide families carrying a severe monogenic disease with a prenatal diagnostic test based on a simple blood draw.
Asunto(s)
Hiperplasia Suprarrenal Congénita/diagnóstico , Biomarcadores/sangre , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Fibrosis Quística/diagnóstico , Polimorfismo de Nucleótido Simple , Diagnóstico Prenatal/métodos , Esteroide 21-Hidroxilasa/genética , Hiperplasia Suprarrenal Congénita/sangre , Hiperplasia Suprarrenal Congénita/genética , Células Cultivadas , Fibrosis Quística/sangre , Fibrosis Quística/genética , Regulador de Conductancia de Transmembrana de Fibrosis Quística/sangre , ADN/sangre , ADN/genética , Femenino , Haplotipos , Humanos , Embarazo , Esteroide 21-Hidroxilasa/sangreRESUMEN
It is becoming increasingly clear that chromosome organization plays an important role in gene regulation. High-resolution methods such as 4C, Capture-C and promoter capture Hi-C (PCHiC) enable the study of chromatin loops such as those formed between promoters and enhancers or CTCF/cohesin binding sites. An important aspect of 4C/Capture-C/PCHiC analyses is the reliable identification of chromatin loops, preferably not based on visual inspection of a DNA contact profile, but on reproducible statistical analysis that robustly scores interaction peaks in the non-uniform contact background. Here, we present peakC, an R package for the analysis of 4C/Capture-C/PCHiC data. We generated 4C data for 13 viewpoints in two tissues in at least triplicate to test our methods. We developed a non-parametric peak caller based on rank-products. Sampling analysis shows that not read depth but template quality is the most important determinant of success in 4C experiments. By performing peak calling on single experiments we show that the peak calling results are similar to the replicate experiments, but that false positive rates are significantly reduced by performing replicates. Our software is user-friendly and enables robust peak calling for one-vs-all chromosome capture experiments. peakC is available at: https://github.com/deWitLab/peakC.
Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Regiones Promotoras Genéticas/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Sitios de Unión/genética , Factor de Unión a CCCTC/metabolismo , Cromatina/genética , Cromatina/metabolismo , Cromosomas de los Mamíferos/genética , Cromosomas de los Mamíferos/metabolismo , Células Madre Embrionarias/metabolismo , Hígado/embriología , Hígado/metabolismo , Ratones , Reproducibilidad de los ResultadosRESUMEN
Despite recent progress in genome topology knowledge, the role of repeats, which make up the majority of mammalian genomes, remains elusive. Satellite repeats are highly abundant sequences that cluster around centromeres, attract pericentromeric heterochromatin, and aggregate into nuclear chromocenters. These nuclear landmark structures are assumed to form a repressive compartment in the nucleus to which genes are recruited for silencing. We have designed a strategy for genome-wide identification of pericentromere-associated domains (PADs) in different mouse cell types. The â¼1000 PADs and non-PADs have similar chromatin states in embryonic stem cells, but during lineage commitment, chromocenters progressively associate with constitutively inactive genomic regions at the nuclear periphery. This suggests that PADs are not actively recruited to chromocenters, but that chromocenters are themselves attracted to inactive chromatin compartments. However, we also found that experimentally induced proximity of an active locus to chromocenters was sufficient to cause gene repression. Collectively, our data suggest that rather than driving nuclear organization, pericentromeric satellite repeats mostly co-segregate with inactive genomic regions into nuclear compartments where they can contribute to stable maintenance of the repressed status of proximal chromosomal regions.
Asunto(s)
Centrómero/genética , Genómica , Repeticiones de Minisatélite , Animales , Eucromatina , Regulación de la Expresión Génica , Genómica/métodos , Heterocromatina , Ratones , Activación TranscripcionalRESUMEN
BACKGROUND: Prenatal hCMV infections can lead to severe embryopathy and neurological sequelae in neonates. Screening during pregnancy is not recommended by global societies, as there is no effective therapy. Recently, several groups showed that maternal-fetal hCMV transmission can be strongly reduced by administering anti-viral agents early in pregnancy. This calls for a screening method to identify at risk pregnancies at an appropriate gestational age, with the possibility for large-scale enrolment. Non-Invasive Prenatal Testing (NIPT) for fetal aneuploidy screening early in pregnancy is already implemented in many countries and performed on a large-scale basis. We investigated the use of whole genome cell-free DNA (cfDNA) sequencing data, generated for the purpose of NIPT, as (pre-)screening tool to identify women with active hCMV-infections, eligible for therapy. METHODS: Coded raw sequencing NIPT data from 204,818 pregnant women from three testing laboratories were analyzed for the presence of hCMV-cfDNA. Samples were stratified by cfDNA-hCMV load. For validation and interpretation, diagnostic hCMV-qPCR and serology testing were performed on a subset of cfDNA-hCMV-positive (n = 112) and -negative (n = 127) samples. FINDINGS: In 1930 samples (0.94%) hCMV fragments were detected. Validation by hCMV-qPCR showed that samples with high cfDNA-hCMV load tested positive and cfDNA-hCMV-negative samples tested negative. In 32/112 cfDNA-hCMV-positive samples (28.6%) the serological profile suggested a recent primary infection: this was more likely in samples with high cfDNA-hCMV load (78.6%) than in samples with low cfDNA-hCMV load (11.0%). In none of the cfDNA-hCMV-negative samples serology was indicative of a recent primary infection. INTERPRETATION: Our study shows that large-scale (pre-)screening for both genetic fetal aberrations and active maternal hCMV infections during pregnancy can be combined in one cfDNA sequencing test, performed on a single blood sample, drawn in the first trimester of pregnancy. FUNDING: This work was partly funded by the Prenatal Screening Foundation Nijmegen, the Netherlands.
Asunto(s)
Ácidos Nucleicos Libres de Células , Citomegalovirus , Recién Nacido , Humanos , Femenino , Embarazo , Citomegalovirus/genética , Mujeres Embarazadas , Aneuploidia , Diagnóstico Prenatal/métodosRESUMEN
Purpose: Uveal melanoma (UM) has a high propensity to metastasize. Prognosis is associated with specific driver mutations and copy number variations (CNVs), but limited primary tumor tissue is available for molecular characterization due to eye-sparing irradiation treatment. This study aimed to assess the rise in circulating tumor DNA (ctDNA) levels in UM and evaluate its efficacy for CNV-profiling of patients with UM. Methods: In a pilot study, we assessed ctDNA levels in the blood of patients with UM (n = 18) at various time points, including the time of diagnosis (n = 13), during fractionated stereotactic radiotherapy (fSRT) treatment (n = 6), and upon detection of metastatic disease (n = 13). Shallow whole-genome sequencing (sWGS) combined with in silico size-selection was used to identify prognostically relevant CNVs in patients with UM (n = 26) from peripheral blood retrieved at the time of diagnosis (n = 9), during fSRT (n = 5), during post-treatment follow-up (n = 4), metastasis detection (n = 6), and metastasis follow-up (n = 4). Results: A total of 34 patients had blood analyzed for ctDNA detection (n = 18) and/or CNV analysis (n = 26) at various time points. At the time of diagnosis, 5 of 13 patients (38%) had detectable ctDNA (median = 0 copies/mL). Upon detection of metastatic disease, ctDNA was detected in 10 of 13 patients (77%) and showed increased ctDNA levels (median = 24 copies/mL, P < 0.01). Among the six patients analyzed during fSRT, three (50%) patients had detectable ctDNA at baseline and three of six (50%) patients had undetectable levels of ctDNA. During the fSRT regimen, ctDNA levels remained unchanged (P > 0.05). The ctDNA fractions were undetectable to low in localized disease, and sWGS did not elucidate chromosome 3 status from blood samples. However, in 7 of 10 (70%) patients with metastases, the detection of chromosome 3 loss corresponded to the high metastatic-risk class. Conclusions: The rise in ctDNA levels observed in patients with UM harboring metastases suggests its potential utility for CNV profiling. These findings highlight the potential of using ctDNA for metastasis detection and patient inclusion in therapeutic studies targeting metastatic UM.
Asunto(s)
ADN Tumoral Circulante , Melanoma , Neoplasias de la Úvea , Humanos , ADN Tumoral Circulante/genética , Variaciones en el Número de Copia de ADN , Proyectos Piloto , BiomarcadoresRESUMEN
MOTIVATION: Gene regulatory networks, in which edges between nodes describe interactions between transcriptional regulators and their target genes, determine the coordinated spatiotemporal expression of genes. Especially in higher organisms, context-specific combinatorial regulation by transcription factors (TFs) is believed to determine cellular states and fates. TF-target gene interactions can be studied using high-throughput techniques such as ChIP-chip or ChIP-Seq. These experiments are time and cost intensive, and further limited by, for instance, availability of high affinity TF antibodies. Hence, there is a practical need for methods that can predict TF-TF and TF-target gene interactions in silico, i.e. from gene expression and DNA sequence data alone. We propose GEMULA, a novel approach based on linear models to predict TF-gene expression associations and TF-TF interactions from experimental data. GEMULA is based on linear models, fast and considers a wide range of biologically plausible models that describe gene expression data as a function of predicted TF binding to gene promoters. RESULTS: We show that models inferred with GEMULA are able to explain roughly 70% of the observed variation in gene expression in the yeast heat shock response. The functional relevance of the inferred TF-TF interactions in these models are validated by different sources of independent experimental evidence. We also have applied GEMULA to an in vitro model of neuronal outgrowth. Our findings confirm existing knowledge on gene regulatory interactions underlying neuronal outgrowth, but importantly also generate new insights into the temporal dynamics of this gene regulatory network that can now be addressed experimentally. AVAILABILITY: The GEMULA R-package is available from http://www.few.vu.nl/~degunst/gemula_1.0.tar.gz.
Asunto(s)
Redes Reguladoras de Genes , Modelos Genéticos , Programas Informáticos , Animales , Regulación de la Expresión Génica , Humanos , Modelos Lineales , Análisis de Secuencia por Matrices de Oligonucleótidos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismoRESUMEN
Gene regulatory networks, in which edges between nodes describe interactions between transcription factors (TFs) and their target genes, model regulatory interactions that determine the cell-type and condition-specific expression of genes. Regression methods can be used to identify TF-target gene interactions from gene expression and DNA sequence data. The response variable, i.e. observed gene expression, is modeled as a function of many predictor variables simultaneously. In practice, it is generally not possible to select a single model that clearly achieves the best fit to the observed experimental data and the selected models typically contain overlapping sets of predictor variables. Moreover, parameters that represent the marginal effect of the individual predictors are not always present. In this paper, we use the statistical framework of estimation of variable importance to define variable importance as a parameter of interest and study two different estimators of this parameter in the context of gene regulatory networks. On yeast data we show that the resulting parameter has a biologically appealing interpretation. We apply the proposed methodology on mammalian gene expression data to gain insight into the temporal activity of TFs that underly gene expression changes in F11 cells in response to Forskolin stimulation.
Asunto(s)
Redes Reguladoras de Genes , Funciones de Verosimilitud , Perfilación de la Expresión Génica/estadística & datos numéricos , Modelos Genéticos , Probabilidad , Análisis de Regresión , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
All cellular processes are regulated by condition-specific and time-dependent interactions between transcription factors and their target genes. While in simple organisms, e.g. bacteria and yeast, a large amount of experimental data is available to support functional transcription regulatory interactions, in mammalian systems reconstruction of gene regulatory networks still heavily depends on the accurate prediction of transcription factor binding sites. Here, we present a new method, log-linear modeling of 3D contingency tables (LLM3D), to predict functional transcription factor binding sites. LLM3D combines gene expression data, gene ontology annotation and computationally predicted transcription factor binding sites in a single statistical analysis, and offers a methodological improvement over existing enrichment-based methods. We show that LLM3D successfully identifies novel transcriptional regulators of the yeast metabolic cycle, and correctly predicts key regulators of mouse embryonic stem cell self-renewal more accurately than existing enrichment-based methods. Moreover, in a clinically relevant in vivo injury model of mammalian neurons, LLM3D identified peroxisome proliferator-activated receptor γ (PPARγ) as a neuron-intrinsic transcriptional regulator of regenerative axon growth. In conclusion, LLM3D provides a significant improvement over existing methods in predicting functional transcription regulatory interactions in the absence of experimental transcription factor binding data.
Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Línea Celular , Células Madre Embrionarias/metabolismo , Genoma , Modelos Lineales , Ratones , Regeneración Nerviosa/genética , Neuronas/metabolismo , PPAR gamma/metabolismo , Ratas , Ratas Wistar , Levaduras/genética , Levaduras/metabolismoRESUMEN
Upon antigen-specific T cell receptor (TCR) engagement, human CD4+ T cells proliferate and differentiate, a process associated with rapid transcriptional changes and metabolic reprogramming. Here, we show that the generation of extramitochondrial pyruvate is an important step for acetyl-CoA production and subsequent H3K27ac-mediated remodeling of histone acetylation. Histone modification, transcriptomic, and carbon tracing analyses of pyruvate dehydrogenase (PDH)-deficient T cells show PDH-dependent acetyl-CoA generation as a rate-limiting step during T activation. Furthermore, T cell activation results in the nuclear translocation of PDH and its association with both the p300 acetyltransferase and histone H3K27ac. These data support the tight integration of metabolic and histone-modifying enzymes, allowing metabolic reprogramming to fuel CD4+ T cell activation. Targeting this pathway may provide a therapeutic approach to specifically regulate antigen-driven T cell activation.
Asunto(s)
Ensamble y Desensamble de Cromatina , Histonas , Humanos , Histonas/metabolismo , Acetilcoenzima A/metabolismo , Linfocitos T CD4-Positivos/metabolismoRESUMEN
Speciation is associated with substantial rewiring of the regulatory circuitry underlying the expression of genes. Determining which changes are relevant and underlie the emergence of the human brain or its unique susceptibility to neural disease has been challenging. Here we annotate changes to gene regulatory elements (GREs) at cell type resolution in the brains of multiple primate species spanning most of primate evolution. We identify a unique set of regulatory elements that emerged in hominins prior to the separation of humans and chimpanzees. We demonstrate that these hominin gains perferentially affect oligodendrocyte function postnatally and are preferentially affected in the brains of autism patients. This preference is also observed for human-specific GREs suggesting this system is under continued selective pressure. Our data provide a roadmap of regulatory rewiring across primate evolution providing insight into the genomic changes that underlie the emergence of the brain and its susceptibility to neural disease.
Asunto(s)
Trastorno Autístico/metabolismo , Encéfalo/metabolismo , Hominidae/metabolismo , Oligodendroglía/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos/fisiología , Animales , Trastorno Autístico/genética , Callithrix , Cromatina , Inmunoprecipitación de Cromatina , Cromosomas/química , Susceptibilidad a Enfermedades , Evolución Molecular , Femenino , Regulación de la Expresión Génica , Genómica , Hominidae/genética , Humanos , Macaca mulatta , Pan troglodytesRESUMEN
BACKGROUND: Atrial fibrillation (AF) often arises from structural abnormalities in the left atria (LA). Annotation of the noncoding genome in human LA is limited, as are effects on gene expression and chromatin architecture. Many AF-associated genetic variants reside in noncoding regions; this knowledge gap impairs efforts to understand the molecular mechanisms of AF and cardiac conduction phenotypes. METHODS: We generated a model of the LA noncoding genome by profiling 7 histone post-translational modifications (active: H3K4me3, H3K4me2, H3K4me1, H3K27ac, H3K36me3; repressive: H3K27me3, H3K9me3), CTCF binding, and gene expression in samples from 5 individuals without structural heart disease or AF. We used MACS2 to identify peak regions (P<0.01), applied a Markov model to classify regulatory elements, and annotated this model with matched gene expression data. We intersected chromatin states with expression quantitative trait locus, DNA methylation, and HiC chromatin interaction data from LA and left ventricle. Finally, we integrated genome-wide association data for AF and electrocardiographic traits to link disease-related variants to genes. RESULTS: Our model identified 21 epigenetic states, encompassing regulatory motifs, such as promoters, enhancers, and repressed regions. Genes were regulated by proximal chromatin states; repressive states were associated with a significant reduction in gene expression (P<2×10-16). Chromatin states were differentially methylated, promoters were less methylated than repressed regions (P<2×10-16). We identified over 15 000 LA-specific enhancers, defined by homeobox family motifs, and annotated several cardiovascular disease susceptibility loci. Intersecting AF and PR genome-wide association studies loci with long-range chromatin conformation data identified a gene interaction network dominated by NKX2-5, TBX3, ZFHX3, and SYNPO2L. CONCLUSIONS: Profiling the noncoding genome provides new insights into the gene expression and chromatin regulation in human LA tissue. These findings enabled identification of a gene network underlying AF; our experimental and analytic approach can be extended to identify molecular mechanisms for other cardiac diseases and traits.
Asunto(s)
Fibrilación Atrial/genética , Epigénesis Genética , Redes Reguladoras de Genes , Atrios Cardíacos/patología , Secuencias de Aminoácidos/genética , Secuencia de Bases , Cromatina/metabolismo , Metilación de ADN/genética , Elementos de Facilitación Genéticos/genética , Femenino , Humanos , Masculino , Persona de Mediana Edad , Modelos Genéticos , Donantes de Tejidos , Transcripción GenéticaRESUMEN
Mutations in non-coding regulatory DNA such as enhancers underlie a wide variety of diseases including developmental disorders and cancer. As enhancers rapidly evolve, understanding their function and configuration in non-human disease models can have important clinical applications. Here, we analyze enhancer configurations in tissues isolated from the common marmoset, a widely used primate model for human disease. Integrating these data with human and mouse data, we find that enhancers containing trait-associated variants are preferentially conserved. In contrast, most human-specific enhancers are highly variable between individuals, with a subset failing to contact promoters. These are located further away from genes and more often reside in inactive B-compartments. Our data show that enhancers typically emerge as instable elements with minimal biological impact prior to their integration in a transcriptional program. Furthermore, our data provide insight into which trait variations in enhancers can be faithfully modeled using the common marmoset.