RESUMO
Many Mendelian developmental disorders caused by coding variants in epigenetic regulators have now been discovered. Epigenetic regulators are broadly expressed, and each of these disorders typically shows phenotypic manifestations from many different organ systems. An open question is whether the chromatin disruption-the root of the pathogenesis-is similar in the different disease-relevant cell types. This is possible in principle, because all these cell types are subject to effects from the same causative gene, which has the same kind of function (e.g., methylates histones) and is disrupted by the same germline variant. We focus on mouse models for Kabuki syndrome types 1 and 2 and find that the chromatin accessibility changes in neurons are mostly distinct from changes in B or T cells. This is not because the neuronal accessibility changes occur at regulatory elements that are only active in neurons. Neurons, but not B or T cells, show preferential chromatin disruption at CpG islands and at regulatory elements linked to aging. A sensitive analysis reveals that regulatory elements disrupted in B/T cells do show chromatin accessibility changes in neurons, but these are very subtle and of uncertain functional significance. Finally, we are able to identify a small set of regulatory elements disrupted in all three cell types. Our findings reveal the cellular-context-specific effect of variants in epigenetic regulators and suggest that blood-derived episignatures, although useful diagnostically, may not be well suited for understanding the mechanistic basis of neurodevelopment in Mendelian disorders of the epigenetic machinery.
Assuntos
Anormalidades Múltiplas , Envelhecimento , Cromatina , Ilhas de CpG , Face , Doenças Hematológicas , Neurônios , Doenças Vestibulares , Animais , Doenças Hematológicas/genética , Doenças Hematológicas/metabolismo , Camundongos , Face/anormalidades , Cromatina/metabolismo , Cromatina/genética , Doenças Vestibulares/genética , Neurônios/metabolismo , Envelhecimento/genética , Anormalidades Múltiplas/genética , Modelos Animais de Doenças , Epigênese Genética , Linfócitos T/metabolismo , Linfócitos B/metabolismoRESUMO
Growth deficiency is a characteristic feature of both Kabuki syndrome 1 (KS1) and Kabuki syndrome 2 (KS2), Mendelian disorders of the epigenetic machinery with similar phenotypes but distinct genetic etiologies. We previously described skeletal growth deficiency in a mouse model of KS1 and further established that a Kmt2d-/- chondrocyte model of KS1 exhibits precocious differentiation. Here we characterized growth deficiency in a mouse model of KS2, Kdm6atm1d/+. We show that Kdm6atm1d/+ mice have decreased femur and tibia length compared to controls and exhibit abnormalities in cortical and trabecular bone structure. Kdm6atm1d/+ growth plates are also shorter, due to decreases in hypertrophic chondrocyte size and hypertrophic zone height. Given these disturbances in the growth plate, we generated Kdm6a-/- chondrogenic cell lines. Similar to our prior in vitro model of KS1, we found that Kdm6a-/- cells undergo premature, enhanced differentiation towards chondrocytes compared to Kdm6a+/+ controls. RNA-seq showed that Kdm6a-/- cells have a distinct transcriptomic profile that indicates dysregulation of cartilage development. Finally, we performed RNA-seq simultaneously on Kmt2d-/-, Kdm6a-/-, and control lines at Days 7 and 14 of differentiation. This revealed surprising resemblance in gene expression between Kmt2d-/- and Kdm6a-/- at both time points and indicates that the similarity in phenotype between KS1 and KS2 also exists at the transcriptional level.
Assuntos
Anormalidades Múltiplas , Condrócitos , Modelos Animais de Doenças , Face , Doenças Hematológicas , Histona Desmetilases , Doenças Vestibulares , Animais , Doenças Vestibulares/genética , Doenças Vestibulares/patologia , Camundongos , Face/anormalidades , Histona Desmetilases/genética , Histona Desmetilases/metabolismo , Doenças Hematológicas/genética , Doenças Hematológicas/patologia , Condrócitos/metabolismo , Anormalidades Múltiplas/genética , Anormalidades Múltiplas/patologia , Diferenciação Celular/genética , Condrogênese/genética , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/deficiência , Humanos , Camundongos Knockout , Fenótipo , Histona-Lisina N-Metiltransferase , Proteína de Leucina Linfoide-MieloideRESUMO
Hi-C data are commonly normalized using single sample processing methods, with focus on comparisons between regions within a given contact map. Here, we aim to compare contact maps across different samples. We demonstrate that unwanted variation, of likely technical origin, is present in Hi-C data with replicates from different individuals, and that properties of this unwanted variation change across the contact map. We present band-wise normalization and batch correction, a method for normalization and batch correction of Hi-C data and show that it substantially improves comparisons across samples, including in a quantitative trait loci analysis as well as differential enrichment across cell types.
Assuntos
Locos de Características Quantitativas , Humanos , Biologia ComputacionalRESUMO
Diet-related metabolic syndrome is the largest contributor to adverse health in the United States. However, the study of gene-environment interactions and their epigenomic and transcriptomic integration is complicated by the lack of environmental and genetic control in humans that is possible in mouse models. Here we exposed three mouse strains, C57BL/6J (BL6), A/J, and NOD/ShiLtJ (NOD), to a high-fat, high-carbohydrate diet, leading to varying degrees of metabolic syndrome. We then performed transcriptomic and genome-wide DNA methylation analyses for each strain and found overlapping but also highly divergent changes in gene expression and methylation upstream of the discordant metabolic phenotypes. Strain-specific pathway analysis of dietary effects revealed a dysregulation of cholesterol biosynthesis common to all three strains but distinct regulatory networks driving this dysregulation. This suggests a strategy for strain-specific targeted pharmacologic intervention of these upstream regulators informed by epigenetic and transcriptional regulation. As a pilot study, we administered the drug GW4064 to target one of these genotype-dependent networks, the farnesoid X receptor pathway, and found that GW4064 exerts strain-specific protection against dietary effects in BL6, as predicted by our transcriptomic analysis. Furthermore, GW4064 treatment induced inflammatory-related gene expression changes in NOD, indicating a strain-specific effect in its associated toxicities as well as its therapeutic efficacy. This pilot study demonstrates the potential efficacy of precision therapeutics for genotype-informed dietary metabolic intervention and a mouse platform for guiding this approach.
Assuntos
Síndrome Metabólica , Humanos , Camundongos , Animais , Síndrome Metabólica/tratamento farmacológico , Síndrome Metabólica/genética , Síndrome Metabólica/metabolismo , Epigenômica , Projetos Piloto , Fígado/metabolismo , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos NOD , Dieta Hiperlipídica/efeitos adversos , Epigênese GenéticaRESUMO
In complex tissues containing cells that are difficult to dissociate, single-nucleus RNA-sequencing (snRNA-seq) has become the preferred experimental technology over single-cell RNA-sequencing (scRNA-seq) to measure gene expression. To accurately model these data in downstream analyses, previous work has shown that droplet-based scRNA-seq data are not zero-inflated, but whether droplet-based snRNA-seq data follow the same probability distributions has not been systematically evaluated. Using pseudonegative control data from nuclei in mouse cortex sequenced with the 10x Genomics Chromium system and mouse kidney sequenced with the DropSeq system, we found that droplet-based snRNA-seq data follow a negative binomial distribution, suggesting that parametric statistical models applied to scRNA-seq are transferable to snRNA-seq. Furthermore, we found that the quantification choices in adapting quantification mapping strategies from scRNA-seq to snRNA-seq can play a significant role in downstream analyses and biological interpretation. In particular, reference transcriptomes that do not include intronic regions result in significantly smaller library sizes and incongruous cell type classifications. We also confirmed the presence of a gene length bias in snRNA-seq data, which we show is present in both exonic and intronic reads, and investigate potential causes for the bias.
RESUMO
The aggregation and joint analysis of large numbers of exome sequences has recently made it possible to derive estimates of intolerance to loss-of-function (LoF) variation for human genes. Here, we demonstrate strong and widespread coupling between genic LoF intolerance and promoter CpG density across the human genome. Genes downstream of the most CpG-rich promoters (top 10% CpG density) have a 67.2% probability of being highly LoF intolerant, using the LOEUF metric from gnomAD. This is in contrast to 7.4% of genes downstream of the most CpG-poor (bottom 10% CpG density) promoters. Combining promoter CpG density with exonic and promoter conservation explains 33.4% of the variation in LOEUF, and the contribution of CpG density exceeds the individual contributions of exonic and promoter conservation. We leverage this to train a simple and easily interpretable predictive model that outperforms other existing predictors and allows us to classify 1,760 genes-which are currently unascertained in gnomAD-as highly LoF intolerant or not. These predictions have the potential to aid in the interpretation of novel variants in the clinical setting. Moreover, our results reveal that high CpG density is not merely a generic feature of human promoters but is preferentially encountered at the promoters of the most selectively constrained genes, calling into question the prevailing view that CpG islands are not subject to selection.
Assuntos
Ilhas de CpG/genética , Genoma Humano/genética , Mutação com Perda de Função/genética , Regiões Promotoras Genéticas/genética , Metilação de DNA/genética , Éxons/genética , Humanos , RNA Polimerase II/genética , Sítio de Iniciação de TranscriçãoRESUMO
Probing epigenetic features on DNA has tremendous potential to advance our understanding of the phased epigenome. In this study, we use nanopore sequencing to evaluate CpG methylation and chromatin accessibility simultaneously on long strands of DNA by applying GpC methyltransferase to exogenously label open chromatin. We performed nanopore sequencing of nucleosome occupancy and methylome (nanoNOMe) on four human cell lines (GM12878, MCF-10A, MCF-7 and MDA-MB-231). The single-molecule resolution allows footprinting of protein and nucleosome binding, and determination of the combinatorial promoter epigenetic signature on individual molecules. Long-read sequencing makes it possible to robustly assign reads to haplotypes, allowing us to generate a fully phased human epigenome, consisting of chromosome-level allele-specific profiles of CpG methylation and chromatin accessibility. We further apply this to a breast cancer model to evaluate differential methylation and accessibility between cancerous and noncancerous cells.
Assuntos
Neoplasias da Mama/genética , Cromatina/genética , Metilação de DNA/genética , Sequenciamento por Nanoporos/métodos , Linhagem Celular Tumoral , Ilhas de CpG/genética , DNA/metabolismo , Epigenoma/genética , Feminino , Genoma Humano/genética , Humanos , Células MCF-7 , Metiltransferases/metabolismo , Regiões Promotoras Genéticas/genética , Análise de Sequência de DNARESUMO
Estimates of correlation between pairs of genes in co-expression analysis are commonly used to construct networks among genes using gene expression data. As previously noted, the distribution of such correlations depends on the observed expression level of the involved genes, which we refer to this as a mean-correlation relationship in RNA-seq data, both bulk and single-cell. This dependence introduces an unwanted technical bias in co-expression analysis whereby highly expressed genes are more likely to be highly correlated. Such a relationship is not observed in protein-protein interaction data, suggesting that it is not reflecting biology. Ignoring this bias can lead to missing potentially biologically relevant pairs of genes that are lowly expressed, such as transcription factors. To address this problem, we introduce spatial quantile normalization (SpQN), a method for normalizing local distributions in a correlation matrix. We show that spatial quantile normalization removes the mean-correlation relationship and corrects the expression bias in network reconstruction.
Assuntos
Perfilação da Expressão Gênica , Fatores de Transcrição , Análise de Sequência de RNA/métodos , Fatores de Transcrição/genética , Sequenciamento do ExomaRESUMO
Uremic symptoms are common in patients with advanced chronic kidney disease, but the toxins that cause these symptoms are unknown. To evaluate this, we performed a cross-sectional study of the 12 month post-randomization follow-up visit of Modification of Diet in Renal Disease (MDRD) participants reporting uremic symptoms who also had available stored serum. We quantified 1,163 metabolites by liquid chromatography-tandem mass spectrometry. For each uremic symptom, we calculated a score as the severity multiplied by the number of days the symptom was experienced. We analyzed the associations of the individual symptom scores with metabolites using linear models with empirical Bayesian inference, adjusted for multiple comparisons. Among 695 participants, the mean measured glomerular filtration rate (mGFR) was 28 mL/min/1.73 m2. Uremic symptoms were more common in the subgroup of 214 patients with an mGFR under 20 mL/min/1.73 m2 (mGFR under 20 subgroup) than in the full group. For all metabolites with significant associations, the direction of the association was concordant in the full group and the subgroup. For gastrointestinal symptoms (bad taste, loss of appetite, nausea, and vomiting), eleven metabolites were associated with symptoms. For neurologic symptoms (decreased alertness, falling asleep during the day, forgetfulness, lack of pep and energy, and tiring easily/weakness), seven metabolites were associated with symptoms. Associations were consistent across sensitivity analyses. Thus, our proof-of-principle study demonstrates the potential for metabolomics to understand metabolic pathways associated with uremic symptoms. Larger, prospective studies with external validation are needed.
Assuntos
Insuficiência Renal Crônica , Teorema de Bayes , Estudos Transversais , Taxa de Filtração Glomerular , Humanos , Metabolômica , Estudos Prospectivos , Insuficiência Renal Crônica/complicações , Insuficiência Renal Crônica/diagnósticoRESUMO
Coding variants in epigenetic regulators are emerging as causes of neurological dysfunction and cancer. However, a comprehensive effort to identify disease candidates within the human epigenetic machinery (EM) has not been performed; it is unclear whether features exist that distinguish between variation-intolerant and variation-tolerant EM genes, and between EM genes associated with neurological dysfunction versus cancer. Here, we rigorously define 295 genes with a direct role in epigenetic regulation (writers, erasers, remodelers, readers). Systematic exploration of these genes reveals that although individual enzymatic functions are always mutually exclusive, readers often also exhibit enzymatic activity (dual-function EM genes). We find that the majority of EM genes are very intolerant to loss-of-function variation, even when compared to the dosage sensitive transcription factors, and we identify 102 novel EM disease candidates. We show that this variation intolerance is driven by the protein domains encoding the epigenetic function, suggesting that disease is caused by a perturbed chromatin state. We then describe a large subset of EM genes that are coexpressed within multiple tissues. This subset is almost exclusively populated by extremely variation-intolerant genes and shows enrichment for dual-function EM genes. It is also highly enriched for genes associated with neurological dysfunction, even when accounting for dosage sensitivity, but not for cancer-associated EM genes. Finally, we show that regulatory regions near epigenetic regulators are genetically important for common neurological traits. These findings prioritize novel disease candidate EM genes and suggest that this coexpression plays a functional role in normal neurological homeostasis.
Assuntos
Epigênese Genética , Doenças do Sistema Nervoso/genética , Polimorfismo Genético , Montagem e Desmontagem da Cromatina , Humanos , Mutação com Perda de Função , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
In the study of DNA methylation, genetic variation between species, strains or individuals can result in CpG sites that are exclusive to a subset of samples, and insertions and deletions can rearrange the spatial distribution of CpGs. How to account for this variation in an analysis of the interplay between sequence variation and DNA methylation is not well understood, especially when the number of CpG differences between samples is large. Here, we use whole-genome bisulfite sequencing data on two highly divergent mouse strains to study this problem. We show that alignment to personal genomes is necessary for valid methylation quantification. We introduce a method for including strain-specific CpGs in differential analysis, and show that this increases power. We apply our method to a human normal-cancer dataset, and show this improves accuracy and power, illustrating the broad applicability of our approach. Our method uses smoothing to impute methylation levels at strain-specific sites, thereby allowing strain-specific CpGs to contribute to the analysis, while accounting for differences in the spatial occurrences of CpGs. Our results have implications for joint analysis of genetic variation and DNA methylation using bisulfite-converted DNA, and unlocks the use of personal genomes for addressing this question.
Assuntos
Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos , Animais , Ilhas de CpG/genética , Metilação de DNA/genética , Epigênese Genética , Genoma Humano/genética , Genótipo , Humanos , Camundongos , Análise de Sequência de DNARESUMO
Kabuki syndrome is a Mendelian intellectual disability syndrome caused by mutations in either of two genes (KMT2D and KDM6A) involved in chromatin accessibility. We previously showed that an agent that promotes chromatin opening, the histone deacetylase inhibitor (HDACi) AR-42, ameliorates the deficiency of adult neurogenesis in the granule cell layer of the dentate gyrus and rescues hippocampal memory defects in a mouse model of Kabuki syndrome (Kmt2d+/ßGeo). Unlike a drug, a dietary intervention could be quickly transitioned to the clinic. Therefore, we have explored whether treatment with a ketogenic diet could lead to a similar rescue through increased amounts of beta-hydroxybutyrate, an endogenous HDACi. Here, we report that a ketogenic diet in Kmt2d+/ßGeo mice modulates H3ac and H3K4me3 in the granule cell layer, with concomitant rescue of both the neurogenesis defect and hippocampal memory abnormalities seen in Kmt2d+/ßGeo mice; similar effects on neurogenesis were observed on exogenous administration of beta-hydroxybutyrate. These data suggest that dietary modulation of epigenetic modifications through elevation of beta-hydroxybutyrate may provide a feasible strategy to treat the intellectual disability seen in Kabuki syndrome and related disorders.
Assuntos
Anormalidades Múltiplas/dietoterapia , Dieta Cetogênica/métodos , Face/anormalidades , Doenças Hematológicas/dietoterapia , Hipocampo/metabolismo , Histonas/biossíntese , Deficiência Intelectual/dietoterapia , Neurogênese/fisiologia , Doenças Vestibulares/dietoterapia , Ácido 3-Hidroxibutírico/metabolismo , Anormalidades Múltiplas/genética , Animais , Modelos Animais de Doenças , Doenças Hematológicas/genética , Hipocampo/citologia , Histona Desmetilases/genética , Histona-Lisina N-Metiltransferase/genética , Deficiência Intelectual/genética , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Proteína de Leucina Linfoide-Mieloide/genética , Neurogênese/genética , Doenças Vestibulares/genéticaRESUMO
Recent genome-wide association studies (GWAS) identified numerous schizophrenia (SZ) and Alzheimer's disease (AD) associated loci, most outside protein-coding regions and hypothesized to affect gene transcription. We used a massively parallel reporter assay to screen, 1,049 SZ and 30 AD variants in 64 and nine loci, respectively for allele differences in driving reporter gene expression. A library of synthetic oligonucleotides assaying each allele five times was transfected into K562 chronic myelogenous leukemia lymphoblasts and SK-SY5Y human neuroblastoma cells. One hundred forty eight variants showed allelic differences in K562 and 53 in SK-SY5Y cells, on average 2.6 variants per locus. Nine showed significant differences in both lines, a modest overlap reflecting different regulatory landscapes of these lines that also differ significantly in chromatin marks. Eight of nine were in the same direction. We observe no preference for risk alleles to increase or decrease expression. We find a positive correlation between the number of SNPs in linkage disequilibrium and the proportion of functional SNPs supporting combinatorial effects that may lead to haplotype selection. Our results prioritize future functional follow up of disease associated SNPs to determine the driver GWAS variant(s), at each locus and enhance our understanding of gene regulation dynamics.
Assuntos
Doença de Alzheimer/genética , Regulação da Expressão Gênica/genética , Esquizofrenia/genética , Alelos , Linhagem Celular Tumoral , Expressão Gênica/genética , Frequência do Gene/genética , Predisposição Genética para Doença , Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Haplótipos , Humanos , Células K562 , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único/genética , Locos de Características QuantitativasRESUMO
BACKGROUND: Massively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets. RESULTS: We present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. An R package is available from the Bioconductor project. CONCLUSIONS: Together, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments.
Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Modelos Lineares , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA , SoftwareRESUMO
DNA methylation at the 5-position of cytosine (5mC) is an epigenetic modification that regulates gene expression and cellular plasticity in development and disease. The ten-eleven translocation (TET) gene family oxidizes 5mC to 5-hydroxymethylcytosine (5hmC), providing an active mechanism for DNA demethylation, and it may also provide its own regulatory function. Here we applied oxidative bisulfite sequencing to generate whole-genome DNA methylation and hydroxymethylation maps at single-base resolution in human normal liver and lung as well as paired tumor tissues. We found that 5hmC is significantly enriched in CpG island (CGI) shores while depleted in CGIs themselves, especially in active genes, which exhibit a bimodal distribution of 5hmC around CGI that corresponds to H3K4me1 modifications. Hydroxymethylation on promoters, gene bodies, and transcription termination regions (TTRs) showed strong positive correlation with gene expression within and across tissues, suggesting that 5hmC is a marker of active genes and could play a role in gene expression mediated by DNA demethylation. Comparative analysis of methylomes and hydroxymethylomes revealed that 5hmC is significantly enriched in both tissue-specific DMRs (t-DMRs) and cancer-specific DMRs (c-DMRs), and 5hmC is negatively correlated with methylation changes, especially in non-CGI-associated DMRs. These findings revealed novel reciprocity between epigenetic markers at CGI shores corresponding to differential gene expression in normal tissues and matching tumors. Overall, our study provided a comprehensive analysis of the interplay between the methylome, hydroxymethylome, and histone modifications during tumorigenesis.
Assuntos
5-Metilcitosina/análogos & derivados , Metilação de DNA , Fígado/química , Pulmão/química , Neoplasias/genética , Sequenciamento Completo do Genoma/métodos , 5-Metilcitosina/análise , Ilhas de CpG , DNA/química , Epigênese Genética , Regulação da Expressão Gênica , Humanos , Fígado/patologia , Pulmão/patologia , Especificidade de Órgãos , Regiões Promotoras GenéticasRESUMO
DNA methylation is an epigenetic mark thought to be robust to environmental perturbations on a short time scale. Here, we challenge that view by demonstrating that the infection of human dendritic cells (DCs) with a live pathogenic bacteria is associated with rapid and active demethylation at thousands of loci, independent of cell division. We performed an integrated analysis of data on genome-wide DNA methylation, histone mark patterns, chromatin accessibility, and gene expression, before and after infection. We found that infection-induced demethylation rarely occurs at promoter regions and instead localizes to distal enhancer elements, including those that regulate the activation of key immune transcription factors. Active demethylation is associated with extensive epigenetic remodeling, including the gain of histone activation marks and increased chromatin accessibility, and is strongly predictive of changes in the expression levels of nearby genes. Collectively, our observations show that active, rapid changes in DNA methylation in enhancers play a previously unappreciated role in regulating the transcriptional response to infection, even in nonproliferating cells.
Assuntos
Infecções Bacterianas/genética , Metilação de DNA , Células Dendríticas/metabolismo , Células Dendríticas/microbiologia , Interações Hospedeiro-Patógeno/genética , 5-Metilcitosina/análogos & derivados , Infecções Bacterianas/imunologia , Infecções Bacterianas/metabolismo , Ilhas de CpG , Citosina/análogos & derivados , Citosina/metabolismo , Células Dendríticas/imunologia , Epigênese Genética , Epigenômica/métodos , Regulação da Expressão Gênica , Interações Hospedeiro-Patógeno/imunologia , Humanos , Mycobacterium tuberculosis/imunologia , Fatores de Transcrição/metabolismo , Tuberculose/genética , Tuberculose/imunologia , Tuberculose/metabolismo , Tuberculose/microbiologiaRESUMO
Whole-genome bisulfite sequencing (WGBS) allows genome-wide DNA methylation profiling, but the associated high sequencing costs continue to limit its widespread application. We used several high-coverage reference data sets to experimentally determine minimal sequencing requirements. We present data-derived recommendations for minimum sequencing depth for WGBS libraries, highlight what is gained with increasing coverage and discuss the trade-off between sequencing depth and number of assayed replicates.
Assuntos
Ilhas de CpG , Metilação de DNA , Análise de Sequência de DNA/métodos , Encéfalo/fisiologia , Linfócitos T CD4-Positivos/fisiologia , Linfócitos T CD8-Positivos/fisiologia , Interpretação Estatística de Dados , Bases de Dados Genéticas , Células-Tronco Embrionárias/fisiologia , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sensibilidade e Especificidade , SulfitosRESUMO
Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.
Assuntos
Biologia Computacional , Perfilação da Expressão Gênica , Genômica/métodos , Ensaios de Triagem em Larga Escala/métodos , Software , Linguagens de Programação , Interface Usuário-ComputadorRESUMO
Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods.
Assuntos
Genômica , Biologia Computacional , MicroRNAs , Análise de Sequência de DNARESUMO
Summary: The minfi package is widely used for analyzing Illumina DNA methylation array data. Here we describe modifications to the minfi package required to support the HumanMethylationEPIC ('EPIC') array from Illumina. We discuss methods for the joint analysis and normalization of data from the HumanMethylation450 ('450k') and EPIC platforms. We introduce the single-sample Noob ( ssNoob ) method, a normalization procedure suitable for incremental preprocessing of individual methylation arrays and conclude that this method should be used when integrating data from multiple generations of Infinium methylation arrays. We show how to use reference 450k datasets to estimate cell type composition of samples on EPIC arrays. The cumulative effect of these updates is to ensure that minfi provides the tools to best integrate existing and forthcoming Illumina methylation array data. Availability and Implementation: The minfi package version 1.19.12 or higher is available for all platforms from the Bioconductor project. Contact: khansen@jhsph.edu. Supplementary information: Supplementary data are available at Bioinformatics online.