RESUMO
The growth of omic data presents evolving challenges in data manipulation, analysis and integration. Addressing these challenges, Bioconductor provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming offers a revolutionary data organization and manipulation standard. Here we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analyzing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas, spanning six data frameworks and ten analysis tools.
Assuntos
Software , Humanos , Biologia Computacional/métodos , Leucócitos Mononucleares/metabolismo , Leucócitos Mononucleares/citologia , Genômica/métodos , Análise de DadosRESUMO
The development of cancer is intimately associated with genetic abnormalities that target proteins with intrinsically disordered regions (IDRs). In human haematological malignancies, recurrent chromosomal translocation of nucleoporin (NUP98 or NUP214) generates an aberrant chimera that invariably retains the nucleoporin IDR-tandemly dispersed repeats of phenylalanine and glycine residues1,2. However, how unstructured IDRs contribute to oncogenesis remains unclear. Here we show that IDRs contained within NUP98-HOXA9, a homeodomain-containing transcription factor chimera recurrently detected in leukaemias1,2, are essential for establishing liquid-liquid phase separation (LLPS) puncta of chimera and for inducing leukaemic transformation. Notably, LLPS of NUP98-HOXA9 not only promotes chromatin occupancy of chimera transcription factors, but also is required for the formation of a broad 'super-enhancer'-like binding pattern typically seen at leukaemogenic genes, which potentiates transcriptional activation. An artificial HOX chimera, created by replacing the phenylalanine and glycine repeats of NUP98 with an unrelated LLPS-forming IDR of the FUS protein3,4, had similar enhancing effects on the genome-wide binding and target gene activation of the chimera. Deeply sequenced Hi-C revealed that phase-separated NUP98-HOXA9 induces CTCF-independent chromatin loops that are enriched at proto-oncogenes. Together, this report describes a proof-of-principle example in which cancer acquires mutation to establish oncogenic transcription factor condensates via phase separation, which simultaneously enhances their genomic targeting and induces organization of aberrant three-dimensional chromatin structure during tumourous transformation. As LLPS-competent molecules are frequently implicated in diseases1,2,4-7, this mechanism can potentially be generalized to many malignant and pathological settings.
Assuntos
Cromatina/genética , Proteínas de Homeodomínio/genética , Proteínas Intrinsicamente Desordenadas/genética , Neoplasias/patologia , Complexo de Proteínas Formadoras de Poros Nucleares/genética , Translocação Genética , Animais , Carcinogênese , Feminino , Células HEK293 , Células HeLa , Humanos , Camundongos , Camundongos Endogâmicos BALB C , Neoplasias/genética , Proteínas de Fusão Oncogênica/genética , Fatores de Transcrição/genética , Ativação TranscricionalRESUMO
Three-dimensional (3D) chromatin structure has been shown to play a role in regulating gene transcription during biological transitions. Although our understanding of loop formation and maintenance is rapidly improving, much less is known about the mechanisms driving changes in looping and the impact of differential looping on gene transcription. One limitation has been a lack of well-powered differential looping data sets. To address this, we conducted a deeply sequenced Hi-C time course of megakaryocyte development comprising four biological replicates and 6 billion reads per time point. Statistical analysis revealed 1503 differential loops. Gained loop anchors were enriched for AP-1 occupancy and were characterized by large increases in histone H3K27ac (over 11-fold) but relatively small increases in CTCF and RAD21 binding (1.26- and 1.23-fold, respectively). Linear modeling revealed that changes in histone H3K27ac, chromatin accessibility, and JUN binding were better correlated with changes in looping than RAD21 and almost as well correlated as CTCF. Changes to epigenetic features between-rather than at-boundaries were highly predictive of changes in looping. Together these data suggest that although CTCF and RAD21 may be the core machinery dictating where loops form, other features (both at the anchors and within the loop boundaries) may play a larger role than previously anticipated in determining the relative loop strength across cell types and conditions.
Assuntos
Cromatina , Histonas , Histonas/metabolismo , Fator de Ligação a CCCTC/genética , Fator de Ligação a CCCTC/metabolismo , Cromatina/genética , Cromossomos/metabolismo , Diferenciação Celular/genéticaRESUMO
MOTIVATION: 3D chromatin structure plays an important role in regulating gene expression and alterations to this structure can result in developmental abnormalities and disease. While genomic approaches like Hi-C and Micro-C can provide valuable insights in 3D chromatin architecture, the resulting datasets are extremely large and difficult to manipulate. RESULTS: Here, we present mariner, a rapid and memory efficient tool to extract, aggregate, and plot data from Hi-C matrices within the R/Bioconductor environment. Mariner simplifies the process of querying and extracting contacts from multiple Hi-C files using a parallel and block-processing approach. Modular functions allow complete workflow customization for advanced users, yet all-in-one functions are available for running the most common types of analyses. Finally, tight integration with existing Bioconductor infrastructure enables complete analysis and visualization of Hi-C data in R. AVAILABILITY AND IMPLEMENTATION: Available on GitHub at https://github.com/EricSDavis/mariner and on Bioconductor at https://www.bioconductor.org/packages/release/bioc/html/mariner.html.
Assuntos
Cromatina , Software , Cromatina/metabolismo , Cromatina/química , Genômica/métodos , Humanos , Biologia Computacional/métodosRESUMO
MOTIVATION: Deriving biological insights from genomic data commonly requires comparing attributes of selected genomic loci to a null set of loci. The selection of this null set is non-trivial, as it requires careful consideration of potential covariates, a problem that is exacerbated by the non-uniform distribution of genomic features including genes, enhancers, and transcription factor binding sites. Propensity score-based covariate matching methods allow the selection of null sets from a pool of possible items while controlling for multiple covariates; however, existing packages do not operate on genomic data classes and can be slow for large data sets making them difficult to integrate into genomic workflows. RESULTS: To address this, we developed matchRanges, a propensity score-based covariate matching method for the efficient and convenient generation of matched null ranges from a set of background ranges within the Bioconductor framework. AVAILABILITY AND IMPLEMENTATION: Package: https://bioconductor.org/packages/nullranges, Code: https://github.com/nullranges, Documentation: https://nullranges.github.io/nullranges.
Assuntos
Genômica , Software , Genômica/métodos , Genoma , Sequências Reguladoras de Ácido Nucleico , Projetos de PesquisaRESUMO
MOTIVATION: Enrichment analysis is a widely utilized technique in genomic analysis that aims to determine if there is a statistically significant association between two sets of genomic features. To conduct this type of hypothesis testing, an appropriate null model is typically required. However, the null distribution that is commonly used can be overly simplistic and may result in inaccurate conclusions. RESULTS: bootRanges provides fast functions for generation of block bootstrapped genomic ranges representing the null hypothesis in enrichment analysis. As part of a modular workflow, bootRanges offers greater flexibility for computing various test statistics leveraging other Bioconductor packages. We show that shuffling or permutation schemes may result in overly narrow test statistic null distributions and over-estimation of statistical significance, while creating new range sets with a block bootstrap preserves local genomic correlation structure and generates more reliable null distributions. It can also be used in more complex analyses, such as accessing correlations between cis-regulatory elements (CREs) and genes across cell types or providing optimized thresholds, e.g. log fold change (logFC) from differential analysis. AVAILABILITY AND IMPLEMENTATION: bootRanges is freely available in the R/Bioconductor package nullranges hosted at https://bioconductor.org/packages/nullranges.
Assuntos
Genoma , Genômica , Genômica/métodos , SoftwareRESUMO
SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/excluderanges/. Package website: https://dozmorovlab.github.io/excluderanges/.
Assuntos
Genoma Humano , Software , Animais , Humanos , Camundongos , IncertezaRESUMO
MOTIVATION: The R programming language is one of the most widely used programming languages for transforming raw genomic datasets into meaningful biological conclusions through analysis and visualization, which has been largely facilitated by infrastructure and tools developed by the Bioconductor project. However, existing plotting packages rely on relative positioning and sizing of plots, which is often sufficient for exploratory analysis but is poorly suited for the creation of publication-quality multi-panel images inherent to scientific manuscript preparation. RESULTS: We present plotgardener, a coordinate-based genomic data visualization package that offers a new paradigm for multi-plot figure generation in R. Plotgardener allows precise, programmatic control over the placement, esthetics and arrangements of plots while maximizing user experience through fast and memory-efficient data access, support for a wide variety of data and file types, and tight integration with the Bioconductor environment. Plotgardener also allows precise placement and sizing of ggplot2 plots, making it an invaluable tool for R users and data scientists from virtually any discipline. AVAILABILITY AND IMPLEMENTATION: Package: https://bioconductor.org/packages/plotgardener, Code: https://github.com/PhanstielLab/plotgardener, Documentation: https://phanstiellab.github.io/plotgardener/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Linguagens de Programação , Software , Genômica , Genoma , Visualização de DadosRESUMO
INTRODUCTION: Alveolar macrophages (AMs) are lung-resident immune cells that phagocytose inhaled particles and pathogens, and help coordinate the lung's immune response to infection. Little is known about the impact of chronic e-cigarette use (ie, vaping) on this important pulmonary cell type. Thus, we determined the effect of vaping on AM phenotype and gene expression. AIMS AND METHODS: We recruited never-smokers, smokers, and e-cigarette users (vapers) and performed research bronchoscopies to isolate AMs from bronchoalveolar lavage fluid samples and epithelial cells from bronchial brushings. We then performed morphological analyses and used the Nanostring platform to look for changes in gene expression. RESULTS: AMs obtained from smokers and vapers were phenotypically distinct from those obtained from nonsmokers, and from each other. Immunocytochemistry revealed that vapers AMs had significantly elevated inducible nitric oxide synthase (M1) expression and significantly reduced CD301a (M2) expression compared with nonsmokers or smokers. Vapers' AMs and bronchial epithelia exhibited unique changes in gene expression compared with nonsmokers or smokers. Moreover, vapers' AMs were the most affected of all groups and had 124 genes uniquely downregulated. Gene ontology analysis revealed that vapers and smokers had opposing changes in biological processes. CONCLUSIONS: These data indicate that vaping causes unique changes to AMs and bronchial epithelia compared with nonsmokers and smokers which may impact pulmonary host defense. IMPLICATIONS: These data indicate that normal "healthy" vapers have altered AMs and may be at risk of developing abnormal immune responses to inflammatory stimuli.
Assuntos
Sistemas Eletrônicos de Liberação de Nicotina , Produtos do Tabaco , Vaping , Expressão Gênica , Humanos , Macrófagos Alveolares , Vaping/efeitos adversosRESUMO
The e-liquids used in electronic cigarettes (E-cigs) consist of propylene glycol (PG), vegetable glycerin (VG), nicotine, and chemical additives for flavoring. There are currently over 7,700 e-liquid flavors available, and while some have been tested for toxicity in the laboratory, most have not. Here, we developed a 3-phase, 384-well, plate-based, high-throughput screening (HTS) assay to rapidly triage and validate the toxicity of multiple e-liquids. Our data demonstrated that the PG/VG vehicle adversely affected cell viability and that a large number of e-liquids were more toxic than PG/VG. We also performed gas chromatography-mass spectrometry (GC-MS) analysis on all tested e-liquids. Subsequent nonmetric multidimensional scaling (NMDS) analysis revealed that e-liquids are an extremely heterogeneous group. Furthermore, these data indicated that (i) the more chemicals contained in an e-liquid, the more toxic it was likely to be and (ii) the presence of vanillin was associated with higher toxicity values. Further analysis of common constituents by electron ionization revealed that the concentration of cinnamaldehyde and vanillin, but not triacetin, correlated with toxicity. We have also developed a publicly available searchable website (www.eliquidinfo.org). Given the large numbers of available e-liquids, this website will serve as a resource to facilitate dissemination of this information. Our data suggest that an HTS approach to evaluate the toxicity of multiple e-liquids is feasible. Such an approach may serve as a roadmap to enable bodies such as the Food and Drug Administration (FDA) to better regulate e-liquid composition.
Assuntos
Sistemas Eletrônicos de Liberação de Nicotina , Aromatizantes/toxicidade , Glicerol/toxicidade , Nicotina/toxicidade , Propilenoglicol/toxicidade , Sobrevivência Celular/efeitos dos fármacos , Células Cultivadas , Biologia Computacional , Células Epiteliais/efeitos dos fármacos , Aromatizantes/química , Cromatografia Gasosa-Espectrometria de Massas , Células HEK293 , Humanos , Testes de ToxicidadeRESUMO
"Pod-based" e-cigarettes such as JUUL are currently the most prevalent electronic nicotine delivery systems (ENDS) in the United States. JUUL-type ENDS utilize nicotine salts protonated with benzoic acid rather than freebase nicotine. However, limited information is available on the cellular effects of these products. Cytoplasmic Ca2+ is a universal second messenger that controls many cellular functions including cell growth and cell death. Of note, dysregulation of cell Ca2+ homeostasis has been linked with several disease processes including autoimmune disease and several types of cancer. We exposed HEK293T cells and THP-1 macrophage-like cells to different JUUL e-liquids. We evaluated their effects on cellular viability and Ca2+ signaling by measuring fluorescence from calcein-AM/propidium iodide and Fluo-4, respectively. E-liquid autofluorescence was used to look for e-liquid permeation into cells. To identify the mechanisms behind the Ca2+ responses, different inhibitors of Ca2+ channels and phospholipase C signaling were used. JUUL e-liquids caused significant cytotoxic effects, with "Mint" flavor being the most cytotoxic. The Mint flavored e-liquid also caused a significant elevation in cytoplasmic Ca2+ . Using autofluorescence, the permeation of JUUL e-liquids into live cells was confirmed, indicating that intracellular organelles are directly exposed to e-liquids. Further studies identified the endoplasmic reticulum as being the source of e-liquid-induced changes in cytoplasmic Ca2+ . Nicotine salt-based e-liquids cause cytotoxicity and elevate cytoplasmic Ca2+ , indicating that they can exert biological effects beyond what would be expected with nicotine alone. These effects are flavor-dependent, and we propose that flavored e-liquids be reassessed for potential lung toxicity.
Assuntos
Morte Celular/efeitos dos fármacos , Sobrevivência Celular/efeitos dos fármacos , Células Cultivadas/efeitos dos fármacos , Vapor do Cigarro Eletrônico/toxicidade , Sistemas Eletrônicos de Liberação de Nicotina , Aromatizantes/toxicidade , Nicotina/toxicidade , Humanos , Estados UnidosRESUMO
A piezoelectric-based method for information storage is presented. It involves engineering the polarization profiles of multiple piezoelectric wafers to enhance/suppress specific electromechanical resonances. These enhanced/suppressed resonances can be used to represent multiple frequency-dependent bits, thus enabling multi-level information storage. This multi-level information storage is demonstrated by achieving three information states for a ternary encoding. Using the three information states, we present an approach to encode and decode information from a 2-by-3 array of piezoelectric wafers that we refer to as a concept Piezoelectric Quick Response (PQR) code. The scaling relation between the number of wafers used and the cumulative number of information states that can be achieved with the proposed methodology is briefly discussed. Potential applications of this methodology include tamper-evident devices, embedded product tags in manufacturing/inventory tracking, and additional layers of security with existing information storage technologies.
RESUMO
RATIONALE: E-cigarettes vaporize propylene glycol/vegetable glycerin (PG/VG), nicotine, and flavorings. However, the long-term health effects of exposing lungs to vaped e-liquids are unknown. OBJECTIVES: To determine the effects of chronic vaping on pulmonary epithelia. METHODS: We performed research bronchoscopies on healthy nonsmokers, cigarette smokers, and e-cigarette users (vapers) and obtained bronchial brush biopsies and lavage samples from these subjects for proteomic investigation. We further employed in vitro and murine exposure models to support our human findings. MEASUREMENTS AND MAIN RESULTS: Visual inspection by bronchoscopy revealed that vaper airways appeared friable and erythematous. Epithelial cells from biopsy samples revealed approximately 300 proteins that were differentially expressed in smoker and vaper airways, with only 78 proteins being commonly altered in both groups and 113 uniquely altered in vapers. For example, CYP1B1 (cytochrome P450 family 1 subfamily B member 1), MUC5AC (mucin 5 AC), and MUC4 levels were increased in vapers. Aerosolized PG/VG alone significantly increased MUC5AC protein in human airway epithelial cultures and in murine nasal epithelia in vivo. We also found that e-liquids rapidly entered cells and that PG/VG reduced membrane fluidity and impaired protein diffusion. CONCLUSIONS: We conclude that chronic vaping exerts marked biological effects on the lung and that these effects may in part be mediated by the PG/VG base. These changes are likely not harmless and may have clinical implications for the development of chronic lung disease. Further studies will be required to determine the full extent of vaping on the lung.
Assuntos
Brônquios/efeitos dos fármacos , Sistemas Eletrônicos de Liberação de Nicotina , Células Epiteliais/efeitos dos fármacos , Pulmão/efeitos dos fármacos , Nicotina/efeitos adversos , Proteoma/efeitos dos fármacos , Fumantes , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-IdadeRESUMO
Implantation is a complex event demanding contributions from both embryo and endometrium. Despite advances in assisted reproduction, endometrial receptivity defects persist as a barrier to successful implantation in women with infertility. We previously demonstrated that maternal haploinsufficiency for the endocrine peptide adrenomedullin (AM) in mice confers a subfertility phenotype characterized by defective uterine receptivity and sparse epithelial pinopode coverage. The strong link between AM and implantation suggested the compelling hypothesis that administration of AM prior to implantation may improve fertility, protect against pregnancy complications, and ultimately lead to better maternal and fetal outcomes. Here, we demonstrate that intrauterine delivery of AM prior to blastocyst transfer improves the embryo implantation rate and spacing within the uterus. We then use genetic decrease-of-function and pharmacologic gain-of-function mouse models to identify potential mechanisms by which AM confers enhanced implantation success. In epithelium, we find that AM accelerates the kinetics of pinopode formation and water transport and that, in stroma, AM promotes connexin 43 expression, gap junction communication, and barrier integrity of the primary decidual zone. Ultimately, our findings advance our understanding of the contributions of AM to uterine receptivity and suggest potential broad use for AM as therapy to encourage healthy embryo implantation, for example, in combination with in vitro fertilization.
Assuntos
Adrenomedulina/farmacologia , Endométrio/citologia , Endométrio/efeitos dos fármacos , Fármacos para a Fertilidade Feminina/farmacologia , Fertilidade/efeitos dos fármacos , Junções Intercelulares/efeitos dos fármacos , Útero/citologia , Útero/efeitos dos fármacos , Animais , Comunicação Celular/efeitos dos fármacos , Conexina 43/biossíntese , Decídua/citologia , Decídua/efeitos dos fármacos , Implantação do Embrião/efeitos dos fármacos , Transferência Embrionária , Feminino , Junções Comunicantes/efeitos dos fármacos , Humanos , Camundongos , Camundongos Knockout , Água/metabolismoRESUMO
Triple-negative breast cancer (TNBC) is the most therapeutically recalcitrant form of breast cancer, which is due in part to the paucity of targeted therapies. A systematic analysis of regulatory elements that extend beyond protein-coding genes could uncover avenues for therapeutic intervention. To this end, we analyzed the regulatory mechanisms of TNBC-specific transcriptional enhancers together with their noncoding enhancer RNA (eRNA) transcripts. The functions of the top 30 eRNA-producing super-enhancers were systematically probed using high-throughput CRISPR-interference assays coupled to RNA sequencing that enabled unbiased detection of target genes genome-wide. Generation of high-resolution Hi-C chromatin interaction maps enabled annotation of the direct target genes for each super-enhancer, which highlighted their proclivity for genes that portend worse clinical outcomes in patients with TNBC. Illustrating the utility of this dataset, deletion of an identified super-enhancer controlling the nearby PODXL gene or specific degradation of its eRNAs led to profound inhibitory effects on target gene expression, cell proliferation, and migration. Furthermore, loss of this super-enhancer suppressed tumor growth and metastasis in TNBC mouse xenograft models. Single-cell RNA sequencing and assay for transposase-accessible chromatin with high-throughput sequencing analyses demonstrated the enhanced activity of this super-enhancer within the malignant cells of TNBC tumor specimens compared with nonmalignant cell types. Collectively, this work examines several fundamental questions about how regulatory information encoded into eRNA-producing super-enhancers drives gene expression networks that underlie the biology of TNBC. Significance: Integrative analysis of eRNA-producing super-enhancers defines molecular mechanisms controlling global patterns of gene expression that regulate clinical outcomes in breast cancer, highlighting the potential of enhancers as biomarkers and therapeutic targets.
Assuntos
Elementos Facilitadores Genéticos , Regulação Neoplásica da Expressão Gênica , Neoplasias de Mama Triplo Negativas , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/patologia , Humanos , Animais , Feminino , Camundongos , Progressão da Doença , Proliferação de Células/genética , Linhagem Celular Tumoral , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Sistemas CRISPR-CasRESUMO
The growth of omic data presents evolving challenges in data manipulation, analysis, and integration. Addressing these challenges, Bioconductor1 provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming2 offers a revolutionary standard for data organisation and manipulation. Here, we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning, and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analysing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas3, spanning six data frameworks and ten analysis tools.
RESUMO
Motivation: Three-dimensional chromatin structure plays an important role in gene regulation by connecting regulatory regions and gene promoters. The ability to detect the formation and loss of these loops in various cell types and conditions provides valuable information on the mechanisms driving these cell states and is critical for understanding long-range gene regulation. Hi-C is a powerful technique for characterizing 3D chromatin structure; however, Hi-C can quickly become costly and labor-intensive, and proper planning is required to ensure efficient use of time and resources while maintaining experimental rigor and well-powered results. Results: To facilitate better planning and interpretation of human Hi-C experiments, we conducted a detailed evaluation of statistical power using publicly available Hi-C datasets, paying particular attention to the impact of loop size on Hi-C contacts and fold change compression. In addition, we have developed Hi-C Poweraid, a publicly hosted web application to investigate these findings. For experiments involving well-replicated cell lines, we recommend a total sequencing depth of at least 6 billion contacts per condition, split between at least two replicates to achieve the power to detect differences in the majority of loops. For experiments with higher variation, more replicates and deeper sequencing depths are required. Values for specific cases can be determined by using Hi-C Poweraid. This tool simplifies Hi-C power calculations, allowing for more efficient use of time and resources and more accurate interpretation of experimental results. Availability and implementation: Hi-C Poweraid is available as an R Shiny application deployed at http://phanstiel-lab.med.unc.edu/poweraid/, with code available at https://github.com/sarmapar/poweraid.
RESUMO
3D chromatin structure plays an important role in gene regulation by connecting regulatory regions and gene promoters. The ability to detect the formation and loss of these loops in various cell types and conditions provides valuable information on the mechanisms driving these cell states and is critical for understanding how long-range gene regulation works. Hi-C is a powerful technique used to characterize three-dimensional chromatin structure; however, Hi-C can quickly become a costly and labor-intensive endeavor, and proper planning is required to determine how to best use time and resources while maintaining experimental rigor and well-powered results. To facilitate better planning and interpretation of Hi-C experiments, we have conducted a detailed evaluation of statistical power using publicly available Hi-C datasets paying particular attention to the impact of loop size on Hi-C contacts and fold change compression. In addition, we have developed Hi-C Poweraid, a publicly-hosted web application to investigate these findings (http://phanstiel-lab.med.unc.edu/poweraid/). For experiments involving well-replicated cell lines, we recommend a total sequencing depth of at least 6 billion contacts per condition, split between at least 2 replicates in order to achieve the power to detect the majority of differential loops. For experiments with higher variation, more replicates and deeper sequencing depths are required. Exact values and recommendations for specific cases can be determined through the use of Hi-C Poweraid. This tool simplifies the complexities behind calculating power for Hi-C data and will provide useful information on the amount of well-powered loops an experiment will be able to detect given a specific set of experimental parameters, such as sequencing depth, replicates, and the sizes of the loops of interest. This will allow for more efficient use of time and resources and more accurate interpretation of experimental results.
RESUMO
During mouse embryogenesis, expression of the long non-coding RNA (lncRNA) Airn leads to gene repression and recruitment of Polycomb repressive complexes (PRCs) to varying extents over a 15-Mb domain. The mechanisms remain unclear. Using high-resolution approaches, we show in mouse trophoblast stem cells that Airn expression induces long-range changes to chromatin architecture that coincide with PRC-directed modifications and center around CpG island promoters that contact the Airn locus even in the absence of Airn expression. Intensity of contact between the Airn lncRNA and chromatin correlated with underlying intensity of PRC recruitment and PRC-directed modifications. Deletion of CpG islands that contact the Airn locus altered long-distance repression and PRC activity in a manner that correlated with changes in chromatin architecture. Our data imply that the extent to which Airn expression recruits PRCs to chromatin is controlled by DNA regulatory elements that modulate proximity of the Airn lncRNA product to its target DNA.
Assuntos
RNA Longo não Codificante , Animais , Camundongos , Cromatina , Desenvolvimento Embrionário , Proteínas do Grupo Polycomb/genética , Proteínas do Grupo Polycomb/metabolismo , Regiões Promotoras Genéticas/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismoRESUMO
Nuclear compartments are prominent features of 3D chromatin organization, but sequencing depth limitations have impeded investigation at ultra fine-scale. CTCF loops are generally studied at a finer scale, but the impact of looping on proximal interactions remains enigmatic. Here, we critically examine nuclear compartments and CTCF loop-proximal interactions using a combination of in situ Hi-C at unparalleled depth, algorithm development, and biophysical modeling. Producing a large Hi-C map with 33 billion contacts in conjunction with an algorithm for performing principal component analysis on sparse, super massive matrices (POSSUMM), we resolve compartments to 500 bp. Our results demonstrate that essentially all active promoters and distal enhancers localize in the A compartment, even when flanking sequences do not. Furthermore, we find that the TSS and TTS of paused genes are often segregated into separate compartments. We then identify diffuse interactions that radiate from CTCF loop anchors, which correlate with strong enhancer-promoter interactions and proximal transcription. We also find that these diffuse interactions depend on CTCF's RNA binding domains. In this work, we demonstrate features of fine-scale chromatin organization consistent with a revised model in which compartments are more precise than commonly thought while CTCF loops are more protracted.