Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 40(6)2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38837370

ABSTRACT

MOTIVATION: The BigWig and BigBed file formats were originally designed for the visualization of next-generation sequencing data through a genome browser. Due to their versatility, these formats have long since become ubiquitous for the storage of processed sequencing data and regularly serve as the basis for downstream data analysis. As the number and size of sequencing experiments continues to accelerate, there is an increasing demand to efficiently generate and query BigWig and BigBed files in a scalable and robust manner, and to efficiently integrate these functionalities into data analysis environments and third-party applications. RESULTS: Here, we present Bigtools, a feature-complete, high-performance, and integrable software library for generating and querying both BigWig and BigBed files. Bigtools is written in the Rust programming language and includes a flexible suite of command line tools as well as bindings to Python. AVAILABILITY AND IMPLEMENTATION: Bigtools is cross-platform and released under the MIT license. It is distributed on Crates.io, Bioconda, and the Python Package Index, and the source code is available at https://github.com/jackh726/bigtools.


Subject(s)
High-Throughput Nucleotide Sequencing , Software , High-Throughput Nucleotide Sequencing/methods , Programming Languages
2.
PLoS Comput Biol ; 20(5): e1012067, 2024 May.
Article in English | MEDLINE | ID: mdl-38709825

ABSTRACT

Chromosome conformation capture (3C) technologies reveal the incredible complexity of genome organization. Maps of increasing size, depth, and resolution are now used to probe genome architecture across cell states, types, and organisms. Larger datasets add challenges at each step of computational analysis, from storage and memory constraints to researchers' time; however, analysis tools that meet these increased resource demands have not kept pace. Furthermore, existing tools offer limited support for customizing analysis for specific use cases or new biology. Here we introduce cooltools (https://github.com/open2c/cooltools), a suite of computational tools that enables flexible, scalable, and reproducible analysis of high-resolution contact frequency data. Cooltools leverages the widely-adopted cooler format which handles storage and access for high-resolution datasets. Cooltools provides a paired command line interface (CLI) and Python application programming interface (API), which respectively facilitate workflows on high-performance computing clusters and in interactive analysis environments. In short, cooltools enables the effective use of the latest and largest genome folding datasets.


Subject(s)
Computational Biology , Software , Computational Biology/methods , Programming Languages , Genomics/methods , Genome/genetics , Chromosome Mapping/methods , Humans
3.
PLoS Comput Biol ; 20(5): e1012164, 2024 May.
Article in English | MEDLINE | ID: mdl-38809952

ABSTRACT

The field of 3D genome organization produces large amounts of sequencing data from Hi-C and a rapidly-expanding set of other chromosome conformation protocols (3C+). Massive and heterogeneous 3C+ data require high-performance and flexible processing of sequenced reads into contact pairs. To meet these challenges, we present pairtools-a flexible suite of tools for contact extraction from sequencing data. Pairtools provides modular command-line interface (CLI) tools that can be flexibly chained into data processing pipelines. The core operations provided by pairtools are parsing of.sam alignments into Hi-C pairs, sorting and removal of PCR duplicates. In addition, pairtools provides auxiliary tools for building feature-rich 3C+ pipelines, including contact pair manipulation, filtration, and quality control. Benchmarking pairtools against popular 3C+ data pipelines shows advantages of pairtools for high-performance and flexible 3C+ analysis. Finally, pairtools provides protocol-specific tools for restriction-based protocols, haplotype-resolved contacts, and single-cell Hi-C. The combination of CLI tools and tight integration with Python data analysis libraries makes pairtools a versatile foundation for a broad range of 3C+ pipelines.


Subject(s)
Chromosomes , Computational Biology , Software , Chromosomes/genetics , Chromosomes/chemistry , Computational Biology/methods , Humans , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Chromosome Mapping/methods
4.
Bioinformatics ; 40(2)2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38402507

ABSTRACT

MOTIVATION: Genomic intervals are one of the most prevalent data structures in computational genome biology, and used to represent features ranging from genes, to DNA binding sites, to disease variants. Operations on genomic intervals provide a language for asking questions about relationships between features. While there are excellent interval arithmetic tools for the command line, they are not smoothly integrated into Python, one of the most popular general-purpose computational and visualization environments. RESULTS: Bioframe is a library to enable flexible and performant operations on genomic interval dataframes in Python. Bioframe extends the Python data science stack to use cases for computational genome biology by building directly on top of two of the most commonly-used Python libraries, NumPy and Pandas. The bioframe API enables flexible name and column orders, and decouples operations from data formats to avoid unnecessary conversions, a common scourge for bioinformaticians. Bioframe achieves these goals while maintaining high performance and a rich set of features. AVAILABILITY AND IMPLEMENTATION: Bioframe is open-source under MIT license, cross-platform, and can be installed from the Python Package Index. The source code is maintained by Open2C on GitHub at https://github.com/open2c/bioframe.


Subject(s)
Computational Biology , Genomics , Gene Library , Binding Sites , Data Science
5.
J Cell Biol ; 223(4)2024 04 01.
Article in English | MEDLINE | ID: mdl-38376465

ABSTRACT

DNA methylation (DNAme) is a key epigenetic mark that regulates critical biological processes maintaining overall genome stability. Given its pleiotropic function, studies of DNAme dynamics are crucial, but currently available tools to interfere with DNAme have limitations and major cytotoxic side effects. Here, we present cell models that allow inducible and reversible DNAme modulation through DNMT1 depletion. By dynamically assessing whole genome and locus-specific effects of induced passive demethylation through cell divisions, we reveal a cooperative activity between DNMT1 and DNMT3B, but not of DNMT3A, to maintain and control DNAme. We show that gradual loss of DNAme is accompanied by progressive and reversible changes in heterochromatin, compartmentalization, and peripheral localization. DNA methylation loss coincides with a gradual reduction of cell fitness due to G1 arrest, with minor levels of mitotic failure. Altogether, this system allows DNMTs and DNA methylation studies with fine temporal resolution, which may help to reveal the etiologic link between DNAme dysfunction and human disease.


Subject(s)
DNA (Cytosine-5-)-Methyltransferase 1 , DNA Methylation , DNA Methyltransferase 3A , Epigenomics , Humans , Cell Division , Heterochromatin/genetics , DNA (Cytosine-5-)-Methyltransferase 1/genetics , DNA Methyltransferase 3A/genetics , Cell Line
6.
bioRxiv ; 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-38370777

ABSTRACT

The BigWig and BigBed file formats were originally designed for the visualization of next-generation sequencing data through a genome browser. Due to their versatility, these formats have long since become ubiquitous for the storage of processed sequencing data and regularly serve as the basis for downstream data analysis. As the number and size of sequencing experiments continues to accelerate, there is an increasing demand to efficiently generate and query BigWig and BigBed files in a scalable and robust manner, and to efficiently integrate these functionalities into data analysis environments and third-party applications. Here, we present Bigtools, a feature-complete, high-performance, and integrable software library for generating and querying both BigWig and BigBed files. Bigtools is written in the Rust programming language and includes a flexible suite of command line tools as well as bindings to Python. Bigtools is cross-platform and released under the MIT license. It is distributed on Crates.io and the Python Package Index, and the source code is available at https://github.com/jackh726/bigtools.

7.
Nat Metab ; 5(5): 861-879, 2023 05.
Article in English | MEDLINE | ID: mdl-37253881

ABSTRACT

Recent large-scale genomic association studies found evidence for a genetic link between increased risk of type 2 diabetes and decreased risk for adiposity-related traits, reminiscent of metabolically obese normal weight (MONW) association signatures. However, the target genes and cellular mechanisms driving such MONW associations remain to be identified. Here, we systematically identify the cellular programmes of one of the top-scoring MONW risk loci, the 2q24.3 risk locus, in subcutaneous adipocytes. We identify a causal genetic variant, rs6712203, an intronic single-nucleotide polymorphism in the COBLL1 gene, which changes the conserved transcription factor motif of POU domain, class 2, transcription factor 2, and leads to differential COBLL1 gene expression by altering the enhancer activity at the locus in subcutaneous adipocytes. We then establish the cellular programme under the genetic control of the 2q24.3 MONW risk locus and the effector gene COBLL1, which is characterized by impaired actin cytoskeleton remodelling in differentiating subcutaneous adipocytes and subsequent failure of these cells to accumulate lipids and develop into metabolically active and insulin-sensitive adipocytes. Finally, we show that perturbations of the effector gene Cobll1 in a mouse model result in organismal phenotypes matching the MONW association signature, including decreased subcutaneous body fat mass and body weight along with impaired glucose tolerance. Taken together, our results provide a mechanistic link between the genetic risk for insulin resistance and low adiposity, providing a potential therapeutic hypothesis and a framework for future identification of causal relationships between genome associations and cellular programmes in other disorders.


Subject(s)
Actins , Adipocytes , Obesity, Metabolically Benign , Humans , Adipocytes/metabolism , Actins/metabolism , Obesity, Metabolically Benign/genetics , Transcription Factors/genetics , Subcutaneous Fat/metabolism , Cells, Cultured , Haplotypes , Mice, Knockout , Male , Female , Mice , Animals
8.
bioRxiv ; 2023 Feb 15.
Article in English | MEDLINE | ID: mdl-36824968

ABSTRACT

The field of 3D genome organization produces large amounts of sequencing data from Hi-C and a rapidly-expanding set of other chromosome conformation protocols (3C+). Massive and heterogeneous 3C+ data require high-performance and flexible processing of sequenced reads into contact pairs. To meet these challenges, we present pairtools - a flexible suite of tools for contact extraction from sequencing data. Pairtools provides modular command-line interface (CLI) tools that can be flexibly chained into data processing pipelines. Pairtools provides both crucial core tools as well as auxiliary tools for building feature-rich 3C+ pipelines, including contact pair manipulation, filtration, and quality control. Benchmarking pairtools against popular 3C+ data pipelines shows advantages of pairtools for high-performance and flexible 3C+ analysis. Finally, pairtools provides protocol-specific tools for multi-way contacts, haplotype-resolved contacts, and single-cell Hi-C. The combination of CLI tools and tight integration with Python data analysis libraries makes pairtools a versatile foundation for a broad range of 3C+ pipelines.

9.
Nat Struct Mol Biol ; 30(1): 38-51, 2023 Jan.
Article in English | MEDLINE | ID: mdl-36550219

ABSTRACT

The relationships between chromosomal compartmentalization, chromatin state and function are poorly understood. Here by profiling long-range contact frequencies in HCT116 colon cancer cells, we distinguish three silent chromatin states, comprising two types of heterochromatin and a state enriched for H3K9me2 and H2A.Z that exhibits neutral three-dimensional interaction preferences and which, to our knowledge, has not previously been characterized. We find that heterochromatin marked by H3K9me3, HP1α and HP1ß correlates with strong compartmentalization. We demonstrate that disruption of DNA methyltransferase activity greatly remodels genome compartmentalization whereby domains lose H3K9me3-HP1α/ß binding and acquire the neutrally interacting state while retaining late replication timing. Furthermore, we show that H3K9me3-HP1α/ß heterochromatin is permissive to loop extrusion by cohesin but refractory to CTCF binding. Together, our work reveals a dynamic structural and organizational diversity of the silent portion of the genome and establishes connections between the regulation of chromatin state and chromosome organization, including an interplay between DNA methylation, compartmentalization and loop extrusion.


Subject(s)
Chromatin , Heterochromatin , Methylation , Histones/metabolism , Chromobox Protein Homolog 5 , Transcription Factors/metabolism
11.
Nat Commun ; 13(1): 2365, 2022 05 02.
Article in English | MEDLINE | ID: mdl-35501320

ABSTRACT

The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( https://data.4dnucleome.org/ ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.


Subject(s)
Chromosomes , Software , Cell Nucleus/genetics , Chromosomes/genetics , Genome
13.
Bioinformatics ; 37(14): 2053-2054, 2021 08 04.
Article in English | MEDLINE | ID: mdl-33135074

ABSTRACT

MOTIVATION: Single-cell Hi-C research currently lacks an efficient, easy to use and shareable data storage format. Recent studies have used a variety of sub-optimal solutions: publishing raw data only, text-based interaction matrices, or reusing established Hi-C storage formats for single interaction matrices. These approaches are storage and pre-processing intensive, require long labour time and are often error-prone. RESULTS: The single-cell cooler file format (scool) provides an efficient, user-friendly and storage-saving approach for single-cell Hi-C data. It is a flavour of the established cooler format and guarantees stable API support. AVAILABILITY AND IMPLEMENTATION: The single-cell cooler format is part of the cooler file format as of API version 0.8.9. It is available via pip, conda and github: https://github.com/mirnylab/cooler. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Information Storage and Retrieval , Software
14.
Mol Cell ; 78(3): 554-565.e7, 2020 05 07.
Article in English | MEDLINE | ID: mdl-32213324

ABSTRACT

Over the past decade, 3C-related methods have provided remarkable insights into chromosome folding in vivo. To overcome the limited resolution of prior studies, we extend a recently developed Hi-C variant, Micro-C, to map chromosome architecture at nucleosome resolution in human ESCs and fibroblasts. Micro-C robustly captures known features of chromosome folding including compartment organization, topologically associating domains, and interactions between CTCF binding sites. In addition, Micro-C provides a detailed map of nucleosome positions and localizes contact domain boundaries with nucleosomal precision. Compared to Hi-C, Micro-C exhibits an order of magnitude greater dynamic range, allowing the identification of ∼20,000 additional loops in each cell type. Many newly identified peaks are localized along extrusion stripes and form transitive grids, consistent with their anchors being pause sites impeding cohesin-dependent loop extrusion. Our analyses comprise the highest-resolution maps of chromosome folding in human cells to date, providing a valuable resource for studies of chromosome organization.


Subject(s)
Chromosomes, Human/ultrastructure , Animals , CCCTC-Binding Factor/metabolism , Cells, Cultured , Chromatin/chemistry , Chromosomes, Mammalian/ultrastructure , Embryonic Stem Cells/cytology , Fibroblasts/cytology , Humans , Male , Mammals/genetics , Nucleosomes/metabolism , Nucleosomes/ultrastructure , Signal-To-Noise Ratio
15.
Bioinformatics ; 36(1): 311-316, 2020 01 01.
Article in English | MEDLINE | ID: mdl-31290943

ABSTRACT

MOTIVATION: Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature, while supporting efficient compression and providing fast random access to facilitate development of scalable algorithms for data analysis. RESULTS: We developed a file format called cooler, based on a sparse data model, that can support genomically labeled matrices at any resolution. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns and metadata. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. AVAILABILITY AND IMPLEMENTATION: Cooler is cross-platform, BSD-licensed and can be installed from the Python package index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , Information Storage and Retrieval , Software , Algorithms , Genomics/methods , Metadata
16.
Nat Commun ; 10(1): 4486, 2019 10 03.
Article in English | MEDLINE | ID: mdl-31582744

ABSTRACT

Genome organization involves cis and trans chromosomal interactions, both implicated in gene regulation, development, and disease. Here, we focus on trans interactions in Drosophila, where homologous chromosomes are paired in somatic cells from embryogenesis through adulthood. We first address long-standing questions regarding the structure of embryonic homolog pairing and, to this end, develop a haplotype-resolved Hi-C approach to minimize homolog misassignment and thus robustly distinguish trans-homolog from cis contacts. This computational approach, which we call Ohm, reveals pairing to be surprisingly structured genome-wide, with trans-homolog domains, compartments, and interaction peaks, many coinciding with analogous cis features. We also find a significant genome-wide correlation between pairing, transcription during zygotic genome activation, and binding of the pioneer factor Zelda. Our findings reveal a complex, highly structured organization underlying homolog pairing, first discovered a century ago in Drosophila. Finally, we demonstrate the versatility of our haplotype-resolved approach by applying it to mammalian embryos.


Subject(s)
Chromosome Pairing , Chromosomes, Insect/genetics , Drosophila melanogaster/genetics , Genome, Insect , Animals , Cell Culture Techniques , Cell Line , Chromatin/metabolism , Computational Biology , Datasets as Topic , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Embryo, Mammalian , Embryo, Nonmammalian , Female , Genomics/methods , High-Throughput Nucleotide Sequencing , Male , Mice , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , RNA, Small Interfering/metabolism , Sequence Homology, Nucleic Acid , Transcription, Genetic , Zygote
17.
Curr Opin Cell Biol ; 58: 142-152, 2019 06.
Article in English | MEDLINE | ID: mdl-31228682

ABSTRACT

The spatial organization of chromosomes has long been connected to their polymeric nature and is believed to be important for their biological functions, including the control of interactions between genomic elements, the maintenance of genetic information, and the compaction and safe transfer of chromosomes to cellular progeny. chromosome conformation capture techniques, particularly Hi-C, have provided a comprehensive picture of spatial chromosome organization and revealed new features and elements of chromosome folding. Furthermore, recent advances in microscopy have made it possible to obtain distance maps for extensive regions of chromosomes (Bintu et al., 2018; Nir et al., 2018 [2••,3]), providing information complementary to, and in excellent agreement with, Hi-C maps. Not only has the resolution of both techniques advanced significantly, but new perturbation data generated in the last two years have led to the identification of molecular mechanisms behind large-scale genome organization. Two major mechanisms that have been proposed to govern chromosome organization are (i) the active (ATP-dependent) process of loop extrusion by Structural Maintenance of Chromosomes (SMC) complexes, and (ii) the spatial compartmentalization of the genome, which is likely mediated by affinity interactions between heterochromatic regions (Falk et al., 2019 [76••]) rather than by ATP-dependent processes. Here, we review existing evidence that these two processes operate together to fold chromosomes in interphase and that loop extrusion alone drives mitotic compaction. We discuss possible implications of these mechanisms for chromosome function.


Subject(s)
Chromosomes/chemistry , Animals , CCCTC-Binding Factor/metabolism , Cell Cycle , Chromosome Structures , Chromosomes/metabolism , Gene Expression Regulation , Genome , Humans , Interphase
18.
Genome Biol ; 19(1): 125, 2018 08 24.
Article in English | MEDLINE | ID: mdl-30143029

ABSTRACT

We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others. We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. HiGlass is accessible online at http://higlass.io and is also available as a containerized application that can be run on any platform.


Subject(s)
Chromosome Mapping , Genome , Internet , User-Computer Interface
19.
Proc Natl Acad Sci U S A ; 115(29): E6697-E6706, 2018 07 17.
Article in English | MEDLINE | ID: mdl-29967174

ABSTRACT

Mammalian chromatin is spatially organized at many scales showing two prominent features in interphase: (i) alternating regions (1-10 Mb) of active and inactive chromatin that spatially segregate into different compartments, and (ii) domains (<1 Mb), that is, regions that preferentially interact internally [topologically associating domains (TADs)] and are central to gene regulation. There is growing evidence that TADs are formed by active extrusion of chromatin loops by cohesin, whereas compartmentalization is established according to local chromatin states. Here, we use polymer simulations to examine how loop extrusion and compartmental segregation work collectively and potentially interfere in shaping global chromosome organization. A model with differential attraction between euchromatin and heterochromatin leads to phase separation and reproduces compartmentalization as observed in Hi-C. Loop extrusion, essential for TAD formation, in turn, interferes with compartmentalization. Our integrated model faithfully reproduces Hi-C data from puzzling experimental observations where altering loop extrusion also led to changes in compartmentalization. Specifically, depletion of chromatin-associated cohesin reduced TADs and revealed finer compartments, while increased processivity of cohesin strengthened large TADs and reduced compartmentalization; and depletion of the TAD boundary protein CTCF weakened TADs while leaving compartments unaffected. We reveal that these experimental perturbations are special cases of a general polymer phenomenon of active mixing by loop extrusion. Our results suggest that chromatin organization on the megabase scale emerges from competition of nonequilibrium active loop extrusion and epigenetically defined compartment structure.


Subject(s)
Chromatin Assembly and Disassembly/physiology , Chromatin/metabolism , Chromosomes, Mammalian/metabolism , Models, Biological , Animals , Cell Cycle Proteins/metabolism , Chromosomal Proteins, Non-Histone/metabolism , Cohesins
20.
Nature ; 551(7678): 51-56, 2017 11 02.
Article in English | MEDLINE | ID: mdl-29094699

ABSTRACT

Imaging and chromosome conformation capture studies have revealed several layers of chromosome organization, including segregation into megabase-sized active and inactive compartments, and partitioning into sub-megabase domains (TADs). It remains unclear, however, how these layers of organization form, interact with one another and influence genome function. Here we show that deletion of the cohesin-loading factor Nipbl in mouse liver leads to a marked reorganization of chromosomal folding. TADs and associated Hi-C peaks vanish globally, even in the absence of transcriptional changes. By contrast, compartmental segregation is preserved and even reinforced. Strikingly, the disappearance of TADs unmasks a finer compartment structure that accurately reflects the underlying epigenetic landscape. These observations demonstrate that the three-dimensional organization of the genome results from the interplay of two independent mechanisms: cohesin-independent segregation of the genome into fine-scale compartments, defined by chromatin state; and cohesin-dependent formation of TADs, possibly by loop extrusion, which helps to guide distant enhancers to their target genes.


Subject(s)
Cell Cycle Proteins/metabolism , Chromatin/metabolism , Chromosomal Proteins, Non-Histone/metabolism , Chromosome Positioning , Animals , Chromatin/chemistry , Chromatin/genetics , Enhancer Elements, Genetic/genetics , Epigenesis, Genetic , Liver/metabolism , Mice , Transcription Factors/deficiency , Transcription Factors/genetics , Transcription Factors/metabolism , Transcription, Genetic , Cohesins
SELECTION OF CITATIONS
SEARCH DETAIL