Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 43
1.
PLoS Comput Biol ; 20(5): e1012164, 2024 May.
Article En | MEDLINE | ID: mdl-38809952

The field of 3D genome organization produces large amounts of sequencing data from Hi-C and a rapidly-expanding set of other chromosome conformation protocols (3C+). Massive and heterogeneous 3C+ data require high-performance and flexible processing of sequenced reads into contact pairs. To meet these challenges, we present pairtools-a flexible suite of tools for contact extraction from sequencing data. Pairtools provides modular command-line interface (CLI) tools that can be flexibly chained into data processing pipelines. The core operations provided by pairtools are parsing of.sam alignments into Hi-C pairs, sorting and removal of PCR duplicates. In addition, pairtools provides auxiliary tools for building feature-rich 3C+ pipelines, including contact pair manipulation, filtration, and quality control. Benchmarking pairtools against popular 3C+ data pipelines shows advantages of pairtools for high-performance and flexible 3C+ analysis. Finally, pairtools provides protocol-specific tools for restriction-based protocols, haplotype-resolved contacts, and single-cell Hi-C. The combination of CLI tools and tight integration with Python data analysis libraries makes pairtools a versatile foundation for a broad range of 3C+ pipelines.


Chromosomes , Computational Biology , Software , Chromosomes/genetics , Chromosomes/chemistry , Computational Biology/methods , Humans , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Chromosome Mapping/methods
2.
PLoS Comput Biol ; 20(5): e1012067, 2024 May.
Article En | MEDLINE | ID: mdl-38709825

Chromosome conformation capture (3C) technologies reveal the incredible complexity of genome organization. Maps of increasing size, depth, and resolution are now used to probe genome architecture across cell states, types, and organisms. Larger datasets add challenges at each step of computational analysis, from storage and memory constraints to researchers' time; however, analysis tools that meet these increased resource demands have not kept pace. Furthermore, existing tools offer limited support for customizing analysis for specific use cases or new biology. Here we introduce cooltools (https://github.com/open2c/cooltools), a suite of computational tools that enables flexible, scalable, and reproducible analysis of high-resolution contact frequency data. Cooltools leverages the widely-adopted cooler format which handles storage and access for high-resolution datasets. Cooltools provides a paired command line interface (CLI) and Python application programming interface (API), which respectively facilitate workflows on high-performance computing clusters and in interactive analysis environments. In short, cooltools enables the effective use of the latest and largest genome folding datasets.


Computational Biology , Software , Computational Biology/methods , Programming Languages , Genomics/methods , Genome/genetics , Chromosome Mapping/methods , Humans
3.
Bioinformatics ; 40(2)2024 Feb 01.
Article En | MEDLINE | ID: mdl-38402507

MOTIVATION: Genomic intervals are one of the most prevalent data structures in computational genome biology, and used to represent features ranging from genes, to DNA binding sites, to disease variants. Operations on genomic intervals provide a language for asking questions about relationships between features. While there are excellent interval arithmetic tools for the command line, they are not smoothly integrated into Python, one of the most popular general-purpose computational and visualization environments. RESULTS: Bioframe is a library to enable flexible and performant operations on genomic interval dataframes in Python. Bioframe extends the Python data science stack to use cases for computational genome biology by building directly on top of two of the most commonly-used Python libraries, NumPy and Pandas. The bioframe API enables flexible name and column orders, and decouples operations from data formats to avoid unnecessary conversions, a common scourge for bioinformaticians. Bioframe achieves these goals while maintaining high performance and a rich set of features. AVAILABILITY AND IMPLEMENTATION: Bioframe is open-source under MIT license, cross-platform, and can be installed from the Python Package Index. The source code is maintained by Open2C on GitHub at https://github.com/open2c/bioframe.


Computational Biology , Genomics , Gene Library , Binding Sites , Data Science
4.
bioRxiv ; 2023 Feb 15.
Article En | MEDLINE | ID: mdl-36824968

The field of 3D genome organization produces large amounts of sequencing data from Hi-C and a rapidly-expanding set of other chromosome conformation protocols (3C+). Massive and heterogeneous 3C+ data require high-performance and flexible processing of sequenced reads into contact pairs. To meet these challenges, we present pairtools - a flexible suite of tools for contact extraction from sequencing data. Pairtools provides modular command-line interface (CLI) tools that can be flexibly chained into data processing pipelines. Pairtools provides both crucial core tools as well as auxiliary tools for building feature-rich 3C+ pipelines, including contact pair manipulation, filtration, and quality control. Benchmarking pairtools against popular 3C+ data pipelines shows advantages of pairtools for high-performance and flexible 3C+ analysis. Finally, pairtools provides protocol-specific tools for multi-way contacts, haplotype-resolved contacts, and single-cell Hi-C. The combination of CLI tools and tight integration with Python data analysis libraries makes pairtools a versatile foundation for a broad range of 3C+ pipelines.

5.
Nat Struct Mol Biol ; 30(1): 38-51, 2023 Jan.
Article En | MEDLINE | ID: mdl-36550219

The relationships between chromosomal compartmentalization, chromatin state and function are poorly understood. Here by profiling long-range contact frequencies in HCT116 colon cancer cells, we distinguish three silent chromatin states, comprising two types of heterochromatin and a state enriched for H3K9me2 and H2A.Z that exhibits neutral three-dimensional interaction preferences and which, to our knowledge, has not previously been characterized. We find that heterochromatin marked by H3K9me3, HP1α and HP1ß correlates with strong compartmentalization. We demonstrate that disruption of DNA methyltransferase activity greatly remodels genome compartmentalization whereby domains lose H3K9me3-HP1α/ß binding and acquire the neutrally interacting state while retaining late replication timing. Furthermore, we show that H3K9me3-HP1α/ß heterochromatin is permissive to loop extrusion by cohesin but refractory to CTCF binding. Together, our work reveals a dynamic structural and organizational diversity of the silent portion of the genome and establishes connections between the regulation of chromatin state and chromosome organization, including an interplay between DNA methylation, compartmentalization and loop extrusion.


Chromatin , Heterochromatin , Methylation , Histones/metabolism , Chromobox Protein Homolog 5 , Transcription Factors/metabolism
6.
Phys Rev X ; 13(4)2023.
Article En | MEDLINE | ID: mdl-38774252

Chromosomes are exceedingly long topologically-constrained polymers compacted in a cell nucleus. We recently suggested that chromosomes are organized into loops by an active process of loop extrusion. Yet loops remain elusive to direct observations in living cells; detection and characterization of myriads of such loops is a major challenge. The lack of a tractable physical model of a polymer folded into loops limits our ability to interpret experimental data and detect loops. Here, we introduce a new physical model - a polymer folded into a sequence of loops, and solve it analytically. Our model and a simple geometrical argument show how loops affect statistics of contacts in a polymer across different scales, explaining universally observed shapes of the contact probability. Moreover, we reveal that folding into loops reduces the density of topological entanglements, a novel phenomenon we refer as "the dilution of entanglements". Supported by simulations this finding suggests that up to ~ 1 - 2Mb chromosomes with loops are not topologically constrained, yet become crumpled at larger scales. Our theoretical framework allows inference of loop characteristics, draws a new picture of chromosome organization, and shows how folding into loops affects topological properties of crumpled polymers.

8.
ACS ES T Water ; 2(11): 1899-1909, 2022 Nov 11.
Article En | MEDLINE | ID: mdl-36380771

Wastewater-based epidemiology has emerged as a promising technology for population-level surveillance of COVID-19. In this study, we present results of a large nationwide SARS-CoV-2 wastewater monitoring system in the United States. We profile 55 locations with at least six months of sampling from April 2020 to May 2021. These locations represent more than 12 million individuals across 19 states. Samples were collected approximately weekly by wastewater treatment utilities as part of a regular wastewater surveillance service and analyzed for SARS-CoV-2 RNA concentrations. SARS-CoV-2 RNA concentrations were normalized to pepper mild mottle virus, an indicator of fecal matter in wastewater. We show that wastewater data reflect temporal and geographic trends in clinical COVID-19 cases and investigate the impact of normalization on correlations with case data within and across locations. We also provide key lessons learned from our broad-scale implementation of wastewater-based epidemiology, which can be used to inform wastewater-based epidemiology approaches for future emerging diseases. This work demonstrates that wastewater surveillance is a feasible approach for nationwide population-level monitoring of COVID-19 disease. With an evolving epidemic and effective vaccines against SARS-CoV-2, wastewater-based epidemiology can serve as a passive surveillance approach for detecting changing dynamics or resurgences of the virus.

9.
Genome Biol ; 23(1): 236, 2022 11 08.
Article En | MEDLINE | ID: mdl-36348471

Effectively monitoring the spread of SARS-CoV-2 mutants is essential to efforts to counter the ongoing pandemic. Predicting lineage abundance from wastewater, however, is technically challenging. We show that by sequencing SARS-CoV-2 RNA in wastewater and applying algorithms initially used for transcriptome quantification, we can estimate lineage abundance in wastewater samples. We find high variability in signal among individual samples, but the overall trends match those observed from sequencing clinical samples. Thus, while clinical sequencing remains a more sensitive technique for population surveillance, wastewater sequencing can be used to monitor trends in mutant prevalence in situations where clinical sequencing is unavailable.


COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , Wastewater , RNA, Viral/genetics , Transcriptome
10.
Nat Commun ; 13(1): 2365, 2022 05 02.
Article En | MEDLINE | ID: mdl-35501320

The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data Portal ( https://data.4dnucleome.org/ ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.


Chromosomes , Software , Cell Nucleus/genetics , Chromosomes/genetics , Genome
11.
Water Res ; 212: 118070, 2022 Apr 01.
Article En | MEDLINE | ID: mdl-35101695

Wastewater surveillance has emerged as a useful tool in the public health response to the COVID-19 pandemic. While wastewater surveillance has been applied at various scales to monitor population-level COVID-19 dynamics, there is a need for quantitative metrics to interpret wastewater data in the context of public health trends. 24-hour composite wastewater samples were collected from March 2020 through May 2021 from a Massachusetts wastewater treatment plant and SARS-CoV-2 RNA concentrations were measured using RT-qPCR. The relationship between wastewater copy numbers of SARS-CoV-2 gene fragments and COVID-19 clinical cases and deaths varies over time. We demonstrate the utility of three new metrics to monitor changes in COVID-19 epidemiology: (1) the ratio between wastewater copy numbers of SARS-CoV-2 gene fragments and clinical cases (WC ratio), (2) the time lag between wastewater and clinical reporting, and (3) a transfer function between the wastewater and clinical case curves. The WC ratio increases after key events, providing insight into the balance between disease spread and public health response. Time lag and transfer function analysis showed that wastewater data preceded clinically reported cases in the first wave of the pandemic but did not serve as a leading indicator in the second wave, likely due to increased testing capacity, which allows for more timely case detection and reporting. These three metrics could help further integrate wastewater surveillance into the public health response to the COVID-19 pandemic and future pandemics.


COVID-19 , Pandemics , Benchmarking , Humans , RNA, Viral , SARS-CoV-2 , Wastewater , Wastewater-Based Epidemiological Monitoring
12.
medRxiv ; 2021 Sep 02.
Article En | MEDLINE | ID: mdl-34494031

Effectively monitoring the spread of SARS-CoV-2 variants is essential to efforts to counter the ongoing pandemic. Wastewater monitoring of SARS-CoV-2 RNA has proven an effective and efficient technique to approximate COVID-19 case rates in the population. Predicting variant abundances from wastewater, however, is technically challenging. Here we show that by sequencing SARS-CoV-2 RNA in wastewater and applying computational techniques initially used for RNA-Seq quantification, we can estimate the abundance of variants in wastewater samples. We show by sequencing samples from wastewater and clinical isolates in Connecticut U.S.A. between January and April 2021 that the temporal dynamics of variant strains broadly correspond. We further show that this technique can be used with other wastewater sequencing techniques by expanding to samples taken across the United States in a similar timeframe. We find high variability in signal among individual samples, and limited ability to detect the presence of variants with clinical frequencies <10%; nevertheless, the overall trends match what we observed from sequencing clinical samples. Thus, while clinical sequencing remains a more sensitive technique for population surveillance, wastewater sequencing can be used to monitor trends in variant prevalence in situations where clinical sequencing is unavailable or impractical.

13.
Water Res ; 202: 117433, 2021 Sep 01.
Article En | MEDLINE | ID: mdl-34304074

Individuals infected with SARS-CoV-2, the virus that causes COVID-19, may shed the virus in stool before developing symptoms, suggesting that measurements of SARS-CoV-2 concentrations in wastewater could be a "leading indicator" of COVID-19 prevalence. Multiple studies have corroborated the leading indicator concept by showing that the correlation between wastewater measurements and COVID-19 case counts is maximized when case counts are lagged. However, the meaning of "leading indicator" will depend on the specific application of wastewater-based epidemiology, and the correlation analysis is not relevant for all applications. In fact, the quantification of a leading indicator will depend on epidemiological, biological, and health systems factors. Thus, there is no single "lead time" for wastewater-based COVID-19 monitoring. To illustrate this complexity, we enumerate three different applications of wastewater-based epidemiology for COVID-19: a qualitative "early warning" system; an independent, quantitative estimate of disease prevalence; and a quantitative alert of bursts of disease incidence. The leading indicator concept has different definitions and utility in each application.


COVID-19 , Wastewater-Based Epidemiological Monitoring , Humans , Lead , SARS-CoV-2 , Wastewater
14.
medRxiv ; 2021 Jun 16.
Article En | MEDLINE | ID: mdl-34159339

Wastewater surveillance has emerged as a useful tool in the public health response to the COVID-19 pandemic. While wastewater surveillance has been applied at various scales to monitor population-level COVID-19 dynamics, there is a need for quantitative metrics to interpret wastewater data in the context of public health trends. We collected 24-hour composite wastewater samples from March 2020 through May 2021 from a Massachusetts wastewater treatment plant and measured SARS-CoV-2 RNA concentrations using RT-qPCR. We show that the relationship between wastewater viral titers and COVID-19 clinical cases and deaths varies over time. We demonstrate the utility of three new metrics to monitor changes in COVID-19 epidemiology: (1) the ratio between wastewater viral titers and clinical cases (WC ratio), (2) the time lag between wastewater and clinical reporting, and (3) a transfer function between the wastewater and clinical case curves. We find that the WC ratio increases after key events, providing insight into the balance between disease spread and public health response. We also find that wastewater data preceded clinically reported cases in the first wave of the pandemic but did not serve as a leading indicator in the second wave, likely due to increased testing capacity. These three metrics could complement a framework for integrating wastewater surveillance into the public health response to the COVID-19 pandemic and future pandemics.

15.
Am J Hum Genet ; 107(1): 46-59, 2020 07 02.
Article En | MEDLINE | ID: mdl-32470373

In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.


Multifactorial Inheritance/genetics , Aged , Cohort Studies , Diabetes Mellitus, Type 2/genetics , Female , Genome-Wide Association Study/methods , Genotype , Humans , Linkage Disequilibrium/genetics , Male , Middle Aged , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics
16.
Nat Commun ; 10(1): 4486, 2019 10 03.
Article En | MEDLINE | ID: mdl-31582744

Genome organization involves cis and trans chromosomal interactions, both implicated in gene regulation, development, and disease. Here, we focus on trans interactions in Drosophila, where homologous chromosomes are paired in somatic cells from embryogenesis through adulthood. We first address long-standing questions regarding the structure of embryonic homolog pairing and, to this end, develop a haplotype-resolved Hi-C approach to minimize homolog misassignment and thus robustly distinguish trans-homolog from cis contacts. This computational approach, which we call Ohm, reveals pairing to be surprisingly structured genome-wide, with trans-homolog domains, compartments, and interaction peaks, many coinciding with analogous cis features. We also find a significant genome-wide correlation between pairing, transcription during zygotic genome activation, and binding of the pioneer factor Zelda. Our findings reveal a complex, highly structured organization underlying homolog pairing, first discovered a century ago in Drosophila. Finally, we demonstrate the versatility of our haplotype-resolved approach by applying it to mammalian embryos.


Chromosome Pairing , Chromosomes, Insect/genetics , Drosophila melanogaster/genetics , Genome, Insect , Animals , Cell Culture Techniques , Cell Line , Chromatin/metabolism , Computational Biology , Datasets as Topic , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Embryo, Mammalian , Embryo, Nonmammalian , Female , Genomics/methods , High-Throughput Nucleotide Sequencing , Male , Mice , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , RNA, Small Interfering/metabolism , Sequence Homology, Nucleic Acid , Transcription, Genetic , Zygote
17.
Nature ; 572(7771): E22, 2019 Aug.
Article En | MEDLINE | ID: mdl-31375785

An Amendment to this paper has been published and can be accessed via a link at the top of the paper.

18.
Nature ; 570(7761): 395-399, 2019 06.
Article En | MEDLINE | ID: mdl-31168090

The nucleus of mammalian cells displays a distinct spatial segregation of active euchromatic and inactive heterochromatic regions of the genome1,2. In conventional nuclei, microscopy shows that euchromatin is localized in the nuclear interior and heterochromatin at the nuclear periphery1,2. Genome-wide chromosome conformation capture (Hi-C) analyses show this segregation as a plaid pattern of contact enrichment within euchromatin and heterochromatin compartments3, and depletion between them. Many mechanisms for the formation of compartments have been proposed, such as attraction of heterochromatin to the nuclear lamina2,4, preferential attraction of similar chromatin to each other1,4-12, higher levels of chromatin mobility in active chromatin13-15 and transcription-related clustering of euchromatin16,17. However, these hypotheses have remained inconclusive, owing to the difficulty of disentangling intra-chromatin and chromatin-lamina interactions in conventional nuclei18. The marked reorganization of interphase chromosomes in the inverted nuclei of rods in nocturnal mammals19,20 provides an opportunity to elucidate the mechanisms that underlie spatial compartmentalization. Here we combine Hi-C analysis of inverted rod nuclei with microscopy and polymer simulations. We find that attractions between heterochromatic regions are crucial for establishing both compartmentalization and the concentric shells of pericentromeric heterochromatin, facultative heterochromatin and euchromatin in the inverted nucleus. When interactions between heterochromatin and the lamina are added, the same model recreates the conventional nuclear organization. In addition, our models allow us to rule out mechanisms of compartmentalization that involve strong euchromatin interactions. Together, our experiments and modelling suggest that attractions between heterochromatic regions are essential for the phase separation of the active and inactive genome in inverted and conventional nuclei, whereas interactions of the chromatin with the lamina are necessary to build the conventional architecture from these segregated phases.


Cell Compartmentation , Cell Nucleus/metabolism , Heterochromatin/metabolism , Animals , Cell Compartmentation/genetics , Cell Nucleus/genetics , Euchromatin/genetics , Euchromatin/metabolism , Heterochromatin/genetics , Mice , Models, Biological , Nuclear Lamina/genetics , Nuclear Lamina/metabolism , Time Factors
19.
Curr Opin Cell Biol ; 58: 142-152, 2019 06.
Article En | MEDLINE | ID: mdl-31228682

The spatial organization of chromosomes has long been connected to their polymeric nature and is believed to be important for their biological functions, including the control of interactions between genomic elements, the maintenance of genetic information, and the compaction and safe transfer of chromosomes to cellular progeny. chromosome conformation capture techniques, particularly Hi-C, have provided a comprehensive picture of spatial chromosome organization and revealed new features and elements of chromosome folding. Furthermore, recent advances in microscopy have made it possible to obtain distance maps for extensive regions of chromosomes (Bintu et al., 2018; Nir et al., 2018 [2••,3]), providing information complementary to, and in excellent agreement with, Hi-C maps. Not only has the resolution of both techniques advanced significantly, but new perturbation data generated in the last two years have led to the identification of molecular mechanisms behind large-scale genome organization. Two major mechanisms that have been proposed to govern chromosome organization are (i) the active (ATP-dependent) process of loop extrusion by Structural Maintenance of Chromosomes (SMC) complexes, and (ii) the spatial compartmentalization of the genome, which is likely mediated by affinity interactions between heterochromatic regions (Falk et al., 2019 [76••]) rather than by ATP-dependent processes. Here, we review existing evidence that these two processes operate together to fold chromosomes in interphase and that loop extrusion alone drives mitotic compaction. We discuss possible implications of these mechanisms for chromosome function.


Chromosomes/chemistry , Animals , CCCTC-Binding Factor/metabolism , Cell Cycle , Chromosome Structures , Chromosomes/metabolism , Gene Expression Regulation , Genome , Humans , Interphase
20.
Nat Genet ; 51(2): 364, 2019 02.
Article En | MEDLINE | ID: mdl-30647470

In the version of this article initially published, '+' and '-' labels were missing from the graph keys at the bottom of Fig. 8d. The error has been corrected in the HTML and PDF versions of the article.

...