Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 111
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 40(Supplement_1): i471-i480, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940142

ABSTRACT

MOTIVATION: High-resolution Hi-C contact matrices reveal the detailed three-dimensional architecture of the genome, but high-coverage experimental Hi-C data are expensive to generate. Simultaneously, chromatin structure analyses struggle with extremely sparse contact matrices. To address this problem, computational methods to enhance low-coverage contact matrices have been developed, but existing methods are largely based on resolution enhancement methods for natural images and hence often employ models that do not distinguish between biologically meaningful contacts, such as loops and other stochastic contacts. RESULTS: We present Capricorn, a machine learning model for Hi-C resolution enhancement that incorporates small-scale chromatin features as additional views of the input Hi-C contact matrix and leverages a diffusion probability model backbone to generate a high-coverage matrix. We show that Capricorn outperforms the state of the art in a cross-cell-line setting, improving on existing methods by 17% in mean squared error and 26% in F1 score for chromatin loop identification from the generated high-coverage data. We also demonstrate that Capricorn performs well in the cross-chromosome setting and cross-chromosome, cross-cell-line setting, improving the downstream loop F1 score by 14% relative to existing methods. We further show that our multiview idea can also be used to improve several existing methods, HiCARN and HiCNN, indicating the wide applicability of this approach. Finally, we use DNA sequence to validate discovered loops and find that the fraction of CTCF-supported loops from Capricorn is similar to those identified from the high-coverage data. Capricorn is a powerful Hi-C resolution enhancement method that enables scientists to find chromatin features that cannot be identified in the low-coverage contact matrix. AVAILABILITY AND IMPLEMENTATION: Implementation of Capricorn and source code for reproducing all figures in this paper are available at https://github.com/CHNFTQ/Capricorn.


Subject(s)
Chromatin , Machine Learning , Chromatin/chemistry , Chromatin/metabolism , Humans , Computational Biology/methods , Algorithms , Software
2.
bioRxiv ; 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38895431

ABSTRACT

A pressing statistical challenge in the field of mass spectrometry proteomics is how to assess whether a given software tool provides accurate error control. Each software tool for searching such data uses its own internally implemented methodology for reporting and controlling the error. Many of these software tools are closed source, with incompletely documented methodology, and the strategies for validating the error are inconsistent across tools. In this work, we identify three different methods for validating false discovery rate (FDR) control in use in the field, one of which is invalid, one of which can only provide a lower bound rather than an upper bound, and one of which is valid but under-powered. The result is that the field has a very poor understanding of how well we are doing with respect to FDR control, particularly for the analysis of data-independent acquisition (DIA) data. We therefore propose a new, more powerful method for evaluating FDR control in this setting, and we then employ that method, along with an existing lower bounding technique, to characterize a variety of popular search tools. We find that the search tools for analysis of data-dependent acquisition (DDA) data generally seem to control the FDR at the peptide level, whereas none of the DIA search tools consistently controls the FDR at the peptide level across all the datasets we investigated. Furthermore, this problem becomes much worse when the latter tools are evaluated at the protein level. These results may have significant implications for various downstream analyses, since proper FDR control has the potential to reduce noise in discovery lists and thereby boost statistical power.

4.
J Proteome Res ; 23(6): 1894-1906, 2024 Jun 07.
Article in English | MEDLINE | ID: mdl-38652578

ABSTRACT

Searching for tandem mass spectrometry proteomics data against a database is a well-established method for assigning peptide sequences to observed spectra but typically cannot identify peptides harboring unexpected post-translational modifications (PTMs). Open modification searching aims to address this problem by allowing a spectrum to match a peptide even if the spectrum's precursor mass differs from the peptide mass. However, expanding the search space in this way can lead to a loss of statistical power to detect peptides. We therefore developed a method, called CONGA (combining open and narrow searches with group-wise analysis), that takes into account results from both types of searches─a traditional "narrow window" search and an open modification search─while carrying out rigorous false discovery rate control. The result is an algorithm that provides the best of both worlds: the ability to detect unexpected PTMs without a concomitant loss of power to detect unmodified peptides.


Subject(s)
Algorithms , Databases, Protein , Protein Processing, Post-Translational , Proteomics , Tandem Mass Spectrometry , Tandem Mass Spectrometry/methods , Proteomics/methods , Peptides/analysis , Peptides/chemistry , Humans , Software , Amino Acid Sequence
5.
bioRxiv ; 2024 Apr 02.
Article in English | MEDLINE | ID: mdl-38617345

ABSTRACT

Membrane-bound particles in plasma are composed of exosomes, microvesicles, and apoptotic bodies and represent ~1-2% of the total protein composition. Proteomic interrogation of this subset of plasma proteins augments the representation of tissue-specific proteins, representing a "liquid biopsy," while enabling the detection of proteins that would otherwise be beyond the dynamic range of liquid chromatography-tandem mass spectrometry of unfractionated plasma. We have developed an enrichment strategy (Mag-Net) using hyper-porous strong-anion exchange magnetic microparticles to sieve membrane-bound particles from plasma. The Mag-Net method is robust, reproducible, inexpensive, and requires <100 µL plasma input. Coupled to a quantitative data-independent mass spectrometry analytical strategy, we demonstrate that we can collect results for >37,000 peptides from >4,000 plasma proteins with high precision. Using this analytical pipeline on a small cohort of patients with neurodegenerative disease and healthy age-matched controls, we discovered 204 proteins that differentiate (q-value < 0.05) patients with Alzheimer's disease dementia (ADD) from those without ADD. Our method also discovered 310 proteins that were different between Parkinson's disease and those with either ADD or healthy cognitively normal individuals. Using machine learning we were able to distinguish between ADD and not ADD with a mean ROC AUC = 0.98 ± 0.06.

6.
Nat Commun ; 15(1): 1027, 2024 Feb 03.
Article in English | MEDLINE | ID: mdl-38310092

ABSTRACT

Fluorescent in situ hybridization (FISH) is a powerful method for the targeted visualization of nucleic acids in their native contexts. Recent technological advances have leveraged computationally designed oligonucleotide (oligo) probes to interrogate > 100 distinct targets in the same sample, pushing the boundaries of FISH-based assays. However, even in the most highly multiplexed experiments, repetitive DNA regions are typically not included as targets, as the computational design of specific probes against such regions presents significant technical challenges. Consequently, many open questions remain about the organization and function of highly repetitive sequences. Here, we introduce Tigerfish, a software tool for the genome-scale design of oligo probes against repetitive DNA intervals. We showcase Tigerfish by designing a panel of 24 interval-specific repeat probes specific to each of the 24 human chromosomes and imaging this panel on metaphase spreads and in interphase nuclei. Tigerfish extends the powerful toolkit of oligo-based FISH to highly repetitive DNA.


Subject(s)
DNA , Repetitive Sequences, Nucleic Acid , Humans , In Situ Hybridization, Fluorescence/methods , DNA/genetics , Repetitive Sequences, Nucleic Acid/genetics , Oligonucleotide Probes/genetics , DNA Probes/genetics , Oligonucleotides/genetics
7.
J Proteome Res ; 22(11): 3427-3438, 2023 11 03.
Article in English | MEDLINE | ID: mdl-37861703

ABSTRACT

Quantitative measurements produced by tandem mass spectrometry proteomics experiments typically contain a large proportion of missing values. Missing values hinder reproducibility, reduce statistical power, and make it difficult to compare across samples or experiments. Although many methods exist for imputing missing values, in practice, the most commonly used methods are among the worst performing. Furthermore, previous benchmarking studies have focused on relatively simple measurements of error such as the mean-squared error between imputed and held-out values. Here we evaluate the performance of commonly used imputation methods using three practical, "downstream-centric" criteria. These criteria measure the ability to identify differentially expressed peptides, generate new quantitative peptides, and improve the peptide lower limit of quantification. Our evaluation comprises several experiment types and acquisition strategies, including data-dependent and data-independent acquisition. We find that imputation does not necessarily improve the ability to identify differentially expressed peptides but that it can identify new quantitative peptides and improve the peptide lower limit of quantification. We find that MissForest is generally the best performing method per our downstream-centric criteria. We also argue that existing imputation methods do not properly account for the variance of peptide quantifications and highlight the need for methods that do.


Subject(s)
Algorithms , Proteomics , Proteomics/methods , Reproducibility of Results , Tandem Mass Spectrometry , Peptides/analysis
8.
Mol Cell ; 83(15): 2624-2640, 2023 08 03.
Article in English | MEDLINE | ID: mdl-37419111

ABSTRACT

The four-dimensional nucleome (4DN) consortium studies the architecture of the genome and the nucleus in space and time. We summarize progress by the consortium and highlight the development of technologies for (1) mapping genome folding and identifying roles of nuclear components and bodies, proteins, and RNA, (2) characterizing nuclear organization with time or single-cell resolution, and (3) imaging of nuclear organization. With these tools, the consortium has provided over 2,000 public datasets. Integrative computational models based on these data are starting to reveal connections between genome structure and function. We then present a forward-looking perspective and outline current aims to (1) delineate dynamics of nuclear architecture at different timescales, from minutes to weeks as cells differentiate, in populations and in single cells, (2) characterize cis-determinants and trans-modulators of genome organization, (3) test functional consequences of changes in cis- and trans-regulators, and (4) develop predictive models of genome structure and function.


Subject(s)
Cell Nucleus , Genome , Genome/genetics , Cell Nucleus/genetics , Cell Nucleus/metabolism , Chromatin/metabolism
9.
Genome Biol ; 24(1): 134, 2023 06 06.
Article in English | MEDLINE | ID: mdl-37280678

ABSTRACT

Recent deep learning models that predict the Hi-C contact map from DNA sequence achieve promising accuracy but cannot generalize to new cell types and or even capture differences among training cell types. We propose Epiphany, a neural network to predict cell-type-specific Hi-C contact maps from widely available epigenomic tracks. Epiphany uses bidirectional long short-term memory layers to capture long-range dependencies and optionally a generative adversarial network architecture to encourage contact map realism. Epiphany shows excellent generalization to held-out chromosomes within and across cell types, yields accurate TAD and interaction calls, and predicts structural changes caused by perturbations of epigenomic signals.


Subject(s)
Chromosomes , Epigenomics , Neural Networks, Computer , Chromatin
10.
J Proteome Res ; 22(7): 2172-2178, 2023 07 07.
Article in English | MEDLINE | ID: mdl-37261867

ABSTRACT

Controlling the false discovery rate (FDR) among discoveries from a tandem mass spectrometry proteomics experiment using target decoy competition (TDC) controls only the proportion of false discoveries in an average sense. Thus, for any particular analysis, even with a valid FDR control procedure, the proportion of false discoveries (the FDP) may be higher than the specified FDR threshold. We demonstrate this phenomenon using real data and describe two recently developed methods that help bridge the gap between controlling the expected or average rate of false discoveries and the empirical rate (FDP). The FDP Stepdown method controls the FDP at any desired confidence level, and the TDC Uniform Band provides a confidence, or upper prediction bound, on the FDP in TDC's list of discoveries.


Subject(s)
Algorithms , Proteomics , Databases, Protein , Proteomics/methods , Tandem Mass Spectrometry
11.
bioRxiv ; 2023 May 04.
Article in English | MEDLINE | ID: mdl-37205597

ABSTRACT

Background: The number and escape levels of genes that escape X chromosome inactivation (XCI) in female somatic cells vary among tissues and cell types, potentially contributing to specific sex differences. Here we investigate the role of CTCF, a master chromatin conformation regulator, in regulating escape from XCI. CTCF binding profiles and epigenetic features were systematically examined at constitutive and facultative escape genes using mouse allelic systems to distinguish the inactive X (Xi) and active X (Xa) chromosomes. Results: We found that escape genes are located inside domains flanked by convergent arrays of CTCF binding sites, consistent with the formation of loops. In addition, strong and divergent CTCF binding sites often located at the boundaries between escape genes and adjacent neighbors subject to XCI would help insulate domains. Facultative escapees show clear differences in CTCF binding dependent on their XCI status in specific cell types/tissues. Concordantly, deletion but not inversion of a CTCF binding site at the boundary between the facultative escape gene Car5b and its silent neighbor Siah1b resulted in loss of Car5b escape. Reduced CTCF binding and enrichment of a repressive mark over Car5b in cells with a boundary deletion indicated loss of looping and insulation. In mutant lines in which either the Xi-specific compact structure or its H3K27me3 enrichment was disrupted, escape genes showed an increase in gene expression and associated active marks, supporting the roles of the 3D Xi structure and heterochromatic marks in constraining levels of escape. Conclusion: Our findings indicate that escape from XCI is modulated both by looping and insulation of chromatin via convergent arrays of CTCF binding sites and by compaction and epigenetic features of the surrounding heterochromatin.

12.
bioRxiv ; 2023 Mar 07.
Article in English | MEDLINE | ID: mdl-36945528

ABSTRACT

Fluorescent in situ hybridization (FISH) is a powerful method for the targeted visualization of nucleic acids in their native contexts. Recent technological advances have leveraged computationally designed oligonucleotide (oligo) probes to interrogate >100 distinct targets in the same sample, pushing the boundaries of FISH-based assays. However, even in the most highly multiplexed experiments, repetitive DNA regions are typically not included as targets, as the computational design of specific probes against such regions presents significant technical challenges. Consequently, many open questions remain about the organization and function of highly repetitive sequences. Here, we introduce Tigerfish, a software tool for the genome-scale design of oligo probes against repetitive DNA intervals. We showcase Tigerfish by designing a panel of 24 interval-specific repeat probes specific to each of the 24 human chromosomes and imaging this panel on metaphase spreads and in interphase nuclei. Tigerfish extends the powerful toolkit of oligo-based FISH to highly repetitive DNA.

13.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36594573

ABSTRACT

MOTIVATION: We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two-step algorithm: first, convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood approach, modeling the contact counts between two loci as a Poisson random variable whose intensity is a decreasing function of the distance between them. However, a Poisson model of contact counts implies that the variance of the data is equal to the mean, a relationship that is often too restrictive to properly model count data. RESULTS: We first confirm the presence of overdispersion in several real Hi-C datasets, and we show that the overdispersion arises even in simulated datasets. We then propose a new model, called Pastis-NB, where we replace the Poisson model of contact counts by a negative binomial one, which is parametrized by a mean and a separate dispersion parameter. The dispersion parameter allows the variance to be adjusted independently from the mean, thus better modeling overdispersed data. We compare the results of Pastis-NB to those of several previously published algorithms, both MDS-based and statistical methods. We show that the negative binomial inference yields more accurate structures on simulated data, and more robust structures than other models across real Hi-C replicates and across different resolutions. AVAILABILITY AND IMPLEMENTATION: A Python implementation of Pastis-NB is available at https://github.com/hiclib/pastis under the BSD license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Genome , Likelihood Functions
14.
Stem Cell Reports ; 18(1): 159-174, 2023 01 10.
Article in English | MEDLINE | ID: mdl-36493778

ABSTRACT

Vascular endothelial cells are a mesoderm-derived lineage with many essential functions, including angiogenesis and coagulation. The gene-regulatory mechanisms underpinning endothelial specialization are largely unknown, as are the roles of chromatin organization in regulating endothelial cell transcription. To investigate the relationships between chromatin organization and gene expression, we induced endothelial cell differentiation from human pluripotent stem cells and performed Hi-C and RNA-sequencing assays at specific time points. Long-range intrachromosomal contacts increase over the course of differentiation, accompanied by widespread heteroeuchromatic compartment transitions that are tightly associated with transcription. Dynamic topologically associating domain boundaries strengthen and converge on an endothelial cell state, and function to regulate gene expression. Chromatin pairwise point interactions (DNA loops) increase in frequency during differentiation and are linked to the expression of genes essential to vascular biology. Chromatin dynamics guide transcription in endothelial cell development and promote the divergence of endothelial cells from cardiomyocytes.


Subject(s)
Chromatin , Endothelial Cells , Humans , Cell Differentiation/genetics , Gene Expression Regulation
15.
Am J Clin Pathol ; 157(5): 748-757, 2022 05 04.
Article in English | MEDLINE | ID: mdl-35512256

ABSTRACT

OBJECTIVES: Standard implementations of amyloid typing by liquid chromatography-tandem mass spectrometry use capabilities unavailable to most clinical laboratories. To improve accessibility of this testing, we explored easier approaches to tissue sampling and data processing. METHODS: We validated a typing method using manual sampling in place of laser microdissection, pairing the technique with a semiquantitative measure of sampling adequacy. In addition, we created an open-source data processing workflow (Crux Pipeline) for clinical users. RESULTS: Cases of amyloidosis spanning the major types were distinguishable with 100% specificity using measurements of individual amyloidogenic proteins or in combination with the ratio of λ and κ constant regions. Crux Pipeline allowed for rapid, batched data processing, integrating the steps of peptide identification, statistical confidence estimation, and label-free protein quantification. CONCLUSIONS: Accurate mass spectrometry-based amyloid typing is possible without laser microdissection. To facilitate entry into solid tissue proteomics, newcomers can leverage manual sampling approaches in combination with Crux Pipeline and related tools.


Subject(s)
Amyloidosis , Tandem Mass Spectrometry , Amyloid/analysis , Amyloidogenic Proteins , Amyloidosis/diagnosis , Humans , Microdissection , Tandem Mass Spectrometry/methods
16.
J Proteome Res ; 21(6): 1382-1391, 2022 06 03.
Article in English | MEDLINE | ID: mdl-35549345

ABSTRACT

Advances in library-based methods for peptide detection from data-independent acquisition (DIA) mass spectrometry have made it possible to detect and quantify tens of thousands of peptides in a single mass spectrometry run. However, many of these methods rely on a comprehensive, high-quality spectral library containing information about the expected retention time and fragmentation patterns of peptides in the sample. Empirical spectral libraries are often generated through data-dependent acquisition and may suffer from biases as a result. Spectral libraries can be generated in silico, but these models are not trained to handle all possible post-translational modifications. Here, we propose a false discovery rate-controlled spectrum-centric search workflow to generate spectral libraries directly from gas-phase fractionated DIA tandem mass spectrometry data. We demonstrate that this strategy is able to detect phosphorylated peptides and can be used to generate a spectral library for accurate peptide detection and quantitation in wide-window DIA data. We compare the results of this search workflow to other library-free approaches and demonstrate that our search is competitive in terms of accuracy and sensitivity. These results demonstrate that the proposed workflow has the capacity to generate spectral libraries while avoiding the limitations of other methods.


Subject(s)
Peptides , Tandem Mass Spectrometry , Peptide Library , Peptides/analysis , Protein Processing, Post-Translational , Proteome/analysis , Tandem Mass Spectrometry/methods , Workflow
18.
Nat Rev Genet ; 23(3): 169-181, 2022 03.
Article in English | MEDLINE | ID: mdl-34837041

ABSTRACT

The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics. We explore how the structure of genomics data can bias performance evaluations and predictions. To address the challenges associated with applying cutting-edge ML methods to genomics, we describe solutions and appropriate use cases where ML modelling shows great potential.


Subject(s)
Genomics/methods , Machine Learning , Animals , Genomics/standards , Genomics/trends , Humans , Machine Learning/standards , Models, Statistical , Software
19.
Elife ; 102021 11 04.
Article in English | MEDLINE | ID: mdl-34734806

ABSTRACT

A longstanding hypothesis is that chromatin fiber folding mediated by interactions between nearby nucleosomes represses transcription. However, it has been difficult to determine the relationship between local chromatin fiber compaction and transcription in cells. Further, global changes in fiber diameters have not been observed, even between interphase and mitotic chromosomes. We show that an increase in the range of local inter-nucleosomal contacts in quiescent yeast drives the compaction of chromatin fibers genome-wide. Unlike actively dividing cells, inter-nucleosomal interactions in quiescent cells require a basic patch in the histone H4 tail. This quiescence-specific fiber folding globally represses transcription and inhibits chromatin loop extrusion by condensin. These results reveal that global changes in chromatin fiber compaction can occur during cell state transitions, and establish physiological roles for local chromatin fiber folding in regulating transcription and chromatin domain formation.


Subject(s)
Chromatin Assembly and Disassembly , Chromatin/genetics , Saccharomyces cerevisiae/genetics , Adenosine Triphosphatases , Chromatin/metabolism , DNA-Binding Proteins , Histones/chemistry , Histones/metabolism , Multiprotein Complexes , Nucleosomes/metabolism , Protein Folding , Saccharomyces cerevisiae/growth & development , Transcription, Genetic
20.
Genome Biol ; 22(1): 279, 2021 09 27.
Article in English | MEDLINE | ID: mdl-34579774

ABSTRACT

BACKGROUND: Mammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X inactivation by integrating, for the first time, allele-specific data from these three modalities obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. RESULTS: Allele-specific contact decay profiles obtained by single-cell Hi-C clearly show that the inactive X chromosome has a unique profile in differentiated cells that have undergone X inactivation. Loss of this inactive X-specific structure at mitosis is followed by its reappearance during the cell cycle, suggesting a "bookmark" mechanism. Differentiation of embryonic stem cells to follow the onset of X inactivation is associated with changes in contact decay profiles that occur in parallel on both the X chromosomes and autosomes. Single-cell RNA-seq and ATAC-seq show evidence of a delay in female versus male cells, due to the presence of two active X chromosomes at early stages of differentiation. The onset of the inactive X-specific structure in single cells occurs later than gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Single-cell Hi-C highlights evidence of discrete changes in nuclear structure characterized by the acquisition of very long-range contacts throughout the nucleus. Novel computational approaches allow for the effective alignment of single-cell gene expression, chromatin accessibility, and 3D chromosome structure. CONCLUSIONS: Based on trajectory analyses, three distinct nuclear structure states are detected reflecting discrete and profound simultaneous changes not only to the structure of the X chromosomes, but also to that of autosomes during differentiation. Our study reveals that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.


Subject(s)
Cell Differentiation/genetics , Gene Expression , Mouse Embryonic Stem Cells/metabolism , X Chromosome Inactivation , Alleles , Animals , Cell Cycle , Cell Line , Cell Nucleus/genetics , Female , Genome , Male , Mice , RNA-Seq , Single-Cell Analysis , X Chromosome/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...