Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 207
Filter
1.
Methods Mol Biol ; 2846: 1-16, 2024.
Article in English | MEDLINE | ID: mdl-39141226

ABSTRACT

For the genome-wide mapping of histone modifications, chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing remains the benchmark method. While crosslinked ChIP can be used for all kinds of targets, native ChIP is predominantly used for strong and direct DNA interactors like histones and their modifications. Here we describe a native ChIP protocol that can be used for cells and tissue material.


Subject(s)
Chromatin Immunoprecipitation , Histones , Chromatin Immunoprecipitation/methods , Histones/metabolism , Histones/genetics , Humans , Histone Code , High-Throughput Nucleotide Sequencing/methods , Protein Processing, Post-Translational , Animals , Chromatin/metabolism , Chromatin/genetics , DNA/genetics , DNA/metabolism , Chromatin Immunoprecipitation Sequencing/methods
2.
Methods Mol Biol ; 2846: 47-62, 2024.
Article in English | MEDLINE | ID: mdl-39141229

ABSTRACT

Chromatin immunoprecipitation (ChIP) followed by next-generation sequencing (-seq) has been the most common genomics method for studying DNA-protein interactions in the last decade. ChIP-seq technology became standard both experimentally and computationally. This chapter presents a core workflow that covers data processing and initial analytical steps of ChIP-seq data. We provide a step-by-step protocol of the commands as well as a fully assembled Snakemake workflow. Along the protocol, we discuss key tool parameters, quality control, output reports, and preliminary results.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Computational Biology , Software , Workflow , Chromatin Immunoprecipitation Sequencing/methods , Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Data Analysis , Chromatin Immunoprecipitation/methods , Humans
3.
Methods Mol Biol ; 2846: 63-89, 2024.
Article in English | MEDLINE | ID: mdl-39141230

ABSTRACT

Chromatin immunoprecipitation in combination with next-generation sequencing (ChIP-Seq) allows probing of protein-DNA binding in a rapid and genome-wide fashion. Herein we describe the required steps to preprocess ChIP-Seq data and to analyze the differential binding of proteins to DNA for perturbation experiments. In these experiments, different conditions are compared to find the underlying biological mechanisms caused by the stimulus or treatment. In addition, we provide a sample analysis using the steps outlined in the chapter.


Subject(s)
Chromatin Immunoprecipitation Sequencing , DNA-Binding Proteins , DNA , High-Throughput Nucleotide Sequencing , Protein Binding , Chromatin Immunoprecipitation Sequencing/methods , DNA/metabolism , DNA/genetics , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , Binding Sites , Chromatin Immunoprecipitation/methods , Computational Biology/methods , Sequence Analysis, DNA/methods , Software
4.
Methods Mol Biol ; 2846: 109-121, 2024.
Article in English | MEDLINE | ID: mdl-39141232

ABSTRACT

ChIP-Seq has been used extensively to profile genome-wide transcription factor binding and post-translational histone modifications. A sequential ChIP assay determines the in vivo co-localization of two proteins to the same genomic locus. In this chapter, we combine the two protocols in Sequential ChIP-Seq, a method for identifying genome-wide sites of in vivo protein co-occupancy.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Chromatin Immunoprecipitation Sequencing/methods , Humans , Histones/metabolism , Histones/genetics , Transcription Factors/metabolism , Transcription Factors/genetics , Binding Sites , Protein Binding , Chromatin Immunoprecipitation/methods , Animals , High-Throughput Nucleotide Sequencing/methods
5.
Methods Mol Biol ; 2846: 133-150, 2024.
Article in English | MEDLINE | ID: mdl-39141234

ABSTRACT

Gonadal steroid hormones, namely, testosterone, progesterone, and estrogens, influence the physiological state of an organism through the regulation of gene transcription. Steroid hormones activate nuclear hormone receptor (HR), transcription factors (TFs), which bind DNA in a tissue- and cell type-specific manner to influence cellular function. Identifying the genomic binding sites of HRs is essential to understanding mechanisms of hormone signaling across tissues and disease contexts. Traditionally, chromatin immunoprecipitation followed by sequencing (ChIP-seq) has been used to map the genomic binding of HRs in cancer cell lines and large tissues. However, ChIP-seq lacks the sensitivity to detect TF binding in small numbers of cells, such as genetically defined neuronal subtypes in the brain. Cleavage Under Targets & Release Under Nuclease (CUT&RUN) resolves most of the technical limitations of ChIP-seq, enabling the detection of protein-DNA interactions with as few as 100-1000 cells. In this chapter, we provide a stepwise CUT&RUN protocol for detecting and analyzing the genome-wide binding of estrogen receptor α (ERα) in mouse brain tissue. The steps described here can be used to identify the genomic binding sites of most TFs in the brain.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Chromatin , Animals , Chromatin/metabolism , Chromatin/genetics , Mice , Chromatin Immunoprecipitation Sequencing/methods , Binding Sites , Chromatin Immunoprecipitation/methods , Brain/metabolism , Estrogen Receptor alpha/metabolism , Estrogen Receptor alpha/genetics , Receptors, Estrogen/metabolism , Receptors, Estrogen/genetics , Protein Binding , Humans , Transcription Factors/metabolism , Transcription Factors/genetics
6.
Methods Mol Biol ; 2846: 169-179, 2024.
Article in English | MEDLINE | ID: mdl-39141236

ABSTRACT

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) allows for the identification of genomic targeting of DNA-binding proteins. Cleavage Under Targets and Release Using Nuclease (CUT&RUN) modifies this process by including a nuclease to digest DNA around a protein of interest. The result is a higher signal-to-noise ratio and decreased required starting material. This allows for high-fidelity sequence identification from as few as 500 cells, enabling chromatin profiling of precious tissue samples or primary cell types, as well as less abundant chromatin-binding proteins: all at significantly increased throughput.


Subject(s)
Epigenesis, Genetic , Humans , Chromatin Immunoprecipitation/methods , Chromatin Immunoprecipitation Sequencing/methods , DNA/metabolism , DNA/genetics , Chromatin/metabolism , Chromatin/genetics , Animals , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/genetics
7.
Methods Mol Biol ; 2846: 191-213, 2024.
Article in English | MEDLINE | ID: mdl-39141238

ABSTRACT

Cleavage Under Targets and Tagmentation (CUT&Tag) is a recent methodology used for robust epigenomic profiling that, unlike conventional chromatin immunoprecipitation (ChIP-Seq), requires only a limited amount of cells as starting material. RNA sequencing (RNA-Seq) reveals the presence and quantity of RNA in a biological sample, describing the continuously changing cellular transcriptome. The integrated analysis of transcriptional activity, histone modifications, and chromatin accessibility via CUT&Tag is still in its infancy compared to the well-established ChIP-Seq. This chapter describes a robust bioinformatics methodology and workflow to perform an integrative CUT&Tag/RNA-Seq analysis.


Subject(s)
Computational Biology , Workflow , Computational Biology/methods , Humans , Epigenomics/methods , RNA-Seq/methods , Software , Chromatin/genetics , Chromatin/metabolism , Sequence Analysis, RNA/methods , Chromatin Immunoprecipitation Sequencing/methods , Chromatin Immunoprecipitation/methods , High-Throughput Nucleotide Sequencing/methods , Gene Expression Profiling/methods , Transcriptome
8.
Nat Commun ; 15(1): 6828, 2024 Aug 09.
Article in English | MEDLINE | ID: mdl-39122670

ABSTRACT

Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) is being increasingly used to study gene regulation. However, major analytical gaps limit its utility in studying gene regulatory programs in complex diseases. In response, MOCHA (Model-based single cell Open CHromatin Analysis) presents major advances over existing analysis tools, including: 1) improving identification of sample-specific open chromatin, 2) statistical modeling of technical drop-out with zero-inflated methods, 3) mitigation of false positives in single cell analysis, 4) identification of alternative transcription-starting-site regulation, and 5) modules for inferring temporal gene regulatory networks from longitudinal data. These advances, in addition to open chromatin analyses, provide a robust framework after quality control and cell labeling to study gene regulatory programs in human disease. We benchmark MOCHA with four state-of-the-art tools to demonstrate its advances. We also construct cross-sectional and longitudinal gene regulatory networks, identifying potential mechanisms of COVID-19 response. MOCHA provides researchers with a robust analytical tool for functional genomic inference from scATAC-seq data.


Subject(s)
COVID-19 , Chromatin , Gene Regulatory Networks , Genomics , Models, Statistical , Single-Cell Analysis , Humans , COVID-19/genetics , COVID-19/virology , Single-Cell Analysis/methods , Genomics/methods , Chromatin/genetics , Chromatin/metabolism , SARS-CoV-2/genetics , Transposases/metabolism , Transposases/genetics , Chromatin Immunoprecipitation Sequencing/methods , Cohort Studies , Gene Expression Regulation
9.
Methods Mol Biol ; 2818: 23-43, 2024.
Article in English | MEDLINE | ID: mdl-39126465

ABSTRACT

Meiotic recombination is a key process facilitating the formation of crossovers and the exchange of genetic material between homologous chromosomes in early meiosis. This involves controlled double-strand breaks (DSBs) formation catalyzed by Spo11. DSBs exhibit a preferential location in specific genomic regions referred to as hotspots, and their variability is tied to varying Spo11 activity levels. We have refined a ChIP-Seq technique, called SPO-Seq, to map Spo11-specific DSB formation in Saccharomyces cerevisiae. The chapter describes our streamlined approach and the developed bioinformatic tools for processing data and comparing with existing DSB hotspot maps. Through this combined experimental and computational approach, we aim to enhance our understanding of meiotic recombination and genetic exchange processes in budding yeast, with the potential to expand this methodology to other organisms by applying a few modifications.


Subject(s)
DNA Breaks, Double-Stranded , Endodeoxyribonucleases , Meiosis , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genetics , Meiosis/genetics , Endodeoxyribonucleases/genetics , Endodeoxyribonucleases/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Chromatin Immunoprecipitation Sequencing/methods , Computational Biology/methods
10.
Methods Mol Biol ; 2818: 65-80, 2024.
Article in English | MEDLINE | ID: mdl-39126467

ABSTRACT

Chromatin undergoes extensive remodeling during meiosis, leading to specific patterns of gene expression and chromosome organization, which ultimately controls fundamental meiotic processes such as recombination and homologous chromosome associations. Recent game-changing advances have been made by analysis of chromatin binding sites of meiotic specific proteins genome-wide in mouse spermatocytes. However, further progress is still highly dependent on the reliable isolation of sufficient quantities of spermatocytes at specific stages of prophase I. Here, we describe a combination of methodologies we adapted for rapid and reliable isolation of synchronized fixed mouse spermatocytes. We show that chromatin isolated from these cells can be used to study chromatin-binding sites by ChIP-seq. High-quality data we obtained from INO80 ChIP-seq in zygotene cells was used for functional analysis of chromatin-binding sites.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Chromatin , Spermatocytes , Animals , Spermatocytes/metabolism , Spermatocytes/cytology , Mice , Male , Chromatin Immunoprecipitation Sequencing/methods , Chromatin/genetics , Chromatin/metabolism , Meiosis/genetics , Chromatin Immunoprecipitation/methods , Binding Sites
11.
Nat Commun ; 15(1): 6852, 2024 Aug 10.
Article in English | MEDLINE | ID: mdl-39127768

ABSTRACT

Cis-regulatory elements (CREs) are pivotal in orchestrating gene expression throughout diverse biological systems. Accurate identification and in-depth characterization of functional CREs are crucial for decoding gene regulation networks during cellular processes. In this study, we develop Kethoxal-Assisted Single-stranded DNA Assay for Transposase-Accessible Chromatin with Sequencing (KAS-ATAC-seq) to quantitatively analyze the transcriptional activity of CREs. A main advantage of KAS-ATAC-seq lies in its precise measurement of ssDNA levels within both proximal and distal ATAC-seq peaks, enabling the identification of transcriptional regulatory sequences. This feature is particularly adept at defining Single-Stranded Transcribing Enhancers (SSTEs). SSTEs are highly enriched with nascent RNAs and specific transcription factors (TFs) binding sites that define cellular identity. Moreover, KAS-ATAC-seq provides a detailed characterization and functional implications of various SSTE subtypes. Our analysis of CREs during mouse neural differentiation demonstrates that KAS-ATAC-seq can effectively identify immediate-early activated CREs in response to retinoic acid (RA) treatment. Our findings indicate that KAS-ATAC-seq provides more precise annotation of functional CREs in transcription. Future applications of KAS-ATAC-seq would help elucidate the intricate dynamics of gene regulation in diverse biological processes.


Subject(s)
Transcription Factors , Animals , Mice , Transcription Factors/metabolism , Transcription Factors/genetics , Transcription, Genetic , Enhancer Elements, Genetic/genetics , Chromatin/metabolism , Chromatin/genetics , Binding Sites , Humans , DNA, Single-Stranded/genetics , DNA, Single-Stranded/metabolism , Chromatin Immunoprecipitation Sequencing/methods , Transposases/metabolism , Transposases/genetics , Regulatory Elements, Transcriptional , Tretinoin/pharmacology , Tretinoin/metabolism , Gene Expression Regulation , Cell Differentiation/genetics , Sequence Analysis, DNA/methods , Regulatory Sequences, Nucleic Acid/genetics
12.
PLoS Comput Biol ; 20(8): e1011854, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39093856

ABSTRACT

Single-cell ATAC-seq sequencing data (scATAC-seq) has been widely used to investigate chromatin accessibility on the single-cell level. One important application of scATAC-seq data analysis is differential chromatin accessibility (DA) analysis. However, the data characteristics of scATAC-seq such as excessive zeros and large variability of chromatin accessibility across cells impose a unique challenge for DA analysis. Existing statistical methods focus on detecting the mean difference of the chromatin accessible regions while overlooking the distribution difference. Motivated by real data exploration that distribution difference exists among cell types, we introduce a novel composite statistical test named "scaDA", which is based on zero-inflated negative binomial model (ZINB), for performing differential distribution analysis of chromatin accessibility by jointly testing the abundance, prevalence and dispersion simultaneously. Benefiting from both dispersion shrinkage and iterative refinement of mean and prevalence parameter estimates, scaDA demonstrates its superiority to both ZINB-based likelihood ratio tests and published methods by achieving the highest power and best FDR control in a comprehensive simulation study. In addition to demonstrating the highest power in three real sc-multiome data analyses, scaDA successfully identifies differentially accessible regions in microglia from sc-multiome data for an Alzheimer's disease (AD) study that are most enriched in GO terms related to neurogenesis and the clinical phenotype of AD, and AD-associated GWAS SNPs.


Subject(s)
Chromatin , Single-Cell Analysis , Chromatin/genetics , Chromatin/metabolism , Chromatin/chemistry , Single-Cell Analysis/methods , Single-Cell Analysis/statistics & numerical data , Humans , Computational Biology/methods , Alzheimer Disease/genetics , Models, Statistical , Chromatin Immunoprecipitation Sequencing/methods , Computer Simulation , Animals , Sequence Analysis, DNA/methods , Algorithms
13.
Article in English | MEDLINE | ID: mdl-39049508

ABSTRACT

Gene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing (RNA-seq) data, which helps to decipher single-cell heterogeneity and cell type-specific variability by incorporating prior knowledge from functional gene sets. Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a powerful technique for interrogating single-cell chromatin-based gene regulation, and genes or gene sets with dynamic regulatory potentials can be regarded as cell type-specific markers as if in single-cell RNA-seq (scRNA-seq). However, there are few GSS tools specifically designed for scATAC-seq, and the applicability and performance of RNA-seq GSS tools on scATAC-seq data remain to be investigated. Here, we systematically benchmarked ten GSS tools, including four bulk RNA-seq tools, five scRNA-seq tools, and one scATAC-seq method. First, using matched scATAC-seq and scRNA-seq datasets, we found that the performance of GSS tools on scATAC-seq data was comparable to that on scRNA-seq, suggesting their applicability to scATAC-seq. Then, the performance of different GSS tools was extensively evaluated using up to ten scATAC-seq datasets. Moreover, we evaluated the impact of gene activity conversion, dropout imputation, and gene set collections on the results of GSS. Results show that dropout imputation can significantly promote the performance of almost all GSS tools, while the impact of gene activity conversion methods or gene set collections on GSS performance is more dependent on GSS tools or datasets. Finally, we provided practical guidelines for choosing appropriate preprocessing methods and GSS tools in different application scenarios.


Subject(s)
Algorithms , Benchmarking , Chromatin Immunoprecipitation Sequencing , Single-Cell Analysis , Single-Cell Analysis/methods , Single-Cell Analysis/standards , Humans , Chromatin Immunoprecipitation Sequencing/methods , RNA-Seq/methods , RNA-Seq/standards , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/standards , Gene Expression Profiling/methods , Gene Expression Profiling/standards , Chromatin/genetics , Chromatin/metabolism
14.
Life Sci Alliance ; 7(9)2024 Sep.
Article in English | MEDLINE | ID: mdl-38969365

ABSTRACT

Zn2+ is an essential metal required by approximately 850 human transcription factors. How these proteins acquire their essential Zn2+ cofactor and whether they are sensitive to changes in the labile Zn2+ pool in cells remain open questions. Using ATAC-seq to profile regions of accessible chromatin coupled with transcription factor enrichment analysis, we examined how increases and decreases in the labile zinc pool affect chromatin accessibility and transcription factor enrichment. We found 685 transcription factor motifs were differentially enriched, corresponding to 507 unique transcription factors. The pattern of perturbation and the types of transcription factors were notably different at promoters versus intergenic regions, with zinc-finger transcription factors strongly enriched in intergenic regions in elevated Zn2+ To test whether ATAC-seq and transcription factor enrichment analysis predictions correlate with changes in transcription factor binding, we used ChIP-qPCR to profile six p53 binding sites. We found that for five of the six targets, p53 binding correlates with the local accessibility determined by ATAC-seq. These results demonstrate that changes in labile zinc alter chromatin accessibility and transcription factor binding to DNA.


Subject(s)
Chromatin , DNA , Protein Binding , Transcription Factors , Tumor Suppressor Protein p53 , Zinc , Humans , Tumor Suppressor Protein p53/metabolism , Tumor Suppressor Protein p53/genetics , Chromatin/metabolism , Chromatin/genetics , Zinc/metabolism , DNA/metabolism , DNA/genetics , Binding Sites , Transcription Factors/metabolism , Transcription Factors/genetics , Promoter Regions, Genetic/genetics , Chromatin Immunoprecipitation Sequencing/methods
15.
Brief Bioinform ; 25(Supplement_1)2024 Jul 23.
Article in English | MEDLINE | ID: mdl-39041910

ABSTRACT

Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) generates genome-wide chromatin accessibility profiles, providing valuable insights into epigenetic gene regulation at both pooled-cell and single-cell population levels. Comprehensive analysis of ATAC-seq data involves the use of various interdependent programs. Learning the correct sequence of steps needed to process the data can represent a major hurdle. Selecting appropriate parameters at each stage, including pre-analysis, core analysis, and advanced downstream analysis, is important to ensure accurate analysis and interpretation of ATAC-seq data. Additionally, obtaining and working within a limited computational environment presents a significant challenge to non-bioinformatic researchers. Therefore, we present Cloud ATAC, an open-source, cloud-based interactive framework with a scalable, flexible, and streamlined analysis framework based on the best practices approach for pooled-cell and single-cell ATAC-seq data. These frameworks use on-demand computational power and memory, scalability, and a secure and compliant environment provided by the Google Cloud. Additionally, we leverage Jupyter Notebook's interactive computing platform that combines live code, tutorials, narrative text, flashcards, quizzes, and custom visualizations to enhance learning and analysis. Further, leveraging GPU instances has significantly improved the run-time of the single-cell framework. The source codes and data are publicly available through NIH Cloud lab https://github.com/NIGMS/ATAC-Seq-and-Single-Cell-ATAC-Seq-Analysis. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.


Subject(s)
Cloud Computing , High-Throughput Nucleotide Sequencing , Software , High-Throughput Nucleotide Sequencing/methods , Humans , Computational Biology/methods , Chromatin Immunoprecipitation Sequencing/methods , Single-Cell Analysis/methods , Chromatin/genetics , Chromatin/metabolism
16.
Methods Mol Biol ; 2826: 55-63, 2024.
Article in English | MEDLINE | ID: mdl-39017885

ABSTRACT

The Assay for Transposase Accessible Chromatin (ATAC)-seq protocol is optimized to generate global maps of accessible chromatin using limited cell inputs. The Tn5 transposase tagmentation reaction simultaneously fragments and tags the accessible DNA with Illumina Nextera sequencing adapters. Fragmented and adapter tagged DNA is then purified and PCR amplified with dual indexing primers to generate a size-specific sequencing library. The One-Step workflow below outlines the Tn5 nuclei transposition from a range of cell inputs followed by PCR amplification to generate a sequencing library.


Subject(s)
B-Lymphocytes , Chromatin , High-Throughput Nucleotide Sequencing , Transposases , Chromatin/genetics , Chromatin/metabolism , Transposases/metabolism , Transposases/genetics , B-Lymphocytes/metabolism , High-Throughput Nucleotide Sequencing/methods , Humans , Gene Library , Sequence Analysis, DNA/methods , Polymerase Chain Reaction/methods , Animals , DNA/genetics , Chromatin Immunoprecipitation Sequencing/methods
17.
Methods Mol Biol ; 2842: 419-447, 2024.
Article in English | MEDLINE | ID: mdl-39012609

ABSTRACT

Chromatin immunoprecipitation (ChIP) is an invaluable method to characterize interactions between proteins and genomic DNA, such as the genomic localization of transcription factors and post-translational modification of histones. DNA and proteins are reversibly and covalently crosslinked using formaldehyde. Then the cells are lysed to release the chromatin. The chromatin is fragmented into smaller sizes either by micrococcal nuclease (MN) or sonication and then purified from other cellular components. The protein-DNA complexes are enriched by immunoprecipitation (IP) with antibodies that target the epitope of interest. The DNA is released from the proteins by heat and protease treatment, followed by degradation of contaminating RNAs with RNase. The resulting DNA is analyzed using various methods, including polymerase chain reaction (PCR), quantitative PCR (qPCR), or sequencing. This protocol outlines each of these steps for both yeast and human cells. This chapter includes a contextual discussion of the combination of ChIP with DNA analysis methods such as ChIP-on-Chip, ChIP-qPCR, and ChIP-Seq, recent updates on ChIP-Seq data analysis pipelines, complementary methods for identification of binding sites of DNA binding proteins, and additional protocol information about ChIP-qPCR and ChIP-Seq.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Humans , Chromatin Immunoprecipitation Sequencing/methods , Chromatin Immunoprecipitation/methods , DNA/genetics , DNA/metabolism , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/genetics , Binding Sites , Chromatin/genetics , Chromatin/metabolism , High-Throughput Nucleotide Sequencing/methods
18.
Genome Res ; 34(6): 937-951, 2024 Jul 23.
Article in English | MEDLINE | ID: mdl-38986578

ABSTRACT

Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. However, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard "multimapped" reads that align equally well to multiple genomic locations. Because multimapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multimapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multimapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multimapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq data sets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly beneficial in identifying ChIP-seq peaks at centromeres, near segmentally duplicated genes, and in younger TEs, enabling new regulatory analyses in these regions.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Humans , Chromatin Immunoprecipitation Sequencing/methods , Regulatory Sequences, Nucleic Acid , Repetitive Sequences, Nucleic Acid , Genomics/methods , Binding Sites , CCCTC-Binding Factor/metabolism , CCCTC-Binding Factor/genetics , Regulatory Elements, Transcriptional , DNA Transposable Elements , Sequence Analysis, DNA/methods , Neural Networks, Computer
19.
Genes (Basel) ; 15(7)2024 Jul 05.
Article in English | MEDLINE | ID: mdl-39062661

ABSTRACT

In recent years, there has been a growing interest in profiling multiomic modalities within individual cells simultaneously. One such example is integrating combined single-cell RNA sequencing (scRNA-seq) data and single-cell transposase-accessible chromatin sequencing (scATAC-seq) data. Integrated analysis of diverse modalities has helped researchers make more accurate predictions and gain a more comprehensive understanding than with single-modality analysis. However, generating such multimodal data is technically challenging and expensive, leading to limited availability of single-cell co-assay data. Here, we propose a model for cross-modal prediction between the transcriptome and chromatin profiles in single cells. Our model is based on a deep neural network architecture that learns the latent representations from the source modality and then predicts the target modality. It demonstrates reliable performance in accurately translating between these modalities across multiple paired human scATAC-seq and scRNA-seq datasets. Additionally, we developed CrossMP, a web-based portal allowing researchers to upload their single-cell modality data through an interactive web interface and predict the other type of modality data, using high-performance computing resources plugged at the backend.


Subject(s)
Chromatin Immunoprecipitation Sequencing , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , RNA-Seq/methods , Chromatin Immunoprecipitation Sequencing/methods , Software , Internet , Transcriptome/genetics , Sequence Analysis, RNA/methods , Chromatin/genetics , Chromatin/metabolism , Single-Cell Gene Expression Analysis
20.
Methods Mol Biol ; 2819: 39-53, 2024.
Article in English | MEDLINE | ID: mdl-39028501

ABSTRACT

Nucleotide sequences recognized and bound by DNA-binding proteins (DBPs) are critical to controlling and maintaining gene expression, replication, chromosome segregation, cell division, and nucleoid structure in bacterial cells. Therefore, determination of the binding sequences of DBPs is important not only to study DBP recognition mechanisms but also to understand the fundamentals of cell homeostasis. While ChIP-seq analysis appears to be an effective way to determine DBP binding sites on the genome, the resolution is sometimes not sufficient to identify the sites precisely. Here we introduce a simple and effective method named Genome Footprinting with high-throughput sequencing (GeF-seq) to determine binding sites of DBPs with single base-pair resolution. GeF-seq detects binding sites of DBPs as sharp peaks and thus makes it possible to identify the recognition sequence in each "binding peak" more easily and accurately compared to the common ChIP-seq.


Subject(s)
Chromatin Immunoprecipitation Sequencing , DNA-Binding Proteins , High-Throughput Nucleotide Sequencing , Chromatin Immunoprecipitation Sequencing/methods , High-Throughput Nucleotide Sequencing/methods , Binding Sites , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/genetics , Base Pairing , Protein Binding , DNA Footprinting/methods
SELECTION OF CITATIONS
SEARCH DETAIL