Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 886
1.
Nat Commun ; 15(1): 3606, 2024 May 02.
Article En | MEDLINE | ID: mdl-38697975

Amyotrophic Lateral Sclerosis (ALS), like many other neurodegenerative diseases, is highly heritable, but with only a small fraction of cases explained by monogenic disease alleles. To better understand sporadic ALS, we report epigenomic profiles, as measured by ATAC-seq, of motor neuron cultures derived from a diverse group of 380 ALS patients and 80 healthy controls. We find that chromatin accessibility is heavily influenced by sex, the iPSC cell type of origin, ancestry, and the inherent variance arising from sequencing. Once these covariates are corrected for, we are able to identify ALS-specific signals in the data. Additionally, we find that the ATAC-seq data is able to predict ALS disease progression rates with similar accuracy to methods based on biomarkers and clinical status. These results suggest that iPSC-derived motor neurons recapitulate important disease-relevant epigenomic changes.


Amyotrophic Lateral Sclerosis , Induced Pluripotent Stem Cells , Motor Neurons , Humans , Amyotrophic Lateral Sclerosis/genetics , Amyotrophic Lateral Sclerosis/pathology , Amyotrophic Lateral Sclerosis/metabolism , Induced Pluripotent Stem Cells/metabolism , Motor Neurons/metabolism , Motor Neurons/pathology , Male , Female , Middle Aged , Case-Control Studies , Chromatin/metabolism , Chromatin/genetics , Aged , Epigenomics/methods , Chromatin Immunoprecipitation Sequencing/methods , Disease Progression , Epigenesis, Genetic
2.
Int J Mol Sci ; 25(9)2024 May 03.
Article En | MEDLINE | ID: mdl-38732207

Prediction of binding sites for transcription factors is important to understand how the latter regulate gene expression and how this regulation can be modulated for therapeutic purposes. A consistent number of references address this issue with different approaches, Machine Learning being one of the most successful. Nevertheless, we note that many such approaches fail to propose a robust and meaningful method to embed the genetic data under analysis. We try to overcome this problem by proposing a bidirectional transformer-based encoder, empowered by bidirectional long-short term memory layers and with a capsule layer responsible for the final prediction. To evaluate the efficiency of the proposed approach, we use benchmark ChIP-seq datasets of five cell lines available in the ENCODE repository (A549, GM12878, Hep-G2, H1-hESC, and Hela). The results show that the proposed method can predict TFBS within the five different cell lines very well; moreover, cross-cell predictions provide satisfactory results as well. Experiments conducted across cell lines are reinforced by the analysis of five additional lines used only to test the model trained using the others. The results confirm that prediction across cell lines remains very high, allowing an extensive cross-transcription factor analysis to be performed from which several indications of interest for molecular biology may be drawn.


Deep Learning , Transcription Factors , Humans , Transcription Factors/metabolism , Transcription Factors/genetics , Binding Sites , Computational Biology/methods , HeLa Cells , Protein Binding , Chromatin Immunoprecipitation Sequencing/methods , Cell Line
3.
Genomics ; 116(3): 110851, 2024 May.
Article En | MEDLINE | ID: mdl-38692440

Skeletal muscle satellite cells (SMSCs) play an important role in regulating muscle growth and regeneration. Chromatin accessibility allows physical interactions that synergistically regulate gene expression through enhancers, promoters, insulators, and chromatin binding factors. However, the chromatin accessibility altas and its regulatory role in ovine myoblast differentiation is still unclear. Therefore, ATAC-seq and RNA-seq analysis were performed on ovine SMSCs at the proliferation stage (SCG) and differentiation stage (SCD). 17,460 DARs (differential accessibility regions) and 3732 DEGs (differentially expressed genes) were identified. Based on joint analysis of ATAC-seq and RNA-seq, we revealed that PI3K-Akt, TGF-ß and other signaling pathways regulated SMSCs differentiation. We identified two novel candidate genes, FZD5 and MAP2K6, which may affect the proliferation and differentiation of SMSCs. Our data identify potential cis regulatory elements of ovine SMSCs. This study can provide a reference for exploring the mechanisms of the differentiation and regeneration of SMSCs in the future.


Cell Differentiation , Muscle Development , Satellite Cells, Skeletal Muscle , Animals , Satellite Cells, Skeletal Muscle/metabolism , Satellite Cells, Skeletal Muscle/cytology , Sheep/genetics , Muscle Development/genetics , Frizzled Receptors/genetics , Frizzled Receptors/metabolism , RNA-Seq , Signal Transduction , Cells, Cultured , Chromatin Immunoprecipitation Sequencing , Transforming Growth Factor beta/metabolism , Transforming Growth Factor beta/genetics , Phosphatidylinositol 3-Kinases/metabolism , Phosphatidylinositol 3-Kinases/genetics , Proto-Oncogene Proteins c-akt/metabolism , Proto-Oncogene Proteins c-akt/genetics , Cell Proliferation
4.
J Proteome Res ; 23(6): 1937-1947, 2024 Jun 07.
Article En | MEDLINE | ID: mdl-38776154

Lactylation is a novel post-translational modification of proteins. Although the histone lactylation modification has been reported to be involved in glucose metabolism, its role and molecular pathways in gestational diabetes mellitus (GDM) are still unclear. This study aims to elucidate the histone lactylation modification landscapes of GDM patients and explore lactylation-modification-related genes involved in GDM. We employed a combination of RNA-seq analysis and chromatin immunoprecipitation sequencing (ChIP-seq) analysis to identify upregulated differentially expressed genes (DEGs) with hyperhistone lactylation modification in GDM. We demonstrated that the levels of lactate and histone lactylation were significantly elevated in GDM patients. DEGs were involved in diabetes-related pathways, such as the PI3K-Akt signaling pathway, Jak-STAT signaling pathway, and mTOR signaling pathway. ChIP-seq analysis indicated that histone lactylation modification in the promoter regions of the GDM group was significantly changed. By integrating the results of RNA-seq and ChIP-seq analysis, we found that CACNA2D1 is a key gene for histone lactylation modification and is involved in the progression of GDM by promoting cell vitality and proliferation. In conclusion, we identified the key gene CACNA2D1, which upregulated and exhibited hypermodification of histone lactylation in GDM. These findings establish a theoretical groundwork for the targeted therapy of GDM.


Chromatin Immunoprecipitation Sequencing , Diabetes, Gestational , Histones , Protein Processing, Post-Translational , Diabetes, Gestational/genetics , Diabetes, Gestational/metabolism , Humans , Female , Pregnancy , Histones/metabolism , Histones/genetics , Signal Transduction/genetics , RNA-Seq , Adult
5.
BMC Bioinformatics ; 25(1): 158, 2024 Apr 20.
Article En | MEDLINE | ID: mdl-38643066

BACKGROUND: Motif finding in Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) data is essential to reveal the intricacies of transcription factor binding sites (TFBSs) and their pivotal roles in gene regulation. Deep learning technologies including convolutional neural networks (CNNs) and graph neural networks (GNNs), have achieved success in finding ATAC-seq motifs. However, CNN-based methods are limited by the fixed width of the convolutional kernel, which makes it difficult to find multiple transcription factor binding sites with different lengths. GNN-based methods has the limitation of using the edge weight information directly, makes it difficult to aggregate the neighboring nodes' information more efficiently when representing node embedding. RESULTS: To address this challenge, we developed a novel graph attention network framework named MMGAT, which employs an attention mechanism to adjust the attention coefficients among different nodes. And then MMGAT finds multiple ATAC-seq motifs based on the attention coefficients of sequence nodes and k-mer nodes as well as the coexisting probability of k-mers. Our approach achieved better performance on the human ATAC-seq datasets compared to existing tools, as evidenced the highest scores on the precision, recall, F1_score, ACC, AUC, and PRC metrics, as well as finding 389 higher quality motifs. To validate the performance of MMGAT in predicting TFBSs and finding motifs on more datasets, we enlarged the number of the human ATAC-seq datasets to 180 and newly integrated 80 mouse ATAC-seq datasets for multi-species experimental validation. Specifically on the mouse ATAC-seq dataset, MMGAT also achieved the highest scores on six metrics and found 356 higher-quality motifs. To facilitate researchers in utilizing MMGAT, we have also developed a user-friendly web server named MMGAT-S that hosts the MMGAT method and ATAC-seq motif finding results. CONCLUSIONS: The advanced methodology MMGAT provides a robust tool for finding ATAC-seq motifs, and the comprehensive server MMGAT-S makes a significant contribution to genomics research. The open-source code of MMGAT can be found at https://github.com/xiaotianr/MMGAT , and MMGAT-S is freely available at https://www.mmgraphws.com/MMGAT-S/ .


Chromatin Immunoprecipitation Sequencing , Genomics , Humans , Animals , Mice , Binding Sites , Protein Binding , Genomics/methods , Chromatin/genetics , Transcription Factors/metabolism
6.
Sci Rep ; 14(1): 9275, 2024 04 23.
Article En | MEDLINE | ID: mdl-38654130

Transcription factors (TFs) are crucial epigenetic regulators, which enable cells to dynamically adjust gene expression in response to environmental signals. Computational procedures like digital genomic footprinting on chromatin accessibility assays such as ATACseq can be used to identify bound TFs in a genome-wide scale. This method utilizes short regions of low accessibility signals due to steric hindrance of DNA bound proteins, called footprints (FPs), which are combined with motif databases for TF identification. However, while over 1600 TFs have been described in the human genome, only ~ 700 of these have a known binding motif. Thus, a substantial number of FPs without overlap to a known DNA motif are normally discarded from FP analysis. In addition, the FP method is restricted to organisms with a substantial number of known TF motifs. Here we present DENIS (DE Novo motIf diScovery), a framework to generate and systematically investigate the potential of de novo TF motif discovery from FPs. DENIS includes functionality (1) to isolate FPs without binding motifs, (2) to perform de novo motif generation and (3) to characterize novel motifs. Here, we show that the framework rediscovers artificially removed TF motifs, quantifies de novo motif usage during an early embryonic development example dataset, and is able to analyze and uncover TF activity in organisms lacking canonical motifs. The latter task is exemplified by an investigation of a scATAC-seq dataset in zebrafish which covers different cell types during hematopoiesis.


Chromatin Immunoprecipitation Sequencing , Nucleotide Motifs , Transcription Factors , Zebrafish , Transcription Factors/metabolism , Transcription Factors/genetics , Animals , Zebrafish/genetics , Zebrafish/metabolism , Chromatin Immunoprecipitation Sequencing/methods , Humans , Binding Sites , Protein Binding , DNA Footprinting/methods , Computational Biology/methods , Chromatin/metabolism , Chromatin/genetics
7.
Nat Comput Sci ; 4(4): 285-298, 2024 Apr.
Article En | MEDLINE | ID: mdl-38600256

The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.


Chromatin , Single-Cell Analysis , Humans , Algorithms , Chromatin/genetics , Chromatin/metabolism , Chromatin Immunoprecipitation Sequencing/methods , Computational Biology/methods , Genome/genetics , Genomics/methods , Neoplasms/genetics , Single-Cell Analysis/methods , Transposases/genetics , Transposases/metabolism
8.
Methods ; 226: 151-160, 2024 Jun.
Article En | MEDLINE | ID: mdl-38670416

Chromatin loop is of crucial importance for the regulation of gene transcription. Cohesin is a type of chromatin-associated protein that mediates the interaction of chromatin through the loop extrusion. Cohesin-mediated chromatin interactions have strong cell-type specificity, posing a challenge for predicting chromatin loops. Existing computational methods perform poorly in predicting cell-type-specific chromatin loops. To address this issue, we propose a random forest model to predict cell-type-specific cohesin-mediated chromatin loops based on chromatin states identified by ChromHMM and the occupancy of related factors. Our results show that chromatin state is responsible for cell-type-specificity of loops. Using only chromatin states as features, the model achieved high accuracy in predicting cell-type-specific loops between two cell types and can be applied to different cell types. Furthermore, when chromatin states are combined with the occurrence frequency of CTCF, RAD21, YY1, and H3K27ac ChIP-seq peaks, more accurate prediction can be achieved. Our feature extraction method provides novel insights into predicting cell-type-specific chromatin loops and reveals the relationship between chromatin state and chromatin loop formation.


CCCTC-Binding Factor , Cell Cycle Proteins , Chromatin , Chromosomal Proteins, Non-Histone , Cohesins , Chromosomal Proteins, Non-Histone/metabolism , Chromosomal Proteins, Non-Histone/genetics , Cell Cycle Proteins/metabolism , Cell Cycle Proteins/genetics , Chromatin/metabolism , Chromatin/genetics , Humans , CCCTC-Binding Factor/metabolism , CCCTC-Binding Factor/genetics , YY1 Transcription Factor/metabolism , YY1 Transcription Factor/genetics , Nuclear Proteins/metabolism , Nuclear Proteins/genetics , Computational Biology/methods , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/genetics , Histones/metabolism , Histones/genetics , Phosphoproteins/metabolism , Phosphoproteins/genetics , Chromatin Immunoprecipitation Sequencing/methods
9.
Nucleic Acids Res ; 52(8): 4137-4150, 2024 May 08.
Article En | MEDLINE | ID: mdl-38572749

DNA motifs are crucial patterns in gene regulation. DNA-binding proteins (DBPs), including transcription factors, can bind to specific DNA motifs to regulate gene expression and other cellular activities. Past studies suggest that DNA shape features could be subtly involved in DNA-DBP interactions. Therefore, the shape motif annotations based on intrinsic DNA topology can deepen the understanding of DNA-DBP binding. Nevertheless, high-throughput tools for DNA shape motif discovery that incorporate multiple features altogether remain insufficient. To address it, we propose a series of methods to discover non-redundant DNA shape motifs with the generalization to multiple motifs in multiple shape features. Specifically, an existing Gibbs sampling method is generalized to multiple DNA motif discovery with multiple shape features. Meanwhile, an expectation-maximization (EM) method and a hybrid method coupling EM with Gibbs sampling are proposed and developed with promising performance, convergence capability, and efficiency. The discovered DNA shape motif instances reveal insights into low-signal ChIP-seq peak summits, complementing the existing sequence motif discovery works. Additionally, our modelling captures the potential interplays across multiple DNA shape features. We provide a valuable platform of tools for DNA shape motif discovery. An R package is built for open accessibility and long-lasting impact: https://zenodo.org/doi/10.5281/zenodo.10558980.


DNA , Nucleotide Motifs , DNA/chemistry , DNA/genetics , DNA/metabolism , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , Algorithms , Nucleic Acid Conformation , Chromatin Immunoprecipitation Sequencing/methods , Binding Sites , Transcription Factors/metabolism , Transcription Factors/genetics , Transcription Factors/chemistry , Humans , Protein Binding
10.
Genome Biol ; 25(1): 90, 2024 Apr 08.
Article En | MEDLINE | ID: mdl-38589969

Single-cell ATAC-seq has emerged as a powerful approach for revealing candidate cis-regulatory elements genome-wide at cell-type resolution. However, current single-cell methods suffer from limited throughput and high costs. Here, we present a novel technique called scifi-ATAC-seq, single-cell combinatorial fluidic indexing ATAC-sequencing, which combines a barcoded Tn5 pre-indexing step with droplet-based single-cell ATAC-seq using the 10X Genomics platform. With scifi-ATAC-seq, up to 200,000 nuclei across multiple samples can be indexed in a single emulsion reaction, representing an approximately 20-fold increase in throughput compared to the standard 10X Genomics workflow.


Chromatin Immunoprecipitation Sequencing , Chromatin , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Cell Nucleus
11.
Nucleic Acids Res ; 52(9): 5179-5194, 2024 May 22.
Article En | MEDLINE | ID: mdl-38647081

Transcription factor RBPJ is the central component in Notch signal transduction and directly forms a coactivator complex together with the Notch intracellular domain (NICD). While RBPJ protein levels remain constant in most tissues, dynamic expression of Notch target genes varies depending on the given cell-type and the Notch activity state. To elucidate dynamic RBPJ binding genome-wide, we investigated RBPJ occupancy by ChIP-Seq. Surprisingly, only a small set of the total RBPJ sites show a dynamic binding behavior in response to Notch signaling. Compared to static RBPJ sites, dynamic sites differ in regard to their chromatin state, binding strength and enhancer positioning. Dynamic RBPJ sites are predominantly located distal to transcriptional start sites (TSSs), while most static sites are found in promoter-proximal regions. Importantly, gene responsiveness is preferentially associated with dynamic RBPJ binding sites and this static and dynamic binding behavior is repeatedly observed across different cell types and species. Based on the above findings we used a machine-learning algorithm to predict Notch responsiveness with high confidence in different cellular contexts. Our results strongly support the notion that the combination of binding strength and enhancer positioning are indicative of Notch responsiveness.


Immunoglobulin J Recombination Signal Sequence-Binding Protein , Receptors, Notch , Immunoglobulin J Recombination Signal Sequence-Binding Protein/metabolism , Immunoglobulin J Recombination Signal Sequence-Binding Protein/genetics , Receptors, Notch/metabolism , Receptors, Notch/genetics , Binding Sites , Humans , Mice , Enhancer Elements, Genetic , Animals , Signal Transduction/genetics , Protein Binding , Promoter Regions, Genetic , Genomics/methods , Chromatin/metabolism , Chromatin/genetics , Transcription Initiation Site , Chromatin Immunoprecipitation Sequencing , Machine Learning , Gene Expression Regulation
12.
Nucleic Acids Res ; 52(9): e46, 2024 May 22.
Article En | MEDLINE | ID: mdl-38647069

SifiNet is a robust and accurate computational pipeline for identifying distinct gene sets, extracting and annotating cellular subpopulations, and elucidating intrinsic relationships among these subpopulations. Uniquely, SifiNet bypasses the cell clustering stage, commonly integrated into other cellular annotation pipelines, thereby circumventing potential inaccuracies in clustering that may compromise subsequent analyses. Consequently, SifiNet has demonstrated superior performance in multiple experimental datasets compared with other state-of-the-art methods. SifiNet can analyze both single-cell RNA and ATAC sequencing data, thereby rendering comprehensive multi-omic cellular profiles. It is conveniently available as an open-source R package.


Single-Cell Analysis , Software , Single-Cell Analysis/methods , Humans , Molecular Sequence Annotation , Algorithms , Computational Biology/methods , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Chromatin Immunoprecipitation Sequencing/methods , Cluster Analysis
13.
Genome Biol ; 25(1): 78, 2024 Mar 22.
Article En | MEDLINE | ID: mdl-38519979

We develop a large-scale single-cell ATAC-seq method by combining Tn5-based pre-indexing with 10× Genomics barcoding, enabling the indexing of up to 200,000 nuclei across multiple samples in a single reaction. We profile 449,953 nuclei across diverse tissues, including the human cortex, mouse brain, human lung, mouse lung, mouse liver, and lung tissue from a club cell secretory protein knockout (CC16-/-) model. Our study of CC16-/- nuclei uncovers previously underappreciated technical artifacts derived from remnant 129 mouse strain genetic material, which cause profound cell-type-specific changes in regulatory elements near many genes, thereby confounding the interpretation of this commonly referenced mouse model.


Chromatin Immunoprecipitation Sequencing , Chromatin , Animals , Mice , Humans , Chromatin/metabolism , Cell Nucleus/genetics , Regulatory Sequences, Nucleic Acid
14.
BMC Bioinformatics ; 25(1): 123, 2024 Mar 21.
Article En | MEDLINE | ID: mdl-38515011

BACKGROUND: Chromosome is one of the most fundamental part of cell biology where DNA holds the hierarchical information. DNA compacts its size by forming loops, and these regions house various protein particles, including CTCF, SMC3, H3 histone. Numerous sequencing methods, such as Hi-C, ChIP-seq, and Micro-C, have been developed to investigate these properties. Utilizing these data, scientists have developed a variety of loop prediction techniques that have greatly improved their methods for characterizing loop prediction and related aspects. RESULTS: In this study, we categorized 22 loop calling methods and conducted a comprehensive study of 11 of them. Additionally, we have provided detailed insights into the methodologies underlying these algorithms for loop detection, categorizing them into five distinct groups based on their fundamental approaches. Furthermore, we have included critical information such as resolution, input and output formats, and parameters. For this analysis, we utilized the GM12878 Hi-C datasets at 5 KB, 10 KB, 100 KB and 250 KB resolutions. Our evaluation criteria encompassed various factors, including memory usages, running time, sequencing depth, and recovery of protein-specific sites such as CTCF, H3K27ac, and RNAPII. CONCLUSION: This analysis offers insights into the loop detection processes of each method, along with the strengths and weaknesses of each, enabling readers to effectively choose suitable methods for their datasets. We evaluate the capabilities of these tools and introduce a novel Biological, Consistency, and Computational robustness score ( B C C score ) to measure their overall robustness ensuring a comprehensive evaluation of their performance.


Chromatin , Chromosomes , Chromatin/genetics , DNA , Chromatin Immunoprecipitation Sequencing , Algorithms
15.
Brief Bioinform ; 25(2)2024 Jan 22.
Article En | MEDLINE | ID: mdl-38493346

Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data provided new insights into the understanding of epigenetic heterogeneity and transcriptional regulation. With the increasing abundance of dataset resources, there is an urgent need to extract more useful information through high-quality data analysis methods specifically designed for scATAC-seq. However, analyzing scATAC-seq data poses challenges due to its near binarization, high sparsity and ultra-high dimensionality properties. Here, we proposed a novel network diffusion-based computational method to comprehensively analyze scATAC-seq data, named Single-Cell ATAC-seq Analysis via Network Refinement with Peaks Location Information (SCARP). SCARP formulates the Network Refinement diffusion method under the graph theory framework to aggregate information from different network orders, effectively compensating for missing signals in the scATAC-seq data. By incorporating distance information between adjacent peaks on the genome, SCARP also contributes to depicting the co-accessibility of peaks. These two innovations empower SCARP to obtain lower-dimensional representations for both cells and peaks more effectively. We have demonstrated through sufficient experiments that SCARP facilitated superior analyses of scATAC-seq data. Specifically, SCARP exhibited outstanding cell clustering performance, enabling better elucidation of cell heterogeneity and the discovery of new biologically significant cell subpopulations. Additionally, SCARP was also instrumental in portraying co-accessibility relationships of accessible regions and providing new insight into transcriptional regulation. Consequently, SCARP identified genes that were involved in key Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways related to diseases and predicted reliable cis-regulatory interactions. To sum up, our studies suggested that SCARP is a promising tool to comprehensively analyze the scATAC-seq data.


Chromatin Immunoprecipitation Sequencing , Chromatin , Chromatin Immunoprecipitation Sequencing/methods , Chromatin/genetics , Genome , Epigenomics , Data Analysis
16.
Nucleic Acids Res ; 52(7): e40, 2024 Apr 24.
Article En | MEDLINE | ID: mdl-38499482

Genome-wide binding assays aspire to map the complete binding pattern of gene regulators. Common practice relies on replication-duplicates or triplicates-and high stringency statistics to favor false negatives over false positives. Here we show that duplicates and triplicates of CUT&RUN are not sufficient to discover the entire activity of transcriptional regulators. We introduce ICEBERG (Increased Capture of Enrichment By Exhaustive Replicate aGgregation), a pipeline that harnesses large numbers of CUT&RUN replicates to discover the full set of binding events and chart the line between false positives and false negatives. We employed ICEBERG to map the full set of H3K4me3-marked regions, the targets of the co-factor ß-catenin, and those of the transcription factor TBX3, in human colorectal cancer cells. The ICEBERG datasets allow benchmarking of individual replicates, comparing the performance of peak calling and replication approaches, and expose the arbitrary nature of strategies to identify reproducible peaks. Instead of a static view of genomic targets, ICEBERG establishes a spectrum of detection probabilities across the genome for a given factor, underlying the intrinsic dynamicity of its mechanism of action, and permitting to distinguish frequent from rare regulation events. Finally, ICEBERG discovered instances, undetectable with other approaches, that underlie novel mechanisms of colorectal cancer progression.


Software , Transcription, Genetic , Humans , beta Catenin/metabolism , beta Catenin/genetics , Binding Sites , Cell Line, Tumor , Chromatin Immunoprecipitation Sequencing , Colorectal Neoplasms/genetics , Colorectal Neoplasms/metabolism , Genome, Human , Histones/metabolism , Histones/genetics , Protein Binding , T-Box Domain Proteins/genetics , T-Box Domain Proteins/metabolism , Transcription Factors/metabolism , Transcription Factors/genetics
17.
Int J Mol Sci ; 25(5)2024 Feb 28.
Article En | MEDLINE | ID: mdl-38474039

Ascidian larvae undergo tail elongation and notochord lumenogenesis, making them an ideal model for investigating tissue morphogenesis in embryogenesis. The cellular and mechanical mechanisms of these processes have been studied; however, the underlying molecular regulatory mechanism remains to be elucidated. In this study, assays for transposase-accessible chromatin using sequencing (ATAC-seq) and RNA sequencing (RNA-seq) were applied to investigate potential regulators of the development of ascidian Ciona savignyi larvae. Our results revealed 351 and 138 differentially accessible region genes through comparisons of ATAC-seq data between stages 21 and 24 and between stages 24 and 25, respectively. A joint analysis of RNA-seq and ATAC-seq data revealed a correlation between chromatin accessibility and gene transcription. We further verified the tissue expression patterns of 12 different genes. Among them, Cs-matrix metalloproteinase 24 (MMP24) and Cs-krüppel-like factor 5 (KLF5) were highly expressed in notochord cells. Functional assay results demonstrated that both genes are necessary for notochord lumen formation and expansion. Finally, we performed motif enrichment analysis of the differentially accessible regions in different tailbud stages and summarized the potential roles of these motif-bearing transcription factors in larval development. Overall, our study found a correlation between gene expression and chromatin accessibility and provided a vital resource for understanding the mechanisms of the development of ascidian embryos.


Ciona , Urochordata , Animals , Chromatin , Urochordata/genetics , Chromatin Immunoprecipitation Sequencing , Morphogenesis , Transcription Factors/genetics
18.
Mol Plant Pathol ; 25(3): e13446, 2024 Mar.
Article En | MEDLINE | ID: mdl-38502176

Animal studies have shown that virus infection causes changes in host chromatin accessibility, but little is known about changes in chromatin accessibility of plants infected by viruses and its potential impact. Here, rice infected by rice stripe virus (RSV) was used to investigate virus-induced changes in chromatin accessibility. Our analysis identified a total of 6462 open- and 3587 closed-differentially accessible chromatin regions (DACRs) in rice under RSV infection by ATAC-seq. Additionally, by integrating ATAC-seq and RNA-seq, 349 up-regulated genes in open-DACRs and 126 down-regulated genes in closed-DACRs were identified, of which 34 transcription factors (TFs) were further identified by search of upstream motifs. Transcription levels of eight of these TFs were validated by reverse transcription-PCR. Importantly, four of these TFs (OsWRKY77, OsWRKY28, OsZFP12 and OsERF91) interacted with RSV proteins and are therefore predicted to play important roles in RSV infection. This is the first application of ATAC-seq and RNA-seq techniques to analyse changes in rice chromatin accessibility caused by RSV infection. Integrating ATAC-seq and RNA-seq provides a new approach to select candidate TFs in response to virus infection.


Oryza , Respiratory Syncytial Virus Infections , Tenuivirus , Animals , Transcription Factors/genetics , Oryza/genetics , Tenuivirus/genetics , Chromatin Immunoprecipitation Sequencing , RNA-Seq , Chromatin , Data Analysis
19.
BMC Genomics ; 25(1): 300, 2024 Mar 21.
Article En | MEDLINE | ID: mdl-38515040

BACKGROUND: The Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) utilizes the Transposase Tn5 to probe open chromatic, which simultaneously reveals multiple transcription factor binding sites (TFBSs) compared to traditional technologies. Deep learning (DL) technology, including convolutional neural networks (CNNs), has successfully found motifs from ATAC-seq data. Due to the limitation of the width of convolutional kernels, the existing models only find motifs with fixed lengths. A Graph neural network (GNN) can work on non-Euclidean data, which has the potential to find ATAC-seq motifs with different lengths. However, the existing GNN models ignored the relationships among ATAC-seq sequences, and their parameter settings should be improved. RESULTS: In this study, we proposed a novel GNN model named GNNMF to find ATAC-seq motifs via GNN and background coexisting probability. Our experiment has been conducted on 200 human datasets and 80 mouse datasets, demonstrated that GNNMF has improved the area of eight metrics radar scores of 4.92% and 6.81% respectively, and found more motifs than did the existing models. CONCLUSIONS: In this study, we developed a novel model named GNNMF for finding multiple ATAC-seq motifs. GNNMF built a multi-view heterogeneous graph by using ATAC-seq sequences, and utilized background coexisting probability and the iterloss to find different lengths of ATAC-seq motifs and optimize the parameter sets. Compared to existing models, GNNMF achieved the best performance on TFBS prediction and ATAC-seq motif finding, which demonstrates that our improvement is available for ATAC-seq motif finding.


Chromatin Immunoprecipitation Sequencing , High-Throughput Nucleotide Sequencing , Humans , Animals , Mice , Sequence Analysis, DNA , Chromatin/genetics , Neural Networks, Computer
20.
mSystems ; 9(4): e0095123, 2024 Apr 16.
Article En | MEDLINE | ID: mdl-38470037

The regulation of Bordetella pertussis virulence is mediated by the two-component system BvgA/S, which activates the transcription of virulence-activated genes (vags). In the avirulent phase, the vags are not expressed, but instead, virulence-repressed genes (vrgs) are expressed, under the control of another two-component system, RisA/K. Here, we combined transcriptomic and chromatin immunoprecipitation sequencing (ChIPseq) data to examine the RisA/K regulon. We performed RNAseq analyses of RisA-deficient and RisA-phosphoablative B. pertussis mutants cultivated in virulent and avirulent conditions. We confirmed that the expression of most vrgs is regulated by phosphorylated RisA. However, the expression of some, including those involved in flagellum biosynthesis and chemotaxis, requires RisA independently of phosphorylation. Many RisA-regulated genes encode proteins with regulatory functions, suggesting multiple RisA regulation cascades. By ChIPseq analyses, we identified 430 RisA-binding sites, 208 within promoter regions, 201 within open reading frames, and 21 in non-coding regions. RisA binding was demonstrated in the promoter regions of most vrgs and, surprisingly, of some vags, as well as for other genes not identified as vags or vrgs. Unexpectedly, many genes, including some vags, like prn, brpL, bipA, and cyaA, contain a BvgA-binding site and a RisA-binding site, which increases the complexity of the RisAK/BvgAS network in B. pertussis virulence regulation.IMPORTANCEThe expression of virulence-activated genes (vags) of Bordetella pertussis, the etiological agent of whooping cough, is under the transcriptional control of the two-component system BvgA/S, which allows the bacterium to switch between virulent and avirulent phases. In addition, the more recently identified two-component system RisA/K is required for the expression of B. pertussis genes, collectively named vrgs, that are repressed during the virulent phase but activated during the avirulent phase. We have characterized the RisA/K regulon by combined transcriptomic and chromatin immunoprecipitation sequencing analyses. We identified more than 400 RisA-binding sites. Many of them are localized in promoter regions, especially vrgs, but some were found within open reading frames and in non-coding regions. Surprisingly, RisA-binding sites were also found in promoter regions of some vags, illustrating the previously underappreciated complexity of virulence regulation in B. pertussis.


Bordetella pertussis , Whooping Cough , Humans , Bordetella pertussis/genetics , Regulon/genetics , Transcription Factors/genetics , Whooping Cough/genetics , Bacterial Proteins/genetics , Chromatin Immunoprecipitation Sequencing , Gene Expression Profiling
...