Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 194
Filter
1.
Brief Bioinform ; 25(Supplement_1)2024 Jul 23.
Article in English | MEDLINE | ID: mdl-39041910

ABSTRACT

Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) generates genome-wide chromatin accessibility profiles, providing valuable insights into epigenetic gene regulation at both pooled-cell and single-cell population levels. Comprehensive analysis of ATAC-seq data involves the use of various interdependent programs. Learning the correct sequence of steps needed to process the data can represent a major hurdle. Selecting appropriate parameters at each stage, including pre-analysis, core analysis, and advanced downstream analysis, is important to ensure accurate analysis and interpretation of ATAC-seq data. Additionally, obtaining and working within a limited computational environment presents a significant challenge to non-bioinformatic researchers. Therefore, we present Cloud ATAC, an open-source, cloud-based interactive framework with a scalable, flexible, and streamlined analysis framework based on the best practices approach for pooled-cell and single-cell ATAC-seq data. These frameworks use on-demand computational power and memory, scalability, and a secure and compliant environment provided by the Google Cloud. Additionally, we leverage Jupyter Notebook's interactive computing platform that combines live code, tutorials, narrative text, flashcards, quizzes, and custom visualizations to enhance learning and analysis. Further, leveraging GPU instances has significantly improved the run-time of the single-cell framework. The source codes and data are publicly available through NIH Cloud lab https://github.com/NIGMS/ATAC-Seq-and-Single-Cell-ATAC-Seq-Analysis. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.


Subject(s)
Cloud Computing , High-Throughput Nucleotide Sequencing , Software , High-Throughput Nucleotide Sequencing/methods , Humans , Computational Biology/methods , Chromatin Immunoprecipitation Sequencing/methods , Single-Cell Analysis/methods , Chromatin/genetics , Chromatin/metabolism
2.
Methods Mol Biol ; 2826: 55-63, 2024.
Article in English | MEDLINE | ID: mdl-39017885

ABSTRACT

The Assay for Transposase Accessible Chromatin (ATAC)-seq protocol is optimized to generate global maps of accessible chromatin using limited cell inputs. The Tn5 transposase tagmentation reaction simultaneously fragments and tags the accessible DNA with Illumina Nextera sequencing adapters. Fragmented and adapter tagged DNA is then purified and PCR amplified with dual indexing primers to generate a size-specific sequencing library. The One-Step workflow below outlines the Tn5 nuclei transposition from a range of cell inputs followed by PCR amplification to generate a sequencing library.


Subject(s)
B-Lymphocytes , Chromatin , High-Throughput Nucleotide Sequencing , Transposases , Chromatin/genetics , Chromatin/metabolism , Transposases/metabolism , Transposases/genetics , B-Lymphocytes/metabolism , High-Throughput Nucleotide Sequencing/methods , Humans , Gene Library , Sequence Analysis, DNA/methods , Polymerase Chain Reaction/methods , Animals , DNA/genetics , Chromatin Immunoprecipitation Sequencing/methods
3.
Life Sci Alliance ; 7(9)2024 Sep.
Article in English | MEDLINE | ID: mdl-38969365

ABSTRACT

Zn2+ is an essential metal required by approximately 850 human transcription factors. How these proteins acquire their essential Zn2+ cofactor and whether they are sensitive to changes in the labile Zn2+ pool in cells remain open questions. Using ATAC-seq to profile regions of accessible chromatin coupled with transcription factor enrichment analysis, we examined how increases and decreases in the labile zinc pool affect chromatin accessibility and transcription factor enrichment. We found 685 transcription factor motifs were differentially enriched, corresponding to 507 unique transcription factors. The pattern of perturbation and the types of transcription factors were notably different at promoters versus intergenic regions, with zinc-finger transcription factors strongly enriched in intergenic regions in elevated Zn2+ To test whether ATAC-seq and transcription factor enrichment analysis predictions correlate with changes in transcription factor binding, we used ChIP-qPCR to profile six p53 binding sites. We found that for five of the six targets, p53 binding correlates with the local accessibility determined by ATAC-seq. These results demonstrate that changes in labile zinc alter chromatin accessibility and transcription factor binding to DNA.


Subject(s)
Chromatin , DNA , Protein Binding , Transcription Factors , Tumor Suppressor Protein p53 , Zinc , Humans , Tumor Suppressor Protein p53/metabolism , Tumor Suppressor Protein p53/genetics , Chromatin/metabolism , Chromatin/genetics , Zinc/metabolism , DNA/metabolism , DNA/genetics , Binding Sites , Transcription Factors/metabolism , Transcription Factors/genetics , Promoter Regions, Genetic/genetics , Chromatin Immunoprecipitation Sequencing/methods
4.
Methods Mol Biol ; 2819: 39-53, 2024.
Article in English | MEDLINE | ID: mdl-39028501

ABSTRACT

Nucleotide sequences recognized and bound by DNA-binding proteins (DBPs) are critical to controlling and maintaining gene expression, replication, chromosome segregation, cell division, and nucleoid structure in bacterial cells. Therefore, determination of the binding sequences of DBPs is important not only to study DBP recognition mechanisms but also to understand the fundamentals of cell homeostasis. While ChIP-seq analysis appears to be an effective way to determine DBP binding sites on the genome, the resolution is sometimes not sufficient to identify the sites precisely. Here we introduce a simple and effective method named Genome Footprinting with high-throughput sequencing (GeF-seq) to determine binding sites of DBPs with single base-pair resolution. GeF-seq detects binding sites of DBPs as sharp peaks and thus makes it possible to identify the recognition sequence in each "binding peak" more easily and accurately compared to the common ChIP-seq.


Subject(s)
Chromatin Immunoprecipitation Sequencing , DNA-Binding Proteins , High-Throughput Nucleotide Sequencing , Chromatin Immunoprecipitation Sequencing/methods , High-Throughput Nucleotide Sequencing/methods , Binding Sites , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/genetics , Base Pairing , Protein Binding , DNA Footprinting/methods
5.
Methods Mol Biol ; 2842: 419-447, 2024.
Article in English | MEDLINE | ID: mdl-39012609

ABSTRACT

Chromatin immunoprecipitation (ChIP) is an invaluable method to characterize interactions between proteins and genomic DNA, such as the genomic localization of transcription factors and post-translational modification of histones. DNA and proteins are reversibly and covalently crosslinked using formaldehyde. Then the cells are lysed to release the chromatin. The chromatin is fragmented into smaller sizes either by micrococcal nuclease (MN) or sonication and then purified from other cellular components. The protein-DNA complexes are enriched by immunoprecipitation (IP) with antibodies that target the epitope of interest. The DNA is released from the proteins by heat and protease treatment, followed by degradation of contaminating RNAs with RNase. The resulting DNA is analyzed using various methods, including polymerase chain reaction (PCR), quantitative PCR (qPCR), or sequencing. This protocol outlines each of these steps for both yeast and human cells. This chapter includes a contextual discussion of the combination of ChIP with DNA analysis methods such as ChIP-on-Chip, ChIP-qPCR, and ChIP-Seq, recent updates on ChIP-Seq data analysis pipelines, complementary methods for identification of binding sites of DNA binding proteins, and additional protocol information about ChIP-qPCR and ChIP-Seq.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Humans , Chromatin Immunoprecipitation Sequencing/methods , Chromatin Immunoprecipitation/methods , DNA/genetics , DNA/metabolism , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/genetics , Binding Sites , Chromatin/genetics , Chromatin/metabolism , High-Throughput Nucleotide Sequencing/methods
6.
Article in English | MEDLINE | ID: mdl-39049508

ABSTRACT

Gene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing (RNA-seq) data, which helps to decipher single-cell heterogeneity and cell type-specific variability by incorporating prior knowledge from functional gene sets. Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a powerful technique for interrogating single-cell chromatin-based gene regulation, and genes or gene sets with dynamic regulatory potentials can be regarded as cell type-specific markers as if in single-cell RNA-seq (scRNA-seq). However, there are few GSS tools specifically designed for scATAC-seq, and the applicability and performance of RNA-seq GSS tools on scATAC-seq data remain to be investigated. Here, we systematically benchmarked ten GSS tools, including four bulk RNA-seq tools, five scRNA-seq tools, and one scATAC-seq method. First, using matched scATAC-seq and scRNA-seq datasets, we found that the performance of GSS tools on scATAC-seq data was comparable to that on scRNA-seq, suggesting their applicability to scATAC-seq. Then, the performance of different GSS tools was extensively evaluated using up to ten scATAC-seq datasets. Moreover, we evaluated the impact of gene activity conversion, dropout imputation, and gene set collections on the results of GSS. Results show that dropout imputation can significantly promote the performance of almost all GSS tools, while the impact of gene activity conversion methods or gene set collections on GSS performance is more dependent on GSS tools or datasets. Finally, we provided practical guidelines for choosing appropriate preprocessing methods and GSS tools in different application scenarios.


Subject(s)
Algorithms , Benchmarking , Chromatin Immunoprecipitation Sequencing , Single-Cell Analysis , Single-Cell Analysis/methods , Single-Cell Analysis/standards , Humans , Chromatin Immunoprecipitation Sequencing/methods , RNA-Seq/methods , RNA-Seq/standards , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/standards , Gene Expression Profiling/methods , Gene Expression Profiling/standards , Chromatin/genetics , Chromatin/metabolism
7.
Genome Res ; 34(6): 937-951, 2024 Jul 23.
Article in English | MEDLINE | ID: mdl-38986578

ABSTRACT

Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. However, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard "multimapped" reads that align equally well to multiple genomic locations. Because multimapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multimapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multimapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multimapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq data sets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly beneficial in identifying ChIP-seq peaks at centromeres, near segmentally duplicated genes, and in younger TEs, enabling new regulatory analyses in these regions.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Humans , Chromatin Immunoprecipitation Sequencing/methods , Regulatory Sequences, Nucleic Acid , Repetitive Sequences, Nucleic Acid , Genomics/methods , Binding Sites , CCCTC-Binding Factor/metabolism , CCCTC-Binding Factor/genetics , Regulatory Elements, Transcriptional , DNA Transposable Elements , Sequence Analysis, DNA/methods , Neural Networks, Computer
8.
Genes (Basel) ; 15(7)2024 Jul 05.
Article in English | MEDLINE | ID: mdl-39062661

ABSTRACT

In recent years, there has been a growing interest in profiling multiomic modalities within individual cells simultaneously. One such example is integrating combined single-cell RNA sequencing (scRNA-seq) data and single-cell transposase-accessible chromatin sequencing (scATAC-seq) data. Integrated analysis of diverse modalities has helped researchers make more accurate predictions and gain a more comprehensive understanding than with single-modality analysis. However, generating such multimodal data is technically challenging and expensive, leading to limited availability of single-cell co-assay data. Here, we propose a model for cross-modal prediction between the transcriptome and chromatin profiles in single cells. Our model is based on a deep neural network architecture that learns the latent representations from the source modality and then predicts the target modality. It demonstrates reliable performance in accurately translating between these modalities across multiple paired human scATAC-seq and scRNA-seq datasets. Additionally, we developed CrossMP, a web-based portal allowing researchers to upload their single-cell modality data through an interactive web interface and predict the other type of modality data, using high-performance computing resources plugged at the backend.


Subject(s)
Chromatin Immunoprecipitation Sequencing , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , RNA-Seq/methods , Chromatin Immunoprecipitation Sequencing/methods , Software , Internet , Transcriptome/genetics , Sequence Analysis, RNA/methods , Chromatin/genetics , Chromatin/metabolism , Single-Cell Gene Expression Analysis
9.
Methods Mol Biol ; 2810: 211-233, 2024.
Article in English | MEDLINE | ID: mdl-38926282

ABSTRACT

In traditional cell line design pipelines, cost- and time-intensive long-term stability studies must be performed due to random integration of the transgene into the genome. By this, integration into epigenetically silenced regions can lead to silencing of the recombinant promoter over time. Site-specific integration into regions with active chromatin structure can overcome this problem and lead to strong and stable gene expression. Here, we describe a detailed protocol to identify integration sites with epigenetically preferable properties by chromatin immunoprecipitation sequencing and use them for stable and strong gene expression by applying CRISPR/Cas9. Furthermore, the examination of the integration sites with focus on Cas9-targeted sequencing with nanopores is described.


Subject(s)
CRISPR-Cas Systems , Humans , Histone Code/genetics , Gene Editing/methods , Cell Line , Epigenesis, Genetic , Chromatin Immunoprecipitation Sequencing/methods , Histones/metabolism , Histones/genetics , Chromatin/genetics , Chromatin/metabolism
10.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 41(3): 552-559, 2024 Jun 25.
Article in Chinese | MEDLINE | ID: mdl-38932542

ABSTRACT

The rapid development of high-throughput chromatin conformation capture (Hi-C) technology provides rich genomic interaction data between chromosomal loci for chromatin structure analysis. However, existing methods for identifying topologically associated domains (TADs) based on Hi-C data suffer from low accuracy and sensitivity to parameters. In this context, a TAD identification method based on spatial density clustering was designed and implemented in this paper. The method preprocessed the raw Hi-C data to obtain normalized Hi-C contact matrix data. Then, it computed the distance matrix between loci, generated a reachability graph based on the core distance and reachability distance of loci, and extracted clustering clusters. Finally, it extracted TAD boundaries based on clustering results. This method could identify TAD structures with higher coherence, and TAD boundaries were enriched with more ChIP-seq factors. Experimental results demonstrate that our method has advantages such as higher accuracy and practical significance in TAD identification.


Subject(s)
Chromatin , Chromatin/genetics , Chromatin/chemistry , Cluster Analysis , Algorithms , Humans , Chromatin Immunoprecipitation Sequencing/methods
11.
Genet Sel Evol ; 56(1): 50, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38937662

ABSTRACT

BACKGROUND: Genome sequence variants affecting complex traits (quantitative trait loci, QTL) are enriched in functional regions of the genome, such as those marked by certain histone modifications. These variants are believed to influence gene expression. However, due to the linkage disequilibrium among nearby variants, pinpointing the precise location of QTL is challenging. We aimed to identify allele-specific binding (ASB) QTL (asbQTL) that cause variation in the level of histone modification, as measured by the height of peaks assayed by ChIP-seq (chromatin immunoprecipitation sequencing). We identified DNA sequences that predict the difference between alleles in ChIP-seq peak height in H3K4me3 and H3K27ac histone modifications in the mammary glands of cows. RESULTS: We used a gapped k-mer support vector machine, a novel best linear unbiased prediction model, and a multiple linear regression model that combines the other two approaches to predict variant impacts on peak height. For each method, a subset of 1000 sites with the highest magnitude of predicted ASB was considered as candidate asbQTL. The accuracy of this prediction was measured by the proportion where the predicted direction matched the observed direction. Prediction accuracy ranged between 0.59 and 0.74, suggesting that these 1000 sites are enriched for asbQTL. Using independent data, we investigated functional enrichment in the candidate asbQTL set and three control groups, including non-causal ASB sites, non-ASB variants under a peak, and SNPs (single nucleotide polymorphisms) not under a peak. For H3K4me3, a higher proportion of the candidate asbQTL were confirmed as ASB when compared to the non-causal ASB sites (P < 0.01). However, these candidate asbQTL did not enrich for the other annotations, including expression QTL (eQTL), allele-specific expression QTL (aseQTL) and sites conserved across mammals (P > 0.05). CONCLUSIONS: We identified putatively causal sites for asbQTL using the DNA sequence surrounding these sites. Our results suggest that many sites influencing histone modifications may not directly affect gene expression. However, it is important to acknowledge that distinguishing between putative causal ASB sites and other non-causal ASB sites in high linkage disequilibrium with the causal sites regarding their impact on gene expression may be challenging due to limitations in statistical power.


Subject(s)
Alleles , Chromatin Immunoprecipitation Sequencing , Histones , Quantitative Trait Loci , Animals , Cattle/genetics , Histones/genetics , Histones/metabolism , Chromatin Immunoprecipitation Sequencing/methods , Polymorphism, Single Nucleotide , Histone Code , Linkage Disequilibrium , Molecular Sequence Annotation , Female
12.
Bioinformatics ; 40(6)2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38870532

ABSTRACT

MOTIVATION: Understanding the rules that govern enhancer-driven transcription remains a central unsolved problem in genomics. Now with multiple massively parallel enhancer perturbation assays published, there are enough data that we can utilize to learn to predict enhancer-promoter (EP) relationships in a data-driven manner. RESULTS: We applied machine learning to one of the largest enhancer perturbation studies integrated with transcription factor (TF) and histone modification ChIP-seq. The results uncovered a discrepancy in the prediction of genome-wide data compared to data from targeted experiments. Relative strength of contact was important for prediction, confirming the basic principle of EP regulation. Novel features such as the density of the enhancers/promoters in the genomic region was found to be important, highlighting our lack of understanding on how other elements in the region contribute to the regulation. Several TF peaks were identified that improved the prediction by identifying the negatives and reducing False Positives. In summary, integrating genomic assays with enhancer perturbation studies increased the accuracy of the model, and provided novel insights into the understanding of enhancer-driven transcription. AVAILABILITY AND IMPLEMENTATION: The trained models, data, and the source code are available at http://doi.org/10.5281/zenodo.11290386 and https://github.com/HanLabUNLV/sleps.


Subject(s)
Enhancer Elements, Genetic , Promoter Regions, Genetic , Supervised Machine Learning , Humans , Transcription Factors/metabolism , Transcription Factors/genetics , Genomics/methods , Chromatin Immunoprecipitation Sequencing/methods
13.
Int J Mol Sci ; 25(9)2024 May 03.
Article in English | MEDLINE | ID: mdl-38732207

ABSTRACT

Prediction of binding sites for transcription factors is important to understand how the latter regulate gene expression and how this regulation can be modulated for therapeutic purposes. A consistent number of references address this issue with different approaches, Machine Learning being one of the most successful. Nevertheless, we note that many such approaches fail to propose a robust and meaningful method to embed the genetic data under analysis. We try to overcome this problem by proposing a bidirectional transformer-based encoder, empowered by bidirectional long-short term memory layers and with a capsule layer responsible for the final prediction. To evaluate the efficiency of the proposed approach, we use benchmark ChIP-seq datasets of five cell lines available in the ENCODE repository (A549, GM12878, Hep-G2, H1-hESC, and Hela). The results show that the proposed method can predict TFBS within the five different cell lines very well; moreover, cross-cell predictions provide satisfactory results as well. Experiments conducted across cell lines are reinforced by the analysis of five additional lines used only to test the model trained using the others. The results confirm that prediction across cell lines remains very high, allowing an extensive cross-transcription factor analysis to be performed from which several indications of interest for molecular biology may be drawn.


Subject(s)
Deep Learning , Transcription Factors , Humans , Transcription Factors/metabolism , Transcription Factors/genetics , Binding Sites , Computational Biology/methods , HeLa Cells , Protein Binding , Chromatin Immunoprecipitation Sequencing/methods , Cell Line
14.
Nat Methods ; 21(6): 1014-1022, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38724693

ABSTRACT

Standard scATAC sequencing (scATAC-seq) analysis pipelines represent cells as sparse numeric vectors relative to an atlas of peaks or genomic tiles and consequently ignore genomic sequence information at accessible loci. Here we present CellSpace, an efficient and scalable sequence-informed embedding algorithm for scATAC-seq that learns a mapping of DNA k-mers and cells to the same space, to address this limitation. We show that CellSpace captures meaningful latent structure in scATAC-seq datasets, including cell subpopulations and developmental hierarchies, and can score transcription factor activities in single cells based on proximity to binding motifs embedded in the same space. Importantly, CellSpace implicitly mitigates batch effects arising from multiple samples, donors or assays, even when individual datasets are processed relative to different peak atlases. Thus, CellSpace provides a powerful tool for integrating and interpreting large-scale scATAC-seq compendia.


Subject(s)
Algorithms , Chromatin Immunoprecipitation Sequencing , Single-Cell Analysis , Single-Cell Analysis/methods , Animals , Chromatin Immunoprecipitation Sequencing/methods , Humans , Mice , Sequence Analysis, DNA/methods , Transcription Factors/genetics , Transcription Factors/metabolism
15.
Nucleic Acids Res ; 52(W1): W45-W53, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38749504

ABSTRACT

ChIP-Atlas (https://chip-atlas.org/) presents a suite of data-mining tools for analyzing epigenomic landscapes, powered by the comprehensive integration of over 376 000 public ChIP-seq, ATAC-seq, DNase-seq and Bisulfite-seq experiments from six representative model organisms. To unravel the intricacies of chromatin architecture that mediates the regulome-initiated generation of transcriptional and phenotypic diversity within cells, we report ChIP-Atlas 3.0 that enhances clarity by incorporating additional tracks for genomic and epigenomic features within a newly consolidated 'annotation track' section. The tracks include chromosomal conformation (Hi-C and eQTL datasets), transcriptional regulatory elements (ChromHMM and FANTOM5 enhancers), and genomic variants associated with diseases and phenotypes (GWAS SNPs and ClinVar variants). These annotation tracks are easily accessible alongside other experimental tracks, facilitating better elucidation of chromatin architecture underlying the diversification of transcriptional and phenotypic traits. Furthermore, 'Diff Analysis,' a new online tool, compares the query epigenome data to identify differentially bound, accessible, and methylated regions using ChIP-seq, ATAC-seq and DNase-seq, and Bisulfite-seq datasets, respectively. The integration of annotation tracks and the Diff Analysis tool, coupled with continuous data expansion, renders ChIP-Atlas 3.0 a robust resource for mining the landscape of transcriptional regulatory mechanisms, thereby offering valuable perspectives, particularly for genetic disease research and drug discovery.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Data Mining , Software , Humans , Data Mining/methods , Chromatin Immunoprecipitation Sequencing/methods , Animals , Chromatin/genetics , Chromatin/metabolism , Chromosomes/genetics , Epigenomics/methods , Polymorphism, Single Nucleotide , Mice , Quantitative Trait Loci , Molecular Sequence Annotation , Regulatory Elements, Transcriptional/genetics , Genomics/methods
16.
Genomics ; 116(4): 110858, 2024 07.
Article in English | MEDLINE | ID: mdl-38735595

ABSTRACT

The ever decreasing cost of Next-Generation Sequencing coupled with the emergence of efficient and reproducible analysis pipelines has rendered genomic methods more accessible. However, downstream analyses are basic or missing in most workflows, creating a significant barrier for non-bioinformaticians. To help close this gap, we developed Cactus, an end-to-end pipeline for analyzing ATAC-Seq and mRNA-Seq data, either separately or jointly. Its Nextflow-, container-, and virtual environment-based architecture ensures efficient and reproducible analyses. Cactus preprocesses raw reads, conducts differential analyses between conditions, and performs enrichment analyses in various databases, including DNA-binding motifs, ChIP-Seq binding sites, chromatin states, and ontologies. We demonstrate the utility of Cactus in a multi-modal and multi-species case study as well as by showcasing its unique capabilities as compared to other ATAC-Seq pipelines. In conclusion, Cactus can assist researchers in gaining comprehensive insights from chromatin accessibility and gene expression data in a quick, user-friendly, and reproducible manner.


Subject(s)
Software , Humans , Animals , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Chromatin Immunoprecipitation Sequencing/methods , Chromatin/genetics , Chromatin/metabolism , RNA-Seq/methods
17.
Nat Commun ; 15(1): 3606, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38697975

ABSTRACT

Amyotrophic Lateral Sclerosis (ALS), like many other neurodegenerative diseases, is highly heritable, but with only a small fraction of cases explained by monogenic disease alleles. To better understand sporadic ALS, we report epigenomic profiles, as measured by ATAC-seq, of motor neuron cultures derived from a diverse group of 380 ALS patients and 80 healthy controls. We find that chromatin accessibility is heavily influenced by sex, the iPSC cell type of origin, ancestry, and the inherent variance arising from sequencing. Once these covariates are corrected for, we are able to identify ALS-specific signals in the data. Additionally, we find that the ATAC-seq data is able to predict ALS disease progression rates with similar accuracy to methods based on biomarkers and clinical status. These results suggest that iPSC-derived motor neurons recapitulate important disease-relevant epigenomic changes.


Subject(s)
Amyotrophic Lateral Sclerosis , Induced Pluripotent Stem Cells , Motor Neurons , Humans , Amyotrophic Lateral Sclerosis/genetics , Amyotrophic Lateral Sclerosis/pathology , Amyotrophic Lateral Sclerosis/metabolism , Induced Pluripotent Stem Cells/metabolism , Motor Neurons/metabolism , Motor Neurons/pathology , Male , Female , Middle Aged , Case-Control Studies , Chromatin/metabolism , Chromatin/genetics , Aged , Epigenomics/methods , Chromatin Immunoprecipitation Sequencing/methods , Disease Progression , Epigenesis, Genetic
18.
Nucleic Acids Res ; 52(9): e46, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38647069

ABSTRACT

SifiNet is a robust and accurate computational pipeline for identifying distinct gene sets, extracting and annotating cellular subpopulations, and elucidating intrinsic relationships among these subpopulations. Uniquely, SifiNet bypasses the cell clustering stage, commonly integrated into other cellular annotation pipelines, thereby circumventing potential inaccuracies in clustering that may compromise subsequent analyses. Consequently, SifiNet has demonstrated superior performance in multiple experimental datasets compared with other state-of-the-art methods. SifiNet can analyze both single-cell RNA and ATAC sequencing data, thereby rendering comprehensive multi-omic cellular profiles. It is conveniently available as an open-source R package.


Subject(s)
Single-Cell Analysis , Software , Single-Cell Analysis/methods , Humans , Molecular Sequence Annotation , Algorithms , Computational Biology/methods , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Chromatin Immunoprecipitation Sequencing/methods , Cluster Analysis
19.
Nucleic Acids Res ; 52(8): 4137-4150, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38572749

ABSTRACT

DNA motifs are crucial patterns in gene regulation. DNA-binding proteins (DBPs), including transcription factors, can bind to specific DNA motifs to regulate gene expression and other cellular activities. Past studies suggest that DNA shape features could be subtly involved in DNA-DBP interactions. Therefore, the shape motif annotations based on intrinsic DNA topology can deepen the understanding of DNA-DBP binding. Nevertheless, high-throughput tools for DNA shape motif discovery that incorporate multiple features altogether remain insufficient. To address it, we propose a series of methods to discover non-redundant DNA shape motifs with the generalization to multiple motifs in multiple shape features. Specifically, an existing Gibbs sampling method is generalized to multiple DNA motif discovery with multiple shape features. Meanwhile, an expectation-maximization (EM) method and a hybrid method coupling EM with Gibbs sampling are proposed and developed with promising performance, convergence capability, and efficiency. The discovered DNA shape motif instances reveal insights into low-signal ChIP-seq peak summits, complementing the existing sequence motif discovery works. Additionally, our modelling captures the potential interplays across multiple DNA shape features. We provide a valuable platform of tools for DNA shape motif discovery. An R package is built for open accessibility and long-lasting impact: https://zenodo.org/doi/10.5281/zenodo.10558980.


Subject(s)
DNA , Nucleotide Motifs , DNA/chemistry , DNA/genetics , DNA/metabolism , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , Algorithms , Nucleic Acid Conformation , Chromatin Immunoprecipitation Sequencing/methods , Binding Sites , Transcription Factors/metabolism , Transcription Factors/genetics , Transcription Factors/chemistry , Humans , Protein Binding
20.
Nat Comput Sci ; 4(4): 285-298, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38600256

ABSTRACT

The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.


Subject(s)
Chromatin , Single-Cell Analysis , Humans , Algorithms , Chromatin/genetics , Chromatin/metabolism , Chromatin Immunoprecipitation Sequencing/methods , Computational Biology/methods , Genome/genetics , Genomics/methods , Neoplasms/genetics , Single-Cell Analysis/methods , Transposases/genetics , Transposases/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL