Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
Add more filters










Publication year range
1.
Nat Biotechnol ; 2024 Apr 12.
Article in English | MEDLINE | ID: mdl-38609714

ABSTRACT

Existing methods for gene regulatory network (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.

2.
NAR Genom Bioinform ; 6(1): lqae017, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38486887

ABSTRACT

Latest advancements in the high-throughput single-cell genome (scDNA) and transcriptome (scRNA) sequencing technologies enabled cell-resolved investigation of tissue clones. However, it remains challenging to cluster and couple single cells for heterogeneous scRNA and scDNA data generated from the same specimen. In this study, we present a computational framework called CCNMF, which employs a novel Coupled-Clone Non-negative Matrix Factorization technique to jointly infer clonal structure for matched scDNA and scRNA data. CCNMF couples multi-omics single cells by linking copy number and gene expression profiles through their general concordance. It successfully resolved the underlying coexisting clones with high correlations between the clonal genome and transcriptome from the same specimen. We validated that CCNMF can achieve high accuracy and robustness using both simulated benchmarks and real-world applications, including an ovarian cancer cell lines mixture, a gastric cancer cell line, and a primary gastric cancer. In summary, CCNMF provides a powerful tool for integrating multi-omics single-cell data, enabling simultaneous resolution of genomic and transcriptomic clonal architecture. This computational framework facilitates the understanding of how cellular gene expression changes in conjunction with clonal genome alternations, shedding light on the cellular genomic difference of subclones that contributes to tumor evolution.

3.
bioRxiv ; 2023 Aug 03.
Article in English | MEDLINE | ID: mdl-37577525

ABSTRACT

Accurate context-specific Gene Regulatory Networks (GRNs) inference from genomics data is a crucial task in computational biology. However, existing methods face limitations, such as reliance on gene expression data alone, lower resolution from bulk data, and data scarcity for specific cellular systems. Despite recent technological advancements, including single-cell sequencing and the integration of ATAC-seq and RNA-seq data, learning such complex mechanisms from limited independent data points still presents a daunting challenge, impeding GRN inference accuracy. To overcome this challenge, we present LINGER (LIfelong neural Network for GEne Regulation), a novel deep learning-based method to infer GRNs from single-cell multiome data with paired gene expression and chromatin accessibility data from the same cell. LINGER incorporates both 1) atlas-scale external bulk data across diverse cellular contexts and 2) the knowledge of transcription factor (TF) motif matching to cis-regulatory elements as a manifold regularization to address the challenge of limited data and extensive parameter space in GRN inference. Our results demonstrate that LINGER achieves 2-3 fold higher accuracy over existing methods. LINGER reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Additionally, following the GRN inference from a reference sc-multiome data, LINGER allows for the estimation of TF activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies. Overall, LINGER provides a comprehensive tool for robust gene regulation inference from genomics data, empowering deeper insights into cellular mechanisms.

4.
Elife ; 112022 12 16.
Article in English | MEDLINE | ID: mdl-36525361

ABSTRACT

Systems genetics holds the promise to decipher complex traits by interpreting their associated SNPs through gene regulatory networks derived from comprehensive multi-omics data of cell types, tissues, and organs. Here, we propose SpecVar to integrate paired chromatin accessibility and gene expression data into context-specific regulatory network atlas and regulatory categories, conduct heritability enrichment analysis with genome-wide association studies (GWAS) summary statistics, identify relevant tissues, and estimate relevance correlation to depict common genetic factors acting in the shared regulatory networks between traits. Our method improves power upon existing approaches by associating SNPs with context-specific regulatory elements to assess heritability enrichments and by explicitly prioritizing gene regulations underlying relevant tissues. Ablation studies, independent data validation, and comparison experiments with existing methods on GWAS of six phenotypes show that SpecVar can improve heritability enrichment, accurately detect relevant tissues, and reveal causal regulations. Furthermore, SpecVar correlates the relevance patterns for pairs of phenotypes and better reveals shared SNP-associated regulations of phenotypes than existing methods. Studying GWAS of 206 phenotypes in UK Biobank demonstrates that SpecVar leverages the context-specific regulatory network atlas to prioritize phenotypes' relevant tissues and shared heritability for biological and therapeutic insights. SpecVar provides a powerful way to interpret SNPs via context-specific regulatory networks and is available at https://github.com/AMSSwanglab/SpecVar, copy archived at swh:1:rev:cf27438d3f8245c34c357ec5f077528e6befe829.


Subject(s)
Gene Regulatory Networks , Genome-Wide Association Study , Phenotype , Gene Expression Regulation , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide
6.
Phenomics ; 2(6): 389-403, 2022 Dec.
Article in English | MEDLINE | ID: mdl-35990388

ABSTRACT

Human genetic variants can influence the severity of symptoms infected with SARS-COV-2. Several genome-wide association studies have identified human genomic risk single nucleotide polymorphisms (SNPs) associated with coronavirus disease 2019 (COVID-19) severity. However, the causal tissues or cell types underlying COVID-19 severity are uncertain. In addition, candidate genes associated with these risk SNPs were investigated based on genomic proximity instead of their functional cellular contexts. Here, we compiled regulatory networks of 77 human contexts and revealed those risk SNPs' enriched cellular contexts and associated risk SNPs with transcription factors, regulatory elements, and target genes. Twenty-one human contexts were identified and grouped into two categories: immune cells and epithelium cells. We further aggregated the regulatory networks of immune cells and epithelium cells. These two aggregated regulatory networks were investigated to reveal their association with risk SNPs' regulation. Two genomic clusters, the chemokine receptors cluster and the oligoadenylate synthetase (OAS) cluster, showed the strongest association with COVID-19 severity, and they had different regulatory programs in immune and epithelium contexts. Our findings were supported by analysis of both SNP array and whole genome sequencing-based genome wide association study (GWAS) summary statistics. Supplementary Information: The online version contains supplementary material available at 10.1007/s43657-022-00066-x.

7.
Genome Biol ; 23(1): 160, 2022 07 19.
Article in English | MEDLINE | ID: mdl-35854350

ABSTRACT

Despite recent developments, it is hard to profile all multi-omics single-cell data modalities on the same cell. Thus, huge amounts of single-cell genomics data of unpaired observations on different cells are generated. We propose a method named UnpairReg for the regression analysis on unpaired observations to integrate single-cell multi-omics data. On real and simulated data, UnpairReg provides an accurate estimation of cell gene expression where only chromatin accessibility data is available. The cis-regulatory network inferred from UnpairReg is highly consistent with eQTL mapping. UnpairReg improves cell type identification accuracy by joint analysis of single-cell gene expression and chromatin accessibility data.


Subject(s)
Chromatin , Genomics , Chromatin/genetics , Regression Analysis , Single-Cell Analysis
8.
Genome Biol ; 23(1): 114, 2022 05 16.
Article in English | MEDLINE | ID: mdl-35578363

ABSTRACT

Technological development has enabled the profiling of gene expression and chromatin accessibility from the same cell. We develop scREG, a dimension reduction methodology, based on the concept of cis-regulatory potential, for single cell multiome data. This concept is further used for the construction of subpopulation-specific cis-regulatory networks. The capability of inferring useful regulatory network is demonstrated by the two-fold increment on network inference accuracy compared to the Pearson correlation-based method and the 27-fold enrichment of GWAS variants for inflammatory bowel disease in the cis-regulatory elements. The R package scREG provides comprehensive functions for single cell multiome data analysis.


Subject(s)
Chromatin , Regulatory Sequences, Nucleic Acid , Chromatin/genetics , Gene Expression , Gene Regulatory Networks , Single-Cell Analysis
9.
Nat Commun ; 12(1): 4763, 2021 08 06.
Article in English | MEDLINE | ID: mdl-34362918

ABSTRACT

The comparison of gene regulatory networks between diseased versus healthy individuals or between two different treatments is an important scientific problem. Here, we propose sc-compReg as a method for the comparative analysis of gene expression regulatory networks between two conditions using single cell gene expression (scRNA-seq) and single cell chromatin accessibility data (scATAC-seq). Our software, sc-compReg, can be used as a stand-alone package that provides joint clustering and embedding of the cells from both scRNA-seq and scATAC-seq, and the construction of differential regulatory networks across two conditions. We apply the method to compare the gene regulatory networks of an individual with chronic lymphocytic leukemia (CLL) versus a healthy control. The analysis reveals a tumor-specific B cell subpopulation in the CLL patient and identifies TOX2 as a potential regulator of this subpopulation.


Subject(s)
Gene Regulatory Networks , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Single-Cell Analysis/methods , B-Lymphocytes , Chromatin , Gene Expression Regulation, Neoplastic , HMGB Proteins , Humans , RNA, Small Cytoplasmic , Software
10.
Proc Natl Acad Sci U S A ; 118(30)2021 07 27.
Article in English | MEDLINE | ID: mdl-34285077

ABSTRACT

Dysfunction in T cells limits the efficacy of cancer immunotherapy. We profiled the epigenome, transcriptome, and enhancer connectome of exhaustion-prone GD2-targeting HA-28z chimeric antigen receptor (CAR) T cells and control CD19-targeting CAR T cells, which present less exhaustion-inducing tonic signaling, at multiple points during their ex vivo expansion. We found widespread, dynamic changes in chromatin accessibility and three-dimensional (3D) chromosome conformation preceding changes in gene expression, notably at loci proximal to exhaustion-associated genes such as PDCD1, CTLA4, and HAVCR2, and increased DNA motif access for AP-1 family transcription factors, which are known to promote exhaustion. Although T cell exhaustion has been studied in detail in mice, we find that the regulatory networks of T cell exhaustion differ between species and involve distinct loci of accessible chromatin and cis-regulated target genes in human CAR T cell exhaustion. Deletion of exhaustion-specific candidate enhancers of PDCD1 suppress the expression of PD-1 in an in vitro model of T cell dysfunction and in HA-28z CAR T cells, suggesting enhancer editing as a path forward in improving cancer immunotherapy.


Subject(s)
Chromatin/metabolism , Neoplasms/therapy , Programmed Cell Death 1 Receptor/metabolism , Receptors, Chimeric Antigen , T-Lymphocytes/physiology , Animals , Antigens, CD19 , Cell Line , Chromatin/genetics , Gene Expression Regulation, Neoplastic , Humans , Mice , Programmed Cell Death 1 Receptor/genetics
11.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34180954

ABSTRACT

Multi-omics data allow us to select a small set of informative markers for the discrimination of specific cell types and study of cellular heterogeneity. However, it is often challenging to choose an optimal marker panel from the high-dimensional molecular profiles for a large amount of cell types. Here, we propose a method called Mixed Integer programming Model to Identify Cell type-specific marker panel (MIMIC). MIMIC maintains the hierarchical topology among different cell types and simultaneously maximizes the specificity of a fixed number of selected markers. MIMIC was benchmarked on the mouse ENCODE RNA-seq dataset, with 29 diverse tissues, for 43 surface markers (SMs) and 1345 transcription factors (TFs). MIMIC could select biologically meaningful markers and is robust for different accuracy criteria. It shows advantages over the standard single gene-based approaches and widely used dimensional reduction methods, such as multidimensional scaling and t-SNE, both in accuracy and in biological interpretation. Furthermore, the combination of SMs and TFs achieves better specificity than SMs or TFs alone. Applying MIMIC to a large collection of 641 RNA-seq samples covering 231 cell types identifies a panel of TFs and SMs that reveal the modularity of cell type association networks. Finally, the scalability of MIMIC is demonstrated by selecting enhancer markers from mouse ENCODE data. MIMIC is freely available at https://github.com/MengZou1/MIMIC.


Subject(s)
Biomarkers , Computational Biology , Flow Cytometry/methods , Gene Expression Profiling/methods , Organ Specificity , Software , Algorithms , Computational Biology/methods , Databases, Genetic , Gene Expression Regulation , Humans , Organ Specificity/genetics , Reproducibility of Results
12.
Nat Commun ; 12(1): 2851, 2021 05 14.
Article in English | MEDLINE | ID: mdl-33990562

ABSTRACT

Genome-wide association studies (GWAS) have cataloged many significant associations between genetic variants and complex traits. However, most of these findings have unclear biological significance, because they often have small effects and occur in non-coding regions. Integration of GWAS with gene regulatory networks addresses both issues by aggregating weak genetic signals within regulatory programs. Here we develop a Bayesian framework that integrates GWAS summary statistics with regulatory networks to infer genetic enrichments and associations simultaneously. Our method improves upon existing approaches by explicitly modeling network topology to assess enrichments, and by automatically leveraging enrichments to identify associations. Applying this method to 18 human traits and 38 regulatory networks shows that genetic signals of complex traits are often enriched in interconnections specific to trait-relevant cell types or tissues. Prioritizing variants within enriched networks identifies known and previously undescribed trait-associated genes revealing biological and therapeutic insights.


Subject(s)
Gene Regulatory Networks , Genome-Wide Association Study/methods , Models, Genetic , Multifactorial Inheritance/genetics , Algorithms , Bayes Theorem , Computer Simulation , Data Mining , Genome, Human , Genome-Wide Association Study/statistics & numerical data , Humans , Polymorphism, Single Nucleotide , Transcription Factors/genetics
13.
Commun Biol ; 4(1): 442, 2021 04 06.
Article in English | MEDLINE | ID: mdl-33824393

ABSTRACT

Cranial Neural Crest Cells (CNCC) originate at the cephalic region from forebrain, midbrain and hindbrain, migrate into the developing craniofacial region, and subsequently differentiate into multiple cell types. The entire specification, delamination, migration, and differentiation process is highly regulated and abnormalities during this craniofacial development cause birth defects. To better understand the molecular networks underlying CNCC, we integrate paired gene expression & chromatin accessibility data and reconstruct the genome-wide human Regulatory network of CNCC (hReg-CNCC). Consensus optimization predicts high-quality regulations and reveals the architecture of upstream, core, and downstream transcription factors that are associated with functions of neural plate border, specification, and migration. hReg-CNCC allows us to annotate genetic variants of human facial GWAS and disease traits with associated cis-regulatory modules, transcription factors, and target genes. For example, we reveal the distal and combinatorial regulation of multiple SNPs to core TF ALX1 and associations to facial distances and cranial rare disease. In addition, hReg-CNCC connects the DNA sequence differences in evolution, such as ultra-conserved elements and human accelerated regions, with gene expression and phenotype. hReg-CNCC provides a valuable resource to interpret genetic variants as early as gastrulation during embryonic development. The network resources are available at https://github.com/AMSSwanglab/hReg-CNCC .


Subject(s)
Cell Differentiation , Gene Expression Regulation, Developmental , Gene Regulatory Networks , Neural Crest/embryology , Humans
14.
Brief Bioinform ; 22(4)2021 07 20.
Article in English | MEDLINE | ID: mdl-33048117

ABSTRACT

The DNA methyltransferases (DNMTs) (DNMT3A, DNMT3B and DNMT3L) are primarily responsible for the establishment of genomic locus-specific DNA methylation patterns, which play an important role in gene regulation and animal development. However, this important protein family's binding mechanism, i.e. how and where the DNMTs bind to genome, is still missing in most tissues and cell lines. This motivates us to explore DNMTs and TF's cooperation and develop a network regularized logistic regression model, GuidingNet, to predict DNMTs' genome-wide binding by integrating gene expression, chromatin accessibility, sequence and protein-protein interaction data. GuidingNet accurately predicted methylation experimental data validated DNMTs' binding, outperformed single data source based and sparsity regularized methods and performed well in within and across tissue prediction for several DNMTs in human and mouse. Importantly, GuidingNet can reveal transcription cofactors assisting DNMTs for methylation establishment. This provides biological understanding in the DNMTs' binding specificity in different tissues and demonstrate the advantage of network regularization. In addition to DNMTs, GuidingNet achieves good performance for other chromatin regulators' binding. GuidingNet is freely available at https://github.com/AMSSwanglab/GuidingNet.


Subject(s)
DNA (Cytosine-5-)-Methyltransferases , DNA Methylation/genetics , Gene Expression Regulation, Enzymologic , Genome, Human , Models, Biological , Protein Interaction Maps , Transcription Factors , Animals , Chromatin/genetics , Chromatin/metabolism , DNA (Cytosine-5-)-Methyltransferases/biosynthesis , DNA (Cytosine-5-)-Methyltransferases/genetics , Databases, Genetic , Humans , Mice , Transcription Factors/genetics , Transcription Factors/metabolism
15.
Nat Commun ; 11(1): 4928, 2020 10 01.
Article in English | MEDLINE | ID: mdl-33004791

ABSTRACT

High-altitude adaptation of Tibetans represents a remarkable case of natural selection during recent human evolution. Previous genome-wide scans found many non-coding variants under selection, suggesting a pressing need to understand the functional role of non-coding regulatory elements (REs). Here, we generate time courses of paired ATAC-seq and RNA-seq data on cultured HUVECs under hypoxic and normoxic conditions. We further develop a variant interpretation methodology (vPECA) to identify active selected REs (ASREs) and associated regulatory network. We discover three causal SNPs of EPAS1, the key adaptive gene for Tibetans. These SNPs decrease the accessibility of ASREs with weakened binding strength of relevant TFs, and cooperatively down-regulate EPAS1 expression. We further construct the downstream network of EPAS1, elucidating its roles in hypoxic response and angiogenesis. Collectively, we provide a systematic approach to interpret phenotype-associated noncoding variants in proper cell types and relevant dynamic conditions, to model their impact on gene regulation.


Subject(s)
Acclimatization/genetics , Chromatin/metabolism , Ethnicity/genetics , Gene Regulatory Networks , Models, Genetic , Altitude , Altitude Sickness/ethnology , Altitude Sickness/genetics , Altitude Sickness/metabolism , Basic Helix-Loop-Helix Transcription Factors/genetics , Cell Hypoxia/genetics , Cells, Cultured , Chromatin/genetics , Chromatin Immunoprecipitation Sequencing , Disease Resistance/genetics , Female , Gene Expression Regulation , Human Umbilical Vein Endothelial Cells , Humans , Hypoxia/genetics , Hypoxia/metabolism , Oxygen/metabolism , Polymorphism, Single Nucleotide , Pregnancy , Primary Cell Culture , RNA-Seq , Regulatory Elements, Transcriptional/genetics , Selection, Genetic , Tibet/ethnology , Transcription Factors/metabolism , Whole Genome Sequencing
16.
Proc Natl Acad Sci U S A ; 117(35): 21364-21372, 2020 09 01.
Article in English | MEDLINE | ID: mdl-32817564

ABSTRACT

A person's genome typically contains millions of variants which represent the differences between this personal genome and the reference human genome. The interpretation of these variants, i.e., the assessment of their potential impact on a person's phenotype, is currently of great interest in human genetics and medicine. We have developed a prioritization tool called OpenCausal which takes as inputs 1) a personal genome and 2) a reference context-specific TF expression profile and returns a list of noncoding variants prioritized according to their impact on chromatin accessibility for any given genomic region of interest. We applied OpenCausal to 6,430 samples across 18 tissues derived from the GTEx project and found that the variants prioritized by OpenCausal are highly enriched for eQTLs and caQTLs. We further propose a strategy to integrate the predicted open scores with genome-wide association studies (GWAS) data to prioritize putative causal variants and regulatory elements for a given risk locus (i.e., fine-mapping analysis). As an initial example, we applied this method to a GWAS dataset of human height and found that the prioritized putative variants and elements are correlated with the phenotype (i.e., heights of individuals) better than others.


Subject(s)
Genetic Techniques , Genetic Variation , Genome, Human , Models, Genetic , Regulatory Elements, Transcriptional , Body Height/genetics , Gene Expression Profiling , Genome-Wide Association Study , Humans , Quantitative Trait Loci , Software , Transcription Factors/metabolism
17.
Genome Res ; 30(4): 622-634, 2020 04.
Article in English | MEDLINE | ID: mdl-32188700

ABSTRACT

A time course experiment is a widely used design in the study of cellular processes such as differentiation or response to stimuli. In this paper, we propose time course regulatory analysis (TimeReg) as a method for the analysis of gene regulatory networks based on paired gene expression and chromatin accessibility data from a time course. TimeReg can be used to prioritize regulatory elements, to extract core regulatory modules at each time point, to identify key regulators driving changes of the cellular state, and to causally connect the modules across different time points. We applied the method to analyze paired chromatin accessibility and gene expression data from a retinoic acid (RA)-induced mouse embryonic stem cells (mESCs) differentiation experiment. The analysis identified 57,048 novel regulatory elements regulating cerebellar development, synapse assembly, and hindbrain morphogenesis, which substantially extended our knowledge of cis-regulatory elements during differentiation. Using single-cell RNA-seq data, we showed that the core regulatory modules can reflect the properties of different subpopulations of cells. Finally, the driver regulators are shown to be important in clarifying the relations between modules across adjacent time points. As a second example, our method on Ascl1-induced direct reprogramming from fibroblast to neuron time course data identified Id1/2 as driver regulators of early stage of reprogramming.


Subject(s)
Chromatin Assembly and Disassembly , Chromatin/genetics , Gene Expression Regulation , Mouse Embryonic Stem Cells/metabolism , Algorithms , Animals , Cell Differentiation/drug effects , Cell Differentiation/genetics , Cell Lineage , Cellular Reprogramming/genetics , Cellular Reprogramming Techniques , Chromatin/metabolism , Computational Biology/methods , Gene Expression Profiling/methods , Gene Regulatory Networks , Mice , Mouse Embryonic Stem Cells/drug effects , Transcription Factors/metabolism , Transcriptome , Tretinoin/pharmacology
18.
Proc Natl Acad Sci U S A ; 117(9): 4864-4873, 2020 03 03.
Article in English | MEDLINE | ID: mdl-32071206

ABSTRACT

In both Turner syndrome (TS) and Klinefelter syndrome (KS) copy number aberrations of the X chromosome lead to various developmental symptoms. We report a comparative analysis of TS vs. KS regarding differences at the genomic network level measured in primary samples by analyzing gene expression, DNA methylation, and chromatin conformation. X-chromosome inactivation (XCI) silences transcription from one X chromosome in female mammals, on which most genes are inactive, and some genes escape from XCI. In TS, almost all differentially expressed escape genes are down-regulated but most differentially expressed inactive genes are up-regulated. In KS, differentially expressed escape genes are up-regulated while the majority of inactive genes appear unchanged. Interestingly, 94 differentially expressed genes (DEGs) overlapped between TS and female and KS and male comparisons; and these almost uniformly display expression changes into opposite directions. DEGs on the X chromosome and the autosomes are coexpressed in both syndromes, indicating that there are molecular ripple effects of the changes in X chromosome dosage. Six potential candidate genes (RPS4X, SEPT6, NKRF, CX0rf57, NAA10, and FLNA) for KS are identified on Xq, as well as candidate central genes on Xp for TS. Only promoters of inactive genes are differentially methylated in both syndromes while escape gene promoters remain unchanged. The intrachromosomal contact map of the X chromosome in TS exhibits the structure of an active X chromosome. The discovery of shared DEGs indicates the existence of common molecular mechanisms for gene regulation in TS and KS that transmit the gene dosage changes to the transcriptome.


Subject(s)
Gene Dosage , Gene Expression Regulation , Genomics , Klinefelter Syndrome/genetics , Turner Syndrome/genetics , X Chromosome , Animals , Chromatin/chemistry , Chromosomes, Human, X , DNA Methylation , Female , Filamins , Humans , Karyotype , Male , Mammals/genetics , N-Terminal Acetyltransferase A , N-Terminal Acetyltransferase E , Protein Serine-Threonine Kinases/genetics , Receptor, PAR-2 , Repressor Proteins/genetics , Septins , Transcriptome/genetics , X Chromosome Inactivation
19.
Nat Commun ; 10(1): 4613, 2019 10 10.
Article in English | MEDLINE | ID: mdl-31601804

ABSTRACT

Characterizing and interpreting heterogeneous mixtures at the cellular level is a critical problem in genomics. Single-cell assays offer an opportunity to resolve cellular level heterogeneity, e.g., scRNA-seq enables single-cell expression profiling, and scATAC-seq identifies active regulatory elements. Furthermore, while scHi-C can measure the chromatin contacts (i.e., loops) between active regulatory elements to target genes in single cells, bulk HiChIP can measure such contacts in a higher resolution. In this work, we introduce DC3 (De-Convolution and Coupled-Clustering) as a method for the joint analysis of various bulk and single-cell data such as HiChIP, RNA-seq and ATAC-seq from the same heterogeneous cell population. DC3 can simultaneously identify distinct subpopulations, assign single cells to the subpopulations (i.e., clustering) and de-convolve the bulk data into subpopulation-specific data. The subpopulation-specific profiles of gene expression, chromatin accessibility and enhancer-promoter contact obtained by DC3 provide a comprehensive characterization of the gene regulatory system in each subpopulation.


Subject(s)
Algorithms , Cluster Analysis , Gene Expression Profiling/statistics & numerical data , Genomics/statistics & numerical data , Single-Cell Analysis/statistics & numerical data , Animals , Cell Line , Chromatin , Chromatin Immunoprecipitation/statistics & numerical data , Computer Simulation , Gene Expression Profiling/methods , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Mice , Promoter Regions, Genetic , Single-Cell Analysis/methods
20.
NPJ Syst Biol Appl ; 5: 28, 2019.
Article in English | MEDLINE | ID: mdl-31428455

ABSTRACT

To study systems-level properties of the cell, it is necessary to go beyond individual regulators and target genes to study the regulatory network among transcription factors (TFs). However, it is difficult to directly dissect the TFs mediated genome-wide gene regulatory network (GRN) by experiment. Here, we proposed a hierarchical graphical model to estimate TF activity from mRNA expression by building TF complexes with protein cofactors and inferring TF's downstream regulatory network simultaneously. Then we applied our model on flower development and circadian rhythm processes in Arabidopsis thaliana. The computational results show that the sequence specific bHLH family TF HFR1 recruits the chromatin regulator HAC1 to flower development master regulator TF AG and further activates AG's expression by histone acetylation. Both independent data and experimental results supported this discovery. We also found a flower tissue specific H3K27ac ChIP-seq peak at AG gene body and a HFR1 motif in the center of this H3K27ac peak. Furthermore, we verified that HFR1 physically interacts with HAC1 by yeast two-hybrid experiment. This HFR1-HAC1-AG triplet relationship may imply that flower development and circadian rhythm are bridged by epigenetic regulation and enrich the classical ABC model in flower development. In addition, our TF activity network can serve as a general method to elucidate molecular mechanisms on other complex biological regulatory processes.


Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Computational Biology/methods , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , AGAMOUS Protein, Arabidopsis/genetics , AGAMOUS Protein, Arabidopsis/metabolism , Arabidopsis/genetics , Arabidopsis/metabolism , Arsenate Reductases/metabolism , Circadian Rhythm/genetics , Circadian Rhythm/physiology , Epigenesis, Genetic/genetics , Flowers/genetics , Gene Expression Regulation, Plant/genetics , Gene Regulatory Networks , Genome , Nuclear Proteins/genetics , Transcription Factors/genetics , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...