Search | VHL Regional Portal

1.

Data integration and inference of gene regulation using single-cell temporal multimodal data with scTIE.

Lin, Yingxin; Wu, Tung-Yu; Chen, Xi; Wan, Sheng; Chao, Brian; Xin, Jingxue; Yang, Jean Y H; Wong, Wing H; Wang, Y X Rachel.

Genome Res ; 34(1): 119-133, 2024 02 07.

Article in English | MEDLINE | ID: mdl-38190633

ABSTRACT

Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.

Subject(s)

Gene Expression Profiling , Single-Cell Analysis , Animals , Mice , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Gene Expression Regulation

2.

scTIE: data integration and inference of gene regulation using single-cell temporal multimodal data.

Lin, Yingxin; Wu, Tung-Yu; Chen, Xi; Wan, Sheng; Chao, Brian; Xin, Jingxue; Yang, Jean Y H; Wong, Wing H; Wang, Y X Rachel.

bioRxiv ; 2023 May 22.

Article in English | MEDLINE | ID: mdl-37292801

ABSTRACT

Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal datasets, we demonstrate scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome dataset we generated from differentiating mouse embryonic stem cells over time, we demonstrate scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.

3.

Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification.

Feng, Zhanying; Duren, Zhana; Xin, Jingxue; Yuan, Qiuyue; He, Yaoxi; Su, Bing; Wong, Wing Hung; Wang, Yong.

Elife ; 112022 12 16.

Article in English | MEDLINE | ID: mdl-36525361

ABSTRACT

Systems genetics holds the promise to decipher complex traits by interpreting their associated SNPs through gene regulatory networks derived from comprehensive multi-omics data of cell types, tissues, and organs. Here, we propose SpecVar to integrate paired chromatin accessibility and gene expression data into context-specific regulatory network atlas and regulatory categories, conduct heritability enrichment analysis with genome-wide association studies (GWAS) summary statistics, identify relevant tissues, and estimate relevance correlation to depict common genetic factors acting in the shared regulatory networks between traits. Our method improves power upon existing approaches by associating SNPs with context-specific regulatory elements to assess heritability enrichments and by explicitly prioritizing gene regulations underlying relevant tissues. Ablation studies, independent data validation, and comparison experiments with existing methods on GWAS of six phenotypes show that SpecVar can improve heritability enrichment, accurately detect relevant tissues, and reveal causal regulations. Furthermore, SpecVar correlates the relevance patterns for pairs of phenotypes and better reveals shared SNP-associated regulations of phenotypes than existing methods. Studying GWAS of 206 phenotypes in UK Biobank demonstrates that SpecVar leverages the context-specific regulatory network atlas to prioritize phenotypes' relevant tissues and shared heritability for biological and therapeutic insights. SpecVar provides a powerful way to interpret SNPs via context-specific regulatory networks and is available at https://github.com/AMSSwanglab/SpecVar, copy archived at swh:1:rev:cf27438d3f8245c34c357ec5f077528e6befe829.

Subject(s)

Gene Regulatory Networks , Genome-Wide Association Study , Phenotype , Gene Expression Regulation , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide

4.

Author Correction: Regulatory analysis of single cell multiome gene expression and chromatin accessibility data with scREG.

Duren, Zhana; Chang, Fengge; Naqing, Fnu; Xin, Jingxue; Liu, Qiao; Wong, Wing Hung.

Genome Biol ; 23(1): 213, 2022 Oct 13.

Article in English | MEDLINE | ID: mdl-36229829

5.

Genetic adaptation of skin pigmentation in highland Tibetans.

Yang, Zhaohui; Bai, Caijuan; Pu, Youwei; Kong, Qinghong; Guo, Yongbo; Liu, Xuyang; Zhao, Qi; Qiu, Zhichao; Zheng, Wangshan; He, Yaoxi; Lin, Yihan; Deng, Lian; Zhang, Chao; Xu, Shuhua; Peng, Yi; Xiang, Kun; Zhang, Xiaoming; Cui, Chaoying; Pan, Yongyue; Xin, Jingxue; Wang, Yong; Liu, Shiming; Wang, Liangbang; Guo, Hengliang; Feng, Zhenzhen; Wang, Shaobo; Shi, Hong; Jiang, Binghua; Wu, Tianyi; Qi, Xuebin; Su, Bing.

Proc Natl Acad Sci U S A ; 119(40): e2200421119, 2022 10 04.

Article in English | MEDLINE | ID: mdl-36161951

ABSTRACT

Strong ultraviolet (UV) radiation at high altitude imposes a serious selective pressure, which may induce skin pigmentation adaptation of indigenous populations. We conducted skin pigmentation phenotyping and genome-wide analysis of Tibetans in order to understand the underlying mechanism of adaptation to UV radiation. We observe that Tibetans have darker baseline skin color compared with lowland Han Chinese, as well as an improved tanning ability, suggesting a two-level adaptation to boost their melanin production. A genome-wide search for the responsible genes identifies GNPAT showing strong signals of positive selection in Tibetans. An enhancer mutation (rs75356281) located in GNPAT intron 2 is enriched in Tibetans (58%) but rare in other world populations (0 to 18%). The adaptive allele of rs75356281 is associated with darker skin in Tibetans and, under UVB treatment, it displays higher enhancer activities compared with the wild-type allele in in vitro luciferase assays. Transcriptome analyses of gene-edited cells clearly show that with UVB treatment, the adaptive variant of GNPAT promotes melanin synthesis, likely through the interactions of CAT and ACAA1 in peroxisomes with other pigmentation genes, and they act synergistically, leading to an improved tanning ability in Tibetans for UV protection.

Subject(s)

Adaptation, Physiological , Altitude , Skin Pigmentation , Acyltransferases/genetics , Adaptation, Physiological/genetics , Ethnicity , Humans , Melanins/genetics , Phenotype , Skin Pigmentation/genetics , Tibet , Transcriptome , Ultraviolet Rays

6.

Regulatory analysis of single cell multiome gene expression and chromatin accessibility data with scREG.

Duren, Zhana; Chang, Fengge; Naqing, Fnu; Xin, Jingxue; Liu, Qiao; Wong, Wing Hung.

Genome Biol ; 23(1): 114, 2022 05 16.

Article in English | MEDLINE | ID: mdl-35578363

ABSTRACT

Technological development has enabled the profiling of gene expression and chromatin accessibility from the same cell. We develop scREG, a dimension reduction methodology, based on the concept of cis-regulatory potential, for single cell multiome data. This concept is further used for the construction of subpopulation-specific cis-regulatory networks. The capability of inferring useful regulatory network is demonstrated by the two-fold increment on network inference accuracy compared to the Pearson correlation-based method and the 27-fold enrichment of GWAS variants for inflammatory bowel disease in the cis-regulatory elements. The R package scREG provides comprehensive functions for single cell multiome data analysis.

Subject(s)

Chromatin , Regulatory Sequences, Nucleic Acid , Chromatin/genetics , Gene Expression , Gene Regulatory Networks , Single-Cell Analysis

7.

Sc-compReg enables the comparison of gene regulatory networks between conditions using single-cell data.

Duren, Zhana; Lu, Wenhui Sophia; Arthur, Joseph G; Shah, Preyas; Xin, Jingxue; Meschi, Francesca; Li, Miranda Lin; Nemec, Corey M; Yin, Yifeng; Wong, Wing Hung.

Nat Commun ; 12(1): 4763, 2021 08 06.

Article in English | MEDLINE | ID: mdl-34362918

ABSTRACT

The comparison of gene regulatory networks between diseased versus healthy individuals or between two different treatments is an important scientific problem. Here, we propose sc-compReg as a method for the comparative analysis of gene expression regulatory networks between two conditions using single cell gene expression (scRNA-seq) and single cell chromatin accessibility data (scATAC-seq). Our software, sc-compReg, can be used as a stand-alone package that provides joint clustering and embedding of the cells from both scRNA-seq and scATAC-seq, and the construction of differential regulatory networks across two conditions. We apply the method to compare the gene regulatory networks of an individual with chronic lymphocytic leukemia (CLL) versus a healthy control. The analysis reveals a tumor-specific B cell subpopulation in the CLL patient and identifies TOX2 as a potential regulator of this subpopulation.

Subject(s)

Gene Regulatory Networks , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Single-Cell Analysis/methods , B-Lymphocytes , Chromatin , Gene Expression Regulation, Neoplastic , HMGB Proteins , Humans , RNA, Small Cytoplasmic , Software

8.

Chromatin accessibility landscape and regulatory network of high-altitude hypoxia adaptation.

Xin, Jingxue; Zhang, Hui; He, Yaoxi; Duren, Zhana; Bai, Caijuan; Chen, Lang; Luo, Xin; Yan, Dong-Sheng; Zhang, Chaoyu; Zhu, Xiang; Yuan, Qiuyue; Feng, Zhanying; Cui, Chaoying; Qi, Xuebin; Wong, Wing Hung; Wang, Yong; Su, Bing.

Nat Commun ; 11(1): 4928, 2020 10 01.

Article in English | MEDLINE | ID: mdl-33004791

ABSTRACT

High-altitude adaptation of Tibetans represents a remarkable case of natural selection during recent human evolution. Previous genome-wide scans found many non-coding variants under selection, suggesting a pressing need to understand the functional role of non-coding regulatory elements (REs). Here, we generate time courses of paired ATAC-seq and RNA-seq data on cultured HUVECs under hypoxic and normoxic conditions. We further develop a variant interpretation methodology (vPECA) to identify active selected REs (ASREs) and associated regulatory network. We discover three causal SNPs of EPAS1, the key adaptive gene for Tibetans. These SNPs decrease the accessibility of ASREs with weakened binding strength of relevant TFs, and cooperatively down-regulate EPAS1 expression. We further construct the downstream network of EPAS1, elucidating its roles in hypoxic response and angiogenesis. Collectively, we provide a systematic approach to interpret phenotype-associated noncoding variants in proper cell types and relevant dynamic conditions, to model their impact on gene regulation.

Subject(s)

Acclimatization/genetics , Chromatin/metabolism , Ethnicity/genetics , Gene Regulatory Networks , Models, Genetic , Altitude , Altitude Sickness/ethnology , Altitude Sickness/genetics , Altitude Sickness/metabolism , Basic Helix-Loop-Helix Transcription Factors/genetics , Cell Hypoxia/genetics , Cells, Cultured , Chromatin/genetics , Chromatin Immunoprecipitation Sequencing , Disease Resistance/genetics , Female , Gene Expression Regulation , Human Umbilical Vein Endothelial Cells , Humans , Hypoxia/genetics , Hypoxia/metabolism , Oxygen/metabolism , Polymorphism, Single Nucleotide , Pregnancy , Primary Cell Culture , RNA-Seq , Regulatory Elements, Transcriptional/genetics , Selection, Genetic , Tibet/ethnology , Transcription Factors/metabolism , Whole Genome Sequencing

9.

Time course regulatory analysis based on paired expression and chromatin accessibility data.

Duren, Zhana; Chen, Xi; Xin, Jingxue; Wang, Yong; Wong, Wing Hung.

Genome Res ; 30(4): 622-634, 2020 04.

Article in English | MEDLINE | ID: mdl-32188700

ABSTRACT

A time course experiment is a widely used design in the study of cellular processes such as differentiation or response to stimuli. In this paper, we propose time course regulatory analysis (TimeReg) as a method for the analysis of gene regulatory networks based on paired gene expression and chromatin accessibility data from a time course. TimeReg can be used to prioritize regulatory elements, to extract core regulatory modules at each time point, to identify key regulators driving changes of the cellular state, and to causally connect the modules across different time points. We applied the method to analyze paired chromatin accessibility and gene expression data from a retinoic acid (RA)-induced mouse embryonic stem cells (mESCs) differentiation experiment. The analysis identified 57,048 novel regulatory elements regulating cerebellar development, synapse assembly, and hindbrain morphogenesis, which substantially extended our knowledge of cis-regulatory elements during differentiation. Using single-cell RNA-seq data, we showed that the core regulatory modules can reflect the properties of different subpopulations of cells. Finally, the driver regulators are shown to be important in clarifying the relations between modules across adjacent time points. As a second example, our method on Ascl1-induced direct reprogramming from fibroblast to neuron time course data identified Id1/2 as driver regulators of early stage of reprogramming.

Subject(s)

Chromatin Assembly and Disassembly , Chromatin/genetics , Gene Expression Regulation , Mouse Embryonic Stem Cells/metabolism , Algorithms , Animals , Cell Differentiation/drug effects , Cell Differentiation/genetics , Cell Lineage , Cellular Reprogramming/genetics , Cellular Reprogramming Techniques , Chromatin/metabolism , Computational Biology/methods , Gene Expression Profiling/methods , Gene Regulatory Networks , Mice , Mouse Embryonic Stem Cells/drug effects , Transcription Factors/metabolism , Transcriptome , Tretinoin/pharmacology

10.

TFAP2C- and p63-Dependent Networks Sequentially Rearrange Chromatin Landscapes to Drive Human Epidermal Lineage Commitment.

Li, Lingjie; Wang, Yong; Torkelson, Jessica L; Shankar, Gautam; Pattison, Jillian M; Zhen, Hanson H; Fang, Fengqin; Duren, Zhana; Xin, Jingxue; Gaddam, Sadhana; Melo, Sandra P; Piekos, Samantha N; Li, Jiang; Liaw, Eric J; Chen, Lang; Li, Rui; Wernig, Marius; Wong, Wing H; Chang, Howard Y; Oro, Anthony E.

Cell Stem Cell ; 24(2): 271-284.e8, 2019 02 07.

Article in English | MEDLINE | ID: mdl-30686763

ABSTRACT

Tissue development results from lineage-specific transcription factors (TFs) programming a dynamic chromatin landscape through progressive cell fate transitions. Here, we define epigenomic landscape during epidermal differentiation of human pluripotent stem cells (PSCs) and create inference networks that integrate gene expression, chromatin accessibility, and TF binding to define regulatory mechanisms during keratinocyte specification. We found two critical chromatin networks during surface ectoderm initiation and keratinocyte maturation, which are driven by TFAP2C and p63, respectively. Consistently, TFAP2C, but not p63, is sufficient to initiate surface ectoderm differentiation, and TFAP2C-initiated progenitor cells are capable of maturing into functional keratinocytes. Mechanistically, TFAP2C primes the surface ectoderm chromatin landscape and induces p63 expression and binding sites, thus allowing maturation factor p63 to positively autoregulate its own expression and close a subset of the TFAP2C-initiated surface ectoderm program. Our work provides a general framework to infer TF networks controlling chromatin transitions that will facilitate future regenerative medicine advances.

Subject(s)

Cell Lineage , Chromatin/metabolism , Epidermis/metabolism , Gene Regulatory Networks , Transcription Factor AP-2/metabolism , Transcription Factors/metabolism , Tumor Suppressor Proteins/metabolism , Cell Differentiation , Ectoderm/cytology , Epigenesis, Genetic , Feedback, Physiological , Humans , Keratinocytes/cytology , Transcriptome/genetics

11.

Lipid-gene regulatory network reveals coregulations of triacylglycerol with phosphatidylinositol/lysophosphatidylinositol and with hexosyl-ceramide.

Wang, Wei; Xin, Jingxue; Yang, Xiao; Lam, Sin Man; Shui, Guanghou; Wang, Yong; Huang, Xun.

Biochim Biophys Acta Mol Cell Biol Lipids ; 1864(2): 168-180, 2019 02.

Article in English | MEDLINE | ID: mdl-30521938

ABSTRACT

Lipid homeostasis is important for executing normal cellular functions and maintaining physiological conditions. The biophysical properties and intricate metabolic network of lipids underlie the coordinated regulation of different lipid species in lipid homeostasis. To reveal the homeostatic response among different lipids, we systematically knocked down 40 lipid metabolism genes in Drosophila S2 cells by RNAi and profiled the lipidomic changes. Clustering analyses of lipids reveal that many pairs of genes acting in a sequential fashion or sharing the same substrate are tightly clustered. Through a lipid-gene regulatory network analysis, we further found that a reduction of triacylglycerol (TAG) is associated with an increase of phosphatidylinositol (PI) and lysophosphatidylinositol (LPI) or a reduction of hexosyl-ceramide (HexCer) and hydroxylated hexosyl-ceramide (OH-HexCer). Importantly, negative coregulation between TAG and LPI/PI, and positive coregulation between TAG and HexCer, were also found in human Hela cells. Together, our results reveal coregulations of TAG with PI/LPI and with HexCer in lipid homeostasis.

Subject(s)

Lipids/genetics , Phosphatidylinositols/metabolism , Triglycerides/metabolism , Animals , Cell Line , Ceramides/metabolism , Ceramides/physiology , Drosophila , Gene Regulatory Networks/genetics , HeLa Cells , Homeostasis , Humans , Lipid Metabolism/genetics , Lipids/physiology , Lysophospholipids/metabolism , Signal Transduction , Triglycerides/genetics

12.

A Semisupervised Classification Approach for Multidomain Networks With Domain Selection.

Chen, Chuan; Xin, Jingxue; Wang, Yong; Chen, Luonan; Ng, Michael K.

IEEE Trans Neural Netw Learn Syst ; 30(1): 269-283, 2019 01.

Article in English | MEDLINE | ID: mdl-29994273

ABSTRACT

Multidomain network classification has attracted significant attention in data integration and machine learning, which can enhance network classification or prediction performance by integrating information from different sources. Despite the previous success, existing multidomain network learning methods usually assume that different views are available for the same set of instances, and thus, they seek a consistent classification result for all domains. However, in many real-world problems, each domain has its specific instance set, and one instance in one domain may correspond to multiple instances in another domain. Moreover, due to the rapid growth of data sources, different domains may not be relevant to each other, which asks for selecting domains relevant to the target/focused domain. A key challenge under this setting is how to achieve accurate prediction by integrating different data representations without losing data information. In this paper, we propose a semisupervised classification approach for a multidomain network based on label propagation, i.e., multidomain classification with domain selection (MCS), which can deal with the cross-domain information and different instance sets in domains. In particular, with sparse weight properties, the proposed MCS can automatically identify those domains relevant to our target domain by assigning them higher weights than the other irrelevant domains. This not only significantly improves a classification accuracy but also helps to obtain optimal network partition for the target domain. From the theoretical viewpoint, we equivalently decompose MCS into two simpler subproblems with analytical solutions, which can be efficiently solved by their computational procedures. Extensive experimental results on both synthetic and real-world data sets empirically demonstrate the advantages of the proposed approach in terms of both prediction performance and domain selection ability.

13.

Drosophila TRF2 and TAF9 regulate lipid droplet size and phospholipid fatty acid composition.

Fan, Wei; Lam, Sin Man; Xin, Jingxue; Yang, Xiao; Liu, Zhonghua; Liu, Yuan; Wang, Yong; Shui, Guanghou; Huang, Xun.

PLoS Genet ; 13(3): e1006664, 2017 03.

Article in English | MEDLINE | ID: mdl-28273089

ABSTRACT

The general transcription factor TBP (TATA-box binding protein) and its associated factors (TAFs) together form the TFIID complex, which directs transcription initiation. Through RNAi and mutant analysis, we identified a specific TBP family protein, TRF2, and a set of TAFs that regulate lipid droplet (LD) size in the Drosophila larval fat body. Among the three Drosophila TBP genes, trf2, tbp and trf1, only loss of function of trf2 results in increased LD size. Moreover, TRF2 and TAF9 regulate fatty acid composition of several classes of phospholipids. Through RNA profiling, we found that TRF2 and TAF9 affects the transcription of a common set of genes, including peroxisomal fatty acid ß-oxidation-related genes that affect phospholipid fatty acid composition. We also found that knockdown of several TRF2 and TAF9 target genes results in large LDs, a phenotype which is similar to that of trf2 mutants. Together, these findings provide new insights into the specific role of the general transcription machinery in lipid homeostasis.

Subject(s)

Drosophila Proteins/metabolism , Drosophila/genetics , Fatty Acids/chemistry , Lipids/chemistry , TATA-Binding Protein Associated Factors/metabolism , Telomeric Repeat Binding Protein 2/metabolism , Transcription Factor TFIID/metabolism , Alleles , Amino Acid Motifs , Animals , Drosophila/metabolism , Homeostasis , Mutation , Oxygen/chemistry , Peroxisomes/chemistry , Phenotype , Phospholipids/chemistry , RNA Interference , Sequence Analysis, RNA , Transcription Factor TFIID/chemistry

14.

Identifying network biomarkers based on protein-protein interactions and expression data.

Xin, Jingxue; Ren, Xianwen; Chen, Luonan; Wang, Yong.

BMC Med Genomics ; 8 Suppl 2: S11, 2015.

Article in English | MEDLINE | ID: mdl-26044366

ABSTRACT

Identifying effective biomarkers to battle complex diseases is an important but challenging task in biomedical research today. Molecular data of complex diseases is increasingly abundant due to the rapid advance of high throughput technologies. However, a great gap remains in identifying the massive molecular data to phenotypic changes, in particular, at a network level, i.e., a novel method for identifying network biomarkers is in pressing need to accurately classify and diagnose diseases from molecular data and shed light on the mechanisms of disease pathogenesis. Rather than seeking differential genes at an individual-molecule level, here we propose a novel method for identifying network biomarkers based on protein-protein interaction affinity (PPIA), which identify the differential interactions at a network level. Specifically, we firstly define PPIAs by estimating the concentrations of protein complexes based on the law of mass action upon gene expression data. Then we select a small and non-redundant group of protein-protein interactions and single proteins according to the PPIAs, that maximizes the discerning ability of cases from controls. This method is mathematically formulated as a linear programming, which can be efficiently solved and guarantees a globally optimal solution. Extensive results on experimental data in breast cancer demonstrate the effectiveness and efficiency of the proposed method for identifying network biomarkers, which not only can accurately distinguish the phenotypes but also provides significant biological insights at a network or pathway level. In addition, our method provides a new way to integrate static protein-protein interaction information with dynamical gene expression data.

Subject(s)

Biomarkers, Tumor/metabolism , Databases, Protein , Protein Interaction Maps , Statistics as Topic , Algorithms , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Software

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL