Pesquisa | BVS Integralidade em Saúde

1.

INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis.

Zhao, Kai; Huang, Sen; Lin, Cuichan; Sham, Pak Chung; So, Hon-Cheong; Lin, Zhixiang.

PLoS Genet ; 20(3): e1011189, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38484017

RESUMO

RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared low-rank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of > = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing 'adjusted' expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.

Assuntos

Algoritmos , Transcriptoma , Transcriptoma/genética , Análise de Sequência de RNA , Análise de Dados , RNA/genética , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Análise por Conglomerados

2.

RefTM: reference-guided topic modeling of single-cell chromatin accessibility data.

Zhang, Zheng; Chen, Shengquan; Lin, Zhixiang.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36513377

RESUMO

Single-cell analysis is a valuable approach for dissecting the cellular heterogeneity, and single-cell chromatin accessibility sequencing (scCAS) can profile the epigenetic landscapes for thousands of individual cells. It is challenging to analyze scCAS data, because of its high dimensionality and a higher degree of sparsity compared with scRNA-seq data. Topic modeling in single-cell data analysis can lead to robust identification of the cell types and it can provide insight into the regulatory mechanisms. Reference-guided approach may facilitate the analysis of scCAS data by utilizing the information in existing datasets. We present RefTM (Reference-guided Topic Modeling of single-cell chromatin accessibility data), which not only utilizes the information in existing bulk chromatin accessibility and annotated scCAS data, but also takes advantage of topic models for single-cell data analysis. RefTM simultaneously models: (1) the shared biological variation among reference data and the target scCAS data; (2) the unique biological variation in scCAS data; (3) other variations from known covariates in scCAS data.

Assuntos

Cromatina , Cromatina/genética

3.

SGCAST: symmetric graph convolutional auto-encoder for scalable and accurate study of spatial transcriptomics.

Li, Jinzhao; Wang, Jiong; Lin, Zhixiang.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38171928

RESUMO

Recent advances in spatial transcriptomics (ST) have enabled comprehensive profiling of gene expression with spatial information in the context of the tissue microenvironment. However, with the improvements in the resolution and scale of ST data, deciphering spatial domains precisely while ensuring efficiency and scalability is still challenging. Here, we develop SGCAST, an efficient auto-encoder framework to identify spatial domains. SGCAST adopts a symmetric graph convolutional auto-encoder to learn aggregated latent embeddings via integrating the gene expression similarity and the proximity of the spatial spots. This framework in SGCAST enables a mini-batch training strategy, which makes SGCAST memory-efficient and scalable to high-resolution spatial transcriptomic data with a large number of spots. SGCAST improves the overall accuracy of spatial domain identification on benchmarking data. We also validated the performance of SGCAST on ST datasets at various scales across multiple platforms. Our study illustrates the superior capacity of SGCAST on analyzing spatial transcriptomic data.

Assuntos

Perfilação da Expressão Gênica , Transcriptoma , Benchmarking , Aprendizagem

4.

Mendelian randomization for causal inference accounting for pleiotropy and sample structure using genome-wide summary statistics.

Hu, Xianghong; Zhao, Jia; Lin, Zhixiang; Wang, Yang; Peng, Heng; Zhao, Hongyu; Wan, Xiang; Yang, Can.

Proc Natl Acad Sci U S A ; 119(28): e2106858119, 2022 07 12.

Artigo em Inglês | MEDLINE | ID: mdl-35787050

RESUMO

Mendelian randomization (MR) is a valuable tool for inferring causal relationships among a wide range of traits using summary statistics from genome-wide association studies (GWASs). Existing summary-level MR methods often rely on strong assumptions, resulting in many false-positive findings. To relax MR assumptions, ongoing research has been primarily focused on accounting for confounding due to pleiotropy. Here, we show that sample structure is another major confounding factor, including population stratification, cryptic relatedness, and sample overlap. We propose a unified MR approach, MR-APSS, which 1) accounts for pleiotropy and sample structure simultaneously by leveraging genome-wide information; and 2) allows the inclusion of more genetic variants with moderate effects as instrument variables (IVs) to improve statistical power without inflating type I errors. We first evaluated MR-APSS using comprehensive simulations and negative controls and then applied MR-APSS to study the causal relationships among a collection of diverse complex traits. The results suggest that MR-APSS can better identify plausible causal relationships with high reliability. In particular, MR-APSS can perform well for highly polygenic traits, where the IV strengths tend to be relatively weak and existing summary-level MR methods for causal inference are vulnerable to confounding effects.

Assuntos

Pleiotropia Genética , Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Causalidade , Análise da Randomização Mendeliana/métodos , Fenótipo , Reprodutibilidade dos Testes

5.

FIRM: Flexible integration of single-cell RNA-sequencing data for large-scale multi-tissue cell atlas datasets.

Ming, Jingsi; Lin, Zhixiang; Zhao, Jia; Wan, Xiang; Yang, Can; Wu, Angela Ruohao.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35561293

RESUMO

Single-cell RNA-sequencing (scRNA-seq) is being used extensively to measure the mRNA expression of individual cells from deconstructed tissues, organs and even entire organisms to generate cell atlas references, leading to discoveries of novel cell types and deeper insight into biological trajectories. These massive datasets are usually collected from many samples using different scRNA-seq technology platforms, including the popular SMART-Seq2 (SS2) and 10X platforms. Inherent heterogeneities between platforms, tissues and other batch effects make scRNA-seq data difficult to compare and integrate, especially in large-scale cell atlas efforts; yet, accurate integration is essential for gaining deeper insights into cell biology. We present FIRM, a re-scaling algorithm which accounts for the effects of cell type compositions, and achieve accurate integration of scRNA-seq datasets across multiple tissue types, platforms and experimental batches. Compared with existing state-of-the-art integration methods, FIRM provides accurate mixing of shared cell type identities and superior preservation of original structure without overcorrection, generating robust integrated datasets for downstream exploration and analysis. FIRM is also a facile way to transfer cell type labels and annotations from one dataset to another, making it a reliable and versatile tool for scRNA-seq analysis, especially for cell atlas data integration.

Assuntos

Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , RNA , RNA Mensageiro , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos

6.

JSNMF enables effective and accurate integrative analysis of single-cell multiomics data.

Ma, Yuanyuan; Sun, Zexuan; Zeng, Pengcheng; Zhang, Wenyu; Lin, Zhixiang.

Brief Bioinform ; 23(3)2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35380624

RESUMO

The single-cell multiomics technologies provide an unprecedented opportunity to study the cellular heterogeneity from different layers of transcriptional regulation. However, the datasets generated from these technologies tend to have high levels of noise, making data analysis challenging. Here, we propose jointly semi-orthogonal nonnegative matrix factorization (JSNMF), which is a versatile toolkit for the integrative analysis of transcriptomic and epigenomic data profiled from the same cell. JSNMF enables data visualization and clustering of the cells and also facilitates downstream analysis, including the characterization of markers and functional pathway enrichment analysis. The core of JSNMF is an unsupervised method based on JSNMF, where it assumes different latent variables for the two molecular modalities, and integrates the information of transcriptomic and epigenomic data with consensus graph fusion, which better tackles the distinct characteristics and levels of noise across different molecular modalities in single-cell multiomics data. We applied JSNMF to single-cell multiomics datasets from different tissues and different technologies. The results demonstrate the superior performance of JSNMF in clustering and data visualization of the cells. JSNMF also allows joint analysis of multiple single-cell multiomics experiments and single-cell multiomics data with more than two modalities profiled on the same cell. JSNMF also provides rich biological insight on the markers, cell-type-specific region-gene associations and the functions of the identified cell subpopulation.

Assuntos

Genômica , Análise de Célula Única , Algoritmos , Análise por Conglomerados , Genômica/métodos , Análise de Célula Única/métodos , Transcriptoma

7.

scAWMV: an adaptively weighted multi-view learning framework for the integrative analysis of parallel scRNA-seq and scATAC-seq data.

Zeng, Pengcheng; Ma, Yuanyuan; Lin, Zhixiang.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36383176

RESUMO

MOTIVATION: Technological advances have enabled us to profile single-cell multi-omics data from the same cells, providing us with an unprecedented opportunity to understand the cellular phenotype and links to its genotype. The available protocols and multi-omics datasets [including parallel single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) data profiled from the same cell] are growing increasingly. However, such data are highly sparse and tend to have high level of noise, making data analysis challenging. The methods that integrate the multi-omics data can potentially improve the capacity of revealing the cellular heterogeneity. RESULTS: We propose an adaptively weighted multi-view learning (scAWMV) method for the integrative analysis of parallel scRNA-seq and scATAC-seq data profiled from the same cell. scAWMV considers both the difference in importance across different modalities in multi-omics data and the biological connection of the features in the scRNA-seq and scATAC-seq data. It generates biologically meaningful low-dimensional representations for the transcriptomic and epigenomic profiles via unsupervised learning. Application to four real datasets demonstrates that our framework scAWMV is an efficient method to dissect cellular heterogeneity for single-cell multi-omics data. AVAILABILITY AND IMPLEMENTATION: The software and datasets are available at https://github.com/pengchengzeng/scAWMV. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Software , Transcriptoma , Análise de Sequência de RNA

8.

stVAE deconvolves cell-type composition in large-scale cellular resolution spatial transcriptomics.

Li, Chen; Chan, Ting-Fung; Yang, Can; Lin, Zhixiang.

Bioinformatics ; 39(10)2023 10 03.

Artigo em Inglês | MEDLINE | ID: mdl-37862237

RESUMO

MOTIVATION: Recent rapid developments in spatial transcriptomic techniques at cellular resolution have gained increasing attention. However, the unique characteristics of large-scale cellular resolution spatial transcriptomic datasets, such as the limited number of transcripts captured per spot and the vast number of spots, pose significant challenges to current cell-type deconvolution methods. RESULTS: In this study, we introduce stVAE, a method based on the variational autoencoder framework to deconvolve the cell-type composition of cellular resolution spatial transcriptomic datasets. To assess the performance of stVAE, we apply it to five datasets across three different biological tissues. In the Stereo-seq and Slide-seqV2 datasets of the mouse brain, stVAE accurately reconstructs the laminar structure of the pyramidal cell layers in the cortex, which are mainly organized by the subtypes of telencephalon projecting excitatory neurons. In the Stereo-seq dataset of the E12.5 mouse embryo, stVAE resolves the complex spatial patterns of osteoblast subtypes, which are supported by their marker genes. In Stereo-seq and Pixel-seq datasets of the mouse olfactory bulb, stVAE accurately delineates the spatial distributions of known cell types. In summary, stVAE can accurately identify spatial patterns of cell types and their relative proportions across spots for cellular resolution spatial transcriptomic data. It is instrumental in understanding the heterogeneity of cell populations and their interactions within tissues. AVAILABILITY AND IMPLEMENTATION: stVAE is available in GitHub (https://github.com/lichen2018/stVAE) and Figshare (https://figshare.com/articles/software/stVAE/23254538).

Assuntos

Algoritmos , Transcriptoma , Animais , Camundongos , Software , Análise de Sequência de RNA/métodos , Análise de Célula Única , Perfilação da Expressão Gênica/métodos

9.

Coupled co-clustering-based unsupervised transfer learning for the integrative analysis of single-cell genomic data.

Zeng, Pengcheng; Wangwu, Jiaxuan; Lin, Zhixiang.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33279962

RESUMO

Unsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. The most current clustering methods are designed for one data type only, such as single-cell RNA sequencing (scRNA-seq), single-cell ATAC sequencing (scATAC-seq) or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. The integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. In this paper, we propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. In co-clustering, both the cells and the genomic features are simultaneously clustered. Clustering similar genomic features reduces the noise in single-cell data and facilitates transfer of knowledge across single-cell datasets. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. Our method coupleCoC is also computationally efficient and can scale up to large datasets. Availability: The software and datasets are available at https://github.com/cuhklinlab/coupleCoC.

Assuntos

Bases de Dados de Ácidos Nucleicos , RNA-Seq , Análise de Célula Única , Software , Aprendizado de Máquina não Supervisionado , Animais , Humanos , Camundongos

10.

Therapeutic Potential of Adipose-Derived Stem Cell-Conditioned Medium and Extracellular Vesicles in an In Vitro Radiation-Induced Skin Injury Model.

Lin, Zhixiang; Shibuya, Yoichiro; Imai, Yukiko; Oshima, Junya; Sasaki, Masahiro; Sasaki, Kaoru; Aihara, Yukiko; Khanh, Vuong Cat; Sekido, Mitsuru.

Int J Mol Sci ; 24(24)2023 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-38139042

RESUMO

Radiotherapy (RT) is one of three major treatments for malignant tumors, and one of its most common side effects is skin and soft tissue injury. However, the treatment of these remains challenging. Several studies have shown that mesenchymal stem cell (MSC) treatment enhances skin wound healing. In this study, we extracted human dermal fibroblasts (HDFs) and adipose-derived stem cells (ADSCs) from patients and generated an in vitro radiation-induced skin injury model with HDFs to verify the effect of conditioned medium derived from adipose-derived stem cells (ADSC-CM) and extracellular vesicles derived from adipose-derived stem cells (ADSC-EVs) on the healing of radiation-induced skin injury. The results showed that collagen synthesis was significantly increased in wounds treated with ADSC-CM or ADSC-EVs compared with the control group, which promoted the expression of collagen-related genes and suppressed the expression of inflammation-related genes. These findings indicated that treatment with ADSC-CM or ADSC-EVs suppressed inflammation and promoted extracellular matrix deposition; treatment with ADSC-EVs also promoted fibroblast proliferation. In conclusion, these results demonstrate the effectiveness of ADSC-CM and ADSC-EVs in the healing of radiation-induced skin injury.

Assuntos

Vesículas Extracelulares , Lesões por Radiação , Humanos , Meios de Cultivo Condicionados/farmacologia , Meios de Cultivo Condicionados/metabolismo , Tecido Adiposo/metabolismo , Células-Tronco/metabolismo , Lesões por Radiação/metabolismo , Inflamação/metabolismo , Colágeno/metabolismo

11.

scAMACE: model-based approach to the joint analysis of single-cell data on chromatin accessibility, gene expression and methylation.

Wangwu, Jiaxuan; Sun, Zexuan; Lin, Zhixiang.

Bioinformatics ; 37(21): 3874-3880, 2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34086847

RESUMO

MOTIVATION: The advancement in technologies and the growth of available single-cell datasets motivate integrative analysis of multiple single-cell genomic datasets. Integrative analysis of multimodal single-cell datasets combines complementary information offered by single-omic datasets and can offer deeper insights on complex biological process. Clustering methods that identify the unknown cell types are among the first few steps in the analysis of single-cell datasets, and they are important for downstream analysis built upon the identified cell types. RESULTS: We propose scAMACE for the integrative analysis and clustering of single-cell data on chromatin accessibility, gene expression and methylation. We demonstrate that cell types are better identified and characterized through analyzing the three data types jointly. We develop an efficient Expectation-Maximization algorithm to perform statistical inference, and evaluate our methods on both simulation study and real data applications. We also provide the GPU implementation of scAMACE, making it scalable to large datasets. AVAILABILITY AND IMPLEMENTATION: The software and datasets are available at https://github.com/cuhklinlab/scAMACE_py (python implementation) and https://github.com/cuhklinlab/scAMACE (R implementation). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Cromatina , Análise de Célula Única , Metilação , Análise de Célula Única/métodos , Software , Expressão Gênica

12.

coupleCoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data.

Zeng, Pengcheng; Lin, Zhixiang.

PLoS Comput Biol ; 17(6): e1009064, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-34077420

RESUMO

Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus.

Assuntos

Genômica/estatística & dados numéricos , Aprendizado de Máquina , Software , Animais , Córtex Cerebral/metabolismo , Análise por Conglomerados , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Células Dendríticas/metabolismo , Humanos , Teoria da Informação , Camundongos , RNA Citoplasmático Pequeno/genética , RNA-Seq , Análise de Célula Única/estatística & dados numéricos

13.

Divergence in a master variator generates distinct phenotypes and transcriptional responses.

Gallagher, Jennifer E G; Zheng, Wei; Rong, Xiaoqing; Miranda, Noraliz; Lin, Zhixiang; Dunn, Barbara; Zhao, Hongyu; Snyder, Michael P.

Genes Dev ; 28(4): 409-21, 2014 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-24532717

RESUMO

Genetic basis of phenotypic differences in individuals is an important area in biology and personalized medicine. Analysis of divergent Saccharomyces cerevisiae strains grown under different conditions revealed extensive variation in response to both drugs (e.g., 4-nitroquinoline 1-oxide [4NQO]) and different carbon sources. Differences in 4NQO resistance were due to amino acid variation in the transcription factor Yrr1. Yrr1(YJM789) conferred 4NQO resistance but caused slower growth on glycerol, and vice versa with Yrr1(S96), indicating that alleles of Yrr1 confer distinct phenotypes. The binding targets of Yrr1 alleles from diverse yeast strains varied considerably among different strains grown under the same conditions as well as for the same strain under different conditions, indicating that distinct molecular programs are conferred by the different Yrr1 alleles. Our results demonstrate that genetic variations in one important control gene (YRR1), lead to distinct regulatory programs and phenotypes in individuals. We term these polymorphic control genes "master variators."

Assuntos

Regulação Fúngica da Expressão Gênica/genética , Variação Genética , Fenótipo , Saccharomyces cerevisiae/fisiologia , 4-Nitroquinolina-1-Óxido/farmacologia , Alelos , Farmacorresistência Fúngica/genética , Glicerol/metabolismo , Mutagênicos/farmacologia , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crescimento & desenvolvimento , Tiorredoxina Dissulfeto Redutase/genética , Tiorredoxina Dissulfeto Redutase/metabolismo

14.

Predicting transcription factor binding motifs from DNA-binding domains, chromatin accessibility and gene expression data.

Zamanighomi, Mahdi; Lin, Zhixiang; Wang, Yong; Jiang, Rui; Wong, Wing Hung.

Nucleic Acids Res ; 45(10): 5666-5677, 2017 Jun 02.

Artigo em Inglês | MEDLINE | ID: mdl-28472398

RESUMO

Transcription factors (TFs) play crucial roles in regulating gene expression through interactions with specific DNA sequences. Recently, the sequence motif of almost 400 human TFs have been identified using high-throughput SELEX sequencing. However, there remain a large number of TFs (â¼800) with no high-throughput-derived binding motifs. Computational methods capable of associating known motifs to such TFs will avoid tremendous experimental efforts and enable deeper understanding of transcriptional regulatory functions. We present a method to associate known motifs to TFs (MATLAB code is available in Supplementary Materials). Our method is based on a probabilistic framework that not only exploits DNA-binding domains and specificities, but also integrates open chromatin, gene expression and genomic data to accurately infer monomeric and homodimeric binding motifs. Our analysis resulted in the assignment of motifs to 200 TFs with no SELEX-derived motifs, roughly a 50% increase compared to the existing coverage.

Assuntos

Algoritmos , Cromatina/química , DNA/química , Regulação da Expressão Gênica , Modelos Estatísticos , Fatores de Transcrição/genética , Sítios de Ligação , Cromatina/metabolismo , DNA/genética , DNA/metabolismo , Genoma Humano , Humanos , Motivos de Nucleotídeos , Ligação Proteica , Técnica de Seleção de Aptâmeros , Fatores de Transcrição/metabolismo

15.

Simultaneous dimension reduction and adjustment for confounding variation.

Lin, Zhixiang; Yang, Can; Zhu, Ying; Duchi, John; Fu, Yao; Wang, Yong; Jiang, Bai; Zamanighomi, Mahdi; Xu, Xuming; Li, Mingfeng; Sestan, Nenad; Zhao, Hongyu; Wong, Wing Hung.

Proc Natl Acad Sci U S A ; 113(51): 14662-14667, 2016 12 20.

Artigo em Inglês | MEDLINE | ID: mdl-27930330

RESUMO

Dimension reduction methods are commonly applied to high-throughput biological datasets. However, the results can be hindered by confounding factors, either biological or technical in origin. In this study, we extend principal component analysis (PCA) to propose AC-PCA for simultaneous dimension reduction and adjustment for confounding (AC) variation. We show that AC-PCA can adjust for (i) variations across individual donors present in a human brain exon array dataset and (ii) variations of different species in a model organism ENCODE RNA sequencing dataset. Our approach is able to recover the anatomical structure of neocortical regions and to capture the shared variation among species during embryonic development. For gene selection purposes, we extend AC-PCA with sparsity constraints and propose and implement an efficient algorithm. The methods developed in this paper can also be applied to more general settings. The R package and MATLAB source code are available at https://github.com/linzx06/AC-PCA.

Assuntos

Encéfalo/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Componente Principal , Análise de Sequência de RNA , Algoritmos , Mapeamento Encefálico , Simulação por Computador , Interpretação Estatística de Dados , Éxons , Humanos , Modelos Estatísticos , Software , Transcriptoma

16.

On joint estimation of Gaussian graphical models for spatial and temporal data.

Lin, Zhixiang; Wang, Tao; Yang, Can; Zhao, Hongyu.

Biometrics ; 73(3): 769-779, 2017 09.

Artigo em Inglês | MEDLINE | ID: mdl-28099997

RESUMO

In this article, we first propose a Bayesian neighborhood selection method to estimate Gaussian Graphical Models (GGMs). We show the graph selection consistency of this method in the sense that the posterior probability of the true model converges to one. When there are multiple groups of data available, instead of estimating the networks independently for each group, joint estimation of the networks may utilize the shared information among groups and lead to improved estimation for each individual network. Our method is extended to jointly estimate GGMs in multiple groups of data with complex structures, including spatial data, temporal data, and data with both spatial and temporal structures. Markov random field (MRF) models are used to efficiently incorporate the complex data structures. We develop and implement an efficient algorithm for statistical inference that enables parallel computing. Simulation studies suggest that our approach achieves better accuracy in network estimation compared with methods not incorporating spatial and temporal dependencies when there are shared structures among the networks, and that it performs comparably well otherwise. Finally, we illustrate our method using the human brain gene expression microarray dataset, where the expression levels of genes are measured in different brain regions across multiple time periods.

Assuntos

Modelos Estatísticos , Algoritmos , Teorema de Bayes , Simulação por Computador , Distribuição Normal

17.

A Markov random field-based approach for joint estimation of differentially expressed genes in mouse transcriptome data.

Lin, Zhixiang; Li, Mingfeng; Sestan, Nenad; Zhao, Hongyu.

Stat Appl Genet Mol Biol ; 15(2): 139-50, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-26926866

RESUMO

The statistical methodology developed in this study was motivated by our interest in studying neurodevelopment using the mouse brain RNA-Seq data set, where gene expression levels were measured in multiple layers in the somatosensory cortex across time in both female and male samples. We aim to identify differentially expressed genes between adjacent time points, which may provide insights on the dynamics of brain development. Because of the extremely small sample size (one male and female at each time point), simple marginal analysis may be underpowered. We propose a Markov random field (MRF)-based approach to capitalizing on the between layers similarity, temporal dependency and the similarity between sex. The model parameters are estimated by an efficient EM algorithm with mean field-like approximation. Simulation results and real data analysis suggest that the proposed model improves the power to detect differentially expressed genes than simple marginal analysis. Our method also reveals biologically interesting results in the mouse brain RNA-Seq data set.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Modelos Estatísticos , Análise de Sequência de RNA/estatística & dados numéricos , Transcriptoma/genética , Animais , Simulação por Computador , Feminino , Perfilação da Expressão Gênica/estatística & dados numéricos , Masculino , Cadeias de Markov , Camundongos , Análise de Regressão , Análise de Sequência de RNA/métodos

18.

scICML: Information-Theoretic Co-Clustering-Based Multi-View Learning for the Integrative Analysis of Single-Cell Multi-Omics Data.

Zeng, Pengcheng; Lin, Zhixiang.

IEEE/ACM Trans Comput Biol Bioinform ; 21(1): 200-207, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-37590102

RESUMO

Modern high-throughput sequencing technologies have enabled us to profile multiple molecular modalities from the same single cell, providing unprecedented opportunities to assay cellular heterogeneity from multiple biological layers. However, the datasets generated from these technologies tend to have high level of noise and are highly sparse, bringing challenges to data analysis. In this paper, we develop a novel information-theoretic co-clustering-based multi-view learning (scICML) method for multi-omics single-cell data integration. scICML utilizes co-clusterings to aggregate similar features for each view of data and uncover the common clustering pattern for cells. In addition, scICML automatically matches the clusters of the linked features across different data types for considering the biological dependency structure across different types of genomic features. Our experiments on four real-world datasets demonstrate that scICML improves the overall clustering performance and provides biological insights into the data analysis of peripheral blood mononuclear cells.

Assuntos

Leucócitos Mononucleares , Multiômica , Genômica/métodos , Análise por Conglomerados , Sequenciamento de Nucleotídeos em Larga Escala/métodos

19.

ABCD1 as a Novel Diagnostic Marker for Solid Pseudopapillary Neoplasm of the Pancreas.

Liu, Ying-Ao; Liu, Yuanhao; Tu, Jiajuan; Shi, Yihong; Pang, Junyi; Huang, Qi; Wang, Xun; Lin, Zhixiang; Zhao, Yupei; Wang, Wenze; Peng, Junya; Wu, Wenming.

Am J Surg Pathol ; 48(5): 511-520, 2024 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-38567813

RESUMO

The diagnosis of solid pseudopapillary neoplasm of the pancreas (SPN) can be challenging due to potential confusion with other pancreatic neoplasms, particularly pancreatic neuroendocrine tumors (NETs), using current pathological diagnostic markers. We conducted a comprehensive analysis of bulk RNA sequencing data from SPNs, NETs, and normal pancreas, followed by experimental validation. This analysis revealed an increased accumulation of peroxisomes in SPNs. Moreover, we observed significant upregulation of the peroxisome marker ABCD1 in both primary and metastatic SPN samples compared with normal pancreas and NETs. To further investigate the potential utility of ABCD1 as a diagnostic marker for SPN via immunohistochemistry staining, we conducted verification in a large-scale patient cohort with pancreatic tumors, including 127 SPN (111 primary, 16 metastatic samples), 108 NET (98 nonfunctional pancreatic neuroendocrine tumor, NF-NET, and 10 functional pancreatic neuroendocrine tumor, F-NET), 9 acinar cell carcinoma (ACC), 3 pancreatoblastoma (PB), 54 pancreatic ductal adenocarcinoma (PDAC), 20 pancreatic serous cystadenoma (SCA), 19 pancreatic mucinous cystadenoma (MCA), 12 pancreatic ductal intraepithelial neoplasia (PanIN) and 5 intraductal papillary mucinous neoplasm (IPMN) samples. Our results indicate that ABCD1 holds promise as an easily applicable diagnostic marker with exceptional efficacy (AUC=0.999, sensitivity=99.10%, specificity=100%) for differentiating SPN from NET and other pancreatic neoplasms through immunohistochemical staining.

Assuntos

Carcinoma Ductal Pancreático , Tumores Neuroendócrinos , Neoplasias Pancreáticas , Humanos , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/patologia , Pâncreas/patologia , Carcinoma Ductal Pancreático/patologia , Tumores Neuroendócrinos/diagnóstico , Tumores Neuroendócrinos/genética , Tumores Neuroendócrinos/patologia , Ductos Pancreáticos/química , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/análise , Membro 1 da Subfamília D de Transportadores de Cassetes de Ligação de ATP

20.

Arachidonic acid in aging: New roles for old players.

Qian, Chen; Wang, Qing; Qiao, Yusen; Xu, Ze; Zhang, Linlin; Xiao, Haixiang; Lin, Zhixiang; Wu, Mingzhou; Xia, Wenyu; Yang, Huilin; Bai, Jiaxiang; Geng, Dechun.

J Adv Res ; 2024 May 04.

Artigo em Inglês | MEDLINE | ID: mdl-38710468

RESUMO

BACKGROUND: Arachidonic acid (AA), one of the most ubiquitous polyunsaturated fatty acids (PUFAs), provides fluidity to mammalian cell membranes. It is derived from linoleic acid (LA) and can be transformed into various bioactive metabolites, including prostaglandins (PGs), thromboxanes (TXs), lipoxins (LXs), hydroxy-eicosatetraenoic acids (HETEs), leukotrienes (LTs), and epoxyeicosatrienoic acids (EETs), by different pathways. All these processes are involved in AA metabolism. Currently, in the context of an increasingly visible aging world population, several scholars have revealed the essential role of AA metabolism in osteoporosis, chronic obstructive pulmonary disease, and many other aging diseases. AIM OF REVIEW: Although there are some reviews describing the role of AA in some specific diseases, there seems to be no or little information on the role of AA metabolism in aging tissues or organs. This review scrutinizes and highlights the role of AA metabolism in aging and provides a new idea for strategies for treating aging-related diseases. KEY SCIENTIFIC CONCEPTS OF REVIEW: As a member of lipid metabolism, AA metabolism regulates the important lipids that interfere with the aging in several ways. We present a comprehensivereviewofthe role ofAA metabolism in aging, with the aim of relieving the extreme suffering of families and the heavy economic burden on society caused by age-related diseases. We also collected and summarized data on anti-aging therapies associated with AA metabolism, with the expectation of identifying a novel and efficient way to protect against aging.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa