Pesquisa | Portal de Pesquisa da BVS

1.

Learning discriminative and structural samples for rare cell types with deep generative model.

Wang, Haiyue; Ma, Xiaoke.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35914950

RESUMO

Cell types (subpopulations) serve as bio-markers for the diagnosis and therapy of complex diseases, and single-cell RNA-sequencing (scRNA-seq) measures expression of genes at cell level, paving the way for the identification of cell types. Although great efforts have been devoted to this issue, it remains challenging to identify rare cell types in scRNA-seq data because of the few-shot problem, lack of interpretability and separation of generating samples and clustering of cells. To attack these issues, a novel deep generative model for leveraging the small samples of cells (aka scLDS2) is proposed by precisely estimating the distribution of different cells, which discriminate the rare and non-rare cell types with adversarial learning. Specifically, to enhance interpretability of samples, scLDS2 generates the sparse faked samples of cells with $\ell _1$-norm, where the relations among cells are learned, facilitating the identification of cell types. Furthermore, scLDS2 directly obtains cell types from the generated samples by learning the block structure such that cells belonging to the same types are similar to each other with the nuclear-norm. scLDS2 joins the generation of samples, classification of the generated and truth samples for cells and feature extraction into a unified generative framework, which transforms the rare cell types detection problem into a classification problem, paving the way for the identification of cell types with joint learning. The experimental results on 20 datasets demonstrate that scLDS2 significantly outperforms 17 state-of-the-art methods in terms of various measurements with 25.12% improvement in adjusted rand index on average, providing an effective strategy for scRNA-seq data with rare cell types. (The software is coded using python, and is freely available for academic https://github.com/xkmaxidian/scLDS2).

Assuntos

Análise de Célula Única , Software , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , RNA/genética , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Sequenciamento do Exoma

2.

Learning deep features and topological structure of cells for clustering of scRNA-sequencing data.

Wang, Haiyue; Ma, Xiaoke.

Brief Bioinform ; 23(3)2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35302164

RESUMO

Single-cell RNA sequencing (scRNA-seq) measures gene transcriptome at the cell level, paving the way for the identification of cell subpopulations. Although deep learning has been successfully applied to scRNA-seq data, these algorithms are criticized for the undesirable performance and interpretability of patterns because of the noises, high-dimensionality and extraordinary sparsity of scRNA-seq data. To address these issues, a novel deep learning subspace clustering algorithm (aka scGDC) for cell types in scRNA-seq data is proposed, which simultaneously learns the deep features and topological structure of cells. Specifically, scGDC extends auto-encoder by introducing a self-representation layer to extract deep features of cells, and learns affinity graph of cells, which provide a better and more comprehensive strategy to characterize structure of cell types. To address heterogeneity of scRNA-seq data, scGDC projects cells of various types onto different subspaces, where types, particularly rare cell types, are well discriminated by utilizing generative adversarial learning. Furthermore, scGDC joins deep feature extraction, structural learning and cell type discovery, where features of cells are extracted under the guidance of cell types, thereby improving performance of algorithms. A total of 15 scRNA-seq datasets from various tissues and organisms with the number of cells ranging from 56 to 63 103 are adopted to validate performance of algorithms, and experimental results demonstrate that scGDC significantly outperforms 14 state-of-the-art methods in terms of various measurements (on average 25.51% by improvement), where (rare) cell types are significantly associated with topology of affinity graph of cells. The proposed model and algorithm provide an effective strategy for the analysis of scRNA-seq data (The software is coded using python, and is freely available for academic https://github.com/xkmaxidian/scGDC).

Assuntos

RNA Citoplasmático Pequeno , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos

3.

Network-based integrative analysis of single-cell transcriptomic and epigenomic data for cell types.

Wu, Wenming; Zhang, Wensheng; Ma, Xiaoke.

Brief Bioinform ; 23(2)2022 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-35043143

RESUMO

Advances in single-cell biotechnologies simultaneously generate the transcriptomic and epigenomic profiles at cell levels, providing an opportunity for investigating cell fates. Although great efforts have been devoted to either of them, the integrative analysis of single-cell multi-omics data is really limited because of the heterogeneity, noises and sparsity of single-cell profiles. In this study, a network-based integrative clustering algorithm (aka NIC) is present for the identification of cell types by fusing the parallel single-cell transcriptomic (scRNA-seq) and epigenomic profiles (scATAC-seq or DNA methylation). To avoid heterogeneity of multi-omics data, NIC automatically learns the cell-cell similarity graphs, which transforms the fusion of multi-omics data into the analysis of multiple networks. Then, NIC employs joint non-negative matrix factorization to learn the shared features of cells by exploiting the structure of learned cell-cell similarity networks, providing a better way to characterize the features of cells. The graph learning and integrative analysis procedures are jointly formulated as an optimization problem, and then the update rules are derived. Thirteen single-cell multi-omics datasets from various tissues and organisms are adopted to validate the performance of NIC, and the experimental results demonstrate that the proposed algorithm significantly outperforms the state-of-the-art methods in terms of various measurements. The proposed algorithm provides an effective strategy for the integrative analysis of single-cell multi-omics data (The software is coded using Matlab, and is freely available for academic https://github.com/xkmaxidian/NIC ).

Assuntos

Análise de Célula Única , Transcriptoma , Algoritmos , Análise por Conglomerados , Epigenômica , Análise de Célula Única/métodos , Software

4.

T_FH cells depend on Tcf1-intrinsic HDAC activity to suppress CTLA4 and guard B-cell help function.

Li, Fengyin; Zhao, Xin; Zhang, Yali; Shao, Peng; Ma, Xiaoke; Paradee, William J; Liu, Chengyu; Wang, Jianmin; Xue, Hai-Hui.

Proc Natl Acad Sci U S A ; 118(2)2021 01 12.

Artigo em Inglês | MEDLINE | ID: mdl-33372138

RESUMO

Precise regulation of coinhibitory receptors is essential for maintaining immune tolerance without interfering with protective immunity, yet the mechanism underlying such a balanced act remains poorly understood. In response to protein immunization, T follicular helper (TFH) cells lacking Tcf1 and Lef1 transcription factors were phenotypically normal but failed to promote germinal center formation and antibody production. Transcriptomic profiling revealed that Tcf1/Lef1-deficient TFH cells aberrantly up-regulated CTLA4 and LAG3 expression, and treatment with anti-CTLA4 alone or combined with anti-LAG3 substantially rectified B-cell help defects by Tcf1/Lef1-deficient TFH cells. Mechanistically, Tcf1 and Lef1 restrain chromatin accessibility at the Ctla4 and Lag3 loci. Groucho/Tle corepressors, which are known to cooperate with Tcf/Lef factors, were essential for TFH cell expansion but dispensable for repressing coinhibitory receptors. In contrast, mutating key amino acids in histone deacetylase (HDAC) domain in Tcf1 resulted in CTLA4 derepression in TFH cells. These findings demonstrate that Tcf1-instrinsic HDAC activity is necessary for preventing excessive CTLA4 induction in protein immunization-elicited TFH cells and hence guarding their B-cell help function.

Assuntos

Fator 1-alfa Nuclear de Hepatócito/metabolismo , Fator 1 de Ligação ao Facilitador Linfoide/metabolismo , Células T Auxiliares Foliculares/imunologia , Animais , Antígenos CD , Linfócitos B/metabolismo , Linfócitos T CD8-Positivos/imunologia , Antígeno CTLA-4/metabolismo , Diferenciação Celular/imunologia , Feminino , Centro Germinativo/imunologia , Fator 1-alfa Nuclear de Hepatócito/imunologia , Tolerância Imunológica , Fator 1 de Ligação ao Facilitador Linfoide/imunologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Proteínas Proto-Oncogênicas c-bcl-6 , Células T Auxiliares Foliculares/metabolismo , Linfócitos T Auxiliares-Indutores/imunologia , Proteína do Gene 3 de Ativação de Linfócitos

5.

jSRC: a flexible and accurate joint learning algorithm for clustering of single-cell RNA-sequencing data.

Wu, Wenming; Liu, Zaiyi; Ma, Xiaoke.

Brief Bioinform ; 22(5)2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-33535230

RESUMO

Single-cell RNA-sequencing (scRNA-seq) explores the transcriptome of genes at cell level, which sheds light on revealing the heterogeneity and dynamics of cell populations. Advances in biotechnologies make it possible to generate scRNA-seq profiles for large-scale cells, requiring effective and efficient clustering algorithms to identify cell types and informative genes. Although great efforts have been devoted to clustering of scRNA-seq, the accuracy, scalability and interpretability of available algorithms are not desirable. In this study, we solve these problems by developing a joint learning algorithm [a.k.a. joints sparse representation and clustering (jSRC)], where the dimension reduction (DR) and clustering are integrated. Specifically, DR is employed for the scalability and joint learning improves accuracy. To increase the interpretability of patterns, we assume that cells within the same type have similar expression patterns, where the sparse representation is imposed on features. We transform clustering of scRNA-seq into an optimization problem and then derive the update rules to optimize the objective of jSRC. Fifteen scRNA-seq datasets from various tissues and organisms are adopted to validate the performance of jSRC, where the number of single cells varies from 49 to 110 824. The experimental results demonstrate that jSRC significantly outperforms 12 state-of-the-art methods in terms of various measurements (on average 20.29% by improvement) with fewer running time. Furthermore, jSRC is efficient and robust across different scRNA-seq datasets from various tissues. Finally, jSRC also accurately identifies dynamic cell types associated with progression of COVID-19. The proposed model and methods provide an effective strategy to analyze scRNA-seq data (the software is coded using MATLAB and is free for academic purposes; https://github.com/xkmaxidian/jSRC).

Assuntos

Algoritmos , Aprendizado de Máquina , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados

6.

Predicting combinations of drugs by exploiting graph embedding of heterogeneous networks.

Song, Fei; Tan, Shiyin; Dou, Zengfa; Liu, Xiaogang; Ma, Xiaoke.

BMC Bioinformatics ; 23(Suppl 1): 34, 2022 Jan 11.

Artigo em Inglês | MEDLINE | ID: mdl-35016602

RESUMO

BACKGROUND: Drug combination, offering an insight into the increased therapeutic efficacy and reduced toxicity, plays an essential role in the therapy of many complex diseases. Although significant efforts have been devoted to the identification of drugs, the identification of drug combination is still a challenge. The current algorithms assume that the independence of feature selection and drug prediction procedures, which may result in an undesirable performance. RESULTS: To address this issue, we develop a novel Semi-supervised Heterogeneous Network Embedding algorithm (called SeHNE) to predict the combination patterns of drugs by exploiting the graph embedding. Specifically, the ATC similarity of drugs, drug-target, and protein-protein interaction networks are integrated to construct the heterogeneous networks. Then, SeHNE jointly learns drug features by exploiting the topological structure of heterogeneous networks and predicting drug combination. One distinct advantage of SeHNE is that features of drugs are extracted under the guidance of classification, which improves the quality of features, thereby enhancing the performance of prediction of drugs. Experimental results demonstrate that the proposed algorithm is more accurate than state-of-the-art methods on various data, implying that the joint learning is promising for the identification of drug combination. CONCLUSIONS: The proposed model and algorithm provide an effective strategy for the prediction of combinatorial patterns of drugs, implying that the graph-based drug prediction is promising for the discovery of drugs.

Assuntos

Algoritmos , Mapas de Interação de Proteínas , Combinação de Medicamentos , Aprendizagem

7.

TLGP: a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain.

Wang, Yan; Xia, Zuheng; Deng, Jingjing; Xie, Xianghua; Gong, Maoguo; Ma, Xiaoke.

BMC Bioinformatics ; 22(Suppl 9): 274, 2021 Aug 25.

Artigo em Inglês | MEDLINE | ID: mdl-34433414

RESUMO

BACKGROUND: Gene prioritization (gene ranking) aims to obtain the centrality of genes, which is critical for cancer diagnosis and therapy since keys genes correspond to the biomarkers or targets of drugs. Great efforts have been devoted to the gene ranking problem by exploring the similarity between candidate and known disease-causing genes. However, when the number of disease-causing genes is limited, they are not applicable largely due to the low accuracy. Actually, the number of disease-causing genes for cancers, particularly for these rare cancers, are really limited. Therefore, there is a critical needed to design effective and efficient algorithms for gene ranking with limited prior disease-causing genes. RESULTS: In this study, we propose a transfer learning based algorithm for gene prioritization (called TLGP) in the cancer (target domain) without disease-causing genes by transferring knowledge from other cancers (source domain). The underlying assumption is that knowledge shared by similar cancers improves the accuracy of gene prioritization. Specifically, TLGP first quantifies the similarity between the target and source domain by calculating the affinity matrix for genes. Then, TLGP automatically learns a fusion network for the target cancer by fusing affinity matrix, pathogenic genes and genomic data of source cancers. Finally, genes in the target cancer are prioritized. The experimental results indicate that the learnt fusion network is more reliable than gene co-expression network, implying that transferring knowledge from other cancers improves the accuracy of network construction. Moreover, TLGP outperforms state-of-the-art approaches in terms of accuracy, improving at least 5%. CONCLUSION: The proposed model and method provide an effective and efficient strategy for gene ranking by integrating genomic data from various cancers.

Assuntos

Biologia Computacional , Neoplasias , Algoritmos , Redes Reguladoras de Genes , Humanos , Aprendizado de Máquina , Neoplasias/genética

8.

Joint learning dimension reduction and clustering of single-cell RNA-sequencing data.

Wu, Wenming; Ma, Xiaoke.

Bioinformatics ; 36(12): 3825-3832, 2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32246821

RESUMO

MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) profiles transcriptome of individual cells, which enables the discovery of cell types or subtypes by using unsupervised clustering. Current algorithms perform dimension reduction before cell clustering because of noises, high-dimensionality and linear inseparability of scRNA-seq data. However, independence of dimension reduction and clustering fails to fully characterize patterns in data, resulting in an undesirable performance. RESULTS: In this study, we propose a flexible and accurate algorithm for scRNA-seq data by jointly learning dimension reduction and cell clustering (aka DRjCC), where dimension reduction is performed by projected matrix decomposition and cell type clustering by non-negative matrix factorization. We first formulate joint learning of dimension reduction and cell clustering into a constrained optimization problem and then derive the optimization rules. The advantage of DRjCC is that feature selection in dimension reduction is guided by cell clustering, significantly improving the performance of cell type discovery. Eleven scRNA-seq datasets are adopted to validate the performance of algorithms, where the number of single cells varies from 49 to 68 579 with the number of cell types ranging from 3 to 14. The experimental results demonstrate that DRjCC significantly outperforms 13 state-of-the-art methods in terms of various measurements on cell type clustering (on average 17.44% by improvement). Furthermore, DRjCC is efficient and robust across different scRNA-seq datasets from various tissues. The proposed model and methods provide an effective strategy to analyze scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The software is coded using matlab, and is free available for academic https://github.com/xkmaxidian/DRjCC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

RNA , Análise de Célula Única , Análise por Conglomerados , Perfilação da Expressão Gênica , Análise de Sequência de RNA

9.

Regularized Multi-View Subspace Clustering for Common Modules Across Cancer Stages.

Zhang, Enli; Ma, Xiaoke.

Molecules ; 23(5)2018 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-29701681

RESUMO

Discovering the common modules that are co-expressed across various stages can lead to an improved understanding of the underlying molecular mechanisms of cancers. There is a shortage of efficient tools for integrative analysis of gene expression and protein interaction networks for discovering common modules associated with cancer progression. To address this issue, we propose a novel regularized multi-view subspace clustering (rMV-spc) algorithm to obtain a representation matrix for each stage and a joint representation matrix that balances the agreement across various stages. To avoid the heterogeneity of data, the protein interaction network is incorporated into the objective of rMV-spc via regularization. Based on the interior point algorithm, we solve the optimization problem to obtain the common modules. By using artificial networks, we demonstrate that the proposed algorithm outperforms state-of-the-art methods in terms of accuracy. Furthermore, the rMV-spc discovers common modules in breast cancer networks based on the breast data, and these modules serve as biomarkers to predict stages of breast cancer. The proposed model and algorithm effectively integrate heterogeneous data for dynamic modules.

Assuntos

Neoplasias da Mama/patologia , Redes Reguladoras de Genes , Mapas de Interação de Proteínas , Algoritmos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Análise por Conglomerados , Feminino , Humanos , Modelos Teóricos , Estadiamento de Neoplasias

10.

Multiple network algorithm for epigenetic modules via the integration of genome-wide DNA methylation and gene expression data.

Ma, Xiaoke; Liu, Zaiyi; Zhang, Zhongyuan; Huang, Xiaotai; Tang, Wanxin.

BMC Bioinformatics ; 18(1): 72, 2017 Jan 31.

Artigo em Inglês | MEDLINE | ID: mdl-28137264

RESUMO

BACKGROUND: With the increase in the amount of DNA methylation and gene expression data, the epigenetic mechanisms of cancers can be extensively investigate. Available methods integrate the DNA methylation and gene expression data into a network by specifying the anti-correlation between them. However, the correlation between methylation and expression is usually unknown and difficult to determine. RESULTS: To address this issue, we present a novel multiple network framework for epigenetic modules, namely, Epigenetic Module based on Differential Networks (EMDN) algorithm, by simultaneously analyzing DNA methylation and gene expression data. The EMDN algorithm prevents the specification of the correlation between methylation and expression. The accuracy of EMDN algorithm is more efficient than that of modern approaches. On the basis of The Cancer Genome Atlas (TCGA) breast cancer data, we observe that the EMDN algorithm can recognize positively and negatively correlated modules and these modules are significantly more enriched in the known pathways than those obtained by other algorithms. These modules can serve as bio-markers to predict breast cancer subtypes by using methylation profiles, where positively and negatively correlated modules are of equal importance in the classification of cancer subtypes. Epigenetic modules also estimate the survival time of patients, and this factor is critical for cancer therapy. CONCLUSIONS: The proposed model and algorithm provide an effective method for the integrative analysis of DNA methylation and gene expression. The algorithm is freely available as an R-package at https://github.com/william0701/EMDN .

Assuntos

Algoritmos , Neoplasias da Mama/genética , Metilação de DNA , Epigênese Genética , Transcriptoma , Neoplasias da Mama/metabolismo , Feminino , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Genômica , Humanos

11.

Multi-Objective Optimization Algorithm to Discover Condition-Specific Modules in Multiple Networks.

Ma, Xiaoke; Sun, Penggang; Zhao, Jianbang.

Molecules ; 22(12)2017 Dec 14.

Artigo em Inglês | MEDLINE | ID: mdl-29240706

RESUMO

The advances in biological technologies make it possible to generate data for multiple conditions simultaneously. Discovering the condition-specific modules in multiple networks has great merit in understanding the underlying molecular mechanisms of cells. The available algorithms transform the multiple networks into a single objective optimization problem, which is criticized for its low accuracy. To address this issue, a multi-objective genetic algorithm for condition-specific modules in multiple networks (MOGA-CSM) is developed to discover the condition-specific modules. By using the artificial networks, we demonstrate that the MOGA-CSM outperforms state-of-the-art methods in terms of accuracy. Furthermore, MOGA-CSM discovers stage-specific modules in breast cancer networks based on The Cancer Genome Atlas (TCGA) data, and these modules serve as biomarkers to predict stages of breast cancer. The proposed model and algorithm provide an effective way to analyze multiple networks.

Assuntos

Algoritmos , Biologia Computacional/métodos , Redes Reguladoras de Genes , Modelos Biológicos , Redes Neurais de Computação , Biomarcadores/metabolismo , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Linhagem Celular , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Estadiamento de Neoplasias , Transdução de Sinais

12.

Revealing Pathway Dynamics in Heart Diseases by Analyzing Multiple Differential Networks.

Ma, Xiaoke; Gao, Long; Karamanlidis, Georgios; Gao, Peng; Lee, Chi Fung; Garcia-Menendez, Lorena; Tian, Rong; Tan, Kai.

PLoS Comput Biol ; 11(6): e1004332, 2015 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-26083688

RESUMO

Development of heart diseases is driven by dynamic changes in both the activity and connectivity of gene pathways. Understanding these dynamic events is critical for understanding pathogenic mechanisms and development of effective treatment. Currently, there is a lack of computational methods that enable analysis of multiple gene networks, each of which exhibits differential activity compared to the network of the baseline/healthy condition. We describe the iMDM algorithm to identify both unique and shared gene modules across multiple differential co-expression networks, termed M-DMs (multiple differential modules). We applied iMDM to a time-course RNA-Seq dataset generated using a murine heart failure model generated on two genotypes. We showed that iMDM achieves higher accuracy in inferring gene modules compared to using single or multiple co-expression networks. We found that condition-specific M-DMs exhibit differential activities, mediate different biological processes, and are enriched for genes with known cardiovascular phenotypes. By analyzing M-DMs that are present in multiple conditions, we revealed dynamic changes in pathway activity and connectivity across heart failure conditions. We further showed that module dynamics were correlated with the dynamics of disease phenotypes during the development of heart failure. Thus, pathway dynamics is a powerful measure for understanding pathogenesis. iMDM provides a principled way to dissect the dynamics of gene pathways and its relationship to the dynamics of disease phenotype. With the exponential growth of omics data, our method can aid in generating systems-level insights into disease progression.

Assuntos

Algoritmos , Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Insuficiência Cardíaca/genética , Animais , Perfilação da Expressão Gênica/métodos , Insuficiência Cardíaca/metabolismo , Camundongos , Camundongos Transgênicos , Biologia de Sistemas , Transcriptoma/genética

13.

Modeling disease progression using dynamics of pathway connectivity.

Ma, Xiaoke; Gao, Long; Tan, Kai.

Bioinformatics ; 30(16): 2343-50, 2014 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-24771518

RESUMO

MOTIVATION: Disease progression is driven by dynamic changes in both the activity and connectivity of molecular pathways. Understanding these dynamic events is critical for disease prognosis and effective treatment. Compared with activity dynamics, connectivity dynamics is poorly explored. RESULTS: We describe the M-module algorithm to identify gene modules with common members but varied connectivity across multiple gene co-expression networks (aka M-modules). We introduce a novel metric to capture the connectivity dynamics of an entire M-module. We find that M-modules with dynamic connectivity have distinct topological and biochemical properties compared with static M-modules and hub genes. We demonstrate that incorporation of module connectivity dynamics significantly improves disease stage prediction. We identify different sets of M-modules that are important for specific disease stage transitions and offer new insights into the molecular events underlying disease progression. Besides modeling disease progression, the algorithm and metric introduced here are broadly applicable to modeling dynamics of molecular pathways. AVAILABILITY AND IMPLEMENTATION: M-module is implemented in R. The source code is freely available at http://www.healthcare.uiowa.edu/labs/tan/M-module.zip.

Assuntos

Algoritmos , Progressão da Doença , Redes Reguladoras de Genes , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Feminino , Perfilação da Expressão Gênica , Humanos , Modelos Genéticos , Máquina de Vetores de Suporte

14.

Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks.

Guo, Xingli; Gao, Lin; Liao, Qi; Xiao, Hui; Ma, Xiaoke; Yang, Xiaofei; Luo, Haitao; Zhao, Guoguang; Bu, Dechao; Jiao, Fei; Shao, Qixiang; Chen, RunSheng; Zhao, Yi.

Nucleic Acids Res ; 41(2): e35, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-23132350

RESUMO

More and more evidences demonstrate that the long non-coding RNAs (lncRNAs) play many key roles in diverse biological processes. There is a critical need to annotate the functions of increasing available lncRNAs. In this article, we try to apply a global network-based strategy to tackle this issue for the first time. We develop a bi-colored network based global function predictor, long non-coding RNA global function predictor ('lnc-GFP'), to predict probable functions for lncRNAs at large scale by integrating gene expression data and protein interaction data. The performance of lnc-GFP is evaluated on protein-coding and lncRNA genes. Cross-validation tests on protein-coding genes with known function annotations indicate that our method can achieve a precision up to 95%, with a suitable parameter setting. Among the 1713 lncRNAs in the bi-colored network, the 1625 (94.9%) lncRNAs in the maximum connected component are all functionally characterized. For the lncRNAs expressed in mouse embryo stem cells and neuronal cells, the inferred putative functions by our method highly match those in the known literature.

Assuntos

Anotação de Sequência Molecular/métodos , RNA Longo não Codificante/fisiologia , Algoritmos , Animais , Encéfalo/metabolismo , Células-Tronco Embrionárias/metabolismo , Expressão Gênica , Humanos , Camundongos , Neurônios/metabolismo , Mapas de Interação de Proteínas , RNA Longo não Codificante/metabolismo

15.

Contrastive and adversarial regularized multi-level representation learning for incomplete multi-view clustering.

Wang, Haiyue; Zhang, Wensheng; Ma, Xiaoke.

Neural Netw ; 172: 106102, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38219677

RESUMO

Incomplete multi-view clustering is a significant task in machine learning, given that complex systems in nature and society cannot be fully observed; it provides an opportunity to exploit the structure and functions of underlying systems. Current algorithms are criticized for failing either to balance data restoration and clustering or to capture the consistency of the representation of various views. To address these problems, a novel Multi-level Representation Learning Contrastive and Adversarial Learning (aka MRL_CAL) for incomplete multi-view clustering is proposed, in which data restoration, consistent representation, and clustering are jointly learned by exploiting features in various subspaces. Specifically, MRL_CAL employs v auto-encoder to obtain a low-level specific-view representation of instances, which restores data by estimating the distribution of the original incomplete data with adversarial learning. Then, MRL_CAL extracts a high-level representation of instances, in which the consistency of various views and labels of clusters is incorporated with contrastive learning. In this case, MRL_CAL simultaneously learns multi-level features of instances in various subspaces, which not only overcomes the confliction of representations but also improves the quality of features. Finally, MRL_CAL transforms incomplete multi-view clustering into an overall objective, where features are learned under the guidance of clustering. Extensive experimental results indicate that MRL_CAL outperforms state-of-the-art algorithms in terms of various measurements, implying that the proposed method is promising for incomplete multi-view clustering.

Assuntos

Algoritmos , Aprendizado de Máquina , Análise por Conglomerados

16.

Learning Consistency and Specificity of Cells From Single-Cell Multi-Omic Data.

Wang, Haiyue; Liu, Zaiyi; Ma, Xiaoke.

IEEE J Biomed Health Inform ; 28(5): 3134-3145, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38709615

RESUMO

Advancements in single-cell technologies concomitantly develop the epigenomic and transcriptomic profiles at the cell levels, providing opportunities to explore the potential biological mechanisms. Even though significant efforts have been dedicated to them, it remains challenging for the integration analysis of multi-omic data of single-cell because of the heterogeneity, complicated coupling and interpretability of data. To handle these issues, we propose a novel self-representation Learning-based Multi-omics data Integrative Clustering algorithm (sLMIC) for the integration of single-cell epigenomic profiles (DNA methylation or scATAC-seq) and transcriptomic (scRNA-seq), which the consistent and specific features of cells are explicitly extracted facilitating the cell clustering. Specifically, sLMIC constructs a graph for each type of single-cell data, thereby transforming omics data into multi-layer networks, which effectively removes heterogeneity of omic data. Then, sLMIC employs the low-rank and exclusivity constraints to separate the self-representation of cells into two parts, i.e., the shared and specific features, which explicitly characterize the consistency and diversity of omic data, providing an effective strategy to model the structure of cell types. Feature extraction and cell clustering are jointly formulated as an overall objective function, where latent features of data are obtained under the guidance of cell clustering. The extensive experimental results on 13 multi-omics datasets of single-cell from diverse organisms and tissues indicate that sLMIC observably exceeds the advanced algorithms regarding various measurements.

Assuntos

Algoritmos , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Análise por Conglomerados , Epigenômica/métodos , Aprendizado de Máquina , Biologia Computacional/métodos , Metilação de DNA/genética , Perfilação da Expressão Gênica/métodos , Transcriptoma/genética , Animais , Multiômica

17.

MNMST: topology of cell networks leverages identification of spatial domains from spatial transcriptomics data.

Wang, Yu; Liu, Zaiyi; Ma, Xiaoke.

Genome Biol ; 25(1): 133, 2024 05 23.

Artigo em Inglês | MEDLINE | ID: mdl-38783355

RESUMO

Advances in spatial transcriptomics provide an unprecedented opportunity to reveal the structure and function of biology systems. However, current algorithms fail to address the heterogeneity and interpretability of spatial transcriptomics data. Here, we present a multi-layer network model for identifying spatial domains in spatial transcriptomics data with joint learning. We demonstrate that spatial domains can be precisely characterized and discriminated by the topological structure of cell networks, facilitating identification and interpretability of spatial domains, which outperforms state-of-the-art baselines. Furthermore, we prove that network model offers an effective and efficient strategy for integrative analysis of spatial transcriptomics data from various platforms.

Assuntos

Transcriptoma , Algoritmos , Perfilação da Expressão Gênica/métodos , Humanos , Redes Reguladoras de Genes

18.

Learning deep representation and discriminative features for clustering of multi-layer networks.

Wu, Wenming; Ma, Xiaoke; Wang, Quan; Gong, Maoguo; Gao, Quanxue.

Neural Netw ; 170: 405-416, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38029721

RESUMO

The multi-layer network consists of the interactions between different layers, where each layer of the network is depicted as a graph, providing a comprehensive way to model the underlying complex systems. The layer-specific modules of multi-layer networks are critical to understanding the structure and function of the system. However, existing methods fail to characterize and balance the connectivity and specificity of layer-specific modules in networks because of the complicated inter- and intra-coupling of various layers. To address the above issues, a joint learning graph clustering algorithm (DRDF) for detecting layer-specific modules in multi-layer networks is proposed, which simultaneously learns the deep representation and discriminative features. Specifically, DRDF learns the deep representation with deep nonnegative matrix factorization, where the high-order topology of the multi-layer network is gradually and precisely characterized. Moreover, it addresses the specificity of modules with discriminative feature learning, where the intra-class compactness and inter-class separation of pseudo-labels of clusters are explored as self-supervised information, thereby providing a more accurate method to explicitly model the specificity of the multi-layer network. Finally, DRDF balances the connectivity and specificity of layer-specific modules with joint learning, where the overall objective of the graph clustering algorithm and optimization rules are derived. The experiments on ten multi-layer networks showed that DRDF not only outperforms eight baselines on graph clustering but also enhances the robustness of algorithms.

Assuntos

Aprendizagem por Discriminação , Aprendizagem , Algoritmos , Análise por Conglomerados , Gestão da Informação

19.

Network-Based Structural Learning Nonnegative Matrix Factorization Algorithm for Clustering of scRNA-Seq Data.

Wu, Wenming; Ma, Xiaoke.

IEEE/ACM Trans Comput Biol Bioinform ; 20(1): 566-575, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-35316190

RESUMO

Single-cell RNA sequencing (scRNA-seq) measures expression profiles at the single-cell level, which sheds light on revealing the heterogeneity and functional diversity among cell populations. The vast majority of current algorithms identify cell types by directly clustering transcriptional profiles, which ignore indirect relations among cells, resulting in an undesirable performance on cell type discovery and trajectory inference. Therefore, there is a critical need for inferring cell types and trajectories by exploiting the interactions among cells. In this study, we propose a network-based structural learning nonnegative matrix factorization algorithm (aka SLNMF) for the identification of cell types in scRNA-seq, which is transformed into a constrained optimization problem. SLNMF first constructs the similarity network for cells and then extracts latent features of the cells by exploiting the topological structure of the cell-cell network. To improve the clustering performance, the structural constraint is imposed on the model to learn the latent features of cells by preserving the structural information of the networks, thereby significantly improving the performance of algorithms. Finally, we track the trajectory of cells by exploring the relationships among cell types. Fourteen scRNA-seq datasets are adopted to validate the performance of algorithms with the number of single cells varying from 49 to 26,484. The experimental results demonstrate that SLNMF significantly outperforms fifteen state-of-the-art methods with 15.32% improvement in terms of accuracy, and it accurately identifies the trajectories of cells. The proposed model and methods provide an effective strategy to analyze scRNA-seq data. (The software is coded using matlab, and is freely available for academic https://github.com/xkmaxidian/SLNMF).

Assuntos

Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados

20.

Clustering of Multilayer Networks Using Joint Learning Algorithm With Orthogonality and Specificity of Features.

Wu, Wenming; Gong, Maoguo; Ma, Xiaoke.

IEEE Trans Cybern ; 53(8): 4972-4985, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-35286272

RESUMO

Complex systems in nature and society consist of various types of interactions, where each type of interaction belongs to a layer, resulting in the so-called multilayer networks. Identifying specific modules for each layer is of great significance for revealing the structure-function relations in multilayer networks. However, the available approaches are criticized undesirable because they fail to explicitly the specificity of modules, and balance the specificity and connectivity of modules. To overcome these drawbacks, we propose an accurate and flexible algorithm by joint learning matrix factorization and sparse representation (jMFSR) for specific modules in multilayer networks, where matrix factorization extracts features of vertices and sparse representation discovers specific modules. To exploit the discriminative latent features of vertices in multilayer networks, jMFSR incorporates linear discriminant analysis (LDA) into non-negative matrix factorization (NMF) to learn features of vertices that distinguish the categories. To explicitly measure the specificity of features, jMFSR decomposes features of vertices into common and specific parts, thereby enhancing the quality of features. Then, jMFSR jointly learns feature extraction, common-specific feature factorization, and clustering of multilayer networks. The experiments on 11 datasets indicate that jMFSR significantly outperforms state-of-the-art baselines in terms of various measurements.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA