Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 104
Filtrar
Mais filtros












Base de dados
Intervalo de ano de publicação
1.
BioData Min ; 17(1): 30, 2024 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-39232802

RESUMO

BACKGROUND: Identifying critical genes is important for understanding the pathogenesis of complex diseases. Traditional studies typically comparing the change of biomecules between normal and disease samples or detecting important vertices from a single static biomolecular network, which often overlook the dynamic changes that occur between different disease stages. However, investigating temporal changes in biomolecular networks and identifying critical genes is critical for understanding the occurrence and development of diseases. METHODS: A novel method called Quantifying Importance of Genes with Tensor Decomposition (QIGTD) was proposed in this study. It first constructs a time series network by integrating both the intra and inter temporal network information, which preserving connections between networks at adjacent stages according to the local similarities. A tensor is employed to describe the connections of this time series network, and a 3-order tensor decomposition method was proposed to capture both the topological information of each network snapshot and the time series characteristics of the whole network. QIGTD is also a learning-free and efficient method that can be applied to datasets with a small number of samples. RESULTS: The effectiveness of QIGTD was evaluated using lung adenocarcinoma (LUAD) datasets and three state-of-the-art methods: T-degree, T-closeness, and T-betweenness were employed as benchmark methods. Numerical experimental results demonstrate that QIGTD outperforms these methods in terms of the indices of both precision and mAP. Notably, out of the top 50 genes, 29 have been verified to be highly related to LUAD according to the DisGeNET Database, and 36 are significantly enriched in LUAD related Gene Ontology (GO) terms, including nuclear division, mitotic nuclear division, chromosome segregation, organelle fission, and mitotic sister chromatid segregation. CONCLUSION: In conclusion, QIGTD effectively captures the temporal changes in gene networks and identifies critical genes. It provides a valuable tool for studying temporal dynamics in biological networks and can aid in understanding the underlying mechanisms of diseases such as LUAD.

2.
Neural Netw ; 179: 106558, 2024 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-39089147

RESUMO

Fine-grained visual categorization in zero-shot setting is a challenging problem in the computer vision community. It requires algorithms to accurately identify fine-grained categories that do not appear during the training phase and have high visual similarity to each other. Existing methods usually address this problem by using attribute information as intermediate knowledge, which provides sufficient fine-grained characteristics of categories and can be transferred from seen categories to unseen categories. However, the learning of attribute visual features is not trivial due to the following two reasons: (i) The visual information about attributes of different types may interfere with the visual feature learning of each other. (ii) The visual characteristics of the same attribute may vary in different categories. To solve these issues, we propose a Multi-Group Multi-Stream attribute Attention network (MGMSA), which not only separates the feature learning of attributes of different types, but also isolates the learning of attribute visual features for categories with big differences in attribute appearance. This avoids the interference between uncorrelated attributes and helps to learn category-specific attribute-related visual features. This is beneficial for distinguishing fine-grained categories with subtle visual differences. Extensive experiments on benchmark datasets show that MGMSA achieves state-of-the-art performance on attribute prediction and fine-grained zero-shot learning.

3.
Genome Biol ; 25(1): 207, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39103856

RESUMO

Cell type identification is an indispensable analytical step in single-cell data analyses. To address the high noise stemming from gene expression data, existing computational methods often overlook the biologically meaningful relationships between genes, opting to reduce all genes to a unified data space. We assume that such relationships can aid in characterizing cell type features and improving cell type recognition accuracy. To this end, we introduce scPriorGraph, a dual-channel graph neural network that integrates multi-level gene biosemantics. Experimental results demonstrate that scPriorGraph effectively aggregates feature values of similar cells using high-quality graphs, achieving state-of-the-art performance in cell type identification.


Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Redes Neurais de Computação , RNA-Seq/métodos , Biologia Computacional/métodos , Algoritmos , Software , Análise da Expressão Gênica de Célula Única
4.
Fundam Res ; 4(4): 752-760, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39156563

RESUMO

The potential for being able to identify individuals at high disease risk solely based on genotype data has garnered significant interest. Although widely applied, traditional polygenic risk scoring methods fall short, as they are built on additive models that fail to capture the intricate associations among single nucleotide polymorphisms (SNPs). This presents a limitation, as genetic diseases often arise from complex interactions between multiple SNPs. To address this challenge, we developed DeepRisk, a biological knowledge-driven deep learning method for modeling these complex, nonlinear associations among SNPs, to provide a more effective method for scoring the risk of common diseases with genome-wide genotype data. Evaluations demonstrated that DeepRisk outperforms existing PRS-based methods in identifying individuals at high risk for four common diseases: Alzheimer's disease, inflammatory bowel disease, type 2 diabetes, and breast cancer.

5.
Brain Sci ; 14(8)2024 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-39199460

RESUMO

The classification of a pre-processed fMRI dataset using functional connectivity (FC)-based features is considered a challenging task because of the set of high-dimensional FC features and the small dataset size. To tackle this specific set of FC high-dimensional features and a small-sized dataset, we propose here a conditional Generative Adversarial Network (cGAN)-based dataset augmenter to first train the cGAN on computed connectivity features of NYU dataset and use the trained cGAN to generate synthetic connectivity features per category. After obtaining a sufficient number of connectivity features per category, a Multi-Head attention mechanism is used as a head for the classification. We name our proposed approach "ASD-GANNet", which is end-to-end and does not require hand-crafted features, as the Multi-Head attention mechanism focuses on the features that are more relevant. Moreover, we compare our results with the six available state-of-the-art techniques from the literature. Our proposed approach results using the "NYU" site as a training set for generating a cGAN-based synthetic dataset are promising. We achieve an overall 10-fold cross-validation-based accuracy of 82%, sensitivity of 82%, and specificity of 81%, outperforming available state-of-the art approaches. A sitewise comparison of our proposed approach also outperforms the available state-of-the-art, as out of the 17 sites, our proposed approach has better results in the 10 sites.

6.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38975895

RESUMO

Spatial transcriptomics provides valuable insights into gene expression within the native tissue context, effectively merging molecular data with spatial information to uncover intricate cellular relationships and tissue organizations. In this context, deciphering cellular spatial domains becomes essential for revealing complex cellular dynamics and tissue structures. However, current methods encounter challenges in seamlessly integrating gene expression data with spatial information, resulting in less informative representations of spots and suboptimal accuracy in spatial domain identification. We introduce stCluster, a novel method that integrates graph contrastive learning with multi-task learning to refine informative representations for spatial transcriptomic data, consequently improving spatial domain identification. stCluster first leverages graph contrastive learning technology to obtain discriminative representations capable of recognizing spatially coherent patterns. Through jointly optimizing multiple tasks, stCluster further fine-tunes the representations to be able to capture complex relationships between gene expression and spatial organization. Benchmarked against six state-of-the-art methods, the experimental results reveal its proficiency in accurately identifying complex spatial domains across various datasets and platforms, spanning tissue, organ, and embryo levels. Moreover, stCluster can effectively denoise the spatial gene expression patterns and enhance the spatial trajectory inference. The source code of stCluster is freely available at https://github.com/hannshu/stCluster.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , Algoritmos , Humanos , Animais , Software , Aprendizado de Máquina
7.
Front Genet ; 15: 1407072, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38846963

RESUMO

Background and Objective: Accurate identification of cancer stages is challenging due to the complexity and heterogeneity of the disease. Current clinical diagnosis methods primarily rely on phenotypic observations, which may not capture early molecular-level changes accurately. Methods: In this study, a novel biomarker recognition method was proposed tailored for cancer stages by considering the change of gene expression relationships. Utilizing the sample-specific information and protein-protein interaction networks, the group specific networks were constructed to address the limited specificity of potential biomarkers. Then, a specific feature recognition method was proposed based on these group specific networks, which employed the random forest algorithm for initial screening followed by a recursive feature elimination process to identify the optimal biomarker subset. During exploring optimal results, a strategy termed the Cost-Benefit Ratio, was devised to facilitate the identification of stage-specific biomarkers. Results: Comparative experiments were conducted on lung adenocarcinoma and breast cancer datasets to validate the method's efficacy and generalizability. The results showed that the identified biomarkers were highly stage-specific, and the F1 scores for predicting cancer stages were significantly improved. For the lung adenocarcinoma dataset, the F1 score reached 97.68%, and for the breast cancer dataset, it achieved 96.87%. These results significantly surpassed those of three conventional methods in terms of F1 scores. Moreover, from the perspective of biological functions, the biomarkers were proved playing an important role in cancer stage-evolution. Conclusion: The proposed method demonstrated its effectiveness in identifying stage-related biomarkers. By using these biomarkers as features, accurate prediction of cancer stages was achieved. Furthermore, the method exhibited potential for biomarker identification in subtype analyses, offering novel perspectives for cancer prognosis.

8.
Int J Mol Sci ; 25(8)2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38673997

RESUMO

The pathogenesis of carcinoma is believed to come from the combined effect of polygenic variation, and the initiation and progression of malignant tumors are closely related to the dysregulation of biological pathways. Quantifying the alteration in pathway activation and identifying coordinated patterns of pathway dysfunction are the imperative part of understanding the malignancy process and distinguishing different tumor stages or clinical outcomes of individual patients. In this study, we have conducted in silico pathway activation analysis using Riemannian manifold (RiePath) toward pan-cancer personalized characterization, which is the first attempt to apply the Riemannian manifold theory to measure the extent of pathway dysregulation in individual patient on the tangent space of the Riemannian manifold. RiePath effectively integrates pathway and gene expression information, not only generating a relatively low-dimensional and biologically relevant representation, but also identifying a robust panel of biologically meaningful pathway signatures as biomarkers. The pan-cancer analysis across 16 cancer types reveals the capability of RiePath to evaluate pathway activation accurately and identify clinical outcome-related pathways. We believe that RiePath has the potential to provide new prospects in understanding the molecular mechanisms of complex diseases and may find broader applications in predicting biomarkers for other intricate diseases.


Assuntos
Neoplasias , Medicina de Precisão , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Medicina de Precisão/métodos , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Regulação Neoplásica da Expressão Gênica , Transdução de Sinais , Perfilação da Expressão Gênica/métodos , Algoritmos , Biologia Computacional/métodos , Redes Reguladoras de Genes , Simulação por Computador
9.
iScience ; 27(4): 109387, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38510118

RESUMO

Identifying cancer genes is vital for cancer diagnosis and treatment. However, because of the complexity of cancer occurrence and limited cancer genes knowledge, it is hard to identify cancer genes accurately using only a few omics data, and the overall performance of existing methods is being called for further improvement. Here, we introduce a two-stage gradual-learning strategy GLIMS to predict cancer genes using integrative features from multi-omics data. Firstly, it uses a semi-supervised hierarchical graph neural network to predict the initial candidate cancer genes by integrating multi-omics data and protein-protein interaction (PPI) network. Then, it uses an unsupervised approach to further optimize the initial prediction by integrating the co-splicing network in post-transcriptional regulation, which plays an important role in cancer development. Systematic experiments on multi-omics cancer data demonstrated that GLIMS outperforms the state-of-the-art methods for the identification of cancer genes and it could be a useful tool to help advance cancer analysis.

10.
Comput Biol Med ; 171: 108108, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38359659

RESUMO

While genome-wide association studies (GWAS) have unequivocally identified vast disease susceptibility variants, a majority of them are situated in non-coding regions and are in high linkage disequilibrium (LD). To pave the way of translating GWAS signals to clinical drug targets, it is essential to identify the underlying causal variants and further causal genes. To this end, a myriad of post-GWAS methods have been devised, each grounded in distinct principles including fine-mapping, co-localization, and transcriptome-wide association study (TWAS) techniques. Yet, no platform currently exists that seamlessly integrates these diverse post-GWAS methodologies. In this work, we present a user-friendly web server for post-GWAS analysis, that seamlessly integrates 9 distinct methods with 12 models, categorized by fine-mapping, colocalization, and TWAS. The server mainly helps users decipher the causality hindered by complex GWAS signals, including casual variants and casual genes, without the burden of computational skills and complex environment configuration, and provides a convenient platform for post-GWAS analysis, result visualization, facilitating the understanding and interpretation of the genome-wide association studies. The postGWAS server is available at http://g2g.biographml.com/.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Humanos , Estudo de Associação Genômica Ampla/métodos , Desequilíbrio de Ligação/genética , Transcriptoma , Polimorfismo de Nucleotídeo Único/genética , Predisposição Genética para Doença/genética
11.
Front Microbiol ; 15: 1345794, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38314434

RESUMO

Introduction: Seasonal influenza A H3N2 viruses are constantly changing, reducing the effectiveness of existing vaccines. As a result, the World Health Organization (WHO) needs to frequently update the vaccine strains to match the antigenicity of emerged H3N2 variants. Traditional assessments of antigenicity rely on serological methods, which are both labor-intensive and time-consuming. Although numerous computational models aim to simplify antigenicity determination, they either lack a robust quantitative linkage between antigenicity and viral sequences or focus restrictively on selected features. Methods: Here, we propose a novel computational method to predict antigenic distances using multiple features, including not only viral sequence attributes but also integrating four distinct categories of features that significantly affect viral antigenicity in sequences. Results: This method exhibits low error in virus antigenicity prediction and achieves superior accuracy in discerning antigenic drift. Utilizing this method, we investigated the evolution process of the H3N2 influenza viruses and identified a total of 21 major antigenic clusters from 1968 to 2022. Discussion: Interestingly, our predicted antigenic map aligns closely with the antigenic map generated with serological data. Thus, our method is a promising tool for detecting antigenic variants and guiding the selection of vaccine candidates.

12.
iScience ; 27(1): 108592, 2024 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-38205240

RESUMO

A key regulatory mechanism involves circular RNA (circRNA) acting as a sponge to modulate microRNA (miRNA), and thus, studying their interaction has significant medical implications. In this field, there are currently two pressing issues that remain unresolved. Firstly, due to the scarcity of verified interactions, we require a minimal amount of samples for training. Secondly, the current models lack interpretability. Therefore, we propose SPBCMI, a method that combines sequence features extracted using the Bidirectional Encoder Representations from Transformer (BERT) model and structural features of biological molecule networks extracted through graph embedding to train a GBDT (Gradient-boosted decision trees) classifier for prediction. Our method yielded an AUC of 0.9143, which is currently the best for this problem. Furthermore, in the case study, SPBCMI accurately predicted 7 out of 10 circRNA-miRNA interactions. These results show that our method provides an innovative and high-performing approach to understanding the interaction between circRNA and miRNA.

13.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38290765

RESUMO

SUMMARY: Single-cell multi-omics technologies provide a unique platform for characterizing cell states and reconstructing developmental process by simultaneously quantifying and integrating molecular signatures across various modalities, including genome, transcriptome, epigenome, and other omics layers. However, there is still an urgent unmet need for novel computational tools in this nascent field, which are critical for both effective and efficient interrogation of functionality across different omics modalities. Scbean represents a user-friendly Python library, designed to seamlessly incorporate a diverse array of models for the examination of single-cell data, encompassing both paired and unpaired multi-omics data. The library offers uniform and straightforward interfaces for tasks, such as dimensionality reduction, batch effect elimination, cell label transfer from well-annotated scRNA-seq data to scATAC-seq data, and the identification of spatially variable genes. Moreover, Scbean's models are engineered to harness the computational power of GPU acceleration through Tensorflow, rendering them capable of effortlessly handling datasets comprising millions of cells. AVAILABILITY AND IMPLEMENTATION: Scbean is released on the Python Package Index (PyPI) (https://pypi.org/project/scbean/) and GitHub (https://github.com/jhu99/scbean) under the MIT license. The documentation and example code can be found at https://scbean.readthedocs.io/en/latest/.


Assuntos
Multiômica , Software , Genoma , Transcriptoma , Análise de Célula Única , Análise de Dados
14.
PLoS One ; 19(1): e0291741, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38181020

RESUMO

Although various methods have been developed to detect structural variations (SVs) in genomic sequences, few are used to validate these results. Several commonly used SV callers produce many false positive SVs, and existing validation methods are not accurate enough. Therefore, a highly efficient and accurate validation method is essential. In response, we propose SVvalidation-a new method that uses long-read sequencing data for validating SVs with higher accuracy and efficiency. Compared to existing methods, SVvalidation performs better in validating SVs in repeat regions and can determine the homozygosity or heterozygosity of an SV. Additionally, SVvalidation offers the highest recall, precision, and F1-score (improving by 7-16%) across all datasets. Moreover, SVvalidation is suitable for different types of SVs. The program is available at https://github.com/nwpuzhengyan/SVvalidation.


Assuntos
Aberrações Cromossômicas , Variação Estrutural do Genoma , Humanos , Projetos de Pesquisa , Genômica , Heterozigoto
15.
Brief Funct Genomics ; 23(2): 118-127, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-36752035

RESUMO

Analysis of cell-cell communication (CCC) in the tumor micro-environment helps decipher the underlying mechanism of cancer progression and drug tolerance. Currently, single-cell RNA-Seq data are available on a large scale, providing an unprecedented opportunity to predict cellular communications. There have been many achievements and applications in inferring cell-cell communication based on the known interactions between molecules, such as ligands, receptors and extracellular matrix. However, the prior information is not quite adequate and only involves a fraction of cellular communications, producing many false-positive or false-negative results. To this end, we propose an improved hierarchical variational autoencoder (HiVAE) based model to fully use single-cell RNA-seq data for automatically estimating CCC. Specifically, the HiVAE model is used to learn the potential representation of cells on known ligand-receptor genes and all genes in single-cell RNA-seq data, respectively, which are then utilized for cascade integration. Subsequently, transfer entropy is employed to measure the transmission of information flow between two cells based on the learned representations, which are regarded as directed communication relationships. Experiments are conducted on single-cell RNA-seq data of the human skin disease dataset and the melanoma dataset, respectively. Results show that the HiVAE model is effective in learning cell representations, and transfer entropy could be used to estimate the communication scores between cell types.


Assuntos
Neoplasias , Análise da Expressão Gênica de Célula Única , Humanos , Análise de Célula Única/métodos , Comunicação Celular , Sequenciamento do Exoma , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , Microambiente Tumoral
16.
Neural Netw ; 169: 475-484, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37939536

RESUMO

The assessment of Enterprise Credit Risk (ECR) is a critical technique for investment decisions and financial regulation. Previous methods usually construct enterprise representations by credit-related indicators, such as liquidity and staff quality. However, indicators of many enterprises are not accessible, especially for the small- and medium-sized enterprises. To alleviate the indicator deficiency, graph learning based methods are proposed to enhance enterprise representation learning by the neighbor structure of enterprise graphs. However, existing methods usually only focus on pairwise relationships, and overlook the ubiquitous high-order relationships among enterprises, e.g., supply chain connecting multiple enterprises. To resolve this issue, we propose a Multi-Structure Cascaded Graph Neural Network framework (MS-CGNN) for ECR assessment. It enhances enterprise representation learning based on enterprise graph structures of different granularity, including knowledge graphs of pairwise relationships, homogeneous and heterogeneous hypergraphs of high-order relationships. To distinguish influences of different types of hyperedges, MS-CGNN redefine new type-dependent hyperedge weight matrices for heterogeneous hypergraph convolutions. Experimental results show that MS-CGNN achieves state-of-the-art performance on real-world ECR datasets.


Assuntos
Investimentos em Saúde , Aprendizagem , Humanos , Conhecimento , Redes Neurais de Computação , Medição de Risco
17.
Artigo em Inglês | MEDLINE | ID: mdl-38055356

RESUMO

Acquiring big-size datasets to raise the performance of deep models has become one of the most critical problems in representation learning (RL) techniques, which is the core potential of the emerging paradigm of federated learning (FL). However, most current FL models concentrate on seeking an identical model for isolated clients and thus fail to make full use of the data specificity between clients. To enhance the classification performance of each client, this study introduces the FDRL, a federated discriminative RL model, by partitioning the data features of each client into a global subspace and a local subspace. More specifically, FDRL learns the global representation for federated communication between those isolated clients, which is to capture common features from all protected datasets via model sharing, and local representations for personalization in each client, which is to preserve specific features of clients via model differentiating. Toward this goal, FDRL in each client trains a shared submodel for federated communication and, meanwhile, a not-shared submodel for locality preservation, in which the two models partition client-feature space by maximizing their differences, followed by a linear model fed with combined features for image classification. The proposed model is implemented with neural networks and optimized in an iterative manner between the server of computing the global model and the clients of learning the local classifiers. Thanks to the powerful capability of local feature preservation, FDRL leads to more discriminative data representations than the compared FL models. Experimental results on public datasets demonstrate that our FDRL benefits from the subspace partition and achieves better performance on federated image classification than the state-of-the-art FL models.

18.
Sci Rep ; 13(1): 21407, 2023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-38049546

RESUMO

A scientific and rational evaluation of teaching is essential for personalized learning. In the current teaching assessment model that solely relies on Grade Point Average (GPA), learners with different learning abilities may be classified as the same type of student. It is challenging to uncover the underlying logic behind different learning patterns when GPA scores are the same. To address the limitations of pure GPA evaluation, we propose a data-driven assessment strategy as a supplement to the current methodology. Firstly, we integrate self-paced learning and graph memory neural networks to develop a learning performance prediction model called the self-paced graph memory network. Secondly, inspired by outliers in linear regression, we use a t-test approach to identify those student samples whose loss values significantly differ from normal samples, indicating that these students have different inherent learning patterns/logic compared to the majority. We find that these learners' GPA levels are distributed across different levels. Through analyzing the learning process data of learners with the same GPA level, we find that our data-driven strategy effectively addresses the shortcomings of the GPA evaluation model. Furthermore, we validate the rationality of our method for student data modeling through protein classification experiments and student performance prediction experiments, it ensuring the rationality and effectiveness of our method.

19.
Commun Biol ; 6(1): 1268, 2023 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-38097699

RESUMO

Recent developments in single-cell technology have enabled the exploration of cellular heterogeneity at an unprecedented level, providing invaluable insights into various fields, including medicine and disease research. Cell type annotation is an essential step in its omics research. The mainstream approach is to utilize well-annotated single-cell data to supervised learning for cell type annotation of new singlecell data. However, existing methods lack good generalization and robustness in cell annotation tasks, partially due to difficulties in dealing with technical differences between datasets, as well as not considering the heterogeneous associations of genes in regulatory mechanism levels. Here, we propose the scPML model, which utilizes various gene signaling pathway data to partition the genetic features of cells, thus characterizing different interaction maps between cells. Extensive experiments demonstrate that scPML performs better in cell type annotation and detection of unknown cell types from different species, platforms, and tissues.


Assuntos
Medicina , Análise da Expressão Gênica de Célula Única , Transdução de Sinais , Tecnologia
20.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37903416

RESUMO

The emergence of single-cell RNA sequencing (scRNA-seq) technology has revolutionized the identification of cell types and the study of cellular states at a single-cell level. Despite its significant potential, scRNA-seq data analysis is plagued by the issue of missing values. Many existing imputation methods rely on simplistic data distribution assumptions while ignoring the intrinsic gene expression distribution specific to cells. This work presents a novel deep-learning model, named scMultiGAN, for scRNA-seq imputation, which utilizes multiple collaborative generative adversarial networks (GAN). Unlike traditional GAN-based imputation methods that generate missing values based on random noises, scMultiGAN employs a two-stage training process and utilizes multiple GANs to achieve cell-specific imputation. Experimental results show the efficacy of scMultiGAN in imputation accuracy, cell clustering, differential gene expression analysis and trajectory analysis, significantly outperforming existing state-of-the-art techniques. Additionally, scMultiGAN is scalable to large scRNA-seq datasets and consistently performs well across sequencing platforms. The scMultiGAN code is freely available at https://github.com/Galaxy8172/scMultiGAN.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Análise por Conglomerados , Sequenciamento do Exoma , Análise de Dados , Análise de Sequência de RNA , Perfilação da Expressão Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...