Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38426327

RESUMO

Cluster assignment is vital to analyzing single-cell RNA sequencing (scRNA-seq) data to understand high-level biological processes. Deep learning-based clustering methods have recently been widely used in scRNA-seq data analysis. However, existing deep models often overlook the interconnections and interactions among network layers, leading to the loss of structural information within the network layers. Herein, we develop a new self-supervised clustering method based on an adaptive multi-scale autoencoder, called scAMAC. The self-supervised clustering network utilizes the Multi-Scale Attention mechanism to fuse the feature information from the encoder, hidden and decoder layers of the multi-scale autoencoder, which enables the exploration of cellular correlations within the same scale and captures deep features across different scales. The self-supervised clustering network calculates the membership matrix using the fused latent features and optimizes the clustering network based on the membership matrix. scAMAC employs an adaptive feedback mechanism to supervise the parameter updates of the multi-scale autoencoder, obtaining a more effective representation of cell features. scAMAC not only enables cell clustering but also performs data reconstruction through the decoding layer. Through extensive experiments, we demonstrate that scAMAC is superior to several advanced clustering and imputation methods in both data clustering and reconstruction. In addition, scAMAC is beneficial for downstream analysis, such as cell trajectory inference. Our scAMAC model codes are freely available at https://github.com/yancy2024/scAMAC.


Assuntos
Análise de Dados , Análise da Expressão Gênica de Célula Única , Análise por Conglomerados , Análise de Sequência de RNA , Perfilação da Expressão Gênica , Algoritmos
2.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38935070

RESUMO

Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.


Assuntos
Biologia Computacional , Redes Reguladoras de Genes , Redes Neurais de Computação , Humanos , Biologia Computacional/métodos , Algoritmos , Neoplasias da Bexiga Urinária/genética , Neoplasias da Bexiga Urinária/patologia , Escherichia coli/genética
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38581416

RESUMO

The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.


Assuntos
Redes Reguladoras de Genes , Neoplasias Hepáticas , Humanos , Biologia de Sistemas/métodos , Transcriptoma , Algoritmos , Biologia Computacional/métodos
4.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36592058

RESUMO

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.


Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Distribuição Normal , Teorema de Bayes , Análise de Célula Única/métodos , Análise por Conglomerados
5.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36631401

RESUMO

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.


Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Humanos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados
6.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36715275

RESUMO

A large number of works have presented the single-cell RNA sequencing (scRNA-seq) to study the diversity and biological functions of cells at the single-cell level. Clustering identifies unknown cell types, which is essential for downstream analysis of scRNA-seq samples. However, the high dimensionality, high noise and pervasive dropout rate of scRNA-seq samples have a significant challenge to the cluster analysis of scRNA-seq samples. Herein, we propose a new adaptive fuzzy clustering model based on the denoising autoencoder and self-attention mechanism called the scDASFK. It implements the comparative learning to integrate cell similar information into the clustering method and uses a deep denoising network module to denoise the data. scDASFK consists of a self-attention mechanism for further denoising where an adaptive clustering optimization function for iterative clustering is implemented. In order to make the denoised latent features better reflect the cell structure, we introduce a new adaptive feedback mechanism to supervise the denoising process through the clustering results. Experiments on 16 real scRNA-seq datasets show that scDASFK performs well in terms of clustering accuracy, scalability and stability. Overall, scDASFK is an effective clustering model with great potential for scRNA-seq samples analysis. Our scDASFK model codes are freely available at https://github.com/LRX2022/scDASFK.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , Algoritmos
7.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38145949

RESUMO

Prediction of drug-target interactions (DTIs) is essential in medicine field, since it benefits the identification of molecular structures potentially interacting with drugs and facilitates the discovery and reposition of drugs. Recently, much attention has been attracted to network representation learning to learn rich information from heterogeneous data. Although network representation learning algorithms have achieved success in predicting DTI, several manually designed meta-graphs limit the capability of extracting complex semantic information. To address the problem, we introduce an adaptive meta-graph-based method, termed AMGDTI, for DTI prediction. In the proposed AMGDTI, the semantic information is automatically aggregated from a heterogeneous network by training an adaptive meta-graph, thereby achieving efficient information integration without requiring domain knowledge. The effectiveness of the proposed AMGDTI is verified on two benchmark datasets. Experimental results demonstrate that the AMGDTI method overall outperforms eight state-of-the-art methods in predicting DTI and achieves the accurate identification of novel DTIs. It is also verified that the adaptive meta-graph exhibits flexibility and effectively captures complex fine-grained semantic information, enabling the learning of intricate heterogeneous network topology and the inference of potential drug-target relationship.


Assuntos
Algoritmos , Medicina , Benchmarking , Sistemas de Liberação de Medicamentos , Semântica
8.
Bioinformatics ; 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39041594

RESUMO

MOTIVATION: In drug development process, a significant portion of budget and research time are dedicated to the lead compound optimization procedure in order to identify potential drugs. This procedure focuses on enhancing the pharmacological and bioactive properties of compounds by optimizing their local substructures. However, due to the vast and discrete chemical structure space and the unpredictable element combinations within this space, the optimization process is inherently complex. Various structure enumeration-based combinatorial optimization methods have shown certain advantages. However, they still have limitations. Those methods fail to consider the differences between molecules and struggle to explore the unknown outer search space. RESULTS: In this study, we propose an adaptive space search-based molecular evolution optimization algorithm (ASSMOEA). It consists of three key modules: construction of molecule-specific search space, molecular evolutionary optimization, and adaptive expansion of molecule-specific search space. Specifically, we design a fragment similarity tree in molecule-specific search space, and apply a dynamic mutation strategy in this space to guide molecular optimization. Then we utilize an encoder-encoder structure to adaptively expand the space. Those three modules are circled iteratively to optimize molecules. Our experiments demonstrate that ASSMOEA outperforms existing methods in terms of molecular optimization. It not only enhances the efficiency of the molecular optimization process, but also exhibits a robust ability to search for correct solutions. AVAILABILITY AND IMPLEMENTATION: The code is freely available on the web at https://github.com/bbbbb-b/MEOAFST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.
Methods ; 224: 71-78, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38395182

RESUMO

Molecular optimization, which aims to improve molecular properties by modifying complex molecular structures, is a crucial and challenging task in drug discovery. In recent years, translation models provide a promising way to transform low-property molecules to high-property molecules, which enables molecular optimization to achieve remarkable progress. However, most existing models require matched molecular pairs, which are prone to be limited by the datasets. Although some models do not require matched molecular pairs, their performance is usually sacrificed due to the lack of useful supervising information. To address this issue, a domain-label-guided translation model is proposed in this paper, namely DLTM. In the model, the domain label information of molecules is exploited as a control condition to obtain different embedding representations, enabling the model to generate diverse molecules. Besides, the model adopts a classifier network to identify the property categories of transformed molecules, guiding the model to generate molecules with desired properties. The performance of DLTM is verified on two optimization tasks, namely the quantitative estimation of drug-likeness and penalized logP. Experimental results show that the proposed DLTM is superior to the compared baseline models.


Assuntos
Descoberta de Drogas
10.
Antimicrob Agents Chemother ; 68(3): e0120223, 2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38349157

RESUMO

Cystic echinococcosis (CE) is a zoonotic parasitic disease caused by larvae of the Echinococcus granulosus sensu lato (s.l.) cluster. There is an urgent need to develop new drug targets and drug molecules to treat CE. Adenosine monophosphate (AMP)-activated protein kinase (AMPK), a serine/threonine protein kinase consisting of α, ß, and γ subunits, plays a key role in the regulation of energy metabolism. However, the role of AMPK in regulating glucose metabolism in E. granulosus s.l. and its effects on parasite viability is unknown. In this study, we found that targeted knockdown of EgAMPKα or a small-molecule AMPK inhibitor inhibited the viability of E. granulosus sensu stricto (s.s.) and disrupted the ultrastructure. The results of in vivo experiments showed that the AMPK inhibitor had a significant therapeutic effect on E. granulosus s.s.-infected mice and resulted in the loss of cellular structures of the germinal layer. In addition, the inhibition of the EgAMPK/EgGLUT1 pathway limited glucose uptake and glucose metabolism functions in E. granulosus s.s.. Overall, our results suggest that EgAMPK can be a potential drug target for CE and that inhibition of EgAMPK activation is an effective strategy for the treatment of disease.


Assuntos
Equinococose , Echinococcus granulosus , Parasitos , Animais , Camundongos , Proteínas Quinases Ativadas por AMP , Equinococose/tratamento farmacológico , Equinococose/parasitologia , Zoonoses/parasitologia , Glucose , Genótipo
11.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35136924

RESUMO

Rapid development of single-cell RNA sequencing (scRNA-seq) technology has allowed researchers to explore biological phenomena at the cellular scale. Clustering is a crucial and helpful step for researchers to study the heterogeneity of cell. Although many clustering methods have been proposed, massive dropout events and the curse of dimensionality in scRNA-seq data make it still difficult to analysis because they reduce the accuracy of clustering methods, leading to misidentification of cell types. In this work, we propose the scHFC, which is a hybrid fuzzy clustering method optimized by natural computation based on Fuzzy C Mean (FCM) and Gath-Geva (GG) algorithms. Specifically, principal component analysis algorithm is utilized to reduce the dimensions of scRNA-seq data after it is preprocessed. Then, FCM algorithm optimized by simulated annealing algorithm and genetic algorithm is applied to cluster the data to output a membership matrix, which represents the initial clustering result and is taken as the input for GG algorithm to get the final clustering results. We also develop a cluster number estimation method called multi-index comprehensive estimation, which can estimate the cluster numbers well by combining four clustering effectiveness indexes. The performance of the scHFC method is evaluated on 17 scRNA-seq datasets, and compared with six state-of-the-art methods. Experimental results validate the better performance of our scHFC method in terms of clustering accuracy and stability of algorithm. In short, scHFC is an effective method to cluster cells for scRNA-seq data, and it presents great potential for downstream analysis of scRNA-seq data. The source code is available at https://github.com/WJ319/scHFC.


Assuntos
Análise de Célula Única , Software , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
12.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34571535

RESUMO

In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.


Assuntos
Inteligência Artificial , Aprendizado Profundo
13.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36125190

RESUMO

The rapid development of biomedicine has produced a large number of biomedical written materials. These unstructured text data create serious challenges for biomedical researchers to find information. Biomedical named entity recognition (BioNER) and biomedical relation extraction (BioRE) are the two most fundamental tasks of biomedical text mining. Accurately and efficiently identifying entities and extracting relations have become very important. Methods that perform two tasks separately are called pipeline models, and they have shortcomings such as insufficient interaction, low extraction quality and easy redundancy. To overcome the above shortcomings, many deep learning-based joint name entity recognition and relation extraction models have been proposed, and they have achieved advanced performance. This paper comprehensively summarize deep learning models for joint name entity recognition and relation extraction for biomedicine. The joint BioNER and BioRE models are discussed in the light of the challenges existing in the BioNER and BioRE tasks. Five joint BioNER and BioRE models and one pipeline model are selected for comparative experiments on four biomedical public datasets, and the experimental results are analyzed. Finally, we discuss the opportunities for future development of deep learning-based joint BioNER and BioRE models.


Assuntos
Aprendizado Profundo , Mineração de Dados/métodos
14.
J Chem Inf Model ; 64(13): 5161-5174, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38870455

RESUMO

Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.


Assuntos
Descoberta de Drogas , Descoberta de Drogas/métodos , Desenho de Fármacos , Algoritmos
15.
Methods ; 213: 42-49, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37001685

RESUMO

A large amount of evidence shows that biomarkers are discriminant features related to disease development. Thus, the identification of disease biomarkers has become a basic problem in the analysis of complex diseases in the medical fields, such as disease stage judgment, disease diagnosis and treatment. Research based on networks have become one of the most popular methods. Several algorithms based on networks have been proposed to identify biomarkers, however the networks of genes or molecules ignored the similarities and associations among the samples. It is essential to further understand how to construct and optimize the networks to make the identified biomarkers more accurate. On this basis, more effective strategies can be developed to improve the performance of biomarkers identification. In this study, a multi-objective evolution algorithm based on sample similarity networks has been proposed for disease biomarker identification. Specifically, we design the sample similarity networks to extract the structural characteristic information among samples, which used to calculate the influence of the sample to each class. Besides, based on the networks and the group of biomarkers we choose in every iteration, we can divide samples into different classes by the importance for each class. Then, in the process of evolution algorithm population iteration, we develop the elite guidance strategy and fusion selection strategy to select the biomarkers which make the sample classification more accurate. The experiment results on the five gene expression datasets suggests that the algorithm we proposed is superior over some state-of-the-art disease biomarker identification methods.


Assuntos
Algoritmos , Biomarcadores
16.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34378011

RESUMO

In silico reuse of old drugs (also known as drug repositioning) to treat common and rare diseases is increasingly becoming an attractive proposition because it involves the use of de-risked drugs, with potentially lower overall development costs and shorter development timelines. Therefore, there is a pressing need for computational drug repurposing methodologies to facilitate drug discovery. In this study, we propose a new method, called DRHGCN (Drug Repositioning based on the Heterogeneous information fusion Graph Convolutional Network), to discover potential drugs for a certain disease. To make full use of different topology information in different domains (i.e. drug-drug similarity, disease-disease similarity and drug-disease association networks), we first design inter- and intra-domain feature extraction modules by applying graph convolution operations to the networks to learn the embedding of drugs and diseases, instead of simply integrating the three networks into a heterogeneous network. Afterwards, we parallelly fuse the inter- and intra-domain embeddings to obtain the more representative embeddings of drug and disease. Lastly, we introduce a layer attention mechanism to combine embeddings from multiple graph convolution layers for further improving the prediction performance. We find that DRHGCN achieves high performance (the average AUROC is 0.934 and the average AUPR is 0.539) in four benchmark datasets, outperforming the current approaches. Importantly, we conducted molecular docking experiments on DRHGCN-predicted candidate drugs, providing several novel approved drugs for Alzheimer's disease (e.g. benzatropine) and Parkinson's disease (e.g. trihexyphenidyl and haloperidol).


Assuntos
Desenvolvimento de Medicamentos/métodos , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos , Modelos Moleculares , Algoritmos , Biomarcadores , Bases de Dados de Produtos Farmacêuticos , Humanos , Curva ROC , Reprodutibilidade dos Testes , Relação Estrutura-Atividade
17.
Bioinformatics ; 38(15): 3703-3709, 2022 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-35699473

RESUMO

MOTIVATION: A large number of studies have shown that clustering is a crucial step in scRNA-seq analysis. Most existing methods are based on unsupervised learning without the prior exploitation of any domain knowledge, which does not utilize available gold-standard labels. When confronted by the high dimensionality and general dropout events of scRNA-seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicate cell type assignment. RESULTS: In this article, we propose a semi-supervised clustering method based on a capsule network named scCNC that integrates domain knowledge into the clustering step. Significantly, we also propose a Semi-supervised Greedy Iterative Training method used to train the whole network. Experiments on some real scRNA-seq datasets show that scCNC can significantly improve clustering performance and facilitate downstream analyses. AVAILABILITY AND IMPLEMENTATION: The source code of scCNC is freely available at https://github.com/WHY-17/scCNC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Software
18.
PLoS Comput Biol ; 18(12): e1010772, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36534702

RESUMO

Single cell RNA sequencing (scRNA-seq) enables researchers to characterize transcriptomic profiles at the single-cell resolution with increasingly high throughput. Clustering is a crucial step in single cell analysis. Clustering analysis of transcriptome profiled by scRNA-seq can reveal the heterogeneity and diversity of cells. However, single cell study still remains great challenges due to its high noise and dimension. Subspace clustering aims at discovering the intrinsic structure of data in unsupervised fashion. In this paper, we propose a deep sparse subspace clustering method scDSSC combining noise reduction and dimensionality reduction for scRNA-seq data, which simultaneously learns feature representation and clustering via explicit modelling of scRNA-seq data generation. Experiments on a variety of scRNA-seq datasets from thousands to tens of thousands of cells have shown that scDSSC can significantly improve clustering performance and facilitate the interpretability of clustering and downstream analysis. Compared to some popular scRNA-deq analysis methods, scDSSC outperformed state-of-the-art methods under various clustering performance metrics.


Assuntos
Análise da Expressão Gênica de Célula Única , Transcriptoma , Análise de Sequência de RNA/métodos , Análise por Conglomerados , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Algoritmos
19.
Methods ; 208: 66-74, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36377123

RESUMO

BACKGROUND: Single cell sequencing is a technology for high-throughput sequencing analysis of genome, transcriptome and epigenome at the single cell level. It can improve the shortcomings of traditional methods, reveal the gene structure and gene expression state of a single cell, and reflect the heterogeneity between cells. Among them, the clustering analysis of single-cell RNA data is a very important step, but the clustering of single-cell RNA data is faced with two difficulties, dropout events and dimension curse. At present, many methods are only driven by data, and do not make full use of the existing biological information. RESULTS: In this work, we propose scSSA, a clustering model based on semi-supervised autoencoder, fast independent component analysis (FastICA) and Gaussian mixture clustering. Firstly, the semi-supervised autoencoder imputes and denoises the scRNA-seq data, and then get the low-dimensional latent representation. Secondly, the low-dimensional representation is reduced the dimension and clustered by FastICA and Gaussian mixture model respectively. Finally, scSSA is compared with Seurat, CIDR and other methods on 10 public scRNA-seq datasets. CONCLUSION: The results show that scSSA has superior performance in cell clustering on 10 public datasets. In conclusion, scSSA can accurately identify the cell types and is generally applicable to all kinds of single cell datasets. scSSA has great application potential in the field of scRNA-seq data analysis. Details in the code have been uploaded to the website https://github.com/houtongshuai123/scSSA/.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise de Sequência de RNA/métodos , RNA-Seq , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , RNA
20.
Methods ; 207: 74-80, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36108992

RESUMO

Non-coding RNA (ncRNA) s play an considerable role in the current biological sciences, such as gene transcription, gene expression, etc. Exploring the ncRNA-protein interactions(NPI) is of great significance, while some experimental techniques are very expensive in terms of time consumption and labor cost. This has promoted the birth of some computational algorithms related to traditional statistics and artificial intelligence. However, these algorithms usually require the sequence or structural feature vector of the molecule. Although graph neural network (GNN) s has been widely used in recent academic and industrial researches, its potential remains unexplored in the field of detecting NPI. Hence, we present a novel GNN-based model to detect NPI in this paper, where the detecting problem of NPI is transformed into the graph link prediction problem. Specifically, the proposed method utilizes two groups of labels to distinguish two different types of nodes: ncRNA and protein, which alleviates the problem of over-coupling in graph network. Subsequently, ncRNA and protein embedding is initially optimized based on the cluster ownership relationship of nodes in the graph. Moreover, the model applies a self-attention mechanism to preserve the graph topology to reduce information loss during pooling. The experimental results indicate that the proposed model indeed has superior performance.


Assuntos
Inteligência Artificial , Redes Neurais de Computação , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Algoritmos , Proteínas
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa