Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
Comput Biol Med ; 179: 108835, 2024 Jul 11.
Article in English | MEDLINE | ID: mdl-38996550

ABSTRACT

Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.

2.
Bioinformatics ; 2024 Jul 23.
Article in English | MEDLINE | ID: mdl-39041594

ABSTRACT

MOTIVATION: In drug development process, a significant portion of budget and research time are dedicated to the lead compound optimization procedure in order to identify potential drugs. This procedure focuses on enhancing the pharmacological and bioactive properties of compounds by optimizing their local substructures. However, due to the vast and discrete chemical structure space and the unpredictable element combinations within this space, the optimization process is inherently complex. Various structure enumeration-based combinatorial optimization methods have shown certain advantages. However, they still have limitations. Those methods fail to consider the differences between molecules and struggle to explore the unknown outer search space. RESULTS: In this study, we propose an adaptive space search-based molecular evolution optimization algorithm (ASSMOEA). It consists of three key modules: construction of molecule-specific search space, molecular evolutionary optimization, and adaptive expansion of molecule-specific search space. Specifically, we design a fragment similarity tree in molecule-specific search space, and apply a dynamic mutation strategy in this space to guide molecular optimization. Then we utilize an encoder-encoder structure to adaptively expand the space. Those three modules are circled iteratively to optimize molecules. Our experiments demonstrate that ASSMOEA outperforms existing methods in terms of molecular optimization. It not only enhances the efficiency of the molecular optimization process, but also exhibits a robust ability to search for correct solutions. AVAILABILITY AND IMPLEMENTATION: The code is freely available on the web at https://github.com/bbbbb-b/MEOAFST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3.
Article in English | MEDLINE | ID: mdl-38954565

ABSTRACT

Synergistic drug combination prediction tasks based on the computational models have been widely studied and applied in the cancer field. However, most of models only consider the interactions between drug pairs and specific cell lines, without taking into account the multiple biological relationships of drug-drug and cell line-cell line that also largely affect synergistic mechanisms. To this end, here we propose a multi-modal deep learning framework, termed MDNNSyn, which adequately applies multi-source information and trains multi-modal features to infer potential synergistic drug combinations. MDNNSyn extracts topology modality features by implementing the multi-layer hypergraph neural network on drug synergy hypergraph and constructs semantic modality features through similarity strategy. A multi-modal fusion network layer with gated neural network is then employed for synergy score prediction. MDNNSyn is compared to five classic and state-of-the-art prediction methods on DrugCombDB and Oncology-Screen datasets. The model achieves area under the curve (AUC) scores of 0.8682 and 0.9013 on two datasets, an improvement of 3.70% and 2.71% over the second-best model. Case study indicates that MDNNSyn is capable of detecting potential synergistic drug combinations.

4.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38935070

ABSTRACT

Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.


Subject(s)
Computational Biology , Gene Regulatory Networks , Neural Networks, Computer , Humans , Computational Biology/methods , Algorithms , Urinary Bladder Neoplasms/genetics , Urinary Bladder Neoplasms/pathology , Escherichia coli/genetics
5.
J Chem Inf Model ; 64(13): 5161-5174, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38870455

ABSTRACT

Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.


Subject(s)
Drug Discovery , Drug Discovery/methods , Drug Design , Algorithms
6.
Article in English | MEDLINE | ID: mdl-38809722

ABSTRACT

Recent methods often introduce attention mechanisms into the skip connections of U-shaped networks to capture features. However, these methods usually overlook spatial information extraction in skip connections and exhibit inefficiency in capturing spatial and channel information. This issue prompts us to reevaluate the design of the skip-connection mechanism and propose a new deep-learning network called the Fusing Spatial and Channel Attention Network, abbreviated as FSCA-Net. FSCA-Net is a novel U-shaped network architecture that utilizes the Parallel Attention Transformer (PAT) to enhance the extraction of spatial and channel features in the skip-connection mechanism, further compensating for downsampling losses. We design the Cross-Attention Bridge Layer (CAB) to mitigate excessive feature and resolution loss when downsampling to the lowest level, ensuring meaningful information fusion during upsampling at the lowest level. Finally, we construct the Dual-Path Channel Attention (DPCA) module to guide channel and spatial information filtering for Transformer features, eliminating ambiguities with decoder features and better concatenating features with semantic inconsistencies between the Transformer and the U-Net decoder. FSCA-Net is designed explicitly for fine-grained segmentation tasks of multiple organs and regions. Our approach achieves over 48% reduction in FLOPs and over 32% reduction in parameters compared to the state-of-the-art method. Moreover, FSCA-Net outperforms existing segmentation methods on seven public datasets, demonstrating exceptional performance. The code has been made available on GitHub: https://github.com/Henry991115/FSCA-Net.

7.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38581416

ABSTRACT

The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.


Subject(s)
Gene Regulatory Networks , Liver Neoplasms , Humans , Systems Biology/methods , Transcriptome , Algorithms , Computational Biology/methods
8.
Interdiscip Sci ; 2024 Apr 06.
Article in English | MEDLINE | ID: mdl-38581626

ABSTRACT

Exploration of the intricate connections between long noncoding RNA (lncRNA) and diseases, referred to as lncRNA-disease associations (LDAs), plays a pivotal and indispensable role in unraveling the underlying molecular mechanisms of diseases and devising practical treatment approaches. It is imperative to employ computational methods for predicting lncRNA-disease associations to circumvent the need for superfluous experimental endeavors. Graph-based learning models have gained substantial popularity in predicting these associations, primarily because of their capacity to leverage node attributes and relationships within the network. Nevertheless, there remains much room for enhancing the performance of these techniques by incorporating and harmonizing the node attributes more effectively. In this context, we introduce a novel model, i.e., Adaptive Message Passing and Feature Fusion (AMPFLDAP), for forecasting lncRNA-disease associations within a heterogeneous network. Firstly, we constructed a heterogeneous network involving lncRNA, microRNA (miRNA), and diseases based on established associations and employing Gaussian interaction profile kernel similarity as a measure. Then, an adaptive topological message passing mechanism is suggested to address the information aggregation for heterogeneous networks. The topological features of nodes in the heterogeneous network were extracted based on the adaptive topological message passing mechanism. Moreover, an attention mechanism is applied to integrate both topological and semantic information to achieve the multimodal features of biomolecules, which are further used to predict potential LDAs. The experimental results demonstrated that the performance of the proposed AMPFLDAP is superior to seven state-of-the-art methods. Furthermore, to validate its efficacy in practical scenarios, we conducted detailed case studies involving three distinct diseases, which conclusively demonstrated AMPFLDAP's effectiveness in the prediction of LDAs.

9.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38426327

ABSTRACT

Cluster assignment is vital to analyzing single-cell RNA sequencing (scRNA-seq) data to understand high-level biological processes. Deep learning-based clustering methods have recently been widely used in scRNA-seq data analysis. However, existing deep models often overlook the interconnections and interactions among network layers, leading to the loss of structural information within the network layers. Herein, we develop a new self-supervised clustering method based on an adaptive multi-scale autoencoder, called scAMAC. The self-supervised clustering network utilizes the Multi-Scale Attention mechanism to fuse the feature information from the encoder, hidden and decoder layers of the multi-scale autoencoder, which enables the exploration of cellular correlations within the same scale and captures deep features across different scales. The self-supervised clustering network calculates the membership matrix using the fused latent features and optimizes the clustering network based on the membership matrix. scAMAC employs an adaptive feedback mechanism to supervise the parameter updates of the multi-scale autoencoder, obtaining a more effective representation of cell features. scAMAC not only enables cell clustering but also performs data reconstruction through the decoding layer. Through extensive experiments, we demonstrate that scAMAC is superior to several advanced clustering and imputation methods in both data clustering and reconstruction. In addition, scAMAC is beneficial for downstream analysis, such as cell trajectory inference. Our scAMAC model codes are freely available at https://github.com/yancy2024/scAMAC.


Subject(s)
Data Analysis , Single-Cell Gene Expression Analysis , Cluster Analysis , Sequence Analysis, RNA , Gene Expression Profiling , Algorithms
10.
Antimicrob Agents Chemother ; 68(3): e0120223, 2024 Mar 06.
Article in English | MEDLINE | ID: mdl-38349157

ABSTRACT

Cystic echinococcosis (CE) is a zoonotic parasitic disease caused by larvae of the Echinococcus granulosus sensu lato (s.l.) cluster. There is an urgent need to develop new drug targets and drug molecules to treat CE. Adenosine monophosphate (AMP)-activated protein kinase (AMPK), a serine/threonine protein kinase consisting of α, ß, and γ subunits, plays a key role in the regulation of energy metabolism. However, the role of AMPK in regulating glucose metabolism in E. granulosus s.l. and its effects on parasite viability is unknown. In this study, we found that targeted knockdown of EgAMPKα or a small-molecule AMPK inhibitor inhibited the viability of E. granulosus sensu stricto (s.s.) and disrupted the ultrastructure. The results of in vivo experiments showed that the AMPK inhibitor had a significant therapeutic effect on E. granulosus s.s.-infected mice and resulted in the loss of cellular structures of the germinal layer. In addition, the inhibition of the EgAMPK/EgGLUT1 pathway limited glucose uptake and glucose metabolism functions in E. granulosus s.s.. Overall, our results suggest that EgAMPK can be a potential drug target for CE and that inhibition of EgAMPK activation is an effective strategy for the treatment of disease.


Subject(s)
Echinococcosis , Echinococcus granulosus , Parasites , Animals , Mice , AMP-Activated Protein Kinases , Echinococcosis/drug therapy , Echinococcosis/parasitology , Zoonoses/parasitology , Glucose , Genotype
11.
Methods ; 224: 71-78, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38395182

ABSTRACT

Molecular optimization, which aims to improve molecular properties by modifying complex molecular structures, is a crucial and challenging task in drug discovery. In recent years, translation models provide a promising way to transform low-property molecules to high-property molecules, which enables molecular optimization to achieve remarkable progress. However, most existing models require matched molecular pairs, which are prone to be limited by the datasets. Although some models do not require matched molecular pairs, their performance is usually sacrificed due to the lack of useful supervising information. To address this issue, a domain-label-guided translation model is proposed in this paper, namely DLTM. In the model, the domain label information of molecules is exploited as a control condition to obtain different embedding representations, enabling the model to generate diverse molecules. Besides, the model adopts a classifier network to identify the property categories of transformed molecules, guiding the model to generate molecules with desired properties. The performance of DLTM is verified on two optimization tasks, namely the quantitative estimation of drug-likeness and penalized logP. Experimental results show that the proposed DLTM is superior to the compared baseline models.


Subject(s)
Drug Discovery
12.
IEEE Trans Cybern ; 54(5): 2798-2810, 2024 May.
Article in English | MEDLINE | ID: mdl-37279140

ABSTRACT

This study focuses on building an intelligent decision-making attention mechanism in which the channel relationship and conduct feature maps among specific deep Dense ConvNet blocks are connected to each other. Thus, develop a novel freezing network with a pyramid spatial channel attention mechanism (FPSC-Net) in deep modeling. This model studies how specific design choices in the large-scale data-driven optimization and creation process affect the balance between the accuracy and effectiveness of the designed deep intelligent model. To this end, this study presents a novel architecture unit, which is termed as the "Activate-and-Freeze" block on popular and highly competitive datasets. In order to extract informative features by fusing spatial and channel-wise information together within local receptive fields and boost the representation power, this study constructs a Dense-attention module (pyramid spatial channel (PSC) attention) to perform feature recalibration, and through the PSC attention to model the interdependence among convolution feature channels. We join the PSC attention module in the activating and back-freezing strategy to search for one of the most important parts of the network for extraction and optimization. Experiments on various large-scale datasets demonstrate that the proposed method can achieve substantially better performance for improving the ConvNets representation power than the other state-of-the-art deep models.

13.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38145949

ABSTRACT

Prediction of drug-target interactions (DTIs) is essential in medicine field, since it benefits the identification of molecular structures potentially interacting with drugs and facilitates the discovery and reposition of drugs. Recently, much attention has been attracted to network representation learning to learn rich information from heterogeneous data. Although network representation learning algorithms have achieved success in predicting DTI, several manually designed meta-graphs limit the capability of extracting complex semantic information. To address the problem, we introduce an adaptive meta-graph-based method, termed AMGDTI, for DTI prediction. In the proposed AMGDTI, the semantic information is automatically aggregated from a heterogeneous network by training an adaptive meta-graph, thereby achieving efficient information integration without requiring domain knowledge. The effectiveness of the proposed AMGDTI is verified on two benchmark datasets. Experimental results demonstrate that the AMGDTI method overall outperforms eight state-of-the-art methods in predicting DTI and achieves the accurate identification of novel DTIs. It is also verified that the adaptive meta-graph exhibits flexibility and effectively captures complex fine-grained semantic information, enabling the learning of intricate heterogeneous network topology and the inference of potential drug-target relationship.


Subject(s)
Algorithms , Medicine , Benchmarking , Drug Delivery Systems , Semantics
14.
Brief Funct Genomics ; 2023 Aug 28.
Article in English | MEDLINE | ID: mdl-37642213

ABSTRACT

The precise identification of drug-protein inter action (DPI) can significantly speed up the drug discovery process. Bioassay methods are time-consuming and expensive to screen for each pair of drug proteins. Machine-learning-based methods cannot accurately predict a large number of DPIs. Compared with traditional computing methods, deep learning methods need less domain knowledge and have strong data learning ability. In this study, we construct a DPI prediction model based on dual channel neural networks with an efficient path attention mechanism, called DCA-DPI. The drug molecular graph and protein sequence are used as the data input of the model, and the residual graph neural network and the residual convolution network are used to learn the feature representation of the drug and protein, respectively, to obtain the feature vector of the drug and the hidden vector of protein. To get a more accurate protein feature vector, the weighted sum of the hidden vector of protein is applied using the neural attention mechanism. In the end, drug and protein vectors are concatenated and input into the full connection layer for classification. In order to evaluate the performance of DCA-DPI, three widely used public data, Human, C.elegans and DUD-E, are used in the experiment. The evaluation metrics values in the experiment are superior to other relevant methods. Experiments show that our model is efficient for DPI prediction.

15.
Methods ; 213: 42-49, 2023 05.
Article in English | MEDLINE | ID: mdl-37001685

ABSTRACT

A large amount of evidence shows that biomarkers are discriminant features related to disease development. Thus, the identification of disease biomarkers has become a basic problem in the analysis of complex diseases in the medical fields, such as disease stage judgment, disease diagnosis and treatment. Research based on networks have become one of the most popular methods. Several algorithms based on networks have been proposed to identify biomarkers, however the networks of genes or molecules ignored the similarities and associations among the samples. It is essential to further understand how to construct and optimize the networks to make the identified biomarkers more accurate. On this basis, more effective strategies can be developed to improve the performance of biomarkers identification. In this study, a multi-objective evolution algorithm based on sample similarity networks has been proposed for disease biomarker identification. Specifically, we design the sample similarity networks to extract the structural characteristic information among samples, which used to calculate the influence of the sample to each class. Besides, based on the networks and the group of biomarkers we choose in every iteration, we can divide samples into different classes by the importance for each class. Then, in the process of evolution algorithm population iteration, we develop the elite guidance strategy and fusion selection strategy to select the biomarkers which make the sample classification more accurate. The experiment results on the five gene expression datasets suggests that the algorithm we proposed is superior over some state-of-the-art disease biomarker identification methods.


Subject(s)
Algorithms , Biomarkers
16.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36592058

ABSTRACT

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.


Subject(s)
Gene Expression Profiling , Single-Cell Gene Expression Analysis , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Normal Distribution , Bayes Theorem , Single-Cell Analysis/methods , Cluster Analysis
17.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36631401

ABSTRACT

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.


Subject(s)
Gene Expression Profiling , Single-Cell Gene Expression Analysis , Humans , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Algorithms , Cluster Analysis
18.
Brief Bioinform ; 24(2)2023 03 19.
Article in English | MEDLINE | ID: mdl-36715275

ABSTRACT

A large number of works have presented the single-cell RNA sequencing (scRNA-seq) to study the diversity and biological functions of cells at the single-cell level. Clustering identifies unknown cell types, which is essential for downstream analysis of scRNA-seq samples. However, the high dimensionality, high noise and pervasive dropout rate of scRNA-seq samples have a significant challenge to the cluster analysis of scRNA-seq samples. Herein, we propose a new adaptive fuzzy clustering model based on the denoising autoencoder and self-attention mechanism called the scDASFK. It implements the comparative learning to integrate cell similar information into the clustering method and uses a deep denoising network module to denoise the data. scDASFK consists of a self-attention mechanism for further denoising where an adaptive clustering optimization function for iterative clustering is implemented. In order to make the denoised latent features better reflect the cell structure, we introduce a new adaptive feedback mechanism to supervise the denoising process through the clustering results. Experiments on 16 real scRNA-seq datasets show that scDASFK performs well in terms of clustering accuracy, scalability and stability. Overall, scDASFK is an effective clustering model with great potential for scRNA-seq samples analysis. Our scDASFK model codes are freely available at https://github.com/LRX2022/scDASFK.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Cluster Analysis , Algorithms
19.
PLoS Comput Biol ; 18(12): e1010772, 2022 12.
Article in English | MEDLINE | ID: mdl-36534702

ABSTRACT

Single cell RNA sequencing (scRNA-seq) enables researchers to characterize transcriptomic profiles at the single-cell resolution with increasingly high throughput. Clustering is a crucial step in single cell analysis. Clustering analysis of transcriptome profiled by scRNA-seq can reveal the heterogeneity and diversity of cells. However, single cell study still remains great challenges due to its high noise and dimension. Subspace clustering aims at discovering the intrinsic structure of data in unsupervised fashion. In this paper, we propose a deep sparse subspace clustering method scDSSC combining noise reduction and dimensionality reduction for scRNA-seq data, which simultaneously learns feature representation and clustering via explicit modelling of scRNA-seq data generation. Experiments on a variety of scRNA-seq datasets from thousands to tens of thousands of cells have shown that scDSSC can significantly improve clustering performance and facilitate the interpretability of clustering and downstream analysis. Compared to some popular scRNA-deq analysis methods, scDSSC outperformed state-of-the-art methods under various clustering performance metrics.


Subject(s)
Single-Cell Gene Expression Analysis , Transcriptome , Sequence Analysis, RNA/methods , Cluster Analysis , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Algorithms
20.
Methods ; 208: 66-74, 2022 12.
Article in English | MEDLINE | ID: mdl-36377123

ABSTRACT

BACKGROUND: Single cell sequencing is a technology for high-throughput sequencing analysis of genome, transcriptome and epigenome at the single cell level. It can improve the shortcomings of traditional methods, reveal the gene structure and gene expression state of a single cell, and reflect the heterogeneity between cells. Among them, the clustering analysis of single-cell RNA data is a very important step, but the clustering of single-cell RNA data is faced with two difficulties, dropout events and dimension curse. At present, many methods are only driven by data, and do not make full use of the existing biological information. RESULTS: In this work, we propose scSSA, a clustering model based on semi-supervised autoencoder, fast independent component analysis (FastICA) and Gaussian mixture clustering. Firstly, the semi-supervised autoencoder imputes and denoises the scRNA-seq data, and then get the low-dimensional latent representation. Secondly, the low-dimensional representation is reduced the dimension and clustered by FastICA and Gaussian mixture model respectively. Finally, scSSA is compared with Seurat, CIDR and other methods on 10 public scRNA-seq datasets. CONCLUSION: The results show that scSSA has superior performance in cell clustering on 10 public datasets. In conclusion, scSSA can accurately identify the cell types and is generally applicable to all kinds of single cell datasets. scSSA has great application potential in the field of scRNA-seq data analysis. Details in the code have been uploaded to the website https://github.com/houtongshuai123/scSSA/.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Sequence Analysis, RNA/methods , RNA-Seq , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Cluster Analysis , RNA
SELECTION OF CITATIONS
SEARCH DETAIL