RESUMEN
The gene regulatory network (GRN) plays a vital role in understanding the structure and dynamics of cellular systems, revealing complex regulatory relationships, and exploring disease mechanisms. Recently, deep learning (DL)-based methods have been proposed to infer GRNs from single-cell transcriptomic data and achieved impressive performance. However, these methods do not fully utilize graph topological information and high-order neighbor information from multiple receptive fields. To overcome those limitations, we propose a novel model based on multiview graph attention network, namely, scMGATGRN, to infer GRNs. scMGATGRN mainly consists of GAT, multiview, and view-level attention mechanism. GAT can extract essential features of the gene regulatory network. The multiview model can simultaneously utilize local feature information and high-order neighbor feature information of nodes in the gene regulatory network. The view-level attention mechanism dynamically adjusts the relative importance of node embedding representations and efficiently aggregates node embedding representations from two views. To verify the effectiveness of scMGATGRN, we compared its performance with 10 methods (five shallow learning algorithms and five state-of-the-art DL-based methods) on seven benchmark single-cell RNA sequencing (scRNA-seq) datasets from five cell lines (two in human and three in mouse) with four different kinds of ground-truth networks. The experimental results not only show that scMGATGRN outperforms competing methods but also demonstrate the potential of this model in inferring GRNs. The code and data of scMGATGRN are made freely available on GitHub (https://github.com/nathanyl/scMGATGRN).
Asunto(s)
Redes Reguladoras de Genes , Análisis de la Célula Individual , Transcriptoma , Análisis de la Célula Individual/métodos , Humanos , Biología Computacional/métodos , Algoritmos , Aprendizaje Profundo , Perfilación de la Expresión Génica/métodos , RatonesRESUMEN
The advent of single-cell RNA sequencing (scRNA-seq) technology offers the opportunity to conduct biological research at the cellular level. Single-cell type identification based on unsupervised clustering is one of the fundamental tasks of scRNA-seq data analysis. Although many single-cell clustering methods have been developed recently, few can fully exploit the deep potential relationships between cells, resulting in suboptimal clustering. In this paper, we propose scGAMF, a graph autoencoder-based multi-level kernel subspace fusion framework for scRNA-seq data analysis. Based on multiple top feature sets, scGAMF unifies deep feature embedding and kernel space analysis into a single framework to learn an accurate clustering affinity matrix. First, we construct multiple top feature sets to avoid the high variability caused by single feature set learning. Second, scGAMF uses a graph autoencoder (GAEs) to extract deep information embedded in the data, and learn embeddings including gene expression patterns and cell-cell relationships. Third, to fully explore the deep potential relationships between cells, we design a multi-level kernel space fusion strategy. This strategy uses a kernel expression model with adaptive similarity preservation to learn a self-expression matrix shared by all embedding spaces of a given feature set, and a consensus affinity matrix across multiple top feature sets. Finally, the consensus affinity matrix is used for spectral clustering, visualization, and identification of gene markers. Extensive validation on real datasets shows that scGAMF achieves higher clustering accuracy than many popular single-cell analysis methods.
RESUMEN
Recent advancements in spatially transcriptomics (ST) technologies have enabled the comprehensive measurement of gene expression profiles while preserving the spatial information of cells. Combining gene expression profiles and spatial information has been the most commonly used method to identify spatial functional domains and genes. However, most existing spatial domain decipherer methods are more focused on spatially neighboring structures and fail to take into account balancing the self-characteristics and the spatial structure dependency of spots. Therefore, we propose a novel model called SpaGCAC, which recognizes spatial domains with the help of an adaptive feature-spatial balanced graph convolutional network named AFSBGCN. The AFSBGCN can dynamically learn the relationship between spatial local topology structures and the self-characteristics of spots by adaptively increasing or declining the weight on the self-characteristics during message aggregation. Moreover, to better capture the local structures of spots, SpaGCAC exploits a local topology structure contrastive learning strategy. Meanwhile, SpaGCAC utilizes a probability distribution contrastive learning strategy to increase the similarity of probability distributions for points belonging to the same category. We validate the performance of SpaGCAC for spatial domain identification on four spatial transcriptomic datasets. In comparison with seven spatial domain recognition methods, SpaGCAC achieved the highest NMI median of 0.683 and the second highest ARI median of 0.559 on the multi-slice DLPFC dataset. SpaGCAC achieved the best results on all three other single-slice datasets. The above-mentioned results show that SpaGCAC outperforms most existing methods, providing enhanced insights into tissue heterogeneity.
RESUMEN
Circular RNAs (circRNAs) play vital roles in transcription and translation. Identification of circRNA-RBP (RNA-binding protein) interaction sites has become a fundamental step in molecular and cell biology. Deep learning (DL)-based methods have been proposed to predict circRNA-RBP interaction sites and achieved impressive identification performance. However, those methods cannot effectively capture long-distance dependencies, and cannot effectively utilize the interaction information of multiple features. To overcome those limitations, we propose a DL-based model iCRBP-LKHA using deep hybrid networks for identifying circRNA-RBP interaction sites. iCRBP-LKHA adopts five encoding schemes. Meanwhile, the neural network architecture, which consists of large kernel convolutional neural network (LKCNN), convolutional block attention module with one-dimensional convolution (CBAM-1D) and bidirectional gating recurrent unit (BiGRU), can explore local information, global context information and multiple features interaction information automatically. To verify the effectiveness of iCRBP-LKHA, we compared its performance with shallow learning algorithms on 37 circRNAs datasets and 37 circRNAs stringent datasets. And we compared its performance with state-of-the-art DL-based methods on 37 circRNAs datasets, 37 circRNAs stringent datasets and 31 linear RNAs datasets. The experimental results not only show that iCRBP-LKHA outperforms other competing methods, but also demonstrate the potential of this model in identifying other RNA-RBP interaction sites.
Asunto(s)
Algoritmos , Biología Computacional , Aprendizaje Profundo , Redes Neurales de la Computación , ARN Circular , Proteínas de Unión al ARN , ARN Circular/genética , ARN Circular/metabolismo , Biología Computacional/métodos , Proteínas de Unión al ARN/metabolismo , Proteínas de Unión al ARN/genética , Humanos , Sitios de Unión/genéticaRESUMEN
Since genomics was proposed, the exploration of genes has been the focus of research. The emergence of single-cell RNA sequencing (scRNA-seq) technology makes it possible to explore gene expression at the single-cell level. Due to the limitations of sequencing technology, the data contains a lot of noise. At the same time, it also has the characteristics of highdimensional and sparse. Clustering is a common method of analyzing scRNA-seq data. This paper proposes a novel singlecell clustering method called Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization (MLRR-ATV). The Adaptive Total-Variation (ATV) regularization is introduced into Low-Rank Representation (LRR) model to reduce the influence of noise through gradient learning. Then, the linear and nonlinear manifold structures in the data are learned through Euclidean distance and cosine similarity, and more valuable information is retained. Because the model is non-convex, we use the Alternating Direction Method of Multipliers (ADMM) to optimize the model. We tested the performance of the MLRRATV model on eight real scRNA-seq datasets and selected nine state-of-the-art methods as comparison methods. The experimental results show that the performance of the MLRRATV model is better than the other nine methods.
RESUMEN
Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.
Asunto(s)
Enfermedades de los Peces , Redes Reguladoras de Genes , Infecciones por Rhabdoviridae , Rhabdoviridae , Animales , Rhabdoviridae/genética , Enfermedades de los Peces/genética , Enfermedades de los Peces/virología , Infecciones por Rhabdoviridae/genética , Infecciones por Rhabdoviridae/virología , Carpas/genética , Carpas/virología , Biología Computacional/métodos , Redes Neurales de la Computación , Cyprinidae/genéticaRESUMEN
Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.
Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Redes Neurales de la Computación , Humanos , Biología Computacional/métodos , Algoritmos , Neoplasias de la Vejiga Urinaria/genética , Neoplasias de la Vejiga Urinaria/patología , Escherichia coli/genéticaRESUMEN
BACKGROUND AND OBJECTIVE: Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application. METHODS: In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6. RESULTS: The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition. CONCLUSIONS: The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides.
Asunto(s)
Biología Computacional , Interleucina-6 , Péptidos , Humanos , Péptidos/química , Biología Computacional/métodos , COVID-19 , Algoritmos , Aprendizaje Automático , SARS-CoV-2RESUMEN
The pathogenesis of Alzheimer's disease (AD) is extremely intricate, which makes AD patients almost incurable. Recent studies have demonstrated that analyzing multi-modal data can offer a comprehensive perspective on the different stages of AD progression, which is beneficial for early diagnosis of AD. In this paper, we propose a deep self-reconstruction fusion similarity hashing (DS-FSH) method to effectively capture the AD-related biomarkers from the multi-modal data and leverage them to diagnose AD. Given that most existing methods ignore the topological structure of the data, a deep self-reconstruction model based on random walk graph regularization is designed to reconstruct the multi-modal data, thereby learning the nonlinear relationship between samples. Additionally, a fused similarity hash based on anchor graph is proposed to generate discriminative binary hash codes for multi-modal reconstructed data. This allows sample fused similarity to be effectively modeled by a fusion similarity matrix based on anchor graph while modal correlation can be approximated by Hamming distance. Especially, extracted features from the multi-modal data are classified using deep sparse autoencoders classifier. Finally, experiments conduct on the AD Neuroimaging Initiative database show that DS-FSH outperforms comparable methods of AD classification. To conclude, DS-FSH identifies multi-modal features closely associated with AD, which are expected to contribute significantly to understanding of the pathogenesis of AD.
Asunto(s)
Enfermedad de Alzheimer , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/diagnóstico , Humanos , Algoritmos , Aprendizaje Profundo , Imagen por Resonancia Magnética/métodos , Interpretación de Imagen Asistida por Computador/métodos , Neuroimagen/métodos , Encéfalo/diagnóstico por imagen , Imagen Multimodal/métodosRESUMEN
The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.
Asunto(s)
Redes Reguladoras de Genes , Neoplasias Hepáticas , Humanos , Biología de Sistemas/métodos , Transcriptoma , Algoritmos , Biología Computacional/métodosRESUMEN
Accumulating evidence indicates that microRNAs (miRNAs) can control and coordinate various biological processes. Consequently, abnormal expressions of miRNAs have been linked to various complex diseases. Recognizable proof of miRNA-disease associations (MDAs) will contribute to the diagnosis and treatment of human diseases. Nevertheless, traditional experimental verification of MDAs is laborious and limited to small-scale. Therefore, it is necessary to develop reliable and effective computational methods to predict novel MDAs. In this work, a multi-kernel graph attention deep autoencoder (MGADAE) method is proposed to predict potential MDAs. In detail, MGADAE first employs the multiple kernel learning (MKL) algorithm to construct an integrated miRNA similarity and disease similarity, providing more biological information for further feature learning. Second, MGADAE combines the known MDAs, disease similarity, and miRNA similarity into a heterogeneous network, then learns the representations of miRNAs and diseases through graph convolution operation. After that, an attention mechanism is introduced into MGADAE to integrate the representations from multiple graph convolutional network (GCN) layers. Lastly, the integrated representations of miRNAs and diseases are input into the bilinear decoder to obtain the final predicted association scores. Corresponding experiments prove that the proposed method outperforms existing advanced approaches in MDA prediction. Furthermore, case studies related to two human cancers provide further confirmation of the reliability of MGADAE in practice.
Asunto(s)
MicroARNs , Neoplasias , Humanos , MicroARNs/genética , Reproducibilidad de los Resultados , Biología Computacional/métodos , Neoplasias/genética , AlgoritmosRESUMEN
Single-cell RNA sequencing (scRNA-seq) has rapidly emerged as a powerful technique for analyzing cellular heterogeneity at the individual cell level. In the analysis of scRNA-seq data, cell clustering is a critical step in downstream analysis, as it enables the identification of cell types and the discovery of novel cell subtypes. However, the characteristics of scRNA-seq data, such as high dimensionality and sparsity, dropout events and batch effects, present significant computational challenges for clustering analysis. In this study, we propose scGCC, a novel graph self-supervised contrastive learning model, to address the challenges faced in scRNA-seq data analysis. scGCC comprises two main components: a representation learning module and a clustering module. The scRNA-seq data is first fed into a representation learning module for training, which is then used for data classification through a clustering module. scGCC can learn low-dimensional denoised embeddings, which is advantageous for our clustering task. We introduce Graph Attention Networks (GAT) for cell representation learning, which enables better feature extraction and improved clustering accuracy. Additionally, we propose five data augmentation methods to improve clustering performance by increasing data diversity and reducing overfitting. These methods enhance the robustness of clustering results. Our experimental study on 14 real-world datasets has demonstrated that our model achieves extraordinary accuracy and robustness. We also perform downstream tasks, including batch effect removal, trajectory inference, and marker genes analysis, to verify the biological effectiveness of our model.
Asunto(s)
Análisis de la Célula Individual , Análisis de Expresión Génica de una Sola Célula , Humanos , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Análisis de Datos , Perfilación de la Expresión Génica/métodos , AlgoritmosRESUMEN
Single-cell RNA sequencing (scRNA-Seq) technology has emerged as a powerful tool to investigate cellular heterogeneity within tissues, organs, and organisms. One fundamental question pertaining to single-cell gene expression data analysis revolves around the identification of cell types, which constitutes a critical step within the data processing workflow. However, existing methods for cell type identification through learning low-dimensional latent embeddings often overlook the intercellular structural relationships. In this paper, we present a novel non-negative low-rank similarity correction model (NLRSIM) that leverages subspace clustering to preserve the global structure among cells. This model introduces a novel manifold learning process to address the issue of imbalanced neighbourhood spatial density in cells, thereby effectively preserving local geometric structures. This procedure utilizes a position-sensitive hashing algorithm to construct the graph structure of the data. The experimental results demonstrate that the NLRSIM surpasses other advanced models in terms of clustering effects and visualization experiments. The validated effectiveness of gene expression information after calibration by the NLRSIM model has been duly ascertained in the realm of relevant biological studies. The NLRSIM model offers unprecedented insights into gene expression, states, and structures at the individual cellular level, thereby contributing novel perspectives to the field.
Asunto(s)
Análisis de la Célula Individual , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodosRESUMEN
Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.
Asunto(s)
Algoritmos , ARN Circular , Humanos , ARN Circular/genética , SemánticaRESUMEN
The development of single-cell transcriptome sequencing technologies has opened new ways to study biological phenomena at the cellular level. A key application of such technologies involves the employment of single-cell RNA sequencing (scRNA-seq) data to identify distinct cell types through clustering, which in turn provides evidence for revealing heterogeneity. Despite the promise of this approach, the inherent characteristics of scRNA-seq data, such as higher noise levels and lower coverage, pose major challenges to existing clustering methods and compromise their accuracy. In this study, we propose a method called Adjusted Random walk Graph regularization Sparse Low-Rank Representation (ARGLRR), a practical sparse subspace clustering method, to identify cell types. The fundamental low-rank representation (LRR) model is concerned with the global structure of data. To address the limited ability of the LRR method to capture local structure, we introduced adjusted random walk graph regularization in its framework. ARGLRR allows for the capture of both local and global structures in scRNA-seq data. Additionally, the imposition of similarity constraints into the LRR framework further improves the ability of the proposed model to estimate cell-to-cell similarity and capture global structural relationships between cells. ARGLRR surpasses other advanced comparison approaches on nine known scRNA-seq data sets judging by the results. In the normalized mutual information and Adjusted Rand Index metrics on the scRNA-seq data sets clustering experiments, ARGLRR outperforms the best-performing comparative method by 6.99% and 5.85%, respectively. In addition, we visualize the result using Uniform Manifold Approximation and Projection. Visualization results show that the usage of ARGLRR enhances the separation of different cell types within the similarity matrix.
Asunto(s)
Algoritmos , ARN , Análisis por Conglomerados , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN , Perfilación de la Expresión GénicaRESUMEN
Biclustering algorithms are essential for processing gene expression data. However, to process the dataset, most biclustering algorithms require preprocessing the data matrix into a binary matrix. Regrettably, this type of preprocessing may introduce noise or cause information loss in the binary matrix, which would reduce the biclustering algorithm's ability to effectively obtain the optimal biclusters. In this paper, we propose a new preprocessing method named Mean-Standard Deviation (MSD) to resolve the problem. Additionally, we introduce a new biclustering algorithm called Weight Adjacency Difference Matrix Binary Biclustering (W-AMBB) to effectively process datasets containing overlapping biclusters. The basic idea is to create a weighted adjacency difference matrix by applying weights to a binary matrix that is derived from the data matrix. This allows us to identify genes with significant associations in sample data by efficiently identifying similar genes that respond to specific conditions. Furthermore, the performance of the W-AMBB algorithm was tested on both synthetic and real datasets and compared with other classical biclustering methods. The experiment results demonstrate that the W-AMBB algorithm is significantly more robust than the compared biclustering methods on the synthetic dataset. Additionally, the results of the GO enrichment analysis show that the W-AMBB method possesses biological significance on real datasets.
Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis por Conglomerados , Expresión GénicaRESUMEN
Gene regulatory networks (GRNs) participate in many biological processes, and reconstructing them plays an important role in systems biology. Although many advanced methods have been proposed for GRN reconstruction, their predictive performance is far from the ideal standard, so it is urgent to design a more effective method to reconstruct GRN. Moreover, most methods only consider the gene expression data, ignoring the network structure information contained in GRN. In this study, we propose a supervised model named CNNGRN, which infers GRN from bulk time-series expression data via convolutional neural network (CNN) model, with a more informative feature. Bulk time series gene expression data imply the intricate regulatory associations between genes, and the network structure feature of ground-truth GRN contains rich neighbor information. Hence, CNNGRN integrates the above two features as model inputs. In addition, CNN is adopted to extract intricate features of genes and infer the potential associations between regulators and target genes. Moreover, feature importance visualization experiments are implemented to seek the key features. Experimental results show that CNNGRN achieved competitive performance on benchmark datasets compared to the state-of-the-art computational methods. Finally, hub genes identified based on CNNGRN have been confirmed to be involved in biological processes through literature.
Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Redes Reguladoras de Genes/genética , Factores de Tiempo , Redes Neurales de la Computación , Biología de Sistemas , Biología Computacional/métodosRESUMEN
Circular RNAs (circRNAs) are a category of noncoding RNAs that exist in great numbers in eukaryotes. They have recently been discovered to be crucial in the growth of tumors. Therefore, it is important to explore the association of circRNAs with disease. This paper proposes a new method based on DeepWalk and nonnegative matrix factorization (DWNMF) to predict circRNA-disease association. Based on the known circRNA-disease association, we calculate the topological similarity of circRNA and disease via the DeepWalk-based method to learn the node features on the association network. Next, the functional similarity of the circRNAs and the semantic similarity of the diseases are fused with their respective topological similarities at different scales. Then, we use the improved weighted K-nearest neighbor (IWKNN) method to preprocess the circRNA-disease association network and correct nonnegative associations by setting different parameters K1 and K2 in the circRNA and disease matrices. Finally, the L2,1-norm, dual-graph regularization term and Frobenius norm regularization term are introduced into the nonnegative matrix factorization model to predict the circRNA-disease correlation. We perform cross-validation on circR2Disease, circRNADisease, and MNDR. The numerical results show that DWNMF is an efficient tool for forecasting potential circRNA-disease relationships, outperforming other state-of-the-art approaches in terms of predictive performance.
Asunto(s)
MicroARNs , Neoplasias , Humanos , ARN Circular/genética , Algoritmos , Neoplasias/genética , Análisis por Conglomerados , Biología Computacional/métodosRESUMEN
Single-cell RNA sequencing (scRNA-seq) technology can provide expression profile of single cells, which propels biological research into a new chapter. Clustering individual cells based on their transcriptome is a critical objective of scRNA-seq data analysis. However, the high-dimensional, sparse and noisy nature of scRNA-seq data pose a challenge to single-cell clustering. Therefore, it is urgent to develop a clustering method targeting scRNA-seq data characteristics. Due to its powerful subspace learning capability and robustness to noise, the subspace segmentation method based on low-rank representation (LRR) is broadly used in clustering researches and achieves satisfactory results. In view of this, we propose a personalized low-rank subspace clustering method, namely PLRLS, to learn more accurate subspace structures from both global and local perspectives. Specifically, we first introduce the local structure constraint to capture the local structure information of the data, while helping our method to obtain better inter-cluster separability and intra-cluster compactness. Then, in order to retain the important similarity information that is ignored by the LRR model, we utilize the fractional function to extract similarity information between cells, and introduce this information as the similarity constraint into the LRR framework. The fractional function is an efficient similarity measure designed for scRNA-seq data, which has theoretical and practical implications. In the end, based on the LRR matrix learned from PLRLS, we perform downstream analyses on real scRNA-seq datasets, including spectral clustering, visualization and marker gene identification. Comparative experiments show that the proposed method achieves superior clustering accuracy and robustness.
Asunto(s)
Algoritmos , Análisis de Expresión Génica de una Sola Célula , Humanos , Transcriptoma , Análisis por Conglomerados , Análisis de Datos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodosRESUMEN
Studies have revealed that microbes have an important effect on numerous physiological processes, and further research on the links between diseases and microbes is significant. Given that laboratory methods are expensive and not optimized, computational models are increasingly used for discovering disease-related microbes. Here, a new neighbor approach based on two-tier Bi-Random Walk is proposed for potential disease-related microbes, known as NTBiRW. In this method, the first step is to construct multiple microbe similarities and disease similarities. Then, three kinds of microbe/disease similarity are integrated through two-tier Bi-Random Walk to obtain the final integrated microbe/disease similarity network with different weights. Finally, Weighted K Nearest Known Neighbors (WKNKN) is used for prediction based on the final similarity network. In addition, leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold CV) are applied for evaluating the performance of NTBiRW. Multiple evaluating indicators are taken to show the performance from multiple perspectives. And most of the evaluation index values of NTBiRW are better than those of the compared methods. Moreover, in case studies on atopic dermatitis and psoriasis, most of the first 10 candidates in the final result can be proven. This also demonstrates the capability of NTBiRW for discovering new associations. Therefore, this method can contribute to the discovery of disease-related microbes and thus offer new thoughts for further understanding the pathogenesis of diseases.