Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 138
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38568771

RESUMO

The pathogenesis of Alzheimer's disease (AD) is extremely intricate, which makes AD patients almost incurable. Recent studies have demonstrated that analyzing multi-modal data can offer a comprehensive perspective on the different stages of AD progression, which is beneficial for early diagnosis of AD. In this paper, we propose a deep self-reconstruction fusion similarity hashing (DS-FSH) method to effectively capture the AD-related biomarkers from the multi-modal data and leverage them to diagnose AD. Given that most existing methods ignore the topological structure of the data, a deep self-reconstruction model based on random walk graph regularization is designed to reconstruct the multi-modal data, thereby learning the nonlinear relationship between samples. Additionally, a fused similarity hash based on anchor graph is proposed to generate discriminative binary hash codes for multi-modal reconstructed data. This allows sample fused similarity to be effectively modeled by a fusion similarity matrix based on anchor graph while modal correlation can be approximated by Hamming distance. Especially, extracted features from the multi-modal data are classified using deep sparse autoencoders classifier. Finally, experiments conduct on the AD Neuroimaging Initiative database show that DS-FSH outperforms comparable methods of AD classification. To conclude, DS-FSH identifies multi-modal features closely associated with AD, which are expected to contribute significantly to understanding of the pathogenesis of AD.

2.
IEEE J Biomed Health Inform ; 28(5): 3029-3041, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38427553

RESUMO

The roles of brain region activities and genotypic functions in the pathogenesis of Alzheimer's disease (AD) remain unclear. Meanwhile, current imaging genetics methods are difficult to identify potential pathogenetic markers by correlation analysis between brain network and genetic variation. To discover disease-related brain connectome from the specific brain structure and the fine-grained level, based on the Automated Anatomical Labeling (AAL) and human Brainnetome atlases, the functional brain network is first constructed for each subject. Specifically, the upper triangle elements of the functional connectivity matrix are extracted as connectivity features. The clustering coefficient and the average weighted node degree are developed to assess the significance of every brain area. Since the constructed brain network and genetic data are characterized by non-linearity, high-dimensionality, and few subjects, the deep subspace clustering algorithm is proposed to reconstruct the original data. Our multilayer neural network helps capture the non-linear manifolds, and subspace clustering learns pairwise affinities between samples. Moreover, most approaches in neuroimaging genetics are unsupervised learning, neglecting the diagnostic information related to diseases. We presented a label constraint with diagnostic status to instruct the imaging genetics correlation analysis. To this end, a diagnosis-guided deep subspace clustering association (DDSCA) method is developed to discover brain connectome and risk genetic factors by integrating genotypes with functional network phenotypes. Extensive experiments prove that DDSCA achieves superior performance to most association methods and effectively selects disease-relevant genetic markers and brain connectome at the coarse-grained and fine-grained levels.


Assuntos
Doença de Alzheimer , Encéfalo , Imageamento por Ressonância Magnética , Humanos , Doença de Alzheimer/genética , Doença de Alzheimer/diagnóstico por imagem , Análise por Conglomerados , Encéfalo/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Conectoma/métodos , Algoritmos , Idoso , Biomarcadores , Feminino , Masculino , Atlas como Assunto , Neuroimagem/métodos
3.
IEEE J Biomed Health Inform ; 28(5): 3178-3185, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38408006

RESUMO

CircRNA has been proved to play an important role in the diseases diagnosis and treatment. Considering that the wet-lab is time-consuming and expensive, computational methods are viable alternative in these years. However, the number of circRNA-disease associations (CDAs) that can be verified is relatively few, and some methods do not take full advantage of dependencies between attributes. To solve these problems, this paper proposes a novel method based on Kernel Fusion and Deep Auto-encoder (KFDAE) to predict the potential associations between circRNAs and diseases. Firstly, KFDAE uses a non-linear method to fuse the circRNA similarity kernels and disease similarity kernels. Then the vectors are connected to make the positive and negative sample sets, and these data are send to deep auto-encoder to reduce dimension and extract features. Finally, three-layer deep feedforward neural network is used to learn features and gain the prediction score. The experimental results show that compared with existing methods, KFDAE achieves the best performance. In addition, the results of case studies prove the effectiveness and practical significance of KFDAE, which means KFDAE is able to capture more comprehensive information and generate credible candidate for subsequent wet-lab.


Assuntos
Algoritmos , Biologia Computacional , Redes Neurais de Computação , RNA Circular , Humanos , RNA Circular/genética , Biologia Computacional/métodos , Aprendizado Profundo
4.
Artigo em Inglês | MEDLINE | ID: mdl-36912759

RESUMO

The development and widespread utilization of high-throughput sequencing technologies in biology has fueled the rapid growth of single-cell RNA sequencing (scRNA-seq) data over the past decade. The development of scRNA-seq technology has significantly expanded researchers' understanding of cellular heterogeneity. Accurate cell type identification is the prerequisite for any research on heterogeneous cell populations. However, due to the high noise and high dimensionality of scRNA-seq data, improving the effectiveness of cell type identification remains a challenge. As an effective dimensionality reduction method, Principal Component Analysis (PCA) is an essential tool for visualizing high-dimensional scRNA-seq data and identifying cell subpopulations. However, traditional PCA has some defects when used in mining the nonlinear manifold structure of the data and usually suffers from over-density of principal components (PCs). Therefore, we present a novel method in this paper called joint L2,p-norm and random walk graph constrained PCA (RWPPCA). RWPPCA aims to retain the data's local information in the process of mapping high-dimensional data to low-dimensional space, to more accurately obtain sparse principal components and to then identify cell types more precisely. Specifically, RWPPCA combines the random walk (RW) algorithm with graph regularization to more accurately determine the local geometric relationships between data points. Moreover, to mitigate the adverse effects of dense PCs, the L2,p-norm is introduced to make the PCs sparser, thus increasing their interpretability. Then, we evaluate the effectiveness of RWPPCA on simulated data and scRNA-seq data. The results show that RWPPCA performs well in cell type identification and outperforms other comparison methods.


Assuntos
Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Análise de Componente Principal , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados
5.
IEEE J Biomed Health Inform ; 28(2): 1110-1121, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38055359

RESUMO

Accumulating evidence indicates that microRNAs (miRNAs) can control and coordinate various biological processes. Consequently, abnormal expressions of miRNAs have been linked to various complex diseases. Recognizable proof of miRNA-disease associations (MDAs) will contribute to the diagnosis and treatment of human diseases. Nevertheless, traditional experimental verification of MDAs is laborious and limited to small-scale. Therefore, it is necessary to develop reliable and effective computational methods to predict novel MDAs. In this work, a multi-kernel graph attention deep autoencoder (MGADAE) method is proposed to predict potential MDAs. In detail, MGADAE first employs the multiple kernel learning (MKL) algorithm to construct an integrated miRNA similarity and disease similarity, providing more biological information for further feature learning. Second, MGADAE combines the known MDAs, disease similarity, and miRNA similarity into a heterogeneous network, then learns the representations of miRNAs and diseases through graph convolution operation. After that, an attention mechanism is introduced into MGADAE to integrate the representations from multiple graph convolutional network (GCN) layers. Lastly, the integrated representations of miRNAs and diseases are input into the bilinear decoder to obtain the final predicted association scores. Corresponding experiments prove that the proposed method outperforms existing advanced approaches in MDA prediction. Furthermore, case studies related to two human cancers provide further confirmation of the reliability of MGADAE in practice.


Assuntos
MicroRNAs , Neoplasias , Humanos , MicroRNAs/genética , Reprodutibilidade dos Testes , Biologia Computacional/métodos , Neoplasias/genética , Algoritmos
6.
Front Genet ; 14: 1249171, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37614816

RESUMO

Identification of disease-associated long non-coding RNAs (lncRNAs) is crucial for unveiling the underlying genetic mechanisms of complex diseases. Multiple types of similarity networks of lncRNAs (or diseases) can complementary and comprehensively characterize their similarities. Hence, in this study, we presented a computational model iLncDA-RSN based on reliable similarity networks for identifying potential lncRNA-disease associations (LDAs). Specifically, for constructing reliable similarity networks of lncRNAs and diseases, miRNA heuristic information with lncRNAs and diseases is firstly introduced to construct their respective Jaccard similarity networks; then Gaussian interaction profile (GIP) kernel similarity networks and Jaccard similarity networks of lncRNAs and diseases are provided based on the lncRNA-disease association network; a random walk with restart strategy is finally applied on Jaccard similarity networks, GIP kernel similarity networks, as well as lncRNA functional similarity network and disease semantic similarity network to construct reliable similarity networks. Depending on the lncRNA-disease association network and the reliable similarity networks, feature vectors of lncRNA-disease pairs are integrated from lncRNA and disease perspectives respectively, and then dimensionality reduced by the elastic net. Two random forests are at last used together on different lncRNA-disease association feature sets to identify potential LDAs. The iLncDA-RSN is evaluated by five-fold cross-validation to analyse its prediction performance, results of which show that the iLncDA-RSN outperforms the compared models. Furthermore, case studies of different complex diseases demonstrate the effectiveness of the iLncDA-RSN in identifying potential LDAs.

7.
IEEE J Biomed Health Inform ; 27(10): 5187-5198, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37498764

RESUMO

Advances in omics technology have enriched the understanding of the biological mechanisms of diseases, which has provided a new approach for cancer research. Multi-omics data contain different levels of cancer information, and comprehensive analysis of them has attracted wide attention. However, limited by the dimensionality of matrix models, traditional methods cannot fully use the key high-dimensional global structure of multi-omics data. Moreover, besides global information, local features within each omics are also critical. It is necessary to consider the potential local information together with the high-dimensional global information, ensuring that the shared and complementary features of the omics data are comprehensively observed. In view of the above, this article proposes a new tensor integrative framework called the strong complementarity tensor decomposition model (BioSTD) for cancer multi-omics data. It is used to identify cancer subtype specific genes and cluster subtype samples. Different from the matrix framework, BioSTD utilizes multi-view tensors to coordinate each omics to maximize high-dimensional spatial relationships, which jointly considers the different characteristics of different omics data. Meanwhile, we propose the concept of strong complementarity constraint applicable to omics data and introduce it into BioSTD. Strong complementarity is used to explore the potential local information, which can enhance the separability of different subtypes, allowing consistency and complementarity in the omics data to be fully represented. Experimental results on real cancer datasets show that our model outperforms other advanced models, which confirms its validity.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Multiômica
8.
BMC Genomics ; 24(1): 426, 2023 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-37516822

RESUMO

Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Estudos de Associação Genética , Aprendizado de Máquina , Mapeamento de Interação de Proteínas
9.
IEEE J Biomed Health Inform ; 27(10): 5199-5209, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37506010

RESUMO

The development of single-cell RNA sequencing (scRNA-seq) technology has opened up a new perspective for us to study disease mechanisms at the single cell level. Cell clustering reveals the natural grouping of cells, which is a vital step in scRNA-seq data analysis. However, the high noise and dropout of single-cell data pose numerous challenges to cell clustering. In this study, we propose a novel matrix factorization method named NLRRC for single-cell type identification. NLRRC joins non-negative low-rank representation (LRR) and random walk graph regularized NMF (RWNMFC) to accurately reveal the natural grouping of cells. Specifically, we find the lowest rank representation of single-cell samples by non-negative LRR to reduce the difficulty of analyzing high-dimensional samples and capture the global information of the samples. Meanwhile, by using random walk graph regularization (RWGR) and NMF, RWNMFC captures manifold structure and cluster information before generating a cluster allocation matrix. The cluster assignment matrix contains cluster labels, which can be used directly to get the clustering results. The performance of NLRRC is validated on simulated and real single-cell datasets. The results of the experiments illustrate that NLRRC has a significant advantage in single-cell type identification.


Assuntos
Algoritmos , Análise de Célula Única , Humanos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos
10.
J Comput Biol ; 30(8): 937-947, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37486669

RESUMO

Determining the association between drug and disease is important in drug development. However, existing approaches for drug-disease associations (DDAs) prediction are too homogeneous in terms of feature extraction. Here, a novel graph representation approach based on light gradient boosting machine (GRLGB) is proposed for prediction of DDAs. After the introduction of the protein into a heterogeneous network, nodes features were extracted from two perspectives: network topology and biological knowledge. Finally, the GRLGB classifier was applied to predict potential DDAs. GRLGB achieved satisfactory results on Bdataset and Fdataset through 10-fold cross-validation. To further prove the reliability of the GRLGB, case studies involving anxiety disorders and clozapine were conducted. The results suggest that GRLGB can identify novel DDAs.


Assuntos
Biologia Computacional , Proteínas , Reprodutibilidade dos Testes , Biologia Computacional/métodos , Algoritmos
11.
Artigo em Inglês | MEDLINE | ID: mdl-37459265

RESUMO

Increasing microRNAs (miRNAs) have been confirmed to be inextricably linked to various diseases, and the discovery of their associations has become a routine way of treating diseases. To overcome the time-consuming and laborious shortcoming of traditional experiments in verifying the associations of miRNAs and diseases (MDAs), a variety of computational methods have emerged. However, these methods still have many shortcomings in terms of predictive performance and accuracy. In this study, a model based on multiple graph convolutional networks and random forest (MGCNRF) was proposed for the prediction MDAs. Specifically, MGCNRF first mapped miRNA functional similarity and sequence similarity, disease semantic similarity and target similarity, and the known MDAs into four different two-layer heterogeneous networks. Second, MGCNRF applied four heterogeneous networks into four different layered attention graph convolutional networks (GCNs), respectively, to extract MDA embeddings. Finally, MGCNRF integrated the embeddings of every MDA into the features of the miRNA-disease pair and predicted potential MDAs through the random forest (RF). Fivefold cross-validation was applied to verify the prediction performance of MGCNRF, which outperforms the other seven state-of-the-art methods by area under curve. Furthermore, the accuracy and the case studies of different diseases further demonstrate the scientific rationale of MGCNRF. In conclusion, MGCNRF can serve as a scientific tool for predicting potential MDAs.

12.
J Comput Biol ; 30(8): 926-936, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37466461

RESUMO

Clinical trials indicate that the dysregulation of microRNAs (miRNAs) is closely associated with the development of diseases. Therefore, predicting miRNA-disease associations is significant for studying the pathogenesis of diseases. Since traditional wet-lab methods are resource-intensive, cost-saving computational models can be an effective complementary tool in biological experiments. In this work, a locality-constrained linear coding is proposed to predict associations (ILLCEL). Among them, ILLCEL adopts miRNA sequence similarity, miRNA functional similarity, disease semantic similarity, and interaction profile similarity obtained by locality-constrained linear coding (LLC) as the priori information. Next, features and similarities extracted from multiperspectives are input to the ensemble learning framework to improve the comprehensiveness of the prediction. Significantly, the introduction of hypergraph-regular terms improves the accuracy of prediction by describing complex associations between samples. The results under fivefold cross validation indicate that ILLCEL achieves superior prediction performance. In case studies, known associations are accurately predicted and novel associations are verified in HMDD v3.2, miRCancer, and existing literature. It is concluded that ILLCEL can be served as a powerful tool for inferring potential associations.

13.
J Comput Biol ; 30(8): 848-860, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37471220

RESUMO

The development of single-cell transcriptome sequencing technologies has opened new ways to study biological phenomena at the cellular level. A key application of such technologies involves the employment of single-cell RNA sequencing (scRNA-seq) data to identify distinct cell types through clustering, which in turn provides evidence for revealing heterogeneity. Despite the promise of this approach, the inherent characteristics of scRNA-seq data, such as higher noise levels and lower coverage, pose major challenges to existing clustering methods and compromise their accuracy. In this study, we propose a method called Adjusted Random walk Graph regularization Sparse Low-Rank Representation (ARGLRR), a practical sparse subspace clustering method, to identify cell types. The fundamental low-rank representation (LRR) model is concerned with the global structure of data. To address the limited ability of the LRR method to capture local structure, we introduced adjusted random walk graph regularization in its framework. ARGLRR allows for the capture of both local and global structures in scRNA-seq data. Additionally, the imposition of similarity constraints into the LRR framework further improves the ability of the proposed model to estimate cell-to-cell similarity and capture global structural relationships between cells. ARGLRR surpasses other advanced comparison approaches on nine known scRNA-seq data sets judging by the results. In the normalized mutual information and Adjusted Rand Index metrics on the scRNA-seq data sets clustering experiments, ARGLRR outperforms the best-performing comparative method by 6.99% and 5.85%, respectively. In addition, we visualize the result using Uniform Manifold Approximation and Projection. Visualization results show that the usage of ARGLRR enhances the separation of different cell types within the similarity matrix.


Assuntos
Algoritmos , RNA , Análise por Conglomerados , Análise de Célula Única/métodos , Análise de Sequência de RNA , Perfilação da Expressão Gênica
14.
J Comput Biol ; 30(8): 889-899, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37471239

RESUMO

The analysis of cancer data from multi-omics can effectively promote cancer research. The main focus of this article is to cluster cancer samples and identify feature genes to reveal the correlation between cancers and genes, with the primary approach being the analysis of multi-view cancer omics data. Our proposed solution, the Multi-View Enhanced Tensor Nuclear Norm and Local Constraint (MVET-LC) model, aims to utilize the consistency and complementarity of omics data to support biological research. The model is designed to maximize the utilization of multi-view data and incorporates a nuclear norm and local constraint to achieve this goal. The first step involves introducing the concept of enhanced partial sum of tensor nuclear norm, which significantly enhances the flexibility of the tensor nuclear norm. After that, we incorporate total variation regularization into the MVET-LC model to further augment its performance. It enables MVET-LC to make use of the relationship between tensor data structures and sparse data while paying attention to the feature details of the tensor data. To tackle the iterative optimization problem of MVET-LC, the alternating direction method of multipliers is utilized. Through experimental validation, it is demonstrated that our proposed model outperforms other comparison models.


Assuntos
Algoritmos , Neoplasias , Humanos , Neoplasias/genética , Análise por Conglomerados
15.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2802-2809, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37285246

RESUMO

Biclustering algorithms are essential for processing gene expression data. However, to process the dataset, most biclustering algorithms require preprocessing the data matrix into a binary matrix. Regrettably, this type of preprocessing may introduce noise or cause information loss in the binary matrix, which would reduce the biclustering algorithm's ability to effectively obtain the optimal biclusters. In this paper, we propose a new preprocessing method named Mean-Standard Deviation (MSD) to resolve the problem. Additionally, we introduce a new biclustering algorithm called Weight Adjacency Difference Matrix Binary Biclustering (W-AMBB) to effectively process datasets containing overlapping biclusters. The basic idea is to create a weighted adjacency difference matrix by applying weights to a binary matrix that is derived from the data matrix. This allows us to identify genes with significant associations in sample data by efficiently identifying similar genes that respond to specific conditions. Furthermore, the performance of the W-AMBB algorithm was tested on both synthetic and real datasets and compared with other classical biclustering methods. The experiment results demonstrate that the W-AMBB algorithm is significantly more robust than the compared biclustering methods on the synthetic dataset. Additionally, the results of the GO enrichment analysis show that the W-AMBB method possesses biological significance on real datasets.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise por Conglomerados , Expressão Gênica
16.
IEEE J Biomed Health Inform ; 27(7): 3686-3694, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37163398

RESUMO

Identifying drug-disease associations (DDAs) is critical to the development of drugs. Traditional methods to determine DDAs are expensive and inefficient. Therefore, it is imperative to develop more accurate and effective methods for DDAs prediction. Most current DDAs prediction methods utilize original DDAs matrix directly. However, the original DDAs matrix is sparse, which greatly affects the prediction consequences. Hence, a prediction method based on multi-similarities graph convolutional autoencoder (MSGCA) is proposed for DDAs prediction. First, MSGCA integrates multiple drug similarities and disease similarities using centered kernel alignment-based multiple kernel learning (CKA-MKL) algorithm to form new drug similarity and disease similarity, respectively. Second, the new drug and disease similarities are improved by linear neighborhood, and the DDAs matrix is reconstructed by weighted K nearest neighbor profiles. Next, the reconstructed DDAs and the improved drug and disease similarities are integrated into a heterogeneous network. Finally, the graph convolutional autoencoder with attention mechanism is utilized to predict DDAs. Compared with extant methods, MSGCA shows superior results on three datasets. Furthermore, case studies further demonstrate the reliability of MSGCA.


Assuntos
Algoritmos , Humanos , Reprodutibilidade dos Testes
17.
BMC Genomics ; 24(1): 279, 2023 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-37226081

RESUMO

BACKGROUND: Piwi-interacting RNAs (piRNAs) have been proven to be closely associated with human diseases. The identification of the potential associations between piRNA and disease is of great significance for complex diseases. Traditional "wet experiment" is time-consuming and high-priced, predicting the piRNA-disease associations by computational methods is of great significance. METHODS: In this paper, a method based on the embedding transformation graph convolution network is proposed to predict the piRNA-disease associations, named ETGPDA. Specifically, a heterogeneous network is constructed based on the similarity information of piRNA and disease, as well as the known piRNA-disease associations, which is applied to extract low-dimensional embeddings of piRNA and disease based on graph convolutional network with an attention mechanism. Furthermore, the embedding transformation module is developed for the problem of embedding space inconsistency, which is lightweighter, stronger learning ability and higher accuracy. Finally, the piRNA-disease association score is calculated by the similarity of the piRNA and disease embedding. RESULTS: Evaluated by fivefold cross-validation, the AUC of ETGPDA achieves 0.9603, which is better than the other five selected computational models. The case studies based on Head and neck squamous cell carcinoma and Alzheimer's disease further prove the superior performance of ETGPDA. CONCLUSIONS: Hence, the ETGPDA is an effective method for predicting the hidden piRNA-disease associations.


Assuntos
Doença de Alzheimer , Neoplasias de Cabeça e Pescoço , Humanos , RNA de Interação com Piwi , Doença de Alzheimer/genética , Aprendizagem , Projetos de Pesquisa
18.
IEEE J Biomed Health Inform ; 27(5): 2575-2584, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37027680

RESUMO

Single-cell RNA sequencing (scRNA-seq) technology can provide expression profile of single cells, which propels biological research into a new chapter. Clustering individual cells based on their transcriptome is a critical objective of scRNA-seq data analysis. However, the high-dimensional, sparse and noisy nature of scRNA-seq data pose a challenge to single-cell clustering. Therefore, it is urgent to develop a clustering method targeting scRNA-seq data characteristics. Due to its powerful subspace learning capability and robustness to noise, the subspace segmentation method based on low-rank representation (LRR) is broadly used in clustering researches and achieves satisfactory results. In view of this, we propose a personalized low-rank subspace clustering method, namely PLRLS, to learn more accurate subspace structures from both global and local perspectives. Specifically, we first introduce the local structure constraint to capture the local structure information of the data, while helping our method to obtain better inter-cluster separability and intra-cluster compactness. Then, in order to retain the important similarity information that is ignored by the LRR model, we utilize the fractional function to extract similarity information between cells, and introduce this information as the similarity constraint into the LRR framework. The fractional function is an efficient similarity measure designed for scRNA-seq data, which has theoretical and practical implications. In the end, based on the LRR matrix learned from PLRLS, we perform downstream analyses on real scRNA-seq datasets, including spectral clustering, visualization and marker gene identification. Comparative experiments show that the proposed method achieves superior clustering accuracy and robustness.


Assuntos
Algoritmos , Análise da Expressão Gênica de Célula Única , Humanos , Transcriptoma , Análise por Conglomerados , Análise de Dados , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos
19.
Comput Biol Chem ; 104: 107862, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37031647

RESUMO

Single-cell RNA sequencing technology provides a tremendous opportunity for studying disease mechanisms at the single-cell level. Cell type identification is a key step in the research of disease mechanisms. Many clustering algorithms have been proposed to identify cell types. Most clustering algorithms perform similarity calculation before cell clustering. Because clustering and similarity calculation are independent, a low-rank matrix obtained only by similarity calculation may be unable to fully reveal the patterns in single-cell data. In this study, to capture accurate single-cell clustering information, we propose a novel method based on a low-rank representation model, called KGLRR, that combines the low-rank representation approach with K-means clustering. The cluster centroid is updated as the cell dimension decreases to better from new clusters and improve the quality of clustering information. In addition, the low-rank representation model ignores local geometric information, so the graph regularization constraint is introduced. KGLRR is tested on both simulated and real single-cell datasets to validate the effectiveness of the new method. The experimental results show that KGLRR is more robust and accurate in cell type identification than other advanced algorithms.


Assuntos
Algoritmos , Análise por Conglomerados
20.
IEEE J Biomed Health Inform ; 27(6): 2968-2979, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37030856

RESUMO

In this study, we proposed a novel method called the graph capsule convolutional network (GCCN) to predict the progression from mild cognitive impairment to dementia and identify its pathogenesis. First, we proposed a novel risk gene discovery component to indirectly target genes with higher interactions with others. These risk genes and brain regions were collected as nodes to construct heterogeneous pathogenic information association graphs. Second, the graph capsules were established by projecting heterogeneous pathogenic information into a set of disentangled latent components. The orientation and length of capsules are representations of the format and intensity of pathogenic information. Third, graph capsule convolution network was used to model the information flows among pathogenic factors and elaborates the convergence of primary capsules to advanced capsules. The advanced capsule is a concept that organizes pathogenic information based on its consistency, and the synergistic effects of advanced capsules directed the development of the disease. Finally, discriminative pathogenic information flows were captured by a straightforward built-in interpretation mechanism, i.e., the dynamic routing mechanism, and applied to the identification of pathogenesis. GCCN has been experimentally shown to be significantly advanced on public datasets. Further experiments have shown that the pathogenic factors identified by GCCN are evidential and closely related to progressive mild cognitive impairment.


Assuntos
Disfunção Cognitiva , Humanos , Cápsulas , Disfunção Cognitiva/diagnóstico por imagem , Disfunção Cognitiva/genética , Diagnóstico por Imagem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...