Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 59
Filtrer
1.
IEEE J Biomed Health Inform ; 28(5): 3029-3041, 2024 May.
Article de Anglais | MEDLINE | ID: mdl-38427553

RÉSUMÉ

The roles of brain region activities and genotypic functions in the pathogenesis of Alzheimer's disease (AD) remain unclear. Meanwhile, current imaging genetics methods are difficult to identify potential pathogenetic markers by correlation analysis between brain network and genetic variation. To discover disease-related brain connectome from the specific brain structure and the fine-grained level, based on the Automated Anatomical Labeling (AAL) and human Brainnetome atlases, the functional brain network is first constructed for each subject. Specifically, the upper triangle elements of the functional connectivity matrix are extracted as connectivity features. The clustering coefficient and the average weighted node degree are developed to assess the significance of every brain area. Since the constructed brain network and genetic data are characterized by non-linearity, high-dimensionality, and few subjects, the deep subspace clustering algorithm is proposed to reconstruct the original data. Our multilayer neural network helps capture the non-linear manifolds, and subspace clustering learns pairwise affinities between samples. Moreover, most approaches in neuroimaging genetics are unsupervised learning, neglecting the diagnostic information related to diseases. We presented a label constraint with diagnostic status to instruct the imaging genetics correlation analysis. To this end, a diagnosis-guided deep subspace clustering association (DDSCA) method is developed to discover brain connectome and risk genetic factors by integrating genotypes with functional network phenotypes. Extensive experiments prove that DDSCA achieves superior performance to most association methods and effectively selects disease-relevant genetic markers and brain connectome at the coarse-grained and fine-grained levels.


Sujet(s)
Maladie d'Alzheimer , Encéphale , Imagerie par résonance magnétique , Humains , Maladie d'Alzheimer/génétique , Maladie d'Alzheimer/imagerie diagnostique , Analyse de regroupements , Encéphale/imagerie diagnostique , Imagerie par résonance magnétique/méthodes , Connectome/méthodes , Algorithmes , Sujet âgé , Marqueurs biologiques , Femelle , Mâle , Atlas comme sujet , Neuroimagerie/méthodes
2.
IEEE J Biomed Health Inform ; 28(5): 3178-3185, 2024 May.
Article de Anglais | MEDLINE | ID: mdl-38408006

RÉSUMÉ

CircRNA has been proved to play an important role in the diseases diagnosis and treatment. Considering that the wet-lab is time-consuming and expensive, computational methods are viable alternative in these years. However, the number of circRNA-disease associations (CDAs) that can be verified is relatively few, and some methods do not take full advantage of dependencies between attributes. To solve these problems, this paper proposes a novel method based on Kernel Fusion and Deep Auto-encoder (KFDAE) to predict the potential associations between circRNAs and diseases. Firstly, KFDAE uses a non-linear method to fuse the circRNA similarity kernels and disease similarity kernels. Then the vectors are connected to make the positive and negative sample sets, and these data are send to deep auto-encoder to reduce dimension and extract features. Finally, three-layer deep feedforward neural network is used to learn features and gain the prediction score. The experimental results show that compared with existing methods, KFDAE achieves the best performance. In addition, the results of case studies prove the effectiveness and practical significance of KFDAE, which means KFDAE is able to capture more comprehensive information and generate credible candidate for subsequent wet-lab.


Sujet(s)
Algorithmes , Biologie informatique , , ARN circulaire , Humains , ARN circulaire/génétique , Biologie informatique/méthodes , Apprentissage profond
3.
IEEE J Biomed Health Inform ; 28(2): 1110-1121, 2024 Feb.
Article de Anglais | MEDLINE | ID: mdl-38055359

RÉSUMÉ

Accumulating evidence indicates that microRNAs (miRNAs) can control and coordinate various biological processes. Consequently, abnormal expressions of miRNAs have been linked to various complex diseases. Recognizable proof of miRNA-disease associations (MDAs) will contribute to the diagnosis and treatment of human diseases. Nevertheless, traditional experimental verification of MDAs is laborious and limited to small-scale. Therefore, it is necessary to develop reliable and effective computational methods to predict novel MDAs. In this work, a multi-kernel graph attention deep autoencoder (MGADAE) method is proposed to predict potential MDAs. In detail, MGADAE first employs the multiple kernel learning (MKL) algorithm to construct an integrated miRNA similarity and disease similarity, providing more biological information for further feature learning. Second, MGADAE combines the known MDAs, disease similarity, and miRNA similarity into a heterogeneous network, then learns the representations of miRNAs and diseases through graph convolution operation. After that, an attention mechanism is introduced into MGADAE to integrate the representations from multiple graph convolutional network (GCN) layers. Lastly, the integrated representations of miRNAs and diseases are input into the bilinear decoder to obtain the final predicted association scores. Corresponding experiments prove that the proposed method outperforms existing advanced approaches in MDA prediction. Furthermore, case studies related to two human cancers provide further confirmation of the reliability of MGADAE in practice.


Sujet(s)
microARN , Tumeurs , Humains , microARN/génétique , Reproductibilité des résultats , Biologie informatique/méthodes , Tumeurs/génétique , Algorithmes
4.
IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3737-3747, 2023.
Article de Anglais | MEDLINE | ID: mdl-37751340

RÉSUMÉ

Single-cell RNA sequencing (scRNA-Seq) technology has emerged as a powerful tool to investigate cellular heterogeneity within tissues, organs, and organisms. One fundamental question pertaining to single-cell gene expression data analysis revolves around the identification of cell types, which constitutes a critical step within the data processing workflow. However, existing methods for cell type identification through learning low-dimensional latent embeddings often overlook the intercellular structural relationships. In this paper, we present a novel non-negative low-rank similarity correction model (NLRSIM) that leverages subspace clustering to preserve the global structure among cells. This model introduces a novel manifold learning process to address the issue of imbalanced neighbourhood spatial density in cells, thereby effectively preserving local geometric structures. This procedure utilizes a position-sensitive hashing algorithm to construct the graph structure of the data. The experimental results demonstrate that the NLRSIM surpasses other advanced models in terms of clustering effects and visualization experiments. The validated effectiveness of gene expression information after calibration by the NLRSIM model has been duly ascertained in the realm of relevant biological studies. The NLRSIM model offers unprecedented insights into gene expression, states, and structures at the individual cellular level, thereby contributing novel perspectives to the field.


Sujet(s)
Analyse sur cellule unique , Analyse de l'expression du gène de la cellule unique , Analyse sur cellule unique/méthodes , Algorithmes , Analyse de regroupements , Analyse de séquence d'ARN/méthodes , Analyse de profil d'expression de gènes/méthodes
5.
J Comput Biol ; 30(8): 926-936, 2023 Aug.
Article de Anglais | MEDLINE | ID: mdl-37466461

RÉSUMÉ

Clinical trials indicate that the dysregulation of microRNAs (miRNAs) is closely associated with the development of diseases. Therefore, predicting miRNA-disease associations is significant for studying the pathogenesis of diseases. Since traditional wet-lab methods are resource-intensive, cost-saving computational models can be an effective complementary tool in biological experiments. In this work, a locality-constrained linear coding is proposed to predict associations (ILLCEL). Among them, ILLCEL adopts miRNA sequence similarity, miRNA functional similarity, disease semantic similarity, and interaction profile similarity obtained by locality-constrained linear coding (LLC) as the priori information. Next, features and similarities extracted from multiperspectives are input to the ensemble learning framework to improve the comprehensiveness of the prediction. Significantly, the introduction of hypergraph-regular terms improves the accuracy of prediction by describing complex associations between samples. The results under fivefold cross validation indicate that ILLCEL achieves superior prediction performance. In case studies, known associations are accurately predicted and novel associations are verified in HMDD v3.2, miRCancer, and existing literature. It is concluded that ILLCEL can be served as a powerful tool for inferring potential associations.

6.
J Comput Biol ; 30(8): 937-947, 2023 08.
Article de Anglais | MEDLINE | ID: mdl-37486669

RÉSUMÉ

Determining the association between drug and disease is important in drug development. However, existing approaches for drug-disease associations (DDAs) prediction are too homogeneous in terms of feature extraction. Here, a novel graph representation approach based on light gradient boosting machine (GRLGB) is proposed for prediction of DDAs. After the introduction of the protein into a heterogeneous network, nodes features were extracted from two perspectives: network topology and biological knowledge. Finally, the GRLGB classifier was applied to predict potential DDAs. GRLGB achieved satisfactory results on Bdataset and Fdataset through 10-fold cross-validation. To further prove the reliability of the GRLGB, case studies involving anxiety disorders and clozapine were conducted. The results suggest that GRLGB can identify novel DDAs.


Sujet(s)
Biologie informatique , Protéines , Reproductibilité des résultats , Biologie informatique/méthodes , Algorithmes
7.
IEEE J Biomed Health Inform ; 27(10): 5187-5198, 2023 10.
Article de Anglais | MEDLINE | ID: mdl-37498764

RÉSUMÉ

Advances in omics technology have enriched the understanding of the biological mechanisms of diseases, which has provided a new approach for cancer research. Multi-omics data contain different levels of cancer information, and comprehensive analysis of them has attracted wide attention. However, limited by the dimensionality of matrix models, traditional methods cannot fully use the key high-dimensional global structure of multi-omics data. Moreover, besides global information, local features within each omics are also critical. It is necessary to consider the potential local information together with the high-dimensional global information, ensuring that the shared and complementary features of the omics data are comprehensively observed. In view of the above, this article proposes a new tensor integrative framework called the strong complementarity tensor decomposition model (BioSTD) for cancer multi-omics data. It is used to identify cancer subtype specific genes and cluster subtype samples. Different from the matrix framework, BioSTD utilizes multi-view tensors to coordinate each omics to maximize high-dimensional spatial relationships, which jointly considers the different characteristics of different omics data. Meanwhile, we propose the concept of strong complementarity constraint applicable to omics data and introduce it into BioSTD. Strong complementarity is used to explore the potential local information, which can enhance the separability of different subtypes, allowing consistency and complementarity in the omics data to be fully represented. Experimental results on real cancer datasets show that our model outperforms other advanced models, which confirms its validity.


Sujet(s)
Tumeurs , Humains , Tumeurs/génétique , Multi-omique
8.
IEEE J Biomed Health Inform ; 27(7): 3686-3694, 2023 Jul.
Article de Anglais | MEDLINE | ID: mdl-37163398

RÉSUMÉ

Identifying drug-disease associations (DDAs) is critical to the development of drugs. Traditional methods to determine DDAs are expensive and inefficient. Therefore, it is imperative to develop more accurate and effective methods for DDAs prediction. Most current DDAs prediction methods utilize original DDAs matrix directly. However, the original DDAs matrix is sparse, which greatly affects the prediction consequences. Hence, a prediction method based on multi-similarities graph convolutional autoencoder (MSGCA) is proposed for DDAs prediction. First, MSGCA integrates multiple drug similarities and disease similarities using centered kernel alignment-based multiple kernel learning (CKA-MKL) algorithm to form new drug similarity and disease similarity, respectively. Second, the new drug and disease similarities are improved by linear neighborhood, and the DDAs matrix is reconstructed by weighted K nearest neighbor profiles. Next, the reconstructed DDAs and the improved drug and disease similarities are integrated into a heterogeneous network. Finally, the graph convolutional autoencoder with attention mechanism is utilized to predict DDAs. Compared with extant methods, MSGCA shows superior results on three datasets. Furthermore, case studies further demonstrate the reliability of MSGCA.


Sujet(s)
Algorithmes , Humains , Reproductibilité des résultats
9.
Article de Anglais | MEDLINE | ID: mdl-37022835

RÉSUMÉ

Studies have revealed that microbes have an important effect on numerous physiological processes, and further research on the links between diseases and microbes is significant. Given that laboratory methods are expensive and not optimized, computational models are increasingly used for discovering disease-related microbes. Here, a new neighbor approach based on two-tier Bi-Random Walk is proposed for potential disease-related microbes, known as NTBiRW. In this method, the first step is to construct multiple microbe similarities and disease similarities. Then, three kinds of microbe/disease similarity are integrated through two-tier Bi-Random Walk to obtain the final integrated microbe/disease similarity network with different weights. Finally, Weighted K Nearest Known Neighbors (WKNKN) is used for prediction based on the final similarity network. In addition, leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold CV) are applied for evaluating the performance of NTBiRW. Multiple evaluating indicators are taken to show the performance from multiple perspectives. And most of the evaluation index values of NTBiRW are better than those of the compared methods. Moreover, in case studies on atopic dermatitis and psoriasis, most of the first 10 candidates in the final result can be proven. This also demonstrates the capability of NTBiRW for discovering new associations. Therefore, this method can contribute to the discovery of disease-related microbes and thus offer new thoughts for further understanding the pathogenesis of diseases.

10.
Comput Biol Chem ; 103: 107833, 2023 Apr.
Article de Anglais | MEDLINE | ID: mdl-36812824

RÉSUMÉ

Many experiments have proved that long non-coding RNAs (lncRNAs) in humans have been implicated in disease development. The prediction of lncRNA-disease association is essential in promoting disease treatment and drug development. It is time-consuming and laborious to explore the relationship between lncRNA and diseases in the laboratory. The computation-based approach has clear advantages and has become a promising research direction. This paper proposes a new lncRNA disease association prediction algorithm BRWMC. Firstly, BRWMC constructed several lncRNA (disease) similarity networks based on different measurement angles and fused them into an integrated similarity network by similarity network fusion (SNF). In addition, the random walk method is used to preprocess the known lncRNA-disease association matrix and calculate the estimated scores of potential lncRNA-disease associations. Finally, the matrix completion method accurately predicts the potential lncRNA-disease associations. Under the framework of leave-one-out cross-validation and 5-fold cross-validation, the AUC values obtained by BRWMC are 0.9610 and 0.9739, respectively. In addition, case studies of three common diseases show that BRWMC is a reliable method for prediction.


Sujet(s)
ARN long non codant , Humains , ARN long non codant/génétique , Biologie informatique/méthodes , Algorithmes
11.
Article de Anglais | MEDLINE | ID: mdl-35085090

RÉSUMÉ

An Increase in microbial activity is shown to be intimately connected with the pathogenesis of diseases. Considering the expense of traditional verification methods, researchers are working to develop high-efficiency methods for detecting potential disease-related microbes. In this article, a new prediction method, MSF-LRR, is established, which uses Low-Rank Representation (LRR) to perform multi-similarity information fusion to predict disease-related microbes. Considering that most existing methods only use one class of similarity, three classes of microbe and disease similarity are added. Then, LRR is used to obtain low-rank structural similarity information. Additionally, the method adaptively extracts the local low-rank structure of the data from a global perspective, to make the information used for the prediction more effective. Finally, a neighbor-based prediction method that utilizes the concept of collaborative filtering is applied to predict unknown microbe-disease pairs. As a result, the AUC value of MSF-LRR is superior to other existing algorithms under 5-fold cross-validation. Furthermore, in case studies, excluding originally known associations, 16 and 19 of the top 20 microbes associated with Bacterial Vaginosis and Irritable Bowel Syndrome, respectively, have been confirmed by the recent literature. In summary, MSF-LRR is a good predictor of potential microbe-disease associations and can contribute to drug discovery and biological research.


Sujet(s)
Algorithmes , Bactéries , Maladie , Interactions hôte-microbes , Bactéries/pathogénicité
12.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1774-1782, 2023.
Article de Anglais | MEDLINE | ID: mdl-36251902

RÉSUMÉ

With the development of bioinformatics, the important role played by lncRNAs in various intractable diseases has aroused the interest of many experts. In recent studies, researchers have found that several human diseases are related to lncRANs. Moreover, it is very difficult and expensive to explore the unknown lncRNA-disease associations (LDAs), so only a few associations have been confirmed. It is vital to find a more accurate and effective method to identify potential LDAs. In this study, a method of collaborative matrix factorization based on correntropy (LDCMFC) is proposed for the identification of potential LDAs. To improve the robustness of the algorithm, the traditional minimization of the Euclidean distance is replaced with the maximized correntropy. In addition, the weighted K nearest known neighbor (WKNKN) method is used to rebuild the adjacency matrix. Finally, the performance of LDCMFC is tested by 5-fold cross-validation. Compared with other traditional methods, LDACMFC obtains a higher AUC of 0.8628. In different types of studies of three important cancer cases, most of the potentially relevant lncRNAs derived from the experiments have been validated in the databases. The final result shows that LDCMFC is a feasible method to predict LDAs.


Sujet(s)
ARN long non codant , Humains , ARN long non codant/génétique , Algorithmes , Biologie informatique/méthodes , Bases de données factuelles , Analyse de regroupements
13.
Interdiscip Sci ; 15(1): 88-99, 2023 Mar.
Article de Anglais | MEDLINE | ID: mdl-36335274

RÉSUMÉ

With the high-quality development of bioinformatics technology, miRNA-disease associations (MDAs) are gradually being uncovered. At present, convenient and efficient prediction methods, which solve the problem of resource-consuming in traditional wet experiments, need to be further put forward. In this study, a space projection model based on block matrix is presented for predicting MDAs (BMPMDA). Specifically, two block matrices are first composed of the known association matrix and similarity to increase comprehensiveness. For the integrity of information in the heterogeneous network, matrix completion (MC) is utilized to mine potential MDAs. Considering the neighborhood information of data points, linear neighborhood similarity (LNS) is regarded as a measure of similarity. Next, LNS is projected onto the corresponding completed association matrix to derive the projection score. Finally, the AUC and AUPR values for BMPMDA reach 0.9691 and 0.6231, respectively. Additionally, the majority of novel MDAs in three disease cases are identified in existing databases and literature. It suggests that BMPMDA can serve as a reliable prediction model for biological research.


Sujet(s)
microARN , Humains , Algorithmes , Biologie informatique/méthodes , Prévision , Bases de données factuelles , Prédisposition génétique à une maladie
14.
Brief Bioinform ; 23(6)2022 11 19.
Article de Anglais | MEDLINE | ID: mdl-36305457

RÉSUMÉ

With the development of research on the complex aetiology of many diseases, computational drug repositioning methodology has proven to be a shortcut to costly and inefficient traditional methods. Therefore, developing more promising computational methods is indispensable for finding new candidate diseases to treat with existing drugs. In this paper, a model integrating a new variant of message passing neural network and a novel-gated fusion mechanism called GLGMPNN is proposed for drug-disease association prediction. First, a light-gated message passing neural network (LGMPNN), including message passing, aggregation and updating, is proposed to separately extract multiple pieces of information from the similarity networks and the association network. Then, a gated fusion mechanism consisting of a forget gate and an output gate is applied to integrate the multiple pieces of information to extent. The forget gate calculated by the multiple embeddings is built to integrate the association information into the similarity information. Furthermore, the final node representations are controlled by the output gate, which fuses the topology information of the networks and the initial similarity information. Finally, a bilinear decoder is adopted to reconstruct an adjacency matrix for drug-disease associations. Evaluated by 10-fold cross-validations, GLGMPNN achieves excellent performance compared with the current models. The following studies show that our model can effectively discover novel drug-disease associations.


Sujet(s)
Biologie informatique , , Biologie informatique/méthodes , Repositionnement des médicaments/méthodes , Algorithmes
15.
Article de Anglais | MEDLINE | ID: mdl-35857730

RÉSUMÉ

Single-cell RNA sequencing (scRNA-seq) technology is famous for providing a microscopic view to help capture cellular heterogeneity. This characteristic has advanced the field of genomics by enabling the delicate differentiation of cell types. However, the properties of single-cell datasets, such as high dropout events, noise, and high dimensionality, are still a research challenge in the single-cell field. To utilize single-cell data more efficiently and to better explore the heterogeneity among cells, a new graph autoencoder (GAE)-based consensus-guided model (scGAC) is proposed in this article. The data are preprocessed into multiple top-level feature datasets. Then, feature learning is performed by using GAEs to generate new feature matrices, followed by similarity learning based on distance fusion methods. The learned similarity matrices are fed back to the GAEs to guide their feature learning process. Finally, the abovementioned steps are iterated continuously to integrate the final consistent similarity matrix and perform other related downstream analyses. The scGAC model can accurately identify critical features and effectively preserve the internal structure of the data. This can further improve the accuracy of cell type identification.

16.
J Bioinform Comput Biol ; 20(2): 2250002, 2022 04.
Article de Anglais | MEDLINE | ID: mdl-35191362

RÉSUMÉ

Tensor Robust Principal Component Analysis (TRPCA) has achieved promising results in the analysis of genomics data. However, the TRPCA model under the existing tensor singular value decomposition ([Formula: see text]-SVD) framework insufficiently extracts the potential low-rank structure of the data, resulting in suboptimal restored components. Simultaneously, the tensor nuclear norm (TNN) defined based on [Formula: see text]-SVD uses the same standard to handle various singular values. TNN ignores the difference of singular values, leading to the failure of the main information that needs to be well preserved. To preserve the heterogeneous structure in the low-rank information, we propose a novel TNN and extend it to the TRPCA model. Potential low-rank space may contain important information. We learn the low-rank structural information from the core tensor. The singular value space contains the association information between genes and cancers. The [Formula: see text]-shrinkage generalized threshold function is utilized to preserve the low-rank properties of larger singular values. The optimization problem is solved by the alternating direction method of the multiplier (ADMM) algorithm. Clustering and feature selection experiments are performed on the TCGA data set. The experimental results show that the proposed model is more promising than other state-of-the-art tensor decomposition methods.


Sujet(s)
Algorithmes , Tumeurs , Analyse de regroupements , Génomique , Humains , Tumeurs/génétique , Analyse en composantes principales
17.
IEEE Trans Cybern ; 52(6): 5079-5087, 2022 Jun.
Article de Anglais | MEDLINE | ID: mdl-33119529

RÉSUMÉ

A growing number of clinical studies have provided substantial evidence of a close relationship between the microbe and the disease. Thus, it is necessary to infer potential microbe-disease associations. But traditional approaches use experiments to validate these associations that often spend a lot of materials and time. Hence, more reliable computational methods are expected to be applied to predict disease-associated microbes. In this article, an innovative mean for predicting microbe-disease associations is proposed, which is based on network consistency projection and label propagation (NCPLP). Given that most existing algorithms use the Gaussian interaction profile (GIP) kernel similarity as the similarity criterion between microbe pairs and disease pairs, in this model, Medical Subject Headings descriptors are considered to calculate disease semantic similarity. In addition, 16S rRNA gene sequences are borrowed for the calculation of microbe functional similarity. In view of the gene-based sequence information, we use two conventional methods (BLAST+ and MEGA7) to assess the similarity between each pair of microbes from different perspectives. Especially, network consistency projection is added to obtain network projection scores from the microbe space and the disease space. Ultimately, label propagation is utilized to reliably predict microbes related to diseases. NCPLP achieves better performance in various evaluation indicators and discovers a greater number of potential associations between microbes and diseases. Also, case studies further confirm the reliable prediction performance of NCPLP. To conclude, our algorithm NCPLP has the ability to discover these underlying microbe-disease associations and can provide help for biological study.


Sujet(s)
Algorithmes , Biologie informatique , Biologie informatique/méthodes , ARN ribosomique 16S
18.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1154-1164, 2022.
Article de Anglais | MEDLINE | ID: mdl-33026977

RÉSUMÉ

The rapid development of single-cell RNA sequencing (scRNA-seq)technology reveals the gene expression status and gene structure of individual cells, reflecting the heterogeneity and diversity of cells. The traditional methods of scRNA-seq data analysis treat data as the same subspace, and hide structural information in other subspaces. In this paper, we propose a low-rank subspace ensemble clustering framework (LRSEC)to analyze scRNA-seq data. Assuming that the scRNA-seq data exist in multiple subspaces, the low-rank model is used to find the lowest rank representation of the data in the subspace. It is worth noting that the penalty factor of the low-rank kernel function is uncertain, and different penalty factors correspond to different low-rank structures. Moreover, the single cluster model is difficult to find the cellular structure of all datasets. To strengthen the correlation between model solutions, we construct a new ensemble clustering framework LRSEC by using the low-rank model as the basic learner. The LRSEC framework captures the global structure of data through low-rank subspaces, which has better clustering performance than a single clustering model. We validate the performance of the LRSEC framework on seven small datasets and one large dataset and obtain satisfactory results.


Sujet(s)
Algorithmes , Analyse sur cellule unique , Analyse de regroupements , Analyse de séquence d'ARN , Analyse sur cellule unique/méthodes ,
19.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2420-2430, 2022.
Article de Anglais | MEDLINE | ID: mdl-33690124

RÉSUMÉ

Extracting genes involved in cancer lesions from gene expression data is critical for cancer research and drug development. The method of feature selection has attracted much attention in the field of bioinformatics. Principal Component Analysis (PCA) is a widely used method for learning low-dimensional representation. Some variants of PCA have been proposed to improve the robustness and sparsity of the algorithm. However, the existing methods ignore the high-order relationships between data. In this paper, a new model named Robust Principal Component Analysis via Hypergraph Regularization (HRPCA) is proposed. In detail, HRPCA utilizes L2,1-norm to reduce the effect of outliers and make data sufficiently row-sparse. And the hypergraph regularization is introduced to consider the complex relationship among data. Important information hidden in the data are mined, and this method ensures the accuracy of the resulting data relationship information. Extensive experiments on multi-view biological data demonstrate that the feasible and effective of the proposed approach.


Sujet(s)
Biologie informatique , Tumeurs , Algorithmes , Analyse de regroupements , Biologie informatique/méthodes , Humains , Tumeurs/génétique , Tumeurs/métabolisme , Analyse en composantes principales
20.
IEEE J Biomed Health Inform ; 26(1): 458-467, 2022 01.
Article de Anglais | MEDLINE | ID: mdl-34156956

RÉSUMÉ

The development of single-cell RNA sequencing (scRNA-seq) technology has made it possible to measure gene expression levels at the resolution of a single cell, which further reveals the complex growth processes of cells such as mutation and differentiation. Recognizing cell heterogeneity is one of the most critical tasks in scRNA-seq research. To solve it, we propose a non-negative matrix factorization framework based on multi-subspace cell similarity learning for unsupervised scRNA-seq data analysis (MscNMF). MscNMF includes three parts: data decomposition, similarity learning, and similarity fusion. The three work together to complete the data similarity learning task. MscNMF can learn the gene features and cell features of different subspaces, and the correlation and heterogeneity between cells will be more prominent in multi-subspaces. The redundant information and noise in each low-dimensional feature space are eliminated, and its gene weight information can be further analyzed to calculate the optimal number of subpopulations. The final cell similarity learning will be more satisfactory due to the fusion of cell similarity information in different subspaces. The advantage of MscNMF is that it can calculate the number of cell types and the rank of Non-negative matrix factorization (NMF) reasonably. Experiments on eight real scRNA-seq datasets show that MscNMF can effectively perform clustering tasks and extract useful genetic markers. To verify its clustering performance, the framework is compared with other latest clustering algorithms and satisfactory results are obtained. The code of MscNMF is free available for academic (https://github.com/wangchuanyuan1/project-MscNMF).


Sujet(s)
Algorithmes , Analyse sur cellule unique , Analyse de regroupements , Analyse de profil d'expression de gènes , Marqueurs génétiques , Humains , Analyse de séquence d'ARN/méthodes , Analyse sur cellule unique/méthodes
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...