Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 128
Filtrer
1.
Brief Bioinform ; 25(4)2024 May 23.
Article de Anglais | MEDLINE | ID: mdl-38935070

RÉSUMÉ

Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.


Sujet(s)
Biologie informatique , Réseaux de régulation génique , , Humains , Biologie informatique/méthodes , Algorithmes , Tumeurs de la vessie urinaire/génétique , Tumeurs de la vessie urinaire/anatomopathologie , Escherichia coli/génétique
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Article de Anglais | MEDLINE | ID: mdl-38581416

RÉSUMÉ

The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.


Sujet(s)
Réseaux de régulation génique , Tumeurs du foie , Humains , Biologie des systèmes/méthodes , Transcriptome , Algorithmes , Biologie informatique/méthodes
3.
IEEE J Biomed Health Inform ; 28(6): 3513-3522, 2024 Jun.
Article de Anglais | MEDLINE | ID: mdl-38568771

RÉSUMÉ

The pathogenesis of Alzheimer's disease (AD) is extremely intricate, which makes AD patients almost incurable. Recent studies have demonstrated that analyzing multi-modal data can offer a comprehensive perspective on the different stages of AD progression, which is beneficial for early diagnosis of AD. In this paper, we propose a deep self-reconstruction fusion similarity hashing (DS-FSH) method to effectively capture the AD-related biomarkers from the multi-modal data and leverage them to diagnose AD. Given that most existing methods ignore the topological structure of the data, a deep self-reconstruction model based on random walk graph regularization is designed to reconstruct the multi-modal data, thereby learning the nonlinear relationship between samples. Additionally, a fused similarity hash based on anchor graph is proposed to generate discriminative binary hash codes for multi-modal reconstructed data. This allows sample fused similarity to be effectively modeled by a fusion similarity matrix based on anchor graph while modal correlation can be approximated by Hamming distance. Especially, extracted features from the multi-modal data are classified using deep sparse autoencoders classifier. Finally, experiments conduct on the AD Neuroimaging Initiative database show that DS-FSH outperforms comparable methods of AD classification. To conclude, DS-FSH identifies multi-modal features closely associated with AD, which are expected to contribute significantly to understanding of the pathogenesis of AD.


Sujet(s)
Maladie d'Alzheimer , Maladie d'Alzheimer/imagerie diagnostique , Maladie d'Alzheimer/diagnostic , Humains , Algorithmes , Apprentissage profond , Imagerie par résonance magnétique/méthodes , Interprétation d'images assistée par ordinateur/méthodes , Neuroimagerie/méthodes , Encéphale/imagerie diagnostique , Imagerie multimodale/méthodes
4.
Comput Methods Programs Biomed ; 250: 108176, 2024 Jun.
Article de Anglais | MEDLINE | ID: mdl-38677081

RÉSUMÉ

BACKGROUND AND OBJECTIVE: Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application. METHODS: In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6. RESULTS: The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition. CONCLUSIONS: The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides.


Sujet(s)
Biologie informatique , Interleukine-6 , Peptides , Humains , Peptides/composition chimique , Biologie informatique/méthodes , COVID-19 , Algorithmes , Apprentissage machine , SARS-CoV-2
5.
IEEE J Biomed Health Inform ; 28(2): 1110-1121, 2024 Feb.
Article de Anglais | MEDLINE | ID: mdl-38055359

RÉSUMÉ

Accumulating evidence indicates that microRNAs (miRNAs) can control and coordinate various biological processes. Consequently, abnormal expressions of miRNAs have been linked to various complex diseases. Recognizable proof of miRNA-disease associations (MDAs) will contribute to the diagnosis and treatment of human diseases. Nevertheless, traditional experimental verification of MDAs is laborious and limited to small-scale. Therefore, it is necessary to develop reliable and effective computational methods to predict novel MDAs. In this work, a multi-kernel graph attention deep autoencoder (MGADAE) method is proposed to predict potential MDAs. In detail, MGADAE first employs the multiple kernel learning (MKL) algorithm to construct an integrated miRNA similarity and disease similarity, providing more biological information for further feature learning. Second, MGADAE combines the known MDAs, disease similarity, and miRNA similarity into a heterogeneous network, then learns the representations of miRNAs and diseases through graph convolution operation. After that, an attention mechanism is introduced into MGADAE to integrate the representations from multiple graph convolutional network (GCN) layers. Lastly, the integrated representations of miRNAs and diseases are input into the bilinear decoder to obtain the final predicted association scores. Corresponding experiments prove that the proposed method outperforms existing advanced approaches in MDA prediction. Furthermore, case studies related to two human cancers provide further confirmation of the reliability of MGADAE in practice.


Sujet(s)
microARN , Tumeurs , Humains , microARN/génétique , Reproductibilité des résultats , Biologie informatique/méthodes , Tumeurs/génétique , Algorithmes
6.
IEEE J Biomed Health Inform ; 27(12): 6133-6143, 2023 Dec.
Article de Anglais | MEDLINE | ID: mdl-37751336

RÉSUMÉ

Single-cell RNA sequencing (scRNA-seq) has rapidly emerged as a powerful technique for analyzing cellular heterogeneity at the individual cell level. In the analysis of scRNA-seq data, cell clustering is a critical step in downstream analysis, as it enables the identification of cell types and the discovery of novel cell subtypes. However, the characteristics of scRNA-seq data, such as high dimensionality and sparsity, dropout events and batch effects, present significant computational challenges for clustering analysis. In this study, we propose scGCC, a novel graph self-supervised contrastive learning model, to address the challenges faced in scRNA-seq data analysis. scGCC comprises two main components: a representation learning module and a clustering module. The scRNA-seq data is first fed into a representation learning module for training, which is then used for data classification through a clustering module. scGCC can learn low-dimensional denoised embeddings, which is advantageous for our clustering task. We introduce Graph Attention Networks (GAT) for cell representation learning, which enables better feature extraction and improved clustering accuracy. Additionally, we propose five data augmentation methods to improve clustering performance by increasing data diversity and reducing overfitting. These methods enhance the robustness of clustering results. Our experimental study on 14 real-world datasets has demonstrated that our model achieves extraordinary accuracy and robustness. We also perform downstream tasks, including batch effect removal, trajectory inference, and marker genes analysis, to verify the biological effectiveness of our model.


Sujet(s)
Analyse sur cellule unique , Analyse de l'expression du gène de la cellule unique , Humains , Analyse sur cellule unique/méthodes , Analyse de regroupements , Analyse de données , Analyse de profil d'expression de gènes/méthodes , Algorithmes
7.
IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3737-3747, 2023.
Article de Anglais | MEDLINE | ID: mdl-37751340

RÉSUMÉ

Single-cell RNA sequencing (scRNA-Seq) technology has emerged as a powerful tool to investigate cellular heterogeneity within tissues, organs, and organisms. One fundamental question pertaining to single-cell gene expression data analysis revolves around the identification of cell types, which constitutes a critical step within the data processing workflow. However, existing methods for cell type identification through learning low-dimensional latent embeddings often overlook the intercellular structural relationships. In this paper, we present a novel non-negative low-rank similarity correction model (NLRSIM) that leverages subspace clustering to preserve the global structure among cells. This model introduces a novel manifold learning process to address the issue of imbalanced neighbourhood spatial density in cells, thereby effectively preserving local geometric structures. This procedure utilizes a position-sensitive hashing algorithm to construct the graph structure of the data. The experimental results demonstrate that the NLRSIM surpasses other advanced models in terms of clustering effects and visualization experiments. The validated effectiveness of gene expression information after calibration by the NLRSIM model has been duly ascertained in the realm of relevant biological studies. The NLRSIM model offers unprecedented insights into gene expression, states, and structures at the individual cellular level, thereby contributing novel perspectives to the field.


Sujet(s)
Analyse sur cellule unique , Analyse de l'expression du gène de la cellule unique , Analyse sur cellule unique/méthodes , Algorithmes , Analyse de regroupements , Analyse de séquence d'ARN/méthodes , Analyse de profil d'expression de gènes/méthodes
8.
PLoS Comput Biol ; 19(8): e1011344, 2023 08.
Article de Anglais | MEDLINE | ID: mdl-37651321

RÉSUMÉ

Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.


Sujet(s)
Algorithmes , ARN circulaire , Humains , ARN circulaire/génétique , Sémantique
9.
J Comput Biol ; 30(8): 848-860, 2023 08.
Article de Anglais | MEDLINE | ID: mdl-37471220

RÉSUMÉ

The development of single-cell transcriptome sequencing technologies has opened new ways to study biological phenomena at the cellular level. A key application of such technologies involves the employment of single-cell RNA sequencing (scRNA-seq) data to identify distinct cell types through clustering, which in turn provides evidence for revealing heterogeneity. Despite the promise of this approach, the inherent characteristics of scRNA-seq data, such as higher noise levels and lower coverage, pose major challenges to existing clustering methods and compromise their accuracy. In this study, we propose a method called Adjusted Random walk Graph regularization Sparse Low-Rank Representation (ARGLRR), a practical sparse subspace clustering method, to identify cell types. The fundamental low-rank representation (LRR) model is concerned with the global structure of data. To address the limited ability of the LRR method to capture local structure, we introduced adjusted random walk graph regularization in its framework. ARGLRR allows for the capture of both local and global structures in scRNA-seq data. Additionally, the imposition of similarity constraints into the LRR framework further improves the ability of the proposed model to estimate cell-to-cell similarity and capture global structural relationships between cells. ARGLRR surpasses other advanced comparison approaches on nine known scRNA-seq data sets judging by the results. In the normalized mutual information and Adjusted Rand Index metrics on the scRNA-seq data sets clustering experiments, ARGLRR outperforms the best-performing comparative method by 6.99% and 5.85%, respectively. In addition, we visualize the result using Uniform Manifold Approximation and Projection. Visualization results show that the usage of ARGLRR enhances the separation of different cell types within the similarity matrix.


Sujet(s)
Algorithmes , ARN , Analyse de regroupements , Analyse sur cellule unique/méthodes , Analyse de séquence d'ARN , Analyse de profil d'expression de gènes
10.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2802-2809, 2023.
Article de Anglais | MEDLINE | ID: mdl-37285246

RÉSUMÉ

Biclustering algorithms are essential for processing gene expression data. However, to process the dataset, most biclustering algorithms require preprocessing the data matrix into a binary matrix. Regrettably, this type of preprocessing may introduce noise or cause information loss in the binary matrix, which would reduce the biclustering algorithm's ability to effectively obtain the optimal biclusters. In this paper, we propose a new preprocessing method named Mean-Standard Deviation (MSD) to resolve the problem. Additionally, we introduce a new biclustering algorithm called Weight Adjacency Difference Matrix Binary Biclustering (W-AMBB) to effectively process datasets containing overlapping biclusters. The basic idea is to create a weighted adjacency difference matrix by applying weights to a binary matrix that is derived from the data matrix. This allows us to identify genes with significant associations in sample data by efficiently identifying similar genes that respond to specific conditions. Furthermore, the performance of the W-AMBB algorithm was tested on both synthetic and real datasets and compared with other classical biclustering methods. The experiment results demonstrate that the W-AMBB algorithm is significantly more robust than the compared biclustering methods on the synthetic dataset. Additionally, the results of the GO enrichment analysis show that the W-AMBB method possesses biological significance on real datasets.


Sujet(s)
Algorithmes , Analyse de profil d'expression de gènes , Analyse de profil d'expression de gènes/méthodes , Séquençage par oligonucléotides en batterie/méthodes , Analyse de regroupements , Expression des gènes
11.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2853-2861, 2023.
Article de Anglais | MEDLINE | ID: mdl-37267145

RÉSUMÉ

Gene regulatory networks (GRNs) participate in many biological processes, and reconstructing them plays an important role in systems biology. Although many advanced methods have been proposed for GRN reconstruction, their predictive performance is far from the ideal standard, so it is urgent to design a more effective method to reconstruct GRN. Moreover, most methods only consider the gene expression data, ignoring the network structure information contained in GRN. In this study, we propose a supervised model named CNNGRN, which infers GRN from bulk time-series expression data via convolutional neural network (CNN) model, with a more informative feature. Bulk time series gene expression data imply the intricate regulatory associations between genes, and the network structure feature of ground-truth GRN contains rich neighbor information. Hence, CNNGRN integrates the above two features as model inputs. In addition, CNN is adopted to extract intricate features of genes and infer the potential associations between regulators and target genes. Moreover, feature importance visualization experiments are implemented to seek the key features. Experimental results show that CNNGRN achieved competitive performance on benchmark datasets compared to the state-of-the-art computational methods. Finally, hub genes identified based on CNNGRN have been confirmed to be involved in biological processes through literature.


Sujet(s)
Algorithmes , Réseaux de régulation génique , Réseaux de régulation génique/génétique , Facteurs temps , , Biologie des systèmes , Biologie informatique/méthodes
12.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3154-3162, 2023.
Article de Anglais | MEDLINE | ID: mdl-37018084

RÉSUMÉ

Circular RNAs (circRNAs) are a category of noncoding RNAs that exist in great numbers in eukaryotes. They have recently been discovered to be crucial in the growth of tumors. Therefore, it is important to explore the association of circRNAs with disease. This paper proposes a new method based on DeepWalk and nonnegative matrix factorization (DWNMF) to predict circRNA-disease association. Based on the known circRNA-disease association, we calculate the topological similarity of circRNA and disease via the DeepWalk-based method to learn the node features on the association network. Next, the functional similarity of the circRNAs and the semantic similarity of the diseases are fused with their respective topological similarities at different scales. Then, we use the improved weighted K-nearest neighbor (IWKNN) method to preprocess the circRNA-disease association network and correct nonnegative associations by setting different parameters K1 and K2 in the circRNA and disease matrices. Finally, the L2,1-norm, dual-graph regularization term and Frobenius norm regularization term are introduced into the nonnegative matrix factorization model to predict the circRNA-disease correlation. We perform cross-validation on circR2Disease, circRNADisease, and MNDR. The numerical results show that DWNMF is an efficient tool for forecasting potential circRNA-disease relationships, outperforming other state-of-the-art approaches in terms of predictive performance.


Sujet(s)
microARN , Tumeurs , Humains , ARN circulaire/génétique , Algorithmes , Tumeurs/génétique , Analyse de regroupements , Biologie informatique/méthodes
13.
IEEE J Biomed Health Inform ; 27(5): 2575-2584, 2023 05.
Article de Anglais | MEDLINE | ID: mdl-37027680

RÉSUMÉ

Single-cell RNA sequencing (scRNA-seq) technology can provide expression profile of single cells, which propels biological research into a new chapter. Clustering individual cells based on their transcriptome is a critical objective of scRNA-seq data analysis. However, the high-dimensional, sparse and noisy nature of scRNA-seq data pose a challenge to single-cell clustering. Therefore, it is urgent to develop a clustering method targeting scRNA-seq data characteristics. Due to its powerful subspace learning capability and robustness to noise, the subspace segmentation method based on low-rank representation (LRR) is broadly used in clustering researches and achieves satisfactory results. In view of this, we propose a personalized low-rank subspace clustering method, namely PLRLS, to learn more accurate subspace structures from both global and local perspectives. Specifically, we first introduce the local structure constraint to capture the local structure information of the data, while helping our method to obtain better inter-cluster separability and intra-cluster compactness. Then, in order to retain the important similarity information that is ignored by the LRR model, we utilize the fractional function to extract similarity information between cells, and introduce this information as the similarity constraint into the LRR framework. The fractional function is an efficient similarity measure designed for scRNA-seq data, which has theoretical and practical implications. In the end, based on the LRR matrix learned from PLRLS, we perform downstream analyses on real scRNA-seq datasets, including spectral clustering, visualization and marker gene identification. Comparative experiments show that the proposed method achieves superior clustering accuracy and robustness.


Sujet(s)
Algorithmes , Analyse de l'expression du gène de la cellule unique , Humains , Transcriptome , Analyse de regroupements , Analyse de données , Analyse sur cellule unique/méthodes , Analyse de profil d'expression de gènes/méthodes
14.
Article de Anglais | MEDLINE | ID: mdl-37022835

RÉSUMÉ

Studies have revealed that microbes have an important effect on numerous physiological processes, and further research on the links between diseases and microbes is significant. Given that laboratory methods are expensive and not optimized, computational models are increasingly used for discovering disease-related microbes. Here, a new neighbor approach based on two-tier Bi-Random Walk is proposed for potential disease-related microbes, known as NTBiRW. In this method, the first step is to construct multiple microbe similarities and disease similarities. Then, three kinds of microbe/disease similarity are integrated through two-tier Bi-Random Walk to obtain the final integrated microbe/disease similarity network with different weights. Finally, Weighted K Nearest Known Neighbors (WKNKN) is used for prediction based on the final similarity network. In addition, leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold CV) are applied for evaluating the performance of NTBiRW. Multiple evaluating indicators are taken to show the performance from multiple perspectives. And most of the evaluation index values of NTBiRW are better than those of the compared methods. Moreover, in case studies on atopic dermatitis and psoriasis, most of the first 10 candidates in the final result can be proven. This also demonstrates the capability of NTBiRW for discovering new associations. Therefore, this method can contribute to the discovery of disease-related microbes and thus offer new thoughts for further understanding the pathogenesis of diseases.

15.
Brief Bioinform ; 24(1)2023 01 19.
Article de Anglais | MEDLINE | ID: mdl-36592058

RÉSUMÉ

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.


Sujet(s)
Analyse de profil d'expression de gènes , Analyse de l'expression du gène de la cellule unique , Analyse de profil d'expression de gènes/méthodes , Analyse de séquence d'ARN/méthodes , Loi normale , Théorème de Bayes , Analyse sur cellule unique/méthodes , Analyse de regroupements
16.
Brief Bioinform ; 24(1)2023 01 19.
Article de Anglais | MEDLINE | ID: mdl-36611253

RÉSUMÉ

Although previous studies have revealed that synonymous mutations contribute to various human diseases, distinguishing deleterious synonymous mutations from benign ones is still a challenge in medical genomics. Recently, computational tools have been introduced to predict the harmfulness of synonymous mutations. However, most of these computational tools rely on balanced training sets without considering abundant negative samples that could result in deficient performance. In this study, we propose a computational model that uses a selective ensemble to predict deleterious synonymous mutations (seDSM). We construct several candidate base classifiers for the ensemble using balanced training subsets randomly sampled from the imbalanced benchmark training sets. The diversity measures of the base classifiers are calculated by the pairwise diversity metrics, and the classifiers with the highest diversities are selected for integration using soft voting for synonymous mutation prediction. We also design two strategies for filling in missing values in the imbalanced dataset and constructing models using different pairwise diversity metrics. The experimental results show that a selective ensemble based on double fault with the ensemble strategy EKNNI for filling in missing values is the most effective scheme. Finally, using 40-dimensional biology features, we propose a novel model based on a selective ensemble for predicting deleterious synonymous mutations (seDSM). seDSM outperformed other state-of-the-art methods on the independent test sets according to multiple evaluation indicators, indicating that it has an outstanding predictive performance for deleterious synonymous mutations. We hope that seDSM will be useful for studying deleterious synonymous mutations and advancing our understanding of synonymous mutations. The source code of seDSM is freely accessible at https://github.com/xialab-ahu/seDSM.git.


Sujet(s)
Génomique , Mutation inapparente , Humains , Génomique/méthodes , Logiciel , Algorithmes
17.
Brief Bioinform ; 24(1)2023 01 19.
Article de Anglais | MEDLINE | ID: mdl-36631401

RÉSUMÉ

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.


Sujet(s)
Analyse de profil d'expression de gènes , Analyse de l'expression du gène de la cellule unique , Humains , Analyse de profil d'expression de gènes/méthodes , Analyse de séquence d'ARN/méthodes , Analyse sur cellule unique/méthodes , Algorithmes , Analyse de regroupements
18.
IEEE Trans Neural Netw Learn Syst ; 34(9): 5570-5579, 2023 09.
Article de Anglais | MEDLINE | ID: mdl-34860656

RÉSUMÉ

Determining microRNA (miRNA)-disease associations (MDAs) is an integral part in the prevention, diagnosis, and treatment of complex diseases. However, wet experiments to discern MDAs are inefficient and expensive. Hence, the development of reliable and efficient data integrative models for predicting MDAs is of significant meaning. In the present work, a novel deep learning method for predicting MDAs through deep autoencoder with multiple kernel learning (DAEMKL) is presented. Above all, DAEMKL applies multiple kernel learning (MKL) in miRNA space and disease space to construct miRNA similarity network and disease similarity network, respectively. Then, for each disease or miRNA, its feature representation is learned from the miRNA similarity network and disease similarity network via the regression model. After that, the integrated miRNA feature representation and disease feature representation are input into deep autoencoder (DAE). Furthermore, the novel MDAs are predicted through reconstruction error. Ultimately, the AUC results show that DAEMKL achieves outstanding performance. In addition, case studies of three complex diseases further prove that DAEMKL has excellent predictive performance and can discover a large number of underlying MDAs. On the whole, our method DAEMKL is an effective method to identify MDAs.


Sujet(s)
microARN , microARN/génétique , , Algorithmes , Biologie informatique/méthodes
19.
Article de Anglais | MEDLINE | ID: mdl-34951853

RÉSUMÉ

CircRNAs have a stable structure, which gives them a higher tolerance to nucleases. Therefore, the properties of circular RNAs are beneficial in disease diagnosis. However, there are few known associations between circRNAs and disease. Biological experiments identify new associations is time-consuming and high-cost. As a result, there is a need of building efficient and achievable computation models to predict potential circRNA-disease associations. In this paper, we design a novel convolution neural networks framework(DMFCNNCD) to learn features from deep matrix factorization to predict circRNA-disease associations. Firstly, we decompose the circRNA-disease association matrix to obtain the original features of the disease and circRNA, and use the mapping module to extract potential nonlinear features. Then, we integrate it with the similarity information to form a training set. Finally, we apply convolution neural networks to predict the unknown association between circRNAs and diseases. The five-fold cross-validation on various experiments shows that our method can predict circRNA-disease association and outperforms state of the art methods.


Sujet(s)
, ARN circulaire , ARN circulaire/génétique , Biologie informatique/méthodes
20.
Article de Anglais | MEDLINE | ID: mdl-35420988

RÉSUMÉ

With the discovery of causality between synonymous mutations and diseases, it has become increasingly important to identify deleterious synonymous mutations for better understanding of their functional mechanisms. Although several machine learning methods have been proposed to solve the task, an effective feature representation method that can make use of the inner difference and relevance between deleterious and benign synonymous mutations is still challenging considering the vast number of synonymous mutations in human genome. In this work, we developed a robust and accurate predictor called frDSM for deleterious synonymous mutation prediction using logistic regression. More specifically, we introduced an effective feature representation learning method which exploits multiple feature descriptors from different perspectives including functional scores obtained from previously computational methods, evolutionary conservation, splicing and sequence feature descriptors, and these features descriptors were input into the 76 XGBoost classifiers to obtain the predictive probabilities values. These probabilities were concatenated to generate the 76-dimension new feature vector, and feature selection method was used to remove redundant and irrelevant features. Experimental results show that frDSM enables robust and accurate prediction than the competing prediction methods with 31 optimal features, which demonstrated the effectiveness of the feature representation learning method. frDSM is freely available at http://frdsm.xialab.info.


Sujet(s)
Génome humain , Mutation inapparente , Humains , Génome humain/génétique , Apprentissage machine , Algorithmes
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...