Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 24(1): 13, 2023 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-36624376

RESUMO

BACKGROUND: Constructing molecular interaction networks from microarray data and then identifying disease module biomarkers can provide insight into the underlying pathogenic mechanisms of non-small cell lung cancer. A promising approach for identifying disease modules in the network is community detection. RESULTS: In order to identify disease modules from gene co-expression networks, a community detection method is proposed based on multi-objective optimization genetic algorithm with decomposition. The method is named DM-MOGA and possesses two highlights. First, the boundary correction strategy is designed for the modules obtained in the process of local module detection and pre-simplification. Second, during the evolution, we introduce Davies-Bouldin index and clustering coefficient as fitness functions which are improved and migrated to weighted networks. In order to identify modules that are more relevant to diseases, the above strategies are designed to consider the network topology of genes and the strength of connections with other genes at the same time. Experimental results of different gene expression datasets of non-small cell lung cancer demonstrate that the core modules obtained by DM-MOGA are more effective than those obtained by several other advanced module identification methods. CONCLUSIONS: The proposed method identifies disease-relevant modules by optimizing two novel fitness functions to simultaneously consider the local topology of each gene and its connection strength with other genes. The association of the identified core modules with lung cancer has been confirmed by pathway and gene ontology enrichment analysis.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/genética , Neoplasias Pulmonares/genética , Redes Reguladoras de Genes , Análise em Microsséries , Algoritmos , Perfilação da Expressão Gênica/métodos
2.
BMC Genomics ; 24(1): 426, 2023 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-37516822

RESUMO

Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Estudos de Associação Genética , Aprendizado de Máquina , Mapeamento de Interação de Proteínas
3.
BMC Genomics ; 24(1): 279, 2023 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-37226081

RESUMO

BACKGROUND: Piwi-interacting RNAs (piRNAs) have been proven to be closely associated with human diseases. The identification of the potential associations between piRNA and disease is of great significance for complex diseases. Traditional "wet experiment" is time-consuming and high-priced, predicting the piRNA-disease associations by computational methods is of great significance. METHODS: In this paper, a method based on the embedding transformation graph convolution network is proposed to predict the piRNA-disease associations, named ETGPDA. Specifically, a heterogeneous network is constructed based on the similarity information of piRNA and disease, as well as the known piRNA-disease associations, which is applied to extract low-dimensional embeddings of piRNA and disease based on graph convolutional network with an attention mechanism. Furthermore, the embedding transformation module is developed for the problem of embedding space inconsistency, which is lightweighter, stronger learning ability and higher accuracy. Finally, the piRNA-disease association score is calculated by the similarity of the piRNA and disease embedding. RESULTS: Evaluated by fivefold cross-validation, the AUC of ETGPDA achieves 0.9603, which is better than the other five selected computational models. The case studies based on Head and neck squamous cell carcinoma and Alzheimer's disease further prove the superior performance of ETGPDA. CONCLUSIONS: Hence, the ETGPDA is an effective method for predicting the hidden piRNA-disease associations.


Assuntos
Doença de Alzheimer , Neoplasias de Cabeça e Pescoço , Humanos , RNA de Interação com Piwi , Doença de Alzheimer/genética , Aprendizagem , Projetos de Pesquisa
4.
BMC Genomics ; 23(1): 686, 2022 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-36199016

RESUMO

BACKGROUND: MicroRNAs (miRNAs) have been confirmed to be inextricably linked to the emergence of human complex diseases. The identification of the disease-related miRNAs has gradually become a routine way to unveil the genetic mechanisms of examined disorders. METHODS: In this study, a method BLNIMDA based on a weighted bi-level network was proposed for predicting hidden associations between miRNAs and diseases. For this purpose, the known associations between miRNAs and diseases as well as integrated similarities between miRNAs and diseases are mapped into a bi-level network. Based on the developed bi-level network, the miRNA-disease associations (MDAs) are defined as strong associations, potential associations and no associations. Then, each miRNA-disease pair (MDP) is assigned two information properties according to the bidirectional information distribution strategy, i.e., associations of miRNA towards disease and vice-versa. Finally, two affinity weights for each MDP obtained from the information properties and the association type are then averaged as the final association score of the MDP. Highlights of the BLNIMDA lie in the definition of MDA types, and the introduction of affinity weights evaluation from the bidirectional information distribution strategy and defined association types, which ensure the comprehensiveness and accuracy of the final prediction score of MDAs. RESULTS: Five-fold cross-validation and leave-one-out cross-validation are used to evaluate the performance of the BLNIMDA. The results of the Area Under Curve show that the BLNIMDA has many advantages over the other seven selected computational methods. Furthermore, the case studies based on four common diseases and miRNAs prove that the BLNIMDA has good predictive performance. CONCLUSIONS: Therefore, the BLNIMDA is an effective method for predicting hidden MDAs.


Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética
5.
BMC Genomics ; 23(1): 851, 2022 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-36564711

RESUMO

In the analysis of single-cell RNA-sequencing (scRNA-seq) data, how to effectively and accurately identify cell clusters from a large number of cell mixtures is still a challenge. Low-rank representation (LRR) method has achieved excellent results in subspace clustering. But in previous studies, most LRR-based methods usually choose the original data matrix as the dictionary. In addition, the methods based on LRR usually use spectral clustering algorithm to complete cell clustering. Therefore, there is a matching problem between the spectral clustering method and the affinity matrix, which is difficult to ensure the optimal effect of clustering. Considering the above two points, we propose the DLNLRR method to better identify the cell type. First, DLNLRR can update the dictionary during the optimization process instead of using the predefined fixed dictionary, so it can realize dictionary learning and LRR learning at the same time. Second, DLNLRR can realize subspace clustering without relying on spectral clustering algorithm, that is, we can perform clustering directly based on the low-rank matrix. Finally, we carry out a large number of experiments on real single-cell datasets and experimental results show that DLNLRR is superior to other scRNA-seq data analysis algorithms in cell type identification.


Assuntos
Algoritmos , Aprendizagem , Análise por Conglomerados , Análise de Dados , RNA/genética , Análise de Célula Única , Análise de Sequência de RNA
6.
Bioinformatics ; 37(18): 2920-2929, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33730153

RESUMO

MOTIVATION: For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. RESULTS: In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. AVAILABILITY AND IMPLEMENTATION: The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Glioblastoma , Software , Humanos
7.
Int J Mol Sci ; 23(17)2022 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-36077236

RESUMO

Compared to single-drug therapy, drug combinations have shown great potential in cancer treatment. Most of the current methods employ genomic data and chemical information to construct drug-cancer cell line features, but there is still a need to explore methods to combine topological information in the protein interaction network (PPI). Therefore, we propose a network-embedding-based prediction model, NEXGB, which integrates the corresponding protein modules of drug-cancer cell lines with PPI network information. NEXGB extracts the topological features of each protein node in a PPI network by struc2vec. Then, we combine the topological features with the target protein information of drug-cancer cell lines, to generate drug features and cancer cell line features, and utilize extreme gradient boosting (XGBoost) to predict the synergistic relationship between drug combinations and cancer cell lines. We apply our model on two recently developed datasets, the Oncology-Screen dataset (Oncology-Screen) and the large drug combination dataset (DrugCombDB). The experimental results show that NEXGB outperforms five current methods, and it effectively improves the predictive power in discovering relationships between drug combinations and cancer cell lines. This further demonstrates that the network information is valid for detecting combination therapies for cancer and other complex diseases.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica , Neoplasias , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Combinação de Medicamentos , Genômica , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Mapas de Interação de Proteínas , Proteínas/uso terapêutico
8.
Molecules ; 27(14)2022 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-35889243

RESUMO

Many microRNAs (miRNAs) have been confirmed to be associated with the generation of human diseases. Capturing miRNA-disease associations (M-DAs) provides an effective way to understand the etiology of diseases. Many models for predicting M-DAs have been constructed; nevertheless, there are still several limitations, such as generally considering direct information between miRNAs and diseases, usually ignoring potential knowledge hidden in isolated miRNAs or diseases. To overcome these limitations, in this study a novel method for predicting M-DAs was developed named TLNPMD, highlights of which are the introduction of drug heuristic information and a bipartite network reconstruction strategy. Specifically, three bipartite networks, including drug-miRNA, drug-disease, and miRNA-disease, were reconstructed as weighted ones using such reconstruction strategy. Based on these weighted bipartite networks, as well as three corresponding similarity networks of drugs, miRNAs and diseases, the miRNA-drug-disease three-layer heterogeneous network was constructed. Then, this heterogeneous network was converted into three two-layer heterogeneous networks, for each of which the network path computational model was employed to predict association scores. Finally, both direct and indirect miRNA-disease paths were used to predict M-DAs. Comparative experiments of TLNPMD and other four models were performed and evaluated by five-fold and global leave-one-out cross validations, results of which show that TLNPMD has the highest AUC values among those of compared methods. In addition, case studies of two common diseases were carried out to validate the effectiveness of the TLNPMD. These experiments demonstrate that the TLNPMD may serve as a promising alternative to existing methods for predicting M-DAs.


Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , Humanos , MicroRNAs/genética
9.
BMC Bioinformatics ; 21(1): 445, 2020 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-33028187

RESUMO

BACKGROUND: As a machine learning method with high performance and excellent generalization ability, extreme learning machine (ELM) is gaining popularity in various studies. Various ELM-based methods for different fields have been proposed. However, the robustness to noise and outliers is always the main problem affecting the performance of ELM. RESULTS: In this paper, an integrated method named correntropy induced loss based sparse robust graph regularized extreme learning machine (CSRGELM) is proposed. The introduction of correntropy induced loss improves the robustness of ELM and weakens the negative effects of noise and outliers. By using the L2,1-norm to constrain the output weight matrix, we tend to obtain a sparse output weight matrix to construct a simpler single hidden layer feedforward neural network model. By introducing the graph regularization to preserve the local structural information of the data, the classification performance of the new method is further improved. Besides, we design an iterative optimization method based on the idea of half quadratic optimization to solve the non-convex problem of CSRGELM. CONCLUSIONS: The classification results on the benchmark dataset show that CSRGELM can obtain better classification results compared with other methods. More importantly, we also apply the new method to the classification problems of cancer samples and get a good classification effect.


Assuntos
Aprendizado de Máquina , Neoplasias/classificação , Benchmarking , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos , Neoplasias/patologia
10.
BMC Bioinformatics ; 21(1): 339, 2020 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-32736513

RESUMO

BACKGROUND: It has been widely accepted that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human diseases. Many association prediction models have been proposed for predicting lncRNA functions and identifying potential lncRNA-disease associations. Nevertheless, among them, little effort has been attempted to measure lncRNA functional similarity, which is an essential part of association prediction models. RESULTS: In this study, we presented an lncRNA functional similarity calculation model, IDSSIM for short, based on an improved disease semantic similarity method, highlight of which is the introduction of information content contribution factor into the semantic value calculation to take into account both the hierarchical structures of disease directed acyclic graphs and the disease specificities. IDSSIM and three state-of-the-art models, i.e., LNCSIM1, LNCSIM2, and ILNCSIM, were evaluated by applying their disease semantic similarity matrices and the lncRNA functional similarity matrices, as well as corresponding matrices of human lncRNA-disease associations coming from either lncRNADisease database or MNDR database, into an association prediction method WKNKN for lncRNA-disease association prediction. In addition, case studies of breast cancer and adenocarcinoma were also performed to validate the effectiveness of IDSSIM. CONCLUSIONS: Results demonstrated that in terms of ROC curves and AUC values, IDSSIM is superior to compared models, and can improve accuracy of disease semantic similarity effectively, leading to increase the association prediction ability of the IDSSIM-WKNKN model; in terms of case studies, most of potential disease-associated lncRNAs predicted by IDSSIM can be confirmed by databases and literatures, implying that IDSSIM can serve as a promising tool for predicting lncRNA functions, identifying potential lncRNA-disease associations, and pre-screening candidate lncRNAs to perform biological experiments. The IDSSIM code, all experimental data and prediction results are available online at https://github.com/CDMB-lab/IDSSIM .


Assuntos
Algoritmos , Biologia Computacional/métodos , Doença/genética , Modelos Genéticos , RNA Longo não Codificante/genética , Semântica , Adenocarcinoma/genética , Área Sob a Curva , Neoplasias da Mama/genética , Bases de Dados Genéticas , Feminino , Humanos , Curva ROC
11.
Hum Genomics ; 13(Suppl 1): 46, 2019 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-31639067

RESUMO

BACKGROUND: As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data. RESULTS: To alleviate these problems, we present a novel NMF framework named robust hypergraph regularized non-negative matrix factorization (RHNMF). In particular, the hypergraph Laplacian regularization is imposed to capture the geometric information of original data. Unlike graph Laplacian regularization which captures the relationship between pairwise sample points, it captures the high-order relationship among more sample points. Moreover, the robustness of the RHNMF is enhanced by using the L2,1-norm constraint when estimating the residual. This is because the L2,1-norm is insensitive to noise and outliers. CONCLUSIONS: Clustering and common abnormal expression gene (com-abnormal expression gene) selection are conducted to test the validity of the RHNMF model. Extensive experimental results on multi-view datasets reveal that our proposed model outperforms other state-of-the-art methods.


Assuntos
Algoritmos , Bases de Dados Genéticas , Regulação Neoplásica da Expressão Gênica , Análise por Conglomerados , Humanos , Neoplasias/genética
12.
Hum Hered ; 84(1): 9-20, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31412348

RESUMO

Cancer subtyping is of great importance for the prediction, diagnosis, and precise treatment of cancer patients. Many clustering methods have been proposed for cancer subtyping. In 2014, a clustering algorithm named Clustering by Fast Search and Find of Density Peaks (CFDP) was proposed and published in Science, which has been applied to cancer subtyping and achieved attractive results. However, CFDP requires to set two key parameters (cluster centers and cutoff distance) manually, while their optimal values are difficult to be determined. To overcome this limitation, an automatic clustering method named PSO-CFDP is proposed in this paper, in which cluster centers and cutoff distance are automatically determined by running an improved particle swarm optimization (PSO) algorithm multiple times. Experiments using PSO-CFDP, as well as LR-CFDP, STClu, CH-CCFDAC, and CFDP, were performed on four benchmark data-sets and two real cancer gene expression datasets. The results show that PSO-CFDP can determine cluster centers and cutoff distance automatically within controllable time/cost and, therefore, improve the accuracy of cancer subtyping.


Assuntos
Algoritmos , Análise por Conglomerados , Neoplasias/classificação , Expressão Gênica , Humanos , Neoplasias/genética
13.
BMC Bioinformatics ; 20(Suppl 22): 716, 2019 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-31888433

RESUMO

BACKGROUND: In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in data, e.g., pair-wise distance between data points, have not been explored. This information can facilitate sample clustering; second, the Principal Components (PCs) determined by PCA are dense, leading to hard interpretation. However, only a few of genes are related to the cancer. It is of great significance for the early diagnosis and treatment of cancer to identify a handful of the differentially expressed genes and find new cancer biomarkers. RESULTS: In this study, a new method gLSPCA is proposed to integrate both graph Laplacian and sparse constraint into PCA. gLSPCA on the one hand improves the clustering accuracy by exploring the internal geometric structure of the data, on the other hand identifies differentially expressed genes by imposing a sparsity constraint on the PCs. CONCLUSIONS: Experiments of gLSPCA and its comparison with existing methods, including Z-SPCA, GPower, PathSPCA, SPCArt, gLPCA, are performed on real datasets of both pancreatic cancer (PAAD) and head & neck squamous carcinoma (HNSC). The results demonstrate that gLSPCA is effective in identifying differentially expressed genes and sample clustering. In addition, the applications of gLSPCA on these datasets provide several new clues for the exploration of causative factors of PAAD and HNSC.


Assuntos
Algoritmos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Análise de Componente Principal , Análise por Conglomerados , Expressão Gênica , Humanos , Neoplasias/genética , Mapas de Interação de Proteínas
14.
BMC Bioinformatics ; 20(1): 5, 2019 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-30611214

RESUMO

BACKGROUND: Predicting drug-disease interactions (DDIs) is time-consuming and expensive. Improving the accuracy of prediction results is necessary, and it is crucial to develop a novel computing technology to predict new DDIs. The existing methods mostly use the construction of heterogeneous networks to predict new DDIs. However, the number of known interacting drug-disease pairs is small, so there will be many errors in this heterogeneous network that will interfere with the final results. RESULTS: A novel method, known as the dual-network L2,1-collaborative matrix factorization, is proposed to predict novel DDIs. The Gaussian interaction profile kernels and L2,1-norm are introduced in our method to achieve better results than other advanced methods. The network similarities of drugs and diseases with their chemical and semantic similarities are combined in this method. CONCLUSIONS: Cross validation is used to evaluate our method, and simulation experiments are used to predict new interactions using two different datasets. Finally, our prediction accuracy is better than other existing methods. This proves that our method is feasible and effective.


Assuntos
Algoritmos , Biologia Computacional/métodos , Doença , Interações Medicamentosas , Área Sob a Curva , Bases de Dados como Assunto , Humanos , Reprodutibilidade dos Testes , Semântica
15.
BMC Bioinformatics ; 17(1): 214, 2016 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-27184783

RESUMO

BACKGROUND: Detecting and visualizing nonlinear interaction effects of single nucleotide polymorphisms (SNPs) or epistatic interactions are important topics in bioinformatics since they play an important role in unraveling the mystery of "missing heritability". However, related studies are almost limited to pairwise epistatic interactions due to their methodological and computational challenges. RESULTS: We develop CINOEDV (Co-Information based N-Order Epistasis Detector and Visualizer) for the detection and visualization of epistatic interactions of their orders from 1 to n (n ≥ 2). CINOEDV is composed of two stages, namely, detecting stage and visualizing stage. In detecting stage, co-information based measures are employed to quantify association effects of n-order SNP combinations to the phenotype, and two types of search strategies are introduced to identify n-order epistatic interactions: an exhaustive search and a particle swarm optimization based search. In visualizing stage, all detected n-order epistatic interactions are used to construct a hypergraph, where a real vertex represents the main effect of a SNP and a virtual vertex denotes the interaction effect of an n-order epistatic interaction. By deeply analyzing the constructed hypergraph, some hidden clues for better understanding the underlying genetic architecture of complex diseases could be revealed. CONCLUSIONS: Experiments of CINOEDV and its comparison with existing state-of-the-art methods are performed on both simulation data sets and a real data set of age-related macular degeneration. Results demonstrate that CINOEDV is promising in detecting and visualizing n-order epistatic interactions. CINOEDV is implemented in R and is freely available from R CRAN: http://cran.r-project.org and https://sourceforge.net/projects/cinoedv/files/ .


Assuntos
Algoritmos , Biologia Computacional/métodos , Epistasia Genética , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla , Humanos , Degeneração Macular/genética
16.
Noncoding RNA ; 10(1)2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-38392964

RESUMO

Biological research has demonstrated the significance of identifying miRNA-disease associations in the context of disease prevention, diagnosis, and treatment. However, the utilization of experimental approaches involving biological subjects to infer these associations is both costly and inefficient. Consequently, there is a pressing need to devise novel approaches that offer enhanced accuracy and effectiveness. Presently, the predominant methods employed for predicting disease associations rely on Graph Convolutional Network (GCN) techniques. However, the Graph Convolutional Network algorithm, which is locally aggregated, solely incorporates information from the immediate neighboring nodes of a given node at each layer. Consequently, GCN cannot simultaneously aggregate information from multiple nodes. This constraint significantly impacts the predictive efficacy of the model. To tackle this problem, we propose a novel approach, based on HyperGCN and Sørensen-Dice loss (HGSMDA), for predicting associations between miRNAs and diseases. In the initial phase, we developed multiple networks to represent the similarity between miRNAs and diseases and employed GCNs to extract information from diverse perspectives. Subsequently, we draw into HyperGCN to construct a miRNA-disease heteromorphic hypergraph using hypernodes and train GCN on the graph to aggregate information. Finally, we utilized the Sørensen-Dice loss function to evaluate the degree of similarity between the predicted outcomes and the ground truth values, thereby enabling the prediction of associations between miRNAs and diseases. In order to assess the soundness of our methodology, an extensive series of experiments was conducted employing the Human MicroRNA Disease Database (HMDD v3.2) as the dataset. The experimental outcomes unequivocally indicate that HGSMDA exhibits remarkable efficacy when compared to alternative methodologies. Furthermore, the predictive capacity of HGSMDA was corroborated through a case study focused on colon cancer. These findings strongly imply that HGSMDA represents a dependable and valid framework, thereby offering a novel avenue for investigating the intricate association between miRNAs and diseases.

17.
Artigo em Inglês | MEDLINE | ID: mdl-38319777

RESUMO

Advances in high-throughput single-cell RNA sequencing (scRNA-seq) technology have provided more comprehensive biological information on cell expression. Clustering analysis is a critical step in scRNA-seq research and provides clear knowledge of the cell identity. Unfortunately, the characteristics of scRNA-seq data and the limitations of existing technologies make clustering encounter a considerable challenge. Meanwhile, some existing methods treat different features equally and ignore differences in feature contributions, which leads to a loss of information. To overcome limitations, we introduce a weighted distance constraint into the construction of the similarity graph and combine the similarity constraint. We propose the Joint Automatic Weighting Similarity Graph and Low-rank Representation (JAGLRR) clustering method. Evaluating the contributions of each feature and assigning various weight values can increase the significance of valuable features while decreasing the interference of redundant features. The similarity constraint allows the model to generate a more symmetric affinity matrix. Benefitting from that affinity matrix, JAGLRR recovers the original linear relationship of the data more accurately and obtains more discriminative information. The results on simulated datasets and 8 real datasets show that JAGLRR outperforms 11 existing comparison methods in clustering experiments, with higher clustering accuracy and stability.


Assuntos
Algoritmos , Biologia Computacional , RNA-Seq , Análise de Célula Única , Análise por Conglomerados , Análise de Célula Única/métodos , Biologia Computacional/métodos , RNA-Seq/métodos , Humanos , Animais , Análise de Sequência de RNA/métodos , Camundongos , Análise da Expressão Gênica de Célula Única
18.
Artigo em Inglês | MEDLINE | ID: mdl-36912759

RESUMO

The development and widespread utilization of high-throughput sequencing technologies in biology has fueled the rapid growth of single-cell RNA sequencing (scRNA-seq) data over the past decade. The development of scRNA-seq technology has significantly expanded researchers' understanding of cellular heterogeneity. Accurate cell type identification is the prerequisite for any research on heterogeneous cell populations. However, due to the high noise and high dimensionality of scRNA-seq data, improving the effectiveness of cell type identification remains a challenge. As an effective dimensionality reduction method, Principal Component Analysis (PCA) is an essential tool for visualizing high-dimensional scRNA-seq data and identifying cell subpopulations. However, traditional PCA has some defects when used in mining the nonlinear manifold structure of the data and usually suffers from over-density of principal components (PCs). Therefore, we present a novel method in this paper called joint L2,p-norm and random walk graph constrained PCA (RWPPCA). RWPPCA aims to retain the data's local information in the process of mapping high-dimensional data to low-dimensional space, to more accurately obtain sparse principal components and to then identify cell types more precisely. Specifically, RWPPCA combines the random walk (RW) algorithm with graph regularization to more accurately determine the local geometric relationships between data points. Moreover, to mitigate the adverse effects of dense PCs, the L2,p-norm is introduced to make the PCs sparser, thus increasing their interpretability. Then, we evaluate the effectiveness of RWPPCA on simulated data and scRNA-seq data. The results show that RWPPCA performs well in cell type identification and outperforms other comparison methods.


Assuntos
Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Análise de Componente Principal , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados
19.
Int J Neural Syst ; : 2450050, 2024 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-38973024

RESUMO

Although the density peak clustering (DPC) algorithm can effectively distribute samples and quickly identify noise points, it lacks adaptability and cannot consider the local data structure. In addition, clustering algorithms generally suffer from high time complexity. Prior research suggests that clustering algorithms grounded in P systems can mitigate time complexity concerns. Within the realm of membrane systems (P systems), spiking neural P systems (SN P systems), inspired by biological nervous systems, are third-generation neural networks that possess intricate structures and offer substantial parallelism advantages. Thus, this study first improved the DPC by introducing the maximum nearest neighbor distance and K-nearest neighbors (KNN). Moreover, a method based on delayed spiking neural P systems (DSN P systems) was proposed to improve the performance of the algorithm. Subsequently, the DSNP-ANDPC algorithm was proposed. The effectiveness of DSNP-ANDPC was evaluated through comprehensive evaluations across four synthetic datasets and 10 real-world datasets. The proposed method outperformed the other comparison methods in most cases.

20.
Artigo em Inglês | MEDLINE | ID: mdl-38833405

RESUMO

Feature selection is a critical component of data mining and has garnered significant attention in recent years. However, feature selection methods based on information entropy often introduce complex mutual information forms to measure features, leading to increased redundancy and potential errors. To address this issue, we propose FSCME, a feature selection method combining Copula correlation (Ccor) and the maximum information coefficient (MIC) by entropy weights. The FSCME takes into consideration the relevance between features and labels, as well as the redundancy among candidate features and selected features. Therefore, the FSCME utilizes Ccor to measure the redundancy between features, while also estimating the relevance between features and labels. Meanwhile, the FSCME employs MIC to enhance the credibility of the correlation between features and labels. Moreover, this study employs the Entropy Weight Method (EWM) to evaluate and assign weights to the Ccor and MIC. The experimental results demonstrate that FSCME yields a more effective feature subset for subsequent clustering processes, significantly improving the classification performance compared to the other six feature selection methods. The source codes of the FSCME are available online at https://github.com/CDMBlab/FSCME.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA