Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 143
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36305457

RESUMO

With the development of research on the complex aetiology of many diseases, computational drug repositioning methodology has proven to be a shortcut to costly and inefficient traditional methods. Therefore, developing more promising computational methods is indispensable for finding new candidate diseases to treat with existing drugs. In this paper, a model integrating a new variant of message passing neural network and a novel-gated fusion mechanism called GLGMPNN is proposed for drug-disease association prediction. First, a light-gated message passing neural network (LGMPNN), including message passing, aggregation and updating, is proposed to separately extract multiple pieces of information from the similarity networks and the association network. Then, a gated fusion mechanism consisting of a forget gate and an output gate is applied to integrate the multiple pieces of information to extent. The forget gate calculated by the multiple embeddings is built to integrate the association information into the similarity information. Furthermore, the final node representations are controlled by the output gate, which fuses the topology information of the networks and the initial similarity information. Finally, a bilinear decoder is adopted to reconstruct an adjacency matrix for drug-disease associations. Evaluated by 10-fold cross-validations, GLGMPNN achieves excellent performance compared with the current models. The following studies show that our model can effectively discover novel drug-disease associations.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Biologia Computacional/métodos , Reposicionamento de Medicamentos/métodos , Algoritmos
2.
BMC Bioinformatics ; 24(1): 13, 2023 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-36624376

RESUMO

BACKGROUND: Constructing molecular interaction networks from microarray data and then identifying disease module biomarkers can provide insight into the underlying pathogenic mechanisms of non-small cell lung cancer. A promising approach for identifying disease modules in the network is community detection. RESULTS: In order to identify disease modules from gene co-expression networks, a community detection method is proposed based on multi-objective optimization genetic algorithm with decomposition. The method is named DM-MOGA and possesses two highlights. First, the boundary correction strategy is designed for the modules obtained in the process of local module detection and pre-simplification. Second, during the evolution, we introduce Davies-Bouldin index and clustering coefficient as fitness functions which are improved and migrated to weighted networks. In order to identify modules that are more relevant to diseases, the above strategies are designed to consider the network topology of genes and the strength of connections with other genes at the same time. Experimental results of different gene expression datasets of non-small cell lung cancer demonstrate that the core modules obtained by DM-MOGA are more effective than those obtained by several other advanced module identification methods. CONCLUSIONS: The proposed method identifies disease-relevant modules by optimizing two novel fitness functions to simultaneously consider the local topology of each gene and its connection strength with other genes. The association of the identified core modules with lung cancer has been confirmed by pathway and gene ontology enrichment analysis.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/genética , Neoplasias Pulmonares/genética , Redes Reguladoras de Genes , Análise em Microsséries , Algoritmos , Perfilação da Expressão Gênica/métodos
3.
BMC Genomics ; 24(1): 426, 2023 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-37516822

RESUMO

Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Estudos de Associação Genética , Aprendizado de Máquina , Mapeamento de Interação de Proteínas
4.
BMC Genomics ; 24(1): 279, 2023 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-37226081

RESUMO

BACKGROUND: Piwi-interacting RNAs (piRNAs) have been proven to be closely associated with human diseases. The identification of the potential associations between piRNA and disease is of great significance for complex diseases. Traditional "wet experiment" is time-consuming and high-priced, predicting the piRNA-disease associations by computational methods is of great significance. METHODS: In this paper, a method based on the embedding transformation graph convolution network is proposed to predict the piRNA-disease associations, named ETGPDA. Specifically, a heterogeneous network is constructed based on the similarity information of piRNA and disease, as well as the known piRNA-disease associations, which is applied to extract low-dimensional embeddings of piRNA and disease based on graph convolutional network with an attention mechanism. Furthermore, the embedding transformation module is developed for the problem of embedding space inconsistency, which is lightweighter, stronger learning ability and higher accuracy. Finally, the piRNA-disease association score is calculated by the similarity of the piRNA and disease embedding. RESULTS: Evaluated by fivefold cross-validation, the AUC of ETGPDA achieves 0.9603, which is better than the other five selected computational models. The case studies based on Head and neck squamous cell carcinoma and Alzheimer's disease further prove the superior performance of ETGPDA. CONCLUSIONS: Hence, the ETGPDA is an effective method for predicting the hidden piRNA-disease associations.


Assuntos
Doença de Alzheimer , Neoplasias de Cabeça e Pescoço , Humanos , RNA de Interação com Piwi , Doença de Alzheimer/genética , Aprendizagem , Projetos de Pesquisa
5.
BMC Bioinformatics ; 23(1): 381, 2022 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-36123637

RESUMO

Biclustering algorithm is an effective tool for processing gene expression datasets. There are two kinds of data matrices, binary data and non-binary data, which are processed by biclustering method. A binary matrix is usually converted from pre-processed gene expression data, which can effectively reduce the interference from noise and abnormal data, and is then processed using a biclustering algorithm. However, biclustering algorithms of dealing with binary data have a poor balance between running time and performance. In this paper, we propose a new biclustering algorithm called the Adjacency Difference Matrix Binary Biclustering algorithm (AMBB) for dealing with binary data to address the drawback. The AMBB algorithm constructs the adjacency matrix based on the adjacency difference values, and the submatrix obtained by continuously updating the adjacency difference matrix is called a bicluster. The adjacency matrix allows for clustering of gene that undergo similar reactions under different conditions into clusters, which is important for subsequent genes analysis. Meanwhile, experiments on synthetic and real datasets visually demonstrate that the AMBB algorithm has high practicability.


Assuntos
Análise de Dados , Perfilação da Expressão Gênica , Algoritmos , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos
6.
BMC Genomics ; 23(1): 686, 2022 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-36199016

RESUMO

BACKGROUND: MicroRNAs (miRNAs) have been confirmed to be inextricably linked to the emergence of human complex diseases. The identification of the disease-related miRNAs has gradually become a routine way to unveil the genetic mechanisms of examined disorders. METHODS: In this study, a method BLNIMDA based on a weighted bi-level network was proposed for predicting hidden associations between miRNAs and diseases. For this purpose, the known associations between miRNAs and diseases as well as integrated similarities between miRNAs and diseases are mapped into a bi-level network. Based on the developed bi-level network, the miRNA-disease associations (MDAs) are defined as strong associations, potential associations and no associations. Then, each miRNA-disease pair (MDP) is assigned two information properties according to the bidirectional information distribution strategy, i.e., associations of miRNA towards disease and vice-versa. Finally, two affinity weights for each MDP obtained from the information properties and the association type are then averaged as the final association score of the MDP. Highlights of the BLNIMDA lie in the definition of MDA types, and the introduction of affinity weights evaluation from the bidirectional information distribution strategy and defined association types, which ensure the comprehensiveness and accuracy of the final prediction score of MDAs. RESULTS: Five-fold cross-validation and leave-one-out cross-validation are used to evaluate the performance of the BLNIMDA. The results of the Area Under Curve show that the BLNIMDA has many advantages over the other seven selected computational methods. Furthermore, the case studies based on four common diseases and miRNAs prove that the BLNIMDA has good predictive performance. CONCLUSIONS: Therefore, the BLNIMDA is an effective method for predicting hidden MDAs.


Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética
7.
Bioinformatics ; 37(18): 2920-2929, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33730153

RESUMO

MOTIVATION: For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. RESULTS: In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. AVAILABILITY AND IMPLEMENTATION: The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Glioblastoma , Software , Humanos
8.
BMC Med Inform Decis Mak ; 22(1): 69, 2022 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-35305630

RESUMO

BACKGROUND: MiRNA is a class of non-coding single-stranded RNA molecules with a length of approximately 22 nucleotides encoded by endogenous genes, which can regulate the expression of other genes. Therefore, it is very important to predict the associations between miRNA and disease. Predecessors developed a new prediction method of drug-disease association, and it achieved good results. METHODS: In this paper, we introduced the method of LAGCN to identify potential miRNA-disease associations. First, we integrate three associations into a heterogeneous network, such as the known miRNA-disease association, miRNA-miRNA similarities and disease-disease similarities, next we apply graph convolution network to learn the embedding of miRNA and disease. We use an attention mechanism to combine embedding from multiple convolution layers. Unobserved miRNA-disease associations are scored based on integrated embedding. RESULTS: After fivefold cross-validations, the value of AUC is reached 0.9091, which is higher than other prediction methods and baseline methods. CONCLUSIONS: In this paper, we introduced the method of LAGCN to identify potential miRNA-disease associations. LAGCN has achieved good performance in predicting miRNA-disease associations, and it is superior to other association prediction methods and baseline methods.


Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , Humanos , MicroRNAs/genética
9.
Int J Mol Sci ; 23(17)2022 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-36077236

RESUMO

Compared to single-drug therapy, drug combinations have shown great potential in cancer treatment. Most of the current methods employ genomic data and chemical information to construct drug-cancer cell line features, but there is still a need to explore methods to combine topological information in the protein interaction network (PPI). Therefore, we propose a network-embedding-based prediction model, NEXGB, which integrates the corresponding protein modules of drug-cancer cell lines with PPI network information. NEXGB extracts the topological features of each protein node in a PPI network by struc2vec. Then, we combine the topological features with the target protein information of drug-cancer cell lines, to generate drug features and cancer cell line features, and utilize extreme gradient boosting (XGBoost) to predict the synergistic relationship between drug combinations and cancer cell lines. We apply our model on two recently developed datasets, the Oncology-Screen dataset (Oncology-Screen) and the large drug combination dataset (DrugCombDB). The experimental results show that NEXGB outperforms five current methods, and it effectively improves the predictive power in discovering relationships between drug combinations and cancer cell lines. This further demonstrates that the network information is valid for detecting combination therapies for cancer and other complex diseases.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica , Neoplasias , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Combinação de Medicamentos , Genômica , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Mapas de Interação de Proteínas , Proteínas/uso terapêutico
10.
Molecules ; 27(14)2022 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-35889243

RESUMO

Many microRNAs (miRNAs) have been confirmed to be associated with the generation of human diseases. Capturing miRNA-disease associations (M-DAs) provides an effective way to understand the etiology of diseases. Many models for predicting M-DAs have been constructed; nevertheless, there are still several limitations, such as generally considering direct information between miRNAs and diseases, usually ignoring potential knowledge hidden in isolated miRNAs or diseases. To overcome these limitations, in this study a novel method for predicting M-DAs was developed named TLNPMD, highlights of which are the introduction of drug heuristic information and a bipartite network reconstruction strategy. Specifically, three bipartite networks, including drug-miRNA, drug-disease, and miRNA-disease, were reconstructed as weighted ones using such reconstruction strategy. Based on these weighted bipartite networks, as well as three corresponding similarity networks of drugs, miRNAs and diseases, the miRNA-drug-disease three-layer heterogeneous network was constructed. Then, this heterogeneous network was converted into three two-layer heterogeneous networks, for each of which the network path computational model was employed to predict association scores. Finally, both direct and indirect miRNA-disease paths were used to predict M-DAs. Comparative experiments of TLNPMD and other four models were performed and evaluated by five-fold and global leave-one-out cross validations, results of which show that TLNPMD has the highest AUC values among those of compared methods. In addition, case studies of two common diseases were carried out to validate the effectiveness of the TLNPMD. These experiments demonstrate that the TLNPMD may serve as a promising alternative to existing methods for predicting M-DAs.


Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , Humanos , MicroRNAs/genética
11.
BMC Bioinformatics ; 22(1): 175, 2021 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-33794766

RESUMO

BACKGROUND: Identifying lncRNA-disease associations not only helps to better comprehend the underlying mechanisms of various human diseases at the lncRNA level but also speeds up the identification of potential biomarkers for disease diagnoses, treatments, prognoses, and drug response predictions. However, as the amount of archived biological data continues to grow, it has become increasingly difficult to detect potential human lncRNA-disease associations from these enormous biological datasets using traditional biological experimental methods. Consequently, developing new and effective computational methods to predict potential human lncRNA diseases is essential. RESULTS: Using a combination of incremental principal component analysis (IPCA) and random forest (RF) algorithms and by integrating multiple similarity matrices, we propose a new algorithm (IPCARF) based on integrated machine learning technology for predicting lncRNA-disease associations. First, we used two different models to compute a semantic similarity matrix of diseases from a directed acyclic graph of diseases. Second, a characteristic vector for each lncRNA-disease pair is obtained by integrating disease similarity, lncRNA similarity, and Gaussian nuclear similarity. Then, the best feature subspace is obtained by applying IPCA to decrease the dimension of the original feature set. Finally, we train an RF model to predict potential lncRNA-disease associations. The experimental results show that the IPCARF algorithm effectively improves the AUC metric when predicting potential lncRNA-disease associations. Before the parameter optimization procedure, the AUC value predicted by the IPCARF algorithm under 10-fold cross-validation reached 0.8529; after selecting the optimal parameters using the grid search algorithm, the predicted AUC of the IPCARF algorithm reached 0.8611. CONCLUSIONS: We compared IPCARF with the existing LRLSLDA, LRLSLDA-LNCSIM, TPGLDA, NPCMF, and ncPred prediction methods, which have shown excellent performance in predicting lncRNA-disease associations. The compared results of 10-fold cross-validation procedures show that the predictions of the IPCARF method are better than those of the other compared methods.


Assuntos
Biologia Computacional , Aprendizado de Máquina , RNA Longo não Codificante , Algoritmos , Humanos , Análise de Componente Principal , RNA Longo não Codificante/genética
12.
BMC Bioinformatics ; 22(Suppl 3): 241, 2021 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-33980147

RESUMO

BACKGROUND: In the development of science and technology, there are increasing evidences that there are some associations between lncRNAs and human diseases. Therefore, finding these associations between them will have a huge impact on our treatment and prevention of some diseases. However, the process of finding the associations between them is very difficult and requires a lot of time and effort. Therefore, it is particularly important to find some good methods for predicting lncRNA-disease associations (LDAs). RESULTS: In this paper, we propose a method based on dual sparse collaborative matrix factorization (DSCMF) to predict LDAs. The DSCMF method is improved on the traditional collaborative matrix factorization method. To increase the sparsity, the L2,1-norm is added in our method. At the same time, Gaussian interaction profile kernel is added to our method, which increase the network similarity between lncRNA and disease. Finally, the AUC value obtained by the experiment is used to evaluate the quality of our method, and the AUC value is obtained by the ten-fold cross-validation method. CONCLUSIONS: The AUC value obtained by the DSCMF method is 0.8523. At the end of the paper, simulation experiment is carried out, and the experimental results of prostate cancer, breast cancer, ovarian cancer and colorectal cancer are analyzed in detail. The DSCMF method is expected to bring some help to lncRNA-disease associations research. The code can access the https://github.com/Ming-0113/DSCMF website.


Assuntos
Neoplasias da Mama , Neoplasias da Próstata , RNA Longo não Codificante , Algoritmos , Simulação por Computador , Humanos , Masculino , Neoplasias da Próstata/genética , RNA Longo não Codificante/genética
13.
BMC Bioinformatics ; 22(1): 573, 2021 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-34837953

RESUMO

BACKGROUND: With the rapid development of various advanced biotechnologies, researchers in related fields have realized that microRNAs (miRNAs) play critical roles in many serious human diseases. However, experimental identification of new miRNA-disease associations (MDAs) is expensive and time-consuming. Practitioners have shown growing interest in methods for predicting potential MDAs. In recent years, an increasing number of computational methods for predicting novel MDAs have been developed, making a huge contribution to the research of human diseases and saving considerable time. In this paper, we proposed an efficient computational method, named bipartite graph-based collaborative matrix factorization (BGCMF), which is highly advantageous for predicting novel MDAs. RESULTS: By combining two improved recommendation methods, a new model for predicting MDAs is generated. Based on the idea that some new miRNAs and diseases do not have any associations, we adopt the bipartite graph based on the collaborative matrix factorization method to complete the prediction. The BGCMF achieves a desirable result, with AUC of up to 0.9514 ± (0.0007) in the five-fold cross-validation experiments. CONCLUSIONS: Five-fold cross-validation is used to evaluate the capabilities of our method. Simulation experiments are implemented to predict new MDAs. More importantly, the AUC value of our method is higher than those of some state-of-the-art methods. Finally, many associations between new miRNAs and new diseases are successfully predicted by performing simulation experiments, indicating that BGCMF is a useful method to predict more potential miRNAs with roles in various diseases.


Assuntos
MicroRNAs , Algoritmos , Biologia Computacional , Simulação por Computador , Predisposição Genética para Doença , Humanos , MicroRNAs/genética
14.
BMC Bioinformatics ; 21(1): 445, 2020 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-33028187

RESUMO

BACKGROUND: As a machine learning method with high performance and excellent generalization ability, extreme learning machine (ELM) is gaining popularity in various studies. Various ELM-based methods for different fields have been proposed. However, the robustness to noise and outliers is always the main problem affecting the performance of ELM. RESULTS: In this paper, an integrated method named correntropy induced loss based sparse robust graph regularized extreme learning machine (CSRGELM) is proposed. The introduction of correntropy induced loss improves the robustness of ELM and weakens the negative effects of noise and outliers. By using the L2,1-norm to constrain the output weight matrix, we tend to obtain a sparse output weight matrix to construct a simpler single hidden layer feedforward neural network model. By introducing the graph regularization to preserve the local structural information of the data, the classification performance of the new method is further improved. Besides, we design an iterative optimization method based on the idea of half quadratic optimization to solve the non-convex problem of CSRGELM. CONCLUSIONS: The classification results on the benchmark dataset show that CSRGELM can obtain better classification results compared with other methods. More importantly, we also apply the new method to the classification problems of cancer samples and get a good classification effect.


Assuntos
Aprendizado de Máquina , Neoplasias/classificação , Benchmarking , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos , Neoplasias/patologia
15.
BMC Bioinformatics ; 21(1): 339, 2020 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-32736513

RESUMO

BACKGROUND: It has been widely accepted that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human diseases. Many association prediction models have been proposed for predicting lncRNA functions and identifying potential lncRNA-disease associations. Nevertheless, among them, little effort has been attempted to measure lncRNA functional similarity, which is an essential part of association prediction models. RESULTS: In this study, we presented an lncRNA functional similarity calculation model, IDSSIM for short, based on an improved disease semantic similarity method, highlight of which is the introduction of information content contribution factor into the semantic value calculation to take into account both the hierarchical structures of disease directed acyclic graphs and the disease specificities. IDSSIM and three state-of-the-art models, i.e., LNCSIM1, LNCSIM2, and ILNCSIM, were evaluated by applying their disease semantic similarity matrices and the lncRNA functional similarity matrices, as well as corresponding matrices of human lncRNA-disease associations coming from either lncRNADisease database or MNDR database, into an association prediction method WKNKN for lncRNA-disease association prediction. In addition, case studies of breast cancer and adenocarcinoma were also performed to validate the effectiveness of IDSSIM. CONCLUSIONS: Results demonstrated that in terms of ROC curves and AUC values, IDSSIM is superior to compared models, and can improve accuracy of disease semantic similarity effectively, leading to increase the association prediction ability of the IDSSIM-WKNKN model; in terms of case studies, most of potential disease-associated lncRNAs predicted by IDSSIM can be confirmed by databases and literatures, implying that IDSSIM can serve as a promising tool for predicting lncRNA functions, identifying potential lncRNA-disease associations, and pre-screening candidate lncRNAs to perform biological experiments. The IDSSIM code, all experimental data and prediction results are available online at https://github.com/CDMB-lab/IDSSIM .


Assuntos
Algoritmos , Biologia Computacional/métodos , Doença/genética , Modelos Genéticos , RNA Longo não Codificante/genética , Semântica , Adenocarcinoma/genética , Área Sob a Curva , Neoplasias da Mama/genética , Bases de Dados Genéticas , Feminino , Humanos , Curva ROC
16.
BMC Bioinformatics ; 21(1): 454, 2020 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-33054708

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are non-coding RNAs with regulatory functions. Many studies have shown that miRNAs are closely associated with human diseases. Among the methods to explore the relationship between the miRNA and the disease, traditional methods are time-consuming and the accuracy needs to be improved. In view of the shortcoming of previous models, a method, collaborative matrix factorization based on matrix completion (MCCMF) is proposed to predict the unknown miRNA-disease associations. RESULTS: The complete matrix of the miRNA and the disease is obtained by matrix completion. Moreover, Gaussian Interaction Profile kernel is added to the miRNA functional similarity matrix and the disease semantic similarity matrix. Then the Weight K Nearest Known Neighbors method is used to pretreat the association matrix, so the model is close to the reality. Finally, collaborative matrix factorization method is applied to obtain the prediction results. Therefore, the MCCMF obtains a satisfactory result in the fivefold cross-validation, with an AUC of 0.9569 (0.0005). CONCLUSIONS: The AUC value of MCCMF is higher than other advanced methods in the fivefold cross validation experiment. In order to comprehensively evaluate the performance of MCCMF, accuracy, precision, recall and f-measure are also added. The final experimental results demonstrate that MCCMF outperforms other methods in predicting miRNA-disease associations. In the end, the effectiveness and practicability of MCCMF are further verified by researching three specific diseases.


Assuntos
Algoritmos , Predisposição Genética para Doença , MicroRNAs/genética , Área Sob a Curva , Redes Reguladoras de Genes , Hepatoblastoma/genética , Humanos , Curva ROC , Reprodutibilidade dos Testes , Retinoblastoma/genética , Fatores de Risco
17.
Hum Genomics ; 13(Suppl 1): 46, 2019 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-31639067

RESUMO

BACKGROUND: As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data. RESULTS: To alleviate these problems, we present a novel NMF framework named robust hypergraph regularized non-negative matrix factorization (RHNMF). In particular, the hypergraph Laplacian regularization is imposed to capture the geometric information of original data. Unlike graph Laplacian regularization which captures the relationship between pairwise sample points, it captures the high-order relationship among more sample points. Moreover, the robustness of the RHNMF is enhanced by using the L2,1-norm constraint when estimating the residual. This is because the L2,1-norm is insensitive to noise and outliers. CONCLUSIONS: Clustering and common abnormal expression gene (com-abnormal expression gene) selection are conducted to test the validity of the RHNMF model. Extensive experimental results on multi-view datasets reveal that our proposed model outperforms other state-of-the-art methods.


Assuntos
Algoritmos , Bases de Dados Genéticas , Regulação Neoplásica da Expressão Gênica , Análise por Conglomerados , Humanos , Neoplasias/genética
18.
Hum Hered ; 84(1): 21-33, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31466058

RESUMO

Differentially expressed genes selection becomes a hotspot and difficulty in recent molecular biology. Low-rank representation (LRR) uniting graph Laplacian regularization has gained good achievement in the above field. However, the co-expression information of data cannot be captured well by graph regularization. Therefore, a novel low-rank representation method regularized by dual-hypergraph Laplacian is proposed to reveal the intrinsic geometrical structures hidden in the samples and genes direction simultaneously, which is called dual-hypergraph Laplacian regularized LRR (DHLRR). Finally, a low-rank matrix and a sparse perturbation matrix can be recovered from genomic data by DHLRR. Based on the sparsity of differentially expressed genes, the sparse disturbance matrix can be applied to extracting differentially expressed genes. In our experiments, two gene analysis tools are used to discuss the experimental results. The results on two real genomic data and an integrated dataset prove that DHLRR is efficient and effective in finding differentially expressed genes.


Assuntos
Regulação Neoplásica da Expressão Gênica , Genômica/métodos , Neoplasias Pancreáticas/genética , Carcinoma de Células Escamosas de Cabeça e Pescoço/genética , Humanos
19.
Hum Hered ; 84(1): 47-58, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31466072

RESUMO

Principal component analysis (PCA) is a widely used method for evaluating low-dimensional data. Some variants of PCA have been proposed to improve the interpretation of the principal components (PCs). One of the most common methods is sparse PCA which aims at finding a sparse basis to improve the interpretability over the dense basis of PCA. However, the performances of these improved methods are still far from satisfactory because the data still contain redundant PCs. In this paper, a novel method called PCA based on graph Laplacian and double sparse constraints (GDSPCA) is proposed to improve the interpretation of the PCs and consider the internal geometry of the data. In detail, GDSPCA utilizes L2,1-norm and L1-norm regularization terms simultaneously to enforce the matrix to be sparse by filtering redundant and irrelative PCs, where the L2,1-norm regularization term can produce row sparsity, while the L1-norm regularization term can enforce element sparsity. This way, we can make a better interpretation of the new PCs in low-dimensional subspace. Meanwhile, the method of GDSPCA integrates graph Laplacian into PCA to explore the geometric structure hidden in the data. A simple and effective optimization solution is provided. Extensive experiments on multi-view biological data demonstrate the feasibility and effectiveness of the proposed approach.


Assuntos
Algoritmos , Análise de Componente Principal , Análise por Conglomerados , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/genética
20.
Hum Hered ; 84(1): 9-20, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31412348

RESUMO

Cancer subtyping is of great importance for the prediction, diagnosis, and precise treatment of cancer patients. Many clustering methods have been proposed for cancer subtyping. In 2014, a clustering algorithm named Clustering by Fast Search and Find of Density Peaks (CFDP) was proposed and published in Science, which has been applied to cancer subtyping and achieved attractive results. However, CFDP requires to set two key parameters (cluster centers and cutoff distance) manually, while their optimal values are difficult to be determined. To overcome this limitation, an automatic clustering method named PSO-CFDP is proposed in this paper, in which cluster centers and cutoff distance are automatically determined by running an improved particle swarm optimization (PSO) algorithm multiple times. Experiments using PSO-CFDP, as well as LR-CFDP, STClu, CH-CCFDAC, and CFDP, were performed on four benchmark data-sets and two real cancer gene expression datasets. The results show that PSO-CFDP can determine cluster centers and cutoff distance automatically within controllable time/cost and, therefore, improve the accuracy of cancer subtyping.


Assuntos
Algoritmos , Análise por Conglomerados , Neoplasias/classificação , Expressão Gênica , Humanos , Neoplasias/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA