Pesquisa | BVS Integralidade em Saúde

1.

Predicting microRNA-disease associations from lncRNA-microRNA interactions via Multiview Multitask Learning.

Huang, Yu-An; Chan, Keith C C; You, Zhu-Hong; Hu, Pengwei; Wang, Lei; Huang, Zhi-An.

Brief Bioinform ; 22(3)2021 05 20.

Artigo em Inglês | MEDLINE | ID: mdl-32633319

RESUMO

MOTIVATION: Identifying microRNAs that are associated with different diseases as biomarkers is a problem of great medical significance. Existing computational methods for uncovering such microRNA-diseases associations (MDAs) are mostly developed under the assumption that similar microRNAs tend to associate with similar diseases. Since such an assumption is not always valid, these methods may not always be applicable to all kinds of MDAs. Considering that the relationship between long noncoding RNA (lncRNA) and different diseases and the co-regulation relationships between the biological functions of lncRNA and microRNA have been established, we propose here a multiview multitask method to make use of the known lncRNA-microRNA interaction to predict MDAs on a large scale. The investigation is performed in the absence of complete information of microRNAs and any similarity measurement for it and to the best knowledge, the work represents the first ever attempt to discover MDAs based on lncRNA-microRNA interactions. RESULTS: In this paper, we propose to develop a deep learning model called MVMTMDA that can create a multiview representation of microRNAs. The model is trained based on an end-to-end multitasking approach to machine learning so that, based on it, missing data in the side information can be determined automatically. Experimental results show that the proposed model yields an average area under ROC curve of 0.8410+/-0.018, 0.8512+/-0.012 and 0.8521+/-0.008 when k is set to 2, 5 and 10, respectively. In addition, we also propose here a statistical approach to predicting lncRNA-disease associations based on these associations and the MDA discovered using MVMTMDA. AVAILABILITY: Python code and the datasets used in our studies are made available at https://github.com/yahuang1991polyu/MVMTMDA/.

Assuntos

Doença/genética , Aprendizado de Máquina , MicroRNAs , Modelos Genéticos , RNA Longo não Codificante , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Valor Preditivo dos Testes , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo

2.

Graph convolution for predicting associations between miRNA and drug resistance.

Huang, Yu-An; Hu, Pengwei; Chan, Keith C C; You, Zhu-Hong.

Bioinformatics ; 36(3): 851-858, 2020 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-31397851

RESUMO

MOTIVATION: MicroRNA (miRNA) therapeutics is becoming increasingly important. However, aberrant expression of miRNAs is known to cause drug resistance and can become an obstacle for miRNA-based therapeutics. At present, little is known about associations between miRNA and drug resistance and there is no computational tool available for predicting such association relationship. Since it is known that miRNAs can regulate genes that encode specific proteins that are keys for drug efficacy, we propose here a computational approach, called GCMDR, for finding a three-layer latent factor model that can be used to predict miRNA-drug resistance associations. RESULTS: In this paper, we discuss how the problem of predicting such associations can be formulated as a link prediction problem involving a bipartite attributed graph. GCMDR makes use of the technique of graph convolution to build a latent factor model, which can effectively utilize information of high-dimensional attributes of miRNA/drug in an end-to-end learning scheme. In addition, GCMDR also learns graph embedding features for miRNAs and drugs. We leveraged the data from multiple databases storing miRNA expression profile, drug substructure fingerprints, gene ontology and disease ontology. The test for performance shows that the GCMDR prediction model can achieve AUCs of 0.9301 ± 0.0005, 0.9359 ± 0.0006 and 0.9369 ± 0.0003 based on 2-fold, 5-fold and 10-fold cross validation, respectively. Using this model, we show that the associations between miRNA and drug resistance can be reliably predicted by properly introducing useful side information like miRNA expression profile and drug structure fingerprints. AVAILABILITY AND IMPLEMENTATION: Python codes and dataset are available at https://github.com/yahuang1991polyu/GCMDR/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

MicroRNAs , Algoritmos , Área Sob a Curva , Biologia Computacional , Resistência a Medicamentos

3.

An efficient approach based on multi-sources information to predict circRNA-disease associations using deep convolutional neural network.

Wang, Lei; You, Zhu-Hong; Huang, Yu-An; Huang, De-Shuang; Chan, Keith C C.

Bioinformatics ; 36(13): 4038-4046, 2020 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-31793982

RESUMO

MOTIVATION: Emerging evidence indicates that circular RNA (circRNA) plays a crucial role in human disease. Using circRNA as biomarker gives rise to a new perspective regarding our diagnosing of diseases and understanding of disease pathogenesis. However, detection of circRNA-disease associations by biological experiments alone is often blind, limited to small scale, high cost and time consuming. Therefore, there is an urgent need for reliable computational methods to rapidly infer the potential circRNA-disease associations on a large scale and to provide the most promising candidates for biological experiments. RESULTS: In this article, we propose an efficient computational method based on multi-source information combined with deep convolutional neural network (CNN) to predict circRNA-disease associations. The method first fuses multi-source information including disease semantic similarity, disease Gaussian interaction profile kernel similarity and circRNA Gaussian interaction profile kernel similarity, and then extracts its hidden deep feature through the CNN and finally sends them to the extreme learning machine classifier for prediction. The 5-fold cross-validation results show that the proposed method achieves 87.21% prediction accuracy with 88.50% sensitivity at the area under the curve of 86.67% on the CIRCR2Disease dataset. In comparison with the state-of-the-art SVM classifier and other feature extraction methods on the same dataset, the proposed model achieves the best results. In addition, we also obtained experimental support for prediction results by searching published literature. As a result, 7 of the top 15 circRNA-disease pairs with the highest scores were confirmed by literature. These results demonstrate that the proposed model is a suitable method for predicting circRNA-disease associations and can provide reliable candidates for biological experiments. AVAILABILITY AND IMPLEMENTATION: The source code and datasets explored in this work are available at https://github.com/look0012/circRNA-Disease-association. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Redes Neurais de Computação , RNA Circular , Algoritmos , Humanos

4.

Constructing prediction models from expression profiles for large scale lncRNA-miRNA interaction profiling.

Huang, Yu-An; Chan, Keith C C; You, Zhu-Hong.

Bioinformatics ; 34(5): 812-819, 2018 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-29069317

RESUMO

Motivation: The interaction of miRNA and lncRNA is known to be important for gene regulations. However, not many computational approaches have been developed to analyze known interactions and predict the unknown ones. Given that there are now more evidences that suggest that lncRNA-miRNA interactions are closely related to their relative expression levels in the form of a titration mechanism, we analyzed the patterns in large-scale expression profiles of known lncRNA-miRNA interactions. From these uncovered patterns, we noticed that lncRNAs tend to interact collaboratively with miRNAs of similar expression profiles, and vice versa. Results: By representing known interaction between lncRNA and miRNA as a bipartite graph, we propose here a technique, called EPLMI, to construct a prediction model from such a graph. EPLMI performs its tasks based on the assumption that lncRNAs that are highly similar to each other tend to have similar interaction or non-interaction patterns with miRNAs and vice versa. The effectiveness of the prediction model so constructed has been evaluated using the latest dataset of lncRNA-miRNA interactions. The results show that the prediction model can achieve AUCs of 0.8522 and 0.8447 ± 0.0017 based on leave-one-out cross validation and 5-fold cross validation. Using this model, we show that lncRNA-miRNA interactions can be reliably predicted. We also show that we can use it to select the most likely lncRNA targets that specific miRNAs would interact with. We believe that the prediction models discovered by EPLMI can yield great insights for further research on ceRNA regulation network. To the best of our knowledge, EPLMI is the first technique that is developed for large-scale lncRNA-miRNA interaction profiling. Availability and implementation: Matlab codes and dataset are available at https://github.com/yahuang1991polyu/EPLMI/. Contact: yu-an.huang@connect.polyu.hk or zhuhongyou@ms.xjb.ac.cn. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , MicroRNAs/metabolismo , RNA Longo não Codificante/metabolismo , Análise de Sequência de RNA/métodos , Algoritmos , Área Sob a Curva , Humanos , Sensibilidade e Especificidade

5.

Detecting gene-gene interactions for complex quantitative traits using generalized fuzzy classification.

Zhou, Xiangdong; Chan, Keith C C.

BMC Bioinformatics ; 19(1): 329, 2018 Sep 18.

Artigo em Inglês | MEDLINE | ID: mdl-30227829

RESUMO

BACKGROUND: Quantitative traits or continuous outcomes related to complex diseases can provide more information and therefore more accurate analysis for identifying gene-gene and gene- environment interactions associated with complex diseases. Multifactor Dimensionality Reduction (MDR) is originally proposed to identify gene-gene and gene- environment interactions associated with binary status of complex diseases. Some efforts have been made to extend it to quantitative traits (QTs) and ordinal traits. However these and other methods are still not computationally efficient or effective. RESULTS: Generalized Fuzzy Quantitative trait MDR (GFQMDR) is proposed in this paper to strengthen identification of gene-gene interactions associated with a quantitative trait by first transforming it to an ordinal trait and then selecting best sets of genetic markers, mainly single nucleotide polymorphisms (SNPs) or simple sequence length polymorphic markers (SSLPs), as having strong association with the trait through generalized fuzzy classification using extended member functions. Experimental results on simulated datasets and real datasets show that our algorithm has better success rate, classification accuracy and consistency in identifying gene-gene interactions associated with QTs. CONCLUSION: The proposed algorithm provides a more effective way to identify gene-gene interactions associated with quantitative traits.

Assuntos

Biologia Computacional/métodos , Epistasia Genética , Lógica Fuzzy , Fenótipo , Animais , Feminino , Marcadores Genéticos/genética , Humanos , Camundongos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único

6.

A density-based clustering approach for identifying overlapping protein complexes with functional preferences.

Hu, Lun; Chan, Keith C C.

BMC Bioinformatics ; 16: 174, 2015 May 27.

Artigo em Inglês | MEDLINE | ID: mdl-26013799

RESUMO

BACKGROUND: Identifying protein complexes is an essential task for understanding the mechanisms of proteins in cells. Many computational approaches have thus been developed to identify protein complexes in protein-protein interaction (PPI) networks. Regarding the information that can be adopted by computational approaches to identify protein complexes, in addition to the graph topology of PPI network, the consideration of functional information of proteins has been becoming popular recently. Relevant approaches perform their tasks by relying on the idea that proteins in the same protein complex may be associated with similar functional information. However, we note from our previous researches that for most protein complexes their proteins are only similar in specific subsets of categories of functional information instead of the entire set. Hence, if the preference of each functional category can also be taken into account when identifying protein complexes, the accuracy will be improved. RESULTS: To implement the idea, we first introduce a preference vector for each of proteins to quantitatively indicate the preference of each functional category when deciding the protein complex this protein belongs to. Integrating functional preferences of proteins and the graph topology of PPI network, we formulate the problem of identifying protein complexes into a constrained optimization problem, and we propose the approach DCAFP to address it. For performance evaluation, we have conducted extensive experiments with several PPI networks from the species of Saccharomyces cerevisiae and Human and also compared DCAFP with state-of-the-art approaches in the identification of protein complexes. The experimental results show that considering the integration of functional preferences and dense structures improved the performance of identifying protein complexes, as DCAFP outperformed the other approaches for most of PPI networks based on the assessments of independent measures of f-measure, Accuracy and Maximum Matching Rate. Furthermore, the function enrichment experiments indicated that DCAFP identified more protein complexes with functional significance when compared with approaches, such as PCIA, that also utilize the functional information. CONCLUSIONS: According to the promising performance of DCAFP, the integration of functional preferences and dense structures has made it possible to identify protein complexes more accurately and significantly.

Assuntos

Biologia Computacional/métodos , Complexos Multiproteicos/química , Complexos Multiproteicos/metabolismo , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Proteômica/métodos , Saccharomyces cerevisiae/metabolismo , Análise por Conglomerados , Humanos

7.

EGCN++: A New Fusion Strategy for Ensemble Learning in Skeleton-Based Rehabilitation Exercise Assessment.

Yu, Bruce X B; Liu, Yan; Chan, Keith C C; Chen, Chang Wen.

IEEE Trans Pattern Anal Mach Intell ; 46(9): 6471-6485, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38502632

RESUMO

Skeleton-based exercise assessment focuses on evaluating the correctness or quality of an exercise performed by a subject. Skeleton data provide two groups of features (i.e., position and orientation), which existing methods have not fully harnessed. We previously proposed an ensemble-based graph convolutional network (EGCN) that considers both position and orientation features to construct a model-based approach. Integrating these types of features achieved better performance than available methods. However, EGCN lacked a fusion strategy across the data, feature, decision, and model levels. In this paper, we present an advanced framework, EGCN++, for rehabilitation exercise assessment. Based on EGCN, a new fusion strategy called MLE-PO is proposed for EGCN++; this technique considers fusion at the data and model levels. We conduct extensive cross-validation experiments and investigate the consistency between machine and human evaluations on three datasets: UI-PRMD, KIMORE, and EHE. Results demonstrate that MLE-PO outperforms other EGCN ensemble strategies and representative baselines. Furthermore, the MLE-PO's model evaluation scores are more quantitatively consistent with clinical evaluations than other ensemble strategies.

Assuntos

Redes Neurais de Computação , Humanos , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Bases de Dados Factuais

8.

MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos.

Yu, Bruce X B; Liu, Yan; Zhang, Xiang; Zhong, Sheng-Hua; Chan, Keith C C.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 3522-3538, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-35617191

RESUMO

Human action recognition (HAR) in RGB-D videos has been widely investigated since the release of affordable depth sensors. Currently, unimodal approaches (e.g., skeleton-based and RGB video-based) have realized substantial improvements with increasingly larger datasets. However, multimodal methods specifically with model-level fusion have seldom been investigated. In this article, we propose a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities via a model-based approach. The objective of our method is to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. For the model-based fusion scheme, we use a spatiotemporal graph convolution network for the skeleton modality to learn attention weights that will be transferred to the network of the RGB modality. Extensive experiments are conducted on five benchmark datasets: NTU RGB+D 60, NTU RGB+D 120, PKU-MMD, Northwestern-UCLA Multiview, and Toyota Smarthome. Upon aggregating the results of multiple modalities, our method is found to outperform state-of-the-art approaches on six evaluation protocols of the five datasets; thus, the proposed MMNet can effectively capture mutually complementary features in different RGB-D video modalities and provide more discriminative features for HAR. We also tested our MMNet on an RGB video dataset Kinetics 400 that contains more outdoor actions, which shows consistent results with those of RGB-D video datasets.

Assuntos

Algoritmos , Reconhecimento Automatizado de Padrão , Humanos , Benchmarking , Atividades Humanas , Aprendizagem

9.

Discovering patterns in drug-protein interactions based on their fingerprints.

Luo, Weimin; Chan, Keith C C.

BMC Bioinformatics ; 13 Suppl 9: S4, 2012 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-22901089

RESUMO

BACKGROUND: The discovering of interesting patterns in drug-protein interaction data at molecular level can reveal hidden relationship among drugs and proteins and can therefore be of paramount importance for such application as drug design. To discover such patterns, we propose here a computational approach to analyze the molecular data of drugs and proteins that are known to have interactions with each other. Specifically, we propose to use a data mining technique called Drug-Protein Interaction Analysis (D-PIA) to determine if there are any commonalities in the fingerprints of the substructures of interacting drug and protein molecules and if so, whether or not any patterns can be generalized from them. METHOD: Given a database of drug-protein interactions, D-PIA performs its tasks in several steps. First, for each drug in the database, the fingerprints of its molecular substructures are first obtained. Second, for each protein in the database, the fingerprints of its protein domains are obtained. Third, based on known interactions between drugs and proteins, an interdependency measure between the fingerprint of each drug substructure and protein domain is then computed. Fourth, based on the interdependency measure, drug substructures and protein domains that are significantly interdependent are identified. Fifth, the existence of interaction relationship between a previously unknown drug-protein pairs is then predicted based on their constituent substructures that are significantly interdependent. RESULTS: To evaluate the effectiveness of D-PIA, we have tested it with real drug-protein interaction data. D-PIA has been tested with real drug-protein interaction data including enzymes, ion channels, and protein-coupled receptors. Experimental results show that there are indeed patterns that one can discover in the interdependency relationship between drug substructures and protein domains of interacting drugs and proteins. Based on these relationships, a testing set of drug-protein data are used to see if D-PIA can correctly predict the existence of interaction between drug-protein pairs. The results show that the prediction accuracy can be very high. An AUC score of a ROC plot could reach as high as 75% which shows the effectiveness of this classifier. CONCLUSIONS: D-PIA has the advantage that it is able to perform its tasks effectively based on the fingerprints of drug and protein molecules without requiring any 3D information about their structures and D-PIA is therefore very fast to compute. D-PIA has been tested with real drug-protein interaction data and experimental results show that it can be very useful for predicting previously unknown drug-protein as well as protein-ligand interactions. It can also be used to tackle problems such as ligand specificity which is related directly and indirectly to drug design and discovery.

Assuntos

Mineração de Dados , Preparações Farmacêuticas/química , Proteínas/química , Bases de Dados Factuais , Ligantes , Estrutura Terciária de Proteína , Curva ROC

10.

Unsupervised fuzzy pattern discovery in gene expression data.

Wu, Gene P K; Chan, Keith C C; Wong, Andrew K C.

BMC Bioinformatics ; 12 Suppl 5: S5, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-21989090

RESUMO

BACKGROUND: Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. METHODS: For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns of events can be discovered. If the gene groups obtained are crisp gene clusters, significant patterns overlapping different gene clusters cannot be found. This paper presents a new method of "fuzzifying" the crisp gene clusters to overcome such problem. RESULTS: To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic data set and then a gene expression data set with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm's effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method. The existence of correlation among continuous valued gene expression levels suggests that certain genes in the gene groups have high interdependence with other genes in the group. Fuzzification of a crisp gene cluster allows the cluster to take in genes from other clusters so that overlapping relationship among gene clusters could be uncovered. Hence, previously unknown hidden patterns resided in overlapping gene clusters are discovered. From the experimental results, the high order patterns discovered reveal multiple gene interaction patterns in cancerous tissues not found in normal tissues. It was also found that for the colon cancer experiment, 70% of the top patterns and most of the discriminative patterns between cancerous and normal tissues are among those spanning across different crisp gene clusters. CONCLUSIONS: We show that the proposed method for analyzing the error-prone microarray is effective even without the presence of tissue class information. A unified framework is presented, allowing fast and accurate pattern discovery for gene expression data. For a large gene set, to discover a comprehensive set of patterns, gene clustering, gene expression discretization and gene cluster fuzzification are absolutely necessary.

Assuntos

Algoritmos , Neoplasias do Colo/genética , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos

11.

Learning Representation of Molecules in Association Network for Predicting Intermolecular Associations.

Yi, Hai-Cheng; You, Zhu-Hong; Guo, Zhen-Hao; Huang, De-Shuang; Chan, Keith C C.

IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2546-2554, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-32070992

RESUMO

A key aim of post-genomic biomedical research is to systematically understand molecules and their interactions in human cells. Multiple biomolecules coordinate to sustain life activities, and interactions between various biomolecules are interconnected. However, existing studies usually only focusing on associations between two or very limited types of molecules. In this study, we propose a network representation learning based computational framework MAN-SDNE to predict any intermolecular associations. More specifically, we constructed a large-scale molecular association network of multiple biomolecules in human by integrating associations among long non-coding RNA, microRNA, protein, drug, and disease, containing 6,528 molecular nodes, 9 kind of,105,546 associations. And then, the feature of each node is represented by its network proximity and attribute features. Furthermore, these features are used to train Random Forest classifier to predict intermolecular associations. MAN-SDNE achieves a remarkable performance with an AUC of 0.9552 and an AUPR of 0.9338 under five-fold cross-validation. To indicate the ability to predict specific types of interactions, a case study for predicting lncRNA-protein interactions using MAN-SDNE is also executed. Experimental results demonstrate this work offers a systematic insight for understanding the synergistic associations between molecules and complex diseases and provides a network-based computational tool to systematically explore intermolecular interactions.

Assuntos

Modelos Biológicos , Biologia de Sistemas/métodos , Simulação por Computador , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Preparações Farmacêuticas/metabolismo , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo

12.

Determining dependency and redundancy for identifying gene-gene interaction associated with complex disease.

Zhou, Xiangdong; Chan, Keith C C; Huang, Zhihua; Wang, Jingbin.

J Bioinform Comput Biol ; 18(5): 2050035, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-33064052

RESUMO

As interactions among genetic variants in different genes can be an important factor for predicting complex diseases, many computational methods have been proposed to detect if a particular set of genes has interaction with a particular complex disease. However, even though many such methods have been shown to be useful, they can be made more effective if the properties of gene-gene interactions can be better understood. Towards this goal, we have attempted to uncover patterns in gene-gene interactions and the patterns reveal an interesting property that can be reflected in an inequality that describes the relationship between two genotype variables and a disease-status variable. We show, in this paper, that this inequality can be generalized to [Formula: see text] genotype variables. Based on this inequality, we establish a conditional independence and redundancy (CIR)-based definition of gene-gene interaction and the concept of an interaction group. From these new definitions, a novel measure of gene-gene interaction is then derived. We discuss the properties of these concepts and explain how they can be used in a novel algorithm to detect high-order gene-gene interactions. Experimental results using both simulated and real datasets show that the proposed method can be very promising.

Assuntos

Algoritmos , Epistasia Genética , Estudos de Casos e Controles , Biologia Computacional/métodos , Frequência do Gene , Genótipo , Hemoglobinopatias/genética , Humanos , Desequilíbrio de Ligação , Malária Falciparum/genética , Polimorfismo de Nucleotídeo Único , alfa-Globinas/genética

13.

Contextual Correlation Preserving Multiview Featured Graph Clustering.

He, Tiantian; Liu, Yang; Ko, Tobey H; Chan, Keith C C; Ong, Yew-Soon.

IEEE Trans Cybern ; 50(10): 4318-4331, 2020 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-31329151

RESUMO

Graph clustering, which aims at discovering sets of related vertices in graph-structured data, plays a crucial role in various applications, such as social community detection and biological module discovery. With the huge increase in the volume of data in recent years, graph clustering is used in an increasing number of real-life scenarios. However, the classical and state-of-the-art methods, which consider only single-view features or a single vector concatenating features from different views and neglect the contextual correlation between pairwise features, are insufficient for the task, as features that characterize vertices in a graph are usually from multiple views and the contextual correlation between pairwise features may influence the cluster preference for vertices. To address this challenging problem, we introduce in this paper, a novel graph clustering model, dubbed contextual correlation preserving multiview featured graph clustering (CCPMVFGC) for discovering clusters in graphs with multiview vertex features. Unlike most of the aforementioned approaches, CCPMVFGC is capable of learning a shared latent space from multiview features as the cluster preference for each vertex and making use of this latent space to model the inter-relationship between pairwise vertices. CCPMVFGC uses an effective method to compute the degree of contextual correlation between pairwise vertex features and utilizes view-wise latent space representing the feature-cluster preference to model the computed correlation. Thus, the cluster preference learned by CCPMVFGC is jointly inferred by multiview features, view-wise correlations of pairwise features, and the graph topology. Accordingly, we propose a unified objective function for CCPMVFGC and develop an iterative strategy to solve the formulated optimization problem. We also provide the theoretical analysis of the proposed model, including convergence proof and computational complexity analysis. In our experiments, we extensively compare the proposed CCPMVFGC with both classical and state-of-the-art graph clustering methods on eight standard graph datasets (six multiview and two single-view datasets). The results show that CCPMVFGC achieves competitive performance on all eight datasets, which validates the effectiveness of the proposed model.

14.

Learning Multimodal Networks From Heterogeneous Data for Prediction of lncRNA-miRNA Interactions.

Hu, Pengwei; Huang, Yu-An; Chan, Keith C C; You, Zhu-Hong.

IEEE/ACM Trans Comput Biol Bioinform ; 17(5): 1516-1524, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31796414

RESUMO

Long noncoding RNAs (lncRNAs) is an important class of non-protein coding RNAs. They have recently been found to potentially be able to act as a regulatory molecule in some important biological processes. MicroRNAs (miRNAs) have been confirmed to be closely related to the regulation of various human diseases. Recent studies have suggested that lncRNAs could interact with miRNAs to modulate their regulatory roles. Hence, predicting lncRNA-miRNA interactions are biologically significant due to their potential roles in determining the effectiveness of diagnostic biomarkers and therapeutic targets for various human diseases. For the details of the mechanisms to be better understood, it would be useful if some computational approaches are developed to allow for such investigations. As diverse heterogeneous datasets for describing lncRNA and miRNA have been made available, it becomes more feasible for us to develop a model to describe potential interactions between lncRNAs and miRNAs. In this work, we present a novel computational approach called LMNLMI for such purpose. LMNLMI works in several phases. First, it learns patterns from expression, sequences and functional data. Based on the patterns, it then constructs several networks including an expression-similarity network, a functional-similarity network, and a sequence-similarity network. Based on a measure of similarities between these networks, LMNLMI computes an interaction score for each pair of lncRNA and miRNA in the database. The novelty of LMNLMI lies in the use of a network fusion technique to combine the patterns inherent in multiple similarity networks and a matrix completion technique in predicting interaction relationships. Using a set of real data, we show that LMNLMI can be a very effective approach for the accurate prediction of lncRNA-miRNA interactions.

Assuntos

Biologia Computacional/métodos , MicroRNAs , RNA Longo não Codificante , Transcriptoma/genética , Bases de Dados Genéticas , Doença , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Modelos Genéticos , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo

15.

Learning Representations to Predict Intermolecular Interactions on Large-Scale Heterogeneous Molecular Association Network.

Yi, Hai-Cheng; You, Zhu-Hong; Huang, De-Shuang; Guo, Zhen-Hao; Chan, Keith C C; Li, Yangming.

iScience ; 23(7): 101261, 2020 Jul 24.

Artigo em Inglês | MEDLINE | ID: mdl-32580123

RESUMO

Molecular components that are functionally interdependent in human cells constitute molecular association networks. Disease can be caused by disturbance of multiple molecular interactions. New biomolecular regulatory mechanisms can be revealed by discovering new biomolecular interactions. To this end, a heterogeneous molecular association network is formed by systematically integrating comprehensive associations between miRNAs, lncRNAs, circRNAs, mRNAs, proteins, drugs, microbes, and complex diseases. We propose a machine learning method for predicting intermolecular interactions, named MMI-Pred. More specifically, a network embedding model is developed to fully exploit the network behavior of biomolecules, and attribute features are also calculated. Then, these discriminative features are combined to train a random forest classifier to predict intermolecular interactions. MMI-Pred achieves an outstanding performance of 93.50% accuracy in hybrid associations prediction under 5-fold cross-validation. This work provides systematic landscape and machine learning method to model and infer complex associations between various biological components.

16.

Network-Based Prediction of Major Adverse Cardiac Events in Acute Coronary Syndromes from Imbalanced EMR Data.

Hu, Pengwei; Xia, Eryu; Li, Shochun; Du, Xin; Ma, Changsheng; Dong, Jianzeng; Chan, Keith C C.

Stud Health Technol Inform ; 264: 1480-1481, 2019 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-31438191

RESUMO

The low proportion and the rapid evolvement of major adverse cardiac events (MACE) present challenges for predicting MACE by machine learning models. In this paper, we propose a method to predict MACE from large-scale imbalanced EMR data by using a network-based one-class classifier. It only used the reliably known MACE samples to establish the hyperspherical model. Experiments show that our model outperforms the state-of-the-art models.

Assuntos

Síndrome Coronariana Aguda , Humanos

17.

UPSEC: an algorithm for classifying unaligned protein sequences into functional families.

Ma, Patrick C H; Chan, Keith C C.

J Comput Biol ; 15(4): 431-43, 2008 May.

Artigo em Inglês | MEDLINE | ID: mdl-18435571

RESUMO

To classify proteins into functional families based on their primary sequences, popular algorithms such as the k-NN-, HMM-, and SVM-based algorithms are often used. For many of these algorithms to perform their tasks, protein sequences need to be properly aligned first. Since the alignment process can be error-prone, protein classification may not be performed very accurately. To improve classification accuracy, we propose an algorithm, called the Unaligned Protein SEquence Classifier (UPSEC), which can perform its tasks without sequence alignment. UPSEC makes use of a probabilistic measure to identify residues that are useful for classification in both positive and negative training samples, and can handle multi-class classification with a single classifier and a single pass through the training data. UPSEC has been tested with real protein data sets. Experimental results show that UPSEC can effectively classify unaligned protein sequences into their corresponding functional families, and the patterns it discovers during the training process can be biologically meaningful.

Assuntos

Algoritmos , Proteínas/química , Proteínas/classificação , Análise de Sequência de Proteína , Sequência de Aminoácidos , Matemática , Proteínas/genética , Alinhamento de Sequência

18.

Measuring Boundedness for Protein Complex Identification in PPI Networks.

He, Tiantian; Chan, Keith C C.

IEEE/ACM Trans Comput Biol Bioinform ; 2018 Apr 03.

Artigo em Inglês | MEDLINE | ID: mdl-29993661

RESUMO

The problem of identifying protein complexes in Protein-Protein Interaction (PPI) networks is usually formulated as the problem of identifying dense regions in such networks. In this paper, we present a novel approach, called TBPCI, to identify protein complexes based instead on the concept of a measure of boundedness. Such a measure is defined as an objective function of a Jaccard Index-based connectedness measure which takes into consideration how much two proteins within a network are connected to each other, and an association measure which takes into consideration how much two connecting proteins are associated based on their attributes found in the Gene Ontology database. Based on the above two measures, the objective function is derived to capture how strong the proteins can be considered as bounded together and the objective value is therefore referred as the aggregated degree of boundedness. To identify protein complexes, TBPCI computes the degree of boundedness between all possible pairwise proteins. Then, TBPCI uses a Breadth-First-Search method to determine whether a protein-pair should be incorporated into the same complex. TBPCI has been tested with several real data sets and the experimental results show it is an effective approach for identifying protein complexes in PPI networks.

19.

MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs.

He, Tiantian; Chan, Keith C C.

IEEE Trans Cybern ; 48(5): 1369-1382, 2018 May.

Artigo em Inglês | MEDLINE | ID: mdl-28459699

RESUMO

An attributed graph contains vertices that are associated with a set of attribute values. Mining clusters or communities, which are interesting subgraphs in the attributed graph is one of the most important tasks of graph analytics. Many problems can be defined as the mining of interesting subgraphs in attributed graphs. Algorithms that discover subgraphs based on predefined topologies cannot be used to tackle these problems. To discover interesting subgraphs in the attributed graph, we propose an algorithm called mining interesting subgraphs in attributed graph algorithm (MISAGA). MISAGA performs its tasks by first using a probabilistic measure to determine whether the strength of association between a pair of attribute values is strong enough to be interesting. Given the interesting pairs of attribute values, then the degree of association is computed for each pair of vertices using an information theoretic measure. Based on the edge structure and degree of association between each pair of vertices, MISAGA identifies interesting subgraphs by formulating it as a constrained optimization problem and solves it by identifying the optimal affiliation of subgraphs for the vertices in the attributed graph. MISAGA has been tested with several large-sized real graphs and is found to be potentially very useful for various applications.

20.

Evolutionary Graph Clustering for Protein Complex Identification.

He, Tiantian; Chan, Keith C C.

IEEE/ACM Trans Comput Biol Bioinform ; 15(3): 892-904, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-28029628

RESUMO

This paper presents a graph clustering algorithm, called EGCPI, to discover protein complexes in protein-protein interaction (PPI) networks. In performing its task, EGCPI takes into consideration both network topologies and attributes of interacting proteins, both of which have been shown to be important for protein complex discovery. EGCPI formulates the problem as an optimization problem and tackles it with evolutionary clustering. Given a PPI network, EGCPI first annotates each protein with corresponding attributes that are provided in Gene Ontology database. It then adopts a similarity measure to evaluate how similar the connected proteins are taking into consideration the network topology. Given this measure, EGCPI then discovers a number of graph clusters within which proteins are densely connected, based on an evolutionary strategy. At last, EGCPI identifies protein complexes in each discovered cluster based on the homogeneity of attributes performed by pairwise proteins. EGCPI has been tested with several real data sets and the experimental results show EGCPI is very effective on protein complex discovery, and the evolutionary clustering is helpful to identify protein complexes in PPI networks. The software of EGCPI can be downloaded via: https://github.com/hetiantian1985/EGCPI.

Assuntos

Análise por Conglomerados , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/fisiologia , Proteínas , Algoritmos , Humanos , Proteínas/química , Proteínas/metabolismo , Proteínas/fisiologia , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/fisiologia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa