RESUMEN
MOTIVATION: Identifying microRNAs that are associated with different diseases as biomarkers is a problem of great medical significance. Existing computational methods for uncovering such microRNA-diseases associations (MDAs) are mostly developed under the assumption that similar microRNAs tend to associate with similar diseases. Since such an assumption is not always valid, these methods may not always be applicable to all kinds of MDAs. Considering that the relationship between long noncoding RNA (lncRNA) and different diseases and the co-regulation relationships between the biological functions of lncRNA and microRNA have been established, we propose here a multiview multitask method to make use of the known lncRNA-microRNA interaction to predict MDAs on a large scale. The investigation is performed in the absence of complete information of microRNAs and any similarity measurement for it and to the best knowledge, the work represents the first ever attempt to discover MDAs based on lncRNA-microRNA interactions. RESULTS: In this paper, we propose to develop a deep learning model called MVMTMDA that can create a multiview representation of microRNAs. The model is trained based on an end-to-end multitasking approach to machine learning so that, based on it, missing data in the side information can be determined automatically. Experimental results show that the proposed model yields an average area under ROC curve of 0.8410+/-0.018, 0.8512+/-0.012 and 0.8521+/-0.008 when k is set to 2, 5 and 10, respectively. In addition, we also propose here a statistical approach to predicting lncRNA-disease associations based on these associations and the MDA discovered using MVMTMDA. AVAILABILITY: Python code and the datasets used in our studies are made available at https://github.com/yahuang1991polyu/MVMTMDA/.
Asunto(s)
Enfermedad/genética , Aprendizaje Automático , MicroARNs , Modelos Genéticos , ARN Largo no Codificante , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Valor Predictivo de las Pruebas , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismoRESUMEN
MOTIVATION: MicroRNA (miRNA) therapeutics is becoming increasingly important. However, aberrant expression of miRNAs is known to cause drug resistance and can become an obstacle for miRNA-based therapeutics. At present, little is known about associations between miRNA and drug resistance and there is no computational tool available for predicting such association relationship. Since it is known that miRNAs can regulate genes that encode specific proteins that are keys for drug efficacy, we propose here a computational approach, called GCMDR, for finding a three-layer latent factor model that can be used to predict miRNA-drug resistance associations. RESULTS: In this paper, we discuss how the problem of predicting such associations can be formulated as a link prediction problem involving a bipartite attributed graph. GCMDR makes use of the technique of graph convolution to build a latent factor model, which can effectively utilize information of high-dimensional attributes of miRNA/drug in an end-to-end learning scheme. In addition, GCMDR also learns graph embedding features for miRNAs and drugs. We leveraged the data from multiple databases storing miRNA expression profile, drug substructure fingerprints, gene ontology and disease ontology. The test for performance shows that the GCMDR prediction model can achieve AUCs of 0.9301 ± 0.0005, 0.9359 ± 0.0006 and 0.9369 ± 0.0003 based on 2-fold, 5-fold and 10-fold cross validation, respectively. Using this model, we show that the associations between miRNA and drug resistance can be reliably predicted by properly introducing useful side information like miRNA expression profile and drug structure fingerprints. AVAILABILITY AND IMPLEMENTATION: Python codes and dataset are available at https://github.com/yahuang1991polyu/GCMDR/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
MicroARNs , Algoritmos , Área Bajo la Curva , Biología Computacional , Resistencia a MedicamentosRESUMEN
MOTIVATION: Emerging evidence indicates that circular RNA (circRNA) plays a crucial role in human disease. Using circRNA as biomarker gives rise to a new perspective regarding our diagnosing of diseases and understanding of disease pathogenesis. However, detection of circRNA-disease associations by biological experiments alone is often blind, limited to small scale, high cost and time consuming. Therefore, there is an urgent need for reliable computational methods to rapidly infer the potential circRNA-disease associations on a large scale and to provide the most promising candidates for biological experiments. RESULTS: In this article, we propose an efficient computational method based on multi-source information combined with deep convolutional neural network (CNN) to predict circRNA-disease associations. The method first fuses multi-source information including disease semantic similarity, disease Gaussian interaction profile kernel similarity and circRNA Gaussian interaction profile kernel similarity, and then extracts its hidden deep feature through the CNN and finally sends them to the extreme learning machine classifier for prediction. The 5-fold cross-validation results show that the proposed method achieves 87.21% prediction accuracy with 88.50% sensitivity at the area under the curve of 86.67% on the CIRCR2Disease dataset. In comparison with the state-of-the-art SVM classifier and other feature extraction methods on the same dataset, the proposed model achieves the best results. In addition, we also obtained experimental support for prediction results by searching published literature. As a result, 7 of the top 15 circRNA-disease pairs with the highest scores were confirmed by literature. These results demonstrate that the proposed model is a suitable method for predicting circRNA-disease associations and can provide reliable candidates for biological experiments. AVAILABILITY AND IMPLEMENTATION: The source code and datasets explored in this work are available at https://github.com/look0012/circRNA-Disease-association. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Redes Neurales de la Computación , ARN Circular , Algoritmos , HumanosRESUMEN
Motivation: The interaction of miRNA and lncRNA is known to be important for gene regulations. However, not many computational approaches have been developed to analyze known interactions and predict the unknown ones. Given that there are now more evidences that suggest that lncRNA-miRNA interactions are closely related to their relative expression levels in the form of a titration mechanism, we analyzed the patterns in large-scale expression profiles of known lncRNA-miRNA interactions. From these uncovered patterns, we noticed that lncRNAs tend to interact collaboratively with miRNAs of similar expression profiles, and vice versa. Results: By representing known interaction between lncRNA and miRNA as a bipartite graph, we propose here a technique, called EPLMI, to construct a prediction model from such a graph. EPLMI performs its tasks based on the assumption that lncRNAs that are highly similar to each other tend to have similar interaction or non-interaction patterns with miRNAs and vice versa. The effectiveness of the prediction model so constructed has been evaluated using the latest dataset of lncRNA-miRNA interactions. The results show that the prediction model can achieve AUCs of 0.8522 and 0.8447 ± 0.0017 based on leave-one-out cross validation and 5-fold cross validation. Using this model, we show that lncRNA-miRNA interactions can be reliably predicted. We also show that we can use it to select the most likely lncRNA targets that specific miRNAs would interact with. We believe that the prediction models discovered by EPLMI can yield great insights for further research on ceRNA regulation network. To the best of our knowledge, EPLMI is the first technique that is developed for large-scale lncRNA-miRNA interaction profiling. Availability and implementation: Matlab codes and dataset are available at https://github.com/yahuang1991polyu/EPLMI/. Contact: yu-an.huang@connect.polyu.hk or zhuhongyou@ms.xjb.ac.cn. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , MicroARNs/metabolismo , ARN Largo no Codificante/metabolismo , Análisis de Secuencia de ARN/métodos , Algoritmos , Área Bajo la Curva , Humanos , Sensibilidad y EspecificidadRESUMEN
BACKGROUND: Quantitative traits or continuous outcomes related to complex diseases can provide more information and therefore more accurate analysis for identifying gene-gene and gene- environment interactions associated with complex diseases. Multifactor Dimensionality Reduction (MDR) is originally proposed to identify gene-gene and gene- environment interactions associated with binary status of complex diseases. Some efforts have been made to extend it to quantitative traits (QTs) and ordinal traits. However these and other methods are still not computationally efficient or effective. RESULTS: Generalized Fuzzy Quantitative trait MDR (GFQMDR) is proposed in this paper to strengthen identification of gene-gene interactions associated with a quantitative trait by first transforming it to an ordinal trait and then selecting best sets of genetic markers, mainly single nucleotide polymorphisms (SNPs) or simple sequence length polymorphic markers (SSLPs), as having strong association with the trait through generalized fuzzy classification using extended member functions. Experimental results on simulated datasets and real datasets show that our algorithm has better success rate, classification accuracy and consistency in identifying gene-gene interactions associated with QTs. CONCLUSION: The proposed algorithm provides a more effective way to identify gene-gene interactions associated with quantitative traits.
Asunto(s)
Biología Computacional/métodos , Epistasis Genética , Lógica Difusa , Fenotipo , Animales , Femenino , Marcadores Genéticos/genética , Humanos , Ratones , Modelos Genéticos , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: Identifying protein complexes is an essential task for understanding the mechanisms of proteins in cells. Many computational approaches have thus been developed to identify protein complexes in protein-protein interaction (PPI) networks. Regarding the information that can be adopted by computational approaches to identify protein complexes, in addition to the graph topology of PPI network, the consideration of functional information of proteins has been becoming popular recently. Relevant approaches perform their tasks by relying on the idea that proteins in the same protein complex may be associated with similar functional information. However, we note from our previous researches that for most protein complexes their proteins are only similar in specific subsets of categories of functional information instead of the entire set. Hence, if the preference of each functional category can also be taken into account when identifying protein complexes, the accuracy will be improved. RESULTS: To implement the idea, we first introduce a preference vector for each of proteins to quantitatively indicate the preference of each functional category when deciding the protein complex this protein belongs to. Integrating functional preferences of proteins and the graph topology of PPI network, we formulate the problem of identifying protein complexes into a constrained optimization problem, and we propose the approach DCAFP to address it. For performance evaluation, we have conducted extensive experiments with several PPI networks from the species of Saccharomyces cerevisiae and Human and also compared DCAFP with state-of-the-art approaches in the identification of protein complexes. The experimental results show that considering the integration of functional preferences and dense structures improved the performance of identifying protein complexes, as DCAFP outperformed the other approaches for most of PPI networks based on the assessments of independent measures of f-measure, Accuracy and Maximum Matching Rate. Furthermore, the function enrichment experiments indicated that DCAFP identified more protein complexes with functional significance when compared with approaches, such as PCIA, that also utilize the functional information. CONCLUSIONS: According to the promising performance of DCAFP, the integration of functional preferences and dense structures has made it possible to identify protein complexes more accurately and significantly.
Asunto(s)
Biología Computacional/métodos , Complejos Multiproteicos/química , Complejos Multiproteicos/metabolismo , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Proteómica/métodos , Saccharomyces cerevisiae/metabolismo , Análisis por Conglomerados , HumanosRESUMEN
OBJECTIVE: This meta-analysis systematically compiles intervention research designed to increase medication adherence among underrepresented adults. METHOD: Comprehensive searching located published and unpublished studies with medication adherence behavior outcomes. Studies were included if samples were adults living in North America who had any of the following backgrounds or identities: African American, Native American, Latino, Latino American, Asian, Asian American, Pacific Islander, Native Alaskan, or Native Hawaiian. Random-effect analyses synthesized data to calculate effect sizes as a standardized mean difference and variability measures. Exploratory moderator analyses examined the association between specific efforts to increase the cultural relevance of medication adherence studies and behavior outcomes. RESULTS: Data were synthesized across 5559 subjects in 55 eligible samples. Interventions significantly improved medication adherence behavior of treatment subjects compared to control subjects (standardized mean difference=0.211). Primary studies infrequently reported strategies to enhance cultural relevance. Exploratory moderator analyses found no evidence that associated cultural relevance strategies with better medication adherence outcomes. CONCLUSION: The modest magnitude of improvements in medication adherence behavior documents the need for further research with clear testing of cultural relevance features.
Asunto(s)
Cumplimiento de la Medicación/etnología , Cumplimiento de la Medicación/estadística & datos numéricos , Grupos Minoritarios/estadística & datos numéricos , Poblaciones Vulnerables/estadística & datos numéricos , Adulto , Anciano , Etnicidad/estadística & datos numéricos , Femenino , Humanos , Indígenas Norteamericanos/estadística & datos numéricos , Masculino , Persona de Mediana Edad , América del Norte , Medicamentos bajo PrescripciónRESUMEN
Skeleton-based exercise assessment focuses on evaluating the correctness or quality of an exercise performed by a subject. Skeleton data provide two groups of features (i.e., position and orientation), which existing methods have not fully harnessed. We previously proposed an ensemble-based graph convolutional network (EGCN) that considers both position and orientation features to construct a model-based approach. Integrating these types of features achieved better performance than available methods. However, EGCN lacked a fusion strategy across the data, feature, decision, and model levels. In this paper, we present an advanced framework, EGCN++, for rehabilitation exercise assessment. Based on EGCN, a new fusion strategy called MLE-PO is proposed for EGCN++; this technique considers fusion at the data and model levels. We conduct extensive cross-validation experiments and investigate the consistency between machine and human evaluations on three datasets: UI-PRMD, KIMORE, and EHE. Results demonstrate that MLE-PO outperforms other EGCN ensemble strategies and representative baselines. Furthermore, the MLE-PO's model evaluation scores are more quantitatively consistent with clinical evaluations than other ensemble strategies.
Asunto(s)
Redes Neurales de la Computación , Humanos , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Bases de Datos FactualesRESUMEN
Human action recognition (HAR) in RGB-D videos has been widely investigated since the release of affordable depth sensors. Currently, unimodal approaches (e.g., skeleton-based and RGB video-based) have realized substantial improvements with increasingly larger datasets. However, multimodal methods specifically with model-level fusion have seldom been investigated. In this article, we propose a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities via a model-based approach. The objective of our method is to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. For the model-based fusion scheme, we use a spatiotemporal graph convolution network for the skeleton modality to learn attention weights that will be transferred to the network of the RGB modality. Extensive experiments are conducted on five benchmark datasets: NTU RGB+D 60, NTU RGB+D 120, PKU-MMD, Northwestern-UCLA Multiview, and Toyota Smarthome. Upon aggregating the results of multiple modalities, our method is found to outperform state-of-the-art approaches on six evaluation protocols of the five datasets; thus, the proposed MMNet can effectively capture mutually complementary features in different RGB-D video modalities and provide more discriminative features for HAR. We also tested our MMNet on an RGB video dataset Kinetics 400 that contains more outdoor actions, which shows consistent results with those of RGB-D video datasets.
Asunto(s)
Algoritmos , Reconocimiento de Normas Patrones Automatizadas , Humanos , Benchmarking , Actividades Humanas , AprendizajeRESUMEN
BACKGROUND: The discovering of interesting patterns in drug-protein interaction data at molecular level can reveal hidden relationship among drugs and proteins and can therefore be of paramount importance for such application as drug design. To discover such patterns, we propose here a computational approach to analyze the molecular data of drugs and proteins that are known to have interactions with each other. Specifically, we propose to use a data mining technique called Drug-Protein Interaction Analysis (D-PIA) to determine if there are any commonalities in the fingerprints of the substructures of interacting drug and protein molecules and if so, whether or not any patterns can be generalized from them. METHOD: Given a database of drug-protein interactions, D-PIA performs its tasks in several steps. First, for each drug in the database, the fingerprints of its molecular substructures are first obtained. Second, for each protein in the database, the fingerprints of its protein domains are obtained. Third, based on known interactions between drugs and proteins, an interdependency measure between the fingerprint of each drug substructure and protein domain is then computed. Fourth, based on the interdependency measure, drug substructures and protein domains that are significantly interdependent are identified. Fifth, the existence of interaction relationship between a previously unknown drug-protein pairs is then predicted based on their constituent substructures that are significantly interdependent. RESULTS: To evaluate the effectiveness of D-PIA, we have tested it with real drug-protein interaction data. D-PIA has been tested with real drug-protein interaction data including enzymes, ion channels, and protein-coupled receptors. Experimental results show that there are indeed patterns that one can discover in the interdependency relationship between drug substructures and protein domains of interacting drugs and proteins. Based on these relationships, a testing set of drug-protein data are used to see if D-PIA can correctly predict the existence of interaction between drug-protein pairs. The results show that the prediction accuracy can be very high. An AUC score of a ROC plot could reach as high as 75% which shows the effectiveness of this classifier. CONCLUSIONS: D-PIA has the advantage that it is able to perform its tasks effectively based on the fingerprints of drug and protein molecules without requiring any 3D information about their structures and D-PIA is therefore very fast to compute. D-PIA has been tested with real drug-protein interaction data and experimental results show that it can be very useful for predicting previously unknown drug-protein as well as protein-ligand interactions. It can also be used to tackle problems such as ligand specificity which is related directly and indirectly to drug design and discovery.
Asunto(s)
Minería de Datos , Preparaciones Farmacéuticas/química , Proteínas/química , Bases de Datos Factuales , Ligandos , Estructura Terciaria de Proteína , Curva ROCRESUMEN
BACKGROUND: Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. METHODS: For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns of events can be discovered. If the gene groups obtained are crisp gene clusters, significant patterns overlapping different gene clusters cannot be found. This paper presents a new method of "fuzzifying" the crisp gene clusters to overcome such problem. RESULTS: To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic data set and then a gene expression data set with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm's effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method. The existence of correlation among continuous valued gene expression levels suggests that certain genes in the gene groups have high interdependence with other genes in the group. Fuzzification of a crisp gene cluster allows the cluster to take in genes from other clusters so that overlapping relationship among gene clusters could be uncovered. Hence, previously unknown hidden patterns resided in overlapping gene clusters are discovered. From the experimental results, the high order patterns discovered reveal multiple gene interaction patterns in cancerous tissues not found in normal tissues. It was also found that for the colon cancer experiment, 70% of the top patterns and most of the discriminative patterns between cancerous and normal tissues are among those spanning across different crisp gene clusters. CONCLUSIONS: We show that the proposed method for analyzing the error-prone microarray is effective even without the presence of tissue class information. A unified framework is presented, allowing fast and accurate pattern discovery for gene expression data. For a large gene set, to discover a comprehensive set of patterns, gene clustering, gene expression discretization and gene cluster fuzzification are absolutely necessary.
Asunto(s)
Algoritmos , Neoplasias del Colon/genética , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodosRESUMEN
OBJECTIVE: Glucagon-like peptide 1 receptor agonists (GLP-1RAs) improved multiple proatherogenic risk factors and reduced cardiovascular events in recent clinical trials, suggesting that they may slow progression of atherosclerosis. We tested whether exenatide once weekly reduces carotid plaque progression in individuals with type 2 diabetes. RESEARCH DESIGN AND METHODS: In a double-blind, pragmatic trial, 163 participants were randomized (2:1) to exenatide (n = 109) or placebo (n = 54). Changes in carotid plaque volume and composition were measured at 9 and 18 months by multicontrast 3 Tesla MRI. Fasting and post-high-fat meal plasma glucose and lipids, and endothelial function responses, were measured at 3, 9, and 18 months. RESULTS: Exenatide reduced hemoglobin A1c (HbA1c) (estimated difference vs. placebo 0.55%, P = 0.0007) and fasting and postmeal plasma glucose (19 mg/dL, P = 0.0002, and 25 mg/dL, P < 0.0001, respectively). Mean (SD) change in plaque volume in the exenatide group (0.3% [2%]) was not different from that in the placebo group (-2.2% [8%]) (P = 0.4). The change in plaque volume in the exenatide group was associated with changes in HbA1c (r = 0.38, P = 0.0004), body weight, and overall plasma glucose (r = 0.29, P = 0.007 both). There were no differences in changes in plaque composition, body weight, blood pressure, fasting and postmeal plasma triglycerides, and endothelial function between the groups. CONCLUSIONS: Exenatide once weekly for up to 18 months improved fasting and postprandial glycemic control but did not modify change in carotid plaque volume or composition. This study raises the possibility that short-term antiatherosclerotic effects may not play a central role in the cardiovascular benefits of GLP-1RAs.
Asunto(s)
Enfermedades de las Arterias Carótidas , Diabetes Mellitus Tipo 2 , Glucemia , Diabetes Mellitus Tipo 2/complicaciones , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Exenatida/uso terapéutico , Receptor del Péptido 1 Similar al Glucagón , Hemoglobina Glucada/análisis , Humanos , Hipoglucemiantes/uso terapéutico , Péptidos , Ponzoñas/uso terapéuticoRESUMEN
A key aim of post-genomic biomedical research is to systematically understand molecules and their interactions in human cells. Multiple biomolecules coordinate to sustain life activities, and interactions between various biomolecules are interconnected. However, existing studies usually only focusing on associations between two or very limited types of molecules. In this study, we propose a network representation learning based computational framework MAN-SDNE to predict any intermolecular associations. More specifically, we constructed a large-scale molecular association network of multiple biomolecules in human by integrating associations among long non-coding RNA, microRNA, protein, drug, and disease, containing 6,528 molecular nodes, 9 kind of,105,546 associations. And then, the feature of each node is represented by its network proximity and attribute features. Furthermore, these features are used to train Random Forest classifier to predict intermolecular associations. MAN-SDNE achieves a remarkable performance with an AUC of 0.9552 and an AUPR of 0.9338 under five-fold cross-validation. To indicate the ability to predict specific types of interactions, a case study for predicting lncRNA-protein interactions using MAN-SDNE is also executed. Experimental results demonstrate this work offers a systematic insight for understanding the synergistic associations between molecules and complex diseases and provides a network-based computational tool to systematically explore intermolecular interactions.
Asunto(s)
Modelos Biológicos , Biología de Sistemas/métodos , Simulación por Computador , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Preparaciones Farmacéuticas/metabolismo , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismoRESUMEN
As interactions among genetic variants in different genes can be an important factor for predicting complex diseases, many computational methods have been proposed to detect if a particular set of genes has interaction with a particular complex disease. However, even though many such methods have been shown to be useful, they can be made more effective if the properties of gene-gene interactions can be better understood. Towards this goal, we have attempted to uncover patterns in gene-gene interactions and the patterns reveal an interesting property that can be reflected in an inequality that describes the relationship between two genotype variables and a disease-status variable. We show, in this paper, that this inequality can be generalized to [Formula: see text] genotype variables. Based on this inequality, we establish a conditional independence and redundancy (CIR)-based definition of gene-gene interaction and the concept of an interaction group. From these new definitions, a novel measure of gene-gene interaction is then derived. We discuss the properties of these concepts and explain how they can be used in a novel algorithm to detect high-order gene-gene interactions. Experimental results using both simulated and real datasets show that the proposed method can be very promising.
Asunto(s)
Algoritmos , Epistasis Genética , Estudios de Casos y Controles , Biología Computacional/métodos , Frecuencia de los Genes , Genotipo , Hemoglobinopatías/genética , Humanos , Desequilibrio de Ligamiento , Malaria Falciparum/genética , Polimorfismo de Nucleótido Simple , Globinas alfa/genéticaRESUMEN
Graph clustering, which aims at discovering sets of related vertices in graph-structured data, plays a crucial role in various applications, such as social community detection and biological module discovery. With the huge increase in the volume of data in recent years, graph clustering is used in an increasing number of real-life scenarios. However, the classical and state-of-the-art methods, which consider only single-view features or a single vector concatenating features from different views and neglect the contextual correlation between pairwise features, are insufficient for the task, as features that characterize vertices in a graph are usually from multiple views and the contextual correlation between pairwise features may influence the cluster preference for vertices. To address this challenging problem, we introduce in this paper, a novel graph clustering model, dubbed contextual correlation preserving multiview featured graph clustering (CCPMVFGC) for discovering clusters in graphs with multiview vertex features. Unlike most of the aforementioned approaches, CCPMVFGC is capable of learning a shared latent space from multiview features as the cluster preference for each vertex and making use of this latent space to model the inter-relationship between pairwise vertices. CCPMVFGC uses an effective method to compute the degree of contextual correlation between pairwise vertex features and utilizes view-wise latent space representing the feature-cluster preference to model the computed correlation. Thus, the cluster preference learned by CCPMVFGC is jointly inferred by multiview features, view-wise correlations of pairwise features, and the graph topology. Accordingly, we propose a unified objective function for CCPMVFGC and develop an iterative strategy to solve the formulated optimization problem. We also provide the theoretical analysis of the proposed model, including convergence proof and computational complexity analysis. In our experiments, we extensively compare the proposed CCPMVFGC with both classical and state-of-the-art graph clustering methods on eight standard graph datasets (six multiview and two single-view datasets). The results show that CCPMVFGC achieves competitive performance on all eight datasets, which validates the effectiveness of the proposed model.
RESUMEN
Long noncoding RNAs (lncRNAs) is an important class of non-protein coding RNAs. They have recently been found to potentially be able to act as a regulatory molecule in some important biological processes. MicroRNAs (miRNAs) have been confirmed to be closely related to the regulation of various human diseases. Recent studies have suggested that lncRNAs could interact with miRNAs to modulate their regulatory roles. Hence, predicting lncRNA-miRNA interactions are biologically significant due to their potential roles in determining the effectiveness of diagnostic biomarkers and therapeutic targets for various human diseases. For the details of the mechanisms to be better understood, it would be useful if some computational approaches are developed to allow for such investigations. As diverse heterogeneous datasets for describing lncRNA and miRNA have been made available, it becomes more feasible for us to develop a model to describe potential interactions between lncRNAs and miRNAs. In this work, we present a novel computational approach called LMNLMI for such purpose. LMNLMI works in several phases. First, it learns patterns from expression, sequences and functional data. Based on the patterns, it then constructs several networks including an expression-similarity network, a functional-similarity network, and a sequence-similarity network. Based on a measure of similarities between these networks, LMNLMI computes an interaction score for each pair of lncRNA and miRNA in the database. The novelty of LMNLMI lies in the use of a network fusion technique to combine the patterns inherent in multiple similarity networks and a matrix completion technique in predicting interaction relationships. Using a set of real data, we show that LMNLMI can be a very effective approach for the accurate prediction of lncRNA-miRNA interactions.
Asunto(s)
Biología Computacional/métodos , MicroARNs , ARN Largo no Codificante , Transcriptoma/genética , Bases de Datos Genéticas , Enfermedad , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Modelos Genéticos , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismoRESUMEN
Molecular components that are functionally interdependent in human cells constitute molecular association networks. Disease can be caused by disturbance of multiple molecular interactions. New biomolecular regulatory mechanisms can be revealed by discovering new biomolecular interactions. To this end, a heterogeneous molecular association network is formed by systematically integrating comprehensive associations between miRNAs, lncRNAs, circRNAs, mRNAs, proteins, drugs, microbes, and complex diseases. We propose a machine learning method for predicting intermolecular interactions, named MMI-Pred. More specifically, a network embedding model is developed to fully exploit the network behavior of biomolecules, and attribute features are also calculated. Then, these discriminative features are combined to train a random forest classifier to predict intermolecular interactions. MMI-Pred achieves an outstanding performance of 93.50% accuracy in hybrid associations prediction under 5-fold cross-validation. This work provides systematic landscape and machine learning method to model and infer complex associations between various biological components.
RESUMEN
The low proportion and the rapid evolvement of major adverse cardiac events (MACE) present challenges for predicting MACE by machine learning models. In this paper, we propose a method to predict MACE from large-scale imbalanced EMR data by using a network-based one-class classifier. It only used the reliably known MACE samples to establish the hyperspherical model. Experiments show that our model outperforms the state-of-the-art models.
Asunto(s)
Síndrome Coronario Agudo , HumanosRESUMEN
To classify proteins into functional families based on their primary sequences, popular algorithms such as the k-NN-, HMM-, and SVM-based algorithms are often used. For many of these algorithms to perform their tasks, protein sequences need to be properly aligned first. Since the alignment process can be error-prone, protein classification may not be performed very accurately. To improve classification accuracy, we propose an algorithm, called the Unaligned Protein SEquence Classifier (UPSEC), which can perform its tasks without sequence alignment. UPSEC makes use of a probabilistic measure to identify residues that are useful for classification in both positive and negative training samples, and can handle multi-class classification with a single classifier and a single pass through the training data. UPSEC has been tested with real protein data sets. Experimental results show that UPSEC can effectively classify unaligned protein sequences into their corresponding functional families, and the patterns it discovers during the training process can be biologically meaningful.
Asunto(s)
Algoritmos , Proteínas/química , Proteínas/clasificación , Análisis de Secuencia de Proteína , Secuencia de Aminoácidos , Matemática , Proteínas/genética , Alineación de SecuenciaRESUMEN
The problem of identifying protein complexes in Protein-Protein Interaction (PPI) networks is usually formulated as the problem of identifying dense regions in such networks. In this paper, we present a novel approach, called TBPCI, to identify protein complexes based instead on the concept of a measure of boundedness. Such a measure is defined as an objective function of a Jaccard Index-based connectedness measure which takes into consideration how much two proteins within a network are connected to each other, and an association measure which takes into consideration how much two connecting proteins are associated based on their attributes found in the Gene Ontology database. Based on the above two measures, the objective function is derived to capture how strong the proteins can be considered as bounded together and the objective value is therefore referred as the aggregated degree of boundedness. To identify protein complexes, TBPCI computes the degree of boundedness between all possible pairwise proteins. Then, TBPCI uses a Breadth-First-Search method to determine whether a protein-pair should be incorporated into the same complex. TBPCI has been tested with several real data sets and the experimental results show it is an effective approach for identifying protein complexes in PPI networks.