RESUMEN
Many tasks involve learning representations from matrices, and Non-negative Matrix Factorization (NMF) has been widely used due to its excellent interpretability. Through factorization, sample vectors are reconstructed as additive combinations of latent factors, which are represented as non-negative distributions over the raw input features. NMF models are significantly affected by latent factors' distribution characteristics and the correlations among them. And NMF models are faced with the challenge of learning robust latent factor. To this end, we propose to learn representations with an awareness of the semantic quality evaluated from the aspects of intra- and inter-factors. On the one hand, a Maximum Entropy-based function is devised for the intra-factor semantic quality. On the other hand, the semantic uniqueness is evaluated via inter-factor correlation, which reinforces the aim of semantic compactness. Moreover, we present a novel non-linear NMF framework. The learning algorithm is presented and the convergence is theoretically analyzed and proved. Extensive experimental results on multiple datasets demonstrate that our method can be successfully applied to representative NMF models and boost performances over state-of-the-art models.
Asunto(s)
Aprendizaje Automático , Entropía , SemánticaRESUMEN
Gene-phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene-phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene-phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene-phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC[Formula: see text]NMF). Specifically, first we introduce the depth of parent-child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene-phenotype association datasets of mouse and human demonstrate that GC[Formula: see text]NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.
Asunto(s)
Algoritmos , Estudios de Asociación Genética , Animales , Humanos , RatonesRESUMEN
BACKGROUND: Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein interactions that exist in the dataset. To the best of our knowledge, the false positive protein-protein interactions have not been considered before in disease gene prioritization. Label propagation has been successfully applied to prioritize disease causing genes in previous network-based methods. These network-based methods use basic label propagation, i.e. random walk, on networks to prioritize disease genes in different ways. However, all these methods can not deal with the situation in which plenty false positive protein-protein interactions exist in the dataset, because the PPI network is used as a fixed input in previous methods. This important characteristic of data source may cause a large deviation in results. RESULTS: A novel network-based framework IDLP is proposed to prioritize candidate disease genes. IDLP effectively propagates labels throughout the PPI network and the phenotype similarity network. It avoids the method falling when few disease genes are known. Meanwhile, IDLP models the bias caused by false positive protein interactions and other potential factors by treating the PPI network matrix and the phenotype similarity matrix as the matrices to be learnt. By amending the noises in training matrices, it improves the performance results significantly. We conduct extensive experiments over OMIM datasets, and IDLP has demonstrated its effectiveness compared with eight state-of-the-art approaches. The robustness of IDLP is also validated by doing experiments with disturbed PPI network. Furthermore, We search the literatures to verify the predicted new genes got by IDLP are associated with the given diseases, the high prediction accuracy shows IDLP can be a powerful tool to help biologists discover new disease genes. CONCLUSIONS: IDLP model is an effective method for disease gene prioritization, particularly for querying phenotypes without known associated genes, which would be greatly helpful for identifying disease genes for less studied phenotypes. AVAILABILITY: https://github.com/nkiip/IDLP.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Enfermedad/genética , Área Bajo la Curva , Humanos , Enfermedad de Parkinson/genética , Fenotipo , Mapas de Interacción de Proteínas/genética , Estadística como AsuntoRESUMEN
Discovering gene-phenotype associations is significant to understand the disease mechanisms. Nonnegative matrix factorization (NMF) has been widely used in computational biology for its good performance and interpretability. In this paper, we proposed a novel metrical consistency NMF (MCNMF) method for candidate gene prioritization. The MCNMF method assume that phenotype similarities, calculated from various independent ways, should be consistent in case that the associations between genes and phenotypes are completely known. Experiment results show that our method can recover the gene-phenotype associations effectively and outperform the comparative methods.
Asunto(s)
Algoritmos , Estudios de Asociación Genética , Humanos , Fenotipo , Mapeo de Interacción de ProteínasRESUMEN
Non-invasive fluorescent imaging of preclinical animal models in vivo is a rapidly developing field with new emerging technologies and techniques. Quantum dot (QD) fluorescent probes with longer emission wavelengths in red and near infrared (NIR) emission ranges are more amenable to deep-tissue imaging, because both scattering and autofluorescence are reduced as wavelengths are increased. We have designed and synthesized red CdTe and NIR CdHgTe QDs for fluorescent imaging. We demonstrated fluorescent imaging by using CdTe and CdHgTe QDs as fluorescent probes both in vitro and in vivo. Both CdTe and CdHgTe QDs provided sensitive detection over background autofluorescence in tissue biopsies and live mice, making them attractive probes for in vivo imaging extending into deep tissues or whole animals. The studies suggest a basis of using QD-antibody conjugates to detect membrane antigens.
Asunto(s)
Compuestos de Cadmio/química , Compuestos de Mercurio/química , Puntos Cuánticos , Telurio/química , Animales , Colorantes Fluorescentes/química , Rayos Infrarrojos , Ratones , Ratones Desnudos , Imagen de Cuerpo Entero/métodosRESUMEN
We report the use of novel multicolored CdTe quantum dots (QDs) as fluorophores for biological fluorescence imaging. The CdTe QDs were prepared to exhibit emission wavelengths in the green, yellow, and red range by using trifluoroacetic acid (TFA), L-cysteine and thioglycolic acid (TGA) as surface stabilizers, respectively. The particles have good water solubility and photostability. Fluorescence imaging potential was evaluated in vitro and in vivo using a multispectral Maestro CRI Fluorescence Imaging system. The results show that different colored CdTe QDs allow sensitive detection simultaneously or separately both in vitro and in vivo against background fluorescence. The studies indicate that CdTe QDs can provide alternative fluorescent probes for biological imaging.