Pesquisa | Portal de Pesquisa da BVS

Fast optimization of non-negative matrix tri-factorization.

Copar, Andrej; Zupan, Blaz; Zitnik, Marinka.

PLoS One ; 14(6): e0217994, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31185054

RESUMO

Non-negative matrix tri-factorization (NMTF) is a popular technique for learning low-dimensional feature representation of relational data. Currently, NMTF learns a representation of a dataset through an optimization procedure that typically uses multiplicative update rules. This procedure has had limited success, and its failure cases have not been well understood. We here perform an empirical study involving six large datasets comparing multiplicative update rules with three alternative optimization methods, including alternating least squares, projected gradients, and coordinate descent. We find that methods based on projected gradients and coordinate descent converge up to twenty-four times faster than multiplicative update rules. Furthermore, alternating least squares method can quickly train NMTF models on sparse datasets but often fails on dense datasets. Coordinate descent-based NMTF converges up to sixteen times faster compared to well-established methods.

Assuntos

Algoritmos , Modelos Teóricos , Bases de Dados Factuais

Democratized image analytics by visual programming through integration of deep models and small-scale machine learning.

Godec, Primoz; Pancur, Matjaz; Ilenic, Nejc; Copar, Andrej; Strazar, Martin; Erjavec, Ales; Pretnar, Ajda; Demsar, Janez; Staric, Anze; Toplak, Marko; Zagar, Lan; Hartman, Jan; Wang, Hamilton; Bellazzi, Riccardo; Petrovic, Uros; Garagna, Silvia; Zuccotti, Maurizio; Park, Dongsu; Shaulsky, Gad; Zupan, Blaz.

Nat Commun ; 10(1): 4551, 2019 10 07.

Artigo em Inglês | MEDLINE | ID: mdl-31591416

RESUMO

Analysis of biomedical images requires computational expertize that are uncommon among biomedical scientists. Deep learning approaches for image analysis provide an opportunity to develop user-friendly tools for exploratory data analysis. Here, we use the visual programming toolbox Orange ( http://orange.biolab.si ) to simplify image analysis by integrating deep-learning embedding, machine learning procedures, and data visualization. Orange supports the construction of data analysis workflows by assembling components for data preprocessing, visualization, and modeling. We equipped Orange with components that use pre-trained deep convolutional networks to profile images with vectors of features. These vectors are used in image clustering and classification in a framework that enables mining of image sets for both novel and experienced users. We demonstrate the utility of the tool in image analysis of progenitor cells in mouse bone healing, identification of developmental competence in mouse oocytes, subcellular protein localization in yeast, and developmental morphology of social amoebae.

Assuntos

Biologia Computacional/métodos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Redes Neurais de Computação , Animais , Dictyostelium/citologia , Dictyostelium/crescimento & desenvolvimento , Dictyostelium/metabolismo , Proteínas de Fluorescência Verde/genética , Proteínas de Fluorescência Verde/metabolismo , Internet , Estágios do Ciclo de Vida , Camundongos Transgênicos , Oócitos/metabolismo , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo

Scalable non-negative matrix tri-factorization.

Copar, Andrej; Zitnik, Marinka; Zupan, Blaz.

BioData Min ; 10: 41, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29299064

RESUMO

BACKGROUND: Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disease association mining. Matrix factorization learns a latent data model that takes a data matrix and transforms it into a latent feature space enabling generalization, noise removal and feature discovery. However, factorization algorithms are numerically intensive, and hence there is a pressing challenge to scale current algorithms to work with large datasets. Our focus in this paper is matrix tri-factorization, a popular method that is not limited by the assumption of standard matrix factorization about data residing in one latent space. Matrix tri-factorization solves this by inferring a separate latent space for each dimension in a data matrix, and a latent mapping of interactions between the inferred spaces, making the approach particularly suitable for biomedical data mining. RESULTS: We developed a block-wise approach for latent factor learning in matrix tri-factorization. The approach partitions a data matrix into disjoint submatrices that are treated independently and fed into a parallel factorization system. An appealing property of the proposed approach is its mathematical equivalence with serial matrix tri-factorization. In a study on large biomedical datasets we show that our approach scales well on multi-processor and multi-GPU architectures. On a four-GPU system we demonstrate that our approach can be more than 100-times faster than its single-processor counterpart. CONCLUSIONS: A general approach for scaling non-negative matrix tri-factorization is proposed. The approach is especially useful parallel matrix factorization implemented in a multi-GPU environment. We expect the new approach will be useful in emerging procedures for latent factor analysis, notably for data integration, where many large data matrices need to be collectively factorized.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA