Rechercher | Portail Régional BVS

Local conformal autoencoder for standardized data coordinates.

Peterfreund, Erez; Lindenbaum, Ofir; Dietrich, Felix; Bertalan, Tom; Gavish, Matan; Kevrekidis, Ioannis G; Coifman, Ronald R.

Proc Natl Acad Sci U S A ; 117(49): 30918-30927, 2020 12 08.

Article de Anglais | MEDLINE | ID: mdl-33229581

RÉSUMÉ

We propose a local conformal autoencoder (LOCA) for standardized data coordinates. LOCA is a deep learning-based method for obtaining standardized data coordinates from scientific measurements. Data observations are modeled as samples from an unknown, nonlinear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized, latent variables. We assume a repeated measurement sampling strategy, common in scientific measurements, and present a method for learning an embedding in [Formula: see text] that is isometric to the latent variables of the manifold. The coordinates recovered by our method are invariant to diffeomorphisms of the manifold, making it possible to match between different instrumental observations of the same phenomenon. Our embedding is obtained using LOCA, which is an algorithm that learns to rectify deformations by using a local z-scoring procedure, while preserving relevant geometric information. We demonstrate the isometric embedding properties of LOCA in various model settings and observe that it exhibits promising interpolation and extrapolation capabilities, superior to the current state of the art. Finally, we demonstrate LOCA's efficacy in single-site Wi-Fi localization data and for the reconstruction of three-dimensional curved surfaces from two-dimensional projections.

Sujet(s)

Algorithmes , Analyse de données , Normes de référence

The Spectral Underpinning of word2vec.

Jaffe, Ariel; Kluger, Yuval; Lindenbaum, Ofir; Patsenker, Jonathan; Peterfreund, Erez; Steinerberger, Stefan.

Front Appl Math Stat ; 62020 Dec.

Article de Anglais | MEDLINE | ID: mdl-34504892

RÉSUMÉ

Word2vec introduced by Mikolov et al. is a word embedding method that is widely used in natural language processing. Despite its success and frequent use, a strong theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.

RÉSUMÉ

Sujet(s)

RÉSUMÉ

ENVOYER À:

SÉLECTION CITATIONS

DÉTAIL DE RECHERCHE