Deep network embedding with dimension selection.

Dong, Tianning; Sun, Yan; Liang, Faming

Dong, Tianning; Sun, Yan; Liang, Faming.

Affiliation

Dong T; Department of Statistics, Purdue University, West Lafayette, IN 47907, United States of America.
Sun Y; Department of Statistics, Purdue University, West Lafayette, IN 47907, United States of America.
Liang F; Department of Statistics, Purdue University, West Lafayette, IN 47907, United States of America. Electronic address: fmliang@purdue.edu.

Neural Netw ; 179: 106512, 2024 Nov.

Article de En | MEDLINE | ID: mdl-39032394

ABSTRACT

ABSTRACT

Network embedding is a general-purpose machine learning technique that converts network data from non-Euclidean space to Euclidean space, facilitating downstream analyses for the networks. However, existing embedding methods are often optimization-based, with the embedding dimension determined in a heuristic or ad hoc way, which can cause potential bias in downstream statistical inference. Additionally, existing deep embedding methods can suffer from a nonidentifiability issue due to the universal approximation power of deep neural networks. We address these issues within a rigorous statistical framework. We treat the embedding vectors as missing data, reconstruct the network features using a sparse decoder, and simultaneously impute the embedding vectors and train the sparse decoder using an adaptive stochastic gradient Markov chain Monte Carlo (MCMC) algorithm. Under mild conditions, we show that the sparse decoder provides a parsimonious mapping from the embedding space to network features, enabling effective selection of the embedding dimension and overcoming the nonidentifiability issue encountered by existing deep embedding methods. Furthermore, we show that the embedding vectors converge weakly to a desired posterior distribution in the 2-Wasserstein distance, addressing the potential bias issue experienced by existing embedding methods. This work lays down the first theoretical foundation for network embedding within the framework of missing data imputation.

Sujet(s)
Mots clés

Deep learning; Dimension selection; Embedding; Imputation; Social network; Stochastic Gradient Markov chain Monte Carlo

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Algorithmes / Chaines de Markov / 29935 Limites: Humans Langue: En Journal: Neural Netw / Neural netw / Neural networks Sujet du journal: NEUROLOGIA Année: 2024 Type de document: Article Pays d'affiliation: États-Unis d'Amérique Pays de publication: États-Unis d'Amérique

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google