Reconstructing controllable faces from brain activity with hierarchical multiview representations.

Ren, Ziqi; Li, Jie; Xue, Xuetong; Li, Xin; Yang, Fan; Jiao, Zhicheng; Gao, Xinbo

Ren, Ziqi; Li, Jie; Xue, Xuetong; Li, Xin; Yang, Fan; Jiao, Zhicheng; Gao, Xinbo.

Afiliação

Ren Z; School of Electronic Engineering, Xidian University, Xi'an 710071, China.
Li J; School of Electronic Engineering, Xidian University, Xi'an 710071, China.
Xue X; School of Electronic Engineering, Xidian University, Xi'an 710071, China.
Li X; Group 42 (G42), Abu Dhabi, United Arab Emirates.
Yang F; Group 42 (G42), Abu Dhabi, United Arab Emirates.
Jiao Z; The Warren Alpert Medical School, Brown University, RI, USA; Department of Diagnostic Imaging, Rhode Island Hospital, RI, USA.
Gao X; School of Electronic Engineering, Xidian University, Xi'an 710071, China. Electronic address: xbgao@mail.xidian.edu.cn.

Neural Netw ; 166: 487-500, 2023 Sep.

Article em En | MEDLINE | ID: mdl-37574622

ABSTRACT

ABSTRACT

Reconstructing visual experience from brain responses measured by functional magnetic resonance imaging (fMRI) is a challenging yet important research topic in brain decoding, especially it has proved more difficult to decode visually similar stimuli, such as faces. Although face attributes are known as the key to face recognition, most existing methods generally ignore how to decode facial attributes more precisely in perceived face reconstruction, which often leads to indistinguishable reconstructed faces. To solve this problem, we propose a novel neural decoding framework called VSPnet (voxel2style2pixel) by establishing hierarchical encoding and decoding networks with disentangled latent representations as media, so that to recover visual stimuli more elaborately. And we design a hierarchical visual encoder (named HVE) to pre-extract features containing both high-level semantic knowledge and low-level visual details from stimuli. The proposed VSPnet consists of two networks Multi-branch cognitive encoder and style-based image generator. The encoder network is constructed by multiple linear regression branches to map brain signals to the latent space provided by the pre-extracted visual features and obtain representations containing hierarchical information consistent to the corresponding stimuli. We make the generator network inspired by StyleGAN to untangle the complexity of fMRI representations and generate images. And the HVE network is composed of a standard feature pyramid over a ResNet backbone. Extensive experimental results on the latest public datasets have demonstrated the reconstruction accuracy of our proposed method outperforms the state-of-the-art approaches and the identifiability of different reconstructed faces has been greatly improved. In particular, we achieve feature editing for several facial attributes in fMRI domain based on the multiview (i.e., visual stimuli and evoked fMRI) latent representations.

Assuntos

Encéfalo; Reconhecimento Psicológico; Encéfalo/diagnóstico por imagem; Encéfalo/fisiologia; Mapeamento Encefálico/métodos; Imageamento por Ressonância Magnética/métodos; Análise Multivariada

Palavras-chave

Face reconstruction; Feature disentanglement; Hierarchical multiview representations; Neural decoding; StyleGAN; fMRI

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Encéfalo / Reconhecimento Psicológico Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google