Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4273-4285, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34591772

RESUMO

Organizing the implicit topology of a document as a graph, and further performing feature extraction via the graph convolutional network (GCN), has proven effective in document analysis. However, existing document graphs are often restricted to expressing single-level relations, which are predefined and independent of downstream learning. A set of learnable hierarchical graphs are built to explore multilevel sentence relations, assisted by a hierarchical probabilistic topic model. Based on these graphs, multiple parallel GCNs are used to extract multilevel semantic features, which are aggregated by an attention mechanism for different document-comprehension tasks. Equipped with variational inference, the graph construction and GCN are learned jointly, allowing the graphs to evolve dynamically to better match the downstream task. The effectiveness and efficiency of the proposed multilevel sentence relation graph convolutional network (MuserGCN) is demonstrated via experiments on document classification, abstractive summarization, and matching.

2.
IEEE Trans Neural Netw Learn Syst ; 34(9): 6003-6014, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34919523

RESUMO

Zero-shot image recognition aims to classify data from unseen classes, by exploring the association between visual features and the semantic representations of each class. Most existing approaches focus on learning a shared single-scale embedding space (often at the output layer of the network) for both visual and semantic features, ignoring a fact that different-scale visual features exhibit different semantics. In this article, we propose a multi-scale visual-attribute co-attention (mVACA) model, considering both visual-semantic alignment and visual discrimination at multiple scales. At each scale, a hybrid visual attention is realized by attribute-related attention and visual self-attention. The attribute-related attention is guided by a pseudo attribute vector inferred via a mutual information regularization (MIR). The visual self-attentive features further influence the attribute attention to emphasize visual-associated attributes. Leveraging multiscale visual discrimination, mVACA unifies standard zero-shot learning (ZSL) and generalized ZSL tasks in one framework, achieving state-of-the-art or competitive performance on several commonly used benchmarks of both setups. To better understand the interaction between images and attributes in mVACA, we also provide visualized analysis.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2264-2281, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35324434

RESUMO

Conventional high-speed and spectral imaging systems are expensive and they usually consume a significant amount of memory and bandwidth to save and transmit the high-dimensional data. By contrast, snapshot compressive imaging (SCI), where multiple sequential frames are coded by different masks and then summed to a single measurement, is a promising idea to use a 2-dimensional camera to capture 3-dimensional scenes. In this paper, we consider the reconstruction problem in SCI, i.e., recovering a series of scenes from a compressed measurement. Specifically, the measurement and modulation masks are fed into our proposed network, dubbed BIdirectional Recurrent Neural networks with Adversarial Training (BIRNAT) to reconstruct the desired frames. BIRNAT employs a deep convolutional neural network with residual blocks and self-attention to reconstruct the first frame, based on which a bidirectional recurrent neural network is utilized to sequentially reconstruct the following frames. Moreover, we build an extended BIRNAT-color algorithm for color videos aiming at joint reconstruction and demosaicing. Extensive results on both video and spectral, simulation and real data from three SCI cameras demonstrate the superior performance of BIRNAT.

4.
IEEE Trans Neural Netw Learn Syst ; 33(2): 721-735, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33136546

RESUMO

Due to hardware limitations, it is challenging for sensors to acquire images of high resolution in both spatial and spectral domains, which arouses a trend that utilizing a low-resolution hyperspectral image (LR-HSI) and a high-resolution multispectral image (HR-MSI) to fuse an HR-HSI in an unsupervised manner. Considering the fact that most existing methods are restricted by using linear spectral unmixing, we propose a nonlinear variational probabilistic generative model (NVPGM) for the unsupervised fusion task based on nonlinear unmixing. We model the joint full likelihood of the observed pixels in an LR-HSI and an HR-MSI, both of which are assumed to be generated from the corresponding latent representations, i.e., the abundance vectors. The sufficient statistics of the generative conditional distributions are nonlinear functions with respect to the latent variable, realized by neural networks, which results in a nonlinear spectral mixture model. For scalability and efficiency, we construct two recognition models to infer the latent representations, which are parameterized by neural networks as well. Simultaneously inferring the latent representations and optimizing the parameters are achieved using stochastic gradient variational inference, after which the target HR-HSI is retrieved via feedforward mapping. Though without supervised information about the HR-HSI, NVPGM still can be trained based on extra LR-HSI and HR-MSI data sets in advance unsupervisedly and processes the images at the test phase in real time. Three commonly used data sets are used to evaluate the effectiveness and efficiency of NVPGM, illustrating the outperformance of NVPGM in the unsupervised LR-HSI and HR-MSI fusion task.

5.
IEEE Trans Cybern ; 52(5): 3936-3946, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-32991299

RESUMO

In this article, considering the supervised dimensionality reduction, we first propose a model, called infinite Bayesian max-margin linear discriminant projection (iMMLDP), by assembling a set of local regions, where we make use of Bayesian nonparametric priors to handle the model selection problem, for example, the underlying number of local regions. In each local region, our model jointly learns a discriminative subspace and the corresponding classifier. Under this framework, iMMLDP combines dimensionality reduction, clustering, and classification in a principled way. Moreover, to deal with more complex data, for example, a local nonlinear separable structure, we extend the linear projection to a nonlinear case based on the kernel trick and develop an infinite kernel max-margin discriminant projection (iKMMDP) model. Thanks to the conjugate property, the parameters in these two models can be inferred efficiently via the Gibbs sampler. Finally, we implement our models on synthesized and real-world data, including multimodally distributed datasets and measured radar image data, to validate their efficiency and effectiveness.

6.
IEEE Trans Cybern ; 52(10): 11156-11171, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33909580

RESUMO

For multimodal representation learning, traditional black-box approaches often fall short of extracting interpretable multilayer hidden structures, which contribute to visualize the connections between different modalities at multiple semantic levels. To extract interpretable multimodal latent representations and visualize the hierarchial semantic relationships between different modalities, based on deep topic models, we develop a novel multimodal Poisson gamma belief network (mPGBN) that tightly couples the observations of different modalities via imposing sparse connections between their modality-specific hidden layers. To alleviate the time-consuming Gibbs sampler adopted by traditional topic models in the testing stage, we construct a Weibull-based variational inference network (encoder) to directly map the observations to their latent representations, and further combine it with the mPGBN (decoder), resulting in a novel multimodal Weibull variational autoencoder (MWVAE), which is fast in out-of-sample prediction and can handle large-scale multimodal datasets. Qualitative evaluations on bimodal data consisting of image-text pairs show that the developed MWVAE can successfully extract expressive multimodal latent representations for downstream tasks like missing modality imputation and multimodal retrieval. Further extensive quantitative results demonstrate that both MWVAE and its supervised extension sMWVAE achieve state-of-the-art performance on various multimodal benchmarks.


Assuntos
Aprendizado de Máquina , Modelos Teóricos , Aprendizagem
7.
Artigo em Inglês | MEDLINE | ID: mdl-36256717

RESUMO

Text generation is a key component of many natural language tasks. Motivated by the success of generative adversarial networks (GANs) for image generation, many text-specific GANs have been proposed. However, due to the discrete nature of text, these text GANs often use reinforcement learning (RL) or continuous relaxations to calculate gradients during learning, leading to high-variance or biased estimation. Furthermore, the existing text GANs often suffer from mode collapse (i.e., they have limited generative diversity). To tackle these problems, we propose a new text GAN model named text feature GAN (TFGAN), where adversarial learning is performed in a continuous text feature space. In the adversarial game, GPT2 provides the "true" features, while the generator of TFGAN learns from them. TFGAN is trained by maximum likelihood estimation on text space and adversarial learning on text feature space, effectively combining them into a single objective, while alleviating mode collapse. TFGAN achieves appealing performance in text generation tasks, and it can also be used as a flexible framework for learning text representations.

8.
IEEE Trans Cybern ; 49(7): 2454-2466, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29993594

RESUMO

In this paper, a unified Bayesian max-margin discriminant projection framework is proposed, which is able to jointly learn the discriminant feature space and the max-margin classifier with different relationships between the latent representations and observations. We assume that the latent representation follows a normal distribution whose sufficient statistics are functions of the observations. The function can be flexibly realized through either shallow or deep structures. The shallow structure includes linear, nonlinear kernel-based functions, and even the convolutional projection, which can be further trained layerwisely to build a multilayered convolutional feature learning model. To take the advantage of the deep neural networks, especially their highly expressive ability and efficient parameter learning, we integrate Bayesian modeling and the popular neural networks, for example, mltilayer perceptron and convolutional neural network, to build an end-to-end Bayesian deep discriminant projection under the proposed framework, which degenerated into the existing shallow linear or convolutional projection with the single-layer structure. Moreover, efficient scalable inferences for the realizations with different functions are derived to handle large-scale data via a stochastic gradient Markov chain Monte Carlo. Finally, we demonstrate the effectiveness and efficiency of the proposed models by the experiments on real-world data, including four image benchmarks (MNIST, CIFAR-10, STL-10, and SVHN) and one measured radar high-resolution range profile dataset, with the detailed analysis about the parameters and computational complexity.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA