Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
IEEE Trans Cybern ; 54(3): 1353-1365, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37262118

RESUMO

Continually capturing novel concepts without forgetting is one of the most critical functions sought for in artificial intelligence systems. However, even the most advanced deep learning networks are prone to quickly forgetting previously learned knowledge after training with new data. The proposed lifelong dual generative adversarial networks (LD-GANs) consist of two generative adversarial networks (GANs), namely, a Teacher and an Assistant teaching each other in tandem while successively learning a series of tasks. A single discriminator is used to decide the realism of generated images by the dual GANs. A new training algorithm, called the lifelong self knowledge distillation (LSKD) is proposed for training the LD-GAN while learning each new task during lifelong learning (LLL). LSKD enables the transfer of knowledge from one more knowledgeable player to the other jointly with learning the information from a newly given dataset, within an adversarial playing game setting. In contrast to other LLL models, LD-GANs are memory efficient and does not require freezing any parameters after learning each given task. Furthermore, we extend the LD-GANs to being the Teacher module in a Teacher-Student network for assimilating data representations across several domains during LLL. Experimental results indicate a better performance for the proposed framework in unsupervised lifelong representation learning when compared to other methods.

2.
Artigo em Inglês | MEDLINE | ID: mdl-37410645

RESUMO

Lifelong learning describes an ability that enables humans to continually acquire and learn new information without forgetting. This capability, common to humans and animals, has lately been identified as an essential function for an artificial intelligence system aiming to learn continuously from a stream of data during a certain period of time. However, modern neural networks suffer from degenerated performance when learning multiple domains sequentially and fail to recognize past learned tasks after being retrained. This corresponds to catastrophic forgetting and is ultimately induced by replacing the parameters associated with previously learned tasks with new values. One approach in lifelong learning is the generative replay mechanism (GRM) that trains a powerful generator as the generative replay network, implemented by a variational autoencoder (VAE) or a generative adversarial network (GAN). In this article, we study the forgetting behavior of GRM-based learning systems by developing a new theoretical framework in which the forgetting process is expressed as an increase in the model's risk during the training. Although many recent attempts have provided high-quality generative replay samples by using GANs, they are limited to mainly downstream tasks due to the lack of inference. Inspired by the theoretical analysis while aiming to address the drawbacks of existing approaches, we propose the lifelong generative adversarial autoencoder (LGAA). LGAA consists of a generative replay network and three inference models, each addressing the inference of a different type of latent variable. The experimental results show that LGAA learns novel visual concepts without forgetting and can be applied to a wide range of downstream tasks.

3.
Neural Netw ; 164: 245-263, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37163844

RESUMO

Content-based image retrieval (CBIR) aims to provide the most similar images to a given query. Feature extraction plays an essential role in retrieval performance within a CBIR pipeline. Current CBIR studies would either uniformly extract feature information from the input image and use it directly or employ some trainable spatial weighting module which is then used for similarity comparison between pairs of query and candidate matching images. These spatial weighting modules are normally query non-sensitive and only based on the knowledge learned during the training stage. They may focus towards incorrect regions, especially when the target image is not salient or is surrounded by distractors. This paper proposes an efficient query sensitive co-attention1 mechanism for large-scale CBIR tasks. In order to reduce the extra computation cost required by the query sensitivity to the co-attention mechanism, the proposed method employs clustering of the selected local features. Experimental results indicate that the co-attention maps can provide the best retrieval results on benchmark datasets under challenging situations, such as having completely different image acquisition conditions between the query and its match image.


Assuntos
Armazenamento e Recuperação da Informação , Reconhecimento Automatizado de Padrão , Reconhecimento Automatizado de Padrão/métodos , Diagnóstico por Imagem , Aprendizagem , Análise por Conglomerados
4.
Cognition ; 231: 105319, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36399902

RESUMO

Humans can effortlessly assess the complexity of the visual stimuli they encounter. However, our understanding of how we do this, and the relevant factors that result in our perception of scene complexity remain unclear; especially for the natural scenes in which we are constantly immersed. We introduce several new datasets to further understanding of human perception of scene complexity. Our first dataset (VISC-C) contains 800 scenes and 800 corresponding two-dimensional complexity annotations gathered from human observers, allowing exploration for how complexity perception varies across a scene. Our second dataset, (VISC-CI) consists of inverted scenes (reflection on the horizontal axis) with corresponding complexity maps, collected from human observers. Inverting images in this fashion is associated with destruction of semantic scene characteristics when viewed by humans, and hence allows analysis of the impact of semantics on perceptual complexity. We analysed perceptual complexity from both a single-score and a two-dimensional perspective, by evaluating a set of calculable and observable perceptual features based upon grounded psychological research (clutter, symmetry, entropy and openness). We considered these factors' relationship to complexity via hierarchical regressions analyses, tested the efficacy of various neural models against our datasets, and validated our perceptual features against a large and varied complexity dataset consisting of nearly 5000 images. Our results indicate that both global image properties and semantic features are important for complexity perception. We further verified this by combining identified perceptual features with the output of a neural network predictor capable of extracting semantics, and found that we could increase the amount of explained human variance in complexity beyond that of low-level measures alone. Finally, we dissect our best performing prediction network, determining that artificial neurons learn to extract both global image properties and semantic details from scenes for complexity prediction. Based on our experimental results, we propose the "dual information" framework of complexity perception, hypothesising that humans rely on both low-level image features and high-level semantic content to evaluate the complexity of images.


Assuntos
Aprendizagem , Percepção Visual , Humanos , Percepção Visual/fisiologia , Semântica
5.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5731-5748, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36355745

RESUMO

Lifelong learning (LLL) represents the ability of an artificial intelligence system to learn successively a sequence of different databases. In this paper we introduce the Dynamic Self-Supervised Teacher-Student Network (D-TS), representing a more general LLL framework, where the Teacher is implemented as a dynamically expanding mixture model which automatically increases its capacity to deal with a growing number of tasks. We propose the Knowledge Discrepancy Score (KDS) criterion for measuring the relevance of the incoming information characterizing a new task when compared to the existing knowledge accumulated by the Teacher module from its previous training. The KDS ensures a light Teacher architecture while also enabling to reuse the learned knowledge whenever appropriate, accelerating the learning of given tasks. The Student module is implemented as a lightweight probabilistic generative model. We introduce a novel self-supervised learning procedure for the Student that allows to capture cross-domain latent representations from the entire knowledge accumulated by the Teacher as well as from novel data. We perform several experiments which show that D-TS can achieve the state of the art results in LLL while requiring fewer parameters than other methods.

6.
IEEE Trans Neural Netw Learn Syst ; 34(1): 461-474, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34370670

RESUMO

In this article, we propose an end-to-end lifelong learning mixture of experts. Each expert is implemented by a variational autoencoder (VAE). The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds (MELBO) on the log-likelihood of the given training samples. The mixing coefficients in the mixture model control the contributions of each expert in the global representation. These are sampled from a Dirichlet distribution whose parameters are determined through nonparametric estimation during lifelong learning. The model can learn new tasks fast when these are similar to those previously learned. The proposed lifelong mixture of VAE (L-MVAE) expands its architecture with new components when learning a completely new task. After the training, our model can automatically determine the relevant expert to be used when fed with new data samples. This mechanism benefits both the memory efficiency and the required computational cost as only one expert is used during the inference. The L-MVAE inference model is able to perform interpolations in the joint latent space across the data domains associated with different tasks and is shown to be efficient for disentangled learning representation.

7.
IEEE Trans Image Process ; 31: 4966-4979, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35853053

RESUMO

A rich video data representation can be realized by means of spatio-temporal frequency analysis. In this research study we show that a video can be disentangled, following the learning of video characteristics according to their spatio-temporal properties, into two complementary information components, dubbed Busy and Quiet. The Busy information characterizes the boundaries of moving regions, moving objects, or regions of change in movement. Meanwhile, the Quiet information encodes global smooth spatio-temporal structures defined by substantial redundancy. We design a trainable Motion Band-Pass Module (MBPM) for separating Busy and Quiet-defined information, in raw video data. We model a Busy-Quiet Net (BQN) by embedding the MBPM into a two-pathway CNN architecture. The efficiency of BQN is determined by avoiding redundancy in the feature spaces defined by the two pathways. While one pathway processes the Busy features, the other processes Quiet features at lower spatio-temporal resolutions reducing both memory and computational costs. Through experiments we show that the proposed MBPM can be used as a plug-in module in various CNN backbone architectures, significantly boosting their performance. The proposed BQN is shown to outperform many recent video models on Something-Something V1, Kinetics400, UCF101 and HMDB51 datasets.

8.
Sci Rep ; 12(1): 1583, 2022 01 28.
Artigo em Inglês | MEDLINE | ID: mdl-35091559

RESUMO

Visual memory schemas (VMS) are two-dimensional memorability maps that capture the most memorable regions of a given scene, predicting with a high degree of consistency human observer's memory for the same images. These maps are hypothesized to correlate with a mental framework of knowledge employed by humans to encode visual memories. In this study, we develop a generative model we term 'MEMGAN' constrained by extracted visual memory schemas that generates completely new complex scene images that vary based on their degree of predicted memorability. The generated populations of high and low memorability images are then evaluated for their memorability using a human observer experiment. We gather VMS maps for these generated images from participants in the memory experiment and compare these with the intended target VMS maps. Following the evaluation of observers' memory performance through both VMS-defined memorability and hit rate, we find significantly superior memory performance by human observers for the highly memorable generated images compared to poorly memorable. Implementing and testing a construct from cognitive science allows us to generate images whose memorability we can manipulate at will, as well as providing a tool for further study of mental schemas in humans.


Assuntos
Estimulação Luminosa
9.
IEEE Trans Neural Netw Learn Syst ; 33(10): 5789-5803, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33872161

RESUMO

Variational autoencoders (VAEs) are one of the most popular unsupervised generative models that rely on learning latent representations of data. In this article, we extend the classical concept of Gaussian mixtures into the deep variational framework by proposing a mixture of VAEs (MVAE). Each component in the MVAE model is implemented by a variational encoder and has an associated subdecoder. The separation between the latent spaces modeled by different encoders is enforced using the d -variable Hilbert-Schmidt independence criterion (dHSIC). Each component would capture different data variational features. We also propose a mechanism for finding the appropriate number of VAE components for a given task, leading to an optimal architecture. The differentiable categorical Gumbel-softmax distribution is used in order to generate dropout masking parameters within the end-to-end backpropagation training framework. Extensive experiments show that the proposed MVAE model can learn a rich latent data representation and is able to discover additional underlying data representation factors.

10.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6280-6296, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34170822

RESUMO

A unique cognitive capability of humans consists in their ability to acquire new knowledge and skills from a sequence of experiences. Meanwhile, artificial intelligence systems are good at learning only the last given task without being able to remember the databases learnt in the past. We propose a novel lifelong learning methodology by employing a Teacher-Student network framework. While the Student module is trained with a new given database, the Teacher module would remind the Student about the information learnt in the past. The Teacher, implemented by a Generative Adversarial Network (GAN), is trained to preserve and replay past knowledge corresponding to the probabilistic representations of previously learnt databases. Meanwhile, the Student module is implemented by a Variational Autoencoder (VAE) which infers its latent variable representation from both the output of the Teacher module as well as from the newly available database. Moreover, the Student module is trained to capture both continuous and discrete underlying data representations across different domains. The proposed lifelong learning framework is applied in supervised, semi-supervised and unsupervised training.


Assuntos
Algoritmos , Inteligência Artificial , Educação Continuada , Humanos , Aprendizagem , Estudantes
11.
IEEE Trans Pattern Anal Mach Intell ; 42(9): 2165-2178, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-31056491

RESUMO

Memorability of an image is a characteristic determined by the human observers' ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the properties of cumulative human annotations. We propose a new concept called the Visual Memory Schema (VMS) referring to an organization of image components human observers share when encoding and recognizing images. The concept of VMS is operationalised by asking human observers to define memorable regions of images they were asked to remember during an episodic memory test. We then statistically assess the consistency of VMSs across observers for either correctly or incorrectly recognised images. The associations of the VMSs with eye fixations and saliency are analysed separately as well. Lastly, we adapt various deep learning architectures for the reconstruction and prediction of memorable regions in images and analyse the results when using transfer learning at the outputs of different convolutional network layers.


Assuntos
Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Memória/fisiologia , Modelos Neurológicos , Percepção Visual/fisiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Inteligência Artificial , Fixação Ocular/fisiologia , Humanos , Pessoa de Meia-Idade , Adulto Jovem
12.
IEEE Trans Cybern ; 50(5): 1989-2001, 2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-30571650

RESUMO

While 3-D steganography and digital watermarking represent methods for embedding information into 3-D objects, 3-D steganalysis aims to find the hidden information. Previous research studies have shown that by estimating the parameters modeling the statistics of 3-D features and feeding them into a classifier we can identify whether a 3-D object carries secret information. For training the steganalyzer, such features are extracted from cover and stego pairs, representing the original 3-D objects and those carrying hidden information. However, in practical applications, the steganalyzer would have to distinguish stego-objects from cover-objects, which most likely have not been used during the training. This represents a significant challenge for existing steganalyzers, raising a challenge known as the cover source mismatch (CSM) problem, which is due to the significant limitation of their generalization ability. This paper proposes a novel feature selection algorithm taking into account both feature robustness and relevance in order to mitigate the CSM problem in 3-D steganalysis. In the context of the proposed methodology, new shapes are generated by distorting those used in the training. Then a subset of features is selected from a larger given set, by assessing their effectiveness in separating cover-objects from stego-objects among the generated sets of objects. Two different measures are used for selecting the appropriate features: 1) the Pearson correlation coefficient and 2) the mutual information criterion.

13.
IEEE Trans Image Process ; 22(5): 1822-35, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23288337

RESUMO

This paper proposes a new approach to 3D watermarking by ensuring the optimal preservation of mesh surfaces. A new 3D surface preservation function metric is defined consisting of the distance of a vertex displaced by watermarking to the original surface, to the watermarked object surface as well as the actual vertex displacement. The proposed method is statistical, blind, and robust. Minimal surface distortion according to the proposed function metric is enforced during the statistical watermark embedding stage using Levenberg-Marquardt optimization method. A study of the watermark code crypto-security is provided for the proposed methodology. According to the experimental results, the proposed methodology has high robustness against the common mesh attacks while preserving the original object surface during watermarking.

14.
IEEE Trans Image Process ; 20(10): 2813-26, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21507773

RESUMO

This paper describes a new statistical approach for watermarking mesh representations of 3-D graphical objects. A robust digital watermarking method has to mitigate among the requirements of watermark invisibility, robustness, embedding capacity and key security. The proposed method employs a mesh propagation distance metric procedure called the fast marching method (FMM), which defines regions of equal geodesic distance width calculated with respect to a reference location on the mesh. Each of these regions is used for embedding a single bit. The embedding is performed by changing the normalized distribution of local geodesic distances from within each region. Two different embedding methods are used by changing the mean or the variance of geodesic distance distributions. Geodesic distances are slightly modified statistically by displacing the vertices in their existing triangle planes. The vertex displacements, performed according to the FMM, ensure a minimal surface distortion while embedding the watermark code. Robustness to a variety of attacks is shown according to experimental results.

15.
IEEE Trans Image Process ; 19(9): 2332-44, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20409993

RESUMO

This paper proposes a new approach, coupling physical models and image estimation techniques, for modelling the movement of fluids. The fluid flow is characterized by turbulent movement and dynamically changing patterns which poses challenges to existing optical flow estimation methods. The proposed methodology, which relies on Navier-Stokes equations, is used for processing fluid optical flow by using a succession of stages such as advection, diffusion and mass conservation. A robust diffusion step jointly considering the local data geometry and its statistics is embedded in the proposed framework. The diffusion kernel is Gaussian with the covariance matrix defined by the local second derivatives. Such an anisotropic kernel is able to implicitly detect changes in the vector field orientation and to diffuse accordingly. A new approach is developed for detecting fluid flow structures such as vortices. The proposed methodology is applied on artificially generated vector fields as well as on various image sequences.

16.
IEEE Trans Syst Man Cybern B Cybern ; 39(6): 1543-55, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19546043

RESUMO

Kernel density estimation is a nonparametric procedure for probability density modeling, which has found several applications in various fields. The smoothness and modeling ability of the functional approximation are controlled by the kernel bandwidth. In this paper, we describe a Bayesian estimation method for finding the bandwidth from a given data set. The proposed bandwidth estimation method is applied in three different computational-intelligence methods that rely on kernel density estimation: 1) scale space; 2) mean shift; and 3) quantum clustering. The third method is a novel approach that relies on the principles of quantum mechanics. This method is based on the analogy between data samples and quantum particles and uses the SchrOdinger potential as a cost function. The proposed methodology is used for blind-source separation of modulated signals and for terrain segmentation based on topography information.

17.
IEEE Trans Syst Man Cybern B Cybern ; 36(4): 849-62, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16903369

RESUMO

This paper proposes a joint maximum likelihood and Bayesian methodology for estimating Gaussian mixture models. In Bayesian inference, the distributions of parameters are modeled, characterized by hyperparameters. In the case of Gaussian mixtures, the distributions of parameters are considered as Gaussian for the mean, Wishart for the covariance, and Dirichlet for the mixing probability. The learning task consists of estimating the hyperparameters characterizing these distributions. The integration in the parameter space is decoupled using an unsupervised variational methodology entitled variational expectation-maximization (VEM). This paper introduces a hyperparameter initialization procedure for the training algorithm. In the first stage, distributions of parameters resulting from successive runs of the expectation-maximization algorithm are formed. Afterward, maximum-likelihood estimators are applied to find appropriate initial values for the hyperparameters. The proposed initialization provides faster convergence, more accurate hyperparameter estimates, and better generalization for the VEM training algorithm. The proposed methodology is applied in blind signal detection and in color image segmentation.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Distribuição Normal
18.
IEEE Trans Image Process ; 15(3): 687-701, 2006 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-16519355

RESUMO

A new methodology for fingerprinting and watermarking three-dimensional (3-D) graphical objects is proposed in this paper. The 3-D graphical objects are described by means of polygonal meshes. The information to be embedded is provided as a binary code. A watermarking methodology has two stages: embedding and detecting the information that has been embedded in the given media. The information is embedded by means of local geometrical perturbations while maintaining the local connectivity. A neighborhood localized measure is used for selecting appropriate vertices for watermarking. A study is undertaken in order to verify the suitability of this measure for selecting vertices from regions where geometrical perturbations are less perceptible. Two different watermarking algorithms, that do not require the original 3-D graphical object in the detection stage, are proposed. The two algorithms differ with respect to the type of constraint to be embedded in the local structure: by using parallel planes and bounding ellipsoids, respectively. The information capacity of various 3-D meshes is analyzed when using the proposed 3-D watermarking algorithms. The robustness of the 3-D watermarking algorithms is tested to noise perturbation and to object cropping.


Assuntos
Algoritmos , Compressão de Dados/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Rotulagem de Produtos/métodos , Processamento de Sinais Assistido por Computador , Gráficos por Computador , Patentes como Assunto , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
19.
IEEE Trans Med Imaging ; 21(2): 100-8, 2002 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-11929098

RESUMO

In this paper, we propose an interpolation algorithm using a mathematical morphology morphing approach. The aim of this algorithm is to reconstruct the n-dimensional object from a group of (n - 1)-dimensional sets representing sections of that object. The morphing transformation modifies pairs of consecutive sets such that they approach in shape and size. The interpolated set is achieved when the two consecutive sets are made idempotent by the morphing transformation. We prove the convergence of the morphological morphing. The entire object is modeled by successively interpolating a certain number of intermediary sets between each two consecutive given sets. We apply the interpolation algorithm for three-dimensional tooth reconstruction.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Modelos Estatísticos , Dente/anatomia & histologia , Simulação por Computador , Humanos , Reconhecimento Automatizado de Padrão , Radiografia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Dente/diagnóstico por imagem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA