Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Neural Netw ; 177: 106392, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38788290

RESUMO

Explainable artificial intelligence (XAI) has been increasingly investigated to enhance the transparency of black-box artificial intelligence models, promoting better user understanding and trust. Developing an XAI that is faithful to models and plausible to users is both a necessity and a challenge. This work examines whether embedding human attention knowledge into saliency-based XAI methods for computer vision models could enhance their plausibility and faithfulness. Two novel XAI methods for object detection models, namely FullGrad-CAM and FullGrad-CAM++, were first developed to generate object-specific explanations by extending the current gradient-based XAI methods for image classification models. Using human attention as the objective plausibility measure, these methods achieve higher explanation plausibility. Interestingly, all current XAI methods when applied to object detection models generally produce saliency maps that are less faithful to the model than human attention maps from the same object detection task. Accordingly, human attention-guided XAI (HAG-XAI) was proposed to learn from human attention how to best combine explanatory information from the models to enhance explanation plausibility by using trainable activation functions and smoothing kernels to maximize the similarity between XAI saliency map and human attention map. The proposed XAI methods were evaluated on widely used BDD-100K, MS-COCO, and ImageNet datasets and compared with typical gradient-based and perturbation-based XAI methods. Results suggest that HAG-XAI enhanced explanation plausibility and user trust at the expense of faithfulness for image classification models, and it enhanced plausibility, faithfulness, and user trust simultaneously and outperformed existing state-of-the-art XAI methods for object detection models.


Assuntos
Inteligência Artificial , Atenção , Humanos , Atenção/fisiologia , Redes Neurais de Computação
2.
Artigo em Inglês | MEDLINE | ID: mdl-38809736

RESUMO

Graph neural networks (GNNs) are widely used for analyzing graph-structural data and solving graph-related tasks due to their powerful expressiveness. However, existing off-the-shelf GNN-based models usually consist of no more than three layers. Deeper GNNs usually suffer from severe performance degradation due to several issues including the infamous "over-smoothing" issue, which restricts the further development of GNNs. In this article, we investigate the over-smoothing issue in deep GNNs. We discover that over-smoothing not only results in indistinguishable embeddings of graph nodes, but also alters and even corrupts their semantic structures, dubbed semantic over-smoothing. Existing techniques, e.g., graph normalization, aim at handling the former concern, but neglect the importance of preserving the semantic structures in the spatial domain, which hinders the further improvement of model performance. To alleviate the concern, we propose a cluster-keeping sparse aggregation strategy to preserve the semantic structure of embeddings in deep GNNs (especially for spatial GNNs). Particularly, our strategy heuristically redistributes the extent of aggregations for all the nodes from layers, instead of aggregating them equally, so that it enables aggregate concise yet meaningful information for deep layers. Without any bells and whistles, it can be easily implemented as a plug-and-play structure of GNNs via weighted residual connections. Last, we analyze the over-smoothing issue on the GNNs with weighted residual structures and conduct experiments to demonstrate the performance comparable to the state-of-the-arts.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38517727

RESUMO

We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visual explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works on classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to one-stage, two-stage, and transformer-based detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art in terms of both effectiveness and efficiency. We discuss two explanation tasks for object detection: 1) object specification: what is the important region for the prediction? 2) object discrimination: which object is detected? Aiming at these two aspects, we present a detailed analysis of the visual explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM. Furthermore, we investigate user trust on the explanation maps, how well the visual explanations of object detectors agrees with human explanations, as measured through human eye gaze, and whether this agreement is related with user trust. Finally, we also propose two applications, ODAM-KD and ODAM-NMS, based on these two abilities of ODAM. ODAM-KD utilizes the object specification of ODAM to generate top-down attention for key predictions and instruct the knowledge distillation of object detection. ODAM-NMS considers the location of the model's explanation for each prediction to distinguish the duplicate detected objects. A training scheme, ODAM-Train, is proposed to improve the quality on object discrimination, and help with ODAM-NMS. The code of ODAM is available: https://github.com/Cyang-Zhao/ODAM.

4.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 2882-2899, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-37995158

RESUMO

Typical approaches that learn crowd density maps are limited to extracting the supervisory information from the loosely organized spatial information in the crowd dot/density maps. This paper tackles this challenge by performing the supervision in the frequency domain. More specifically, we devise a new loss function for crowd analysis called generalized characteristic function loss (GCFL). This loss carries out two steps: 1) transforming the spatial information in density or dot maps to the frequency domain; 2) calculating a loss value between their frequency contents. For step 1, we establish a series of theoretical fundaments by extending the definition of the characteristic function for probability distributions to density maps, as well as proving some vital properties of the extended characteristic function. After taking the characteristic function of the density map, its information in the frequency domain is well-organized and hierarchically distributed, while in the spatial domain it is loose-organized and dispersed everywhere. In step 2, we design a loss function that can fit the information organization in the frequency domain, allowing the exploitation of the well-organized frequency information for the supervision of crowd analysis tasks. The loss function can be adapted to various crowd analysis tasks through the specification of its window functions. In this paper, we demonstrate its power in three tasks: Crowd Counting, Crowd Localization and Noisy Crowd Counting. We show the advantages of our GCFL compared to other SOTA losses and its competitiveness to other SOTA methods by theoretical analysis and empirical results on benchmark datasets. Our codes are available at https://github.com/wbshu/Crowd_Counting_in_the_Frequency_Domain.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15065-15080, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37506001

RESUMO

Point-wise supervision is widely adopted in computer vision tasks such as crowd counting and human pose estimation. In practice, the noise in point annotations may affect the performance and robustness of algorithm significantly. In this paper, we investigate the effect of annotation noise in point-wise supervision and propose a series of robust loss functions for different tasks. In particular, the point annotation noise includes spatial-shift noise, missing-point noise, and duplicate-point noise. The spatial-shift noise is the most common one, and exists in crowd counting, pose estimation, visual tracking, etc, while the missing-point and duplicate-point noises usually appear in dense annotations, such as crowd counting. In this paper, we first consider the shift noise by modeling the real locations as random variables and the annotated points as noisy observations. The probability density function of the intermediate representation (a smooth heat map generated from dot annotations) is derived and the negative log likelihood is used as the loss function to naturally model the shift uncertainty in the intermediate representation. The missing and duplicate noise are further modeled by an empirical way with the assumption that the noise appears at high density region with a high probability. We apply the method to crowd counting, human pose estimation and visual tracking, propose robust loss functions for those tasks, and achieve superior performance and robustness on widely used datasets.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10519-10534, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37027650

RESUMO

Nested dropout is a variant of dropout operation that is able to order network parameters or features based on the pre-defined importance during training. It has been explored for: I. Constructing nested nets Cui et al. 2020, Cui et al. 2021: the nested nets are neural networks whose architectures can be adjusted instantly during testing time, e.g., based on computational constraints. The nested dropout implicitly ranks the network parameters, generating a set of sub-networks such that any smaller sub-network forms the basis of a larger one. II. Learning ordered representation Rippel et al. 2014: the nested dropout applied to the latent representation of a generative model (e.g., auto-encoder) ranks the features, enforcing explicit order of the dense representation over dimensions. However, the dropout rate is fixed as a hyper-parameter during the whole training process. For nested nets, when network parameters are removed, the performance decays in a human-specified trajectory rather than in a trajectory learned from data. For generative models, the importance of features is specified as a constant vector, restraining the flexibility of representation learning. To address the problem, we focus on the probabilistic counterpart of the nested dropout. We propose a variational nested dropout (VND) operation that draws samples of multi-dimensional ordered masks at a low cost, providing useful gradients to the parameters of nested dropout. Based on this approach, we design a Bayesian nested neural network that learns the order knowledge of the parameter distributions. We further exploit the VND under different generative models for learning ordered latent distributions. In experiments, we show that the proposed approach outperforms the nested network in terms of accuracy, calibration, and out-of-domain detection in classification tasks. It also outperforms the related generative models on data generation tasks.


Assuntos
Algoritmos , Redes Neurais de Computação , Humanos , Teorema de Bayes , Aprendizagem
7.
Br J Psychol ; 114 Suppl 1: 17-20, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36951761

RESUMO

Multiple factors have been proposed to contribute to the other-race effect in face recognition, including perceptual expertise and social-cognitive accounts. Here, we propose to understand the effect and its contributing factors from the perspectives of learning mechanisms that involve joint learning of visual attention strategies and internal representations for faces, which can be modulated by quality of contact with other-race individuals including emotional and motivational factors. Computational simulations of this process will enhance our understanding of interactions among factors and help resolve inconsistent results in the literature. In particular, since learning is driven by task demands, visual attention effects observed in different face-processing tasks, such as passive viewing or recognition, are likely to be task specific (although may be associated) and should be examined and compared separately. When examining visual attention strategies, the use of more data-driven and comprehensive eye movement measures, taking both spatial-temporal pattern and consistency of eye movements into account, can lead to novel discoveries in other-race face processing. The proposed framework and analysis methods may be applied to other tasks of real-life significance such as face emotion recognition, further enhancing our understanding of the relationship between learning and visual cognition.


Assuntos
Reconhecimento Visual de Modelos , Grupos Raciais , Humanos , Grupos Raciais/psicologia , Aprendizagem , Reconhecimento Psicológico , Movimentos Oculares
9.
IEEE Trans Neural Netw Learn Syst ; 34(3): 1537-1551, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34464269

RESUMO

The hidden Markov model (HMM) is a broadly applied generative model for representing time-series data, and clustering HMMs attract increased interest from machine learning researchers. However, the number of clusters ( K ) and the number of hidden states ( S ) for cluster centers are still difficult to determine. In this article, we propose a novel HMM-based clustering algorithm, the variational Bayesian hierarchical EM algorithm, which clusters HMMs through their densities and priors and simultaneously learns posteriors for the novel HMM cluster centers that compactly represent the structure of each cluster. The numbers K and S are automatically determined in two ways. First, we place a prior on the pair (K,S) and approximate their posterior probabilities, from which the values with the maximum posterior are selected. Second, some clusters and states are pruned out implicitly when no data samples are assigned to them, thereby leading to automatic selection of the model complexity. Experiments on synthetic and real data demonstrate that our algorithm performs better than using model selection techniques with maximum likelihood estimation.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2088-2103, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35294345

RESUMO

Recent image captioning models are achieving impressive results based on popular metrics, i.e., BLEU, CIDEr, and SPICE. However, focusing on the most popular metrics that only consider the overlap between the generated captions and human annotation could result in using common words and phrases, which lacks distinctiveness, i.e., many similar images have the same caption. In this paper, we aim to improve the distinctiveness of image captions via comparing and reweighting with a set of similar images. First, we propose a distinctiveness metric-between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric reveals that the human annotations of each image in the MSCOCO dataset are not equivalent based on distinctiveness; however, previous works normally treat the human annotations equally during training, which could be a reason for generating less distinctive captions. In contrast, we reweight each ground-truth caption according to its distinctiveness during training. We further integrate a long-tailed weight strategy to highlight the rare words that contain more information, and captions from the similar image set are sampled as negative examples to encourage the generated sentence to be unique. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study.

11.
IEEE Trans Neural Netw Learn Syst ; 34(12): 10653-10667, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35576413

RESUMO

Multicamera surveillance has been an active research topic for understanding and modeling scenes. Compared to a single camera, multicameras provide larger field-of-view and more object cues, and the related applications are multiview counting, multiview tracking, 3-D pose estimation or 3-D reconstruction, and so on. It is usually assumed that the cameras are all temporally synchronized when designing models for these multicamera-based tasks. However, this assumption is not always valid, especially for multicamera systems with network transmission delay and low frame rates due to limited network bandwidth, resulting in desynchronization of the captured frames across cameras. To handle the issue of unsynchronized multicameras, in this article, we propose a synchronization model that works in conjunction with existing deep neural network (DNN)-based multiview models, thus avoiding the redesign of the whole model. We consider two variants of the model, based on where in the pipeline the synchronization occurs, scene-level synchronization and camera-level synchronization. The view synchronization step and the task-specific view fusion and prediction step are unified in the same framework and trained in an end-to-end fashion. Our view synchronization models are applied to different DNNs-based multicamera vision tasks under the unsynchronized setting, including multiview counting and 3-D pose estimation, and achieve good performance compared to baselines.

12.
Dev Psychol ; 59(2): 353-363, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36342437

RESUMO

Early attention bias to threat-related negative emotions may lead children to overestimate dangers in social situations. This study examined its emergence and how it might develop in tandem with a known predictor namely temperamental shyness for toddlers' fear of strangers in 168 Chinese toddlers. Measurable individual differences in such attention bias to fearful faces were found and remained stable from age 12 to 18 months. When shown photos of paired happy versus fearful or happy versus angry faces, toddlers initially gazed more and had longer initial fixation and total fixation at fearful faces compared with happy faces consistently. However, they initially gazed more at happy faces compared with angry faces consistently and had a longer total fixation at angry faces only at 18 months. Stranger anxiety at 12 months predicted attention bias to fearful faces at 18 months. Temperamentally shyer 12-month-olds went on to show stronger attention bias to fearful faces at 18 months, and their fear of strangers also increased more from 12 to 18 months. Together with prior research suggesting attention bias to angry or fearful faces foretelling social anxiety, the present findings point to likely positive feedback loops among attention bias to fearful faces, temperamental shyness, and stranger anxiety in early childhood. (PsycInfo Database Record (c) 2023 APA, all rights reserved).


Assuntos
Expressão Facial , Medo , Humanos , Pré-Escolar , Lactente , Medo/psicologia , Ansiedade , Ira , Felicidade , Emoções
13.
NPJ Sci Learn ; 7(1): 28, 2022 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-36284113

RESUMO

Greater eyes-focused eye movement pattern during face recognition is associated with better performance in adults but not in children. We test the hypothesis that higher eye movement consistency across trials, instead of a greater eyes-focused pattern, predicts better performance in children since it reflects capacity in developing visual routines. We first simulated visual routine development through combining deep neural network and hidden Markov model that jointly learn perceptual representations and eye movement strategies for face recognition. The model accounted for the advantage of eyes-focused pattern in adults, and predicted that in children (partially trained models) consistency but not pattern of eye movements predicted recognition performance. This result was then verified with data from typically developing children. In addition, lower eye movement consistency in children was associated with autism diagnosis, particularly autistic traits in social skills. Thus, children's face recognition involves visual routine development through social exposure, indexed by eye movement consistency.

14.
Sci Rep ; 12(1): 7462, 2022 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-35523808

RESUMO

No previous studies have investigated eye-movement patterns to show children's information processing while viewing clinical images. Therefore, this study aimed to explore children and their educators' perception of a midline diastema by applying eye-movement analysis using the hidden Markov models (EMHMM). A total of 155 children between 2.5 and 5.5 years of age and their educators (n = 34) viewed pictures with and without a midline diastema while Tobii Pro Nano eye-tracker followed their eye movements. Fixation data were analysed using data-driven, and fixed regions of interest (ROIs) approaches with EMHMM. Two different eye-movement patterns were identified: explorative pattern (76%), where the children's ROIs were predominantly around the nose and mouth, and focused pattern (26%), where children's ROIs were precise, locating on the teeth with and without a diastema, and fixations transited among the ROIs with similar frequencies. Females had a significantly higher eye-movement preference for without diastema image than males. Comparisons between the different age groups showed a statistically significant difference for overall entropies. The 3.6-4.5y age groups exhibited higher entropies, indicating lower eye-movement consistency. In addition, children and their educators exhibited two specific eye-movement patterns. Children in the explorative pattern saw the midline diastema more often while their educators focussed on the image without diastema. Thus, EMHMMs are valuable in analysing eye-movement patterns in children and adults.


Assuntos
Diastema , Movimentos Oculares , Adulto , Atenção , Criança , Face , Feminino , Humanos , Masculino , Boca
15.
Dent Traumatol ; 38(5): 410-416, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35460595

RESUMO

BACKGROUND/AIM: Traumatic dental injuries (TDIs) in the primary dentition may result in tooth discolouration and fractures. The aim of this child-centred study was to explore the differences between preschool children's eye movement patterns and visual attention to typical outcomes following TDIs to primary teeth. MATERIALS AND METHODS: An eye-tracker recorded 155 healthy preschool children's eye movements when they viewed clinical images of healthy teeth, tooth fractures and discolourations. The visual search pattern was analysed using the eye movement analysis with the Hidden Markov Models (EMHMM) approach and preference for the various regions of interest (ROIs). RESULTS: Two different eye movement patterns (distributed and selective) were identified (p < .05). Children with the distributed pattern shifted their fixations between the presented images, while those with the selective pattern remained focused on the same image they first saw. CONCLUSIONS: Preschool children noticed teeth. However, most of them did not have an attentional bias, implying that they did not interpret these TDI outcomes negatively. Only a few children avoided looking at images with TDIs indicating a potential negative impact. The EMHMM approach is appropriate for assessing inter-individual differences in children's visual attention to TDI outcomes.


Assuntos
Fraturas dos Dentes , Traumatismos Dentários , Pré-Escolar , Tecnologia de Rastreamento Ocular , Humanos , Dente Decíduo
16.
Caries Res ; 56(2): 129-137, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35398845

RESUMO

Visual attention is a significant gateway to a child's mind, and looking is one of the first behaviors young children develop. Untreated caries and the resulting poor dental aesthetics can have adverse emotional and social impacts on children's oral health-related quality of life due to its detrimental effects on self-esteem and self-concept. Therefore, we explored preschool children's eye movement patterns and visual attention to images with and without dental caries via eye movement analysis using hidden Markov models (EMHMM). We calibrated a convenience sample of 157 preschool children to the eye-tracker (Tobii Nano Pro) to ensure standardization. Consequently, each participant viewed the same standardized pictures with and without dental caries while an eye-tracking device tracked their eye movements. Subsequently, based on the sequence of viewed regions of interest (ROIs), a transition matrix was developed where the participants' previously viewed ROI informed their subsequently considered ROI. Hence, an individual's HMM was estimated from their eye movement data using a variational Bayesian approach to determine the optimal number of ROIs automatically. Consequently, this data-driven approach generated the visual task participants' most representative eye movement patterns. Preschool children exhibited two different eye movement patterns, distributed (78%) and selective (21%), which was statistically significant. Children switched between images with more similar probabilities in the distributed pattern while children remained looking at the same ROI than switching to the other ROI in the selective pattern. Nevertheless, all children exhibited an equal starting fixation on the right or left image and noticed teeth. The study findings reveal that most preschool children did not have an attentional bias to images with and without dental caries. Furthermore, only a few children selectively fixated on images with dental caries. Therefore, selective eye-movement patterns may strongly predict preschool children's sustained visual attention to dental caries. Nevertheless, future studies are essential to fully understand the developmental origins of differences in visual attention to common oral health presentations in children. Finally, EMHMM is appropriate for assessing inter-individual differences in children's visual attention.


Assuntos
Cárie Dentária , Teorema de Bayes , Pré-Escolar , Cárie Dentária/diagnóstico por imagem , Tecnologia de Rastreamento Ocular , Humanos , Saúde Bucal , Qualidade de Vida
17.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1357-1370, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32903177

RESUMO

Crowd counting is an essential topic in computer vision due to its practical usage in surveillance systems. The typical design of crowd counting algorithms is divided into two steps. First, the ground-truth density maps of crowd images are generated from the ground-truth dot maps (density map generation), e.g., by convolving with a Gaussian kernel. Second, deep learning models are designed to predict a density map from an input image (density map estimation). The density map based counting methods that incorporate density map as the intermediate representation have improved counting performance dramatically. However, in the sense of end-to-end training, the hand-crafted methods used for generating the density maps may not be optimal for the particular network or dataset used. To address this issue, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter. The counter and generator are trained jointly within an end-to-end framework. We also show that the proposed framework can be applied to general dense object counting tasks. Extensive experiments are conducted on 10 datasets for 3 applications: crowd counting, vehicle counting, and general object counting. The experiment results on these datasets confirm the effectiveness of the proposed learnable density map representations.

18.
IEEE Trans Pattern Anal Mach Intell ; 44(2): 1035-1049, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-32749960

RESUMO

Diversity is one of the most important properties in image captioning, as it reflects various expressions of important concepts presented in an image. However, the most popular metrics cannot well evaluate the diversity of multiple captions. In this paper, we first propose a metric to measure the diversity of a set of captions, which is derived from latent semantic analysis (LSA), and then kernelize LSA using CIDEr (R. Vedantam et al., 2015) similarity. Compared with mBLEU (R. Shetty et al., 2017), our proposed diversity metrics show a relatively strong correlation to human evaluation. We conduct extensive experiments, finding there is a large gap between the performance of the current state-of-the-art models and human annotations considering both diversity and accuracy; the models that aim to generate captions with higher CIDEr scores normally obtain lower diversity scores, which generally learn to describe images using common words. To bridge this "diversity" gap, we consider several methods for training caption models to generate diverse captions. First, we show that balancing the cross-entropy loss and CIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions. Second, we develop approaches that directly optimize our diversity metric and CIDEr score using reinforcement learning. These proposed approaches using reinforcement learning (RL) can be unified into a self-critical (S. J. Rennie et al., 2017) framework with new RL baselines. Third, we combine accuracy and diversity into a single measure using an ensemble matrix, and then maximize the determinant of the ensemble matrix via reinforcement learning to boost diversity and accuracy, which outperforms its counterparts on the oracle test. Finally, inspired by determinantal point processes (DPP), we develop a DPP selection algorithm to select a subset of captions from a large number of candidate captions. The experimental results show that maximizing the determinant of the ensemble matrix outperforms other methods considerably improving diversity and accuracy.


Assuntos
Algoritmos , Benchmarking , Humanos , Aprendizagem , Semântica
19.
IEEE Trans Neural Netw Learn Syst ; 33(4): 1492-1506, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33361002

RESUMO

Estimating the predictive uncertainty of a Bayesian learning model is critical in various decision-making problems, e.g., reinforcement learning, detecting the adversarial attack, self-driving car. As the model posterior is almost always intractable, most efforts were made on finding an accurate approximation to the true posterior. Even though a decent estimation of the model posterior is obtained, another approximation is required to compute the predictive distribution over the desired output. A common accurate solution is to use Monte Carlo (MC) integration. However, it needs to maintain a large number of samples, and evaluate the model repeatedly, and average multiple model outputs. In many real-world cases, this is computationally prohibitive. In this work, assuming that the exact posterior or a decent approximation is obtained, we propose a generic framework to approximate the output probability distribution induced by the model posterior with a parameterized model and in an amortized fashion. The aim is to approximate the predictive uncertainty of a specific Bayesian model, meanwhile alleviating the heavy workload of MC integration at testing time. The proposed method is universally applicable to Bayesian classification models that allow for posterior sampling. Theoretically, we show that the idea of amortization incurs no additional costs on approximation performance. Empirical results validate the strong practical performance of our approach.

20.
IEEE Trans Pattern Anal Mach Intell ; 44(6): 3197-3211, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33385310

RESUMO

We propose a ParametRIc MAnifold Learning (PRIMAL) algorithm for Gaussian mixtures models (GMM), assuming that GMMs lie on or near to a manifold of probability distributions that is generated from a low-dimensional hierarchical latent space through parametric mappings. Inspired by principal component analysis (PCA), the generative processes for priors, means and covariance matrices are modeled by their respective latent space and parametric mapping. Then, the dependencies between latent spaces are captured by a hierarchical latent space by a linear or kernelized mapping. The function parameters and hierarchical latent space are learned by minimizing the reconstruction error between ground-truth GMMs and manifold-generated GMMs, measured by Kullback-Leibler Divergence (KLD). Variational approximation is employed to handle the intractable KLD between GMMs and a variational EM algorithm is derived to optimize the objective function. Experiments on synthetic data, flow cytometry analysis, eye-fixation analysis and topic models show that PRIMAL learns a continuous and interpretable manifold of GMM distributions and achieves a minimum reconstruction error.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA