Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38743536

RESUMO

Deep neural networks (DNNs) provide state-of-the-art accuracy for vision tasks, but they require significant resources for training. Thus, they are trained on cloud servers far from the edge devices that acquire the data. This issue increases communication cost, runtime, and privacy concerns. In this study, a novel hierarchical training method for DNNs is proposed that uses early exits in a divided architecture between edge and cloud workers to reduce the communication cost, training runtime, and privacy concerns. The method proposes a brand-new use case for early exits to separate the backward pass of neural networks between the edge and the cloud during the training phase. We address the issues of most available methods that, due to the sequential nature of the training phase, cannot train the levels of hierarchy simultaneously or they do it with the cost of compromising privacy. In contrast, our method can use both edge and cloud workers simultaneously, does not share the raw input data with the cloud, and does not require communication during the backward pass. Several simulations and on-device experiments for different neural network architectures demonstrate the effectiveness of this method. It is shown that the proposed method reduces the training runtime for VGG-16 and ResNet-18 architectures by 29% and 61% in CIFAR-10 classification and by 25% and 81% in Tiny ImageNet classification, respectively, when the communication with the cloud is done over a low bit rate channel. This gain in the runtime is achieved, while the accuracy drop is negligible. This method is advantageous for online learning of high-accuracy DNNs on sensor-holding low-resource devices such as mobile phones or robots as a part of an edge-cloud system, making them more flexible in facing new tasks and classes of data.

2.
PLoS One ; 18(2): e0279419, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36735652

RESUMO

Blood pressure (BP) is a crucial biomarker giving valuable information regarding cardiovascular diseases but requires accurate continuous monitoring to maximize its value. In the effort of developing non-invasive, non-occlusive and continuous BP monitoring devices, photoplethysmography (PPG) has recently gained interest. Researchers have attempted to estimate BP based on the analysis of PPG waveform morphology, with promising results, yet often validated on a small number of subjects with moderate BP variations. This work presents an accurate BP estimator based on PPG morphology features. The method first uses a clinically-validated algorithm (oBPM®) to perform signal preprocessing and extraction of physiological features. A subset of features that best reflects BP changes is automatically identified by Lasso regression, and a feature relevance analysis is conducted. Three machine learning (ML) methods are then investigated to translate this subset of features into systolic BP (SBP) and diastolic BP (DBP) estimates; namely Lasso regression, support vector regression and Gaussian process regression. The accuracy of absolute BP estimates and trending ability are evaluated. Such an approach considerably improves the performance for SBP estimation over previous oBPM® technology, with a reduction in the standard deviation of the error of over 20%. Furthermore, rapid BP changes assessed by the PPG-based approach demonstrates concordance rate over 99% with the invasive reference. Altogether, the results confirm that PPG morphology features can be combined with ML methods to accurately track BP variations generated during anesthesia induction. They also reinforce the importance of adding a calibration measure to obtain an absolute BP estimate.


Assuntos
Determinação da Pressão Arterial , Fotopletismografia , Humanos , Pressão Sanguínea/fisiologia , Fotopletismografia/métodos , Determinação da Pressão Arterial/métodos , Aprendizado de Máquina , Anestesia Geral
3.
Open Heart ; 10(1)2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36596624

RESUMO

BACKGROUND: Angiographic parameters can facilitate the risk stratification of coronary lesions but remain insufficient in the prediction of future myocardial infarction (MI). AIMS: We compared the ability of humans, angiographic parameters and deep learning (DL) to predict the lesion that would be responsible for a future MI in a population of patients with non-significant CAD at baseline. METHODS: We retrospectively included patients who underwent invasive coronary angiography (ICA) for MI, in whom a previous angiogram had been performed within 5 years. The ability of human visual assessment, diameter stenosis, area stenosis, quantitative flow ratio (QFR) and DL to predict the future culprit lesion (FCL) was compared. RESULTS: In total, 746 cropped ICA images of FCL and non-culprit lesions (NCL) were analysed. Predictive models for each modality were developed in a training set before validation in a test set. DL exhibited the best predictive performance with an area under the curve of 0.81, compared with diameter stenosis (0.62, p=0.04), area stenosis (0.58, p=0.05) and QFR (0.67, p=0.13). DL exhibited a significant net reclassification improvement (NRI) compared with area stenosis (0.75, p=0.03) and QFR (0.95, p=0.01), and a positive nonsignificant NRI when compared with diameter stenosis. Among all models, DL demonstrated the highest accuracy (0.78) followed by QFR (0.70) and area stenosis (0.68). Predictions based on human visual assessment and diameter stenosis had the lowest accuracy (0.58). CONCLUSION: In this feasibility study, DL outperformed human visual assessment and established angiographic parameters in the prediction of FCLs. Larger studies are now required to confirm this finding.


Assuntos
Estenose Coronária , Aprendizado Profundo , Reserva Fracionada de Fluxo Miocárdico , Infarto do Miocárdio , Humanos , Estenose Coronária/diagnóstico por imagem , Angiografia Coronária/métodos , Constrição Patológica , Estudos de Viabilidade , Estudos Retrospectivos , Vasos Coronários , Infarto do Miocárdio/diagnóstico por imagem
4.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6372-6385, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36112555

RESUMO

Stereo confidence estimation aims to estimate the reliability of the estimated disparity by stereo matching. Different from the previous methods that exploit the limited input modality, we present a novel method that estimates confidence map of an initial disparity by making full use of tri-modal input, including matching cost, disparity, and color image through deep networks. The proposed network, termed as Locally Adaptive Fusion Networks (LAF-Net), learns locally-varying attention and scale maps to fuse the tri-modal confidence features. Moreover, we propose a knowledge distillation framework to learn more compact confidence estimation networks as student networks. By transferring the knowledge from LAF-Net as teacher networks, the student networks that solely take as input a disparity can achieve comparable performance. To transfer more informative knowledge, we also propose a module to learn the locally-varying temperature in a softmax function. We further extend this framework to a multiview scenario. Experimental results show that LAF-Net and its variations outperform the state-of-the-art stereo confidence methods on various benchmarks.

5.
IEEE Trans Image Process ; 31: 5813-5827, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36054397

RESUMO

State-of-the-art 2D image compression schemes rely on the power of convolutional neural networks (CNNs). Although CNNs offer promising perspectives for 2D image compression, extending such models to omnidirectional images is not straightforward. First, omnidirectional images have specific spatial and statistical properties that can not be fully captured by current CNN models. Second, basic mathematical operations composing a CNN architecture, e.g., translation and sampling, are not well-defined on the sphere. In this paper, we study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images. In particular, we: i) propose the definition of a new convolution operation on the sphere that keeps the high expressiveness and the low complexity of a classical 2D convolution; ii) adapt standard CNN techniques such as stride, iterative aggregation, and pixel shuffling to the spherical domain; and then iii) apply our new framework to the task of omnidirectional image compression. Our experiments show that our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images. Also, compared to learning models based on graph convolutional networks, our solution supports more expressive filters that can preserve high frequencies and provide a better perceptual quality of the compressed images. Such results demonstrate the efficiency of the proposed framework, which opens new research venues for other omnidirectional vision tasks to be effectively implemented on the sphere manifold.

6.
Neuroimage ; 252: 119037, 2022 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-35219859

RESUMO

Understanding the organizational principles of human brain activity at the systems level remains a major challenge in network neuroscience. Here, we introduce a fully data-driven approach based on graph learning to extract meaningful repeating network patterns from regionally-averaged timecourses. We use the Graph Laplacian Mixture Model (GLMM), a generative model that treats functional data as a collection of signals expressed on multiple underlying graphs. By exploiting covariance between activity of brain regions, these graphs can be learned without resorting to structural information. To validate the proposed technique, we first apply it to task fMRI with a known experimental paradigm. The probability of each graph to occur at each time-point is found to be consistent with the task timing, while the spatial patterns associated to each epoch of the task are in line with previously established activation patterns using classical regression analysis. We further on apply the technique to resting state data, which leads to extracted graphs that correspond to well-known brain functional activation patterns. The GLMM allows to learn graphs entirely from the functional activity that, in practice, turn out to reveal high degrees of similarity to the structural connectome. The Default Mode Network (DMN) is always captured by the algorithm in the different tasks and resting state data. Therefore, we compare the states corresponding to this network within themselves and with structure. Overall, this method allows us to infer relevant functional brain networks without the need of structural connectome information. Moreover, we overcome the limitations of windowing the time sequences by feeding the GLMM with the whole functional signal and neglecting the focus on sub-portions of the signals.


Assuntos
Conectoma , Algoritmos , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Conectoma/métodos , Humanos , Imageamento por Ressonância Magnética/métodos , Rede Nervosa/diagnóstico por imagem , Rede Nervosa/fisiologia
7.
IEEE Trans Neural Netw Learn Syst ; 33(9): 5032-5044, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33788695

RESUMO

With the advent of data science, the analysis of network or graph data has become a very timely research problem. A variety of recent works have been proposed to generalize neural networks to graphs, either from a spectral graph theory or a spatial perspective. The majority of these works, however, focus on adapting the convolution operator to graph representation. At the same time, the pooling operator also plays an important role in distilling multiscale and hierarchical representations, but it has been mostly overlooked so far. In this article, we propose a parameter-free pooling operator, called iPool, that permits to retain the most informative features in arbitrary graphs. With the argument that informative nodes dominantly characterize graph signals, we propose a criterion to evaluate the amount of information of each node given its neighbors and theoretically demonstrate its relationship to neighborhood conditional entropy. This new criterion determines how nodes are selected and coarsened graphs are constructed in the pooling layer. The resulting hierarchical structure yields an effective isomorphism-invariant representation of networked data on arbitrary topologies. The proposed strategy achieves superior or competitive performance in graph classification on a collection of public graph benchmark data sets and superpixel-induced image graph data sets.

8.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 463-466, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34891333

RESUMO

Blood pressure (BP) is an important indicator for prevention and management of cardiovascular diseases. Alongside the improvement in sensors and wearables, photoplethysmography (PPG) appears to be a promising technology for continuous, non-invasive and cuffless BP monitoring. Previous attempts mainly focused on features extracted from the pulse morphology. In this paper, we propose to remove the feature engineering step and automatically generate features from an ensemble average (EA) PPG pulse and its derivatives, using convolutional neural network and a calibration measurement. We used the large VitalDB dataset to accurately evaluate the generalization capability of the proposed model. The model achieved mean errors of -0.24 ± 11.56 mmHg for SBP and -0.5 ± 6.52 mmHg for DBP. We observed a considerable reduction in error standard deviation of above 40% compared to the control case, which assumes no BP variation. Altogether, these results highlight the capability to model the dependency between PPG and BP.


Assuntos
Fotopletismografia , Análise de Onda de Pulso , Pressão Sanguínea , Determinação da Pressão Arterial , Redes Neurais de Computação
9.
Artigo em Inglês | MEDLINE | ID: mdl-31403414

RESUMO

In this paper, we propose a new graph-based transform and illustrate its potential application to signal compression. Our approach relies on the careful design of a graph that optimizes the overall rate-distortion performance through an effective graph-based transform. We introduce a novel graph estimation algorithm, which uncovers the connectivities between the graph signal values by taking into consideration the coding of both the signal and the graph topology in rate-distortion terms. In particular, we introduce a novel coding solution for the graph by treating the edge weights as another graph signal that lies on the dual graph. Then, the cost of the graph description is introduced in the optimization problem by minimizing the sparsity of the coefficients of its graph Fourier transform (GFT) on the dual graph. In this way, we obtain a convex optimization problem whose solution defines an efficient transform coding strategy. The proposed technique is a general framework that can be applied to different types of signals, and we show two possible application fields, namely natural image coding and piecewise smooth image coding. The experimental results show that the proposed graph-based transform outperforms classical fixed transforms such as DCT for both natural and piecewise smooth images. In the case of depth map coding, the obtained results are even comparable to the state-of-the-art graph-based coding method, that are specifically designed for depth map images.

10.
IEEE Trans Image Process ; 27(9): 4207-4218, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29870342

RESUMO

Light field cameras capture the 3D information in a scene with a single exposure. This special feature makes light field cameras very appealing for a variety of applications: from post-capture refocus to depth estimation and image-based rendering. However, light field cameras suffer by design from strong limitations in their spatial resolution. Off-the-shelf super-resolution algorithms are not ideal for light field data, as they do not consider its structure. On the other hand, the few super-resolution algorithms explicitly tailored for light field data exhibit significant limitations, such as the need to carry out a costly disparity estimation procedure with sub-pixel precision. We propose a new light field super-resolution algorithm meant to address these limitations. We use the complementary information in the different light field views to augment the spatial resolution of the whole light field at once. In particular, we show that coupling the multi-view approach with a graph-based regularizer, which enforces the light field geometric structure, permits to avoid the need of a precise and costly disparity estimation step. Extensive experiments show that the new algorithm compares favorably to the state-of-the-art methods for light field super-resolution, both in terms of visual quality and in terms of reconstruction error.

11.
IEEE Trans Image Process ; 26(11): 5477-5490, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28783631

RESUMO

We consider the synthesis of intermediate views of an object captured by two widely spaced and calibrated cameras. This problem is challenging because foreshortening effects and occlusions induce significant differences between the reference images when the cameras are far apart. That makes the association or disappearance/appearance of their pixels difficult to estimate. Our main contribution lies in disambiguating this ill-posed problem by making the interpolated views consistent with a plausible transformation of the object silhouette between the reference views. This plausible transformation is derived from an object-specific prior that consists of a nonlinear shape manifold learned from multiple previous observations of this object by the two reference cameras. The prior is used to estimate the evolution of the epipolar silhouette segments between the reference views. This information directly supports the definition of epipolar silhouette segments in the intermediate views, as well as the synthesis of textures in those segments. It permits to reconstruct the epipolar plane images (EPIs) and the continuum of views associated with the EPI volume, obtained by aggregating the EPIs. Experiments on synthetic and natural images show that our method preserves the object topology in intermediate views and deals effectively with the self-occluded regions and the severe foreshortening effect associated with wide-baseline camera configurations.

12.
IEEE Trans Image Process ; 25(4): 1808-19, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26890866

RESUMO

Augmented reality, interactive navigation in 3D scenes, multiview video, and other emerging multimedia applications require large sets of images, hence larger data volumes and increased resources compared with traditional video services. The significant increase in the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multiview video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the data set. In such coding schemes, the two following questions become fundamental: 1) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? And 2) where to place these key views in the multiview data set? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views, such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multiview coding scheme. We show that considering the 3D scene geometry in the reference view, positioning problem brings significant rate-distortion improvements and outperforms the traditional coding strategy that simply selects key frames based on the distance between cameras.

13.
IEEE Trans Image Process ; 25(4): 1765-78, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26891486

RESUMO

This paper addresses the problem of compression of 3D point cloud sequences that are characterized by moving 3D positions and color attributes. As temporally successive point cloud frames share some similarities, motion estimation is key to effective compression of these sequences. It, however, remains a challenging problem as the point cloud frames have varying numbers of points without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs, and consider 3D positions and color attributes of the point clouds as signals on the vertices of the graphs. We then cast motion estimation as a feature-matching problem between successive graphs. The motion is estimated on a sparse set of representative vertices using new spectral graph wavelet descriptors. A dense motion field is eventually interpolated by solving a graph-based regularization problem. The estimated motion is finally used for removing the temporal redundancy in the predictive coding of the 3D positions and the color characteristics of the point cloud sequences. Experimental results demonstrate that our method is able to accurately estimate the motion between consecutive frames. Moreover, motion estimation is shown to bring a significant improvement in terms of the overall compression performance of the sequence. To the best of our knowledge, this is the first paper that exploits both the spatial correlation inside each frame (through the graph) and the temporal correlation between the frames (through the motion estimation) to compress the color and the geometry of 3D point cloud sequences in an efficient way.

14.
IEEE Trans Image Process ; 25(1): 134-49, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26561432

RESUMO

In free viewpoint video systems, a user has the freedom to select a virtual view from which an image of the 3D scene is rendered, and the scene is commonly represented by color and depth images of multiple nearby viewpoints. In such representation, there exists data redundancy across multiple dimensions: 1) a 3D voxel may be represented by pixels in multiple viewpoint images (inter-view redundancy); 2) a pixel patch may recur in a distant spatial region of the same image due to self-similarity (inter-patch redundancy); and 3) pixels in a local spatial region tend to be similar (inter-pixel redundancy). It is important to exploit these redundancies during inter-view prediction toward effective multiview video compression. In this paper, we propose an encoder-driven inpainting strategy for inter-view predictive coding, where explicit instructions are transmitted minimally, and the decoder is left to independently recover remaining missing data via inpainting, resulting in lower coding overhead. In particular, after pixels in a reference view are projected to a target view via depth-image-based rendering at the decoder, the remaining holes in the target view are filled via an inpainting process in a block-by-block manner. First, blocks are ordered in terms of difficulty-to-inpaint by the decoder. Then, explicit instructions are only sent for the reconstruction of the most difficult blocks. In particular, the missing pixels are explicitly coded via a graph Fourier transform or a sparsification procedure using discrete cosine transform, leading to low coding cost. For blocks that are easy to inpaint, the decoder independently completes missing pixels via template-based inpainting. We apply our proposed scheme to frames in a prediction structure defined by JCT-3V where inter-view prediction is dominant, and experimentally we show that our scheme achieves up to 3-dB gain in peak-signal-to-noise-ratio in reconstructed image quality over a comparable 3D-High Efficiency Video Coding implementation using fixed 16 $\times $ 16 block size.

15.
IEEE Trans Image Process ; 24(5): 1573-86, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25675455

RESUMO

In this paper, we propose a new geometry representation method for multiview image sets. Our approach relies on graphs to describe the multiview geometry information in a compact and controllable way. The links of the graph connect pixels in different images and describe the proximity between pixels in 3D space. These connections are dependent on the geometry of the scene and provide the right amount of information that is necessary for coding and reconstructing multiple views. Our multiview image representation is very compact and adapts the transmitted geometry information as a function of the complexity of the prediction performed at the decoder side. To achieve this, our graph-based representation (GBR) carefully selects the amount of geometry information needed before coding. This is in contrast with depth coding, which directly compresses with losses the original geometry signal, thus making it difficult to quantify the impact of coding errors on geometry-based interpolation. We present the principles of this GBR and we build an efficient coding algorithm to represent it. We compare our GBR approach to classical depth compression methods and compare their respective view synthesis qualities as a function of the compactness of the geometry description. We show that GBR can achieve significant gains in geometry coding rate over depth-based schemes operating at similar quality. Experimental results demonstrate the potential of this new representation.

16.
IEEE Trans Image Process ; 22(9): 3513-6, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24052142

RESUMO

A new set of three-dimensional (3D) data formats and associated compression technologies are emerging with the aim to achieve more flexible representation and higher compression of 3D and multiview video content. These new tools will facilitate the generation of multiview output (e.g., as needed for multiview auto-stereoscopic displays), provide richer immersive multimedia experiences, and allow new interactive applications. This special section includes a timely set of papers covering the most recent technical developments in this area with papers covering topics in the different aspects of 3D systems, from representation and compression algorithms to rendering techniques and quality assessment. This special section includes a good balance on topics that are of interest to academic, industrial, and standardization communities. We believe that this collection of papers represent the most recent advances in representation, compression, rendering, and quality assessment of 3D scenes.

17.
IEEE Trans Image Process ; 22(9): 3459-72, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23797262

RESUMO

Enabling users to interactively navigate through different viewpoints of a static scene is a new interesting functionality in 3D streaming systems. While it opens exciting perspectives toward rich multimedia applications, it requires the design of novel representations and coding techniques to solve the new challenges imposed by the interactive navigation. In particular, the encoder must prepare a priori a compressed media stream that is flexible enough to enable the free selection of multiview navigation paths by different streaming media clients. Interactivity clearly brings new design constraints: the encoder is unaware of the exact decoding process, while the decoder has to reconstruct information from incomplete subsets of data since the server generally cannot transmit images for all possible viewpoints due to resource constrains. In this paper, we propose a novel multiview data representation that permits us to satisfy bandwidth and storage constraints in an interactive multiview streaming system. In particular, we partition the multiview navigation domain into segments, each of which is described by a reference image (color and depth data) and some auxiliary information. The auxiliary information enables the client to recreate any viewpoint in the navigation segment via view synthesis. The decoder is then able to navigate freely in the segment without further data request to the server; it requests additional data only when it moves to a different segment. We discuss the benefits of this novel representation in interactive navigation systems and further propose a method to optimize the partitioning of the navigation domain into independent segments, under bandwidth and storage constraints. Experimental results confirm the potential of the proposed representation; namely, our system leads to similar compression performance as classical inter-view coding, while it provides the high level of flexibility that is required for interactive streaming. Because of these unique properties, our new framework represents a promising solution for 3D data representation in novel interactive multimedia services.

18.
IEEE Trans Image Process ; 22(5): 1969-81, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23335670

RESUMO

Distributed representation of correlated multiview images is an important problem that arises in vision sensor networks. This paper concentrates on the joint reconstruction problem where the distributively compressed images are decoded together in order to take benefit from the image correlation. We consider a scenario where the images captured at different viewpoints are encoded independently using common coding solutions (e.g., JPEG) with a balanced rate distribution among different cameras. A central decoder first estimates the inter-view image correlation from the independently compressed data. The joint reconstruction is then cast as a constrained convex optimization problem that reconstructs total-variation (TV) smooth images, which comply with the estimated correlation model. At the same time, we add constraints that force the reconstructed images to be as close as possible to their compressed versions. We show through experiments that the proposed joint reconstruction scheme outperforms independent reconstruction in terms of image quality, for a given target bit rate. In addition, the decoding performance of our algorithm compares advantageously to state-of-the-art distributed coding schemes based on motion learning and on the DISCOVER algorithm.

19.
IEEE Trans Image Process ; 22(4): 1311-25, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23193457

RESUMO

Manifold models provide low-dimensional representations that are useful for processing and analyzing data in a transformation-invariant way. In this paper, we study the problem of learning smooth pattern transformation manifolds from image sets that represent observations of geometrically transformed signals. To construct a manifold, we build a representative pattern whose transformations accurately fit various input images. We examine two objectives of the manifold-building problem, namely, approximation and classification. For the approximation problem, we propose a greedy method that constructs a representative pattern by selecting analytic atoms from a continuous dictionary manifold. We present a dc optimization scheme that is applicable to a wide range of transformation and dictionary models, and demonstrate its application to the transformation manifolds generated by the rotation, translation, and anisotropic scaling of a reference pattern. Then, we generalize this approach to a setting with multiple transformation manifolds, where each manifold represents a different class of signals. We present an iterative multiple-manifold-building algorithm such that the classification accuracy is promoted in the learning of the representative patterns. The experimental results suggest that the proposed methods yield high accuracy in the approximation and classification of data compared with some reference methods, while the invariance to geometric transformations is achieved because of the transformation manifold model.

20.
IEEE Trans Image Process ; 21(7): 3206-19, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22345540

RESUMO

This paper addresses the problem of distributed coding of images whose correlation is driven by the motion of objects or the camera positioning. It concentrates on the problem where images are encoded with compressed linear measurements. We propose a geometry-based correlation model that describes the common information in pairs of images. We assume that the constitutive components of natural images can be captured by visual features that undergo local transformations (e.g., translation) in different images. We first identify prominent visual features by computing a sparse approximation of a reference image with a dictionary of geometric basis functions. We then pose a regularized optimization problem in order to estimate the corresponding features in correlated images that are given by quantized linear measurements. The correlation model is thus given by the relative geometric transformations between corresponding features. We then propose an efficient joint decoding algorithm that reconstructs the compressed images such that they are consistent with both the quantized measurements and the correlation model. Experimental results show that the proposed algorithm effectively estimates the correlation between images in multiview data sets. In addition, the proposed algorithm provides effective decoding performance that advantageously compares to independent coding solutions and state-of-the-art distributed coding schemes based on disparity learning.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA