Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Neuroimage ; 252: 119037, 2022 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-35219859

RESUMEN

Understanding the organizational principles of human brain activity at the systems level remains a major challenge in network neuroscience. Here, we introduce a fully data-driven approach based on graph learning to extract meaningful repeating network patterns from regionally-averaged timecourses. We use the Graph Laplacian Mixture Model (GLMM), a generative model that treats functional data as a collection of signals expressed on multiple underlying graphs. By exploiting covariance between activity of brain regions, these graphs can be learned without resorting to structural information. To validate the proposed technique, we first apply it to task fMRI with a known experimental paradigm. The probability of each graph to occur at each time-point is found to be consistent with the task timing, while the spatial patterns associated to each epoch of the task are in line with previously established activation patterns using classical regression analysis. We further on apply the technique to resting state data, which leads to extracted graphs that correspond to well-known brain functional activation patterns. The GLMM allows to learn graphs entirely from the functional activity that, in practice, turn out to reveal high degrees of similarity to the structural connectome. The Default Mode Network (DMN) is always captured by the algorithm in the different tasks and resting state data. Therefore, we compare the states corresponding to this network within themselves and with structure. Overall, this method allows us to infer relevant functional brain networks without the need of structural connectome information. Moreover, we overcome the limitations of windowing the time sequences by feeding the GLMM with the whole functional signal and neglecting the focus on sub-portions of the signals.


Asunto(s)
Conectoma , Algoritmos , Encéfalo/diagnóstico por imagen , Encéfalo/fisiología , Conectoma/métodos , Humanos , Imagen por Resonancia Magnética/métodos , Red Nerviosa/diagnóstico por imagen , Red Nerviosa/fisiología
2.
Artículo en Inglés | MEDLINE | ID: mdl-38743536

RESUMEN

Deep neural networks (DNNs) provide state-of-the-art accuracy for vision tasks, but they require significant resources for training. Thus, they are trained on cloud servers far from the edge devices that acquire the data. This issue increases communication cost, runtime, and privacy concerns. In this study, a novel hierarchical training method for DNNs is proposed that uses early exits in a divided architecture between edge and cloud workers to reduce the communication cost, training runtime, and privacy concerns. The method proposes a brand-new use case for early exits to separate the backward pass of neural networks between the edge and the cloud during the training phase. We address the issues of most available methods that, due to the sequential nature of the training phase, cannot train the levels of hierarchy simultaneously or they do it with the cost of compromising privacy. In contrast, our method can use both edge and cloud workers simultaneously, does not share the raw input data with the cloud, and does not require communication during the backward pass. Several simulations and on-device experiments for different neural network architectures demonstrate the effectiveness of this method. It is shown that the proposed method reduces the training runtime for VGG-16 and ResNet-18 architectures by 29% and 61% in CIFAR-10 classification and by 25% and 81% in Tiny ImageNet classification, respectively, when the communication with the cloud is done over a low bit rate channel. This gain in the runtime is achieved, while the accuracy drop is negligible. This method is advantageous for online learning of high-accuracy DNNs on sensor-holding low-resource devices such as mobile phones or robots as a part of an edge-cloud system, making them more flexible in facing new tasks and classes of data.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6372-6385, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-36112555

RESUMEN

Stereo confidence estimation aims to estimate the reliability of the estimated disparity by stereo matching. Different from the previous methods that exploit the limited input modality, we present a novel method that estimates confidence map of an initial disparity by making full use of tri-modal input, including matching cost, disparity, and color image through deep networks. The proposed network, termed as Locally Adaptive Fusion Networks (LAF-Net), learns locally-varying attention and scale maps to fuse the tri-modal confidence features. Moreover, we propose a knowledge distillation framework to learn more compact confidence estimation networks as student networks. By transferring the knowledge from LAF-Net as teacher networks, the student networks that solely take as input a disparity can achieve comparable performance. To transfer more informative knowledge, we also propose a module to learn the locally-varying temperature in a softmax function. We further extend this framework to a multiview scenario. Experimental results show that LAF-Net and its variations outperform the state-of-the-art stereo confidence methods on various benchmarks.

4.
PLoS One ; 18(2): e0279419, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36735652

RESUMEN

Blood pressure (BP) is a crucial biomarker giving valuable information regarding cardiovascular diseases but requires accurate continuous monitoring to maximize its value. In the effort of developing non-invasive, non-occlusive and continuous BP monitoring devices, photoplethysmography (PPG) has recently gained interest. Researchers have attempted to estimate BP based on the analysis of PPG waveform morphology, with promising results, yet often validated on a small number of subjects with moderate BP variations. This work presents an accurate BP estimator based on PPG morphology features. The method first uses a clinically-validated algorithm (oBPM®) to perform signal preprocessing and extraction of physiological features. A subset of features that best reflects BP changes is automatically identified by Lasso regression, and a feature relevance analysis is conducted. Three machine learning (ML) methods are then investigated to translate this subset of features into systolic BP (SBP) and diastolic BP (DBP) estimates; namely Lasso regression, support vector regression and Gaussian process regression. The accuracy of absolute BP estimates and trending ability are evaluated. Such an approach considerably improves the performance for SBP estimation over previous oBPM® technology, with a reduction in the standard deviation of the error of over 20%. Furthermore, rapid BP changes assessed by the PPG-based approach demonstrates concordance rate over 99% with the invasive reference. Altogether, the results confirm that PPG morphology features can be combined with ML methods to accurately track BP variations generated during anesthesia induction. They also reinforce the importance of adding a calibration measure to obtain an absolute BP estimate.


Asunto(s)
Determinación de la Presión Sanguínea , Fotopletismografía , Humanos , Presión Sanguínea/fisiología , Fotopletismografía/métodos , Determinación de la Presión Sanguínea/métodos , Aprendizaje Automático , Anestesia General
5.
Open Heart ; 10(1)2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36596624

RESUMEN

BACKGROUND: Angiographic parameters can facilitate the risk stratification of coronary lesions but remain insufficient in the prediction of future myocardial infarction (MI). AIMS: We compared the ability of humans, angiographic parameters and deep learning (DL) to predict the lesion that would be responsible for a future MI in a population of patients with non-significant CAD at baseline. METHODS: We retrospectively included patients who underwent invasive coronary angiography (ICA) for MI, in whom a previous angiogram had been performed within 5 years. The ability of human visual assessment, diameter stenosis, area stenosis, quantitative flow ratio (QFR) and DL to predict the future culprit lesion (FCL) was compared. RESULTS: In total, 746 cropped ICA images of FCL and non-culprit lesions (NCL) were analysed. Predictive models for each modality were developed in a training set before validation in a test set. DL exhibited the best predictive performance with an area under the curve of 0.81, compared with diameter stenosis (0.62, p=0.04), area stenosis (0.58, p=0.05) and QFR (0.67, p=0.13). DL exhibited a significant net reclassification improvement (NRI) compared with area stenosis (0.75, p=0.03) and QFR (0.95, p=0.01), and a positive nonsignificant NRI when compared with diameter stenosis. Among all models, DL demonstrated the highest accuracy (0.78) followed by QFR (0.70) and area stenosis (0.68). Predictions based on human visual assessment and diameter stenosis had the lowest accuracy (0.58). CONCLUSION: In this feasibility study, DL outperformed human visual assessment and established angiographic parameters in the prediction of FCLs. Larger studies are now required to confirm this finding.


Asunto(s)
Estenosis Coronaria , Aprendizaje Profundo , Reserva del Flujo Fraccional Miocárdico , Infarto del Miocardio , Humanos , Estenosis Coronaria/diagnóstico por imagen , Angiografía Coronaria/métodos , Constricción Patológica , Estudios de Factibilidad , Estudios Retrospectivos , Vasos Coronarios , Infarto del Miocardio/diagnóstico por imagen
6.
IEEE Trans Neural Netw Learn Syst ; 33(9): 5032-5044, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33788695

RESUMEN

With the advent of data science, the analysis of network or graph data has become a very timely research problem. A variety of recent works have been proposed to generalize neural networks to graphs, either from a spectral graph theory or a spatial perspective. The majority of these works, however, focus on adapting the convolution operator to graph representation. At the same time, the pooling operator also plays an important role in distilling multiscale and hierarchical representations, but it has been mostly overlooked so far. In this article, we propose a parameter-free pooling operator, called iPool, that permits to retain the most informative features in arbitrary graphs. With the argument that informative nodes dominantly characterize graph signals, we propose a criterion to evaluate the amount of information of each node given its neighbors and theoretically demonstrate its relationship to neighborhood conditional entropy. This new criterion determines how nodes are selected and coarsened graphs are constructed in the pooling layer. The resulting hierarchical structure yields an effective isomorphism-invariant representation of networked data on arbitrary topologies. The proposed strategy achieves superior or competitive performance in graph classification on a collection of public graph benchmark data sets and superpixel-induced image graph data sets.

7.
IEEE Trans Image Process ; 31: 5813-5827, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36054397

RESUMEN

State-of-the-art 2D image compression schemes rely on the power of convolutional neural networks (CNNs). Although CNNs offer promising perspectives for 2D image compression, extending such models to omnidirectional images is not straightforward. First, omnidirectional images have specific spatial and statistical properties that can not be fully captured by current CNN models. Second, basic mathematical operations composing a CNN architecture, e.g., translation and sampling, are not well-defined on the sphere. In this paper, we study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images. In particular, we: i) propose the definition of a new convolution operation on the sphere that keeps the high expressiveness and the low complexity of a classical 2D convolution; ii) adapt standard CNN techniques such as stride, iterative aggregation, and pixel shuffling to the spherical domain; and then iii) apply our new framework to the task of omnidirectional image compression. Our experiments show that our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images. Also, compared to learning models based on graph convolutional networks, our solution supports more expressive filters that can preserve high frequencies and provide a better perceptual quality of the compressed images. Such results demonstrate the efficiency of the proposed framework, which opens new research venues for other omnidirectional vision tasks to be effectively implemented on the sphere manifold.

8.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 463-466, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34891333

RESUMEN

Blood pressure (BP) is an important indicator for prevention and management of cardiovascular diseases. Alongside the improvement in sensors and wearables, photoplethysmography (PPG) appears to be a promising technology for continuous, non-invasive and cuffless BP monitoring. Previous attempts mainly focused on features extracted from the pulse morphology. In this paper, we propose to remove the feature engineering step and automatically generate features from an ensemble average (EA) PPG pulse and its derivatives, using convolutional neural network and a calibration measurement. We used the large VitalDB dataset to accurately evaluate the generalization capability of the proposed model. The model achieved mean errors of -0.24 ± 11.56 mmHg for SBP and -0.5 ± 6.52 mmHg for DBP. We observed a considerable reduction in error standard deviation of above 40% compared to the control case, which assumes no BP variation. Altogether, these results highlight the capability to model the dependency between PPG and BP.


Asunto(s)
Fotopletismografía , Análisis de la Onda del Pulso , Presión Sanguínea , Determinación de la Presión Sanguínea , Redes Neurales de la Computación
9.
IEEE Trans Pattern Anal Mach Intell ; 31(7): 1225-38, 2009 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-19443921

RESUMEN

Transformation invariance is an important property in pattern recognition, where different observations of the same object typically receive the same label. This paper focuses on a transformation-invariant distance measure that represents the minimum distance between the transformation manifolds spanned by patterns of interest. Since these manifolds are typically nonlinear, the computation of the manifold distance (MD) becomes a nonconvex optimization problem. We propose representing a pattern of interest as a linear combination of a few geometric functions extracted from a structured and redundant basis. Transforming the pattern results in the transformation of its constituent parts. We show that, when the transformation is restricted to a synthesis of translations, rotations, and isotropic scalings, such a pattern representation results in a closed-form expression of the manifold equation with respect to the transformation parameters. The MD computation can then be formulated as a minimization problem whose objective function is expressed as the difference of convex functions (DC). This interesting property permits optimally solving the optimization problem with DC programming solvers that are globally convergent. We present experimental evidence which shows that our method is able to find the globally optimal solution, outperforming existing methods that yield suboptimal solutions.


Asunto(s)
Algoritmos , Inteligencia Artificial , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnica de Sustracción , Simulación por Computador , Aumento de la Imagen/métodos , Modelos Estadísticos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
10.
Artículo en Inglés | MEDLINE | ID: mdl-31403414

RESUMEN

In this paper, we propose a new graph-based transform and illustrate its potential application to signal compression. Our approach relies on the careful design of a graph that optimizes the overall rate-distortion performance through an effective graph-based transform. We introduce a novel graph estimation algorithm, which uncovers the connectivities between the graph signal values by taking into consideration the coding of both the signal and the graph topology in rate-distortion terms. In particular, we introduce a novel coding solution for the graph by treating the edge weights as another graph signal that lies on the dual graph. Then, the cost of the graph description is introduced in the optimization problem by minimizing the sparsity of the coefficients of its graph Fourier transform (GFT) on the dual graph. In this way, we obtain a convex optimization problem whose solution defines an efficient transform coding strategy. The proposed technique is a general framework that can be applied to different types of signals, and we show two possible application fields, namely natural image coding and piecewise smooth image coding. The experimental results show that the proposed graph-based transform outperforms classical fixed transforms such as DCT for both natural and piecewise smooth images. In the case of depth map coding, the obtained results are even comparable to the state-of-the-art graph-based coding method, that are specifically designed for depth map images.

11.
IEEE Trans Image Process ; 17(7): 1033-46, 2008 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-18586613

RESUMEN

This paper addresses the problem of efficient representation of scenes captured by distributed omnidirectional vision sensors. We propose a novel geometric model to describe the correlation between different views of a 3-D scene. We first approximate the camera images by sparse expansions over a dictionary of geometric atoms. Since the most important visual features are likely to be equivalently dominant in images from multiple cameras, we model the correlation between corresponding features in different views by local geometric transforms. For the particular case of omnidirectional images, we define the multiview transforms between corresponding features based on shape and epipolar geometry constraints. We apply this geometric framework in the design of a distributed coding scheme with side information, which builds an efficient representation of the scene without communication between cameras. The Wyner-Ziv encoder partitions the dictionary into cosets of dissimilar atoms with respect to shape and position in the image. The joint decoder then determines pairwise correspondences between atoms in the reference image and atoms in the cosets of the Wyner-Ziv image in order to identify the most likely atoms to decode under epipolar geometry constraints. Experiments demonstrate that the proposed method leads to reliable estimation of the geometric transforms between views. In particular, the distributed coding scheme offers similar rate-distortion performance as joint encoding at low bit rate and outperforms methods based on independent decoding of the different images.


Asunto(s)
Algoritmos , Inteligencia Artificial , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Transductores
12.
IEEE Trans Image Process ; 27(9): 4207-4218, 2018 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-29870342

RESUMEN

Light field cameras capture the 3D information in a scene with a single exposure. This special feature makes light field cameras very appealing for a variety of applications: from post-capture refocus to depth estimation and image-based rendering. However, light field cameras suffer by design from strong limitations in their spatial resolution. Off-the-shelf super-resolution algorithms are not ideal for light field data, as they do not consider its structure. On the other hand, the few super-resolution algorithms explicitly tailored for light field data exhibit significant limitations, such as the need to carry out a costly disparity estimation procedure with sub-pixel precision. We propose a new light field super-resolution algorithm meant to address these limitations. We use the complementary information in the different light field views to augment the spatial resolution of the whole light field at once. In particular, we show that coupling the multi-view approach with a graph-based regularizer, which enforces the light field geometric structure, permits to avoid the need of a precise and costly disparity estimation step. Extensive experiments show that the new algorithm compares favorably to the state-of-the-art methods for light field super-resolution, both in terms of visual quality and in terms of reconstruction error.

13.
IEEE Trans Image Process ; 26(11): 5477-5490, 2017 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-28783631

RESUMEN

We consider the synthesis of intermediate views of an object captured by two widely spaced and calibrated cameras. This problem is challenging because foreshortening effects and occlusions induce significant differences between the reference images when the cameras are far apart. That makes the association or disappearance/appearance of their pixels difficult to estimate. Our main contribution lies in disambiguating this ill-posed problem by making the interpolated views consistent with a plausible transformation of the object silhouette between the reference views. This plausible transformation is derived from an object-specific prior that consists of a nonlinear shape manifold learned from multiple previous observations of this object by the two reference cameras. The prior is used to estimate the evolution of the epipolar silhouette segments between the reference views. This information directly supports the definition of epipolar silhouette segments in the intermediate views, as well as the synthesis of textures in those segments. It permits to reconstruct the epipolar plane images (EPIs) and the continuum of views associated with the EPI volume, obtained by aggregating the EPIs. Experiments on synthetic and natural images show that our method preserves the object topology in intermediate views and deals effectively with the self-occluded regions and the severe foreshortening effect associated with wide-baseline camera configurations.

14.
IEEE Trans Image Process ; 15(3): 726-39, 2006 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-16519358

RESUMEN

New breakthroughs in image coding possibly lie in signal decomposition through nonseparable basis functions that can efficiently capture edge characteristics, present in natural images. The work proposed in this paper provides an adaptive way of representing images as a sum of two-dimensional features. It presents a low bit-rate image coding method based on a matching pursuit (MP) expansion, over a dictionary built on anisotropic refinement and rotation of contour-like atoms. This method is shown to provide, at low bit rates, results comparable to the state of the art in image compression, represented here by JPEG2000 and SPIHT, with generally a better visual quality in the MP scheme. The coding artifacts are less annoying than the ringing introduced by wavelets at very low bit rate, due to the smoothing performed by the basis functions used in the MP algorithm. In addition to good compression performances at low bit rates, the new coder has the advantage of producing highly flexible streams. They can easily be decoded at any spatial resolution, different from the original image, and the bitstream can be truncated at any point to match diverse bandwidth requirements. The spatial adaptivity is shown to be more flexible and less complex than transcoding operations generally applied to state of the art codec bitstreams. Due to both its ability for capturing the most important parts of multidimensional signals, and a flexible stream structure, the image coder proposed in this paper represents an interesting solution for low to medium rate image coding in visual communication applications.


Asunto(s)
Algoritmos , Gráficos por Computador , Compresión de Datos/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Procesamiento de Señales Asistido por Computador , Redes de Comunicación de Computadores
15.
IEEE Trans Image Process ; 25(1): 134-49, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26561432

RESUMEN

In free viewpoint video systems, a user has the freedom to select a virtual view from which an image of the 3D scene is rendered, and the scene is commonly represented by color and depth images of multiple nearby viewpoints. In such representation, there exists data redundancy across multiple dimensions: 1) a 3D voxel may be represented by pixels in multiple viewpoint images (inter-view redundancy); 2) a pixel patch may recur in a distant spatial region of the same image due to self-similarity (inter-patch redundancy); and 3) pixels in a local spatial region tend to be similar (inter-pixel redundancy). It is important to exploit these redundancies during inter-view prediction toward effective multiview video compression. In this paper, we propose an encoder-driven inpainting strategy for inter-view predictive coding, where explicit instructions are transmitted minimally, and the decoder is left to independently recover remaining missing data via inpainting, resulting in lower coding overhead. In particular, after pixels in a reference view are projected to a target view via depth-image-based rendering at the decoder, the remaining holes in the target view are filled via an inpainting process in a block-by-block manner. First, blocks are ordered in terms of difficulty-to-inpaint by the decoder. Then, explicit instructions are only sent for the reconstruction of the most difficult blocks. In particular, the missing pixels are explicitly coded via a graph Fourier transform or a sparsification procedure using discrete cosine transform, leading to low coding cost. For blocks that are easy to inpaint, the decoder independently completes missing pixels via template-based inpainting. We apply our proposed scheme to frames in a prediction structure defined by JCT-3V where inter-view prediction is dominant, and experimentally we show that our scheme achieves up to 3-dB gain in peak-signal-to-noise-ratio in reconstructed image quality over a comparable 3D-High Efficiency Video Coding implementation using fixed 16 $\times $ 16 block size.

16.
IEEE Trans Image Process ; 25(4): 1765-78, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26891486

RESUMEN

This paper addresses the problem of compression of 3D point cloud sequences that are characterized by moving 3D positions and color attributes. As temporally successive point cloud frames share some similarities, motion estimation is key to effective compression of these sequences. It, however, remains a challenging problem as the point cloud frames have varying numbers of points without explicit correspondence information. We represent the time-varying geometry of these sequences with a set of graphs, and consider 3D positions and color attributes of the point clouds as signals on the vertices of the graphs. We then cast motion estimation as a feature-matching problem between successive graphs. The motion is estimated on a sparse set of representative vertices using new spectral graph wavelet descriptors. A dense motion field is eventually interpolated by solving a graph-based regularization problem. The estimated motion is finally used for removing the temporal redundancy in the predictive coding of the 3D positions and the color characteristics of the point cloud sequences. Experimental results demonstrate that our method is able to accurately estimate the motion between consecutive frames. Moreover, motion estimation is shown to bring a significant improvement in terms of the overall compression performance of the sequence. To the best of our knowledge, this is the first paper that exploits both the spatial correlation inside each frame (through the graph) and the temporal correlation between the frames (through the motion estimation) to compress the color and the geometry of 3D point cloud sequences in an efficient way.

17.
IEEE Trans Image Process ; 25(4): 1808-19, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26890866

RESUMEN

Augmented reality, interactive navigation in 3D scenes, multiview video, and other emerging multimedia applications require large sets of images, hence larger data volumes and increased resources compared with traditional video services. The significant increase in the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multiview video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the data set. In such coding schemes, the two following questions become fundamental: 1) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? And 2) where to place these key views in the multiview data set? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views, such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multiview coding scheme. We show that considering the 3D scene geometry in the reference view, positioning problem brings significant rate-distortion improvements and outperforms the traditional coding strategy that simply selects key frames based on the distance between cameras.

18.
IEEE Trans Image Process ; 24(5): 1573-86, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25675455

RESUMEN

In this paper, we propose a new geometry representation method for multiview image sets. Our approach relies on graphs to describe the multiview geometry information in a compact and controllable way. The links of the graph connect pixels in different images and describe the proximity between pixels in 3D space. These connections are dependent on the geometry of the scene and provide the right amount of information that is necessary for coding and reconstructing multiple views. Our multiview image representation is very compact and adapts the transmitted geometry information as a function of the complexity of the prediction performed at the decoder side. To achieve this, our graph-based representation (GBR) carefully selects the amount of geometry information needed before coding. This is in contrast with depth coding, which directly compresses with losses the original geometry signal, thus making it difficult to quantify the impact of coding errors on geometry-based interpolation. We present the principles of this GBR and we build an efficient coding algorithm to represent it. We compare our GBR approach to classical depth compression methods and compare their respective view synthesis qualities as a function of the compactness of the geometry description. We show that GBR can achieve significant gains in geometry coding rate over depth-based schemes operating at similar quality. Experimental results demonstrate the potential of this new representation.

19.
IEEE Trans Image Process ; 22(4): 1311-25, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23193457

RESUMEN

Manifold models provide low-dimensional representations that are useful for processing and analyzing data in a transformation-invariant way. In this paper, we study the problem of learning smooth pattern transformation manifolds from image sets that represent observations of geometrically transformed signals. To construct a manifold, we build a representative pattern whose transformations accurately fit various input images. We examine two objectives of the manifold-building problem, namely, approximation and classification. For the approximation problem, we propose a greedy method that constructs a representative pattern by selecting analytic atoms from a continuous dictionary manifold. We present a dc optimization scheme that is applicable to a wide range of transformation and dictionary models, and demonstrate its application to the transformation manifolds generated by the rotation, translation, and anisotropic scaling of a reference pattern. Then, we generalize this approach to a setting with multiple transformation manifolds, where each manifold represents a different class of signals. We present an iterative multiple-manifold-building algorithm such that the classification accuracy is promoted in the learning of the representative patterns. The experimental results suggest that the proposed methods yield high accuracy in the approximation and classification of data compared with some reference methods, while the invariance to geometric transformations is achieved because of the transformation manifold model.

20.
IEEE Trans Image Process ; 22(5): 1969-81, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23335670

RESUMEN

Distributed representation of correlated multiview images is an important problem that arises in vision sensor networks. This paper concentrates on the joint reconstruction problem where the distributively compressed images are decoded together in order to take benefit from the image correlation. We consider a scenario where the images captured at different viewpoints are encoded independently using common coding solutions (e.g., JPEG) with a balanced rate distribution among different cameras. A central decoder first estimates the inter-view image correlation from the independently compressed data. The joint reconstruction is then cast as a constrained convex optimization problem that reconstructs total-variation (TV) smooth images, which comply with the estimated correlation model. At the same time, we add constraints that force the reconstructed images to be as close as possible to their compressed versions. We show through experiments that the proposed joint reconstruction scheme outperforms independent reconstruction in terms of image quality, for a given target bit rate. In addition, the decoding performance of our algorithm compares advantageously to state-of-the-art distributed coding schemes based on motion learning and on the DISCOVER algorithm.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA