Búsqueda | BVS Bolivia

1.

Language-Aware Vision Transformer for Referring Segmentation.

Yang, Zhao; Wang, Jiaqi; Ye, Xubing; Tang, Yansong; Chen, Kai; Zhao, Hengshuang; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Sep 25.

Artículo en Inglés | MEDLINE | ID: mdl-39321010

RESUMEN

Referring segmentation is a fundamental vision-language task that aims to segment out an object from an image or video in accordance with a natural language description. One of the key challenges behind this task is leveraging the referring expression for highlighting relevant positions in the image or video frames. A paradigm for tackling this problem in both the image and the video domains is to leverage a powerful vision-language ("cross-modal") decoder to fuse features independently extracted from a vision encoder and a language encoder. Recent methods have made remarkable advances in this paradigm by exploiting Transformers as cross-modal decoders, concurrent to the Transformer's overwhelming success in many other vision-language tasks. Adopting a different approach in this work, we show that significantly better cross-modal alignments can be achieved through the early fusion of linguistic and visual features in intermediate layers of a vision Transformer encoder network. Based on the idea of conducting cross-modal feature fusion in the visual feature encoding stage, we propose a unified framework named Language-Aware Vision Transformer (LAVT), which leverages the well-proven correlation modeling power of a Transformer encoder for excavating helpful multi-modal context. This way, accurate segmentation results can be harvested with a light-weight mask predictor. One of the key components in the proposed system is a dense attention mechanism for collecting pixel-specific linguistic cues. When dealing with video inputs, we present the video LAVT framework and design a 3D version of this component by introducing multi-scale convolutional operators arranged in a parallel fashion, which can exploit spatio-temporal dependencies at different granularity levels. We further introduce unified LAVT as a unified framework capable of handling both image and video inputs, with enhanced segmentation capabilities for the unified referring segmentation task. Our methods surpass previous state-of-the-art methods on seven benchmarks for referring image segmentation and referring video segmentation. The code to reproduce our experiments is available at LAVT-RS.

2.

Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning.

Xue, Nan; Wu, Tianfu; Bai, Song; Wang, Fu-Dong; Xia, Gui-Song; Zhang, Liangpei; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; 45(12): 14727-14744, 2023 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37676811

RESUMEN

This article presents Holistically-Attracted Wireframe Parsing (HAWP), a method for geometric analysis of 2D images containing wireframes formed by line segments and junctions. HAWP utilizes a parsimonious Holistic Attraction (HAT) field representation that encodes line segments using a closed-form 4D geometric vector field. The proposed HAWP consists of three sequential components empowered by end-to-end and HAT-driven designs: 1) generating a dense set of line segments from HAT fields and endpoint proposals from heatmaps, 2) binding the dense line segments to sparse endpoint proposals to produce initial wireframes, and 3) filtering false positive proposals through a novel endpoint-decoupled line-of-interest aligning (EPD LOIAlign) module that captures the co-occurrence between endpoint proposals and HAT fields for better verification. Thanks to our novel designs, HAWPv2 shows strong performance in fully supervised learning, while HAWPv3 excels in self-supervised learning, achieving superior repeatability scores and efficient training (24 GPU hours on a single GPU). Furthermore, HAWPv3 exhibits a promising potential for wireframe parsing in out-of-distribution images without providing ground truth labels of wireframes.

3.

SiamMask: A Framework for Fast Online Object Tracking and Segmentation.

Hu, Weiming; Wang, Qiang; Zhang, Li; Bertinetto, Luca; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 3072-3089, 2023 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-37022470

RESUMEN

In this article, we introduce SiamMask, a framework to perform both visual object tracking and video object segmentation, in real-time, with the same simple method. We improve the offline training procedure of popular fully-convolutional Siamese approaches by augmenting their losses with a binary segmentation task. Once the offline training is completed, SiamMask only requires a single bounding box for initialization and can simultaneously carry out visual object tracking and segmentation at high frame-rates. Moreover, we show that it is possible to extend the framework to handle multiple object tracking and segmentation by simply re-using the multi-task model in a cascaded fashion. Experimental results show that our approach has high processing efficiency, at around 55 frames per second. It yields real-time state-of-the art results on visual-object tracking benchmarks, while at the same time demonstrating competitive performance at a high speed for video object segmentation benchmarks.

4.

Patch-Based Separable Transformer for Visual Recognition.

Sun, Shuyang; Yue, Xiaoyu; Zhao, Hengshuang; Torr, Philip H S; Bai, Song.

IEEE Trans Pattern Anal Mach Intell ; 45(7): 9241-9247, 2023 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-37015401

RESUMEN

The computational complexity of transformers limits it to be widely deployed onto frameworks for visual recognition. Recent work Dosovitskiy et al. 2021 significantly accelerates the network processing speed by reducing the resolution at the beginning of the network, however, it is still hard to be directly generalized onto other downstream tasks e.g.object detection and segmentation like CNN. In this paper, we present a transformer-based architecture retaining both the local and global interactions within the network, and can be transferable to other downstream tasks. The proposed architecture reforms the original full spatial self-attention into pixel-wise local attention and patch-wise global attention. Such factorization saves the computational cost while retaining the information of different granularities, which helps generate multi-scale features required by different tasks. By exploiting the factorized attention, we construct a Separable Transformer (SeT) for visual modeling. Experimental results show that SeT outperforms the previous state-of-the-art transformer-based approaches and its CNN counterparts on three major tasks including image classification, object detection and instance segmentation.1.

5.

Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation.

Tang, Hao; Torr, Philip H S; Sebe, Nicu.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 6055-6071, 2023 May.

Artículo en Inglés | MEDLINE | ID: mdl-36215369

RESUMEN

We propose a novel model named Multi-Channel Attention Selection Generative Adversarial Network (SelectionGAN) for guided image-to-image translation, where we translate an input image into another while respecting an external semantic guidance. The proposed SelectionGAN explicitly utilizes the semantic guidance information and consists of two stages. In the first stage, the input image and the conditional semantic guidance are fed into a cycled semantic-guided generation network to produce initial coarse results. In the second stage, we refine the initial results by using the proposed multi-scale spatial pooling & channel selection module and the multi-channel attention selection module. Moreover, uncertainty maps automatically learned from attention maps are used to guide the pixel loss for better network optimization. Exhaustive experiments on four challenging guided image-to-image translation tasks (face, hand, body, and street view) demonstrate that our SelectionGAN is able to generate significantly better results than the state-of-the-art methods. Meanwhile, the proposed framework and modules are unified solutions and can be applied to solve other generation tasks such as semantic image synthesis. The code is available at https://github.com/Ha0Tang/SelectionGAN.

6.

Local and Global GANs With Semantic-Aware Upsampling for Image Generation.

Tang, Hao; Shao, Ling; Torr, Philip H S; Sebe, Nicu.

IEEE Trans Pattern Anal Mach Intell ; 45(1): 768-784, 2023 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-35263249

RESUMEN

In this paper, we address the task of semantic-guided image generation. One challenge common to most existing image-level generation methods is the difficulty in generating small objects and detailed local textures. To address this, in this work we consider generating images using local context. As such, we design a local class-specific generative network using semantic maps as guidance, which separately constructs and learns subgenerators for different classes, enabling it to capture finer details. To learn more discriminative class-specific feature representations for the local generation, we also propose a novel classification module. To combine the advantages of both global image-level and local class-specific generation, a joint generation network is designed with an attention fusion module and a dual-discriminator structure embedded. Lastly, we propose a novel semantic-aware upsampling method, which has a larger receptive field and can take far-away pixels that are semantically related for feature upsampling, enabling it to better preserve semantic consistency for instances with the same semantic labels. Extensive experiments on two image generation tasks show the superior performance of the proposed method. State-of-the-art results are established by large margins on both tasks and on nine challenging public benchmarks. The source code and trained models are available at https://github.com/Ha0Tang/LGGAN.

7.

Dynamic Graph Message Passing Networks.

Zhang, Li; Chen, Mohan; Arnab, Anurag; Xue, Xiangyang; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 5712-5730, 2023 May.

Artículo en Inglés | MEDLINE | ID: mdl-36121952

RESUMEN

Modelling long-range dependencies is critical for scene understanding tasks in computer vision. Although convolution neural networks (CNNs) have excelled in many vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels. A fully-connected graph, such as the self-attention operation in Transformers, is beneficial for such modelling, however, its computational overhead is prohibitive. In this paper, we propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. This is achieved by adaptively sampling nodes in the graph, conditioned on the input, for message passing. Based on the sampled nodes, we dynamically predict node-dependent filter weights and the affinity matrix for propagating information between them. This formulation allows us to design a self-attention module, and more importantly a new Transformer-based backbone network, that we use for both image classification pretraining, and for addressing various downstream tasks (e.g. object detection, instance and semantic segmentation). Using this model, we show significant improvements with respect to strong, state-of-the-art baselines on four different tasks. Our approach also outperforms fully-connected graphs while using substantially fewer floating-point operations and parameters. Code and models will be made publicly available at https://github.com/fudan-zvg/DGMN2.

8.

AttentionGAN: Unpaired Image-to-Image Translation Using Attention-Guided Generative Adversarial Networks.

Tang, Hao; Liu, Hong; Xu, Dan; Torr, Philip H S; Sebe, Nicu.

IEEE Trans Neural Netw Learn Syst ; 34(4): 1972-1987, 2023 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-34473628

RESUMEN

State-of-the-art methods in the image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data. Though the existing methods have achieved promising results, they still produce visual artifacts, being able to translate low-level information but not high-level semantics of input images. One possible reason is that generators do not have the ability to perceive the most discriminative parts between the source and target domains, thus making the generated images low quality. In this article, we propose a new Attention-Guided Generative Adversarial Networks (AttentionGAN) for the unpaired image-to-image translation task. AttentionGAN can identify the most discriminative foreground objects and minimize the change of the background. The attention-guided generators in AttentionGAN are able to produce attention masks, and then fuse the generation output with the attention masks to obtain high-quality target images. Accordingly, we also design a novel attention-guided discriminator which only considers attended regions. Extensive experiments are conducted on several generative tasks with eight public datasets, demonstrating that the proposed method is effective to generate sharper and more realistic images compared with existing competitive models. The code is available at https://github.com/Ha0Tang/AttentionGAN.

9.

GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement for Joint Depth and Surface Normal Estimation.

Qi, Xiaojuan; Liu, Zhengzhe; Liao, Renjie; Torr, Philip H S; Urtasun, Raquel; Jia, Jiaya.

IEEE Trans Pattern Anal Mach Intell ; 44(2): 969-984, 2022 02.

Artículo en Inglés | MEDLINE | ID: mdl-32870785

RESUMEN

In this paper, we propose a geometric neural network with edge-aware refinement (GeoNet++) to jointly predict both depth and surface normal maps from a single image. Building on top of two-stream CNNs, GeoNet++ captures the geometric relationships between depth and surface normals with the proposed depth-to-normal and normal-to-depth modules. In particular, the "depth-to-normal" module exploits the least square solution of estimating surface normals from depth to improve their quality, while the "normal-to-depth" module refines the depth map based on the constraints on surface normals through kernel regression. Boundary information is exploited via an edge-aware refinement module. GeoNet++ effectively predicts depth and surface normals with high 3D consistency and sharp boundaries resulting in better reconstructed 3D scenes. Note that GeoNet++ is generic and can be used in other depth/normal prediction frameworks to improve 3D reconstruction quality and pixel-wise accuracy of depth and surface normals. Furthermore, we propose a new 3D geometric metric (3DGM) for evaluating depth prediction in 3D. In contrast to current metrics that focus on evaluating pixel-wise error/accuracy, 3DGM measures whether the predicted depth can reconstruct high quality 3D surface normals. This is a more natural metric for many 3D application domains. Our experiments on NYUD-V2 [1] and KITTI [2] datasets verify that GeoNet++ produces fine boundary details and the predicted depth can be used to reconstruct high quality 3D surfaces.

Asunto(s)

Algoritmos , Redes Neurales de la Computación , Análisis de los Mínimos Cuadrados

10.

Deep learning for predicting COVID-19 malignant progression.

Fang, Cong; Bai, Song; Chen, Qianlan; Zhou, Yu; Xia, Liming; Qin, Lixin; Gong, Shi; Xie, Xudong; Zhou, Chunhua; Tu, Dandan; Zhang, Changzheng; Liu, Xiaowu; Chen, Weiwei; Bai, Xiang; Torr, Philip H S.

Med Image Anal ; 72: 102096, 2021 08.

Artículo en Inglés | MEDLINE | ID: mdl-34051438

RESUMEN

As COVID-19 is highly infectious, many patients can simultaneously flood into hospitals for diagnosis and treatment, which has greatly challenged public medical systems. Treatment priority is often determined by the symptom severity based on first assessment. However, clinical observation suggests that some patients with mild symptoms may quickly deteriorate. Hence, it is crucial to identify patient early deterioration to optimize treatment strategy. To this end, we develop an early-warning system with deep learning techniques to predict COVID-19 malignant progression. Our method leverages CT scans and the clinical data of outpatients and achieves an AUC of 0.920 in the single-center study. We also propose a domain adaptation approach to improve the generalization of our model and achieve an average AUC of 0.874 in the multicenter study. Moreover, our model automatically identifies crucial indicators that contribute to the malignant progression, including Troponin, Brain natriuretic peptide, White cell count, Aspartate aminotransferase, Creatinine, and Hypersensitive C-reactive protein.

Asunto(s)

COVID-19 , Aprendizaje Profundo , Humanos , SARS-CoV-2 , Tomografía Computarizada por Rayos X

11.

Learning Regional Attraction for Line Segment Detection.

Xue, Nan; Bai, Song; Wang, Fu-Dong; Xia, Gui-Song; Wu, Tianfu; Zhang, Liangpei; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; 43(6): 1998-2013, 2021 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-31831408

RESUMEN

This paper presents regional attraction of line segment maps, and hereby poses the problem of line segment detection (LSD) as a problem of region coloring. Given a line segment map, the proposed regional attraction first establishes the relationship between line segments and regions in the image lattice. Based on this, the line segment map is equivalently transformed to an attraction field map (AFM), which can be remapped to a set of line segments without loss of information. Accordingly, we develop an end-to-end framework to learn attraction field maps for raw input images, followed by a squeeze module to detect line segments. Apart from existing works, the proposed detector properly handles the local ambiguity and does not rely on the accurate identification of edge pixels. Comprehensive experiments on the Wireframe dataset and the YorkUrban dataset demonstrate the superiority of our method. In particular, we achieve an F-measure of 0.831 on the Wireframe dataset, advancing the state-of-the-art performance by 10.3 percent.

12.

Adversarial Metric Attack and Defense for Person Re-Identification.

Bai, Song; Li, Yingwei; Zhou, Yuyin; Li, Qizhu; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; 43(6): 2119-2126, 2021 06.

Artículo en Inglés | MEDLINE | ID: mdl-33064650

RESUMEN

Person re-identification (re-ID) has attracted much attention recently due to its great importance in video surveillance. In general, distance metrics used to identify two person images are expected to be robust under various appearance changes. However, our work observes the extreme vulnerability of existing distance metrics to adversarial examples, generated by simply adding human-imperceptible perturbations to person images. Hence, the security danger is dramatically increased when deploying commercial re-ID systems in video surveillance. Although adversarial examples have been extensively applied for classification analysis, it is rarely studied in metric analysis like person re-identification. The most likely reason is the natural gap between the training and testing of re-ID networks, that is, the predictions of a re-ID network cannot be directly used during testing without an effective metric. In this work, we bridge the gap by proposing Adversarial Metric Attack, a parallel methodology to adversarial classification attacks. Comprehensive experiments clearly reveal the adversarial effects in re-ID systems. Meanwhile, we also present an early attempt of training a metric-preserving network, thereby defending the metric against adversarial attacks. At last, by benchmarking various adversarial settings, we expect that our work can facilitate the development of adversarial attack and defense in metric-based applications.

13.

Lessons from reinforcement learning for biological representations of space.

Muryy, Alex; Siddharth, N; Nardelli, Nantas; Glennerster, Andrew; Torr, Philip H S.

Vision Res ; 174: 79-93, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-32683096

RESUMEN

Neuroscientists postulate 3D representations in the brain in a variety of different coordinate frames (e.g. 'head-centred', 'hand-centred' and 'world-based'). Recent advances in reinforcement learning demonstrate a quite different approach that may provide a more promising model for biological representations underlying spatial perception and navigation. In this paper, we focus on reinforcement learning methods that reward an agent for arriving at a target image without any attempt to build up a 3D 'map'. We test the ability of this type of representation to support geometrically consistent spatial tasks such as interpolating between learned locations using decoding of feature vectors. We introduce a hand-crafted representation that has, by design, a high degree of geometric consistency and demonstrate that, in this case, information about the persistence of features as the camera translates (e.g. distant features persist) can improve performance on the geometric tasks. These examples avoid Cartesian (in this case, 2D) representations of space. Non-Cartesian, learned representations provide an important stimulus in neuroscience to the search for alternatives to a 'cognitive map'.

Asunto(s)

Aprendizaje , Refuerzo en Psicología , Encéfalo , Humanos , Recompensa , Percepción Espacial

14.

On the Robustness of Semantic Segmentation Models to Adversarial Attacks.

Arnab, Anurag; Miksik, Ondrej; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; 42(12): 3040-3053, 2020 12.

Artículo en Inglés | MEDLINE | ID: mdl-31150338

RESUMEN

Deep Neural Networks (DNNs) have demonstrated exceptional performance on most recognition tasks such as image classification and segmentation. However, they have also been shown to be vulnerable to adversarial examples. This phenomenon has recently attracted a lot of attention but it has not been extensively studied on multiple, large-scale datasets and structured prediction tasks such as semantic segmentation which often require more specialised networks with additional components such as CRFs, dilated convolutions, skip-connections and multiscale processing. In this paper, we present what to our knowledge is the first rigorous evaluation of adversarial attacks on modern semantic segmentation models, using two large-scale datasets. We analyse the effect of different network architectures, model capacity and multiscale processing, and show that many observations made on the task of classification do not always transfer to this more complex task. Furthermore, we show how mean-field inference in deep structured models, multiscale processing (and more generally, input transformations) naturally implement recently proposed adversarial defenses. Our observations will aid future efforts in understanding and defending against adversarial examples. Moreover, in the shorter term, we show how to effectively benchmark robustness and show which segmentation models should currently be preferred in safety-critical applications due to their inherent robustness.

15.

ROAM: A Rich Object Appearance Model with Application to Rotoscoping.

Perez-Rua, Juan-Manuel; Miksik, Ondrej; Crivelli, Tomas; Bouthemy, Patrick; Torr, Philip H S; Perez, Patrick.

IEEE Trans Pattern Anal Mach Intell ; 42(8): 1996-2010, 2020 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-30872223

RESUMEN

Rotoscoping, the detailed delineation of scene elements through a video shot, is a painstaking task of tremendous importance in professional post-production pipelines. While pixel-wise segmentation techniques can help for this task, professional rotoscoping tools rely on parametric curves that offer the artists a much better interactive control on the definition, editing and manipulation of the segments of interest. Sticking to this prevalent rotoscoping paradigm, we propose a novel framework to capture and track the visual aspect of an arbitrary object in a scene, given an initial closed outline of this object. This model combines a collection of local foreground/background appearance models spread along the outline, a global appearance model of the enclosed object and a set of distinctive foreground landmarks. The structure of this rich appearance model allows simple initialization, efficient iterative optimization with exact minimization at each step, and on-line adaptation in videos. We further extend this model by so-called trimaps which serve as an input to alpha-matting algorithms to allow truly seamless compositing. To this end, we leverage local classifiers attached to the roto-curves to define a confidence measure that is well-suited to define trimaps with adaptive band-widths. The resulting trimaps are parametric, temporally consistent and remain fully editable by the artist. We demonstrate qualitatively and quantitatively the merit of this framework through comparisons with tools based on either dynamic segmentation with a closed curve or pixel-wise binary labelling.

16.

Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade.

Cavallari, Tommaso; Golodetz, Stuart; Lord, Nicholas A; Valentin, Julien; Prisacariu, Victor A; Stefano, Luigi Di; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; 42(10): 2465-2477, 2020 10.

Artículo en Inglés | MEDLINE | ID: mdl-31059430

RESUMEN

Camera pose estimation is an important problem in computer vision, with applications as diverse as simultaneous localisation and mapping, virtual/augmented reality and navigation. Common techniques match the current image against keyframes with known poses coming from a tracker, directly regress the pose, or establish correspondences between keypoints in the current image and points in the scene in order to estimate the pose. In recent years, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but have traditionally needed to be trained offline on the target scene, preventing relocalisation in new environments. Recently, we showed how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. The adapted forests achieved relocalisation performance that was on par with that of offline forests, and our approach was able to estimate the camera pose in close to real time, which made it desirable for systems that require online relocalisation. In this paper, we present an extension of this work that achieves significantly better relocalisation performance whilst running fully in real time. To achieve this, we make several changes to the original approach: (i) instead of simply accepting the camera pose hypothesis produced by RANSAC without question, we make it possible to score the final few hypotheses it considers using a geometric approach and select the most promising one; (ii) we chain several instantiations of our relocaliser (with different parameter settings) together in a cascade, allowing us to try faster but less accurate relocalisation first, only falling back to slower, more accurate relocalisation as necessary; and (iii) we tune the parameters of our cascade, and the individual relocalisers it contains, to achieve effective overall performance. Taken together, these changes allow us to significantly improve upon the performance our original state-of-the-art method was able to achieve on the well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional contributions, we present a novel way of visualising the internal behaviour of our forests, and use the insights gleaned from this to show how to entirely circumvent the need to pre-train a forest on a generic scene.

17.

Deeply Supervised Salient Object Detection with Short Connections.

Hou, Qibin; Cheng, Ming-Ming; Hu, Xiaowei; Borji, Ali; Tu, Zhuowen; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; 41(4): 815-828, 2019 04.

Artículo en Inglés | MEDLINE | ID: mdl-29993862

RESUMEN

Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. The Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis of the role of training data on performance. We provide a training set for future research and fair comparisons.

18.

Collaborative Large-Scale Dense 3D Reconstruction with Online Inter-Agent Pose Optimisation.

Golodetz, Stuart; Cavallari, Tommaso; Lord, Nicholas A; Prisacariu, Victor A; Murray, David W; Torr, Philip H S.

IEEE Trans Vis Comput Graph ; 24(11): 2895-2905, 2018 11.

Artículo en Inglés | MEDLINE | ID: mdl-30334761

RESUMEN

Reconstructing dense, volumetric models of real-world 3D scenes is important for many tasks, but capturing large scenes can take significant time, and the risk of transient changes to the scene goes up as the capture time increases. These are good reasons to want instead to capture several smaller sub-scenes that can be joined to make the whole scene. Achieving this has traditionally been difficult: joining sub-scenes that may never have been viewed from the same angle requires a high-quality camera relocaliser that can cope with novel poses, and tracking drift in each sub-scene can prevent them from being joined to make a consistent overall scene. Recent advances, however, have significantly improved our ability to capture medium-sized sub-scenes with little to no tracking drift: real-time globally consistent reconstruction systems can close loops and re-integrate the scene surface on the fly, whilst new visual-inertial odometry approaches can significantly reduce tracking drift during live reconstruction. Moreover, high-quality regression forest-based relocalisers have recently been made more practical by the introduction of a method to allow them to be trained and used online. In this paper, we leverage these advances to present what to our knowledge is the first system to allow multiple users to collaborate interactively to reconstruct dense, voxel-based models of whole buildings using only consumer-grade hardware, a task that has traditionally been both time-consuming and dependent on the availability of specialised hardware. Using our system, an entire house or lab can be reconstructed in under half an hour and at a far lower cost than was previously possible.

19.

CODE: Coherence Based Decision Boundaries for Feature Correspondence.

Lin, Wen-Yan; Wang, Fan; Cheng, Ming-Ming; Yeung, Sai-Kit; Torr, Philip H S; Do, Minh N; Lu, Jiangbo.

IEEE Trans Pattern Anal Mach Intell ; 40(1): 34-47, 2018 01.

Artículo en Inglés | MEDLINE | ID: mdl-28092524

RESUMEN

A key challenge in feature correspondence is the difficulty in differentiating true and false matches at a local descriptor level. This forces adoption of strict similarity thresholds that discard many true matches. However, if analyzed at a global level, false matches are usually randomly scattered while true matches tend to be coherent (clustered around a few dominant motions), thus creating a coherence based separability constraint. This paper proposes a non-linear regression technique that can discover such a coherence based separability constraint from highly noisy matches and embed it into a correspondence likelihood model. Once computed, the model can filter the entire set of nearest neighbor matches (which typically contains over 90 percent false matches) for true matches. We integrate our technique into a full feature correspondence system which reliably generates large numbers of good quality correspondences over wide baselines where previous techniques provide few or no matches.

20.

Sequential Optimization for Efficient High-Quality Object Proposal Generation.

Zhang, Ziming; Liu, Yun; Chen, Xi; Zhu, Yanjun; Cheng, Ming-Ming; Saligrama, Venkatesh; Torr, Philip H S.

IEEE Trans Pattern Anal Mach Intell ; 40(5): 1209-1223, 2018 05.

Artículo en Inglés | MEDLINE | ID: mdl-28541893

RESUMEN

We are motivated by the need for a generic object proposal generation algorithm which achieves good balance between object detection recall, proposal localization quality and computational efficiency. We propose a novel object proposal algorithm, BING++, which inherits the virtue of good computational efficiency of BING [1] but significantly improves its proposal localization quality. At high level we formulate the problem of object proposal generation from a novel probabilistic perspective, based on which our BING++ manages to improve the localization quality by employing edges and segments to estimate object boundaries and update the proposals sequentially. We propose learning the parameters efficiently by searching for approximate solutions in a quantized parameter space for complexity reduction. We demonstrate the generalization of BING++ with the same fixed parameters across different object classes and datasets. Empirically our BING++ can run at half speed of BING on CPU, but significantly improve the localization quality by 18.5 and 16.7 percent on both VOC2007 and Microhsoft COCO datasets, respectively. Compared with other state-of-the-art approaches, BING++ can achieve comparable performance, but run significantly faster.

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA