Búsqueda | Portal Regional de la BVS

1.

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.

Li, Hongyang; Sima, Chonghao; Dai, Jifeng; Wang, Wenhai; Lu, Lewei; Wang, Huijie; Zeng, Jia; Li, Zhiqi; Yang, Jiazhi; Deng, Hanming; Tian, Hao; Xie, Enze; Xie, Jiangwei; Chen, Li; Li, Tianyu; Li, Yang; Gao, Yulu; Jia, Xiaosong; Liu, Si; Shi, Jianping; Lin, Dahua; Qiao, Yu.

IEEE Trans Pattern Anal Mach Intell ; 46(4): 2151-2170, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-37976193

RESUMEN

Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. Conventional approaches for most autonomous driving algorithms perform detection, segmentation, tracking, etc., in a front or perspective view. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. BEV perception inherits several advantages, as representing surrounding scenes in BEV is intuitive and fusion-friendly; and representing objects in BEV is most desirable for subsequent modules as in planning and/or control. The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; (c) how to formulate the pipeline to incorporate features from different sources and views; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios. In this survey, we review the most recent works on BEV perception and provide an in-depth analysis of different solutions. Moreover, several systematic designs of BEV approach from the industry are depicted as well. Furthermore, we introduce a full suite of practical guidebook to improve the performance of BEV perception tasks, including camera, LiDAR and fusion inputs. At last, we point out the future research directions in this area. We hope this report will shed some light on the community and encourage more research effort on BEV perception.

2.

SPTS v2: Single-Point Scene Text Spotting.

Liu, Yuliang; Zhang, Jiaxin; Peng, Dezhi; Huang, Mingxin; Wang, Xinyu; Tang, Jingqun; Huang, Can; Lin, Dahua; Shen, Chunhua; Bai, Xiang; Jin, Lianwen.

IEEE Trans Pattern Anal Mach Intell ; 45(12): 15665-15679, 2023 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37669204

RESUMEN

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel, which significantly reduces the requirement of the length of the sequence. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms.

3.

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-Based Perception.

Zhu, Xinge; Zhou, Hui; Wang, Tai; Hong, Fangzhou; Li, Wei; Ma, Yuexin; Li, Hongsheng; Yang, Ruigang; Lin, Dahua.

IEEE Trans Pattern Anal Mach Intell ; 44(10): 6807-6822, 2022 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-34310286

RESUMEN

State-of-the-art methods for driving-scene LiDAR-based perception (including point cloud semantic segmentation, panoptic segmentation and 3D detection, etc.) often project the point clouds to 2D space and then process them via 2D convolution. Although this cooperation shows the competitiveness in the point cloud, it inevitably alters and abandons the 3D topology and geometric relations. A natural remedy is to utilize the 3D voxelization and 3D convolution network. However, we found that in the outdoor point cloud, the improvement obtained in this way is quite limited. An important reason is the property of the outdoor point cloud, namely sparsity and varying density. Motivated by this investigation, we propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern while maintaining these inherent properties. The proposed model acts as a backbone and the learned features from this model can be used for downstream tasks such as point cloud semantic and panoptic segmentation or 3D detection. In this paper, we benchmark our model on these three tasks. For semantic segmentation, we evaluate the proposed model on several large-scale datasets, i.e., SemanticKITTI, nuScenes and A2D2. Our method achieves the state-of-the-art on the leaderboard of SemanticKITTI (both single-scan and multi-scan challenge), and significantly outperforms existing methods on nuScenes and A2D2 dataset. Furthermore, the proposed 3D framework also shows strong performance and good generalization on LiDAR panoptic segmentation and LiDAR 3D detection.

4.

CARAFE++: Unified Content-Aware ReAssembly of FEatures.

Wang, Jiaqi; Chen, Kai; Xu, Rui; Liu, Ziwei; Loy, Chen Change; Lin, Dahua.

IEEE Trans Pattern Anal Mach Intell ; 44(9): 4674-4687, 2022 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-33881989

RESUMEN

Feature reassembly, i.e. feature downsampling and upsampling, is a key operation in a number of modern convolutional network architectures, e.g., residual networks and feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight, and highly effective operator to fulfill this goal. CARAFE++ has several appealing properties: (1) Unlike conventional methods such as pooling and interpolation that only exploit sub-pixel neighborhood, CARAFE++ aggregates contextual information within a large receptive field. (2) Instead of using a fixed kernel for all samples (e.g. convolution and deconvolution), CARAFE++ generates adaptive kernels on-the-fly to enable instance-specific content-aware handling. (3) CARAFE++ introduces little computational overhead and can be readily integrated into modern network architectures. We conduct comprehensive evaluations on standard benchmarks in object detection, instance/semantic segmentation, and image inpainting. CARAFE++ shows consistent and substantial gains on mainstream methods across all the tasks with negligible computational overhead. It shows great potential to serve as a strong building block for modern deep networks.

5.

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation.

Pan, Xingang; Zhan, Xiaohang; Dai, Bo; Lin, Dahua; Loy, Chen Change; Luo, Ping.

IEEE Trans Pattern Anal Mach Intell ; 44(11): 7474-7489, 2022 11.

Artículo en Inglés | MEDLINE | ID: mdl-34559638

RESUMEN

Learning a good image prior is a long-term goal for image restoration and manipulation. While existing methods like deep image prior (DIP) capture low-level image statistics, there are still gaps toward an image prior that captures rich image semantics including color, spatial coherence, textures, and high-level concepts. This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig. 1, the deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images. It also enables diverse image manipulation including random jittering, image morphing, and category transfer. Such highly flexible restoration and manipulation are made possible through relaxing the assumption of existing GAN inversion methods, which tend to fix the generator. Notably, we allow the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN. We show that these easy-to-implement and practical changes help preserve the reconstruction to remain in the manifold of nature images, and thus lead to more precise and faithful reconstruction for real images. Code is available at https://github.com/XingangPan/deep-generative-prior.

Asunto(s)

Algoritmos , Procesamiento de Imagen Asistido por Computador , Procesamiento de Imagen Asistido por Computador/métodos

6.

Temporal Segment Networks for Action Recognition in Videos.

Wang, Limin; Xiong, Yuanjun; Wang, Zhe; Qiao, Yu; Lin, Dahua; Tang, Xiaoou; Van Gool, Luc.

IEEE Trans Pattern Anal Mach Intell ; 41(11): 2740-2755, 2019 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-30183621

RESUMEN

We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme. This unique design enables the TSN framework to efficiently learn action models by using the whole video. The learned models could be easily deployed for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the implementation of the TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on five challenging action recognition benchmarks: HMDB51 (71.0 percent), UCF101 (94.9 percent), THUMOS14 (80.1 percent), ActivityNet v1.2 (89.6 percent), and Kinetics400 (75.7 percent). In addition, using the proposed RGB difference as a simple motion representation, our method can still achieve competitive accuracy on UCF101 (91.0 percent) while running at 340 FPS. Furthermore, based on the proposed TSN framework, we won the video classification track at the ActivityNet challenge 2016 among 24 teams.

7.

Molecular recognition using corona phase complexes made of synthetic polymers adsorbed on carbon nanotubes.

Zhang, Jingqing; Landry, Markita P; Barone, Paul W; Kim, Jong-Ho; Lin, Shangchao; Ulissi, Zachary W; Lin, Dahua; Mu, Bin; Boghossian, Ardemis A; Hilmer, Andrew J; Rwei, Alina; Hinckley, Allison C; Kruss, Sebastian; Shandell, Mia A; Nair, Nitish; Blake, Steven; Sen, Fatih; Sen, Selda; Croy, Robert G; Li, Deyu; Yum, Kyungsuk; Ahn, Jin-Ho; Jin, Hong; Heller, Daniel A; Essigmann, John M; Blankschtein, Daniel; Strano, Michael S.

Nat Nanotechnol ; 8(12): 959-68, 2013 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-24270641

RESUMEN

Understanding molecular recognition is of fundamental importance in applications such as therapeutics, chemical catalysis and sensor design. The most common recognition motifs involve biological macromolecules such as antibodies and aptamers. The key to biorecognition consists of a unique three-dimensional structure formed by a folded and constrained bioheteropolymer that creates a binding pocket, or an interface, able to recognize a specific molecule. Here, we show that synthetic heteropolymers, once constrained onto a single-walled carbon nanotube by chemical adsorption, also form a new corona phase that exhibits highly selective recognition for specific molecules. To prove the generality of this phenomenon, we report three examples of heteropolymer-nanotube recognition complexes for riboflavin, L-thyroxine and oestradiol. In each case, the recognition was predicted using a two-dimensional thermodynamic model of surface interactions in which the dissociation constants can be tuned by perturbing the chemical structure of the heteropolymer. Moreover, these complexes can be used as new types of spatiotemporal sensors based on modulation of the carbon nanotube photoemission in the near-infrared, as we show by tracking riboflavin diffusion in murine macrophages.

Asunto(s)

Nanotubos de Carbono/química , Polímeros/química , Adsorción , Animales , Estradiol/química , Estradiol/aislamiento & purificación , Ratones , Nanotubos de Carbono/ultraestructura , Riboflavina/química , Riboflavina/aislamiento & purificación , Tiroxina/química , Tiroxina/aislamiento & purificación

8.

Single molecule detection of nitric oxide enabled by d(AT)15 DNA adsorbed to near infrared fluorescent single-walled carbon nanotubes.

Zhang, Jingqing; Boghossian, Ardemis A; Barone, Paul W; Rwei, Alina; Kim, Jong-Ho; Lin, Dahua; Heller, Daniel A; Hilmer, Andrew J; Nair, Nitish; Reuel, Nigel F; Strano, Michael S.

J Am Chem Soc ; 133(3): 567-81, 2011 Jan 26.

Artículo en Inglés | MEDLINE | ID: mdl-21142158

RESUMEN

We report the selective detection of single nitric oxide (NO) molecules using a specific DNA sequence of d(AT)(15) oligonucleotides, adsorbed to an array of near-infrared fluorescent semiconducting single-walled carbon nanotubes (AT(15)-SWNT). While SWNT suspended with eight other variant DNA sequences show fluorescence quenching or enhancement from analytes such as dopamine, NADH, L-ascorbic acid, and riboflavin, d(AT)(15) imparts SWNT with a distinct selectivity toward NO. In contrast, the electrostatically neutral polyvinyl alcohol enables no response to nitric oxide, but exhibits fluorescent enhancement to other molecules in the tested library. For AT(15)-SWNT, a stepwise fluorescence decrease is observed when the nanotubes are exposed to NO, reporting the dynamics of single-molecule NO adsorption via SWNT exciton quenching. We describe these quenching traces using a birth-and-death Markov model, and the maximum likelihood estimator of adsorption and desorption rates of NO is derived. Applying the method to simulated traces indicates that the resulting error in the estimated rate constants is less than 5% under our experimental conditions, allowing for calibration using a series of NO concentrations. As expected, the adsorption rate is found to be linearly proportional to NO concentration, and the intrinsic single-site NO adsorption rate constant is 0.001 s(-1) µM NO(-1). The ability to detect nitric oxide quantitatively at the single-molecule level may find applications in new cellular assays for the study of nitric oxide carcinogenesis and chemical signaling, as well as medical diagnostics for inflammation.

Asunto(s)

ADN/química , Nanotubos de Carbono , Óxido Nítrico/química , Adsorción , Fluorescencia , Microscopía de Fuerza Atómica , Espectroscopía Infrarroja Corta

9.

Nonparametric discriminant analysis for face recognition.

Li, Zhifeng; Lin, Dahua; Tang, Xiaoou.

IEEE Trans Pattern Anal Mach Intell ; 31(4): 755-61, 2009 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-19229090

RESUMEN

In this paper, we develop a new framework for face recognition based on nonparametric discriminant analysis (NDA) and multi-classifier integration. Traditional LDA-based methods suffer a fundamental limitation originating from the parametric nature of scatter matrices, which are based on the Gaussian distribution assumption. The performance of these methods notably degrades when the actual distribution is Non-Gaussian. To address this problem, we propose a new formulation of scatter matrices to extend the two-class nonparametric discriminant analysis to multi-class cases. Then, we develop two more improved multi-class NDA-based algorithms (NSA and NFA) with each one having two complementary methods based on the principal space and the null space of the intra-class scatter matrix respectively. Comparing to the NSA, the NFA is more effective in the utilization of the classification boundary information. In order to exploit the complementary nature of the two kinds of NFA (PNFA and NNFA), we finally develop a dual NFA-based multi-classifier fusion framework by employing the over complete Gabor representation to boost the recognition performance. We show the improvements of the developed new algorithms over the traditional subspace methods through comparative experiments on two challenging face databases, Purdue AR database and XM2VTS database.

Asunto(s)

Cara , Reconocimiento de Normas Patrones Automatizadas/estadística & datos numéricos , Análisis Discriminante , Humanos , Estadísticas no Paramétricas

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA