Búsqueda | Portal de Búsqueda de la BVS Enfermería

Event-based asynchronous HDR imaging by temporal incident light modulation.

Wu, Yuliang; Tan, Ganchao; Chen, Jinze; Zhai, Wei; Cao, Yang; Zha, Zheng-Jun.

Opt Express ; 32(11): 18527-18538, 2024 May 20.

Artículo en Inglés | MEDLINE | ID: mdl-38859006

RESUMEN

Dynamic range (DR) is a pivotal characteristic of imaging systems. Current frame-based cameras struggle to achieve high dynamic range imaging due to the conflict between globally uniform exposure and spatially variant scene illumination. In this paper, we propose AsynHDR, a pixel-asynchronous HDR imaging system, based on key insights into the challenges in HDR imaging and the unique event-generating mechanism of dynamic vision sensors (DVS). Our proposed AsynHDR system integrates the DVS with a set of LCD panels. The LCD panels modulate the irradiance incident upon the DVS by altering their transparency, thereby triggering the pixel-independent event streams. The HDR image is subsequently decoded from the event streams through our temporal-weighted algorithm. Experiments under the standard test platform and several challenging scenes have verified the feasibility of the system in HDR imaging tasks.

Tackling Event-Based Lip-Reading by Exploring Multigrained Spatiotemporal Clues.

Tan, Ganchao; Wan, Zengyu; Wang, Yang; Cao, Yang; Zha, Zheng-Jun.

IEEE Trans Neural Netw Learn Syst ; PP2024 Sep 17.

Artículo en Inglés | MEDLINE | ID: mdl-39288038

RESUMEN

Automatic lip-reading (ALR) is the task of recognizing words based on visual information obtained from the speaker's lip movements. In this study, we introduce event cameras, a novel type of sensing device, for ALR. Event cameras offer both technical and application advantages over conventional cameras for ALR due to their higher temporal resolution, less redundant visual information, and lower power consumption. To recognize words from the event data, we propose a novel multigrained spatiotemporal features learning framework, which is capable of perceiving fine-grained spatiotemporal features from microsecond time-resolved event data. Specifically, we first convert the event data into event frames of multiple temporal resolutions to avoid losing too much visual information at the event representation stage. Then, they are fed into a multibranch subnetwork where the branch operating on low-rate frames can perceive spatially complete but temporally coarse features, while the branch operating on high frame rate can perceive spatially coarse but temporally fine features. Thus, fine-grained spatial and temporal features can be simultaneously learned by integrating the features perceived by different branches. Furthermore, to model the temporal relationships in the event stream, we design a temporal aggregation subnetwork to aggregate the features perceived by the multibranch subnetwork. In addition, we collect two event-based lip-reading datasets (DVS-Lip and DVS-LRW100) for the study of the event-based lip-reading task. Experimental results demonstrate the superiority of the proposed model over the state-of-the-art event-based action recognition models and video-based lip-reading models.

Event-Based Optical Flow via Transforming Into Motion-Dependent View.

Wan, Zengyu; Tan, Ganchao; Wang, Yang; Zhai, Wei; Cao, Yang; Zha, Zheng-Jun.

IEEE Trans Image Process ; 33: 5327-5339, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39058603

RESUMEN

Event cameras respond to temporal dynamics, helping to resolve ambiguities in spatio-temporal changes for optical flow estimation. However, the unique spatio-temporal event distribution challenges the feature extraction, and the direct construction of motion representation through the orthogonal view is less than ideal due to the entanglement of appearance and motion. This paper proposes to transform the orthogonal view into a motion-dependent one for enhancing event-based motion representation and presents a Motion View-based Network (MV-Net) for practical optical flow estimation. Specifically, this motion-dependent view transformation is achieved through the Event View Transformation Module, which captures the relationship between the steepest temporal changes and motion direction, incorporating these temporal cues into the view transformation process for feature gathering. This module includes two phases: extracting the temporal evolution clues by central difference operation in the extraction phase and capturing the motion pattern by evolution-guided deformable convolution in the perception phase. Besides, the MV-Net constructs an eccentric downsampling process to avoid response weakening from the sparsity of events in the downsampling stage. The whole network is trained end-to-end in a self-supervised manner, and the evaluations conducted on four challenging datasets reveal the superior performance of the proposed model compared to state-of-the-art (SOTA) methods.

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA