Pesquisa | Biblioteca Virtual em Saúde

PiGLET: Pixel-Level Grounding of Language Expressions With Transformers.

Gonzalez, Cristina; Ayobi, Nicolas; Hernandez, Isabela; Pont-Tuset, Jordi; Arbelaez, Pablo.

IEEE Trans Pattern Anal Mach Intell ; 45(10): 12206-12221, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37339036

RESUMO

This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem. We establish an experimental framework for the study of this new task, including new ground truth and metrics. We propose PiGLET, a novel multi-modal Transformer architecture to tackle the Panoptic Narrative Grounding task, and to serve as a stepping stone for future work. We exploit the intrinsic semantic richness in an image by including panoptic categories, and we approach visual grounding at a fine-grained level using segmentations. In terms of ground truth, we propose an algorithm to automatically transfer Localized Narratives annotations to specific regions in the panoptic segmentations of the MS COCO dataset. PiGLET achieves a performance of 63.2 absolute Average Recall points. By leveraging the rich language information on the Panoptic Narrative Grounding benchmark on MS COCO, PiGLET obtains an improvement of 0.4 Panoptic Quality points over its base method on the panoptic segmentation task. Finally, we demonstrate the generalizability of our method to other natural language visual grounding problems such as Referring Expression Segmentation. PiGLET is competitive with previous state-of-the-art in RefCOCO, RefCOCO+ and RefCOCOg.

STRIDE: Street View-based Environmental Feature Detection and Pedestrian Collision Prediction.

González, Cristina; Ayobi, Nicolás; Escallón, Felipe; Baldovino-Chiquillo, Laura; Wilches-Mogollón, Maria; Pasos, Donny; Ramírez, Nicole; Pinzón, Jose; Sarmiento, Olga; Quistberg, D Alex; Arbeláez, Pablo.

IEEE Int Conf Comput Vis Workshops ; 2023: 3222-3234, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-39104779

RESUMO

This paper introduces a novel benchmark to study the impact and relationship of built environment elements on pedestrian collision prediction, intending to enhance environmental awareness in autonomous driving systems to prevent pedestrian injuries actively. We introduce a built environment detection task in large-scale panoramic images and a detection-based pedestrian collision frequency prediction task. We propose a baseline method that incorporates a collision prediction module into a state-of-the-art detection model to tackle both tasks simultaneously. Our experiments demonstrate a significant correlation between object detection of built environment elements and pedestrian collision frequency prediction. Our results are a stepping stone towards understanding the interdependencies between built environment conditions and pedestrian safety.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA