Búsqueda | BVS CLAP/SMR-OPS/OMS

Learning to Overcome Noise in Weak Caption Supervision for Object Detection.

Unal, Mesut Erhan; Ye, Keren; Zhang, Mingda; Thomas, Christopher; Kovashka, Adriana; Li, Wei; Qin, Danfeng; Berent, Jesse.

IEEE Trans Pattern Anal Mach Intell ; 45(4): 4897-4914, 2023 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-35771793

RESUMEN

We propose the first mechanism to train object detection models from weak supervision in the form of captions at the image level. Language-based supervision for detection is appealing and inexpensive: many blogs with images and descriptive text written by human users exist. However, there is significant noise in this supervision: captions do not mention all objects that are shown, and may mention extraneous concepts. We first propose a technique to determine which image-caption pairs provide suitable signal for supervision. We further propose several complementary mechanisms to extract image-level pseudo labels for training from the caption. Finally, we train an iterative weakly-supervised object detection model from these image-level pseudo labels. We use captions from four datasets (COCO, Flickr30K, MIRFlickr1M, and Conceptual Captions) whose level of noise varies. We evaluate our approach on two object detection datasets. Weighting the labels extracted from different captions provides a boost over treating all captions equally. Further, our primary proposed technique for inferring pseudo labels for training at the image level, outperforms alternative techniques under a wide variety of settings. Both techniques generalize to datasets beyond the one they were trained on.

Interpreting the Rhetoric of Visual Advertisements.

Ye, Keren; Nazari, Narges Honarvar; Hahn, James; Hussain, Zaeem; Zhang, Mingda; Kovashka, Adriana.

IEEE Trans Pattern Anal Mach Intell ; 43(4): 1308-1323, 2021 04.

Artículo en Inglés | MEDLINE | ID: mdl-31634123

RESUMEN

Visual media have important persuasive power, but prior computer vision approaches have predominantly ignored the persuasive aspects of images. In this work, we propose a suite of data and techniques that enable progress on understanding the messages that visual advertisements convey. We make available a dataset of 64,832 image ads and 3,477 video ads, annotated with ten types of information: the topic and sentiment of the ad; whether it is funny, exciting, or effective; what action it prompts the viewer to do, and what arguments it provides for why this action should be taken; symbolic associations that the ad relies on; the metaphorical object transformations on which especially creative ads rely; and the climax in video ads. We develop methods that use multimodal cues, i.e., both visuals and slogans, for both the image and video domains. Our methods rely on finding poignant content spatially and temporally. We also examine the creative story construction in ads: for videos, we learn to predict when the climax occurs (if any), and how effective the story is; for images, we analyze how object transformations in ads metaphorically depict product properties.

Ver mas detalles

ENVIAR RESULTADO:

Exportar

Imprimir

RSS

XML

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA