Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-29993628

RESUMO

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being sufficiently grounded in vision to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person real-time chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and consists of dialog question-answer pairs from 10-round, human-human dialogs grounded in images from the COCO dataset.

2.
IEEE Trans Pattern Anal Mach Intell ; 38(4): 627-38, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26959669

RESUMO

Relating visual information to its linguistic semantic meaning remains an open and challenging area of research. The semantic meaning of images depends on the presence of objects, their attributes and their relations to other objects. But precisely characterizing this dependence requires extracting complex visual information from an image, which is in general a difficult and yet unsolved problem. In this paper, we propose studying semantic information in abstract images created from collections of clip art. Abstract images provide several advantages over real images. They allow for the direct study of how to infer high-level semantic information, since they remove the reliance on noisy low-level object, attribute and relation detectors, or the tedious hand-labeling of real images. Importantly, abstract images also allow the ability to generate sets of semantically similar scenes. Finding analogous sets of real images that are semantically similar would be nearly impossible. We create 1,002 sets of 10 semantically similar abstract images with corresponding written descriptions. We thoroughly analyze this dataset to discover semantically important features, the relations of words to visual features and methods for measuring semantic similarity. Finally, we study the relation between the saliency and memorability of objects and their semantic importance.

3.
IEEE Trans Pattern Anal Mach Intell ; 38(1): 74-87, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26656579

RESUMO

Recent trends in image understanding have pushed for scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers. In this work, we are interested in understanding the roles of these different tasks in improved scene understanding, in particular semantic segmentation, object detection and scene recognition. Towards this goal, we "plug-in" human subjects for each of the various components in a conditional random field model. Comparisons among various hybrid human-machine CRFs give us indications of how much "head room" there is to improve scene understanding by focusing research efforts on various individual tasks.


Assuntos
Inteligência Artificial/estatística & dados numéricos , Interfaces Cérebro-Computador/estatística & dados numéricos , Algoritmos , Simulação por Computador , Bases de Dados Factuais , Humanos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , Reconhecimento Visual de Modelos
4.
IEEE Trans Pattern Anal Mach Intell ; 36(7): 1469-82, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26353315

RESUMO

When glancing at a magazine, or browsing the Internet, we are continuously exposed to photographs. Despite this overflow of visual information, humans are extremely good at remembering thousands of pictures along with some of their visual details. But not all images are equal in memory. Some stick in our minds while others are quickly forgotten. In this paper, we focus on the problem of predicting how memorable an image will be. We show that memorability is an intrinsic and stable property of an image that is shared across different viewers, and remains stable across delays. We introduce a database for which we have measured the probability that each picture will be recognized after a single view. We analyze a collection of image features, labels, and attributes that contribute to making an image memorable, and we train a predictor based on global image descriptors. We find that predicting image memorability is a task that can be addressed with current computer vision techniques. While making memorable images is a challenging task in visualization, photography, and education, this work is a first attempt to quantify this useful property of images.


Assuntos
Sinais (Psicologia) , Rememoração Mental/fisiologia , Modelos Biológicos , Fotografação/métodos , Reconhecimento Psicológico/fisiologia , Análise e Desempenho de Tarefas , Percepção Visual/fisiologia , Simulação por Computador , Humanos
5.
IEEE Trans Pattern Anal Mach Intell ; 34(10): 1978-91, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22201066

RESUMO

Typically, object recognition is performed based solely on the appearance of the object. However, relevant information also exists in the scene surrounding the object. In this paper, we explore the roles that appearance and contextual information play in object recognition. Through machine experiments and human studies, we show that the importance of contextual information varies with the quality of the appearance information, such as an image's resolution. Our machine experiments explicitly model context between object categories through the use of relative location and relative scale, in addition to co-occurrence. With the use of our context model, our algorithm achieves state-of-the-art performance on the MSRC and Corel data sets. We perform recognition tests for machines and human subjects on low and high resolution images, which vary significantly in the amount of appearance information present, using just the object appearance information, the combination of appearance and context, as well as just context without object appearance information (blind recognition). We also explore the impact of the different sources of context (co-occurrence, relative-location, and relative-scale). We find that the importance of different types of contextual information varies significantly across data sets such as MSRC and PASCAL.


Assuntos
Inteligência Artificial , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Reconhecimento Visual de Modelos/fisiologia , Algoritmos , Humanos
6.
IEEE Trans Syst Man Cybern B Cybern ; 37(2): 437-50, 2007 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-17416170

RESUMO

This paper introduces Learn++, an ensemble of classifiers based algorithm originally developed for incremental learning, and now adapted for information/data fusion applications. Recognizing the conceptual similarity between incremental learning and data fusion, Learn++ follows an alternative approach to data fusion, i.e., sequentially generating an ensemble of classifiers that specifically seek the most discriminating information from each data set. It was observed that Learn++ based data fusion consistently outperforms a similarly configured ensemble classifier trained on any of the individual data sources across several applications. Furthermore, even if the classifiers trained on individual data sources are fine tuned for the given problem, Learn++ can still achieve a statistically significant improvement by combining them, if the additional data sets carry complementary information. The algorithm can also identify-albeit indirectly-those data sets that do not carry such additional information. Finally, it was shown that the algorithm can consecutively learn both the supplementary novel information coming from additional data of the same source, and the complementary information coming from new data sources without requiring access to any of the previously seen data.


Assuntos
Algoritmos , Inteligência Artificial , Análise por Conglomerados , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Software
7.
Conf Proc IEEE Eng Med Biol Soc ; 2005: 2479-82, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-17282740

RESUMO

We describe an ensemble of classifiers based data fusion approach to combine information from two sources, believed to contain complimentary information, for early diagnosis of Alzheimer's disease. Specifically, we use the event related potentials recorded from the Pz and Cz electrodes of the EEG, which are further analyzed using multiresolution wavelet analysis. The proposed data fusion approach includes generating multiple classifiers trained with strategically selected subsets of the training data from each source, which are then combined through a weighted majority voting. Several factors set this study apart from similar prior efforts: we use a larger cohort, specifically target early diagnosis of the disease, use an ensemble based approach rather then a single classifier, and most importantly, we combine information from multiple sources, rather then using a single modality. We present promising results obtained from the first 35 (of 80) patients whose data are analyzed thus far.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA