Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
IEEE Trans Image Process ; 30: 8130-8143, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34559649

RESUMO

Video hyperlinking is the task of linking two video fragments/clips based on their multi-modal contents. Specifically, given an anchor video as a query, machine techniques automatically generate links between the anchor and target videos by modeling and comparing their content aboutness. The term "aboutness" specifically refers to contextually relevant multimedia content, i.e., a fragment is on or of something. Since video contents are multi-modal (e.g., audio and vision), the content aboutness may be reflected across different modalities. Existing approaches regard hyperlinking as a retrieval task, by embedding multi-modal video contents into one or multiple common video representation space(s) for cross-modal comparison. As a result, the aboutness between videos is scored by computing the vector-distance based similarity in the learnt common feature space. However, these methods suffer from two main limitations: (1) the video modality descriptors/features are treated equally in representation learning, which hinders the effective modeling of their respective capabilities in linking; and (2) directly using the vector-distance based similarity to measure aboutness bears the risk of returning more duplicates. This paper focuses on addressing these two problems. Specifically, we firstly build attentional neural networks to learn a compact fragment-level representation, assigning different importance weights to different descriptor/feature contents by an attention mechanism. We believe that the potentially interesting content(s) should be highlighted in the representation. Furthermore, instead of directly computing the similarity of two representation embeddings, we secondly build a holographic composition network to model the aboutness for link establishment, with the core use of circular correlation. The two networks string together to form the final hyperlinking matching system. The entire model is trained in an end-to-end fashion. We examine its effectiveness by creating four train/validate/test partitioning schemes on the Blip10000 dataset and employing two video fragmentation methods.

2.
IEEE Trans Image Process ; 30: 1514-1526, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33360994

RESUMO

Food recognition has captured numerous research attention for its importance for health-related applications. The existing approaches mostly focus on the categorization of food according to dish names, while ignoring the underlying ingredient composition. In reality, two dishes with the same name do not necessarily share the exact list of ingredients. Therefore, the dishes under the same food category are not mandatorily equal in nutrition content. Nevertheless, due to limited datasets available with ingredient labels, the problem of ingredient recognition is often overlooked. Furthermore, as the number of ingredients is expected to be much less than the number of food categories, ingredient recognition is more tractable in the real-world scenario. This paper provides an insightful analysis of three compelling issues in ingredient recognition. These issues involve recognition in either image-level or region level, pooling in either single or multiple image scales, learning in either single or multi-task manner. The analysis is conducted on a large food dataset, Vireo Food-251, contributed by this paper. The dataset is composed of 169,673 images with 251 popular Chinese food and 406 ingredients. The dataset includes adequate challenges in scale and complexity to reveal the limit of the current approaches in ingredient recognition.


Assuntos
Aprendizado Profundo , Ingredientes de Alimentos/classificação , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , China , Culinária , Humanos
3.
IEEE Trans Image Process ; 18(2): 412-23, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19144595

RESUMO

Near-duplicate (ND) detection appears as a timely issue recently, being regarded as a powerful tool for various emerging applications. In the Web 2.0 environment particularly, the identification of near-duplicates enables the tasks such as copyright enforcement, news topic tracking, image and video search. In this paper, we describe an algorithm, namely Scale-Rotation invariant Pattern Entropy (SR-PE), for the detection of near-duplicates in large-scale video corpus. SR-PE is a novel pattern evaluation technique capable of measuring the spatial regularity of matching patterns formed by local keypoints. More importantly, the coherency of patterns and the perception of visual similarity, under the scenario that there could be multiple ND regions undergone arbitrary transformations, respectively, are carefully addressed through entropy measure. To demonstrate our work in large-scale dataset, a practical framework composed of three components: bag-of-words representation, local keypoint matching and SR-PE evaluation, is also proposed for the rapid detection of near-duplicates.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Gravação em Vídeo/métodos , Reprodutibilidade dos Testes , Rotação , Sensibilidade e Especificidade
4.
IEEE Trans Med Imaging ; 35(7): 1741-52, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-26886971

RESUMO

Wireless capsule endoscopy (WCE) has become a widely used diagnostic technique to examine inflammatory bowel diseases and disorders. As one of the most common human helminths, hookworm is a kind of small tubular structure with grayish white or pinkish semi-transparent body, which is with a number of 600 million people infection around the world. Automatic hookworm detection is a challenging task due to poor quality of images, presence of extraneous matters, complex structure of gastrointestinal, and diverse appearances in terms of color and texture. This is the first few works to comprehensively explore the automatic hookworm detection for WCE images. To capture the properties of hookworms, the multi scale dual matched filter is first applied to detect the location of tubular structure. Piecewise parallel region detection method is then proposed to identify the potential regions having hookworm bodies. To discriminate the unique visual features for different components of gastrointestinal, the histogram of average intensity is proposed to represent their properties. In order to deal with the problem of imbalance data, Rusboost is deployed to classify WCE images. Experiments on a diverse and large scale dataset with 440 K WCE images demonstrate that the proposed approach achieves a promising performance and outperforms the state-of-the-art methods. Moreover, the high sensitivity in detecting hookworms indicates the potential of our approach for future clinical application.


Assuntos
Ancylostomatoidea , Endoscopia por Cápsula , Animais , Cor , Humanos
5.
IEEE Trans Image Process ; 24(11): 3781-95, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26186774

RESUMO

Human action recognition in unconstrained videos is a challenging problem with many applications. Most state-of-the-art approaches adopted the well-known bag-of-features representations, generated based on isolated local patches or patch trajectories, where motion patterns, such as object-object and object-background relationships are mostly discarded. In this paper, we propose a simple representation aiming at modeling these motion relationships. We adopt global and local reference points to explicitly characterize motion information, so that the final representation is more robust to camera movements, which widely exist in unconstrained videos. Our approach operates on the top of visual codewords generated on dense local patch trajectories, and therefore, does not require foreground-background separation, which is normally a critical and difficult step in modeling object relationships. Through an extensive set of experimental evaluations, we show that the proposed representation produces a very competitive performance on several challenging benchmark data sets. Further combining it with the standard bag-of-features or Fisher vector representations can lead to substantial improvements.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Gravação em Vídeo/métodos , Algoritmos , Humanos , Movimento
6.
IEEE Trans Image Process ; 12(3): 341-55, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-18237913

RESUMO

This paper presents new approaches in characterizing and segmenting the content of video. These approaches are developed based upon the pattern analysis of spatio-temporal slices. While traditional approaches to motion sequence analysis tend to formulate computational methodologies on two or three adjacent frames, spatio-temporal slices provide rich visual patterns along a larger temporal scale. We first describe a motion computation method based on a structure tensor formulation. This method encodes visual patterns of spatio-temporal slices in a tensor histogram, on one hand, characterizing the temporal changes of motion over time, on the other hand, describing the motion trajectories of different moving objects. By analyzing the tensor histogram of an image sequence, we can temporally segment the sequence into several motion coherent subunits, in addition, spatially segment the sequence into various motion layers. The temporal segmentation of image sequences expeditiously facilitates the motion annotation and content representation of a video, while the spatial decomposition of image sequences leads to a prominent way of reconstructing background panoramic images and computing foreground objects.

7.
IEEE Trans Image Process ; 23(2): 527-40, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26270906

RESUMO

Near-duplicate retrieval (NDR) in merchandize images is of great importance to a lot of online applications on e-Commerce websites. In those applications where the requirement of response time is critical, however, the conventional techniques developed for a general purpose NDR are limited, because expensive post-processing like spatial verification or hashing is usually employed to compromise the quantization errors among the visual words used for the images. In this paper, we argue that most of the errors are introduced because of the quantization process where the visual words are considered individually, which has ignored the contextual relations among words. We propose a "spelling or phrase correction" like process for NDR, which extends the concept of collocations to visual domain for modeling the contextual relations. Binary quadratic programming is used to enforce the contextual consistency of words selected for an image, so that the errors (typos) are eliminated and the quality of the quantization process is improved. The experimental results show that the proposed method can improve the efficiency of NDR by reducing vocabulary size by 1000% times, and under the scenario of merchandize image NDR, the expensive local interest point feature used in conventional approaches can be replaced by color-moment feature, which reduces the time cost by 9202% while maintaining comparable performance to the state-of-the-art methods.

8.
IEEE Trans Image Process ; 22(4): 1644-55, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23288334

RESUMO

Search reranking is regarded as a common way to boost retrieval precision. The problem nevertheless is not trivial especially when there are multiple features or modalities to be considered for search, which often happens in image and video retrieval. This paper proposes a new reranking algorithm, named circular reranking, that reinforces the mutual exchange of information across multiple modalities for improving search performance, following the philosophy that strong performing modality could learn from weaker ones, while weak modality does benefit from interacting with stronger ones. Technically, circular reranking conducts multiple runs of random walks through exchanging the ranking scores among different features in a cyclic manner. Unlike the existing techniques, the reranking procedure encourages interaction among modalities to seek a consensus that are useful for reranking. In this paper, we study several properties of circular reranking, including how and which order of information propagation should be configured to fully exploit the potential of modalities for reranking. Encouraging results are reported for both image and video retrieval on Microsoft Research Asia Multimedia image dataset and TREC Video Retrieval Evaluation 2007-2008 datasets, respectively.

9.
IEEE Trans Image Process ; 22(3): 980-91, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23144031

RESUMO

Scale-invariant feature transform (SIFT) feature has been widely accepted as an effective local keypoint descriptor for its invariance to rotation, scale, and lighting changes in images. However, it is also well known that SIFT, which is derived from directionally sensitive gradient fields, is not flip invariant. In real-world applications, flip or flip-like transformations are commonly observed in images due to artificial flipping, opposite capturing viewpoint, or symmetric patterns of objects. This paper proposes a new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties of SIFT while being tolerant to flips. F-SIFT starts by estimating the dominant curl of a local patch and then geometrically normalizes the patch by flipping before the computation of SIFT. We demonstrate the power of F-SIFT on three tasks: large-scale video copy detection, object recognition, and detection. In copy detection, a framework, which smartly indices the flip properties of F-SIFT for rapid filtering and weak geometric checking, is proposed. F-SIFT not only significantly improves the detection accuracy of SIFT, but also leads to a more than 50% savings in computational cost. In object recognition, we demonstrate the superiority of F-SIFT in dealing with flip transformation by comparing it to seven other descriptors. In object detection, we further show the ability of F-SIFT in describing symmetric objects. Consistent improvement across different kinds of keypoint detectors is observed for F-SIFT over the original SIFT.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
10.
IEEE Trans Image Process ; 21(6): 3080-91, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22345543

RESUMO

Exploring context information for visual recognition has recently received significant research attention. This paper proposes a novel and highly efficient approach, which is named semantic diffusion, to utilize semantic context for large-scale image and video annotation. Starting from the initial annotation of a large number of semantic concepts (categories), obtained by either machine learning or manual tagging, the proposed approach refines the results using a graph diffusion technique, which recovers the consistency and smoothness of the annotations over a semantic graph. Different from the existing graph-based learning methods that model relations among data samples, the semantic graph captures context by treating the concepts as nodes and the concept affinities as the weights of edges. In particular, our approach is capable of simultaneously improving annotation accuracy and adapting the concept affinities to new test data. The adaptation provides a means to handle domain change between training and test data, which often occurs in practice. Extensive experiments are conducted to improve concept annotation results using Flickr images and TV program videos. Results show consistent and significant performance gain (10 +% on both image and video data sets). Source codes of the proposed algorithms are available online.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA