Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 43(8): 2739-2751, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-32086197

RESUMO

We introduce a detection framework for dense crowd counting and eliminate the need for the prevalent density regression paradigm. Typical counting models predict crowd density for an image as opposed to detecting every person. These regression methods, in general, fail to localize persons accurate enough for most applications other than counting. Hence, we adopt an architecture that locates every person in the crowd, sizes the spotted heads with bounding box and then counts them. Compared to normal object or face detectors, there exist certain unique challenges in designing such a detection system. Some of them are direct consequences of the huge diversity in dense crowds along with the need to predict boxes contiguously. We solve these issues and develop our LSC-CNN model, which can reliably detect heads of people across sparse to dense crowds. LSC-CNN employs a multi-column architecture with top-down feature modulation to better resolve persons and produce refined predictions at multiple resolutions. Interestingly, the proposed training regime requires only point head annotation, but can estimate approximate size information of heads. We show that LSC-CNN not only has superior localization than existing density regressors, but outperforms in counting as well. The code for our approach is available at https://github.com/val-iisc/lsc-cnn.


Assuntos
Algoritmos , Aglomeração , Cabeça , Humanos , Análise de Regressão
2.
IEEE Trans Pattern Anal Mach Intell ; 42(1): 221-231, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-30369439

RESUMO

The ability of intelligent agents to play games in human-like fashion is popularly considered a benchmark of progress in Artificial Intelligence. In our work, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, a guessing task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data. Sketch-QA involves asking a fixed question ("What object is being drawn?") and gathering open-ended guess-words from human guessers. We analyze the resulting dataset and present many interesting findings therein. To mimic Pictionary-style guessing, we propose a deep neural model which generates guess-words in response to temporally evolving human-drawn object sketches. Our model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate our model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. Experimental results demonstrate the promise of our approach for Pictionary and similarly themed games.

3.
Biomed Opt Express ; 10(5): 2227-2243, 2019 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-31149371

RESUMO

The methods available for solving the inverse problem of photoacoustic tomography promote only one feature-either being smooth or sharp-in the resultant image. The fusion of photoacoustic images reconstructed from distinct methods improves the individually reconstructed images, with the guided filter based approach being state-of-the-art, which requires that implicit regularization parameters are chosen. In this work, a deep fusion method based on convolutional neural networks has been proposed as an alternative to the guided filter based approach. It has the combined benefit of using less data for training without the need for the careful choice of any parameters and is a fully data-driven approach. The proposed deep fusion approach outperformed the contemporary fusion method, which was proved using experimental, numerical phantoms and in-vivo studies. The improvement obtained in the reconstructed images was as high as 95.49% in root mean square error and 7.77 dB in signal to noise ratio (SNR) in comparison to the guided filter approach. Also, it was demonstrated that the proposed deep fuse approach, trained on only blood vessel type images at measurement data SNR being 40 dB, was able to provide a generalization that can work across various noise levels in the measurement data, experimental set-ups as well as imaging objects.

4.
IEEE Trans Pattern Anal Mach Intell ; 41(10): 2452-2465, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30072314

RESUMO

Machine learning models are susceptible to adversarial perturbations: small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task specific, (ii) require samples from the training data distribution, and (iii) perform complex optimizations. Additionally, because of the data dependence, fooling ability of the crafted perturbations is proportional to the available training data. In this paper, we present a novel, generalizable and data-free approach for crafting universal adversarial perturbations. Independent of the underlying task, our objective achieves fooling via corrupting the extracted features at multiple layers. Therefore, the proposed objective is generalizable to craft image-agnostic perturbations across multiple vision tasks such as object recognition, semantic segmentation, and depth estimation. In the practical setting of black-box attack scenario (when the attacker does not have access to the target model and it's training data), we show that our objective outperforms the data dependent objectives to fool the learned models. Further, via exploiting simple priors related to the data distribution, our objective remarkably boosts the fooling ability of the crafted perturbations. Significant fooling rates achieved by our objective emphasize that the current deep learning models are now at an increased risk, since our objective generalizes across multiple tasks without the requirement of training data for crafting the perturbations. To encourage reproducible research, we have released the codes for our proposed algorithm.1.

5.
IEEE Trans Image Process ; 26(9): 4446-4456, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28692956

RESUMO

Understanding and predicting the human visual attention mechanism is an active area of research in the fields of neuroscience and computer vision. In this paper, we propose DeepFix, a fully convolutional neural network, which models the bottom-up mechanism of visual attention via saliency prediction. Unlike classical works, which characterize the saliency map using various hand-crafted features, our model automatically learns features in a hierarchical fashion and predicts the saliency map in an end-to-end manner. DeepFix is designed to capture semantics at multiple scales while taking global context into account, by using network layers with very large receptive fields. Generally, fully convolutional nets are spatially invariant-this prevents them from modeling location-dependent patterns (e.g., centre-bias). Our network handles this by incorporating a novel location-biased convolutional layer. We evaluate our model on multiple challenging saliency data sets and show that it achieves the state-of-the-art results.


Assuntos
Fixação Ocular/fisiologia , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Modelos Estatísticos , Redes Neurais de Computação , Algoritmos , Atenção/fisiologia , Humanos
6.
IEEE Trans Image Process ; 23(5): 2193-205, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24818242

RESUMO

In this paper, we propose FeatureMatch, a generalised approximate nearest-neighbour field (ANNF) computation framework, between a source and target image. The proposed algorithm can estimate ANNF maps between any image pairs, not necessarily related. This generalisation is achieved through appropriate spatial-range transforms. To compute ANNF maps, global colour adaptation is applied as a range transform on the source image. Image patches from the pair of images are approximated using low-dimensional features, which are used along with KD-tree to estimate the ANNF map. This ANNF map is further improved based on image coherency and spatial transforms. The proposed generalisation, enables us to handle a wider range of vision applications, which have not been tackled using the ANNF framework. We illustrate two such applications namely: 1) optic disk detection and 2) super resolution. The first application deals with medical imaging, where we locate optic disks in retinal images using a healthy optic disk image as common target image. The second application deals with super resolution of synthetic images using a common source image as dictionary. We make use of ANNF mappings in both these applications and show experimentally that our proposed approaches are faster and accurate, compared with the state-ofthe-art techniques.

7.
Comput Med Imaging Graph ; 38(1): 49-56, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24290957

RESUMO

Approximate Nearest Neighbour Field maps are commonly used by computer vision and graphics community to deal with problems like image completion, retargetting, denoising, etc. In this paper, we extend the scope of usage of ANNF maps to medical image analysis, more specifically to optic disk detection in retinal images. In the analysis of retinal images, optic disk detection plays an important role since it simplifies the segmentation of optic disk and other retinal structures. The proposed approach uses FeatureMatch, an ANNF algorithm, to find the correspondence between a chosen optic disk reference image and any given query image. This correspondence provides a distribution of patches in the query image that are closest to patches in the reference image. The likelihood map obtained from the distribution of patches in query image is used for optic disk detection. The proposed approach is evaluated on five publicly available DIARETDB0, DIARETDB1, DRIVE, STARE and MESSIDOR databases, with total of 1540 images. We show, experimentally, that our proposed approach achieves an average detection accuracy of 99% and an average computation time of 0.2 s per image.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Disco Óptico/anatomia & histologia , Reconhecimento Automatizado de Padrão/métodos , Retinoscopia/métodos , Humanos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA