Pesquisa | Portal Regional da BVS

Inverse-Based Approach to Explaining and Visualizing Convolutional Neural Networks.

Kwon, Hyuk Jin; Koo, Hyung Il; Soh, Jae Woong; Cho, Nam Ik.

IEEE Trans Neural Netw Learn Syst ; 33(12): 7318-7329, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-34138716

RESUMO

This article presents a new method for understanding and visualizing convolutional neural networks (CNNs). Most existing approaches to this problem focus on a global score and evaluate the pixelwise contribution of inputs to the score. The analysis of CNNs for multilabeled outputs or regression has not yet been considered in the literature, despite their success on image classification tasks with well-defined global scores. To address this problem, we propose a new inverse-based approach that computes the inverse of a feedforward pass to identify activations of interest in lower layers. We developed a layerwise inverse procedure based on two observations: 1) inverse results should have consistent internal activations to the original forward pass and 2) a small amount of activation in inverse results is desirable for human interpretability. Experimental results show that the proposed method allows us to analyze CNNs for classification and regression in the same framework. We demonstrated that our method successfully finds attributions in the inputs for image classification with comparable performance to state-of-the-art methods. To visualize the tradeoff between various methods, we developed a novel plot that shows the tradeoff between the amount of activations and the rate of class reidentification. In the case of regression, our method showed that conventional CNNs for single image super-resolution overlook a portion of frequency bands that may result in performance degradation.

Assuntos

Redes Neurais de Computação , Humanos

Scene text detection via connected component clustering and nontext filtering.

Koo, Hyung Il; Kim, Duck Hoon.

IEEE Trans Image Process ; 22(6): 2296-305, 2013 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-23475363

RESUMO

In this paper, we present a new scene text detection algorithm based on two machine learning classifiers: one allows us to generate candidate word regions and the other filters out nontext ones. To be precise, we extract connected components (CCs) in images by using the maximally stable extremal region algorithm. These extracted CCs are partitioned into clusters so that we can generate candidate regions. Unlike conventional methods relying on heuristic rules in clustering, we train an AdaBoost classifier that determines the adjacency relationship and cluster CCs by using their pairwise relations. Then we normalize candidate word regions and determine whether each region contains text or not. Since the scale, skew, and color of each candidate can be estimated from CCs, we develop a text/nontext classifier for normalized images. This classifier is based on multilayer perceptrons and we can control recall and precision rates with a single free parameter. Finally, we extend our approach to exploit multichannel information. Experimental results on ICDAR 2005 and 2011 robust reading competition datasets show that our method yields the state-of-the-art performance both in speed and accuracy.

Assuntos

Inteligência Artificial , Análise por Conglomerados , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão

Text-line extraction in handwritten Chinese documents based on an energy minimization framework.

Koo, Hyung Il; Cho, Nam Ik.

IEEE Trans Image Process ; 21(3): 1169-75, 2012 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-21896387

RESUMO

Text-line extraction in unconstrained handwritten documents remains a challenging problem due to nonuniform character scale, spatially varying text orientation, and the interference between text lines. In order to address these problems, we propose a new cost function that considers the interactions between text lines and the curvilinearity of each text line. Precisely, we achieve this goal by introducing normalized measures for them, which are based on an estimated line spacing. We also present an optimization method that exploits the properties of our cost function. Experimental results on a database consisting of 853 handwritten Chinese document images have shown that our method achieves a detection rate of 99.52% and an error rate of 0.32%, which outperforms conventional methods.

Design of interchannel MRF model for probabilistic multichannel image processing.

Koo, Hyung Il; Cho, Nam Ik.

IEEE Trans Image Process ; 20(3): 601-11, 2011 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-20875973

RESUMO

In this paper, we present a novel framework that exploits an informative reference channel in the processing of another channel. We formulate the problem as a maximum a posteriori estimation problem considering a reference channel and develop a probabilistic model encoding the interchannel correlations based on Markov random fields. Interestingly, the proposed formulation results in an image-specific and region-specific linear filter for each site. The strength of filter response can also be controlled in order to transfer the structural information of a channel to the others. Experimental results on satellite image fusion and chrominance image interpolation with denoising show that our method provides improved subjective and objective performance compared with conventional approaches.

Assuntos

Processamento de Imagem Assistida por Computador/métodos , Modelos Estatísticos , Algoritmos , Cadeias de Markov

Composition of a dewarped and enhanced document image from two view images.

Koo, Hyung Il; Kim, Jinho; Cho, Nam Ik.

IEEE Trans Image Process ; 18(7): 1551-62, 2009 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-19447710

RESUMO

In this paper, we propose an algorithm to compose a geometrically dewarped and visually enhanced image from two document images taken by a digital camera at different angles. Unlike the conventional works that require special equipment or assumptions on the contents of books or complicated image acquisition steps, we estimate the unfolded book or document surface from the corresponding points between two images. For this purpose, the surface and camera matrices are estimated using structure reconstruction, 3-D projection analysis, and random sample consensus-based curve fitting with the cylindrical surface model. Because we do not need any assumption on the contents of books, the proposed method can be applied not only to optical character recognition (OCR), but also to the high-quality digitization of pictures in documents. In addition to the dewarping for a structurally better image, image mosaic is also performed for further improving the visual quality. By finding better parts of images (with less out of focus blur and/or without specular reflections) from either of views, we compose a better image by stitching and blending them. These processes are formulated as energy minimization problems that can be solved using a graph cut method. Experiments on many kinds of book or document images show that the proposed algorithm robustly works and yields visually pleasing results. Also, the OCR rate of the resulting image is comparable to that of document images from a flatbed scanner.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA