RESUMO
A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications.
RESUMO
Offline handwritten text recognition (HTR) for historical documents aims for effective transcription by addressing challenges that originate from the low quality of manuscripts under study as well as from several particularities which are related to the historical period of writing. In this paper, the challenge in HTR is related to a focused goal of the transcription of Greek historical manuscripts that contain several particularities. To this end, in this paper, a convolutional recurrent neural network architecture is proposed that comprises octave convolution and recurrent units which use effective gated mechanisms. The proposed architecture has been evaluated on three newly created collections from Greek historical handwritten documents that will be made publicly available for research purposes as well as on standard datasets like IAM and RIMES. For evaluation we perform a concise study which shows that compared to state of the art architectures, the proposed one deals effectively with the challenging Greek historical manuscripts.
RESUMO
Word spotting strategies employed in historical handwritten documents face many challenges due to variation in the writing style and intense degradation. In this paper, a new method that permits efficient and effective word spotting in handwritten documents is presented that relies upon document-oriented local features that take into account information around representative keypoints and a matching process that incorporates a spatial context in a local proximity search without using any training data. The method relies on a document-oriented keypoint and feature extraction, along with a fast feature matching method. This enables the corresponding methodological pipeline to be both effectively and efficiently employed in the cloud so that word spotting can be realised as a service in modern mobile devices. The effectiveness and efficiency of the proposed method in terms of its matching accuracy, along with its fast retrieval time, respectively, are shown after a consistent evaluation of several historical handwritten datasets.
RESUMO
Background and ObjectivesSegmentation of mammographic lesions has been proven to be a valuable source of information, as it can assist in both extracting shape-related features and providing accurate localization of the lesion. In this work, a methodology is proposed for integrating mammographic mass segmentation information into a convolutional neural network (CNN), aiming to improve the diagnosis of breast cancer in mammograms. MethodsThe proposed methodology involves modification of each convolutional layer of a CNN, so that information of not only the input image but also the corresponding segmentation map is considered. Furthermore, a new loss function is introduced, which adds an extra term to the standard cross-entropy, aiming to steer the attention of the network to the mass region, penalizing strong feature activations based on their location. The segmentation maps are acquired either from the provided ground-truth or from an automatic segmentation stage. ResultsPerformance evaluation in diagnosis is conducted on two mammographic mass datasets, namely DDSM-400 and CBIS-DDSM, with differences in quality of the corresponding ground-truth segmentation maps. The proposed method achieves diagnosis performance of 0.898 and 0.862 in terms AUC when using ground-truth segmentation maps and a maximum of 0.880 and 0.860 when a U-Net-based automatic segmentation stage is employed, for DDSM-400 and CBIS-DDSM, respectively. ConclusionsThe experimental results demonstrate that integrating segmentation information into a CNN leads to improved performance in breast cancer diagnosis of mammographic masses.
Assuntos
Neoplasias da Mama , Neoplasias da Mama/diagnóstico por imagem , Humanos , Mamografia , Redes Neurais de ComputaçãoRESUMO
Deep convolutional neural networks (CNNs) are investigated in the context of computer-aided diagnosis (CADx) of breast cancer. State-of-the-art CNNs are trained and evaluated on two mammographic datasets, consisting of ROIs depicting benign or malignant mass lesions. The performance evaluation of each examined network is addressed in two training scenarios: the first involves initializing the network with pre-trained weights, while for the second the networks are initialized in a random fashion. Extensive experimental results show the superior performance achieved in the case of fine-tuning a pretrained network compared to training from scratch.
RESUMO
Word spotting strategies employed in historical handwritten documents face many challenges due to variation in the writing style and intense degradation. In this paper, a new method that permits effective word spotting in handwritten documents is presented that it relies upon document-oriented local features, which take into account information around representative keypoints as well a matching process that incorporates spatial context in a local proximity search without using any training data. Experimental results on four historical handwritten data sets for two different scenarios (segmentation-based and segmentation-free) using standard evaluation measures show the improved performance achieved by the proposed methodology.
RESUMO
A fully automatic mesh segmentation scheme using heterogeneous graphs is presented. We introduce a spectral framework where local geometry affinities are coupled with surface patch affinities. A heterogeneous graph is constructed combining two distinct graphs: a weighted graph based on adjacency of patches of an initial over-segmentation, and the weighted dual mesh graph. The partitioning relies on processing each eigenvector of the heterogeneous graph Laplacian individually, taking into account the nodal set and nodal domain theory. Experiments on standard datasets show that the proposed unsupervised approach outperforms the state-of-the-art unsupervised methodologies and is comparable to the best supervised approaches.
RESUMO
PURPOSE: Primary goal of this study is to select optimal registration schemes in the framework of interstitial lung disease (ILD) follow-up analysis in CT. METHODS: A set of 128 multiresolution schemes composed of multiresolution nonrigid and combinations of rigid and nonrigid registration schemes are evaluated, utilizing ten artificially warped ILD follow-up volumes, originating from ten clinical volumetric CT scans of ILD affected patients, to select candidate optimal schemes. Specifically, all combinations of four transformation models (three rigid: rigid, similarity, affine and one nonrigid: third order B-spline), four cost functions (sum-of-square distances, normalized correlation coefficient, mutual information, and normalized mutual information), four gradient descent optimizers (standard, regular step, adaptive stochastic, and finite difference), and two types of pyramids (recursive and Gaussian-smoothing) were considered. The selection process involves two stages. The first stage involves identification of schemes with deformation field singularities, according to the determinant of the Jacobian matrix. In the second stage, evaluation methodology is based on distance between corresponding landmark points in both normal lung parenchyma (NLP) and ILD affected regions. Statistical analysis was performed in order to select near optimal registration schemes per evaluation metric. Performance of the candidate registration schemes was verified on a case sample of ten clinical follow-up CT scans to obtain the selected registration schemes. RESULTS: By considering near optimal schemes common to all ranking lists, 16 out of 128 registration schemes were initially selected. These schemes obtained submillimeter registration accuracies in terms of average distance errors 0.18 ± 0.01 mm for NLP and 0.20 ± 0.01 mm for ILD, in case of artificially generated follow-up data. Registration accuracy in terms of average distance error in clinical follow-up data was in the range of 1.985-2.156 mm and 1.966-2.234 mm, for NLP and ILD affected regions, respectively, excluding schemes with statistically significant lower performance (Wilcoxon signed-ranks test, p < 0.05), resulting in 13 finally selected registration schemes. CONCLUSIONS: Selected registration schemes in case of ILD CT follow-up analysis indicate the significance of adaptive stochastic gradient descent optimizer, as well as the importance of combined rigid and nonrigid schemes providing high accuracy and time efficiency. The selected optimal deformable registration schemes are equivalent in terms of their accuracy and thus compatible in terms of their clinical outcome.
Assuntos
Doenças Pulmonares Intersticiais/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Conjuntos de Dados como Assunto , Seguimentos , Humanos , Imageamento Tridimensional/métodos , Pulmão/diagnóstico por imagem , Radiografia Torácica/métodosRESUMO
Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behavior, as well as verifying its effectiveness, by providing qualitative and quantitative indication of its performance. This paper addresses a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the recall and precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms, background noise, character enlargement, and merging. Several experiments conducted in comparison with other pixel-based evaluation measures demonstrate the validity of the proposed evaluation scheme.
Assuntos
Algoritmos , Documentação/métodos , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Impressão , Inteligência Artificial , Escrita Manual , Humanos , Reprodutibilidade dos TestesRESUMO
Document digitization with either flatbed scanners or camera-based systems results in document images which often suffer from warping and perspective distortions that deteriorate the performance of current OCR approaches. In this paper, we present a goal-oriented rectification methodology to compensate for undesirable document image distortions aiming to improve the OCR result. Our approach relies upon a coarse-to-fine strategy. First, a coarse rectification is accomplished with the aid of a computationally low cost transformation which addresses the projection of a curved surface to a 2-D rectangular area. The projection of the curved surface on the plane is guided only by the textual content's appearance in the document image while incorporating a transformation which does not depend on specific model primitives or camera setup parameters. Second, pose normalization is applied on the word level aiming to restore all the local distortions of the document image. Experimental results on various document images with a variety of distortions demonstrate the robustness and effectiveness of the proposed rectification methodology using a consistent evaluation methodology that encounters OCR accuracy and a newly introduced measure using a semi-automatic procedure.
Assuntos
Inteligência Artificial , Documentação/métodos , Processamento Eletrônico de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Algoritmos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
We present a new framework for the hierarchical segmentation of color images. The proposed scheme comprises a nonlinear scale-space with vector-valued gradient watersheds. Our aim is to produce a meaningful hierarchy among the objects in the image using three image components of distinct perceptual significance for a human observer, namely strong edges, smooth segments and detailed segments. The scale-space is based on a vector-valued diffusion that uses the Additive Operator Splitting numerical scheme. Furthermore, we introduce the principle of the dynamics of contours in scale-space that combines scale and contrast information. The performance of the proposed segmentation scheme is presented via experimental results obtained with a wide range of images including natural and artificial scenes.