Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sensors (Basel) ; 22(21)2022 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-36366000

RESUMO

As one of the pioneering data representations, the point cloud has shown its straightforward capacity to depict fine geometry in many applications, including computer graphics, molecular structurology, modern sensing signal processing, and more. However, unlike computer graphs obtained with auxiliary regularization techniques or from syntheses, raw sensor/scanner (metric) data often contain natural random noise caused by multiple extrinsic factors, especially in the case of high-speed imaging scenarios. On the other hand, grid-like imaging techniques (e.g., RGB images or video frames) tend to entangle interesting aspects with environmental variations such as pose/illuminations with Euclidean sampling/processing pipelines. As one such typical problem, 3D Facial Expression Recognition (3D FER) has been developed into a new stage, with remaining difficulties involving the implementation of efficient feature abstraction methods for high dimensional observations and of stabilizing methods to obtain adequate robustness in cases of random exterior variations. In this paper, a localized and smoothed overlapping kernel is proposed to extract discriminative inherent geometric features. By association between the induced deformation stability and certain types of exterior perturbations through manifold scattering transform, we provide a novel framework that directly consumes point cloud coordinates for FER while requiring no predefined meshes or other features/signals. As a result, our compact framework achieves 78.33% accuracy on the Bosphorus dataset for expression recognition challenge and 77.55% on 3D-BUFE.


Assuntos
Reconhecimento Facial , Imageamento Tridimensional/métodos
2.
Comput Med Imaging Graph ; 101: 102110, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36057184

RESUMO

Medical image segmentation is a critical step in pathology assessment and monitoring. Extensive methods tend to utilize a deep convolutional neural network for various medical segmentation tasks, such as polyp segmentation, skin lesion segmentation, etc. However, due to the inherent difficulty of medical images and tremendous data variations, they usually perform poorly in some intractable cases. In this paper, we propose an input-specific network called conditional-synergistic convolution and lesion decoupling network (CCLDNet) to solve these issues. First, in contrast to existing CNN-based methods with stationary convolutions, we propose the conditional synergistic convolution (CSConv) that aims to generate a specialist convolution kernel for each lesion. CSConv has the ability of dynamic modeling and could be leveraged as a basic block to construct other networks in a broad range of vision tasks. Second, we devise a lesion decoupling strategy (LDS) to decouple the original lesion segmentation map into two soft labels, i.e., lesion center label and lesion boundary label, for reducing the segmentation difficulty. Besides, we use a transformer network as the backbone, further erasing the fixed structure of the standard CNN and empowering dynamic modeling capability of the whole framework. Our CCLDNet outperforms state-of-the-art approaches by a large margin on a variety of benchmarks, including polyp segmentation (89.22% dice score on EndoScene) and skin lesion segmentation (91.15% dice score on ISIC2018). Our code is available at https://github.com/QianChen98/CCLD-Net.


Assuntos
Processamento de Imagem Assistida por Computador , Dermatopatias , Algoritmos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação
3.
Artigo em Inglês | MEDLINE | ID: mdl-36099219

RESUMO

RGB-depth (RGB-D) salient object detection (SOD) recently has attracted increasing research interest, and many deep learning methods based on encoder-decoder architectures have emerged. However, most existing RGB-D SOD models conduct explicit and controllable cross-modal feature fusion either in the single encoder or decoder stage, which hardly guarantees sufficient cross-modal fusion ability. To this end, we make the first attempt in addressing RGB-D SOD through 3-D convolutional neural networks. The proposed model, named, aims at prefusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RGB and depth streams. Specifically, first conducts prefusion across RGB and depth modalities through a 3-D encoder obtained by inflating 2-D ResNet and later provides in-depth feature fusion by designing a 3-D decoder equipped with rich back-projection paths (RBPPs) for leveraging the extensive aggregation ability of 3-D convolutions. Toward an improved model, we propose to disentangle the conventional 3-D convolution into successive spatial and temporal convolutions and, meanwhile, discard unnecessary zero padding. This eventually results in a 2-D convolutional equivalence that facilitates optimization and reduces parameters and computation costs. Thanks to such a progressive-fusion strategy involving both the encoder and the decoder, effective and thorough interactions between the two modalities can be exploited and boost detection accuracy. As an additional boost, we also introduce channel-modality attention and its variant after each path of RBPP to attend to important features. Extensive experiments on seven widely used benchmark datasets demonstrate that and perform favorably against 14 state-of-the-art RGB-D SOD approaches in terms of five key evaluation metrics. Our code will be made publicly available at https://github.com/PPOLYpubki/RD3D.

4.
Artigo em Inglês | MEDLINE | ID: mdl-33861691

RESUMO

Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training process. Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture. In this paper, we propose two effective components: joint learning (JL), and densely cooperative fusion (DCF). The JL module provides robust saliency feature learning by exploiting cross-modal commonality via a Siamese network, while the DCF module is introduced for complementary feature discovery. Comprehensive experiments using 5 popular metrics show that the designed framework yields a robust RGB-D saliency detector with good generalization. As a result, JL-DCF significantly advances the SOTAs by an average of ~2.0% (F-measure) across 7 challenging datasets. In addition, we show that JL-DCF is readily applicable to other related multi-modal detection tasks, including RGB-T SOD and video SOD, achieving comparable or better performance.

5.
Sensors (Basel) ; 20(17)2020 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-32872196

RESUMO

In recent years, Generative Adversarial Networks (GANs)-based illumination processing of facial images has made favorable achievements. However, some GANs-based illumination-processing methods only pay attention to the image quality and neglect the recognition accuracy, whereas others only crop partial face area and ignore the challenges to synthesize photographic face, background and hair when the original face image is under extreme illumination (Image under extreme illumination (extreme illumination conditions) means that we cannot see the texture and structure information clearly and most pixel values tend to 0 or 255.). Moreover, the recognition accuracy is low when the faces are under extreme illumination conditions. For these reasons, we present an elaborately designed architecture based on convolutional neural network and GANs for processing the illumination of facial image. We use ResBlock at the down-sampling stage in our encoder and adopt skip connections in our generator. This special design together with our loss can enhance the ability to preserve identity and generate high-quality images. Moreover, we use different convolutional layers of a pre-trained feature network to extract varisized feature maps, and then use these feature maps to compute loss, which is named multi-stage feature maps (MSFM) loss. For the sake of fairly evaluating our method against state-of-the-art models, we use four metrics to estimate the performance of illumination-processing algorithms. A variety of experimental data indicate that our method is superior to the previous models under various illumination challenges in illumination processing. We conduct qualitative and quantitative experiments on two datasets, and the experimental data indicate that our scheme obviously surpasses the state-of-the-art algorithms in image quality and identification accuracy.


Assuntos
Reconhecimento Facial , Processamento de Imagem Assistida por Computador , Algoritmos , Humanos , Iluminação , Redes Neurais de Computação
6.
Sensors (Basel) ; 19(19)2019 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-31548515

RESUMO

Face recognition using depth data has attracted increasing attention from both academia and industry in the past five years. Previous works show a huge performance gap between high-quality and low-quality depth data. Due to the lack of databases and reasonable evaluations on data quality, very few researchers have focused on boosting depth-based face recognition by enhancing data quality or feature representation. In the paper, we carefully collect a new database including high-quality 3D shapes, low-quality depth images and the corresponding color images of the faces of 902 subjects, which have long been missing in the area. With the database, we make a standard evaluation protocol and propose three strategies to train low-quality depth-based face recognition models with the help of high-quality depth data. Our training strategies could serve as baselines for future research, and their feasibility of boosting low-quality depth-based face recognition is validated by extensive experiments.


Assuntos
Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Bases de Dados Factuais , Reconhecimento Facial/fisiologia , Humanos
7.
IEEE Trans Image Process ; 24(12): 5671-83, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26441448

RESUMO

Existing salient object detection models favor over-segmented regions upon which saliency is computed. Such local regions are less effective on representing object holistically and degrade emphasis of entire salient objects. As a result, the existing methods often fail to highlight an entire object in complex background. Toward better grouping of objects and background, in this paper, we consider graph cut, more specifically, the normalized graph cut (Ncut) for saliency detection. Since the Ncut partitions a graph in a normalized energy minimization fashion, resulting eigenvectors of the Ncut contain good cluster information that may group visual contents. Motivated by this, we directly induce saliency maps via eigenvectors of the Ncut, contributing to accurate saliency estimation of visual clusters. We implement the Ncut on a graph derived from a moderate number of superpixels. This graph captures both intrinsic color and edge information of image data. Starting from the superpixels, an adaptive multi-level region merging scheme is employed to seek such cluster information from Ncut eigenvectors. With developed saliency measures for each merged region, encouraging performance is obtained after across-level integration. Experiments by comparing with 13 existing methods on four benchmark datasets, including MSRA-1000, SOD, SED, and CSSD show the proposed method, Ncut saliency, results in uniform object enhancement and achieves comparable/better performance to the state-of-the-art methods.

8.
IEEE Trans Neural Netw Learn Syst ; 26(10): 2261-74, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25608310

RESUMO

Graph Laplacian has been widely exploited in traditional graph-based semisupervised learning (SSL) algorithms to regulate the labels of examples that vary smoothly on the graph. Although it achieves a promising performance in both transductive and inductive learning, it is not effective for handling ambiguous examples (shown in Fig. 1). This paper introduces deformed graph Laplacian (DGL) and presents label prediction via DGL (LPDGL) for SSL. The local smoothness term used in LPDGL, which regularizes examples and their neighbors locally, is able to improve classification accuracy by properly dealing with ambiguous examples. Theoretical studies reveal that LPDGL obtains the globally optimal decision function, and the free parameters are easy to tune. The generalization bound is derived based on the robustness analysis. Experiments on a variety of real-world data sets demonstrate that LPDGL achieves top-level performance on both transductive and inductive settings by comparing it with popular SSL algorithms, such as harmonic functions, AnchorGraph regularization, linear neighborhood propagation, Laplacian regularized least square, and Laplacian support vector machine.

9.
IEEE Trans Neural Netw Learn Syst ; 26(9): 2148-62, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25532192

RESUMO

How to propagate the label information from labeled examples to unlabeled examples is a critical problem for graph-based semisupervised learning. Many label propagation algorithms have been developed in recent years and have obtained promising performance on various applications. However, the eigenvalues of iteration matrices in these algorithms are usually distributed irregularly, which slow down the convergence rate and impair the learning performance. This paper proposes a novel label propagation method called Fick's law assisted propagation (FLAP). Unlike the existing algorithms that are directly derived from statistical learning, FLAP is deduced on the basis of the theory of Fick's First Law of Diffusion, which is widely known as the fundamental theory in fluid-spreading. We prove that FLAP will converge with linear rate and show that FLAP makes eigenvalues of the iteration matrix distributed regularly. Comprehensive experimental evaluations on synthetic and practical datasets reveal that FLAP obtains encouraging results in terms of both accuracy and efficiency.

10.
IEEE Trans Cybern ; 44(6): 882-93, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23963263

RESUMO

Video object tracking is widely used in many real-world applications, and it has been extensively studied for over two decades. However, tracking robustness is still an issue in most existing methods, due to the difficulties with adaptation to environmental or target changes. In order to improve adaptability, this paper formulates the tracking process as a ranking problem, and the PageRank algorithm, which is a well-known webpage ranking algorithm used by Google, is applied. Labeled and unlabeled samples in tracking application are analogous to query webpages and the webpages to be ranked, respectively. Therefore, determining the target is equivalent to finding the unlabeled sample that is the most associated with existing labeled set. We modify the conventional PageRank algorithm in three aspects for tracking application, including graph construction, PageRank vector acquisition and target filtering. Our simulations with the use of various challenging public-domain video sequences reveal that the proposed PageRank tracker outperforms mean-shift tracker, co-tracker, semiboosting and beyond semiboosting trackers in terms of accuracy, robustness and stability.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...