Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
IEEE J Biomed Health Inform ; 28(7): 4170-4183, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38954557

RESUMEN

Efficient medical image segmentation aims to provide accurate pixel-wise predictions with a lightweight implementation framework. However, existing lightweight networks generally overlook the generalizability of the cross-domain medical segmentation tasks. In this paper, we propose Generalizable Knowledge Distillation (GKD), a novel framework for enhancing the performance of lightweight networks on cross-domain medical segmentation by generalizable knowledge distillation from powerful teacher networks. Considering the domain gaps between different medical datasets, we propose the Model-Specific Alignment Networks (MSAN) to obtain the domain-invariant representations. Meanwhile, a customized Alignment Consistency Training (ACT) strategy is designed to promote the MSAN training. Based on the domain-invariant vectors in MSAN, we propose two generalizable distillation schemes, Dual Contrastive Graph Distillation (DCGD) and Domain-Invariant Cross Distillation (DICD). In DCGD, two implicit contrastive graphs are designed to model the intra-coupling and inter-coupling semantic correlations. Then, in DICD, the domain-invariant semantic vectors are reconstructed from two networks (i.e., teacher and student) with a crossover manner to achieve simultaneous generalization of lightweight networks, hierarchically. Moreover, a metric named Fréchet Semantic Distance (FSD) is tailored to verify the effectiveness of the regularized domain-invariant features. Extensive experiments conducted on the Liver, Retinal Vessel and Colonoscopy segmentation datasets demonstrate the superiority of our method, in terms of performance and generalization ability on lightweight networks.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Redes Neurales de la Computación , Bases de Datos Factuales , Aprendizaje Profundo
2.
Artículo en Inglés | MEDLINE | ID: mdl-38885108

RESUMEN

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance. However, the process of collecting and labeling such data can be expensive and time-consuming. Self-supervised learning (SSL), a subset of unsupervised learning, aims to learn discriminative features from unlabeled data without relying on human-annotated labels. SSL has garnered significant attention recently, leading to the development of numerous related algorithms. However, there is a dearth of comprehensive studies that elucidate the connections and evolution of different SSL variants. This paper presents a review of diverse SSL methods, encompassing algorithmic aspects, application domains, three key trends, and open research questions. Firstly, we provide a detailed introduction to the motivations behind most SSL algorithms and compare their commonalities and differences. Secondly, we explore representative applications of SSL in domains such as image processing, computer vision, and natural language processing. Lastly, we discuss the three primary trends observed in SSL research and highlight the open questions that remain. A curated collection of valuable resources can be accessed at https://github.com/guijiejie/SSL.

3.
Artículo en Inglés | MEDLINE | ID: mdl-38776190

RESUMEN

Although face swapping has attracted much attention in recent years, it remains a challenging problem. Existing methods leverage a large number of data samples to explore the intrinsic properties of face swapping without considering the semantic information of face images. Moreover, the representation of the identity information tends to be fixed, leading to suboptimal face swapping. In this paper, we present a simple yet efficient method named FaceSwapper, for one-shot face swapping based on Generative Adversarial Networks. Our method consists of a disentangled representation module and a semantic-guided fusion module. The disentangled representation module comprises an attribute encoder and an identity encoder, which aims to achieve the disentanglement of the identity and attribute information. The identity encoder is more flexible, and the attribute encoder contains more attribute details than its competitors. Benefiting from the disentangled representation, FaceSwapper can swap face images progressively. In addition, semantic information is introduced into the semantic-guided fusion module to control the swapped region and model the pose and expression more accurately. Experimental results show that our method achieves state-of-the-art results on benchmark datasets with fewer training samples. Our code is publicly available at https://github.com/liqi-casia/FaceSwapper.

4.
Artículo en Inglés | MEDLINE | ID: mdl-38170659

RESUMEN

Human faces contain rich semantic information that could hardly be described without a large vocabulary and complex sentence patterns. However, most existing text-to-image synthesis methods could only generate meaningful results based on limited sentence templates with words contained in the training set, which heavily impairs the generalization ability of these models. In this paper, we define a novel 'free-style' text-to-face generation and manipulation problem, and propose an effective solution, named AnyFace++, which is applicable to a much wider range of open-world scenarios. The CLIP model is involved in AnyFace++ for learning an aligned language-vision feature space, which also expands the range of acceptable vocabulary as it is trained on a large-scale dataset. To further improve the granularity of semantic alignment between text and images, a memory module is incorporated to convert the description with arbitrary length, format, and modality into regularized latent embeddings representing discriminative attributes of the target face. Moreover, the diversity and semantic consistency of generation results are improved by a novel semi-supervised training scheme and a series of newly proposed objective functions. Compared to state-of-the-art methods, AnyFace++ is capable of synthesizing and manipulating face images based on more flexible descriptions and producing realistic images with higher diversity.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14590-14610, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37494159

RESUMEN

Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics. In the last decade, with the remarkable success of Generative Adversarial Networks (GANs) in synthesizing realistic images, numerous GAN-based models have been proposed to solve FAM with various problem formulation approaches and guiding information representations. This paper presents a comprehensive survey of GAN-based FAM methods with a focus on summarizing their principal motivations and technical details. The main contents of this survey include: (i) an introduction to the research background and basic concepts related to FAM, (ii) a systematic review of GAN-based FAM methods in three main categories, and (iii) an in-depth discussion of important properties of FAM methods, open issues, and future research directions. This survey not only builds a good starting point for researchers new to this field but also serves as a reference for the vision community.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15120-15136, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37490385

RESUMEN

Occlusion is a common problem with biometric recognition in the wild. The generalization ability of CNNs greatly decreases due to the adverse effects of various occlusions. To this end, we propose a novel unified framework integrating the merits of both CNNs and graph models to overcome occlusion problems in biometric recognition, called multiscale dynamic graph representation (MS-DGR). More specifically, a group of deep features reflected on certain subregions is recrafted into a feature graph (FG). Each node inside the FG is deemed to characterize a specific local region of the input sample, and the edges imply the co-occurrence of non-occluded regions. By analyzing the similarities of the node representations and measuring the topological structures stored in the adjacent matrix, the proposed framework leverages dynamic graph matching to judiciously discard the nodes corresponding to the occluded parts. The multiscale strategy is further incorporated to attain more diverse nodes representing regions of various sizes. Furthermore, the proposed framework exhibits a more illustrative and reasonable inference by showing the paired nodes. Extensive experiments demonstrate the superiority of the proposed framework, which boosts the accuracy in both natural and occlusion-simulated cases by a large margin compared with that of baseline methods.

7.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12287-12303, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37126625

RESUMEN

We present PyMAF-X, a regression-based approach to recovering a parametric full-body model from a single image. This task is very challenging since minor parametric deviation may lead to noticeable misalignment between the estimated mesh and the input image. Moreover, when integrating part-specific estimations into the full-body model, existing solutions tend to either degrade the alignment or produce unnatural wrist poses. To address these issues, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop in our regression network for well-aligned human mesh recovery and extend it as PyMAF-X for the recovery of expressive full-body models. The core idea of PyMAF is to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status. Specifically, given the currently predicted parameters, mesh-aligned evidence will be extracted from finer-resolution features accordingly and fed back for parameter rectification. To enhance the alignment perception, an auxiliary dense supervision is employed to provide mesh-image correspondence guidance while spatial alignment attention is introduced to enable the awareness of the global contexts for our network. When extending PyMAF for full-body mesh recovery, an adaptive integration strategy is proposed in PyMAF-X to produce natural wrist poses while maintaining the well-aligned performance of the part-specific estimations. The efficacy of our approach is validated on several benchmark datasets for body, hand, face, and full-body mesh recovery, where PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new The project page with code and video results can be found at https://www.liuyebin.com/pymaf-x.

8.
Artículo en Inglés | MEDLINE | ID: mdl-37018302

RESUMEN

Clinical management and accurate disease diagnosis are evolving from qualitative stage to the quantitative stage, particularly at the cellular level. However, the manual process of histopathological analysis is lab-intensive and time-consuming. Meanwhile, the accuracy is limited by the experience of the pathologist. Therefore, deep learning-empowered computer-aided diagnosis (CAD) is emerging as an important topic in digital pathology to streamline the standard process of automatic tissue analysis. Automated accurate nucleus segmentation can not only help pathologists make more accurate diagnosis, save time and labor, but also achieve consistent and efficient diagnosis results. However, nucleus segmentation is susceptible to staining variation, uneven nucleus intensity, background noises, and nucleus tissue differences in biopsy specimens. To solve these problems, we propose Deep Attention Integrated Networks (DAINets), which mainly built on self-attention based spatial attention module and channel attention module. In addition, we also introduce a feature fusion branch to fuse high-level representations with low-level features for multi-scale perception, and employ the mark-based watershed algorithm to refine the predicted segmentation maps. Furthermore, in the testing phase, we design Individual Color Normalization (ICN) to settle the dyeing variation problem in specimens. Quantitative evaluations on the multi-organ nucleus dataset indicate the priority of our automated nucleus segmentation framework.

9.
IEEE Trans Med Imaging ; 42(4): 1159-1171, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36423314

RESUMEN

With the development of deep convolutional neural networks, medical image segmentation has achieved a series of breakthroughs in recent years. However, high-performance convolutional neural networks always mean numerous parameters and high computation costs, which will hinder the applications in resource-limited medical scenarios. Meanwhile, the scarceness of large-scale annotated medical image datasets further impedes the application of high-performance networks. To tackle these problems, we propose Graph Flow, a comprehensive knowledge distillation framework, for both network-efficiency and annotation-efficiency medical image segmentation. Specifically, the Graph Flow Distillation transfers the essence of cross-layer variations from a well-trained cumbersome teacher network to a non-trained compact student network. In addition, an unsupervised Paraphraser Module is integrated to purify the knowledge of the teacher, which is also beneficial for the training stabilization. Furthermore, we build a unified distillation framework by integrating the adversarial distillation and the vanilla logits distillation, which can further refine the final predictions of the compact network. With different teacher networks (traditional convolutional architecture or prevalent transformer architecture) and student networks, we conduct extensive experiments on four medical image datasets with different modalities (Gastric Cancer, Synapse, BUSI, and CVC-ClinicDB). We demonstrate the prominent ability of our method on these datasets, which achieves competitive performances. Moreover, we demonstrate the effectiveness of our Graph Flow through a novel semi-supervised paradigm for dual efficient medical image segmentation. Our code will be available at Graph Flow.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación
10.
IEEE Trans Image Process ; 31: 4651-4662, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35786554

RESUMEN

One major issue that challenges person re-identification (Re-ID) is the ubiquitous occlusion over the captured persons. There are two main challenges for the occluded person Re-ID problem, i.e. , the interference of noise during feature matching and the loss of pedestrian information brought by the occlusions. In this paper, we propose a new approach called Feature Recovery Transformer (FRT) to address the two challenges simultaneously, which mainly consists of visibility graph matching and feature recovery transformer. To reduce the interference of the noise during feature matching, we mainly focus on visible regions that appear in both images and develop a visibility graph to calculate the similarity. In terms of the second challenge, based on the developed graph similarity, for each query image, we propose a recovery transformer that exploits the feature sets of its k -nearest neighbors in the gallery to recover the complete features. Extensive experiments across different person Re-ID datasets, including occluded, partial and holistic datasets, demonstrate the effectiveness of FRT. Specifically, FRT significantly outperforms state-of-the-art results by at least 6.2% Rank- 1 accuracy and 7.2% mAP scores on the challenging Occluded-Duke dataset.


Asunto(s)
Identificación Biométrica , Peatones , Identificación Biométrica/métodos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático
11.
IEEE Trans Pattern Anal Mach Intell ; 44(5): 2610-2627, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-33270560

RESUMEN

Reconstructing 3D human shape and pose from monocular images is challenging despite the promising results achieved by the most recent learning-based methods. The commonly occurred misalignment comes from the facts that the mapping from images to the model space is highly non-linear and the rotation-based pose representation of the body model is prone to result in the drift of joint positions. In this work, we investigate learning 3D human shape and pose from dense correspondences of body parts and propose a Decompose-and-aggregate Network (DaNet) to address these issues. DaNet adopts the dense correspondence maps, which densely build a bridge between 2D pixels and 3D vertexes, as intermediate representations to facilitate the learning of 2D-to-3D mapping. The prediction modules of DaNet are decomposed into one global stream and multiple local streams to enable global and fine-grained perceptions for the shape and pose predictions, respectively. Messages from local streams are further aggregated to enhance the robust prediction of the rotation-based poses, where a position-aided rotation feature refinement strategy is proposed to exploit spatial relationships between body joints. Moreover, a Part-based Dropout (PartDrop) strategy is introduced to drop out dense information from intermediate representations during training, encouraging the network to focus on more complementary body parts as well as neighboring position features. The efficacy of the proposed method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW, showing that our method could significantly improve the reconstruction performance in comparison with previous state-of-the-art methods. Our code is publicly available at https://hongwenzhang.github.io/dense2mesh.


Asunto(s)
Cuerpo Humano , Imagenología Tridimensional , Algoritmos , Humanos , Imagenología Tridimensional/métodos
12.
Artículo en Inglés | MEDLINE | ID: mdl-37015555

RESUMEN

Recent studies of video action recognition can be classified into two categories: the appearance-based methods and the pose-based methods. The appearance-based methods generally cannot model temporal dynamics of large motion well by virtue of optical flow estimation, while the pose-based methods ignore the visual context information such as typical scenes and objects, which are also important cues for action understanding. In this paper, we tackle these problems by proposing a Pose-Appearance Relational Network (PARNet), which models the correlation between human pose and image appearance, and combines the benefits of these two modalities to improve the robustness towards unconstrained real-world videos. There are three network streams in our model, namely pose stream, appearance stream and relation stream. For the pose stream, a Temporal Multi-Pose RNN module is constructed to obtain the dynamic representations through temporal modeling of 2D poses. For the appearance stream, a Spatial Appearance CNN module is employed to extract the global appearance representation of the video sequence. For the relation stream, a Pose-Aware RNN module is built to connect pose and appearance streams by modelling action-sensitive visual context information. Through jointly optimizing the three modules, PARNet achieves superior performances compared with the state-of-the-arts on both the pose-complete datasets (KTH, Penn-Action, UCF11) and the challenging pose-incomplete datasets (UCF101, HMDB51, JHMDB), demonstrating its robustness towards complex environments and noisy skeletons. Its effectiveness on NTU-RGBD dataset is also validated even compared with 3D skeleton-based methods. Furthermore, an appearance-enhanced PARNet equipped with a RGB-based I3D stream is proposed, which outperforms the Kinetics pre-trained competitors on UCF101 and HMDB51. The better experimental results verify the potentials of our framework by integrating various modules.

13.
IEEE Trans Pattern Anal Mach Intell ; 42(5): 1025-1037, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-31880541

RESUMEN

Near infrared-visible (NIR-VIS) heterogeneous face recognition refers to the process of matching NIR to VIS face images. Current heterogeneous methods try to extend VIS face recognition methods to the NIR spectrum by synthesizing VIS images from NIR images. However, due to the self-occlusion and sensing gap, NIR face images lose some visible lighting contents so that they are always incomplete compared to VIS face images. This paper models high-resolution heterogeneous face synthesis as a complementary combination of two components: a texture inpainting component and a pose correction component. The inpainting component synthesizes and inpaints VIS image textures from NIR image textures. The correction component maps any pose in NIR images to a frontal pose in VIS images, resulting in paired NIR and VIS textures. A warping procedure is developed to integrate the two components into an end-to-end deep network. A fine-grained discriminator and a wavelet-based discriminator are designed to improve visual quality. A novel 3D-based pose correction loss, two adversarial losses, and a pixel loss are imposed to ensure synthesis results. We demonstrate that by attaching the correction component, we can simplify heterogeneous face synthesis from one-to-many unpaired image translation to one-to-one paired image translation, and minimize the spectral and pose discrepancy during heterogeneous recognition. Extensive experimental results show that our network not only generates high-resolution VIS face images but also facilitates the accuracy improvement of heterogeneous face recognition.


Asunto(s)
Reconocimiento Facial Automatizado/métodos , Espectroscopía Infrarroja Corta/métodos , Bases de Datos Factuales , Cara/anatomía & histología , Cara/diagnóstico por imagen , Humanos , Aprendizaje Automático Supervisado
14.
Artículo en Inglés | MEDLINE | ID: mdl-31567089

RESUMEN

Binocular stereo vision (SV) has been widely used to reconstruct the depth information, but it is quite vulnerable to scenes with strong occlusions. As an emerging computational photography technology, light-field (LF) imaging brings about a novel solution to passive depth perception by recording multiple angular views in a single exposure. In this paper, we explore binocular SV and LF imaging to form the binocular-LF imaging system. An imaging theory is derived by modeling the imaging process and analyzing disparity properties based on the geometrical optics theory. Then an accurate occlusion-robust depth estimation algorithm is proposed by exploiting multibaseline stereo matching cues and defocus cues. The occlusions caused by binocular SV and LF imaging are detected and handled to eliminate the matching ambiguities and outliers. Finally, we develop a binocular-LF database and capture realworld scenes by our binocular-LF system to test the accuracy and robustness. The experimental results demonstrate that the proposed algorithm definitely recovers high quality depth maps with smooth surfaces and precise geometric shapes, which tackles the drawbacks of binocular SV and LF imaging simultaneously.

15.
Artículo en Inglés | MEDLINE | ID: mdl-31021767

RESUMEN

Regression based methods have revolutionized 2D landmark localization with the exploitation of deep neural networks and massive annotated datasets in the wild. However, it remains challenging for 3D landmark localization due to the lack of annotated datasets and the ambiguous nature of landmarks under 3D perspective. This paper revisits regression based methods and proposes an adversarial voxel and coordinate regression framework for 2D and 3D facial landmark localization in real-world scenarios. First, a semantic volumetric representation is introduced to encode the per-voxel likelihood of positions being the 3D landmarks. Then, an end-to-end pipeline is designed to jointly regress the proposed volumetric representation and the coordinate vector. Such a pipeline not only enhances the robustness and accuracy of the predictions but also unifies the 2D and 3D landmark localization so that 2D and 3D datasets could be utilized simultaneously. Further, an adversarial learning strategy is exploited to distill 3D structure learned from synthetic datasets to real-world datasets under weakly supervised settings, where an auxiliary regression discriminator is proposed to encourage the network to produce plausible predictions for both synthetic and real-world images. The effectiveness of our method is validated on benchmark datasets 3DFAW and AFLW2000-3D for both 2D and 3D facial landmark localization tasks. Experimental results show that the proposed method achieves significant improvements over previous state-of-the-art methods.

16.
IEEE Trans Pattern Anal Mach Intell ; 41(5): 1027-1042, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-29993436

RESUMEN

Unsupervised domain adaptation aims to leverage the labeled source data to learn with the unlabeled target data. Previous trandusctive methods tackle it by iteratively seeking a low-dimensional projection to extract the invariant features and obtaining the pseudo target labels via building a classifier on source data. However, they merely concentrate on minimizing the cross-domain distribution divergence, while ignoring the intra-domain structure especially for the target domain. Even after projection, possible risk factors like imbalanced data distribution may still hinder the performance of target label inference. In this paper, we propose a simple yet effective domain-invariant projection ensemble approach to tackle these two issues together. Specifically, we seek the optimal projection via a novel relaxed domain-irrelevant clustering-promoting term that jointly bridges the cross-domain semantic gap and increases the intra-class compactness in both domains. To further enhance the target label inference, we first develop a 'sampling-and-fusion' framework, under which multiple projections are independently learned based on various randomized coupled domain subsets. Subsequently, aggregating models such as majority voting are utilized to leverage multiple projections and classify unlabeled target data. Extensive experimental results on six visual benchmarks including object, face, and digit images, demonstrate that the proposed methods gain remarkable margins over state-of-the-art unsupervised domain adaptation methods.

17.
IEEE Trans Pattern Anal Mach Intell ; 41(7): 1761-1773, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-29993534

RESUMEN

Heterogeneous face recognition (HFR) aims at matching facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR presents more challenging issues than traditional face recognition because of the large intra-class variation among heterogeneous face images and the limited availability of training samples of cross-modality face image pairs. This paper proposes the novel Wasserstein convolutional neural network (WCNN) approach for learning invariant features between near-infrared (NIR) and visual (VIS) face images (i.e., NIR-VIS face recognition). The low-level layers of the WCNN are trained with widely available face images in the VIS spectrum, and the high-level layer is divided into three parts: the NIR layer, the VIS layer and the NIR-VIS shared layer. The first two layers aim at learning modality-specific features, and the NIR-VIS shared layer is designed to learn a modality-invariant feature subspace. The Wasserstein distance is introduced into the NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. W-CNN learning is performed to minimize the Wasserstein distance between the NIR distribution and the VIS distribution for invariant deep feature representations of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected WCNN layers to reduce the size of the parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at the training stage and an efficient computation for heterogeneous data at the testing stage. Extensive experiments using three challenging NIR-VIS face recognition databases demonstrate the superiority of the WCNN method over state-of-the-art methods.


Asunto(s)
Identificación Biométrica/métodos , Cara/anatomía & histología , Redes Neurales de la Computación , Espectroscopía Infrarroja Corta/métodos , Algoritmos , Bases de Datos Factuales , Expresión Facial , Humanos , Procesamiento de Imagen Asistido por Computador/métodos
18.
Artículo en Inglés | MEDLINE | ID: mdl-30582539

RESUMEN

Hashing has attracted increasing attention due to its tremendous potential for efficient image retrieval and data storage. Compared with conventional hashing methods with a handcrafted feature, emerging deep hashing approaches employ deep neural networks to learn feature representations as well as hash functions, which have already been proved to be more powerful and robust in real-world applications. Currently, most of the existing deep hashing methods construct pairwise or triplet-wise constraint to obtain similar binary codes between similar data pair or relative similar binary codes within a triplet. However, some critical local structures of the data are lack of exploiting, thus the effectiveness of hash learning is not fully shown. To address this limitation, we propose a novel deep hashing method named local semantic-aware deep hashing with Hamming-isometric quantization (LSDH), where local similarity of the data is intentionally integrated into hash learning. Specifically, in the Hamming space, we exploit the potential semantic relation of the data to robustly preserve their local similarity. In addition to reducing the error introduced by binary quantizing, we further develop a Hamming-isometric objective to maximize the consistency of similarity between the pairwise binary-like feature and its binary codes pair, which is shown to be able to enhance the quality of binary codes. Extensive experimental results on several benchmark datasets, including three singlelabel datasets (i.e., CIFAR-10, CIFAR-20, and SUN397) and one multi-label dataset (NUS-WIDE), demonstrate that the proposed LSDH achieves superior performance over the latest state-of-theart hashing methods.

19.
Artículo en Inglés | MEDLINE | ID: mdl-30235130

RESUMEN

Partial face recognition (PFR) in an unconstrained environment is a very important task, especially in situations where partial face images are likely to be captured due to occlusions, out-of-view, and large viewing angle, e.g., video surveillance and mobile devices. However, little attention has been paid to PFR so far and thus, the problem of recognizing an arbitrary patch of a face image remains largely unsolved. This study proposes a novel partial face recognition approach, called Dynamic Feature Matching (DFM), which combines Fully Convolutional Networks (FCNs) and Sparse Representation Classification (SRC) to address partial face recognition problem regardless of various face sizes. DFM does not require prior position information of partial faces against a holistic face. By sharing computation, the feature maps are calculated from the entire input image once, which yields a significant speedup. Experimental results demonstrate the effectiveness and advantages of DFM in comparison with state-of-the-art PFR methods on several partial face databases, including CAISA-NIR-Distance, CASIA-NIR-Mobile, and LFW databases. The performance of DFM is also impressive in partial person re-identification on Partial RE-ID and iLIDS databases. The source code of DFM can be found at https://github.com/lingxiao-he/dfm new.

20.
IEEE Trans Image Process ; 27(9): 4274-4286, 2018 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-29870347

RESUMEN

The low spatial resolution of light-field image poses significant difficulties in exploiting its advantage. To mitigate the dependency of accurate depth or disparity information as priors for light-field image super-resolution, we propose an implicitly multi-scale fusion scheme to accumulate contextual information from multiple scales for super-resolution reconstruction. The implicitly multi-scale fusion scheme is then incorporated into bidirectional recurrent convolutional neural network, which aims to iteratively model spatial relations between horizontally or vertically adjacent sub-aperture images of light-field data. Within the network, the recurrent convolutions are modified to be more effective and flexible in modeling the spatial correlations between neighboring views. A horizontal sub-network and a vertical sub-network of the same network structure are ensembled for final outputs via stacked generalization. Experimental results on synthetic and real-world data sets demonstrate that the proposed method outperforms other state-of-the-art methods by a large margin in peak signal-to-noise ratio and gray-scale structural similarity indexes, which also achieves superior quality for human visual systems. Furthermore, the proposed method can enhance the performance of light field applications such as depth estimation.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...