Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 149
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-33852407

RESUMO

Learning to hash has been widely applied for image retrieval due to the low storage and high retrieval efficiency. Existing hashing methods assume that the distributions of the retrieval pool (i.e., the data sets being retrieved) and the query data are similar, which, however, cannot truly reflect the real-world condition due to the unconstrained visual cues, such as illumination, pose, background, and so on. Due to the large distribution gap between the retrieval pool and the query set, the performances of traditional hashing methods are seriously degraded. Therefore, we first propose a new efficient but transferable hashing model for unconstrained cross-domain visual retrieval, in which the retrieval pool and the query sample are drawn from different but semantic relevant domains. Specifically, we propose a simple yet effective unsupervised hashing method, domain adaptation preconceived hashing (DAPH), toward learning domain-invariant hashing representation. Three merits of DAPH are observed: 1) to the best of our knowledge, we first propose unconstrained visual retrieval by introducing DA into hashing for learning transferable hashing codes; 2) a domain-invariant feature transformation with marginal discrepancy distance minimization and feature reconstruction constraint is learned, such that the hashing code is not only domain adaptive but content preserved; and 3) a DA preconceived quantization loss is proposed, which further guarantees the discrimination of the learned hashing code for sample retrieval. Extensive experiments on various benchmark data sets verify that our DAPH outperforms many state-of-the-art hashing methods toward unconstrained (unrestricted) instance retrieval in both single- and cross-domain scenarios.

2.
Artigo em Inglês | MEDLINE | ID: mdl-33861711

RESUMO

Heterogeneous faces are acquired with different sensors, which are closer to real-world scenarios and play an important role in the biometric security field. However, heterogeneous face analysis is still a challenging problem due to the large discrepancy between different modalities. Recent works either focus on designing a novel loss function or network architecture to directly extract modality-invariant features or synthesizing the same modality faces initially to decrease the modality gap. Yet, the former always lacks explicit interpretability, and the latter strategy inherently brings in synthesis bias. In this article, we explore to learn the plain interpretable representation for complex heterogeneous faces and simultaneously perform face recognition and synthesis tasks. We propose the heterogeneous face interpretable disentangled representation (HFIDR) that could explicitly interpret dimensions of face representation rather than simple mapping. Benefited from the interpretable structure, we further could extract latent identity information for cross-modality recognition and convert the modality factor to synthesize cross-modality faces. Moreover, we propose a multimodality heterogeneous face interpretable disentangled representation (M-HFIDR) to extend the basic approach suitable for the multimodality face recognition and synthesis. To evaluate the ability of generalization, we construct a novel large-scale face sketch data set. Experimental results on multiple heterogeneous face databases demonstrate the effectiveness of the proposed method.

3.
Indian J Ophthalmol ; 69(4): 865-870, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33727449

RESUMO

Purpose: Obtaining a better understanding of the pathogenesis of primary angle-closure disease (PACD) still requires studies that provide measurements of anterior and posterior biometric characteristics together and that assess the relationship between them. Methods: In total, 201 eyes were enrolled in this cross-sectional study: 50 normal controls, 49 primary angle-closure suspect (PACS), 38 primary angle closure (PAC), and 64 primary angle-closure glaucoma (PACG) eyes. The anterior and posterior structural features were measured by anterior segment optical coherence tomography and swept-source optical coherence tomography. Results: All PACD groups had smaller anterior chamber depth (ACD), anterior chamber area (ACA), anterior chamber volume (ACV), angle opening distance at 750 µm from the scleral spur (AOD750), trabecular-iris space area at 750 µm from the scleral spur (TISA750), and angle recess area (ARA), as well as a larger lens vault (LV), than controls (all P < 0.001). The PACS and PAC groups had thicker iris thickness at 750 µm from the scleral spur (IT750) than controls (P = 0.017 and P = 0.002, respectively). Choroidal thickness (CT) was not statistically different among normal, PACS, PAC, and PACG eyes. Univariate and multivariate linear regression analysis revealed a significant association between thinner IT750 and increased CT in PACD eyes (P = 0.031, univariate analysis; P = 0.008, multivariate analysis). Conclusion: Thinner iris thickness was associated with increased CT in PACD eyes; however, the underlying mechanism needs further investigation.

4.
Artigo em Inglês | MEDLINE | ID: mdl-33651699

RESUMO

Person re-identification (Re-ID) aims to retrieve images of the same person across disjoint camera views. Most Re-ID studies focus on pedestrian images captured by visible cameras, without considering the infrared images obtained in the dark scenarios. Person retrieval between visible and infrared modalities is of great significance to public security. Current methods usually train a model to extract global feature descriptors and obtain discriminative representations for visible infrared person Re-ID (VI-REID). Nevertheless, they ignore the detailed information of heterogeneous pedestrian images, which affects the performance of Re-ID. In this article, we propose a flexible body partition (FBP) model-based adversarial learning method (FBP-AL) for VI-REID. To learn more fine-grained information, FBP model is exploited to automatically distinguish part representations according to the feature maps of pedestrian images. Specially, we design a modality classifier and introduce adversarial learning which attempts to discriminate features between visible and infrared modality. Adaptive weighting-based representation learning and threefold triplet loss-based metric learning compete with modality classification to obtain more effective modality-sharable features, thus shrinking the cross-modality gap and enhancing the feature discriminability. Extensive experimental results on two cross-modality person Re-ID data sets, i.e., SYSU-MM01 and RegDB, exhibit the superiority of the proposed method compared with the state-of-the-art solutions.

5.
IEEE Trans Cybern ; PP2021 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-33635814

RESUMO

Despite the promising preliminary results, tensor-singular value decomposition (t-SVD)-based multiview subspace is incapable of dealing with real problems, such as noise and illumination changes. The major reason is that tensor-nuclear norm minimization (TNNM) used in t-SVD regularizes each singular value equally, which does not make sense in matrix completion and coefficient matrix learning. In this case, the singular values represent different perspectives and should be treated differently. To well exploit the significant difference between singular values, we study the weighted tensor Schatten p-norm based on t-SVD and develop an efficient algorithm to solve the weighted tensor Schatten p-norm minimization (WTSNM) problem. After that, applying WTSNM to learn the coefficient matrix in multiview subspace clustering, we present a novel multiview clustering method by integrating coefficient matrix learning and spectral clustering into a unified framework. The learned coefficient matrix well exploits both the cluster structure and high-order information embedded in multiview views. The extensive experiments indicate the efficiency of our method in six metrics.

6.
IEEE Trans Cybern ; PP2021 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-33606648

RESUMO

Multiview clustering has aroused increasing attention in recent years since real-world data are always comprised of multiple features or views. Despite the existing clustering methods having achieved promising performance, there still remain some challenges to be solved: 1) most existing methods are unscalable to large-scale datasets due to the high computational burden of eigendecomposition or graph construction and 2) most methods learn latent representations and cluster structures separately. Such a two-step learning scheme neglects the correlation between the two learning stages and may obtain a suboptimal clustering result. To address these challenges, a pseudo-label guided collective matrix factorization (PLCMF) method that jointly learns latent representations and cluster structures is proposed in this article. The proposed PLCMF first performs clustering on each view separately to obtain pseudo-labels that reflect the intraview similarities of each view. Then, it adds a pseudo-label constraint on collective matrix factorization to learn unified latent representations, which preserve the intraview and interview similarities simultaneously. Finally, it intuitively incorporates latent representation learning and cluster structure learning into a joint framework to directly obtain clustering results. Besides, the weight of each view is learned adaptively according to data distribution in the joint framework. In particular, the joint learning problem can be solved with an efficient iterative updating method with linear complexity. Extensive experiments on six benchmark datasets indicate the superiority of the proposed method over state-of-the-art multiview clustering methods in both clustering accuracy and computational efficiency.

7.
Neuroimage ; 228: 117602, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33395572

RESUMO

Reconstructing perceived stimulus (image) only from human brain activity measured with functional Magnetic Resonance Imaging (fMRI) is a significant task in brain decoding. However, the inconsistent distribution and representation between fMRI signals and visual images cause great 'domain gap'. Moreover, the limited fMRI data instances generally suffer from the issues of low signal noise ratio (SNR), extremely high dimensionality, and limited spatial resolution. Existing methods are often affected by these issues so that a satisfactory reconstruction is still an open problem. In this paper, we show that it is possible to obtain a promising solution by learning visually-guided latent cognitive representations from the fMRI signals, and inversely decoding them to the image stimuli. The resulting framework is called Dual-Variational Autoencoder/ Generative Adversarial Network (D-Vae/Gan), which combines the advantages of adversarial representation learning with knowledge distillation. In addition, we introduce a novel three-stage learning strategy which enables the (cognitive) encoder to gradually distill useful knowledge from the paired (visual) encoder during the learning process. Extensive experimental results on both artificial and natural images have demonstrated that our method could achieve surprisingly good results and outperform the available alternatives.


Assuntos
Encéfalo/fisiologia , Cognição/fisiologia , Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imagem por Ressonância Magnética/métodos
8.
Artigo em Inglês | MEDLINE | ID: mdl-33428577

RESUMO

Mitigating label noise is a crucial problem in classification. Noise filtering is an effective method of dealing with label noise which does not need to estimate the noise rate or rely on any loss function. However, most filtering methods focus mainly on binary classification, leaving the more difficult counterpart problem of multiclass classification relatively unexplored. To remedy this deficit, we present a definition for label noise in a multiclass setting and propose a general framework for a novel label noise filtering learning method for multiclass classification. Two examples of noise filtering methods for multiclass classification, multiclass complete random forest (mCRF) and multiclass relative density, are derived from their binary counterparts using our proposed framework. In addition, to optimize the NI_threshold hyperparameter in mCRF, we propose two new optimization methods: a new voting cross-validation method and an adaptive method that employs a 2-means clustering algorithm. Furthermore, we incorporate SMOTE into our label noise filtering learning framework to handle the ubiquitous problem of imbalanced data in multiclass classification. We report experiments on both synthetic data sets and UCI benchmarks to demonstrate our proposed methods are highly robust to label noise in comparison with state-of-the-art baselines. All code and data results are available at https://github.com/syxiaa/Multiclass-Label-Noise-Filtering-Learning.

9.
Artigo em Inglês | MEDLINE | ID: mdl-33481708

RESUMO

Benefiting from the strong capabilities of deep CNNs for feature representation and nonlinear mapping, deep-learning-based methods have achieved excellent performance in single image super-resolution. However, most existing SR methods depend on the high capacity of networks that are initially designed for visual recognition, and rarely consider the initial intention of super-resolution for detail fidelity. To pursue this intention, there are two challenging issues that must be solved: (1) learning appropriate operators which is adaptive to the diverse characteristics of smoothes and details; (2) improving the ability of the model to preserve low-frequency smoothes and reconstruct high-frequency details. To solve these problems, we propose a purposeful and interpretable detail-fidelity attention network to progressively process these smoothes and details in a divide-and-conquer manner, which is a novel and specific prospect of image super-resolution for the purpose of improving detail fidelity. This proposed method updates the concept of blindly designing or using deep CNNs architectures for only feature representation in local receptive fields. In particular, we propose a Hessian filtering for interpretable high-profile feature representation for detail inference, along with a dilated encoder-decoder and a distribution alignment cell to improve the inferred Hessian features in a morphological manner and statistical manner respectively. Extensive experiments demonstrate that the proposed method achieves superior performance compared to the state-of-the-art methods both quantitatively and qualitatively. The code is available at https://github.com/YuanfeiHuang/DeFiAN.

10.
IEEE Trans Image Process ; 30: 2016-2028, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33439841

RESUMO

Facial expression recognition is of significant importance in criminal investigation and digital entertainment. Under unconstrained conditions, existing expression datasets are highly class-imbalanced, and the similarity between expressions is high. Previous methods tend to improve the performance of facial expression recognition through deeper or wider network structures, resulting in increased storage and computing costs. In this paper, we propose a new adaptive supervised objective named AdaReg loss, re-weighting category importance coefficients to address this class imbalance and increasing the discrimination power of expression representations. Inspired by human beings' cognitive mode, an innovative coarse-fine (C-F) labels strategy is designed to guide the model from easy to difficult to classify highly similar representations. On this basis, we propose a novel training framework named the emotional education mechanism (EEM) to transfer knowledge, composed of a knowledgeable teacher network (KTN) and a self-taught student network (STSN). Specifically, KTN integrates the outputs of coarse and fine streams, learning expression representations from easy to difficult. Under the supervision of the pre-trained KTN and existing learning experience, STSN can maximize the potential performance and compress the original KTN. Extensive experiments on public benchmarks demonstrate that the proposed method achieves superior performance compared to current state-of-the-art frameworks with 88.07% on RAF-DB, 63.97% on AffectNet and 90.49% on FERPlus.

11.
Neural Netw ; 133: 57-68, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33125918

RESUMO

As an effective convex relaxation of the rank minimization model, the tensor nuclear norm minimization based multi-view clustering methods have been attracting more and more interest in recent years. However, most existing clustering methods regularize each singular value equally, restricting their capability and flexibility in tackling many practical problems, where the singular values should be treated differently. To address this problem, we propose a novel weighted tensor nuclear norm minimization (WTNNM) based method for multi-view spectral clustering. Specifically, we firstly calculate a set of transition probability matrices from different views, and construct a 3-order tensor whose lateral slices are composed of probability matrices. Secondly, we learn a latent high-order transition probability matrix by using our proposed weighted tensor nuclear norm, which directly considers the prior knowledge of singular values. Finally, clustering is performed on the learned transition probability matrix, which well characterizes both the complementary information and high-order information embedded in multi-view data. An efficient optimization algorithm is designed to solve the optimal solution. Extensive experiments on five benchmarks demonstrate that our method outperforms the state-of-the-art methods.


Assuntos
Algoritmos , Aprendizagem por Probabilidade , Benchmarking/métodos , Análise por Conglomerados
12.
Artigo em Inglês | MEDLINE | ID: mdl-33108295

RESUMO

Although remarkable progress has been made on single-image super-resolution (SISR), deep learning methods cannot be easily applied to real-world applications due to the requirement of its heavy computation, especially for mobile devices. Focusing on the fewer parameters and faster inference SISR approach, we propose an efficient and time-saving wavelet transform-based network architecture, where the image super-resolution (SR) processing is carried out in the wavelet domain. Different from the existing methods that directly infer high-resolution (HR) image with the input low-resolution (LR) image, our approach first decomposes the LR image into a series of wavelet coefficients (WCs) and the network learns to predict the corresponding series of HR WCs and then reconstructs the HR image. Particularly, in order to further enhance the relationship between WCs and image deep characteristics, we propose two novel modules [wavelet feature mapping block (WFMB) and wavelet coefficients reconstruction block (WCRB)] and a dual recursive framework for joint learning strategy, thus forming a WCs prediction model to realize the efficient and accurate reconstruction of HR WCs. Experimental results show that the proposed method can outperform state-of-the-art methods with more than a 2x reduction in model parameters and computational complexity.

13.
Artigo em Inglês | MEDLINE | ID: mdl-33108298

RESUMO

Significant progress has been made with face photo-sketch synthesis in recent years due to the development of deep convolutional neural networks, particularly generative adversarial networks (GANs). However, the performance of existing methods is still limited because of the lack of training data (photo-sketch pairs). To address this challenge, we investigate the effect of knowledge distillation (KD) on training neural networks for the face photo-sketch synthesis task and propose an effective KD model to improve the performance of synthetic images. In particular, we utilize a teacher network trained on a large amount of data in a related task to separately learn knowledge of the face photo and knowledge of the face sketch and simultaneously transfer this knowledge to two student networks designed for the face photo-sketch synthesis task. In addition to assimilating the knowledge from the teacher network, the two student networks can mutually transfer their own knowledge to further enhance their learning. To further enhance the perception quality of the synthetic image, we propose a KD+ model that combines GANs with KD. The generator can produce images with more realistic textures and less noise under the guide of knowledge. Extensive experiments and a user study demonstrate the superiority of our models over the state-of-the-art methods.

14.
IEEE Trans Cybern ; PP2020 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-33055048

RESUMO

Recently, the visual quality evaluation of screen content images (SCIs) has become an important and timely emerging research theme. This article presents an effective and novel blind quality evaluation metric for SCIs by using stacked autoencoders (SAE) based on pictorial and textual regions. Since the SCI consists of not only the pictorial area but also the textual area, the human visual system (HVS) is not equally sensitive to their different distortion types. First, the textual and pictorial regions can be obtained by dividing an input SCI via an SCI segmentation metric. Next, we extract quality-aware features from the textual region and pictorial region, respectively. Then, two different SAEs are trained via an unsupervised approach for quality-aware features that are extracted from these two regions. After the training procedure of the SAEs, the quality-aware features can evolve into more discriminative and meaningful features. Subsequently, the evolved features and their corresponding subjective scores are input into two regressors for training. Each regressor can obtain one output predictive score. Finally, the final perceptual quality score of a test SCI is computed by these two predicted scores via a weighted model. Experimental results on two public SCI-oriented databases have revealed that the proposed scheme can compare favorably with the existing blind image quality assessment metrics.

15.
Neural Netw ; 132: 245-252, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32927427

RESUMO

Due to the efficiency of exploiting relationships and complex structures hidden in multi-views data, graph-oriented clustering methods have achieved remarkable progress in recent years. But most existing graph-based spectral methods still have the following demerits: (1) They regularize each view equally, which does not make sense in real applications. (2) By employing different norms, most existing methods calculate the error feature by feature, resulting in neglecting the spatial structure information and the complementary information. To tackle the aforementioned drawbacks, we propose an enhanced multi-view spectral clustering model. Our model characterizes the consistency among indicator matrices by minimizing our proposed weighted tensor nuclear norm, which explicitly exploits the salient different information between singular values of the matrix. Moreover, our model adaptively assigns a reasonable weight to each view, which helps improve robustness of the algorithm. Finally, the proposed tensor nuclear norm well exploits both high-order and complementary information, which helps mine the consistency between indicator matrices. Extensive experiments indicate the efficiency of our method.

16.
Artigo em Inglês | MEDLINE | ID: mdl-32809937

RESUMO

Despite the promising results, tensor robust principal component analysis (TRPCA), which aims to recover underlying low-rank structure of clean tensor data corrupted with noise/outliers by shrinking all singular values equally, cannot well preserve the salient content of image. The major reason is that, in real applications, there is a salient difference information between all singular values of a tensor image, and the larger singular values are generally associated with some salient parts in the image. Thus, the singular values should be treated differently. Inspired by this observation, we investigate whether there is a better alternative solution when using tensor rank minimization. In this paper, we develop an enhanced TRPCA (ETRPCA) which explicitly considers the salient difference information between singular values of tensor data by the weighted tensor Schatten p-norm minimization, and then propose an efficient algorithm, which has a good convergence, to solve ETRPCA. Extensive experimental results reveal that the proposed method ETRPCA is superior to several state-of-the-art variant RPCA methods in terms of performance.

17.
Int Ophthalmol ; 2020 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-32813193

RESUMO

PURPOSE: This study aimed to determine the susceptibility and the changes of bacterial agents of chronic dacryocystitis and determine the risk factors for bacterial prevalence and drug sensitivity to provide a reference for clinical selection of antibiotics. METHODS: A case-control study was conducted using 112 patients with chronic dacryocystitis and 112 patients with non-infectious ophthalmopathy between August 2017 and April 2018. Lacrimal and conjunctival sac secretions were cultured for aerobic and anaerobic bacteria. Forty-five patients with chronic dacryocystitis between November 2014 and November 2015 were also included. RESULTS: Positive bacterial cultures were obtained from 61.9% and 50.9% of chronic dacryocystitis and non-infectious ophthalmopathy patients, but the detection rates for pathogenic bacteria were 18.3% and 2.7%, respectively (P > 0.001). Gram-negative and anaerobic bacteria were significantly more prevalent in the patient group compared with the control group (P = 0.001 and 0.005, respectively). Bacteria were detected at a significantly higher rate in patients with irritant symptoms (itch or foreign-body sensation) than in those without (OR = 9.333, P = 0.002), particularly Staphylococcus (OR = 9.783, P = 0.002). 11.6% (10/86) and 55.8% (48/86) showed resistance to levofloxacin and tobramycin, respectively. Compared with three years ago, the detection rate for Gram-positive cocci decreased from 51.1% to 27.8% (χ2 = 8.054, P = 0.005) CONCLUSIONS: Gram-positive cocci, Gram-negative bacilli, and anaerobic bacteria were the predominant pathogens. The prevalence of Gram-positive bacteria in cases of chronic dacryocystitis is decreasing.

18.
Artigo em Inglês | MEDLINE | ID: mdl-32813659

RESUMO

Face photo-sketch style transfer aims to convert a representation of a face from the photo (or sketch) domain to the sketch (respectively, photo) domain while preserving the character of the subject. It has wide-ranging applications in law enforcement, forensic investigation and digital entertainment. However, conventional face photo-sketch synthesis methods usually require training images from both the source domain and the target domain, and are limited in that they cannot be applied to universal conditions where collecting training images in the source domain that match the style of the test image is unpractical. This problem entails two major challenges: 1) designing an effective and robust domain translation model for the universal situation in which images of the source domain needed for training are unavailable, and 2) preserving the facial character while performing a transfer to the style of an entire image collection in the target domain. To this end, we present a novel universal face photo-sketch style transfer method that does not need any image from the source domain for training. The regression relationship between an input test image and the entire training image collection in the target domain is inferred via a deep domain translation framework, in which a domain-wise adaption term and a local consistency adaption term are developed. To improve the robustness of the style transfer process, we propose a multiview domain translation method that flexibly leverages a convolutional neural network representation with hand-crafted features in an optimal way. Qualitative and quantitative comparisons are provided for universal unconstrained conditions of unavailable training images from the source domain, demonstrating the effectiveness and superiority of our method for universal face photo-sketch style transfer.

19.
IEEE Trans Cybern ; PP2020 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-32554335

RESUMO

Convolutional neural networks (CNNs)-based video quality enhancement generally employs optical flow for pixelwise motion estimation and compensation, followed by utilizing motion-compensated frames and jointly exploring the spatiotemporal correlation across frames to facilitate the enhancement. This method, called the optical-flow-based method (OPT), usually achieves high accuracy at the expense of high computational complexity. In this article, we develop a new framework, referred to as biprediction-based multiframe video enhancement (PMVE), to achieve a one-pass enhancement procedure. PMVE designs two networks, that is, the prediction network (Pred-net) and the frame-fusion network (FF-net), to implement the two steps of synthesization and fusion, respectively. Specifically, the Pred-net leverages frame pairs to synthesize the so-called virtual frames (VFs) for those low-quality frames (LFs) through biprediction. Afterward, the slowly fused FF-net takes the VFs as the input to extract the correlation across the VFs and the related LFs, to obtain an enhanced version of those LFs. Such a framework allows PMVE to leverage the cross-correlation between successive frames for enhancement, hence capable of achieving high accuracy performance. Meanwhile, PMVE effectively avoids the explicit operations of motion estimation and compensation, hence greatly reducing the complexity compared to OPT. The experimental results demonstrate that the peak signal-to-noise ratio (PSNR) performance of PMVE is fully on par with that of OPT while its computational complexity is only 1% of OPT. Compared with other state-of-the-art methods in the literature, PMVE is also confirmed to achieve superior performance in both objective quality and visual quality at a reasonable complexity level. For instance, PMVE can surpass its best counterpart method by up to 0.42 dB in PSNR.

20.
Neural Netw ; 129: 123-137, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32512319

RESUMO

Image style transfer renders the content of an image into different styles. Current methods made decent progress with transferring the style of single image, however, visual statistics from one image cannot reflect the full scope of an artist. Also, previous work did not put content preservation in the important position, which would result in poor structure integrity, thus deteriorating the comprehensibility of generated image. These two problems would limit the visual quality improvement of style transfer results. Targeting at style resemblance and content preservation problems, we propose a style transfer system composed of collection representation space and semantic-guided reconstruction. We train an encoder-decoder network with art collections to construct a representation space that can reflect the style of the artist. Then, we use semantic information as guidance to reconstruct the target representation of the input image for better content preservation. We conduct both quantitative analysis and qualitative evaluation to assess the proposed method. Experiment results demonstrate that our approach well balanced the trade-off between capturing artistic characteristics and preserving content information in style transfer tasks.


Assuntos
Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Semântica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...