Pesquisa | Biblioteca Virtual em Saúde

1.

Predicting Alzheimer's disease progression using deep recurrent neural networks.

Nguyen, Minh; He, Tong; An, Lijun; Alexander, Daniel C; Feng, Jiashi; Yeo, B T Thomas.

Neuroimage ; 222: 117203, 2020 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-32763427

RESUMO

Early identification of individuals at risk of developing Alzheimer's disease (AD) dementia is important for developing disease-modifying therapies. In this study, given multimodal AD markers and clinical diagnosis of an individual from one or more timepoints, we seek to predict the clinical diagnosis, cognition and ventricular volume of the individual for every month (indefinitely) into the future. We proposed and applied a minimal recurrent neural network (minimalRNN) model to data from The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) challenge, comprising longitudinal data of 1677 participants (Marinescu et al., 2018) from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We compared the performance of the minimalRNN model and four baseline algorithms up to 6 years into the future. Most previous work on predicting AD progression ignore the issue of missing data, which is a prevalent issue in longitudinal data. Here, we explored three different strategies to handle missing data. Two of the strategies treated the missing data as a "preprocessing" issue, by imputing the missing data using the previous timepoint ("forward filling") or linear interpolation ("linear filling). The third strategy utilized the minimalRNN model itself to fill in the missing data both during training and testing ("model filling"). Our analyses suggest that the minimalRNN with "model filling" compared favorably with baseline algorithms, including support vector machine/regression, linear state space (LSS) model, and long short-term memory (LSTM) model. Importantly, although the training procedure utilized longitudinal data, we found that the trained minimalRNN model exhibited similar performance, when using only 1 input timepoint or 4 input timepoints, suggesting that our approach might work well with just cross-sectional data. An earlier version of our approach was ranked 5th (out of 53 entries) in the TADPOLE challenge in 2019. The current approach is ranked 2nd out of 63 entries as of June 3rd, 2020.

Assuntos

Doença de Alzheimer/diagnóstico por imagem , Aprendizado Profundo , Progressão da Doença , Imageamento por Ressonância Magnética/métodos , Modelos Teóricos , Neuroimagem/métodos , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/líquido cefalorraquidiano , Doença de Alzheimer/patologia , Doença de Alzheimer/fisiopatologia , Feminino , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Prognóstico

2.

Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics.

He, Tong; Kong, Ru; Holmes, Avram J; Nguyen, Minh; Sabuncu, Mert R; Eickhoff, Simon B; Bzdok, Danilo; Feng, Jiashi; Yeo, B T Thomas.

Neuroimage ; 206: 116276, 2020 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-31610298

RESUMO

There is significant interest in the development and application of deep neural networks (DNNs) to neuroimaging data. A growing literature suggests that DNNs outperform their classical counterparts in a variety of neuroimaging applications, yet there are few direct comparisons of relative utility. Here, we compared the performance of three DNN architectures and a classical machine learning algorithm (kernel regression) in predicting individual phenotypes from whole-brain resting-state functional connectivity (RSFC) patterns. One of the DNNs was a generic fully-connected feedforward neural network, while the other two DNNs were recently published approaches specifically designed to exploit the structure of connectome data. By using a combined sample of almost 10,000 participants from the Human Connectome Project (HCP) and UK Biobank, we showed that the three DNNs and kernel regression achieved similar performance across a wide range of behavioral and demographic measures. Furthermore, the generic feedforward neural network exhibited similar performance to the two state-of-the-art connectome-specific DNNs. When predicting fluid intelligence in the UK Biobank, performance of all algorithms dramatically improved when sample size increased from 100 to 1000 subjects. Improvement was smaller, but still significant, when sample size increased from 1000 to 5000 subjects. Importantly, kernel regression was competitive across all sample sizes. Overall, our study suggests that kernel regression is as effective as DNNs for RSFC-based behavioral prediction, while incurring significantly lower computational costs. Therefore, kernel regression might serve as a useful baseline algorithm for future studies.

Assuntos

Encéfalo/fisiologia , Conectoma/métodos , Interpretação de Imagem Assistida por Computador/métodos , Inteligência/fisiologia , Aprendizado de Máquina , Imageamento por Ressonância Magnética/métodos , Redes Neurais de Computação , Desempenho Psicomotor/fisiologia , Adulto , Fatores Etários , Idoso , Bancos de Espécimes Biológicos , Encéfalo/diagnóstico por imagem , Conjuntos de Dados como Assunto , Aprendizado Profundo , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fatores Sexuais , Adulto Jovem

3.

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition.

Hou, Qibin; Lu, Cheng-Ze; Cheng, Ming-Ming; Feng, Jiashi.

IEEE Trans Pattern Anal Mach Intell ; PP2024 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-38748521

RESUMO

Vision Transformers have been the most popular network architecture in visual recognition recently due to the strong ability of encode global information. However, its high computational cost when processing high-resolution images limits the applications in downstream tasks. In this paper, we take a deep look at the internal structure of self-attention and present a simple Transformer style convolutional neural network (ConvNet) for visual recognition. By comparing the design principles of the recent ConvNets and Vision Transformers, we propose to simplify the self-attention by leveraging a convolutional modulation operation. We show that such a simple approach can better take advantage of the large kernels ( ≥ 7×7) nested in convolutional layers and we observe a consistent performance improvement when gradually increasing the kernel size from 5×5 to 21×21. We build a family of hierarchical ConvNets using the proposed convolutional modulation, termed Conv2Former. Our network is simple and easy to follow. Experiments show that our Conv2Former outperforms existent popular ConvNets and vision Transformers, like Swin Transformer and ConvNeXt in all ImageNet classification, COCO object detection and ADE20k semantic segmentation. Our code is available at https://github.com/HVision-NKU/Conv2Former.

4.

Contrastive Masked Autoencoders are Stronger Vision Learners.

Huang, Zhicheng; Jin, Xiaojie; Lu, Chengze; Hou, Qibin; Cheng, Ming-Ming; Fu, Dongmei; Shen, Xiaohui; Feng, Jiashi.

IEEE Trans Pattern Anal Mach Intell ; 46(4): 2506-2517, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38015699

RESUMO

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger vision learner. Towards this goal, we propose Contrastive Masked Autoencoders (CMAE), a new self-supervised pre-training method for learning more comprehensive and capable vision representations. By elaboratively unifying contrastive learning (CL) and masked image model (MIM) through novel designs, CMAE leverages their respective advantages and learns representations with both strong instance discriminability and local perceptibility. Specifically, CMAE consists of two branches where the online branch is an asymmetric encoder-decoder and the momentum branch is a momentum updated encoder. During training, the online encoder reconstructs original images from latent representations of masked images to learn holistic features. The momentum encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart. To make CL compatible with MIM, CMAE introduces two new components, i.e., pixel shifting for generating plausible positive views and feature decoder for complementing features of contrastive pairs. Thanks to these novel designs, CMAE effectively improves the representation quality and transfer performance over its MIM counterpart. CMAE achieves the state-of-the-art performance on highly competitive benchmarks of image classification, semantic segmentation and object detection. Notably, CMAE-Base achieves 85.3% top-1 accuracy on ImageNet and 52.5% mIoU on ADE20k, surpassing previous best results by 0.7% and 1.8% respectively.

5.

Learning to Augment Poses for 3D Human Pose Estimation in Images and Videos.

Zhang, Jianfeng; Gong, Kehong; Wang, Xinchao; Feng, Jiashi.

IEEE Trans Pattern Anal Mach Intell ; 45(8): 10012-10026, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37027609

RESUMO

Existing 3D human pose estimation methods often suffer inferior generalization performance to new datasets, largely due to the limited diversity of 2D-3D pose pairs in the training data. To address this problem, we present PoseAug, a novel auto-augmentation framework that learns to augment the available training poses towards greater diversity and thus enhances the generalization power of the trained 2D-to-3D pose estimator. Specifically, PoseAug introduces a novel pose augmentor that learns to adjust various geometry factors of a pose through differentiable operations. With such differentiable capacity, the augmentor can be jointly optimized with the 3D pose estimator and take the estimation error as feedback to generate more diverse and harder poses in an online manner. PoseAug is generic and handy to be applied to various 3D pose estimation models. It is also extendable to aid pose estimation from video frames. To demonstrate this, we introduce PoseAug-V, a simple yet effective method that decomposes video pose augmentation into end pose augmentation and conditioned intermediate pose generation. Extensive experiments demonstrate that PoseAug and its extension PoseAug-V bring clear improvements for frame-based and video-based 3D pose estimation on several out-of-domain 3D human pose benchmarks.

Assuntos

Algoritmos , Imageamento Tridimensional , Humanos , Imageamento Tridimensional/métodos

6.

Token Selection is a Simple Booster for Vision Transformers.

Zhou, Daquan; Hou, Qibin; Yang, Linjie; Jin, Xiaojie; Feng, Jiashi.

IEEE Trans Pattern Anal Mach Intell ; 45(11): 12738-12746, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-36155475

RESUMO

Vision transformers have recently attained state-of-the-art results in visual recognition tasks. Their success is largely attributed to the self-attention component, which models the global dependencies among the image patches (tokens) and aggregates them into higher-level features. However, self-attention brings significant training difficulties to ViTs. Many recent works thus develop various new self-attention components to alleviate this issue. In this article, instead of developing complicated self-attention mechanism, we aim to explore simple approaches to fully release the potential of the vanilla self-attention. We first study the token selection behavior of self-attention and find that it suffers from a low diversity due to attention over-smoothing, which severely limits its effectiveness in learning discriminative token features. We then develop simple approaches to enhance selectivity and diversity for self-attention in token selection. The resulted token selector module can server as a drop-in module for various ViT backbones and consistently boost their performance. Significantly, they enable ViTs to achieve 84.6% top-1 classification accuracy on ImageNet with only 25M parameters. When scaled up to 81M parameters, the result can be further improved to 86.1%. In addition, we also present comprehensive experiments to demonstrate the token selector can be applied to a variety of transformer-based models to boost their performance for image classification, semantic segmentation and NLP tasks. Code is available at https://github.com/zhoudaquan/dvit_repo.

7.

VOLO: Vision Outlooker for Visual Recognition.

Yuan, Li; Hou, Qibin; Jiang, Zihang; Feng, Jiashi; Yan, Shuicheng.

IEEE Trans Pattern Anal Mach Intell ; 45(5): 6575-6586, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36094970

RESUMO

Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the state-of-the-art CNNs when trained from scratch on a midsize dataset like ImageNet. Through experimental analysis, we find it is because of two reasons: 1) the simple tokenization of input images fails to model the important local structure such as edges and lines, leading to low training sample efficiency; 2) the redundant attention backbone design of ViTs leads to limited feature richness for fixed computation budgets and limited training samples. To overcome such limitations, we present a new simple and generic architecture, termed Vision Outlooker (VOLO), which implements a novel outlook attention operation that dynamically conduct the local feature aggregation mechanism in a sliding window manner across the input image. Unlike self-attention that focuses on modeling global dependencies of local features at a coarse level, our outlook attention targets at encoding finer-level features, which is critical for recognition but ignored by self-attention. Outlook attention breaks the bottleneck of self-attention whose computation cost scales quadratically with the input spatial dimension, and thus is much more memory efficient. Compared to our Tokens-To-Token Vision Transformer (T2T-ViT), VOLO can more efficiently encode fine-level features that are essential for high-performance visual recognition. Experiments show that with only 26.6 M learnable parameters, VOLO achieves 84.2% top-1 accuracy on ImageNet-1 K without using extra training data, 2.7% better than T2T-ViT with a comparable number of parameters. When the model size is scaled up to 296 M parameters, its performance can be further improved to 87.1%, setting a new record for ImageNet-1 K classification. In addition, we also take the proposed VOLO as pretrained models and report superior performance on downstream tasks, such as semantic segmentation. Code is available at https://github.com/sail-sg/volo.

8.

Deep Long-Tailed Learning: A Survey.

Zhang, Yifan; Kang, Bingyi; Hooi, Bryan; Yan, Shuicheng; Feng, Jiashi.

IEEE Trans Pattern Anal Mach Intell ; 45(9): 10795-10816, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37074896

RESUMO

Deep long-tailed learning, one of the most challenging problems in visual recognition, aims to train well-performing deep models from a large number of images that follow a long-tailed class distribution. In the last decade, deep learning has emerged as a powerful recognition model for learning high-quality image representations and has led to remarkable breakthroughs in generic visual recognition. However, long-tailed class imbalance, a common problem in practical visual recognition tasks, often limits the practicality of deep network based recognition models in real-world applications, since they can be easily biased towards dominant classes and perform poorly on tail classes. To address this problem, a large number of studies have been conducted in recent years, making promising progress in the field of deep long-tailed learning. Considering the rapid evolution of this field, this article aims to provide a comprehensive survey on recent advances in deep long-tailed learning. To be specific, we group existing deep long-tailed learning studies into three main categories (i.e., class re-balancing, information augmentation and module improvement), and review these methods following this taxonomy in detail. Afterward, we empirically analyze several state-of-the-art methods by evaluating to what extent they address the issue of class imbalance via a newly proposed evaluation metric, i.e., relative accuracy. We conclude the survey by highlighting important applications of deep long-tailed learning and identifying several promising directions for future research.

9.

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition.

Hou, Qibin; Jiang, Zihang; Yuan, Li; Cheng, Ming-Ming; Yan, Shuicheng; Feng, Jiashi.

IEEE Trans Pattern Anal Mach Intell ; 45(1): 1328-1334, 2023 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-35077359

RESUMO

In this paper, we present Vision Permutator, a conceptually simple and data efficient MLP-like architecture for visual recognition. By realizing the importance of the positional information carried by 2D feature representations, unlike recent MLP-like models that encode the spatial information along the flattened spatial dimensions, Vision Permutator separately encodes the feature representations along the height and width dimensions with linear projections. This allows Vision Permutator to capture long-range dependencies and meanwhile avoid the attention building process in transformers. The outputs are then aggregated in a mutually complementing manner to form expressive representations. We show that our Vision Permutators are formidable competitors to convolutional neural networks (CNNs) and vision transformers. Without the dependence on spatial convolutions or attention mechanisms, Vision Permutator achieves 81.5% top-1 accuracy on ImageNet without extra large-scale training data (e.g., ImageNet-22k) using only 25M learnable parameters, which is much better than most CNNs and vision transformers under the same model size constraint. When scaling up to 88M, it attains 83.2% top-1 accuracy, greatly improving the performance of recent state-of-the-art MLP-like networks for visual recognition. We hope this work could encourage research on rethinking the way of encoding spatial information and facilitate the development of MLP-like models. PyTorch/MindSpore/Jittor code is available at https://github.com/Andrew-Qibin/VisionPermutator.

10.

MetaFormer Baselines for Vision.

Yu, Weihao; Si, Chenyang; Zhou, Pan; Luo, Mi; Zhou, Yichen; Feng, Jiashi; Yan, Shuicheng; Wang, Xinchao.

IEEE Trans Pattern Anal Mach Intell ; PP2023 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-37910405

RESUMO

MetaFormer, the abstracted architecture of Transformer, has been found to play a significant role in achieving competitive performance. In this paper, we further explore the capacity of MetaFormer, again, by migrating our focus away from the token mixer design: we introduce several baseline models under MetaFormer using the most basic or common mixers, and demonstrate their gratifying performance. We summarize our observations as follows: (1) MetaFormer ensures solid lower bound of performance. By merely adopting identity mapping as the token mixer, the MetaFormer model, termed IdentityFormer, achieves [Formula: see text]80% accuracy on ImageNet-1 K. (2) MetaFormer works well with arbitrary token mixers. When specifying the token mixer as even a random matrix to mix tokens, the resulting model RandFormer yields an accuracy of [Formula: see text]81%, outperforming IdentityFormer. Rest assured of MetaFormer's results when new token mixers are adopted. (3) MetaFormer effortlessly offers state-of-the-art results. With just conventional token mixers dated back five years ago, the models instantiated from MetaFormer already beat state of the art. (a) ConvFormer outperforms ConvNeXt. Taking the common depthwise separable convolutions as the token mixer, the model termed ConvFormer, which can be regarded as pure CNNs, outperforms the strong CNN model ConvNeXt. (b) CAFormer sets new record on ImageNet-1 K. By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1 K: it achieves an accuracy of 85.5% at 224 ×224 resolution, under normal supervised training without external data or distillation. In our expedition to probe MetaFormer, we also find that a new activation, StarReLU, reduces 71% FLOPs of activation compared with commonly-used GELU yet achieves better performance. Specifically, StarReLU is a variant of Squared ReLU dedicated to alleviating distribution shift. We expect StarReLU to find great potential in MetaFormer- like models alongside other neural networks. Code and models are available at https://github.com/sail-sg/metaformer.

11.

Learnable Central Similarity Quantization for Efficient Image and Video Retrieval.

Yuan, Li; Wang, Tao; Zhang, Xiaopeng; Tay, Francis Eng Hock; Jie, Zequn; Tian, Yonghong; Liu, Wei; Feng, Jiashi.

IEEE Trans Neural Netw Learn Syst ; PP2023 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-38090871

RESUMO

Data-dependent hashing methods aim to learn hash functions from the pairwise or triplet relationships among the data, which often lead to low efficiency and low collision rate by only capturing the local distribution of the data. To solve the limitation, we propose central similarity, in which the hash codes of similar data pairs are encouraged to approach a common center and those of dissimilar pairs to converge to different centers. As a new global similarity metric, central similarity can improve the efficiency and retrieval accuracy of hash learning. By introducing a new concept, hash centers, we principally formulate the computation of the proposed central similarity metric, in which the hash centers refer to a set of points scattered in the Hamming space with a sufficient mutual distance between each other. To construct well-separated hash centers, we provide two efficient methods: 1) leveraging the Hadamard matrix and Bernoulli distributions to generate data-independent hash centers and 2) learning data-dependent hash centers from data representations. Based on the proposed similarity metric and hash centers, we propose central similarity quantization (CSQ) that optimizes the central similarity between data points with respect to their hash centers instead of optimizing the local similarity to generate a high-quality deep hash function. We also further improve the CSQ with data-dependent hash centers, dubbed as CSQ with learnable center (CSQ [Formula: see text] ). The proposed CSQ and CSQ [Formula: see text] are generic and applicable to image and video hashing scenarios. We conduct extensive experiments on large-scale image and video retrieval tasks, and the proposed CSQ yields noticeably boosted retrieval performance, i.e., 3%-20% in mean average precision (mAP) over the previous state-of-the-art methods, which also demonstrates that our methods can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs.

12.

Towards Age-Invariant Face Recognition.

Zhao, Jian; Yan, Shuicheng; Feng, Jiashi.

IEEE Trans Pattern Anal Mach Intell ; 44(1): 474-487, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-32750831

RESUMO

Despite the remarkable progress in face recognition related technologies, reliably recognizing faces across ages remains a big challenge. The appearance of a human face changes substantially over time, resulting in significant intra-class variations. As opposed to current techniques for age-invariant face recognition, which either directly extract age-invariant features for recognition, or first synthesize a face that matches target age before feature extraction, we argue that it is more desirable to perform both tasks jointly so that they can leverage each other. To this end, we propose a deep Age-Invariant Model (AIM) for face recognition in the wild with three distinct novelties. First, AIM presents a novel unified deep architecture jointly performing cross-age face synthesis and recognition in a mutual boosting way. Second, AIM achieves continuous face rejuvenation/aging with remarkable photorealistic and identity-preserving properties, avoiding the requirement of paired data and the true age of testing samples. Third, effective and novel training strategies are developed for end-to-end learning of the whole deep architecture, which generates powerful age-invariant face representations explicitly disentangled from the age variation. Moreover, we construct a new large-scale Cross-Age Face Recognition (CAFR) benchmark dataset to facilitate existing efforts and push the frontiers of age-invariant face recognition research. Extensive experiments on both our CAFR dataset and several other cross-age datasets (MORPH, CACD, and FG-NET) demonstrate the superiority of the proposed AIM model over the state-of-the-arts. Benchmarking our model on the popular unconstrained face recognition datasets YTF and IJB-C additionally verifies its promising generalization ability in recognizing faces in the wild.

Assuntos

Reconhecimento Facial , Envelhecimento , Algoritmos , Face , Humanos , Aprendizagem

13.

Source Data-Absent Unsupervised Domain Adaptation Through Hypothesis Transfer and Labeling Transfer.

Liang, Jian; Hu, Dapeng; Wang, Yunbo; He, Ran; Feng, Jiashi.

IEEE Trans Pattern Anal Mach Intell ; 44(11): 8602-8617, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-34383644

RESUMO

Unsupervised domain adaptation (UDA) aims to transfer knowledge from a related but different well-labeled source domain to a new unlabeled target domain. Most existing UDA methods require access to the source data, and thus are not applicable when the data are confidential and not shareable due to privacy concerns. This paper aims to tackle a realistic setting with only a classification model available trained over, instead of accessing to, the source data. To effectively utilize the source model for adaptation, we propose a novel approach called Source HypOthesis Transfer (SHOT), which learns the feature extraction module for the target domain by fitting the target data features to the frozen source classification module (representing classification hypothesis). Specifically, SHOT exploits both information maximization and self-supervised learning for the feature extraction module learning to ensure the target features are implicitly aligned with the features of unseen source data via the same hypothesis. Furthermore, we propose a new labeling transfer strategy, which separates the target data into two splits based on the confidence of predictions (labeling information), and then employ semi-supervised learning to improve the accuracy of less-confident predictions in the target domain. We denote labeling transfer as SHOT++ if the predictions are obtained by SHOT. Extensive experiments on both digit classification and object recognition tasks show that SHOT and SHOT++ achieve results surpassing or comparable to the state-of-the-arts, demonstrating the effectiveness of our approaches for various visual domain adaptation problems. Code will be available at https://github.com/tim-learn/SHOT-plus.

Assuntos

Algoritmos , Redes Neurais de Computação

14.

Recurrent Multi-Frame Deraining: Combining Physics Guidance and Adversarial Learning.

Yang, Wenhan; Tan, Robby T; Feng, Jiashi; Wang, Shiqi; Cheng, Bin; Liu, Jiaying.

IEEE Trans Pattern Anal Mach Intell ; 44(11): 8569-8586, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-34029186

RESUMO

Existing video rain removal methods mainly focus on rain streak removal and are solely trained based on the synthetic data, which neglect more complex degradation factors, e.g., rain accumulation, and the prior knowledge in real rain data. Thus, in this paper, we build a more comprehensive rain model with several degradation factors and construct a novel two-stage video rain removal method that combines the power of synthetic videos and real data. Specifically, a novel two-stage progressive network is proposed: recovery guided by a physics model, and further restoration by adversarial learning. The first stage performs an inverse recovery process guided by our proposed rain model. An initially estimated background frame is obtained based on the input rain frame. The second stage employs adversarial learning to refine the result, i.e., recovering the overall color and illumination distributions of the frame, the background details that are failed to be recovered in the first stage, and removing the artifacts generated in the first stage. Furthermore, we also introduce a more comprehensive rain model that includes degradation factors, e.g., occlusion and rain accumulation, which appear in real scenes yet ignored by existing methods. This model, which generates more realistic rain images, will train and evaluate our models better. Extensive evaluations on synthetic and real videos show the effectiveness of our method in comparisons to the state-of-the-art methods. Our datasets, results and code are available at: https://github.com/flyywh/Recurrent-Multi-Frame-Deraining.

15.

PSGAN++: Robust Detail-Preserving Makeup Transfer and Removal.

Liu, Si; Jiang, Wentao; Gao, Chen; He, Ran; Feng, Jiashi; Li, Bo; Yan, Shuicheng.

IEEE Trans Pattern Anal Mach Intell ; 44(11): 8538-8551, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-34033534

RESUMO

In this paper, we address the makeup transfer and removal tasks simultaneously, which aim to transfer the makeup from a reference image to a source image and remove the makeup from the with-makeup image respectively. Existing methods have achieved much advancement in constrained scenarios, but it is still very challenging for them to transfer makeup between images with large pose and expression differences, or handle makeup details like blush on cheeks or highlight on the nose. In addition, they are hardly able to control the degree of makeup during transferring or to transfer a specified part in the input face. These defects limit the application of previous makeup transfer methods to real-world scenarios. In this work, we propose a Pose and expression robust Spatial-aware GAN (abbreviated as PSGAN++). PSGAN++ is capable of performing both detail-preserving makeup transfer and effective makeup removal. For makeup transfer, PSGAN++ uses a Makeup Distill Network (MDNet) to extract makeup information, which is embedded into spatial-aware makeup matrices. We also devise an Attentive Makeup Morphing (AMM) module that specifies how the makeup in the source image is morphed from the reference image, and a makeup detail loss to supervise the model within the selected makeup detail area. On the other hand, for makeup removal, PSGAN++ applies an Identity Distill Network (IDNet) to embed the identity information from with-makeup images into identity matrices. Finally, the obtained makeup/identity matrices are fed to a Style Transfer Network (STNet) that is able to edit the feature maps to achieve makeup transfer or removal. To evaluate the effectiveness of our PSGAN++, we collect a Makeup Transfer In the Wild (MT-Wild) dataset that contains images with diverse poses and expressions and a Makeup Transfer High-Resolution (MT-HR) dataset that contains high-resolution images. Experiments demonstrate that PSGAN++ not only achieves state-of-the-art results with fine makeup details even in cases of large pose/expression differences but also can perform partial or degree-controllable makeup transfer. Both the code and the newly collected datasets will be released at https://github.com/wtjiang98/PSGAN.

Assuntos

Algoritmos

16.

Meta-matching as a simple framework to translate phenotypic predictive models from big to small data.

He, Tong; An, Lijun; Chen, Pansheng; Chen, Jianzhong; Feng, Jiashi; Bzdok, Danilo; Holmes, Avram J; Eickhoff, Simon B; Yeo, B T Thomas.

Nat Neurosci ; 25(6): 795-804, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35578132

RESUMO

We propose a simple framework-meta-matching-to translate predictive models from large-scale datasets to new unseen non-brain-imaging phenotypes in small-scale studies. The key consideration is that a unique phenotype from a boutique study likely correlates with (but is not the same as) related phenotypes in some large-scale dataset. Meta-matching exploits these correlations to boost prediction in the boutique study. We apply meta-matching to predict non-brain-imaging phenotypes from resting-state functional connectivity. Using the UK Biobank (N = 36,848) and Human Connectome Project (HCP) (N = 1,019) datasets, we demonstrate that meta-matching can greatly boost the prediction of new phenotypes in small independent datasets in many scenarios. For example, translating a UK Biobank model to 100 HCP participants yields an eight-fold improvement in variance explained with an average absolute gain of 4.0% (minimum = -0.2%, maximum = 16.0%) across 35 phenotypes. With a growing number of large-scale datasets collecting increasingly diverse phenotypes, our results represent a lower bound on the potential of meta-matching.

Assuntos

Encéfalo , Conectoma , Encéfalo/diagnóstico por imagem , Conectoma/métodos , Humanos , Imageamento por Ressonância Magnética/métodos , Fenótipo

17.

PVRED: A Position-Velocity Recurrent Encoder-Decoder for Human Motion Prediction.

Wang, Hongsong; Dong, Jian; Cheng, Bin; Feng, Jiashi.

IEEE Trans Image Process ; 30: 6096-6106, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34185641

RESUMO

Human motion prediction, which aims to predict future human poses given past poses, has recently seen increased interest. Many recent approaches are based on Recurrent Neural Networks (RNN) which model human poses with exponential maps. These approaches neglect the pose velocity as well as temporal relation of different poses, and tend to converge to the mean pose or fail to generate natural-looking poses. We therefore propose a novel Position-Velocity Recurrent Encoder-Decoder (PVRED) for human motion prediction, which makes full use of pose velocities and temporal positional information. A temporal position embedding method is presented and a Position-Velocity RNN (PVRNN) is proposed. We also emphasize the benefits of quaternion parameterization of poses and design a novel trainable Quaternion Transformation (QT) layer, which is combined with a robust loss function during training. We provide quantitative results for both short-term prediction in the future 0.5 seconds and long-term prediction in the future 0.5 to 1 seconds. Experiments on several benchmarks show that our approach considerably outperforms the state-of-the-art methods. In addition, qualitative visualizations in the future 4 seconds show that our approach could predict future human-like and meaningful poses in very long time horizons. Code is publicly available on GitHub: https://github.com/hongsong-wang/PVRNN.

Assuntos

Processamento de Imagem Assistida por Computador/métodos , Movimento/fisiologia , Redes Neurais de Computação , Algoritmos , Humanos , Gravação em Vídeo

18.

Tensor Low-Rank Representation for Data Recovery and Clustering.

Zhou, Pan; Lu, Canyi; Feng, Jiashi; Lin, Zhouchen; Yan, Shuicheng.

IEEE Trans Pattern Anal Mach Intell ; 43(5): 1718-1732, 2021 May.

Artigo em Inglês | MEDLINE | ID: mdl-31751228

RESUMO

Multi-way or tensor data analysis has attracted increasing attention recently, with many important applications in practice. This article develops a tensor low-rank representation (TLRR) method, which is the first approach that can exactly recover the clean data of intrinsic low-rank structure and accurately cluster them as well, with provable performance guarantees. In particular, for tensor data with arbitrary sparse corruptions, TLRR can exactly recover the clean data under mild conditions; meanwhile TLRR can exactly verify their true origin tensor subspaces and hence cluster them accurately. TLRR objective function can be optimized via efficient convex programing with convergence guarantees. Besides, we provide two simple yet effective dictionary construction methods, the simple TLRR (S-TLRR) and robust TLRR (R-TLRR), to handle slightly and severely corrupted data respectively. Experimental results on two computer vision data analysis tasks, image/video recovery and face clustering, clearly demonstrate the superior performance, efficiency and robustness of our developed method over state-of-the-arts including the popular LRR and SSC methods.

19.

Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds.

Zhou, Pan; Yuan, Xiao-Tong; Yan, Shuicheng; Feng, Jiashi.

IEEE Trans Pattern Anal Mach Intell ; 43(2): 459-472, 2021 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-31398110

RESUMO

First-order non-convex Riemannian optimization algorithms have gained recent popularity in structured machine learning problems including principal component analysis and low-rank matrix completion. The current paper presents an efficient Riemannian Stochastic Path Integrated Differential EstimatoR (R-SPIDER) algorithm to solve the finite-sum and online Riemannian non-convex minimization problems. At the core of R-SPIDER is a recursive semi-stochastic gradient estimator that can accurately estimate Riemannian gradient under not only exponential mapping and parallel transport, but also general retraction and vector transport operations. Compared with prior Riemannian algorithms, such a recursive gradient estimation mechanism endows R-SPIDER with lower computational cost in first-order oracle complexity. Specifically, for finite-sum problems with n components, R-SPIDER is proved to converge to an Ïµ-approximate stationary point within [Formula: see text] stochastic gradient evaluations, beating the best-known complexity [Formula: see text]; for online optimization, R-SPIDER is shown to converge with [Formula: see text] complexity which is, to the best of our knowledge, the first non-asymptotic result for online Riemannian optimization. For the special case of gradient dominated functions, we further develop a variant of R-SPIDER with improved linear rate of convergence. Extensive experimental results demonstrate the advantage of the proposed algorithms over the state-of-the-art Riemannian non-convex optimization methods.

20.

Anytime Recognition with Routing Convolutional Networks.

Jie, Zequn; Sun, Peng; Li, Xin; Feng, Jiashi; Liu, Wei.

IEEE Trans Pattern Anal Mach Intell ; 43(6): 1875-1886, 2021 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-31869778

RESUMO

Achieving an automatic trade-off between accuracy and efficiency for a single deep neural network is highly desired in time-sensitive computer vision applications. To achieve anytime prediction, existing methods only embed fixed exits to neural networks and make the predictions with the fixed exits for all the samples (refer to the "latest-all" strategy). However, it is observed that the latest exit within a time budget does not always provide a more accurate prediction than the earlier exits for testing samples of various difficulties, making the "latest-all" strategy a sub-optimal solution. Motivated by this, we propose to improve the anytime prediction accuracy by allowing each sample to adaptively select its own optimal exit within a specific time budget. Specifically, we propose a new Routing Convolutional Network (RCN). For any given time budget, it adaptively selects the optimal layer as exit for a specific testing sample. To learn an optimal policy for sample routing, a Q-network is embedded into the RCN at each exit, considering both potential information gain and time-cost. To further boost the anytime prediction accuracy, the exits and the Q-networks are optimized alternately to mutually boost each other under the cost-sensitive environment. Apart from applying to whole image classification, RCN can also be adapted to dense prediction tasks, e.g., scene parsing, to achieve the pixel-level anytime prediction. Extensive experimental results on CIFAR-10, CIFAR-100, and ImageNet classification benchmarks, and Cityscapes scene parsing benchmark demonstrate the efficacy of the proposed RCN for anytime recognition.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA