Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38848236

RESUMEN

3D neural rendering enables photo-realistic reconstruction of a specific scene by encoding discontinuous inputs into a neural representation. Despite the remarkable rendering results, the storage of network parameters is not transmission-friendly and not extendable to metaverse applications. In this paper, we propose an invertible neural rendering approach that enables generating an interactive 3D model from a single image (i.e., 3D Snapshot). Our idea is to distill a pre-trained neural rendering model (e.g., NeRF) into a visualizable image form that can then be easily inverted back to a neural network. To this end, we first present a neural image distillation method to optimize three neural planes for representing the original neural rendering model. However, this representation is noisy and visually meaningless. We thus propose a dynamic invertible neural network to embed this noisy representation into a plausible image representation of the scene. We demonstrate promising reconstruction quality quantitatively and qualitatively, by comparing to the original neural rendering model, as well as video-based invertible methods. On the other hand, our method can store dozens of NeRFs with a compact restoration network (5MB), and embedding each 3D scene takes up only 160KB of storage. More importantly, our approach is the first solution that allows embedding a neural rendering model into image representations, which enables applications like creating an interactive 3D model from a printed image in the metaverse.

2.
Artículo en Inglés | MEDLINE | ID: mdl-38713571

RESUMEN

Text-to-image generation models have significantly broadened the horizons of creative expression through the power of natural language. However, navigating these models to generate unique concepts, alter their appearance, or reimagine them in unfamiliar roles presents an intricate challenge. For instance, how can we exploit language-guided models to transpose an anime character into a different art style, or envision a beloved character in a radically different setting or role? This paper unveils a novel approach named DreamAnime, designed to provide this level of creative freedom. Using a minimal set of 2-3 images of a user-specified concept such as an anime character or an art style, we teach our model to encapsulate its essence through novel "words" in the embedding space of a pre-existing text-to-image model. Crucially, we disentangle the concepts of style and identity into two separate "words", thus providing the ability to manipulate them independently. These distinct "words" can then be pieced together into natural language sentences, promoting an intuitive and personalized creative process. Empirical results suggest that this disentanglement into separate word embeddings successfully captures a broad range of unique and complex concepts, with each word focusing on style or identity as appropriate. Comparisons with existing methods illustrate DreamAnime's superior capacity to accurately interpret and recreate the desired concepts across various applications and tasks. Code is available at https://github.com/chnshx/DreamAnime.

3.
Artículo en Inglés | MEDLINE | ID: mdl-38498754

RESUMEN

HD map reconstruction is crucial for autonomous driving. LiDAR-based methods are limited due to expensive sensors and time-consuming computation. Camera-based methods usually need to perform road segmentation and view transformation separately, which often causes distortion and missing content. To push the limits of the technology, we present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view given a front-view monocular image only. We propose a front-to-top view projection (FTVP) module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding. In addition, we apply multi-scale FTVP modules to propagate the rich spatial information of low-level features to mitigate spatial deviation of the predicted object location. Experiments on public benchmarks show that our method achieves various tasks on road layout estimation, vehicle occupancy estimation, and multi-class semantic estimation, at a performance level comparable to the state-of-the-arts, while maintaining superior efficiency.

4.
Artículo en Inglés | MEDLINE | ID: mdl-38335081

RESUMEN

Throughout history, static paintings have captivated viewers within display frames, yet the possibility of making these masterpieces vividly interactive remains intriguing. This research paper introduces 3DArtmator, a novel approach that aims to represent artforms in a highly interpretable stylized space, enabling 3D-aware animatable reconstruction and editing. Our rationale is to transfer the interpretability and 3D controllability of the latent space in a 3D-aware GAN to a stylized sub-space of a customized GAN, revitalizing the original artforms. To this end, the proposed two-stage optimization framework of 3DArtmator begins with discovering an anchor in the original latent space that accurately mimics the pose and content of a given art painting. This anchor serves as a reliable indicator of the original latent space local structure, therefore sharing the same editable predefined expression vectors. In the second stage, we train a customized 3D-aware GAN specific to the input artform, while enforcing the preservation of the original latent local structure through a meticulous style-directional difference loss. This approach ensures the creation of a stylized sub-space that remains interpretable and retains 3D control. The effectiveness and versatility of 3DArtmator are validated through extensive experiments across a diverse range of art styles. With the ability to generate 3D reconstruction and editing for artforms while maintaining interpretability, 3DArtmator opens up new possibilities for artistic exploration and engagement.

5.
Plant Biotechnol J ; 22(4): 1017-1032, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38012865

RESUMEN

Maize is one of the most important crops for food, cattle feed and energy production. However, maize is frequently attacked by various pathogens and pests, which pose a significant threat to maize yield and quality. Identification of quantitative trait loci and genes for resistance to pests will provide the basis for resistance breeding in maize. Here, a ß-glucosidase ZmBGLU17 was identified as a resistance gene against Pythium aphanidermatum, one of the causal agents of corn stalk rot, by genome-wide association analysis. Genetic analysis showed that both structural variations at the promoter and a single nucleotide polymorphism at the fifth intron distinguish the two ZmBGLU17 alleles. The causative polymorphism near the GT-AG splice site activates cryptic alternative splicing and intron retention of ZmBGLU17 mRNA, leading to the downregulation of functional ZmBGLU17 transcripts. ZmBGLU17 localizes in both the extracellular matrix and vacuole and contribute to the accumulation of two defence metabolites lignin and DIMBOA. Silencing of ZmBGLU17 reduces maize resistance against P. aphanidermatum, while overexpression significantly enhances resistance of maize against both the oomycete pathogen P. aphanidermatum and the Asian corn borer Ostrinia furnacalis. Notably, ZmBGLU17 overexpression lines exhibited normal growth and yield phenotype in the field. Taken together, our findings reveal that the apoplastic and vacuolar localized ZmBGLU17 confers resistance to both pathogens and insect pests in maize without a yield penalty, by fine-tuning the accumulation of lignin and DIMBOA.


Asunto(s)
Zea mays , beta-Glucosidasa , Animales , Bovinos , Zea mays/genética , Zea mays/química , beta-Glucosidasa/genética , Estudio de Asociación del Genoma Completo , Lignina , Fitomejoramiento , Insectos
6.
Cell Host Microbe ; 31(11): 1792-1803.e7, 2023 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-37944492

RESUMEN

Plants deploy intracellular receptors to counteract pathogen effectors that suppress cell-surface-receptor-mediated immunity. To what extent pathogens manipulate intracellular receptor-mediated immunity, and how plants tackle such manipulation, remains unknown. Arabidopsis thaliana encodes three similar ADR1 class helper nucleotide-binding domain leucine-rich repeat receptors (ADR1, ADR1-L1, and ADR1-L2), which are crucial in plant immunity initiated by intracellular receptors. Here, we report that Pseudomonas syringae effector AvrPtoB suppresses ADR1-L1- and ADR1-L2-mediated cell death. ADR1, however, evades such suppression by diversifying into two ubiquitination sites targeted by AvrPtoB. The intracellular sensor SNC1 interacts with and guards the CCR domains of ADR1-L1/L2. Removal of ADR1-L1/L2 or delivery of AvrPtoB activates SNC1, which then signals through ADR1 to trigger immunity. Our work elucidates the long-sought-after function of SNC1 in defense, and also how plants can use dual strategies, sequence diversification, and a multi-layered guard-guardee system, to counteract pathogen's attack on core immunity functions.


Asunto(s)
Proteínas de Arabidopsis , Arabidopsis , Proteínas de Arabidopsis/metabolismo , Inmunidad de la Planta , Ubiquitinación , Proteínas Portadoras/metabolismo , Enfermedades de las Plantas
7.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15081-15097, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37624715

RESUMEN

Traditional monocular depth estimation assumes that all objects are reliably visible in the RGB color domain. However, this is not always the case as more and more buildings are decorated with transparent glass walls. This problem has not been explored due to the difficulties in annotating the depth levels of glass walls, as commercial depth sensors cannot provide correct feedbacks on transparent objects. Furthermore, estimating depths from transparent glass walls requires the aids of surrounding context, which has not been considered in prior works. To cope with this problem, we introduce the first Glass Walls Depth Dataset (GW-Depth dataset). We annotate the depth levels of transparent glass walls by propagating the context depth values within neighboring flat areas, and the glass segmentation mask and instance level line segments of glass edges are also provided. On the other hand, a tailored monocular depth estimation method is proposed to fully activate the glass wall contextual understanding. First, we propose to exploit the glass structure context by incorporating the structural prior knowledge embedded in glass boundary line segment detections. Furthermore, to make our method adaptive to scenes without structure context where the glass boundary is either absent in the image or too narrow to be recognized, we propose to derive a reflection context by utilizing the depth reliable points sampled according to the variance between two depth estimations from different resolutions. High-resolution depth is thus estimated by the weighted summation of depths by those reliable points. Extensive experiments are conducted to evaluate the effectiveness of the proposed dual context design. Superior performances of our method is also demonstrated by comparing with state-of-the-art methods. We present the first feasible solution for monocular depth estimation in the presence of glass walls, which can be widely adopted in autonomous navigation.

8.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 9248-9255, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37015627

RESUMEN

Labeling is onerous for crowd counting as it should annotate each individual in crowd images. Recently, several methods have been proposed for semi-supervised crowd counting to reduce the labeling efforts. Given a limited labeling budget, they typically select a few crowd images and densely label all individuals in each of them. Despite the promising results, we argue the None-or-All labeling strategy is suboptimal as the densely labeled individuals in each crowd image usually appear similar while the massive unlabeled crowd images may contain entirely diverse individuals. To this end, we propose to break the labeling chain of previous methods and make the first attempt to reduce spatial labeling redundancy for semi-supervised crowd counting. First, instead of annotating all the regions in each crowd image, we propose to annotate the representative ones only. We analyze the region representativeness from both vertical and horizontal directions of initially estimated density maps, and formulate them as cluster centers of Gaussian Mixture Models. Additionally, to leverage the rich unlabeled regions, we exploit the similarities among individuals in each crowd image to directly supervise the unlabeled regions via feature propagation instead of the error-prone label propagation employed in the previous methods. In this way, we can transfer the original spatial labeling redundancy caused by individual similarities to effective supervision signals on the unlabeled regions. Extensive experiments on the widely-used benchmarks demonstrate that our method can outperform previous best approaches by a large margin.

9.
Artículo en Inglés | MEDLINE | ID: mdl-36343000

RESUMEN

Photorealistic multiview face synthesis from a single image is a challenging problem. Existing works mainly learn a texture mapping model from the source to the target faces. However, they rarely consider the geometric constraints on the internal deformation arising from pose variations, which causes a high level of uncertainty in face pose modeling, and hence, produces inferior results for large pose variations. Moreover, current methods typically suffer from undesired facial details loss due to the adoption of the de-facto standard encoder-decoder architecture without any skip connections (SCs). In this article, we directly learn and exploit geometric constraints and propose a fully deformable network to simultaneously model the deformations of both landmarks and faces for face synthesis. Specifically, our model consists of two parts: a deformable landmark learning network (DLLN) and a gated deformable face synthesis network (GDFSN). The DLLN converts an initial reference landmark to an individual-specific target landmark as delicate pose guidance for face rotation. The GDFSN adopts a dual-stream structure, with one stream estimating the deformation of two views in the form of convolution offsets according to the source pose and the converted target pose, and the other leveraging the predicted deformation offsets to create the target face. In this way, individual-aware pose changes are explicitly modeled in the face generator to cope with geometric transformation, by adaptively focusing on pertinent regions of the source face. To compensate for offset estimation errors, we introduce a soft-gating mechanism for adaptive fusion between deformable features and primitive features. Additionally, a pose-aligned SC (PASC) is tailored to propagate low-level input features to the appropriate positions in the output features for further enhancing the facial details and identity preservation. Extensive experiments on six benchmarks show that our approach performs favorably against the state-of-the-arts, especially with large pose changes. Code is available at https://github.com/cschengxu/FDFace.

10.
IEEE Trans Image Process ; 31: 5332-5342, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35921348

RESUMEN

We resolve the ill-posed alpha matting problem from a completely different perspective. Given an input portrait image, instead of estimating the corresponding alpha matte, we focus on the other end, to subtly enhance this input so that the alpha matte can be easily estimated by any existing matting models. This is accomplished by exploring the latent space of GAN models. It is demonstrated that interpretable directions can be found in the latent space and they correspond to semantic image transformations. We further explore this property in alpha matting. Particularly, we invert an input portrait into the latent code of StyleGAN, and our aim is to discover whether there is an enhanced version in the latent space which is more compatible with a reference matting model. We optimize multi-scale latent vectors in the latent spaces under four tailored losses, ensuring matting-specificity and subtle modifications on the portrait. We demonstrate that the proposed method can refine real portrait images for arbitrary matting models, boosting the performance of automatic alpha matting by a large margin. In addition, we leverage the generative property of StyleGAN, and propose to generate enhanced portrait data which can be treated as the pseudo GT. It addresses the problem of expensive alpha matte annotation, further augmenting the matting performance of existing models.

11.
Med Image Anal ; 78: 102381, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35231849

RESUMEN

Reliable nasopharyngeal carcinoma (NPC) segmentation plays an important role in radiotherapy planning. However, recent deep learning methods fail to achieve satisfactory NPC segmentation in magnetic resonance (MR) images, since NPC is infiltrative and typically has a small or even tiny volume with indistinguishable border, making it indiscernible from tightly connected surrounding tissues from immense and complex backgrounds. To address such background dominance problems, this paper proposes a sequential method (SeqSeg) to achieve accurate NPC segmentation. Specifically, the proposed SeqSeg is devoted to solving the problem at two scales: the instance level and feature level. At the instance level, SeqSeg is forced to focus attention on the tumor and its surrounding tissue through the deep Q-learning (DQL)-based NPC detection model by prelocating the tumor and reducing the scale of the segmentation background. Next, at the feature level, SeqSeg uses high-level semantic features in deeper layers to guide feature learning in shallower layers, thus directing the channel-wise and region-wise attention to mine tumor-related features to perform accurate segmentation. The performance of our proposed method is evaluated by extensive experiments on the large NPC dataset containing 1101 patients. The experimental results demonstrated that the proposed SeqSeg not only outperforms several state-of-the-art methods but also achieves better performance in multi-device and multi-center datasets.


Asunto(s)
Imagen por Resonancia Magnética , Neoplasias Nasofaríngeas , Humanos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética/métodos , Carcinoma Nasofaríngeo/diagnóstico por imagen , Carcinoma Nasofaríngeo/patología , Neoplasias Nasofaríngeas/diagnóstico por imagen
12.
IEEE Trans Image Process ; 31: 1230-1242, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35015636

RESUMEN

The state-of-the-art photo upsampling method, PULSE, demonstrates that a sharp, high-resolution (HR) version of a given low-resolution (LR) input can be obtained by exploring the latent space of generative models. However, mapping an extreme LR input (162) directly to an HR image (10242) is too ambiguous to preserve faithful local facial semantics. In this paper, we propose an enhanced upsampling approach, Pro-PULSE, that addresses the issues of semantic inconsistency and optimization complexity. Our idea is to learn an encoder that progressively constructs the HR latent codes in the extended W+ latent space of StyleGAN. This design divides the complex 64× upsampling problem into several steps, and therefore small-scale facial semantics can be inherited from one end to the other. In particular, we train two encoders, the base encoder maps latent vectors in W space and serves as a foundation of the HR latent vector, while the second scale-specific encoder performed in W+ space gradually replaces the previous vector produced by the base encoder at each scale. This process produces intermediate side-outputs, which injects deep supervision into the training of encoder. Extensive experiments demonstrate superiorities over the latest latent space exploration methods, in terms of efficiency, quantitative quality metrics, and qualitative visual results.

13.
Artículo en Inglés | MEDLINE | ID: mdl-37015410

RESUMEN

Converting a human portrait to anime style is a desirable but challenging problem. Existing methods fail to resolve this problem due to the large inherent gap between two domains that cannot be overcome by a simple direct mapping. For this reason, these methods struggle to preserve the appearance features in the original photo. In this paper, we discover an intermediate domain, the coser portrait (portraits of humans costuming as anime characters), that helps bridge this gap. It alleviates the learning ambiguity and loosens the mapping difficulty in a progressive manner. Specifically, we start from learning the mapping between coser and anime portraits, and present a proxy-guided domain adaptation learning scheme with three progressive adaptation stages to shift the initial model to the human portrait domain. In this way, our model can generate visually pleasant anime portraits with well-preserved appearances given the human portrait. Our model adopts a disentangled design by breaking down the translation problem into two specific subtasks of face deformation and portrait stylization. This further elevates the generation quality. Extensive experimental results show that our model can achieve visually compelling translation with better appearance preservation and perform favorably against the existing methods both qualitatively and quantitatively. Our code and datasets are available at https://github.com/NeverGiveU/PDA-Translation.

14.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3791-3806, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-33566757

RESUMEN

This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spatial location and dominant color of the largest color diversity along the temporal axis, etc. Then a neural network is built and trained to yield the statistical summaries given the video frames as inputs. In order to alleviate the learning difficulty, we employ several spatial partitioning patterns to encode rough spatial locations instead of exact spatial Cartesian coordinates. Our approach is inspired by the observation that human visual system is sensitive to rapidly changing contents in the visual field, and only needs impressions about rough spatial locations to understand the visual contents. To validate the effectiveness of the proposed approach, we conduct extensive experiments with four 3D backbone networks, i.e., C3D, 3D-ResNet, R(2+1)D and S3D-G. The results show that our approach outperforms the existing approaches across these backbone networks on four downstream video analysis tasks including action recognition, video retrieval, dynamic scene recognition, and action similarity labeling. The source code is publicly available at: https://github.com/laura-wang/video_repres_sts.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Humanos , Movimiento (Física) , Programas Informáticos
15.
IEEE Trans Pattern Anal Mach Intell ; 44(6): 2856-2871, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33290212

RESUMEN

In this paper, we introduce a novel yet challenging research problem, interactive crowd video generation, committed to producing diverse and continuous crowd video, and relieving the difficulty of insufficient annotated real-world datasets in crowd analysis. Our goal is to recursively generate realistic future crowd video frames given few context frames, under the user-specified guidance, namely individual positions of the crowd. To this end, we propose a deep network architecture specifically designed for crowd video generation that is composed of two complementary modules, each of which combats the problems of crowd dynamic synthesis and appearance preservation respectively. Particularly, a spatio-temporal transfer module is proposed to infer the crowd position and structure from guidance and temporal information, and a point-aware flow prediction module is presented to preserve appearance consistency by flow-based warping. Then, the outputs of the two modules are integrated by a self-selective fusion unit to produce an identity-preserved and continuous video. Unlike previous works, we generate continuous crowd behaviors beyond identity annotations or matching. Extensive experiments show that our method is effective for crowd video generation. More importantly, we demonstrate the generated video can produce diverse crowd behaviors and be used for augmenting different crowd analysis tasks, i.e., crowd counting, anomaly detection, crowd video prediction. Code is available at https://github.com/Icep2020/CrowdGAN.

16.
IEEE Trans Cybern ; 52(11): 11734-11746, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-34191743

RESUMEN

Multiview clustering seeks to partition objects via leveraging cross-view relations to provide a comprehensive description of the same objects. Most existing methods assume that different views are linear transformable or merely sampling from a common latent space. Such rigid assumptions betray reality, thus leading to unsatisfactory performance. To tackle the issue, we propose to learn both common and specific sampling spaces for each view to fully exploit their collaborative representations. The common space corresponds to the universal self-representation basis for all views, while the specific spaces are the view-specific basis accordingly. An iterative self-supervision scheme is conducted to strengthen the learned affinity matrix. The clustering is modeled by a convex optimization. We first solve its linear formulation by the popular scheme. Then, we employ the deep autoencoder structure to exploit its deep nonlinear formulation. The extensive experimental results on six real-world datasets demonstrate that the proposed model achieves uniform superiority over the benchmark methods.


Asunto(s)
Algoritmos , Aprendizaje , Análisis por Conglomerados
17.
IEEE Trans Image Process ; 30: 6700, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34339368

RESUMEN

In the above article [1], unfortunately, Fig. 5 was not displayed correctly with many empty images. The correct version is supplemented here.

18.
IEEE Trans Image Process ; 30: 6024-6035, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34181543

RESUMEN

Existing GAN-based multi-view face synthesis methods rely heavily on "creating" faces, and thus they struggle in reproducing the faithful facial texture and fail to preserve identity when undergoing a large angle rotation. In this paper, we combat this problem by dividing the challenging large-angle face synthesis into a series of easy small-angle rotations, and each of them is guided by a face flow to maintain faithful facial details. In particular, we propose a Face Flow-guided Generative Adversarial Network (FFlowGAN) that is specifically trained for small-angle synthesis. The proposed network consists of two modules, a face flow module that aims to compute a dense correspondence between the input and target faces. It provides strong guidance to the second module, face synthesis module, for emphasizing salient facial texture. We apply FFlowGAN multiple times to progressively synthesize different views, and therefore facial features can be propagated to the target view from the very beginning. All these multiple executions are cascaded and trained end-to-end with a unified back-propagation, and thus we ensure each intermediate step contributes to the final result. Extensive experiments demonstrate the proposed divide-and-conquer strategy is effective, and our method outperforms the state-of-the-art on four benchmark datasets qualitatively and quantitatively.

19.
IEEE/ACM Trans Comput Biol Bioinform ; 18(5): 1914-1923, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-31841420

RESUMEN

Tumor metastases detection is of great importance for the treatment of breast cancer patients. Various CNN (convolutional neural network) based methods get excellent performance in object detection/segmentation. However, the detection of metastases in hematoxylin and eosin (H&E) stained whole-slide images (WSI) is still challenging mainly due to two aspects. (1) The resolution of the image is too large. (2) lacking labeled training data. Whole-slide images generally stored in a multi-resolution structure with multiple downsampled tiles. It is difficult to feed the whole image into memory without compression. Moreover, labeling images for the pathologists are time-consuming and expensive. In this paper, we study the problem of detecting breast cancer metastases in the pathological image on patch level. To address the abovementioned challenges, we propose a few-shot learning method to classify whether an image patch contains tumor cells. Specifically, we propose a patch-level unsupervised cell ranking approach, which only relies on images with limited labels. The main idea of the proposed method is that when cropping a patch A from the WSI and further cropping a sub-patch B from A, the cell number of A is always larger than that of B. Based on this observation, we make use of the unlabeled images to learn the ranking information of cell counting to extract the abstract features. Experimental results show that our method is effective to improve the patch-level classification accuracy, compared to the traditional supervised method. The source code is publicly available at https://github.com/fewshot-camelyon.


Asunto(s)
Neoplasias de la Mama , Recuento de Células/métodos , Interpretación de Imagen Asistida por Computador/métodos , Metástasis de la Neoplasia , Aprendizaje Automático no Supervisado , Algoritmos , Neoplasias de la Mama/diagnóstico por imagen , Neoplasias de la Mama/patología , Femenino , Histocitoquímica , Humanos , Ganglios Linfáticos/diagnóstico por imagen , Ganglios Linfáticos/patología , Metástasis de la Neoplasia/diagnóstico por imagen , Metástasis de la Neoplasia/patología , Redes Neurales de la Computación
20.
IEEE Trans Neural Netw Learn Syst ; 32(8): 3761-3769, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-32822308

RESUMEN

With the explosive growth of action categories, zero-shot action recognition aims to extend a well-trained model to novel/unseen classes. To bridge the large knowledge gap between seen and unseen classes, in this brief, we visually associate unseen actions with seen categories in a visually connected graph, and the knowledge is then transferred from the visual features space to semantic space via the grouped attention graph convolutional networks (GAGCNs). In particular, we extract visual features for all the actions, and a visually connected graph is built to attach seen actions to visually similar unseen categories. Moreover, the proposed grouped attention mechanism exploits the hierarchical knowledge in the graph so that the GAGCN enables propagating the visual-semantic connections from seen actions to unseen ones. We extensively evaluate the proposed method on three data sets: HMDB51, UCF101, and NTU RGB + D. Experimental results show that the GAGCN outperforms state-of-the-art methods.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...