Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Neural Netw ; 176: 106350, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38723309

RESUMEN

In recent years, self-supervised learning has emerged as a powerful approach to learning visual representations without requiring extensive manual annotation. One popular technique involves using rotation transformations of images, which provide a clear visual signal for learning semantic representation. However, in this work, we revisit the pretext task of predicting image rotation in self-supervised learning and discover that it tends to marginalise the perception of features located near the centre of an image. To address this limitation, we propose a new self-supervised learning method, namely FullRot, which spotlights underrated regions by resizing the randomly selected and cropped regions of images. Moreover, FullRot increases the complexity of the rotation pretext task by applying the degree-free rotation to the region cropped into a circle. To encourage models to learn from different general parts of an image, we introduce a new data mixture technique called WRMix, which merges two random intra-image patches. By combining these innovative crop and rotation methods with the data mixture scheme, our approach, FullRot + WRMix, surpasses the state-of-the-art self-supervision methods in classification, segmentation, and object detection tasks on ten benchmark datasets with an improvement of up to +13.98% accuracy on STL-10, +8.56% accuracy on CIFAR-10, +10.20% accuracy on Sports-100, +15.86% accuracy on Mammals-45, +15.15% accuracy on PAD-UFES-20, +32.44% mIoU on VOC 2012, +7.62% mIoU on ISIC 2018, +9.70% mIoU on FloodArea, +25.16% AP50 on VOC 2007, and +58.69% AP50 on UTDAC 2020. The code is available at https://github.com/anthonyweidai/FullRot_WRMix.


Asunto(s)
Aprendizaje Automático Supervisado , Rotación , Humanos , Redes Neurales de la Computación , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos
2.
IEEE J Biomed Health Inform ; 28(2): 719-729, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37624725

RESUMEN

Accurate and unbiased examinations of skin lesions are critical for the early diagnosis and treatment of skin diseases. Visual features of skin lesions vary significantly because the images are collected from patients with different lesion colours and morphologies by using dissimilar imaging equipment. Recent studies have reported that ensembled convolutional neural networks (CNNs) are practical to classify the images for early diagnosis of skin disorders. However, the practical use of these ensembled CNNs is limited as these networks are heavyweight and inadequate for processing contextual information. Although lightweight networks (e.g., MobileNetV3 and EfficientNet) were developed to achieve parameter reduction for implementing deep neural networks on mobile devices, insufficient depth of feature representation restricts the performance. To address the existing limitations, we develop a new lite and effective neural network, namely HierAttn. The HierAttn applies a novel deep supervision strategy to learn the local and global features by using multi-stage and multi-branch attention mechanisms with only one training loss. The efficacy of HierAttn was evaluated by using the dermoscopy images dataset ISIC2019 and smartphone photos dataset PAD-UFES-20 (PAD2020). The experimental results show that HierAttn achieves the best accuracy and area under the curve (AUC) among the state-of-the-art lightweight networks.


Asunto(s)
Dermoscopía , Enfermedades de la Piel , Humanos , Dermoscopía/métodos , Enfermedades de la Piel/diagnóstico por imagen , Redes Neurales de la Computación , Diagnóstico por Computador/métodos
3.
IEEE Trans Image Process ; 33: 479-492, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38153821

RESUMEN

Early action prediction (EAP) aims to recognize human actions from a part of action execution in ongoing videos, which is an important task for many practical applications. Most prior works treat partial or full videos as a whole, ignoring rich action knowledge hidden in videos, i.e., semantic consistencies among different partial videos. In contrast, we partition original partial or full videos to form a new series of partial videos and mine the Action-Semantic Consistent Knowledge (ASCK) among these new partial videos evolving in arbitrary progress levels. Moreover, a novel Rich Action-semantic Consistent Knowledge network (RACK) under the teacher-student framework is proposed for EAP. Firstly, we use a two-stream pre-trained model to extract features of videos. Secondly, we treat the RGB or flow features of the partial videos as nodes and their action semantic consistencies as edges. Next, we build a bi-directional semantic graph for the teacher network and a single-directional semantic graph for the student network to model rich ASCK among partial videos. The MSE and MMD losses are incorporated as our distillation loss to enrich the ASCK of partial videos from the teacher to the student network. Finally, we obtain the final prediction by summering the logits of different subnetworks and applying a softmax layer. Extensive experiments and ablative studies have been conducted, demonstrating the effectiveness of modeling rich ASCK for EAP. With the proposed RACK, we have achieved state-of-the-art performance on three benchmarks. The code is available at https://github.com/lily2lab/RACK.git.

4.
Artículo en Inglés | MEDLINE | ID: mdl-37256808

RESUMEN

Human motion prediction is challenging due to the complex spatiotemporal feature modeling. Among all methods, graph convolution networks (GCNs) are extensively utilized because of their superiority in explicit connection modeling. Within a GCN, the graph correlation adjacency matrix drives feature aggregation, and thus, is the key to extracting predictive motion features. State-of-the-art methods decompose the spatiotemporal correlation into spatial correlations for each frame and temporal correlations for each joint. Directly parameterizing these correlations introduces redundant parameters to represent common relations shared by all frames and all joints. Besides, the spatiotemporal graph adjacency matrix is the same for different motion samples, and thus, cannot reflect samplewise correspondence variances. To overcome these two bottlenecks, we propose dynamic spatiotemporal decompose GC (DSTD-GC), which only takes 28.6% parameters of the state-of-the-art GC. The key of DSTD-GC is constrained dynamic correlation modeling, which explicitly parameterizes the common static constraints as a spatial/temporal vanilla adjacency matrix shared by all frames/joints and dynamically extracts correspondence variances for each frame/joint with an adjustment modeling function. For each sample, the common constrained adjacency matrices are fixed to represent generic motion patterns, while the extracted variances complete the matrices with specific pattern adjustments. Meanwhile, we mathematically reformulate GCs on spatiotemporal graphs into a unified form and find that DSTD-GC relaxes certain constraints of other GC, which contributes to a better representation capability. Moreover, by combining DSTD-GC with prior knowledge like body connection and temporal context, we propose a powerful spatiotemporal GCN called DSTD-GCN. On the Human3.6M, Carnegie Mellon University (CMU) Mocap, and 3D Poses in the Wild (3DPW) datasets, DSTD-GCN outperforms state-of-the-art methods by 3.9%-8.7% in prediction accuracy with 55.0%-96.9% fewer parameters. Codes are available at https://github.com/Jaakk0F/DSTD-GCN.

5.
IEEE Trans Image Process ; 31: 3973-3986, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35648878

RESUMEN

Video-based human pose estimation (VHPE) is a vital yet challenging task. While deep learning algorithms have made tremendous progress for the VHPE, lots of these approaches to this task implicitly model the long-range interaction between joints by expanding the receptive field of the convolution or designing a graph manually. Unlike prior methods, we design a lightweight and plug-and-play joint relation extractor (JRE) to explicitly and automatically model the associative relationship between joints. The JRE takes the pseudo heatmaps of joints as input and calculates their similarity. In this way, the JRE can flexibly learn the correlation between any two joints, allowing it to learn the rich spatial configuration of human poses. Furthermore, the JRE can infer invisible joints according to the correlation between joints, which is beneficial for locating occluded joints. Then, combined with temporal semantic continuity modeling, we propose a Relation-based Pose Semantics Transfer Network (RPSTN) for video-based human pose estimation. Specifically, to capture the temporal dynamics of poses, the pose semantic information of the current frame is transferred to the next with a joint relation guided pose semantics propagator (JRPSP). The JRPSP can transfer the pose semantic features from the non-occluded frame to the occluded frame. The proposed RPSTN achieves state-of-the-art or competitive results on the video-based Penn Action, Sub-JHMDB, PoseTrack2018, and HiEve datasets. Moreover, the proposed JRE improves the performance of backbones on the image-based COCO2017 dataset. Code is available at https://github.com/YHDang/pose-estimation.


Asunto(s)
Algoritmos , Semántica , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...