Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Neuroimage ; 183: 650-665, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30125711

RESUMO

White matter hyperintensities (WMH) are commonly found in the brains of healthy elderly individuals and have been associated with various neurological and geriatric disorders. In this paper, we present a study using deep fully convolutional network and ensemble models to automatically detect such WMH using fluid attenuation inversion recovery (FLAIR) and T1 magnetic resonance (MR) scans. The algorithm was evaluated and ranked 1st in the WMH Segmentation Challenge at MICCAI 2017. In the evaluation stage, the implementation of the algorithm was submitted to the challenge organizers, who then independently tested it on a hidden set of 110 cases from 5 scanners. Averaged dice score, precision and robust Hausdorff distance obtained on held-out test datasets were 80%, 84% and 6.30 mm respectively. These were the highest achieved in the challenge, suggesting the proposed method is the state-of-the-art. Detailed descriptions and quantitative analysis on key components of the system were provided. Furthermore, a study of cross-scanner evaluation is presented to discuss how the combination of modalities affect the generalization capability of the system. The adaptability of the system to different scanners and protocols is also investigated. A quantitative study is further presented to show the effect of ensemble size and the effectiveness of the ensemble model. Additionally, software and models of our method are made publicly available. The effectiveness and generalization capability of the proposed system show its potential for real-world clinical practice.


Assuntos
Algoritmos , Encéfalo/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Neuroimagem/métodos , Substância Branca/diagnóstico por imagem , Conjuntos de Dados como Assunto , Humanos
2.
IEEE Trans Image Process ; 33: 1600-1613, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38373124

RESUMO

Action quality assessment (AQA) is to assess how well an action is performed. Previous works perform modelling by only the use of visual information, ignoring audio information. We argue that although AQA is highly dependent on visual information, the audio is useful complementary information for improving the score regression accuracy, especially for sports with background music, such as figure skating and rhythmic gymnastics. To leverage multimodal information for AQA, i.e., RGB, optical flow and audio information, we propose a Progressive Adaptive Multimodal Fusion Network (PAMFN) that separately models modality-specific information and mixed-modality information. Our model consists of with three modality-specific branches that independently explore modality-specific information and a mixed-modality branch that progressively aggregates the modality-specific information from the modality-specific branches. To build the bridge between modality-specific branches and the mixed-modality branch, three novel modules are proposed. First, a Modality-specific Feature Decoder module is designed to selectively transfer modality-specific information to the mixed-modality branch. Second, when exploring the interaction between modality-specific information, we argue that using an invariant multimodal fusion policy may lead to suboptimal results, so as to take the potential diversity in different parts of an action into consideration. Therefore, an Adaptive Fusion Module is proposed to learn adaptive multimodal fusion policies in different parts of an action. This module consists of several FusionNets for exploring different multimodal fusion strategies and a PolicyNet for deciding which FusionNets are enabled. Third, a module called Cross-modal Feature Decoder is designed to transfer cross-modal features generated by Adaptive Fusion Module to the mixed-modality branch. Our extensive experiments validate the efficacy of the proposed method, and our method achieves state-of-the-art performance on two public datasets. Code is available at https://github.com/qinghuannn/PAMFN.


Assuntos
Interpretação de Imagem Assistida por Computador , Aprendizado de Máquina
3.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 2692-2708, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-37922161

RESUMO

Person re-identification (Re-ID) is a fundamental task in visual surveillance. Given a query image of the target person, conventional Re-ID focuses on the pairwise similarities between the candidate images and the query. However, conventional Re-ID does not evaluate the consistency of the retrieval results of whether the most similar images ranked in each place contain the same person, which is risky in some applications such as missing out a place where the patient passed will hinder the epidemiological investigation. In this work, we investigate a more challenging task: consistently and successfully retrieving the target person in all camera views. We define the task as continuous person Re-ID and propose a corresponding evaluation metric termed overall Rank-K accuracy. Different from the conventional Re-ID, any incorrect retrieval under an individual camera view that raises an inconsistency will fail the continuous Re-ID. Consequently, the defective cameras, in which the images are hard to be automatically associated with the images from other views, strongly degrade the performance of continuous person Re-ID. Since the camera deployment is crucial for continuous tracking across camera views, we rethink person Re-ID from the perspective of camera deployment and assess the quality of a camera network by performing continuous Re-ID. Moreover, we propose to automatically detect the defective cameras that greatly hamper the continuous Re-ID. Because brute-force search is costly when the camera network becomes complicated, we explicitly model the visual relations as well as the spatial relations among cameras and develop a relational deep Q-network to select the properly deployed cameras and the un-selected cameras are regarded as the defective cameras. Since most existing datasets do not provide topology information about the camera network, they are unsuitable for investigating the importance of spatial relations on camera selection. Thus, we collect a new dataset including 20 cameras with topology information. Compared with randomly removing cameras, the experimental results show that our method can effectively detect the defective cameras so that people could take further operations on these cameras in practice (https://www.isee-ai.cn/∼yixing/MCCPD.html).

4.
Artigo em Inglês | MEDLINE | ID: mdl-38683711

RESUMO

Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges. To our best knowledge, there is no versatile ReID model that can handle various ReID challenges at the same time. This work contributes to the first attempt at learning a versatile ReID model to solve such a problem. Our main idea is to form a two-stage prompt-based twin modeling framework called VersReID. Our VersReID firstly leverages the scene label to train a ReID Bank that contains abundant knowledge for handling various scenes, where several groups of scene-specific prompts are used to encode different scene-specific knowledge. In the second stage, we distill a V-Branch model with versatile prompts from the ReID Bank for adaptively solving the ReID of different scenes, eliminating the demand for scene labels during the inference stage. To facilitate training VersReID, we further introduce the multi-scene properties into self-supervised learning of ReID via a multi-scene prioris data augmentation (MPDA) strategy. Through extensive experiments, we demonstrate the success of learning an effective and versatile ReID model for handling ReID tasks under multi-scene conditions without manual assignment of scene labels in the inference stage, including general, low-resolution, clothing change, occlusion, and cross-modality scenes. Codes and models will be made publicly available.

5.
Neural Netw ; 177: 106382, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38761416

RESUMO

Occluded person re-identification (Re-ID) is a challenging task, as pedestrians are often obstructed by various occlusions, such as non-pedestrian objects or non-target pedestrians. Previous methods have heavily relied on auxiliary models to obtain information in unoccluded regions, such as human pose estimation. However, these auxiliary models fall short in accounting for pedestrian occlusions, thereby leading to potential misrepresentations. In addition, some previous works learned feature representations from single images, ignoring the potential relations among samples. To address these issues, this paper introduces a Multi-Level Relation-Aware Transformer (MLRAT) model for occluded person Re-ID. This model mainly encompasses two novel modules: Patch-Level Relation-Aware (PLRA) and Sample-Level Relation-Aware (SLRA). PLRA learns fine-grained local features by modeling the structural relations between key patches, bypassing the dependency on auxiliary models. It adopts a model-free method to select key patches that have high semantic correlation with the final pedestrian representation. In particular, to alleviate the interference of occlusion, PLRA captures the structural relations among key patches via a two-layer Graph Convolution Network (GCN), effectively guiding the local feature fusion and learning. SLRA is designed to facilitate the model to learn discriminative features by modeling the relations among samples. Specifically, to mitigate noisy relations of irrelevant samples, we present a Relation-Aware Transformer (RAT) block to capture the relations among neighbors. Furthermore, to bridge the gap between training and testing phases, a self-distillation method is employed to transfer the sample-level relations captured by SLRA to the backbone. Extensive experiments are conducted on four occluded datasets, two partial datasets and two holistic datasets. The results show that the proposed MLRAT model significantly outperforms existing baselines on four occluded datasets, while maintains top performance on two partial datasets and two holistic datasets.


Assuntos
Redes Neurais de Computação , Pedestres , Humanos , Algoritmos
6.
Int J Biol Macromol ; 268(Pt 1): 131729, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38653429

RESUMO

In this case, various characterization technologies have been employed to probe dissociation mechanism of cellulose in N,N-dimethylacetamide/lithium chloride (DMAc/LiCl) system. These results indicate that coordination of DMAc ligands to the Li+-Cl- ion pair results in the formation of a series of Lix(DMAc)yClz (x = 1, 2; y = 1, 2, 3, 4; z = 1, 2) complexes. Analysis of interaction between DMAc ligand and Li center indicate that Li bond plays a major role for the formation of these Lix(DMAc)yClz complexes. And the saturation and directionality of Li bond in these Lix(DMAc)yClz complexes are found to be a tetrahedral structure. The hydrogen bonds between two cellulose chains could be broken at the nonreduced end of cellulose molecule via combined effects of basicity of Cl- ion and steric hindrance of [Li (DMAc)4]+ unit. The unique feature of Li bond in Lix(DMAc)yClz complexes is a key factor in determination of the dissociation mechanism.


Assuntos
Acetamidas , Celulose , Cloreto de Lítio , Celulose/química , Acetamidas/química , Cloreto de Lítio/química , Lítio/química , Ligação de Hidrogênio
7.
IEEE Trans Pattern Anal Mach Intell ; 46(6): 4188-4205, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38227419

RESUMO

Existing studies on knowledge distillation typically focus on teacher-centered methods, in which the teacher network is trained according to its own standards before transferring the learned knowledge to a student one. However, due to differences in network structure between the teacher and the student, the knowledge learned by the former may not be desired by the latter. Inspired by human educational wisdom, this paper proposes a Student-Centered Distillation (SCD) method that enables the teacher network to adjust its knowledge transfer according to the student network's needs. We implemented SCD based on various human educational wisdom, e.g., the teacher network identified and learned the knowledge desired by the student network on the validation set, and then transferred it to the latter through the training set. To address the problems of current deficiency knowledge, hard sample learning and knowledge forgetting faced by a student network in the learning process, we introduce and improve Proportional-Integral-Derivative (PID) algorithms from automation fields to make them effective in identifying the current knowledge required by the student network. Furthermore, we propose a curriculum learning-based fuzzy strategy and apply it to the proposed PID control algorithm, such that the student network in SCD can actively pay attention to the learning of challenging samples after with certain knowledge. The overall performance of SCD is verified in multiple tasks by comparing it with state-of-the-art ones. Experimental results show that our student-centered distillation method outperforms existing teacher-centered ones.


Assuntos
Algoritmos , Estudantes , Humanos , Aprendizado de Máquina , Lógica Fuzzy , Conhecimento
8.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15512-15529, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37410652

RESUMO

Semi-supervised person re-identification (Re-ID) is an important approach for alleviating annotation costs when learning to match person images across camera views. Most existing works assume that training data contains abundant identities crossing camera views. However, this assumption is not true in many real-world applications, especially when images are captured in nonadjacent scenes for Re-ID in wider areas, where the identities rarely cross camera views. In this work, we operate semi-supervised Re-ID under a relaxed assumption of identities rarely crossing camera views, which is still largely ignored in existing methods. Since the identities rarely cross camera views, the underlying sample relations across camera views become much more uncertain, and deteriorate the noise accumulation problem in many advanced Re-ID methods that apply pseudo labeling for associating visually similar samples. To quantify such uncertainty, we parameterize the probabilistic relations between samples in a relation discovery objective for pseudo label training. Then, we introduce reward quantified by identification performance on a few labeled data to guide learning dynamic relations between samples for reducing uncertainty. Our strategy is called the Rewarded Relation Discovery (R 2D), of which the rewarded learning paradigm is under-explored in existing pseudo labeling methods. To further reduce the uncertainty in sample relations, we perform multiple relation discovery objectives learning to discover probabilistic relations based on different prior knowledge of intra-camera affinity and cross-camera style variation, and fuse the complementary knowledge of different probabilistic relations by similarity distillation. To better evaluate semi-supervised Re-ID on identities rarely crossing camera views, we collect a new real-world dataset called REID-CBD, and perform simulation on benchmark datasets. Experiment results show that our method outperforms a wide range of semi-supervised and unsupervised learning methods.

9.
IEEE Trans Image Process ; 32: 3806-3820, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37418403

RESUMO

We are concerned with retrieving a query person from multiple videos captured by a non-overlapping camera network. Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network. To address this issue, we propose a pedestrian retrieval framework based on cross-camera trajectory generation that integrates both temporal and spatial information. To obtain pedestrian trajectories, we propose a novel cross-camera spatio-temporal model that integrates pedestrians' walking habits and the path layout between cameras to form a joint probability distribution. Such a cross-camera spatio-temporal model can be specified using sparsely sampled pedestrian data. Based on the spatio-temporal model, cross-camera trajectories can be extracted by the conditional random field model and further optimised by restricted non-negative matrix factorization. Finally, a trajectory re-ranking technique is proposed to improve the pedestrian retrieval results. To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset, the Person Trajectory Dataset, in real surveillance scenarios. Extensive experiments verify the effectiveness and robustness of the proposed method.

10.
Artigo em Inglês | MEDLINE | ID: mdl-37030682

RESUMO

In this work, we investigate online multi-view learning according to the multi-view complementarity and consistency principles to memorably process online multi-view data when fused across views. Online diverse features through different deep feature extractors under different views are used as input to an online learning method to privately and memorably optimize in each view for the discovery and memorization of the view-specific information. More specifically, according to the multi-view complementarity principle, a softmax-weighted reducible (SWR) loss is proposed to selectively retain credible views and neglect incredible ones for the online model's cross-view complementarity fusion. According to the multi-view consistency principle, we design a cross-view embedding consistency (CVEC) loss and a cross-view Kullback-Leibler (CVKL) divergence loss to maintain the cross-view consistency of the online model. Since the online multi-view learning setup needs to avoid repeatedly accessing online data to handle the knowledge forgetting in each view, we propose a knowledge registration unit (KRU) based on dictionary learning to incrementally register newly view-specific knowledge of online unlabeled data to the learnable and adjustable dictionary. Finally, by using the above strategies, we propose an online multi-view KRU approach and evaluate it with comprehensive experiments, thereby showing its superiority in online multi-view learning.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 11120-11135, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37027255

RESUMO

Vision Transformer (ViT) has shown great potential for various visual tasks due to its ability to model long-range dependency. However, ViT requires a large amount of computing resource to compute the global self-attention. In this work, we propose a ladder self-attention block with multiple branches and a progressive shift mechanism to develop a light-weight transformer backbone that requires less computing resources (e.g., a relatively small number of parameters and FLOPs), termed Progressive Shift Ladder Transformer (PSLT). First, the ladder self-attention block reduces the computational cost by modelling local self-attention in each branch. In the meanwhile, the progressive shift mechanism is proposed to enlarge the receptive field in the ladder self-attention block by modelling diverse local self-attention for each branch and interacting among these branches. Second, the input feature of the ladder self-attention block is split equally along the channel dimension for each branch, which considerably reduces the computational cost in the ladder self-attention block (with nearly [Formula: see text] the amount of parameters and FLOPs), and the outputs of these branches are then collaborated by a pixel-adaptive fusion. Therefore, the ladder self-attention block with a relatively small number of parameters and FLOPs is capable of modelling long-range interactions. Based on the ladder self-attention block, PSLT performs well on several vision tasks, including image classification, objection detection and person re-identification. On the ImageNet-1 k dataset, PSLT achieves a top-1 accuracy of 79.9% with 9.2 M parameters and 1.9 G FLOPs, which is comparable to several existing models with more than 20 M parameters and 4 G FLOPs. Code is available at https://isee-ai.cn/wugaojie/PSLT.html.

12.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 489-507, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-35130146

RESUMO

Egocentric videos, which record the daily activities of individuals from a first-person point of view, have attracted increasing attention during recent years because of their growing use in many popular applications, including life logging, health monitoring and virtual reality. As a fundamental problem in egocentric vision, one of the tasks of egocentric action recognition aims to recognize the actions of the camera wearers from egocentric videos. In egocentric action recognition, relation modeling is important, because the interactions between the camera wearer and the recorded persons or objects form complex relations in egocentric videos. However, only a few of existing methods model the relations between the camera wearer and the interacting persons for egocentric action recognition, and moreover they require prior knowledge or auxiliary data to localize the interacting persons. In this work, we consider modeling the relations in a weakly supervised manner, i.e., without using annotations or prior knowledge about the interacting persons or objects, for egocentric action recognition. We form a weakly supervised framework by unifying automatic interactor localization and explicit relation modeling for the purpose of automatic relation modeling. First, we learn to automatically localize the interactors, i.e., the body parts of the camera wearer and the persons or objects that the camera wearer interacts with, by learning a series of keypoints directly from video data to localize the action-relevant regions with only action labels and some constraints on these keypoints. Second, more importantly, to explicitly model the relations between the interactors, we develop an ego-relational LSTM (long short-term memory) network with several candidate connections to model the complex relations in egocentric videos, such as the temporal, interactive, and contextual relations. In particular, to reduce human efforts and manual interventions needed to construct an optimal ego-relational LSTM structure, we search for the optimal connections by employing a differentiable network architecture search mechanism, which automatically constructs the ego-relational LSTM network to explicitly model different relations for egocentric action recognition. We conduct extensive experiments on egocentric video datasets to illustrate the effectiveness of our method.


Assuntos
Algoritmos , Realidade Virtual , Humanos , Aprendizagem
13.
Artigo em Inglês | MEDLINE | ID: mdl-37022228

RESUMO

Imbalanced training data in medical image diagnosis is a significant challenge for diagnosing rare diseases. For this purpose, we propose a novel two-stage Progressive Class-Center Triplet (PCCT) framework to overcome the class imbalance issue. In the first stage, PCCT designs a class-balanced triplet loss to coarsely separate distributions of different classes. Triplets are sampled equally for each class at each training iteration, which alleviates the imbalanced data issue and lays solid foundation for the successive stage. In the second stage, PCCT further designs a class-center involved triplet strategy to enable a more compact distribution for each class. The positive and negative samples in each triplet are replaced by their corresponding class centers, which prompts compact class representations and benefits training stability. The idea of class-center involved loss can be extended to the pair-wise ranking loss and the quadruplet loss, which demonstrates the generalization of the proposed framework. Extensive experiments support that the PCCT framework works effectively for medical image classification with imbalanced training images. On four challenging class-imbalanced datasets (two skin datasets Skin7 and Skin 198, one chest X-ray dataset ChestXray-COVID, and one eye dataset Kaggle EyePACs), the proposed approach respectively obtains the mean F1 score 86.20, 65.20, 91.32, and 87.18 over all classes and 81.40, 63.87, 82.62, and 79.09 for rare classes, achieving state-of-the-art performance and outperforming the widely used methods for the class imbalance issue.

14.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7001-7018, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33079658

RESUMO

Learning to re-identify or retrieve a group of people across non-overlapped camera systems has important applications in video surveillance. However, most existing methods focus on (single) person re-identification (re-id), ignoring the fact that people often walk in groups in real scenarios. In this work, we take a step further and consider employing context information for identifying groups of people, i.e., group re-id. On the one hand, group re-id is more challenging than single person re-id, since it requires both a robust modeling of local individual person appearance (with different illumination conditions, pose/viewpoint variations, and occlusions), as well as full awareness of global group structures (with group layout and group member variations). On the other hand, we believe that person re-id can be greatly enhanced by incorporating additional visual context from neighboring group members, a task which we formulate as group-aware (single) person re-id. In this paper, we propose a novel unified framework based on graph neural networks to simultaneously address the above two group-based re-id tasks, i.e., group re-id and group-aware person re-id. Specifically, we construct a context graph with group members as its nodes to exploit dependencies among different people. A multi-level attention mechanism is developed to formulate both intra-group and inter-group context, with an additional self-attention module for robust graph-level representations by attentively aggregating node-level features. The proposed model can be directly generalized to tackle group-aware person re-id using node-level representations. Meanwhile, to facilitate the deployment of deep learning models on these tasks, we build a new group re-id dataset which contains more than 3.8K images with 1.5K annotated groups, an order of magnitude larger than existing group re-id datasets. Extensive experiments on the novel dataset as well as three existing datasets clearly demonstrate the effectiveness of the proposed framework for both group-based re-id tasks.

15.
Biomed Pharmacother ; 159: 114099, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36641923

RESUMO

Intervertebral disc degeneration (IVDD), a common cartilage-degenerative disease, is considered the main cause of low back pain (LBP). Owing to the complex aetiology and pathophysiology of IVDD, its molecular mechanisms and definitive treatment of IVDD remain unclear. As an evolutionarily and functionally conserved signalling pathway, Hippo-YAP/TAZ signalling plays a crucial role in IVDD progression. In this review, we discuss the regulation of Hippo-YAP/TAZ signalling and summarise the recent research progress on its role in cartilage homeostasis and IVDD. We also discuss the current application and future prospects of IVDD treatments based on Hippo-YAP/TAZ signalling.


Assuntos
Degeneração do Disco Intervertebral , Disco Intervertebral , Humanos , Via de Sinalização Hippo , Transdução de Sinais , Proteínas com Motivo de Ligação a PDZ com Coativador Transcricional
16.
IEEE Trans Neural Netw Learn Syst ; 33(2): 774-788, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-33493120

RESUMO

While feature learning by deep neural networks is currently widely used, it is still very challenging to perform this task, given the very limited quantity of labeled data. To solve this problem, we propose to unite subspace clustering with deep semisupervised feature learning to form a unified learning framework to pursue feature learning by subspace clustering. More specifically, we develop a deep entropy-sparsity subspace clustering (deep ESSC) model, which forces a deep neural network to learn features using subspace clustering constrained by our designed entropy-sparsity scheme. The model can inherently harmonize deep semisupervised feature learning and subspace clustering simultaneously by the proposed self-similarity preserving strategy. To optimize the deep ESSC model, we introduce two unconstrained variables to eliminate the two constraints via softmax functions. We provide a general algebraic-treatment scheme for solving the proposed deep ESSC model. Extensive experiments with comprehensive analysis substantiate that our deep ESSC model is more effective than the related methods.

17.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 8779-8795, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34752383

RESUMO

Action assessment, the process of evaluating how well an action is performed, is an important task in human action analysis. Action assessment has experienced considerable development based on visual cues; however, existing methods neglect to adaptively learn different architectures for varied types of actions and are therefore limited in achieving high-performance assessment for each type of action. In fact, every type of action has specific evaluation criteria, and human experts are trained for years to correctly evaluate a single type of action. Therefore, it is difficult for a single assessment architecture to achieve high performance for all types of actions. However, manually designing an assessment architecture for each specific type of action is very difficult and impracticable. This work addresses this problem by adaptively designing different assessment architectures for different types of actions, and the proposed approach is therefore called the adaptive action assessment. In order to facilitate our adaptive action assessment by exploiting the specific joint interactions for each type of action, a set of graph-based joint relations is learned for each type of action by means of trainable joint relation graphs built according to the human skeleton structure, and the learned joint relation graphs can visually interpret the assessment process. In addition, we introduce using a normalized mean squared error loss (N-MSE loss) and a Pearson loss that perform automatic score normalization to operate adaptive assessment training. The experiments on four benchmarks for action assessment demonstrate the effectiveness and feasibility of the proposed method. We also demonstrate the visual interpretability of our model by visualizing the details of the assessment process.


Assuntos
Algoritmos , Aprendizagem , Humanos
18.
IEEE Trans Image Process ; 31: 3081-3094, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35389866

RESUMO

Humans have the inherent advantage of understanding action intention, while it is an enormous challenge to train the machine to localize unintentional action in videos due to the lack of reliable annotations for stable training. The annotations of unintentional action are unreliable since different annotators are affected by their subjective appraisals and intrinsic ambiguity, which brings heavy difficulties for the training. To address this issue, we propose a probabilistic framework for unintentional action localization by modeling the uncertainty of annotations. Our framework consists of two main components, including Temporal Label Aggregation (TLA) and Dense Probabilistic Localization (DPL). We first formulate each annotated failure moment as a temporal label distribution. Then we propose a TLA component to aggregate temporal label distributions of different failure moments in an online manner and generate dense probabilistic supervision. Based on TLA, We further develop a DPL component to jointly train three heads (i.e., probabilistic dense classification, probabilistic temporal detection, and probabilistic regression) with different supervision granularities and make them highly collaborative. We evaluate our approach on the largest unintentional action dataset OOPS and demonstrate that our approach can achieve significant improvement over the baseline and state-of-the-art methods.


Assuntos
Modelos Estatísticos , Humanos
19.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6074-6093, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34048336

RESUMO

In conventional person re-identification (re-id), the images used for model training in the training probe set and training gallery set are all assumed to be instance-level samples that are manually labeled from raw surveillance video (likely with the assistance of detection) in a frame-by-frame manner. This labeling across multiple non-overlapping camera views from raw video surveillance is expensive and time consuming. To overcome these issues, we consider a weakly supervised person re-id modeling that aims to find the raw video clips where a given target person appears. In our weakly supervised setting, during training, given a sample of a person captured in one camera view, our weakly supervised approach aims to train a re-id model without further instance-level labeling for this person in another camera view. The weak setting refers to matching a target person with an untrimmed gallery video where we only know that the identity appears in the video without the requirement of annotating the identity in any frame of the video during the training procedure. The weakly supervised person re-id is challenging since it not only suffers from the difficulties occurring in conventional person re-id (e.g., visual ambiguity and appearance variations caused by occlusions, pose variations, background clutter, etc.), but more importantly, is also challenged by weakly supervised information because the instance-level labels and the ground-truth locations for person instances (i.e., the ground-truth bounding boxes of person instances) are absent. To solve the weakly supervised person re-id problem, we develop deep graph metric learning (DGML). On the one hand, DGML measures the consistency between intra-video spatial graphs of consecutive frames, where the spatial graph captures neighborhood relationship about the detected person instances in each frame. On the other hand, DGML distinguishes the inter-video spatial graphs captured from different camera views at different sites simultaneously. To further explicitly embed weak supervision into the DGML and solve the weakly supervised person re-id problem, we introduce weakly supervised regularization (WSR), which utilizes multiple weak video-level labels to learn discriminative features by means of a weak identity loss and a cross-video alignment loss. We conduct extensive experiments to demonstrate the feasibility of the weakly supervised person re-id approach and its special cases (e.g., its bag-to-bag extension) and show that the proposed DGML is effective.


Assuntos
Identificação Biométrica , Algoritmos , Identificação Biométrica/métodos , Humanos
20.
Artigo em Inglês | MEDLINE | ID: mdl-37015433

RESUMO

Occluded person re-identification (ReID) is a challenging task due to more background noises and incomplete foreground information. Although existing human parsing-based ReID methods can tackle this problem with semantic alignment at the finest pixel level, their performance is heavily affected by the human parsing model. Most supervised methods propose to train an extra human parsing model aside from the ReID model with cross-domain human parts annotation, suffering from expensive annotation cost and domain gap; Unsupervised methods integrate a feature clustering-based human parsing process into the ReID model, but lacking supervision signals brings less satisfactory segmentation results. In this paper, we argue that the pre-existing information in the ReID training dataset can be directly used as supervision signals to train the human parsing model without any extra annotation. By integrating a weakly supervised human co-parsing network into the ReID network, we propose a novel framework that exploits shared information across different images of the same pedestrian, called the Human Co-parsing Guided Alignment (HCGA) framework. Specifically, the human co-parsing network is weakly supervised by three consistency criteria, namely global semantics, local space, and background. By feeding the semantic information and deep features from the person ReID network into the guided alignment module, features of the foreground and human parts can then be obtained for effective occluded person ReID. Experiment results on two occluded and two holistic datasets demonstrate the superiority of our method. Especially on Occluded-DukeMTMC, it achieves 70.2% Rank-1 accuracy and 57.5% mAP.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA