Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 103
Filtrar
1.
Zhongguo Dang Dai Er Ke Za Zhi ; 25(11): 1107-1112, 2023 Nov 15.
Artigo em Zh | MEDLINE | ID: mdl-37990453

RESUMO

OBJECTIVES: To study the efficacy and safety of Xiyanping injection through intramuscular injection for the treatment of acute bronchitis in children. METHODS: A prospective study was conducted from December 2021 to October 2022, including 78 children with acute bronchitis from three hospitals using a multicenter, randomized, parallel-controlled design. The participants were divided into a test group (conventional treatment plus Xiyanping injection; n=36) and a control group (conventional treatment alone; n=37) in a 1:1 ratio. Xiyanping injection was administered at a dose of 0.3 mL/(kg·d) (total daily dose ≤8 mL), twice daily via intramuscular injection, with a treatment duration of ≤4 days and a follow-up period of 7 days. The treatment efficacy and safety were compared between the two groups. RESULTS: The total effective rate on the 3rd day after treatment in the test group was significantly higher than that in the control group (P<0.05), while there was no significant difference in the total effective rate on the 5th day between the two groups (P>0.05). The rates of fever relief, cough relief, and lung rale relief in the test group on the 3rd day after treatment were higher than those in the control group (P<0.05). The cough relief rate on the 5th day after treatment in the test group was higher than that in the control group (P<0.05), while there was no significant difference in the fever relief rate and lung rale relief rate between the two groups (P>0.05). The cough relief time, daily cough relief time, and nocturnal cough relief time in the test group were significantly shorter than those in the control group (P<0.05), while there were no significant differences in the fever duration and lung rale relief time between the two groups (P>0.05). There was no significant difference in the incidence of adverse events between the two groups (P>0.05). CONCLUSIONS: The overall efficacy of combined routine treatment with intramuscular injection of Xiyanping injection in the treatment of acute bronchitis in children is superior to that of routine treatment alone, without an increase in the incidence of adverse reactions.


Assuntos
Bronquite , Tosse , Humanos , Criança , Injeções Intramusculares , Tosse/tratamento farmacológico , Estudos Prospectivos , Sons Respiratórios , Bronquite/tratamento farmacológico , Resultado do Tratamento
2.
Int J Comput Vis ; 130(6): 1494-1525, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35465628

RESUMO

The goal of Facial Kinship Verification (FKV) is to automatically determine whether two individuals have a kin relationship or not from their given facial images or videos. It is an emerging and challenging problem that has attracted increasing attention due to its practical applications. Over the past decade, significant progress has been achieved in this new field. Handcrafted features and deep learning techniques have been widely studied in FKV. The goal of this paper is to conduct a comprehensive review of the problem of FKV. We cover different aspects of the research, including problem definition, challenges, applications, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance. In retrospect of what has been achieved so far, we identify gaps in current research and discuss potential future research directions.

3.
Sensors (Basel) ; 18(7)2018 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-30041441

RESUMO

In this paper, a novel imperceptible, fragile and blind watermark scheme is proposed for speech tampering detection and self-recovery. The embedded watermark data for content recovery is calculated from the original discrete cosine transform (DCT) coefficients of host speech. The watermark information is shared in a frames-group instead of stored in one frame. The scheme trades off between the data waste problem and the tampering coincidence problem. When a part of a watermarked speech signal is tampered with, one can accurately localize the tampered area, the watermark data in the area without any modification still can be extracted. Then, a compressive sensing technique is employed to retrieve the coefficients by exploiting the sparseness in the DCT domain. The smaller the tampered the area, the better quality of the recovered signal is. Experimental results show that the watermarked signal is imperceptible, and the recovered signal is intelligible for high tampering rates of up to 47.6%. A deep learning-based enhancement method is also proposed and implemented to increase the SNR of recovered speech signal.

4.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 1093-1108, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37930909

RESUMO

Image restoration aims to reconstruct the latent sharp image from its corrupted counterpart. Besides dealing with this long-standing task in the spatial domain, a few approaches seek solutions in the frequency domain by considering the large discrepancy between spectra of sharp/degraded image pairs. However, these algorithms commonly utilize transformation tools, e.g., wavelet transform, to split features into several frequency parts, which is not flexible enough to select the most informative frequency component to recover. In this paper, we exploit a multi-branch and content-aware module to decompose features into separate frequency subbands dynamically and locally, and then accentuate the useful ones via channel-wise attention weights. In addition, to handle large-scale degradation blurs, we propose an extremely simple decoupling and modulation module to enlarge the receptive field via global and window-based average pooling. Furthermore, we merge the paradigm of multi-stage networks into a single U-shaped network to pursue multi-scale receptive fields and improve efficiency. Finally, integrating the above designs into a convolutional backbone, the proposed Frequency Selection Network (FSNet) performs favorably against state-of-the-art algorithms on 20 different benchmark datasets for 6 representative image restoration tasks, including single-image defocus deblurring, image dehazing, image motion deblurring, image desnowing, image deraining, and image denoising.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38917284

RESUMO

Image restoration aims to reconstruct a high-quality image from its corrupted version, playing essential roles in many scenarios. Recent years have witnessed a paradigm shift in image restoration from convolutional neural networks (CNNs) to Transformerbased models due to their powerful ability to model long-range pixel interactions. In this paper, we explore the potential of CNNs for image restoration and show that the proposed simple convolutional network architecture, termed ConvIR, can perform on par with or better than the Transformer counterparts. By re-examing the characteristics of advanced image restoration algorithms, we discover several key factors leading to the performance improvement of restoration models. This motivates us to develop a novel network for image restoration based on cheap convolution operators. Comprehensive experiments demonstrate that our ConvIR delivers state-ofthe- art performance with low computation complexity among 20 benchmark datasets on five representative image restoration tasks, including image dehazing, image motion/defocus deblurring, image deraining, and image desnowing.

6.
IEEE Trans Image Process ; 33: 5921-5935, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39405143

RESUMO

Scene text editing aims to replace the source text with the target text while preserving the original background. Its practical applications span various domains, such as data generation and privacy protection, highlighting its increasing importance in recent years. In this study, we propose a novel Scene Text Editing network with Explicitly-decoupled text transfer and Minimized background reconstruction, called STEEM. Unlike existing methods that usually fuse text style, text content, and background, our approach focuses on decoupling text style and content from the background and utilizes the minimized background reconstruction to reduce the impact of text replacement on the background. Specifically, the text-background separation module predicts the text mask of the scene text image, separating the source text from the background. Subsequently, the style-guided text transfer decoding module transfers the geometric and stylistic attributes of the source text to the content text, resulting in the target text. Next, the background and target text are combined to determine the minimal reconstruction area. Finally, the context-focused background reconstruction module is applied to the reconstruction area, producing the editing result. Furthermore, to ensure stable joint optimization of the four modules, a task-adaptive training optimization strategy has been devised. Experimental evaluations conducted on two popular datasets demonstrate the effectiveness of our approach. STEEM outperforms state-of-the-art methods, as evidenced by a reduction in the FID index from 29.48 to 24.67 and an increase in text recognition accuracy from 76.8% to 78.8%.

7.
IEEE Trans Image Process ; 33: 5936-5948, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39405142

RESUMO

Camouflaged object detection (COD) aims to identify the objects that seamlessly blend into the surrounding backgrounds. Due to the intrinsic similarity between the camouflaged objects and the background region, it is extremely challenging to precisely distinguish the camouflaged objects by existing approaches. In this paper, we propose a hierarchical graph interaction network termed HGINet for camouflaged object detection, which is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features. Specifically, we first design a region-aware token focusing attention (RTFA) with dynamic token clustering to excavate the potentially distinguishable tokens in the local region. Afterwards, a hierarchical graph interaction transformer (HGIT) is proposed to construct bi-directional aligned communication between hierarchical features in the latent interaction space for visual semantics enhancement. Furthermore, we propose a decoder network with confidence aggregated feature fusion (CAFF) modules, which progressively fuses the hierarchical interacted features to refine the local detail in ambiguous regions. Extensive experiments conducted on the prevalent datasets, i.e. COD10K, CAMO, NC4K and CHAMELEON demonstrate the superior performance of HGINet compared to existing state-of-the-art methods. Our code is available at https://github.com/Garyson1204/HGINet.

8.
IEEE Trans Image Process ; 33: 6045-6056, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39356599

RESUMO

Inertial measurement units (IMU) in the capturing device can record the motion information of the device, with gyroscopes measuring angular velocity and accelerometers measuring acceleration. However, conventional deblurring methods seldom incorporate IMU data, and existing approaches that utilize IMU information often face challenges in fully leveraging this valuable data, resulting in noise issues from the sensors. To address these issues, in this paper, we propose a multi-stage deblurring network named INformer, which combines inertial information with the Transformer architecture. Specifically, we design an IMU-image Attention Fusion (IAF) block to merge motion information derived from inertial measurements with blurry image features at the attention level. Furthermore, we introduce an Inertial-Guided Deformable Attention (IGDA) block for utilizing the motion information features as guidance to adaptively adjust the receptive field, which can further refine the corresponding blur kernel for pixels. Extensive experiments on comprehensive benchmarks demonstrate that our proposed method performs favorably against state-of-the-art deblurring approaches.

9.
IEEE Trans Image Process ; 33: 5837-5848, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39383089

RESUMO

Object detection methods have achieved remarkable performances when the training and testing data satisfy the assumption of i.i.d. However, the training and testing data may be collected from different domains, and the gap between the domains can significantly degrade the detectors. Test Time Adaptive Object Detection (TTA-OD) is a novel online approach that aims to adapt detectors quickly and make predictions during the testing procedure. TTA-OD is more realistic than the existing unsupervised domain adaptation and source-free unsupervised domain adaptation approaches. For example, self-driving cars need to improve their perception of new environments in the TTA-OD paradigm during driving. To address this, we propose a multi-level feature alignment (MLFA) method for TTA-OD, which is able to adapt the model online based on the steaming target domain data. For a more straightforward adaptation, we select informative foreground and background features from image feature maps and capture their distributions using probabilistic models. Our approach includes: i) global-level feature alignment to align all informative feature distributions, thereby encouraging detectors to extract domain-invariant features, and ii) cluster-level feature alignment to match feature distributions for each category cluster across different domains. Through the multi-level alignment, we can prompt detectors to extract domain-invariant features, as well as align the category-specific components of image features from distinct domains. We conduct extensive experiments to verify the effectiveness of our proposed method. Our code is accessible at https://github.com/yaboliudotug/MLFA.

10.
IEEE Trans Pattern Anal Mach Intell ; 46(10): 6713-6730, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38598383

RESUMO

A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e., data augmentation) or 2) pre-processing the noisy image by learning to solve the inverse problem (i.e., image denoising). However, such methods generally exhibit inefficient process and unstable result, limiting their practical applications. In this paper, we explore a non-learning paradigm that aims to derive robust representation directly from noisy images, without the denoising as pre-processing. Here, the noise-robust representation is designed as Fractional-order Moments in Radon space (FMR), with also beneficial properties of orthogonality and rotation invariance. Unlike earlier integer-order methods, our work is a more generic design taking such classical methods as special cases, and the introduced fractional-order parameter offers time-frequency analysis capability that is not available in classical methods. Formally, both implicit and explicit paths for constructing the FMR are discussed in detail. Extensive simulation experiments and robust visual applications are provided to demonstrate the uniqueness and usefulness of our FMR, especially for noise robustness, rotation invariance, and time-frequency discriminability.

11.
Artigo em Inglês | MEDLINE | ID: mdl-38861429

RESUMO

Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called Diversity-Promoting Collaborative Metric Learning (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to introduce a set of multiple representations for each user in the system where users' preference toward an item is aggregated by taking the minimum item-user distance among their embedding set. Specifically, we instantiate two effective assignment strategies to explore a proper quantity of vectors for each user. Meanwhile, a Diversity Control Regularization Scheme (DCRS) is developed to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could induce a smaller generalization error than traditional CML. Furthermore, we notice that CML-based approaches usually require negative sampling to reduce the heavy computational burden caused by the pairwise objective therein. In this paper, we reveal the fundamental limitation of the widely adopted hard-aware sampling from the One-Way Partial AUC (OPAUC) perspective and then develop an effective sampling alternative for the CML-based paradigm. Finally, comprehensive experiments over a range of benchmark datasets speak to the efficacy of DPCML.

12.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5541-5555, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38412089

RESUMO

Optical aberration is a ubiquitous degeneration in realistic lens-based imaging systems. Optical aberrations are caused by the differences in the optical path length when light travels through different regions of the camera lens with different incident angles. The blur and chromatic aberrations manifest significant discrepancies when the optical system changes. This work designs a transferable and effective image simulation system of simple lenses via multi-wavelength, depth-aware, spatially-variant four-dimensional point spread functions (4D-PSFs) estimation by changing a small amount of lens-dependent parameters. The image simulation system can alleviate the overhead of dataset collecting and exploiting the principle of computational imaging for effective optical aberration correction. With the guidance of domain knowledge about the image formation model provided by the 4D-PSFs, we establish a multi-scale optical aberration correction network for degraded image reconstruction, which consists of a scene depth estimation branch and an image restoration branch. Specifically, we propose to predict adaptive filters with the depth-aware PSFs and carry out dynamic convolutions, which facilitate the model's generalization in various scenes. We also employ convolution and self-attention mechanisms for global and local feature extraction and realize a spatially-variant restoration. The multi-scale feature extraction complements the features across different scales and provides fine details and contextual features. Extensive experiments demonstrate that our proposed algorithm performs favorably against state-of-the-art restoration methods.

13.
Artigo em Inglês | MEDLINE | ID: mdl-38896521

RESUMO

Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc. Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.

14.
IEEE Trans Pattern Anal Mach Intell ; 46(9): 6367-6383, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38530739

RESUMO

Fast adversarial training (FAT) is an efficient method to improve robustness in white-box attack scenarios. However, the original FAT suffers from catastrophic overfitting, which dramatically and suddenly reduces robustness after a few training epochs. Although various FAT variants have been proposed to prevent overfitting, they require high training time. In this paper, we investigate the relationship between adversarial example quality and catastrophic overfitting by comparing the training processes of standard adversarial training and FAT. We find that catastrophic overfitting occurs when the attack success rate of adversarial examples becomes worse. Based on this observation, we propose a positive prior-guided adversarial initialization to prevent overfitting by improving adversarial example quality without extra training time. This initialization is generated by using high-quality adversarial perturbations from the historical training process. We provide theoretical analysis for the proposed initialization and propose a prior-guided regularization method that boosts the smoothness of the loss function. Additionally, we design a prior-guided ensemble FAT method that averages the different model weights of historical models using different decay rates. Our proposed method, called FGSM-PGK, assembles the prior-guided knowledge, i.e., the prior-guided initialization and model weights, acquired during the historical training process. The proposed method can effectively improve the model's adversarial robustness in white-box attack scenarios. Evaluations of four datasets demonstrate the superiority of the proposed method.

15.
Artigo em Inglês | MEDLINE | ID: mdl-37022900

RESUMO

Most multi-exposure image fusion (MEF) methods perform unidirectional alignment within limited and local regions, which ignore the effects of augmented locations and preserve deficient global features. In this work, we propose a multi-scale bidirectional alignment network via deformable self-attention to perform adaptive image fusion. The proposed network exploits differently exposed images and aligns them to the normal exposure in varying degrees. Specifically, we design a novel deformable self-attention module that considers variant long-distance attention and interaction and implements the bidirectional alignment for image fusion. To realize adaptive feature alignment, we employ a learnable weighted summation of different inputs and predict the offsets in the deformable self-attention module, which facilitates that the model generalizes well in various scenes. In addition, the multi-scale feature extraction strategy makes the features across different scales complementary and provides fine details and contextual features. Extensive experiments demonstrate that our proposed algorithm performs favorably against state-of-the-art MEF methods.

16.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7668-7685, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37819793

RESUMO

Nowadays, machine learning (ML) and deep learning (DL) methods have become fundamental building blocks for a wide range of AI applications. The popularity of these methods also makes them widely exposed to malicious attacks, which may cause severe security concerns. To understand the security properties of the ML/DL methods, researchers have recently started to turn their focus to adversarial attack algorithms that could successfully corrupt the model or clean data owned by the victim with imperceptible perturbations. In this paper, we study the Label Flipping Attack (LFA) problem, where the attacker expects to corrupt an ML/DL model's performance by flipping a small fraction of the labels in the training data. Prior art along this direction adopts combinatorial optimization problems, leading to limited scalability toward deep learning models. To this end, we propose a novel minimax problem which provides an efficient reformulation of the sample selection process in LFA. In the new optimization problem, the sample selection operation could be implemented with a single thresholding parameter. This leads to a novel training algorithm called Sample Thresholding. Since the objective function is differentiable and the model complexity does not depend on the sample size, we can apply Sample Thresholding to attack deep learning models. Moreover, since the victim's behavior is not predictable in a poisonous attack setting, we have to employ surrogate models to simulate the true model employed by the victim model. Seeing the problem, we provide a theoretical analysis of such a surrogate paradigm. Specifically, we show that the performance gap between the true model employed by the victim and the surrogate model is small under mild conditions. On top of this paradigm, we extend Sample Thresholding to the crowdsourced ranking task, where labels collected from the annotators are vulnerable to adversarial attacks. Finally, experimental analyses on three real-world datasets speak to the efficacy of our method.

17.
IEEE Trans Image Process ; 32: 5465-5477, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37773909

RESUMO

Context modeling or multi-level feature fusion methods have been proved to be effective in improving semantic segmentation performance. However, they are not specialized to deal with the problems of pixel-context mismatch and spatial feature misalignment, and the high computational complexity hinders their widespread application in real-time scenarios. In this work, we propose a lightweight Context and Spatial Feature Calibration Network (CSFCN) to address the above issues with pooling-based and sampling-based attention mechanisms. CSFCN contains two core modules: Context Feature Calibration (CFC) module and Spatial Feature Calibration (SFC) module. CFC adopts a cascaded pyramid pooling module to efficiently capture nested contexts, and then aggregates private contexts for each pixel based on pixel-context similarity to realize context feature calibration. SFC splits features into multiple groups of sub-features along the channel dimension and propagates sub-features therein by the learnable sampling to achieve spatial feature calibration. Extensive experiments on the Cityscapes and CamVid datasets illustrate that our method achieves a state-of-the-art trade-off between speed and accuracy. Concretely, our method achieves 78.7% mIoU with 70.0 FPS and 77.8% mIoU with 179.2 FPS on the Cityscapes and CamVid test sets, respectively. The code is available at https://nave.vr3i.com/ and https://github.com/kaigelee/CSFCN.

18.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4090-4108, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35834468

RESUMO

Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profit, the potential adversary has strong motivation and incentives to manipulate the ranking list. Meanwhile, the intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. To fully understand the possible risks, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data in this paper. From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim. To perform the targeted attack, we formulate the interaction between the adversary and the victim as a game-theoretic framework consisting of two continuous operators while Nash equilibrium is established. Then two procedures against HodgeRank and RankCentrality are constructed to produce the modification of the original data. Furthermore, we prove that the victims will produce the target ranking list once the adversary masters the complete information. It is noteworthy that the proposed methods allow the adversary only to hold incomplete information or imperfect feedback and perform the purposeful attack. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments. These experimental results show that the proposed methods could achieve the attacker's goal in the sense that the leading candidate of the perturbed ranking list is the designated one by the adversary.

19.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 5053-5069, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35981065

RESUMO

Top- k error has become a popular metric for large-scale classification benchmarks due to the inevitable semantic ambiguity among classes. Existing literature on top- k optimization generally focuses on the optimization method of the top- k objective, while ignoring the limitations of the metric itself. In this paper, we point out that the top- k objective lacks enough discrimination such that the induced predictions may give a totally irrelevant label a top rank. To fix this issue, we develop a novel metric named partial Area Under the top- k Curve (AUTKC). Theoretical analysis shows that AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top- K ranking with respect to the conditional probability. This shows that AUTKC does not allow irrelevant labels to appear in the top list. Furthermore, we present an empirical surrogate risk minimization framework to optimize the proposed metric. Theoretically, we present (1) a sufficient condition for Fisher consistency of the Bayes optimal score function; (2) a generalization upper bound which is insensitive to the number of classes under a simple hyperparameter setting. Finally, the experimental results on four benchmark datasets validate the effectiveness of our proposed framework.

20.
IEEE Trans Image Process ; 32: 2468-2480, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37115831

RESUMO

Human-object relationship detection reveals the fine-grained relationship between humans and objects, helping the comprehensive understanding of videos. Previous human-object relationship detection approaches are mainly developed with object features and relation features without exploring the specific information of humans. In this paper, we propose a novel Relation-Pose Transformer (RPT) for human-object relationship detection. Inspired by the coordination of eye-head-body movements in cognitive science, we employ the head pose to find those crucial objects that humans focus on and use the body pose with skeleton information to represent multiple actions. Then, we utilize the spatial encoder to capture spatial contextualized information of the relation pair, which integrates the relation features and pose features. Next, the temporal decoder aims to model the temporal dependency of the relationship. Finally, we adopt multiple classifiers to predict different types of relationships. Extensive experiments on the benchmark Action Genome validate the effectiveness of our proposed method and show the state-of-the-art performance compared with related methods.


Assuntos
Cognição , Apego ao Objeto , Humanos , Benchmarking , Movimentos da Cabeça , Esqueleto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA