Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Sports Sci ; 41(19): 1779-1786, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38155177

RESUMO

This study examined the reliability of expert tennis coaches/biomechanists to qualitatively assess selected features of the serve with the aid of two-dimensional (2D) video replays. Two expert high-performance coaches rated the serves of 150 male and 150 female players across three different age groups from two different camera viewing angles. Serve performance was rated across 13 variables that represented commonly investigated and coached (serve) mechanics using a 1-7 Likert rating scale. A total of 7800 ratings were performed. The reliability of the experts' ratings was assessed using a Krippendorffs alpha. Strong agreement was shown across all age groups and genders when the experts rated the overall serve score (0.727-0.924), power or speed of the serve (0.720-0.907), rhythm (0.744-0.944), quality of the trunk action (0.775-1.000), leg drive (0.731-0.959) and the likelihood of back injury (0.703-0.934). They encountered greater difficulty in consistently rating shoulder internal rotation speed (0.688-0.717). In high-performance settings, the desire for highly precise measurement and large data sets powered by new technologies, is commonplace but this study revealed that tennis experts, through the use of 2D video, can reliably rate important mechanical features of the game's most important shot, the serve.


Assuntos
Tênis , Humanos , Masculino , Feminino , Fenômenos Biomecânicos , Reprodutibilidade dos Testes , Extremidade Superior , Ombro
2.
Sensors (Basel) ; 23(10)2023 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-37430660

RESUMO

Smart metering systems (SMSs) have been widely used by industrial users and residential customers for purposes such as real-time tracking, outage notification, quality monitoring, load forecasting, etc. However, the consumption data it generates can violate customers' privacy through absence detection or behavior recognition. Homomorphic encryption (HE) has emerged as one of the most promising methods to protect data privacy based on its security guarantees and computability over encrypted data. However, SMSs have various application scenarios in practice. Consequently, we used the concept of trust boundaries to help design HE solutions for privacy protection under these different scenarios of SMSs. This paper proposes a privacy-preserving framework as a systematic privacy protection solution for SMSs by implementing HE with trust boundaries for various SMS scenarios. To show the feasibility of the proposed HE framework, we evaluated its performance on two computation metrics, summation and variance, which are often used for billing, usage predictions, and other related tasks. The security parameter set was chosen to provide a security level of 128 bits. In terms of performance, the aforementioned metrics could be computed in 58,235 ms for summation and 127,423 ms for variance, given a sample size of 100 households. These results indicate that the proposed HE framework can protect customer privacy under varying trust boundary scenarios in SMS. The computational overhead is acceptable from a cost-benefit perspective while ensuring data privacy.

3.
Am J Orthod Dentofacial Orthop ; 163(3): 357-367.e3, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36503861

RESUMO

INTRODUCTION: Recent 3-dimensional technology advancements have resulted in new techniques to improve the accuracy of intraoperative transfer. This study aimed to validate the accuracy of computer-aided design and manufacturing (CAD-CAM) customized surgical cutting guides and fixation plates on mandibular repositioning surgery performed in isolation or combined with simultaneous maxillary repositioning surgery. METHODS: Sixty patients who underwent mandibular advancement surgery by the same surgeon were retrospectively evaluated by 3-dimensional surface-based superimposition. A 3-point coordinate system (x, y, z) was used to identify the linear and angular discrepancies between the planned movements and actual outcomes. Wilcoxon rank sum test was used to compare the outcomes between the mandible-only and the bimaxillary surgery groups with significance at P <0.05. Pearson correlation coefficient compared planned mandible advancement to the outcome from advancement planned. The centroid, which represents the mandible as a single unit, was computed from 3 landmarks, and the discrepancies were evaluated by the root mean square error (RMSE) for clinical significance set at 2 mm for linear discrepancies and 4° for angular discrepancies. RESULTS: There was no statistically significant difference between the planned and actual position of the mandible in either group when considering absolute values of the differences. When considering raw directional data, a statistically significant difference was identified in the y-axis suggesting a tendency for under-advancement of the mandible in the bimaxillary group. The largest translational RMSE for the centroid was 0.77 mm in the sagittal dimension for the bimaxillary surgery group. The largest rotational RMSE for the centroid was 1.25° in the transverse dimension for the bimaxillary surgery group. Our results show that the precision and clinical feasibility of CAD-CAM customized surgical cutting guides and fixation plates on mandibular repositioning surgery is well within clinically acceptable parameters. CONCLUSION: Mandibular repositioning surgery can be performed predictably and accurately with the aid of CAD-CAM customized surgical cutting guides and fixation plates with or without maxillary surgery.


Assuntos
Procedimentos Cirúrgicos Ortognáticos , Cirurgia Assistida por Computador , Humanos , Estudos Retrospectivos , Cirurgia Assistida por Computador/métodos , Imageamento Tridimensional , Procedimentos Cirúrgicos Ortognáticos/métodos , Desenho Assistido por Computador
4.
Proc Biol Sci ; 289(1971): 20220143, 2022 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-35317674

RESUMO

The broad autism phenotype commonly refers to sub-clinical levels of autistic-like behaviour and cognition presented in biological relatives of autistic people. In a recent study, we reported findings suggesting that the broad autism phenotype may also be expressed in facial morphology, specifically increased facial masculinity. Increased facial masculinity has been reported among autistic children, as well as their non-autistic siblings. The present study builds on our previous findings by investigating the presence of increased facial masculinity among non-autistic parents of autistic children. Using a previously established method, a 'facial masculinity score' and several facial distances were calculated for each three-dimensional facial image of 192 parents of autistic children (58 males, 134 females) and 163 age-matched parents of non-autistic children (50 males, 113 females). While controlling for facial area and age, significantly higher masculinity scores and larger (more masculine) facial distances were observed in parents of autistic children relative to the comparison group, with effect sizes ranging from small to medium (0.16 ≤ d ≤ .41), regardless of sex. These findings add to an accumulating evidence base that the broad autism phenotype is expressed in physical characteristics and suggest that both maternal and paternal pathways are implicated in masculinized facial morphology.


Assuntos
Transtorno Autístico , Face/anatomia & histologia , Pai , Feminino , Humanos , Masculino , Masculinidade , Fenótipo
5.
Sensors (Basel) ; 21(13)2021 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-34283080

RESUMO

The application of artificial intelligence techniques to wearable sensor data may facilitate accurate analysis outside of controlled laboratory settings-the holy grail for gait clinicians and sports scientists looking to bridge the lab to field divide. Using these techniques, parameters that are difficult to directly measure in-the-wild, may be predicted using surrogate lower resolution inputs. One example is the prediction of joint kinematics and kinetics based on inputs from inertial measurement unit (IMU) sensors. Despite increased research, there is a paucity of information examining the most suitable artificial neural network (ANN) for predicting gait kinematics and kinetics from IMUs. This paper compares the performance of three commonly employed ANNs used to predict gait kinematics and kinetics: multilayer perceptron (MLP); long short-term memory (LSTM); and convolutional neural networks (CNN). Overall high correlations between ground truth and predicted kinematic and kinetic data were found across all investigated ANNs. However, the optimal ANN should be based on the prediction task and the intended use-case application. For the prediction of joint angles, CNNs appear favourable, however these ANNs do not show an advantage over an MLP network for the prediction of joint moments. If real-time joint angle and joint moment prediction is desirable an LSTM network should be utilised.


Assuntos
Inteligência Artificial , Redes Neurais de Computação , Fenômenos Biomecânicos , Marcha , Cinética
6.
Sensors (Basel) ; 21(7)2021 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-33810604

RESUMO

Conventional methods of uniformly spraying fields to combat weeds, requires large herbicide inputs at significant cost with impacts on the environment. More focused weed control methods such as site-specific weed management (SSWM) have become popular but require methods to identify weed locations. Advances in technology allows the potential for automated methods such as drone, but also ground-based sensors for detecting and mapping weeds. In this study, the capability of Light Detection and Ranging (LiDAR) sensors were assessed to detect and locate weeds. For this purpose, two trials were performed using artificial targets (representing weeds) at different heights and diameter to understand the detection limits of a LiDAR. The results showed the detectability of the target at different scanning distances from the LiDAR was directly influenced by the size of the target and its orientation toward the LiDAR. A third trial was performed in a wheat plot where the LiDAR was used to scan different weed species at various heights above the crop canopy, to verify the capacity of the stationary LiDAR to detect weeds in a field situation. The results showed that 100% of weeds in the wheat plot were detected by the LiDAR, based on their height differences with the crop canopy.

7.
Sensors (Basel) ; 20(22)2020 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-33233568

RESUMO

Convolutional neural networks have recently been used for multi-focus image fusion. However, some existing methods have resorted to adding Gaussian blur to focused images, to simulate defocus, thereby generating data (with ground-truth) for supervised learning. Moreover, they classify pixels as 'focused' or 'defocused', and use the classified results to construct the fusion weight maps. This then necessitates a series of post-processing steps. In this paper, we present an end-to-end learning approach for directly predicting the fully focused output image from multi-focus input image pairs. The suggested approach uses a CNN architecture trained to perform fusion, without the need for ground truth fused images. The CNN exploits the image structural similarity (SSIM) to calculate the loss, a metric that is widely accepted for fused image quality evaluation. What is more, we also use the standard deviation of a local window of the image to automatically estimate the importance of the source images in the final fused image when designing the loss function. Our network can accept images of variable sizes and hence, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluation on benchmark datasets show that our method outperforms, or is comparable with, existing state-of-the-art techniques on both objective and subjective benchmarks.

8.
Sensors (Basel) ; 20(23)2020 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-33291759

RESUMO

Detecting key frames in videos is a common problem in many applications such as video classification, action recognition and video summarization. These tasks can be performed more efficiently using only a handful of key frames rather than the full video. Existing key frame detection approaches are mostly designed for supervised learning and require manual labelling of key frames in a large corpus of training data to train the models. Labelling requires human annotators from different backgrounds to annotate key frames in videos which is not only expensive and time consuming but also prone to subjective errors and inconsistencies between the labelers. To overcome these problems, we propose an automatic self-supervised method for detecting key frames in a video. Our method comprises a two-stream ConvNet and a novel automatic annotation architecture able to reliably annotate key frames in a video for self-supervised learning of the ConvNet. The proposed ConvNet learns deep appearance and motion features to detect frames that are unique. The trained network is then able to detect key frames in test videos. Extensive experiments on UCF101 human action and video summarization VSUMM datasets demonstrates the effectiveness of our proposed method.

9.
Am J Orthod Dentofacial Orthop ; 158(1): 134-146, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32414548

RESUMO

INTRODUCTION: It is considered normal for facial structures to exhibit mild asymmetry between left and right sides. An automated, landmark-independent method was developed to accurately assess and quantify facial asymmetry in 3 planes of space and describe a midline deviation of each subject and ultimately establish thresholds of significance. METHODS: The subjects were 279 healthy young Western Australian white adults (134 females and 145 males) with a mean age 22.17 years ± 0.63, (minimum 20.58 years-maximum 24.42 years) without craniofacial anomalies. They were randomly selected from participants in the Raine Study-Generation 2. Surface facial images were obtained using a 3dMDface scanning system (3dMD Inc, Atlanta, Ga). Images were standardized using the dense correspondence technique. An automated landmark detection method was applied, and measurements performed on color deviation maps to quantitatively assess facial asymmetry. RESULTS: Based on asymmetrical projections over the total facial surface area, the proportion of female and males with moderate asymmetry (2-5 mm) was 52.3% and 58.4%, respectively, and with severe asymmetry (>5 mm) was 7.1% and 7.7%, respectively. Most asymmetry occurred in the coronal plane (x-axis), followed by the transverse plane (z-axis) and the least asymmetry in the sagittal plane (y-axis). Males were statistically more asymmetrical (P <0.05) in the coronal and transverse planes (males: coronal 36.5%, transverse 15.2%; females: coronal 31.8%, transverse 12.3%). The midline was deviated to the right in all females and in all but 1 male subject. CONCLUSIONS: This study presents an automated, rapid and accurate method of assessing 3-dimensional facial asymmetry (using symmetry and midline analyses). Analyses revealed that >50% of the faces of young adults are >2 mm asymmetrical, based on total facial surface area.


Assuntos
Assimetria Facial , Imageamento Tridimensional , Adulto , Austrália , Cefalometria , Feminino , Humanos , Masculino , Adulto Jovem
10.
Sensors (Basel) ; 19(14)2019 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-31330773

RESUMO

Forged documents and counterfeit currency can be better detected with multispectral imaging in multiple color channels instead of the usual red, green and blue. However, multispectral cameras/scanners are expensive. We propose the construction of a low cost scanner designed to capture multispectral images of documents. A standard sheet-feed scanner was modified by disconnecting its internal light source and connecting an external multispectral light source comprising of narrow band light emitting diodes (LED). A document was scanned by illuminating the scanner light guide successively with different LEDs and capturing a scan of the document. The system costs less than a hundred dollars and is portable. It can potentially be used for applications in verification of questioned documents, checks, receipts and bank notes.

12.
Proc Biol Sci ; 282(1816): 20151351, 2015 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-26400740

RESUMO

Prenatal testosterone may have a powerful masculinizing effect on postnatal physical characteristics. However, no study has directly tested this hypothesis. Here, we report a 20-year follow-up study that measured testosterone concentrations from the umbilical cord blood of 97 male and 86 female newborns, and procured three-dimensional facial images on these participants in adulthood (range: 21-24 years). Twenty-three Euclidean and geodesic distances were measured from the facial images and an algorithm identified a set of six distances that most effectively distinguished adult males from females. From these distances, a 'gender score' was calculated for each face, indicating the degree of masculinity or femininity. Higher cord testosterone levels were associated with masculinized facial features when males and females were analysed together (n = 183; r = -0.59), as well as when males (n = 86; r = -0.55) and females (n = 97; r = -0.48) were examined separately (p-values < 0.001). The relationships remained significant and substantial after adjusting for potentially confounding variables. Adult circulating testosterone concentrations were available for males but showed no statistically significant relationship with gendered facial morphology (n = 85, r = 0.01, p = 0.93). This study provides the first direct evidence of a link between prenatal testosterone exposure and human facial structure.


Assuntos
Face/anatomia & histologia , Sangue Fetal/química , Testosterona/metabolismo , Feminino , Seguimentos , Humanos , Masculino , Gravidez , Caracteres Sexuais , Austrália Ocidental , Adulto Jovem
13.
Opt Express ; 23(12): 15160-73, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26193499

RESUMO

Over a decade ago, Pan et al. [IEEE TPAMI 25, 1552 (2003)] performed face recognition using only the spectral reflectance of the face at six points and reported around 95% recognition rate. Since their database is private, no one has been able to replicate these results. Moreover, due to the unavailability of public datasets, there has been no detailed study in the literature on the viability of facial spectral reflectance for person identification. In this study, we introduce a new public database of facial spectral reflectance profiles measured with a high precision spectrometer. For each of the 40 subjects, spectral reflectance was measured at the same six points as Pan et al. [IEEE TPAMI 25, 1552 (2003)] in multiple sessions and with time lapse. Furthermore, we sample the facial spectral reflectance from two public hyperspectral face image datasets and analyzed the data using state of the art face classification techniques. The best performing classifier achieved the maximum rank-1 identification rate of 53.8%. We conclude that facial spectral reflectance alone is not a reliable biometric for unconstrained face recognition.

14.
IEEE Trans Image Process ; 33: 2639-2651, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38551827

RESUMO

Current semi-supervised video object segmentation (VOS) methods often employ the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we introduce a Region Aware Video Object Segmentation (RAVOS) approach, which predicts regions of interest (ROIs) for efficient object segmentation and memory storage. RAVOS includes a fast object motion tracker to predict object ROIs in the next frame. For efficient segmentation, object features are extracted based on the ROIs, and an object decoder is designed for object-level segmentation. For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects. In addition to RAVOS, we also propose a large-scale occluded VOS dataset, dubbed OVOS, to benchmark the performance of VOS models under occlusions. Evaluation on DAVIS and YouTube-VOS benchmarks and our new OVOS dataset show that our method achieves state-of-the-art performance with significantly faster inference time, e.g., 86.1 J & F at 42 FPS on DAVIS and 84.4 J & F at 23 FPS on YouTube-VOS. Project page: ravos.netlify.app.

15.
Artigo em Inglês | MEDLINE | ID: mdl-38875095

RESUMO

Point cloud processing methods exploit local point features and global context through aggregation which does not explicitly model the internal correlations between local and global features. To address this problem, we propose full point encoding which is applicable to convolution and transformer architectures. Specifically, we propose full point convolution (FuPConv) and full point transformer (FPTransformer) architectures. The key idea is to adaptively learn the weights from local and global geometric connections, where the connections are established through local and global correlation functions, respectively. FuPConv and FPTransformer simultaneously model the local and global geometric relationships as well as their internal correlations, demonstrating strong generalization ability and high performance. FuPConv is incorporated in classical hierarchical network architectures to achieve local and global shape-aware learning. In FPTransformer, we introduce full point position encoding in self-attention, that hierarchically encodes each point position in the global and local receptive field. We also propose a shape-aware downsampling block that takes into account the local shape and the global context. Experimental comparison to existing methods on benchmark datasets shows the efficacy of FuPConv and FPTransformer for semantic segmentation, object detection, classification, and normal estimation tasks. In particular, we achieve state-of-the-art semantic segmentation results of 76.8% mIoU on S3DIS sixfold and 73.1% on S3DIS Area 5. Our code is available at https://github.com/hnuhyuwa/FullPointTransformer.

16.
Artigo em Inglês | MEDLINE | ID: mdl-38356214

RESUMO

Six-degree-of-freedom (6DoF) object pose estimation is a crucial task for virtual reality and accurate robotic manipulation. Category-level 6DoF pose estimation has recently become popular as it improves generalization to a complete category of objects. However, current methods focus on data-driven differential learning, which makes them highly dependent on the quality of the real-world labeled data and limits their ability to generalize to unseen objects. To address this problem, we propose multi-hypothesis (MH) consistency learning (MH6D) for category-level 6-D object pose estimation without using real-world training data. MH6D uses a parallel consistency learning structure, alleviating the uncertainty problem of single-shot feature extraction and promoting self-adaptation of domain to reduce the synthetic-to-real domain gap. Specifically, three randomly sampled pose transformations are first performed in parallel on the input point cloud. An attention-guided category-level 6-D pose estimation network with channel attention (CA) and global feature cross-attention (GFCA) modules is then proposed to estimate the three hypothesized 6-D object poses by extracting and fusing the global and local features effectively. Finally, we propose a novel loss function that considers both the process and the final result information allowing MH6D to perform robust consistency learning. We conduct experiments under two different training data settings (i.e., only synthetic data and synthetic and real-world data) to verify the generalization ability of MH6D. Extensive experiments on benchmark datasets demonstrate that MH6D achieves state-of-the-art (SOTA) performance, outperforming most data-driven methods even without using any real-world data. The code is available at https://github.com/CNJianLiu/MH6D.

17.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9528-9535, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35230955

RESUMO

Convolutional neural network (CNN) architectures are generally heavy on memory and computational requirements which make them infeasible for embedded systems with limited hardware resources. We propose dual convolutional kernels (DualConv) for constructing lightweight deep neural networks. DualConv combines 3×3 and 1×1 convolutional kernels to process the same input feature map channels simultaneously and exploits the group convolution technique to efficiently arrange convolutional filters. DualConv can be employed in any CNN model such as VGG-16 and ResNet-50 for image classification, you only look once (YOLO) and R-CNN for object detection, or fully convolutional network (FCN) for semantic segmentation. In this work, we extensively test DualConv for classification since these network architectures form the backbone for many other tasks. We also test DualConv for image detection on YOLO-V3. Experimental results show that, combined with our structural innovations, DualConv significantly reduces the computational cost and number of parameters of deep neural networks while surprisingly achieving slightly higher accuracy than the original models in some cases. We use DualConv to further reduce the number of parameters of the lightweight MobileNetV2 by 54% with only 0.68% drop in accuracy on CIFAR-100 dataset. When the number of parameters is not an issue, DualConv increases the accuracy of MobileNetV1 by 4.11% on the same dataset. Furthermore, DualConv significantly improves the YOLO-V3 object detection speed and improves its accuracy by 4.4% on PASCAL visual object classes (VOC) dataset.

18.
Artigo em Inglês | MEDLINE | ID: mdl-37310827

RESUMO

Geometric feature learning for 3-D surfaces is critical for many applications in computer graphics and 3-D vision. However, deep learning currently lags in hierarchical modeling of 3-D surfaces due to the lack of required operations and/or their efficient implementations. In this article, we propose a series of modular operations for effective geometric feature learning from 3-D triangle meshes. These operations include novel mesh convolutions, efficient mesh decimation, and associated mesh (un)poolings. Our mesh convolutions exploit spherical harmonics as orthonormal bases to create continuous convolutional filters. The mesh decimation module is graphics processing unit (GPU)-accelerated and able to process batched meshes on-the-fly, while the (un)pooling operations compute features for upsampled/downsampled meshes. We provide an open-source implementation of these operations, collectively termed Picasso. Picasso supports heterogeneous mesh batching and processing. Leveraging its modular operations, we further contribute a novel hierarchical neural network for perceptual parsing of 3-D surfaces, named PicassoNet ++ . It achieves highly competitive performance for shape analysis and scene segmentation on prominent 3-D benchmarks. The code, data, and trained models are available at https://github.com/EnyaHermite/Picasso.

19.
Opt Express ; 20(10): 10658-73, 2012 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-22565691

RESUMO

Hyperspectral video acquisition is a trade-off between spectral and temporal resolution. We present an algorithm for recovering dense hyperspectral video of dynamic scenes from a few measured multispectral bands per frame using optical flow and sparse coding. Different set of bands are measured in each video frame and optical flow is used to register them. Optical flow errors are corrected by exploiting sparsity in the spectra and the spatial correlation between images of a scene at different wavelengths. A redundant dictionary of atoms is learned that can sparsely approximate training spectra. The restoration of correct spectra is formulated as an ℓ1 convex optimization problem that minimizes a Mahalanobis-like weighted distance between the restored and corrupt signals as well as the restored signal and the median of the eight connected neighbours of the corrupt signal such that the restored signal is a sparse linear combination of the dictionary atoms. Spectral restoration is followed by spatial restoration using a guided dictionary approach where one dictionary is learned for measured bands and another for a band that is to be spatially restored. By constraining the sparse coding coefficients of both dictionaries to be the same, the restoration of corrupt band is guided by the more reliable measured bands. Experiments on real data and comparison with an existing volumetric image denoising technique shows the superiority of our algorithm.

20.
IEEE Trans Neural Netw Learn Syst ; 33(4): 1609-1622, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-33351768

RESUMO

Deep learning models achieve impressive performance for skeleton-based human action recognition. Graph convolutional networks (GCNs) are particularly suitable for this task due to the graph-structured nature of skeleton data. However, the robustness of these models to adversarial attacks remains largely unexplored due to their complex spatiotemporal nature that must represent sparse and discrete skeleton joints. This work presents the first adversarial attack on skeleton-based action recognition with GCNs. The proposed targeted attack, termed constrained iterative attack for skeleton actions (CIASA), perturbs joint locations in an action sequence such that the resulting adversarial sequence preserves the temporal coherence, spatial integrity, and the anthropomorphic plausibility of the skeletons. CIASA achieves this feat by satisfying multiple physical constraints and employing spatial skeleton realignments for the perturbed skeletons along with regularization of the adversarial skeletons with generative networks. We also explore the possibility of semantically imperceptible localized attacks with CIASA and succeed in fooling the state-of-the-art skeleton action recognition models with high confidence. CIASA perturbations show high transferability in black-box settings. We also show that the perturbed skeleton sequences are able to induce adversarial behavior in the RGB videos created with computer graphics. A comprehensive evaluation with NTU and Kinetics data sets ascertains the effectiveness of CIASA for graph-based skeleton action recognition and reveals the imminent threat to the spatiotemporal deep learning tasks in general.


Assuntos
Redes Neurais de Computação , Reconhecimento Automatizado de Padrão , Humanos , Reconhecimento Psicológico , Esqueleto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA