Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros

Base de dados
Tipo de documento
Assunto da revista
Intervalo de ano de publicação
1.
Sensors (Basel) ; 24(14)2024 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-39066043

RESUMO

Human activity recognition (HAR) is pivotal in advancing applications ranging from healthcare monitoring to interactive gaming. Traditional HAR systems, primarily relying on single data sources, face limitations in capturing the full spectrum of human activities. This study introduces a comprehensive approach to HAR by integrating two critical modalities: RGB imaging and advanced pose estimation features. Our methodology leverages the strengths of each modality to overcome the drawbacks of unimodal systems, providing a richer and more accurate representation of activities. We propose a two-stream network that processes skeletal and RGB data in parallel, enhanced by pose estimation techniques for refined feature extraction. The integration of these modalities is facilitated through advanced fusion algorithms, significantly improving recognition accuracy. Extensive experiments conducted on the UTD multimodal human action dataset (UTD MHAD) demonstrate that the proposed approach exceeds the performance of existing state-of-the-art algorithms, yielding improved outcomes. This study not only sets a new benchmark for HAR systems but also highlights the importance of feature engineering in capturing the complexity of human movements and the integration of optimal features. Our findings pave the way for more sophisticated, reliable, and applicable HAR systems in real-world scenarios.


Assuntos
Algoritmos , Atividades Humanas , Humanos , Processamento de Imagem Assistida por Computador/métodos , Movimento/fisiologia , Postura/fisiologia , Reconhecimento Automatizado de Padrão/métodos
2.
Sensors (Basel) ; 23(12)2023 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-37420932

RESUMO

Defect inspection is important to ensure consistent quality and efficiency in industrial manufacturing. Recently, machine vision systems integrating artificial intelligence (AI)-based inspection algorithms have exhibited promising performance in various applications, but practically, they often suffer from data imbalance. This paper proposes a defect inspection method using a one-class classification (OCC) model to deal with imbalanced datasets. A two-stream network architecture consisting of global and local feature extractor networks is presented, which can alleviate the representation collapse problem of OCC. By combining an object-oriented invariant feature vector with a training-data-oriented local feature vector, the proposed two-stream network model prevents the decision boundary from collapsing to the training dataset and obtains an appropriate decision boundary. The performance of the proposed model is demonstrated in the practical application of automotive-airbag bracket-welding defect inspection. The effects of the classification layer and two-stream network architecture on the overall inspection accuracy were clarified by using image samples collected in a controlled laboratory environment and from a production site. The results are compared with those of a previous classification model, demonstrating that the proposed model can improve the accuracy, precision, and F1 score by up to 8.19%, 10.74%, and 4.02%, respectively.


Assuntos
Inteligência Artificial , Rios , Algoritmos
3.
Sensors (Basel) ; 22(2)2022 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-35062558

RESUMO

In the field of video action classification, existing network frameworks often only use video frames as input. When the object involved in the action does not appear in a prominent position in the video frame, the network cannot accurately classify it. We introduce a new neural network structure that uses sound to assist in processing such tasks. The original sound wave is converted into sound texture as the input of the network. Furthermore, in order to use the rich modal information (images and sound) in the video, we designed and used a two-stream frame. In this work, we assume that sound data can be used to solve motion recognition tasks. To demonstrate this, we designed a neural network based on sound texture to perform video action classification tasks. Then, we fuse this network with a deep neural network that uses continuous video frames to construct a two-stream network, which is called A-IN. Finally, in the kinetics dataset, we use our proposed A-IN to compare with the image-only network. The experimental results show that the recognition accuracy of the two-stream neural network model with uesed sound data features is increased by 7.6% compared with the network using video frames. This proves that the rational use of the rich information in the video can improve the classification effect.


Assuntos
Redes Neurais de Computação , Reconhecimento Automatizado de Padrão , Som
4.
Sensors (Basel) ; 21(3)2021 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-33572928

RESUMO

In recent years, human detection in indoor scenes has been widely applied in smart buildings and smart security, but many related challenges can still be difficult to address, such as frequent occlusion, low illumination and multiple poses. This paper proposes an asymmetric adaptive fusion two-stream network (AAFTS-net) for RGB-D human detection. This network can fully extract person-specific depth features and RGB features while reducing the typical complexity of a two-stream network. A depth feature pyramid is constructed by combining contextual information, with the motivation of combining multiscale depth features to improve the adaptability for targets of different sizes. An adaptive channel weighting (ACW) module weights the RGB-D feature channels to achieve efficient feature selection and information complementation. This paper also introduces a novel RGB-D dataset for human detection called RGBD-human, on which we verify the performance of the proposed algorithm. The experimental results show that AAFTS-net outperforms existing state-of-the-art methods and can maintain stable performance under conditions of frequent occlusion, low illumination and multiple poses.


Assuntos
Algoritmos , Atividades Humanas , Humanos , Redes Neurais de Computação
5.
Front Neurosci ; 17: 1212049, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37397450

RESUMO

Introduction: The human brain processes shape and texture information separately through different neurons in the visual system. In intelligent computer-aided imaging diagnosis, pre-trained feature extractors are commonly used in various medical image recognition methods, common pre-training datasets such as ImageNet tend to improve the texture representation of the model but make it ignore many shape features. Weak shape feature representation is disadvantageous for some tasks that focus on shape features in medical image analysis. Methods: Inspired by the function of neurons in the human brain, in this paper, we proposed a shape-and-texture-biased two-stream network to enhance the shape feature representation in knowledge-guided medical image analysis. First, the two-stream network shape-biased stream and a texture-biased stream are constructed through classification and segmentation multi-task joint learning. Second, we propose pyramid-grouped convolution to enhance the texture feature representation and introduce deformable convolution to enhance the shape feature extraction. Third, we used a channel-attention-based feature selection module in shape and texture feature fusion to focus on the key features and eliminate information redundancy caused by feature fusion. Finally, aiming at the problem of model optimization difficulty caused by the imbalance in the number of benign and malignant samples in medical images, an asymmetric loss function was introduced to improve the robustness of the model. Results and conclusion: We applied our method to the melanoma recognition task on ISIC-2019 and XJTU-MM datasets, which focus on both the texture and shape of the lesions. The experimental results on dermoscopic image recognition and pathological image recognition datasets show the proposed method outperforms the compared algorithms and prove the effectiveness of our method.

6.
Comput Biol Med ; 147: 105803, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35809411

RESUMO

At present, the assessment of mental retardation is mainly based on clinical interview, which requires the participation of experienced psychiatrist and is laborious. Studies have shown that there are correlations between mental retardation and abnormal behaviors (such as, hyperkinetic, tics, stereotypes, etc.). On the basis of this fact, a two stream Non-Local CNN-LSTM network has been proposed to learn the features of upper body behavior and facial expression of patients, thus, to achieve the preliminary screening of mental retardation. Specifically, RGB and optical flow are extracted separately from interview videos, and a two stream network based on contribution mechanism is designed to effectively fuse the information of two kinds of images, which may update the network in a new approach of alternating iteration training to find the optimal model. Besides, by introducing non-local mechanism and adopting it to the network, the global feature sensing can be established more effectively to reduce the background interference for video clip in a short time zone. Experiments on clinical video dataset show that the performance of proposed model is better than other prevalent deep learning methods of behavioral feature learning, the accuracy reaches 89.15% in basic experiment, and is further improved to 89.52% in the supplementary experiment. Furthermore, the experimental results show that this method still has a lot of room for improvement. In general, our work indicates that the proposed model has potential value for the clinical diagnosis and screening of mental retardation.


Assuntos
Deficiência Intelectual , Redes Neurais de Computação , Humanos , Deficiência Intelectual/diagnóstico por imagem
7.
Heliyon ; 8(11): e11401, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36387431

RESUMO

Aiming at the problem of low modeling efficiency and feature loss of temporal modeling in human action recognition, we propose a human action recognition method based on Motion Excitation and Temporal Aggregation module (META). The method can capture multi-state and multi-scale temporal information to achieve effective motion excitation. Firstly, temporal relational sampling is performed on video frames; Secondly, META is proposed to capture multi-state and multi-scale temporal information. META is composed of Multi-scale Motion Excitation module (MME) and Squeeze and Excitation Temporal Aggregation module (SETA). MME captures the feature level temporal difference by transforming the features into the temporal channel, which directly establishes the relationship between features and temporal channel, and solves the problem of low modeling efficiency. SETA transforms the local convolution into a set of sub-convolutions. Multiple sub-convolutions form hierarchies to extract features together and share the results of the upper convolutional layer, which increases the final temporal receptive field and solves the problem of feature loss. Moreover, the optical flow features are extracted through Cross modality pre-training to improve the utilization of temporal information. Finally, the result of human action recognition is carried out by combining spatiotemporal two stream features. Experimental results show that the accuracy of this method in UCF101 and HMDB-51 is 96.0% and 71.2% respectively, which is higher than other studies in the same period.

8.
Animals (Basel) ; 11(2)2021 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-33673162

RESUMO

Behavior analysis of wild felines has significance for the protection of a grassland ecological environment. Compared with human action recognition, fewer researchers have focused on feline behavior analysis. This paper proposes a novel two-stream architecture that incorporates spatial and temporal networks for wild feline action recognition. The spatial portion outlines the object region extracted by Mask region-based convolutional neural network (R-CNN) and builds a Tiny Visual Geometry Group (VGG) network for static action recognition. Compared with VGG16, the Tiny VGG network can reduce the number of network parameters and avoid overfitting. The temporal part presents a novel skeleton-based action recognition model based on the bending angle fluctuation amplitude of the knee joints in a video clip. Due to its temporal features, the model can effectively distinguish between different upright actions, such as standing, ambling, and galloping, particularly when the felines are occluded by objects such as plants, fallen trees, and so on. The experimental results showed that the proposed two-stream network model can effectively outline the wild feline targets in captured images and can significantly improve the performance of wild feline action recognition due to its spatial and temporal features.

9.
Animals (Basel) ; 11(6)2021 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-34206077

RESUMO

Automated recognition of human facial expressions of pain and emotions is to a certain degree a solved problem, using approaches based on computer vision and machine learning. However, the application of such methods to horses has proven difficult. Major barriers are the lack of sufficiently large, annotated databases for horses and difficulties in obtaining correct classifications of pain because horses are non-verbal. This review describes our work to overcome these barriers, using two different approaches. One involves the use of a manual, but relatively objective, classification system for facial activity (Facial Action Coding System), where data are analyzed for pain expressions after coding using machine learning principles. We have devised tools that can aid manual labeling by identifying the faces and facial keypoints of horses. This approach provides promising results in the automated recognition of facial action units from images. The second approach, recurrent neural network end-to-end learning, requires less extraction of features and representations from the video but instead depends on large volumes of video data with ground truth. Our preliminary results suggest clearly that dynamics are important for pain recognition and show that combinations of recurrent neural networks can classify experimental pain in a small number of horses better than human raters.

10.
Neural Netw ; 109: 31-42, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30390521

RESUMO

In this paper, we propose a novel fully convolutional two-stream fusion network (FCTSFN) for interactiveimage segmentation. The proposed network includes two sub-networks: a two-stream late fusion network (TSLFN) that predicts the foreground at a reduced resolution, and a multi-scale refining network (MSRN) that refines the foreground at full resolution. The TSLFN includes two distinct deep streams followed by a fusion network. The intuition is that, since user interactions are more direct information on foreground/background than the image itself, the two-stream structure of the TSLFN reduces the number of layers between the pure user interaction features and the network output, allowing the user interactions to have a more direct impact on the segmentation result. The MSRN fuses the features from different layers of TSLFN with different scales, in order to seek the local to global information on the foreground to refine the segmentation result at full resolution. We conduct comprehensive experiments on four benchmark datasets. The results show that the proposed network achieves competitive performance compared to current state-of-the-art interactive image segmentation methods. 1.


Assuntos
Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Bases de Dados Factuais , Reconhecimento Visual de Modelos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA