Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Sensors (Basel) ; 24(8)2024 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-38676137

RESUMO

Human action recognition (HAR) is growing in machine learning with a wide range of applications. One challenging aspect of HAR is recognizing human actions while playing music, further complicated by the need to recognize the musical notes being played. This paper proposes a deep learning-based method for simultaneous HAR and musical note recognition in music performances. We conducted experiments on Morin khuur performances, a traditional Mongolian instrument. The proposed method consists of two stages. First, we created a new dataset of Morin khuur performances. We used motion capture systems and depth sensors to collect data that includes hand keypoints, instrument segmentation information, and detailed movement information. We then analyzed RGB images, depth images, and motion data to determine which type of data provides the most valuable features for recognizing actions and notes in music performances. The second stage utilizes a Spatial Temporal Attention Graph Convolutional Network (STA-GCN) to recognize musical notes as continuous gestures. The STA-GCN model is designed to learn the relationships between hand keypoints and instrument segmentation information, which are crucial for accurate recognition. Evaluation on our dataset demonstrates that our model outperforms the traditional ST-GCN model, achieving an accuracy of 81.4%.


Assuntos
Aprendizado Profundo , Música , Humanos , Redes Neurais de Computação , Atividades Humanas , Reconhecimento Automatizado de Padrão/métodos , Gestos , Algoritmos , Movimento/fisiologia
2.
Int J Adv Manuf Technol ; 126(3-4): 1093-1107, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37073280

RESUMO

Surface defects are a common issue that affects product quality in the industrial manufacturing process. Many companies put a lot of effort into developing automated inspection systems to handle this issue. In this work, we propose a novel deep learning-based surface defect inspection system called the forceful steel defect detector (FDD), especially for steel surface defect detection. Our model adopts the state-of-the-art cascade R-CNN as the baseline architecture and improves it with the deformable convolution and the deformable RoI pooling to adapt to the geometric shape of defects. Besides, our model adopts the guided anchoring region proposal to generate bounding boxes with higher accuracies. Moreover, to enrich the point of view of input images, we propose the random scaling and the ultimate scaling techniques in the training and inference process, respectively. The experimental studies on the Severstal steel dataset, NEU steel dataset, and DAGM dataset demonstrate that our proposed model effectively improved the detection accuracy in terms of the average recall (AR) and the mean average precision (mAP) compared to state-of-the-art defect detection methods. We expect our innovation to accelerate the automation of industrial manufacturing process by increasing the productivity and by sustaining high product qualities.

3.
Sensors (Basel) ; 22(17)2022 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-36080911

RESUMO

Given video streams, we aim to correctly detect unsegmented signs related to continuous sign language recognition (CSLR). Despite the increase in proposed deep learning methods in this area, most of them mainly focus on using only an RGB feature, either the full-frame image or details of hands and face. The scarcity of information for the CSLR training process heavily constrains the capability to learn multiple features using the video input frames. Moreover, exploiting all frames in a video for the CSLR task could lead to suboptimal performance since each frame contains a different level of information, including main features in the inferencing of noise. Therefore, we propose novel spatio-temporal continuous sign language recognition using the attentive multi-feature network to enhance CSLR by providing extra keypoint features. In addition, we exploit the attention layer in the spatial and temporal modules to simultaneously emphasize multiple important features. Experimental results from both CSLR datasets demonstrate that the proposed method achieves superior performance in comparison with current state-of-the-art methods by 0.76 and 20.56 for the WER score on CSL and PHOENIX datasets, respectively.


Assuntos
Reconhecimento Psicológico , Língua de Sinais , Atenção , Humanos
4.
IEEE Trans Cybern ; 52(4): 2453-2466, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32667885

RESUMO

Due to the great advances in mobility techniques, an increasing number of point-of-interest (POI)-related services have emerged, which could help users to navigate or predict POIs that may be interesting. Obviously, predicting POIs is a challenging task, mainly because of the complicated sequential transition regularities, and the heterogeneity and sparsity of the collected trajectory data. Most prior studies on successive POI recommendation mainly focused on modeling the correlation among POIs based on users' check-in data. However, given a user's check-in sequence, generally, the relationship between two consecutive POIs is usually both time and distance subtle. In this article, we propose a novel POI recommendation system to capture and learn the complicated sequential transitions by incorporating time and distance irregularity. In addition, we propose a feasible way to dynamically weight the decay values into the model learning process. The learned awareness weights offer an easy-to-interpret way to translate how much each context is emphasized in the prediction process. The performance evaluations are conducted on real mobility datasets to demonstrate the effectiveness and practicability of the POI recommendations. The experimental results show that the proposed methods significantly outperform the state-of-the-art models in all metrics.


Assuntos
Aprendizagem
5.
Sensors (Basel) ; 20(10)2020 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-32455537

RESUMO

Semantic segmentation of street view images is an important step in scene understanding for autonomous vehicle systems. Recent works have made significant progress in pixel-level labeling using Fully Convolutional Network (FCN) framework and local multi-scale context information. Rich global context information is also essential in the segmentation process. However, a systematic way to utilize both global and local contextual information in a single network has not been fully investigated. In this paper, we propose a global-and-local network architecture (GLNet) which incorporates global spatial information and dense local multi-scale context information to model the relationship between objects in a scene, thus reducing segmentation errors. A channel attention module is designed to further refine the segmentation results using low-level features from the feature map. Experimental results demonstrate that our proposed GLNet achieves 80.8% test accuracy on the Cityscapes test dataset, comparing favorably with existing state-of-the-art methods.

6.
Sensors (Basel) ; 19(24)2019 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-31835404

RESUMO

With the recent growth of Smart TV technology, the demand for unique and beneficial applications motivates the study of a unique gesture-based system for a smart TV-like environment. Combining movie recommendation, social media platform, call a friend application, weather updates, chatting app, and tourism platform into a single system regulated by natural-like gesture controller is proposed to allow the ease of use and natural interaction. Gesture recognition problem solving was designed through 24 gestures of 13 static and 11 dynamic gestures that suit to the environment. Dataset of a sequence of RGB and depth images were collected, preprocessed, and trained in the proposed deep learning architecture. Combination of three-dimensional Convolutional Neural Network (3DCNN) followed by Long Short-Term Memory (LSTM) model was used to extract the spatio-temporal features. At the end of the classification, Finite State Machine (FSM) communicates the model to control the class decision results based on application context. The result suggested the combination data of depth and RGB to hold 97.8% of accuracy rate on eight selected gestures, while the FSM has improved the recognition rate from 89% to 91% in a real-time performance.

7.
Int J Data Min Bioinform ; 10(2): 121-45, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25796734

RESUMO

Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.


Assuntos
Algoritmos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Ontologia Genética , Genes Reguladores/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Mapeamento de Interação de Proteínas/métodos , Mineração de Dados/métodos , Regulação da Expressão Gênica/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA