Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Sensors (Basel) ; 23(3)2023 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-36772705

RESUMO

Recently, content-based image retrieval (CBIR) based on bag-of-visual-words (BoVW) model has been one of the most promising and increasingly active research areas. In this paper, we propose a new CBIR framework based on the visual words fusion of multiple feature descriptors to achieve an improved retrieval performance, where interest points are separately extracted from an image using features from accelerated segment test (FAST) and speeded-up robust features (SURF). The extracted keypoints are then fused together in a single keypoint feature vector and the improved RootSIFT algorithm is applied to describe the region surrounding each keypoint. Afterward, the FeatureWiz algorithm is employed to reduce features and select the best features for the BoVW learning model. To create the codebook, K-means clustering is applied to quantize visual features into a smaller set of visual words. Finally, the feature vectors extracted from the BoVW model are fed into a support vector machines (SVMs) classifier for image retrieval. An inverted index technique based on cosine distance metric is applied to sort the retrieved images to the similarity of the query image. Experiments on three benchmark datasets (Corel-1000, Caltech-10 and Oxford Flower-17) show that the presented CBIR technique can deliver comparable results to other state-of-the-art techniques, by achieving average accuracies of 92.94%, 98.40% and 84.94% on these datasets, respectively.

2.
Sensors (Basel) ; 23(11)2023 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-37299770

RESUMO

Multimodal user interfaces promise natural and intuitive human-machine interactions. However, is the extra effort for the development of a complex multisensor system justified, or can users also be satisfied with only one input modality? This study investigates interactions in an industrial weld inspection workstation. Three unimodal interfaces, including spatial interaction with buttons augmented on a workpiece or a worktable, and speech commands, were tested individually and in a multimodal combination. Within the unimodal conditions, users preferred the augmented worktable, but overall, the interindividual usage of all input technologies in the multimodal condition was ranked best. Our findings indicate that the implementation and the use of multiple input modalities is valuable and that it is difficult to predict the usability of individual input modalities for complex systems.


Assuntos
Tecnologia , Interface Usuário-Computador , Humanos , Fala
3.
Sensors (Basel) ; 22(13)2022 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-35808487

RESUMO

Pain is a reliable indicator of health issues; it affects patients' quality of life when not well managed. The current methods in the clinical application undergo biases and errors; moreover, such methods do not facilitate continuous pain monitoring. For this purpose, the recent methodologies in automatic pain assessment were introduced, which demonstrated the possibility for objectively and robustly measuring and monitoring pain when using behavioral cues and physiological signals. This paper focuses on introducing a reliable automatic system for continuous monitoring of pain intensity by analyzing behavioral cues, such as facial expressions and audio, and physiological signals, such as electrocardiogram (ECG), electromyogram (EMG), and electrodermal activity (EDA) from the X-ITE Pain Dataset. Several experiments were conducted with 11 datasets regarding classification and regression; these datasets were obtained from the database to reduce the impact of the imbalanced database problem. With each single modality (Uni-modality) experiment, we used a Random Forest [RF] baseline method, a Long Short-Term Memory (LSTM) method, and a LSTM using a sample weighting method (called LSTM-SW). Further, LSTM and LSTM-SW were used with fused modalities (two modalities = Bi-modality and all modalities = Multi-modality) experiments. Sample weighting was used to downweight misclassified samples during training to improve the performance. The experiments' results confirmed that regression is better than classification with imbalanced datasets, EDA is the best single modality, and fused modalities improved the performance significantly over the single modality in 10 out of 11 datasets.


Assuntos
Redes Neurais de Computação , Qualidade de Vida , Eletrocardiografia , Humanos , Dor , Medição da Dor/métodos
4.
Sensors (Basel) ; 22(3)2022 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-35161671

RESUMO

This paper presents an implementation of RoSA, a Robot System Assistant, for safe and intuitive human-machine interaction. The interaction modalities were chosen and previously reviewed using a Wizard of Oz study emphasizing a strong propensity for speech and pointing gestures. Based on these findings, we design and implement a new multi-modal system for contactless human-machine interaction based on speech, facial, and gesture recognition. We evaluate our proposed system in an extensive study with multiple subjects to examine the user experience and interaction efficiency. It reports that our method achieves similar usability scores compared to the entirely human remote-controlled robot interaction in our Wizard of Oz study. Furthermore, our framework's implementation is based on the Robot Operating System (ROS), allowing modularity and extendability for our multi-device and multi-user method.


Assuntos
Robótica , Rosa , Gestos , Humanos , Software , Fala
5.
Sensors (Basel) ; 21(11)2021 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-34071704

RESUMO

Vision-based 3D human pose estimation approaches are typically evaluated on datasets that are limited in diversity regarding many factors, e.g., subjects, poses, cameras, and lighting. However, for real-life applications, it would be desirable to create systems that work under arbitrary conditions ("in-the-wild"). To advance towards this goal, we investigated the commonly used datasets HumanEva-I, Human3.6M, and Panoptic Studio, discussed their biases (that is, their limitations in diversity), and illustrated them in cross-database experiments (for which we used a surrogate for roughly estimating in-the-wild performance). For this purpose, we first harmonized the differing skeleton joint definitions of the datasets, reducing the biases and systematic test errors in cross-database experiments. We further proposed a scale normalization method that significantly improved generalization across camera viewpoints, subjects, and datasets. In additional experiments, we investigated the effect of using more or less cameras, training with multiple datasets, applying a proposed anatomy-based pose validation step, and using OpenPose as the basis for the 3D pose estimation. The experimental results showed the usefulness of the joint harmonization, of the scale normalization, and of augmenting virtual cameras to significantly improve cross-database and in-database generalization. At the same time, the experiments showed that there were dataset biases that could not be compensated and call for new datasets covering more diversity. We discussed our results and promising directions for future work.


Assuntos
Imageamento Tridimensional , Iluminação , Bases de Dados Factuais , Humanos
6.
Sensors (Basel) ; 21(17)2021 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-34502809

RESUMO

Face and person detection are important tasks in computer vision, as they represent the first component in many recognition systems, such as face recognition, facial expression analysis, body pose estimation, face attribute detection, or human action recognition. Thereby, their detection rate and runtime are crucial for the performance of the overall system. In this paper, we combine both face and person detection in one framework with the goal of reaching a detection performance that is competitive to the state of the art of lightweight object-specific networks while maintaining real-time processing speed for both detection tasks together. In order to combine face and person detection in one network, we applied multi-task learning. The difficulty lies in the fact that no datasets are available that contain both face as well as person annotations. Since we did not have the resources to manually annotate the datasets, as it is very time-consuming and automatic generation of ground truths results in annotations of poor quality, we solve this issue algorithmically by applying a special training procedure and network architecture without the need of creating new labels. Our newly developed method called Simultaneous Face and Person Detection (SFPD) is able to detect persons and faces with 40 frames per second. Because of this good trade-off between detection performance and inference time, SFPD represents a useful and valuable real-time framework especially for a multitude of real-world applications such as, e.g., human-robot interaction.


Assuntos
Reconhecimento Facial , Robótica , Expressão Facial , Humanos , Processamento de Imagem Assistida por Computador
7.
Sensors (Basel) ; 21(9)2021 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-34068462

RESUMO

Prior work on automated methods demonstrated that it is possible to recognize pain intensity from frontal faces in videos, while there is an assumption that humans are very adept at this task compared to machines. In this paper, we investigate whether such an assumption is correct by comparing the results achieved by two human observers with the results achieved by a Random Forest classifier (RFc) baseline model (called RFc-BL) and by three proposed automated models. The first proposed model is a Random Forest classifying descriptors of Action Unit (AU) time series; the second is a modified MobileNetV2 CNN classifying face images that combine three points in time; and the third is a custom deep network combining two CNN branches using the same input as for MobileNetV2 plus knowledge of the RFc. We conduct experiments with X-ITE phasic pain database, which comprises videotaped responses to heat and electrical pain stimuli, each of three intensities. Distinguishing these six stimulation types plus no stimulation was the main 7-class classification task for the human observers and automated approaches. Further, we conducted reduced 5-class and 3-class classification experiments, applied Multi-task learning, and a newly suggested sample weighting method. Experimental results show that the pain assessments of the human observers are significantly better than guessing and perform better than the automatic baseline approach (RFc-BL) by about 1%; however, the human performance is quite poor due to the challenge that pain that is ethically allowed to be induced in experimental studies often does not show up in facial reaction. We discovered that downweighting those samples during training improves the performance for all samples. The proposed RFc and two-CNNs models (using the proposed sample weighting) significantly outperformed the human observer by about 6% and 7%, respectively.


Assuntos
Expressão Facial , Redes Neurais de Computação , Bases de Dados Factuais , Humanos , Dor , Medição da Dor
8.
Sensors (Basel) ; 19(12)2019 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-31234293

RESUMO

Experimental economic laboratories run many studies to test theoretical predictions with actual human behaviour, including public goods games. With this experiment, participants in a group have the option to invest money in a public account or to keep it. All the invested money is multiplied and then evenly distributed. This structure incentivizes free riding, resulting in contributions to the public goods declining over time. Face-to-face Communication (FFC) diminishes free riding and thus positively affects contribution behaviour, but the question of how has remained mostly unknown. In this paper, we investigate two communication channels, aiming to explain what promotes cooperation and discourages free riding. Firstly, the facial expressions of the group in the 3-minute FFC videos are automatically analysed to predict the group behaviour towards the end of the game. The proposed automatic facial expressions analysis approach uses a new group activity descriptor and utilises random forest classification. Secondly, the contents of FFC are investigated by categorising strategy-relevant topics and using meta-data. The results show that it is possible to predict whether the group will fully contribute to the end of the games based on facial expression data from three minutes of FFC, but deeper understanding requires a larger dataset. Facial expression analysis and content analysis found that FFC and talking until the very end had a significant, positive effect on the contributions.


Assuntos
Comunicação , Expressão Facial , Relações Interpessoais , Comportamento Cooperativo , Teoria dos Jogos , Humanos , Comportamento Social
9.
Sensors (Basel) ; 18(9)2018 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-30149549

RESUMO

The majority of handwritten word recognition strategies are constructed on learning-based generative frameworks from letter or word training samples. Theoretically, constructing recognition models through discriminative learning should be the more effective alternative. The primary goal of this research is to compare the performances of discriminative and generative recognition strategies, which are described by generatively-trained hidden Markov modeling (HMM), discriminatively-trained conditional random fields (CRF) and discriminatively-trained hidden-state CRF (HCRF). With learning samples obtained from two dissimilar databases, we initially trained and applied an HMM classification scheme. To enable HMM classifiers to effectively reject incorrect and out-of-vocabulary segmentation, we enhance the models with adaptive threshold schemes. Aside from proposing such schemes for HMM classifiers, this research introduces CRF and HCRF classifiers in the recognition of offline Arabic handwritten words. Furthermore, the efficiencies of all three strategies are fully assessed using two dissimilar databases. Recognition outcomes for both words and letters are presented, with the pros and cons of each strategy emphasized.

10.
Sensors (Basel) ; 16(3)2016 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-26978368

RESUMO

Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a specific area of application. However, generation of training samples is expensive in the sense of manpower and time, particularly if complete text pages including complex ground truth are required. This is why there is a lack of such databases, especially for Arabic, the second most popular language. However, Arabic handwriting recognition involves different preprocessing, segmentation and recognition methods. Each requires particular ground truth or samples to enable optimal training and validation, which are often not covered by the currently available databases. To overcome this issue, we propose a system that synthesizes Arabic handwritten words and text pages and generates corresponding detailed ground truth. We use these syntheses to validate a new, segmentation based system that recognizes handwritten Arabic words. We found that a modification of an Active Shape Model based character classifiers-that we proposed earlier-improves the word recognition accuracy. Further improvements are achieved, by using a vocabulary of the 50,000 most common Arabic words for error correction.

11.
Sensors (Basel) ; 16(3): 283, 2016 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-26927106

RESUMO

We propose a novel method for registration of partly overlapping three-dimensional surface measurements for stereo-based optical sensors using fringe projection. Based on two-dimensional texture matching, it allows global registration of surfaces with poor and ambiguous three-dimensional features, which are common to surface inspection applications. No prior information about relative sensor position is necessary, which makes our approach suitable for semi-automatic and manual measurement. The algorithm is robust and works with challenging measurements, including uneven illumination, surfaces with specular reflection as well as sparsely textured surfaces. We show that precisions of 1 mm and below can be achieved along the surfaces, which is necessary for further local 3D registration.

12.
ScientificWorldJournal ; 2015: 323575, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26295059

RESUMO

Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.

13.
Sensors (Basel) ; 15(9): 20945-66, 2015 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-26343651

RESUMO

Head pose estimation is a crucial initial task for human face analysis, which is employed in several computer vision systems, such as: facial expression recognition, head gesture recognition, yawn detection, etc. In this work, we propose a frame-based approach to estimate the head pose on top of the Viola and Jones (VJ) Haar-like face detector. Several appearance and depth-based feature types are employed for the pose estimation, where comparisons between them in terms of accuracy and speed are presented. It is clearly shown through this work that using the depth data, we improve the accuracy of the head pose estimation. Additionally, we can spot positive detections, faces in profile views detected by the frontal model, that are wrongly cropped due to background disturbances. We introduce a new depth-based feature descriptor that provides competitive estimation results with a lower computation time. Evaluation on a benchmark Kinect database shows that the histogram of oriented gradients and the developed depth-based features are more distinctive for the head pose estimation, where they compare favorably to the current state-of-the-art approaches. Using a concatenation of the aforementioned feature types, we achieved a head pose estimation with average errors not exceeding 5:1; 4:6; 4:2 for pitch, yaw and roll angles, respectively.


Assuntos
Biometria/métodos , Face/anatomia & histologia , Cabeça/fisiologia , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Postura/fisiologia , Algoritmos , Inteligência Artificial , Humanos
14.
IEEE Trans Image Process ; 33: 2377-2387, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38512742

RESUMO

Estimating the head pose of a person is a crucial problem for numerous applications that is yet mainly addressed as a subtask of frontal pose prediction. We present a novel method for unconstrained end-to-end head pose estimation to tackle the challenging task of full range of orientation head pose prediction. We address the issue of ambiguous rotation labels by introducing the rotation matrix formalism for our ground truth data and propose a continuous 6D rotation matrix representation for efficient and robust direct regression. This allows to efficiently learn full rotation appearance and to overcome the limitations of the current state-of-the-art. Together with new accumulated training data that provides full head pose rotation data and a geodesic loss approach for stable learning, we design an advanced model that is able to predict an extended range of head orientations. An extensive evaluation on public datasets demonstrates that our method significantly outperforms other state-of-the-art methods in an efficient and robust manner, while its advanced prediction range allows the expansion of the application area. We open-source our training and testing code along with our trained models: https://github.com/thohemp/6DRepNet360.

15.
Front Robot AI ; 11: 1347985, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38686339

RESUMO

Visual simultaneous localization and mapping (V-SLAM) plays a crucial role in the field of robotic systems, especially for interactive and collaborative mobile robots. The growing reliance on robotics has increased complexity in task execution in real-world applications. Consequently, several types of V-SLAM methods have been revealed to facilitate and streamline the functions of robots. This work aims to showcase the latest V-SLAM methodologies, offering clear selection criteria for researchers and developers to choose the right approach for their robotic applications. It chronologically presents the evolution of SLAM methods, highlighting key principles and providing comparative analyses between them. The paper focuses on the integration of the robotic ecosystem with a robot operating system (ROS) as Middleware, explores essential V-SLAM benchmark datasets, and presents demonstrative figures for each method's workflow.

16.
Life (Basel) ; 13(9)2023 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-37763232

RESUMO

This study focuses on improving healthcare quality by introducing an automated system that continuously monitors patient pain intensity. The system analyzes the Electrodermal Activity (EDA) sensor modality modality, compares the results obtained from both EDA and facial expressions modalities, and late fuses EDA and facial expressions modalities. This work extends our previous studies of pain intensity monitoring via an expanded analysis of the two informative methods. The EDA sensor modality and facial expression analysis play a prominent role in pain recognition; the extracted features reflect the patient's responses to different pain levels. Three different approaches were applied: Random Forest (RF) baseline methods, Long-Short Term Memory Network (LSTM), and LSTM with the sample-weighting method (LSTM-SW). Evaluation metrics included Micro average F1-score for classification and Mean Squared Error (MSE) and intraclass correlation coefficient (ICC [3, 1]) for both classification and regression. The results highlight the effectiveness of late fusion for EDA and facial expressions, particularly in almost balanced datasets (Micro average F1-score around 61%, ICC about 0.35). EDA regression models, particularly LSTM and LSTM-SW, showed superiority in imbalanced datasets and outperformed guessing (where the majority of votes indicate no pain) and baseline methods (RF indicates Random Forest classifier (RFc) and Random Forest regression (RFr)). In conclusion, by integrating both modalities or utilizing EDA, they can provide medical centers with reliable and valuable insights into patients' pain experiences and responses.

17.
Front Psychiatry ; 14: 1227426, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38188049

RESUMO

The sudden appearance and devastating effects of the COVID-19 pandemic resulted in the need for multiple adaptive changes in societies, business operations and healthcare systems across the world. This review describes the development and increased use of digital technologies such as chat bots, electronic diaries, online questionnaires and even video gameplay to maintain effective treatment standards for individuals with mental health conditions such as depression, anxiety and post-traumatic stress syndrome. We describe how these approaches have been applied to help meet the challenges of the pandemic in delivering mental healthcare solutions. The main focus of this narrative review is on describing how these digital platforms have been used in diagnostics, patient monitoring and as a treatment option for the general public, as well as for frontline medical staff suffering with mental health issues.

18.
Heliyon ; 8(11): e11397, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36387580

RESUMO

Vehicular accident prediction and detection has recently garnered curiosity and large amounts of attention in machine learning applications and related areas, due to its peculiar and fascinating application potentials in the development of Intelligent Transportation Systems (ITS) that play a pivotal role in the success of emerging smart cities. In this paper, we present a new vision-based framework for real-time vehicular accident prediction and detection, based on motion temporal templates and fuzzy time-slicing. The presented framework proceeds in a stepwise fashion, starting with automatically detecting moving objects (i.e., on-road vehicles or roadside pedestrians), followed by dynamically keep tracking of the detected moving objects based on temporal templates, clustering and supervised learning. Then, an extensive set of local features is extracted from the temporal templates of moving objects. Finally, an effective deep neural network (DNN) model is trained on the extracted features to detect abnormal vehicle behavioral patterns and thus predict an accident just before it occurs. The experiments on real-world vehicular accident videos demonstrate that the framework can yield mostly promising results by achieving a hit rate of 98.5% with a false alarm rate of 4.2% that compare very favorably to those from existing approaches, while still being able to deliver delay guarantees for realtime traffic monitoring and surveillance applications.

19.
Comput Biol Med ; 137: 104781, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34455303

RESUMO

Recently, automatic computer-aided detection (CAD) of COVID-19 using radiological images has received a great deal of attention from many researchers and medical practitioners, and consequently several CAD frameworks and methods have been presented in the literature to assist the radiologist physicians in performing diagnostic COVID-19 tests quickly, reliably and accurately. This paper presents an innovative framework for the automatic detection of COVID-19 from chest X-ray (CXR) images, in which a rich and effective representation of lung tissue patterns is generated from the gray level co-occurrence matrix (GLCM) based textural features. The input CXR image is first preprocessed by spatial filtering along with median filtering and contrast limited adaptive histogram equalization to improve the CXR image's poor quality and reduce image noise. Automatic thresholding by the optimized formula of Otsu's method is applied to find a proper threshold value to best segment lung regions of interest (ROIs) out from CXR images. Then, a concise set of GLCM-based texture features is extracted to accurately represent the segmented lung ROIs of each CXR image. Finally, the normalized features are fed into a trained discriminative latent-dynamic conditional random fields (LDCRFs) model for fine-grained classification to divide the cases into two categories: COVID-19 and non-COVID-19. The presented method has been experimentally tested and validated on a relatively large dataset of frontal CXR images, achieving an average accuracy, precision, recall, and F1-score of 95.88%, 96.17%, 94.45%, and 95.79%, respectively, which compare favorably with and occasionally exceed those previously reported in similar studies in the literature.


Assuntos
COVID-19 , Humanos , SARS-CoV-2
20.
Brain Sci ; 11(2)2021 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-33672978

RESUMO

Due to their high distinctiveness, robustness to illumination and simple computation, Histogram of Oriented Gradient (HOG) features have attracted much attention and achieved remarkable success in many computer vision tasks. In this paper, an innovative framework for driver drowsiness detection is proposed, where an adaptive descriptor that possesses the virtue of distinctiveness, robustness and compactness is formed from an improved version of HOG features based on binarized histograms of shifted orientations. The final HOG descriptor generated from binarized HOG features is fed to the trained Naïve Bayes (NB) classifier to make the final driver drowsiness determination. Experimental results on the publicly available NTHU-DDD dataset verify that the proposed framework has the potential to be a strong contender for several state-of-the-art baselines, by achieving a competitive detection accuracy of 85.62%, without loss of efficiency or stability.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA