Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Sensors (Basel) ; 22(12)2022 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-35746341

RESUMEN

Sign language is the main channel for hearing-impaired people to communicate with others. It is a visual language that conveys highly structured components of manual and non-manual parameters such that it needs a lot of effort to master by hearing people. Sign language recognition aims to facilitate this mastering difficulty and bridge the communication gap between hearing-impaired people and others. This study presents an efficient architecture for sign language recognition based on a convolutional graph neural network (GCN). The presented architecture consists of a few separable 3DGCN layers, which are enhanced by a spatial attention mechanism. The limited number of layers in the proposed architecture enables it to avoid the common over-smoothing problem in deep graph neural networks. Furthermore, the attention mechanism enhances the spatial context representation of the gestures. The proposed architecture is evaluated on different datasets and shows outstanding results.


Asunto(s)
Redes Neurales de la Computación , Lengua de Signos , Gestos , Humanos , Lenguaje , Reconocimiento en Psicología
2.
Sensors (Basel) ; 21(6)2021 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-33803891

RESUMEN

Human activity recognition (HAR) remains a challenging yet crucial problem to address in computer vision. HAR is primarily intended to be used with other technologies, such as the Internet of Things, to assist in healthcare and eldercare. With the development of deep learning, automatic high-level feature extraction has become a possibility and has been used to optimize HAR performance. Furthermore, deep-learning techniques have been applied in various fields for sensor-based HAR. This study introduces a new methodology using convolution neural networks (CNN) with varying kernel dimensions along with bi-directional long short-term memory (BiLSTM) to capture features at various resolutions. The novelty of this research lies in the effective selection of the optimal video representation and in the effective extraction of spatial and temporal features from sensor data using traditional CNN and BiLSTM. Wireless sensor data mining (WISDM) and UCI datasets are used for this proposed methodology in which data are collected through diverse methods, including accelerometers, sensors, and gyroscopes. The results indicate that the proposed scheme is efficient in improving HAR. It was thus found that unlike other available methods, the proposed method improved accuracy, attaining a higher score in the WISDM dataset compared to the UCI dataset (98.53% vs. 97.05%).


Asunto(s)
Aprendizaje Profundo , Minería de Datos , Actividades Humanas , Humanos , Memoria a Largo Plazo , Redes Neurales de la Computación
3.
Sensors (Basel) ; 21(4)2021 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-33572169

RESUMEN

This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes them. AFD-Obj consists of two main stages: firstly, we formulate the problem of AFs detection as an object detection problem and prepare the data to fulfill requirement of object detectors by generating a spectral three-channel image from the speech signal and creating the corresponding annotation for each utterance. Secondly, we use annotated images to train the proposed system to detect sequences of AFs and their boundaries. We test the system by feeding spectrogram images to the system, which will recognize and localize multi-label AFs. We investigated using these AFs to detect the utterance phonemes. YOLOv3-tiny detector is selected because of its real-time property and its support for multi-label detection. We test our AFD-Obj system on Arabic and English languages using KAPD and TIMIT corpora, respectively. Additionally, we propose using YOLOv3-tiny as an Arabic phoneme detection system (i.e., PD-Obj) to recognize and localize a sequence of Arabic phonemes from whole speech utterances. The proposed AFD-Obj and PD-Obj systems achieve excellent results for Arabic corpus and comparable to the state-of-the-art method for English corpus. Moreover, we showed that using only one-scale detection is suitable for AFs detection or phoneme recognition.

4.
J Med Syst ; 40(1): 20, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26531753

RESUMEN

Voice disorders are associated with irregular vibrations of vocal folds. Based on the source filter theory of speech production, these irregular vibrations can be detected in a non-invasive way by analyzing the speech signal. In this paper we present a multiband approach for the detection of voice disorders given that the voice source generally interacts with the vocal tract in a non-linear way. In normal phonation, and assuming sustained phonation of a vowel, the lower frequencies of speech are heavily source dependent due to the low frequency glottal formant, while the higher frequencies are less dependent on the source signal. During abnormal phonation, this is still a valid, but turbulent noise of source, because of the irregular vibration, affects also higher frequencies. Motivated by such a model, we suggest a multiband approach based on a three-level discrete wavelet transformation (DWT) and in each band the fractal dimension (FD) of the estimated power spectrum is estimated. The experiments suggest that frequency band 1-1562 Hz, lower frequencies after level 3, exhibits a significant difference in the spectrum of a normal and pathological subject. With this band, a detection rate of 91.28 % is obtained with one feature, and the obtained result is higher than all other frequency bands. Moreover, an accuracy of 92.45 % and an area under receiver operating characteristic curve (AUC) of 95.06 % is acquired when the FD of all levels is fused. Likewise, when the FD of all levels is combined with 22 Multi-Dimensional Voice Program (MDVP) parameters, an improvement of 2.26 % in accuracy and 1.45 % in AUC is observed.


Asunto(s)
Fractales , Trastornos de la Voz/diagnóstico , Trastornos de la Voz/fisiopatología , Análisis de Ondículas , Algoritmos , Humanos , Voz/fisiología
5.
Diagnostics (Basel) ; 12(4)2022 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-35454043

RESUMEN

Electroencephalography-based motor imagery (EEG-MI) classification is a critical component of the brain-computer interface (BCI), which enables people with physical limitations to communicate with the outside world via assistive technology. Regrettably, EEG decoding is challenging because of the complexity, dynamic nature, and low signal-to-noise ratio of the EEG signal. Developing an end-to-end architecture capable of correctly extracting EEG data's high-level features remains a difficulty. This study introduces a new model for decoding MI known as a Multi-Branch EEGNet with squeeze-and-excitation blocks (MBEEGSE). By clearly specifying channel interdependencies, a multi-branch CNN model with attention blocks is employed to adaptively change channel-wise feature responses. When compared to existing state-of-the-art EEG motor imagery classification models, the suggested model achieves good accuracy (82.87%) with reduced parameters in the BCI-IV2a motor imagery dataset and (96.15%) in the high gamma dataset.

6.
Biomed Eng Online ; 10: 41, 2011 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-21624137

RESUMEN

BACKGROUND AND OBJECTIVE: There has been a growing interest in objective assessment of speech in dysphonic patients for the classification of the type and severity of voice pathologies using automatic speech recognition (ASR). The aim of this work was to study the accuracy of the conventional ASR system (with Mel frequency cepstral coefficients (MFCCs) based front end and hidden Markov model (HMM) based back end) in recognizing the speech characteristics of people with pathological voice. MATERIALS AND METHODS: The speech samples of 62 dysphonic patients with six different types of voice disorders and 50 normal subjects were analyzed. The Arabic spoken digits were taken as an input. The distribution of the first four formants of the vowel /a/ was extracted to examine deviation of the formants from normal. RESULTS: There was 100% recognition accuracy obtained for Arabic digits spoken by normal speakers. However, there was a significant loss of accuracy in the classifications while spoken by voice disordered subjects. Moreover, no significant improvement in ASR performance was achieved after assessing a subset of the individuals with disordered voices who underwent treatment. CONCLUSION: The results of this study revealed that the current ASR technique is not a reliable tool in recognizing the speech of dysphonic patients.


Asunto(s)
Disfonía/diagnóstico , Disfonía/fisiopatología , Fonética , Software de Reconocimiento del Habla , Habla , Adolescente , Adulto , Automatización , Femenino , Humanos , Lenguaje , Masculino , Persona de Mediana Edad , Adulto Joven
7.
Data Brief ; 26: 104514, 2019 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-31667277

RESUMEN

The date palm is one of the most valuable fruit trees in the world. Most methods used for date fruit inspection, harvesting, grading, and classification are manual, which makes them ineffective in terms of both time and economy. Research on automated date fruit harvesting is limited as there is no public dataset for date fruits to aid in this. In this work, we present a comprehensive dataset for date fruits that can be used by the research community for multiple tasks including automated harvesting, visual yield estimation, and classification tasks. The dataset contains images of date fruit bunches of different date varieties, captured at different pre-maturity and maturity stages. These images cover multiple sets of variations such as multi-scale images, variable illumination, and different bagging states. We also marked date bunches for selected palms and measured the weights of the bunches, captured their images on a graph paper, and recorded 360° video of the palms. This dataset can help in advancing research and automating date palm agricultural applications, including robotic harvesting, fruit detection and classification, maturity analysis, and weight/yield estimation. The dataset is freely and publicly available for the research community in the IEEE DataPort repository [1] (https://doi.org/10.21227/x46j-sk98).

8.
J Voice ; 31(1): 3-15, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26992554

RESUMEN

OBJECTIVES AND BACKGROUND: Automatic voice pathology detection and classification systems effectively contribute to the assessment of voice disorders, which helps clinicians to detect the existence of any voice pathologies and the type of pathology from which patients suffer in the early stages. This work concentrates on developing an accurate and robust feature extraction for detecting and classifying voice pathologies by investigating different frequency bands using correlation functions. In this paper, we extracted maximum peak values and their corresponding lag values from each frame of a voiced signal by using correlation functions as features to detect and classify pathological samples. These features are investigated in different frequency bands to see the contribution of each band on the detection and classification processes. MATERIAL AND METHODS: Various samples of sustained vowel /a/ of normal and pathological voices were extracted from three different databases: English, German, and Arabic. A support vector machine was used as a classifier. We also performed a t test to investigate the significant differences in mean of normal and pathological samples. RESULTS: The best achieved accuracies in both detection and classification were varied depending on the band, the correlation function, and the database. The most contributive bands in both detection and classification were between 1000 and 8000 Hz. In detection, the highest acquired accuracies when using cross-correlation were 99.809%, 90.979%, and 91.168% in the Massachusetts Eye and Ear Infirmary, Saarbruecken Voice Database, and Arabic Voice Pathology Database databases, respectively. However, in classification, the highest acquired accuracies when using cross-correlation were 99.255%, 98.941%, and 95.188% in the three databases, respectively.


Asunto(s)
Acústica , Procesamiento de Señales Asistido por Computador , Acústica del Lenguaje , Medición de la Producción del Habla/métodos , Patología del Habla y Lenguaje/métodos , Trastornos de la Voz/diagnóstico , Calidad de la Voz , Bases de Datos Factuales , Humanos , Reconocimiento de Normas Patrones Automatizadas , Trastornos de la Voz/clasificación , Trastornos de la Voz/fisiopatología
9.
J Healthc Eng ; 2017: 8783751, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29201333

RESUMEN

A voice disorder database is an essential element in doing research on automatic voice disorder detection and classification. Ethnicity affects the voice characteristics of a person, and so it is necessary to develop a database by collecting the voice samples of the targeted ethnic group. This will enhance the chances of arriving at a global solution for the accurate and reliable diagnosis of voice disorders by understanding the characteristics of a local group. Motivated by such idea, an Arabic voice pathology database (AVPD) is designed and developed in this study by recording three vowels, running speech, and isolated words. For each recorded samples, the perceptual severity is also provided which is a unique aspect of the AVPD. During the development of the AVPD, the shortcomings of different voice disorder databases were identified so that they could be avoided in the AVPD. In addition, the AVPD is evaluated by using six different types of speech features and four types of machine learning algorithms. The results of detection and classification of voice disorders obtained with the sustained vowel and the running speech are also compared with the results of an English-language disorder database, the Massachusetts Eye and Ear Infirmary (MEEI) database.


Asunto(s)
Diagnóstico por Computador , Lenguaje , Reconocimiento de Normas Patrones Automatizadas , Acústica del Lenguaje , Medición de la Producción del Habla/métodos , Trastornos de la Voz/diagnóstico , Calidad de la Voz , Acústica , Adulto , Algoritmos , Bases de Datos Factuales , Femenino , Humanos , Laringoscopía , Aprendizaje Automático , Masculino , Persona de Mediana Edad , Reproducibilidad de los Resultados , Arabia Saudita , Procesamiento de Señales Asistido por Computador , Grabación en Video , Voz , Adulto Joven
10.
J Voice ; 31(3): 386.e1-386.e8, 2017 May.
Artículo en Inglés | MEDLINE | ID: mdl-27745756

RESUMEN

A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection.


Asunto(s)
Diagnóstico por Computador/métodos , Lenguaje , Procesamiento de Señales Asistido por Computador , Acústica del Lenguaje , Medición de la Producción del Habla/métodos , Trastornos de la Voz/diagnóstico , Calidad de la Voz , Bases de Datos Factuales , Humanos , Reconocimiento de Normas Patrones Automatizadas , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Trastornos de la Voz/fisiopatología
11.
J Voice ; 31(1): 113.e9-113.e18, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27105857

RESUMEN

BACKGROUND AND OBJECTIVE: Automatic voice-pathology detection and classification systems may help clinicians to detect the existence of any voice pathologies and the type of pathology from which patients suffer in the early stages. The main aim of this paper is to investigate Multidimensional Voice Program (MDVP) parameters to automatically detect and classify the voice pathologies in multiple databases, and then to find out which parameters performed well in these two processes. MATERIALS AND METHODS: Samples of the sustained vowel /a/ of normal and pathological voices were extracted from three different databases, which have three voice pathologies in common. The selected databases in this study represent three distinct languages: (1) the Arabic voice pathology database; (2) the Massachusetts Eye and Ear Infirmary database (English database); and (3) the Saarbruecken Voice Database (German database). A computerized speech lab program was used to extract MDVP parameters as features, and an acoustical analysis was performed. The Fisher discrimination ratio was applied to rank the parameters. A t test was performed to highlight any significant differences in the means of the normal and pathological samples. RESULTS: The experimental results demonstrate a clear difference in the performance of the MDVP parameters using these databases. The highly ranked parameters also differed from one database to another. The best accuracies were obtained by using the three highest ranked MDVP parameters arranged according to the Fisher discrimination ratio: these accuracies were 99.68%, 88.21%, and 72.53% for the Saarbruecken Voice Database, the Massachusetts Eye and Ear Infirmary database, and the Arabic voice pathology database, respectively.


Asunto(s)
Acústica , Acústica del Lenguaje , Medición de la Producción del Habla/métodos , Trastornos de la Voz/diagnóstico , Calidad de la Voz , Área Bajo la Curva , Automatización , Bases de Datos Factuales , Humanos , Reconocimiento de Normas Patrones Automatizadas , Valor Predictivo de las Pruebas , Curva ROC , Reproducibilidad de los Resultados , Espectrografía del Sonido , Trastornos de la Voz/clasificación , Trastornos de la Voz/fisiopatología
12.
J Voice ; 30(6): 757.e7-757.e19, 2016 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26522263

RESUMEN

BACKGROUND AND OBJECTIVE: Automatic voice pathology detection using sustained vowels has been widely explored. Because of the stationary nature of the speech waveform, pathology detection with a sustained vowel is a comparatively easier task than that using a running speech. Some disorder detection systems with running speech have also been developed, although most of them are based on a voice activity detection (VAD), that is, itself a challenging task. Pathology detection with running speech needs more investigation, and systems with good accuracy (ACC) are required. Furthermore, pathology classification systems with running speech have not received any attention from the research community. In this article, automatic pathology detection and classification systems are developed using text-dependent running speech without adding a VAD module. METHOD: A set of three psychophysics conditions of hearing (critical band spectral estimation, equal loudness hearing curve, and the intensity loudness power law of hearing) is used to estimate the auditory spectrum. The auditory spectrum and all-pole models of the auditory spectrums are computed and analyzed and used in a Gaussian mixture model for an automatic decision. RESULTS: In the experiments using the Massachusetts Eye & Ear Infirmary database, an ACC of 99.56% is obtained for pathology detection, and an ACC of 93.33% is obtained for the pathology classification system. The results of the proposed systems outperform the existing running-speech-based systems. DISCUSSION: The developed system can effectively be used in voice pathology detection and classification systems, and the proposed features can visually differentiate between normal and pathological samples.


Asunto(s)
Acústica , Procesamiento de Señales Asistido por Computador , Acústica del Lenguaje , Medición de la Producción del Habla/métodos , Trastornos de la Voz/diagnóstico , Calidad de la Voz , Algoritmos , Área Bajo la Curva , Bases de Datos Factuales , Análisis de Fourier , Humanos , Modelos Lineales , Reconocimiento de Normas Patrones Automatizadas , Valor Predictivo de las Pruebas , Curva ROC , Reproducibilidad de los Resultados , Espectrografía del Sonido , Factores de Tiempo , Trastornos de la Voz/clasificación , Trastornos de la Voz/fisiopatología
13.
J Voice ; 26(6): 817.e19-27, 2012 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-23177748

RESUMEN

BACKGROUND AND OBJECTIVE: Objective assessment of voice pathology has a growing interest nowadays. Automatic speech/speaker recognition (ASR) systems are commonly deployed in voice pathology detection. The aim of this work was to develop a novel feature extraction method for ASR that incorporates distributions of voiced and unvoiced parts, and voice onset and offset characteristics in a time-frequency domain to detect voice pathology. MATERIALS AND METHODS: The speech samples of 70 dysphonic patients with six different types of voice disorders and 50 normal subjects were analyzed. The Arabic spoken digits (1-10) were taken as an input. The proposed feature extraction method was embedded into the ASR system with Gaussian mixture model (GMM) classifier to detect voice disorder. RESULTS: Accuracy of 97.48% was obtained in text independent (all digits' training) case, and over 99% accuracy was obtained in text dependent (separate digit's training) case. The proposed method outperformed the conventional Mel frequency cepstral coefficient (MFCC) features. CONCLUSION: The results of this study revealed that incorporating voice onset and offset information leads to efficient automatic voice disordered detection.


Asunto(s)
Acústica , Modelos Estadísticos , Procesamiento de Señales Asistido por Computador , Acústica del Lenguaje , Medición de la Producción del Habla , Trastornos de la Voz/diagnóstico , Calidad de la Voz , Adolescente , Adulto , Algoritmos , Automatización , Estudios de Casos y Controles , Femenino , Humanos , Modelos Lineales , Masculino , Persona de Mediana Edad , Reconocimiento de Normas Patrones Automatizadas , Valor Predictivo de las Pruebas , Espectrografía del Sonido , Factores de Tiempo , Trastornos de la Voz/fisiopatología , Adulto Joven
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA