Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nat Commun ; 13(1): 5815, 2022 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-36192403

RESUMEN

A wearable silent speech interface (SSI) is a promising platform that enables verbal communication without vocalization. The most widely studied methodology for SSI focuses on surface electromyography (sEMG). However, sEMG suffers from low scalability because of signal quality-related issues, including signal-to-noise ratio and interelectrode interference. Hence, here, we present a novel SSI by utilizing crystalline-silicon-based strain sensors combined with a 3D convolutional deep learning algorithm. Two perpendicularly placed strain gauges with minimized cell dimension (<0.1 mm2) could effectively capture the biaxial strain information with high reliability. We attached four strain sensors near the subject's mouths and collected strain data of unprecedently large wordsets (100 words), which our SSI can classify at a high accuracy rate (87.53%). Several analysis methods were demonstrated to verify the system's reliability, as well as the performance comparison with another SSI using sEMG electrodes with the same dimension, which exhibited a relatively low accuracy rate (42.60%).


Asunto(s)
Aprendizaje Profundo , Habla , Algoritmos , Electromiografía/métodos , Reproducibilidad de los Resultados , Silicio
2.
Sensors (Basel) ; 19(9)2019 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-31083445

RESUMEN

The bioelectrical impedance analysis (BIA) method is widely used to predict percent body fat (PBF). However, it requires four to eight electrodes, and it takes a few minutes to accurately obtain the measurement results. In this study, we propose a faster and more accurate method that utilizes a small dry electrode-based wearable device, which predicts whole-body impedance using only upper-body impedance values. Such a small electrode-based device typically needs a long measurement time due to increased parasitic resistance, and its accuracy varies by measurement posture. To minimize these variations, we designed a sensing system that only utilizes contact with the wrist and index fingers. The measurement time was also reduced to five seconds by an effective parameter calibration network. Finally, we implemented a deep neural network-based algorithm to predict the PBF value by the measurement of the upper-body impedance and lower-body anthropometric data as auxiliary input features. The experiments were performed with 163 amateur athletes who exercised regularly. The performance of the proposed system was compared with those of two commercial systems that were designed to measure body composition using either a whole-body or upper-body impedance value. The results showed that the correlation coefficient ( r 2 ) value was improved by about 9%, and the standard error of estimate (SEE) was reduced by 28%.


Asunto(s)
Antropometría/métodos , Composición Corporal/fisiología , Electrodos , Impedancia Eléctrica , Humanos , Dispositivos Electrónicos Vestibles
3.
J Acoust Soc Am ; 143(1): EL37, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-29390776

RESUMEN

In this letter, a generic search grid generation algorithm for far-field source localization (SL) is proposed. Since conventional uniform regular grid structures only consider the resolution of the distribution, it is difficult to control the number of grid points to be distributed. The proposed algorithm generates a search grid by distributing a desired number of points evenly, depending on the target criterion, in either direction of arrival or time difference of arrival domain. The experimental results show that the proposed algorithm provides optimally distributed grid points given the number of desired points and the corresponding domain for SL processing.

4.
Artículo en Inglés | MEDLINE | ID: mdl-26737691

RESUMEN

This paper proposes a constrained two-layer compression technique for electrocardiogram (ECG) waves, of which encoded parameters can be directly used for the diagnosis of arrhythmia. In the first layer, a single ECG beat is represented by one of the registered templates in the codebook. Since the required coding parameter in this layer is only the codebook index of the selected template, its compression ratio (CR) is very high. Note that the distribution of registered templates is also related to the characteristics of ECG waves, thus it can be used as a metric to detect various types of arrhythmias. The residual error between the input and the selected template is encoded by a wavelet-based transform coding in the second layer. The number of wavelet coefficients is constrained by pre-defined maximum distortion to be allowed. The MIT-BIH arrhythmia database is used to evaluate the performance of the proposed algorithm. The proposed algorithm shows around 7.18 CR when the reference value of percentage root mean square difference (PRD) is set to ten.


Asunto(s)
Arritmias Cardíacas/diagnóstico , Electrocardiografía , Algoritmos , Compresión de Datos , Bases de Datos Factuales , Humanos , Valores de Referencia , Procesamiento de Señales Asistido por Computador , Análisis de Ondículas
5.
J Acoust Soc Am ; 135(6): EL284-90, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24907835

RESUMEN

This letter investigates the impact of spectral compression on the vector Taylor series-based model adaptation algorithm. Unlike mel-frequency cepstral coefficients obtained by the logarithmic compression, the fractional power compression is used for extracting features. Since the relationship between acoustic models for clean and noisy speech depends on nonlinearity of the spectrum, it is important to select an appropriate compressive operator in the model adaptation. In this letter, the dependency of spectral nonlinearity on the speech recognition system is analyzed in various noisy environments. Experimental results confirm that the replacement of the compressive operator improves the performance of the model adaptation.

6.
Biomed Res Int ; 2013: 758731, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24288686

RESUMEN

This paper investigates the effectiveness of measures related to vocal tract characteristics in classifying normal and pathological speech. Unlike conventional approaches that mainly focus on features related to the vocal source, vocal tract characteristics are examined to determine if interaction effects between vocal folds and the vocal tract can be used to detect pathological speech. Especially, this paper examines features related to formant frequencies to see if vocal tract characteristics are affected by the nature of the vocal fold-related pathology. To test this hypothesis, stationary fragments of vowel /aa/ produced by 223 normal subjects, 472 vocal fold polyp subjects, and 195 unilateral vocal cord paralysis subjects are analyzed. Based on the acoustic-articulatory relationships, phonation for pathological subjects is found to be associated with measures correlated with a raised tongue body or an advanced tongue root. Vocal tract-related features are also found to be statistically significant from the Kruskal-Wallis test in distinguishing normal and pathological speech. Classification results demonstrate that combining the formant measurements with vocal fold-related features results in improved performance in differentiating vocal pathologies including vocal polyps and unilateral vocal cord paralysis, which suggests that measures related to vocal tract characteristics may provide additional information in diagnosing vocal disorders.


Asunto(s)
Fonación , Software de Reconocimiento del Habla , Pliegues Vocales/fisiopatología , Trastornos de la Voz , Voz , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Trastornos de la Voz/diagnóstico , Trastornos de la Voz/fisiopatología
7.
J Acoust Soc Am ; 134(5): EL438-44, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-24181988

RESUMEN

This letter proposes a degradation and cognition model to estimate speech quality impairment because of packet loss concealment (PLC) algorithm implemented in the speech CODEC SILK. By considering the fact that the quality degradation caused by packet loss is highly related to the PLC algorithm, the impact of quality degradation on various types of previous and lost packet classes is analyzed. Then, the PLC effects to the proposed class types are measured by the class conditional expectation of the degradation scores. Finally, the cognition module is derived to estimate the total quality degradation in a mean opinion score (MOS) scale. When assessed for correlation with subject test results, the correlation coefficient of the encoder-based class model is 0.93, and that of the decoder-based model is 0.87.


Asunto(s)
Internet , Modelos Teóricos , Procesamiento de Señales Asistido por Computador , Acústica del Lenguaje , Percepción del Habla , Medición de la Producción del Habla , Calidad de la Voz , Algoritmos , Audiometría del Habla , Femenino , Humanos , Masculino , Inteligibilidad del Habla
8.
J Acoust Soc Am ; 132(1): EL29-35, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22779569

RESUMEN

This letter presents a single-channel speech dereverberation approach using a non-causal minimum variance distortionless response (MVDR) filter. The non-causal filter is adopted to utilize the additional information of the desired signal that lies in subsequent frames. Note that the desired signal output has minimal distortion due to the introduction of the MVDR criterion. The proposed system further suppresses the late reverberation by employing a statistical reverberant model. Experimental results demonstrate the superiority of the proposed algorithm to conventional approaches.


Asunto(s)
Algoritmos , Percepción del Habla/fisiología , Análisis de Fourier , Humanos , Enmascaramiento Perceptual/fisiología , Acústica del Lenguaje , Inteligibilidad del Habla/fisiología
9.
J Acoust Soc Am ; 131(2): 1536-46, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22352523

RESUMEN

Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs.


Asunto(s)
Fonética , Reconocimiento en Psicología/fisiología , Acústica del Lenguaje , Percepción del Habla/fisiología , Teléfono , Algoritmos , Señales (Psicología) , Humanos , Espectrografía del Sonido , Software de Reconocimiento del Habla
10.
J Acoust Soc Am ; 126(3): EL100-6, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19739699

RESUMEN

This paper proposes an efficient method to improve speaker recognition performance by dynamically controlling the ratio of phoneme class information. It utilizes the fact that each phoneme contains different amounts of speaker discriminative information that can be measured by mutual information. After classifying phonemes into five classes, the optimal ratio of each class in both training and testing processes is adjusted using a non-linear optimization technique, i.e., the Nelder-Mead method. Speaker identification results verify that the proposed method achieves 18% improvement in terms of error rate compared to a baseline system.


Asunto(s)
Modelos Teóricos , Reconocimiento de Normas Patrones Automatizadas , Patrones de Reconocimiento Fisiológico , Fonética , Habla , Algoritmos , Animales , Teoría de la Información , Dinámicas no Lineales
11.
J Acoust Soc Am ; 122(3): EL88, 2007 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-17927313

RESUMEN

The perceptual relevance of adopting the temporal envelope to model the frequency band of 4-7 kHz (highband) in wideband speech signal is described in this letter. Based on theoretical work in psychoacoustics, we find out that the temporal envelope can indeed be a perceptual cue for the high-band signal, i.e., a noiseless sound can be obtained if the temporal envelope is roughly preserved. Subjective listening tests verify that transparent quality can be obtained if the model is used for the 4.5-7 kHz band. The proposed model has the benefits of offering flexible scalability and reducing the cost for quantization in coding applications.


Asunto(s)
Percepción Auditiva/fisiología , Audición/fisiología , Percepción del Habla/fisiología , Habla/fisiología , Humanos , Modelos Biológicos , Enmascaramiento Perceptual , Psicoacústica , Espectrografía del Sonido , Inteligibilidad del Habla
12.
J Acoust Soc Am ; 120(6): 3770-81, 2006 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-17225404

RESUMEN

A new method for overcoming signal cancellation problems due to correlated interferences which occur in a minimum variance distortionless response beamformer is proposed. Instead of decorrelating the correlated interferences, the coherently combining signal-to-interference plus noise ratio (CC-SINR) beamformer regards them as replicas of the desired signal and coherently combines them with the desired signal. This method uses an eigenvector constraint that suppresses only noise and uncorrelated interferences but retains the desired signal and correlated interferences. The CC-SINR beamformer does not require any preliminary information on correlated interferences. The signal-to-interference plus noise ratio (SINR) of the proposed beamformer output was compared to that of a conventional SINR beamformer when correlated interference, uncorrelated interference, and white noise exist. In addition, various key parameters that affect the performance of the beamformer, such as signal-to-noise ratio, uncorrelated interference-to-noise ratio, angular separation between signals, attenuation factor, phase delay of correlated interference, and the number of sensors were analyzed. All of the experimental results were in good agreement with the analytical results.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...