Pesquisa | Portal Regional da BVS

1.

Ultrathin crystalline-silicon-based strain gauges with deep learning algorithms for silent speech interfaces.

Kim, Taemin; Shin, Yejee; Kang, Kyowon; Kim, Kiho; Kim, Gwanho; Byeon, Yunsu; Kim, Hwayeon; Gao, Yuyan; Lee, Jeong Ryong; Son, Geonhui; Kim, Taeseong; Jun, Yohan; Kim, Jihyun; Lee, Jinyoung; Um, Seyun; Kwon, Yoohwan; Son, Byung Gwan; Cho, Myeongki; Sang, Mingyu; Shin, Jongwoon; Kim, Kyubeen; Suh, Jungmin; Choi, Heekyeong; Hong, Seokjun; Cheng, Huanyu; Kang, Hong-Goo; Hwang, Dosik; Yu, Ki Jun.

Nat Commun ; 13(1): 5815, 2022 10 03.

Artigo em Inglês | MEDLINE | ID: mdl-36192403

RESUMO

A wearable silent speech interface (SSI) is a promising platform that enables verbal communication without vocalization. The most widely studied methodology for SSI focuses on surface electromyography (sEMG). However, sEMG suffers from low scalability because of signal quality-related issues, including signal-to-noise ratio and interelectrode interference. Hence, here, we present a novel SSI by utilizing crystalline-silicon-based strain sensors combined with a 3D convolutional deep learning algorithm. Two perpendicularly placed strain gauges with minimized cell dimension (<0.1 mm2) could effectively capture the biaxial strain information with high reliability. We attached four strain sensors near the subject's mouths and collected strain data of unprecedently large wordsets (100 words), which our SSI can classify at a high accuracy rate (87.53%). Several analysis methods were demonstrated to verify the system's reliability, as well as the performance comparison with another SSI using sEMG electrodes with the same dimension, which exhibited a relatively low accuracy rate (42.60%).

Assuntos

Aprendizado Profundo , Fala , Algoritmos , Eletromiografia/métodos , Reprodutibilidade dos Testes , Silício

2.

Dry Electrode-Based Body Fat Estimation System with Anthropometric Data for Use in a Wearable Device.

Shin, Seung-Chul; Lee, Jinkyu; Choe, Soyeon; Yang, Hyuk In; Min, Jihee; Ahn, Ki-Yong; Jeon, Justin Y; Kang, Hong-Goo.

Sensors (Basel) ; 19(9)2019 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-31083445

RESUMO

The bioelectrical impedance analysis (BIA) method is widely used to predict percent body fat (PBF). However, it requires four to eight electrodes, and it takes a few minutes to accurately obtain the measurement results. In this study, we propose a faster and more accurate method that utilizes a small dry electrode-based wearable device, which predicts whole-body impedance using only upper-body impedance values. Such a small electrode-based device typically needs a long measurement time due to increased parasitic resistance, and its accuracy varies by measurement posture. To minimize these variations, we designed a sensing system that only utilizes contact with the wrist and index fingers. The measurement time was also reduced to five seconds by an effective parameter calibration network. Finally, we implemented a deep neural network-based algorithm to predict the PBF value by the measurement of the upper-body impedance and lower-body anthropometric data as auxiliary input features. The experiments were performed with 163 amateur athletes who exercised regularly. The performance of the proposed system was compared with those of two commercial systems that were designed to measure body composition using either a whole-body or upper-body impedance value. The results showed that the correlation coefficient ( r 2 ) value was improved by about 9%, and the standard error of estimate (SEE) was reduced by 28%.

Assuntos

Antropometria/métodos , Composição Corporal/fisiologia , Eletrodos , Impedância Elétrica , Humanos , Dispositivos Eletrônicos Vestíveis

3.

Generic uniform search grid generation algorithm for far-field source localization.

Lee, JeeSok; Chung, Soo-Whan; Kang, Hong-Goo; Choi, Min-Seok.

J Acoust Soc Am ; 143(1): EL37, 2018 01.

Artigo em Inglês | MEDLINE | ID: mdl-29390776

RESUMO

In this letter, a generic search grid generation algorithm for far-field source localization (SL) is proposed. Since conventional uniform regular grid structures only consider the resolution of the distribution, it is difficult to control the number of grid points to be distributed. The proposed algorithm generates a search grid by distributing a desired number of points evenly, depending on the target criterion, in either direction of arrival or time difference of arrival domain. The experimental results show that the proposed algorithm provides optimally distributed grid points given the number of desired points and the corresponding domain for SL processing.

4.

A constrained two-layer compression technique for ECG waves.

Byun, Kyungguen; Song, Eunwoo; Shim, Hwan; Lim, Hyungjoon; Kang, Hong-Goo.

Annu Int Conf IEEE Eng Med Biol Soc ; 2015: 6130-3, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26737691

RESUMO

This paper proposes a constrained two-layer compression technique for electrocardiogram (ECG) waves, of which encoded parameters can be directly used for the diagnosis of arrhythmia. In the first layer, a single ECG beat is represented by one of the registered templates in the codebook. Since the required coding parameter in this layer is only the codebook index of the selected template, its compression ratio (CR) is very high. Note that the distribution of registered templates is also related to the characteristics of ECG waves, thus it can be used as a metric to detect various types of arrhythmias. The residual error between the input and the selected template is encoded by a wavelet-based transform coding in the second layer. The number of wavelet coefficients is constrained by pre-defined maximum distortion to be allowed. The MIT-BIH arrhythmia database is used to evaluate the performance of the proposed algorithm. The proposed algorithm shows around 7.18 CR when the reference value of percentage root mean square difference (PRD) is set to ten.

Assuntos

Arritmias Cardíacas/diagnóstico , Eletrocardiografia , Algoritmos , Compressão de Dados , Bases de Dados Factuais , Humanos , Valores de Referência , Processamento de Sinais Assistido por Computador , Análise de Ondaletas

5.

Selection of spectral compressive operator for vector Taylor series-based model adaptation in noisy environments.

Baek, Soonho; Kang, Hong-Goo.

J Acoust Soc Am ; 135(6): EL284-90, 2014 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-24907835

RESUMO

This letter investigates the impact of spectral compression on the vector Taylor series-based model adaptation algorithm. Unlike mel-frequency cepstral coefficients obtained by the logarithmic compression, the fractional power compression is used for extracting features. Since the relationship between acoustic models for clean and noisy speech depends on nonlinearity of the spectrum, it is important to select an appropriate compressive operator in the model adaptation. In this letter, the dependency of spectral nonlinearity on the speech recognition system is analyzed in various noisy environments. Experimental results confirm that the replacement of the compressive operator improves the performance of the model adaptation.

6.

Speech quality estimation of voice over internet protocol codec using a packet loss impairment model.

Lee, Min-Ki; Kang, Hong-Goo.

J Acoust Soc Am ; 134(5): EL438-44, 2013 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-24181988

RESUMO

This letter proposes a degradation and cognition model to estimate speech quality impairment because of packet loss concealment (PLC) algorithm implemented in the speech CODEC SILK. By considering the fact that the quality degradation caused by packet loss is highly related to the PLC algorithm, the impact of quality degradation on various types of previous and lost packet classes is analyzed. Then, the PLC effects to the proposed class types are measured by the class conditional expectation of the degradation scores. Finally, the cognition module is derived to estimate the total quality degradation in a mean opinion score (MOS) scale. When assessed for correlation with subject test results, the correlation coefficient of the encoder-based class model is 0.93, and that of the decoder-based model is 0.87.

Assuntos

Internet , Modelos Teóricos , Processamento de Sinais Assistido por Computador , Acústica da Fala , Percepção da Fala , Medida da Produção da Fala , Qualidade da Voz , Algoritmos , Audiometria da Fala , Feminino , Humanos , Masculino , Inteligibilidade da Fala

7.

An investigation of vocal tract characteristics for acoustic discrimination of pathological voices.

Lee, Jung-Won; Kang, Hong-Goo; Choi, Jeung-Yoon; Son, Young-Ik.

Biomed Res Int ; 2013: 758731, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24288686

RESUMO

This paper investigates the effectiveness of measures related to vocal tract characteristics in classifying normal and pathological speech. Unlike conventional approaches that mainly focus on features related to the vocal source, vocal tract characteristics are examined to determine if interaction effects between vocal folds and the vocal tract can be used to detect pathological speech. Especially, this paper examines features related to formant frequencies to see if vocal tract characteristics are affected by the nature of the vocal fold-related pathology. To test this hypothesis, stationary fragments of vowel /aa/ produced by 223 normal subjects, 472 vocal fold polyp subjects, and 195 unilateral vocal cord paralysis subjects are analyzed. Based on the acoustic-articulatory relationships, phonation for pathological subjects is found to be associated with measures correlated with a raised tongue body or an advanced tongue root. Vocal tract-related features are also found to be statistically significant from the Kruskal-Wallis test in distinguishing normal and pathological speech. Classification results demonstrate that combining the formant measurements with vocal fold-related features results in improved performance in differentiating vocal pathologies including vocal polyps and unilateral vocal cord paralysis, which suggests that measures related to vocal tract characteristics may provide additional information in diagnosing vocal disorders.

Assuntos

Fonação , Interface para o Reconhecimento da Fala , Prega Vocal/fisiopatologia , Distúrbios da Voz , Voz , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Distúrbios da Voz/diagnóstico , Distúrbios da Voz/fisiopatologia

8.

Single-channel dereverberation using a non-causal minimum variance distortionless response filter.

Song, Myung-Suk; Kang, Hong-Goo.

J Acoust Soc Am ; 132(1): EL29-35, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22779569

RESUMO

This letter presents a single-channel speech dereverberation approach using a non-causal minimum variance distortionless response (MVDR) filter. The non-causal filter is adopted to utilize the additional information of the desired signal that lies in subsequent frames. Note that the desired signal output has minimal distortion due to the introduction of the MVDR criterion. The proposed system further suppresses the late reverberation by employing a statistical reverberant model. Experimental results demonstrate the superiority of the proposed algorithm to conventional approaches.

Assuntos

Algoritmos , Percepção da Fala/fisiologia , Análise de Fourier , Humanos , Mascaramento Perceptivo/fisiologia , Acústica da Fala , Inteligibilidade da Fala/fisiologia

9.

Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech.

Lee, Jung-Won; Choi, Jeung-Yoon; Kang, Hong-Goo.

J Acoust Soc Am ; 131(2): 1536-46, 2012 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-22352523

RESUMO

Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs.

Assuntos

Fonética , Reconhecimento Psicológico/fisiologia , Acústica da Fala , Percepção da Fala/fisiologia , Telefone , Algoritmos , Sinais (Psicologia) , Humanos , Espectrografia do Som , Interface para o Reconhecimento da Fala

10.

Phonetically optimized speaker modeling for robust speaker recognition.

Lee, Bong-Jin; Choi, Jeung-Yoon; Kang, Hong-Goo.

J Acoust Soc Am ; 126(3): EL100-6, 2009 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-19739699

RESUMO

This paper proposes an efficient method to improve speaker recognition performance by dynamically controlling the ratio of phoneme class information. It utilizes the fact that each phoneme contains different amounts of speaker discriminative information that can be measured by mutual information. After classifying phonemes into five classes, the optimal ratio of each class in both training and testing processes is adjusted using a non-linear optimization technique, i.e., the Nelder-Mead method. Speaker identification results verify that the proposed method achieves 18% improvement in terms of error rate compared to a baseline system.

Assuntos

Modelos Teóricos , Reconhecimento Automatizado de Padrão , Reconhecimento Fisiológico de Modelo , Fonética , Fala , Algoritmos , Animais , Teoria da Informação , Dinâmica não Linear

11.

Perceptual relevance of the temporal envelope to the speech signal in the 4-7 kHz band.

Kim, Kyung Tae; Choi, Jeung-Yoon; Kang, Hong Goo.

J Acoust Soc Am ; 122(3): EL88, 2007 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-17927313

RESUMO

The perceptual relevance of adopting the temporal envelope to model the frequency band of 4-7 kHz (highband) in wideband speech signal is described in this letter. Based on theoretical work in psychoacoustics, we find out that the temporal envelope can indeed be a perceptual cue for the high-band signal, i.e., a noiseless sound can be obtained if the temporal envelope is roughly preserved. Subjective listening tests verify that transparent quality can be obtained if the model is used for the 4.5-7 kHz band. The proposed model has the benefits of offering flexible scalability and reducing the cost for quantization in coding applications.

Assuntos

Percepção Auditiva/fisiologia , Audição/fisiologia , Percepção da Fala/fisiologia , Fala/fisiologia , Humanos , Modelos Biológicos , Mascaramento Perceptivo , Psicoacústica , Espectrografia do Som , Inteligibilidade da Fala

12.

Optimum beamformer in correlated source environments.

Kim, Seungil; Lee, Chungyong; Kang, Hong-Goo.

J Acoust Soc Am ; 120(6): 3770-81, 2006 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-17225404

RESUMO

A new method for overcoming signal cancellation problems due to correlated interferences which occur in a minimum variance distortionless response beamformer is proposed. Instead of decorrelating the correlated interferences, the coherently combining signal-to-interference plus noise ratio (CC-SINR) beamformer regards them as replicas of the desired signal and coherently combines them with the desired signal. This method uses an eigenvector constraint that suppresses only noise and uncorrelated interferences but retains the desired signal and correlated interferences. The CC-SINR beamformer does not require any preliminary information on correlated interferences. The signal-to-interference plus noise ratio (SINR) of the proposed beamformer output was compared to that of a conventional SINR beamformer when correlated interference, uncorrelated interference, and white noise exist. In addition, various key parameters that affect the performance of the beamformer, such as signal-to-noise ratio, uncorrelated interference-to-noise ratio, angular separation between signals, attenuation factor, phase delay of correlated interference, and the number of sensors were analyzed. All of the experimental results were in good agreement with the analytical results.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA