Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
J Voice ; 33(2): 204-213, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-29162356

RESUMO

BACKGROUND: The perception of pediatric voice quality has been investigated using clinical protocols developed for adult voices and acoustic analyses designed to identify important physical parameters associated with normal and dysphonic pediatric voices. Laboratory investigations of adult dysphonia have included sophisticated methods, including a psychoacoustic approach that involves a single-variable matching task (SVMT), characterized by high inter- and intra-listener reliability, and analyses that include bio-inspired models of auditory perception that have provided valuable information regarding adult voice quality. OBJECTIVES: To establish the utility of a psychoacoustic approach to the investigation of voice quality perception in the context of pediatric voices? METHODS: Six listeners judged the breathiness of 20 synthetic vowel stimuli using an SVMT. To support comparisons with previous data, stimuli were modeled after four pediatric speakers and synthesized using Klatt with five parameter settings that influence the perception of breathiness. The population average breathiness judgments were modeled with acoustic measures of loudness ratio, pitch strength, and cepstral peak. RESULTS: Listeners reliably judged the perceived breathiness of pediatric voices, as with previous investigations of breathiness in adult dysphonic voices. Breathiness judgments were accurately modeled by loudness ratio (r2 = 0.93), pitch strength (r2 = 0.91), and cepstral peak (r2 = 0.82). Model accuracy was not affected significantly by including stimulus fundamental frequency and was slightly higher for pediatric than for adult voices. CONCLUSIONS: The SVMT proved robust for pediatric voices spanning a wide range of breathiness. The data indicate that this is a promising approach for future investigation of pediatric voice quality.


Assuntos
Percepção Auditiva , Disfonia/diagnóstico , Acústica da Fala , Qualidade da Voz , Fatores Etários , Pré-Escolar , Disfonia/fisiopatologia , Feminino , Humanos , Julgamento , Percepção Sonora , Masculino , Variações Dependentes do Observador , Percepção da Altura Sonora , Psicoacústica , Índice de Gravidade de Doença , Espectrografia do Som , Percepção da Fala , Adulto Jovem
2.
J Voice ; 33(4): 473-481, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29804803

RESUMO

OBJECTIVES: This study aims to determine the sensitivity of perceptual and computational correlates of breathy and rough voice quality (VQ) across multiple vowel categories using single-variable matching tasks (SVMTs). METHODS: Sustained phonations of /a/, /i/, and /u/ from 20 dysphonic talkers (10 with primarily breathy voices and 10 with primarily rough voices) were selected from the University of Florida Dysphonic Voice Database. For primarily breathy voices, perceived breathiness was judged, and for primarily rough voices, perceived roughness was judged by the same group of 10 listeners using an SVMT with five replicates per condition. Measures of pitch strength, cepstral peak, and autocorrelation peak were applied to models of the perceptual data. RESULTS: Intra- and inter-rater reliability were high for both the breathiness and the roughness perceptual tasks. For breathiness judgments, the effect of vowel was small. Averaged over all talkers and listeners, breathiness judgments for /a/, /i/, and /u/ were -11.6, -11.2, and -12.2 dB noise-to-signal ratio, respectively. For roughness judgments, the effect of vowel was larger. The perceived roughness of /a/ was higher than /i/ or /u/ by 3 dB modulation depth. Pitch strength was the most accurate predictor of breathiness matching (r2 = 0.84-0.94 across vowels), and log-transformed autocorrelation peak was the most accurate predictor of roughness matching (r2 = 0.59-0.83 across vowels). CONCLUSIONS: Breathiness is more consistently represented across vowels for dysphonic voices than roughness. This work represents a critical step in advancing studies of voice quality perception from single vowels to running speech.


Assuntos
Disfonia/diagnóstico , Julgamento , Acústica da Fala , Percepção da Fala , Medida da Produção da Fala , Qualidade da Voz , Disfonia/fisiopatologia , Feminino , Humanos , Masculino , Fonação , Percepção da Altura Sonora , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Adulto Jovem
3.
J Voice ; 33(5): 795-800, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29773324

RESUMO

INTRODUCTION: The diagnoses of voice disorders, as well as treatment outcomes, are often tracked using visual (eg, stroboscopic images), auditory (eg, perceptual ratings), objective (eg, from acoustic or aerodynamic signals), and patient report (eg, Voice Handicap Index and Voice-Related Quality of Life) measures. However, many of these measures are known to have low to moderate sensitivity and specificity for detecting changes in vocal characteristics, including vocal quality. OBJECTIVE: The objective of this study was to compare changes in estimated pitch strength (PS) with other conventionally used acoustic measures based on the cepstral peak prominence (smoothed cepstral peak prominence, cepstral spectral index of dysphonia, and acoustic voice quality index), and clinical judgments of voice quality (GRBAS [grade, roughness, breathiness, asthenia, strain] scale) following laryngeal framework surgery. METHODS: This study involved post hoc analysis of recordings from 22 patients pretreatment and post treatment (thyroplasty and behavioral therapy). Sustained vowels and connected speech were analyzed using objective measures (PS, smoothed cepstral peak prominence, cepstral spectral index of dysphonia, and acoustic voice quality index), and these results were compared with mean auditory-perceptual ratings by expert clinicians using the GRBAS scale. RESULTS: All four acoustic measures changed significantly in the direction that usually indicates improved voice quality following treatment (P < 0.005). Grade and breathiness correlated the strongest with the acoustic measures (|r| ~ 0.7) with strain being the least correlated. CONCLUSIONS: Acoustic analysis on running speech highly correlates with judged ratings. PS is a robust, easily obtained acoustic measure of voice quality that could be useful in the clinical environment to follow treatment of voice disorders.


Assuntos
Laringoplastia , Acústica da Fala , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Adulto Jovem
4.
J Voice ; 31(6): 691-696, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28318967

RESUMO

BACKGROUND: Measurement of treatment outcomes is critical for the spectrum of voice treatments (ie, surgical, behavioral, or pharmacological). Outcome measures typically include visual (eg, stroboscopic data), auditory (eg, Consensus Auditory-Perceptual Evaluation of Voice; Grade, Roughness, Breathiness, Asthenia, Strain), and objective correlates of vocal fold vibratory characteristics, such as acoustic signals (eg, harmonics-to-noise ratio, cepstral peak prominence) or patient self-reported questionnaires (eg, Voice Handicap Index, Voice-Related Quality of Life). Subjective measures often show high variability, whereas most acoustic measures of voice are only valid for signals where some degree of periodicity can be assumed. However, this assumption is often invalid for dysphonic voices where signal periodicity is suspect. Furthermore, many of these measures are not useful in isolation for diagnostic purposes. OBJECTIVE: We evaluated a recently developed algorithm (Auditory Sawtooth Waveform Inspired Pitch Estimator-Prime [Auditory-SWIPE']) for estimating pitch and pitch strength for dysphonic voices. Whereas fundamental frequency is a physical attribute of a signal, pitch is its psychophysical correlate. As such, the perception of pitch can extend to most signals irrespective of their periodicity. METHODS: Post hoc analyses were conducted for three groups of patients evaluated and treated for voice problems at a major voice center: (1) muscle tension dysphonia/functional dysphonia, (2) vocal fold mass(es), and (3) presbyphonia. All patients were recorded before and after surgical/behavioral treatment for voice disorders. Pitch and pitch strength for each speaker were computed with the Auditory-SWIPE' algorithm. RESULTS: Comparison of pre- and posttreatment data provides support for pitch strength as a measure of treatment outcomes for dysphonic voices.


Assuntos
Acústica , Disfonia/terapia , Procedimentos Cirúrgicos Otorrinolaringológicos , Acústica da Fala , Medida da Produção da Fala/métodos , Qualidade da Voz , Treinamento da Voz , Adulto , Idoso , Algoritmos , Disfonia/diagnóstico , Disfonia/fisiopatologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Percepção da Altura Sonora , Valor Preditivo dos Testes , Recuperação de Função Fisiológica , Estudos Retrospectivos , Processamento de Sinais Assistido por Computador , Espectrografia do Som , Fatores de Tempo , Resultado do Tratamento
5.
J Voice ; 29(6): 670-81, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25944288

RESUMO

OBJECTIVE: The aim of this study was to develop a theoretic analysis of the cepstral peak (CP), to compare several CP software programs, and to propose methods for reducing variability in CP estimation. STUDY DESIGN: Descriptive, experimental study. METHODS: The theoretic CP value of a pulse train was derived and compared with estimates computed for pulse train WAV files using available CP software programs: (1) Hillenbrand's CP prominence (CPP) software (Western Michigan University, Kalamazoo, MI), (2) KayPENTAX (Montvale, NJ) Multi-Speech implementation of CPP, and (3) a MATLAB (The Mathworks, Natick, MA, version R2014a) implementation using cepstral interpolation. The CP variation was also investigated for synthetic breathy vowels. RESULTS: For pulse trains with period T samples, the theoretic CP is 1/2+ε/T, |ε|<0.1 for all pulse trains (ε=0 for integer T). For fundamental frequencies between 70 and 230Hz, the CP mean±standard deviation was 0.496±0.002 using cepstral interpolation and 0.29±0.03 using Hillenbrand's software, whereas CPP was 35.0±3.8dB using Hillenbrand's software and 20.5±2.7dB using KayPENTAX's software. The CP and CPP versus signal-to-noise ratio for synthetic breathy vowels were fit to a logistic model for the Hillenbrand (R(2)=0.92) and KayPENTAX (R(2)=0.82) estimators as well as an ideal estimator (R(2)=0.98), which used a period-synchronous analysis. CONCLUSIONS: The findings indicate that several variables unrelated to the signal itself impact CP values, with some factors introducing large variability in CP values that would otherwise be attributed to the signal (eg, voice quality). Variability may be reduced by using a period-synchronous analysis with Hann windows.


Assuntos
Acústica da Fala , Qualidade da Voz , Humanos , Software
6.
J Acoust Soc Am ; 134(2): EL127-32, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23927214

RESUMO

Spontaneous swallowing in dysphagic individuals has been shown to occur at a lower rate compared to healthy controls, and passive swallowing detection may function as a valid screening test to identify dysphagia in at-risk populations. To automate swallow identification, acoustic source and vocal tract features were extracted from two types of swallows and eight upper airway movements from nine healthy subjects. Swallow vs non-swallow classification accuracy was 96.3 ± 1.1%. The results provide useful methods for further development of automated tools for identification of patients with swallowing impairment.


Assuntos
Acústica , Deglutição , Laringe/fisiologia , Automação Laboratorial , Fenômenos Biomecânicos , Voluntários Saudáveis , Humanos , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Processamento de Sinais Assistido por Computador , Transdutores , Prega Vocal/fisiologia
7.
J Acoust Soc Am ; 125(1): 513-21, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19173436

RESUMO

Methods for detecting echolocation calls in field recordings of bats vary in performance and influence the effective range of a recording system. In experiments using synthetic calls from five species, human detection accuracy was 89.7+/-0.6%, compared to 76.3+/-0.8% for a model-based detector, 72.2+/-0.8% for an energy-based detector, and 98.4+/-0.2% for an optimal linear detector. The energy-based detector was 11 times faster than the model-based detector and 110 times faster than humans. Human accuracy was positively correlated with test duration (R(2)=0.43, P<0.05), meaning that higher accuracy was achieved at the expense of slower performance. Species was a significant factor determining accuracy for all detectors (P<0.001) because of call bandwidth: Narrowband calls concentrated energy in a narrower frequency band and were easier to detect. For a hypothetical recording system, range at 90% human detection accuracy varied from 10 to 35 m among species, while range dropped by approximately 20% using the automated detectors. The optimal detector outperformed humans by 5 dB and the automated methods by 9 dB. The results quantify the tradeoff between detector speed and accuracy and are useful for designing field studies of bats.


Assuntos
Ecolocação , Processamento Eletrônico de Dados , Detecção de Sinal Psicológico , Vocalização Animal , Animais , Percepção Auditiva , Quirópteros , Humanos , Modelos Teóricos
8.
J Acoust Soc Am ; 124(1): 328-36, 2008 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-18646980

RESUMO

The link detector combines a model-based spectral peak tracker with an echo filter to detect echolocation calls of bats. By processing calls in the spectrogram domain, the links detector separates calls that overlap in time, including call harmonics and echoes. The links detector was validated by using an artificial recording environment, including synthetic calls, atmospheric absorption, and echoes, which provided control of signal-to-noise ratio and an absolute ground truth. Maximum hit rate (2% false positive rate) for the links detector was 87% compared to 1.5% for a spectral peak detector. The difference in performance was due to the ability of the links detector to filter out echoes. Detection range varied across species from 13 to more than 20 m due to call bandwidth and frequency range. Global features of calls detected by the links detector were compared to those of synthetic calls. The error in all estimates increased as the range increased, and estimates of minimum frequency and frequency of most energy were more accurate compared to maximum frequency. The links detector combines local and global features to automatically detect calls within the machine learning paradigm and detects overlapping calls and call harmonics in a unified framework.


Assuntos
Ecolocação/fisiologia , Processamento Eletrônico de Dados , Acústica , Animais , Quirópteros , Simulação por Computador , Meio Ambiente , Modelos Biológicos
9.
J Acoust Soc Am ; 123(5): 2643-50, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18529184

RESUMO

Detection of echolocation calls is fundamental to quantitative analysis of bat acoustic signals. Automated methods of detection reduce the subjectivity of hand labeling of calls and speed up the detection process in an accurate and repeatable manner. A model-based detector was initialized using a baseline energy threshold detector, removing the need for hand labels to train the model, and shown to be superior to the baseline detector using synthetic calls in two experiments: (1) an artificial environment and (2) a field playback setting. Synthetic calls using a piecewise exponential frequency modulation function from five hypothetical species were employed to control the signal-to-noise ratio (SNR) in each experiment and to provide an absolute ground truth to judge detector performance. The model-based detector outperformed the baseline detector by 2.5 dB SNR in the artificial environment and 1.5 dB SNR in the field playback setting. Atmospheric absorption was measured for the synthetic calls, and 1.5 dB increased the effective detection radius by between 1 and 7 m depending on species. The results demonstrate that hand labels are not necessary for training detection models and that model-based detectors significantly increase the range of detection for a recording system.


Assuntos
Quirópteros , Ecolocação/fisiologia , Modelos Biológicos , Acústica , Animais , Automação , Julgamento , Ruído , Orientação , Oscilometria , Comportamento Predatório , Reologia , Sensibilidade e Especificidade , Comportamento Social
10.
Neural Netw ; 20(3): 414-23, 2007 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-17556115

RESUMO

We have combined an echo state network (ESN) with a competitive state machine framework to create a classification engine called the predictive ESN classifier. We derive the expressions for training the predictive ESN classifier and show that the model was significantly more noise robust compared to a hidden Markov model in noisy speech classification experiments by 8+/-1 dB signal-to-noise ratio. The simple training algorithm and noise robustness of the predictive ESN classifier make it an attractive classification engine for automatic speech recognition.


Assuntos
Reconhecimento Automatizado de Padrão , Reconhecimento Psicológico/fisiologia , Percepção da Fala/fisiologia , Fala/fisiologia , Análise de Variância , Inteligência Artificial , Humanos , Cadeias de Markov , Ruído , Processamento de Sinais Assistido por Computador , Interface para o Reconhecimento da Fala
11.
IEEE Trans Biomed Eng ; 53(10): 1983-9, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17019862

RESUMO

We propose a method of predicting intrauterine pressure (IUP) from external electrohysterograms (EHG) using a causal FIR Wiener filter. IUP and 8-channel EHG data were collected simultaneously from 14 laboring patients at term, and prediction models were trained and tested using 10-min windows for each patient and channel. RMS prediction error varied between 5-14 mmHg across all patients. We performed a 4-way analysis of variance on the RMS error, which varied across patients, channels, time (test window) and model (train window). The patient-channel interaction was the most significant factor while channel alone was not significant, indicating that different channels produced significantly different RMS errors depending on the patient. The channel-time factor was significant due to single-channel bursty noise, while time was a significant factor due to multichannel bursty noise. The time-model interaction was not significant, supporting the assumption that the random process generating the IUP and EHG signals was stationary. The results demonstrate the capabilities of optimal linear filter in predicting IUP from external EHG and offer insight into the factors that affect prediction error of IUP from multichannel EHG recordings.


Assuntos
Diagnóstico por Computador/métodos , Eletromiografia/métodos , Manometria/métodos , Gravidez/fisiologia , Contração Uterina/fisiologia , Monitorização Uterina/métodos , Útero/fisiologia , Algoritmos , Feminino , Humanos , Modelos Lineares , Contração Muscular/fisiologia , Pressão , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador
12.
J Acoust Soc Am ; 119(3): 1817-33, 2006 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-16583922

RESUMO

Current automatic acoustic detection and classification of microchiroptera utilize global features of individual calls (i.e., duration, bandwidth, frequency extrema), an approach that stems from expert knowledge of call sonograms. This approach parallels the acoustic phonetic paradigm of human automatic speech recognition (ASR), which relied on expert knowledge to account for variations in canonical linguistic units. ASR research eventually shifted from acoustic phonetics to machine learning, primarily because of the superior ability of machine learning to account for signal variation. To compare machine learning with conventional methods of detection and classification, nearly 3000 search-phase calls were hand labeled from recordings of five species: Pipistrellus bodenheimeri, Molossus molossus, Lasiurus borealis, L. cinereus semotus, and Tadarida brasiliensis. The hand labels were used to train two machine learning models: a Gaussian mixture model (GMM) for detection and classification and a hidden Markov model (HMM) for classification. The GMM detector produced 4% error compared to 32% error for a baseline broadband energy detector, while the GMM and HMM classifiers produced errors of 0.6 +/- 0.2% compared to 16.9 +/- 1.1% error for a baseline discriminant function analysis classifier. The experiments showed that machine learning algorithms produced errors an order of magnitude smaller than those for conventional methods.


Assuntos
Acústica/instrumentação , Quirópteros/classificação , Quirópteros/fisiologia , Ecolocação/fisiologia , Algoritmos , Animais , Aprendizagem , Cadeias de Markov , Modelos Biológicos , Distribuição Normal , Curva ROC , Reprodutibilidade dos Testes , Espectrografia do Som
13.
J Acoust Soc Am ; 116(3): 1774-80, 2004 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-15478444

RESUMO

Mel frequency cepstral coefficients (MFCC) are the most widely used speech features in automatic speech recognition systems, primarily because the coefficients fit well with the assumptions used in hidden Markov models and because of the superior noise robustness of MFCC over alternative feature sets such as linear prediction-based coefficients. The authors have recently introduced human factor cepstral coefficients (HFCC), a modification of MFCC that uses the known relationship between center frequency and critical bandwidth from human psychoacoustics to decouple filter bandwidth from filter spacing. In this work, the authors introduce a variation of HFCC called HFCC-E in which filter bandwidth is linearly scaled in order to investigate the effects of wider filter bandwidth on noise robustness. Experimental results show an increase in signal-to-noise ratio of 7 dB over traditional MFCC algorithms when filter bandwidth increases in HFCC-E. An important attribute of both HFCC and HFCC-E is that the algorithms only differ from MFCC in the filter bank coefficients: increased noise robustness using wider filters is achieved with no additional computational cost.


Assuntos
Fonética , Percepção da Fala/fisiologia , Algoritmos , Humanos , Cadeias de Markov , Modelos Biológicos , Ruído , Psicoacústica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA