Search | VHL Regional Portal

1.

Validating a psychoacoustic model of voice quality.

Kreiman, Jody; Lee, Yoonjeong; Garellek, Marc; Samlan, Robin; Gerratt, Bruce R.

J Acoust Soc Am ; 149(1): 457, 2021 01.

Article in English | MEDLINE | ID: mdl-33514179

ABSTRACT

No agreed-upon method currently exists for objective measurement of perceived voice quality. This paper describes validation of a psychoacoustic model designed to fill this gap. This model includes parameters to characterize the harmonic and inharmonic voice sources, vocal tract transfer function, fundamental frequency, and amplitude of the voice, which together serve to completely quantify the integral sound of a target voice sample. In experiment 1, 200 voices with and without diagnosed vocal pathology were fit with the model using analysis-by-synthesis. The resulting synthetic voice samples were not distinguishable from the original voice tokens, suggesting that the model has all the parameters it needs to fully quantify voice quality. In experiment 2 parameters that model the harmonic voice source were removed one by one, and the voice tokens were re-synthesized with the reduced model. In every case the lower-dimensional models provided worse perceptual matches to the quality of the natural tokens than did the original set, indicating that the psychoacoustic model cannot be reduced in dimensionality without loss of fit to the data. Results confirm that this model can be validly applied to quantify voice quality in clinical and research applications.

Subject(s)

Psychoacoustics , Voice Disorders , Voice , Female , Humans , Male , Speech , Speech Acoustics , Voice Quality

2.

Vocal Fundamental Frequency and Sound Pressure Level in Charismatic Speech: A Cross-Gender and -Language Study.

Signorello, Rosario; Demolin, Didier; Henrich Bernardoni, Nathalie; Gerratt, Bruce R; Zhang, Zhaoyan; Kreiman, Jody.

J Voice ; 34(5): 808.e1-808.e13, 2020 Sep.

Article in English | MEDLINE | ID: mdl-31196689

ABSTRACT

OBJECTIVES/HYPOTHESES: Charismatic leaders use vocal behavior to persuade their audience, achieve goals, arouse emotional states, and convey personality traits and leadership status. This study investigates voice fundamental frequency (f0) and sound pressure level (SPL) in female and male French, Italian, Brazilian, and American politicians to determine which acoustic parameters are related to cross-gender and cross-cultural common vocal abilities, and which derive from culture-, gender-, and language-specific vocal strategies used to adapt vocal behavior to listeners' culture-related expectations. STUDY DESIGN: Speech corpora were collected for two formal communicative contexts (leaders address followers or other leaders) and one informal communicative context (dyadic interaction), based on the persuasive goals inherent in each context and on the relative status of the listeners and speakers. Leaders' acoustic voice profiles were created to show differences in f0 and SPL manipulation with respect to speakers' gender and language in each communicative context. RESULTS: Cross-gender and cross-language similarities in manipulation of average f0 and in f0 and SPL ranges occurred in all communicative contexts. Patterns of f0 manipulation were shared across genders and cultures, suggesting this dimension might be biologically based and is exploited by leaders to convey dominance. Ranges for f0 and SPL seemed to be affected by the communicative context, being wider or narrower depending on the persuasive goal. Results also showed language- and speaker-specific differences in the acoustic manipulation of f0 and SPL over time. CONCLUSIONS: These findings are consistent with the idea that specific charismatic leaders' vocal behaviors depend on a fine combination of vocal abilities that are shared across cultures and genders, combined with culturally- and linguistically-filtered vocal strategies.

Subject(s)

Speech , Voice , Brazil , Female , Humans , Language , Male , Sound , Speech Acoustics

3.

Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech.

Gerratt, Bruce R; Kreiman, Jody; Garellek, Marc.

J Speech Lang Hear Res ; 59(5): 994-1001, 2016 10 01.

Article in English | MEDLINE | ID: mdl-27626612

ABSTRACT

Purpose: The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Method: Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. Results: Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. Conclusions: Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.

Subject(s)

Phonation , Speech , Voice Quality , Adult , Analysis of Variance , Female , Humans , Male , Sound Spectrography , Young Adult

4.

Modeling the voice source in terms of spectral slopes.

Garellek, Marc; Samlan, Robin; Gerratt, Bruce R; Kreiman, Jody.

J Acoust Soc Am ; 139(3): 1404-10, 2016 Mar.

Article in English | MEDLINE | ID: mdl-27036277

ABSTRACT

A psychoacoustic model of the voice source spectrum is proposed. The model is characterized by four spectral slope parameters: the difference in amplitude between the first two harmonics (H1-H2), the second and fourth harmonics (H2-H4), the fourth harmonic and the harmonic nearest 2 kHz in frequency (H4-2 kHz), and the harmonic nearest 2 kHz and that nearest 5 kHz (2 kHz-5 kHz). As a step toward model validation, experiments were conducted to establish the acoustic and perceptual independence of these parameters. In experiment 1, the model was fit to a large number of voice sources. Results showed that parameters are predictable from one another, but that these relationships are due to overall spectral roll-off. Two additional experiments addressed the perceptual independence of the source parameters. Listener sensitivity to H1-H2, H2-H4, and H4-2 kHz did not change as a function of the slope of an adjacent component, suggesting that sensitivity to these components is robust. Listener sensitivity to changes in spectral slope from 2 kHz to 5 kHz depended on complex interactions between spectral slope, spectral noise levels, and H4-2 kHz. It is concluded that the four parameters represent non-redundant acoustic and perceptual aspects of voice quality.

Subject(s)

Acoustics , Models, Theoretical , Speech Acoustics , Voice Quality , Adult , Female , Humans , Male , Psychoacoustics , Sound Spectrography , Speech Production Measurement , Young Adult

5.

Perceptual evaluation of voice source models.

Kreiman, Jody; Garellek, Marc; Chen, Gang; Alwan, Abeer; Gerratt, Bruce R.

J Acoust Soc Am ; 138(1): 1-10, 2015 Jul.

Article in English | MEDLINE | ID: mdl-26233000

ABSTRACT

Models of the voice source differ in their fits to natural voices, but it is unclear which differences in fit are perceptually salient. This study examined the relationship between the fit of five voice source models to 40 natural voices, and the degree of perceptual match among stimuli synthesized with each of the modeled sources. Listeners completed a visual sort-and-rate task to compare versions of each voice created with the different source models, and the results were analyzed using multidimensional scaling. Neither fits to pulse shapes nor fits to landmark points on the pulses predicted observed differences in quality. Further, the source models fit the opening phase of the glottal pulses better than they fit the closing phase, but at the same time similarity in quality was better predicted by the timing and amplitude of the negative peak of the flow derivative (part of the closing phase) than by the timing and/or amplitude of peak glottal opening. Results indicate that simply knowing how (or how well) a particular source model fits or does not fit a target source pulse in the time domain provides little insight into what aspects of the voice source are important to listeners.

Subject(s)

Auditory Perception/physiology , Voice Quality/physiology , Acoustic Stimulation , Adolescent , Adult , Glottis/physiology , Humans , Middle Aged , Models, Biological , Sound Localization/physiology , Sound Spectrography , Young Adult

6.

Toward a unified theory of voice production and perception.

Kreiman, Jody; Gerratt, Bruce R; Garellek, Marc; Samlan, Robin; Zhang, Zhaoyan.

Loquens ; 1(1)2014 Jan.

Article in English | MEDLINE | ID: mdl-27135054

ABSTRACT

At present, two important questions about voice remain unanswered: When voice quality changes, what physiological alteration caused this change, and if a change to the voice production system occurs, what change in perceived quality can be expected? We argue that these questions can only be answered by an integrated model of voice linking production and perception, and we describe steps towards the development of such a model. Preliminary evidence in support of this approach is also presented. We conclude that development of such a model should be a priority for scientists interested in voice, to explain what physical condition(s) might underlie a given voice quality, or what voice quality might result from a specific physical configuration.

7.

Development of a glottal area index that integrates glottal gap size and open quotient.

Chen, Gang; Kreiman, Jody; Gerratt, Bruce R; Neubauer, Juergen; Shue, Yen-Liang; Alwan, Abeer.

J Acoust Soc Am ; 133(3): 1656-66, 2013 Mar.

Article in English | MEDLINE | ID: mdl-23464035

ABSTRACT

Because voice signals result from vocal fold vibration, perceptually meaningful vibratory measures should quantify those aspects of vibration that correspond to differences in voice quality. In this study, glottal area waveforms were extracted from high-speed videoendoscopy of the vocal folds. Principal component analysis was applied to these waveforms to investigate the factors that vary with voice quality. Results showed that the first principal component derived from tokens without glottal gaps was significantly (p < 0.01) associated with the open quotient (OQ). The alternating-current (AC) measure had a significant effect (p < 0.01) on the first principal component among tokens exhibiting glottal gaps. A measure AC/OQ, defined as the ratio of AC to OQ, was proposed to combine both amplitude and temporal characteristics of the glottal area waveform for both complete and incomplete glottal closures. Analyses of "glide" phonations in which quality varied continuously from breathy to pressed showed that the AC/OQ measure was able to characterize the corresponding continuum of glottal area waveform variation, regardless of the presence or absence of glottal gaps.

Subject(s)

Glottis/anatomy & histology , Glottis/physiology , Phonation , Speech Acoustics , Voice Quality , Biomechanical Phenomena , Female , Humans , Laryngoscopy , Linear Models , Male , Principal Component Analysis , Time Factors , Vibration , Video Recording , Vocal Cords/anatomy & histology , Vocal Cords/physiology

8.

Acoustic and perceptual effects of changes in body layer stiffness in symmetric and asymmetric vocal fold models.

Zhang, Zhaoyan; Kreiman, Jody; Gerratt, Bruce R; Garellek, Marc.

J Acoust Soc Am ; 133(1): 453-62, 2013 Jan.

Article in English | MEDLINE | ID: mdl-23297917

ABSTRACT

At present, it is not well understood how changes in vocal fold biomechanics correspond to changes in voice quality. Understanding such cross-domain links from physiology to acoustics to perception in the "speech chain" is of both theoretical and clinical importance. This study investigates links between changes in body layer stiffness, which is regulated primarily by the thyroarytenoid muscle, and the consequent changes in acoustics and voice quality under left-right symmetric and asymmetric stiffness conditions. Voice samples were generated using three series of two-layer physical vocal fold models, which differed only in body stiffness. Differences in perceived voice quality in each series were then measured in a "sort and rate" listening experiment. The results showed that increasing body stiffness better maintained vocal fold adductory position, thereby exciting more high-order harmonics, differences that listeners readily perceived. Changes to the degree of left-right stiffness mismatch and the resulting left-right vibratory asymmetry did not produce perceptually significant differences in quality unless the stiffness mismatch was large enough to cause a change in vibratory mode. This suggests that a vibration pattern with left-right asymmetry does not necessarily result in a salient deviation in voice quality, and thus may not always be of clinical significance.

Subject(s)

Acoustics , Models, Anatomic , Phonation , Speech Acoustics , Speech Perception , Vocal Cords/physiology , Voice Quality , Biomechanical Phenomena , Elasticity , Female , Humans , Linear Models , Male , Pressure , Signal Processing, Computer-Assisted , Sound Spectrography , Vibration , Vocal Cords/anatomy & histology

9.

Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation.

Kreiman, Jody; Shue, Yen-Liang; Chen, Gang; Iseli, Markus; Gerratt, Bruce R; Neubauer, Juergen; Alwan, Abeer.

J Acoust Soc Am ; 132(4): 2625-32, 2012 Oct.

Article in English | MEDLINE | ID: mdl-23039455

ABSTRACT

Increases in open quotient are widely assumed to cause changes in the amplitude of the first harmonic relative to the second (H1*-H2*), which in turn correspond to increases in perceived vocal breathiness. Empirical support for these assumptions is rather limited, and reported relationships among these three descriptive levels have been variable. This study examined the empirical relationship among H1*-H2*, the glottal open quotient (OQ), and glottal area waveform skewness, measured synchronously from audio recordings and high-speed video images of the larynges of six phonetically knowledgeable, vocally healthy speakers who varied fundamental frequency and voice qualities quasi-orthogonally. Across speakers and voice qualities, OQ, the asymmetry coefficient, and fundamental frequency accounted for an average of 74% of the variance in H1*-H2*. However, analyses of individual speakers showed large differences in the strategies used to produce the same intended voice qualities. Thus, H1*-H2* can be predicted with good overall accuracy, but its relationship to phonatory characteristics appears to be speaker dependent.

Subject(s)

Glottis/physiology , Phonation , Phonetics , Speech Acoustics , Voice Quality , Biomechanical Phenomena , Female , Glottis/anatomy & histology , Humans , Laryngoscopy , Linear Models , Male , Speech Production Measurement , Time Factors , Video Recording

10.

Perceptual interaction of the harmonic source and noise in voice.

Kreiman, Jody; Gerratt, Bruce R.

J Acoust Soc Am ; 131(1): 492-500, 2012 Jan.

Article in English | MEDLINE | ID: mdl-22280610

ABSTRACT

Although the amount of inharmonic energy (noise) present in a human voice is an important determinant of vocal quality, little is known about the perceptual interaction between harmonic and inharmonic aspects of the voice source. This paper reports three experiments investigating this issue. Results indicate that perception of the harmonic slope and of noise levels are both influenced by complex interactions between the spectral shape and relative levels of harmonic and noise energy in the voice source. Just-noticeable differences (JNDs) for the noise-to-harmonics ratio (NHR) varied significantly with the NHR and harmonic spectral slope, but NHR had no effect on JNDs for NHR when harmonic slopes were steepest, and harmonic slope had no effect when NHRs were highest. Perception of changes in the harmonic source slope depended on NHR and on the harmonic source slope: JNDs increased when spectra rolled off steeply, with this effect in turn depending on NHR. Finally, all effects were modulated by the shape of the noise spectrum. It thus appears that, beyond masking, understanding perception of individual parameters requires knowledge of the acoustic context in which they function, consistent with the view that voices are integral patterns that resist decomposition.

Subject(s)

Noise , Voice Quality/physiology , Acoustic Stimulation , Analysis of Variance , Auditory Perception/physiology , Computer Simulation , Differential Threshold/physiology , Female , Humans , Male , Phonetics , Pilot Projects , Signal-To-Noise Ratio , Sound Spectrography

11.

Comparing two methods for reducing variability in voice quality measurements.

Kreiman, Jody; Gerratt, Bruce R.

J Speech Lang Hear Res ; 54(3): 803-12, 2011 Jun.

Article in English | MEDLINE | ID: mdl-21081673

ABSTRACT

PURPOSE: Interrater disagreements in ratings of quality plague the study of voice. This study compared 2 methods for handling this variability. METHOD: Listeners provided multiple breathiness ratings for 2 sets of pathological voices, one including 20 male and 20 female voices unselected for quality and one including 20 breathy female voices. Ratings for each listener were averaged together, mean ratings were z transformed, and the likelihood that 2 listeners would agree exactly in their ratings was calculated as a function of averaging and standardizing condition. Data were also multidimensionally scaled to examine similarities among listeners in perceptual strategy. Results were compared with parallel analyses of existing breathiness ratings of the same voices gathered using a method-of-adjustment task. RESULTS: Three-way interactions between the mean rating for a voice, standardization condition, and the number of voices averaged together were observed, but no main effect of averaging condition emerged. Multidimensional scaling revealed significant residual differences in perceptual strategy across listeners after averaging and standardizing. Ratings from the method-of-adjustment task showed both high agreement levels and consistent perceptual strategies across listeners, as theoretically predicted. CONCLUSION: Averaging multiple ratings and standardizing the mean are inadequate in addressing variations in voice quality perception.

Subject(s)

Dysphonia/diagnosis , Psychometrics , Speech Discrimination Tests , Speech Perception , Voice Quality , Adolescent , Adult , Female , Humans , Male , Observer Variation , Psychometrics/methods , Psychometrics/standards , Psychometrics/statistics & numerical data , Respiratory Mechanics , Speech Acoustics , Speech Discrimination Tests/methods , Speech Discrimination Tests/standards , Speech Discrimination Tests/statistics & numerical data , Young Adult

12.

Integrated software for analysis and synthesis of voice quality.

Kreiman, Jody; Antoñanzas-Barroso, Norma; Gerratt, Bruce R.

Behav Res Methods ; 42(4): 1030-41, 2010 Nov.

Article in English | MEDLINE | ID: mdl-21139170

ABSTRACT

Voice quality is an important perceptual cue in many disciplines, but knowledge of its nature is limited by a poor understanding of the relevant psychoacoustics. This article (aimed at researchers studying voice, speech, and vocal behavior) describes the UCLA voice synthesizer, software for voice analysis and synthesis designed to test hypotheses about the relationship between acoustic parameters and voice quality perception. The synthesizer provides experimenters with a useful tool for creating and modeling voice signals. In particular, it offers an integrated approach to voice analysis and synthesis and allows easy, precise, spectral-domain manipulations of the harmonic voice source. The synthesizer operates in near real time, using a parsimonious set of acoustic parameters for the voice source and vocal tract that a user can modify to accurately copy the quality of most normal and pathological voices. The software, user's manual, and audio files may be downloaded from http://brm.psychonomic-journals.org/content/supplemental. Future updates may be downloaded from www.surgery.medsch.ucla.edu/glottalaffairs/.

Subject(s)

Sound Spectrography , Speech Recognition Software , Voice Quality , Cues , Speech

13.

Effects of native language on perception of voice quality.

Kreiman, Jody; Gerratt, Bruce R; Khan, Sameer Ud Dowla.

J Phon ; 38(4): 588-593, 2010 Oct 01.

Article in English | MEDLINE | ID: mdl-21152109

ABSTRACT

Little is known about how listeners judge phonemic versus allophonic (or freely varying) versus post-lexical variations in voice quality, or about which acoustic attributes serve as perceptual cues in specific contexts. To address this issue, native speakers of Gujarati, Thai, and English discriminated among pairs of voices that differed only in the relative amplitudes of the first versus second harmonics (H1-H2). Results indicate that speakers of Gujarati (which contrasts H1-H2 phonemically) were more sensitive to changes than are speakers of Thai or English. Further, sensitivity was not affected by the overall source spectral slope for Gujarati speakers, unlike Thai and English speakers, who were most sensitive when the spectrum fell away steeply. In combination with previous findings from Mandarin speakers, these results suggest a continuum of sensitivity to H1-H2. In Gujarati, the independence of sensitivity and spectral context is consistent with use of H1-H2 as a cue to the language's phonemic phonation contrast. Speakers of Mandarin, in which creaky phonation occurs in conjunction with the low dipping Tone 3, apparently also learn to hear these contrasts, but sensitivity is conditioned by spectral context. Finally, for Thai and English speakers, who vary phonation only post-lexically, sensitivity is both lower and contextually-determined, reflecting the smaller role of H1-H2 in these languages.

14.

Perceptual sensitivity to first harmonic amplitude in the voice source.

Kreiman, Jody; Gerratt, Bruce R.

J Acoust Soc Am ; 128(4): 2085-9, 2010 Oct.

Article in English | MEDLINE | ID: mdl-20968379

ABSTRACT

Little is known about the perceptual importance of changes in the shape of the source spectrum, although many measures have been proposed and correlations with different vocal qualities (breathiness, roughness, nasality, strain...) have frequently been reported. This study investigated just-noticeable differences in the relative amplitudes of the first two harmonics (H1-H2) for speakers of Mandarin and English. Listeners heard pairs of vowels that differed only in the amplitude of the first harmonic and judged whether or not the voice tokens were identical in voice quality. Across voices and listeners, just-noticeable-differences averaged 3.18 dB. This value is small relative to the range of values across voices, indicating that H1-H2 is a perceptually valid acoustic measure of vocal quality. For both groups of listeners, differences in the amplitude of the first harmonic were easier to detect when the source spectral slope was steeply falling so that F0 dominated the spectrum. Mandarin speakers were significantly more sensitive (by about 1 dB) to differences in first harmonic amplitudes than were English speakers. Two explanations for these results are possible: Mandarin speakers may have learned to hear changes in harmonic amplitudes due to changes in voice quality that are correlated with the tones of Mandarin; or Mandarin speakers' experience with tonal contrasts may increase their sensitivity to small differences in the amplitude of F0 (which is also the first harmonic).

Subject(s)

Auditory Pathways/physiology , Phonetics , Pitch Perception , Signal Detection, Psychological , Speech Acoustics , Voice , Acoustic Stimulation , Adult , Audiometry , Female , Humans , Male , Middle Aged , Multilingualism , Perceptual Masking , Sound Spectrography , Voice Quality , Young Adult

15.

Improved tracheoesophageal prosthesis sizing in office-based tracheoesophageal puncture.

Sidell, Douglas; Shamouelian, David; Erman, Andrew; Gerratt, Bruce R; Chhetri, Dinesh.

Ann Otol Rhinol Laryngol ; 119(1): 37-41, 2010 Jan.

Article in English | MEDLINE | ID: mdl-20128185

ABSTRACT

OBJECTIVES: Tracheoesophageal puncture (TEP) for postlaryngectomy speech is increasingly being performed as an office-based procedure. We review our experience with office-based TEP and compare outcomes with those of operating room-based TEP. Our hypothesis was that office-based TEP results in improved prosthesis sizing, reducing the number of visits dedicated to prosthesis resizing. METHODS: A retrospective chart review was performed of all patients who underwent secondary TEP at our institution from 2001 to 2008. The primary dependent measure was the change in the length of the voice prosthesis. We also evaluated the number of visits made to the speech-language pathologist for resizing before a stable prosthesis length was achieved, and the number of days between voice prosthesis placement and the date a stable prosthesis length was observed. RESULTS: Thirty-one patients were included in this study. There was a significant difference in prosthesis length change between patients who had office-based TEP and patients who had operating room-based TEP (p < 0.001). In addition, the office-based cohort required fewer visits to the speech-language pathologist for TEP adjustments before a stable TEP length was achieved (p < 0.001). CONCLUSIONS: Voice prosthesis sizing was better in patients who had office-based TEP than in patients who had operating room-based TEP. This outcome is likely due to the lesser degree of swelling of the tracheoesophageal party wall in the office-based procedure.

Subject(s)

Prosthesis Implantation/methods , Aged , Aged, 80 and over , Ambulatory Care , Esophagus/surgery , Female , Humans , Larynx, Artificial , Male , Middle Aged , Prosthesis Fitting , Retrospective Studies , Trachea/surgery

16.

Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol.

Kempster, Gail B; Gerratt, Bruce R; Verdolini Abbott, Katherine; Barkmeier-Kraemer, Julie; Hillman, Robert E.

Am J Speech Lang Pathol ; 18(2): 124-32, 2009 May.

Article in English | MEDLINE | ID: mdl-18930908

ABSTRACT

PURPOSE: This article presents the development of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) following a consensus conference on perceptual voice quality measurement sponsored by the American Speech-Language-Hearing Association's Special Interest Division 3, Voice and Voice Disorders. The CAPE-V protocol and recording form were designed to promote a standardized approach to evaluating and documenting auditory-perceptual judgments of vocal quality. METHOD: A summary of the consensus conference proceedings and the factors considered by the authors in developing this instrument are included. CONCLUSION: The CAPE-V form and instructions, included as appendices to this article, enable clinicians to document perceived voice quality deviations following a standard (i.e., consistent and specified) protocol.

Subject(s)

Voice Disorders/diagnosis , American Speech-Language-Hearing Association , Auditory Perception , Clinical Protocols , Humans , Psychoacoustics , Speech Production Measurement , United States , Voice Quality

17.

When and why listeners disagree in voice quality assessment tasks.

Kreiman, Jody; Gerratt, Bruce R; Ito, Mika.

J Acoust Soc Am ; 122(4): 2354-64, 2007 Oct.

Article in English | MEDLINE | ID: mdl-17902870

ABSTRACT

Modeling sources of listener variability in voice quality assessment is the first step in developing reliable, valid protocols for measuring quality, and provides insight into the reasons that listeners disagree in their quality assessments. This study examined the adequacy of one such model by quantifying the contributions of four factors to interrater variability: instability of listeners' internal standards for different qualities, difficulties isolating individual attributes in voice patterns, scale resolution, and the magnitude of the attribute being measured. One hundred twenty listeners in six experiments assessed vocal quality in tasks that differed in scale resolution, in the presence/absence of comparison stimuli, and in the extent to which the comparison stimuli (if present) matched the target voices. These factors accounted for 84.2% of the variance in the likelihood that listeners would agree exactly in their assessments. Providing listeners with comparison stimuli that matched the target voices doubled the likelihood that they would agree exactly. Listeners also agreed significantly better when assessing quality on continuous versus six-point scales. These results indicate that interrater variability is an issue of task design, not of listener unreliability.

Subject(s)

Judgment , Voice Disorders/diagnosis , Voice Quality , Adult , Communication Aids for Disabled , Discrimination Learning , Female , Humans , Likelihood Functions , Male , Middle Aged , Observer Variation , ROC Curve , Sound Spectrography , Voice Disorders/psychology

18.

Measures of the glottal source spectrum.

Kreiman, Jody; Gerratt, Bruce R; Antoñanzas-Barroso, Norma.

J Speech Lang Hear Res ; 50(3): 595-610, 2007 Jun.

Article in English | MEDLINE | ID: mdl-17538103

ABSTRACT

PURPOSE: Many researchers have studied the acoustics, physiology, and perceptual characteristics of the voice source, but despite significant attention, it remains unclear which aspects of the source should be quantified and how measurements should be made. In this study, the authors examined the relationships among a number of existing measures of the glottal source spectrum, along with the association of these measures to overall spectral shapes and to glottal pulse shapes, to determine which measures of the source best capture information about the shapes of glottal pulses and glottal source spectra. METHOD: Seventy-eight different measures of source spectral shapes were made on the voices of 70 speakers. Principal components analysis was applied to measurement data, and the resulting factors were compared with factors similarly derived from oral speech spectra and glottal pulses. RESULTS: Results revealed high levels of duplication and overlap among existing measures of source spectral slope. Further, existing measures were not well aligned with patterns of spectral variability. In particular, existing spectral measures do not appear to model the higher frequency parts of the source spectrum adequately. CONCLUSION: The failure of existing measures to adequately quantify spectral variability may explain why results of studies examining the perceptual importance of spectral slope have not produced consistent results. Because variability in the speech signal is often perceptually salient, these results suggest that most existing measures of source spectral slope are unlikely to be good predictors of voice quality.

Subject(s)

Glottis/physiology , Phonation/physiology , Speech/physiology , Voice Quality , Female , Humans , Male , Middle Aged , Speech Acoustics , Speech Perception , Speech Production Measurement , Time Factors

19.

Perception of aperiodicity in pathological voice.

Kreiman, Jody; Gerratt, Bruce R.

J Acoust Soc Am ; 117(4 Pt 1): 2201-11, 2005 Apr.

Article in English | MEDLINE | ID: mdl-15898661

ABSTRACT

Although jitter, shimmer, and noise acoustically characterize all voice signals, their perceptual importance in naturally produced pathological voices has not been established psychoacoustically. To determine the role of these attributes in the perception of vocal quality, listeners were asked to adjust levels of jitter, shimmer, and the noise-to-signal ratio in a speech synthesizer, so that synthetic voices matched naturally produced tokens. Results showed that, although listeners agreed well in their judgments of the noise-to-signal ratio, they did not agree with one another in their chosen settings for jitter and shimmer. Noise-dependent differences in listeners' ability to detect changes in amounts of jitter and shimmer implicate both listener insensitivity and inability to isolate jitter and shimmer as separate dimensions in the overall pattern of aperiodicity in a voice as causes of this poor agreement. These results suggest that jitter and shimmer are not useful as independent indices of perceived vocal quality, apart from their acoustic contributions to the overall pattern of spectrally shaped noise in a voice.

Subject(s)

Sound Spectrography , Speech Perception , Voice Disorders/diagnosis , Voice Quality/physiology , Adult , Analysis of Variance , Communication Aids for Disabled , Female , Humans , Male , Middle Aged , Phonetics

20.

Perception of vocal tremor.

Kreiman, Jody; Gabelman, Brian; Gerratt, Bruce R.

J Speech Lang Hear Res ; 46(1): 203-14, 2003 Feb.

Article in English | MEDLINE | ID: mdl-12647899

ABSTRACT

Vocal tremors characterize many pathological voices, but acoustic-perceptual aspects of tremor are poorly understood. To investigate this relationship, 2 tremor models were implemented in a custom voice synthesizer. The first modulated fundamental frequency (F0) with a sine wave. The second provided irregular modulation. Control parameters in both models were the frequency and amplitude of the F0 modulating waveform. Thirty-two 1-s samples of /a/, produced by speakers with vocal pathology, were modeled in the synthesizer. Synthetic copies of each vowel were created by using tremor parameters derived from different features of F0 versus time plots of the natural stimuli or by using parameters chosen to match the original stimuli perceptually. Listeners compared synthetic and original stimuli in 3 experiments. Sine wave and irregular tremor models both provided excellent matches to subsets of the voices. The perceptual importance of the shape of the modulating waveform depended on the severity of the tremor, with the choice of tremor model increasing in importance as the tremor increased in severity. The average frequency deviation from the mean F0 proved a good predictor of the perceived amplitude of a tremor. Differences in tremor rats were easiest to hear when the tremor was sinusoidal and of small amplitude. Differences in tremor rate were difficult to judge for tremors of large amplitude or in the context of irregularities in the pattern of frequency modulation. These results suggest that difference limens are larger for modulation rates and amplitudes when the tremor pattern is complex. Further, tremor rate, regularity, and amplitude interact, so that the perceptual importance of any one dimension depends on values of the others.

Subject(s)

Vocal Cords/physiopathology , Voice Disorders/diagnosis , Voice Disorders/physiopathology , Adult , Algorithms , Female , Humans , Male , Middle Aged , Severity of Illness Index , Time Factors , Voice Quality

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL