Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
J Speech Lang Hear Res ; 58(2): 171-84, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25480760

ABSTRACT

PURPOSE: Exercises are described that were designed to provide practice in phonetic transcription for students taking an introductory phonetics course. The goal was to allow instructors to offload much of the drill that would otherwise need to be covered in class or handled with paper-and-pencil tasks using text rather than speech as input. METHOD: The exercises were developed using Alvin, a general-purpose software package for experiment design and control. The simplest exercises help students learn sound-symbol associations. For example, a vowel-transcription exercise presents listeners with consonant-vowel-consonant syllables on each trial; students are asked to choose among buttons labeled with phonetic symbols for 12 vowels. Several word-transcription exercises are included in which students hear a word and are asked to enter a phonetic transcription. Immediate feedback is provided for all of the exercises. An explanation of the methods that are used to create exercises is provided. RESULTS: Although no formal evaluation was conducted, comments on course evaluations suggest that most students found the exercises to be useful. CONCLUSIONS: Exercises were developed for use in an introductory phonetics course. The exercises can be used in their current form, they can be modified to suit individual needs, or new exercises can be developed.


Subject(s)
Phonetics , Software , Speech-Language Pathology/education , Teaching , Humans , Young Adult
2.
J Voice ; 28(6): 783-8, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25179777

ABSTRACT

OBJECTIVE: The purpose of this study is to establish normative values for the smoothed cepstral peak prominence (CPPS) and its sensitivity and specificity as a measure of dysphonia. STUDY DESIGN: Prospective cohort study. METHODS: Voice samples of running speech were obtained from 835 patients and 50 volunteers. Eight laryngologists and four speech-language pathologists performed perceptual ratings of the voice samples on the degree of dysphonia/normality using an analog scale. The mean of their perceptual ratings was used as the gold standard for the detection of the presence or absence of dysphonia. CPPS was measured using the CPPS algorithm of Hillenbrand, and the cut-off value for positivity that has the highest sensitivity and specificity for discriminating between normal and severely dysphonia voices was determined based on ROC-curve analysis. RESULTS: The cut-off value for normal for CPPS was set at 4.0 or higher, which gave a sensitivity of 92.4%, a specificity of 79%, a positive predictive value of 82.5%, and a negative predictive value of 90.8%. The area under the receiver operating characteristic (ROC) curve was 0.937 (P < 0.05). CONCLUSIONS: CPPS is a good measure of dysphonia, with the normal value of CPPS (Hillenbrand algorithm) of a running speech sample being defined as a value of 4.0 or higher.


Subject(s)
Dysphonia/diagnosis , Speech Acoustics , Voice Quality , Algorithms , Area Under Curve , Case-Control Studies , Dysphonia/physiopathology , Female , Humans , Judgment , Male , Observer Variation , Predictive Value of Tests , Prospective Studies , ROC Curve , Reproducibility of Results , Severity of Illness Index , Signal Processing, Computer-Assisted , Speech Perception , Speech Production Measurement , United States
3.
J Acoust Soc Am ; 129(6): 3991-4000, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21682420

ABSTRACT

There is a significant body of research examining the intelligibility of sinusoidal replicas of natural speech. Discussion has followed about what the sinewave speech phenomenon might imply about the mechanisms underlying phonetic recognition. However, most of this work has been conducted using sentence material, making it unclear what the contributions are of listeners' use of linguistic constraints versus lower level phonetic mechanisms. This study was designed to measure vowel intelligibility using sinusoidal replicas of naturally spoken vowels. The sinusoidal signals were modeled after 300 /hVd/ syllables spoken by men, women, and children. Students enrolled in an introductory phonetics course served as listeners. Recognition rates for the sinusoidal vowels averaged 55%, which is much lower than the ∼95% intelligibility of the original signals. Attempts to improve performance using three different training methods met with modest success, with post-training recognition rates rising by ∼5-11 percentage points. Follow-up work showed that more extensive training produced further improvements, with performance leveling off at ∼73%-74%. Finally, modeling work showed that a fairly simple pattern-matching algorithm trained on naturally spoken vowels classified sinewave vowels with 78.3% accuracy, showing that the sinewave speech phenomenon does not necessarily rule out template matching as a mechanism underlying phonetic recognition.


Subject(s)
Speech Acoustics , Speech Intelligibility , Speech Perception , Acoustic Stimulation , Adult , Algorithms , Analysis of Variance , Audiometry, Pure-Tone , Audiometry, Speech , Child , Female , Humans , Male , Pattern Recognition, Physiological , Recognition, Psychology , Signal Processing, Computer-Assisted , Sound Spectrography
4.
Atten Percept Psychophys ; 71(5): 1150-66, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19525544

ABSTRACT

The purpose of the present study was to determine the contributions of fundamental frequency (f (0)) and formants in cuing the distinction between men's and women's voices. A source-filter synthesizer was used to create four versions of 25 sentences spoken by men: (1) unmodified synthesis, (2) f (0) only shifted up toward values typical of women, (3) formants only shifted up toward values typical of women, and (4) both f (0) and formants shifted up. Identical methods were used to generate four corresponding versions of 25 sentences spoken by women, but with downward shifts. Listening tests showed that (1) shifting both f (0) and formants was usually effective (~82%) in changing the perceived sex of the utterance, and (2) shifting either f (0) or formants alone was usually ineffective in changing the perceived sex. Both f (0) and formants are apparently needed to specify speaker sex, though even together these cues are not entirely effective. Results also suggested that f (0) is somewhat more important than formants. A second experiment used the same methods, but isolated /hVd/ syllables were used as test signals. Results were broadly similar, with the important exception that, on average, the syllables were more likely to shift perceived talker sex with shifts in f (0) and/or formants.


Subject(s)
Phonetics , Sex Characteristics , Sound Spectrography , Speech Acoustics , Speech Perception , Voice Quality , Female , Humans , Male
5.
J Acoust Soc Am ; 119(6): 4041-54, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16838546

ABSTRACT

This study was designed to measure the relative contributions to speech intelligibility of spectral envelope peaks (including, but not limited to formants) versus the detailed shape of the spectral envelope. The problem was addressed by asking listeners to identify sentences and nonsense syllables that were generated by two structurally identical source-filter synthesizers, one of which constructs the filter function based on the detailed spectral envelope shape while the other constructs the filter function using a purposely coarse estimate that is based entirely on the distribution of peaks in the envelope. Viewed in the broadest terms the results showed that nearly as much speech information is conveyed by the peaks-only method as by the detail-preserving method. Just as clearly, however, every test showed some measurable advantage for spectral detail, although the differences were not large in absolute terms.


Subject(s)
Acoustic Stimulation/methods , Phonetics , Speech Acoustics , Speech Perception/physiology , Adult , Analysis of Variance , Child , Female , Humans , Linguistics , Male , Sound Spectrography , Speech Intelligibility
6.
J Speech Lang Hear Res ; 48(1): 45-60, 2005 Feb.
Article in English | MEDLINE | ID: mdl-15938059

ABSTRACT

The purpose of this paper is to describe a software package that can be used for performing such routine tasks as controlling listening experiments (e.g., simple labeling, discrimination, sentence intelligibility, and magnitude estimation), recording responses and response latencies, analyzing and plotting the results of those experiments, displaying instructions, and making scripted audio-recordings. The software runs under Windows and is controlled by creating text files that allow the experimenter to specify key features of the experiment such as the stimuli that are to be presented, the randomization scheme, interstimulus and intertrial intervals, the format of the output file, and the layout of response alternatives on the screen. Although the software was developed primarily with speech-perception and psychoacoustics research in mind, it has uses in other areas as well, such as written or auditory word recognition, written or auditory sentence processing, and visual perception.


Subject(s)
Software , Speech Perception , Humans , Recognition, Psychology , Visual Perception , Vocabulary
7.
Ann Otol Rhinol Laryngol ; 112(4): 324-33, 2003 Apr.
Article in English | MEDLINE | ID: mdl-12731627

ABSTRACT

Quantification of perceptual voice characteristics allows the assessment of voice changes. Acoustic measures of jitter, shimmer, and noise-to-harmonic ratio (NHR) are often unreliable. Measures of cepstral peak prominence (CPP) may be more reliable predictors of dysphonia. Trained listeners analyzed voice samples from 281 patients. The NHR, amplitude perturbation quotient, smoothed pitch perturbation quotient, percent jitter, and CPP were obtained from sustained vowel phonation, and the CPP was obtained from running speech. For the first time, normal and abnormal values of CPP were defined, and they were compared with other acoustic measures used to predict dysphonia. The CPP for running speech is a good predictor and a more reliable measure of dysphonia than are acoustic measures of jitter, shimmer, and NHR.


Subject(s)
Speech Acoustics , Voice Disorders/diagnosis , Aged , Aged, 80 and over , Child , Female , Humans , Male , Middle Aged , Observer Variation , ROC Curve , Reference Values , Sensitivity and Specificity , Severity of Illness Index , Time Factors , Voice Disorders/epidemiology , Voice Quality
8.
J Acoust Soc Am ; 113(2): 1044-55, 2003 Feb.
Article in English | MEDLINE | ID: mdl-12597197

ABSTRACT

The purpose of this paper is to propose and evaluate a new model of vowel perception which assumes that vowel identity is recognized by a template-matching process involving the comparison of narrow band input spectra with a set of smoothed spectral-shape templates that are learned through ordinary exposure to speech. In the present simulation of this process, the input spectra are computed over a sufficiently long window to resolve individual harmonics of voiced speech. Prior to template creation and pattern matching, the narrow band spectra are amplitude equalized by a spectrum-level normalization process, and the information-bearing spectral peaks are enhanced by a "flooring" procedure that zeroes out spectral values below a threshold function consisting of a center-weighted running average of spectral amplitudes. Templates for each vowel category are created simply by averaging the narrow band spectra of like vowels spoken by a panel of talkers. In the present implementation, separate templates are used for men, women, and children. The pattern matching is implemented with a simple city-block distance measure given by the sum of the channel-by-channel differences between the narrow band input spectrum (level-equalized and floored) and each vowel template. Spectral movement is taken into account by computing the distance measure at several points throughout the course of the vowel. The input spectrum is assigned to the vowel template that results in the smallest difference accumulated over the sequence of spectral slices. The model was evaluated using a large database consisting of 12 vowels in /hVd/ context spoken by 45 men, 48 women, and 46 children. The narrow band model classified vowels in this database with a degree of accuracy (91.4%) approaching that of human listeners.


Subject(s)
Phonetics , Sound Spectrography , Speech Acoustics , Speech Perception , Adult , Child , Female , Fourier Analysis , Humans , Male , Reference Values , Sound Spectrography/statistics & numerical data
9.
J Commun Disord ; 35(6): 533-42, 2002.
Article in English | MEDLINE | ID: mdl-12443051

ABSTRACT

UNLABELLED: Spectral moments, which describe the distribution of frequencies in a spectrum, were used to investigate the preservation of acoustic cues to intelligibility of speech produced during simultaneous communication (SC) in relation to acoustic cues produced when speaking alone. The spectral moment data obtained from speech alone (SA) were comparable to those spectral moment data reported by Jongman, Wayland, and Wong (2000) and Nittrouer (1995). The spectral moments obtained from speech produced during SC were statistically indistinguishable from those obtained during SA, indicating no measurable degradation of obstruent spectral acoustic cues during SC. EDUCATIONAL OBJECTIVES: As a result of this activity, the participant will be able to (1) describe SC; (2) explain the role of SC in communication with children who are deaf; (3) describe the first, third, and fourth spectral moments of obstruent consonants; and (4) identify spectral moment patterns in speech produced during SC.


Subject(s)
Communication , Cues , Speech , Adult , Female , Humans , Male , Phonetics , Speech Acoustics , Speech Production Measurement , Time Factors
10.
J Speech Lang Hear Res ; 45(4): 639-50, 2002 Aug.
Article in English | MEDLINE | ID: mdl-12199395

ABSTRACT

A speech synthesizer was developed that operates by summing exponentially damped sinusoids at frequencies and amplitudes corresponding to peaks derived from the spectrum envelope of the speech signal. The spectrum analysis begins with the calculation of a smoothed Fourier spectrum. A masking threshold is then computed for each frame as the running average of spectral amplitudes over an 800-Hz window. In a rough simulation of lateral suppression, the running average is then subtracted from the smoothed spectrum (with negative spectral values set to zero), producing a masked spectrum. The signal is resynthesized by summing exponentially damped sinusoids at frequencies corresponding to peaks in the masked spectra. If a periodicity measure indicates that a given analysis frame is voiced, the damped sinusoids are pulsed at a rate corresponding to the measured fundamental period. For unvoiced speech, the damped sinusoids are pulsed on and off at random intervals. A perceptual evaluation of speech produced by the damped sinewave synthesizer showed excellent sentence intelligibility, excellent intelligibility for vowels in /hVd/ syllables, and fair intelligibility for consonants in CV nonsense syllables.


Subject(s)
Communication Aids for Disabled , Speech, Alaryngeal , Humans , Phonetics , Sound Spectrography , Speech Intelligibility , Speech Perception
SELECTION OF CITATIONS
SEARCH DETAIL