Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
Add more filters










Publication year range
1.
Phonetica ; 77(3): 186-208, 2020.
Article in English | MEDLINE | ID: mdl-31018217

ABSTRACT

BACKGROUND/AIMS: This work examines the perception of the stop voicing contrast in Spanish and English along four acoustic dimensions, comparing monolingual and bilingual listeners. Our primary goals are to test the extent to which cue-weighting strategies are language-specific in monolinguals, and whether this language specificity extends to bilingual listeners. METHODS: Participants categorized sounds varying in voice onset time (VOT, the primary cue to the contrast) and three secondary cues: fundamental frequency at vowel onset, first formant (F1) onset frequency, and stop closure duration. Listeners heard acoustically identical target stimuli, within language-specific carrier phrases, in English and Spanish modes. RESULTS: While all listener groups used all cues, monolingual English listeners relied more on F1, and less on closure duration, than monolingual Spanish listeners, indicating language specificity in cue use. Early bilingual listeners used the three secondary cues similarly in English and Spanish, despite showing language-specific VOT boundaries. CONCLUSION: While our findings reinforce previous work demonstrating language-specific phonetic representations in bilinguals in terms of VOT boundary, they suggest that this specificity may not extend straightforwardly to cue-weighting strategies.


Subject(s)
Cues , Language , Multilingualism , Phonetics , Voice , Acoustic Stimulation , Adult , Humans , Logistic Models , Speech , Young Adult
2.
Cognition ; 182: 318-330, 2019 01.
Article in English | MEDLINE | ID: mdl-30415133

ABSTRACT

Bilinguals understand when the communication context calls for speaking a particular language and can switch from speaking one language to speaking the other based on such conceptual knowledge. There is disagreement regarding whether conceptually-based language selection is also possible in the listening modality. For example, can bilingual listeners perceptually adjust to changes in pronunciation across languages based on their conceptual understanding of which language they're currently hearing? We asked French- and Spanish-English bilinguals to identify nonsense monosyllables as beginning with /b/ or /p/, speech categories that French and Spanish speakers pronounce differently than English speakers. We conceptually cued each bilingual group to one of their two languages or the other by explicitly instructing them that the speech items were word onsets in that language, uttered by a native speaker thereof. Both groups adjusted their /b-p/ identification boundary as a function of this conceptual cue to the language context. These results support a bilingual model permitting conceptually-based language selection on both the speaking and listening end of a communicative exchange.


Subject(s)
Multilingualism , Psycholinguistics , Speech Perception/physiology , Adult , Cues , Humans , Young Adult
3.
J Acoust Soc Am ; 137(1): EL65-70, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25618101

ABSTRACT

Speech perception studies generally focus on the acoustic information present in the frequency regions below 6 kHz. Recent evidence suggests that there is perceptually relevant information in the higher frequencies, including information affecting speech intelligibility. This experiment examined whether listeners are able to accurately identify a subset of vowels and consonants in CV-context when only high-frequency (above 5 kHz) acoustic information is available (through high-pass filtering and masking of lower frequency energy). The findings reveal that listeners are capable of extracting information from these higher frequency regions to accurately identify certain consonants and vowels.


Subject(s)
Phonetics , Pitch Perception/physiology , Speech Intelligibility/physiology , Acoustic Stimulation , Adult , Female , Humans , Male , Perceptual Masking/physiology , Speech Perception , Young Adult
4.
J Voice ; 29(2): 140-7, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25532813

ABSTRACT

OBJECTIVES/HYPOTHESIS: Sources of vocal tremor are difficult to categorize perceptually and acoustically. This article describes a preliminary attempt to discriminate vocal tremor sources through the use of spectral measures of the amplitude envelope. The hypothesis is that different vocal tremor sources are associated with distinct patterns of acoustic amplitude modulations. STUDY DESIGN: Statistical categorization methods (discriminant function analysis) were used to discriminate signals from simulated vocal tremor with different sources using only acoustic measures derived from the amplitude envelopes. METHODS: Simulations of vocal tremor were created by modulating parameters of a vocal fold model corresponding to oscillations of respiratory driving pressure (respiratory tremor), degree of vocal fold adduction (adductory tremor), and fundamental frequency of vocal fold vibration (F0 tremor). The acoustic measures were based on spectral analyses of the amplitude envelope computed across the entire signal and within select frequency bands. RESULTS: The signals could be categorized (with accuracy well above chance) in terms of the simulated tremor source using only measures of the amplitude envelope spectrum even when multiple sources of tremor were included. CONCLUSIONS: These results supply initial support for an amplitude-envelope-based approach to identify the source of vocal tremor and provide further evidence for the rich information about talker characteristics present in the temporal structure of the amplitude envelope.


Subject(s)
Speech Acoustics , Vocal Cords/physiopathology , Voice Disorders/physiopathology , Voice Quality/physiology , Humans , Speech Production Measurement/methods
5.
Front Psychol ; 5: 1239, 2014.
Article in English | MEDLINE | ID: mdl-25400613

ABSTRACT

Humans routinely produce acoustical energy at frequencies above 6 kHz during vocalization, but this frequency range is often not represented in communication devices and speech perception research. Recent advancements toward high-definition (HD) voice and extended bandwidth hearing aids have increased the interest in the high frequencies. The potential perceptual information provided by high-frequency energy (HFE) is not well characterized. We found that humans can accomplish tasks of gender discrimination and vocal production mode discrimination (speech vs. singing) when presented with acoustic stimuli containing only HFE at both amplified and normal levels. Performance in these tasks was robust in the presence of low-frequency masking noise. No substantial learning effect was observed. Listeners also were able to identify the sung and spoken text (excerpts from "The Star-Spangled Banner") with very few exposures. These results add to the increasing evidence that the high frequencies provide at least redundant information about the vocal signal, suggesting that its representation in communication devices (e.g., cell phones, hearing aids, and cochlear implants) and speech/voice synthesizers could improve these devices and benefit normal-hearing and hearing-impaired listeners.

6.
Front Psychol ; 5: 587, 2014.
Article in English | MEDLINE | ID: mdl-24982643

ABSTRACT

While human vocalizations generate acoustical energy at frequencies up to (and beyond) 20 kHz, the energy at frequencies above about 5 kHz has traditionally been neglected in speech perception research. The intent of this paper is to review (1) the historical reasons for this research trend and (2) the work that continues to elucidate the perceptual significance of high-frequency energy (HFE) in speech and singing. The historical and physical factors reveal that, while HFE was believed to be unnecessary and/or impractical for applications of interest, it was never shown to be perceptually insignificant. Rather, the main causes for focus on low-frequency energy appear to be because the low-frequency portion of the speech spectrum was seen to be sufficient (from a perceptual standpoint), or the difficulty of HFE research was too great to be justifiable (from a technological standpoint). The advancement of technology continues to overcome concerns stemming from the latter reason. Likewise, advances in our understanding of the perceptual effects of HFE now cast doubt on the first cause. Emerging evidence indicates that HFE plays a more significant role than previously believed, and should thus be considered in speech and voice perception research, especially in research involving children and the hearing impaired.

7.
Front Psychol ; 5: 427, 2014.
Article in English | MEDLINE | ID: mdl-24917830
8.
J Speech Lang Hear Res ; 57(5): 1619-37, 2014 Oct.
Article in English | MEDLINE | ID: mdl-24845730

ABSTRACT

PURPOSE: Computational modeling was used to examine the consequences of 5 different laryngeal asymmetries on acoustic and perceptual measures of vocal function. METHOD: A kinematic vocal fold model was used to impose 5 laryngeal asymmetries: adduction, edge bulging, nodal point ratio, amplitude of vibration, and starting phase. Thirty /a/ and /ɪ/ vowels were generated for each asymmetry and analyzed acoustically using cepstral peak prominence (CPP), harmonics-to-noise ratio (HNR), and 3 measures of spectral slope (H1*-H2*, B0-B1, and B0-B2). Twenty listeners rated voice quality for a subset of the productions. RESULTS: Increasingly asymmetric adduction, bulging, and nodal point ratio explained significant variance in perceptual rating (R2 = .05, p < .001). The same factors resulted in generally decreasing CPP, HNR, and B0-B2 and in increasing B0-B1. Of the acoustic measures, only CPP explained significant variance in perceived quality (R2 = .14, p < .001). Increasingly asymmetric amplitude of vibration or starting phase minimally altered vocal function or voice quality. CONCLUSION: Asymmetries of adduction, bulging, and nodal point ratio drove acoustic measures and perception in the current study, whereas asymmetric amplitude of vibration and starting phase demonstrated minimal influence on the acoustic signal or voice quality.


Subject(s)
Larynx/physiopathology , Speech Perception/physiology , Speech/physiology , Vocal Cord Paralysis/physiopathology , Adolescent , Adult , Aged , Computer Simulation , Female , Humans , Male , Middle Aged , Signal-To-Noise Ratio , Speech Acoustics , Vibration , Vocal Cords/physiopathology , Young Adult
9.
Behav Brain Sci ; 37(2): 204-5, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24775161

ABSTRACT

Speech is commonly claimed to relate to mirror neurons because of the alluring surface analogy of mirror neurons to the Motor Theory of speech perception, which posits that perception and production draw upon common motor-articulatory representations. We argue that the analogy fails and highlight examples of systems-level developmental approaches that have been more fruitful in revealing perception-production associations.


Subject(s)
Biological Evolution , Brain/physiology , Learning/physiology , Mirror Neurons/physiology , Social Perception , Animals , Humans
10.
J Acoust Soc Am ; 135(1): 400-6, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24437780

ABSTRACT

Previous work has shown that human listeners are sensitive to level differences in high-frequency energy (HFE) in isolated vowel sounds produced by male singers. Results indicated that sensitivity to HFE level changes increased with overall HFE level, suggesting that listeners would be more "tuned" to HFE in vocal production exhibiting higher levels of HFE. It follows that sensitivity to HFE level changes should be higher (1) for female vocal production than for male vocal production and (2) for singing than for speech. To test this hypothesis, difference limens for HFE level changes in male and female speech and singing were obtained. Listeners showed significantly greater ability to detect level changes in singing vs speech but not in female vs male speech. Mean differences limen scores for speech and singing were about 5 dB in the 8-kHz octave (5.6-11.3 kHz) but 8-10 dB in the 16-kHz octave (11.3-22 kHz). These scores are lower (better) than those previously reported for isolated vowels and some musical instruments.


Subject(s)
Pitch Discrimination , Singing , Speech Acoustics , Speech Perception , Voice Quality , Acoustic Stimulation , Adult , Audiometry, Speech , Female , Humans , Male , Psychoacoustics , Sex Factors , Sound Spectrography , Young Adult
11.
Psychol Sci ; 24(11): 2135-42, 2013 Nov 01.
Article in English | MEDLINE | ID: mdl-24022652

ABSTRACT

Bilinguals perceptually accommodate speech variation across languages, but to what extent this flexibility depends on bilingual experience is uncertain. One account suggests that bilingual experience promotes language-specific processing modes, implying that bilinguals can switch as appropriate between the different phonetic systems of the languages they speak. Another account suggests that bilinguals rapidly recalibrate to the unique acoustic properties of each language following language-general processes common to monolinguals. Challenging this latter account, the present results show that Spanish-English bilinguals with exposure to both languages from early childhood, but not English monolinguals, shift perception as appropriate across acoustically controlled English and Spanish contexts. Early bilingual experience appears to promote language-specific phonetic systems.


Subject(s)
Multilingualism , Speech Perception/physiology , Adult , Humans , Phonetics , Psycholinguistics/methods , Random Allocation , Young Adult
12.
Front Psychol ; 4: 399, 2013.
Article in English | MEDLINE | ID: mdl-23847573

ABSTRACT

It is well-established that listeners will shift their categorization of a target vowel as a function of acoustic characteristics of a preceding carrier phrase (CP). These results have been interpreted as an example of perceptual normalization for variability resulting from differences in talker anatomy. The present study examined whether listeners would normalize for acoustic variability resulting from differences in speaking style within a single talker. Two vowel series were synthesized that varied between central and peripheral vowels (the vowels in "beat"-"bit" and "bod"-"bud"). Each member of the series was appended to one of four CPs that were spoken in either a "clear" or "reduced" speech style. Participants categorized vowels in these eight contexts. A reliable shift in categorization as a function of speaking style was obtained for three of four phrase sets. This demonstrates that phrase context effects can be obtained with a single talker. However, the directions of the obtained shifts are not reliably predicted on the basis of the speaking style of the talker. Instead, it appears that the effect is determined by an interaction of the average spectrum of the phrase with the target vowel.

13.
J Acoust Soc Am ; 132(3): 1754-64, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22978902

ABSTRACT

The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.


Subject(s)
Singing , Speech Acoustics , Voice , Adult , Aged , Analysis of Variance , Female , Humans , Male , Middle Aged , Sex Factors , Signal Processing, Computer-Assisted , Sound Spectrography , Speech Production Measurement , Voice Quality , Young Adult
14.
Front Psychol ; 3: 203, 2012.
Article in English | MEDLINE | ID: mdl-22737140

ABSTRACT

Voices have unique acoustic signatures, contributing to the acoustic variability listeners must contend with in perceiving speech, and it has long been proposed that listeners normalize speech perception to information extracted from a talker's speech. Initial attempts to explain talker normalization relied on extraction of articulatory referents, but recent studies of context-dependent auditory perception suggest that general auditory referents such as the long-term average spectrum (LTAS) of a talker's speech similarly affect speech perception. The present study aimed to differentiate the contributions of articulatory/linguistic versus auditory referents for context-driven talker normalization effects and, more specifically, to identify the specific constraints under which such contexts impact speech perception. Synthesized sentences manipulated to sound like different talkers influenced categorization of a subsequent speech target only when differences in the sentences' LTAS were in the frequency range of the acoustic cues relevant for the target phonemic contrast. This effect was true both for speech targets preceded by spoken sentence contexts and for targets preceded by non-speech tone sequences that were LTAS-matched to the spoken sentence contexts. Specific LTAS characteristics, rather than perceived talker, predicted the results suggesting that general auditory mechanisms play an important role in effects considered to be instances of perceptual talker normalization.

15.
J Acoust Soc Am ; 129(4): 2263-8, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21476681

ABSTRACT

The human voice spectrum above 5 kHz receives little attention. However, there are reasons to believe that this high-frequency energy (HFE) may play a role in perceived quality of voice in singing and speech. To fulfill this role, differences in HFE must first be detectable. To determine human ability to detect differences in HFE, the levels of the 8- and 16-kHz center-frequency octave bands were individually attenuated in sustained vowel sounds produced by singers and presented to listeners. Relatively small changes in HFE were in fact detectable, suggesting that this frequency range potentially contributes to the perception of especially the singing voice. Detection ability was greater in the 8-kHz octave than in the 16-kHz octave and varied with band energy level.


Subject(s)
Auditory Perception/physiology , Music , Phonation/physiology , Voice/physiology , Adult , Auditory Threshold/physiology , Female , Humans , Male , Middle Aged , Phonetics , Pressure , Sound Spectrography , Voice Training , Young Adult
16.
Atten Percept Psychophys ; 72(5): 1218-27, 2010 Jul.
Article in English | MEDLINE | ID: mdl-20601702

ABSTRACT

Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition.


Subject(s)
Comprehension , Phonetics , Semantics , Speech Perception , Adult , Cochlear Implants , Humans , Infant , Infant, Newborn , Language Development , Psycholinguistics , Sound Spectrography
17.
J Speech Lang Hear Res ; 53(5): 1246-55, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20643800

ABSTRACT

PURPOSE: Previous research demonstrated the ability of temporally based rhythm metrics to distinguish among dysarthrias with different prosodic deficit profiles (J. M. Liss et al., 2009). The authors examined whether comparable results could be obtained by an automated analysis of speech envelope modulation spectra (EMS), which quantifies the rhythmicity of speech within specified frequency bands. METHOD: EMS was conducted on sentences produced by 43 speakers with 1 of 4 types of dysarthria and healthy controls. The EMS consisted of the spectra of the slow-rate (up to 10 Hz) amplitude modulations of the full signal and 7 octave bands ranging in center frequency from 125 to 8000 Hz. Six variables were calculated for each band relating to peak frequency and amplitude and relative energy above, below, and in the region of 4 Hz. Discriminant function analyses (DFA) determined which sets of predictor variables best discriminated between and among groups. RESULTS: Each of 6 DFAs identified 2-6 of the 48 predictor variables. These variables achieved 84%-100% classification accuracy for group membership. CONCLUSIONS: Dysarthrias can be characterized by quantifiable temporal patterns in acoustic output. Because EMS analysis is automated and requires no editing or linguistic assumptions, it shows promise as a clinical and research tool.


Subject(s)
Dysarthria/classification , Speech Acoustics , Speech Articulation Tests , Speech Intelligibility , Amyotrophic Lateral Sclerosis/complications , Ataxia/complications , Case-Control Studies , Discriminant Analysis , Dysarthria/complications , Dysarthria/diagnosis , Female , Humans , Huntington Disease/complications , Male , Parkinson Disease/complications , Periodicity , Phonetics , Reference Values , Signal Processing, Computer-Assisted , Sound Spectrography/methods , Time Factors
18.
J Speech Lang Hear Res ; 52(5): 1334-52, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19717656

ABSTRACT

PURPOSE: In this study, the authors examined whether rhythm metrics capable of distinguishing languages with high and low temporal stress contrast also can distinguish among control and dysarthric speakers of American English with perceptually distinct rhythm patterns. Methods Acoustic measures of vocalic and consonantal segment durations were obtained for speech samples from 55 speakers across 5 groups (hypokinetic, hyperkinetic, flaccid-spastic, ataxic dysarthrias, and controls). Segment durations were used to calculate standard and new rhythm metrics. Discriminant function analyses (DFAs) were used to determine which sets of predictor variables (rhythm metrics) best discriminated between groups (control vs. dysarthrias; and among the 4 dysarthrias). A cross-validation method was used to test the robustness of each original DFA. RESULTS: The majority of classification functions were more than 80% successful in classifying speakers into their appropriate group. New metrics that combined successive vocalic and consonantal segments emerged as important predictor variables. DFAs pitting each dysarthria group against the combined others resulted in unique constellations of predictor variables that yielded high levels of classification accuracy. CONCLUSIONS: This study confirms the ability of rhythm metrics to distinguish control speech from dysarthrias and to discriminate dysarthria subtypes. Rhythm metrics show promise for use as a rational and objective clinical tool.


Subject(s)
Dysarthria/diagnosis , Dysarthria/physiopathology , Speech Articulation Tests , Speech/physiology , Analysis of Variance , Ataxia/diagnosis , Ataxia/physiopathology , Humans , Language , Predictive Value of Tests , Speech Acoustics , Time Factors
19.
Trends Cogn Sci ; 13(3): 110-4, 2009 Mar.
Article in English | MEDLINE | ID: mdl-19223222

ABSTRACT

The discovery of mirror neurons, a class of neurons that respond when a monkey performs an action and also when the monkey observes others producing the same action, has promoted a renaissance for the Motor Theory (MT) of speech perception. This is because mirror neurons seem to accomplish the same kind of one to one mapping between perception and action that MT theorizes to be the basis of human speech communication. However, this seeming correspondence is superficial, and there are theoretical and empirical reasons to temper enthusiasm about the explanatory role mirror neurons might have for speech perception. In fact, rather than providing support for MT, mirror neurons are actually inconsistent with the central tenets of MT.


Subject(s)
Imitative Behavior/physiology , Motor Activity/physiology , Neurons/physiology , Speech Perception/physiology , Visual Perception/physiology , Animals , Humans , Motor Cortex/cytology , Motor Cortex/physiology , Neurons/classification , Psychomotor Performance/physiology , Reaction Time/physiology
20.
J Acoust Soc Am ; 124(3): 1695-703, 2008 Sep.
Article in English | MEDLINE | ID: mdl-19045660

ABSTRACT

Williams [(1986). "Role of dynamic information in the perception of coarticulated vowels," Ph.D. thesis, University of Connecticut, Standford, CT] demonstrated that nonspeech contexts had no influence on pitch judgments of nonspeech targets, whereas context effects were obtained when instructed to perceive the sounds as speech. On the other hand, Holt et al. [(2000). "Neighboring spectral content influences vowel identification," J. Acoust. Soc. Am. 108, 710-722] showed that nonspeech contexts were sufficient to elicit context effects in speech targets. The current study was to test a hypothesis that could explain the varying effectiveness of nonspeech contexts: Context effects are obtained only when there are well-established perceptual categories for the target stimuli. Experiment 1 examined context effects in speech and nonspeech signals using four series of stimuli: steady-state vowels that perceptually spanned from /inverted ohm/-/I/ in isolation and in the context of /w/ (with no steady-state portion) and two nonspeech sine-wave series that mimicked the acoustics of the speech series. In agreement with previous work context effects were obtained for speech contexts and targets but not for nonspeech analogs. Experiment 2 tested predictions of the hypothesis by testing for nonspeech context effects after the listeners had been trained to categorize the sounds. Following training, context-dependent categorization was obtained for nonspeech stimuli in the training group. These results are presented within a general perceptual-cognitive framework for speech perception research.


Subject(s)
Auditory Perception , Cues , Signal Detection, Psychological , Sound , Speech Acoustics , Speech Perception , Acoustic Stimulation , Adult , Auditory Threshold , Cognition , Humans , Pitch Perception , Sound Spectrography , Speech Discrimination Tests , Time Factors , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...