Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 8.890
Filter
1.
Trends Hear ; 28: 23312165241273399, 2024.
Article in English | MEDLINE | ID: mdl-39246212

ABSTRACT

In everyday acoustic environments, reverberation alters the speech signal received at the ears. Normal-hearing listeners are robust to these distortions, quickly recalibrating to achieve accurate speech perception. Over the past two decades, multiple studies have investigated the various adaptation mechanisms that listeners use to mitigate the negative impacts of reverberation and improve speech intelligibility. Following the PRISMA guidelines, we performed a systematic review of these studies, with the aim to summarize existing research, identify open questions, and propose future directions. Two researchers independently assessed a total of 661 studies, ultimately including 23 in the review. Our results showed that adaptation to reverberant speech is robust across diverse environments, experimental setups, speech units, and tasks, in noise-masked or unmasked conditions. The time course of adaptation is rapid, sometimes occurring in less than 1 s, but this can vary depending on the reverberation and noise levels of the acoustic environment. Adaptation is stronger in moderately reverberant rooms and minimal in rooms with very intense reverberation. While the mechanisms underlying the recalibration are largely unknown, adaptation to the direct-to-reverberant ratio-related changes in amplitude modulation appears to be the predominant candidate. However, additional factors need to be explored to provide a unified theory for the effect and its applications.


Subject(s)
Adaptation, Physiological , Noise , Speech Intelligibility , Speech Perception , Humans , Acoustic Stimulation , Acoustics , Noise/adverse effects , Perceptual Masking , Speech Acoustics , Speech Perception/physiology
2.
J Acoust Soc Am ; 156(3): 1707-1719, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-39269161

ABSTRACT

Speech sounds exist in a complex acoustic-phonetic space, and listeners vary in the extent to which they are sensitive to variability within the speech sound category ("gradience") and the degree to which they show stable, consistent responses to phonetic stimuli. Here, we investigate the hypothesis that individual differences in the perception of the sound categories of one's language may aid speech-in-noise performance across the adult lifespan. Declines in speech-in-noise performance are well documented in healthy aging, and are, unsurprisingly, associated with differences in hearing ability. Nonetheless, hearing status and age are incomplete predictors of speech-in-noise performance, and long-standing research suggests that this ability draws on more complex cognitive and perceptual factors. In this study, a group of adults ranging in age from 18 to 67 years performed online assessments designed to measure phonetic category sensitivity, questionnaires querying recent noise exposure history and demographic factors, and crucially, a test of speech-in-noise perception. Results show that individual differences in the perception of two consonant contrasts significantly predict speech-in-noise performance, even after accounting for age and recent noise exposure history. This finding supports the hypothesis that individual differences in sensitivity to phonetic categories mediates speech perception in challenging listening situations.


Subject(s)
Individuality , Noise , Phonetics , Speech Perception , Humans , Speech Perception/physiology , Adult , Middle Aged , Male , Female , Young Adult , Aged , Adolescent , Perceptual Masking , Acoustic Stimulation , Speech Acoustics
3.
Am J Speech Lang Pathol ; 33(5): 2536-2555, 2024 Sep 18.
Article in English | MEDLINE | ID: mdl-39240811

ABSTRACT

PURPOSE: The goal of this study was to determine the relationship between the perceptual measure of speech naturalness and objective measures of pitch, loudness, and rate control as a potential tool for assessment of ataxic dysarthria. METHOD: Twenty-seven participants with ataxia and 29 age- and sex-matched control participants completed the pitch glide and loudness step tasks drawn from the Frenchay Dysarthria Assessment-Second Edition (FDA-2) in addition to speech diadochokinetic (DDK) tasks. First, group differences were compared for pitch variability in the pitch glide task, loudness variability in the loudness step task, and syllable duration and speech rate in the DDK task. Then, these acoustic measures were compared with previously collected ratings of speech naturalness by speech-language pathology graduate students. RESULTS: Robust group differences were measured for pitch variability and both DDK syllable duration and speech rate, indicating that the ataxia group had greater pitch variability, longer DDK syllable duration, and slower DDK speech rate than the control group. No group differences were measured for loudness variability. There were robust relationships between speech naturalness and pitch variability, DDK syllable duration, and DDK speech rate, but not for loudness variability. CONCLUSIONS: Objective acoustic measures of pitch variability in the FDA-2 pitch glide task and syllable duration and speech rate in the DDK task can be used to validate perceptual measures of speech naturalness. Overall, speech-language pathologists can incorporate both perceptual measures of speech naturalness and acoustic measures of pitch variability and DDK performance for a comprehensive evaluation of ataxic dysarthria.


Subject(s)
Cerebellar Ataxia , Dysarthria , Speech Acoustics , Speech Production Measurement , Voice Quality , Humans , Female , Male , Middle Aged , Dysarthria/physiopathology , Dysarthria/diagnosis , Dysarthria/etiology , Adult , Cerebellar Ataxia/physiopathology , Aged , Pitch Perception , Case-Control Studies , Loudness Perception , Speech Perception
4.
J Acoust Soc Am ; 156(3): 1850-1861, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-39287467

ABSTRACT

Research has shown that talkers reliably coordinate the timing of articulator movements across variation in production rate and syllable stress, and that this precision of inter-articulator timing instantiates phonetic structure in the resulting acoustic signal. We here tested the hypothesis that immediate auditory feedback helps regulate that consistent articulatory timing control. Talkers with normal hearing recorded 480 /tV#Cat/ utterances using electromagnetic articulography, with alternative V (/ɑ/-/ɛ/) and C (/t/-/d/), across variation in production rate (fast-normal) and stress (first syllable stressed-unstressed). Utterances were split between two listening conditions: unmasked and masked. To quantify the effect of immediate auditory feedback on the coordination between the jaw and tongue-tip, the timing of tongue-tip raising onset for C, relative to the jaw opening-closing cycle for V, was obtained in each listening condition. Across both listening conditions, any manipulation that shortened the jaw opening-closing cycle reduced the latency of tongue-tip movement onset, relative to the onset of jaw opening. Moreover, tongue-tip latencies were strongly affiliated with utterance type. During auditory masking, however, tongue-tip latencies were less strongly affiliated with utterance type, demonstrating that talkers use afferent auditory signals in real-time to regulate the precision of inter-articulator timing in service to phonetic structure.


Subject(s)
Feedback, Sensory , Phonetics , Speech Perception , Tongue , Humans , Tongue/physiology , Male , Female , Adult , Feedback, Sensory/physiology , Young Adult , Speech Perception/physiology , Jaw/physiology , Speech Acoustics , Speech Production Measurement/methods , Time Factors , Speech/physiology , Perceptual Masking
5.
J Acoust Soc Am ; 156(3): 1720-1733, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-39283150

ABSTRACT

Previous research has shown that prosodic structure can regulate the relationship between co-speech gestures and speech itself. Most co-speech studies have focused on manual gestures, but head movements have also been observed to accompany speech events by Munhall, Jones, Callan, Kuratate, and Vatikiotis-Bateson [(2004). Psychol. Sci. 15(2), 133-137], and these co-verbal gestures may be linked to prosodic prominence, as shown by Esteve-Gibert, Borrás-Comes, Asor, Swerts, and Prieto [(2017). J. Acoust. Soc. Am. 141(6), 4727-4739], Hadar, Steiner, Grant, and Rose [(1984). Hum. Mov. Sci. 3, 237-245], and House, Beskow, and Granström [(2001). Lang. Speech 26(2), 117-129]. This study examines how the timing and magnitude of head nods may be related to degrees of prosodic prominence connected to different focus conditions. Using electromagnetic articulometry, a time-varying signal of vertical head movement for 12 native French speakers was generated to examine the relationship between head nod gestures and F0 peaks. The results suggest that speakers use two different alignment strategies, which integrate both temporal and magnitudinal aspects of the gesture. Some evidence of inter-speaker preferences in the use of the two strategies was observed, although the inter-speaker variability is not categorical. Importantly, prosodic prominence itself is not the cause of the difference between the two strategies, but instead magnifies their inherent differences. In this way, the use of co-speech head nod gestures under French focus conditions can be considered as a method of prosodic enhancement.


Subject(s)
Head Movements , Speech Acoustics , Humans , Male , Female , Young Adult , Adult , Speech Production Measurement/methods , Time Factors , Gestures , Voice Quality , France , Language
6.
Sci Justice ; 64(5): 485-497, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39277331

ABSTRACT

Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.


Subject(s)
Forensic Sciences , Humans , Forensic Sciences/methods , Likelihood Functions , Linguistics , Support Vector Machine , Speech Acoustics , Algorithms , Speech
7.
JASA Express Lett ; 4(9)2024 Sep 01.
Article in English | MEDLINE | ID: mdl-39259019

ABSTRACT

Greek uses H*, L + H*, and H* + L, all followed by L-L% edge tones, as nuclear pitch accents in statements. A previous analysis demonstrated that these accents are distinguished by F0 scaling and contour shape. This study expands the earlier investigation by exploring additional cues, namely, voice quality, amplitude, and duration, in distinguishing the pitch accents, and investigating individual variability in the selection of both F0 and non-F0 cues. Bayesian multivariate analysis and hierarchical clustering demonstrate that the accents are distinguished not only by F0 but also by additional cues at the group level, with individual variability in cue selection.


Subject(s)
Cues , Humans , Male , Female , Adult , Speech Acoustics , Voice Quality , Young Adult , Language , Bayes Theorem , Speech Perception/physiology , Pitch Perception/physiology
8.
Forensic Sci Int ; 363: 112199, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39182457

ABSTRACT

A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.


Subject(s)
Phonetics , Speech Acoustics , Voice Quality , Humans , Male , Sound Spectrography , Adult , Forensic Sciences/methods , Middle Aged , Young Adult , Signal Processing, Computer-Assisted
9.
JMIR Aging ; 7: e55126, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39173144

ABSTRACT

BACKGROUND: With the aging global population and the rising burden of Alzheimer disease and related dementias (ADRDs), there is a growing focus on identifying mild cognitive impairment (MCI) to enable timely interventions that could potentially slow down the onset of clinical dementia. The production of speech by an individual is a cognitively complex task that engages various cognitive domains. The ease of audio data collection highlights the potential cost-effectiveness and noninvasive nature of using human speech as a tool for cognitive assessment. OBJECTIVE: This study aimed to construct a machine learning pipeline that incorporates speaker diarization, feature extraction, feature selection, and classification to identify a set of acoustic features derived from voice recordings that exhibit strong MCI detection capability. METHODS: The study included 100 MCI cases and 100 cognitively normal controls matched for age, sex, and education from the Framingham Heart Study. Participants' spoken responses on neuropsychological tests were recorded, and the recorded audio was processed to identify segments of each participant's voice from recordings that included voices of both testers and participants. A comprehensive set of 6385 acoustic features was then extracted from these voice segments using OpenSMILE and Praat software. Subsequently, a random forest model was constructed to classify cognitive status using the features that exhibited significant differences between the MCI and cognitively normal groups. The MCI detection performance of various audio lengths was further examined. RESULTS: An optimal subset of 29 features was identified that resulted in an area under the receiver operating characteristic curve of 0.87, with a 95% CI of 0.81-0.94. The most important acoustic feature for MCI classification was the number of filled pauses (importance score=0.09, P=3.10E-08). There was no substantial difference in the performance of the model trained on the acoustic features derived from different lengths of voice recordings. CONCLUSIONS: This study showcases the potential of monitoring changes to nonsemantic and acoustic features of speech as a way of early ADRD detection and motivates future opportunities for using human speech as a measure of brain health.


Subject(s)
Cognitive Dysfunction , Humans , Cognitive Dysfunction/diagnosis , Cognitive Dysfunction/physiopathology , Female , Male , Aged , Voice/physiology , Machine Learning , Neuropsychological Tests , Middle Aged , Aged, 80 and over , Case-Control Studies , Speech Acoustics
10.
Trends Hear ; 28: 23312165241266316, 2024.
Article in English | MEDLINE | ID: mdl-39183533

ABSTRACT

During continuous speech perception, endogenous neural activity becomes time-locked to acoustic stimulus features, such as the speech amplitude envelope. This speech-brain coupling can be decoded using non-invasive brain imaging techniques, including electroencephalography (EEG). Neural decoding may provide clinical use as an objective measure of stimulus encoding by the brain-for example during cochlear implant listening, wherein the speech signal is severely spectrally degraded. Yet, interplay between acoustic and linguistic factors may lead to top-down modulation of perception, thereby complicating audiological applications. To address this ambiguity, we assess neural decoding of the speech envelope under spectral degradation with EEG in acoustically hearing listeners (n = 38; 18-35 years old) using vocoded speech. We dissociate sensory encoding from higher-order processing by employing intelligible (English) and non-intelligible (Dutch) stimuli, with auditory attention sustained using a repeated-phrase detection task. Subject-specific and group decoders were trained to reconstruct the speech envelope from held-out EEG data, with decoder significance determined via random permutation testing. Whereas speech envelope reconstruction did not vary by spectral resolution, intelligible speech was associated with better decoding accuracy in general. Results were similar across subject-specific and group analyses, with less consistent effects of spectral degradation in group decoding. Permutation tests revealed possible differences in decoder statistical significance by experimental condition. In general, while robust neural decoding was observed at the individual and group level, variability within participants would most likely prevent the clinical use of such a measure to differentiate levels of spectral degradation and intelligibility on an individual basis.


Subject(s)
Acoustic Stimulation , Electroencephalography , Speech Intelligibility , Speech Perception , Humans , Speech Perception/physiology , Female , Male , Adolescent , Adult , Young Adult , Speech Acoustics , Brain/physiology
11.
eNeuro ; 11(8)2024 Aug.
Article in English | MEDLINE | ID: mdl-39095091

ABSTRACT

Adults heard recordings of two spatially separated speakers reading newspaper and magazine articles. They were asked to listen to one of them and ignore the other, and EEG was recorded to assess their neural processing. Machine learning extracted neural sources that tracked the target and distractor speakers at three levels: the acoustic envelope of speech (delta- and theta-band modulations), lexical frequency for individual words, and the contextual predictability of individual words estimated by GPT-4 and earlier lexical models. To provide a broader view of speech perception, half of the subjects completed a simultaneous visual task, and the listeners included both native and non-native English speakers. Distinct neural components were extracted for these levels of auditory and lexical processing, demonstrating that native English speakers had greater target-distractor separation compared with non-native English speakers on most measures, and that lexical processing was reduced by the visual task. Moreover, there was a novel interaction of lexical predictability and frequency with auditory processing; acoustic tracking was stronger for lexically harder words, suggesting that people listened harder to the acoustics when needed for lexical selection. This demonstrates that speech perception is not simply a feedforward process from acoustic processing to the lexicon. Rather, the adaptable context-sensitive processing long known to occur at a lexical level has broader consequences for perception, coupling with the acoustic tracking of individual speakers in noise.


Subject(s)
Electroencephalography , Noise , Speech Perception , Humans , Speech Perception/physiology , Female , Male , Adult , Young Adult , Electroencephalography/methods , Speech Acoustics , Language , Machine Learning
12.
J Acoust Soc Am ; 156(2): 1171-1182, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39158324

ABSTRACT

In this study, a computer-driven, phoneme-agnostic method was explored for assessing speech disorders (SDs) in children, bypassing traditional labor-intensive phonetic transcription. Using the SpeechMark® automatic syllabic cluster (SC) analysis, which detects sequences of acoustic features that characterize well-formed syllables, 1952 American English utterances of 60 preschoolers were analyzed [16 with speech disorder present (SD-P) and 44 with speech disorder not present (SD-NP)] from two dialectal areas. A four-factor regression analysis evaluated the robustness of seven automated measures produced by SpeechMark® and their interactions. SCs significantly predicted SD status (p < 0.001). A secondary analysis using a generalized linear model with a negative binomial distribution evaluated the number of SCs produced by the groups. Results highlighted that children with SD-P produced fewer well-formed clusters [incidence rate ratio (IRR) = 0.8116, p ≤ 0.0137]. The interaction between speech group and age indicated that the effect of age on syllable count was more pronounced in children with SD-P (IRR = 1.0451, p = 0.0251), suggesting that even small changes in age can have a significant effect on SCs. In conclusion, speech status significantly influences the degree to which preschool children produce acoustically well-formed SCs, suggesting the potential for SCs to be speech biomarkers for SD in preschoolers.


Subject(s)
Phonetics , Speech Acoustics , Speech Disorders , Speech Production Measurement , Humans , Child, Preschool , Male , Female , Speech Production Measurement/methods , Speech Disorders/physiopathology , Speech Disorders/diagnosis , Child , Child Language , Age Factors
13.
J Acoust Soc Am ; 156(2): 1202-1213, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39158325

ABSTRACT

Band importance functions for speech-in-noise recognition, typically determined in the presence of steady background noise, indicate a negligible role for extended high frequencies (EHFs; 8-20 kHz). However, recent findings indicate that EHF cues support speech recognition in multi-talker environments, particularly when the masker has reduced EHF levels relative to the target. This scenario can occur in natural auditory scenes when the target talker is facing the listener, but the maskers are not. In this study, we measured the importance of five bands from 40 to 20 000 Hz for speech-in-speech recognition by notch-filtering the bands individually. Stimuli consisted of a female target talker recorded from 0° and a spatially co-located two-talker female masker recorded either from 0° or 56.25°, simulating a masker either facing the listener or facing away, respectively. Results indicated peak band importance in the 0.4-1.3 kHz band and a negligible effect of removing the EHF band in the facing-masker condition. However, in the non-facing condition, the peak was broader and EHF importance was higher and comparable to that of the 3.3-8.3 kHz band in the facing-masker condition. These findings suggest that EHFs contain important cues for speech recognition in listening conditions with mismatched talker head orientations.


Subject(s)
Acoustic Stimulation , Cues , Noise , Perceptual Masking , Recognition, Psychology , Speech Perception , Humans , Female , Speech Perception/physiology , Young Adult , Adult , Male , Audiometry, Speech , Speech Intelligibility , Auditory Threshold , Sound Localization , Speech Acoustics , Sound Spectrography
14.
JASA Express Lett ; 4(8)2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39185931

ABSTRACT

High vowels have higher f0 than low vowels, creating a context effect on the interpretation of f0. Since onset F0 is a cue to stop voicing, the vowel context is expected to influence voicing judgements. Listeners categorized syllables starting with high ("bee"-"pea") and low ("bye"-"pie") vowels varying orthogonally in VOT and onset F0. Listeners made use of both cues as expected. Furthermore, vowel height affected listeners' categorization. Syllables with the low vowel /a/ elicited more voiceless responses compared to syllables with the high vowel /i/. This suggests that listeners compensate for vowel intrinsic effects when making other phonemic judgements.


Subject(s)
Phonetics , Speech Perception , Humans , Speech Perception/physiology , Language , Female , Male , Speech Acoustics , Cues , Adult , Young Adult
15.
J Acoust Soc Am ; 156(2): 1391-1412, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39196103

ABSTRACT

Period-doubled phonation, henceforth, period doubling, characterized by voicing periods that alternate in amplitudes and/or frequencies, is often perceived rough and with an indeterminate pitch. Lower pitch percept has been suggested by past studies when the degree of amplitude or frequency modulation increases. However, how listeners use period doubling when identifying linguistic tones remains unclear. The current study uses tasks of categorization with training, followed by imitation of tones manipulated with period doubling (with amplitude and frequency modulation, both separately and jointly) in a novel language. Native Mandarin and English speakers with different levels of music experience were tested. I show that period doubling leads to a low-tone bias in perception and imitation, especially as the modulation degree, particularly that of frequency, increases. Interestingly, interactions with stimulus f0 and modulation type show that in amplitude-modulated tokens, when compared to lower f0 (200 Hz), higher f0 (300 Hz) drives more low-tone responses. Period doubling is also imitated with lowered f0 and creaky quality. Language and music experience does not affect perceptual and imitative responses, suggesting that the perception of period doubling is not language-specific or conditioned by tonal knowledge. Period doubling likely signals low tones, even when the original f0 is high.


Subject(s)
Phonation , Pitch Perception , Voice Quality , Humans , Male , Female , Young Adult , Adult , Speech Acoustics , Acoustic Stimulation , Speech Perception/physiology , Imitative Behavior , Music , Time Factors , Language
16.
J Acoust Soc Am ; 156(2): 1380-1390, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39196104

ABSTRACT

For most of his illustrious career, Ken Stevens focused on examining and documenting the rich detail about vocal tract changes available to listeners underlying the acoustic signal of speech. Current approaches to speech inversion take advantage of this rich detail to recover information about articulatory movement. Our previous speech inversion work focused on movements of the tongue and lips, for which "ground truth" is readily available. In this study, we describe acquisition and validation of ground-truth articulatory data about velopharyngeal port constriction, using both the well-established measure of nasometry plus a novel technique-high-speed nasopharyngoscopy. Nasometry measures the acoustic output of the nasal and oral cavities to derive the measure nasalance. High-speed nasopharyngoscopy captures images of the nasopharyngeal region and can resolve velar motion during speech. By comparing simultaneously collected data from both acquisition modalities, we show that nasalance is a sufficiently sensitive measure to use as ground truth for our speech inversion system. Further, a speech inversion system trained on nasalance can recover known patterns of velopharyngeal port constriction shown by American English speakers. Our findings match well with Stevens' own studies of the acoustics of nasal consonants.


Subject(s)
Speech Acoustics , Speech Production Measurement , Humans , Male , Speech Production Measurement/methods , Adult , Female , Young Adult , Voice Quality , Constriction, Pathologic , Speech/physiology , Endoscopy/methods , Endoscopy/instrumentation
17.
J Acoust Soc Am ; 156(2): 1367-1379, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39189786

ABSTRACT

Predictions of gradient degree of lenition of voiceless and voiced stops in a corpus of Argentine Spanish are evaluated using three acoustic measures (minimum and maximum intensity velocity and duration) and two recurrent neural network (Phonet) measures (posterior probabilities of sonorant and continuant phonological features). While mixed and inconsistent predictions were obtained across the acoustic metrics, sonorant and continuant probability values were consistently in the direction predicted by known factors of a stop's lenition with respect to its voicing, place of articulation, and surrounding contexts. The results suggest the effectiveness of Phonet as an additional or alternative method of lenition measurement. Furthermore, this study has enhanced the accessibility of Phonet by releasing the trained Spanish Phonet model used in this study and a pipeline with step-by-step instructions for training and inferencing new models.


Subject(s)
Neural Networks, Computer , Phonetics , Speech Acoustics , Humans , Speech Production Measurement/methods , Time Factors , Probability , Acoustics
18.
PLoS One ; 19(8): e0308655, 2024.
Article in English | MEDLINE | ID: mdl-39163326

ABSTRACT

While many studies focus on segmental variation in Parkinsonian speech, little is known about prosodic modulations reflecting the ability to adapt to communicative demands in people with Parkinson's disease (PwPD). This type of prosodic modulation is important for social interaction, and it involves modifications in speech melody (intonational level) and articulation of consonants and vowels (segmental level). The present study investigates phonetic cues of prosodic modulations with respect to different focus structures in mild dysarthric PwPD as a function of levodopa. Acoustic and kinematic speech parameters of 25 PwPD were assessed in two motor conditions. Speech production data from PwPD were collected before (medication-OFF) and after levodopa intake (medication-ON) by means of 3-D electromagnetic articulography. On the acoustic level, intensity, pitch, and syllable durations were analyzed. On the kinematic level, movement duration and amplitude were investigated. Spatio-temporal modulations of speech parameters were examined and compared across three different prosodic focus structures (out-of-focus, broad focus, contrastive focus) to display varying speech demands. Overall, levodopa had beneficial effects on motor performance, speech loudness, and pitch modulation. Acoustic syllable durations and kinematic movement durations did not change, revealing no systematic effects of motor status on the temporal domain. In contrast, there were spatial modulations of the oral articulators: tongue tip movements were smaller and lower lip movements were larger in amplitude under levodopa, reflecting a more agile and efficient articulatory movement under levodopa. Thus, respiratory-phonatory functions and consonant production improved, while syllable duration and tongue body kinematics did not change. Interestingly, prominence marking strategies were comparable between the medication conditions under investigation, and in fact, appear to be preserved in mild dysarthric PwPD.


Subject(s)
Levodopa , Parkinson Disease , Humans , Parkinson Disease/physiopathology , Parkinson Disease/drug therapy , Male , Female , Aged , Middle Aged , Levodopa/therapeutic use , Levodopa/administration & dosage , Levodopa/pharmacology , Speech/physiology , Speech Acoustics , Biomechanical Phenomena , Phonetics , Dysarthria/physiopathology , Dysarthria/etiology
19.
J Acoust Soc Am ; 156(2): 1221-1230, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39162416

ABSTRACT

Voice and speech production change with age, which can lead to potential communication challenges. This study explored the use of Landmark-based analysis of speech (LMBAS), a knowledge-based speech analysis algorithm based on Stevens' Landmark Theory, to describe age-related changes in adult speakers. The speech samples analyzed were sourced from the University of Florida Aging Voice Database, which included recordings of 16 sentences from the Speech Perception in Noise test of Bilger, Rzcezkowski, Nuetzel, and Rabinowitz [J. Acoust. Soc. Am. 65, S98-S98 (1979)] and Bilger, Nuetzel, Rabinowitz, and Rzeczkowski [J. Speech. Lang. Hear. Res. 27, 32-84 (1984)]. These sentences were read in quiet environments by 50 young, 50 middle-aged, and 50 older American English speakers, with an equal distribution of sexes. Acoustic landmarks, specifically, glottal, bursts, and syllabicity landmarks, were extracted using SpeechMark®, MATLAB Toolbox version 1.1.2. The results showed significant age effect on glottal and burst landmarks. Furthermore, the sex effect was significant for burst and syllabicity landmarks. While the results of LMBAS suggest its potential in detecting age-related changes in speech, increase in syllabicity landmarks with age was unexpected. This finding may suggest the need for further refinement and adjustment of this analytical approach.


Subject(s)
Aging , Speech Acoustics , Speech Production Measurement , Humans , Male , Female , Middle Aged , Aged , Adult , Young Adult , Aging/physiology , Speech Production Measurement/methods , Age Factors , Voice Quality , Algorithms , Aged, 80 and over , Speech Perception/physiology , Speech/physiology
20.
J Acoust Soc Am ; 156(2): 1440-1460, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39213460

ABSTRACT

This study investigates whether downstep in Japanese is directly triggered by accents. When the pitch height of a word X is lower after an accented word (A) than after an unaccented word (U), X is diagnosed as downstepped. However, this diagnosis involves two confounding factors: the already lowered F0 before X and phonological phrasing. To control these factors, this study contrasts genitive and nominative case markers and adjusts measurement points. Eight native speakers of Tokyo Japanese participated in a production experiment. The results show six key findings. First, a structure-dependent F0 downtrend was observed in UX. Second, higher F0 peaks with larger initial lowering were observed after accents with a nominative case marker compared to those with a genitive case marker, suggesting a boosting effect by boundaries. Third, larger initial lowering was observed in AX compared to UX, contradicting the notion that X is more compressed in AX due to downstep. Fourth, the paradigmatic difference in F0 height between AX and UX decreases when F0 of X is increased, supporting that boundaries trigger downstep. Fifth, downstep is not physiologically constrained but is phonologically controlled. Finally, the blocking of initial lowering in heavy syllables is not phonological but rather an articulatory phenomenon.


Subject(s)
Phonetics , Speech Acoustics , Speech Production Measurement , Humans , Male , Female , Young Adult , Speech Production Measurement/methods , Adult , Voice Quality , Speech Perception
SELECTION OF CITATIONS
SEARCH DETAIL