Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 8.819
Filtrar
1.
J Acoust Soc Am ; 155(5): 3206-3212, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38738937

RESUMEN

Modern humans and chimpanzees share a common ancestor on the phylogenetic tree, yet chimpanzees do not spontaneously produce speech or speech sounds. The lab exercise presented in this paper was developed for undergraduate students in a course entitled "What's Special About Human Speech?" The exercise is based on acoustic analyses of the words "cup" and "papa" as spoken by Viki, a home-raised, speech-trained chimpanzee, as well as the words spoken by a human. The analyses allow students to relate differences in articulation and vocal abilities between Viki and humans to the known anatomical differences in their vocal systems. Anatomical and articulation differences between humans and Viki include (1) potential tongue movements, (2) presence or absence of laryngeal air sacs, (3) presence or absence of vocal membranes, and (4) exhalation vs inhalation during production.


Asunto(s)
Pan troglodytes , Acústica del Lenguaje , Habla , Humanos , Animales , Pan troglodytes/fisiología , Habla/fisiología , Lengua/fisiología , Lengua/anatomía & histología , Vocalización Animal/fisiología , Especificidad de la Especie , Medición de la Producción del Habla , Laringe/fisiología , Laringe/anatomía & histología , Fonética
2.
J Acoust Soc Am ; 155(5): 2934-2947, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38717201

RESUMEN

Spatial separation and fundamental frequency (F0) separation are effective cues for improving the intelligibility of target speech in multi-talker scenarios. Previous studies predominantly focused on spatial configurations within the frontal hemifield, overlooking the ipsilateral side and the entire median plane, where localization confusion often occurs. This study investigated the impact of spatial and F0 separation on intelligibility under the above-mentioned underexplored spatial configurations. The speech reception thresholds were measured through three experiments for scenarios involving two to four talkers, either in the ipsilateral horizontal plane or in the entire median plane, utilizing monotonized speech with varying F0s as stimuli. The results revealed that spatial separation in symmetrical positions (front-back symmetry in the ipsilateral horizontal plane or front-back, up-down symmetry in the median plane) contributes positively to intelligibility. Both target direction and relative target-masker separation influence the masking release attributed to spatial separation. As the number of talkers exceeds two, the masking release from spatial separation diminishes. Nevertheless, F0 separation remains as a remarkably effective cue and could even facilitate spatial separation in improving intelligibility. Further analysis indicated that current intelligibility models encounter difficulties in accurately predicting intelligibility in scenarios explored in this study.


Asunto(s)
Señales (Psicología) , Enmascaramiento Perceptual , Localización de Sonidos , Inteligibilidad del Habla , Percepción del Habla , Humanos , Femenino , Masculino , Adulto Joven , Adulto , Percepción del Habla/fisiología , Estimulación Acústica , Umbral Auditivo , Acústica del Lenguaje , Prueba del Umbral de Recepción del Habla , Ruido
3.
J Acoust Soc Am ; 155(5): 3060-3070, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38717210

RESUMEN

Speakers tailor their speech to different types of interlocutors. For example, speech directed to voice technology has different acoustic-phonetic characteristics than speech directed to a human. The present study investigates the perceptual consequences of human- and device-directed registers in English. We compare two groups of speakers: participants whose first language is English (L1) and bilingual L1 Mandarin-L2 English talkers. Participants produced short sentences in several conditions: an initial production and a repeat production after a human or device guise indicated either understanding or misunderstanding. In experiment 1, a separate group of L1 English listeners heard these sentences and transcribed the target words. In experiment 2, the same productions were transcribed by an automatic speech recognition (ASR) system. Results show that transcription accuracy was highest for L1 talkers for both human and ASR transcribers. Furthermore, there were no overall differences in transcription accuracy between human- and device-directed speech. Finally, while human listeners showed an intelligibility benefit for coda repair productions, the ASR transcriber did not benefit from these enhancements. Findings are discussed in terms of models of register adaptation, phonetic variation, and human-computer interaction.


Asunto(s)
Multilingüismo , Inteligibilidad del Habla , Percepción del Habla , Humanos , Masculino , Femenino , Adulto , Adulto Joven , Acústica del Lenguaje , Fonética , Software de Reconocimiento del Habla
4.
J Acoust Soc Am ; 155(5): 2990-3004, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38717206

RESUMEN

Speakers can place their prosodic prominence on any locations within a sentence, generating focus prosody for listeners to perceive new information. This study aimed to investigate age-related changes in the bottom-up processing of focus perception in Jianghuai Mandarin by clarifying the perceptual cues and the auditory processing abilities involved in the identification of focus locations. Young, middle-aged, and older speakers of Jianghuai Mandarin completed a focus identification task and an auditory perception task. The results showed that increasing age led to a decrease in listeners' accuracy rate in identifying focus locations, with all participants performing the worst when dynamic pitch cues were inaccessible. Auditory processing abilities did not predict focus perception performance in young and middle-aged listeners but accounted significantly for the variance in older adults' performance. These findings suggest that age-related deteriorations in focus perception can be largely attributed to declined auditory processing of perceptual cues. Poor ability to extract frequency modulation cues may be the most important underlying psychoacoustic factor for older adults' difficulties in perceiving focus prosody in Jianghuai Mandarin. The results contribute to our understanding of the bottom-up mechanisms involved in linguistic prosody processing in aging adults, particularly in tonal languages.


Asunto(s)
Envejecimiento , Señales (Psicología) , Percepción del Habla , Humanos , Persona de Mediana Edad , Anciano , Masculino , Femenino , Envejecimiento/psicología , Envejecimiento/fisiología , Adulto Joven , Adulto , Percepción del Habla/fisiología , Factores de Edad , Acústica del Lenguaje , Estimulación Acústica , Percepción de la Altura Tonal , Lenguaje , Calidad de la Voz , Psicoacústica , Audiometría del Habla
5.
J Acoust Soc Am ; 155(5): 3090-3100, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38717212

RESUMEN

The perceived level of femininity and masculinity is a prominent property by which a speaker's voice is indexed, and a vocal expression incongruent with the speaker's gender identity can greatly contribute to gender dysphoria. Our understanding of the acoustic cues to the levels of masculinity and femininity perceived by listeners in voices is not well developed, and an increased understanding of them would benefit communication of therapy goals and evaluation in gender-affirming voice training. We developed a voice bank with 132 voices with a range of levels of femininity and masculinity expressed in the voice, as rated by 121 listeners in independent, individually randomized perceptual evaluations. Acoustic models were developed from measures identified as markers of femininity or masculinity in the literature using penalized regression and tenfold cross-validation procedures. The 223 most important acoustic cues explained 89% and 87% of the variance in the perceived level of femininity and masculinity in the evaluation set, respectively. The median fo was confirmed to provide the primary cue, but other acoustic properties must be considered in accurate models of femininity and masculinity perception. The developed models are proposed to afford communication and evaluation of gender-affirming voice training goals and improve voice synthesis efforts.


Asunto(s)
Señales (Psicología) , Acústica del Lenguaje , Percepción del Habla , Calidad de la Voz , Humanos , Femenino , Masculino , Adulto , Adulto Joven , Masculinidad , Persona de Mediana Edad , Feminidad , Adolescente , Identidad de Género , Acústica
6.
J Acoust Soc Am ; 155(5): 3071-3089, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38717213

RESUMEN

This study investigated how 40 Chinese learners of English as a foreign language (EFL learners) differed from 40 native English speakers in the production of four English tense-lax contrasts, /i-ɪ/, /u-ʊ/, /ɑ-ʌ/, and /æ-ε/, by examining the acoustic measurements of duration, the first three formant frequencies, and the slope of the first formant movement (F1 slope). The dynamic formant trajectory was modeled using discrete cosine transform coefficients to demonstrate the time-varying properties of formant trajectories. A discriminant analysis was employed to illustrate the extent to which Chinese EFL learners relied on different acoustic parameters. This study found that: (1) Chinese EFL learners overemphasized durational differences and weakened spectral differences for the /i-ɪ/, /u-ʊ/, and /ɑ-ʌ/ pairs, although they maintained sufficient spectral differences for /æ-ε/. In contrast, native English speakers predominantly used spectral differences across all four pairs; (2) in non-low tense-lax contrasts, unlike native English speakers, Chinese EFL learners failed to exhibit different F1 slope values, indicating a non-nativelike tongue-root placement during the articulatory process. The findings underscore the contribution of dynamic spectral patterns to the differentiation between English tense and lax vowels, and reveal the influence of precise articulatory gestures on the realization of the tense-lax contrast.


Asunto(s)
Multilingüismo , Fonética , Acústica del Lenguaje , Humanos , Masculino , Femenino , Adulto Joven , Medición de la Producción del Habla , Adulto , Lenguaje , Acústica , Aprendizaje , Calidad de la Voz , Espectrografía del Sonido , Pueblos del Este de Asia
7.
Codas ; 36(4): e20230148, 2024.
Artículo en Portugués, Inglés | MEDLINE | ID: mdl-38775526

RESUMEN

PURPOSE: To evaluate the immediate effect of the inspiratory exercise with a booster and a respiratory exerciser on the voice of women without vocal complaints. METHODS: 25 women with no vocal complaints, between 18 and 34 years old, with a score of 1 on the Vocal Disorder Screening Index (ITDV) participated. Data collection was performed before and after performing the inspiratory exercise and consisted of recording the sustained vowel /a/, connected speech and maximum phonatory times (MPT) of vowels, fricative phonemes and counting numbers. In the auditory-perceptual judgment, the Vocal Deviation Scale (VSD) was used to verify the general degree of vocal deviation. Acoustic evaluation was performed using the PRAAT software and the parameters fundamental frequency (f0), jitter, shimmer, harmonium-to-noise ratio (HNR), Cepstral Peak Prominence Smoothed (CPPS), Acoustic Voice Quality Index (AVQI) and Acoustic Breathiness Index (ABI). To measure the aerodynamic measurements, the time of each emission was extracted in the Audacity program. Data were statistically analyzed using the Statistica for Windows software and normality was tested using the Shapiro-Wilk test. To compare the results, Student's and Wilcoxon's t tests were applied, adopting a significance level of 5%. RESULTS: There were no significant differences between the results of the JPA and the acoustic measures, in the pre and post inspiratory exercise moments. As for the aerodynamic measures, it was possible to observe a significant increase in the value of the TMF /s/ (p=0.008). CONCLUSION: There was no change in vocal quality after the inspiratory exercise with stimulator and respiratory exerciser, but an increase in the MPT of the phoneme /s/ was observed after the exercise.


OBJETIVO: Avaliar o efeito imediato do exercício inspiratório com incentivador e exercitador respiratório na voz de mulheres sem queixas vocais. MÉTODO: Participaram 25 mulheres sem queixas vocais, entre 18 e 34 anos, com pontuação 1 no Índice de Triagem para Distúrbio Vocal (ITDV). A coleta de dados foi realizada nos momentos antes e após realização de exercício inspiratório e consistiu na gravação de vogal sustentada /a/, fala encadeada e tempos máximos fonatórios (TMF) de vogais, fonemas fricativos e contagem de números. No julgamento perceptivo-auditivo foi utilizada a Escala de Desvio Vocal (EDV) para verificar o grau geral do desvio vocal. Avaliação acústica foi feita no software PRAAT e foram extraídos os parâmetros frequência fundamental (f0), jitter, shimmer, proporção harmônico -ruído (HNR), Cepstral Peak Prominence Smoothed (CPPS), Acoustic Voice Quality Index (AVQI) e Acoustic Breathiness Index (ABI). Para mensuração das medidas aerodinâmicas, o tempo de emissão foi extraído no programa Audacity. Para comparar os resultados utilizou-se o teste paramétrico t de Student para amostras dependentes na análise das variáveis com distribuição normal e o teste de Wilcoxon para variáveis com distribuição não normal. RESULTADOS: Não houve diferenças entre os resultados do JPA e das medidas acústicas, nos momentos pré e pós exercício inspiratório. Quanto às medidas aerodinâmicas foi possível observar aumento significativo no valor do TMF /s/ (p=0,008). CONCLUSÃO: Não houve modificação na qualidade vocal após o exercício inspiratório com incentivador e exercitador respiratório, porém foi observado aumento do TMF do fonema /s/ após a realização do exercício.


Asunto(s)
Ejercicios Respiratorios , Calidad de la Voz , Humanos , Femenino , Adulto , Adulto Joven , Adolescente , Ejercicios Respiratorios/métodos , Acústica del Lenguaje , Trastornos de la Voz/fisiopatología , Trastornos de la Voz/diagnóstico , Fonación/fisiología
8.
J Acoust Soc Am ; 155(4): 2285-2301, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38557735

RESUMEN

Fronting of the vowels /u, ʊ, o/ is observed throughout most North American English varieties, but has been analyzed mainly in terms of acoustics rather than articulation. Because an increase in F2, the acoustic correlate of vowel fronting, can be the result of any gesture that shortens the front cavity of the vocal tract, acoustic data alone do not reveal the combination of tongue fronting and/or lip unrounding that speakers use to produce fronted vowels. It is furthermore unresolved to what extent the articulation of fronted back vowels varies according to consonantal context and how the tongue and lips contribute to the F2 trajectory throughout the vowel. This paper presents articulatory and acoustic data on fronted back vowels from two varieties of American English: coastal Southern California and South Carolina. Through analysis of dynamic acoustic, ultrasound, and lip video data, it is shown that speakers of both varieties produce fronted /u, ʊ, o/ with rounded lips, and that high F2 observed for these vowels is associated with a front-central tongue position rather than unrounded lips. Examination of time-varying formant trajectories and articulatory configurations shows that the degree of vowel-internal F2 change is predominantly determined by coarticulatory influence of the coda.


Asunto(s)
Fonética , Acústica del Lenguaje , Estados Unidos , Acústica , Lenguaje , South Carolina
9.
J Acoust Soc Am ; 155(4): R7-R8, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38558083

RESUMEN

The Reflections series takes a look back on historical articles from The Journal of the Acoustical Society of America that have had a significant impact on the science and practice of acoustics.


Asunto(s)
Percepción del Habla , Acústica , Acústica del Lenguaje , Cognición
10.
PLoS One ; 19(4): e0301514, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38564597

RESUMEN

Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning.


Asunto(s)
Corteza Auditiva , Percepción del Habla , Humanos , Habla/fisiología , Percepción del Habla/fisiología , Acústica , Movimiento , Fonética , Acústica del Lenguaje
11.
J Acoust Soc Am ; 155(4): 2612-2626, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38629882

RESUMEN

This study presents an acoustic investigation of the vowel inventory of Drehu (Southern Oceanic Linkage), spoken in New Caledonia. Reportedly, Drehu has a 14 vowel system distinguishing seven vowel qualities and an additional length distinction. Previous phonological descriptions were based on impressionistic accounts showing divergent proposals for two out of seven reported vowel qualities. This study presents the first phonetic investigation of Drehu vowels based on acoustic data from eight speakers. To examine the phonetic correlates of the proposed phonological vowel inventory, multi-point acoustic analyses were used, and vowel inherent spectral change (VISC) was investigated (F1, F2, and F3). Additionally, vowel duration was measured. Contrary to reports from other studies on VISC in monophthongs, we find that monophthongs in Drehu are mostly steady state. We propose a revised vowel inventory and focus on the acoustic description of open-mid /ɛ/ and the central vowel /ə/, whose status was previously unclear. Additionally, we find that vowel quality stands orthogonal to vowel quantity by demonstrating that the phonological vowel length distinction is primarily based on a duration cue rather than formant structure. Finally, we report the acoustic properties of the seven vowel qualities that were identified.


Asunto(s)
Fonética , Acústica del Lenguaje , Acústica
12.
J Acoust Soc Am ; 155(4): 2698-2706, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38639561

RESUMEN

The notion of the "perceptual center" or the "P-center" has been put forward to account for the repeated finding that acoustic and perceived syllable onsets do not necessarily coincide, at least in the perception of simple monosyllables or disyllables. The magnitude of the discrepancy between acoustics and perception-the location of the P-center in the speech signal- has proven difficult to estimate, though acoustic models of the effect do exist. The present study asks if the P-center effect can be documented in natural connected speech of English and Japanese and examines if an acoustic model that defines the P-center as the moment of the fastest energy change in a syllabic amplitude envelope adequately reflects the P-center in the two languages. A sensorimotor synchronization paradigm was deployed to address the research questions. The results provide evidence for the existence of the P-center effect in speech of both languages while the acoustic P-center model is found to be less applicable to Japanese. Sensorimotor synchronization patterns further suggest that the P-center may reflect perceptual anticipation of a vowel onset.


Asunto(s)
Acústica del Lenguaje , Percepción del Habla , Humanos , Fonética , Habla , Lenguaje
13.
Int J Pediatr Otorhinolaryngol ; 180: 111962, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38657429

RESUMEN

PURPOSE: In this prospective study, we aimed to investigate the difference in voice acoustic parameters between girls with idiopathic central precocious puberty (ICPP) and those who developed normally during prepuberty. MATERIALS AND METHODS: Our study recruited 54 girls diagnosed with ICPP and randomly sampled 51 healthy prepubertal girls as the control. Tanner stages, circulating hormone levels and bone ages of the girls with ICPP and the age and body mass index (BMI) of all participants were recorded. Acoustic analyses were performed using PRAAT computer-based voice analysis software and the mean pitch (F0), jitter, shimmer, noise-to harmonic-ratio (NHR) and harmonic-to-noise ratio (HNR) values were compared in the patient and control groups. RESULTS: The two groups did not significantly differ in age or BMI. In the evaluation of the F0 and jitter values, we were found to be lower in the control group than in the patient group. However, we did not find a statistical significance. The mean shimmer values of the patient group were significantly higher than those of the control group. In addition, a statistically significant difference was noted for the mean HNR and NHR values (P < 0.001). A moderate negative correlation was found between shimmer and hormone levels in the patient group. CONCLUSIONS: Voice acoustic parameters one of the defining features of girls with ICPP. Voice changes in acoustic parameters could reflect hormonal changes during puberty. Clinicians should suspect ICPP when there is a change in the voice.


Asunto(s)
Pubertad Precoz , Humanos , Pubertad Precoz/sangre , Femenino , Niño , Estudios Prospectivos , Calidad de la Voz/fisiología , Acústica del Lenguaje , Estudios de Casos y Controles , Voz/fisiología , Índice de Masa Corporal
14.
J Speech Lang Hear Res ; 67(5): 1360-1369, 2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38629972

RESUMEN

PURPOSE: According to the interpersonal synergy model of spoken dialogue, interlocutors modify their communicative behaviors to meet the contextual demands of a given conversation. Although a growing body of research supports this postulation for linguistic behaviors (e.g., semantics, syntax), little is understood about how this model applies to speech behaviors (e.g., speech rate, pitch). The purpose of this study is to test the hypothesis that interlocutors adjust their speech behaviors across different conversational tasks with different conversational goals. METHOD: In this study, 28 participants each engaged in two different types of conversations (i.e., relational and informational) with two partners (i.e., Partner 1 and Partner 2), yielding a total of 112 conversations. We compared six acoustic measures of participant speech behavior across conversational task and partner. RESULTS: Linear mixed-effects models demonstrated significant differences between speech feature measures in informational and relational conversations. Furthermore, these findings were generally robust across conversations with different partners. CONCLUSIONS: Results suggest that contextual demands influence speech behaviors. These findings provide empirical support for the interpersonal synergy model and highlight important considerations for assessing speech behaviors in individuals with communication disorders.


Asunto(s)
Relaciones Interpersonales , Habla , Humanos , Masculino , Femenino , Adulto Joven , Adulto , Acústica del Lenguaje , Conducta Verbal , Comunicación
15.
Codas ; 36(3): e20230175, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38629682

RESUMEN

PURPOSE: To assess the influence of the listener experience, measurement scales and the type of speech task on the auditory-perceptual evaluation of the overall severity (OS) of voice deviation and the predominant type of voice (rough, breathy or strain). METHODS: 22 listeners, divided into four groups participated in the study: speech-language pathologist specialized in voice (SLP-V), SLP non specialized in voice (SLP-NV), graduate students with auditory-perceptual analysis training (GS-T), and graduate students without auditory-perceptual analysis training (GS-U). The subjects rated the OS of voice deviation and the predominant type of voice of 44 voices by visual analog scale (VAS) and the numerical scale (score "G" from GRBAS), corresponding to six speech tasks such as sustained vowel /a/ and /ɛ/, sentences, number counting, running speech, and all five previous tasks together. RESULTS: Sentences obtained the best interrater reliability in each group, using both VAS and GRBAS. SLP-NV group demonstrated the best interrater reliability in OS judgment in different speech tasks using VAS or GRBAS. Sustained vowel (/a/ and /ɛ/) and running speech obtained the best interrater reliability among the groups of listeners in judging the predominant vocal quality. GS-T group got the best result of interrater reliability in judging the predominant vocal quality. CONCLUSION: The time of experience in the auditory-perceptual judgment of the voice, the type of training to which they were submitted, and the type of speech task influence the reliability of the auditory-perceptual evaluation of vocal quality.


Asunto(s)
Disfonía , Percepción del Habla , Humanos , Habla , Reproducibilidad de los Resultados , Medición de la Producción del Habla , Variaciones Dependientes del Observador , Calidad de la Voz , Acústica del Lenguaje
16.
J Acoust Soc Am ; 155(4): 2849-2859, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38682914

RESUMEN

The context-based Extended Speech Transmission Index (cESTI) (van Schoonhoven et al., 2022, J. Acoust. Soc. Am. 151, 1404-1415) was successfully applied to predict the intelligibility of monosyllabic words with different degrees of context in interrupted noise. The current study aimed to use the same model for the prediction of sentence intelligibility in different types of non-stationary noise. The necessary context factors and transfer functions were based on values found in existing literature. The cESTI performed similar to or better than the original ESTI when noise had speech-like characteristics. We hypothesize that the remaining inaccuracies in model predictions can be attributed to the limits of the modelling approach with regard to mechanisms, such as modulation masking and informational masking.


Asunto(s)
Ruido , Enmascaramiento Perceptual , Inteligibilidad del Habla , Percepción del Habla , Humanos , Enmascaramiento Perceptual/fisiología , Femenino , Percepción del Habla/fisiología , Masculino , Adulto , Adulto Joven , Acústica del Lenguaje , Modelos Teóricos , Estimulación Acústica
17.
J Acoust Soc Am ; 155(4): 2836-2848, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38682915

RESUMEN

This paper evaluates an innovative framework for spoken dialect density prediction on children's and adults' African American English. A speaker's dialect density is defined as the frequency with which dialect-specific language characteristics occur in their speech. Rather than treating the presence or absence of a target dialect in a user's speech as a binary decision, instead, a classifier is trained to predict the level of dialect density to provide a higher degree of specificity in downstream tasks. For this, self-supervised learning representations from HuBERT, handcrafted grammar-based features extracted from ASR transcripts, prosodic features, and other feature sets are experimented with as the input to an XGBoost classifier. Then, the classifier is trained to assign dialect density labels to short recorded utterances. High dialect density level classification accuracy is achieved for child and adult speech and demonstrated robust performance across age and regional varieties of dialect. Additionally, this work is used as a basis for analyzing which acoustic and grammatical cues affect machine perception of dialect.


Asunto(s)
Negro o Afroamericano , Acústica del Lenguaje , Humanos , Adulto , Niño , Masculino , Femenino , Medición de la Producción del Habla/métodos , Lenguaje , Preescolar , Adulto Joven , Percepción del Habla , Adolescente , Fonética , Lenguaje Infantil
18.
JASA Express Lett ; 4(4)2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38687585

RESUMEN

This paper examines the adaptations African American English speakers make when imagining talking to a voice assistant, compared to a close friend/family member and to a stranger. Results show that speakers slowed their rate and produced less pitch variation in voice-assistant-"directed speech" (DS), relative to human-DS. These adjustments were not mediated by how often participants reported experiencing errors with automatic speech recognition. Overall, this paper addresses a limitation in the types of language varieties explored when examining technology-DS registers and contributes to our understanding of the dynamics of human-computer interaction.


Asunto(s)
Negro o Afroamericano , Humanos , Masculino , Femenino , Adulto , Imaginación , Habla , Lenguaje , Adulto Joven , Acústica del Lenguaje
19.
J Speech Lang Hear Res ; 67(4): 1090-1106, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38498664

RESUMEN

PURPOSE: This study examined speech changes induced by deep-brain stimulation (DBS) in speakers with Parkinson's disease (PD) using a set of auditory-perceptual and acoustic measures. METHOD: Speech recordings from nine speakers with PD and DBS were compared between DBS-On and DBS-Off conditions using auditory-perceptual and acoustic analyses. Auditory-perceptual ratings included voice quality, articulation precision, prosody, speech intelligibility, and listening effort obtained from 44 listeners. Acoustic measures were made for voicing proportion, second formant frequency slope, vowel dispersion, articulation rate, and range of fundamental frequency and intensity. RESULTS: No significant changes were found between DBS-On and DBS-Off for the five perceptual ratings. Four of six acoustic measures revealed significant differences between the two conditions. While articulation rate and acoustic vowel dispersion increased, voicing proportion and intensity range decreased from the DBS-Off to DBS-On condition. However, a visual examination of the data indicated that the statistical significance was mostly driven by a small number of participants, while the majority did not show a consistent pattern of such changes. CONCLUSIONS: Our data, in general, indicate no-to-minimal changes in speech production ensued from DBS stimulation. The findings are discussed with a focus on large interspeaker variability in PD in terms of their speech characteristics and the potential effects of DBS on speech.


Asunto(s)
Estimulación Encefálica Profunda , Enfermedad de Parkinson , Humanos , Acústica , Inteligibilidad del Habla/fisiología , Calidad de la Voz , Enfermedad de Parkinson/complicaciones , Enfermedad de Parkinson/terapia , Encéfalo , Acústica del Lenguaje
20.
Am J Speech Lang Pathol ; 33(3): 1113-1126, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38501906

RESUMEN

PURPOSE: The study of gender and speech has historically excluded studies of transmasculine individuals. Consequently, generalizations about speech and gender are based on cisgender individuals. This lack of representation hinders clinical training and clinical service delivery, particularly by speech-language pathologists providing gender-affirming communication services. This letter describes a new corpus of the speech of American English-speaking transmasculine men, transmasculine nonbinary people, and cisgender men that is open and available to clinicians and researchers. METHOD: Twenty masculine-presenting native English speakers from the Upper Midwestern United States (including cisgender men, transmasculine men, and transmasculine nonbinary people) were recorded, producing three sets of speech materials: Consensus Auditory-Perceptual Evaluation of Voice sentences, the Rainbow Passage, and a novel set of sentences developed for this project. Acoustic measures vowels (overall formant frequency scaling, vowel-space dispersion, fundamental frequency, breathiness), consonants (voice onset time of word-initial voiceless stops, spectral moments of word-initial /s/), and the entire sentence (rate of speech) that were made. RESULTS: The acoustic measures reveal a wide range for all dependent measures and low correlations among the measures. Results show that many of the voices depart considerably from the norms for men's speech in published studies. CONCLUSION: This new corpus can be used to illustrate different ways of sounding masculine by speech-language pathologists performing gender-affirming communication services and by higher education teachers as examples of diverse ways of sounding masculine.


Asunto(s)
Acústica del Lenguaje , Medición de la Producción del Habla , Personas Transgénero , Calidad de la Voz , Humanos , Masculino , Personas Transgénero/psicología , Adulto , Adulto Joven , Patología del Habla y Lenguaje/métodos , Femenino , Persona de Mediana Edad , Fonética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...