Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
J Acoust Soc Am ; 153(2): 1108, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36859141

RESUMEN

Listeners parse the speech signal effortlessly into words and phrases, but many questions remain about how. One classic idea is that rhythm-related auditory principles play a role, in particular, that a psycho-acoustic "iambic-trochaic law" (ITL) ensures that alternating sounds varying in intensity are perceived as recurrent binary groups with initial prominence (trochees), while alternating sounds varying in duration are perceived as binary groups with final prominence (iambs). We test the hypothesis that the ITL is in fact an indirect consequence of the parsing of speech along two in-principle orthogonal dimensions: prominence and grouping. Results from several perception experiments show that the two dimensions, prominence and grouping, are each reliably cued by both intensity and duration, while foot type is not associated with consistent cues. The ITL emerges only when one manipulates either intensity or duration in an extreme way. Overall, the results suggest that foot perception is derivative of the cognitively more basic decisions of grouping and prominence, and the notions of trochee and iamb may not play any direct role in speech parsing. A task manipulation furthermore gives new insight into how these decisions mutually inform each other.


Asunto(s)
Acústica , Habla , Señales (Psicología) , Grupo Social , Sonido
2.
J Acoust Soc Am ; 148(2): 793, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32872992

RESUMEN

A number of recent studies have observed that phonetic variability is constrained across speakers, where speakers exhibit limited variation in the signalling of phonological contrasts in spite of overall differences between speakers. This previous work focused predominantly on controlled laboratory speech and on contrasts in English and German, leaving unclear how such speaker variability is structured in spontaneous speech and in phonological contrasts that make substantial use of more than one acoustic cue. This study attempts to both address these empirical gaps and expand the empirical scope of research investigating structured variability by examining how speakers vary in the use of positive voice onset time and voicing during closure in marking the stop voicing contrast in Japanese spontaneous speech. Strong covarying relationships within each cue across speakers are observed, while between-cue relationships across speakers are much weaker, suggesting that structured variability is constrained by the language-specific phonetic implementation of linguistic contrasts.


Asunto(s)
Percepción del Habla , Voz , Señales (Psicología) , Japón , Fonética , Acústica del Lenguaje
3.
Open Mind (Camb) ; 7: 350-391, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37637302

RESUMEN

Words that are more surprising given context take longer to process. However, no incremental parsing algorithm has been shown to directly predict this phenomenon. In this work, we focus on a class of algorithms whose runtime does naturally scale in surprisal-those that involve repeatedly sampling from the prior. Our first contribution is to show that simple examples of such algorithms predict runtime to increase superlinearly with surprisal, and also predict variance in runtime to increase. These two predictions stand in contrast with literature on surprisal theory (Hale, 2001; Levy, 2008a) which assumes that the expected processing cost increases linearly with surprisal, and makes no prediction about variance. In the second part of this paper, we conduct an empirical study of the relationship between surprisal and reading time, using a collection of modern language models to estimate surprisal. We find that with better language models, reading time increases superlinearly in surprisal, and also that variance increases. These results are consistent with the predictions of sampling-based algorithms.

4.
J Acoust Soc Am ; 132(6): 3965-79, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23231126

RESUMEN

A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets.


Asunto(s)
Algoritmos , Fonética , Procesamiento de Señales Asistido por Computador , Acústica del Lenguaje , Medición de la Producción del Habla/métodos , Calidad de la Voz , Inteligencia Artificial , Automatización , Análisis Discriminante , Humanos , Modelos Lineales , Reconocimiento de Normas Patrones Automatizadas , Periodicidad , Reproducibilidad de los Resultados , Espectrografía del Sonido , Factores de Tiempo
5.
Front Artif Intell ; 3: 38, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33733155

RESUMEN

Recent advances in access to spoken-language corpora and development of speech processing tools have made possible the performance of "large-scale" phonetic and sociolinguistic research. This study illustrates the usefulness of such a large-scale approach-using data from multiple corpora across a range of English dialects, collected, and analyzed with the SPADE project-to examine how the pre-consonantal Voicing Effect (longer vowels before voiced than voiceless obstruents, in e.g., bead vs. beat) is realized in spontaneous speech, and varies across dialects and individual speakers. Compared with previous reports of controlled laboratory speech, the Voicing Effect was found to be substantially smaller in spontaneous speech, but still influenced by the expected range of phonetic factors. Dialects of English differed substantially from each other in the size of the Voicing Effect, whilst individual speakers varied little relative to their particular dialect. This study demonstrates the value of large-scale phonetic research as a means of developing our understanding of the structure of speech variability, and illustrates how large-scale studies, such as those carried out within SPADE, can be applied to other questions in phonetic and sociolinguistic research.

6.
Front Psychol ; 10: 821, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31040809

RESUMEN

A central question in the Japanese high vowel devoicing literature concerns whether vowels are devoiced through a categorical process or via gradient reduction. Examining how vowel height and consonantal voicing condition phrase-internal CV duration in a corpus of spontaneous Tokyo Japanese, it was found that CVs containing high vowels are substantially shorter before voiceless consonants, whilst non-high vowels do not exhibit comparable shortening. This quantitative difference between CV durations suggests a controlled temporal compression of the CV, consistent with views that Japanese vowel devoicing is produced through a categorical process targeting high vowels preceding voiceless consonants, and supports previous observations made of elicited productions.

7.
J Speech Lang Hear Res ; 61(10): 2487-2501, 2018 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-30458531

RESUMEN

Purpose: Heterogeneous child speech was force-aligned to investigate whether (a) manipulating specific parameters could improve alignment accuracy and (b) forced alignment could be used to replicate published results on acoustic characteristics of /s/ production by children. Method: In Part 1, child speech from 2 corpora was force-aligned with a trainable aligner (Prosodylab-Aligner) under different conditions that systematically manipulated input training data and the type of transcription used. Alignment accuracy was determined by comparing hand and automatic alignments as to how often they overlapped (%-Match) and absolute differences in duration and boundary placements. Using mixed-effects regression, accuracy was modeled as a function of alignment conditions, as well as segment and child age. In Part 2, forced alignments derived from a subset of the alignment conditions in Part 1 were used to extract spectral center of gravity of /s/ productions from young children. These findings were compared to published results that used manual alignments of the same data. Results: Overall, the results of Part 1 demonstrated that using training data more similar to the data to be aligned as well as phonetic transcription led to improvements in alignment accuracy. Speech from older children was aligned more accurately than younger children. In Part 2, /s/ center of gravity extracted from force-aligned segments was found to diverge in the speech of male and female children, replicating the pattern found in previous work using manually aligned segments. This was true even for the least accurate forced alignment method. Conclusions: Alignment accuracy of child speech can be improved by using more specific training and transcription. However, poor alignment accuracy was not found to impede acoustic analysis of /s/ produced by even very young children. Thus, forced alignment presents a useful tool for the analysis of child speech. Supplemental Material: https://doi.org/10.23641/asha.7070105.


Asunto(s)
Habla/fisiología , Factores de Edad , Niño , Lenguaje Infantil , Preescolar , Femenino , Humanos , Masculino , Modelos Estadísticos , Fonética , Acústica del Lenguaje , Medición de la Producción del Habla/métodos , Software de Reconocimiento del Habla
8.
J Acoust Soc Am ; 122(3): 1735, 2007 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-17927433

RESUMEN

A model of acoustic coupling between the oral and subglottal cavities is developed and predicts attenuation of and discontinuities in vowel formant prominence near resonances of the subglottal system. One discontinuity occurs near the second subglottal resonance (SubF2), at 1300-1600 Hz, suggesting the hypothesis that this is a quantal effect [K. N. Stevens, J. Phonetics 17, 3-46 (1989)] dividing speakers' front and back vowels. Recordings of English vowels (in /hVd/ environments) for three male and three female speakers were made, while an accelerometer attached to the neck area was used to capture the subglottal waveform. Average speaker SubF2 values range from 1280 to 1620 Hz, in agreement with prior work. Attenuation of 5-12 dB of second formant prominence near SubF2 is found to occur in all back-front diphthongs analyzed, while discontinuities in the range of 50-300 Hz often occur, in good agreement with the resonator model. These coupling effects are found to be generally stronger for open-phase than for closed-phase measurements. The implications for a quantal relation between coupling effects near SubF2 and [back] are discussed.


Asunto(s)
Percepción Auditiva/fisiología , Glotis/fisiología , Lenguaje , Inteligibilidad del Habla/fisiología , Calidad de la Voz , Acústica , Femenino , Humanos , Masculino , Boca , Fonación , Fonética , Teoría Cuántica
9.
J Mem Lang ; 75: 159-180, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25089073

RESUMEN

We explored how phonological network structure influences the age of words' first appearance in children's (14-50 months) speech, using a large, longitudinal corpus of spontaneous child-caregiver interactions. We represent the caregiver lexicon as a network in which each word is connected to all of its phonological neighbors, and consider both words' local neighborhood density (degree), and also their embeddedness among interconnected neighborhoods (clustering coefficient and coreness). The larger-scale structure reflected in the latter two measures is implicated in current theories of lexical development and processing, but its role in lexical development has not yet been explored. Multilevel discrete-time survival analysis revealed that children are more likely to produce new words whose network properties support lexical access for production: high degree, but low clustering coefficient and coreness. These effects appear to be strongest at earlier ages and largely absent from 30 months on. These results suggest that both a word's local connectivity in the lexicon and its position in the lexicon as a whole influences when it is learned, and they underscore how general lexical processing mechanisms contribute to productive vocabulary development.

10.
PLoS One ; 8(9): e74746, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24098665

RESUMEN

Numerous studies have documented the phenomenon of phonetic imitation: the process by which the production patterns of an individual become more similar on some phonetic or acoustic dimension to those of her interlocutor. Though social factors have been suggested as a motivator for imitation, few studies has established a tight connection between language-external factors and a speaker's likelihood to imitate. The present study investigated the phenomenon of phonetic imitation using a within-subject design embedded in an individual-differences framework. Participants were administered a phonetic imitation task, which included two speech production tasks separated by a perceptual learning task, and a battery of measures assessing traits associated with Autism-Spectrum Condition, working memory, and personality. To examine the effects of subjective attitude on phonetic imitation, participants were randomly assigned to four experimental conditions, where the perceived sexual orientation of the narrator (homosexual vs. heterosexual) and the outcome (positive vs. negative) of the story depicted in the exposure materials differed. The extent of phonetic imitation by an individual is significantly modulated by the story outcome, as well as by the participant's subjective attitude toward the model talker, the participant's personality trait of openness and the autistic-like trait associated with attention switching.


Asunto(s)
Conducta Imitativa/fisiología , Modelos Psicológicos , Fonética , Percepción del Habla/fisiología , Estimulación Acústica , Atención/fisiología , Actitud , Homosexualidad Masculina , Humanos , Masculino , Memoria a Corto Plazo/fisiología , Personalidad/fisiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA