RESUMEN
Nonword pronunciation is a critical challenge for models of reading aloud but little attention has been given to identifying the best method for assessing model predictions. The most typical approach involves comparing the model's pronunciations of nonwords to pronunciations of the same nonwords by human participants and deeming the model's output correct if it matches with any transcription of the human pronunciations. The present paper introduces a new ratings-based method, in which participants are shown printed nonwords and asked to rate the plausibility of the provided pronunciations, generated here by a speech synthesiser. We demonstrate this method with reference to a previously published database of 915 disyllabic nonwords (Mousikou et al., 2017). We evaluated two well-known psychological models, RC00 and CDP++, as well as an additional grapheme-to-phoneme algorithm known as Sequitur, and compared our model assessment with the corpus-based method adopted by Mousikou et al. We find that the ratings method: a) is much easier to implement than a corpus-based method, b) has a high hit rate and low false-alarm rate in assessing nonword reading accuracy, and c) provided a similar outcome as the corpus-based method in its assessment of RC00 and CDP++. However, the two methods differed in their evaluation of Sequitur, which performed much better under the ratings method. Indeed, our evaluation of Sequitur revealed that the corpus-based method introduced a number of false positives and more often, false negatives. Implications of these findings are discussed.
Asunto(s)
Fonética , Lectura , Humanos , Atención , Modelos Psicológicos , AlgoritmosRESUMEN
Velum position was analysed as a function of vowel height in German tense and lax vowels preceding a nasal or oral consonant. Findings from previous research suggest an interdependence between vowel height and the degree of velum lowering, with a higher velum during high vowels and a more lowered velum during low vowels. In the current study, data were presented from 33 native speakers of Standard German who were measured via non-invasive high quality real-time magnetic resonance imaging. The focus was on exploring the spatiotemporal extent of velum lowering in tense and lax /a, i, o, ø/, which was done by analysing velum movement trajectories over the course of VN and VC sequences in CVNV and CVCV sequences by means of functional principal component analysis. Analyses focused on the impact of the vowel category and vowel tenseness. Data indicated that not only the position of the velum was affected by these factors but also the timing of velum closure. Moreover, it is argued that the effect of vowel height was to be better interpreted in terms of the physiological constriction location of vowels, i.e., the specific tongue position rather than phonetic vowel height.
Asunto(s)
Fonética , Lengua , Humanos , Lengua/fisiología , Movimiento , Constricción Patológica , Imagen por Resonancia Magnética , Habla/fisiología , Acústica del LenguajeRESUMEN
An acoustic analysis was made of the speech characteristics of individuals recorded before and during a prolonged stay in Antarctica. A computational model was used to predict the expected changes due to close contact and isolation, which were then compared with the actual recorded productions. The individuals were found to develop the first stages of a common accent in Antarctica whose phonetic characteristics were in some respects predicted by the computational model. These findings suggest that the phonetic attributes of a spoken accent in its initial stages emerge through interactions between individuals causing speech production to be incrementally updated.
RESUMEN
This study is concerned with the aperture of the mid vowel /E/ in nonfinal syllables in Quebec French. The hypothesis tested is that in underived disyllabic words, the aperture of /E/ would be determined via harmony with the following vowel. Based on predictions from a classifier trained on acoustic properties of word-final vowels, nonfinal vowels were labeled as mid-close or mid-open. Although distant coarticulatory effects were observed, the harmony hypothesis was not supported. The results revealed a bias toward a mid-open quality and a reduced acoustic distinction, which warrant further investigation.
RESUMEN
Second dialect acquisition (SDA) can be defined as the process through which geographically mobile individuals adapt to new dialect features of their first language. Two common methodological approaches in SDA studies could lead to underestimating the phonetic changes that mobile speakers may experience: only large phonetic differences between dialects are considered, and external sources are used to infer what should have been the speakers' original dialect. By contrast, in this study, we carry out a longitudinal analysis to empirically assess the speakers' baseline and shift away from it with no priors as to which features should change or not. Furthermore, we focus on Quebec French, a variety with a relatively crowded vowel space. Using Mahalanobis distances, we measure how acoustic characteristics of vowels produced by 15 mobile speakers change relative to those of a control group of 8 sedentary speakers, with the mobile participants recorded right after they moved to Quebec City, then a year later. Overall, the results show a reduction of Mahalanobis distances over time, indicating convergence toward the control system. Convergence also tends to be greater in denser areas of the vowel space. These results suggest that phonetic changes during SDA could be finer than previously thought. This study calls for the use of methodological approaches that can reveal such trends, and contributes to uncovering the extent of phonetic flexibility during adulthood.
Asunto(s)
Lenguaje , Acústica del Lenguaje , Humanos , Adulto , Quebec , Fonética , AcústicaRESUMEN
Articulatory and acoustic reduction can manifest itself in the temporal and spectral domains. This study introduces a measure of spectral reduction, which is based on the speech decoding techniques commonly used in automatic speech recognizers. Using data for four frequent Dutch affixes from a large corpus of spontaneous face-to-face conversations, it builds on an earlier study examining the effects of lexical frequency on durational reduction in spoken Dutch [Pluymaekers, M. et al. (2005). J. Acoust. Soc. Am. 118, 2561-2569], and compares the proposed measure of spectral reduction with duration as a measure of reduction. The results suggest that the spectral reduction scores capture other aspects of reduction than duration. While duration can--albeit to a moderate degree--be predicted by a number of linguistically motivated variables (such as word frequency, segmental context, and speech rate), the spectral reduction scores cannot. This suggests that the spectral reduction scores capture information that is not directly accounted for by the linguistically motivated variables. The results also show that the spectral reduction scores are able to predict a substantial amount of the variation in duration that the linguistically motivated variables do not account for.