RESUMEN
Previous research has shown that when domain-general transitional probability (TP) cues to word segmentation are in conflict with language-specific stress cues, English-learning 5- and 7-month-olds rely on TP, whereas 9-month-olds rely on stress. In two artificial languages, we evaluated English-learning infants' sensitivity to TP cues to word segmentation vis-a-vis language-specific vowel phonotactic (VP) cues-English words do not end in lax vowels. These cues were either consistent or conflicting. When these cues were in conflict, 10-month-olds relied on the VP cues, whereas 5-month-olds relied on TP. These findings align with statistical bootstrapping accounts, where infants initially use domain-general distributional information for word segmentation, and subsequently discover language-specific patterns based on segmented words. RESEARCH HIGHLIGHTS: Research indicates that when transitional probability (TP) conflicts with stress cues for word segmentation, English-learning 9-month-olds rely on stress, whereas younger infants rely on TP. In two artificial languages, we evaluated English-learning infants' sensitivity to TP versus vowel phonotactic (VP) cues for word segmentation. When these cues conflicted, 10-month-olds relied on VPs, whereas 5-month-olds relied on TP. These findings align with statistical bootstrapping accounts, where infants first utilize domain-general distributional information for word segmentation, and then identify language-specific patterns from segmented words.
Asunto(s)
Señales (Psicología) , Desarrollo del Lenguaje , Fonética , Percepción del Habla , Humanos , Lactante , Masculino , Femenino , Percepción del Habla/fisiología , Lenguaje , VocabularioRESUMEN
The phonetic potential of nonhuman primate vocal tracts has been the subject of considerable contention in recent literature. Here, the work of Philip Lieberman (1934-2022) is considered at length, and two research papers-both purported challenges to Lieberman's theoretical work-and a review of Lieberman's scientific legacy are critically examined. I argue that various aspects of Lieberman's research have been consistently misinterpreted in the literature. A paper by Fitch et al. overestimates the would-be "speech-ready" capacities of a rhesus macaque, and the data presented nonetheless supports Lieberman's principal position-that nonhuman primates cannot articulate the full extent of human speech sounds. The suggestion that no vocal anatomical evolution was necessary for the evolution of human speech (as spoken by all normally developing humans) is not supported by phonetic or anatomical data. The second challenge, by Boë et al., attributes vowel-like qualities of baboon calls to articulatory capacities based on audio data; I argue that such "protovocalic" properties likely result from disparate articulatory maneuvers compared to human speakers. A review of Lieberman's scientific legacy by Boë et al. ascribes a view of speech evolution (which the authors term "laryngeal descent theory") to Lieberman, which contradicts his writings. The present article documents a pattern of incorrect interpretations of Lieberman's theoretical work in recent literature. Finally, the apparent trend of vowel-like formant dispersions in great ape vocalization literature is discussed with regard to Lieberman's theoretical work. The review concludes that the "Lieberman account" of primate vocal tract phonetic capacities remains supported by research: the ready articulation of fully human speech reflects species-unique anatomy.
Asunto(s)
Fonética , Primates , Vocalización Animal , Animales , Primates/fisiología , Primates/anatomía & histología , Humanos , Historia del Siglo XX , Habla/fisiología , Evolución BiológicaRESUMEN
The emergence of multiethnolects, i.e. specific speaking styles or varieties associated with second and third generation speakers from immigrant backgrounds, has been observed and studied in several major cities in Europe and elsewhere in the world. The multiethnolect that is the focus of this study is one such variety of colloquial German. Most previous research on multiethnolectal German has focused on grammatical features. This paper reports on the first comprehensive study of the vowel system (vowel quality and global vowel space size) of multiethnolectal German, based on data from Stuttgart. The results show that the vowel space of multiethnolectal speakers is in generally more centralized than that of a comparison group. A more detailed analysis reveals that the linguistic background plays an important role, as speakers with a Turkish or South Slavonic language background are responsible for this effect.
RESUMEN
Vowel-initial glottalization constitutes a cue to prosodic prominence, realized on a strength continuum from creaky phonation to complete glottal stops. While there is considerable research on children's early utilization of acoustic cues for stress marking, less is understood about the specific implementation of vowel-initial glottalization in American English. Eight sequences of function + novel words were elicited from groups of 5-to-8-year-olds, 8-to-11-year-olds, and adults. Children exhibit a similar rate of prevocalic glottalization to adults but differ in its phonetic implementation, producing a higher rate of glottal stops compared to creaky phonation with respect to adults.
RESUMEN
We examined the neurophysiological underpinnings of lexical-tone and vowel-quality perception in learners of a non-tonal language. We tested 25 6- and 25 9-month-old German-learning infants, as well as 24 German adults and expected developmental differences for the two linguistic properties, as they are both carried by vowels, but have a different status in German. In adults, both lexical-tone and vowel-quality contrasts elicited mismatch negativities, with a stronger response to the vowel-quality contrast. Six-month-olds showed positive mismatch responses for lexical-tone and vowel-quality contrasts, with an emerging negative mismatch response for vowel-quality only. The negative mismatch responses became more pronounced for the vowel-quality contrast at 9 months, while the lexical-tone contrast elicited mainly positive mismatch responses. Our data reveal differential developmental changes in the processing of vowel properties that differ in their lexical relevance in the ambient language.
RESUMEN
Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.
Asunto(s)
Programas Informáticos , Humanos , Fonética , Habla/fisiología , Acústica del Lenguaje , Pliegues Vocales/fisiología , AcústicaRESUMEN
Children with cochlear implants (CI) communicate in noisy environments, such as in classrooms, where multiple talkers and reverberation are present. Speakers compensate for noise via the 'Lombard effect'. The present study examined the Lombard effect on the intensity and duration of stressed vowels in the speech of children with Cochlear Implants (CIs) as compared to children with Normal Hearing (NH), focusing on the effects of speech-shaped noise (SSN) and speech-shaped noise with reverberation (SSN+Reverberation). The sample consisted of 7 children with CIs and 7 children with NH, aged 7-12 years. Regarding intensity, a) children with CIs produced stressed vowels with an overall greater intensity across acoustic conditions as compared to NH peers, b) both groups increased their stressed vowel intensity for all vowels from Quiet to both noise conditions, and c) children with NH further increased their intensity when reverberation was added to SSN, esp. for the vowel /u/. Regarding duration, longer stressed vowels were produced by children with CIs as compared to NH in Quiet and SSN conditions but the effect was retained only for the vowels /i/, /o/ and /u/ when reverberation was added to noise. The SSN+Reverberation condition induced systematic lengthening in stressed vowels for children with NH. Furthermore, although greater intensity and duration ratios of stressed/unstressed syllables were observed for children with NH as compared to CIs in Quiet condition, they diminished with noise. The differences observed across groups have implications for speaking in classroom noise.
Asunto(s)
Implantes Cocleares , Ruido , Percepción del Habla , Humanos , Niño , Masculino , Femenino , Fonética , Acústica del Lenguaje , Sordera/rehabilitaciónRESUMEN
OBJECTIVE: Word list-learning tasks are commonly used to evaluate auditory-verbal learning and memory. However, different frequencies of word usage, subtle meaning nuances, unique word phonology, and different preexisting associations among words make translation across languages difficult. We administered lists of consonant-vowel-consonant (CVC) nonword trigrams to independent American and Italian young adult samples. We evaluated whether an auditory list-learning task using CVC nonword trigrams instead of words could be applied cross-culturally to evaluate similar learning and associative memory processes. PARTICIPANTS AND METHODS: Seventy-five native English-speaking (USA) and 104 native Italian-speaking (Italy) university students were administered 15-item lists of CVC trigrams using the Rey Auditory Verbal Learning Test paradigm with five study-test trials, an interference trial, and short- and long-term delayed recall. Bayesian t tests and mixed-design ANOVAs contrasted the primary learning indexes across the two samples and biological sex. RESULTS: Performance was comparable between nationalities on all primary memory indices except the interference trial (List B), where the Italian group recalled approximately one item more than the American sample. For both nationalities, recall increased across the five learning trials and declined significantly on the postinterference trial, demonstrating susceptibility to retroactive interference. No effects of sex, age, vocabulary, or depressive symptoms were observed. CONCLUSIONS: Using lists of unfamiliar nonword CVC trigrams, Italian and American younger adults showed a similar performance pattern across immediate and delayed recall trials. Whereas word list-learning performance is typically affected by cultural, demographic, mood, and cognitive factors, this trigram list-learning task does not show such effects, demonstrating its utility for cross-cultural memory assessment.
Asunto(s)
Comparación Transcultural , Aprendizaje , Adulto Joven , Humanos , Teorema de Bayes , Memoria , Aprendizaje Verbal , Recuerdo MentalRESUMEN
Long-term rigorous musical training promotes various aspects of spoken language processing. However, it is unclear whether musical training provides an advantage in recognizing segmental and suprasegmental information of spoken language. We used vowel and tone violations in spoken unfamiliar seven-character quatrains and a rhyming judgment task to investigate the effects of musical training on tone and vowel processing by recording ERPs. Compared with non-musicians, musicians were more accurate and responded faster to incorrect than correct tones. Musicians showed larger P2 components in their ERPs than non-musicians during both tone and vowel processing, revealing increased focused attention on sounds. Both groups showed enhanced N400 and LPC for incorrect vowels (vs. correct vowels) but non-musicians showed an additional P2 effect for vowel violations. Moreover, both groups showed enhanced LPC for incorrect tones (vs. correct tones) but only non-musicians showed an additional N400 effect for tone violations. These results indicate that vowel/tone processing is less effortful for musicians (vs. non-musicians). Our study suggests that long-term musical training facilitates speech tone and vowel processing in a tonal language environment by increasing the attentional focus on speech and reducing demands for detecting incorrect vowels and integration costs for tone changes.
Asunto(s)
Música , Percepción del Habla , Femenino , Humanos , Masculino , Estimulación Acústica , Electroencefalografía , Potenciales Evocados , Lenguaje , Poesía como AsuntoRESUMEN
Much previous research on spelling and reading development has focused on single-syllable words. Here we examined disyllables, asking how learners of English mark the distinction between short and long first-syllable vowels by use of vowel digraphs and double-consonant digraphs. In a behavioral study, we asked participants in Grade 2 (n = 32, mean age â¼8 years), Grade 4 (n = 33, mean age â¼10 years), Grade 6 (n = 32, mean age â¼12 years), and university (n = 32; mean age â¼20 years) to spell nonwords with short and long first-syllable vowels. We found an increase across grade levels in use of vowel digraphs to represent long vowels, and we also found increasing use of double-consonant digraphs after short vowels. Participants generally avoided using both a vowel digraph and a following consonant digraph. In a vocabulary analysis, we examined use of vowel and double-consonant digraphs in the words to which readers of different grade levels are exposed. Children used vowel digraphs less often than anticipated on the basis of the vocabulary statistics, but university students used them at similar rates. For double-consonant digraphs after short vowels, rates of digraph use were lower in the behavioral data than in the vocabulary data even for university students. These results point to the difficulty of spelling a phoneme with multiple letters when those letters simultaneously spell another sound in a word. We discuss the results in terms of the roles of statistical learning and explicit instruction in the development of spelling.
Asunto(s)
Lenguaje , Fonética , Niño , Humanos , Adulto Joven , Adulto , Vocabulario , Aprendizaje , LecturaRESUMEN
Perceiving and producing English phonemic vowel length contrasts is challenging for non-native speakers. According to multi-time resolution models, endogenous slow/fast rhythms contribute, respectively, in the right/left hemispheres, to long/short acoustic cue processing. This study introduced a perceptual training method implementing dichotic stimulation to improve /i:/-/ɪ/ processing by promoting hemispheric complementarity. Twenty non-dyslexic and 20 dyslexic French adults received 1 hr-training over 3 days. Productions were evaluated with pre-/post-tests. Training enhanced vowel duration contrast in word production by /i:/ lengthening and /ɪ/ shortening in both groups. Adults with dyslexia compensated fewer /i:/ lengthening by /ɪ/ shortening than did non-dyslexic adults. Transfer from perceptual training to production seems possible for foreign-language learning even in dyslexic adults. The extent to which dichotic presentation contributed to training effectiveness cannot be evaluated here, but the triggering of lengthening and shortening mechanisms suggests that lateralized complementary skills have been enhanced by dichotic stimulation.
Asunto(s)
Dislexia , Adulto , Humanos , Lenguaje , AprendizajeRESUMEN
The vowel system of Catalan has been the focus of many studies, though work on the varieties spoken on the island of Eivissa (Ibiza) are scarce, with a single mention of the possible merger of the mid back vowels /o, É/ (Torres Torres, Marià. 1983. Aspectes del vocalisme tònic eivissenc. Eivissa 14. 22-23). The present article provides the first acoustic analysis of the vowel inventory of 25 young native speakers of Eivissan Catalan, with a focus on the realisations of stressed /É, É/, and the back mid vowels /o, É/. We employed Pillai scores (Hay, Jennifer, Paul Warren & Katie Drager. 2006. Factors influencing speech perception in the context of a merger-in-progress. Journal of Phonetics 34. 458-484) to compare the possibly merged pairs /É, É/ and /o, É/ to the fully-contrasting neighbouring pairs /e, É/ and /o, u/. Our results show that all participants had considerable overlap of stressed /É/ and /É/, and all but one had considerable overlap of the back mid vowels, while the fully contrastive pairs (/e, É/ and /o, u/) showed almost no overlap.
Asunto(s)
Acústica del Lenguaje , Percepción del Habla , Humanos , Lenguaje , Acústica , FonéticaRESUMEN
This study examines how variation in F0 and intensity impacts the perception of American English vowels. Both properties vary intrinsically as a function of vowel features in the speech production literature, raising the question of the perceptual impact of each. In addition to considering listeners' interpretation of either cue as an intrinsic property of the vowel, the possible prominence-marking function of each is considered. Two patterns of prominence strengthening in vowels, sonority expansion and hyperarticulation, are tested in light of recent findings that contextual prominence impacts vowel perception in line with these effects (i.e. a prominent vowel is expected by listeners to be realized as if it had undergone prominence strengthening). Across four vowel contrasts with different height and frontness features, listeners categorized phonetic continua with variation in formants, F0 and intensity. Results show that variation in level F0 height is interpreted as an intrinsic cue by listeners. Higher F0 cues a higher vowel, following intrinsic F0 effects in the production literature. In comparison, intensity is interpreted as a prominence-lending cue, for which effect directionality is dependent on vowel height. Higher intensity high vowels undergo perceptual re-calibration in line with (acoustic) hyperarticulation, whereas higher intensity non-high vowels undergo perceptual re-calibration in line with sonority expansion.
Asunto(s)
Señales (Psicología) , Percepción del Habla , Humanos , Lenguaje , Habla , Fonética , Acústica del LenguajeRESUMEN
The diglossic situation in German-speaking Switzerland entails that both an Alemannic dialect and a Swiss standard variety of German are spoken. One phonological property of both Alemannic and Swiss Standard German (SSG) is contrastive quantity not only in vowels but also in consonants, namely lenis and fortis. This study aims to compare vowel and plosive closure durations as well as articulation rate (AR) between Alemannic and SSG in the varieties spoken in a rural area of the canton of Lucerne (LU) and an urban area of the canton of Zurich (ZH). In addition to the segment durations, an additional measure of vowel-to-vowel + consonant duration (V/(V + C)) ratios is calculated in order to account for possible compensation between vowel and closure durations. Stimuli consisted of words containing different vowel-consonant (VC) combinations. The main differences found are longer segment durations in Alemannic compared to SSG, three phonetic vowel categories in Alemannic that differ between LU and ZH, three stable V/(V + C) ratio categories, and three phonetic consonant categories lenis, fortis, and extrafortis in both Alemannic and SSG. Most importantly, younger ZH speakers produced overall shorter closure durations, calling into question a possible reduction of consonant categories due to a contact to German Standard German (GSG).
Asunto(s)
Lenguaje , Fonética , Humanos , Suiza , Factores de Tiempo , Medicago sativaRESUMEN
This study investigated attention control in L2 phonological processing from a cognitive individual differences perspective, to determine its role in predicting phonological acquisition in adult L2 learning. Participants were 21 L1-Spanish learners of English, and 19 L1-English learners of Spanish. Attention control was measured through a novel speech-based attention-switching task. Phonological processing was assessed through a speeded ABX categorization task (perception) and a delayed sentence repetition task (production). Correlational analyses indicated that learners with more efficient attention switching skill and faster speed in correctly identifying the target phonetic features in the speech dimension under focus could perceptually discriminate L2 vowels at higher processing speed, but not at higher accuracy rates. Thus, attentional flexibility provided a processing advantage for difficult L2 contrasts but did not predict the extent to which precise representations for the target L2 vowels had been established. However, attention control was related to L2 learners' ability to distinguish the contrasting L2 vowels in production. In addition, L2 learners' accuracy in perceptually distinguishing between two contrasting vowels was significantly related to how much of a quality distinction between them they could make in production.
Asunto(s)
Multilingüismo , Percepción del Habla , Adulto , Humanos , Individualidad , Lenguaje , Fonética , AtenciónRESUMEN
The aim of this study was to investigate the acoustic vowel space area in infant directed speech (IDS). The research question is whether the vowel space is expanded or remains constant in IDS. A corpus of spontaneous interactions of 9 dyads followed monthly from the age of 6 to 24 months was analyzed. The occurrences in the parents' speech of each word that the children eventually acquired were extracted. The surface of the vowel triangle and the convex hull of all vowels were computed. The main result is that the development of the vowel space in IDS follows an inverted U-shaped curve: the vowel space starts relatively small, gradually increases as the child's first word use approaches, and decreases again afterwards. These findings show that parents adapt their articulation to the evolving linguistic abilities of their child, and this adaptation can be detected at the level of individual lexical items.
Asunto(s)
Desarrollo del Lenguaje , Percepción del Habla , Lactante , Humanos , Niño , Preescolar , Fonética , Lenguaje Infantil , Habla , Padres , Acústica del LenguajeRESUMEN
Impaired speech sound production adds difficulties to social communication in children with Autism Spectrum Disorder (ASD), while a limited attempt has been made to figure out the speech sound production among Mandarin-speaking children with ASD. The current study conducted both auditory-perceptual scoring and quantitative acoustic analysis of speech sound imitated by 27 Mandarin-speaking children with ASD (3.33-7.00 years) and 30 chronological-age-matched typically developing (TD) children. Auditory-perceptual scoring showed significantly lower scores for aspirated/unaspirated consonants and monophthongs in children with ASD. Moreover, the correlation between the developmental age of language and production accuracy in children with ASD emphasised the importance of language assessment. The quantitative acoustic analysis further indicated that the ASD group produced a much shorter voice onset time for aspirated consonants and showed a reduced vowel space than the TD group. Early interventions focusing on these production patterns should be introduced to improve the speech sound production in Mandarin-speaking children with ASD.
RESUMEN
OBJECTIVE: This study aimed to systematically review and critically appraise the literature describing the phonetic characteristics and accuracy of the consonants, vowels and tones produced by Mandarin-speaking children with cochlear implants (CIs). DESIGN: The protocol in this review was designed in conformity with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. EBSCOhost, PubMed, Scopus, PsycINFO, ProQuest Central databases were searched for relevant articles which met the inclusion criteria. STUDY SAMPLE: A total of 18 journal papers were included in this review. RESULTS: The results revealed that Mandarin-speaking children with CIs perform consistently more poorly in their production of consonants, in particular on fricatives, have a smaller and less well-defined vowel space, and exhibit greater difficulties in tone realisation, notably T2 and T3, when compared to their normal-hearing (NH) peers. The results from acoustic and accuracy analyses are negatively correlated with CI implantation age, but largely positively correlated with hearing age. CONCLUSIONS: Findings of this review highlight the factors that influence consonant, vowel and tone production in Mandarin-speaking children with CIs, thereby providing critical information for clinicians and researchers working with this population.
Asunto(s)
Implantación Coclear , Implantes Cocleares , Sordera , Percepción del Habla , Niño , Implantación Coclear/métodos , Sordera/cirugía , Humanos , Fonética , HablaRESUMEN
BACKGROUND: Previous research has found that high-frequency energy of speech signals decreased while wearing face masks. However, no study has examined the specific spectral characteristics of fricative consonants and vowels and the perception of clarity of speech in mask wearing. AIMS: To investigate acoustic-phonetic characteristics of fricative consonants and vowels and auditory perceptual rating of clarity of speech produced with and without wearing a face mask. METHODS & PROCEDURES: A total of 16 healthcare workers read the Rainbow Passage using modal phonation in three conditions: without a face mask, with a standard surgical mask and with a KN95 mask (China GB2626-2006, a medical respirator with higher barrier level than the standard surgical mask). Speech samples were acoustically analysed for root mean square (RMS) amplitude (ARMS ) and spectral moments of four fricatives /f/, /s/, /Ê/ and /z/; and amplitude of the first three formants (A1, A2 and A3) measured from the reading passage and extracted vowels. Auditory perception of speech clarity was performed. Data were compared across mask and non-mask conditions using linear mixed models. OUTCOMES & RESULTS: The ARMS of all included fricatives was significantly lower in surgical mask and KN95 mask compared with non-mask condition. Centre of gravity of /f/ decreased in both surgical and KN95 mask while other spectral moments did not show systematic significant linear trends across mask conditions. None of the formant amplitude measures was statistically different across conditions. Speech clarity was significantly poorer in both surgical and KN95 mask conditions. CONCLUSIONS & IMPLICATIONS: Speech produced while wearing either a surgical mask or KN95 mask was associated with decreased fricative amplitude and poorer speech clarity. WHAT THIS PAPER ADDS: What is already known on the subject Previous studies have shown that the overall spectral levels in high frequency ranges and intelligibility are decreased for speech produced with a face mask. It is unclear how different types of the speech signals that is, fricatives and vowels are presented in speech produced with wearing either a medical surgical or KN95 mask. It is also unclear whether ratings of speech clarity are similar for speech produced with these face masks. What this paper adds to existing knowledge Speech data collected using a real-world, clinical and non-laboratory-controlled settings showed differences in the amplitude of fricatives and speech clarity ratings between non-mask and mask-wearing conditions. Formant amplitude did not show significant differences in mask-wearing conditions compared with non-mask. What are the potential or actual clinical implications of this work? Wearing a surgical mask or a KN95 mask had different effects on consonants and vowels. It appeared from the findings in this study that these masks only affected fricative consonants and did not affect vowel production. The poorer speech clarity in these mask-wearing conditions has important implications for speech perception in communication between clinical staff and between medical officers and patients in clinics, and between people in everyday situations. The impact of these masks on speech perception may be more pronounced in people with hearing impairment and communication disorders. In voice evaluation and/or therapy sessions, the effects of wearing a medical mask can occur bidirectionally for both the clinician and the patient. The patient may find it more challenging to understand the speech conveyed by the clinician while the clinician may not perceptually assess patient's speech and voice accurately. Given the significant correlation between clarity ratings and fricative amplitude, improving fricative signals would be useful to improve speech clarity while wearing these medical face masks.
Asunto(s)
Percepción del Habla , Habla , Acústica , Humanos , Fonética , Acústica del Lenguaje , Trastornos del HablaRESUMEN
The current study extends traditional perceptual high-variability phonetic training (HVPT) in a foreign language learning context by implementing a comprehensive training paradigm that combines perception (discrimination and identification) and production (immediate repetition) training tasks and by exploring two potentially enhancing training conditions: the use of non-lexical training stimuli and the presence of masking noise during production training. We assessed training effects on L1-Spanish/Catalan bilingual EFL learners' production of a difficult English vowel contrast (/æ/-/Ê/). The participants (N = 62) were randomly assigned to either non-lexical (N = 24) or lexical (N = 24) training and were further subdivided into two groups, one trained in noise (N = 12) and one in silence (N = 12). An untrained control group (N = 14) was also tested. Training gains, measured through spectral distance scores (Euclidean distances) with respect to native speakers' productions of /æ/ and /Ê/, were assessed through delayed word and sentence repetition tasks. The results showed an advantage of non-lexical training over lexical training, detrimental effects of noise for participants trained with nonwords, but not for those trained with words, and less accurate production of vowels elicited in isolated words than in words embedded in sentences, where training gains were only observable for participants trained with nonwords.