Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 75
Filter
1.
Cognition ; 245: 105734, 2024 04.
Article in English | MEDLINE | ID: mdl-38335906

ABSTRACT

Infants learn their native language(s) at an amazing speed. Before they even talk, their perception adapts to the language(s) they hear. However, the mechanisms responsible for this perceptual attunement and the circumstances in which it takes place remain unclear. This paper presents the first attempt to study perceptual attunement using ecological child-centered audio data. We show that a simple prediction algorithm exhibits perceptual attunement when applied on unrealistic clean audio-book data, but fails to do so when applied on ecologically-valid child-centered data. In the latter scenario, perceptual attunement only emerges when the prediction mechanism is supplemented with inductive biases that force the algorithm to focus exclusively on speech segments while learning speaker-, pitch-, and room-invariant representations. We argue these biases are plausible given previous research on infants and non-human animals. More generally, we show that what our model learns and how it develops through exposure to speech depends exquisitely on the details of the input signal. By doing so, we illustrate the importance of considering ecologically valid input data when modeling language acquisition.


Subject(s)
Phonetics , Speech Perception , Humans , Infant , Language Development , Learning , Language
2.
J Child Lang ; 50(6): 1294-1317, 2023 11.
Article in English | MEDLINE | ID: mdl-37246513

ABSTRACT

There is a current 'theory crisis' in language acquisition research, resulting from fragmentation both at the level of the approaches and the linguistic level studied. We identify a need for integrative approaches that go beyond these limitations, and propose to analyse the strengths and weaknesses of current theoretical approaches of language acquisition. In particular, we advocate that language learning simulations, if they integrate realistic input and multiple levels of language, have the potential to contribute significantly to our understanding of language acquisition. We then review recent results obtained through such language learning simulations. Finally, we propose some guidelines for the community to build better simulations.


Subject(s)
Language Development , Learning , Humans , Language , Linguistics
3.
Behav Res Methods ; 55(8): 4489-4501, 2023 12.
Article in English | MEDLINE | ID: mdl-36750521

ABSTRACT

We introduce Shennong, a Python toolbox and command-line utility for audio speech features extraction. It implements a wide range of well-established state-of-the-art algorithms: spectro-temporal filters such as Mel-Frequency Cepstral Filterbank or Predictive Linear Filters, pre-trained neural networks, pitch estimators, speaker normalization methods, and post-processing algorithms. Shennong is an open source, reliable and extensible framework built on top of the popular Kaldi speech processing library. The Python implementation makes it easy to use by non-technical users and integrates with third-party speech modeling and machine learning tools from the Python ecosystem. This paper describes the Shennong software architecture, its core components, and implemented algorithms. Then, three applications illustrate its use. We first present a benchmark of speech features extraction algorithms available in Shennong on a phone discrimination task. We then analyze the performances of a speaker normalization model as a function of the speech duration used for training. We finally compare pitch estimation algorithms on speech under various noise conditions.


Subject(s)
Ecosystem , Speech , Humans , Algorithms , Software , Neural Networks, Computer
4.
Cortex ; 155: 150-161, 2022 10.
Article in English | MEDLINE | ID: mdl-35986957

ABSTRACT

Patients with Huntington's disease suffer from disturbances in the perception of emotions; they do not correctly read the body, vocal and facial expressions of others. With regard to the expression of emotions, it has been shown that they are impaired in expressing emotions through face but up until now, little research has been conducted about their ability to express emotions through spoken language. To better understand emotion production in both voice and language in Huntington's Disease (HD), we tested 115 individuals: 68 patients (HD), 22 participants carrying the mutant HD gene without any motor symptoms (pre-manifest HD), and 25 controls in a single-centre prospective observational follow-up study. Participants were recorded in interviews in which they were asked to recall sad, angry, happy, and neutral stories. Emotion expression through voice and language was investigated by comparing the identifiability of emotions expressed by controls, preHD and HD patients in these interviews. To assess separately vocal and linguistic expression of emotions in a blind design, we used machine learning models instead of a human jury performing a forced-choice recognition test. Results from this study showed that patients with HD had difficulty expressing emotions through both voice and language compared to preHD participants and controls, who behaved similarly and above chance. In addition, we did not find any differences in expression of emotions between preHD and healthy controls. We further validated our newly proposed methodology with a human jury on the speech produced by the controls. These results are consistent with the hypothesis that emotional deficits in HD are caused by impaired sensori-motor representations of emotions, in line with embodied cognition theories. This study also shows how machine learning models can be leveraged to assess emotion expression in a blind and reproducible way.


Subject(s)
Huntington Disease , Emotions , Facial Expression , Follow-Up Studies , Humans , Huntington Disease/psychology , Language
5.
J Neurol ; 269(9): 5008-5021, 2022 Sep.
Article in English | MEDLINE | ID: mdl-35567614

ABSTRACT

OBJECTIVES: Using brief samples of speech recordings, we aimed at predicting, through machine learning, the clinical performance in Huntington's Disease (HD), an inherited Neurodegenerative disease (NDD). METHODS: We collected and analyzed 126 samples of audio recordings of both forward and backward counting from 103 Huntington's disease gene carriers [87 manifest and 16 premanifest; mean age 50.6 (SD 11.2), range (27-88) years] from three multicenter prospective studies in France and Belgium (MIG-HD (ClinicalTrials.gov NCT00190450); BIO-HD (ClinicalTrials.gov NCT00190450) and Repair-HD (ClinicalTrials.gov NCT00190450). We pre-registered all of our methods before running any analyses, in order to avoid inflated results. We automatically extracted 60 speech features from blindly annotated samples. We used machine learning models to combine multiple speech features in order to make predictions at individual levels of the clinical markers. We trained machine learning models on 86% of the samples, the remaining 14% constituted the independent test set. We combined speech features with demographics variables (age, sex, CAG repeats, and burden score) to predict cognitive, motor, and functional scores of the Unified Huntington's disease rating scale. We provided correlation between speech variables and striatal volumes. RESULTS: Speech features combined with demographics allowed the prediction of the individual cognitive, motor, and functional scores with a relative error from 12.7 to 20.0% which is better than predictions using demographics and genetic information. Both mean and standard deviation of pause durations during backward recitation and clinical scores correlated with striatal atrophy (Spearman 0.6 and 0.5-0.6, respectively). INTERPRETATION: Brief and examiner-free speech recording and analysis may become in the future an efficient method for remote evaluation of the individual condition in HD and likely in other NDD.


Subject(s)
Huntington Disease , Neurodegenerative Diseases , Corpus Striatum , Humans , Huntington Disease/diagnosis , Huntington Disease/genetics , Middle Aged , Prospective Studies , Speech
6.
Cognition ; 219: 104961, 2022 02.
Article in English | MEDLINE | ID: mdl-34856424

ABSTRACT

Infants come to learn several hundreds of word forms by two years of age, and it is possible this involves carving these forms out from continuous speech. It has been proposed that the task is facilitated by the presence of prosodic boundaries. We revisit this claim by running computational models of word segmentation, with and without prosodic information, on a corpus of infant-directed speech. We use five cognitively-based algorithms, which vary in whether they employ a sub-lexical or a lexical segmentation strategy and whether they are simple heuristics or embody an ideal learner. Results show that providing expert-annotated prosodic breaks does not uniformly help all segmentation models. The sub-lexical algorithms, which perform more poorly, benefit most, while the lexical ones show a very small gain. Moreover, when prosodic information is derived automatically from the acoustic cues infants are known to be sensitive to, errors in the detection of the boundaries lead to smaller positive effects, and even negative ones for some algorithms. This shows that even though infants could potentially use prosodic breaks, it does not necessarily follow that they should incorporate prosody into their segmentation strategies, when confronted with realistic signals.


Subject(s)
Speech Perception , Speech , Computer Simulation , Cues , Humans , Infant , Learning , Speech Acoustics
7.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5016-5025, 2022 09.
Article in English | MEDLINE | ID: mdl-34038357

ABSTRACT

In order to reach human performance on complex visual tasks, artificial systems need to incorporate a significant amount of understanding of the world in terms of macroscopic objects, movements, forces, etc. Inspired by work on intuitive physics in infants, we propose an evaluation benchmark which diagnoses how much a given system understands about physics by testing whether it can tell apart well matched videos of possible versus impossible events constructed with a game engine. The test requires systems to compute a physical plausibility score over an entire video. To prevent perceptual biases, the dataset is made of pixel matched quadruplets of videos, enforcing systems to focus on high level temporal dependencies between frames rather than pixel-level details. We then describe two Deep Neural Networks systems aimed at learning intuitive physics in an unsupervised way, using only physically possible videos. The systems are trained with a future semantic mask prediction objective and tested on the possible versus impossible discrimination task. The analysis of their results compared to human data gives novel insights in the potentials and limitations of next frame prediction architectures.


Subject(s)
Algorithms , Benchmarking , Humans , Learning , Neural Networks, Computer , Physics
8.
J Acoust Soc Am ; 150(1): 353, 2021 07.
Article in English | MEDLINE | ID: mdl-34340514

ABSTRACT

Deep learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes in a variety of auditory tasks, yet these models often lack interpretability to fully understand the exact computations that have been performed. Here, we proposed a parametrized neural network layer, which computes specific spectro-temporal modulations based on Gabor filters [learnable spectro-temporal filters (STRFs)] and is fully interpretable. We evaluated this layer on speech activity detection, speaker verification, urban sound classification, and zebra finch call type classification. We found that models based on learnable STRFs are on par for all tasks with state-of-the-art and obtain the best performance for speech activity detection. As this layer remains a Gabor filter, it is fully interpretable. Thus, we used quantitative measures to describe distribution of the learned spectro-temporal modulations. Filters adapted to each task and focused mostly on low temporal and spectral modulations. The analyses show that the filters learned on human speech have similar spectro-temporal parameters as the ones measured directly in the human auditory cortex. Finally, we observed that the tasks organized in a meaningful way: the human vocalization tasks closer to each other and bird vocalizations far away from human vocalizations and urban sounds tasks.


Subject(s)
Auditory Cortex , Speech Perception , Acoustic Stimulation , Auditory Perception , Neural Networks, Computer
9.
Cognition ; 213: 104779, 2021 08.
Article in English | MEDLINE | ID: mdl-34092384

ABSTRACT

Theories and data on language acquisition suggest a range of cues are used, ranging from information on structure found in the linguistic signal itself, to information gleaned from the environmental context or through social interaction. We propose a blueprint for computational models of the early language learner (SCALa, for Socio-Computational Architecture of Language Acquisition) that makes explicit the connection between the kinds of information available to the social learner and the computational mechanisms required to extract language-relevant information and learn from it. SCALa integrates a range of views on language acquisition, further allowing us to make precise recommendations for future large-scale empirical research.


Subject(s)
Language Development , Language , Computer Simulation , Humans , Linguistics , Social Environment
10.
Cogn Sci ; 45(5): e12946, 2021 05.
Article in English | MEDLINE | ID: mdl-34018231

ABSTRACT

A prominent hypothesis holds that by speaking to infants in infant-directed speech (IDS) as opposed to adult-directed speech (ADS), parents help them learn phonetic categories. Specifically, two characteristics of IDS have been claimed to facilitate learning: hyperarticulation, which makes the categories more separable, and variability, which makes the generalization more robust. Here, we test the separability and robustness of vowel category learning on acoustic representations of speech uttered by Japanese adults in ADS, IDS (addressed to 18- to 24-month olds), or read speech (RS). Separability is determined by means of a distance measure computed between the five short vowel categories of Japanese, while robustness is assessed by testing the ability of six different machine learning algorithms trained to classify vowels to generalize on stimuli spoken by a novel speaker in ADS. Using two different speech representations, we find that hyperarticulated speech, in the case of RS, can yield better separability, and that increased between-speaker variability in ADS can yield, for some algorithms, more robust categories. However, these conclusions do not apply to IDS, which turned out to yield neither more separable nor more robust categories compared to ADS inputs. We discuss the usefulness of machine learning algorithms run on real data to test hypotheses about the functional role of IDS.


Subject(s)
Speech Perception , Speech , Adult , Humans , Infant , Machine Learning , Phonetics , Reading
11.
Proc Natl Acad Sci U S A ; 118(12)2021 03 23.
Article in English | MEDLINE | ID: mdl-33723064

ABSTRACT

Words categorize the semantic fields they refer to in ways that maximize communication accuracy while minimizing complexity. Focusing on the well-studied color domain, we show that artificial neural networks trained with deep-learning techniques to play a discrimination game develop communication systems whose distribution on the accuracy/complexity plane closely matches that of human languages. The observed variation among emergent color-naming systems is explained by different degrees of discriminative need, of the sort that might also characterize different human communities. Like human languages, emergent systems show a preference for relatively low-complexity solutions, even at the cost of imperfect communication. We demonstrate next that the nature of the emergent systems crucially depends on communication being discrete (as is human word usage). When continuous message passing is allowed, emergent systems become more complex and eventually less efficient. Our study suggests that efficient semantic categorization is a general property of discrete communication systems, not limited to human language. It suggests moreover that it is exactly the discrete nature of such systems that, acting as a bottleneck, pushes them toward low complexity and optimal efficiency.

12.
Proc Natl Acad Sci U S A ; 118(7)2021 02 09.
Article in English | MEDLINE | ID: mdl-33510040

ABSTRACT

Before they even speak, infants become attuned to the sounds of the language(s) they hear, processing native phonetic contrasts more easily than nonnative ones. For example, between 6 to 8 mo and 10 to 12 mo, infants learning American English get better at distinguishing English and [l], as in "rock" vs. "lock," relative to infants learning Japanese. Influential accounts of this early phonetic learning phenomenon initially proposed that infants group sounds into native vowel- and consonant-like phonetic categories-like and [l] in English-through a statistical clustering mechanism dubbed "distributional learning." The feasibility of this mechanism for learning phonetic categories has been challenged, however. Here, we demonstrate that a distributional learning algorithm operating on naturalistic speech can predict early phonetic learning, as observed in Japanese and American English infants, suggesting that infants might learn through distributional learning after all. We further show, however, that, contrary to the original distributional learning proposal, our model learns units too brief and too fine-grained acoustically to correspond to phonetic categories. This challenges the influential idea that what infants learn are phonetic categories. More broadly, our work introduces a mechanism-driven approach to the study of early phonetic learning, together with a quantitative modeling framework that can handle realistic input. This allows accounts of early phonetic learning to be linked to concrete, systematic predictions regarding infants' attunement.


Subject(s)
Language Development , Models, Neurological , Natural Language Processing , Phonetics , Humans , Speech Perception , Speech Recognition Software
13.
JASA Express Lett ; 1(11): 115203, 2021 11.
Article in English | MEDLINE | ID: mdl-36154027

ABSTRACT

This study aims to quantify the effect of several information sources: acoustic, higher-level linguistic, and knowledge of the prosodic system of the language, on the perception of prosodic boundaries. An experiment with native and non-native participants investigating the identification of prosodic boundaries in Japanese was conducted. It revealed that non-native speakers as well as native speakers with access only to acoustic information can recognize boundaries better than chance level. However, knowledge of both the prosodic system and of higher-level information are required for a good boundary identification, each one having similar or higher importance than that of acoustic information.


Subject(s)
Language , Perception , Humans
14.
Open Mind (Camb) ; 5: 113-131, 2021.
Article in English | MEDLINE | ID: mdl-35024527

ABSTRACT

Early changes in infants' ability to perceive native and nonnative speech sound contrasts are typically attributed to their developing knowledge of phonetic categories. We critically examine this hypothesis and argue that there is little direct evidence of category knowledge in infancy. We then propose an alternative account in which infants' perception changes because they are learning a perceptual space that is appropriate to represent speech, without yet carving up that space into phonetic categories. If correct, this new account has substantial implications for understanding early language development.

15.
Behav Res Methods ; 52(1): 264-278, 2020 02.
Article in English | MEDLINE | ID: mdl-30937845

ABSTRACT

A basic task in first language acquisition likely involves discovering the boundaries between words or morphemes in input where these basic units are not overtly segmented. A number of unsupervised learning algorithms have been proposed in the last 20 years for these purposes, some of which have been implemented computationally, but whose results remain difficult to compare across papers. We created a tool that is open source, enables reproducible results, and encourages cumulative science in this domain. WordSeg has a modular architecture: It combines a set of corpora description routines, multiple algorithms varying in complexity and cognitive assumptions (including several that were not publicly available, or insufficiently documented), and a rich evaluation package. In the paper, we illustrate the use of this package by analyzing a corpus of child-directed speech in various ways, which further allows us to make recommendations for experimental design of follow-up work. Supplementary materials allow readers to reproduce every result in this paper, and detailed online instructions further enable them to go beyond what we have done. Moreover, the system can be installed within container software that ensures a stable and reliable environment. Finally, by virtue of its modular architecture and transparency, WordSeg can work as an open-source platform, to which other researchers can add their own segmentation algorithms.


Subject(s)
Speech , Algorithms , Humans , Language Development , Software
16.
Open Mind (Camb) ; 3: 13-22, 2019 Feb 01.
Article in English | MEDLINE | ID: mdl-31149647

ABSTRACT

Previous computational modeling suggests it is much easier to segment words from child-directed speech (CDS) than adult-directed speech (ADS). However, this conclusion is based on data collected in the laboratory, with CDS from play sessions and ADS between a parent and an experimenter, which may not be representative of ecologically collected CDS and ADS. Fully naturalistic ADS and CDS collected with a nonintrusive recording device as the child went about her day were analyzed with a diverse set of algorithms. The difference between registers was small compared to differences between algorithms; it reduced when corpora were matched, and it even reversed under some conditions. These results highlight the interest of studying learnability using naturalistic corpora and diverse algorithmic definitions.

17.
Psychol Assess ; 31(5): 622-630, 2019 May.
Article in English | MEDLINE | ID: mdl-30628822

ABSTRACT

Aphasia is a devastating brain disorder, detrimental for medical care and social interaction. The early diagnosis of language disorders and accurate identification of patient-specific deficits are crucial for patients' care, as aphasia rehabilitation is more effective when focused on patient-specific language deficits. We developed the Core Assessment of Language Processing (CALAP), a new scale combining screening and detailed evaluation to rapidly diagnose and identify patient-specific language deficits. This scale is based on a model of language processing distinguishing between the comprehension, production, and repetition modalities, and their different components: phonology (set of speech-sounds), morphology (how the sounds combine to form words), lexicon (words), syntax (how words combine to form sentences), and concept (semantic knowledge). This scale was validated by 189 participants who underwent the CALAP, and patients not unequivocally classified as without aphasia by a speech-language pathologist underwent the Boston Diagnosis Aphasia Evaluation as the gold standard. CALAP-screening classified patients with and without aphasia with a sensitivity of 1 and a specificity of 0.72, in 3.14 ± 1.23 min. CALAP-detailed evaluation specifically assessed the language components in 8.25 ± 5.1 min. Psychometric properties including concurrent validity, internal validity, internal consistency and interrater reliability showed that the CALAP is a valid and reliable scale. The CALAP provides an aphasia diagnosis along with the identification of patient-specific impairment making it possible to improve clinical follow up and deficit-based rehabilitation. It is a short and easy-to-use scale that can be scored and interpreted by clinicians nonexpert in language, in patients with fatigue and concentration deficits. (PsycINFO Database Record (c) 2019 APA, all rights reserved).


Subject(s)
Language Disorders/diagnosis , Nervous System Diseases/diagnosis , Neuropsychological Tests/standards , Aged , Aphasia/diagnosis , Comprehension , Female , Humans , Language Tests/standards , Male , Middle Aged , Reproducibility of Results
18.
Child Dev ; 90(3): 759-773, 2019 05.
Article in English | MEDLINE | ID: mdl-29094348

ABSTRACT

This article provides an estimation of how frequently, and from whom, children aged 0-11 years (Ns between 9 and 24) receive one-on-one verbal input among Tsimane forager-horticulturalists of lowland Bolivia. Analyses of systematic daytime behavioral observations reveal < 1 min per daylight hour is spent talking to children younger than 4 years of age, which is 4 times less than estimates for others present at the same time and place. Adults provide a majority of the input at 0-3 years of age but not afterward. When integrated with previous work, these results reveal large cross-cultural variation in the linguistic experiences provided to young children. Consideration of more diverse human populations is necessary to build generalizable theories of language acquisition.


Subject(s)
Farmers , Indians, South American/ethnology , Interpersonal Relations , Verbal Behavior , Adult , Bolivia/ethnology , Child , Child, Preschool , Female , Humans , Infant , Male
19.
Cogn Sci ; 2018 May 30.
Article in English | MEDLINE | ID: mdl-29851142

ABSTRACT

We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: The IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.

20.
J Acoust Soc Am ; 143(5): EL372, 2018 05.
Article in English | MEDLINE | ID: mdl-29857692

ABSTRACT

Theories of cross-linguistic phonetic category perception posit that listeners perceive foreign sounds by mapping them onto their native phonetic categories, but, until now, no way to effectively implement this mapping has been proposed. In this paper, Automatic Speech Recognition systems trained on continuous speech corpora are used to provide a fully specified mapping between foreign sounds and native categories. The authors show how the machine ABX evaluation method can be used to compare predictions from the resulting quantitative models with empirically attested effects in human cross-linguistic phonetic category perception.


Subject(s)
Language , Neural Networks, Computer , Phonetics , Speech Perception , Speech Recognition Software/classification , Humans , Speech Perception/physiology
SELECTION OF CITATIONS
SEARCH DETAIL
...