Búsqueda | Portal Regional de la BVS

Classification of voice quality using neck-surface acceleration: Comparison with glottal flow and radiated sound.

Wlodarczak, Marcin; Ludusan, Bogdan; Sundberg, Johan; Heldner, Mattias.

J Voice ; 2022 Aug 23.

Artículo en Inglés | MEDLINE | ID: mdl-36028369

RESUMEN

OBJECTIVES: The aim of the present study is to investigate the usefulness of features extracted from miniature accelerometers attached to speaker's tracheal wall below the glottis for classification of phonation type. The performance of the accelerometer features is evaluated relative to features obtained from inverse filtered and radiated sound. While the former is a good proxy for the voice source, obtaining robust voice source features from the latter is considered difficult since it also contains information about the vocal tract filter. By contrast, the accelerometer signal is largely unaffected by the vocal tract and although it is shaped by subglottal resonances and the transfer properties of the neck tissue, these properties remain constant within a speaker. For this reason, we expect it to provide a better approximation of the voice source than the raw audio. We also investigate which aspects of the voice source are derivable from the accelerometer and microphone signals. METHODS: Five trained singers (two females and three males) were recorded producing the syllable [pæ:] in three voice qualities (neutral, breathy and pressed) and at three pitch levels as determined by the participants' personal preference. Features extracted from the three signals were used for classification of phonation type using a random forest classifier. In addition, accelerometer and microphone features with highest correlation with the voice source features were identified. RESULTS: The three signals showed comparable classification error rates, with considerable differences across speakers both with respect to the overall performance and the importance of individual features. The speaker-specific differences notwithstanding, variation of phonation type had consistent effects on the voice source, accelerometer and audio signals. With regard to the voice source, AQ, NAQ, L1L2 and CQ all showed a monotonic variation along the breathy - neutral - pressed continuum. Several features were also found to vary systematically in the accelerometer and audio signals: HRF, L1L2 and CPPS (both the accelerometer and the audio), as well as the sound level (for the audio). The random forest analysis revealed that all of these features were also among the most important for the classification of voice quality. CONCLUSION: Both the accelerometer and the audio signals were found to discriminate between phonation types with an accuracy approaching that of the voice source. Thus, the accelerometer signal, which is largely uncontaminated by vocal tract resonances, offered no advantage over the signal collected with a normal microphone.

How much does prosody help word segmentation? A simulation study on infant-directed speech.

Ludusan, Bogdan; Cristia, Alejandrina; Mazuka, Reiko; Dupoux, Emmanuel.

Cognition ; 219: 104961, 2022 02.

Artículo en Inglés | MEDLINE | ID: mdl-34856424

RESUMEN

Infants come to learn several hundreds of word forms by two years of age, and it is possible this involves carving these forms out from continuous speech. It has been proposed that the task is facilitated by the presence of prosodic boundaries. We revisit this claim by running computational models of word segmentation, with and without prosodic information, on a corpus of infant-directed speech. We use five cognitively-based algorithms, which vary in whether they employ a sub-lexical or a lexical segmentation strategy and whether they are simple heuristics or embody an ideal learner. Results show that providing expert-annotated prosodic breaks does not uniformly help all segmentation models. The sub-lexical algorithms, which perform more poorly, benefit most, while the lexical ones show a very small gain. Moreover, when prosodic information is derived automatically from the acoustic cues infants are known to be sensitive to, errors in the detection of the boundaries lead to smaller positive effects, and even negative ones for some algorithms. This shows that even though infants could potentially use prosodic breaks, it does not necessarily follow that they should incorporate prosody into their segmentation strategies, when confronted with realistic signals.

Asunto(s)

Percepción del Habla , Habla , Simulación por Computador , Señales (Psicología) , Humanos , Lactante , Aprendizaje , Acústica del Lenguaje

Does Infant-Directed Speech Help Phonetic Learning? A Machine Learning Investigation.

Ludusan, Bogdan; Mazuka, Reiko; Dupoux, Emmanuel.

Cogn Sci ; 45(5): e12946, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-34018231

RESUMEN

A prominent hypothesis holds that by speaking to infants in infant-directed speech (IDS) as opposed to adult-directed speech (ADS), parents help them learn phonetic categories. Specifically, two characteristics of IDS have been claimed to facilitate learning: hyperarticulation, which makes the categories more separable, and variability, which makes the generalization more robust. Here, we test the separability and robustness of vowel category learning on acoustic representations of speech uttered by Japanese adults in ADS, IDS (addressed to 18- to 24-month olds), or read speech (RS). Separability is determined by means of a distance measure computed between the five short vowel categories of Japanese, while robustness is assessed by testing the ability of six different machine learning algorithms trained to classify vowels to generalize on stimuli spoken by a novel speaker in ADS. Using two different speech representations, we find that hyperarticulated speech, in the case of RS, can yield better separability, and that increased between-speaker variability in ADS can yield, for some algorithms, more robust categories. However, these conclusions do not apply to IDS, which turned out to yield neither more separable nor more robust categories compared to ADS inputs. We discuss the usefulness of machine learning algorithms run on real data to test hypotheses about the functional role of IDS.

Asunto(s)

Percepción del Habla , Habla , Adulto , Humanos , Lactante , Aprendizaje Automático , Fonética , Lectura

The effect of different information sources on prosodic boundary perception.

Ludusan, Bogdan; Morii, Masahiro; Minagawa, Yasuyo; Dupoux, Emmanuel.

JASA Express Lett ; 1(11): 115203, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-36154027

RESUMEN

This study aims to quantify the effect of several information sources: acoustic, higher-level linguistic, and knowledge of the prosodic system of the language, on the perception of prosodic boundaries. An experiment with native and non-native participants investigating the identification of prosodic boundaries in Japanese was conducted. It revealed that non-native speakers as well as native speakers with access only to acoustic information can recognize boundaries better than chance level. However, knowledge of both the prosodic system and of higher-level information are required for a good boundary identification, each one having similar or higher importance than that of acoustic information.

Asunto(s)

Lenguaje , Percepción , Humanos

Are Words Easier to Learn From Infant- Than Adult-Directed Speech? A Quantitative Corpus-Based Investigation.

Guevara-Rukoz, Adriana; Cristia, Alejandrina; Ludusan, Bogdan; Thiollière, Roland; Martin, Andrew; Mazuka, Reiko; Dupoux, Emmanuel.

Cogn Sci ; 2018 May 30.

Artículo en Inglés | MEDLINE | ID: mdl-29851142

RESUMEN

We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: The IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.

Learnability of prosodic boundaries: Is infant-directed speech easier?

Ludusan, Bogdan; Cristia, Alejandrina; Martin, Andrew; Mazuka, Reiko; Dupoux, Emmanuel.

J Acoust Soc Am ; 140(2): 1239, 2016 08.

Artículo en Inglés | MEDLINE | ID: mdl-27586752

RESUMEN

This study explores the long-standing hypothesis that the acoustic cues to prosodic boundaries in infant-directed speech (IDS) make those boundaries easier to learn than those in adult-directed speech (ADS). Three cues (pause duration, nucleus duration, and pitch change) were investigated, by means of a systematic review of the literature, statistical analyses of a corpus of Japanese, and machine learning experiments. The review of previous work revealed that the effect of register on boundary cues is less well established than previously thought, and that results often vary across studies for certain cues. Statistical analyses run on a large database of mother-child and mother-interviewer interactions showed that the duration of a pause and the duration of the syllable nucleus preceding the boundary are two cues which are enhanced in IDS, while f0 change is actually degraded in IDS. Supervised and unsupervised machine learning techniques applied to these acoustic cues revealed that IDS boundaries were consistently better classified than ADS ones, regardless of the learning method used. The role of the cues examined in this study and the importance of these findings in the more general context of early linguistic structure acquisition is discussed.

Asunto(s)

Lenguaje Infantil , Señales (Psicología) , Factores de Edad , Femenino , Humanos , Lactante , Madres , Habla , Acústica del Lenguaje , Percepción del Habla , Aprendizaje Automático Supervisado , Aprendizaje Automático no Supervisado

The role of prosodic boundaries in word discovery: Evidence from a computational model.

Ludusan, Bogdan; Dupoux, Emmanuel.

J Acoust Soc Am ; 140(1): EL1, 2016 07.

Artículo en Inglés | MEDLINE | ID: mdl-27475196

RESUMEN

This study aims to quantify the role of prosodic boundaries in early language acquisition using a computational modeling approach. A spoken term discovery system that models early word learning was used with and without a prosodic component on speech corpora of English, Spanish, and Japanese. The results showed that prosodic information induces a consistent improvement both in the alignment of the terms to actual word boundaries and in the phonemic homogeneity of the discovered clusters of terms. This benefit was found also when automatically discovered prosodic boundaries were used, boundaries which did not perfectly match the linguistically defined ones.

Asunto(s)

Lenguaje Infantil , Simulación por Computador , Aprendizaje Verbal , Algoritmos , Niño , Humanos , Lenguaje , Habla , Percepción del Habla

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA