Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Clin Linguist Phon ; : 1-22, 2024 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-38853471

RESUMO

Speech training apps are being developed that provide automatic feedback concerning children's production of known target words, as a score on a 1-5 scale. However, this 'goodness' scale is still poorly understood. We investigated listeners' ratings of 'how many stars the app should provide as feedback' on children's utterances, and whether listener agreement is affected by clinical experience and/or access to anchor stimuli. In addition, we explored the association between goodness ratings and clinical measures of speech accuracy; the Percentage of Consonants Correct (PCC) and the Percentage of Phonemes Correct (PPC). Twenty speech-language pathologists and 20 non-expert listeners participated; half of the listeners in each group had access to anchor stimuli. The listeners rated 120 words, collected from children with and without speech sound disorder. Concerning reliability, intra-rater agreement was generally high, whereas inter-rater agreement was moderate. Access to anchor stimuli was associated with higher agreement, but only for non-expert listeners. Concerning the association between goodness ratings and the PCC/PPC, correlations were moderate for both listener groups, under both conditions. The results indicate that the task of rating goodness is difficult, regardless of clinical experience, and that access to anchor stimuli is insufficient for achieving reliable ratings. This raises concerns regarding the 1-5 rating scale as the means of feedback in speech training apps. More specific listener instructions, particularly regarding the intended context for the app, are suggested in collection of human ratings underlying the development of speech training apps. Until then, alternative means of feedback should be preferred.

2.
Lang Resour Eval ; : 1-26, 2023 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-37360261

RESUMO

Public sources like parliament meeting recordings and transcripts provide ever-growing material for the training and evaluation of automatic speech recognition (ASR) systems. In this paper, we publish and analyse the Finnish Parliament ASR Corpus, the most extensive publicly available collection of manually transcribed speech data for Finnish with over 3000 h of speech and 449 speakers for which it provides rich demographic metadata. This corpus builds on earlier initial work, and as a result the corpus has a natural split into two training subsets from two periods of time. Similarly, there are two official, corrected test sets covering different times, setting an ASR task with longitudinal distribution-shift characteristics. An official development set is also provided. We developed a complete Kaldi-based data preparation pipeline and ASR recipes for hidden Markov models (HMM), hybrid deep neural networks (HMM-DNN), and attention-based encoder-decoders (AED). For HMM-DNN systems, we provide results with time-delay neural networks (TDNN) as well as state-of-the-art wav2vec 2.0 pretrained acoustic models. We set benchmarks on the official test sets and multiple other recently used test sets. Both temporal corpus subsets are already large, and we observe that beyond their scale, HMM-TDNN ASR performance on the official test sets has reached a plateau. In contrast, other domains and larger wav2vec 2.0 models benefit from added data. The HMM-DNN and AED approaches are compared in a carefully matched equal data setting, with the HMM-DNN system consistently performing better. Finally, the variation of the ASR accuracy is compared between the speaker categories available in the parliament metadata to detect potential biases based on factors such as gender, age, and education.

3.
Lang Resour Eval ; : 1-33, 2022 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-35965738

RESUMO

The Donate Speech campaign has so far succeeded in gathering approximately 3600 h of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus. The corpus includes over twenty thousand speakers from all the regions of Finland and from all age brackets. The primary goals of the collection were to create a representative, large-scale resource to study spontaneous spoken Finnish and to accelerate the development of language technology and speech-based services. In this paper, we present the collection process and the collected corpus, and showcase its versatility through multiple use cases. The evaluated use cases include: automatic speech recognition of spontaneous speech, detection of age, gender, dialect and topic and metadata analysis. We provide benchmarks for the use cases, as well downloadable, trained baseline systems with open-source code for reproducibility. One further use case is to verify the metadata and transcripts given in this corpus itself, and to suggest artificial metadata and transcripts for the part of the corpus where it is missing.

4.
Neuroimage ; 219: 116936, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-32474080

RESUMO

Natural speech builds on contextual relations that can prompt predictions of upcoming utterances. To study the neural underpinnings of such predictive processing we asked 10 healthy adults to listen to a 1-h-long audiobook while their magnetoencephalographic (MEG) brain activity was recorded. We correlated the MEG signals with acoustic speech envelope, as well as with estimates of Bayesian word probability with and without the contextual word sequence (N-gram and Unigram, respectively), with a focus on time-lags. The MEG signals of auditory and sensorimotor cortices were strongly coupled to the speech envelope at the rates of syllables (4-8 â€‹Hz) and of prosody and intonation (0.5-2 â€‹Hz). The probability structure of word sequences, independently of the acoustical features, affected the ≤ 2-Hz signals extensively in auditory and rolandic regions, in precuneus, occipital cortices, and lateral and medial frontal regions. Fine-grained temporal progression patterns occurred across brain regions 100-1000 â€‹ms after word onsets. Although the acoustic effects were observed in both hemispheres, the contextual influences were statistically significantly lateralized to the left hemisphere. These results serve as a brain signature of the predictability of word sequences in listened continuous speech, confirming and extending previous results to demonstrate that deeply-learned knowledge and recent contextual information are employed dynamically and in a left-hemisphere-dominant manner in predicting the forthcoming words in natural speech.


Assuntos
Encéfalo/fisiologia , Percepção da Fala/fisiologia , Estimulação Acústica , Adulto , Atenção/fisiologia , Córtex Auditivo/fisiologia , Mapeamento Encefálico , Feminino , Humanos , Magnetoencefalografia , Masculino , Pessoa de Meia-Idade , Fala/fisiologia , Adulto Jovem
5.
Mem Cognit ; 47(7): 1245-1269, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31102191

RESUMO

We studied how statistical models of morphology that are built on different kinds of representational units, i.e., models emphasizing either holistic units or decomposition, perform in predicting human word recognition. More specifically, we studied the predictive power of such models at early vs. late stages of word recognition by using eye-tracking during two tasks. The tasks included a standard lexical decision task and a word recognition task that assumedly places less emphasis on postlexical reanalysis and decision processes. The lexical decision results showed good performance of Morfessor models based on the Minimum Description Length optimization principle. Models which segment words at some morpheme boundaries and keep other boundaries unsegmented performed well both at early and late stages of word recognition, supporting dual- or multiple-route cognitive models of morphological processing. Statistical models based on full forms fared better in late than early measures. The results of the second, multi-word recognition task showed that early and late stages of processing often involve accessing morphological constituents, with the exception of short complex words. Late stages of word recognition additionally involve predicting upcoming morphemes on the basis of previous ones in multimorphemic words. The statistical models based fully on whole words did not fare well in this task. Thus, we assume that the good performance of such models in global measures such as gaze durations or reaction times in lexical decision largely stems from postlexical reanalysis or decision processes. This finding highlights the importance of considering task demands in the study of morphological processing.


Assuntos
Movimentos Oculares , Modelos Estatísticos , Leitura , Reconhecimento Psicológico , Semântica , Adulto , Tomada de Decisões , Feminino , Humanos , Masculino , Rememoração Mental , Tempo de Reação , Adulto Jovem
6.
Hum Brain Mapp ; 34(6): 1477-89, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22344824

RESUMO

It is a challenge for current signal analysis approaches to identify the electrophysiological brain signatures of continuous natural speech that the subject is listening to. To relate magnetoencephalographic (MEG) brain responses to the physical properties of such speech stimuli, we applied canonical correlation analysis (CCA) and a Bayesian mixture of CCA analyzers to extract MEG features related to the speech envelope. Seven healthy adults listened to news for an hour while their brain signals were recorded with whole-scalp MEG. We found shared signal time series (canonical variates) between the MEG signals and speech envelopes at 0.5-12 Hz. By splitting the test signals into equal-length fragments from 2 to 65 s (corresponding to 703 down to 21 pieces per the total speech stimulus) we obtained better than chance-level identification for speech fragments longer than 2-3 s, not used in the model training. The applied analysis approach thus allowed identification of segments of natural speech by means of partial reconstruction of the continuous speech envelope (i.e., the intensity variations of the speech sounds) from MEG responses, provided means to empirically assess the time scales obtainable in speech decoding with the canonical variates, and it demonstrated accurate identification of the heard speech fragments from the MEG data.


Assuntos
Encéfalo/fisiologia , Magnetoencefalografia/métodos , Processamento de Sinais Assistido por Computador , Percepção da Fala/fisiologia , Adulto , Feminino , Humanos , Masculino , Fala , Adulto Jovem
7.
Front Hum Neurosci ; 17: 1122886, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36968782

RESUMO

Children with dyslexia often face difficulties in learning foreign languages, which is reflected as weaker neural activation. However, digital language-learning applications could support learning-induced plastic changes in the brain. Here we aimed to investigate whether plastic changes occur in children with dyslexia more readily after targeted training with a digital language-learning game or similar training without game-like elements. We used auditory event-related potentials (ERPs), specifically, the mismatch negativity (MMN), to study learning-induced changes in the brain responses. Participants were 24 school-aged Finnish-speaking children with dyslexia and 24 age-matched typically reading control children. They trained English speech sounds and words with "Say it again, kid!" (SIAK) language-learning game for 5 weeks between ERP measurements. During the game, the players explored game boards and produced English words aloud to score stars as feedback from an automatic speech recognizer. To compare the effectiveness of the training type (game vs. non-game), we embedded in the game some non-game levels stripped of all game-like elements. In the dyslexia group, the non-game training increased the MMN amplitude more than the game training, whereas in the control group the game training increased the MMN response more than the non-game training. In the dyslexia group, the MMN increase with the non-game training correlated with phonological awareness: the children with poorer phonological awareness showed a larger increase in the MMN response. Improved neural processing of foreign speech sounds as indicated by the MMN increase suggests that targeted training with a simple application could alleviate some spoken foreign-language learning difficulties that are related to phonological processing in children with dyslexia.

8.
Brain Lang ; 230: 105124, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35487084

RESUMO

Digital games may benefit children's learning, yet the factors that induce gaming benefits to cognition are not well known. In this study, we investigated the effectiveness of digital game-based learning in children by comparing the learning of foreign speech sounds and words in a digital game or a non-game digital application. To evaluate gaming-induced plastic changes in the brain, we used the mismatch negativity (MMN) brain response that reflects the access to long-term memory representations. We recorded auditory brain responses from 37 school-aged Finnish-speaking children before and after playing a computer-based language-learning game. The MMN amplitude increased between the pre- and post-measurement for the game condition but not for the non-game condition, suggesting that the gaming intervention enhanced learning more than the non-game intervention. The results indicate that digital games can be beneficial for children's speech-sound learning and that gaming elements per se, not just practice time, support learning.


Assuntos
Plásticos , Jogos de Vídeo , Encéfalo/fisiologia , Criança , Humanos , Aprendizagem/fisiologia , Fonética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA