Pesquisa | Portal Regional da BVS

1.

Word Forms Reflect Trade-Offs Between Speaker Effort and Robust Listener Recognition.

Meylan, Stephan C; Griffiths, Thomas L.

Cogn Sci ; 48(7): e13478, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38980972

RESUMO

How do cognitive pressures shape the lexicons of natural languages? Here, we reframe George Kingsley Zipf's proposed "law of abbreviation" within a more general framework that relates it to cognitive pressures that affect speakers and listeners. In this new framework, speakers' drive to reduce effort (Zipf's proposal) is counteracted by the need for low-frequency words to have word forms that are sufficiently distinctive to allow for accurate recognition by listeners. To support this framework, we replicate and extend recent work using the prevalence of subword phonemic sequences (phonotactic probability) to measure speakers' production effort in place of Zipf's measure of length. Across languages and corpora, phonotactic probability is more strongly correlated with word frequency than word length. We also show this measure of ease of speech production (phonotactic probability) is strongly correlated with a measure of perceptual difficulty that indexes the degree of competition from alternative interpretations in word recognition. This is consistent with the claim that there must be trade-offs between these two factors, and is inconsistent with a recent proposal that phonotactic probability facilitates both perception and production. To our knowledge, this is the first work to offer an explanation why long, phonotactically improbable word forms remain in the lexicons of natural languages.

Assuntos

Idioma , Fonética , Reconhecimento Psicológico , Percepção da Fala , Humanos , Fala

2.

Children and adults produce distinct technology- and human-directed speech.

Cohn, Michelle; Barreda, Santiago; Graf Estes, Katharine; Yu, Zhou; Zellou, Georgia.

Sci Rep ; 14(1): 15611, 2024 Jul 06.

Artigo em Inglês | MEDLINE | ID: mdl-38971806

RESUMO

This study compares how English-speaking adults and children from the United States adapt their speech when talking to a real person and a smart speaker (Amazon Alexa) in a psycholinguistic experiment. Overall, participants produced more effortful speech when talking to a device (longer duration and higher pitch). These differences also varied by age: children produced even higher pitch in device-directed speech, suggesting a stronger expectation to be misunderstood by the system. In support of this, we see that after a staged recognition error by the device, children increased pitch even more. Furthermore, both adults and children displayed the same degree of variation in their responses for whether "Alexa seems like a real person or not", further indicating that children's conceptualization of the system's competence shaped their register adjustments, rather than an increased anthropomorphism response. This work speaks to models on the mechanisms underlying speech production, and human-computer interaction frameworks, providing support for routinized theories of spoken interaction with technology.

Assuntos

Fala , Humanos , Adulto , Criança , Masculino , Feminino , Fala/fisiologia , Adulto Jovem , Adolescente , Psicolinguística

3.

Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability.

Ahn, Youngdo; Han, Sangwook; Lee, Seonggyu; Shin, Jong Won.

Sensors (Basel) ; 24(13)2024 Jun 25.

Artigo em Inglês | MEDLINE | ID: mdl-39000889

RESUMO

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.

Assuntos

Emoções , Fala , Humanos , Emoções/fisiologia , Fala/fisiologia , Algoritmos , Reprodutibilidade dos Testes , Reconhecimento Automatizado de Padrão/métodos , Bases de Dados Factuais

4.

The swallowing and speech after transoral robotic surgery-does the site impact the outcome?

Mettias, Bassem; Young, Kate; Sahota, Bindy; Mansuri, Shaji; Kumar, Anand; Nijim, Hazem; Laugharne, David; Mortimore, Sean.

J Robot Surg ; 18(1): 287, 2024 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-39026112

RESUMO

Transoral robotic surgery (TORS) has been introduced to head and neck surgery as a minimally invasive techqniques to improve the functional outcomes of patients. Compare the functional outcome for swallowing and speech in each site of TORS within the head and neck. Retrospective cohort study for patients who underwent TORS within the head and neck unit. Patients were assessed at four different time points (one day, one month, six months and twelve months, respectively) with bedside/office testing. Methods of testing for swallowing assessment were by the International Dysphagia Diet Standardization Initiative (IDDSI), and speech assessments were carried out using the Understandability of Speech score (USS). Outcomes were compared to patient-specific pre-treatment baseline levels. 68 patients were included. 75% and 40% of the patients resumed normal fluid intake and normal diet immediately after surgery. 8.8% required a temporary feeding tube, with 1% required gastrostomy. There was a steep improvement in diet between 3 and 6 months. Fluid and diet consistency dropped significantly following the majority of transoral robotic surgery with more noticeable diet changes. Early deterioration in diet is temporary and manageable with a modified diet. Rapid recovery of swallowing is achieved before the first year. There is no long-term effect on speech.

Assuntos

Transtornos de Deglutição , Deglutição , Procedimentos Cirúrgicos Robóticos , Fala , Humanos , Procedimentos Cirúrgicos Robóticos/métodos , Deglutição/fisiologia , Masculino , Feminino , Estudos Retrospectivos , Fala/fisiologia , Pessoa de Meia-Idade , Idoso , Transtornos de Deglutição/etiologia , Resultado do Tratamento , Boca , Adulto , Neoplasias de Cabeça e Pescoço/cirurgia , Idoso de 80 Anos ou mais

5.

Speech and music recruit frequency-specific distributed and overlapping cortical networks.

Te Rietmolen, Noémie; Mercier, Manuel R; Trébuchon, Agnès; Morillon, Benjamin; Schön, Daniele.

Elife ; 132024 Jul 22.

Artigo em Inglês | MEDLINE | ID: mdl-39038076

RESUMO

To what extent does speech and music processing rely on domain-specific and domain-general neural networks? Using whole-brain intracranial EEG recordings in 18 epilepsy patients listening to natural, continuous speech or music, we investigated the presence of frequency-specific and network-level brain activity. We combined it with a statistical approach in which a clear operational distinction is made between shared, preferred, and domain-selective neural responses. We show that the majority of focal and network-level neural activity is shared between speech and music processing. Our data also reveal an absence of anatomical regional selectivity. Instead, domain-selective neural responses are restricted to distributed and frequency-specific coherent oscillations, typical of spectral fingerprints. Our work highlights the importance of considering natural stimuli and brain dynamics in their full complexity to map cognitive and brain functions.

Assuntos

Música , Humanos , Masculino , Feminino , Adulto , Rede Nervosa/fisiologia , Fala/fisiologia , Percepção Auditiva/fisiologia , Epilepsia/fisiopatologia , Adulto Jovem , Eletroencefalografia , Córtex Cerebral/fisiologia , Eletrocorticografia , Percepção da Fala/fisiologia , Pessoa de Meia-Idade , Mapeamento Encefálico

6.

Multisensory integration of speech and gestures in a naturalistic paradigm.

Matyjek, Magdalena; Kita, Sotaro; Torralba Cuello, Mireia; Soto Faraco, Salvador.

Hum Brain Mapp ; 45(11): e26797, 2024 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-39041175

RESUMO

Speech comprehension is crucial for human social interaction, relying on the integration of auditory and visual cues across various levels of representation. While research has extensively studied multisensory integration (MSI) using idealised, well-controlled stimuli, there is a need to understand this process in response to complex, naturalistic stimuli encountered in everyday life. This study investigated behavioural and neural MSI in neurotypical adults experiencing audio-visual speech within a naturalistic, social context. Our novel paradigm incorporated a broader social situational context, complete words, and speech-supporting iconic gestures, allowing for context-based pragmatics and semantic priors. We investigated MSI in the presence of unimodal (auditory or visual) or complementary, bimodal speech signals. During audio-visual speech trials, compared to unimodal trials, participants more accurately recognised spoken words and showed a more pronounced suppression of alpha power-an indicator of heightened integration load. Importantly, on the neural level, these effects surpassed mere summation of unimodal responses, suggesting non-linear MSI mechanisms. Overall, our findings demonstrate that typically developing adults integrate audio-visual speech and gesture information to facilitate speech comprehension in noisy environments, highlighting the importance of studying MSI in ecologically valid contexts.

Assuntos

Gestos , Percepção da Fala , Humanos , Feminino , Masculino , Percepção da Fala/fisiologia , Adulto Jovem , Adulto , Percepção Visual/fisiologia , Eletroencefalografia , Compreensão/fisiologia , Estimulação Acústica , Fala/fisiologia , Encéfalo/fisiologia , Estimulação Luminosa/métodos

7.

A voice and speech corpus of patients who underwent upper airway surgery in pre- and post-operative states.

Hernández-García, Estefanía; Guerrero-López, Alejandro; Arias-Londoño, Julián D; Godino-Llorente, Juan I.

Sci Data ; 11(1): 746, 2024 Jul 09.

Artigo em Inglês | MEDLINE | ID: mdl-38982093

RESUMO

Many research articles have explored the impact of surgical interventions on voice and speech evaluations, but advances are limited by the lack of publicly accessible datasets. To address this, a comprehensive corpus of 107 Spanish Castilian speakers was recorded, including control speakers and patients who underwent upper airway surgeries such as Tonsillectomy, Functional Endoscopic Sinus Surgery, and Septoplasty. The dataset contains 3,800 audio files, averaging 35.51 ± 5.91 recordings per patient. This resource enables systematic investigation of the effects of upper respiratory tract surgery on voice and speech. Previous studies using this corpus have shown no relevant changes in key acoustic parameters for sustained vowel phonation, consistent with initial hypotheses. However, the analysis of speech recordings, particularly nasalised segments, remains open for further research. Additionally, this dataset facilitates the study of the impact of upper airway surgery on speaker recognition and identification methods, and testing of anti-spoofing methodologies for improved robustness.

Assuntos

Fala , Voz , Humanos , Período Pós-Operatório , Tonsilectomia , Masculino , Feminino , Período Pré-Operatório , Adulto

8.

Speech's syllabic rhythm and articulatory features produced under different auditory feedback conditions identify Parkinsonism.

Piña Méndez, Ángeles; Taitz, Alan; Palacios Rodríguez, Oscar; Rodríguez Leyva, Ildefonso; Assaneo, M Florencia.

Sci Rep ; 14(1): 15787, 2024 Jul 09.

Artigo em Inglês | MEDLINE | ID: mdl-38982177

RESUMO

Diagnostic tests for Parkinsonism based on speech samples have shown promising results. Although abnormal auditory feedback integration during speech production and impaired rhythmic organization of speech are known in Parkinsonism, these aspects have not been incorporated into diagnostic tests. This study aimed to identify Parkinsonism using a novel speech behavioral test that involved rhythmically repeating syllables under different auditory feedback conditions. The study included 30 individuals with Parkinson's disease (PD) and 30 healthy subjects. Participants were asked to rhythmically repeat the PA-TA-KA syllable sequence, both whispering and speaking aloud under various listening conditions. The results showed that individuals with PD had difficulties in whispering and articulating under altered auditory feedback conditions, exhibited delayed speech onset, and demonstrated inconsistent rhythmic structure across trials compared to controls. These parameters were then fed into a supervised machine-learning algorithm to differentiate between the two groups. The algorithm achieved an accuracy of 85.4%, a sensitivity of 86.5%, and a specificity of 84.3%. This pilot study highlights the potential of the proposed behavioral paradigm as an objective and accessible (both in cost and time) test for identifying individuals with Parkinson's disease.

Assuntos

Retroalimentação Sensorial , Doença de Parkinson , Fala , Humanos , Feminino , Masculino , Idoso , Doença de Parkinson/fisiopatologia , Doença de Parkinson/diagnóstico , Pessoa de Meia-Idade , Fala/fisiologia , Retroalimentação Sensorial/fisiologia , Projetos Piloto , Transtornos Parkinsonianos/fisiopatologia , Estudos de Casos e Controles

9.

Imagined speech event detection from electrocorticography and its transfer between speech modes and subjects.

de Borman, Aurélie; Wittevrongel, Benjamin; Dauwe, Ine; Carrette, Evelien; Meurs, Alfred; Van Roost, Dirk; Boon, Paul; Van Hulle, Marc M.

Commun Biol ; 7(1): 818, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38969758

RESUMO

Speech brain-computer interfaces aim to support communication-impaired patients by translating neural signals into speech. While impressive progress was achieved in decoding performed, perceived and attempted speech, imagined speech remains elusive, mainly due to the absence of behavioral output. Nevertheless, imagined speech is advantageous since it does not depend on any articulator movements that might become impaired or even lost throughout the stages of a neurodegenerative disease. In this study, we analyzed electrocortigraphy data recorded from 16 participants in response to 3 speech modes: performed, perceived (listening), and imagined speech. We used a linear model to detect speech events and examined the contributions of each frequency band, from delta to high gamma, given the speech mode and electrode location. For imagined speech detection, we observed a strong contribution of gamma bands in the motor cortex, whereas lower frequencies were more prominent in the temporal lobe, in particular of the left hemisphere. Based on the similarities in frequency patterns, we were able to transfer models between speech modes and participants with similar electrode locations.

Assuntos

Interfaces Cérebro-Computador , Eletrocorticografia , Imaginação , Fala , Humanos , Eletrocorticografia/métodos , Fala/fisiologia , Masculino , Feminino , Adulto , Imaginação/fisiologia , Adulto Jovem , Córtex Motor/fisiologia

10.

Perception and adaptation of receptive prosody in autistic adolescents.

Kurumada, Chigusa; Rivera, Rachel; Allen, Paul; Bennetto, Loisa.

Sci Rep ; 14(1): 16409, 2024 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-39013983

RESUMO

A fundamental aspect of language processing is inferring others' minds from subtle variations in speech. The same word or sentence can often convey different meanings depending on its tempo, timing, and intonation-features often referred to as prosody. Although autistic children and adults are known to experience difficulty in making such inferences, the science remains unclear as to why. We hypothesize that detail-oriented perception in autism may interfere with the inference process if it lacks the adaptivity required to cope with the variability ubiquitous in human speech. Using a novel prosodic continuum that shifts the sentence meaning gradiently from a statement (e.g., "It's raining") to a question (e.g., "It's raining?"), we have investigated the perception and adaptation of receptive prosody in autistic adolescents and two groups of non-autistic controls. Autistic adolescents showed attenuated adaptivity in categorizing prosody, whereas they were equivalent to controls in terms of discrimination accuracy. Combined with recent findings in segmental (e.g., phoneme) recognition, the current results provide the basis for an emerging research framework for attenuated flexibility and reduced influence of contextual feedback as a possible source of deficits that hinder linguistic and social communication in autism.

Assuntos

Transtorno Autístico , Percepção da Fala , Humanos , Adolescente , Masculino , Feminino , Percepção da Fala/fisiologia , Transtorno Autístico/fisiopatologia , Transtorno Autístico/psicologia , Idioma , Criança , Fala/fisiologia

11.

Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models.

Alhazmi, Ali; Mahmud, Rohana; Idris, Norisma; Mohamed Abo, Mohamed Elhag; Eke, Christopher Ifeanyi.

PLoS One ; 19(7): e0305657, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39018339

RESUMO

Technological developments over the past few decades have changed the way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on social media, it is still a concern that needs to be constantly recognized and observed. The Arabic language poses particular difficulties in the detection of hate speech, despite the considerable efforts made in this area for English-language social media content. Arabic calls for particular consideration when it comes to hate speech detection because of its many dialects and linguistic nuances. Another degree of complication is added by the widespread practice of "code-mixing," in which users merge various languages smoothly. Recognizing this research vacuum, the study aims to close it by examining how well machine learning models containing variation features can detect hate speech, especially when it comes to Arabic tweets featuring code-mixing. Therefore, the objective of this study is to assess and compare the effectiveness of different features and machine learning models for hate speech detection on Arabic hate speech and code-mixing hate speech datasets. To achieve the objectives, the methodology used includes data collection, data pre-processing, feature extraction, the construction of classification models, and the evaluation of the constructed classification models. The findings from the analysis revealed that the TF-IDF feature, when employed with the SGD model, attained the highest accuracy, reaching 98.21%. Subsequently, these results were contrasted with outcomes from three existing studies, and the proposed method outperformed them, underscoring the significance of the proposed method. Consequently, our study carries practical implications and serves as a foundational exploration in the realm of automated hate speech detection in text.

Assuntos

Idioma , Aprendizado de Máquina , Mídias Sociais , Humanos , Fala/fisiologia

12.

Structural and sequential regularities modulate phrase-rate neural tracking.

Zhao, Junyuan; Martin, Andrea E; Coopmans, Cas W.

Sci Rep ; 14(1): 16603, 2024 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-39025957

RESUMO

Electrophysiological brain activity has been shown to synchronize with the quasi-regular repetition of grammatical phrases in connected speech-so-called phrase-rate neural tracking. Current debate centers around whether this phenomenon is best explained in terms of the syntactic properties of phrases or in terms of syntax-external information, such as the sequential repetition of parts of speech. As these two factors were confounded in previous studies, much of the literature is compatible with both accounts. Here, we used electroencephalography (EEG) to determine if and when the brain is sensitive to both types of information. Twenty native speakers of Mandarin Chinese listened to isochronously presented streams of monosyllabic words, which contained either grammatical two-word phrases (e.g., catch fish, sell house) or non-grammatical word combinations (e.g., full lend, bread far). Within the grammatical conditions, we varied two structural factors: the position of the head of each phrase and the type of attachment. Within the non-grammatical conditions, we varied the consistency with which parts of speech were repeated. Tracking was quantified through evoked power and inter-trial phase coherence, both derived from the frequency-domain representation of EEG responses. As expected, neural tracking at the phrase rate was stronger in grammatical sequences than in non-grammatical sequences without syntactic structure. Moreover, it was modulated by both attachment type and head position, revealing the structure-sensitivity of phrase-rate tracking. We additionally found that the brain tracks the repetition of parts of speech in non-grammatical sequences. These data provide an integrative perspective on the current debate about neural tracking effects, revealing that the brain utilizes regularities computed over multiple levels of linguistic representation in guiding rhythmic computation.

Assuntos

Encéfalo , Eletroencefalografia , Humanos , Masculino , Feminino , Adulto , Encéfalo/fisiologia , Adulto Jovem , Idioma , Percepção da Fala/fisiologia , Fala/fisiologia

13.

Focus-marking in a tonal language: Prosodic differences between Cantonese-speaking children with and without autism spectrum disorder.

Chen, Si; Zhang, Yixin; Zhou, Fang; Chan, Angel; Li, Bei; Li, Bin; Tang, Tempo; Chun, Eunjin; Chen, Zhuoming.

PLoS One ; 19(7): e0306272, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39028710

RESUMO

Abnormal speech prosody has been widely reported in individuals with autism. Many studies on children and adults with autism spectrum disorder speaking a non-tonal language showed deficits in using prosodic cues to mark focus. However, focus marking by autistic children speaking a tonal language is rarely examined. Cantonese-speaking children may face additional difficulties because tonal languages require them to use prosodic cues to achieve multiple functions simultaneously such as lexical contrasting and focus marking. This study bridges this research gap by acoustically evaluating the use of Cantonese speech prosody to mark information structure by Cantonese-speaking children with and without autism spectrum disorder. We designed speech production tasks to elicit natural broad and narrow focus production among these children in sentences with different tone combinations. Acoustic correlates of prosodic focus marking like f0, duration and intensity of each syllable were analyzed to examine the effect of participant group, focus condition and lexical tones. Our results showed differences in focus marking patterns between Cantonese-speaking children with and without autism spectrum disorder. The autistic children not only showed insufficient on-focus expansion in terms of f0 range and duration when marking focus, but also produced less distinctive tone shapes in general. There was no evidence that the prosodic complexity (i.e. sentences with single tones or combinations of tones) significantly affected focus marking in these autistic children and their typically-developing (TD) peers.

Assuntos

Transtorno do Espectro Autista , Idioma , Humanos , Transtorno do Espectro Autista/fisiopatologia , Transtorno do Espectro Autista/psicologia , Masculino , Feminino , Criança , Acústica da Fala , Pré-Escolar , Fala/fisiologia

14.

Comparison of speech changes caused by four different orthodontic retainers: a crossover randomized clinical trial.

Lorenzoni, Diego Coelho; Henriques, José Fernando Castanha; Silva, Letícia Korb da; Rosa, Raquel Rodrigues; Berretin-Felix, Giédre; Freitas, Karina Maria Salvatore; Janson, Guilherme.

Dental Press J Orthod ; 29(3): e2423277, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38985077

RESUMO

OBJECTIVE: This study aimed to compare the influence of four different maxillary removable orthodontic retainers on speech. MATERIAL AND METHODS: Eligibility criteria for sample selection were: 20-40-year subjects with acceptable occlusion, native speakers of Portuguese. The volunteers (n=21) were divided in four groups randomized with a 1:1:1:1 allocation ratio. The four groups used, in random order, the four types of retainers full-time for 21 days each, with a washout period of 7-days. The removable maxillary retainers were: conventional wraparound, wraparound with an anterior hole, U-shaped wraparound, and thermoplastic retainer. Three volunteers were excluded. The final sample comprised 18 subjects (11 male; 7 female) with mean age of 27.08 years (SD=4.65). The speech evaluation was performed in vocal excerpts recordings made before, immediately after, and 21 days after the installation of each retainer, with auditory-perceptual and acoustic analysis of formant frequencies F1 and F2 of the vowels. Repeated measures ANOVA and Friedman with Tukey tests were used for statistical comparison. RESULTS: Speech changes increased immediately after conventional wraparound and thermoplastic retainer installation, and reduced after 21 days, but not to normal levels. However, this increase was statistically significant only for the wraparound with anterior hole and the thermoplastic retainer. Formant frequencies of vowels were altered at initial time, and the changes remained in conventional, U-shaped and thermoplastic appliances after three weeks. CONCLUSIONS: The thermoplastic retainer was more harmful to the speech than wraparound appliances. The conventional and U-shaped retainers interfered less in speech. The three-week period was not sufficient for speech adaptation.

Assuntos

Estudos Cross-Over , Contenções Ortodônticas , Humanos , Feminino , Masculino , Adulto , Desenho de Aparelho Ortodôntico , Adulto Jovem , Fala/fisiologia

15.

Speech Transforms into Text I "See": From the time I learned to read, I have experienced a form of mental closed-captioning called ticker-tape synesthesia.

Makowski, Emily.

Sci Am ; 331(1): 90, 2024 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-39017518

Assuntos

Sinestesia , Humanos , Leitura , Fala

16.

Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network.

Gul, Sania; Khan, Muhammad Salman; Ur-Rehman, Ata.

PLoS One ; 19(7): e0301692, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39012881

RESUMO

Speech enhancement is crucial both for human and machine listening applications. Over the last decade, the use of deep learning for speech enhancement has resulted in tremendous improvement over the classical signal processing and machine learning methods. However, training a deep neural network is not only time-consuming; it also requires extensive computational resources and a large training dataset. Transfer learning, i.e. using a pretrained network for a new task, comes to the rescue by reducing the amount of training time, computational resources, and the required dataset, but the network still needs to be fine-tuned for the new task. This paper presents a novel method of speech denoising and dereverberation (SD&D) on an end-to-end frozen binaural anechoic speech separation network. The frozen network requires neither any architectural change nor any fine-tuning for the new task, as is usually required for transfer learning. The interaural cues of a source placed inside noisy and echoic surroundings are given as input to this pretrained network to extract the target speech from noise and reverberation. Although the pretrained model used in this paper has never seen noisy reverberant conditions during its training, it performs satisfactorily for zero-shot testing (ZST) under these conditions. It is because the pretrained model used here has been trained on the direct-path interaural cues of an active source and so it can recognize them even in the presence of echoes and noise. ZST on the same dataset on which the pretrained network was trained (homo-corpus) for the unseen class of interference, has shown considerable improvement over the weighted prediction error (WPE) algorithm in terms of four objective speech quality and intelligibility metrics. Also, the proposed model offers similar performance provided by a deep learning SD&D algorithm for this dataset under varying conditions of noise and reverberations. Similarly, ZST on a different dataset has provided an improvement in intelligibility and almost equivalent quality as provided by the WPE algorithm.

Assuntos

Ruído , Humanos , Fala , Aprendizado Profundo , Razão Sinal-Ruído , Redes Neurais de Computação , Percepção da Fala/fisiologia , Algoritmos , Processamento de Sinais Assistido por Computador

17.

Who is singing? Voice recognition from spoken versus sung speech.

Cooper, Angela; Eitel, Matthew; Fecher, Natalie; Johnson, Elizabeth; Cirelli, Laura K.

JASA Express Lett ; 4(6)2024 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-38888432

RESUMO

Singing is socially important but constrains voice acoustics, potentially masking certain aspects of vocal identity. Little is known about how well listeners extract talker details from sung speech or identify talkers across the sung and spoken modalities. Here, listeners (n = 149) were trained to recognize sung or spoken voices and then tested on their identification of these voices in both modalities. Learning vocal identities was initially easier through speech than song. At test, cross-modality voice recognition was above chance, but weaker than within-modality recognition. We conclude that talker information is accessible in sung speech, despite acoustic constraints in song.

Assuntos

Canto , Percepção da Fala , Humanos , Masculino , Feminino , Adulto , Percepção da Fala/fisiologia , Voz , Adulto Jovem , Reconhecimento Psicológico , Fala

18.

Chunk boundaries disrupt dependency processing in an AG: Reconciling incremental processing and discrete sampling.

Lo, Chia-Wen; Meyer, Lars.

PLoS One ; 19(6): e0305333, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38889141

RESUMO

Language is rooted in our ability to compose: We link words together, fusing their meanings. Links are not limited to neighboring words but often span intervening words. The ability to process these non-adjacent dependencies (NADs) conflicts with the brain's sampling of speech: We consume speech in chunks that are limited in time, containing only a limited number of words. It is unknown how we link words together that belong to separate chunks. Here, we report that we cannot-at least not so well. In our electroencephalography (EEG) study, 37 human listeners learned chunks and dependencies from an artificial grammar (AG) composed of syllables. Multi-syllable chunks to be learned were equal-sized, allowing us to employ a frequency-tagging approach. On top of chunks, syllable streams contained NADs that were either confined to a single chunk or crossed a chunk boundary. Frequency analyses of the EEG revealed a spectral peak at the chunk rate, showing that participants learned the chunks. NADs that cross boundaries were associated with smaller electrophysiological responses than within-chunk NADs. This shows that NADs are processed readily when they are confined to the same chunk, but not as well when crossing a chunk boundary. Our findings help to reconcile the classical notion that language is processed incrementally with recent evidence for discrete perceptual sampling of speech. This has implications for language acquisition and processing as well as for the general view of syntax in human language.

Assuntos

Eletroencefalografia , Idioma , Humanos , Feminino , Masculino , Adulto , Adulto Jovem , Percepção da Fala/fisiologia , Fala/fisiologia , Aprendizagem/fisiologia , Encéfalo/fisiologia

19.

Posterior tongue tie: that is a thing?

Black, Kaelan.

Curr Opin Otolaryngol Head Neck Surg ; 32(4): 282-285, 2024 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-38869616

RESUMO

PURPOSE OF REVIEW: The purpose of this review is to examine the current research of the posterior tongue tie and how it relates to breast feeding, solid feeding, and speech. RECENT FINDINGS: Recent findings show that the posterior tongue tie may play a role in effective breast feeding. SUMMARY: Ankyloglossia is the term used for the restriction of the movement of the tongue that impairs certain functions such as breastfeeding or bottle feeding, feeding with solids, and speech. Cadaver studies have shown that there can be a restriction of the tongue and oral tissues in some people relative to others. In some breast-feeding studies, releasing the posterior tie has been shown to improve certain aspects of tongue movement. There is little evidence for or against posterior tongue ties contributing to other problems such as speech and solid feeding. This article goes into depth about the current studies on posterior ankyloglossia.

Assuntos

Anquiloglossia , Aleitamento Materno , Língua , Humanos , Fala/fisiologia

20.

A data-efficient and easy-to-use lip language interface based on wearable motion capture and speech movement reconstruction.

Liu, Shiqiang; Fawden, Terry; Zhu, Rong; Malliaras, George G; Bance, Manohar.

Sci Adv ; 10(26): eado9576, 2024 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-38924408

RESUMO

Lip language recognition urgently needs wearable and easy-to-use interfaces for interference-free and high-fidelity lip-reading acquisition and to develop accompanying data-efficient decoder-modeling methods. Existing solutions suffer from unreliable lip reading, are data hungry, and exhibit poor generalization. Here, we propose a wearable lip language decoding technology that enables interference-free and high-fidelity acquisition of lip movements and data-efficient recognition of fluent lip language based on wearable motion capture and continuous lip speech movement reconstruction. The method allows us to artificially generate any wanted continuous speech datasets from a very limited corpus of word samples from users. By using these artificial datasets to train the decoder, we achieve an average accuracy of 92.0% across individuals (n = 7) for actual continuous and fluent lip speech recognition for 93 English sentences, even observing no training burn on users because all training datasets are artificially generated. Our method greatly minimizes users' training/learning load and presents a data-efficient and easy-to-use paradigm for lip language recognition.

Assuntos

Fala , Dispositivos Eletrônicos Vestíveis , Humanos , Idioma , Lábio/fisiologia , Movimento , Masculino , Feminino , Adulto , Leitura Labial , Captura de Movimento

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA