Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28.239
Filtrar
Mais filtros








Intervalo de ano de publicação
1.
Commun Biol ; 7(1): 818, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38969758

RESUMO

Speech brain-computer interfaces aim to support communication-impaired patients by translating neural signals into speech. While impressive progress was achieved in decoding performed, perceived and attempted speech, imagined speech remains elusive, mainly due to the absence of behavioral output. Nevertheless, imagined speech is advantageous since it does not depend on any articulator movements that might become impaired or even lost throughout the stages of a neurodegenerative disease. In this study, we analyzed electrocortigraphy data recorded from 16 participants in response to 3 speech modes: performed, perceived (listening), and imagined speech. We used a linear model to detect speech events and examined the contributions of each frequency band, from delta to high gamma, given the speech mode and electrode location. For imagined speech detection, we observed a strong contribution of gamma bands in the motor cortex, whereas lower frequencies were more prominent in the temporal lobe, in particular of the left hemisphere. Based on the similarities in frequency patterns, we were able to transfer models between speech modes and participants with similar electrode locations.


Assuntos
Interfaces Cérebro-Computador , Eletrocorticografia , Imaginação , Fala , Humanos , Eletrocorticografia/métodos , Fala/fisiologia , Masculino , Feminino , Adulto , Imaginação/fisiologia , Adulto Jovem , Córtex Motor/fisiologia
2.
Sci Rep ; 14(1): 15611, 2024 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-38971806

RESUMO

This study compares how English-speaking adults and children from the United States adapt their speech when talking to a real person and a smart speaker (Amazon Alexa) in a psycholinguistic experiment. Overall, participants produced more effortful speech when talking to a device (longer duration and higher pitch). These differences also varied by age: children produced even higher pitch in device-directed speech, suggesting a stronger expectation to be misunderstood by the system. In support of this, we see that after a staged recognition error by the device, children increased pitch even more. Furthermore, both adults and children displayed the same degree of variation in their responses for whether "Alexa seems like a real person or not", further indicating that children's conceptualization of the system's competence shaped their register adjustments, rather than an increased anthropomorphism response. This work speaks to models on the mechanisms underlying speech production, and human-computer interaction frameworks, providing support for routinized theories of spoken interaction with technology.


Assuntos
Fala , Humanos , Adulto , Criança , Masculino , Feminino , Fala/fisiologia , Adulto Jovem , Adolescente , Psicolinguística
3.
Sensors (Basel) ; 24(13)2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-39000889

RESUMO

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.


Assuntos
Emoções , Fala , Humanos , Emoções/fisiologia , Fala/fisiologia , Algoritmos , Reprodutibilidade dos Testes , Reconhecimento Automatizado de Padrão/métodos , Bases de Dados Factuais
5.
Cogn Sci ; 48(7): e13478, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38980972

RESUMO

How do cognitive pressures shape the lexicons of natural languages? Here, we reframe George Kingsley Zipf's proposed "law of abbreviation" within a more general framework that relates it to cognitive pressures that affect speakers and listeners. In this new framework, speakers' drive to reduce effort (Zipf's proposal) is counteracted by the need for low-frequency words to have word forms that are sufficiently distinctive to allow for accurate recognition by listeners. To support this framework, we replicate and extend recent work using the prevalence of subword phonemic sequences (phonotactic probability) to measure speakers' production effort in place of Zipf's measure of length. Across languages and corpora, phonotactic probability is more strongly correlated with word frequency than word length. We also show this measure of ease of speech production (phonotactic probability) is strongly correlated with a measure of perceptual difficulty that indexes the degree of competition from alternative interpretations in word recognition. This is consistent with the claim that there must be trade-offs between these two factors, and is inconsistent with a recent proposal that phonotactic probability facilitates both perception and production. To our knowledge, this is the first work to offer an explanation why long, phonotactically improbable word forms remain in the lexicons of natural languages.


Assuntos
Idioma , Fonética , Reconhecimento Psicológico , Percepção da Fala , Humanos , Fala
6.
PLoS One ; 19(7): e0301692, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39012881

RESUMO

Speech enhancement is crucial both for human and machine listening applications. Over the last decade, the use of deep learning for speech enhancement has resulted in tremendous improvement over the classical signal processing and machine learning methods. However, training a deep neural network is not only time-consuming; it also requires extensive computational resources and a large training dataset. Transfer learning, i.e. using a pretrained network for a new task, comes to the rescue by reducing the amount of training time, computational resources, and the required dataset, but the network still needs to be fine-tuned for the new task. This paper presents a novel method of speech denoising and dereverberation (SD&D) on an end-to-end frozen binaural anechoic speech separation network. The frozen network requires neither any architectural change nor any fine-tuning for the new task, as is usually required for transfer learning. The interaural cues of a source placed inside noisy and echoic surroundings are given as input to this pretrained network to extract the target speech from noise and reverberation. Although the pretrained model used in this paper has never seen noisy reverberant conditions during its training, it performs satisfactorily for zero-shot testing (ZST) under these conditions. It is because the pretrained model used here has been trained on the direct-path interaural cues of an active source and so it can recognize them even in the presence of echoes and noise. ZST on the same dataset on which the pretrained network was trained (homo-corpus) for the unseen class of interference, has shown considerable improvement over the weighted prediction error (WPE) algorithm in terms of four objective speech quality and intelligibility metrics. Also, the proposed model offers similar performance provided by a deep learning SD&D algorithm for this dataset under varying conditions of noise and reverberations. Similarly, ZST on a different dataset has provided an improvement in intelligibility and almost equivalent quality as provided by the WPE algorithm.


Assuntos
Ruído , Humanos , Fala , Aprendizado Profundo , Razão Sinal-Ruído , Redes Neurais de Computação , Percepção da Fala/fisiologia , Algoritmos , Processamento de Sinais Assistido por Computador
7.
Sci Rep ; 14(1): 16409, 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39013983

RESUMO

A fundamental aspect of language processing is inferring others' minds from subtle variations in speech. The same word or sentence can often convey different meanings depending on its tempo, timing, and intonation-features often referred to as prosody. Although autistic children and adults are known to experience difficulty in making such inferences, the science remains unclear as to why. We hypothesize that detail-oriented perception in autism may interfere with the inference process if it lacks the adaptivity required to cope with the variability ubiquitous in human speech. Using a novel prosodic continuum that shifts the sentence meaning gradiently from a statement (e.g., "It's raining") to a question (e.g., "It's raining?"), we have investigated the perception and adaptation of receptive prosody in autistic adolescents and two groups of non-autistic controls. Autistic adolescents showed attenuated adaptivity in categorizing prosody, whereas they were equivalent to controls in terms of discrimination accuracy. Combined with recent findings in segmental (e.g., phoneme) recognition, the current results provide the basis for an emerging research framework for attenuated flexibility and reduced influence of contextual feedback as a possible source of deficits that hinder linguistic and social communication in autism.


Assuntos
Transtorno Autístico , Percepção da Fala , Humanos , Adolescente , Masculino , Feminino , Percepção da Fala/fisiologia , Transtorno Autístico/fisiopatologia , Transtorno Autístico/psicologia , Idioma , Criança , Fala/fisiologia
8.
Sci Data ; 11(1): 746, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38982093

RESUMO

Many research articles have explored the impact of surgical interventions on voice and speech evaluations, but advances are limited by the lack of publicly accessible datasets. To address this, a comprehensive corpus of 107 Spanish Castilian speakers was recorded, including control speakers and patients who underwent upper airway surgeries such as Tonsillectomy, Functional Endoscopic Sinus Surgery, and Septoplasty. The dataset contains 3,800 audio files, averaging 35.51 ± 5.91 recordings per patient. This resource enables systematic investigation of the effects of upper respiratory tract surgery on voice and speech. Previous studies using this corpus have shown no relevant changes in key acoustic parameters for sustained vowel phonation, consistent with initial hypotheses. However, the analysis of speech recordings, particularly nasalised segments, remains open for further research. Additionally, this dataset facilitates the study of the impact of upper airway surgery on speaker recognition and identification methods, and testing of anti-spoofing methodologies for improved robustness.


Assuntos
Fala , Voz , Humanos , Período Pós-Operatório , Tonsilectomia , Masculino , Feminino , Período Pré-Operatório , Adulto
9.
Sci Rep ; 14(1): 15787, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38982177

RESUMO

Diagnostic tests for Parkinsonism based on speech samples have shown promising results. Although abnormal auditory feedback integration during speech production and impaired rhythmic organization of speech are known in Parkinsonism, these aspects have not been incorporated into diagnostic tests. This study aimed to identify Parkinsonism using a novel speech behavioral test that involved rhythmically repeating syllables under different auditory feedback conditions. The study included 30 individuals with Parkinson's disease (PD) and 30 healthy subjects. Participants were asked to rhythmically repeat the PA-TA-KA syllable sequence, both whispering and speaking aloud under various listening conditions. The results showed that individuals with PD had difficulties in whispering and articulating under altered auditory feedback conditions, exhibited delayed speech onset, and demonstrated inconsistent rhythmic structure across trials compared to controls. These parameters were then fed into a supervised machine-learning algorithm to differentiate between the two groups. The algorithm achieved an accuracy of 85.4%, a sensitivity of 86.5%, and a specificity of 84.3%. This pilot study highlights the potential of the proposed behavioral paradigm as an objective and accessible (both in cost and time) test for identifying individuals with Parkinson's disease.


Assuntos
Retroalimentação Sensorial , Doença de Parkinson , Fala , Humanos , Feminino , Masculino , Idoso , Doença de Parkinson/fisiopatologia , Doença de Parkinson/diagnóstico , Pessoa de Meia-Idade , Fala/fisiologia , Retroalimentação Sensorial/fisiologia , Projetos Piloto , Transtornos Parkinsonianos/fisiopatologia , Estudos de Casos e Controles
10.
Dental Press J Orthod ; 29(3): e2423277, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38985077

RESUMO

OBJECTIVE: This study aimed to compare the influence of four different maxillary removable orthodontic retainers on speech. MATERIAL AND METHODS: Eligibility criteria for sample selection were: 20-40-year subjects with acceptable occlusion, native speakers of Portuguese. The volunteers (n=21) were divided in four groups randomized with a 1:1:1:1 allocation ratio. The four groups used, in random order, the four types of retainers full-time for 21 days each, with a washout period of 7-days. The removable maxillary retainers were: conventional wraparound, wraparound with an anterior hole, U-shaped wraparound, and thermoplastic retainer. Three volunteers were excluded. The final sample comprised 18 subjects (11 male; 7 female) with mean age of 27.08 years (SD=4.65). The speech evaluation was performed in vocal excerpts recordings made before, immediately after, and 21 days after the installation of each retainer, with auditory-perceptual and acoustic analysis of formant frequencies F1 and F2 of the vowels. Repeated measures ANOVA and Friedman with Tukey tests were used for statistical comparison. RESULTS: Speech changes increased immediately after conventional wraparound and thermoplastic retainer installation, and reduced after 21 days, but not to normal levels. However, this increase was statistically significant only for the wraparound with anterior hole and the thermoplastic retainer. Formant frequencies of vowels were altered at initial time, and the changes remained in conventional, U-shaped and thermoplastic appliances after three weeks. CONCLUSIONS: The thermoplastic retainer was more harmful to the speech than wraparound appliances. The conventional and U-shaped retainers interfered less in speech. The three-week period was not sufficient for speech adaptation.


Assuntos
Estudos Cross-Over , Contenções Ortodônticas , Humanos , Feminino , Masculino , Adulto , Desenho de Aparelho Ortodôntico , Adulto Jovem , Fala/fisiologia
11.
Sci Rep ; 14(1): 16603, 2024 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-39025957

RESUMO

Electrophysiological brain activity has been shown to synchronize with the quasi-regular repetition of grammatical phrases in connected speech-so-called phrase-rate neural tracking. Current debate centers around whether this phenomenon is best explained in terms of the syntactic properties of phrases or in terms of syntax-external information, such as the sequential repetition of parts of speech. As these two factors were confounded in previous studies, much of the literature is compatible with both accounts. Here, we used electroencephalography (EEG) to determine if and when the brain is sensitive to both types of information. Twenty native speakers of Mandarin Chinese listened to isochronously presented streams of monosyllabic words, which contained either grammatical two-word phrases (e.g., catch fish, sell house) or non-grammatical word combinations (e.g., full lend, bread far). Within the grammatical conditions, we varied two structural factors: the position of the head of each phrase and the type of attachment. Within the non-grammatical conditions, we varied the consistency with which parts of speech were repeated. Tracking was quantified through evoked power and inter-trial phase coherence, both derived from the frequency-domain representation of EEG responses. As expected, neural tracking at the phrase rate was stronger in grammatical sequences than in non-grammatical sequences without syntactic structure. Moreover, it was modulated by both attachment type and head position, revealing the structure-sensitivity of phrase-rate tracking. We additionally found that the brain tracks the repetition of parts of speech in non-grammatical sequences. These data provide an integrative perspective on the current debate about neural tracking effects, revealing that the brain utilizes regularities computed over multiple levels of linguistic representation in guiding rhythmic computation.


Assuntos
Encéfalo , Eletroencefalografia , Humanos , Masculino , Feminino , Adulto , Encéfalo/fisiologia , Adulto Jovem , Idioma , Percepção da Fala/fisiologia , Fala/fisiologia
12.
J Robot Surg ; 18(1): 287, 2024 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-39026112

RESUMO

Transoral robotic surgery (TORS) has been introduced to head and neck surgery as a minimally invasive techqniques to improve the functional outcomes of patients. Compare the functional outcome for swallowing and speech in each site of TORS within the head and neck. Retrospective cohort study for patients who underwent TORS within the head and neck unit. Patients were assessed at four different time points (one day, one month, six months and twelve months, respectively) with bedside/office testing. Methods of testing for swallowing assessment were by the International Dysphagia Diet Standardization Initiative (IDDSI), and speech assessments were carried out using the Understandability of Speech score (USS). Outcomes were compared to patient-specific pre-treatment baseline levels. 68 patients were included. 75% and 40% of the patients resumed normal fluid intake and normal diet immediately after surgery. 8.8% required a temporary feeding tube, with 1% required gastrostomy. There was a steep improvement in diet between 3 and 6 months. Fluid and diet consistency dropped significantly following the majority of transoral robotic surgery with more noticeable diet changes. Early deterioration in diet is temporary and manageable with a modified diet. Rapid recovery of swallowing is achieved before the first year. There is no long-term effect on speech.


Assuntos
Transtornos de Deglutição , Deglutição , Procedimentos Cirúrgicos Robóticos , Fala , Humanos , Procedimentos Cirúrgicos Robóticos/métodos , Deglutição/fisiologia , Masculino , Feminino , Estudos Retrospectivos , Fala/fisiologia , Pessoa de Meia-Idade , Idoso , Transtornos de Deglutição/etiologia , Resultado do Tratamento , Boca , Adulto , Neoplasias de Cabeça e Pescoço/cirurgia , Idoso de 80 Anos ou mais
13.
PLoS One ; 19(7): e0305657, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39018339

RESUMO

Technological developments over the past few decades have changed the way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on social media, it is still a concern that needs to be constantly recognized and observed. The Arabic language poses particular difficulties in the detection of hate speech, despite the considerable efforts made in this area for English-language social media content. Arabic calls for particular consideration when it comes to hate speech detection because of its many dialects and linguistic nuances. Another degree of complication is added by the widespread practice of "code-mixing," in which users merge various languages smoothly. Recognizing this research vacuum, the study aims to close it by examining how well machine learning models containing variation features can detect hate speech, especially when it comes to Arabic tweets featuring code-mixing. Therefore, the objective of this study is to assess and compare the effectiveness of different features and machine learning models for hate speech detection on Arabic hate speech and code-mixing hate speech datasets. To achieve the objectives, the methodology used includes data collection, data pre-processing, feature extraction, the construction of classification models, and the evaluation of the constructed classification models. The findings from the analysis revealed that the TF-IDF feature, when employed with the SGD model, attained the highest accuracy, reaching 98.21%. Subsequently, these results were contrasted with outcomes from three existing studies, and the proposed method outperformed them, underscoring the significance of the proposed method. Consequently, our study carries practical implications and serves as a foundational exploration in the realm of automated hate speech detection in text.


Assuntos
Idioma , Aprendizado de Máquina , Mídias Sociais , Humanos , Fala/fisiologia
14.
Sensors (Basel) ; 24(12)2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38931629

RESUMO

Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively.


Assuntos
Algoritmos , Interface para o Reconhecimento da Fala , Humanos , Fala/fisiologia , Dinâmica não Linear , Reconhecimento Automatizado de Padrão/métodos
15.
Sci Data ; 11(1): 700, 2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-38937483

RESUMO

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up and help beat coronavirus' digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results.


Assuntos
COVID-19 , Humanos , Tosse , COVID-19/diagnóstico , Expiração , Aprendizado de Máquina , Reação em Cadeia da Polimerase , Fala , Reino Unido
16.
Curr Opin Otolaryngol Head Neck Surg ; 32(4): 282-285, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-38869616

RESUMO

PURPOSE OF REVIEW: The purpose of this review is to examine the current research of the posterior tongue tie and how it relates to breast feeding, solid feeding, and speech. RECENT FINDINGS: Recent findings show that the posterior tongue tie may play a role in effective breast feeding. SUMMARY: Ankyloglossia is the term used for the restriction of the movement of the tongue that impairs certain functions such as breastfeeding or bottle feeding, feeding with solids, and speech. Cadaver studies have shown that there can be a restriction of the tongue and oral tissues in some people relative to others. In some breast-feeding studies, releasing the posterior tie has been shown to improve certain aspects of tongue movement. There is little evidence for or against posterior tongue ties contributing to other problems such as speech and solid feeding. This article goes into depth about the current studies on posterior ankyloglossia.


Assuntos
Anquiloglossia , Aleitamento Materno , Língua , Humanos , Fala/fisiologia
17.
Neuropsychologia ; 201: 108944, 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-38925511

RESUMO

The present study investigated how instructions for paying attention to auditory feedback may affect speech error detection and sensorimotor control. Electroencephalography (EEG) and speech signals were recorded from 21 neurologically intact adult subjects while they produced the speech vowel sound /a/ and received randomized ±100 cents pitch-shift alterations in their real-time auditory feedback. Subjects were instructed to pay attention to their auditory feedback and press a button to indicate whether they detected a pitch-shift stimulus during trials. Data for this group was compared with 22 matched subjects who completed the same speech task under altered auditory feedback condition without attentional instructions. Results revealed a significantly smaller magnitude of speech compensations in the attentional-instruction vs. no-instruction group and a positive linear association between the magnitude of compensations and P2 event-related potential (ERP) amplitudes. In addition, we found that the amplitude of P2 ERP component was significantly larger in the attentional-instruction vs. no-instruction group. Source localization analysis showed that this effect was accounted for by significantly stronger neural activities in the right hemisphere insula, precentral gyrus, postcentral gyrus, transverse temporal gyrus, and superior temporal gyrus in the attentional-instruction group. These findings suggest that attentional instructions may enhance speech auditory feedback error detection, and subsequently improve sensorimotor control via generating more stable speech outputs (i.e., smaller compensations) in response to pitch-shift alterations. Our data are informative for advancing theoretical models and motivating targeted interventions with a focus on the role of attentional instructions for improving treatment outcomes in patients with motor speech disorders.


Assuntos
Atenção , Eletroencefalografia , Retroalimentação Sensorial , Fala , Humanos , Masculino , Atenção/fisiologia , Feminino , Adulto , Adulto Jovem , Retroalimentação Sensorial/fisiologia , Fala/fisiologia , Percepção da Fala/fisiologia , Potenciais Evocados/fisiologia , Estimulação Acústica , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico
18.
Physiol Behav ; 283: 114615, 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-38880296

RESUMO

This study sets out to investigate the potential effect of males' testosterone level on speech production and speech perception. Regarding speech production, we investigate intra- and inter-individual variation in mean fundamental frequency (fo) and formant frequencies and highlight the potential interacting effect of another hormone, i.e. cortisol. In addition, we investigate the influence of different speech materials on the relationship between testosterone and speech production. Regarding speech perception, we investigate the potential effect of individual differences in males' testosterone level on ratings of attractiveness of female voices. In the production study, data is gathered from 30 healthy adult males ranging from 19 to 27 years (mean age: 22.4, SD: 2.2) who recorded their voices and provided saliva samples at 9 am, 12 noon and 3 pm on a single day. Speech material consists of sustained vowels, counting, read speech and a free description of pictures. Biological measures comprise speakers' height, grip strength, and hormone levels (testosterone and cortisol). In the perception study, participants were asked to rate the attractiveness of female voice stimuli (sentence stimulus, same-speaker pairs) that were manipulated in three steps regarding mean fo and formant frequencies. Regarding speech production, our results show that testosterone affected mean fo (but not formants) both within and between speakers. This relationship was weakened in speakers with high cortisol levels and depended on the speech material. Regarding speech perception, we found female stimuli with higher mean fo and formants to be rated as sounding more attractive than stimuli with lower mean fo and formants. Moreover, listeners with low testosterone showed an increased sensitivity to vocal cues of female attractiveness. While our results of the production study support earlier findings of a relationship between testosterone and mean fo in males (which is mediated by cortisol), they also highlight the relevance of the speech material: The effect of testosterone was strongest in sustained vowels, potentially due to a strengthened effect of hormones on physiologically strongly influenced tasks such as sustained vowels in contrast to more free speech tasks such as a picture description. The perception study is the first to show an effect of males' testosterone level on female attractiveness ratings using voice stimuli.


Assuntos
Sinais (Psicologia) , Hidrocortisona , Saliva , Percepção da Fala , Fala , Testosterona , Voz , Humanos , Testosterona/metabolismo , Testosterona/farmacologia , Masculino , Adulto , Adulto Jovem , Saliva/metabolismo , Saliva/química , Hidrocortisona/metabolismo , Percepção da Fala/fisiologia , Percepção da Fala/efeitos dos fármacos , Fala/fisiologia , Fala/efeitos dos fármacos , Voz/efeitos dos fármacos , Feminino , Beleza , Estimulação Acústica
19.
Sci Rep ; 14(1): 14698, 2024 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-38926416

RESUMO

Accommodating talker variability is a complex and multi-layered cognitive process. It involves shifting attention to the vocal characteristics of the talker as well as the linguistic content of their speech. Due to an interdependence between voice and phonological processing, multi-talker environments typically incur additional processing costs compared to single-talker environments. A failure or inability to efficiently distribute attention over multiple acoustic cues in the speech signal may have detrimental language learning consequences. Yet, no studies have examined effects of multi-talker processing in populations with atypical perceptual, social and language processing for communication, including autistic people. Employing a classic word-monitoring task, we investigated effects of talker variability in Australian English autistic (n = 24) and non-autistic (n = 28) adults. Listeners responded to target words (e.g., apple, duck, corn) in randomised sequences of words. Half of the sequences were spoken by a single talker and the other half by multiple talkers. Results revealed that autistic participants' sensitivity scores to accurately-spotted target words did not differ to those of non-autistic participants, regardless of whether they were spoken by a single or multiple talkers. As expected, the non-autistic group showed the well-established processing cost associated with talker variability (e.g., slower response times). Remarkably, autistic listeners' response times did not differ across single- or multi-talker conditions, indicating they did not show perceptual processing costs when accommodating talker variability. The present findings have implications for theories of autistic perception and speech and language processing.


Assuntos
Transtorno Autístico , Percepção da Fala , Humanos , Masculino , Feminino , Adulto , Percepção da Fala/fisiologia , Transtorno Autístico/fisiopatologia , Transtorno Autístico/psicologia , Adulto Jovem , Tempo de Reação/fisiologia , Fala/fisiologia , Atenção/fisiologia , Pessoa de Meia-Idade , Idioma
20.
J Alzheimers Dis ; 100(1): 1-27, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38848181

RESUMO

Background: Dementia is a general term for several progressive neurodegenerative disorders including Alzheimer's disease. Timely and accurate detection is crucial for early intervention. Advancements in artificial intelligence present significant potential for using machine learning to aid in early detection. Objective: Summarize the state-of-the-art machine learning-based approaches for dementia prediction, focusing on non-invasive methods, as the burden on the patients is lower. Specifically, the analysis of gait and speech performance can offer insights into cognitive health through clinically cost-effective screening methods. Methods: A systematic literature review was conducted following the PRISMA protocol (Preferred Reporting Items for Systematic Reviews and Meta-Analyses). The search was performed on three electronic databases (Scopus, Web of Science, and PubMed) to identify the relevant studies published between 2017 to 2022. A total of 40 papers were selected for review. Results: The most common machine learning methods employed were support vector machine followed by deep learning. Studies suggested the use of multimodal approaches as they can provide comprehensive and better prediction performance. Deep learning application in gait studies is still in the early stages as few studies have applied it. Moreover, including features of whole body movement contribute to better classification accuracy. Regarding speech studies, the combination of different parameters (acoustic, linguistic, cognitive testing) produced better results. Conclusions: The review highlights the potential of machine learning, particularly non-invasive approaches, in the early prediction of dementia. The comparable prediction accuracies of manual and automatic speech analysis indicate an imminent fully automated approach for dementia detection.


Assuntos
Demência , Aprendizado de Máquina , Fala , Humanos , Demência/diagnóstico , Fala/fisiologia , Análise da Marcha/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA