Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 8.857
Filtrar
1.
Hum Brain Mapp ; 45(10): e26724, 2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-39001584

RESUMO

Music is ubiquitous, both in its instrumental and vocal forms. While speech perception at birth has been at the core of an extensive corpus of research, the origins of the ability to discriminate instrumental or vocal melodies is still not well investigated. In previous studies comparing vocal and musical perception, the vocal stimuli were mainly related to speaking, including language, and not to the non-language singing voice. In the present study, to better compare a melodic instrumental line with the voice, we used singing as a comparison stimulus, to reduce the dissimilarities between the two stimuli as much as possible, separating language perception from vocal musical perception. In the present study, 45 newborns were scanned, 10 full-term born infants and 35 preterm infants at term-equivalent age (mean gestational age at test = 40.17 weeks, SD = 0.44) using functional magnetic resonance imaging while listening to five melodies played by a musical instrument (flute) or sung by a female voice. To examine the dynamic task-based effective connectivity, we employed a psychophysiological interaction of co-activation patterns (PPI-CAPs) analysis, using the auditory cortices as seed region, to investigate moment-to-moment changes in task-driven modulation of cortical activity during an fMRI task. Our findings reveal condition-specific, dynamically occurring patterns of co-activation (PPI-CAPs). During the vocal condition, the auditory cortex co-activates with the sensorimotor and salience networks, while during the instrumental condition, it co-activates with the visual cortex and the superior frontal cortex. Our results show that the vocal stimulus elicits sensorimotor aspects of the auditory perception and is processed as a more salient stimulus while the instrumental condition activated higher-order cognitive and visuo-spatial networks. Common neural signatures for both auditory stimuli were found in the precuneus and posterior cingulate gyrus. Finally, this study adds knowledge on the dynamic brain connectivity underlying the newborns capability of early and specialized auditory processing, highlighting the relevance of dynamic approaches to study brain function in newborn populations.


Assuntos
Percepção Auditiva , Imageamento por Ressonância Magnética , Música , Humanos , Feminino , Masculino , Percepção Auditiva/fisiologia , Recém-Nascido , Canto/fisiologia , Recém-Nascido Prematuro/fisiologia , Mapeamento Encefálico , Estimulação Acústica , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem , Voz/fisiologia
2.
Sci Rep ; 14(1): 16162, 2024 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-39003348

RESUMO

The Web has become an essential resource but is not yet accessible to everyone. Assistive technologies and innovative, intelligent frameworks, for example, those using conversational AI, help overcome some exclusions. However, some users still experience barriers. This paper shows how a human-centered approach can shed light on technology limitations and gaps. It reports on a three-step process (focus group, co-design, and preliminary validation) that we adopted to investigate how people with speech impairments, e.g., dysarthria, browse the Web and how barriers can be reduced. The methodology helped us identify challenges and create new solutions, i.e., patterns for Web browsing, by combining voice-based conversational AI, customized for impaired speech, with techniques for the visual augmentation of web pages. While current trends in AI research focus on more and more powerful large models, participants remarked how current conversational systems do not meet their needs, and how it is important to consider each one's specificity for a technology to be called inclusive.


Assuntos
Inteligência Artificial , Internet , Voz , Humanos , Voz/fisiologia , Masculino , Feminino , Adulto , Pessoa de Meia-Idade , Comunicação , Grupos Focais
3.
Sci Data ; 11(1): 746, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38982093

RESUMO

Many research articles have explored the impact of surgical interventions on voice and speech evaluations, but advances are limited by the lack of publicly accessible datasets. To address this, a comprehensive corpus of 107 Spanish Castilian speakers was recorded, including control speakers and patients who underwent upper airway surgeries such as Tonsillectomy, Functional Endoscopic Sinus Surgery, and Septoplasty. The dataset contains 3,800 audio files, averaging 35.51 ± 5.91 recordings per patient. This resource enables systematic investigation of the effects of upper respiratory tract surgery on voice and speech. Previous studies using this corpus have shown no relevant changes in key acoustic parameters for sustained vowel phonation, consistent with initial hypotheses. However, the analysis of speech recordings, particularly nasalised segments, remains open for further research. Additionally, this dataset facilitates the study of the impact of upper airway surgery on speaker recognition and identification methods, and testing of anti-spoofing methodologies for improved robustness.


Assuntos
Fala , Voz , Humanos , Período Pós-Operatório , Tonsilectomia , Masculino , Feminino , Período Pré-Operatório , Adulto
4.
Sci Data ; 11(1): 800, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-39030186

RESUMO

This paper describes a new publicly-available database of VOiCe signals acquired in Amyotrophic Lateral Sclerosis (ALS) patients (VOC-ALS) and healthy controls performing different speech tasks. This dataset consists of 1224 voice signals recorded from 153 participants: 51 healthy controls (32 males and 19 females) and 102 ALS patients (65 males and 37 females) with different severity of dysarthria. Each subject's voice was recorded using a smartphone application (Vox4Health) while performing several vocal tasks, including a sustained phonation of the vowels /a/, /e/, /i/, /o/, /u/ and /pa/, /ta/, /ka/ syllable repetition. Basic derived speech metrics such as harmonics-to-noise ratio, mean and standard deviation of fundamental frequency (F0), jitter and shimmer were calculated. The F0 standard deviation of vowels and syllables showed an excellent ability to identify people with ALS and to discriminate the different severity of dysarthria. These data represent the most comprehensive database of voice signals in ALS and form a solid basis for research on the recognition of voice impairment in ALS patients for use in clinical applications.


Assuntos
Esclerose Lateral Amiotrófica , Disartria , Humanos , Esclerose Lateral Amiotrófica/fisiopatologia , Esclerose Lateral Amiotrófica/complicações , Disartria/fisiopatologia , Masculino , Feminino , Voz , Bases de Dados Factuais , Pessoa de Meia-Idade , Adulto , Idoso , Estudos de Casos e Controles
5.
Sci Rep ; 14(1): 16462, 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39014043

RESUMO

The current study tested the hypothesis that the association between musical ability and vocal emotion recognition skills is mediated by accuracy in prosody perception. Furthermore, it was investigated whether this association is primarily related to musical expertise, operationalized by long-term engagement in musical activities, or musical aptitude, operationalized by a test of musical perceptual ability. To this end, we conducted three studies: In Study 1 (N = 85) and Study 2 (N = 93), we developed and validated a new instrument for the assessment of prosodic discrimination ability. In Study 3 (N = 136), we examined whether the association between musical ability and vocal emotion recognition was mediated by prosodic discrimination ability. We found evidence for a full mediation, though only in relation to musical aptitude and not in relation to musical expertise. Taken together, these findings suggest that individuals with high musical aptitude have superior prosody perception skills, which in turn contribute to their vocal emotion recognition skills. Importantly, our results suggest that these benefits are not unique to musicians, but extend to non-musicians with high musical aptitude.


Assuntos
Aptidão , Emoções , Música , Humanos , Música/psicologia , Masculino , Feminino , Emoções/fisiologia , Aptidão/fisiologia , Adulto , Adulto Jovem , Percepção da Fala/fisiologia , Percepção Auditiva/fisiologia , Adolescente , Reconhecimento Psicológico/fisiologia , Voz/fisiologia
6.
Cognition ; 250: 105866, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38971020

RESUMO

Language experience confers a benefit to voice learning, a concept described in the literature as the language familiarity effect (LFE). What experiences are necessary for the LFE to be conferred is less clear. We contribute empirically and theoretically to this debate by examining within and across language voice learning with Cantonese-English bilingual voices in a talker-voice association paradigm. Listeners were trained in Cantonese or English and assessed on their abilities to generalize voice learning at test on Cantonese and English utterances. By testing listeners from four language backgrounds - English Monolingual, Cantonese-English Multilingual, Tone Multilingual, and Non-tone Multilingual groups - we assess whether the LFE and group-level differences in voice learning are due to varying abilities (1) in accessing the relative acoustic-phonetic features that distinguish a voice, (2) learning at a given rate, or (3) generalizing learning of talker-voice associations to novel same-language and different-language utterances. The specific four language background groups allow us to investigate the roles of language-specific familiarity, tone language experience, and generic multilingual experience in voice learning. Differences in performance across listener groups shows evidence in support of the LFE and the role of two mechanisms for voice learning: the extraction and association of talker-specific, language-general information that is more robustly generalized across languages, and talker-specific, language-specific information that may be more readily accessible and learnable, but due to its language-specific nature, is less able to be extended to another language.


Assuntos
Aprendizagem , Multilinguismo , Percepção da Fala , Voz , Humanos , Voz/fisiologia , Percepção da Fala/fisiologia , Feminino , Masculino , Aprendizagem/fisiologia , Adulto , Adulto Jovem , Idioma , Reconhecimento Psicológico/fisiologia , Fonética
7.
J Acoust Soc Am ; 156(1): 278-283, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38980102

RESUMO

How we produce and perceive voice is constrained by laryngeal physiology and biomechanics. Such constraints may present themselves as principal dimensions in the voice outcome space that are shared among speakers. This study attempts to identify such principal dimensions in the voice outcome space and the underlying laryngeal control mechanisms in a three-dimensional computational model of voice production. A large-scale voice simulation was performed with parametric variations in vocal fold geometry and stiffness, glottal gap, vocal tract shape, and subglottal pressure. Principal component analysis was applied to data combining both the physiological control parameters and voice outcome measures. The results showed three dominant dimensions accounting for at least 50% of the total variance. The first two dimensions describe respiratory-laryngeal coordination in controlling the energy balance between low- and high-frequency harmonics in the produced voice, and the third dimension describes control of the fundamental frequency. The dominance of these three dimensions suggests that voice changes along these principal dimensions are likely to be more consistently produced and perceived by most speakers than other voice changes, and thus are more likely to have emerged during evolution and be used to convey important personal information, such as emotion and larynx size.


Assuntos
Laringe , Fonação , Análise de Componente Principal , Humanos , Fenômenos Biomecânicos , Laringe/fisiologia , Laringe/anatomia & histologia , Voz/fisiologia , Prega Vocal/fisiologia , Prega Vocal/anatomia & histologia , Simulação por Computador , Qualidade da Voz , Acústica da Fala , Pressão , Modelos Biológicos , Modelos Anatômicos
8.
J Matern Fetal Neonatal Med ; 37(1): 2362933, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38910112

RESUMO

OBJECTIVE: To study the effects of playing mother's recorded voice to preterm infants in the NICU on their mothers' mental health as measured by the Depression, Anxiety and Stress Scale -21 (DASS-21) questionnaire. DESIGN/METHODS: This was a pilot single center prospective randomized controlled trial done at a level IV NICU. The trial was registered at clinicaltrials.gov (NCT04559620). Inclusion criteria were mothers of preterm infants with gestational ages between 26wks and 30 weeks. DASS-21 questionnaire was administered to all the enrolled mothers in the first week after birth followed by recording of their voice by the music therapists. In the interventional group, recorded maternal voice was played into the infant incubator between 15 and 21 days of life. A second DASS-21 was administered between 21 and 23 days of life. The Wilcoxon rank-sum test was used to compare DASS-21 scores between the two groups and Wilcoxon signed-rank test was used to compare the pre- and post-intervention DASS-21 scores. RESULTS: Forty eligible mothers were randomized: 20 to the intervention group and 20 to the control group. The baseline maternal and neonatal characteristics were similar between the two groups. There was no significant difference in the DASS-21 scores between the two groups at baseline or after the study intervention. There was no difference in the pre- and post-interventional DASS-21 scores or its individual components in the experimental group. There was a significant decrease in the total DASS-21 score and the anxiety component of DASS-21 between weeks 1 and 4 in the control group. CONCLUSION: In this pilot randomized control study, recorded maternal voice played into preterm infant's incubator did not have any effect on maternal mental health as measured by the DASS-21 questionnaire. Data obtained in this pilot study are useful in future RCTs (Randomized Controlled Trial) to address this important issue.


Assuntos
Ansiedade , Depressão , Recém-Nascido Prematuro , Estresse Psicológico , Humanos , Feminino , Projetos Piloto , Recém-Nascido , Recém-Nascido Prematuro/psicologia , Ansiedade/terapia , Adulto , Estresse Psicológico/terapia , Depressão/terapia , Mães/psicologia , Incubadoras para Lactentes , Estudos Prospectivos , Musicoterapia/métodos , Voz/fisiologia
9.
Physiol Behav ; 283: 114615, 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-38880296

RESUMO

This study sets out to investigate the potential effect of males' testosterone level on speech production and speech perception. Regarding speech production, we investigate intra- and inter-individual variation in mean fundamental frequency (fo) and formant frequencies and highlight the potential interacting effect of another hormone, i.e. cortisol. In addition, we investigate the influence of different speech materials on the relationship between testosterone and speech production. Regarding speech perception, we investigate the potential effect of individual differences in males' testosterone level on ratings of attractiveness of female voices. In the production study, data is gathered from 30 healthy adult males ranging from 19 to 27 years (mean age: 22.4, SD: 2.2) who recorded their voices and provided saliva samples at 9 am, 12 noon and 3 pm on a single day. Speech material consists of sustained vowels, counting, read speech and a free description of pictures. Biological measures comprise speakers' height, grip strength, and hormone levels (testosterone and cortisol). In the perception study, participants were asked to rate the attractiveness of female voice stimuli (sentence stimulus, same-speaker pairs) that were manipulated in three steps regarding mean fo and formant frequencies. Regarding speech production, our results show that testosterone affected mean fo (but not formants) both within and between speakers. This relationship was weakened in speakers with high cortisol levels and depended on the speech material. Regarding speech perception, we found female stimuli with higher mean fo and formants to be rated as sounding more attractive than stimuli with lower mean fo and formants. Moreover, listeners with low testosterone showed an increased sensitivity to vocal cues of female attractiveness. While our results of the production study support earlier findings of a relationship between testosterone and mean fo in males (which is mediated by cortisol), they also highlight the relevance of the speech material: The effect of testosterone was strongest in sustained vowels, potentially due to a strengthened effect of hormones on physiologically strongly influenced tasks such as sustained vowels in contrast to more free speech tasks such as a picture description. The perception study is the first to show an effect of males' testosterone level on female attractiveness ratings using voice stimuli.


Assuntos
Sinais (Psicologia) , Hidrocortisona , Saliva , Percepção da Fala , Fala , Testosterona , Voz , Humanos , Testosterona/metabolismo , Testosterona/farmacologia , Masculino , Adulto , Adulto Jovem , Saliva/metabolismo , Saliva/química , Hidrocortisona/metabolismo , Percepção da Fala/fisiologia , Percepção da Fala/efeitos dos fármacos , Fala/fisiologia , Fala/efeitos dos fármacos , Voz/efeitos dos fármacos , Feminino , Beleza , Estimulação Acústica
10.
J Speech Lang Hear Res ; 67(7): 2139-2158, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38875480

RESUMO

PURPOSE: This systematic review aimed to evaluate the effects of singing as an intervention for aging voice. METHOD: Quantitative studies of interventions for older adults with any medical condition that involves singing as training were reviewed, measured by respiration, phonation, and posture, which are the physical functions related to the aging voice. English and Chinese studies published until April 2024 were searched using 31 electronic databases, and seven studies were included. The included articles were assessed according to the Grading of Recommendations, Assessment, Development, and Evaluations rubric. RESULTS: Seven studies were included. These studies reported outcome measures that were related to respiratory functions only. For the intervention effect, statistically significant improvements were observed in five of the included studies, among which three studies had large effect sizes. The overall level of evidence of the included studies was not high, with three studies having moderate levels and the rest having lower levels. The intervention activities included trainings other than singing. These non-singing training items may have caused co-intervention bias in the study results. CONCLUSIONS: This systematic review suggests that singing as an intervention for older adults with respiratory and cognitive problems could improve respiration and respiratory-phonatory control. However, none of the included studies covers the other two of the physical functions related to aging voice (phonatory and postural functions). The overall level of evidence of the included studies was not high either. There is a need for more research evidence in singing-based intervention specifically for patient with aging voice.


Assuntos
Envelhecimento , Canto , Humanos , Idoso , Envelhecimento/fisiologia , Distúrbios da Voz/terapia , Fonação/fisiologia , Qualidade da Voz , Voz/fisiologia , Respiração , Postura/fisiologia , Idoso de 80 Anos ou mais
11.
Commun Biol ; 7(1): 711, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38862808

RESUMO

Deepfakes are viral ingredients of digital environments, and they can trick human cognition into misperceiving the fake as real. Here, we test the neurocognitive sensitivity of 25 participants to accept or reject person identities as recreated in audio deepfakes. We generate high-quality voice identity clones from natural speakers by using advanced deepfake technologies. During an identity matching task, participants show intermediate performance with deepfake voices, indicating levels of deception and resistance to deepfake identity spoofing. On the brain level, univariate and multivariate analyses consistently reveal a central cortico-striatal network that decoded the vocal acoustic pattern and deepfake-level (auditory cortex), as well as natural speaker identities (nucleus accumbens), which are valued for their social relevance. This network is embedded in a broader neural identity and object recognition network. Humans can thus be partly tricked by deepfakes, but the neurocognitive mechanisms identified during deepfake processing open windows for strengthening human resilience to fake information.


Assuntos
Percepção da Fala , Humanos , Masculino , Feminino , Adulto , Adulto Jovem , Percepção da Fala/fisiologia , Rede Nervosa/fisiologia , Córtex Auditivo/fisiologia , Voz/fisiologia , Corpo Estriado/fisiologia
12.
Math Biosci Eng ; 21(5): 5947-5971, 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38872565

RESUMO

The technology of robot-assisted prostate seed implantation has developed rapidly. However, during the process, there are some problems to be solved, such as non-intuitive visualization effects and complicated robot control. To improve the intelligence and visualization of the operation process, a voice control technology of prostate seed implantation robot in augmented reality environment was proposed. Initially, the MRI image of the prostate was denoised and segmented. The three-dimensional model of prostate and its surrounding tissues was reconstructed by surface rendering technology. Combined with holographic application program, the augmented reality system of prostate seed implantation was built. An improved singular value decomposition three-dimensional registration algorithm based on iterative closest point was proposed, and the results of three-dimensional registration experiments verified that the algorithm could effectively improve the three-dimensional registration accuracy. A fusion algorithm based on spectral subtraction and BP neural network was proposed. The experimental results showed that the average delay of the fusion algorithm was 1.314 s, and the overall response time of the integrated system was 1.5 s. The fusion algorithm could effectively improve the reliability of the voice control system, and the integrated system could meet the responsiveness requirements of prostate seed implantation.


Assuntos
Algoritmos , Realidade Aumentada , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Próstata , Neoplasias da Próstata , Robótica , Humanos , Masculino , Robótica/instrumentação , Imageamento por Ressonância Magnética/métodos , Neoplasias da Próstata/diagnóstico por imagem , Próstata/diagnóstico por imagem , Imageamento Tridimensional , Voz , Procedimentos Cirúrgicos Robóticos/instrumentação , Procedimentos Cirúrgicos Robóticos/métodos , Holografia/métodos , Holografia/instrumentação , Braquiterapia/instrumentação , Reprodutibilidade dos Testes
13.
Sci Rep ; 14(1): 13132, 2024 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849382

RESUMO

Voice production of humans and most mammals is governed by the MyoElastic-AeroDynamic (MEAD) principle, where an air stream is modulated by self-sustained vocal fold oscillation to generate audible air pressure fluctuations. An alternative mechanism is found in ultrasonic vocalizations of rodents, which are established by an aeroacoustic (AA) phenomenon without vibration of laryngeal tissue. Previously, some authors argued that high-pitched human vocalization is also produced by the AA principle. Here, we investigate the so-called "whistle register" voice production in nine professional female operatic sopranos singing a scale from C6 (≈ 1047 Hz) to G6 (≈ 1568 Hz). Super-high-speed videolaryngoscopy revealed vocal fold collision in all participants, with closed quotients from 30 to 73%. Computational modeling showed that the biomechanical requirements to produce such high-pitched voice would be an increased contraction of the cricothyroid muscle, vocal fold strain of about 50%, and high subglottal pressure. Our data suggest that high-pitched operatic soprano singing uses the MEAD mechanism. Consequently, the commonly used term "whistle register" does not reflect the physical principle of a whistle with regard to voice generation in high pitched classical singing.


Assuntos
Canto , Prega Vocal , Humanos , Feminino , Canto/fisiologia , Fenômenos Biomecânicos , Prega Vocal/fisiologia , Adulto , Som , Voz/fisiologia , Fonação/fisiologia
14.
JASA Express Lett ; 4(6)2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38888432

RESUMO

Singing is socially important but constrains voice acoustics, potentially masking certain aspects of vocal identity. Little is known about how well listeners extract talker details from sung speech or identify talkers across the sung and spoken modalities. Here, listeners (n = 149) were trained to recognize sung or spoken voices and then tested on their identification of these voices in both modalities. Learning vocal identities was initially easier through speech than song. At test, cross-modality voice recognition was above chance, but weaker than within-modality recognition. We conclude that talker information is accessible in sung speech, despite acoustic constraints in song.


Assuntos
Canto , Percepção da Fala , Humanos , Masculino , Feminino , Adulto , Percepção da Fala/fisiologia , Voz , Adulto Jovem , Reconhecimento Psicológico , Fala
15.
Eur J Psychotraumatol ; 15(1): 2358681, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38837122

RESUMO

Background: Research has shown that potential perpetrators and individuals high in psychopathic traits tend to body language cues to target a potential new victim. However, whether targeting occurs also by tending to vocal cues has not been examined. Thus, the role of voice in interpersonal violence merits investigation.Objective: In two studies, we examined whether perpetrators could differentiate female speakers with and without sexual and physical assault histories (presented as rating the degree of 'vulnerability' to victimization).Methods: Two samples of male listeners (sample one N = 105, sample two, N = 109) participated. Each sample rated 18 voices (9 survivors and 9 controls). Listener sample one heard spontaneous speech, and listener sample two heard the second sentence of a standardized passage. Listeners' self-reported psychopathic traits and history of previous perpetration were measured.Results: Across both samples, history of perpetration (but not psychopathy) predicted accuracy in distinguishing survivors of assault.Conclusions: These findings highlight the potential role of voice in prevention and intervention. Gaining a further understanding of what voice cues are associated with accuracy in discerning survivors can also help us understand whether or not specialized voice training could have a role in self-defense practices.


We examined whether listeners with history of perpetration could differentiate female speakers with and without assault histories (presented as rating the degree of 'vulnerability' to victimization).Listeners' higher history of perpetration was associated with higher accuracy in differentiating survivors of assault from non-survivors.These findings highlight that voice could have a crucial role in prevention and intervention.


Assuntos
Sobreviventes , Voz , Humanos , Masculino , Feminino , Adulto , Sobreviventes/psicologia , Sinais (Psicologia) , Vítimas de Crime/psicologia , Pessoa de Meia-Idade
16.
Sci Rep ; 14(1): 12734, 2024 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-38830969

RESUMO

The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.


Assuntos
Depressão , Voz , Humanos , Depressão/diagnóstico , Masculino , Feminino , Inteligência Artificial , Adulto
17.
Sci Rep ; 14(1): 13813, 2024 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-38877028

RESUMO

Parkinson's Disease (PD) is a prevalent neurological condition characterized by motor and cognitive impairments, typically manifesting around the age of 50 and presenting symptoms such as gait difficulties and speech impairments. Although a cure remains elusive, symptom management through medication is possible. Timely detection is pivotal for effective disease management. In this study, we leverage Machine Learning (ML) and Deep Learning (DL) techniques, specifically K-Nearest Neighbor (KNN) and Feed-forward Neural Network (FNN) models, to differentiate between individuals with PD and healthy individuals based on voice signal characteristics. Our dataset, sourced from the University of California at Irvine (UCI), comprises 195 voice recordings collected from 31 patients. To optimize model performance, we employ various strategies including Synthetic Minority Over-sampling Technique (SMOTE) for addressing class imbalance, Feature Selection to identify the most relevant features, and hyperparameter tuning using RandomizedSearchCV. Our experimentation reveals that the FNN and KSVM models, trained on an 80-20 split of the dataset for training and testing respectively, yield the most promising results. The FNN model achieves an impressive overall accuracy of 99.11%, with 98.78% recall, 99.96% precision, and a 99.23% f1-score. Similarly, the KSVM model demonstrates strong performance with an overall accuracy of 95.89%, recall of 96.88%, precision of 98.71%, and an f1-score of 97.62%. Overall, our study showcases the efficacy of ML and DL techniques in accurately identifying PD from voice signals, underscoring the potential for these approaches to contribute significantly to early diagnosis and intervention strategies for Parkinson's Disease.


Assuntos
Aprendizado de Máquina , Doença de Parkinson , Doença de Parkinson/diagnóstico , Humanos , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Redes Neurais de Computação , Voz , Aprendizado Profundo
18.
Proc Natl Acad Sci U S A ; 121(25): e2405588121, 2024 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-38861607

RESUMO

Many animals can extract useful information from the vocalizations of other species. Neuroimaging studies have evidenced areas sensitive to conspecific vocalizations in the cerebral cortex of primates, but how these areas process heterospecific vocalizations remains unclear. Using fMRI-guided electrophysiology, we recorded the spiking activity of individual neurons in the anterior temporal voice patches of two macaques while they listened to complex sounds including vocalizations from several species. In addition to cells selective for conspecific macaque vocalizations, we identified an unsuspected subpopulation of neurons with strong selectivity for human voice, not merely explained by spectral or temporal structure of the sounds. The auditory representational geometry implemented by these neurons was strongly related to that measured in the human voice areas with neuroimaging and only weakly to low-level acoustical structure. These findings provide new insights into the neural mechanisms involved in auditory expertise and the evolution of communication systems in primates.


Assuntos
Percepção Auditiva , Imageamento por Ressonância Magnética , Neurônios , Vocalização Animal , Voz , Animais , Humanos , Neurônios/fisiologia , Voz/fisiologia , Imageamento por Ressonância Magnética/métodos , Vocalização Animal/fisiologia , Percepção Auditiva/fisiologia , Masculino , Macaca mulatta , Encéfalo/fisiologia , Estimulação Acústica , Mapeamento Encefálico/métodos
19.
Sci Rep ; 14(1): 14575, 2024 06 25.
Artigo em Inglês | MEDLINE | ID: mdl-38914752

RESUMO

People often interact with groups (i.e., ensembles) during social interactions. Given that group-level information is important in navigating social environments, we expect perceptual sensitivity to aspects of groups that are relevant for personal threat as well as social belonging. Most ensemble perception research has focused on visual ensembles, with little research looking at auditory or vocal ensembles. Across four studies, we present evidence that (i) perceivers accurately extract the sex composition of a group from voices alone, (ii) judgments of threat increase concomitantly with the number of men, and (iii) listeners' sense of belonging depends on the number of same-sex others in the group. This work advances our understanding of social cognition, interpersonal communication, and ensemble coding to include auditory information, and reveals people's ability to extract relevant social information from brief exposures to vocalizing groups.


Assuntos
Voz , Humanos , Masculino , Feminino , Adulto , Razão de Masculinidade , Percepção Social , Adulto Jovem , Percepção Auditiva/fisiologia , Relações Interpessoais , Interação Social
20.
J Acoust Soc Am ; 155(6): 3822-3832, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38874464

RESUMO

This study proposes the use of vocal resonators to enhance cardiac auscultation signals and evaluates their performance for voice-noise suppression. Data were collected using two electronic stethoscopes while each study subject was talking. One collected auscultation signal from the chest while the other collected voice signals from one of the three voice resonators (cheek, back of the neck, and shoulder). The spectral subtraction method was applied to the signals. Both objective and subjective metrics were used to evaluate the quality of enhanced signals and to investigate the most effective vocal resonator for noise suppression. Our preliminary findings showed a significant improvement after enhancement and demonstrated the efficacy of vocal resonators. A listening survey was conducted with thirteen physicians to evaluate the quality of enhanced signals, and they have received significantly better scores regarding the sound quality than their original signals. The shoulder resonator group demonstrated significantly better sound quality than the cheek group when reducing voice sound in cardiac auscultation signals. The suggested method has the potential to be used for the development of an electronic stethoscope with a robust noise removal function. Significant clinical benefits are expected from the expedited preliminary diagnostic procedure.


Assuntos
Auscultação Cardíaca , Processamento de Sinais Assistido por Computador , Estetoscópios , Humanos , Auscultação Cardíaca/instrumentação , Auscultação Cardíaca/métodos , Auscultação Cardíaca/normas , Masculino , Feminino , Adulto , Ruídos Cardíacos/fisiologia , Espectrografia do Som , Desenho de Equipamento , Voz/fisiologia , Pessoa de Meia-Idade , Qualidade da Voz , Vibração , Ruído
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA