Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 9.526
Filter
1.
Sci Data ; 11(1): 800, 2024 Jul 19.
Article in English | MEDLINE | ID: mdl-39030186

ABSTRACT

This paper describes a new publicly-available database of VOiCe signals acquired in Amyotrophic Lateral Sclerosis (ALS) patients (VOC-ALS) and healthy controls performing different speech tasks. This dataset consists of 1224 voice signals recorded from 153 participants: 51 healthy controls (32 males and 19 females) and 102 ALS patients (65 males and 37 females) with different severity of dysarthria. Each subject's voice was recorded using a smartphone application (Vox4Health) while performing several vocal tasks, including a sustained phonation of the vowels /a/, /e/, /i/, /o/, /u/ and /pa/, /ta/, /ka/ syllable repetition. Basic derived speech metrics such as harmonics-to-noise ratio, mean and standard deviation of fundamental frequency (F0), jitter and shimmer were calculated. The F0 standard deviation of vowels and syllables showed an excellent ability to identify people with ALS and to discriminate the different severity of dysarthria. These data represent the most comprehensive database of voice signals in ALS and form a solid basis for research on the recognition of voice impairment in ALS patients for use in clinical applications.


Subject(s)
Amyotrophic Lateral Sclerosis , Dysarthria , Humans , Amyotrophic Lateral Sclerosis/physiopathology , Amyotrophic Lateral Sclerosis/complications , Dysarthria/physiopathology , Male , Female , Voice , Databases, Factual , Middle Aged , Adult , Aged , Case-Control Studies
2.
Hum Brain Mapp ; 45(10): e26724, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-39001584

ABSTRACT

Music is ubiquitous, both in its instrumental and vocal forms. While speech perception at birth has been at the core of an extensive corpus of research, the origins of the ability to discriminate instrumental or vocal melodies is still not well investigated. In previous studies comparing vocal and musical perception, the vocal stimuli were mainly related to speaking, including language, and not to the non-language singing voice. In the present study, to better compare a melodic instrumental line with the voice, we used singing as a comparison stimulus, to reduce the dissimilarities between the two stimuli as much as possible, separating language perception from vocal musical perception. In the present study, 45 newborns were scanned, 10 full-term born infants and 35 preterm infants at term-equivalent age (mean gestational age at test = 40.17 weeks, SD = 0.44) using functional magnetic resonance imaging while listening to five melodies played by a musical instrument (flute) or sung by a female voice. To examine the dynamic task-based effective connectivity, we employed a psychophysiological interaction of co-activation patterns (PPI-CAPs) analysis, using the auditory cortices as seed region, to investigate moment-to-moment changes in task-driven modulation of cortical activity during an fMRI task. Our findings reveal condition-specific, dynamically occurring patterns of co-activation (PPI-CAPs). During the vocal condition, the auditory cortex co-activates with the sensorimotor and salience networks, while during the instrumental condition, it co-activates with the visual cortex and the superior frontal cortex. Our results show that the vocal stimulus elicits sensorimotor aspects of the auditory perception and is processed as a more salient stimulus while the instrumental condition activated higher-order cognitive and visuo-spatial networks. Common neural signatures for both auditory stimuli were found in the precuneus and posterior cingulate gyrus. Finally, this study adds knowledge on the dynamic brain connectivity underlying the newborns capability of early and specialized auditory processing, highlighting the relevance of dynamic approaches to study brain function in newborn populations.


Subject(s)
Auditory Perception , Magnetic Resonance Imaging , Music , Humans , Female , Male , Auditory Perception/physiology , Infant, Newborn , Singing/physiology , Infant, Premature/physiology , Brain Mapping , Acoustic Stimulation , Brain/physiology , Brain/diagnostic imaging , Voice/physiology
3.
Cognition ; 250: 105866, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38971020

ABSTRACT

Language experience confers a benefit to voice learning, a concept described in the literature as the language familiarity effect (LFE). What experiences are necessary for the LFE to be conferred is less clear. We contribute empirically and theoretically to this debate by examining within and across language voice learning with Cantonese-English bilingual voices in a talker-voice association paradigm. Listeners were trained in Cantonese or English and assessed on their abilities to generalize voice learning at test on Cantonese and English utterances. By testing listeners from four language backgrounds - English Monolingual, Cantonese-English Multilingual, Tone Multilingual, and Non-tone Multilingual groups - we assess whether the LFE and group-level differences in voice learning are due to varying abilities (1) in accessing the relative acoustic-phonetic features that distinguish a voice, (2) learning at a given rate, or (3) generalizing learning of talker-voice associations to novel same-language and different-language utterances. The specific four language background groups allow us to investigate the roles of language-specific familiarity, tone language experience, and generic multilingual experience in voice learning. Differences in performance across listener groups shows evidence in support of the LFE and the role of two mechanisms for voice learning: the extraction and association of talker-specific, language-general information that is more robustly generalized across languages, and talker-specific, language-specific information that may be more readily accessible and learnable, but due to its language-specific nature, is less able to be extended to another language.


Subject(s)
Learning , Multilingualism , Speech Perception , Voice , Humans , Voice/physiology , Speech Perception/physiology , Female , Male , Learning/physiology , Adult , Young Adult , Language , Recognition, Psychology/physiology , Phonetics
4.
J Acoust Soc Am ; 156(1): 278-283, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38980102

ABSTRACT

How we produce and perceive voice is constrained by laryngeal physiology and biomechanics. Such constraints may present themselves as principal dimensions in the voice outcome space that are shared among speakers. This study attempts to identify such principal dimensions in the voice outcome space and the underlying laryngeal control mechanisms in a three-dimensional computational model of voice production. A large-scale voice simulation was performed with parametric variations in vocal fold geometry and stiffness, glottal gap, vocal tract shape, and subglottal pressure. Principal component analysis was applied to data combining both the physiological control parameters and voice outcome measures. The results showed three dominant dimensions accounting for at least 50% of the total variance. The first two dimensions describe respiratory-laryngeal coordination in controlling the energy balance between low- and high-frequency harmonics in the produced voice, and the third dimension describes control of the fundamental frequency. The dominance of these three dimensions suggests that voice changes along these principal dimensions are likely to be more consistently produced and perceived by most speakers than other voice changes, and thus are more likely to have emerged during evolution and be used to convey important personal information, such as emotion and larynx size.


Subject(s)
Larynx , Phonation , Principal Component Analysis , Humans , Biomechanical Phenomena , Larynx/physiology , Larynx/anatomy & histology , Voice/physiology , Vocal Cords/physiology , Vocal Cords/anatomy & histology , Computer Simulation , Voice Quality , Speech Acoustics , Pressure , Models, Biological , Models, Anatomic
5.
Sci Data ; 11(1): 746, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38982093

ABSTRACT

Many research articles have explored the impact of surgical interventions on voice and speech evaluations, but advances are limited by the lack of publicly accessible datasets. To address this, a comprehensive corpus of 107 Spanish Castilian speakers was recorded, including control speakers and patients who underwent upper airway surgeries such as Tonsillectomy, Functional Endoscopic Sinus Surgery, and Septoplasty. The dataset contains 3,800 audio files, averaging 35.51 ± 5.91 recordings per patient. This resource enables systematic investigation of the effects of upper respiratory tract surgery on voice and speech. Previous studies using this corpus have shown no relevant changes in key acoustic parameters for sustained vowel phonation, consistent with initial hypotheses. However, the analysis of speech recordings, particularly nasalised segments, remains open for further research. Additionally, this dataset facilitates the study of the impact of upper airway surgery on speaker recognition and identification methods, and testing of anti-spoofing methodologies for improved robustness.


Subject(s)
Speech , Voice , Humans , Postoperative Period , Tonsillectomy , Male , Female , Preoperative Period , Adult
6.
Sci Rep ; 14(1): 16778, 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-39039258

ABSTRACT

The present study employed dictator game and ultimatum game to investigate the effect of facial attractiveness, vocal attractiveness and social interest in expressing positive ("I like you") versus negative signals ("I don't like you") on decision making. Female participants played against male recipients in dictator game and ultimatum game while played against male proposers in ultimatum game. Results showed that participants offered recipients with attractive faces more money than recipients with unattractive faces. Participants also offered recipients with attractive voices more money than recipients with unattractive voices, especially under the positive social interest condition. Moreover, participants allocated more money to recipients who expressed positive social interest than those who expressed negative social interest, whereas they would also expect proposers who expressed positive social interest to offer them more money than proposers who expressed negative social interest. Overall, the results inform beauty premium for faces and voices on opposite-sex economic bargaining. Social interest also affects decision outcomes. However, the beauty premium and effect of social interest varies with participants' roles.


Subject(s)
Beauty , Decision Making , Face , Voice , Humans , Female , Male , Young Adult , Adult , Games, Experimental
7.
Sci Rep ; 14(1): 16162, 2024 07 13.
Article in English | MEDLINE | ID: mdl-39003348

ABSTRACT

The Web has become an essential resource but is not yet accessible to everyone. Assistive technologies and innovative, intelligent frameworks, for example, those using conversational AI, help overcome some exclusions. However, some users still experience barriers. This paper shows how a human-centered approach can shed light on technology limitations and gaps. It reports on a three-step process (focus group, co-design, and preliminary validation) that we adopted to investigate how people with speech impairments, e.g., dysarthria, browse the Web and how barriers can be reduced. The methodology helped us identify challenges and create new solutions, i.e., patterns for Web browsing, by combining voice-based conversational AI, customized for impaired speech, with techniques for the visual augmentation of web pages. While current trends in AI research focus on more and more powerful large models, participants remarked how current conversational systems do not meet their needs, and how it is important to consider each one's specificity for a technology to be called inclusive.


Subject(s)
Artificial Intelligence , Internet , Voice , Humans , Voice/physiology , Male , Female , Adult , Middle Aged , Communication , Focus Groups
8.
Sci Rep ; 14(1): 16462, 2024 07 16.
Article in English | MEDLINE | ID: mdl-39014043

ABSTRACT

The current study tested the hypothesis that the association between musical ability and vocal emotion recognition skills is mediated by accuracy in prosody perception. Furthermore, it was investigated whether this association is primarily related to musical expertise, operationalized by long-term engagement in musical activities, or musical aptitude, operationalized by a test of musical perceptual ability. To this end, we conducted three studies: In Study 1 (N = 85) and Study 2 (N = 93), we developed and validated a new instrument for the assessment of prosodic discrimination ability. In Study 3 (N = 136), we examined whether the association between musical ability and vocal emotion recognition was mediated by prosodic discrimination ability. We found evidence for a full mediation, though only in relation to musical aptitude and not in relation to musical expertise. Taken together, these findings suggest that individuals with high musical aptitude have superior prosody perception skills, which in turn contribute to their vocal emotion recognition skills. Importantly, our results suggest that these benefits are not unique to musicians, but extend to non-musicians with high musical aptitude.


Subject(s)
Aptitude , Emotions , Music , Humans , Music/psychology , Male , Female , Emotions/physiology , Aptitude/physiology , Adult , Young Adult , Speech Perception/physiology , Auditory Perception/physiology , Adolescent , Recognition, Psychology/physiology , Voice/physiology
9.
Codas ; 36(5): e20240009, 2024.
Article in English | MEDLINE | ID: mdl-39046026

ABSTRACT

PURPOSE: The study aimed to identify (1) whether the age and gender of listeners and the length of vocal stimuli affect emotion discrimination accuracy in voice; and (2) whether the determined level of expression of perceived affective emotions is age and gender-dependent. METHODS: Thirty-two age-matched listeners listened to 270 semantically neutral voice samples produced in neutral, happy, and angry intonation by ten professional actors. The participants were required to categorize the auditory stimulus based on three options and judge the intensity of emotional expression in the sample using a customized tablet web interface. RESULTS: The discrimination accuracy of happy and angry emotions decreased with age, while accuracy in discriminating neutral emotions increased with age. Females rated the intensity level of perceived affective emotions higher than males across all linguistic units. These were: for angry emotions in words (z = -3.599, p < .001), phrases (z = -3.218, p = .001), and texts (z = -2.272, p = .023), for happy emotions in words (z = -5.799, p < .001), phrases (z = -4.706, p < .001), and texts (z = -2.699, p = .007). CONCLUSION: Accuracy in perceiving vocal expressions of emotions varies according to age and gender. Young adults are better at distinguishing happy and angry emotions than middle-aged adults, while middle-aged adults tend to categorize perceived affective emotions as neutral. Gender also plays a role, with females rating expressions of affective emotions in voices higher than males. Additionally, the length of voice stimuli impacts emotion discrimination accuracy.


Subject(s)
Emotions , Speech Perception , Voice , Humans , Female , Male , Adult , Emotions/physiology , Age Factors , Young Adult , Sex Factors , Middle Aged , Speech Perception/physiology , Voice/physiology , Adolescent , Aged
10.
J Med Syst ; 48(1): 70, 2024 Jul 29.
Article in English | MEDLINE | ID: mdl-39073632

ABSTRACT

This is the second in a series of studies assessing the usability and reliability of a novel voice-based delivery system of mental health screening assessments. The previous study demonstrated the reliability and patient preference of a voice-based format of the Patient Health Questionnaire 9 (PHQ 9) for measuring major depression compared to a traditional paper format. Through this study, we further examined the Amazon Alexa tool in the administration of the General Anxiety Disorder 7 (GAD 7). With a replicated methodology to the first study, 40 newly administered patients completed the GAD 7 in one format at their first session and the alternate format at their follow up. Results from the new in clinic population replicated the findings observed in the first PHQ 9 study: GAD 7 assessment scores for the Alexa and paper version showed a high degree of reliability (α = 0.77), patients showed higher overall positive attitudes for the voice-based GAD 7 format, and subscales for attractiveness, stimulation, and novelty were significantly higher for the voiced-based format. Results also demonstrated 42 (84%) of the 50 patients who completed the voice-based format responded as being willing to use the device from home. With new recommendations of universal screening of anxiety disorders for patients below the age of 65 and rapid changes in virtual mental healthcare, convenient screenings are more important than ever. We believe this novel clinical assessment tool has the potential to improve patient behavioral healthcare while mitigating the workload of healthcare professionals.


Subject(s)
Anxiety Disorders , Humans , Female , Male , Reproducibility of Results , Middle Aged , Adult , Anxiety Disorders/diagnosis , Voice , Surveys and Questionnaires , Psychometrics , Aged
11.
JASA Express Lett ; 4(6)2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38888432

ABSTRACT

Singing is socially important but constrains voice acoustics, potentially masking certain aspects of vocal identity. Little is known about how well listeners extract talker details from sung speech or identify talkers across the sung and spoken modalities. Here, listeners (n = 149) were trained to recognize sung or spoken voices and then tested on their identification of these voices in both modalities. Learning vocal identities was initially easier through speech than song. At test, cross-modality voice recognition was above chance, but weaker than within-modality recognition. We conclude that talker information is accessible in sung speech, despite acoustic constraints in song.


Subject(s)
Singing , Speech Perception , Humans , Male , Female , Adult , Speech Perception/physiology , Voice , Young Adult , Recognition, Psychology , Speech
12.
Math Biosci Eng ; 21(5): 5947-5971, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38872565

ABSTRACT

The technology of robot-assisted prostate seed implantation has developed rapidly. However, during the process, there are some problems to be solved, such as non-intuitive visualization effects and complicated robot control. To improve the intelligence and visualization of the operation process, a voice control technology of prostate seed implantation robot in augmented reality environment was proposed. Initially, the MRI image of the prostate was denoised and segmented. The three-dimensional model of prostate and its surrounding tissues was reconstructed by surface rendering technology. Combined with holographic application program, the augmented reality system of prostate seed implantation was built. An improved singular value decomposition three-dimensional registration algorithm based on iterative closest point was proposed, and the results of three-dimensional registration experiments verified that the algorithm could effectively improve the three-dimensional registration accuracy. A fusion algorithm based on spectral subtraction and BP neural network was proposed. The experimental results showed that the average delay of the fusion algorithm was 1.314 s, and the overall response time of the integrated system was 1.5 s. The fusion algorithm could effectively improve the reliability of the voice control system, and the integrated system could meet the responsiveness requirements of prostate seed implantation.


Subject(s)
Algorithms , Augmented Reality , Magnetic Resonance Imaging , Neural Networks, Computer , Prostate , Prostatic Neoplasms , Robotics , Humans , Male , Robotics/instrumentation , Magnetic Resonance Imaging/methods , Prostatic Neoplasms/diagnostic imaging , Prostate/diagnostic imaging , Imaging, Three-Dimensional , Voice , Robotic Surgical Procedures/instrumentation , Robotic Surgical Procedures/methods , Holography/methods , Holography/instrumentation , Brachytherapy/instrumentation , Reproducibility of Results
13.
J Acoust Soc Am ; 155(6): 3822-3832, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38874464

ABSTRACT

This study proposes the use of vocal resonators to enhance cardiac auscultation signals and evaluates their performance for voice-noise suppression. Data were collected using two electronic stethoscopes while each study subject was talking. One collected auscultation signal from the chest while the other collected voice signals from one of the three voice resonators (cheek, back of the neck, and shoulder). The spectral subtraction method was applied to the signals. Both objective and subjective metrics were used to evaluate the quality of enhanced signals and to investigate the most effective vocal resonator for noise suppression. Our preliminary findings showed a significant improvement after enhancement and demonstrated the efficacy of vocal resonators. A listening survey was conducted with thirteen physicians to evaluate the quality of enhanced signals, and they have received significantly better scores regarding the sound quality than their original signals. The shoulder resonator group demonstrated significantly better sound quality than the cheek group when reducing voice sound in cardiac auscultation signals. The suggested method has the potential to be used for the development of an electronic stethoscope with a robust noise removal function. Significant clinical benefits are expected from the expedited preliminary diagnostic procedure.


Subject(s)
Heart Auscultation , Signal Processing, Computer-Assisted , Stethoscopes , Humans , Heart Auscultation/instrumentation , Heart Auscultation/methods , Heart Auscultation/standards , Male , Female , Adult , Heart Sounds/physiology , Sound Spectrography , Equipment Design , Voice/physiology , Middle Aged , Voice Quality , Vibration , Noise
14.
Eur J Psychotraumatol ; 15(1): 2358681, 2024.
Article in English | MEDLINE | ID: mdl-38837122

ABSTRACT

Background: Research has shown that potential perpetrators and individuals high in psychopathic traits tend to body language cues to target a potential new victim. However, whether targeting occurs also by tending to vocal cues has not been examined. Thus, the role of voice in interpersonal violence merits investigation.Objective: In two studies, we examined whether perpetrators could differentiate female speakers with and without sexual and physical assault histories (presented as rating the degree of 'vulnerability' to victimization).Methods: Two samples of male listeners (sample one N = 105, sample two, N = 109) participated. Each sample rated 18 voices (9 survivors and 9 controls). Listener sample one heard spontaneous speech, and listener sample two heard the second sentence of a standardized passage. Listeners' self-reported psychopathic traits and history of previous perpetration were measured.Results: Across both samples, history of perpetration (but not psychopathy) predicted accuracy in distinguishing survivors of assault.Conclusions: These findings highlight the potential role of voice in prevention and intervention. Gaining a further understanding of what voice cues are associated with accuracy in discerning survivors can also help us understand whether or not specialized voice training could have a role in self-defense practices.


We examined whether listeners with history of perpetration could differentiate female speakers with and without assault histories (presented as rating the degree of 'vulnerability' to victimization).Listeners' higher history of perpetration was associated with higher accuracy in differentiating survivors of assault from non-survivors.These findings highlight that voice could have a crucial role in prevention and intervention.


Subject(s)
Survivors , Voice , Humans , Male , Female , Adult , Survivors/psychology , Cues , Crime Victims/psychology , Middle Aged
15.
Sci Rep ; 14(1): 12734, 2024 06 03.
Article in English | MEDLINE | ID: mdl-38830969

ABSTRACT

The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.


Subject(s)
Depression , Voice , Humans , Depression/diagnosis , Male , Female , Artificial Intelligence , Adult
16.
J Matern Fetal Neonatal Med ; 37(1): 2362933, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38910112

ABSTRACT

OBJECTIVE: To study the effects of playing mother's recorded voice to preterm infants in the NICU on their mothers' mental health as measured by the Depression, Anxiety and Stress Scale -21 (DASS-21) questionnaire. DESIGN/METHODS: This was a pilot single center prospective randomized controlled trial done at a level IV NICU. The trial was registered at clinicaltrials.gov (NCT04559620). Inclusion criteria were mothers of preterm infants with gestational ages between 26wks and 30 weeks. DASS-21 questionnaire was administered to all the enrolled mothers in the first week after birth followed by recording of their voice by the music therapists. In the interventional group, recorded maternal voice was played into the infant incubator between 15 and 21 days of life. A second DASS-21 was administered between 21 and 23 days of life. The Wilcoxon rank-sum test was used to compare DASS-21 scores between the two groups and Wilcoxon signed-rank test was used to compare the pre- and post-intervention DASS-21 scores. RESULTS: Forty eligible mothers were randomized: 20 to the intervention group and 20 to the control group. The baseline maternal and neonatal characteristics were similar between the two groups. There was no significant difference in the DASS-21 scores between the two groups at baseline or after the study intervention. There was no difference in the pre- and post-interventional DASS-21 scores or its individual components in the experimental group. There was a significant decrease in the total DASS-21 score and the anxiety component of DASS-21 between weeks 1 and 4 in the control group. CONCLUSION: In this pilot randomized control study, recorded maternal voice played into preterm infant's incubator did not have any effect on maternal mental health as measured by the DASS-21 questionnaire. Data obtained in this pilot study are useful in future RCTs (Randomized Controlled Trial) to address this important issue.


Subject(s)
Anxiety , Depression , Infant, Premature , Stress, Psychological , Humans , Female , Pilot Projects , Infant, Newborn , Infant, Premature/psychology , Anxiety/therapy , Adult , Stress, Psychological/therapy , Depression/therapy , Mothers/psychology , Incubators, Infant , Prospective Studies , Music Therapy/methods , Voice/physiology
17.
Commun Biol ; 7(1): 711, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38862808

ABSTRACT

Deepfakes are viral ingredients of digital environments, and they can trick human cognition into misperceiving the fake as real. Here, we test the neurocognitive sensitivity of 25 participants to accept or reject person identities as recreated in audio deepfakes. We generate high-quality voice identity clones from natural speakers by using advanced deepfake technologies. During an identity matching task, participants show intermediate performance with deepfake voices, indicating levels of deception and resistance to deepfake identity spoofing. On the brain level, univariate and multivariate analyses consistently reveal a central cortico-striatal network that decoded the vocal acoustic pattern and deepfake-level (auditory cortex), as well as natural speaker identities (nucleus accumbens), which are valued for their social relevance. This network is embedded in a broader neural identity and object recognition network. Humans can thus be partly tricked by deepfakes, but the neurocognitive mechanisms identified during deepfake processing open windows for strengthening human resilience to fake information.


Subject(s)
Speech Perception , Humans , Male , Female , Adult , Young Adult , Speech Perception/physiology , Nerve Net/physiology , Auditory Cortex/physiology , Voice/physiology , Corpus Striatum/physiology
18.
Sci Rep ; 14(1): 13813, 2024 06 15.
Article in English | MEDLINE | ID: mdl-38877028

ABSTRACT

Parkinson's Disease (PD) is a prevalent neurological condition characterized by motor and cognitive impairments, typically manifesting around the age of 50 and presenting symptoms such as gait difficulties and speech impairments. Although a cure remains elusive, symptom management through medication is possible. Timely detection is pivotal for effective disease management. In this study, we leverage Machine Learning (ML) and Deep Learning (DL) techniques, specifically K-Nearest Neighbor (KNN) and Feed-forward Neural Network (FNN) models, to differentiate between individuals with PD and healthy individuals based on voice signal characteristics. Our dataset, sourced from the University of California at Irvine (UCI), comprises 195 voice recordings collected from 31 patients. To optimize model performance, we employ various strategies including Synthetic Minority Over-sampling Technique (SMOTE) for addressing class imbalance, Feature Selection to identify the most relevant features, and hyperparameter tuning using RandomizedSearchCV. Our experimentation reveals that the FNN and KSVM models, trained on an 80-20 split of the dataset for training and testing respectively, yield the most promising results. The FNN model achieves an impressive overall accuracy of 99.11%, with 98.78% recall, 99.96% precision, and a 99.23% f1-score. Similarly, the KSVM model demonstrates strong performance with an overall accuracy of 95.89%, recall of 96.88%, precision of 98.71%, and an f1-score of 97.62%. Overall, our study showcases the efficacy of ML and DL techniques in accurately identifying PD from voice signals, underscoring the potential for these approaches to contribute significantly to early diagnosis and intervention strategies for Parkinson's Disease.


Subject(s)
Machine Learning , Parkinson Disease , Parkinson Disease/diagnosis , Humans , Male , Female , Middle Aged , Aged , Neural Networks, Computer , Voice , Deep Learning
19.
J Speech Lang Hear Res ; 67(7): 1997-2020, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38861454

ABSTRACT

PURPOSE: Although different factors and voice measures have been associated with phonotraumatic vocal hyperfunction (PVH), it is unclear what percentage of individuals with PVH exhibit such differences during their daily lives. This study used a machine learning approach to quantify the consistency with which PVH manifests according to ambulatory voice measures. Analyses included acoustic parameters of phonation as well as temporal aspects of phonation and rest, with the goal of determining optimally consistent signatures of PVH. METHOD: Ambulatory neck-surface acceleration signals were recorded over 1 week from 116 female participants diagnosed with PVH and age-, sex-, and occupation-matched vocally healthy controls. The consistency of the manifestation of PVH was defined as the percentage of participants in each group that exhibited an atypical signature based on a target voice measure. Evaluation of each machine learning model used nested 10-fold cross-validation to improve the generalizability of findings. In Experiment 1, we trained separate logistic regression models based on the distributional characteristics of 14 voice measures and durations of voicing and resting segments. In Experiments 2 and 3, features of voicing and resting duration augmented the existing distributional characteristics to examine whether more consistent signatures would result. RESULTS: Experiment 1 showed that the difference in the magnitude of the first two harmonics (H1-H2) exhibited the most consistent signature (69.4% of participants with PVH and 20.4% of controls had an atypical H1-H2 signature), followed by spectral tilt over eight harmonics (73.6% participants with PVH and 32.1% of controls had an atypical spectral tilt signature) and estimated sound pressure level (SPL; 66.9% participants with PVH and 27.6% of controls had an atypical SPL signature). Additionally, 77.6% of participants with PVH had atypical resting duration, with 68.9% exhibiting atypical voicing duration. Experiments 2 and 3 showed that augmenting the best-performing voice measures with univariate features of voicing or resting durations yielded only incremental improvement in the classifier's performance. CONCLUSIONS: Females with PVH were more likely to use more abrupt vocal fold closure (lower H1-H2), phonate louder (higher SPL), and take shorter vocal rests. They were also less likely to use higher fundamental frequency during their daily activities. The difference in the voicing duration signature between participants with PVH and controls had a large effect size, providing strong empirical evidence regarding the role of voice use in the development of PVH.


Subject(s)
Machine Learning , Phonation , Humans , Female , Adult , Middle Aged , Phonation/physiology , Voice Disorders/physiopathology , Voice Disorders/diagnosis , Young Adult , Voice Quality/physiology , Vocal Cords/physiopathology , Speech Acoustics , Voice/physiology , Aged , Case-Control Studies
20.
Physiol Behav ; 283: 114615, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38880296

ABSTRACT

This study sets out to investigate the potential effect of males' testosterone level on speech production and speech perception. Regarding speech production, we investigate intra- and inter-individual variation in mean fundamental frequency (fo) and formant frequencies and highlight the potential interacting effect of another hormone, i.e. cortisol. In addition, we investigate the influence of different speech materials on the relationship between testosterone and speech production. Regarding speech perception, we investigate the potential effect of individual differences in males' testosterone level on ratings of attractiveness of female voices. In the production study, data is gathered from 30 healthy adult males ranging from 19 to 27 years (mean age: 22.4, SD: 2.2) who recorded their voices and provided saliva samples at 9 am, 12 noon and 3 pm on a single day. Speech material consists of sustained vowels, counting, read speech and a free description of pictures. Biological measures comprise speakers' height, grip strength, and hormone levels (testosterone and cortisol). In the perception study, participants were asked to rate the attractiveness of female voice stimuli (sentence stimulus, same-speaker pairs) that were manipulated in three steps regarding mean fo and formant frequencies. Regarding speech production, our results show that testosterone affected mean fo (but not formants) both within and between speakers. This relationship was weakened in speakers with high cortisol levels and depended on the speech material. Regarding speech perception, we found female stimuli with higher mean fo and formants to be rated as sounding more attractive than stimuli with lower mean fo and formants. Moreover, listeners with low testosterone showed an increased sensitivity to vocal cues of female attractiveness. While our results of the production study support earlier findings of a relationship between testosterone and mean fo in males (which is mediated by cortisol), they also highlight the relevance of the speech material: The effect of testosterone was strongest in sustained vowels, potentially due to a strengthened effect of hormones on physiologically strongly influenced tasks such as sustained vowels in contrast to more free speech tasks such as a picture description. The perception study is the first to show an effect of males' testosterone level on female attractiveness ratings using voice stimuli.


Subject(s)
Cues , Hydrocortisone , Saliva , Speech Perception , Speech , Testosterone , Voice , Humans , Testosterone/metabolism , Testosterone/pharmacology , Male , Adult , Young Adult , Saliva/metabolism , Saliva/chemistry , Hydrocortisone/metabolism , Speech Perception/physiology , Speech Perception/drug effects , Speech/physiology , Speech/drug effects , Voice/drug effects , Female , Beauty , Acoustic Stimulation
SELECTION OF CITATIONS
SEARCH DETAIL