Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 9.511
Filter
1.
J Acoust Soc Am ; 156(1): 278-283, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38980102

ABSTRACT

How we produce and perceive voice is constrained by laryngeal physiology and biomechanics. Such constraints may present themselves as principal dimensions in the voice outcome space that are shared among speakers. This study attempts to identify such principal dimensions in the voice outcome space and the underlying laryngeal control mechanisms in a three-dimensional computational model of voice production. A large-scale voice simulation was performed with parametric variations in vocal fold geometry and stiffness, glottal gap, vocal tract shape, and subglottal pressure. Principal component analysis was applied to data combining both the physiological control parameters and voice outcome measures. The results showed three dominant dimensions accounting for at least 50% of the total variance. The first two dimensions describe respiratory-laryngeal coordination in controlling the energy balance between low- and high-frequency harmonics in the produced voice, and the third dimension describes control of the fundamental frequency. The dominance of these three dimensions suggests that voice changes along these principal dimensions are likely to be more consistently produced and perceived by most speakers than other voice changes, and thus are more likely to have emerged during evolution and be used to convey important personal information, such as emotion and larynx size.


Subject(s)
Larynx , Phonation , Principal Component Analysis , Humans , Biomechanical Phenomena , Larynx/physiology , Larynx/anatomy & histology , Voice/physiology , Vocal Cords/physiology , Vocal Cords/anatomy & histology , Computer Simulation , Voice Quality , Speech Acoustics , Pressure , Models, Biological , Models, Anatomic
2.
Sci Data ; 11(1): 746, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38982093

ABSTRACT

Many research articles have explored the impact of surgical interventions on voice and speech evaluations, but advances are limited by the lack of publicly accessible datasets. To address this, a comprehensive corpus of 107 Spanish Castilian speakers was recorded, including control speakers and patients who underwent upper airway surgeries such as Tonsillectomy, Functional Endoscopic Sinus Surgery, and Septoplasty. The dataset contains 3,800 audio files, averaging 35.51 ± 5.91 recordings per patient. This resource enables systematic investigation of the effects of upper respiratory tract surgery on voice and speech. Previous studies using this corpus have shown no relevant changes in key acoustic parameters for sustained vowel phonation, consistent with initial hypotheses. However, the analysis of speech recordings, particularly nasalised segments, remains open for further research. Additionally, this dataset facilitates the study of the impact of upper airway surgery on speaker recognition and identification methods, and testing of anti-spoofing methodologies for improved robustness.


Subject(s)
Speech , Voice , Humans , Postoperative Period , Tonsillectomy , Male , Female , Preoperative Period , Adult
3.
Commun Biol ; 7(1): 711, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38862808

ABSTRACT

Deepfakes are viral ingredients of digital environments, and they can trick human cognition into misperceiving the fake as real. Here, we test the neurocognitive sensitivity of 25 participants to accept or reject person identities as recreated in audio deepfakes. We generate high-quality voice identity clones from natural speakers by using advanced deepfake technologies. During an identity matching task, participants show intermediate performance with deepfake voices, indicating levels of deception and resistance to deepfake identity spoofing. On the brain level, univariate and multivariate analyses consistently reveal a central cortico-striatal network that decoded the vocal acoustic pattern and deepfake-level (auditory cortex), as well as natural speaker identities (nucleus accumbens), which are valued for their social relevance. This network is embedded in a broader neural identity and object recognition network. Humans can thus be partly tricked by deepfakes, but the neurocognitive mechanisms identified during deepfake processing open windows for strengthening human resilience to fake information.


Subject(s)
Speech Perception , Humans , Male , Female , Adult , Young Adult , Speech Perception/physiology , Nerve Net/physiology , Auditory Cortex/physiology , Voice/physiology , Corpus Striatum/physiology
4.
Math Biosci Eng ; 21(5): 5947-5971, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38872565

ABSTRACT

The technology of robot-assisted prostate seed implantation has developed rapidly. However, during the process, there are some problems to be solved, such as non-intuitive visualization effects and complicated robot control. To improve the intelligence and visualization of the operation process, a voice control technology of prostate seed implantation robot in augmented reality environment was proposed. Initially, the MRI image of the prostate was denoised and segmented. The three-dimensional model of prostate and its surrounding tissues was reconstructed by surface rendering technology. Combined with holographic application program, the augmented reality system of prostate seed implantation was built. An improved singular value decomposition three-dimensional registration algorithm based on iterative closest point was proposed, and the results of three-dimensional registration experiments verified that the algorithm could effectively improve the three-dimensional registration accuracy. A fusion algorithm based on spectral subtraction and BP neural network was proposed. The experimental results showed that the average delay of the fusion algorithm was 1.314 s, and the overall response time of the integrated system was 1.5 s. The fusion algorithm could effectively improve the reliability of the voice control system, and the integrated system could meet the responsiveness requirements of prostate seed implantation.


Subject(s)
Algorithms , Augmented Reality , Magnetic Resonance Imaging , Neural Networks, Computer , Prostate , Prostatic Neoplasms , Robotics , Humans , Male , Robotics/instrumentation , Magnetic Resonance Imaging/methods , Prostatic Neoplasms/diagnostic imaging , Prostate/diagnostic imaging , Imaging, Three-Dimensional , Voice , Robotic Surgical Procedures/instrumentation , Robotic Surgical Procedures/methods , Holography/methods , Holography/instrumentation , Brachytherapy/instrumentation , Reproducibility of Results
5.
J Acoust Soc Am ; 155(6): 3822-3832, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38874464

ABSTRACT

This study proposes the use of vocal resonators to enhance cardiac auscultation signals and evaluates their performance for voice-noise suppression. Data were collected using two electronic stethoscopes while each study subject was talking. One collected auscultation signal from the chest while the other collected voice signals from one of the three voice resonators (cheek, back of the neck, and shoulder). The spectral subtraction method was applied to the signals. Both objective and subjective metrics were used to evaluate the quality of enhanced signals and to investigate the most effective vocal resonator for noise suppression. Our preliminary findings showed a significant improvement after enhancement and demonstrated the efficacy of vocal resonators. A listening survey was conducted with thirteen physicians to evaluate the quality of enhanced signals, and they have received significantly better scores regarding the sound quality than their original signals. The shoulder resonator group demonstrated significantly better sound quality than the cheek group when reducing voice sound in cardiac auscultation signals. The suggested method has the potential to be used for the development of an electronic stethoscope with a robust noise removal function. Significant clinical benefits are expected from the expedited preliminary diagnostic procedure.


Subject(s)
Heart Auscultation , Signal Processing, Computer-Assisted , Stethoscopes , Humans , Heart Auscultation/instrumentation , Heart Auscultation/methods , Heart Auscultation/standards , Male , Female , Adult , Heart Sounds/physiology , Sound Spectrography , Equipment Design , Voice/physiology , Middle Aged , Voice Quality , Vibration , Noise
6.
JASA Express Lett ; 4(6)2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38888432

ABSTRACT

Singing is socially important but constrains voice acoustics, potentially masking certain aspects of vocal identity. Little is known about how well listeners extract talker details from sung speech or identify talkers across the sung and spoken modalities. Here, listeners (n = 149) were trained to recognize sung or spoken voices and then tested on their identification of these voices in both modalities. Learning vocal identities was initially easier through speech than song. At test, cross-modality voice recognition was above chance, but weaker than within-modality recognition. We conclude that talker information is accessible in sung speech, despite acoustic constraints in song.


Subject(s)
Singing , Speech Perception , Humans , Male , Female , Adult , Speech Perception/physiology , Voice , Young Adult , Recognition, Psychology , Speech
7.
Sci Rep ; 14(1): 13813, 2024 06 15.
Article in English | MEDLINE | ID: mdl-38877028

ABSTRACT

Parkinson's Disease (PD) is a prevalent neurological condition characterized by motor and cognitive impairments, typically manifesting around the age of 50 and presenting symptoms such as gait difficulties and speech impairments. Although a cure remains elusive, symptom management through medication is possible. Timely detection is pivotal for effective disease management. In this study, we leverage Machine Learning (ML) and Deep Learning (DL) techniques, specifically K-Nearest Neighbor (KNN) and Feed-forward Neural Network (FNN) models, to differentiate between individuals with PD and healthy individuals based on voice signal characteristics. Our dataset, sourced from the University of California at Irvine (UCI), comprises 195 voice recordings collected from 31 patients. To optimize model performance, we employ various strategies including Synthetic Minority Over-sampling Technique (SMOTE) for addressing class imbalance, Feature Selection to identify the most relevant features, and hyperparameter tuning using RandomizedSearchCV. Our experimentation reveals that the FNN and KSVM models, trained on an 80-20 split of the dataset for training and testing respectively, yield the most promising results. The FNN model achieves an impressive overall accuracy of 99.11%, with 98.78% recall, 99.96% precision, and a 99.23% f1-score. Similarly, the KSVM model demonstrates strong performance with an overall accuracy of 95.89%, recall of 96.88%, precision of 98.71%, and an f1-score of 97.62%. Overall, our study showcases the efficacy of ML and DL techniques in accurately identifying PD from voice signals, underscoring the potential for these approaches to contribute significantly to early diagnosis and intervention strategies for Parkinson's Disease.


Subject(s)
Machine Learning , Parkinson Disease , Parkinson Disease/diagnosis , Humans , Male , Female , Middle Aged , Aged , Neural Networks, Computer , Voice , Deep Learning
8.
Sci Rep ; 14(1): 12734, 2024 06 03.
Article in English | MEDLINE | ID: mdl-38830969

ABSTRACT

The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.


Subject(s)
Depression , Voice , Humans , Depression/diagnosis , Male , Female , Artificial Intelligence , Adult
9.
Sci Rep ; 14(1): 14575, 2024 06 25.
Article in English | MEDLINE | ID: mdl-38914752

ABSTRACT

People often interact with groups (i.e., ensembles) during social interactions. Given that group-level information is important in navigating social environments, we expect perceptual sensitivity to aspects of groups that are relevant for personal threat as well as social belonging. Most ensemble perception research has focused on visual ensembles, with little research looking at auditory or vocal ensembles. Across four studies, we present evidence that (i) perceivers accurately extract the sex composition of a group from voices alone, (ii) judgments of threat increase concomitantly with the number of men, and (iii) listeners' sense of belonging depends on the number of same-sex others in the group. This work advances our understanding of social cognition, interpersonal communication, and ensemble coding to include auditory information, and reveals people's ability to extract relevant social information from brief exposures to vocalizing groups.


Subject(s)
Voice , Humans , Male , Female , Adult , Sex Ratio , Social Perception , Young Adult , Auditory Perception/physiology , Interpersonal Relations , Social Interaction
10.
Proc Natl Acad Sci U S A ; 121(25): e2405588121, 2024 Jun 18.
Article in English | MEDLINE | ID: mdl-38861607

ABSTRACT

Many animals can extract useful information from the vocalizations of other species. Neuroimaging studies have evidenced areas sensitive to conspecific vocalizations in the cerebral cortex of primates, but how these areas process heterospecific vocalizations remains unclear. Using fMRI-guided electrophysiology, we recorded the spiking activity of individual neurons in the anterior temporal voice patches of two macaques while they listened to complex sounds including vocalizations from several species. In addition to cells selective for conspecific macaque vocalizations, we identified an unsuspected subpopulation of neurons with strong selectivity for human voice, not merely explained by spectral or temporal structure of the sounds. The auditory representational geometry implemented by these neurons was strongly related to that measured in the human voice areas with neuroimaging and only weakly to low-level acoustical structure. These findings provide new insights into the neural mechanisms involved in auditory expertise and the evolution of communication systems in primates.


Subject(s)
Auditory Perception , Magnetic Resonance Imaging , Neurons , Vocalization, Animal , Voice , Animals , Humans , Neurons/physiology , Voice/physiology , Magnetic Resonance Imaging/methods , Vocalization, Animal/physiology , Auditory Perception/physiology , Male , Macaca mulatta , Brain/physiology , Acoustic Stimulation , Brain Mapping/methods
11.
Eur J Psychotraumatol ; 15(1): 2358681, 2024.
Article in English | MEDLINE | ID: mdl-38837122

ABSTRACT

Background: Research has shown that potential perpetrators and individuals high in psychopathic traits tend to body language cues to target a potential new victim. However, whether targeting occurs also by tending to vocal cues has not been examined. Thus, the role of voice in interpersonal violence merits investigation.Objective: In two studies, we examined whether perpetrators could differentiate female speakers with and without sexual and physical assault histories (presented as rating the degree of 'vulnerability' to victimization).Methods: Two samples of male listeners (sample one N = 105, sample two, N = 109) participated. Each sample rated 18 voices (9 survivors and 9 controls). Listener sample one heard spontaneous speech, and listener sample two heard the second sentence of a standardized passage. Listeners' self-reported psychopathic traits and history of previous perpetration were measured.Results: Across both samples, history of perpetration (but not psychopathy) predicted accuracy in distinguishing survivors of assault.Conclusions: These findings highlight the potential role of voice in prevention and intervention. Gaining a further understanding of what voice cues are associated with accuracy in discerning survivors can also help us understand whether or not specialized voice training could have a role in self-defense practices.


We examined whether listeners with history of perpetration could differentiate female speakers with and without assault histories (presented as rating the degree of 'vulnerability' to victimization).Listeners' higher history of perpetration was associated with higher accuracy in differentiating survivors of assault from non-survivors.These findings highlight that voice could have a crucial role in prevention and intervention.


Subject(s)
Survivors , Voice , Humans , Male , Female , Adult , Survivors/psychology , Cues , Crime Victims/psychology , Middle Aged
12.
Sci Rep ; 14(1): 13132, 2024 06 07.
Article in English | MEDLINE | ID: mdl-38849382

ABSTRACT

Voice production of humans and most mammals is governed by the MyoElastic-AeroDynamic (MEAD) principle, where an air stream is modulated by self-sustained vocal fold oscillation to generate audible air pressure fluctuations. An alternative mechanism is found in ultrasonic vocalizations of rodents, which are established by an aeroacoustic (AA) phenomenon without vibration of laryngeal tissue. Previously, some authors argued that high-pitched human vocalization is also produced by the AA principle. Here, we investigate the so-called "whistle register" voice production in nine professional female operatic sopranos singing a scale from C6 (≈ 1047 Hz) to G6 (≈ 1568 Hz). Super-high-speed videolaryngoscopy revealed vocal fold collision in all participants, with closed quotients from 30 to 73%. Computational modeling showed that the biomechanical requirements to produce such high-pitched voice would be an increased contraction of the cricothyroid muscle, vocal fold strain of about 50%, and high subglottal pressure. Our data suggest that high-pitched operatic soprano singing uses the MEAD mechanism. Consequently, the commonly used term "whistle register" does not reflect the physical principle of a whistle with regard to voice generation in high pitched classical singing.


Subject(s)
Singing , Vocal Cords , Humans , Female , Singing/physiology , Biomechanical Phenomena , Vocal Cords/physiology , Adult , Sound , Voice/physiology , Phonation/physiology
13.
Proc Natl Acad Sci U S A ; 121(26): e2318361121, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38889147

ABSTRACT

When listeners hear a voice, they rapidly form a complex first impression of who the person behind that voice might be. We characterize how these multivariate first impressions from voices emerge over time across different levels of abstraction using electroencephalography and representational similarity analysis. We find that for eight perceived physical (gender, age, and health), trait (attractiveness, dominance, and trustworthiness), and social characteristics (educatedness and professionalism), representations emerge early (~80 ms after stimulus onset), with voice acoustics contributing to those representations between ~100 ms and 400 ms. While impressions of person characteristics are highly correlated, we can find evidence for highly abstracted, independent representations of individual person characteristics. These abstracted representationse merge gradually over time. That is, representations of physical characteristics (age, gender) arise early (from ~120 ms), while representations of some trait and social characteristics emerge later (~360 ms onward). The findings align with recent theoretical models and shed light on the computations underpinning person perception from voices.


Subject(s)
Auditory Perception , Brain , Electroencephalography , Voice , Humans , Male , Female , Voice/physiology , Adult , Brain/physiology , Auditory Perception/physiology , Young Adult , Social Perception
14.
J Speech Lang Hear Res ; 67(7): 1997-2020, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38861454

ABSTRACT

PURPOSE: Although different factors and voice measures have been associated with phonotraumatic vocal hyperfunction (PVH), it is unclear what percentage of individuals with PVH exhibit such differences during their daily lives. This study used a machine learning approach to quantify the consistency with which PVH manifests according to ambulatory voice measures. Analyses included acoustic parameters of phonation as well as temporal aspects of phonation and rest, with the goal of determining optimally consistent signatures of PVH. METHOD: Ambulatory neck-surface acceleration signals were recorded over 1 week from 116 female participants diagnosed with PVH and age-, sex-, and occupation-matched vocally healthy controls. The consistency of the manifestation of PVH was defined as the percentage of participants in each group that exhibited an atypical signature based on a target voice measure. Evaluation of each machine learning model used nested 10-fold cross-validation to improve the generalizability of findings. In Experiment 1, we trained separate logistic regression models based on the distributional characteristics of 14 voice measures and durations of voicing and resting segments. In Experiments 2 and 3, features of voicing and resting duration augmented the existing distributional characteristics to examine whether more consistent signatures would result. RESULTS: Experiment 1 showed that the difference in the magnitude of the first two harmonics (H1-H2) exhibited the most consistent signature (69.4% of participants with PVH and 20.4% of controls had an atypical H1-H2 signature), followed by spectral tilt over eight harmonics (73.6% participants with PVH and 32.1% of controls had an atypical spectral tilt signature) and estimated sound pressure level (SPL; 66.9% participants with PVH and 27.6% of controls had an atypical SPL signature). Additionally, 77.6% of participants with PVH had atypical resting duration, with 68.9% exhibiting atypical voicing duration. Experiments 2 and 3 showed that augmenting the best-performing voice measures with univariate features of voicing or resting durations yielded only incremental improvement in the classifier's performance. CONCLUSIONS: Females with PVH were more likely to use more abrupt vocal fold closure (lower H1-H2), phonate louder (higher SPL), and take shorter vocal rests. They were also less likely to use higher fundamental frequency during their daily activities. The difference in the voicing duration signature between participants with PVH and controls had a large effect size, providing strong empirical evidence regarding the role of voice use in the development of PVH.


Subject(s)
Machine Learning , Phonation , Humans , Female , Adult , Middle Aged , Phonation/physiology , Voice Disorders/physiopathology , Voice Disorders/diagnosis , Young Adult , Voice Quality/physiology , Vocal Cords/physiopathology , Speech Acoustics , Voice/physiology , Aged , Case-Control Studies
15.
J Speech Lang Hear Res ; 67(7): 2139-2158, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38875480

ABSTRACT

PURPOSE: This systematic review aimed to evaluate the effects of singing as an intervention for aging voice. METHOD: Quantitative studies of interventions for older adults with any medical condition that involves singing as training were reviewed, measured by respiration, phonation, and posture, which are the physical functions related to the aging voice. English and Chinese studies published until April 2024 were searched using 31 electronic databases, and seven studies were included. The included articles were assessed according to the Grading of Recommendations, Assessment, Development, and Evaluations rubric. RESULTS: Seven studies were included. These studies reported outcome measures that were related to respiratory functions only. For the intervention effect, statistically significant improvements were observed in five of the included studies, among which three studies had large effect sizes. The overall level of evidence of the included studies was not high, with three studies having moderate levels and the rest having lower levels. The intervention activities included trainings other than singing. These non-singing training items may have caused co-intervention bias in the study results. CONCLUSIONS: This systematic review suggests that singing as an intervention for older adults with respiratory and cognitive problems could improve respiration and respiratory-phonatory control. However, none of the included studies covers the other two of the physical functions related to aging voice (phonatory and postural functions). The overall level of evidence of the included studies was not high either. There is a need for more research evidence in singing-based intervention specifically for patient with aging voice.


Subject(s)
Aging , Singing , Humans , Aged , Aging/physiology , Voice Disorders/therapy , Phonation/physiology , Voice Quality , Voice/physiology , Respiration , Posture/physiology , Aged, 80 and over
16.
J Matern Fetal Neonatal Med ; 37(1): 2362933, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38910112

ABSTRACT

OBJECTIVE: To study the effects of playing mother's recorded voice to preterm infants in the NICU on their mothers' mental health as measured by the Depression, Anxiety and Stress Scale -21 (DASS-21) questionnaire. DESIGN/METHODS: This was a pilot single center prospective randomized controlled trial done at a level IV NICU. The trial was registered at clinicaltrials.gov (NCT04559620). Inclusion criteria were mothers of preterm infants with gestational ages between 26wks and 30 weeks. DASS-21 questionnaire was administered to all the enrolled mothers in the first week after birth followed by recording of their voice by the music therapists. In the interventional group, recorded maternal voice was played into the infant incubator between 15 and 21 days of life. A second DASS-21 was administered between 21 and 23 days of life. The Wilcoxon rank-sum test was used to compare DASS-21 scores between the two groups and Wilcoxon signed-rank test was used to compare the pre- and post-intervention DASS-21 scores. RESULTS: Forty eligible mothers were randomized: 20 to the intervention group and 20 to the control group. The baseline maternal and neonatal characteristics were similar between the two groups. There was no significant difference in the DASS-21 scores between the two groups at baseline or after the study intervention. There was no difference in the pre- and post-interventional DASS-21 scores or its individual components in the experimental group. There was a significant decrease in the total DASS-21 score and the anxiety component of DASS-21 between weeks 1 and 4 in the control group. CONCLUSION: In this pilot randomized control study, recorded maternal voice played into preterm infant's incubator did not have any effect on maternal mental health as measured by the DASS-21 questionnaire. Data obtained in this pilot study are useful in future RCTs (Randomized Controlled Trial) to address this important issue.


Subject(s)
Anxiety , Depression , Infant, Premature , Stress, Psychological , Humans , Female , Pilot Projects , Infant, Newborn , Infant, Premature/psychology , Anxiety/therapy , Adult , Stress, Psychological/therapy , Depression/therapy , Mothers/psychology , Incubators, Infant , Prospective Studies , Music Therapy/methods , Voice/physiology
17.
Acta Psychol (Amst) ; 247: 104317, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38743984

ABSTRACT

Whether or not self-face and self-voice are processed more accurately than others' remains inconclusive. Most previous studies asked participants to judge the presented stimulus as their own or as others', and compared response accuracy to discuss self-advantage. However, it is possible that participants responded correctly in the "other" trials not by identifying "other" but rather by rejecting "self." The present study employed an identity-irrelevant discrimination task, in which participants detected the odd stimulus among the three sequentially presented stimuli. We measured the discrimination thresholds for the self, friend, and stranger conditions. In Experiment 1 (face), the discrimination thresholds for self and friends' faces were lower than those for strangers' faces. This suggests that self-face may not be perceived as special or unique, and facial representation may become more accurate due to increased familiarity through repetitive exposure. Whereas, in Experiment 2 (voice), the discrimination thresholds did not differ between the three conditions, suggesting that the sensitivity to changes is the same regardless of identity. Overall, we found no evidence for self-advantage in identification accuracy, as we observed a familiarity-advantage rather than self-advantage in face processing and a null difference in voice processing.


Subject(s)
Discrimination, Psychological , Facial Recognition , Recognition, Psychology , Voice , Humans , Recognition, Psychology/physiology , Male , Female , Facial Recognition/physiology , Young Adult , Adult , Discrimination, Psychological/physiology , Auditory Perception/physiology , Social Perception
18.
Contemp Clin Trials ; 142: 107574, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38763307

ABSTRACT

BACKGROUND: Novel and scalable psychotherapies are urgently needed to address the depression and anxiety epidemic. Leveraging artificial intelligence (AI), a voice-based virtual coach named Lumen was developed to deliver problem solving treatment (PST). The first pilot trial showed promising changes in cognitive control measured by functional neuroimaging and improvements in depression and anxiety symptoms. METHODS: To further validate Lumen in a 3-arm randomized clinical trial, 200 participants with mild-to-moderate depression and/or anxiety will be randomly assigned in a 2:1:1 ratio to receive Lumen-coached PST, human-coached PST as active treatment comparison, or a waitlist control condition where participants can receive Lumen after the trial period. Participants will be assessed at baseline and 18 weeks. The primary aim is to confirm neural target engagement by testing whether compared with waitlist controls, Lumen participants will show significantly greater improvements from baseline to 18 weeks in the a priori neural target for cognitive control, right dorsal lateral prefrontal cortex engaged by the go/nogo task (primary superiority hypothesis). A secondary hypothesis will test whether compared with human-coached PST participants, Lumen participants will show equivalent improvements (i.e., noninferiority) in the same neural target from baseline to 18 weeks. The second aim is to examine (1) treatment effects on depression and anxiety symptoms, psychosocial functioning, and quality of life outcomes, and (2) relationships of neural target engagement to these patient-reported outcomes. CONCLUSIONS: This study offers potential to improve the reach and impact of psychotherapy, mitigating access, cost, and stigma barriers for people with depression and/or anxiety. CLINICALTRIALS: gov #: NCT05603923.


Subject(s)
Anxiety , Artificial Intelligence , Depression , Humans , Adult , Anxiety/therapy , Depression/therapy , Male , Female , Voice , Problem Solving , Psychological Distress , Quality of Life , Counseling/methods , Middle Aged , Prefrontal Cortex , Psychotherapy/methods , Functional Neuroimaging/methods
19.
PLoS One ; 19(5): e0299140, 2024.
Article in English | MEDLINE | ID: mdl-38809807

ABSTRACT

Non-random exploration of infant speech-like vocalizations (e.g., squeals, growls, and vowel-like sounds or "vocants") is pivotal in speech development. This type of vocal exploration, often noticed when infants produce particular vocal types in clusters, serves two crucial purposes: it establishes a foundation for speech because speech requires formation of new vocal categories, and it serves as a basis for vocal signaling of wellness and interaction with caregivers. Despite the significance of clustering, existing research has largely relied on subjective descriptions and anecdotal observations regarding early vocal category formation. In this study, we aim to address this gap by presenting the first large-scale empirical evidence of vocal category exploration and clustering throughout the first year of life. We observed infant vocalizations longitudinally using all-day home recordings from 130 typically developing infants across the entire first year of life. To identify clustering patterns, we conducted Fisher's exact tests to compare the occurrence of squeals versus vocants, as well as growls versus vocants. We found that across the first year, infants demonstrated clear clustering patterns of squeals and growls, indicating that these categories were not randomly produced, but rather, it seemed, infants actively engaged in practice of these specific categories. The findings lend support to the concept of infants as manifesting active vocal exploration and category formation, a key foundation for vocal language.


Subject(s)
Speech , Humans , Infant , Male , Female , Speech/physiology , Language Development , Voice/physiology , Longitudinal Studies , Phonetics
20.
Forensic Sci Int ; 360: 112048, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38733653

ABSTRACT

Expert testimony is only admissible in common-law systems if it will potentially assist the trier of fact. In order for a forensic-voice-comparison expert's testimony to assist a trier of fact, the expert's forensic voice comparison should be more accurate than the trier of fact's speaker identification. "Speaker identification in courtroom contexts - Part I" addressed the question of whether speaker identification by an individual lay listener (such as a judge) would be more or less accurate than the output of a forensic-voice-comparison system that is based on state-of-the-art automatic-speaker-recognition technology. The present paper addresses the question of whether speaker identification by a group of collaborating lay listeners (such as a jury) would be more or less accurate than the output of such a forensic-voice-comparison system. As members of collaborating groups, participants listen to pairs of recordings reflecting the conditions of the questioned- and known-speaker recordings in an actual case, confer, and make a probabilistic consensus judgement on each pair of recordings. The present paper also compares group-consensus responses with "wisdom of the crowd" which uses the average of the responses from multiple independent individual listeners.


Subject(s)
Forensic Sciences , Voice , Humans , Forensic Sciences/methods , Expert Testimony , Male , Female , Adult , Speech Recognition Software , Cooperative Behavior , Biometric Identification/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...