Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Digital voice analysis as a biomarker of acromegaly.

Vouzouneraki, Konstantina; Nylén, Fredrik; Holmberg, Jenny; Olsson, Tommy; Berinder, Katarina; Höybye, Charlotte; Petersson, Maria; Bensing, Sophie; Åkerman, Anna-Karin; Borg, Henrik; Ekman, Bertil; Robért, Jonas; Engström, Britt Edén; Ragnarsson, Oskar; Burman, Pia; Dahlqvist, Per.

J Clin Endocrinol Metab ; 2024 Oct 04.

Artigo em Inglês | MEDLINE | ID: mdl-39363748

RESUMO

CONTEXT: There is a considerable diagnostic delay in acromegaly contributing to increased morbidity. Voice changes due to orofacial and laryngeal changes are common in acromegaly. OBJECTIVE: Our aim was to explore the use of digital voice analysis as a biomarker for acromegaly using broad acoustic analysis and machine learning. METHODS: Voice recordings from patients with acromegaly and matched controls were collected using a mobile phone at Swedish university hospitals. Anthropometric and clinical data and the Voice Handicap Index (VHI) were assessed. Digital voice analysis of a sustained and stable vowel [a] resulted in 3274 parameters, which were used for training of machine learning models classifying the speaker as "acromegaly" or "control". The machine learning model was trained with 76% of the data and the remaining 24% was used to assess its performance. For comparison, voice recordings of 50 pairs of participants were assessed by 12 experienced endocrinologists. RESULTS: We included 151 Swedish patients with acromegaly (13% biochemically active and 10% newly diagnosed) and 139 matched controls. The machine learning model identified patients with acromegaly more accurately [area under the receiver operating curve (ROC AUC) 0.84] than experienced endocrinologists (ROC AUC 0.69). Self-reported voice problems were more pronounced in patients with acromegaly than matched controls (median VHI 6 vs 2, P < .01) with higher prevalence of clinically significant voice handicap (VHI ≥20: 22.5% vs 3.6%). CONCLUSION: Digital voice analysis can identify patients with acromegaly from short voice recordings with high accuracy. Patients with acromegaly experience more voice disorders than matched controls.

2.

Temperament and Voice Quality in Patients With Vocal Fold Nodules.

Metin, Emine; Uygur, Kemal; Okur, Erdogan; Metin, Bilge; Gündüz, Bülent.

J Voice ; 2024 Sep 16.

Artigo em Inglês | MEDLINE | ID: mdl-39289086

RESUMO

BACKGROUND: Vocal fold nodules are most common in women and patients with vocal fold nodules represent the largest group in voice clinics. The prevalence of vocal fold nodules is particularly high in professions where the voice is used on a regular basis. The quality of the voice is influenced by a number of factors, including temperament, stress, and emotional state. These factors can influence the physiological conditions of phonation. The objective of this study was to assess the acoustic parameters of voice in patients with vocal nodules in comparison to healthy controls, and to determine whether voice quality is influenced by emotional state and coping with stress. METHODS: A total of 32 patients admitted to the ENT Department of the University Medical School with voice disorders between March and June 2007 constituted the study group. All patients were found to have a vocal nodule on physical and stroboscopic examination. The control group consisted of 30 healthy individuals who did not report any voice disorders. All subjects underwent voice recordings in the voice laboratory. Following the completion of the voice evaluation form, an aerodynamic assessment (a, s, and s/z-time), an index of vocal impairment, the Rosenbaum's Learned Resourcefulness Scale, and the Temperament and Characteristics Inventory (Temperament Evaluation of Memphis, Pisa, Paris, San Diego Autoquestionaire), all subjects underwent further assessment. Acoustic analysis was conducted using the CSL program in Multidimensional voice program analysis and the Vocal Assessment component of Dr. Speech. RESULTS: The decrease in maximum phonation time in the study group was statistically significant. There were statistically significant differences in the parameters Mean Fundamental Frequence, Jitter, Relative Avarage Perturbation, Pitch Perturbation Quotient, Shimmer in dB, Shimmer, Amplitude Perturbation Quotient, Noise Hormonic Ratio, Soft Phonation Index from the Multidimensional voice program analysis, Jitter, Shimmer% from the voice assessment, and the perceptual rating (H, R, and B) from Dr. Speech's voice assessment analysis. The differences in the dimensions of anxious temperament and the examination of stress problem-solving strategies were significant between the study group and the control subjects. Differences in aerodynamic and acoustic parameters were found between disordered and healthy groups, as well as between individuals with different personalities. Overall, those with nodules were less likely to manage stress well than those without nodules. CONCLUSIONS: The study group and the control subjects showed significant differences in anxious temperament dimensions and stress problem-solving strategies. There were also differences in aerodynamic and acoustic parameters between the disordered and healthy groups, as well as between the groups with and without personality temperament differences. Overall, those with nodules were less likely to manage stress well than those without nodules. This finding indicates that stress management options are not effectively utilized in patients with vocal fold nodules. So, it might be a good idea to look into some kind of therapeutic approach and patient education for stress management.

3.

HEAR set: A ligHtwEight acoustic paRameters set to assess mental health from voice analysis.

Verde, Laura; Marulli, Fiammetta; De Fazio, Roberta; Campanile, Lelio; Marrone, Stefano.

Comput Biol Med ; 182: 109021, 2024 Sep 04.

Artigo em Inglês | MEDLINE | ID: mdl-39236660

RESUMO

BACKGROUND: Voice analysis has significant potential in aiding healthcare professionals with detecting, diagnosing, and personalising treatment. It represents an objective and non-intrusive tool for supporting the detection and monitoring of specific pathologies. By calculating various acoustic features, voice analysis extracts valuable information to assess voice quality. The choice of these parameters is crucial for an accurate assessment. METHOD: In this paper, we propose a lightweight acoustic parameter set, named HEAR, able to evaluate voice quality to assess mental health. In detail, this consists of jitter, spectral centroid, Mel-frequency cepstral coefficients, and their derivates. The choice of parameters for the proposed set was influenced by the explainable significance of each acoustic parameter in the voice production process. RESULTS: The reliability of the proposed acoustic set to detect the early symptoms of mental disorders was evaluated in an experimental phase. Voices of subjects suffering from different mental pathologies, selected from available databases, were analysed. The performance obtained from the HEAR features was compared with that obtained by analysing features selected from toolkits widely used in the literature, as with those obtained using learned procedures. The best performance in terms of MAE and RMSE was achieved for the detection of depression (5.32 and 6.24 respectively). For the detection of psychogenic dysphonia and anxiety, the highest accuracy rates were about 75 % and 97 %, respectively. CONCLUSIONS: The comparative evaluation was carried out to assess the performance of the proposed approach, demonstrating a reliable capability to highlight affective physiological alterations of voice quality due to the considered mental disorders.

4.

Is organizational intervention using Layered Voice Analysis effective in addressing operator mental health in call centers? A randomized controlled trial.

Tani, Naomichi; Takao, Yoshihiro; Noro, Sakihito; Fujihara, Hiroaki; Eguchi, Hisashi; Sakai, Kazuki; Ebara, Takeshi.

J Occup Health ; 66(1)2024 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-39141838

RESUMO

OBJECTIVES: To verify the effects of organizational interventions on mental health using Layered Voice Analysis (LVA). METHODS: A 12-week single-blind randomized controlled trial was conducted with call center operators. Sixty-six participants were randomly assigned to either a control group (n = 26), an LVA intervention group (n = 20), or a one-on-one intervention group (n = 20). The control group received general self-care information about preventing mental health problems from the Ministry of Health, Labour, and Welfare, Japan website. The organizational LVA intervention involved group sessions using participants' voice calls with customers, whereas the one-on-one intervention consisted of meetings or consultations with participants and their supervisors to discuss preventing mental health issues at work. To verify the effectiveness of the intervention program, the Center for Epidemiologic Studies Depression Scale (CES-D) was administered 4 times (baseline, 4, 8, and 12 weeks) as the primary outcome, and the data were analyzed using a linear mixed model. The intervention of LVA was subdivided and analyzed into LVA ≥5 times and LVA ≤4 times out of the total 6 interventions. RESULTS: Compared with the control group, a significant CES-D reduction effect was observed at 8/12 weeks for the difference of coefficients (DOC; [ßint - ßctrl]) for the intervention of LVA ≥5 times (DOC -1.86 and -2.36, respectively). Similarly, even intervention LVA ≤4 times also showed a significant decrease of CES-D scores at 8/12 weeks (DOC -2.20 and -2.38, respectively). CONCLUSIONS: An organizational intervention using LVA has the potential to reduce the risk of depression among call center operators.

Assuntos

Call Centers , Humanos , Masculino , Feminino , Adulto , Método Simples-Cego , Pessoa de Meia-Idade , Japão , Saúde Mental , Depressão/prevenção & controle , Saúde Ocupacional

5.

Analysis of Tracheoesophageal Voice after Total Laryngectomy: A Single Center Experience.

Migliorelli, Andrea; Natale, Erennio; Manuelli, Marianna; Ciorba, Andrea; Bianchini, Chiara; Pelucchi, Stefano; Stomeo, Francesco.

J Clin Med ; 13(15)2024 Jul 27.

Artigo em Inglês | MEDLINE | ID: mdl-39124659

RESUMO

Background/Objectives: Tracheoesophageal voice is the most commonly used voice rehabilitation technique after a total laryngectomy. The placement of the tracheoesophageal prosthesis can be performed at the same time as the total laryngectomy (primary placement) or in a second procedure after surgery (secondary placement). The purpose of this study is to analyze the substitution voice in patients with a tracheoesophageal prosthesis, considering the influence of radiotherapy and timing of prosthesis placement (primary or secondary) on voice quality. Methods: A retrospective analysis was conducted of all patients who received a tracheoesophageal phonatory prosthesis after a total laryngectomy was performed. We assessed whether patients received radiotherapy and whether they had a primary or secondary tracheoesophageal prosthesis. For the voice analysis, maximum phonation time (MPT), INFVo, SECEL, AVQI, CPPS, harmonic to noise ratio (HNR), unvoiced fraction (UVF), and number of voice breaks (NVB) were evaluated. Results: A total of 15 patients (14 males and 1 female) with a mean age of 71.8 years (SD ± 7.5) were enrolled. Eight had a primary prosthesis placement and five did not receive radiotherapy. INFVo parameters I and Vo were higher in patients with a primary placement of the phonatory prosthesis (p = 0.046 and p = 0.047). Patients who received the prosthesis secondarily had a higher mean CPPS and lower mean AVQI. Conclusions: A secondary placement of the prostheses seems to result in a minimal advantage in voice quality compared to a primary placement. Radiation therapy, on the other hand, has no effect on voice quality, according to these preliminary data.

6.

Acoustic features from speech as markers of depressive and manic symptoms in bipolar disorder: A prospective study.

Kaczmarek-Majer, Katarzyna; Dominiak, Monika; Antosik, Anna Z; Hryniewicz, Olgierd; Kaminska, Olga; Opara, Karol; Owsinski, Jan; Radziszewska, Weronika; Sochacka, Malgorzata; Swiecicki, Lukasz.

Acta Psychiatr Scand ; 2024 Aug 08.

Artigo em Inglês | MEDLINE | ID: mdl-39118422

RESUMO

INTRODUCTION: Voice features could be a sensitive marker of affective state in bipolar disorder (BD). Smartphone apps offer an excellent opportunity to collect voice data in the natural setting and become a useful tool in phase prediction in BD. AIMS OF THE STUDY: We investigate the relations between the symptoms of BD, evaluated by psychiatrists, and patients' voice characteristics. A smartphone app extracted acoustic parameters from the daily phone calls of n = 51 patients. We show how the prosodic, spectral, and voice quality features correlate with clinically assessed affective states and explore their usefulness in predicting the BD phase. METHODS: A smartphone app (BDmon) was developed to collect the voice signal and extract its physical features. BD patients used the application on average for 208 days. Psychiatrists assessed the severity of BD symptoms using the Hamilton depression rating scale -17 and the Young Mania rating scale. We analyze the relations between acoustic features of speech and patients' mental states using linear generalized mixed-effect models. RESULTS: The prosodic, spectral, and voice quality parameters, are valid markers in assessing the severity of manic and depressive symptoms. The accuracy of the predictive generalized mixed-effect model is 70.9%-71.4%. Significant differences in the effect sizes and directions are observed between female and male subgroups. The greater the severity of mania in males, the louder (ß = 1.6) and higher the tone of voice (ß = 0.71), more clearly (ß = 1.35), and more sharply they speak (ß = 0.95), and their conversations are longer (ß = 1.64). For females, the observations are either exactly the opposite-the greater the severity of mania, the quieter (ß = -0.27) and lower the tone of voice (ß = -0.21) and less clearly (ß = -0.25) they speak - or no correlations are found (length of speech). On the other hand, the greater the severity of bipolar depression in males, the quieter (ß = -1.07) and less clearly they speak (ß = -1.00). In females, no distinct correlations between the severity of depressive symptoms and the change in voice parameters are found. CONCLUSIONS: Speech analysis provides physiological markers of affective symptoms in BD and acoustic features extracted from speech are effective in predicting BD phases. This could personalize monitoring and care for BD patients, helping to decide whether a specialist should be consulted.

7.

Exploring explainable AI features in the vocal biomarkers of lung disease.

Chen, Zhao; Liang, Ning; Li, Haoyuan; Zhang, Haili; Li, Huizhen; Yan, Lijiao; Hu, Ziteng; Chen, Yaxin; Zhang, Yujing; Wang, Yanping; Ke, Dandan; Shi, Nannan.

Comput Biol Med ; 179: 108844, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38981214

RESUMO

This review delves into the burgeoning field of explainable artificial intelligence (XAI) in the detection and analysis of lung diseases through vocal biomarkers. Lung diseases, often elusive in their early stages, pose a significant public health challenge. Recent advancements in AI have ushered in innovative methods for early detection, yet the black-box nature of many AI models limits their clinical applicability. XAI emerges as a pivotal tool, enhancing transparency and interpretability in AI-driven diagnostics. This review synthesizes current research on the application of XAI in analyzing vocal biomarkers for lung diseases, highlighting how these techniques elucidate the connections between specific vocal features and lung pathology. We critically examine the methodologies employed, the types of lung diseases studied, and the performance of various XAI models. The potential for XAI to aid in early detection, monitor disease progression, and personalize treatment strategies in pulmonary medicine is emphasized. Furthermore, this review identifies current challenges, including data heterogeneity and model generalizability, and proposes future directions for research. By offering a comprehensive analysis of explainable AI features in the context of lung disease detection, this review aims to bridge the gap between advanced computational approaches and clinical practice, paving the way for more transparent, reliable, and effective diagnostic tools.

Assuntos

Inteligência Artificial , Biomarcadores , Pneumopatias , Humanos , Pneumopatias/diagnóstico , Biomarcadores/metabolismo

8.

Assessment of Changes in the Quality of Voice in Post-thyroidectomy Patients With Intact Recurrent and Superior Laryngeal Nerve Function.

Sahoo, Anjan K; Sahoo, Prasanta K; Gupta, Vikas; Behera, Ganakalyan; Sidam, Shaila; Mishra, Utkal P; Chavan, Aparna; Binu, Rashma; Gour, Shivam; Velayutham, Dhanoush Kumar; Chatterjee, Twisha; Pal, Debrup.

Cureus ; 16(5): e60873, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38916010

RESUMO

Background Thyroidectomy is a routinely performed surgical procedure used to treat benign, malignant, and some hormonal disorders of the thyroid that are not responsive to medical therapy. Voice alterations following thyroid surgery are well-documented and often attributed to recurrent laryngeal nerve dysfunction. However, subtle changes in voice quality can persist despite anatomically intact laryngeal nerves. This study aimed to quantify post-thyroidectomy voice changes in patients with intact laryngeal nerves, focusing on fundamental frequency, first formant frequency, shimmer intensity, and maximum phonation duration. Methodology This cross-sectional study was conducted at a tertiary referral center in central India and focused on post-thyroidectomy patients with normal vocal cord function. Preoperative assessments included laryngeal endoscopy and voice recording using a computer program, with evaluations repeated at one and three months post-surgery. Patients with normal laryngeal endoscopic findings underwent voice analysis and provided feedback on subjective voice changes. The PRAAT version 6.2 software was utilized for voice analysis. Results The study included 41 patients with normal laryngoscopic findings after thyroid surgery, with the majority being female (85.4%) and the average age being 42.4 years. Hemithyroidectomy was performed in 41.4% of patients and total thyroidectomy in 58.6%, with eight patients undergoing central compartment neck dissection. Except for one patient, the majority reported no subjective change in voice following surgery. Objective voice analysis showed statistically significant changes in the one-month postoperative period compared to preoperative values, including a 5.87% decrease in fundamental frequency, a 1.37% decrease in shimmer intensity, and a 6.24% decrease in first formant frequency, along with a 4.35% decrease in maximum phonatory duration. These trends persisted at the three-month postoperative period, although values approached close to preoperative levels. Results revealed statistically significant alterations in voice parameters, particularly fundamental frequency and first formant frequency, with greater values observed in total thyroidectomy patients. Shimmer intensity also exhibited slight changes. Comparison between hemithyroidectomy and total thyroidectomy groups revealed no significant differences in fundamental frequency, first formant frequency, and shimmer. However, maximum phonation duration showed a significantly greater change in the hemithyroidectomy group at both one-month and three-month postoperative intervals. Conclusions This study on post-thyroidectomy patients with normal vocal cord movement revealed significant changes in voice parameters postoperatively, with most patients reporting no subjective voice changes. The findings highlight the importance of objective voice analysis in assessing post-thyroidectomy voice outcomes.

9.

Voice disorder discrimination using vowel acoustic measures in female speakers.

Nguyen, Duy Duong; Novakovic, Daniel; Madill, Catherine.

Int J Lang Commun Disord ; 2024 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-38884559

RESUMO

BACKGROUND: Sustained vowels are important vocal tasks that have been investigated in discriminating voice disorders using acoustic analysis. To date, no study has combined vowel acoustic measures only that evaluate major aspects of the pathological voice signals in voice disorder discrimination. AIMS: To investigate the value of vowel acoustic measures that quantify glottal noise, signal stability, signal periodicity, spectral slope and overall voice quality in discriminating female speakers with and without voice disorders. METHODS & PROCEDURES: Sustained vowel /É/ samples were extracted from 133 voice-disordered female patients and 97 non-voice disordered female speakers and were signal typed prior to analysis. Praat software was used to measure harmonics-to-noise ratio (HNR), glottal-to-noise excitation ratio (GNE), the standard deviation of fundamental frequency (F0SD) and cepstral peak prominence (CPPp); and the Analysis of Dysphonia in Speech and Voice (ADSV) program was used to measure CPPadsv, low/high spectral ratio (LH) and the cepstral/spectral index of dysphonia (CSID). Outcome measures included sensitivity, specificity, and discrimination accuracy. OUTCOMES & RESULTS: As individual acoustic measures, only spectral-based measures showed good (CPPadsv) and acceptable (CSID) discrimination results. The HNR, GNE and CPPp measures had acceptable sensitivity but poor or non-acceptable specificity and discrimination accuracy. Logistic regression models with all Praat measures (F0SD, HNR, GNE, CPPp) plus ADSV measures (CPPadsv, LH or CSID) provided excellent sensitivity, good-to-excellent specificity and excellent discrimination accuracy. ROC analysis for all individual measures showed that CPPadsv, CSID, CPPp, GNE and F0SD had the highest area under the curve (AUC) values. CONCLUSIONS & IMPLICATIONS: A combination of acoustic measures that evaluate the major aspects of vocal dysfunction resulted in good to excellent voice discrimination outcomes. Individual acoustic measures had lower discrimination ability than combined measures. The findings implied that acoustic measures extracted from a prolonged vowel were useful in voice disorder discrimination. WHAT THIS PAPER ADDS: What is already known on this subject Acoustic measures hold great value in discriminating voice disorders from normal voices. However, no study has evaluated discrimination values of a combination of sustained vowel acoustic measures that quantify additive noise, signal stability, signal periodicity, spectral slope and overall voice quality in single-gender cohorts. Previous studies have not used signal typing (the classification of the acoustic signals) for time-based measures, impacting the reliability of discrimination. What this study adds to the existing knowledge This study was the first to implement signal typing to include sustained vowel samples of Types 1 and 2 signals for discrimination statistics. We showed that a combination of vocal acoustic measures using time- and spectral-based extraction from the sustained /É/ vowel evaluating additive noise, signal stability, signal periodicity, spectral slope and overall voice quality resulted in good to excellent sensitivity, specificity and discrimination accuracy. As individual measures, traditional time-based measures such as HNR had rather limited discrimination values whilst spectral-based measures provided higher discrimination values. Measures that are sensitive to signal types have low discrimination ability. What are the potential or actual clinical implications of this work? The sustained vowel /É/ is a relevant, universal vocal task for clinical application using acoustic measures to discriminate female speakers with and without voice disorders if signal typing is implemented. Clinical voice assessment using vowels may not be effective if relying solely on time-based measurements. Spectral-based measures perform better in voice disorder discrimination given their insensitivity to signal types. The most effective voice disorder discrimination could only be obtained using a combination of acoustic measures that quantify major phenomena in the signals of disordered voices. Using measures extracted from both programs, Praat and ADSV, is useful given that specific settings in a program may impact on discrimination accuracy.

10.

Which Mask, N95 or Surgical Mask, Causes Hoarseness in Healthcare Workers?

Altan, Esma; Barmak, Elife; Tatar, Emel Çadalli; Saylam, Guleser; Korkmaz, Mehmet Hakan.

J Voice ; 2024 Jun 19.

Artigo em Inglês | MEDLINE | ID: mdl-38902143

RESUMO

OBJECTIVES: This study aimed to determine the impact of different types of masks on the voices of healthcare professionals who had to wear masks for an extended amount of time during the pandemic period and had a healthy voice. METHODS: Our research included 41 healthcare workers. The participants were separated into two groups: surgical (n = 21) and N95 mask users (n = 20). Healthcare workers evaluated masks before and after wearing them for at least 8 hours throughout the workday. All subjects had a videolaryngoscopic examination; the Voice Handicap Index-10 (VHI-10), GRBAS, acoustic voice analysis (F0, jitter%, shimmer%, noise/harmonic ratio, relative average perturbation [RAP]), aerodynamic measures (maximum phonation time, MPT), and blood oxygen saturation were evaluated. RESULTS: Although both groups' VHI-10 scores increased after using the mask, this rise was not statistically significant in our research. According to the GRBAS classification, voice quality deterioration was identified in 9.6% (mild-moderate) of the group using surgical masks and 15% (mild) of the group wearing N95. Only the jitter and RAP values of individuals wearing both surgical and N95 masks were determined to be statistically significant. There was no significant change in MPT following mask wear in either group. Both the surgical and N95 mask-using groups showed a substantial drop in blood oxygen saturation before and after mask usage. CONCLUSION: There was no change in voice quality between healthcare workers wearing surgical and N95 masks. It has been noticed that voice perception and quality are affected by the mask's barrier effect rather than the kind of mask.

11.

Harnessing Voice Analysis and Machine Learning for Early Diagnosis of Parkinson's Disease: A Comparative Study Across Three Datasets.

Neto, Osmar Pinto.

J Voice ; 2024 May 12.

Artigo em Inglês | MEDLINE | ID: mdl-38740529

RESUMO

OBJECTIVE: This study evaluates the efficacy of voice analysis combined with machine learning (ML) techniques in enabling the diagnosis of Parkinson's disease (PD). METHODS: Voice data, phonation of the vowel "a," from three distinct datasets (two from the University of California Irvine ML Repository and one from figshare) for 432 participants (278 PD patients) were analyzed. We employed four ML models-Artificial Neural Networks, Random Forest, Gradient Boosting (GB), and Support Vector Machine (SVM)-alongside two ensemble methods (soft voting classifier-Ensemble Voting Classifier and stacking method-Ensemble Stacking Model (ESM)). The models underwent 50 iterations of evaluation, involving various data splits and 10-fold cross-validation. Comparative analysis was done using one-way Analysis of Variance followed by Bonferroni posthoc corrections. RESULTS: The ESM, SVM, and GB models emerged as the top performers, demonstrating superior performance across metrics, including accuracy, sensitivity, specificity, precision, F1 score, and area under the receiver operating characteristic curve (ROC AUC). Despite data heterogeneity and variable selection limitations, the models showed high values for all metrics. CONCLUSIONS: ML integration with voice analysis, mainly through ESM, SVM, and GB, is promising for early PD diagnosis. Using multi-source data and a large sample size enhances our findings' validity, reliability, and generalizability. SIGNIFICANCE: Integrating advanced ML techniques with voice analysis demonstrates substantial potential for improving early PD detection, offering valuable tools for speech-language pathologists (SLPs). These findings provide clinically relevant insights that can be applied within the scope of SLP practice to refine diagnostic processes and facilitate early intervention.

12.

Prediction of dysphagia aspiration through machine learning-based analysis of patients' postprandial voices.

Kim, Jung-Min; Kim, Min-Seop; Choi, Sun-Young; Ryu, Ju Seok.

J Neuroeng Rehabil ; 21(1): 43, 2024 03 30.

Artigo em Inglês | MEDLINE | ID: mdl-38555417

RESUMO

BACKGROUND: Conventional diagnostic methods for dysphagia have limitations such as long wait times, radiation risks, and restricted evaluation. Therefore, voice-based diagnostic and monitoring technologies are required to overcome these limitations. Based on our hypothesis regarding the impact of weakened muscle strength and the presence of aspiration on vocal characteristics, this single-center, prospective study aimed to develop a machine-learning algorithm for predicting dysphagia status (normal, and aspiration) by analyzing postprandial voice limiting intake to 3 cc. METHODS: Conducted from September 2021 to February 2023 at Seoul National University Bundang Hospital, this single center, prospective cohort study included 198 participants aged 40 or older, with 128 without suspected dysphagia and 70 with dysphagia-aspiration. Voice data from participants were collected and used to develop dysphagia prediction models using the Multi-Layer Perceptron (MLP) with MobileNet V3. Male-only, female-only, and combined models were constructed using 10-fold cross-validation. Through the inference process, we established a model capable of probabilistically categorizing a new patient's voice as either normal or indicating the possibility of aspiration. RESULTS: The pre-trained models (mn40_as and mn30_as) exhibited superior performance compared to the non-pre-trained models (mn4.0 and mn3.0). Overall, the best-performing model, mn30_as, which is a pre-trained model, demonstrated an average AUC across 10 folds as follows: combined model 0.8361 (95% CI 0.7667-0.9056; max 0.9541), male model 0.8010 (95% CI 0.6589-0.9432; max 1.000), and female model 0.7572 (95% CI 0.6578-0.8567; max 0.9779). However, for the female model, a slightly higher result was observed with the mn4.0, which scored 0.7679 (95% CI 0.6426-0.8931; max 0.9722). Additionally, the other models (pre-trained; mn40_as, non-pre-trained; mn4.0 and mn3.0) also achieved performance above 0.7 in most cases, and the highest fold-level performance for most models was approximately around 0.9. The 'mn' in model names refers to MobileNet and the following number indicates the 'width_mult' parameter. CONCLUSIONS: In this study, we used mel-spectrogram analysis and a MobileNetV3 model for predicting dysphagia aspiration. Our research highlights voice analysis potential in dysphagia screening, diagnosis, and monitoring, aiming for non-invasive safer, and more effective interventions. TRIAL REGISTRATION: This study was approved by the IRB (No. B-2109-707-303) and registered on clinicaltrials.gov (ID: NCT05149976).

Assuntos

Transtornos de Deglutição , Feminino , Humanos , Masculino , Algoritmos , Transtornos de Deglutição/diagnóstico , Transtornos de Deglutição/etiologia , Aprendizado de Máquina , Estudos Prospectivos , Aspiração Respiratória/diagnóstico , Aspiração Respiratória/etiologia , Adulto

13.

Reflux Symptom Index (RSI), Videolaryngostroboscopy and Voice Analysis: A Triad of Non-Invasive Tools to Study Treatment Outcomes of Laryngopharyngeal Reflux Disease (LPRD).

Suda, Anuja; Sikdar, Abhik; Nivsarkar, Sameer; Phatak, Shrikant; Agarwal, Richa.

Indian J Otolaryngol Head Neck Surg ; 76(1): 250-261, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38440605

RESUMO

Study post treatment improvement of Laryngopharyngeal Reflux Disease (LPRD) using non-invasive tools of Reflux symptom index (RSI), Reflux finding score (RFS) grading of videolaryngostroboscopy (VLS) and voice analysis. This study from December 2020 to April 2022 enrolled 100 adults with complaints suggestive of reflux symptoms and having Reflux Symptom Index (RSI) more than 13. All patients underwent VLS along with voice analysis. VLS findings were graded using Reflux Finding Score (RFS). Patients were advised for lifestyle modifications and proton pump inhibitors for 8 weeks when post treatment RSI, VLS and voice analyses were again documented. The age range was from 18 to 75 years. Males predominated. Lifestyle modification compliance was seen in 85% of the patients. We found a significant association (P = 0.001) for difference in pretreatment and posttreatment for both Reflux Symptom Index (RSI) parameters & Reflux Finding Score Index (RFS) parameters. Voice analysis pre and post treatment showed a significant association (P = 0.001) for fundamental frequency, jitter, shimmer, harmonic-to-noise ratio and maximum phonation time. The gold standard of diagnosis of LPRD is 24 h pH monitoring but has many false negatives and false positives due to intermittent reflux and inaccurate probe placement. This costly, time consuming and invasive procedure is not widely available amongst our speciality. Excellent visualisation of VLS allowed accurate RFS calculation. Voice analysis permitted early diagnosis of LPRD induced hoarseness before it became clinically significant. It also documented the treatment outcome. We conclude that an 8-weeks proton pump inhibitor treatment combined with lifestyle modification resulted in a significant improvement in the parameters of the non-invasive tools of RSI and RFS and voice analysis.

14.

Correlation of acoustic voice analysis and Voice Handicap Index in patients with postoperative unilateral vocal cord paralysis after thyroid surgery

Bian, Yanrui; Wang, Jingmiao; Zhang, Haizhong; Yin, Xiaoyan; Zhang, Yubo.

Braz. j. med. biol. res ; 57: e13528, fev.2024. tab, graf

Artigo em Inglês | LILACS-Express | LILACS | ID: biblio-1564159

RESUMO

Unilateral vocal cord paralysis is frequently observed in patients who undergo thyroid surgery. This study explored the correlation between acoustic voice analysis (objective measure) and Voice Handicap Index (VHI, a self-assessment tool). One hundred and forty patients who had thyroid surgery with or without postoperative unilateral vocal cord paralysis (PVCP and NPVCP) were included. The patients were evaluated by the VHI and Dysphonia Severity Index (DSI) tools. VHI scores were significantly higher in PVCP patients than in NPVCP patients. Jitter (%) and shimmer (%) were significantly increased, whereas DSI was significantly decreased in PVCP patients. Receiver operating characteristics curve revealed that VHI scores were associated with the diagnosis of PVCP, of which VHI total score yielded an area under the curve (AUC) of 0.81. Among acoustic parameters, DSI was highly associated to PVCP (AUC=0.82, 95%CI=0.75 to 0.89). Moreover, we found a correlation between VHI scores and voice acoustic parameters. Among them, DSI had a moderate correlation with functional and VHI scores, as suggested by an R value of 0.41 and 0.49, respectively. VHI scores and acoustic parameters were associated with the diagnosis of PVCP.

15.

Critical swallowing functions contributing to dysphagia in patients with recurrent laryngeal nerve paralysis after esophagectomy.

Takatsu, Jun; Higaki, Eiji; Abe, Tetsuya; Fujieda, Hironori; Yoshida, Masahiro; Yamamoto, Masahiko; Shimizu, Yasuhiro.

Esophagus ; 21(2): 111-119, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38294588

RESUMO

BACKGROUND: Recurrent laryngeal nerve paralysis (RLNP) after esophagectomy can cause aspiration because of incomplete glottis closure, leading to pneumonia. However, patients with RLNP often have preserved swallowing function. This study investigated factors that determine swallowing function in patients with RLNP. METHODS: Patients with esophageal cancer who underwent esophagectomy and cervical esophagogastric anastomosis were enrolled between 2017 and 2020. Videofluoroscopic examination of swallowing study (VFSS) and acoustic voice analysis were performed on patients with suspected dysphagia including RLNP. Dysphagia in VFSS was defined as score ≥ 3 of the 8-point penetration-aspiration scale VFSS and acoustic analysis results related to dysphagia were compared between patients with and without RLNP. RESULTS: Among 312 patients who underwent esophagectomy, 74 developed RLNP. The incidence of late-onset pneumonia was significantly higher in the RLNP group than in the non-RLNP (18.9 vs. 8.0%, P = .008). Detailed swallowing function was assessed by VFSS in 84 patients, and patients with RLNP and dysphagia showed significantly shorter maximum diagonal hyoid bone elevation (10.62 vs. 16.75 mm; P = .003), which was a specific finding not seen in patients without RLNP. For acoustic voice analysis, the degree of hoarseness was not closely related to dysphagia. The length of oral intake rehabilitation for patients with and without RLNP was comparable if they did not present with dysphagia (8.5 vs. 9.0 days). CONCLUSIONS: Impaired hyoid bone elevation is a specific dysphagia factor in patients with RLNP, suggesting compensatory epiglottis inversion by hyoid bone elevation is important for incomplete glottis closure caused by RLNP.

Assuntos

Transtornos de Deglutição , Pneumonia , Paralisia das Pregas Vocais , Humanos , Transtornos de Deglutição/epidemiologia , Transtornos de Deglutição/etiologia , Deglutição/fisiologia , Esofagectomia/efeitos adversos , Nervo Laríngeo Recorrente , Paralisia das Pregas Vocais/epidemiologia , Paralisia das Pregas Vocais/etiologia , Aspiração Respiratória

16.

Voice acoustic characteristics of children with late-onset cochlear implantation: Correlation to auditory performance.

Mahrous, Mahmoud M; Abdelgoad, Ahmed A; Said, Nithreen M; Telmesani, Laila M; Alrusayyis, Danah F.

Cochlear Implants Int ; 25(1): 1-10, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38171933

RESUMO

OBJECTIVES: To study the voice acoustic parameters of congenitally deaf children with delayed access to sounds due to late-onset cochlear implantation and to correlate their voice characteristics with their auditory performance. METHODS: The study included 84 children: a control group consisting of 50 children with normal hearing and normal speech development; and a study group consisting of 34 paediatric cochlear implant (CI) recipients who had suffered profound hearing loss since birth. According to speech recognition scores and pure-tone thresholds, the study group was further subdivided into two subgroups: 24 children with excellent auditory performance and 10 children with fair auditory performance. The mean age at the time of implantation was 3.6 years for excellent auditory performance group and 3.2 years for fair auditory performance group. Voice acoustic analysis was conducted on all study participants. RESULTS: Analysis of voice acoustic parameters revealed a statistically significant delay in both study groups in comparison to the control group. However, there was no statistically significant difference between the two study groups. DISCUSSION: Interestingly, in both excellent and fair performance study groups, the gap in comparison to normal hearing children was still present. While late-implanted children performed better on segmental perception (e.g. word recognition), suprasegmental perception (e.g. as demonstrated by objective acoustic voice analysis) did not progress to the same extent. CONCLUSION: On the suprasegmental speech performance level, objective acoustic voice measurements demonstrated a significant delay in the suprasegmental speech performance of children with late-onset CI, even those with excellent auditory performance.

Assuntos

Implante Coclear , Implantes Cocleares , Surdez , Percepção da Fala , Humanos , Masculino , Pré-Escolar , Feminino , Surdez/cirurgia , Surdez/fisiopatologia , Criança , Percepção da Fala/fisiologia , Acústica da Fala , Estudos de Casos e Controles , Voz/fisiologia , Qualidade da Voz

17.

The impact of tamoxifen treatment on voice parameters in premenopausal women with breast cancer.

Ata, Serdar; Ekici, Nur Yücel; Büyüksimsek, Mahmut; Çil, Timuçin; Duman, Berna Bozkurt.

Eur Arch Otorhinolaryngol ; 281(2): 1025-1030, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37947817

RESUMO

OBJECTIVE: The study aims to investigate the estrogen-agonistic effects of tamoxifen on voice parameters in premenopausal women diagnosed with breast cancer. METHODS: A total of 108 premenopausal women were included, segmented into distinct treatment groups and a control group. Objective sound analysis was conducted using robust statistical methods, employing SPSS 25.0 for data analysis. RESULTS: The study identified a statistically significant reduction in Jitter values across all treatment groups compared to the control group. No significant changes were observed in other voice quality parameters such as F0, Shimmer, NHR, and HNR. CONCLUSIONS: The findings suggest that tamoxifen may have an estrogen-agonistic effect on voice quality, thereby potentially influencing future treatment protocols. This research fills a critical void in existing literature and sets the stage for more comprehensive studies that consider affects of hormonal therapies to voice.

Assuntos

Neoplasias da Mama , Voz , Humanos , Feminino , Tamoxifeno/uso terapêutico , Neoplasias da Mama/tratamento farmacológico , Qualidade da Voz , Estrogênios , Acústica da Fala , Acústica

18.

PVGAN: A Pathological Voice Generation Model Incorporating a Progressive Nesting Strategy.

Pan, Xiaoying; Feng, Tong; Zhang, Nijuan.

J Voice ; 2023 Nov 06.

Artigo em Inglês | MEDLINE | ID: mdl-37940422

RESUMO

The voice generation task is to solve the problem of limited samples in the voice dataset using computer technology. By increasing the number of samples, the accuracy of voice disorder diagnosis can be improved, which has a wide range of application value in medical diagnosis and other fields. At present, there are insufficient models for detailed features such as pitch, timbre, and different frequency components in pathological voice data. Therefore, this paper proposes a PVGAN network for learning different frequency information of audio to generate pathological voice data. The proposed network captures the multi-scale features and different periodic patterns of audio signals by designing multiscale perceptual residual blocks and periodic discriminators. At the same time, a progressive nesting strategy was proposed to combine the generator and the discriminator to improve the learning ability of different resolution information. In addition, a latent mapping network is designed to fuse the latent vector with the condition information to generate sound features related to specific diseases or pathological states. The loss function is optimized to further improve the model performance. On the Saarbruecken Voice Database(SVD), the average values of each index of the data generated after training with different pathological types as conditional information are similar to the original data. Finally, the generated data were used to expand the SVD dataset, and the accuracy of the two classification experiments was improved to a certain extent.

19.

Effects of deep brain stimulation of the subthalamic nucleus on patients with Parkinson's disease: a machine-learning voice analysis.

Suppa, Antonio; Asci, Francesco; Costantini, Giovanni; Bove, Francesco; Piano, Carla; Pistoia, Francesca; Cerroni, Rocco; Brusa, Livia; Cesarini, Valerio; Pietracupa, Sara; Modugno, Nicola; Zampogna, Alessandro; Sucapane, Patrizia; Pierantozzi, Mariangela; Tufo, Tommaso; Pisani, Antonio; Peppe, Antonella; Stefani, Alessandro; Calabresi, Paolo; Bentivoglio, Anna Rita; Saggio, Giovanni.

Front Neurol ; 14: 1267360, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37928137

RESUMO

Introduction: Deep brain stimulation of the subthalamic nucleus (STN-DBS) can exert relevant effects on the voice of patients with Parkinson's disease (PD). In this study, we used artificial intelligence to objectively analyze the voices of PD patients with STN-DBS. Materials and methods: In a cross-sectional study, we enrolled 108 controls and 101 patients with PD. The cohort of PD was divided into two groups: the first group included 50 patients with STN-DBS, and the second group included 51 patients receiving the best medical treatment. The voices were clinically evaluated using the Unified Parkinson's Disease Rating Scale part-III subitem for voice (UPDRS-III-v). We recorded and then analyzed voices using specific machine-learning algorithms. The likelihood ratio (LR) was also calculated as an objective measure for clinical-instrumental correlations. Results: Clinically, voice impairment was greater in STN-DBS patients than in those who received oral treatment. Using machine learning, we objectively and accurately distinguished between the voices of STN-DBS patients and those under oral treatments. We also found significant clinical-instrumental correlations since the greater the LRs, the higher the UPDRS-III-v scores. Discussion: STN-DBS deteriorates speech in patients with PD, as objectively demonstrated by machine-learning voice analysis.

20.

High and Wide: An In Silico Investigation of Frequency, Intensity, and Vibrato Effects on Widely Applied Acoustic Voice Perturbation and Noise Measures.

Baker, Calvin Peter; Brockmann-Bauser, Meike; Purdy, Suzanne C; Rakena, Te Oti.

J Voice ; 2023 Nov 02.

Artigo em Inglês | MEDLINE | ID: mdl-37925330

RESUMO

OBJECTIVES: This in silico study explored the effects of a wide range of fundamental frequency (fo), source-spectrum tilt (SST), and vibrato extent (VE) on commonly used frequency and amplitude perturbation and noise measures. METHOD: Using 53 synthesized tones produced in Madde, the effects of stepwise increases in fo, intensity (modeled by decreasing SST), and VE on the PRAAT parameters jitter % (local), relative average perturbation (RAP) %, shimmer % (local), amplitude perturbation quotient 3 (APQ3) %, and harmonics-to-noise ratio (HNR) dB were investigated. A secondary experiment was conducted to determine whether any fo effects on jitter, RAP, shimmer, APQ3, and HNR were stable. A total of 10 sinewaves were synthesized in Sopran from 100 to 1000 Hz using formant frequencies for /a/, /i/, and /u/-like vowels, respectively. All effects were statistically assessed with Kendall's tau-b and partial correlation. RESULTS: Increasing fo resulted in an overall increase in jitter, RAP, shimmer, and APQ3 values, respectively (P < 0.01). Oscillations of the data across the explored fo range were observed in all measurement outputs. In the Sopran tests, the oscillatory pattern seen in the Madde fo condition remained and showed differences between vowel conditions. Increasing intensity (decreasing SST) led to reduced pitch and amplitude perturbation and HNR (P < 0.05). Increasing VE led to lower HNR and an almost linear increase of all other measures (P < 0.05). CONCLUSION: These novel data offer a controlled demonstration for the behavior of jitter (local) %, RAP %, shimmer (local) %, APQ3 %, and HNR (dB) when varying fo, SST, and VE in synthesized tones. Since humans will vary in all of these aspects in spoken language and vowel phonation, researchers should take potential resonance-harmonics type effects into account when comparing intersubject or preintervention and postintervention data using these measures.

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA