Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
1.
Sensors (Basel) ; 24(20)2024 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-39460094

RESUMO

The selection of a target when training deep neural networks for speech enhancement is an important consideration. Different masks have been shown to exhibit different performance characteristics depending on the application and the conditions. This paper presents a comprehensive comparison of several different masks for noise reduction in cochlear implants. The study incorporated three well-known masks, namely the Ideal Binary Mask (IBM), Ideal Ratio Mask (IRM) and the Fast Fourier Transform Mask (FFTM), as well as two newly proposed masks, based on existing masks, called the Quantized Mask (QM) and the Phase-Sensitive plus Ideal Ratio Mask (PSM+). These five masks are used to train networks to estimate masks for the purpose of separating speech from noisy mixtures. A vocoder was used to simulate the behavior of a cochlear implant. Short-time Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) scores indicate that the two new masks proposed in this study (QM and PSM+) perform best for normal speech intelligibility and quality in the presence of stationary and non-stationary noise over a range of signal-to-noise ratios (SNRs). The Normalized Covariance Measure (NCM) and similarity scores indicate that they also perform best for speech intelligibility/gauging the similarity of vocoded speech. The Quantized Mask performs better than the Ideal Binary Mask due to its better resolution as it approximates the Wiener Gain Function. The PSM+ performs better than the three existing benchmark masks (IBM, IRM, and FFTM) as it incorporates both magnitude and phase information.


Assuntos
Implantes Cocleares , Ruído , Razão Sinal-Ruído , Inteligibilidade da Fala , Humanos , Inteligibilidade da Fala/fisiologia , Redes Neurais de Computação , Percepção da Fala/fisiologia
2.
Front Psychiatry ; 15: 1422020, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39355380

RESUMO

Background: Previous studies have classified major depression and healthy control groups based on vocal acoustic features, but the classification accuracy needs to be improved. Therefore, this study utilized deep learning methods to construct classification and prediction models for major depression and healthy control groups. Methods: 120 participants aged 16-25 participated in this study, included 64 MDD group and 56 HC group. We used the Covarep open-source algorithm to extract a total of 1200 high-level statistical functions for each sample. In addition, we used Python for correlation analysis, and neural network to establish the model to distinguish whether participants experienced depression, predict the total depression score, and evaluate the effectiveness of the classification and prediction model. Results: The classification modelling of the major depression and the healthy control groups by relevant and significant vocal acoustic features was 0.90, and the Receiver Operating Characteristic (ROC) curves analysis results showed that the classification accuracy was 84.16%, the sensitivity was 95.38%, and the specificity was 70.9%. The depression prediction model of speech characteristics showed that the predicted score was closely related to the total score of 17 items of the Hamilton Depression Scale(HAMD-17) (r=0.687, P<0.01); and the Mean Absolute Error(MAE) between the model's predicted score and total HAMD-17 score was 4.51. Limitation: This study's results may have been influenced by anxiety comorbidities. Conclusion: The vocal acoustic features can not only effectively classify the major depression and the healthy control groups, but also accurately predict the severity of depressive symptoms.

3.
Comput Biol Med ; 182: 109078, 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39265476

RESUMO

This study advances the automation of Parkinson's disease (PD) diagnosis by analyzing speech characteristics, leveraging a comprehensive approach that integrates a voting-based machine learning model. Given the growing prevalence of PD, especially among the elderly population, continuous and efficient diagnosis is of paramount importance. Conventional monitoring methods suffer from limitations related to time, cost, and accessibility, underscoring the need for the development of automated diagnostic tools. In this paper, we present a robust model for classifying speech patterns in Korean PD patients, addressing a significant research gap. Our model employs straightforward preprocessing techniques and a voting-based machine learning approach, demonstrating superior performance, particularly when training data is limited. Furthermore, we emphasize the effectiveness of the eGeMAPSv2 feature set in PD analysis and introduce new features that substantially enhance classification accuracy. The proposed model, achieving an accuracy of 84.73 % and an area under the ROC (AUC) score of 92.18 % on a dataset comprising 100 Korean PD patients and 100 healthy controls, offers a practical solution for automated diagnosis applications, such as smartphone apps. Future research endeavors will concentrate on enhancing the model's performance and delving deeper into the relationship between high-importance features and PD.

4.
Artif Intell Med ; 156: 102953, 2024 10.
Artigo em Inglês | MEDLINE | ID: mdl-39222579

RESUMO

BACKGROUND: Chronic obstructive pulmonary disease (COPD) is a severe condition affecting millions worldwide, leading to numerous annual deaths. The absence of significant symptoms in its early stages promotes high underdiagnosis rates for the affected people. Besides pulmonary function failure, another harmful problem of COPD is the systemic effects, e.g., heart failure or voice distortion. However, the systemic effects of COPD might provide valuable information for early detection. In other words, symptoms caused by systemic effects could be helpful to detect the condition in its early stages. OBJECTIVE: The proposed study aims to explore whether the voice features extracted from the vowel "a" utterance carry any information that can be predictive of COPD by employing Machine Learning (ML) on a newly collected voice dataset. METHODS: Forty-eight participants were recruited from the pool of research clinic visitors at Blekinge Institute of Technology (BTH) in Sweden between January 2022 and May 2023. A dataset consisting of 1246 recordings from 48 participants was gathered. The collection of voice recordings containing the vowel "a" utterance commenced following an information and consent meeting with each participant using the VoiceDiagnostic application. The collected voice data was subjected to silence segment removal, feature extraction of baseline acoustic features, and Mel Frequency Cepstrum Coefficients (MFCC). Sociodemographic data was also collected from the participants. Three ML models were investigated for the binary classification of COPD and healthy controls: Random Forest (RF), Support Vector Machine (SVM), and CatBoost (CB). A nested k-fold cross-validation approach was employed. Additionally, the hyperparameters were optimized using grid-search on each ML model. For best performance assessment, accuracy, F1-score, precision, and recall metrics were computed. Afterward, we further examined the best classifier by utilizing the Area Under the Curve (AUC), Average Precision (AP), and SHapley Additive exPlanations (SHAP) feature-importance measures. RESULTS: The classifiers RF, SVM, and CB achieved a maximum accuracy of 77 %, 69 %, and 78 % on the test set and 93 %, 78 % and 97 % on the validation set, respectively. The CB classifier outperformed RF and SVM. After further investigation of the best-performing classifier, CB demonstrated the highest performance, producing an AUC of 82 % and AP of 76 %. In addition to age and gender, the mean values of baseline acoustic and MFCC features demonstrate high importance and deterministic characteristics for classification performance in both test and validation sets, though in varied order. CONCLUSION: This study concludes that the utterance of vowel "a" recordings contain information that can be captured by the CatBoost classifier with high accuracy for the classification of COPD. Additionally, baseline acoustic and MFCC features, in conjunction with age and gender information, can be employed for classification purposes and benefit healthcare for decision support in COPD diagnosis. CLINICAL TRIAL REGISTRATION NUMBER: NCT05897944.


Assuntos
Aprendizado de Máquina , Doença Pulmonar Obstrutiva Crônica , Doença Pulmonar Obstrutiva Crônica/classificação , Doença Pulmonar Obstrutiva Crônica/fisiopatologia , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Humanos , Masculino , Feminino , Idoso , Pessoa de Meia-Idade , Voz/fisiologia , Máquina de Vetores de Suporte
5.
Comput Biol Med ; 182: 109021, 2024 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-39236660

RESUMO

BACKGROUND: Voice analysis has significant potential in aiding healthcare professionals with detecting, diagnosing, and personalising treatment. It represents an objective and non-intrusive tool for supporting the detection and monitoring of specific pathologies. By calculating various acoustic features, voice analysis extracts valuable information to assess voice quality. The choice of these parameters is crucial for an accurate assessment. METHOD: In this paper, we propose a lightweight acoustic parameter set, named HEAR, able to evaluate voice quality to assess mental health. In detail, this consists of jitter, spectral centroid, Mel-frequency cepstral coefficients, and their derivates. The choice of parameters for the proposed set was influenced by the explainable significance of each acoustic parameter in the voice production process. RESULTS: The reliability of the proposed acoustic set to detect the early symptoms of mental disorders was evaluated in an experimental phase. Voices of subjects suffering from different mental pathologies, selected from available databases, were analysed. The performance obtained from the HEAR features was compared with that obtained by analysing features selected from toolkits widely used in the literature, as with those obtained using learned procedures. The best performance in terms of MAE and RMSE was achieved for the detection of depression (5.32 and 6.24 respectively). For the detection of psychogenic dysphonia and anxiety, the highest accuracy rates were about 75 % and 97 %, respectively. CONCLUSIONS: The comparative evaluation was carried out to assess the performance of the proposed approach, demonstrating a reliable capability to highlight affective physiological alterations of voice quality due to the considered mental disorders.

6.
Circ Rep ; 6(8): 303-312, 2024 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-39132330

RESUMO

Background: This study aimed to systematically evaluate voice symptoms during heart failure (HF) treatments and to exploratorily extract HF-related vocal biomarkers. Methods and Results: This single-center, prospective study longitudinally acquired 839 audio files from 59 patients with acute decompensated HF. Patients' voices were analyzed along with conventional HF indicators (New York Heart Association [NYHA] class, presence of pulmonary congestion and pleural effusion on chest X-ray, and B-type natriuretic peptide [BNP]) and GOKAN scores based on the assessment of a cardiologist. Machine-learning (ML) models to estimate HF conditions were created using a Light Gradient Boosting Machine. Voice analysis identified 27 acoustic features that correlated with conventional HF indicators and GOKAN scores. When creating ML models based on the acoustic features, there was a significant correlation between actual and ML-derived BNP levels (r=0.49; P<0.001). ML models also identified good diagnostic accuracies in determining HF conditions characterized by NYHA class ≥2, BNP ≥300 pg/mL, presence of pulmonary congestion or pleural effusion on chest X-ray, and decompensated HF (defined as NYHA class ≥2 and BNP levels ≥300 pg/mL; accuracy: 75.1%, 69.1%, 68.7%, 66.4%, and 80.4%, respectively). Conclusions: The present study successfully extracted HF-related acoustic features that correlated with conventional HF indicators. Although the data are preliminary, ML models based on acoustic features (vocal biomarkers) have the potential to infer various HF conditions, which warrant future studies.

7.
Comput Biol Med ; 181: 109020, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39173487

RESUMO

Obstructive sleep apnea (OSA) is a chronic breathing disorder during sleep that affects 10-30% of adults in North America. The gold standard for diagnosing OSA is polysomnography (PSG). However, PSG has several drawbacks, for example, it is a cumbersome and expensive procedure, which can be quite inconvenient for patients. Additionally, patients often have to endure long waitlists before they can undergo PSG. As a result, other alternatives for screening OSA have gained attention. Speech, as an accessible modality, is generated by variations in the pharyngeal airway, vocal tract, and soft tissues in the pharynx, which shares similar anatomical structures that contribute to OSA. Consequently, in this study, we aim to provide a comprehensive review of the existing research on the use of speech for estimating the severity of OSA. In this regard, a total of 851 papers were initially identified from the PubMed database using a specified set of keywords defined by population, intervention, comparison and outcome (PICO) criteria, along with a concatenated graph of the 5 most cited papers in the field extracted from ConnectedPapers platform. Following a rigorous filtering process that considered the preferred reporting items for systematic reviews and meta-analyses (PRISMA) approach, 32 papers were ultimately included in this review. Among these, 28 papers primarily focused on developing methodology, while the remaining 4 papers delved into the clinical perspective of the association between OSA and speech. In the next step, we investigate the physiological similarities between OSA and speech. Subsequently, we highlight the features extracted from speech, the employed feature selection techniques, and the details of the developed models to predict OSA severity. By thoroughly discussing the current findings and limitations of studies in the field, we provide valuable insights into the gaps that need to be addressed in future research directions.


Assuntos
Apneia Obstrutiva do Sono , Humanos , Apneia Obstrutiva do Sono/fisiopatologia , Vigília/fisiologia , Fala/fisiologia , Índice de Gravidade de Doença , Polissonografia
8.
J Affect Disord ; 362: 859-868, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39009320

RESUMO

BACKGROUND: Traditional methodologies for diagnosing post-traumatic stress disorder (PTSD) primarily rely on interviews, incurring considerable costs and lacking objective indices. Integrating biomarkers and machine learning techniques into this diagnostic process has the potential to facilitate accurate PTSD assessment by clinicians. METHODS: We assembled a dataset encompassing recordings from 76 individuals diagnosed with PTSD and 60 healthy controls. Leveraging the openSmile framework, we extracted acoustic features from these recordings and employed a random forest algorithm for feature selection. Subsequently, these selected features were utilized as inputs for six distinct classification models and a regression model. RESULTS: Classification models employing a feature set of 18 elements yielded robust binary prediction outcomes for PTSD. Notably, the RF model achieved peak accuracy at 0.975 with the highest AUC of 1.0. In terms of the regression model, it exhibited significant predictive capability for PCL-5 scores (MSE = 0.90, MAE = 0.76, R2 = 0.10, p < 0.001). Noteworthy was the correlation coefficient of 0.33 (p < 0.01) between predicted and actual values. LIMITATIONS: Firstly, the process of feature selection may compromise the stability of models, which leads to potentially overestimating results. Secondly, it is hard to elucidate the nature of biological mechanisms behind between PTSD patients and healthy individuals. Lastly, the regression model has a limited prediction for PTSD. CONCLUSIONS: Distinct speech patterns differentiate PTSD patients and controls. Classification models accurately discern both groups. Regression model gauges PTSD severity, but further validation on larger datasets is needed.


Assuntos
Aprendizado de Máquina , Transtornos de Estresse Pós-Traumáticos , Humanos , Transtornos de Estresse Pós-Traumáticos/diagnóstico , Masculino , Feminino , Adulto , Pessoa de Meia-Idade , Índice de Gravidade de Doença , Interface para o Reconhecimento da Fala , Estudos de Casos e Controles
9.
J Voice ; 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38890016

RESUMO

PURPOSE: This research aims to identify acoustic features which can distinguish patients with Parkinson's disease (PD patients) and healthy speakers. METHODS: Thirty PD patients and 30 healthy speakers were recruited in the experiment, and their speech was collected, including three vowels (/i/, /a/, and /u/) and nine consonants (/p/, /pÊ°/, /t/, /tÊ°/, /k/, /kÊ°/, /l/, /m/, and /n/). Acoustic features like fundamental frequency (F0), Jitter, Shimmer, harmonics-to-noise ratio (HNR), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), voice onset, voice onset time were analyzed in our experiment. Two-sample independent t test and the nonparametric Mann-Whitney U (MWU) test were carried out alternatively to compare the acoustic measures between the PD patients and healthy speakers. In addition, after figuring out the effective acoustic features for distinguishing PD patients and healthy speakers, we adopted two methods to detect PD patients: (1) Built classifiers based on the effective acoustic features and (2) Trained support vector machine classifiers via the effective acoustic features. RESULTS: Significant differences were found between the male PD group and the male health control in vowel /i/ (Jitter and Shimmer) and /a/ (Shimmer and HNR). Among female subjects, significant differences were observed in F0 standard deviation (F0 SD) of /u/ between the two groups. Additionally, significant differences between PD group and health control were also found in the F3 of /i/ and /n/, whereas other acoustic features showed no significant differences between the two groups. The HNR of vowel /a/ performed the best classification accuracy compared with the other six acoustic features above found to distinguish PD patients and healthy speakers. CONCLUSIONS: PD can cause changes in the articulation and phonation of PD patients, wherein increases or decreases occur in some acoustic features. Therefore, the use of acoustic features to detect PD is expected to be a low-cost and large-scale diagnostic method.

10.
Sensors (Basel) ; 24(10)2024 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-38793909

RESUMO

Constipation is a common gastrointestinal disorder that impairs quality of life. Evaluating bowel motility via traditional methods, such as MRI and radiography, is expensive and inconvenient. Bowel sound (BS) analysis has been proposed as an alternative, with BS-time-domain acoustic features (BSTDAFs) being effective for evaluating bowel motility via several food and drink consumption tests. However, the effect of BSTDAFs before drink consumption on those after drink consumption is yet to be investigated. This study used BS-based stimulus-response plots (BSSRPs) to investigate this effect on 20 participants who underwent drinking tests. A strong negative correlation was observed between the number of BSs per minute before carbonated water consumption and the ratio of that before and after carbonated water consumption. However, a similar trend was not observed when the participants drank cold water. These findings suggest that when carbonated water is drunk, bowel motility before ingestion affects motor response to ingestion. This study provides a non-invasive BS-based approach for evaluating motor response to food and drink, offering a new research window for investigators in this field.


Assuntos
Ingestão de Líquidos , Motilidade Gastrointestinal , Humanos , Ingestão de Líquidos/fisiologia , Masculino , Motilidade Gastrointestinal/fisiologia , Feminino , Adulto , Adulto Jovem , Constipação Intestinal/fisiopatologia , Voluntários Saudáveis , Água Carbonatada
11.
Poult Sci ; 103(6): 103711, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38652956

RESUMO

Sex identification of ducklings is a critical step in the poultry farming industry, and accurate sex identification is beneficial for precise breeding and cost savings. In this study, a method for identifying the sex of ducklings based on acoustic signals was proposed. In the first step, duckling vocalizations were collected and an improved spectral subtraction method and high-pass filtering were applied to reduce the influence of noise. Then, duckling vocalizations were automatically detected by using a double-threshold endpoint detection method with 3 parameters: short-time energy (STE), short-time zero-crossing rate (ZCR), and duration (D). Following the extraction of Mel-Spectrogram features from duckling vocalizations, an improved Res2Net deep learning algorithm was used for sex classification. This algorithm was introduced with the Squeeze-and-Excitation (SE) attention mechanism and Ghost module to improve the bottleneck of Res2Net, thereby improving the model accuracy and reducing the number of parameters. The ablative experimental results showed that the introduction of the SE attention mechanism improved the model accuracy by 2.01%, while the Ghost module reduced the number of model parameters by 7.26M and the FLOPs by 0.85G. Moreover, this algorithm was compared with 5 state-of-the-art (SOTA) algorithms, and the results showed that the proposed algorithm has the best cost-effectiveness, with accuracy, recall, specificity, number of parameters, and FLOPs of 94.80, 94.92, 94.69, 18.91M, and 3.46G, respectively. After that, the vocalization detection score and the average confidence strategy were used to predict the sex of individual ducklings, and the accuracy of the proposed model reached 96.67%. In conclusion, the method proposed in this study can effectively detect the sex of ducklings and serve as a reference for automated sex identification of ducklings.


Assuntos
Patos , Vocalização Animal , Animais , Patos/fisiologia , Feminino , Masculino , Vocalização Animal/fisiologia , Acústica , Análise para Determinação do Sexo/veterinária , Análise para Determinação do Sexo/métodos , Algoritmos
12.
Sensors (Basel) ; 24(8)2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38676050

RESUMO

The use of drones has recently gained popularity in a diverse range of applications, such as aerial photography, agriculture, search and rescue operations, the entertainment industry, and more. However, misuse of drone technology can potentially lead to military threats, terrorist acts, as well as privacy and safety breaches. This emphasizes the need for effective and fast remote detection of potentially threatening drones. In this study, we propose a novel approach for automatic drone detection utilizing the usage of both radio frequency communication signals and acoustic signals derived from UAV rotor sounds. In particular, we propose the use of classical and deep machine-learning techniques and the fusion of RF and acoustic features for efficient and accurate drone classification. Distinct types of ML-based classifiers have been examined, including CNN- and RNN-based networks and the classical SVM method. The proposed approach has been evaluated with both frequency and audio features using common drone datasets, demonstrating better accuracy than existing state-of-the-art methods, especially in low SNR scenarios. The results presented in this paper show a classification accuracy of approximately 91% at an SNR ratio of -10 dB using the LSTM network and fused features.

13.
Appl Psychophysiol Biofeedback ; 49(1): 71-83, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38165498

RESUMO

Biofeedback therapy is mainly based on the analysis of physiological features to improve an individual's affective state. There are insufficient objective indicators to assess symptom improvement after biofeedback. In addition to psychological and physiological features, speech features can precisely convey information about emotions. The use of speech features can improve the objectivity of psychiatric assessments. Therefore, biofeedback based on subjective symptom scales, objective speech, and physiological features to evaluate efficacy provides a new approach for early screening and treatment of emotional problems in college students. A 4-week, randomized, controlled, parallel biofeedback therapy study was conducted with college students with symptoms of anxiety or depression. Speech samples, physiological samples, and clinical symptoms were collected at baseline and at the end of treatment, and the extracted speech features and physiological features were used for between-group comparisons and correlation analyses between the biofeedback and wait-list groups. Based on the speech features with differences between the biofeedback intervention and wait-list groups, an artificial neural network was used to predict the therapeutic effect and response after biofeedback therapy. Through biofeedback therapy, improvements in depression (p = 0.001), anxiety (p = 0.001), insomnia (p = 0.013), and stress (p = 0.004) severity were observed in college-going students (n = 52). The speech and physiological features in the biofeedback group also changed significantly compared to the waitlist group (n = 52) and were related to the change in symptoms. The energy parameters and Mel-Frequency Cepstral Coefficients (MFCC) of speech features can predict whether biofeedback intervention effectively improves anxiety and insomnia symptoms and treatment response. The accuracy of the classification model built using the artificial neural network (ANN) for treatment response and non-response was approximately 60%. The results of this study provide valuable information about biofeedback in improving the mental health of college-going students. The study identified speech features, such as the energy parameters, and MFCC as more accurate and objective indicators for tracking biofeedback therapy response and predicting efficacy. Trial Registration ClinicalTrials.gov ChiCTR2100045542.


Assuntos
Distúrbios do Início e da Manutenção do Sono , Fala , Humanos , Biorretroalimentação Psicológica/métodos , Estudantes/psicologia , Biomarcadores , Aprendizado de Máquina
14.
Infant Behav Dev ; 74: 101908, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37992456

RESUMO

The quality of infant-directed speech (IDS) and infant-directed singing (IDSi) are considered vital to children, but empirical studies on protomusical qualities of the IDSi influencing infant development are rare. The current prospective study examines the role of IDSi acoustic features, such as pitch variability, shape and movement, and vocal amplitude vibration, timbre, and resonance, in associating with infant sensorimotor, language, and socioemotional development at six and 18 months. The sample consists of 236 Palestinian mothers from Gaza Strip singing to their six-month-olds a song by their own choice. Maternal IDSi was recorded and analyzed by the OpenSMILE- tool to depict main acoustic features of pitch frequencies, variations, and contours, vocal intensity, resonance formants, and power. The results are based on completed 219 maternal IDSi. Mothers reported about their infants' sensorimotor, language-vocalization, and socioemotional skills at six months, and psychologists tested these skills by Bayley Scales for Infant Development at 18 months. Results show that maternal IDSi characterized by wide pitch variability and rich and high vocal amplitude and vibration were associated with infants' optimal sensorimotor, language vocalization, and socioemotional skills at six months, and rich and high vocal amplitude and vibration predicted these optimal developmental skills also at 18 months. High resonance and rhythmicity formants were associated with optimal language and vocalization skills at six months. To conclude, the IDSi is considered important in enhancing newborn and risk infants' wellbeing, and the current findings argue that favorable acoustic singing qualities are crucial for optimal multidomain development across infancy.


Assuntos
Canto , Feminino , Lactente , Criança , Recém-Nascido , Humanos , Estudos Prospectivos , Fala , Idioma , Acústica , Desenvolvimento da Linguagem
15.
Sensors (Basel) ; 23(19)2023 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-37836929

RESUMO

Birds play a vital role in the study of ecosystems and biodiversity. Accurate bird identification helps monitor biodiversity, understand the functions of ecosystems, and develop effective conservation strategies. However, previous bird sound recognition methods often relied on single features and overlooked the spatial information associated with these features, leading to low accuracy. Recognizing this gap, the present study proposed a bird sound recognition method that employs multiple convolutional neural-based networks and a transformer encoder to provide a reliable solution for identifying and classifying birds based on their unique sounds. We manually extracted various acoustic features as model inputs, and feature fusion was applied to obtain the final set of feature vectors. Feature fusion combines the deep features extracted by various networks, resulting in a more comprehensive feature set, thereby improving recognition accuracy. The multiple integrated acoustic features, such as mel frequency cepstral coefficients (MFCC), chroma features (Chroma) and Tonnetz features, were encoded by a transformer encoder. The transformer encoder effectively extracted the positional relationships between bird sound features, resulting in enhanced recognition accuracy. The experimental results demonstrated the exceptional performance of our method with an accuracy of 97.99%, a recall of 96.14%, an F1 score of 96.88% and a precision of 97.97% on the Birdsdata dataset. Furthermore, our method achieved an accuracy of 93.18%, a recall of 92.43%, an F1 score of 93.14% and a precision of 93.25% on the Cornell Bird Challenge 2020 (CBC) dataset.


Assuntos
Ecossistema , Reconhecimento Psicológico , Animais , Som , Acústica , Aves
16.
Sensors (Basel) ; 23(17)2023 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-37688009

RESUMO

Although cochlear implants work well for people with hearing impairment in quiet conditions, it is well-known that they are not as effective in noisy environments. Noise reduction algorithms based on machine learning allied with appropriate speech features can be used to address this problem. The purpose of this study is to investigate the importance of acoustic features in such algorithms. Acoustic features are extracted from speech and noise mixtures and used in conjunction with the ideal binary mask to train a deep neural network to estimate masks for speech synthesis to produce enhanced speech. The intelligibility of this speech is objectively measured using metrics such as Short-time Objective Intelligibility (STOI), Hit Rate minus False Alarm Rate (HIT-FA) and Normalized Covariance Measure (NCM) for both simulated normal-hearing and hearing-impaired scenarios. A wide range of existing features is experimentally evaluated, including features that have not been traditionally applied in this application. The results demonstrate that frequency domain features perform best. In particular, Gammatone features performed best for normal hearing over a range of signal-to-noise ratios and noise types (STOI = 0.7826). Mel spectrogram features exhibited the best overall performance for hearing impairment (NCM = 0.7314). There is a stronger correlation between STOI and NCM than HIT-FA and NCM, suggesting that the former is a better predictor of intelligibility for hearing-impaired listeners. The results of this study may be useful in the design of adaptive intelligibility enhancement systems for cochlear implants based on both the noise level and the nature of the noise (stationary or non-stationary).


Assuntos
Implante Coclear , Implantes Cocleares , Humanos , Acústica , Algoritmos , Benchmarking
18.
Alzheimers Dement ; 19(10): 4675-4687, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37578167

RESUMO

Recent advancements in the artificial intelligence (AI) domain have revolutionized the early detection of cognitive impairments associated with dementia. This has motivated clinicians to use AI-powered dementia detection systems, particularly systems developed based on individuals' and patients' speech and language, for a quick and accurate identification of patients with dementia. This paper reviews articles about developing assessment tools using machine learning and deep learning algorithms trained by vocal and textual datasets.

19.
Front Psychiatry ; 14: 1195276, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37415683

RESUMO

Background: Depression is a widespread mental disorder that affects a significant portion of the population. However, the assessment of depression is often subjective, relying on standard questions or interviews. Acoustic features have been suggested as a reliable and objective alternative for depression assessment. Therefore, in this study, we aim to identify and explore voice acoustic features that can effectively and rapidly predict the severity of depression, as well as investigate the potential correlation between specific treatment options and voice acoustic features. Methods: We utilized voice acoustic features correlated with depression scores to train a prediction model based on artificial neural network. Leave-one-out cross-validation was performed to evaluate the performance of the model. We also conducted a longitudinal study to analyze the correlation between the improvement of depression and changes in voice acoustic features after an Internet-based cognitive-behavioral therapy (ICBT) program consisting of 12 sessions. Results: Our study showed that the neural network model trained based on the 30 voice acoustic features significantly correlated with HAMD scores can accurately predict the severity of depression with an absolute mean error of 3.137 and a correlation coefficient of 0.684. Furthermore, four out of the 30 features significantly decreased after ICBT, indicating their potential correlation with specific treatment options and significant improvement in depression (p < 0.05). Conclusion: Voice acoustic features can effectively and rapidly predict the severity of depression, providing a low-cost and efficient method for screening patients with depression on a large scale. Our study also identified potential acoustic features that may be significantly related to specific treatment options for depression.

20.
J Voice ; 2023 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-37479635

RESUMO

Communication is imperative for living beings for exchanging information. But for newborns, the only way of communicating with the world is through crying, and it is the only medium through which caregivers can know about the needs of their children. Timely addressing baby cries is very important so that the child is relieved at the earliest. It has been a challenge, especially for new parents. The literature says newborn babies use The Dustan Baby Language to communicate. According to this language, there are five words to understand a baby's needs, which are "Neh" (hungry), "Eh" (burp is needed), "Owh/Oah" (fatigue), "Eair/Eargghh" (cramps), "Heh" (feel hot or wet, physical discomfort). This research aims to develop a model for recognizing baby cries and distinguishing between different kinds of baby cries. Here we more broadly focus on whether the infant is in pain due to hunger or discomfort. The study proposes a comparative approach using four classification models: random forest, support vector machine, logistic regression, and decision tree. These algorithms learn from the spectral features: chroma_stft, spectral_centroid, bandwidth, spectral_rolloff, mel-frequency cepstral coefficients, linear predictive coding, res, zero_crossing_rate extracted from the infant cry. The support vector machine model outperforms other classifiers for correctly classifying infant cries.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA