Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Mais filtros












Base de dados
Intervalo de ano de publicação
1.
Cureus ; 16(7): e65394, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39184734

RESUMO

The assessment of auscultation using a stethoscope is unsuitable for continuous monitoring. Therefore, we developed a novel acoustic monitoring system that continuously, objectively, and visually evaluates respiratory sounds. In this report, we assess the usefulness of our revised system in a ventilated extremely low birth weight infant (ELBWI) for the diagnosis of pulmonary atelectasis and evaluation of treatment by lung lavage. A female infant was born at 24 weeks of age with a birth weight of 636 g after emergency cesarean section. The patient received invasive mechanical ventilation immediately after birth in our neonatal (NICU). After obtaining informed consent, we monitored her respiratory status using the respiratory-sound monitoring system by attaching a sound collection sensor to the right anterior chest wall. On day 26, lung-sound spectrograms showed that the breath sounds were attenuated simultaneously as hypoxemia progressed. Finally, chest radiography confirmed the diagnosis as pulmonary atelectasis. To relieve atelectasis, surfactant lavage was performed, after which the lung-sound spectrograms returned to normal. Hypoxemia and chest radiographic findings improved significantly. On day 138, the patient was discharged from the NICU without complications. The continuous respiratory-sound monitoring system enabled the visual, quantitative, and noninvasive detection of acute regional lung abnormalities at the bedside. We, therefore, believe that this system can resolve several problems associated with neonatal respiratory management and save lives.

2.
Talanta ; 278: 126426, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-38908135

RESUMO

BACKGROUND: Ankylosing spondylitis (AS), Osteoarthritis (OA), and Sjögren's syndrome (SS) are three prevalent autoimmune diseases. If left untreated, which can lead to severe joint damage and greatly limit mobility. Once the disease worsens, patients may face the risk of long-term disability, and in severe cases, even life-threatening consequences. RESULT: In this study, the Raman spectral data of AS, OA, and SS are analyzed to auxiliary disease diagnosis. For the first time, the Euclidean distance(ED) upscaling technique was used for the conversation from one-dimensional(1D) disease spectral data to two-dimensional(2D) spectral images. A dual-attention mechanism network was then constructed to analyze these two-dimensional spectral maps for disease diagnosis. The results demonstrate that the dual-attention mechanism network achieves a diagnostic accuracy of 100 % when analyzing 2D ED spectrograms. Furthermore, a comparison and analysis with s-transforms(ST), short-time fourier transforms(STFT), recurrence maps(RP), markov transform field(MTF), and Gramian angle fields(GAF) highlight the significant advantage of the proposed method, as it significantly shortens the conversion time while supporting disease-assisted diagnosis. Mutual information(MI) was utilized for the first time to validate the 2D Raman spectrograms generated, including ED, ST, STFT, RP, MTF, and GAF spectrograms. This allowed for evaluation of the similarity between the original 1D spectral data and the generated 2D spectrograms. SIGNIFICANT: The results indicate that utilizing ED to transform 1D spectral data into 2D images, coupled with the application of convolutional neural network(CNN) for analyzing 2D ED Raman spectrograms, holds great promise as a valuable tool in assisting disease diagnosis. The research demonstrated that the 2D spectrogram created with ED closely resembles the original 1D spectral data. This indicates that ED effectively captures key features and important information from the original data, providing a strong descript.


Assuntos
Análise Espectral Raman , Espondilite Anquilosante , Humanos , Análise Espectral Raman/métodos , Espondilite Anquilosante/diagnóstico , Síndrome de Sjogren/diagnóstico , Osteoartrite/diagnóstico , Redes Neurais de Computação
3.
Sci Rep ; 14(1): 6589, 2024 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-38504098

RESUMO

Identifying and recognizing the food on the basis of its eating sounds is a challenging task, as it plays an important role in avoiding allergic foods, providing dietary preferences to people who are restricted to a particular diet, showcasing its cultural significance, etc. In this research paper, the aim is to design a novel methodology that helps to identify food items by analyzing their eating sounds using various deep learning models. To achieve this objective, a system has been proposed that extracts meaningful features from food-eating sounds with the help of signal processing techniques and deep learning models for classifying them into their respective food classes. Initially, 1200 audio files for 20 food items labeled have been collected and visualized to find relationships between the sound files of different food items. Later, to extract meaningful features, various techniques such as spectrograms, spectral rolloff, spectral bandwidth, and mel-frequency cepstral coefficients are used for the cleaning of audio files as well as to capture the unique characteristics of different food items. In the next phase, various deep learning models like GRU, LSTM, InceptionResNetV2, and the customized CNN model have been trained to learn spectral and temporal patterns in audio signals. Besides this, the models have also been hybridized i.e. Bidirectional LSTM + GRU and RNN + Bidirectional LSTM, and RNN + Bidirectional GRU to analyze their performance for the same labeled data in order to associate particular patterns of sound with their corresponding class of food item. During evaluation, the highest accuracy, precision,F1 score, and recall have been obtained by GRU with 99.28%, Bidirectional LSTM + GRU with 97.7% as well as 97.3%, and RNN + Bidirectional LSTM with 97.45%, respectively. The results of this study demonstrate that deep learning models have the potential to precisely identify foods on the basis of their sound by computing the best outcomes.


Assuntos
Aprendizado Profundo , Humanos , Reconhecimento Psicológico , Alimentos , Rememoração Mental , Registros
4.
Sensors (Basel) ; 24(5)2024 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-38474965

RESUMO

Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the attention of existing researchers. Aiming at the problem that the emotional information contained in the current speech signals is distributed dispersedly and cannot comprehensively integrate local and global information, this paper presents a network model based on a gated recurrent unit (GRU) and multi-head attention. We evaluate our proposed emotion model on the IEMOCAP and Emo-DB corpora. The experimental results show that the network model based on Bi-GRU and multi-head attention is significantly better than the traditional network model at detecting multiple evaluation indicators. At the same time, we also apply the model to a speech sentiment analysis task. On the CH-SIMS and MOSI datasets, the model shows excellent generalization performance.


Assuntos
Percepção , Fala , Acústica , Emoções , Reconhecimento Psicológico
5.
Bioengineering (Basel) ; 11(1)2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38247932

RESUMO

Cough-based diagnosis for respiratory diseases (RDs) using artificial intelligence (AI) has attracted considerable attention, yet many existing studies overlook confounding variables in their predictive models. These variables can distort the relationship between cough recordings (input data) and RD status (output variable), leading to biased associations and unrealistic model performance. To address this gap, we propose the Bias-Free Network (RBF-Net), an end-to-end solution that effectively mitigates the impact of confounders in the training data distribution. RBF-Net ensures accurate and unbiased RD diagnosis features, emphasizing its relevance by incorporating a COVID-19 dataset in this study. This approach aims to enhance the reliability of AI-based RD diagnosis models by navigating the challenges posed by confounding variables. A hybrid of a Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks is proposed for the feature encoder module of RBF-Net. An additional bias predictor is incorporated in the classification scheme to formulate a conditional Generative Adversarial Network (c-GAN) that helps in decorrelating the impact of confounding variables from RD prediction. The merit of RBF-Net is demonstrated by comparing classification performance with a State-of-The-Art (SoTA) Deep Learning (DL) model (CNN-LSTM) after training on different unbalanced COVID-19 data sets, created by using a large-scale proprietary cough data set. RBF-Net proved its robustness against extremely biased training scenarios by achieving test set accuracies of 84.1%, 84.6%, and 80.5% for the following confounding variables-gender, age, and smoking status, respectively. RBF-Net outperforms the CNN-LSTM model test set accuracies by 5.5%, 7.7%, and 8.2%, respectively.

6.
Comput Biol Med ; 170: 107908, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38217973

RESUMO

Electrocardiogram (ECG) are the physiological signals and a standard test to measure the heart's electrical activity that depicts the movement of cardiac muscles. A review study has been conducted on ECG signals analysis with the help of artificial intelligence (AI) methods over the last ten years i.e., 2012-22. Primarily, the method of ECG analysis by software systems was divided into classical signal processing (e.g. spectrograms or filters), machine learning (ML) and deep learning (DL), including recursive models, transformers and hybrid. Secondly, the data sources and benchmark datasets were depicted. Authors grouped resources by ECG acquisition methods into hospital-based portable machines and wearable devices. Authors also included new trends like advanced pre-processing, data augmentation, simulations and agent-based modeling. The study found improvement in ECG examination perfection made each year through ML, DL, hybrid models, and transformers. Convolutional neural networks and hybrid models were more targeted and proved efficient. The transformer model extended the accuracy from 90% to 98%. The Physio-Net library helps acquire ECG signals, including the popular benchmark databases such as MIT-BIH, PTB, and challenging datasets. Similarly, wearable devices have been established as a appropriate option for monitoring patient health without the time and place limitations and are also helpful for AI model calibration with so far accuracy of 82%-83% on Samsung smartwatch. In the pre-processing signals, spectrogram generation through Fourier and wavelet transformations are erected leading approaches promoting on average accuracy of 90%-95%. Likewise, data enhancement using geometrical techniques is well-considered; however, extraction and concatenation-based methods need attention. As the what-if analysis in healthcare or cardiac issues can be performed using a complex simulation, the study reviews agent-based modeling and simulation approaches for cardiovascular risk event assessment.


Assuntos
Algoritmos , Inteligência Artificial , Humanos , Redes Neurais de Computação , Software , Processamento de Sinais Assistido por Computador , Eletrocardiografia/métodos
7.
Sensors (Basel) ; 23(22)2023 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-38005472

RESUMO

Recent successes in deep learning have inspired researchers to apply deep neural networks to Acoustic Event Classification (AEC). While deep learning methods can train effective AEC models, they are susceptible to overfitting due to the models' high complexity. In this paper, we introduce EnViTSA, an innovative approach that tackles key challenges in AEC. EnViTSA combines an ensemble of Vision Transformers with SpecAugment, a novel data augmentation technique, to significantly enhance AEC performance. Raw acoustic signals are transformed into Log Mel-spectrograms using Short-Time Fourier Transform, resulting in a fixed-size spectrogram representation. To address data scarcity and overfitting issues, we employ SpecAugment to generate additional training samples through time masking and frequency masking. The core of EnViTSA resides in its ensemble of pre-trained Vision Transformers, harnessing the unique strengths of the Vision Transformer architecture. This ensemble approach not only reduces inductive biases but also effectively mitigates overfitting. In this study, we evaluate the EnViTSA method on three benchmark datasets: ESC-10, ESC-50, and UrbanSound8K. The experimental results underscore the efficacy of our approach, achieving impressive accuracy scores of 93.50%, 85.85%, and 83.20% on ESC-10, ESC-50, and UrbanSound8K, respectively. EnViTSA represents a substantial advancement in AEC, demonstrating the potential of Vision Transformers and SpecAugment in the acoustic domain.

8.
Sensors (Basel) ; 23(16)2023 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-37631690

RESUMO

Hydraulic systems are used in all kinds of industries. Mills, manufacturing, robotics, and Ports require the use of Hydraulic Equipment. Many industries prefer to use hydraulic systems due to their numerous advantages over electrical and mechanical systems. Hence, the growth in demand for hydraulic systems has been increasing over time. Due to its vast variety of applications, the faults in hydraulic systems can cause a breakdown. Using Artificial-Intelligence (AI)-based approaches, faults can be classified and predicted to avoid downtime and ensure sustainable operations. This research work proposes a novel approach for the classification of the cooling behavior of a hydraulic test rig. Three fault conditions for the cooling system of the hydraulic test rig were used. The spectrograms were generated using the time series data for three fault conditions. The CNN variant, the Residual Network, was used for the classification of the fault conditions. Various features were extracted from the data including the F-score, precision, accuracy, and recall using a Confusion Matrix. The data contained 43,680 attributes and 2205 instances. After testing, validating, and training, the model accuracy of the ResNet-18 architecture was found to be close to 95%.

9.
Bioengineering (Basel) ; 10(5)2023 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-37237601

RESUMO

Parkinson's disease is a progressive neurodegenerative disorder caused by dopaminergic neuron degeneration. Parkinsonian speech impairment is one of the earliest presentations of the disease and, along with tremor, is suitable for pre-diagnosis. It is defined by hypokinetic dysarthria and accounts for respiratory, phonatory, articulatory, and prosodic manifestations. The topic of this article targets artificial-intelligence-based identification of Parkinson's disease from continuous speech recorded in a noisy environment. The novelty of this work is twofold. First, the proposed assessment workflow performed speech analysis on samples of continuous speech. Second, we analyzed and quantified Wiener filter applicability for speech denoising in the context of Parkinsonian speech identification. We argue that the Parkinsonian features of loudness, intonation, phonation, prosody, and articulation are contained in the speech, speech energy, and Mel spectrograms. Thus, the proposed workflow follows a feature-based speech assessment to determine the feature variation ranges, followed by speech classification using convolutional neural networks. We report the best classification accuracies of 96% on speech energy, 93% on speech, and 92% on Mel spectrograms. We conclude that the Wiener filter improves both feature-based analysis and convolutional-neural-network-based classification performances.

10.
Comput Biol Med ; 161: 107027, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37211003

RESUMO

The COVID-19 pandemic has highlighted a significant research gap in the field of molecular diagnostics. This has brought forth the need for AI-based edge solutions that can provide quick diagnostic results whilst maintaining data privacy, security and high standards of sensitivity and specificity. This paper presents a novel proof-of-concept method to detect nucleic acid amplification using ISFET sensors and deep learning. This enables the detection of DNA and RNA on a low-cost and portable lab-on-chip platform for identifying infectious diseases and cancer biomarkers. We show that by using spectrograms to transform the signal to the time-frequency domain, image processing techniques can be applied to achieve the reliable classification of the detected chemical signals. Transformation to spectrograms is beneficial as it makes the data compatible with 2D convolutional neural networks and helps gain significant performance improvement over neural networks trained on the time domain data. The trained network achieves an accuracy of 84% with a size of 30kB making it suitable for deployment on edge devices. This facilitates a new wave of intelligent lab-on-chip platforms that combine microfluidics, CMOS-based chemical sensing arrays and AI-based edge solutions for more intelligent and rapid molecular diagnostics.


Assuntos
COVID-19 , Pandemias , Humanos , COVID-19/diagnóstico , Redes Neurais de Computação , DNA , Técnicas de Amplificação de Ácido Nucleico
11.
Food Chem X ; 17: 100386, 2023 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-36974180

RESUMO

The present study aims to perform a comparative analysis of vegetable oils and their two-component blends using infrared spectroscopy and refractometry. The study was conducted in Almaty (Kazakhstan) in 2020. Three samples of 44 vegetable oils and their blends made from two components were examined. Fractometry and infrared spectroscopy were used to investigate the properties of blended vegetable oils. To this end, the fatty acid fraction (in percentage), iodine number, and index of refraction (IOR) were calculated. Afterward, the spectrograms obtained for the blends were analyzed. It was found that the difference between the intensities of weak bands and the band expansion of 722 cm-1 indicates greater expressiveness. When low-intensity bands (1653 cm-1) become more distinct due to vibrations of double carbon bonds (C-bonds), the level of unsaturated fatty acids in the blend increases as well.

12.
Sensors (Basel) ; 23(6)2023 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-36991659

RESUMO

Internet of things (IoT)-enabled wireless body area network (WBAN) is an emerging technology that combines medical devices, wireless devices, and non-medical devices for healthcare management applications. Speech emotion recognition (SER) is an active research field in the healthcare domain and machine learning. It is a technique that can be used to automatically identify speakers' emotions from their speech. However, the SER system, especially in the healthcare domain, is confronted with a few challenges. For example, low prediction accuracy, high computational complexity, delay in real-time prediction, and how to identify appropriate features from speech. Motivated by these research gaps, we proposed an emotion-aware IoT-enabled WBAN system within the healthcare framework where data processing and long-range data transmissions are performed by an edge AI system for real-time prediction of patients' speech emotions as well as to capture the changes in emotions before and after treatment. Additionally, we investigated the effectiveness of different machine learning and deep learning algorithms in terms of performance classification, feature extraction methods, and normalization methods. We developed a hybrid deep learning model, i.e., convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM), and a regularized CNN model. We combined the models with different optimization strategies and regularization techniques to improve the prediction accuracy, reduce generalization error, and reduce the computational complexity of the neural networks in terms of their computational time, power, and space. Different experiments were performed to check the efficiency and effectiveness of the proposed machine learning and deep learning algorithms. The proposed models are compared with a related existing model for evaluation and validation using standard performance metrics such as prediction accuracy, precision, recall, F1 score, confusion matrix, and the differences between the actual and predicted values. The experimental results proved that one of the proposed models outperformed the existing model with an accuracy of about 98%.


Assuntos
Internet das Coisas , Fala , Humanos , Redes Neurais de Computação , Aprendizado de Máquina , Emoções
13.
Sensors (Basel) ; 23(6)2023 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-36991794

RESUMO

In the industrial sector, tool health monitoring has taken on significant importance due to its ability to save labor costs, time, and waste. The approach used in this research uses spectrograms of airborne acoustic emission data and a convolutional neural network variation called the Residual Network to monitor the tool health of an end-milling machine. The dataset was created using three different types of cutting tools: new, moderately used, and worn out. For various cut depths, the acoustic emission signals generated by these tools were recorded. The cuts ranged from 1 mm to 3 mm in depth. In the experiment, two distinct kinds of wood-hardwood (Pine) and softwood (Himalayan Spruce)-were employed. For each example, 28 samples totaling 10 s were captured. The trained model's prediction accuracy was evaluated using 710 samples, and the results showed an overall classification accuracy of 99.7%. The model's total testing accuracy was 100% for classifying hardwood and 99.5% for classifying softwood.

14.
PeerJ Comput Sci ; 9: e1740, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38192463

RESUMO

Nowadays, biometric authentication has gained relevance due to the technological advances that have allowed its inclusion in many daily-use devices. However, this same advantage has also brought dangers, as spoofing attacks are now more common. This work addresses the vulnerabilities of automatic speaker verification authentication systems, which are prone to attacks arising from new techniques for the generation of spoofed audio. In this article, we present a countermeasure for these attacks using an approach that includes easy to implement feature extractors such as spectrograms and mel frequency cepstral coefficients, as well as a modular architecture based on deep neural networks. Finally, we evaluate our proposal using the well-know ASVspoof 2017 V2 database, the experiments show that using the final architecture the best performance is obtained, achieving an equal error rate of 6.66% on the evaluation set.

15.
Sensors (Basel) ; 22(24)2022 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-36559944

RESUMO

The non-invasive electrocardiogram (ECG) signals are useful in heart condition assessment and are found helpful in diagnosing cardiac diseases. However, traditional ways, i.e., a medical consultation required effort, knowledge, and time to interpret the ECG signals due to the large amount of data and complexity. Neural networks have been shown to be efficient recently in interpreting the biomedical signals including ECG and EEG. The novelty of the proposed work is using spectrograms instead of raw signals. Spectrograms could be easily reduced by eliminating frequencies with no ECG information. Moreover, spectrogram calculation is time-efficient through short-time Fourier transformation (STFT) which allowed to present reduced data with well-distinguishable form to convolutional neural network (CNN). The data reduction was performed through frequency filtration by taking a specific cutoff value. These steps makes architecture of the CNN model simple which showed high accuracy. The proposed approach reduced memory usage and computational power through not using complex CNN models. A large publicly available PTB-XL dataset was utilized, and two datasets were prepared, i.e., spectrograms and raw signals for binary classification. The highest accuracy of 99.06% was achieved by the proposed approach, which reflects spectrograms are better than the raw signals for ECG classification. Further, up- and down-sampling of the signals were also performed at various sampling rates and accuracies were attained.


Assuntos
Cardiopatias , Redes Neurais de Computação , Humanos , Frequência Cardíaca , Eletrocardiografia , Filtração , Algoritmos
16.
Biosensors (Basel) ; 12(11)2022 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-36354433

RESUMO

Treating opioid use disorder (OUD) is a significant healthcare challenge in the United States. Remaining abstinent from opioids is challenging for individuals with OUD due to withdrawal symptoms that include restlessness. However, to our knowledge, studies of acute withdrawal have not quantified restlessness using involuntary movements. We hypothesized that wearable accelerometry placed mid-sternum could be used to detect withdrawal-related restlessness in patients with OUD. To study this, 23 patients with OUD undergoing active withdrawal participated in a protocol involving wearable accelerometry, opioid cues to elicit craving, and non-invasive Vagal Nerve Stimulation (nVNS) to dampen withdrawal symptoms. Using accelerometry signals, we analyzed how movements correlated with changes in acute withdrawal severity, measured by the Clinical Opioid Withdrawal Scale (COWS). Our results revealed that patients demonstrating sinusoidal-i.e., predominantly single-frequency oscillation patterns in their motion almost exclusively demonstrated an increase in the COWS, and a strong relationship between the maximum power spectral density and increased withdrawal over time, measured by the COWS (R = 0.92, p = 0.029). Accelerometry may be used in an ambulatory setting to indicate the increased intensity of a patient's withdrawal symptoms, providing an objective, readily-measurable marker that may be captured ubiquitously.


Assuntos
Transtornos Relacionados ao Uso de Opioides , Síndrome de Abstinência a Substâncias , Humanos , Analgésicos Opioides/uso terapêutico , Prognóstico , Agitação Psicomotora , Síndrome de Abstinência a Substâncias/diagnóstico , Síndrome de Abstinência a Substâncias/tratamento farmacológico , Transtornos Relacionados ao Uso de Opioides/diagnóstico , Transtornos Relacionados ao Uso de Opioides/tratamento farmacológico , Acelerometria
17.
BMC Med Inform Decis Mak ; 22(1): 226, 2022 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-36038901

RESUMO

BACKGROUND: The application of machine learning to cardiac auscultation has the potential to improve the accuracy and efficiency of both routine and point-of-care screenings. The use of convolutional neural networks (CNN) on heart sound spectrograms in particular has defined state-of-the-art performance. However, the relative paucity of patient data remains a significant barrier to creating models that can adapt to a wide range of potential variability. To that end, we examined a CNN model's performance on automated heart sound classification, before and after various forms of data augmentation, and aimed to identify the most optimal augmentation methods for cardiac spectrogram analysis. RESULTS: We built a standard CNN model to classify cardiac sound recordings as either normal or abnormal. The baseline control model achieved a PR AUC of 0.763 ± 0.047. Among the single data augmentation techniques explored, horizontal flipping of the spectrogram image improved the model performance the most, with a PR AUC of 0.819 ± 0.044. Principal component analysis color augmentation (PCA) and perturbations of saturation-value (SV) of the hue-saturation-value (HSV) color scale achieved a PR AUC of 0.779 ± 045 and 0.784 ± 0.037, respectively. Time and frequency masking resulted in a PR AUC of 0.772 ± 0.050. Pitch shifting, time stretching and compressing, noise injection, vertical flipping, and applying random color filters negatively impacted model performance. Concatenating the best performing data augmentation technique (horizontal flip) with PCA and SV perturbations improved model performance. CONCLUSION: Data augmentation can improve classification accuracy by expanding and diversifying the dataset, which protects against overfitting to random variance. However, data augmentation is necessarily domain specific. For example, methods like noise injection have found success in other areas of automated sound classification, but in the context of cardiac sound analysis, noise injection can mimic the presence of murmurs and worsen model performance. Thus, care should be taken to ensure clinically appropriate forms of data augmentation to avoid negatively impacting model performance.


Assuntos
Ruídos Cardíacos , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
18.
Multimed Tools Appl ; 81(21): 31107-31128, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35431609

RESUMO

Stress and anger are two negative emotions that affect individuals both mentally and physically; there is a need to tackle them as soon as possible. Automated systems are highly required to monitor mental states and to detect early signs of emotional health issues. In the present work convolutional neural network is proposed for anger and stress detection using handcrafted features and deep learned features from the spectrogram. The objective of using a combined feature set is gathering information from two different representations of speech signals to obtain more prominent features and to boost the accuracy of recognition. The proposed method of emotion assessment is more computationally efficient than similar approaches used for emotion assessment. The preliminary results obtained on experimental evaluation of the proposed approach on three datasets Toronto Emotional Speech Set (TESS), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Berlin Emotional Database (EMO-DB) indicate that categorical accuracy is boosted and cross-entropy loss is reduced to a considerable extent. The proposed convolutional neural network (CNN) obtains training (T) and validation (V) categorical accuracy of T = 93.7%, V = 95.6% for TESS, T = 97.5%, V = 95.6% for EMO-DB and T = 96.7%, V = 96.7% for RAVDESS dataset.

19.
Front Public Health ; 10: 819865, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35400062

RESUMO

Understanding the reason for an infant's cry is the most difficult thing for parents. There might be various reasons behind the baby's cry. It may be due to hunger, pain, sleep, or diaper-related problems. The key concept behind identifying the reason behind the infant's cry is mainly based on the varying patterns of the crying audio. The audio file comprises many features, which are highly important in classifying the results. It is important to convert the audio signals into the required spectrograms. In this article, we are trying to find efficient solutions to the problem of predicting the reason behind an infant's cry. In this article, we have used the Mel-frequency cepstral coefficients algorithm to generate the spectrograms and analyzed the varying feature vectors. We then came up with two approaches to obtain the experimental results. In the first approach, we used the Convolution Neural network (CNN) variants like VGG16 and YOLOv4 to classify the infant cry signals. In the second approach, a multistage heterogeneous stacking ensemble model was used for infant cry classification. Its major advantage was the inclusion of various advanced boosting algorithms at various levels. The proposed multistage heterogeneous stacking ensemble model had the edge over the other neural network models, especially in terms of overall performance and computing power. Finally, after many comparisons, the proposed model revealed the virtuoso performance and a mean classification accuracy of up to 93.7%.


Assuntos
Choro , Redes Neurais de Computação , Algoritmos , Humanos , Lactente
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...