RESUMO
INTRODUCTION: Multiple sclerosis (MS) is a leading cause of disability among young adults, but standard clinical scales may not accurately detect subtle changes in disability occurring between visits. This study aims to explore whether wearable device data provides more granular and objective measures of disability progression in MS. METHODS: Remote Assessment of Disease and Relapse in Central Nervous System Disorders (RADAR-CNS) is a longitudinal multicenter observational study in which 400 MS patients have been recruited since June 2018 and prospectively followed up for 24 months. Monitoring of patients included standard clinical visits with assessment of disability through use of the Expanded Disability Status Scale (EDSS), 6-minute walking test (6MWT) and timed 25-foot walk (T25FW), as well as remote monitoring through the use of a Fitbit. RESULTS: Among the 306 patients who completed the study (mean age, 45.6 years; females 67%), confirmed disability progression defined by the EDSS was observed in 74 patients, who had approximately 1392 fewer daily steps than patients without disability progression. However, the decrease in the number of steps experienced over time by patients with EDSS progression and stable patients was not significantly different. Similar results were obtained with disability progression defined by the 6MWT and the T25FW. CONCLUSION: The use of continuous activity monitoring holds great promise as a sensitive and ecologically valid measure of disability progression in MS.
Assuntos
Pessoas com Deficiência , Esclerose Múltipla , Dispositivos Eletrônicos Vestíveis , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Avaliação da Deficiência , Esclerose Múltipla/diagnóstico , Teste de Caminhada , Caminhada/fisiologia , AdultoRESUMO
From early in the coronavirus disease 2019 (COVID-19) pandemic, there was interest in using machine learning methods to predict COVID-19 infection status based on vocal audio signals, for example, cough recordings. However, early studies had limitations in terms of data collection and of how the performances of the proposed predictive models were assessed. This article describes how these limitations have been overcome in a study carried out by the Turing-RSS Health Data Laboratory and the UK Health Security Agency. As part of the study, the UK Health Security Agency collected a dataset of acoustic recordings, SARS-CoV-2 infection status and extensive study participant meta-data. This allowed us to rigorously assess state-of-the-art machine learning techniques to predict SARS-CoV-2 infection status based on vocal audio signals. The lessons learned from this project should inform future studies on statistical evaluation methods to assess the performance of machine learning techniques for public health tasks.
Assuntos
COVID-19 , Aprendizado de Máquina , Humanos , COVID-19/epidemiologia , Reino Unido , Saúde Pública , SARS-CoV-2RESUMO
The Coronavirus (COVID-19) pandemic impelled several research efforts, from collecting COVID-19 patients' data to screening them for virus detection. Some COVID-19 symptoms are related to the functioning of the respiratory system that influences speech production; this suggests research on identifying markers of COVID-19 in speech and other human generated audio signals. In this article, we give an overview of research on human audio signals using 'Artificial Intelligence' techniques to screen, diagnose, monitor, and spread the awareness about COVID-19. This overview will be useful for developing automated systems that can help in the context of COVID-19, using non-obtrusive and easy to use bio-signals conveyed in human non-speech and speech audio productions.
RESUMO
The sudden outbreak of COVID-19 has resulted in tough challenges for the field of biometrics due to its spread via physical contact, and the regulations of wearing face masks. Given these constraints, voice biometrics can offer a suitable contact-less biometric solution; they can benefit from models that classify whether a speaker is wearing a mask or not. This article reviews the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 COMputational PARalinguistics challengE (ComParE), which focused on the following classification task: Given an audio chunk of a speaker, classify whether the speaker is wearing a mask or not. First, we report the collection of the Mask Augsburg Speech Corpus (MASC) and the baseline approaches used to solve the problem, achieving a performance of 71.8 % Unweighted Average Recall (UAR). We then summarise the methodologies explored in the submitted and accepted papers that mainly used two common patterns: (i) phonetic-based audio features, or (ii) spectrogram representations of audio combined with Convolutional Neural Networks (CNNs) typically used in image processing. Most approaches enhance their models by adapting ensembles of different models and attempting to increase the size of the training data using various techniques. We review and discuss the results of the participants of this sub-challenge, where the winner scored a UAR of 80.1 % . Moreover, we present the results of fusing the approaches, leading to a UAR of 82.6 % . Finally, we present a smartphone app that can be used as a proof of concept demonstration to detect in real-time whether users are wearing a face mask; we also benchmark the run-time of the best models.
RESUMO
COVID-19 is a global health crisis that has been affecting our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. Many symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the first time, the present study investigates voice acoustic correlates of a COVID-19 infection based on a comprehensive acoustic parameter set. We compare 88 acoustic features extracted from recordings of the vowels /i:/, /e:/, /u:/, /o:/, and /a:/ produced by 11 symptomatic COVID-19 positive and 11 COVID-19 negative German-speaking participants. We employ the Mann-Whitney U test and calculate effect sizes to identify features with prominent group differences. The mean voiced segment length and the number of voiced segments per second yield the most important differences across all vowels indicating discontinuities in the pulmonic airstream during phonation in COVID-19 positive participants. Group differences in front vowels are additionally reflected in fundamental frequency variation and the harmonics-to-noise ratio, group differences in back vowels in statistics of the Mel-frequency cepstral coefficients and the spectral slope. Our findings represent an important proof-of-concept contribution for a potential voice-based identification of individuals infected with COVID-19.
Assuntos
COVID-19 , Voz , Acústica , Humanos , Fonação , SARS-CoV-2 , Acústica da Fala , Qualidade da VozRESUMO
Due to the complex and intricate nature associated with their production, the acoustic-prosodic properties of a speech signal are modulated with a range of health related effects. There is an active and growing area of machine learning research in this speech and health domain, focusing on developing paradigms to objectively extract and measure such effects. Concurrently, deep learning is transforming intelligent signal analysis, such that machines are now reaching near human capabilities in a range of recognition and analysis tasks. Herein, we review current state-of-the-art approaches with speech-based health detection, placing a particular focus on the impact of deep learning within this domain. Based on this overview, it is evident while that deep learning based solutions be become more present in the literature, it has not had the same overall dominating effect seen in other related fields. In this regard, we suggest some possible research directions aimed at fully leveraging the advantages that deep learning can offer speech-based health detection.
Assuntos
Aprendizado Profundo/tendências , Fala , Acústica , Humanos , Redes Neurais de ComputaçãoRESUMO
In their recent publication in Patterns, the authors proposed a methodology based on sample-free Bayesian neural networks and label smoothing to improve both predictive and calibration performance on animal call detection. Such approaches have the potential to foster trust in algorithmic decision making and enhance policy making in applications about conservation using recordings made by on-site passive acoustic monitoring equipment. This interview is a companion to these authors' recent paper, "Propagating Variational Model Uncertainty for Bioacoustic Call Label Smoothing".
RESUMO
Due to the objectivity of emotional expression in the central nervous system, EEG-based emotion recognition can effectively reflect humans' internal emotional states. In recent years, convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have made significant strides in extracting local features and temporal dependencies from EEG signals. However, CNNs ignore spatial distribution information from EEG electrodes; moreover, RNNs may encounter issues such as exploding/vanishing gradients and high time consumption. To address these limitations, we propose an attention-based temporal graph representation network (ATGRNet) for EEG-based emotion recognition. Firstly, a hierarchical attention mechanism is introduced to integrate feature representations from both frequency bands and channels ordered by priority in EEG signals. Second, a graph convolutional neural network with top-k operation is utilized to capture internal relationships between EEG electrodes under different emotion patterns. Next, a residual-based graph readout mechanism is applied to accumulate the EEG feature node-level representations into graph-level representations. Finally, the obtained graph-level representations are fed into a temporal convolutional network (TCN) to extract the temporal dependencies between EEG frames. We evaluated our proposed ATGRNet on the SEED, DEAP and FACED datasets. The experimental findings show that the proposed ATGRNet surpasses the state-of-the-art graph-based mehtods for EEG-based emotion recognition.
Assuntos
Eletroencefalografia , Emoções , Redes Neurais de Computação , Processamento de Sinais Assistido por Computador , Humanos , Eletroencefalografia/métodos , Emoções/fisiologia , AlgoritmosRESUMO
Automatically recognising apparent emotions from face and voice is hard, in part because of various sources of uncertainty, including in the input data and the labels used in a machine learning framework. This paper introduces an uncertainty-aware multimodal fusion approach that quantifies modality-wise aleatoric or data uncertainty towards emotion prediction. We propose a novel fusion framework, in which latent distributions over unimodal temporal context are learned by constraining their variance. These variance constraints, Calibration and Ordinal Ranking, are designed such that the variance estimated for a modality can represent how informative the temporal context of that modality is w.r.t. emotion recognition. When well-calibrated, modality-wise uncertainty scores indicate how much their corresponding predictions are likely to differ from the ground truth labels. Well-ranked uncertainty scores allow the ordinal ranking of different frames across different modalities. To jointly impose both these constraints, we propose a softmax distributional matching loss. Our evaluation on AVEC 2019 CES, CMU-MOSEI, and IEMOCAP datasets shows that the proposed multimodal fusion method not only improves the generalisation performance of emotion recognition models and their predictive uncertainty estimates, but also makes the models robust to novel noise patterns encountered at test time.
RESUMO
Ubiquitous sensing has been widely applied in smart healthcare, providing an opportunity for intelligent heart sound auscultation. However, smart devices contain sensitive information, raising user privacy concerns. To this end, federated learning (FL) has been adopted as an effective solution, enabling decentralised learning without data sharing, thus preserving data privacy in the Internet of Health Things (IoHT). Nevertheless, traditional FL requires the same architectural models to be trained across local clients and global servers, leading to a lack of model heterogeneity and client personalisation. For medical institutions with private data clients, this study proposes Fed-MStacking, a heterogeneous FL framework that incorporates a stacking ensemble learning strategy to support clients in building their own models. The secondary objective of this study is to address scenarios involving local clients with data characterised by inconsistent labelling. Specifically, the local client contains only one case type, and the data cannot be shared within or outside the institution. To train a global multi-class classifier, we aggregate missing class information from all clients at each institution and build meta-data, which then participates in FL training via a meta-learner. We apply the proposed framework to a multi-institutional heart sound database. The experiments utilise random forests (RFs), feedforward neural networks (FNNs), and convolutional neural networks (CNNs) as base classifiers. The results show that the heterogeneous stacking of local models performs better compared to homogeneous stacking.
Assuntos
Ruídos Cardíacos , Aprendizado de Máquina , Processamento de Sinais Assistido por Computador , Humanos , Ruídos Cardíacos/fisiologia , Algoritmos , Auscultação Cardíaca/métodos , AdultoRESUMO
Cardiovascular diseases are a prominent cause of mortality, emphasizing the need for early prevention and diagnosis. Utilizing artificial intelligence (AI) models, heart sound analysis emerges as a noninvasive and universally applicable approach for assessing cardiovascular health conditions. However, real-world medical data are dispersed across medical institutions, forming "data islands" due to data sharing limitations for security reasons. To this end, federated learning (FL) has been extensively employed in the medical field, which can effectively model across multiple institutions. Additionally, conventional supervised classification methods require fully labeled data classes, e.g., binary classification requires labeling of positive and negative samples. Nevertheless, the process of labeling healthcare data is time-consuming and labor-intensive, leading to the possibility of mislabeling negative samples. In this study, we validate an FL framework with a naive positive-unlabeled (PU) learning strategy. Semisupervised FL model can directly learn from a limited set of positive samples and an extensive pool of unlabeled samples. Our emphasis is on vertical-FL to enhance collaboration across institutions with different medical record feature spaces. Additionally, our contribution extends to feature importance analysis, where we explore 6 methods and provide practical recommendations for detecting abnormal heart sounds. The study demonstrated an impressive accuracy of 84%, comparable to outcomes in supervised learning, thereby advancing the application of FL in abnormal heart sound detection.
RESUMO
This scoping review paper redefines the Artificial Intelligence-based Internet of Things (AIoT) driven Human Activity Recognition (HAR) field by systematically extrapolating from various application domains to deduce potential techniques and algorithms. We distill a general model with adaptive learning and optimization mechanisms by conducting a detailed analysis of human activity types and utilizing contact or non-contact devices. It presents various system integration mathematical paradigms driven by multimodal data fusion, covering predictions of complex behaviors and redefining valuable methods, devices, and systems for HAR. Additionally, this paper establishes benchmarks for behavior recognition across different application requirements, from simple localized actions to group activities. It summarizes open research directions, including data diversity and volume, computational limitations, interoperability, real-time recognition, data security, and privacy concerns. Finally, we aim to serve as a comprehensive and foundational resource for researchers delving into the complex and burgeoning realm of AIoT-enhanced HAR, providing insights and guidance for future innovations and developments.
RESUMO
OBJECTIVE: Early diagnosis of cardiovascular diseases is a crucial task in medical practice. With the application of computer audition in the healthcare field, artificial intelligence (AI) has been applied to clinical non-invasive intelligent auscultation of heart sounds to provide rapid and effective pre-screening. However, AI models generally require large amounts of data which may cause privacy issues. Unfortunately, it is difficult to collect large amounts of healthcare data from a single centre. METHODS: In this study, we propose federated learning (FL) optimisation strategies for the practical application in multi-centre institutional heart sound databases. The horizontal FL is mainly employed to tackle the privacy problem by aligning the feature spaces of FL participating institutions without information leakage. In addition, techniques based on deep learning have poor interpretability due to their "black-box" property, which limits the feasibility of AI in real medical data. To this end, vertical FL is utilised to address the issues of model interpretability and data scarcity. CONCLUSION: Experimental results demonstrate that, the proposed FL framework can achieve good performance for heart sound abnormality detection by taking the personal privacy protection into account. Moreover, using the federated feature space is beneficial to balance the interpretability of the vertical FL and the privacy of the data. SIGNIFICANCE: This work realises the potential of FL from research to clinical practice, and is expected to have extensive application in the federated smart medical system.
Assuntos
Ruídos Cardíacos , Humanos , Ruídos Cardíacos/fisiologia , Processamento de Sinais Assistido por Computador , Masculino , Bases de Dados Factuais , Aprendizado Profundo , Adulto , Feminino , Algoritmos , Pessoa de Meia-Idade , Adulto Jovem , CriançaRESUMO
Objective: Millions of people in the UK have asthma, yet 70% do not access basic care, leading to the largest number of asthma-related deaths in Europe. Chatbots may extend the reach of asthma support and provide a bridge to traditional healthcare. This study evaluates 'Brisa', a chatbot designed to improve asthma patients' self-assessment and self-management. Methods: We recruited 150 adults with an asthma diagnosis to test our chatbot. Participants were recruited over three waves through social media and a research recruitment platform. Eligible participants had access to 'Brisa' via a WhatsApp or website version for 28 days and completed entry and exit questionnaires to evaluate user experience and asthma control. Weekly symptom tracking, user interaction metrics, satisfaction measures, and qualitative feedback were utilised to evaluate the chatbot's usability and potential effectiveness, focusing on changes in asthma control and self-reported behavioural improvements. Results: 74% of participants engaged with 'Brisa' at least once. High task completion rates were observed: asthma attack risk assessment (86%), voice recording submission (83%) and asthma control tracking (95.5%). Post use, an 8% improvement in asthma control was reported. User satisfaction surveys indicated positive feedback on helpfulness (80%), privacy (87%), trustworthiness (80%) and functionality (84%) but highlighted a need for improved conversational depth and personalisation. Conclusions: The study indicates that chatbots are effective for asthma support, demonstrated by the high usage of features like risk assessment and control tracking, as well as a statistically significant improvement in asthma control. However, lower satisfaction in conversational flexibility highlights rising expectations for chatbot fluency, influenced by advanced models like ChatGPT. Future health-focused chatbots must balance conversational capability with accuracy and safety to maintain engagement and effectiveness.
RESUMO
Among the 17 Sustainable Development Goals (SDGs) proposed within the 2030 Agenda and adopted by all the United Nations member states, the 13th SDG is a call for action to combat climate change. Moreover, SDGs 14 and 15 claim the protection and conservation of life below water and life on land, respectively. In this work, we provide a literature-founded overview of application areas, in which computer audition - a powerful but in this context so far hardly considered technology, combining audio signal processing and machine intelligence - is employed to monitor our ecosystem with the potential to identify ecologically critical processes or states. We distinguish between applications related to organisms, such as species richness analysis and plant health monitoring, and applications related to the environment, such as melting ice monitoring or wildfire detection. This work positions computer audition in relation to alternative approaches by discussing methodological strengths and limitations, as well as ethical aspects. We conclude with an urgent call to action to the research community for a greater involvement of audio intelligence methodology in future ecosystem monitoring approaches.
RESUMO
Along with propagating the input toward making a prediction, Bayesian neural networks also propagate uncertainty. This has the potential to guide the training process by rejecting predictions of low confidence, and recent variational Bayesian methods can do so without Monte Carlo sampling of weights. Here, we apply sample-free methods for wildlife call detection on recordings made via passive acoustic monitoring equipment in the animals' natural habitats. We further propose uncertainty-aware label smoothing, where the smoothing probability is dependent on sample-free predictive uncertainty, in order to downweigh data samples that should contribute less to the loss value. We introduce a bioacoustic dataset recorded in Malaysian Borneo, containing overlapping calls from 30 species. On that dataset, our proposed method achieves an absolute percentage improvement of around 1.5 points on area under the receiver operating characteristic (AU-ROC), 13 points in F1, and 19.5 points in expected calibration error (ECE) compared to the point-estimate network baseline averaged across all target classes.
RESUMO
Leveraging the power of artificial intelligence to facilitate an automatic analysis and monitoring of heart sounds has increasingly attracted tremendous efforts in the past decade. Nevertheless, lacking on standard open-access database made it difficult to maintain a sustainable and comparable research before the first release of the PhysioNet CinC Challenge Dataset. However, inconsistent standards on data collection, annotation, and partition are still restraining a fair and efficient comparison between different works. To this line, we introduced and benchmarked a first version of the Heart Sounds Shenzhen (HSS) corpus. Motivated and inspired by the previous works based on HSS, we redefined the tasks and make a comprehensive investigation on shallow and deep models in this study. First, we segmented the heart sound recording into shorter recordings (10 s), which makes it more similar to the human auscultation case. Second, we redefined the classification tasks. Besides using the 3 class categories (normal, moderate, and mild/severe) adopted in HSS, we added a binary classification task in this study, i.e., normal and abnormal. In this work, we provided detailed benchmarks based on both the classic machine learning and the state-of-the-art deep learning technologies, which are reproducible by using open-source toolkits. Last but not least, we analyzed the feature contributions of best performance achieved by the benchmark to make the results more convincing and interpretable.
RESUMO
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up and help beat coronavirus' digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results.
Assuntos
COVID-19 , Humanos , Tosse , COVID-19/diagnóstico , Expiração , Aprendizado de Máquina , Reação em Cadeia da Polimerase , Fala , Reino UnidoRESUMO
BACKGROUND: Prior research has associated spoken language use with depression, yet studies often involve small or non-clinical samples and face challenges in the manual transcription of speech. This paper aimed to automatically identify depression-related topics in speech recordings collected from clinical samples. METHODS: The data included 3919 English free-response speech recordings collected via smartphones from 265 participants with a depression history. We transcribed speech recordings via automatic speech recognition (Whisper tool, OpenAI) and identified principal topics from transcriptions using a deep learning topic model (BERTopic). To identify depression risk topics and understand the context, we compared participants' depression severity and behavioral (extracted from wearable devices) and linguistic (extracted from transcribed texts) characteristics across identified topics. RESULTS: From the 29 topics identified, we identified 6 risk topics for depression: 'No Expectations', 'Sleep', 'Mental Therapy', 'Haircut', 'Studying', and 'Coursework'. Participants mentioning depression risk topics exhibited higher sleep variability, later sleep onset, and fewer daily steps and used fewer words, more negative language, and fewer leisure-related words in their speech recordings. LIMITATIONS: Our findings were derived from a depressed cohort with a specific speech task, potentially limiting the generalizability to non-clinical populations or other speech tasks. Additionally, some topics had small sample sizes, necessitating further validation in larger datasets. CONCLUSION: This study demonstrates that specific speech topics can indicate depression severity. The employed data-driven workflow provides a practical approach for analyzing large-scale speech data collected from real-world settings.
Assuntos
Aprendizado Profundo , Fala , Humanos , Smartphone , Depressão/diagnóstico , Interface para o Reconhecimento da FalaRESUMO
BACKGROUND: Based on evidence that mental health is more than an absence of mental disorders, there have been calls to find ways to promote flourishing at a population level, especially in young people, which requires effective and scalable interventions. Despite their potential for scalability, few mental wellbeing apps have been rigorously tested in high-powered trials, derived from models of healthy emotional functioning, or tailored to individual profiles. We aimed to test a personalised emotional competence self-help app versus a cognitive behavioural therapy (CBT) self-help app versus a self-monitoring app to promote mental wellbeing in healthy young people. METHODS: This international, multicentre, parallel, open-label, randomised controlled trial within a cohort multiple randomised trial (including a parallel trial of depression prevention) was done at four university trial sites in four countries (the UK, Germany, Spain, and Belgium). Participants were recruited from schools and universities and via social media from the four respective countries. Eligible participants were aged 16-22 years with well adjusted emotional competence profiles and no current or past diagnosis of major depression. Participants were randomised (1:1:1) to usual practice plus either the emotional competence app, the CBT app or the self-monitoring app, by an independent computerised system, minimised by country, age, and self-reported gender, and followed up for 12 months post-randomisation. The primary outcome was mental wellbeing (indexed by the Warwick-Edinburgh Mental Well Being Scale [WEMWBS]) at 3-month follow-up, analysed in participants who completed the 3-month follow-up assessment. Outcome assessors were masked to group allocation. The study is registered with ClinicalTrials.gov, NCT04148508, and is closed. FINDINGS: Between Oct 15, 2020, and Aug 3, 2021, 2532 participants were enrolled, and 847 were randomly assigned to the emotional competence app, 841 to the CBT app, and 844 to the self-monitoring app. Mean age was 19·2 years (SD 1·8). Of 2532 participants self-reporting gender, 1896 (74·9%) were female, 613 (24·2%) were male, 16 (0·6%) were neither, and seven (0·3%) were both. 425 participants in the emotional competence app group, 443 in the CT app group, and 447 in the self-monitoring app group completed the follow-up assessment at 3 months. There was no difference in mental wellbeing between the groups at 3 months (global p=0·47). The emotional competence app did not differ from the CBT app (mean difference in WEMWBS -0·21 [95% CI -1·08 to 0·66]) or the self-monitoring app (0·32 [-0·54 to 1·19]) and the CBT app did not differ from the self-monitoring app (0·53 [-0·33 to 1·39]). 14 of 1315 participants were admitted to or treated in hospital (or both) for mental health-related reasons, which were considered unrelated to the interventions (five participants in the emotional competence app group, eight in the CBT app group, and one in the self-monitoring app group). No deaths occurred. INTERPRETATION: The emotional competence app and the CBT app provided limited benefit in promoting mental wellbeing in healthy young people. This finding might reflect the low intensity of these interventions and the difficulty improving mental wellbeing via universal digital interventions implemented in low-risk populations. FUNDING: European Commission.