RESUMO
INTRODUCTION: Multiple sclerosis (MS) is a leading cause of disability among young adults, but standard clinical scales may not accurately detect subtle changes in disability occurring between visits. This study aims to explore whether wearable device data provides more granular and objective measures of disability progression in MS. METHODS: Remote Assessment of Disease and Relapse in Central Nervous System Disorders (RADAR-CNS) is a longitudinal multicenter observational study in which 400 MS patients have been recruited since June 2018 and prospectively followed up for 24 months. Monitoring of patients included standard clinical visits with assessment of disability through use of the Expanded Disability Status Scale (EDSS), 6-minute walking test (6MWT) and timed 25-foot walk (T25FW), as well as remote monitoring through the use of a Fitbit. RESULTS: Among the 306 patients who completed the study (mean age, 45.6 years; females 67%), confirmed disability progression defined by the EDSS was observed in 74 patients, who had approximately 1392 fewer daily steps than patients without disability progression. However, the decrease in the number of steps experienced over time by patients with EDSS progression and stable patients was not significantly different. Similar results were obtained with disability progression defined by the 6MWT and the T25FW. CONCLUSION: The use of continuous activity monitoring holds great promise as a sensitive and ecologically valid measure of disability progression in MS.
Assuntos
Pessoas com Deficiência , Esclerose Múltipla , Dispositivos Eletrônicos Vestíveis , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Avaliação da Deficiência , Esclerose Múltipla/diagnóstico , Teste de Caminhada , Caminhada/fisiologia , AdultoRESUMO
From early in the coronavirus disease 2019 (COVID-19) pandemic, there was interest in using machine learning methods to predict COVID-19 infection status based on vocal audio signals, for example, cough recordings. However, early studies had limitations in terms of data collection and of how the performances of the proposed predictive models were assessed. This article describes how these limitations have been overcome in a study carried out by the Turing-RSS Health Data Laboratory and the UK Health Security Agency. As part of the study, the UK Health Security Agency collected a dataset of acoustic recordings, SARS-CoV-2 infection status and extensive study participant meta-data. This allowed us to rigorously assess state-of-the-art machine learning techniques to predict SARS-CoV-2 infection status based on vocal audio signals. The lessons learned from this project should inform future studies on statistical evaluation methods to assess the performance of machine learning techniques for public health tasks.
Assuntos
COVID-19 , Aprendizado de Máquina , Humanos , COVID-19/epidemiologia , Reino Unido , Saúde Pública , SARS-CoV-2RESUMO
Long-form audio recordings are increasingly used to study individual variation, group differences, and many other topics in theoretical and applied fields of developmental science, particularly for the description of children's language input (typically speech from adults) and children's language output (ranging from babble to sentences). The proprietary LENA software has been available for over a decade, and with it, users have come to rely on derived metrics like adult word count (AWC) and child vocalization counts (CVC), which have also more recently been derived using an open-source alternative, the ACLEW pipeline. Yet, there is relatively little work assessing the reliability of long-form metrics in terms of the stability of individual differences across time. Filling this gap, we analyzed eight spoken-language datasets: four from North American English-learning infants, and one each from British English-, French-, American English-/Spanish-, and Quechua-/Spanish-learning infants. The audio data were analyzed using two types of processing software: LENA and the ACLEW open-source pipeline. When all corpora were included, we found relatively low to moderate reliability (across multiple recordings, intraclass correlation coefficient attributed to the child identity [Child ICC], was < 50% for most metrics). There were few differences between the two pipelines. Exploratory analyses suggested some differences as a function of child age and corpora. These findings suggest that, while reliability is likely sufficient for various group-level analyses, caution is needed when using either LENA or ACLEW tools to study individual variation. We also encourage improvement of extant tools, specifically targeting accurate measurement of individual variation.
Assuntos
Desenvolvimento da Linguagem , Software , Humanos , Reprodutibilidade dos Testes , Lactente , Feminino , Fala/fisiologia , Masculino , Linguagem Infantil , IdiomaRESUMO
As more land is altered by human activity and more species become at risk of extinction, it is essential that we understand the requirements for conserving threatened species across human-modified landscapes. Owing to their rarity and often sparse distributions, threatened species can be difficult to study and efficient methods to sample them across wide temporal and spatial scales have been lacking. Passive acoustic monitoring (PAM) is increasingly recognized as an efficient method for collecting data on vocal species; however, the development of automated species detectors required to analyse large amounts of acoustic data is not keeping pace. Here, we collected 35 805 h of acoustic data across 341 sites in a region over 1000 km2 to show that PAM, together with a newly developed automated detector, is able to successfully detect the endangered Geoffroy's spider monkey (Ateles geoffroyi), allowing us to show that Geoffroy's spider monkey was absent below a threshold of 80% forest cover and within 1 km of primary paved roads and occurred equally in old growth and secondary forests. We discuss how this methodology circumvents many of the existing issues in traditional sampling methods and can be highly successful in the study of vocally rare or threatened species. Our results provide tools and knowledge for setting targets and developing conservation strategies for the protection of Geoffroy's spider monkey.
Assuntos
Ateles geoffroyi , Animais , Humanos , Florestas , Espécies em Perigo de Extinção , AcústicaRESUMO
The Coronavirus (COVID-19) pandemic impelled several research efforts, from collecting COVID-19 patients' data to screening them for virus detection. Some COVID-19 symptoms are related to the functioning of the respiratory system that influences speech production; this suggests research on identifying markers of COVID-19 in speech and other human generated audio signals. In this article, we give an overview of research on human audio signals using 'Artificial Intelligence' techniques to screen, diagnose, monitor, and spread the awareness about COVID-19. This overview will be useful for developing automated systems that can help in the context of COVID-19, using non-obtrusive and easy to use bio-signals conveyed in human non-speech and speech audio productions.
RESUMO
The sudden outbreak of COVID-19 has resulted in tough challenges for the field of biometrics due to its spread via physical contact, and the regulations of wearing face masks. Given these constraints, voice biometrics can offer a suitable contact-less biometric solution; they can benefit from models that classify whether a speaker is wearing a mask or not. This article reviews the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 COMputational PARalinguistics challengE (ComParE), which focused on the following classification task: Given an audio chunk of a speaker, classify whether the speaker is wearing a mask or not. First, we report the collection of the Mask Augsburg Speech Corpus (MASC) and the baseline approaches used to solve the problem, achieving a performance of 71.8 % Unweighted Average Recall (UAR). We then summarise the methodologies explored in the submitted and accepted papers that mainly used two common patterns: (i) phonetic-based audio features, or (ii) spectrogram representations of audio combined with Convolutional Neural Networks (CNNs) typically used in image processing. Most approaches enhance their models by adapting ensembles of different models and attempting to increase the size of the training data using various techniques. We review and discuss the results of the participants of this sub-challenge, where the winner scored a UAR of 80.1 % . Moreover, we present the results of fusing the approaches, leading to a UAR of 82.6 % . Finally, we present a smartphone app that can be used as a proof of concept demonstration to detect in real-time whether users are wearing a face mask; we also benchmark the run-time of the best models.
RESUMO
This study proposes a contrastive convolutional auto-encoder (contrastive CAE), a combined architecture of an auto-encoder and contrastive loss, to identify individuals with suspected COVID-19 infection using heart-rate data from participants with multiple sclerosis (MS) in the ongoing RADAR-CNS mHealth research project. Heart-rate data was remotely collected using a Fitbit wristband. COVID-19 infection was either confirmed through a positive swab test, or inferred through a self-reported set of recognised symptoms of the virus. The contrastive CAE outperforms a conventional convolutional neural network (CNN), a long short-term memory (LSTM) model, and a convolutional auto-encoder without contrastive loss (CAE). On a test set of 19 participants with MS with reported symptoms of COVID-19, each one paired with a participant with MS with no COVID-19 symptoms, the contrastive CAE achieves an unweighted average recall of 95.3 % , a sensitivity of 100 % and a specificity of 90.6 % , an area under the receiver operating characteristic curve (AUC-ROC) of 0.944, indicating a maximum successful detection of symptoms in the given heart rate measurement period, whilst at the same time keeping a low false alarm rate.
RESUMO
COVID-19 is a global health crisis that has been affecting our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. Many symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the first time, the present study investigates voice acoustic correlates of a COVID-19 infection based on a comprehensive acoustic parameter set. We compare 88 acoustic features extracted from recordings of the vowels /i:/, /e:/, /u:/, /o:/, and /a:/ produced by 11 symptomatic COVID-19 positive and 11 COVID-19 negative German-speaking participants. We employ the Mann-Whitney U test and calculate effect sizes to identify features with prominent group differences. The mean voiced segment length and the number of voiced segments per second yield the most important differences across all vowels indicating discontinuities in the pulmonic airstream during phonation in COVID-19 positive participants. Group differences in front vowels are additionally reflected in fundamental frequency variation and the harmonics-to-noise ratio, group differences in back vowels in statistics of the Mel-frequency cepstral coefficients and the spectral slope. Our findings represent an important proof-of-concept contribution for a potential voice-based identification of individuals infected with COVID-19.
Assuntos
COVID-19 , Voz , Acústica , Humanos , Fonação , SARS-CoV-2 , Acústica da Fala , Qualidade da VozRESUMO
Due to the complex and intricate nature associated with their production, the acoustic-prosodic properties of a speech signal are modulated with a range of health related effects. There is an active and growing area of machine learning research in this speech and health domain, focusing on developing paradigms to objectively extract and measure such effects. Concurrently, deep learning is transforming intelligent signal analysis, such that machines are now reaching near human capabilities in a range of recognition and analysis tasks. Herein, we review current state-of-the-art approaches with speech-based health detection, placing a particular focus on the impact of deep learning within this domain. Based on this overview, it is evident while that deep learning based solutions be become more present in the literature, it has not had the same overall dominating effect seen in other related fields. In this regard, we suggest some possible research directions aimed at fully leveraging the advantages that deep learning can offer speech-based health detection.
Assuntos
Aprendizado Profundo/tendências , Fala , Acústica , Humanos , Redes Neurais de ComputaçãoRESUMO
AIM: Emotional expressions are one of the most widely studied topics in neuroscience, from both clinical and non-clinical perspectives. Atypical emotional expressions are seen in various psychiatric conditions, including schizophrenia, depression, and autism spectrum conditions. Understanding the basics of emotional expressions and recognition can be crucial for diagnostic and therapeutic procedures. Emotions can be expressed in the face, gesture, posture, voice, and behavior and affect physiological parameters, such as the heart rate or body temperature. With modern technology, clinicians can use a variety of tools ranging from sophisticated laboratory equipment to smartphones and web cameras. The aim of this paper is to review the currently used tools using modern technology and discuss their usefulness as well as possible future directions in emotional expression research and treatment strategies. METHODS: The authors conducted a literature review in the PubMed, EBSCO, and SCOPUS databases, using the following key words: 'emotions,' 'emotional expression,' 'affective computing,' and 'autism.' The most relevant and up-to-date publications were identified and discussed. Search results were supplemented by the authors' own research in the field of emotional expression. RESULTS: We present a critical review of the currently available technical diagnostic and therapeutic methods. The most important studies are summarized in a table. CONCLUSION: Most of the currently available methods have not been adequately validated in clinical settings. They may be a great help in everyday practice; however, they need further testing. Future directions in this field include more virtual-reality-based and interactive interventions, as well as development and improvement of humanoid robots.
Assuntos
Emoções/fisiologia , Expressão Facial , Músculos Faciais/fisiologia , Reconhecimento Facial/fisiologia , Transtornos Mentais/fisiopatologia , Comunicação não Verbal/fisiologia , Percepção Social , Voz/fisiologia , HumanosRESUMO
PURPOSE OF REVIEW: Substantial research exists focusing on the various aspects and domains of early human development. However, there is a clear blind spot in early postnatal development when dealing with neurodevelopmental disorders, especially those that manifest themselves clinically only in late infancy or even in childhood. RECENT FINDINGS: This early developmental period may represent an important timeframe to study these disorders but has historically received far less research attention. We believe that only a comprehensive interdisciplinary approach will enable us to detect and delineate specific parameters for specific neurodevelopmental disorders at a very early age to improve early detection/diagnosis, enable prospective studies and eventually facilitate randomised trials of early intervention. In this article, we propose a dynamic framework for characterising neurofunctional biomarkers associated with specific disorders in the development of infants and children. We have named this automated detection 'Fingerprint Model', suggesting one possible approach to accurately and early identify neurodevelopmental disorders.
Assuntos
Biomarcadores , Diagnóstico Precoce , Transtornos do Neurodesenvolvimento/diagnóstico , HumanosRESUMO
In recent years, research fields, including ecology, bioacoustics, signal processing, and machine learning, have made bird sound recognition a part of their focus. This has led to significant advancements within the field of ornithology, such as improved understanding of evolution, local biodiversity, mating rituals, and even the implications and realities associated to climate change. The volume of unlabeled bird sound data is now overwhelming, and comparatively little exploration is being made into methods for how best to handle them. In this study, two active learning (AL) methods are proposed, sparse-instance-based active learning (SI-AL), and least-confidence-score-based active learning (LCS-AL), both effectively reducing the need for expert human annotation. To both of these AL paradigms, a kernel-based extreme learning machine (KELM) is then integrated, and a comparison is made to the conventional support vector machine (SVM). Experimental results demonstrate that, when the classifier capacity is improved from an unweighted average recall of 60%-80%, KELM can outperform SVM even when a limited proportion of human annotations are used from the pool of data in both cases of SI-AL (minimum 34.5% vs minimum 59.0%) and LCS-AL (minimum 17.3% vs minimum 28.4%).
Assuntos
Acústica , Aves/classificação , Aves/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Aprendizado de Máquina Supervisionado , Vocalização Animal/classificação , Animais , Bases de Dados Factuais , Máquina de Vetores de SuporteRESUMO
In their recent publication in Patterns, the authors proposed a methodology based on sample-free Bayesian neural networks and label smoothing to improve both predictive and calibration performance on animal call detection. Such approaches have the potential to foster trust in algorithmic decision making and enhance policy making in applications about conservation using recordings made by on-site passive acoustic monitoring equipment. This interview is a companion to these authors' recent paper, "Propagating Variational Model Uncertainty for Bioacoustic Call Label Smoothing".
RESUMO
Due to the objectivity of emotional expression in the central nervous system, EEG-based emotion recognition can effectively reflect humans' internal emotional states. In recent years, convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have made significant strides in extracting local features and temporal dependencies from EEG signals. However, CNNs ignore spatial distribution information from EEG electrodes; moreover, RNNs may encounter issues such as exploding/vanishing gradients and high time consumption. To address these limitations, we propose an attention-based temporal graph representation network (ATGRNet) for EEG-based emotion recognition. Firstly, a hierarchical attention mechanism is introduced to integrate feature representations from both frequency bands and channels ordered by priority in EEG signals. Second, a graph convolutional neural network with top-k operation is utilized to capture internal relationships between EEG electrodes under different emotion patterns. Next, a residual-based graph readout mechanism is applied to accumulate the EEG feature node-level representations into graph-level representations. Finally, the obtained graph-level representations are fed into a temporal convolutional network (TCN) to extract the temporal dependencies between EEG frames. We evaluated our proposed ATGRNet on the SEED, DEAP and FACED datasets. The experimental findings show that the proposed ATGRNet surpasses the state-of-the-art graph-based mehtods for EEG-based emotion recognition.
Assuntos
Eletroencefalografia , Emoções , Redes Neurais de Computação , Processamento de Sinais Assistido por Computador , Humanos , Eletroencefalografia/métodos , Emoções/fisiologia , AlgoritmosRESUMO
Automatically recognising apparent emotions from face and voice is hard, in part because of various sources of uncertainty, including in the input data and the labels used in a machine learning framework. This paper introduces an uncertainty-aware multimodal fusion approach that quantifies modality-wise aleatoric or data uncertainty towards emotion prediction. We propose a novel fusion framework, in which latent distributions over unimodal temporal context are learned by constraining their variance. These variance constraints, Calibration and Ordinal Ranking, are designed such that the variance estimated for a modality can represent how informative the temporal context of that modality is w.r.t. emotion recognition. When well-calibrated, modality-wise uncertainty scores indicate how much their corresponding predictions are likely to differ from the ground truth labels. Well-ranked uncertainty scores allow the ordinal ranking of different frames across different modalities. To jointly impose both these constraints, we propose a softmax distributional matching loss. Our evaluation on AVEC 2019 CES, CMU-MOSEI, and IEMOCAP datasets shows that the proposed multimodal fusion method not only improves the generalisation performance of emotion recognition models and their predictive uncertainty estimates, but also makes the models robust to novel noise patterns encountered at test time.
RESUMO
Ubiquitous sensing has been widely applied in smart healthcare, providing an opportunity for intelligent heart sound auscultation. However, smart devices contain sensitive information, raising user privacy concerns. To this end, federated learning (FL) has been adopted as an effective solution, enabling decentralised learning without data sharing, thus preserving data privacy in the Internet of Health Things (IoHT). Nevertheless, traditional FL requires the same architectural models to be trained across local clients and global servers, leading to a lack of model heterogeneity and client personalisation. For medical institutions with private data clients, this study proposes Fed-MStacking, a heterogeneous FL framework that incorporates a stacking ensemble learning strategy to support clients in building their own models. The secondary objective of this study is to address scenarios involving local clients with data characterised by inconsistent labelling. Specifically, the local client contains only one case type, and the data cannot be shared within or outside the institution. To train a global multi-class classifier, we aggregate missing class information from all clients at each institution and build meta-data, which then participates in FL training via a meta-learner. We apply the proposed framework to a multi-institutional heart sound database. The experiments utilise random forests (RFs), feedforward neural networks (FNNs), and convolutional neural networks (CNNs) as base classifiers. The results show that the heterogeneous stacking of local models performs better compared to homogeneous stacking.
Assuntos
Ruídos Cardíacos , Aprendizado de Máquina , Processamento de Sinais Assistido por Computador , Humanos , Ruídos Cardíacos/fisiologia , Algoritmos , Auscultação Cardíaca/métodos , AdultoRESUMO
This scoping review paper redefines the Artificial Intelligence-based Internet of Things (AIoT) driven Human Activity Recognition (HAR) field by systematically extrapolating from various application domains to deduce potential techniques and algorithms. We distill a general model with adaptive learning and optimization mechanisms by conducting a detailed analysis of human activity types and utilizing contact or non-contact devices. It presents various system integration mathematical paradigms driven by multimodal data fusion, covering predictions of complex behaviors and redefining valuable methods, devices, and systems for HAR. Additionally, this paper establishes benchmarks for behavior recognition across different application requirements, from simple localized actions to group activities. It summarizes open research directions, including data diversity and volume, computational limitations, interoperability, real-time recognition, data security, and privacy concerns. Finally, we aim to serve as a comprehensive and foundational resource for researchers delving into the complex and burgeoning realm of AIoT-enhanced HAR, providing insights and guidance for future innovations and developments.
RESUMO
Cardiovascular diseases are a prominent cause of mortality, emphasizing the need for early prevention and diagnosis. Utilizing artificial intelligence (AI) models, heart sound analysis emerges as a noninvasive and universally applicable approach for assessing cardiovascular health conditions. However, real-world medical data are dispersed across medical institutions, forming "data islands" due to data sharing limitations for security reasons. To this end, federated learning (FL) has been extensively employed in the medical field, which can effectively model across multiple institutions. Additionally, conventional supervised classification methods require fully labeled data classes, e.g., binary classification requires labeling of positive and negative samples. Nevertheless, the process of labeling healthcare data is time-consuming and labor-intensive, leading to the possibility of mislabeling negative samples. In this study, we validate an FL framework with a naive positive-unlabeled (PU) learning strategy. Semisupervised FL model can directly learn from a limited set of positive samples and an extensive pool of unlabeled samples. Our emphasis is on vertical-FL to enhance collaboration across institutions with different medical record feature spaces. Additionally, our contribution extends to feature importance analysis, where we explore 6 methods and provide practical recommendations for detecting abnormal heart sounds. The study demonstrated an impressive accuracy of 84%, comparable to outcomes in supervised learning, thereby advancing the application of FL in abnormal heart sound detection.
RESUMO
OBJECTIVE: Early diagnosis of cardiovascular diseases is a crucial task in medical practice. With the application of computer audition in the healthcare field, artificial intelligence (AI) has been applied to clinical non-invasive intelligent auscultation of heart sounds to provide rapid and effective pre-screening. However, AI models generally require large amounts of data which may cause privacy issues. Unfortunately, it is difficult to collect large amounts of healthcare data from a single centre. METHODS: In this study, we propose federated learning (FL) optimisation strategies for the practical application in multi-centre institutional heart sound databases. The horizontal FL is mainly employed to tackle the privacy problem by aligning the feature spaces of FL participating institutions without information leakage. In addition, techniques based on deep learning have poor interpretability due to their "black-box" property, which limits the feasibility of AI in real medical data. To this end, vertical FL is utilised to address the issues of model interpretability and data scarcity. CONCLUSION: Experimental results demonstrate that, the proposed FL framework can achieve good performance for heart sound abnormality detection by taking the personal privacy protection into account. Moreover, using the federated feature space is beneficial to balance the interpretability of the vertical FL and the privacy of the data. SIGNIFICANCE: This work realises the potential of FL from research to clinical practice, and is expected to have extensive application in the federated smart medical system.
Assuntos
Ruídos Cardíacos , Humanos , Ruídos Cardíacos/fisiologia , Processamento de Sinais Assistido por Computador , Masculino , Bases de Dados Factuais , Aprendizado Profundo , Adulto , Feminino , Algoritmos , Pessoa de Meia-Idade , Adulto Jovem , CriançaRESUMO
Objective: Millions of people in the UK have asthma, yet 70% do not access basic care, leading to the largest number of asthma-related deaths in Europe. Chatbots may extend the reach of asthma support and provide a bridge to traditional healthcare. This study evaluates 'Brisa', a chatbot designed to improve asthma patients' self-assessment and self-management. Methods: We recruited 150 adults with an asthma diagnosis to test our chatbot. Participants were recruited over three waves through social media and a research recruitment platform. Eligible participants had access to 'Brisa' via a WhatsApp or website version for 28 days and completed entry and exit questionnaires to evaluate user experience and asthma control. Weekly symptom tracking, user interaction metrics, satisfaction measures, and qualitative feedback were utilised to evaluate the chatbot's usability and potential effectiveness, focusing on changes in asthma control and self-reported behavioural improvements. Results: 74% of participants engaged with 'Brisa' at least once. High task completion rates were observed: asthma attack risk assessment (86%), voice recording submission (83%) and asthma control tracking (95.5%). Post use, an 8% improvement in asthma control was reported. User satisfaction surveys indicated positive feedback on helpfulness (80%), privacy (87%), trustworthiness (80%) and functionality (84%) but highlighted a need for improved conversational depth and personalisation. Conclusions: The study indicates that chatbots are effective for asthma support, demonstrated by the high usage of features like risk assessment and control tracking, as well as a statistically significant improvement in asthma control. However, lower satisfaction in conversational flexibility highlights rising expectations for chatbot fluency, influenced by advanced models like ChatGPT. Future health-focused chatbots must balance conversational capability with accuracy and safety to maintain engagement and effectiveness.