Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Pattern Recognit ; 122: 108361, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34629550

RESUMO

The sudden outbreak of COVID-19 has resulted in tough challenges for the field of biometrics due to its spread via physical contact, and the regulations of wearing face masks. Given these constraints, voice biometrics can offer a suitable contact-less biometric solution; they can benefit from models that classify whether a speaker is wearing a mask or not. This article reviews the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 COMputational PARalinguistics challengE (ComParE), which focused on the following classification task: Given an audio chunk of a speaker, classify whether the speaker is wearing a mask or not. First, we report the collection of the Mask Augsburg Speech Corpus (MASC) and the baseline approaches used to solve the problem, achieving a performance of 71.8 % Unweighted Average Recall (UAR). We then summarise the methodologies explored in the submitted and accepted papers that mainly used two common patterns: (i) phonetic-based audio features, or (ii) spectrogram representations of audio combined with Convolutional Neural Networks (CNNs) typically used in image processing. Most approaches enhance their models by adapting ensembles of different models and attempting to increase the size of the training data using various techniques. We review and discuss the results of the participants of this sub-challenge, where the winner scored a UAR of 80.1 % . Moreover, we present the results of fusing the approaches, leading to a UAR of 82.6 % . Finally, we present a smartphone app that can be used as a proof of concept demonstration to detect in real-time whether users are wearing a face mask; we also benchmark the run-time of the best models.

2.
Methods ; 151: 41-54, 2018 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-30099083

RESUMO

Due to the complex and intricate nature associated with their production, the acoustic-prosodic properties of a speech signal are modulated with a range of health related effects. There is an active and growing area of machine learning research in this speech and health domain, focusing on developing paradigms to objectively extract and measure such effects. Concurrently, deep learning is transforming intelligent signal analysis, such that machines are now reaching near human capabilities in a range of recognition and analysis tasks. Herein, we review current state-of-the-art approaches with speech-based health detection, placing a particular focus on the impact of deep learning within this domain. Based on this overview, it is evident while that deep learning based solutions be become more present in the literature, it has not had the same overall dominating effect seen in other related fields. In this regard, we suggest some possible research directions aimed at fully leveraging the advantages that deep learning can offer speech-based health detection.


Assuntos
Aprendizado Profundo/tendências , Fala , Acústica , Humanos , Redes Neurais de Computação
3.
Psychiatry Clin Neurosci ; 73(2): 50-62, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-30565801

RESUMO

AIM: Emotional expressions are one of the most widely studied topics in neuroscience, from both clinical and non-clinical perspectives. Atypical emotional expressions are seen in various psychiatric conditions, including schizophrenia, depression, and autism spectrum conditions. Understanding the basics of emotional expressions and recognition can be crucial for diagnostic and therapeutic procedures. Emotions can be expressed in the face, gesture, posture, voice, and behavior and affect physiological parameters, such as the heart rate or body temperature. With modern technology, clinicians can use a variety of tools ranging from sophisticated laboratory equipment to smartphones and web cameras. The aim of this paper is to review the currently used tools using modern technology and discuss their usefulness as well as possible future directions in emotional expression research and treatment strategies. METHODS: The authors conducted a literature review in the PubMed, EBSCO, and SCOPUS databases, using the following key words: 'emotions,' 'emotional expression,' 'affective computing,' and 'autism.' The most relevant and up-to-date publications were identified and discussed. Search results were supplemented by the authors' own research in the field of emotional expression. RESULTS: We present a critical review of the currently available technical diagnostic and therapeutic methods. The most important studies are summarized in a table. CONCLUSION: Most of the currently available methods have not been adequately validated in clinical settings. They may be a great help in everyday practice; however, they need further testing. Future directions in this field include more virtual-reality-based and interactive interventions, as well as development and improvement of humanoid robots.


Assuntos
Emoções/fisiologia , Expressão Facial , Músculos Faciais/fisiologia , Reconhecimento Facial/fisiologia , Transtornos Mentais/fisiopatologia , Comunicação não Verbal/fisiologia , Percepção Social , Voz/fisiologia , Humanos
4.
J Acoust Soc Am ; 142(4): 1796, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-29092546

RESUMO

In recent years, research fields, including ecology, bioacoustics, signal processing, and machine learning, have made bird sound recognition a part of their focus. This has led to significant advancements within the field of ornithology, such as improved understanding of evolution, local biodiversity, mating rituals, and even the implications and realities associated to climate change. The volume of unlabeled bird sound data is now overwhelming, and comparatively little exploration is being made into methods for how best to handle them. In this study, two active learning (AL) methods are proposed, sparse-instance-based active learning (SI-AL), and least-confidence-score-based active learning (LCS-AL), both effectively reducing the need for expert human annotation. To both of these AL paradigms, a kernel-based extreme learning machine (KELM) is then integrated, and a comparison is made to the conventional support vector machine (SVM). Experimental results demonstrate that, when the classifier capacity is improved from an unweighted average recall of 60%-80%, KELM can outperform SVM even when a limited proportion of human annotations are used from the pool of data in both cases of SI-AL (minimum 34.5% vs minimum 59.0%) and LCS-AL (minimum 17.3% vs minimum 28.4%).


Assuntos
Acústica , Aves/classificação , Aves/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Aprendizado de Máquina Supervisionado , Vocalização Animal/classificação , Animais , Bases de Dados Factuais , Máquina de Vetores de Suporte
5.
iScience ; 27(3): 109175, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38433918

RESUMO

Cross-cultural studies of the meaning of facial expressions have largely focused on judgments of small sets of stereotypical images by small numbers of people. Here, we used large-scale data collection and machine learning to map what facial expressions convey in six countries. Using a mimicry paradigm, 5,833 participants formed facial expressions found in 4,659 naturalistic images, resulting in 423,193 participant-generated facial expressions. In their own language, participants also rated each expression in terms of 48 emotions and mental states. A deep neural network tasked with predicting the culture-specific meanings people attributed to facial movements while ignoring physical appearance and context discovered 28 distinct dimensions of facial expression, with 21 dimensions showing strong evidence of universality and the remainder showing varying degrees of cultural specificity. These results capture the underlying dimensions of the meanings of facial expressions within and across cultures in unprecedented detail.

6.
Nat Hum Behav ; 7(2): 240-250, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36577898

RESUMO

Human social life is rich with sighs, chuckles, shrieks and other emotional vocalizations, called 'vocal bursts'. Nevertheless, the meaning of vocal bursts across cultures is only beginning to be understood. Here, we combined large-scale experimental data collection with deep learning to reveal the shared and culture-specific meanings of vocal bursts. A total of n = 4,031 participants in China, India, South Africa, the USA and Venezuela mimicked vocal bursts drawn from 2,756 seed recordings. Participants also judged the emotional meaning of each vocal burst. A deep neural network tasked with predicting the culture-specific meanings people attributed to vocal bursts while disregarding context and speaker identity discovered 24 acoustic dimensions, or kinds, of vocal expression with distinct emotion-related meanings. The meanings attributed to these complex vocal modulations were 79% preserved across the five countries and three languages. These results reveal the underlying dimensions of human emotional vocalization in remarkable detail.


Assuntos
Aprendizado Profundo , Voz , Humanos , Emoções , Idioma , Acústica
7.
Front Digit Health ; 5: 1058163, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36969956

RESUMO

The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present a summary of the results from the INTERSPEECH 2021 Computational Paralinguistics Challenges: COVID-19 Cough, (CCS) and COVID-19 Speech, (CSS).

8.
Front Digit Health ; 5: 1196079, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37767523

RESUMO

Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearables and other intelligent sensors. In comparison, computer audition can be seen to be lagging behind, at least in terms of commercial interest. Yet, audition has long been a staple assistant for medical practitioners, with the stethoscope being the quintessential sign of doctors around the world. Transforming this traditional technology with the use of AI entails a set of unique challenges. We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data; and, finally, Responsibly, for ensuring compliance to the ethical standards accorded to the field of medicine. Thus, we provide an overview and perspective of HEAR4Health: the sketch of a modern, ubiquitous sensing system that can bring computer audition on par with other AI technologies in the strive for improved healthcare systems.

9.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 2619-2622, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-36086183

RESUMO

Stress is a major threat to well-being that manifests in a variety of physiological and mental symptoms. Utilising speech samples collected while the subject is undergoing an induced stress episode has recently shown promising results for the automatic characterisation of individual stress responses. In this work, we introduce new findings that shed light onto whether speech signals are suited to model physiological biomarkers, as obtained via cortisol measurements, or self-assessed appraisal and affect measurements. Our results show that different indicators impact acoustic features in a diverse way, but that their complimentary information can nevertheless be effectively harnessed by a multi-tasking architecture to improve prediction performance for all of them.


Assuntos
Resolução de Problemas , Estresse Psicológico , Fala , Estresse Psicológico/psicologia
10.
BMJ Innov ; 7(2): 356-362, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34192022

RESUMO

BACKGROUND: Since the emergence of COVID-19 in December 2019, multidisciplinary research teams have wrestled with how best to control the pandemic in light of its considerable physical, psychological and economic damage. Mass testing has been advocated as a potential remedy; however, mass testing using physical tests is a costly and hard-to-scale solution. METHODS: This study demonstrates the feasibility of an alternative form of COVID-19 detection, harnessing digital technology through the use of audio biomarkers and deep learning. Specifically, we show that a deep neural network based model can be trained to detect symptomatic and asymptomatic COVID-19 cases using breath and cough audio recordings. RESULTS: Our model, a custom convolutional neural network, demonstrates strong empirical performance on a data set consisting of 355 crowdsourced participants, achieving an area under the curve of the receiver operating characteristics of 0.846 on the task of COVID-19 classification. CONCLUSION: This study offers a proof of concept for diagnosing COVID-19 using cough and breath audio signals and motivates a comprehensive follow-up research study on a wider data sample, given the evident advantages of a low-cost, highly scalable digital COVID-19 diagnostic tool.

11.
Trends Hear ; 25: 23312165211046135, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34751066

RESUMO

Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human listener is quickly able to interpret layers of sound at a single time-point, with each layer varying in characteristics such as location, state, and trait. Currently, integrated machine listening approaches, on the other hand, will mainly recognise only single events. In this context, this contribution aims to provide key insights and approaches, which can be applied in computer audition to achieve the goal of a more holistic intelligent understanding system, as well as identifying challenges in reaching this goal. We firstly summarise the state-of-the-art in traditional signal-processing-based audio pre-processing and feature representation, as well as automated learning such as by deep neural networks. This concerns, in particular, audio interpretation, decomposition, understanding, as well as ontologisation. We then present an agent-based approach for integrating these concepts as a holistic audio understanding system. Based on this, concluding, avenues are given towards reaching the ambitious goal of 'holistic human-parity' machine listening abilities.


Assuntos
Redes Neurais de Computação , Processamento de Sinais Assistido por Computador , Humanos , Inteligência , Aprendizagem , Som
12.
Front Big Data ; 3: 25, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33693398

RESUMO

Data shapes the development of Artificial Intelligence (AI) as we currently know it, and for many years centralized networking infrastructures have dominated both the sourcing and subsequent use of such data. Research suggests that centralized approaches result in poor representation, and as AI is now integrated more in daily life, there is a need for efforts to improve on this. The AI research community has begun to explore managing data infrastructures more democratically, finding that decentralized networking allows for more transparency which can alleviate core ethical concerns, such as selection-bias. With this in mind, herein, we present a mini-survey framed around data representation and data infrastructures in AI. We outline four key considerations (auditing, benchmarking, confidence and trust, explainability and interpretability) as they pertain to data-driven AI, and propose that reflection of them, along with improved interdisciplinary discussion may aid the mitigation of data-based AI ethical concerns, and ultimately improve individual wellbeing when interacting with AI.

13.
Int J Speech Technol ; 23(1): 169-182, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-34867074

RESUMO

Most typically developed individuals have the ability to perceive emotions encoded in speech; yet, factors such as age or environmental conditions can restrict this inherent skill. Noise pollution and multimedia over-stimulation are common components of contemporary society, and have shown to particularly impair a child's interpersonal skills. Assessing the influence of such features on the perception of emotion over different developmental stages will advance child-related research. The presented work evaluates how background noise and emotionally connoted visual stimuli affect a child's perception of emotional speech. A total of 109 subjects from Spain and Germany (4-14 years) evaluated 20 multi-modal instances of nonsense emotional speech, under several environmental and visual conditions. A control group of 17 Spanish adults performed the same perception test. Results suggest that visual stimulation, gender, and the two sub-cultures with different language background do not influence a child's perception; yet, background noise does compromise their ability to correctly identify emotion in speech-a phenomenon that seems to decrease with age.

14.
Front Robot AI ; 6: 116, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-33501131

RESUMO

During both positive and negative dyadic exchanges, individuals will often unconsciously imitate their partner. A substantial amount of research has been made on this phenomenon, and such studies have shown that synchronization between communication partners can improve interpersonal relationships. Automatic computational approaches for recognizing synchrony are still in their infancy. In this study, we extend on previous work in which we applied a novel method utilizing hand-crafted low-level acoustic descriptors and autoencoders (AEs) to analyse synchrony in the speech domain. For this purpose, a database consisting of 394 in-the-wild speakers from six different cultures, is used. For each speaker in the dyadic exchange, two AEs are implemented. Post the training phase, the acoustic features for one of the speakers is tested using the AE trained on their dyadic partner. In this same way, we also explore the benefits that deep representations from audio may have, implementing the state-of-the-art Deep Spectrum toolkit. For all speakers at varied time-points during their interaction, the calculation of reconstruction error from the AE trained on their respective dyadic partner is made. The results obtained from this acoustic analysis are then compared with the linguistic experiments based on word counts and word embeddings generated by our word2vec approach. The results demonstrate that there is a degree of synchrony during all interactions. We also find that, this degree varies across the 6 cultures found in the investigated database. These findings are further substantiated through the use of 4,096 dimensional Deep Spectrum features.

15.
Artigo em Inglês | MEDLINE | ID: mdl-31765322

RESUMO

Auscultation of the heart is a widely studied technique, which requires precise hearing from practitioners as a means of distinguishing subtle differences in heart-beat rhythm. This technique is popular due to its non-invasive nature, and can be an early diagnosis aid for a range of cardiac conditions. Machine listening approaches can support this process, monitoring continuously and allowing for a representation of both mild and chronic heart conditions. Despite this potential, relevant databases and benchmark studies are scarce. In this paper, we introduce our publicly accessible database, the Heart Sounds Shenzhen Corpus (HSS), which was first released during the recent INTERSPEECH 2018 ComParE Heart Sound sub-challenge. Additionally, we provide a survey of machine learning work in the area of heart sound recognition, as well as a benchmark for HSS utilising standard acoustic features and machine learning models. At best our support vector machine with Log Mel features achieves 49.7% unweighted average recall on a three category task (normal, mild, moderate/severe).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA