Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
J Acoust Soc Am ; 144(5): EL410, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30522292

RESUMO

Recent research has revealed substantial between-speaker variation in speech rhythm, which in effect refers to the coordination of consonants and vowels over time. In the current proof-of-concept study, the hypothesis was investigated that these idiosyncrasies arise, in part, from differences in the tongue's movement amplitude. Speech rhythm was parameterized by means of the percentage over which speech is vocalic (%V) in the German pronoun "sie" [ziː]. The findings support the hypothesis: all else being equal, idiosyncratic %V values behaved proportionally to a speaker's tongue movement area. This research underlines the importance of studying language-external factors, such as a speaker's individual tongue movement behavior, to investigate variation in temporal coordination.


Assuntos
Movimento/fisiologia , Fala/fisiologia , Língua/fisiologia , Adulto , Algoritmos , Fenômenos Eletromagnéticos , Feminino , Alemanha/epidemiologia , Humanos , Idioma , Masculino , Fonética , Fala/classificação , Fatores de Tempo , Língua/anatomia & histologia
2.
Behav Brain Sci ; 40: e46, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26434499

RESUMO

How does sign language compare with gesture, on the one hand, and spoken language on the other? Sign was once viewed as nothing more than a system of pictorial gestures without linguistic structure. More recently, researchers have argued that sign is no different from spoken language, with all of the same linguistic structures. The pendulum is currently swinging back toward the view that sign is gestural, or at least has gestural components. The goal of this review is to elucidate the relationships among sign language, gesture, and spoken language. We do so by taking a close look not only at how sign has been studied over the past 50 years, but also at how the spontaneous gestures that accompany speech have been studied. We conclude that signers gesture just as speakers do. Both produce imagistic gestures along with more categorical signs or words. Because at present it is difficult to tell where sign stops and gesture begins, we suggest that sign should not be compared with speech alone but should be compared with speech-plus-gesture. Although it might be easier (and, in some cases, preferable) to blur the distinction between sign and gesture, we argue that distinguishing between sign (or speech) and gesture is essential to predict certain types of learning and allows us to understand the conditions under which gesture takes on properties of sign, and speech takes on properties of gesture. We end by calling for new technology that may help us better calibrate the borders between sign and gesture.


Assuntos
Gestos , Língua de Sinais , Fala/classificação , Humanos , Desenvolvimento da Linguagem , Aprendizagem/fisiologia , Fala/fisiologia
3.
Sensors (Basel) ; 16(1)2015 Dec 25.
Artigo em Inglês | MEDLINE | ID: mdl-26712757

RESUMO

In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one.


Assuntos
Emoções/classificação , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Fala/classificação , Feminino , Humanos , Masculino
4.
Sensors (Basel) ; 16(1)2015 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-26729126

RESUMO

In order to improve the speech acquisition ability of a non-contact method, a 94 GHz millimeter wave (MMW) radar sensor was employed to detect speech signals. This novel non-contact speech acquisition method was shown to have high directional sensitivity, and to be immune to strong acoustical disturbance. However, MMW radar speech is often degraded by combined sources of noise, which mainly include harmonic, electrical circuit and channel noise. In this paper, an algorithm combining empirical mode decomposition (EMD) and mutual information entropy (MIE) was proposed for enhancing the perceptibility and intelligibility of radar speech. Firstly, the radar speech signal was adaptively decomposed into oscillatory components called intrinsic mode functions (IMFs) by EMD. Secondly, MIE was used to determine the number of reconstructive components, and then an adaptive threshold was employed to remove the noise from the radar speech. The experimental results show that human speech can be effectively acquired by a 94 GHz MMW radar sensor when the detection distance is 20 m. Moreover, the noise of the radar speech is greatly suppressed and the speech sounds become more pleasant to human listeners after being enhanced by the proposed algorithm, suggesting that this novel speech acquisition and enhancement method will provide a promising alternative for various applications associated with speech detection.


Assuntos
Algoritmos , Processamento de Sinais Assistido por Computador , Fala/classificação , Adulto , Feminino , Humanos , Masculino , Espectrografia do Som , Adulto Jovem
5.
J Acoust Soc Am ; 136(6): 3272, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25480073

RESUMO

The production and perception of Dutch whispered boundary tones, i.e., phrasal prosody, was investigated as a function of characteristics of the tone-bearing word, i.e., lexical prosody. More specifically, the disyllabic tone-bearing word also carried a pitch accent, either on the same syllable as the boundary tone (clash condition), or on the directly adjacent syllable (no clash condition). In a statement/question classification task listeners showed moderate, but above-chance performance for both conditions in whisper, which, however, was much worse as well as slower than in normal speech. The syllabic rhymes of speakers' productions were investigated for acoustic correlates of boundary tones. Results showed mainly secondary cues to intonation, that is, cues that are present in whisper as in normal speech, but minimal compensatory cues, which would reflect speakers' efforts to enhance their whispered speech signal in some way. This suggests that multiple prosodic events in close proximity are challenging to perceive and produce in whispered speech. A moderate increase in classification performance was found when that acoustic cue was enhanced that whispering speakers seemed to employ in a compensatory way: changing the spectral tilt of the utterance-final syllable improved perception of especially the poorer speakers and of intonation on stressed syllables.


Assuntos
Fonação , Fonética , Acústica da Fala , Percepção da Fala , Medida da Produção da Fala , Fala , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Tempo de Reação , Fala/classificação , Medida da Produção da Fala/classificação , Adulto Jovem
6.
Behav Res Methods ; 45(3): 758-64, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23239073

RESUMO

The aim of this article is to describe a database of diphone positional frequencies in French. More specifically, we provide frequencies for word-initial, word-internal, and word-final diphones of all words extracted from a subtitle corpus of 50 million words that come from movie and TV series dialogue. We also provide intra- and intersyllable diphone frequencies, as well as interword diphone frequencies. To our knowledge, no other such tool is available to psycholinguists for the study of French sequential probabilities. This database and its new indicators should help researchers conducting new studies on speech segmentation.


Assuntos
Bases de Dados Factuais , Idioma , Fonética , Fala/classificação , Adulto , Humanos , Psicolinguística , Semântica
7.
Behav Res Methods ; 45(1): 191-202, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22718287

RESUMO

The temporal characteristics of speech can be captured by examining the distributions of the durations of measurable speech components, namely speech segment durations and pause durations. However, several barriers prevent the easy analysis of pause durations: The first problem is that natural speech is noisy, and although recording contrived speech minimizes this problem, it also discards diagnostic information about cognitive processes inherent in the longer pauses associated with natural speech. The second issue concerns setting the distribution threshold, and consists of the problem of appropriately classifying pause segments as either short pauses reflecting articulation or long pauses reflecting cognitive processing, while minimizing the overall classification error rate. This article describes a fully automated system for determining the locations of speech-pause transitions and estimating the temporal parameters of both speech and pause distributions in natural speech. We use the properties of Gaussian mixture models at several stages of the analysis, in order to identify theoretical components of the data distributions, to classify speech components, to compute durations, and to calculate the relevant statistics.


Assuntos
Algoritmos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Acústica da Fala , Testes de Articulação da Fala , Medida da Produção da Fala/métodos , Fala/classificação , Teorema de Bayes , Cognição/fisiologia , Humanos , Distribuição Normal , Reprodutibilidade dos Testes , Mecânica Respiratória/fisiologia , Taxa Respiratória , Fala/fisiologia , Medida da Produção da Fala/instrumentação , Fatores de Tempo
8.
Psychother Res ; 22(3): 348-62, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22417083

RESUMO

This study compared participants' speech acts in low-hostile versus moderate-hostile interpersonal episodes in time-limited psychodynamic psychotherapy. Sixty-two cases from the Vanderbilt II psychotherapy project were categorized as low or moderate in interpersonal hostility based on ratings of interpersonal process using Structural Analysis of Social Behavior (Benjamin, 1996). Representative episodes were coded using a taxonomy of speech acts (Stiles, 1992), and speech acts were compared across low- and moderate-hostile episodes. Therapists in moderate-hostility episodes used more interpretations and edifications, and fewer questions and reflections. Patients in moderate-hostility episodes used more disclosures and fewer edifications. Content coding showed that therapist interpretations with a self/intrapsychic self focus were more characteristic of moderate-hostility than low-hostility episodes, whereas the two types of episodes contained similar levels of interpretations focused on the patient's interpersonal relationships and the therapeutic relationship.


Assuntos
Hostilidade , Relações Interpessoais , Psicoterapia , Fala/classificação , Adulto , Estudos de Coortes , Revelação , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
9.
PLoS One ; 16(10): e0258178, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34597350

RESUMO

Measurements of the physical outputs of speech-vocal tract geometry and acoustic energy-are high-dimensional, but linguistic theories posit a low-dimensional set of categories such as phonemes and phrase types. How can it be determined when and where in high-dimensional articulatory and acoustic signals there is information related to theoretical categories? For a variety of reasons, it is problematic to directly quantify mutual information between hypothesized categories and signals. To address this issue, a multi-scale analysis method is proposed for localizing category-related information in an ensemble of speech signals using machine learning algorithms. By analyzing how classification accuracy on unseen data varies as the temporal extent of training input is systematically restricted, inferences can be drawn regarding the temporal distribution of category-related information. The method can also be used to investigate redundancy between subsets of signal dimensions. Two types of theoretical categories are examined in this paper: phonemic/gestural categories and syntactic relative clause categories. Moreover, two different machine learning algorithms were examined: linear discriminant analysis and neural networks with long short-term memory units. Both algorithms detected category-related information earlier and later in signals than would be expected given standard theoretical assumptions about when linguistic categories should influence speech. The neural network algorithm was able to identify category-related information to a greater extent than the discriminant analyses.


Assuntos
Acústica , Aprendizado de Máquina , Percepção da Fala/fisiologia , Fala/classificação , Algoritmos , Análise Discriminante , Gestos , Humanos , Redes Neurais de Computação , Língua/fisiologia
10.
Neural Netw ; 136: 87-96, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33453522

RESUMO

In this paper, we propose Stacked DeBERT, short for StackedDenoising Bidirectional Encoder Representations from Transformers. This novel model improves robustness in incomplete data, when compared to existing systems, by designing a novel encoding scheme in BERT, a powerful language representation model solely based on attention mechanisms. Incomplete data in natural language processing refer to text with missing or incorrect words, and its presence can hinder the performance of current models that were not implemented to withstand such noises, but must still perform well even under duress. This is due to the fact that current approaches are built for and trained with clean and complete data, and thus are not able to extract features that can adequately represent incomplete data. Our proposed approach consists of obtaining intermediate input representations by applying an embedding layer to the input tokens followed by vanilla transformers. These intermediate features are given as input to novel denoising transformers which are responsible for obtaining richer input representations. The proposed approach takes advantage of stacks of multilayer perceptrons for the reconstruction of missing words' embeddings by extracting more abstract and meaningful hidden feature vectors, and bidirectional transformers for improved embedding representation. We consider two datasets for training and evaluation: the Chatbot Natural Language Understanding Evaluation Corpus and Kaggle's Twitter Sentiment Corpus. Our model shows improved F1-scores and better robustness in informal/incorrect texts present in tweets and in texts with Speech-to-Text error in the sentiment and intent classification tasks.1.


Assuntos
Bases de Dados Factuais/classificação , Processamento de Linguagem Natural , Redes Neurais de Computação , Fala/classificação , Humanos , Idioma
11.
PLoS One ; 16(4): e0250173, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33930026

RESUMO

SUBESCO is an audio-only emotional speech corpus for Bangla language. The total duration of the corpus is in excess of 7 hours containing 7000 utterances, and it is the largest emotional speech corpus available for this language. Twenty native speakers participated in the gender-balanced set, each recording of 10 sentences simulating seven targeted emotions. Fifty university students participated in the evaluation of this corpus. Each audio clip of this corpus, except those of Disgust emotion, was validated four times by male and female raters. Raw hit rates and unbiased rates were calculated producing scores above chance level of responses. Overall recognition rate was reported to be above 70% for human perception tests. Kappa statistics and intra-class correlation coefficient scores indicated high-level of inter-rater reliability and consistency of this corpus evaluation. SUBESCO is an Open Access database, licensed under Creative Common Attribution 4.0 International, and can be downloaded free of charge from the web link: https://doi.org/10.5281/zenodo.4526477.


Assuntos
Fala/classificação , Adulto , Bangladesh , Emoções , Feminino , Humanos , Índia , Idioma , Masculino , Reconhecimento Psicológico , Reprodutibilidade dos Testes , Percepção da Fala , Comportamento Verbal
12.
Curr Opin Neurol ; 23(6): 633-7, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20852419

RESUMO

PURPOSE OF REVIEW: The aim is to explore the evolution of the logopenic variant of primary progressive aphasia as a distinct clinical entity and to outline recent advances that have clarified its clinical characteristics, neural underpinnings, and potential genetic and pathological bases. This is particularly relevant as researchers attempt to identify clinico-pathological relationships in subtypes of primary progressive aphasia in hopes of utilizing language phenotype as a marker of underlying disease. RECENT FINDINGS: Recent work has served to refine and expand upon the clinical phenotype of the logopenic variant. Logopenic patients show a unique pattern of spared and impaired language processes that reliably distinguish this syndrome from other variants of progressive aphasia. Specifically, they exhibit deficits in naming and repetition in the context of spared semantic, syntactic, and motor speech abilities. Further, there is a growing body of evidence indicating a possible link between the logopenic phenotype and specific pathological and genetic correlates. SUMMARY: Findings indicate that the logopenic variant is a distinct subtype of progressive aphasia that may hold value as a predictor of underlying pathology. Additional research, however, is warranted in order to further clarify the cognitive-linguistic profile and to confirm its relation to certain pathological and genetic processes.


Assuntos
Afasia Primária Progressiva/genética , Afasia Primária Progressiva/patologia , Variação Genética , Fala/fisiologia , Comportamento Verbal/fisiologia , Afasia Primária Progressiva/classificação , Afasia Primária Progressiva/diagnóstico , Diagnóstico Diferencial , Progressão da Doença , Predisposição Genética para Doença/genética , Humanos , Fenótipo , Fala/classificação , Comportamento Verbal/classificação
13.
Cogn Behav Neurol ; 23(3): 165-77, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20829666

RESUMO

OBJECTIVE: To evaluate the use of a semiautomated computerized system for measuring speech and language characteristics in patients with frontotemporal lobar degeneration (FTLD). BACKGROUND: FTLD is a heterogeneous disorder comprising at least 3 variants. Computerized assessment of spontaneous verbal descriptions by patients with FTLD offers a detailed and reproducible view of the underlying cognitive deficits. METHODS: Audiorecorded speech samples of 38 patients from 3 participating medical centers were elicited using the Cookie Theft stimulus. Each patient underwent a battery of neuropsychologic tests. The audio was analyzed by the computerized system to measure 15 speech and language variables. Analysis of variance was used to identify characteristics with significant differences in means between FTLD variants. Factor analysis was used to examine the implicit relations between subsets of the variables. RESULTS: Semiautomated measurements of pause-to-word ratio and pronoun-to-noun ratio were able to discriminate between some of the FTLD variants. Principal component analysis of all 14 variables suggested 4 subjectively defined components (length, hesitancy, empty content, grammaticality) corresponding to the phenomenology of FTLD variants. CONCLUSION: Semiautomated language and speech analysis is a promising novel approach to neuropsychologic assessment that offers a valuable contribution to the toolbox of researchers in dementia and other neurodegenerative disorders.


Assuntos
Diagnóstico por Computador/métodos , Degeneração Lobar Frontotemporal/diagnóstico , Testes de Linguagem , Psicolinguística/métodos , Validação de Programas de Computador , Comportamento Verbal/classificação , Diagnóstico por Computador/instrumentação , Humanos , Testes Neuropsicológicos , Análise de Componente Principal , Psicolinguística/instrumentação , Fala/classificação , Medida da Produção da Fala/métodos , Interface para o Reconhecimento da Fala
14.
Pol Merkur Lekarski ; 28(166): 277-83, 2010 Apr.
Artigo em Polonês | MEDLINE | ID: mdl-20491337

RESUMO

UNLABELLED: The aim of the rehabilitation in laryngectomized patients is to produce phonatory communication. It is important to choose the optimal method of rehabilitation. Most of the patients use the oesophageal or pharyngeal speech as an effect of natural rehabilitation with vocalistic method. Another group of larygectomized patients is rehabilitated with surgical method which leads to shunt speech. THE AIM OF STUDY was to compare the quality of oesophageal and shunt speech with euphonic voice to choose the optimal method of rehabilitation in laryngectomized patients. MATERIAL AND METHODS: The quality of vicarious phonation was examined in 30 patients with shunt speech and in 20 patients with oesophageal speech. Examination results of the subjective, objective and acoustic assessment were compared with values registered in physiological (euphonic) speech. The results of objective assessment were statistically analysed. RESULTS: The shunt and oesophageal speech enabled effective verbal communication of laryngectomized patients. The parameters of clinical subjective and objective assessment of shunt speech pointed to its high quality which is similar to physiological phonation. CONCLUSIONS: The acoustic analysis of voice confirmed the results of subjective and objective assessment of quality of shunt voice and speech in laryngectomized patients. In conclusion, the surgical rehabilitation of voice after total laryngectomy gave patients the great opportunity for a remarkable improvement in vicarious phonation.


Assuntos
Laringectomia/reabilitação , Voz Alaríngea/métodos , Fala/classificação , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fonação , Qualidade da Voz
15.
Elife ; 92020 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-32223894

RESUMO

Speech perception presumably arises from internal models of how specific sensory features are associated with speech sounds. These features change constantly (e.g. different speakers, articulation modes etc.), and listeners need to recalibrate their internal models by appropriately weighing new versus old evidence. Models of speech recalibration classically ignore this volatility. The effect of volatility in tasks where sensory cues were associated with arbitrary experimenter-defined categories were well described by models that continuously adapt the learning rate while keeping a single representation of the category. Using neurocomputational modelling we show that recalibration of natural speech sound categories is better described by representing the latter at different time scales. We illustrate our proposal by modeling fast recalibration of speech sounds after experiencing the McGurk effect. We propose that working representations of speech categories are driven both by their current environment and their long-term memory representations.


People can distinguish words or syllables even though they may sound different with every speaker. This striking ability reflects the fact that our brain is continually modifying the way we recognise and interpret the spoken word based on what we have heard before, by comparing past experience with the most recent one to update expectations. This phenomenon also occurs in the McGurk effect: an auditory illusion in which someone hears one syllable but sees a person saying another syllable and ends up perceiving a third distinct sound. Abstract models, which provide a functional rather than a mechanistic description of what the brain does, can test how humans use expectations and prior knowledge to interpret the information delivered by the senses at any given moment. Olasagasti and Giraud have now built an abstract model of how brains recalibrate perception of natural speech sounds. By fitting the model with existing experimental data using the McGurk effect, the results suggest that, rather than using a single sound representation that is adjusted with each sensory experience, the brain recalibrates sounds at two different timescales. Over and above slow "procedural" learning, the findings show that there is also rapid recalibration of how different sounds are interpreted. This working representation of speech enables adaptation to changing or noisy environments and illustrates that the process is far more dynamic and flexible than previously thought.


Assuntos
Simulação por Computador , Fonética , Percepção da Fala , Fala/classificação , Estimulação Acústica , Percepção Auditiva , Humanos , Fala/fisiologia , Fatores de Tempo
16.
Dev Sci ; 12(3): 388-95, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19371361

RESUMO

When learning language, young children are faced with many seemingly formidable challenges, including discovering words embedded in a continuous stream of sounds and determining what role these words play in syntactic constructions. We suggest that knowledge of phoneme distributions may play a crucial part in helping children segment words and determine their lexical category, and we propose an integrated model of how children might go from unsegmented speech to lexical categories. We corroborated this theoretical model using a two-stage computational analysis of a large corpus of English child-directed speech. First, we used transition probabilities between phonemes to find words in unsegmented speech. Second, we used distributional information about word edges--the beginning and ending phonemes of words--to predict whether the segmented words from the first stage were nouns, verbs, or something else. The results indicate that discovering lexical units and their associated syntactic category in child-directed speech is possible by attending to the statistics of single phoneme transitions and word-initial and final phonemes. Thus, we suggest that a core computational principle in language acquisition is that the same source of information is used to learn about different aspects of linguistic structure.


Assuntos
Desenvolvimento da Linguagem , Modelos Psicológicos , Acústica da Fala , Fala/classificação , Vocabulário , Humanos , Idioma , Probabilidade , Som , Distribuições Estatísticas
17.
J Speech Lang Hear Res ; 62(9): 3265-3275, 2019 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-31433709

RESUMO

Purpose To better enable communication among researchers, clinicians, and caregivers, we aimed to assess how untrained listeners classify early infant vocalization types in comparison to terms currently used by researchers and clinicians. Method Listeners were caregivers with no prior formal education in speech and language development. A 1st group of listeners reported on clinician/researcher-classified vowel, squeal, growl, raspberry, whisper, laugh, and cry vocalizations obtained from archived video/audio recordings of 10 infants from 4 through 12 months of age. A list of commonly used terms was generated based on listener responses and the standard research terminology. A 2nd group of listeners was presented with the same vocalizations and asked to select terms from the list that they thought best described the sounds. Results Classifications of the vocalizations by listeners largely overlapped with published categorical descriptors and yielded additional insight into alternate terms commonly used. The biggest discrepancies were found for the vowel category. Conclusion Prior research has shown that caregivers are accurate in identifying canonical babbling, a major prelinguistic vocalization milestone occurring at about 6-7 months of age. This indicates that caregivers are also well attuned to even earlier emerging vocalization types. This supports the value of continuing basic and clinical research on the vocal types infants produce in the 1st months of life and on their potential diagnostic utility, and may also help improve communication between speech-language pathologists and families.


Assuntos
Linguagem Infantil , Fonação/fisiologia , Fala/classificação , Fala/fisiologia , Adulto , Feminino , Audição , Humanos , Lactente , Masculino , Adulto Jovem
18.
IEEE J Biomed Health Inform ; 23(6): 2294-2301, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31034426

RESUMO

Childhood anxiety and depression often go undiagnosed. If left untreated these conditions, collectively known as internalizing disorders, are associated with long-term negative outcomes including substance abuse and increased risk for suicide. This paper presents a new approach for identifying young children with internalizing disorders using a 3-min speech task. We show that machine learning analysis of audio data from the task can be used to identify children with an internalizing disorder with 80% accuracy (54% sensitivity, 93% specificity). The speech features most discriminative of internalizing disorder are analyzed in detail, showing that affected children exhibit especially low-pitch voices, with repeatable speech inflections and content, and high-pitched response to surprising stimuli relative to controls. This new tool is shown to outperform clinical thresholds on parent-reported child symptoms, which identify children with an internalizing disorder with lower accuracy (67-77% versus 80%), and similar specificity (85-100% versus 93%), and sensitivity (0-58% versus 54%) in this sample. These results point toward the future use of this approach for screening children for internalizing disorders so that interventions can be deployed when they have the highest chance for long-term success.


Assuntos
Ansiedade/diagnóstico , Depressão/diagnóstico , Aprendizado de Máquina , Fala/classificação , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Psicopatologia , Processamento de Sinais Assistido por Computador
19.
IEEE J Biomed Health Inform ; 23(6): 2265-2275, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31478879

RESUMO

Currently, depression has become a common mental disorder and one of the main causes of disability worldwide. Due to the difference in depressive symptoms evoked by individual differences, how to design comprehensive and effective depression detection methods has become an urgent demand. This study explored from physiological and behavioral perspectives simultaneously and fused pervasive electroencephalography (EEG) and vocal signals to make the detection of depression more objective, effective and convenient. After extraction of several effective features for these two types of signals, we trained six representational classifiers on each modality, then denoted diversity and correlation of decisions from different classifiers using co-decision tensor and combined these decisions into the ultimate classification result with multi-agent strategy. Experimental results on 170 (81 depressed patients and 89 normal controls) subjects showed that the proposed multi-modal depression detection strategy is superior to the single-modal classifiers or other typical late fusion strategies in accuracy, f1-score and sensitivity. This work indicates that late fusion of pervasive physiological and behavioral signals is promising for depression detection and the multi-agent strategy can take advantage of diversity and correlation of different classifiers effectively to gain a better final decision.


Assuntos
Depressão/diagnóstico , Eletroencefalografia/métodos , Processamento de Sinais Assistido por Computador , Espectrografia do Som/métodos , Fala/classificação , Algoritmos , Feminino , Humanos , Masculino
20.
IEEE Trans Cybern ; 49(9): 3293-3306, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29994138

RESUMO

It is challenging to recognize facial action unit (AU) from spontaneous facial displays, especially when they are accompanied by speech. The major reason is that the information is extracted from a single source, i.e., the visual channel, in the current practice. However, facial activity is highly correlated with voice in natural human communications. Instead of solely improving visual observations, this paper presents a novel audiovisual fusion framework, which makes the best use of visual and acoustic cues in recognizing speech-related facial AUs. In particular, a dynamic Bayesian network is employed to explicitly model the semantic and dynamic physiological relationships between AUs and phonemes as well as measurement uncertainty. Experiments on a pilot audiovisual AU-coded database have demonstrated that the proposed framework significantly outperforms the state-of-the-art visual-based methods in terms of recognizing speech-related AUs, especially for those AUs whose visual observations are impaired during speech, and more importantly is also superior to audio-based methods and feature-level fusion methods, which employ low-level audio features, by explicitly modeling and exploiting physiological relationships between AUs and phonemes.


Assuntos
Face , Reconhecimento Automatizado de Padrão/métodos , Fala , Algoritmos , Teorema de Bayes , Face/anatomia & histologia , Face/fisiologia , Expressão Facial , Músculos Faciais/fisiologia , Humanos , Processamento de Sinais Assistido por Computador , Fala/classificação , Fala/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA