Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task.

Shitov, Denis; Pirogova, Elena; Wysocki, Tadeusz A; Lech, Margaret.

Sensors (Basel) ; 23(7)2023 Mar 24.

Artigo em Inglês | MEDLINE | ID: mdl-37050496

RESUMO

Articulatory synthesis is one of the approaches used for modeling human speech production. In this study, we propose a model-based algorithm for learning the policy to control the vocal tract of the articulatory synthesizer in a vowel-to-vowel imitation task. Our method does not require external training data, since the policy is learned through interactions with the vocal tract model. To improve the sample efficiency of the learning, we trained the model of speech production dynamics simultaneously with the policy. The policy was trained in a supervised way using predictions of the model of speech production dynamics. To stabilize the training, early stopping was incorporated into the algorithm. Additionally, we extracted acoustic features using an acoustic word embedding (AWE) model. This model was trained to discriminate between different words and to enable compact encoding of acoustics while preserving contextual information of the input. Our preliminary experiments showed that introducing this AWE model was crucial to guide the policy toward a near-optimal solution. The acoustic embeddings, obtained using the proposed approach, were revealed to be useful when applied as inputs to the policy and the model of speech production dynamics.

Assuntos

Fonética , Acústica da Fala , Humanos , Comportamento Imitativo , Fala , Aprendizagem , Medida da Produção da Fala

Simultaneous Sleep Stage and Sleep Disorder Detection from Multimodal Sensors Using Deep Learning.

Cheng, Yi-Hsuan; Lech, Margaret; Wilkinson, Richardt Howard.

Sensors (Basel) ; 23(7)2023 Mar 26.

Artigo em Inglês | MEDLINE | ID: mdl-37050528

RESUMO

Sleep scoring involves the inspection of multimodal recordings of sleep data to detect potential sleep disorders. Given that symptoms of sleep disorders may be correlated with specific sleep stages, the diagnosis is typically supported by the simultaneous identification of a sleep stage and a sleep disorder. This paper investigates the automatic recognition of sleep stages and disorders from multimodal sensory data (EEG, ECG, and EMG). We propose a new distributed multimodal and multilabel decision-making system (MML-DMS). It comprises several interconnected classifier modules, including deep convolutional neural networks (CNNs) and shallow perceptron neural networks (NNs). Each module works with a different data modality and data label. The flow of information between the MML-DMS modules provides the final identification of the sleep stage and sleep disorder. We show that the fused multilabel and multimodal method improves the diagnostic performance compared to single-label and single-modality approaches. We tested the proposed MML-DMS on the PhysioNet CAP Sleep Database, with VGG16 CNN structures, achieving an average classification accuracy of 94.34% and F1 score of 0.92 for sleep stage detection (six stages) and an average classification accuracy of 99.09% and F1 score of 0.99 for sleep disorder detection (eight disorders). A comparison with related studies indicates that the proposed approach significantly improves upon the existing state-of-the-art approaches.

Assuntos

Aprendizado Profundo , Transtornos do Sono-Vigília , Humanos , Eletroencefalografia/métodos , Sono , Fases do Sono , Transtornos do Sono-Vigília/diagnóstico

A Complete Key Management Scheme for LoRaWAN v1.1.

Chen, Xingda; Lech, Margaret; Wang, Liuping.

Sensors (Basel) ; 21(9)2021 Apr 23.

Artigo em Inglês | MEDLINE | ID: mdl-33922603

RESUMO

Security is one of the major concerns of the Internet of Things (IoT) wireless technologies. LoRaWAN is one of the emerging Low Power Wide Area Networks being developed for IoT applications. The latest LoRaWAN release v.1.1 has provided a security framework that includes data confidentiality protection, data integrity check, device authentication and key management. However, its key management part is only ambiguously defined. In this paper, a complete key management scheme is proposed for LoRaWAN. The scheme addresses key updating, key generation, key backup, and key backward compatibility. The proposed scheme was shown not only to enhance the current LoRaWAN standard, but also to meet the primary design consideration of LoRaWAN, i.e., low power consumption.

Evaluating deep learning architectures for Speech Emotion Recognition.

Fayek, Haytham M; Lech, Margaret; Cavedon, Lawrence.

Neural Netw ; 92: 60-68, 2017 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-28396068

RESUMO

Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances.

Assuntos

Emoções , Aprendizado de Máquina , Interface para o Reconhecimento da Fala , Redes Neurais de Computação

Automatic evaluation of hypernasality based on a cleft palate speech database.

He, Ling; Zhang, Jing; Liu, Qi; Yin, Heng; Lech, Margaret; Huang, Yunzhi.

J Med Syst ; 39(5): 61, 2015 May.

Artigo em Inglês | MEDLINE | ID: mdl-25814462

RESUMO

The hypernasality is one of the most typical characteristics of cleft palate (CP) speech. The evaluation outcome of hypernasality grading decides the necessity of follow-up surgery. Currently, the evaluation of CP speech is carried out by experienced speech therapists. However, the result strongly depends on their clinical experience and subjective judgment. This work aims to propose an automatic evaluation system for hypernasality grading in CP speech. The database tested in this work is collected by the Hospital of Stomatology, Sichuan University, which has the largest number of CP patients in China. Based on the production process of hypernasality, source sound pulse and vocal tract filter features are presented. These features include pitch, the first and second energy amplified frequency bands, cepstrum based features, MFCC, short-time energy in the sub-bands features. These features combined with KNN classier are applied to automatically classify four grades of hypernasality: normal, mild, moderate and severe. The experiment results show that the proposed system achieves a good performance. The classification rates for four hypernasality grades reach up to 80.4%. The sensitivity of proposed features to the gender is also discussed.

Assuntos

Fissura Palatina/complicações , Processamento de Sinais Assistido por Computador , Distúrbios da Fala/etiologia , Distúrbios da Fala/fisiopatologia , Criança , Pré-Escolar , China , Feminino , Humanos , Masculino

Multichannel weighted speech classification system for prediction of major depression in adolescents.

Ooi, Kuan Ee Brian; Lech, Margaret; Allen, Nicholas B.

IEEE Trans Biomed Eng ; 60(2): 497-506, 2013 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-23192475

RESUMO

Early identification of adolescents at high imminent risk for clinical depression could significantly reduce the burden of the disease. This study demonstrated that acoustic speech analysis and classification can be used to determine early signs of major depression in adolescents, up to two years before they meet clinical diagnostic criteria for the full-blown disorder. Individual contributions of four different types of acoustic parameters [prosodic, glottal, Teager's energy operator (TEO), and spectral] to depression-related changes of speech characteristics were examined. A new computational methodology for the early prediction of depression in adolescents was developed and tested. The novel aspect of this methodology is in the introduction of multichannel classification with a weighted decision procedure. It was observed that single-channel classification was effective in predicting depression with a desirable specificity-to-sensitivity ratio and accuracy higher than chance level only when using glottal or prosodic features. The best prediction performance was achieved with the new multichannel method, which used four features (prosodic, glottal, TEO, and spectral). In the case of the person-based approach with two sets of weights, the new multichannel method provided a high accuracy level of 73% and the sensitivity-to-specificity ratio of 79%/67% for predicting future depression.

Assuntos

Transtorno Depressivo Maior/diagnóstico , Processamento de Sinais Assistido por Computador , Acústica da Fala , Fala/classificação , Adolescente , Análise de Variância , Criança , Bases de Dados Factuais , Feminino , Humanos , Masculino

Detection of clinical depression in adolescents' speech during family interactions.

Low, Lu-Shih Alex; Maddage, Namunu C; Lech, Margaret; Sheeber, Lisa B; Allen, Nicholas B.

IEEE Trans Biomed Eng ; 58(3): 574-86, 2011 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-21075715

RESUMO

The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patients' clinical interviews or fixed-text reading sessions. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. This study investigated acoustic correlates of depression in a large sample of 139 adolescents (68 clinically depressed and 71 controls). Speech recordings were made during naturalistic interactions between adolescents and their parents. Prosodic, cepstral, spectral, and glottal features, as well as features derived from the Teager energy operator (TEO), were tested within a binary classification framework. Strong gender differences in classification accuracy were observed. The TEO-based features clearly outperformed all other features and feature combinations, providing classification accuracy ranging between 81%-87% for males and 72%-79% for females. Close, but slightly less accurate, results were obtained by combining glottal features with prosodic and spectral features (67%-69% for males and 70%-75% for females). These findings indicate the importance of nonlinear mechanisms associated with the glottal flow formation as cues for clinical depression.

Assuntos

Depressão/diagnóstico , Diagnóstico por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Psicologia do Adolescente , Acústica da Fala , Fala/fisiologia , Adolescente , Depressão/fisiopatologia , Relações Familiares , Feminino , Humanos , Masculino

Video-based detection of the clinical depression in adolescents.

Maddage, Namunu C; Senaratne, Rajinda; Low, Lu-Shih Alex; Lech, Margaret; Allen, Nicholas.

Annu Int Conf IEEE Eng Med Biol Soc ; 2009: 3723-6, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19965012

RESUMO

We proposed a framework to detect the video contents of depressed and non-depressed subjects. First we characterized the expressed emotions in the video stream using Gabor wavelet features extracted at the facial landmarks which were detected using landmark model matching algorithm. Depressed and non-depressed class models were constructed using Gaussian Mixture models. Using 8 hours of video recordings, an hour of video recording per subject, and both gender and class balanced, we examined the effectiveness of both gender based and gender independent modeling approaches for depressed and non-depressed content classification. We found that the gender based content modeling approach improved the classification accuracy by 6% compared to the gender independent modeling approach, achieving 78.6% average accuracy.

Assuntos

Depressão/diagnóstico , Depressão/patologia , Face , Reconhecimento Automatizado de Padrão/métodos , Adolescente , Algoritmos , Inteligência Artificial , Criança , Feminino , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Masculino , Distribuição Normal , Reprodutibilidade dos Testes , Fatores Sexuais , Gravação em Vídeo

Prostatic carcinoma causing urethral obstruction and obstipation in a cat.

LeRoy, Bruce E; Lech, Margaret E.

J Feline Med Surg ; 6(6): 397-400, 2004 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-15546773

RESUMO

A 9-year-old intact male cat was presented for vomiting and straining to defecate. A large abdominal mass was palpated. The urinary bladder was full and non-expressible. Exploratory laparotomy revealed that the mass was compressing the colon and encircling the urethra caudal to the bladder. The mass was removed, the urethra transected, and the urinary bladder marsupialized to the ventral abdominal wall to allow urine drainage. Histopathologic examination of the mass revealed a prostatic carcinoma. The cat died approximately 6 weeks after removal of the mass. This is the first reported case of a prostatic carcinoma causing urethral obstruction and obstipation in a cat.

Assuntos

Doenças do Gato/diagnóstico , Doenças do Gato/cirurgia , Constipação Intestinal/veterinária , Neoplasias da Próstata/veterinária , Obstrução Uretral/veterinária , Animais , Gatos , Constipação Intestinal/etiologia , Constipação Intestinal/cirurgia , Evolução Fatal , Masculino , Neoplasias da Próstata/complicações , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/cirurgia , Fatores de Tempo , Obstrução Uretral/etiologia , Obstrução Uretral/cirurgia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA