A Privacy-Preserving Unsupervised Speaker Disentanglement Method for Depression Detection from Speech.

Ravi, Vijay; Wang, Jinhan; Flint, Jonathan; Alwan, Abeer

Ravi, Vijay; Wang, Jinhan; Flint, Jonathan; Alwan, Abeer.

Afiliação

Ravi V; Department of Electrical and Computer Engineering, University of California Los Angeles, California, USA 90095.
Wang J; Department of Electrical and Computer Engineering, University of California Los Angeles, California, USA 90095.
Flint J; Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, California, USA 90095.
Alwan A; Department of Electrical and Computer Engineering, University of California Los Angeles, California, USA 90095.

CEUR Workshop Proc ; 3649: 57-63, 2024 Feb.

Article em En | MEDLINE | ID: mdl-38650610

ABSTRACT

ABSTRACT

The proposed method focuses on speaker disentanglement in the context of depression detection from speech signals. Previous approaches require patient/speaker labels, encounter instability due to loss maximization, and introduce unnecessary parameters for adversarial domain prediction. In contrast, the proposed unsupervised approach reduces cosine similarity between latent spaces of depression and pre-trained speaker classification models. This method outperforms baseline models, matches or exceeds adversarial methods in performance, and does so without relying on speaker labels or introducing additional model parameters, leading to a reduction in model complexity. The higher the speaker de-identification score (DeID), the better the depression detection system is in masking a patient's identity thereby enhancing the privacy attributes of depression detection systems. On the DAIC-WOZ dataset with ComparE16 features and an LSTM-only model, our method achieves an F1-Score of 0.776 and a DeID score of 92.87%, outperforming its adversarial counterpart which has an F1Score of 0.762 and 68.37% DeID, respectively. Furthermore, we demonstrate that speaker-disentanglement methods are complementary to text-based approaches, and a score-level fusion with a Word2vec-based depression detection model further enhances the overall performance to an F1-Score of 0.830.

Palavras-chave

DAIC-WOZ; Depression detection; Healthcare AI; Privacy; Speaker disentanglement

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links