Search | VHL Regional Portal

The Silent Treatment?: Changes in patient emotional expression after silence.

Soma, Christina S; Wampold, Bruce; Flemotomos, Nikolaos; Peri, Raghuveer; Narayanan, Shrikanth; Atkins, David C; Imel, Zac E.

Couns Psychother Res ; 23(2): 378-388, 2023 Jun.

Article in English | MEDLINE | ID: mdl-37457038

ABSTRACT

Psychotherapy can be an emotionally laden conversation, where both verbal and non-verbal interventions may impact the therapeutic process. Prior research has postulated mixed results in how clients emotionally react following a silence after the therapist is finished talking, potentially due to studying a limited range of silences with primarily qualitative and self-report methodologies. A quantitative exploration may illuminate new findings. Utilizing research and automatic data processing from the field of linguistics, we analysed the full range of silence lengths (0.2 to 24.01 seconds), and measures of emotional expression - vocally encoded arousal and emotional valence from the works spoken - of 84 audio recordings Motivational Interviewing sessions. We hypothesized that both the level and the variance of client emotional expression would change as a function of silence length, however, due to the mixed results in the literature the direction of emotional change was unclear. We conducted a multilevel linear regression to examine how the level of client emotional expression changed across silence length, and an ANOVA to examine the variability of client emotional expression across silence lengths. Results indicated in both analyses that as silence length increased, emotional expression largely remained the same. Broadly, we demonstrated a weak connection between silence length and emotional expression, indicating no persuasive evidence that silence leads to client emotional processing and expression.

Mel frequency spectral domain defenses against adversarial attacks on speech recognition systems.

Mehlman, Nicholas; Sreeram, Anirudh; Peri, Raghuveer; Narayanan, Shrikanth.

JASA Express Lett ; 3(3): 035208, 2023 03.

Article in English | MEDLINE | ID: mdl-37003705

ABSTRACT

Automatic speech recognition (ASR) systems are vulnerable to adversarial attacks due to their reliance on machine learning models. Many of the defenses explored for defending ASR systems simply adapt defense approaches developed for the image domain. This paper explores speech-specific defenses in the feature domain and introduces a defense method called mel domain noise flooding (MDNF). MDNF injects additive noise to the mel spectrogram speech representation prior to re-synthesizing the audio signal input to ASR. The defense is evaluated against strong white-box threat models and shows competitive robustness.

Subject(s)

Speech Perception , Speech , Speech Recognition Software , Noise/adverse effects , Machine Learning

Automated evaluation of psychotherapy skills using speech and language technologies.

Flemotomos, Nikolaos; Martinez, Victor R; Chen, Zhuohao; Singla, Karan; Ardulov, Victor; Peri, Raghuveer; Caperton, Derek D; Gibson, James; Tanana, Michael J; Georgiou, Panayiotis; Van Epps, Jake; Lord, Sarah P; Hirsch, Tad; Imel, Zac E; Atkins, David C; Narayanan, Shrikanth.

Behav Res Methods ; 54(2): 690-711, 2022 04.

Article in English | MEDLINE | ID: mdl-34346043

ABSTRACT

With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is, however, a cost-prohibitive and time-consuming method that leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called "motivational interviewing", our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist's vs. client's talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients' perspective). We describe our platform and its performance using a dataset of more than 5000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. Widespread use of automated psychotherapy rating tools may augment experts' capabilities by providing an avenue for more effective training and skill improvement, eventually leading to more positive clinical outcomes.

Subject(s)

Professional-Patient Relations , Speech , Humans , Language , Psychotherapy/methods

Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization.

Pal, Monisankha; Kumar, Manoj; Peri, Raghuveer; Park, Tae Jin; Kim, So Hyun; Lord, Catherine; Bishop, Somer; Narayanan, Shrikanth.

IEEE/ACM Trans Audio Speech Lang Process ; 29: 1204-1219, 2021.

Article in English | MEDLINE | ID: mdl-33997106

ABSTRACT

The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN network to improve diarization robustness and enable rapid generalization across various challenging domains. To this end, we fetch the pre-trained encoder from the ClusterGAN and fine tune it by using prototypical loss (meta-ClusterGAN or MCGAN) under the meta-learning paradigm. Experiments are conducted on CALLHOME telephonic conversations, AMI meeting data, DIHARD-II (dev set) which includes challenging multi-domain corpus, and two child-clinician interaction corpora (ADOS, BOSCC) related to the autism spectrum disorder domain. Extensive analyses of the experimental data are done to investigate the effectiveness of the proposed ClusterGAN and MCGAN embeddings over x-vectors. The results show that the proposed embeddings with normalized maximum eigengap spectral clustering (NME-SC) back-end consistently outperform the Kaldi state-of-the-art x-vector diarization system. Finally, we employ embedding fusion with x-vectors to provide further improvement in diarization performance. We achieve a relative diarization error rate (DER) improvement of 6.67% to 53.93% on the aforementioned datasets using the proposed fused embeddings over x-vectors. Besides, the MCGAN embeddings provide better performance in the number of speakers estimation and short speech segment diarization compared to x-vectors and ClusterGAN on telephonic conversations.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL