Your browser doesn't support javascript.
loading
Speaker-turn aware diarization for speech-based cognitive assessments.
Xu, Sean Shensheng; Ke, Xiaoquan; Mak, Man-Wai; Wong, Ka Ho; Meng, Helen; Kwok, Timothy C Y; Gu, Jason; Zhang, Jian; Tao, Wei; Chang, Chunqi.
Afiliación
  • Xu SS; School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China.
  • Ke X; Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China.
  • Mak MW; Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China.
  • Wong KH; Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
  • Meng H; Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
  • Kwok TCY; Jockey Club Centre for Osteoporosis Care and Control, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
  • Gu J; Department of Electrical & Computer Engineering, Dalhousie University, Halifax, NS, Canada.
  • Zhang J; School of Pharmacy, Shenzhen University Medical School, Shenzhen University, Shenzhen, China.
  • Tao W; Department of Neurosurgery, South China Hospital of Shenzhen University, Shenzhen, China.
  • Chang C; School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China.
Front Neurosci ; 17: 1351848, 2023.
Article en En | MEDLINE | ID: mdl-38292896
ABSTRACT

Introduction:

Speaker diarization is an essential preprocessing step for diagnosing cognitive impairments from speech-based Montreal cognitive assessments (MoCA).

Methods:

This paper proposes three enhancements to the conventional speaker diarization methods for such assessments. The enhancements tackle the challenges of diarizing MoCA recordings on two fronts. First, multi-scale channel interdependence speaker embedding is used as the front-end speaker representation for overcoming the acoustic mismatch caused by far-field microphones. Specifically, a squeeze-and-excitation (SE) unit and channel-dependent attention are added to Res2Net blocks for multi-scale feature aggregation. Second, a sequence comparison approach with a holistic view of the whole conversation is applied to measure the similarity of short speech segments in the conversation, which results in a speaker-turn aware scoring matrix for the subsequent clustering step. Third, to further enhance the diarization performance, we propose incorporating a pairwise similarity measure so that the speaker-turn aware scoring matrix contains both local and global information across the segments.

Results:

Evaluations on an interactive MoCA dataset show that the proposed enhancements lead to a diarization system that outperforms the conventional x-vector/PLDA systems under language-, age-, and microphone-mismatch scenarios.

Discussion:

The results also show that the proposed enhancements can help hypothesize the speaker-turn timestamps, making the diarization method amendable to datasets without timestamp information.
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Idioma: En Revista: Front Neurosci Año: 2023 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Banco de datos: MEDLINE Idioma: En Revista: Front Neurosci Año: 2023 Tipo del documento: Article País de afiliación: China