Your browser doesn't support javascript.
loading
Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markers.
Barrett, Liam; Tang, Kevin; Howell, Peter.
Afiliación
  • Barrett L; Department of Experimental Psychology, University College London, London, United Kingdom.
  • Tang K; Department of English Language and Linguistics, Institute of English and American Studies, Faculty of Arts and Humanities, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
  • Howell P; Department of Linguistics, University of Florida, Gainesville, FL, United States.
Front Psychol ; 15: 1155285, 2024.
Article en En | MEDLINE | ID: mdl-38476388
ABSTRACT

Introduction:

Automatic recognition of stutters (ARS) from speech recordings can facilitate objective assessment and intervention for people who stutter. However, the performance of ARS systems may depend on how the speech data are segmented and labelled for training and testing. This study compared two segmentation

methods:

event-based, which delimits speech segments by their fluency status, and interval-based, which uses fixed-length segments regardless of fluency.

Methods:

Machine learning models were trained and evaluated on interval-based and event-based stuttered speech corpora. The models used acoustic and linguistic features extracted from the speech signal and the transcriptions generated by a state-of-the-art automatic speech recognition system.

Results:

The results showed that event-based segmentation led to better ARS performance than interval-based segmentation, as measured by the area under the curve (AUC) of the receiver operating characteristic. The results suggest differences in the quality and quantity of the data because of segmentation method. The inclusion of linguistic features improved the detection of whole-word repetitions, but not other types of stutters.

Discussion:

The findings suggest that event-based segmentation is more suitable for ARS than interval-based segmentation, as it preserves the exact boundaries and types of stutters. The linguistic features provide useful information for separating supra-lexical disfluencies from fluent speech but may not capture the acoustic characteristics of stutters. Future work should explore more robust and diverse features, as well as larger and more representative datasets, for developing effective ARS systems.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Front Psychol Año: 2024 Tipo del documento: Article País de afiliación: Reino Unido Pais de publicación: Suiza

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Front Psychol Año: 2024 Tipo del documento: Article País de afiliación: Reino Unido Pais de publicación: Suiza