Search | VHL Regional Portal

A Voice Disease Detection Method Based on MFCCs and Shallow CNN.

Xie, Xiaoping; Cai, Hao; Li, Can; Wu, Yu; Ding, Fei.

J Voice ; 2023 Oct 25.

Article in English | MEDLINE | ID: mdl-37891129

ABSTRACT

The incidence rate of voice diseases is increasing year by year. The use of software for remote diagnosis is a technical development trend and has important practical value. Among voice diseases, common diseases that cause hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and vocal cord polyp. This paper presents a voice disease detection method that can be applied in a wide range of clinical. We cooperated with Xiangya Hospital of Central South University to collect voice samples from 352 different patients. The Mel Frequency Cepstrum Coefficient (MFCC) parameters are extracted as input features to describe the voice in the form of data. An innovative model combining MFCC parameters and single convolution layer CNN is proposed for fast calculation and classification. The highest accuracy we achieved was 92%, it is fully ahead of the original research results and internationally advanced. And we use advanced voice function assessment databases (AVFAD) to evaluate the generalization ability of the method we proposed, which achieved an accuracy rate of 98%. Experiments on clinical and standard datasets show that for the pathological detection of voice diseases, our method has greatly improved in accuracy and computational efficiency.

PVR-AFM: A Pathological Voice Repair System based on Non-linear Structure.

Zhang, Tao; Liu, Xiaonan; Liu, Ganjun; Shao, Yangyang.

J Voice ; 37(5): 648-662, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37717981

ABSTRACT

OBJECTIVE: Speech signal processing has become an important technique to ensure that the voice interaction system communicates accurately with the user by improving the clarity or intelligibility of speech signals. However, most existing works only focus on whether to process the voice of average human but ignore the communication needs of individuals suffering from voice disorder, including voice-related professionals, older people, and smokers. To solve this demand, it is essential to design a non-invasive repair system that processes pathological voices. METHODS: In this paper, we propose a repair system for multiple polyp vowels, such as /a/, /i/ and /u/. We utilize a non-linear model based on amplitude-modulation (AM) and a frequency-modulation (FM) structure to extract the pitch and formant of pathological voice. To solve the fracture and instability of pitch, we provide a pitch extraction algorithm, which ensures that pitch's stability and avoids the errors of double pitch caused by the instability of low-frequency signal. Furthermore, we design a formant reconstruction mechanism, which can effectively determine the frequency and bandwidth to accomplish formant repair. RESULTS: Finally, spectrum observation and objective indicators show that the system has better performance in improving the intelligibility of pathological speech.

Subject(s)

Voice Disorders , Voice , Humans , Aged , Speech , Voice Disorders/diagnosis , Algorithms , Cognition

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL