Búsqueda | Portal Regional de la BVS

Cumulant GAN.

Pantazis, Yannis; Paul, Dipjyoti; Fasoulakis, Michail; Stylianou, Yannis; Katsoulakis, Markos A.

IEEE Trans Neural Netw Learn Syst ; 34(11): 9439-9450, 2023 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-35385390

RESUMEN

In this article, we propose a novel loss function for training generative adversarial networks (GANs) aiming toward deeper theoretical understanding as well as improved stability and performance for the underlying optimization problem. The new loss function is based on cumulant generating functions (CGFs) giving rise to Cumulant GAN. Relying on a recently derived variational formula, we show that the corresponding optimization problem is equivalent to Rényi divergence minimization, thus offering a (partially) unified perspective of GAN losses: the Rényi family encompasses Kullback-Leibler divergence (KLD), reverse KLD, Hellinger distance, and χ2 -divergence. Wasserstein GAN is also a member of cumulant GAN. In terms of stability, we rigorously prove the linear convergence of cumulant GAN to the Nash equilibrium for a linear discriminator, Gaussian distributions, and the standard gradient descent ascent algorithm. Finally, we experimentally demonstrate that image generation is more robust relative to Wasserstein GAN and it is substantially improved in terms of both inception score (IS) and Fréchet inception distance (FID) when both weaker and stronger discriminators are considered.

A study of time-frequency features for CNN-based automatic heart sound classification for pathology detection.

Bozkurt, Baris; Germanakis, Ioannis; Stylianou, Yannis.

Comput Biol Med ; 100: 132-143, 2018 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-29990646

RESUMEN

This study concerns the task of automatic structural heart abnormality risk detection from digital phonocardiogram (PCG) signals aiming at pediatric heart disease screening applications. Recently, various systems based on convolutional neural networks trained on time-frequency representations of segmental PCG frames have been presented that outperform systems using hand-crafted features. This study focuses on the segmentation and time-frequency representation components of the CNN-based designs. We consider the most commonly used features (MFCC and Mel-Spectrogram) used in state-of-the-art systems and a time-frequency representation influenced by domain-knowledge, namely sub-band envelopes as an alternative feature. Via tests carried on two high quality databases with a large set of possible settings, we show that sub-band envelopes are preferable to the most commonly used features and period synchronous windowing is preferable over asynchronous windowing.

Asunto(s)

Bases de Datos Factuales , Cardiopatías Congénitas , Ruidos Cardíacos , Redes Neurales de la Computación , Procesamiento de Señales Asistido por Computador , Cardiopatías Congénitas/diagnóstico , Cardiopatías Congénitas/fisiopatología , Humanos

Speech Processing to Improve the Perception of Speech in Background Noise for Children With Auditory Processing Disorder and Typically Developing Peers.

Flanagan, Sheila; Zorila, Tudor-Catalin; Stylianou, Yannis; Moore, Brian C J.

Trends Hear ; 22: 2331216518756533, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-29441834

RESUMEN

Auditory processing disorder (APD) may be diagnosed when a child has listening difficulties but has normal audiometric thresholds. For adults with normal hearing and with mild-to-moderate hearing impairment, an algorithm called spectral shaping with dynamic range compression (SSDRC) has been shown to increase the intelligibility of speech when background noise is added after the processing. Here, we assessed the effect of such processing using 8 children with APD and 10 age-matched control children. The loudness of the processed and unprocessed sentences was matched using a loudness model. The task was to repeat back sentences produced by a female speaker when presented with either speech-shaped noise (SSN) or a male competing speaker (CS) at two signal-to-background ratios (SBRs). Speech identification was significantly better with SSDRC processing than without, for both groups. The benefit of SSDRC processing was greater for the SSN than for the CS background. For the SSN, scores were similar for the two groups at both SBRs. For the CS, the APD group performed significantly more poorly than the control group. The overall improvement produced by SSDRC processing could be useful for enhancing communication in a classroom where the teacher's voice is broadcast using a wireless system.

Asunto(s)

Trastornos de la Percepción Auditiva , Ruido , Percepción del Habla , Adolescente , Trastornos de la Percepción Auditiva/fisiopatología , Niño , Femenino , Pruebas Auditivas , Humanos , Masculino , Habla

Evaluation of near-end speech enhancement under equal-loudness constraint for listeners with normal-hearing and mild-to-moderate hearing loss.

Zorila, Tudor-Catalin; Stylianou, Yannis; Flanagan, Sheila; Moore, Brian C J.

J Acoust Soc Am ; 141(1): 189, 2017 01.

Artículo en Inglés | MEDLINE | ID: mdl-28147616

RESUMEN

Four algorithms designed to enhance the intelligibility of speech when noise is added after processing were evaluated under the constraint that the speech should have the same loudness before and after processing, as determined using a loudness model. The algorithms applied spectral modifications and two of them included dynamic-range compression. On average, the methods with dynamic-range compression required the least level adjustment to equate loudness for the unprocessed and processed speech. Subjects with normal-hearing (experiment 1) and mild-to-moderate hearing loss (experiment 2) were tested using unmodified and enhanced speech presented in speech-shaped noise (SSN) and a competing speaker (CS). The results showed (a) the algorithms with dynamic-range compression yielded the largest intelligibility gains in both experiments and for both types of background; (b) the algorithms without dynamic-range compression either yielded benefit only with the SSN or yielded no consistent benefit; (c) speech reception thresholds for unprocessed speech were higher for hearing-impaired than for normal-hearing subjects, by about 2 dB for the SSN and 6 dB for the CS. It is concluded that the enhancement methods incorporating dynamic-range compression can improve intelligibility under the equal-loudness constraint for both normal-hearing and hearing-impaired subjects and for both steady and fluctuating backgrounds.

Effectiveness of a loudness model for time-varying sounds in equating the loudness of sentences subjected to different forms of signal processing.

Zorila, Tudor-Catalin; Stylianou, Yannis; Flanagan, Sheila; Moore, Brian C J.

J Acoust Soc Am ; 140(1): 402, 2016 07.

Artículo en Inglés | MEDLINE | ID: mdl-27475164

RESUMEN

A model for the loudness of time-varying sounds [Glasberg and Moore (2012). J. Audio. Eng. Soc. 50, 331-342] was assessed for its ability to predict the loudness of sentences that were processed to either decrease or increase their dynamic fluctuations. In a paired-comparison task, subjects compared the loudness of unprocessed and processed sentences that had been equalized in (1) root-mean square (RMS) level; (2) the peak long-term loudness predicted by the model; (3) the mean long-term loudness predicted by the model. Method 2 was most effective in equating the loudness of the original and processed sentences.

Asunto(s)

Audiometría del Habla , Percepción Sonora/fisiología , Inteligibilidad del Habla , Adulto , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Modelos Biológicos , Sonido , Percepción del Habla , Factores de Tiempo , Adulto Joven

Evaluating the outcome of phonosurgery: comparing the role of VHI and VoiSS questionnaires in the Greek language.

Kiagiadaki, Devora E; Chimona, Theognosia S; Chlouverakis, Gregory I; Stylianou, Yannis; Proimos, Efklidis K; Papadakis, Chariton E; Bizakis, John G.

J Voice ; 26(3): 372-7, 2012 May.

Artículo en Inglés | MEDLINE | ID: mdl-21839613

RESUMEN

OBJECTIVES/HYPOTHESIS: The objective was to study the role of the Greek version of Voice Handicap Index (VHI) in comparison with Voice Symptom Scale (VoiSS) in terms of measuring voice surgery outcome in patients with benign laryngeal lesions. STUDY DESIGN: Nonrandomized prospective. METHODS: Forty-six patients operated for benign laryngeal lesions were enrolled in the present study. All patients were assessed according to the European Laryngological Society guidelines. In terms of self-evaluation, patients answered the Greek versions of both VHI and VoiSS, preoperatively and 6 weeks postoperatively, and the results were statistically analyzed. RESULTS: The strongest correlation was observed between the functional subscale of VHI and the impairment subscale of VoiSS, as well as, between the emotional subscales of both VHI and VoiSS, pre- and postoperatively. A statistically significant change in subscale and total scores was found. VHI and VoiSS subscales and total scores correlated with the stroboscopic and aerodynamic measurements in a variable manner. Perceptual measurements, as well as shimmer and harmonic-to-noise ratio showed significant correlation with both VHI and VoiSS subscale and total scores postoperatively. CONCLUSION: VHI and VoiSS are considered useful tools in evaluating voice surgery outcome, in the Greek language.

Asunto(s)

Evaluación de la Discapacidad , Lenguaje , Procedimientos Quirúrgicos Otorrinolaringológicos , Encuestas y Cuestionarios , Pliegues Vocales/cirugía , Trastornos de la Voz/cirugía , Calidad de la Voz , Distribución de Chi-Cuadrado , Emociones , Femenino , Grecia , Humanos , Masculino , Persona de Mediana Edad , Oportunidad Relativa , Procedimientos Quirúrgicos Otorrinolaringológicos/efectos adversos , Fonación , Valor Predictivo de las Pruebas , Estudios Prospectivos , Calidad de Vida , Recuperación de la Función , Medición de la Producción del Habla , Estroboscopía , Factores de Tiempo , Resultado del Tratamiento , Grabación en Video , Pliegues Vocales/fisiopatología , Trastornos de la Voz/diagnóstico , Trastornos de la Voz/fisiopatología , Trastornos de la Voz/psicología

On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices.

Arias-Londoño, Julián David; Godino-Llorente, Juan I; Markaki, Maria; Stylianou, Yannis.

Logoped Phoniatr Vocol ; 36(2): 60-9, 2011 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-21073260

RESUMEN

This work presents a novel approach for the automatic detection of pathological voices based on fusing the information extracted by means of mel-frequency cepstral coefficients (MFCC) and features derived from the modulation spectra (MS). The system proposed uses a two-stepped classification scheme. First, the MFCC and MS features were used to feed two different and independent classifiers; and then the outputs of each classifier were used in a second classification stage. In order to establish the best configuration which provides the highest accuracy in the detection, the fusion of information was carried out employing different classifier combination strategies. The experiments were carried out using two different databases: the one developed by The Massachusetts Eye and Ear Infirmary Voice Laboratory, and a database recorded by the Universidad Politécnica de Madrid. The results show that the combination of MFCC and MS features employing the proposed approach yields an improvement in the detection accuracy, demonstrating that both methods of parameterization are complementary.

Asunto(s)

Procesamiento de Señales Asistido por Computador , Medición de la Producción del Habla , Trastornos de la Voz/diagnóstico , Calidad de la Voz , Adolescente , Adulto , Anciano , Algoritmos , Automatización , Niño , Bases de Datos como Asunto , Femenino , Análisis de Fourier , Humanos , Masculino , Persona de Mediana Edad , Reconocimiento de Normas Patrones Automatizadas , Fonación , Valor Predictivo de las Pruebas , Espectrografía del Sonido , Acústica del Lenguaje , Trastornos de la Voz/fisiopatología , Adulto Joven

Using modulation spectra for voice pathology detection and classification.

Markaki, Maria; Stylianou, Yannis.

Annu Int Conf IEEE Eng Med Biol Soc ; 2009: 2514-7, 2009.

Artículo en Inglés | MEDLINE | ID: mdl-19964970

RESUMEN

In this paper, we consider the use of Modulation Spectra for voice pathology detection and classification. To reduce the high-dimensionality space generated by Modulation spectra we suggest the use of Higher Order Singular Value Decomposition (SVD) and we propose a feature selection algorithm based on the Mutual Information between subjective voice quality and computed features. Using SVM with a radial basis function (RBF) kernel as classifier, we conducted experiments on a database of sustained vowel recordings from healthy and pathological voices. For voice pathology detection, the suggested approach achieved a detection rate of 94.1% and an Area Under the Curve (AUC) score of 97.8%. For voice pathology classification, an average detection rate and AUC of 88.6% and 94.8%, respectively, was achieved in classifying polyp against keratosis leukoplakia, adductor spasmodic dysphonia and vocal nodules.

Asunto(s)

Espectrografía del Sonido/instrumentación , Medición de la Producción del Habla/métodos , Trastornos de la Voz/fisiopatología , Calidad de la Voz , Adulto , Algoritmos , Área Bajo la Curva , Femenino , Análisis de Fourier , Humanos , Masculino , Persona de Mediana Edad , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas/métodos , Procesamiento de Señales Asistido por Computador , Espectrografía del Sonido/métodos , Acústica del Lenguaje , Trastornos de la Voz/diagnóstico

Voice pathology detection based eon short-term jitter estimations in running speech.

Vasilakis, Miltiadis; Stylianou, Yannis.

Folia Phoniatr Logop ; 61(3): 153-70, 2009.

Artículo en Inglés | MEDLINE | ID: mdl-19571550

RESUMEN

In this paper, we investigate the use of jitter estimation over short time intervals (short-term jitter) for voice pathology detection in the case of running or continuous speech. Short-term jitter estimations are provided by the spectral jitter estimator (SJE), which is based on a mathematical description of the jitter phenomenon. The SJE has been shown to be robust against errors in pitch period estimations, which makes it a good candidate for measuring jitter in continuous speech. On two large databases of sustained vowel recordings from healthy and pathological voices, we suggest a threshold for the SJE for pathology detection based on cross-database validation. Applying that to a database of continuous speech (reading text) from normophonic and dysphonic speakers, a second threshold and new features are suggested for monitoring jitter in continuous speech. Detection performance of the suggested thresholds and features was evaluated using receiver operating characteristic curves and their discriminative efficiency between healthy and pathological voices was judged using the area under the curve index. In terms of area under the curve, the suggested features for reading text provide a discrimination score of about 95%, while the second threshold provides a classification rate of 87.8%. Furthermore, estimated short-term jitter values from reading text were found to confirm the studies showing a decrease of jitter with increasing fundamental frequencies, and the more frequent presence of high jitter values in the case of pathological voices as time increases.

Asunto(s)

Medición de la Producción del Habla/métodos , Habla , Trastornos de la Voz/diagnóstico , Algoritmos , Área Bajo la Curva , Bases de Datos Factuales , Disfonía/diagnóstico , Humanos , Fonética , Curva ROC , Lectura , Acústica del Lenguaje , Factores de Tiempo

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA