RESUMEN
The harmonics-to-noise ratio (HNR) and other spectral noise parameters are important in clinical objective voice assessment as they could indicate the presence of nonharmonic phenomena, which are tied to the perception of hoarseness or breathiness. Existing HNR estimators are built on the voice signals to be nearly periodic (fixed over a short period), although voice pathology could induce involuntary slow modulation to void this assumption. This paper proposes the use of a deterministically time-varying harmonic model to improve the HNR measurements. To estimate the time-varying model, a two-stage iterative least squares algorithm is proposed to reduce model overfitting. The efficacy of the proposed HNR estimator is demonstrated with synthetic signals, simulated tremor signals, and recorded acoustic signals. Results indicate that the proposed algorithm can produce consistent HNR measures as the extent and rate of tremor are varied.
Asunto(s)
Temblor , Voz , Acústica , Humanos , Ruido , Acústica del LenguajeRESUMEN
High-speed videoendoscopy (HSV) enables observation of the true vibratory behavior of the vocal folds. To quantify the vocal fold vibration captured by the HSV, lateral movement features (e.g., glottal width and vocal fold edge displacements) have been extracted as functions of time. The most common analysis method is to extract the features on a lateral strip used to form digital kymogram. The weakness of this method is that it can only capture the vibrational behavior local to the strip location. While the multi-line kymographic approach has been utilized to capture the spatial diversity, the observation points are either fixed or manually positioned. Behaviors of pathological vocal folds, especially those with lesions, are expected to be spatially diverse and also diverse among speakers, making fixed observation points ineffective. This paper proposes a technique to synthesize kymographic waveforms from full spatiotemporal HSV feature data to extract distinctive behaviors automatically. Each synthesized waveform represents a non-overlapping section of the glottis, where vocal folds are locally behaving homogeneously. The efficacy of the algorithm is demonstrated with four HSV recordings (three pathological) and discussed, including mitigation of the known drawbacks.
Asunto(s)
Quimografía , Glotis , Fonación , Vibración , Grabación en Video , Pliegues VocalesRESUMEN
PURPOSE: Vocal fold asymmetry creates irregular entrainments and modulations in voice, which may lead to rough perceptual quality. The presence of asymmetry can also cause mid-phonation bifurcations where a small change in the phonatory system causes a drastic change in vibration pattern, resulting in transitions in and out of rough voice. This study surveys sustained phonation recordings of speakers with the diagnoses of vocal fold polyp or unilateral vocal fold paralysis to investigate the resulting voice patterns. METHOD: This retrospective study observed 71 sustained phonation recordings from 48 patients. Segments with distinctive signal patterns were identified within each recording with narrowband spectrogram and computer-assisted analysis of spectral peaks. RESULTS: Phonation segmentation yielded 240 segments across all the recordings. Five voice patterns were recognized: (regularly or irregularly) entrained, modulated, uncoupled, unstable, and pulsed. Thirty-six patients (75%) exhibited irregular patterns. No single irregular pattern lasted for the entire phonation and was always accompanied by at least one mid-phonation bifurcation. Durations of the irregular segments (M = 0.4 s) were significantly shorter than the segments with the regular pattern (M = 1.4 s). CONCLUSIONS: The results suggest that vocal fold pathology frequently introduces dynamic vibratory patterns that affect both the acoustic signals and perceptions. Due to these abnormalities, it is important for clinical voice assessment protocols, both perceptual and acoustic, to account for these possible bifurcations, irregular signal patterns, and their tendencies.
Asunto(s)
Pliegues Vocales , Voz , Humanos , Estudios Retrospectivos , Fonación , Acústica , VibraciónRESUMEN
OBJECTIVES: This paper reports the effectiveness of formant-aware spectral parameters to predict the perceptual breathiness rating. A breathy voice has a steeper spectral slope and higher turbulent noise than a normal voice. Measuring spectral parameters of acoustic signals over lower formant regions is a known approach to capture the properties related to breathiness. This study examines this approach by testing the contemporary spectral parameters and algorithms within the framework, alternate frequency band designs, and vowel effects. METHODS: Sustained vowel recordings (/a/, /i/, and /u/) of speakers with voice disorders in the German Saarbrueken Voice Database were considered (n: 367). Recordings with signal irregularities, such as subharmonics or with roughness perception, were excluded from the study. Four speech language pathologists perceptually rated the recordings for breathiness on a 100-point scale, and their averages were used in the analysis. The acoustic spectra were segmented into four frequency bands according to the vowel formant structures. Five spectral parameters (intraband harmonics-to-noise ratio, HNR; interband harmonics ratio, HHR; interband noise ratio, NNR; and interband glottal-to-noise energy, GNE, ratio) were evaluated in each band to predict the perceptual breathiness rating. Four HNR algorithms were tested. RESULTS: Multiple linear regression models of spectral parameters, led by the HNRs, were shown to explain up to 85% of the variance in perceptual breathiness ratings. This performance exceeded that of the acoustic breathiness index (82%). Individually, the HNR over the first two formants best explained the variances in the breathiness (78%), exceeding the smoothed cepstrum peak prominence (74%). The performance of HNR was highly algorithm dependent (10% spread). Some vowel effects were observed in the perceptual rating (higher for /u/), predictability (5% lower for /u/), and model parameter selections. CONCLUSIONS: Strong per-vowel breathiness acoustic models were found by segmenting the spectrum to isolate the portion most affected by breathiness.
RESUMEN
PURPOSE: This research note illustrates the effects of video data with nonsquare pixels on the pixel-based measures obtained from videofluoroscopic swallow studies (VFSS). METHOD: Six pixel-based distance and area measures were obtained from two different videoflouroscopic study units; both yielding videos with nonsquare pixels with different pixel aspect ratios (PARs). The swallowing measures were obtained from the original VFSS videos and from the videos after their pixels were squared. RESULTS: The results demonstrated significant multivariate effects both in video type (original vs. squared) and in the interaction between video type and sample (two video recordings of different patients, different PARs, and opposing tilt angles of the external reference). A wide range of variabilities was observed on the pixel-based measures between original and squared videos with the percent deviation ranging from 0.1% to 9.1% with the maximum effect size of 7.43. CONCLUSIONS: This research note demonstrates the effect of disregarding PAR to distance and area pixel-based parameters. In addition, we present a multilevel roadmap to prevent possible measurement errors that could occur. At the planning stage, the PAR of video source should be identified, and, at the analyses stage, video data should be prescaled prior to analysis with PAR-unaware software. No methodology in prior absolute or relative pixel-based studies reports adjustment to the PAR prior to measurements nor identify the PAR as a possible source of variation within the literature. Addressing PAR will improve the precision and stability of pixel-based VFSS findings and improve comparability within and across clinical and research settings. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21957134.
Asunto(s)
Trastornos de Deglución , Humanos , Trastornos de Deglución/diagnóstico por imagen , Deglución , Grabación en Video/métodos , Programas Informáticos , Fluoroscopía/métodosRESUMEN
High-speed videoendoscopy (HSV) enables the observation and measurement of vibratory behaviors of vocal folds by capturing the laryngeal imagery at high frame rates. The frame rates of commercially available HSVs, however, are still limited to carry out sample-based time-domain objective analyses. To mitigate the resulting lack of temporal resolution, existing studies have employed sum-of-harmonics parametric models to evaluate temporal vocal-fold behaviors. This paper focuses on the other weakness of HSV: its inherent susceptibility to temporal aliasing. Aliasing occurs when there are substantial harmonics above the Nyquist frequency of the HSV camera, and video cameras offer very little means to filter out these harmonics. Although the aliasing effect in HSV data minimally affects many of the laryngeal objective parameter measurements, some parameters, such as harmonics-to-noise ratio and derivative-based parameters, are sensitive to the aliased content. The use of a parametric model with a careful selection of the number of harmonics enables classification of the aliased harmonics as a part of the harmonic signal. Glottal area waveform examples are included to illustrate the modeling capability for cases of normal and disordered vocal folds.
Asunto(s)
Simulación por Computador , Laringoscopía , Laringe/fisiopatología , Modelos Biológicos , Fonación , Grabación en Video , Pliegues Vocales/fisiopatología , Fenómenos Biomecánicos , Diseño de Equipo , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Enfermedades de la Laringe/patología , Enfermedades de la Laringe/fisiopatología , Laringoscopios , Laringoscopía/instrumentación , Laringe/patología , Factores de Tiempo , Vibración , Grabación en Video/instrumentación , Pliegues Vocales/patologíaRESUMEN
Purpose: The purpose of this study was to identify the extent to which 7 measures of glottal area timing and regularity differ between older adults with and without age-related dysphonia (ARD). Method: Laryngeal high-speed videoendoscopy was completed at 4,000 frames per second for 42 adults aged 70 years and older (ARD: 9 female, 5 male; control group: 15 female, 13 male). Relative glottal gap, open quotient, speed index, maximum area declination rate, harmonics-to-noise ratio, harmonic richness factor, and standard deviation of fundamental frequency were measured from a 0.5-s segment of the glottal area waveform. Eta squared (η2) was computed to estimate group effect. Results: Small effect sizes (η2 = .18-.35) were present for relative glottal gap, open quotient, maximum area declination rate, harmonic richness factor, and standard deviation of fundamental frequency. Speed index and glottal harmonics-to-noise ratio did not explain group membership (η2 = .001 and .05, respectively). Conclusion: These findings provide evidence that vocal fold vibration in ARD is different than in normal aging, whereas the overlap in values for every measure is consistent with the concept that normal aging and ARD exist as a continuum of health and disease.
Asunto(s)
Envejecimiento , Disfonía/fisiopatología , Pliegues Vocales/fisiopatología , Acústica , Factores de Edad , Anciano , Anciano de 80 o más Años , Fenómenos Biomecánicos , Estudios de Casos y Controles , Disfonía/diagnóstico , Femenino , Humanos , Laringoscopía , Masculino , Acústica del Lenguaje , Percepción del Habla , Medición de la Producción del Habla , Estroboscopía , Factores de Tiempo , Vibración , Grabación en Video , Calidad de la VozRESUMEN
OBJECTIVES: This study investigated the effect of menstrual cycle on vocal fold vibratory characteristics in young women using high-speed digital imaging. This study examined the menstrual phase effect on five objective high-speed imaging parameters and two self-rated perceptual parameters. The effects of oral birth control use were also investigated. METHODS: Thirteen subjects with no prior voice complaints were included in this study. All data were collected at three different time periods (premenses, postmenses, ovulation) over the course of one menstrual cycle. For five of the 13 subjects, data were collected for two consecutive cycles. Six of 13 subjects were oral birth control users. From high-speed imaging data, five objective parameters were computed: fundamental frequency, fundamental frequency deviation, harmonics-to-noise ratio, harmonic richness factor, and ratio of first and second harmonics. They were supplemented by two self-rated parameters: Reflux Severity Index and perceptual voice quality rating. Analysis included mixed model linear analysis with repeated measures. RESULTS: Results indicated no significant main effects for menstrual phase, between-cycle, or birth control use in the analysis for mean fundamental frequency, fundamental frequency deviation, harmonics-to-noise ratio, harmonic richness factor, first and second harmonics, Reflux Severity Index, and perceptual voice quality rating. Additionally, there were no interaction effects. CONCLUSIONS: Hormone fluctuations observed across the menstrual cycle do not appear to have direct effect on vocal fold vibratory characteristics in young women with no voice concerns. Birth control use, on the other hand, may have influence on spectral richness of vocal fold vibration.
Asunto(s)
Laringoscopía/métodos , Ciclo Menstrual , Fonación , Grabación en Video , Pliegues Vocales/fisiología , Calidad de la Voz , Adulto , Fenómenos Biomecánicos , Anticonceptivos Hormonales Orales/administración & dosificación , Femenino , Humanos , Interpretación de Imagen Asistida por Computador , Ciclo Menstrual/efectos de los fármacos , Fonación/efectos de los fármacos , Valor Predictivo de las Pruebas , Factores de Tiempo , Vibración , Pliegues Vocales/diagnóstico por imagen , Pliegues Vocales/efectos de los fármacos , Calidad de la Voz/efectos de los fármacos , Adulto JovenRESUMEN
OBJECTIVES: This study aimed to investigate the effects of varying volume, pitch, and phonation types on the initiation and termination phases of vocal fold oscillation using high-speed digital videoendoscopy. Specifically, it addressed the effects of the variation of volume, pitch, and phonation type (normal, pressed, and breathy) on the transient duration of the vibrating glottal length (length transient duration, Tlen), the transient duration of the glottal area waveform (area transient duration, Tarea), the time offset between the beginning (or the end) of the full-length vibration and the full-amplitude vibration, TΔ, and the variation of the fundamental frequency during the vocal fold oscillation initiation and termination segments (pitch instability, %PI). METHODS: A female subject with no voice problem produced voices with varying pitch and loudness, including comfortable pitch and comfortable loudness, normal pitch loud, high pitch and comfortable loudness, and high pitch and loud. Breathy and pressed phonations were also recorded. Each of the six phonation types was recorded six times, which resulted in 72 transient segments (each recording included both initiation and termination phases). Mixed model statistical analyses were employed to the five objective high-speed digital videoendoscopy parameters. RESULTS: Preliminary findings demonstrated significant findings for voice type effects for the length and area transient durations for the oscillation initiation segment but not for the oscillation termination segment. CONCLUSIONS: This study demonstrates that voice types appear to influence vibration initiation patterns more than the vibration termination patterns.
Asunto(s)
Laringoscopía , Fonación , Acústica del Lenguaje , Grabación en Video , Pliegues Vocales/fisiología , Calidad de la Voz , Fenómenos Biomecánicos , Femenino , Humanos , Interpretación de Imagen Asistida por Computador , Oscilometría , Reconocimiento de Normas Patrones Automatizadas , Factores de Tiempo , Vibración , Pliegues Vocales/anatomía & histologíaRESUMEN
High-speed videoendoscopy excels in the ability to observe the vocal-fold oscillatory patterns during voice initiation and termination. The initial and most critical step in the analysis of these transient regions is to identify the locations of these transient periods, that is, determining when the vocal-fold oscillation is absent and when the oscillation has reached its steady-state behavior. The latter is more challenging as the "steady" oscillation during sustained phonation is not truly steady and is expected to vary over time. This variation may cause unreliable identification of the transient periods, possibly resulting in less accurate or less reliable parameter measurements. An oscillation feature that is relatively consistent in the steady state is the glottal length, that is, the extent of the oscillation along vocal folds. This paper proposes an autonomous algorithm to estimate the vocal-fold oscillation length and its use to detect four transient events: oscillation onset and offset, and attainment and loss of full-length oscillation. The detected event markers are intended to be used to improve the transient parameter measurements. The autonomous algorithm manipulates the set of glottal width waveforms spatiotemporally to estimate the oscillation length. Examples with in vivo high-speed videoendoscopy recordings of both normal and pathological cases are included to show the efficacy of the proposed algorithm to identify the transient markers.
Asunto(s)
Interpretación de Imagen Asistida por Computador/métodos , Laringoscopía/métodos , Fonación , Grabación en Video/métodos , Pliegues Vocales/fisiopatología , Trastornos de la Voz/diagnóstico , Calidad de la Voz , Algoritmos , Automatización , Fenómenos Biomecánicos , Estudios de Casos y Controles , Humanos , Oscilometría , Valor Predictivo de las Pruebas , Factores de Tiempo , Vibración , Trastornos de la Voz/fisiopatologíaRESUMEN
PURPOSE: The model-based quantitative analysis of high-speed videoendoscopy (HSV) data at a low frame rate of 2,000 frames per second was assessed for its clinical adequacy. Stepwise regression was employed to evaluate the HSV parameters using harmonic models and their relationships to the Voice Handicap Index (VHI). Also, the model-based HSV parameters were compared with those using conventional analysis techniques. METHOD: Eight pairs of HSV recordings of vocal folds before and after surgery for benign lesions were investigated. Five glottal area waveform features-fundamental frequency (F0), open quotient (OQ), speed index (SI), relative glottal gap (RGG), and harmonics-to-noise ratio (HNR)-were measured using model-based and conventional approaches. The statistical analyses were conducted on the mean (M) and standard deviation (SD) of the feature measurements over 1 s during sustained phonation. RESULTS: Two model-based HSV parameters, OQ M (ρ = .67) and HNR M (ρ = -.56), were selected and explained 55% of the VHI variation. The conventional techniques yielded a regression model with OQ SD (ρ = -.60) and F0 SD (ρ = .44), explaining 61% of the VHI variation. CONCLUSIONS: Although the selected model-based HSV parameters explained less variation in the VHI than the conventionally computed HSV parameters, the behaviors of the model-based parameters were more consistent with expectations and theory than the conventional analysis techniques.
Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Laringe/fisiología , Modelos Biológicos , Pliegues Vocales/fisiología , Pliegues Vocales/cirugía , Voz/fisiología , Quistes/fisiopatología , Quistes/cirugía , Endoscopía/métodos , Femenino , Estudios de Seguimiento , Glotis/fisiología , Humanos , Masculino , Persona de Mediana Edad , Pólipos/fisiopatología , Pólipos/cirugía , Periodo Posoperatorio , Periodo Preoperatorio , Estudios Retrospectivos , Acústica del Lenguaje , Vibración , Adulto JovenRESUMEN
One of the critical requirements for high-speed videoendoscopy (HSV) to become a clinically useful tool is to pair it with a technique, which provides a quick overview of the vast amount of HSV data and rapidly identifies the best video segments for subjective and objective analyses. This article proposes intensity-based representations that are easily computed from the HSV data and can be used to identify the HSV features quickly. The first representation-termed the Quick Vibratory Profile (QVP)-is an HSV-based one-dimensional waveform that captures the vocal fold vibration as well as nonglottic activities. The QVP can be used in a wide range of experimental and clinical studies to select appropriate HSV recording segments quickly without extensive review of the actual video frames. Moreover, this article proposes a pair of spatial profiles to locate the vibrating vocal folds within the HSV frames. These profiles are useful in automation of objective assessments as their use together with the QVP are demonstrated in a proposed cyclewise three-dimensional glottal area segmentation. The article illustrates the usefulness of these proposed representations with examples.
Asunto(s)
Interpretación de Imagen Asistida por Computador , Laringoscopía/métodos , Laringe/fisiología , Fonación , Grabación en Video , Voz , Algoritmos , Fenómenos Biomecánicos , Humanos , Valor Predictivo de las Pruebas , Espectrografía del Sonido , Factores de Tiempo , Vibración , Pliegues Vocales/fisiologíaRESUMEN
This article presents a novel approach to analyze nonperiodic vocal fold behavior of high-speed videoendoscopy (HSV) data. Although HSV can capture true vibrational motions of the vocal folds, its clinical advantage over the videostroboscopy has not widely been accepted. One of the key advantages of the HSV over the videostroboscopy is its ability to capture vocal folds' nonperiodic behavior, which is more prominent in pathological vocal folds. However, such nonperiodicity in the HSV data has not been fully explored quantitatively beyond simple perturbation analysis. This article presents an advanced waveform modeling and decomposition technique for HSV-based waveforms. Waveforms are modeled to have three components: harmonic signal, deterministic nonharmonic signal, and random nonharmonic signal. This decomposition is motivated by the fact that voice disorders introduce signal content that is nonharmonic but carries deterministic quality such as subharmonic or modulating content. The proposed model is aimed to isolate such disordered behaviors as deterministic nonharmonic signal and quantify them. In addition to the model, the article outlines model parameter estimation procedures and a family of harmonics-to-noise ratio (HNR) parameters. The proposed HNR parameters include harmonics-to-deterministic-noise ratio (HDNR) and harmonics-to-random-noise ratio. A preliminary study demonstrates the effectiveness of the extended model and its HNR parameters. Vocal folds with and without benign lesions (Nwith = 13; Nwithout = 20) were studied with HSV glottal area waveforms. All three HNR parameters significantly distinguished the disordered condition, and the HDNR reported the largest effect size (Cohen's d = 2.04).