Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Appl Sci (Basel) ; 14(1)2024 Jan 01.
Article in English | MEDLINE | ID: mdl-38699704

ABSTRACT

INTRODUCTION: This study pursued two objectives: (1) to determine the potential association between listener (n = 51) judgments of 20 male tracheoesophageal speaker samples for two auditory-perceptual dimensions of voice, overall severity (OS) and listener comfort (LC); and (2) to assess the temporal and spectral acoustic correlates for these auditory-perceptual dimensions. METHODOLOGY: Three separate correlation analyses were performed to evaluate the association between OS and LC. First, scores of OS and LC from all listeners were pooled together, and then the correlation between OS and LC was computed. Second, scores of OS and LC were averaged over all listeners to derive a single estimate of OS and LC for each TE speaker sample; the correlation between the average OS and LC was then computed. Third, listener-to-listener variability in the association between OS and LC was evaluated by computing the correlation between OS and LC scores from each listener across all TE samples. Finally, two stepwise multiple regression models were created to relate the average LC score to spectral and temporal variation in the acoustic signal. RESULTS: While the pooled OS and LC scores had a moderate positive correlation (r = 0.66, p < 0.00001), the averaged OS and LC exhibited a near perfect positive correlation (r = 0.99, p < 0.00001). The significant differences between the pooled and averaged scores were explained by significant listener-to-listener variability in the association between OS and LC. OS and LC scores from 5 listeners had non-significant correlations, 10 had moderate correlations (r < 0.7), 35 listeners had high correlations (0.7 < r < 0.9), and 1 listener had a very high correlation (r < 0.9 < 1). Finally, the acoustic models created based on the spectral and temporal variations in the signal were able to account for 87.7% and 61.8% of variation in the average LC score. CONCLUSIONS: The strong correlations between OS and LC suggest that LC may, in fact, provide a more comprehensive auditory-perceptual surrogate for the voice quality of TE speakers. Although OS and LC are distinct conceptual dimensions, LC appears to have the advantage of assessing the social impact and potential communication disability that may exist in interactions between TE speakers and listeners.

2.
J Speech Lang Hear Res ; 67(3): 753-781, 2024 Mar 11.
Article in English | MEDLINE | ID: mdl-38386017

ABSTRACT

PURPOSE: Many studies using machine learning (ML) in speech, language, and hearing sciences rely upon cross-validations with single data splitting. This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust data splitting method of nested k-fold cross-validation. The second purpose is to present methods and MATLAB code to perform power analysis for ML-based analysis during the design of a study. METHOD: First, the significant impact of different cross-validations on ML outcomes was demonstrated using real-world clinical data. Then, Monte Carlo simulations were used to quantify the interactions among the employed cross-validation method, the discriminative power of features, the dimensionality of the feature space, the dimensionality of the model, and the sample size. Four different cross-validation methods (single holdout, 10-fold, train-validation-test, and nested 10-fold) were compared based on the statistical power and confidence of the resulting ML models. Distributions of the null and alternative hypotheses were used to determine the minimum required sample size for obtaining a statistically significant outcome (5% significance) with 80% power. Statistical confidence of the model was defined as the probability of correct features being selected for inclusion in the final model. RESULTS: ML models generated based on the single holdout method had very low statistical power and confidence, leading to overestimation of classification accuracy. Conversely, the nested 10-fold cross-validation method resulted in the highest statistical confidence and power while also providing an unbiased estimate of accuracy. The required sample size using the single holdout method could be 50% higher than what would be needed if nested k-fold cross-validation were used. Statistical confidence in the model based on nested k-fold cross-validation was as much as four times higher than the confidence obtained with the single holdout-based model. A computational model, MATLAB code, and lookup tables are provided to assist researchers with estimating the minimum sample size needed during study design. CONCLUSION: The adoption of nested k-fold cross-validation is critical for unbiased and robust ML studies in the speech, language, and hearing sciences. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.25237045.


Subject(s)
Machine Learning , Speech , Humans , Sample Size , Language , Hearing
3.
Laryngoscope ; 133(11): 3094-3099, 2023 11.
Article in English | MEDLINE | ID: mdl-37194664

ABSTRACT

OBJECTIVE: The aim of this study was to gain quantitative insights into the role of daily voice use associated with mild phonotrauma via the Daily Phonotrauma Index (DPI), a measure derived from neck-surface acceleration magnitude (NSAM) and difference between the first two harmonic magnitudes (H1 - H2). METHODS: An ambulatory voice monitor recorded weeklong voice use for 151 female patients with phonotraumatic vocal hyperfunction (PVH) and 181 female vocally healthy controls. Three laryngologists rated phonotrauma severity from each patient's laryngoscopy. Mixed generalized linear models evaluated the accuracy, sensitivity, and specificity of the original DPI trained on all patients versus a mild DPI version trained on only patients rated with mild phonotrauma. Individual contribution of NSAM and H1 - H2 to each DPI model was also evaluated. RESULTS: Reliability across the laryngologists' phonotrauma ratings was moderate (Fleiss κ = 0.41). There were 70, 69, and 12 patients with mild, moderate, and severe phonotrauma, respectively. The mild DPI, compared to the original DPI, correctly classified more patients with mild phonotrauma (Cohen's d = 0.9) and less controls (d = -0.9) and did not change in overall accuracy. H1 - H2 contributed less to mild phonotrauma classification than NSAM for mild DPI. CONCLUSIONS: Compared with the original DPI, the mild DPI exhibited higher sensitivity to mild phonotrauma and lower specificity to controls, but the same overall classification accuracy. These results support the mild DPI as a promising detector of early phonotrauma and that NSAM may be associated with early phonotrauma, and H1 - H2 may be a biomarker associated with vocal fold vibration in the presence of lesions. LEVEL OF EVIDENCE: Level 4, case-control study Laryngoscope, 133:3094-3099, 2023.


Subject(s)
Voice Disorders , Voice , Humans , Female , Case-Control Studies , Reproducibility of Results , Vocal Cords/pathology
4.
Perspect ASHA Spec Interest Groups ; 8(6): 1363-1379, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38312372

ABSTRACT

Purpose: The teaching profession is a high-voice use occupation at elevated risk for developing voice disorders. Continued research on teachers' vocal demands is necessary to advocate for and establish vocal health programs. This study quantified ambulatory vocal dose measures for teachers during both on- and off-work periods, comparing their occupational voice use to that in other studies that have reported percent phonation ranging from 17% to 30%. Method: Participants included 26 full-time, female school teachers between 23 and 55 years of age across multiple grades and subjects, including individuals with and without a voice disorder. Ambulatory voice data were collected from weeklong voice monitoring that recorded phonatory activity through anterior neck-surface vibration. Three vocal dose measures-time, cycle, and distance doses-were computed for each participant for three time periods: on-work weekdays, off-work weekdays, and off-work weekend days. Results: The teachers' average percent phonation was 16.2% on-work weekdays, 8.4% off-work weekdays, and 8.0% off-work weekend days. No statistically significant differences for vocal dose measures were found between off-work weekdays and weekend days. Overall, all vocal dose measures were approximately 2 times higher during work relative to off-work time periods. Conclusions: This study provides values for vocal dose measures for school teachers using ambulatory voice-monitoring technology. The vocal demands of this particular teacher sample and voice activity detection algorithm are potential factors contributing to percent phonation values on the lower end of the range reported in the literature. Future work is needed to continue to understand occupational voice use and its associated risks related to voice health, with the ultimate goal of preventing and managing voice disorders in individuals engaged in high-risk occupations.

5.
J Acoust Soc Am ; 152(1): 580, 2022 07.
Article in English | MEDLINE | ID: mdl-35931551

ABSTRACT

Recent studies have advocated for the use of connected speech in clinical voice and speech assessment. This suggestion is based on the presence of clinically relevant information within the onset, offset, and variation in connected speech. Existing works on connected speech utilize methods originally designed for analysis of sustained vowels and, hence, cannot properly quantify the transient behavior of connected speech. This study presents a non-parametric approach to analysis based on a two-dimensional, temporal-spectral representation of speech. Variations along horizontal and vertical axes corresponding to the temporal and spectral dynamics of speech were quantified using two statistical models. The first, a spectral model, was defined as the probability of changes between the energy of two consecutive frequency sub-bands at a fixed time segment. The second, a temporal model, was defined as the probability of changes in the energy of a sub-band between consecutive time segments. As the first step of demonstrating the efficacy and utility of the proposed method, a diagnostic framework was adopted in this study. Data obtained revealed that the proposed method has (at minimum) significant discriminatory power over the existing alternative approaches.


Subject(s)
Speech Perception , Speech , Acoustics , Speech Acoustics , Speech Production Measurement/methods
6.
J Voice ; 2022 Jan 31.
Article in English | MEDLINE | ID: mdl-35115223

ABSTRACT

OBJECTIVE: Evidence-based practice and precision medicine can significantly benefit from the ability to perform calibrated spatial measurements (eg, mm) from endoscopic images. However, calibrated measurements are not readily available from laryngeal images. Laser-projection endoscopes can provide the required information for performing calibrated spatial measurements, but their applications require a process known as calibration. During calibration, a set of benchtop recordings are used to determine the effect of confounding factors of spatial measurements, and also to learn their proper compensation strategies. Calibration benchtop recordings are acquired from flat surfaces and at a perpendicular imaging angle which is significantly different from in-vivo situations, where a three-dimensional (3D) surface gets recorded at a semi-unknown imaging angle. The aim of this study was to quantify changes in calibrated vertical and horizontal measurement accuracies as we move from the controlled condition of calibration to more realistic and uncontrolled settings. METHOD: A flat surface was positioned in front of a calibrated laser-projection transnasal fiberoptic endoscope at different working distances and imaging angles. Calibrated vertical and horizontal measurement errors were computed from each condition. Multiple linear regression analyses were used to quantify the dependence of vertical and horizontal measurement errors on the imaging angle and working distance. Next, a 3D-printed surface was positioned in front of the laser-projection endoscope at different working distances. Calibrated vertical and horizontal measurement errors were computed from each condition and then they were compared to measurement errors from a flat surface positioned at comparable working distances. RESULTS: The outcome of analyses supported a significant effect of imaging angle on calibrated vertical measurement accuracy, while no significant effect of imaging angle on calibrated horizontal measurement accuracy was established. Additionally, the result of multiple linear regression analyses showed that the coefficient of imaging angle was two times larger than the working distance, which further highlights the significant effect of imaging angle on vertical measurement accuracy. Comparing the magnitude of calibrated vertical and horizontal measurement errors between the 3D surface and a flat surface suggested a significant effect of surface topology on calibrated measurement accuracies. CONCLUSIONS: The mean percent magnitude of error of vertical and horizontal measurement errors from the 3D surface were respectively around 6% and 11%, at most working distances, which are acceptable for many applications. However, the significant effect of imaging angle and surface topology on measurement errors highlights the need for further research on these confounding factors. It also suggests that significant improvements in measurement accuracies may be achieved if these factors are properly accounted for during the calibration process. Last but not least, this study highlights the need for the evaluation of laser-projection endoscopes in uncontrolled and more realistic settings. Specifically, evaluations of laser-projection endoscopes in very controlled settings could significantly overestimate their accuracies and hence it will not represent their actual performances during in-vivo data acquisitions.

7.
J Voice ; 2022 Jan 02.
Article in English | MEDLINE | ID: mdl-34986994

ABSTRACT

OBJECTIVE: Calibrated horizontal-plane measurements from laryngeal images could contribute significantly to refining evidence-based practice and developing patient-specific models and precision-medicine approaches. Laser-projection endoscopes can address the need for direct calibrated measures; however, these systems are not widely available. This study presents the framework for an alternative indirect horizontal-plane calibration approach. METHOD: A spatial attribute of a common object, a distinct characteristic that is maintained across images, may be used as a scale for the normalization of other spatial measurements. The outcome of this indirect approach could be used for absolute measurements (eg, in units of mm) or relative measurements (eg, percent change), depending on the information that is available from the common attribute. The required conditions of a common attribute for achieving a valid calibration outcome were studied. Three conditions were derived: registration accuracy of the common attribute, size consistency of the common attribute, and similarity in the vertical distance between the region of interest (ROI) (eg, vocal fold) and the common attribute. Any common attribute satisfying these three conditions was called proper and would result in a valid indirect calibration outcome. Three tests were presented for evaluating the properness of a common attribute. A data-driven statistical method was presented that can evaluate the registration accuracy of a common attribute. The second test used variation in calibrated lengths of a common attribute under different phonatory configurations for evaluating the size consistency condition. Finally, the effect of differences between vertical distances of the ROI and the common attribute was mathematically tested and quantified. The application of the proposed framework for indirect calibration was demonstrated using a pre existing dataset with a vocal fold as the ROI and four different common attributes (vocal fold length, vocal fold width, blood vessel on the vocal fold, and blood vessel on nearby tissue). RESULTS: The proposed registration-accuracy test was able to detect and eliminate instances of common attributes with low accuracies. The analysis suggested that among the studied four common attributes, the vocal fold length had the highest (ie, best) registration accuracy; however, the vocal fold length exhibited the lowest (ie, worst) size consistency. The analysis also suggested that, among the studied attributes, the vocal fold width offered the best trade-off among the three conditions and, hence, was a proper common attribute for calibrating spatial aspects of the vocal folds (length, displacement of edges, velocity, etc). CONCLUSION: Indirect calibration is a feasible alternative for calibration of laryngeal endoscopic images, given a proper common attribute is selected. Future work is needed to systematically evaluate the effects of various phonatory conditions on the characteristics of common attributes.

8.
J Voice ; 36(6): 755-769, 2022 Nov.
Article in English | MEDLINE | ID: mdl-32958427

ABSTRACT

Laryngeal images obtained via high-speed videoendoscopy are an invaluable source of information for the advancement of voice science because they can capture the true cycle-to-cycle vibratory characteristics of the vocal folds in addition to the transient behaviors of the phonatory mechanism, such as onset, offset, and breaks. This information is obtained through relating the spatial and temporal features from acquired images using objective measurements or subjective assessments. While these images are calibrated temporally, a great challenge is the lack of spatial calibration. Recently, a laser-projection system allowing for spatial calibration was developed. However, various sources of optical distortions deviate the images from reflecting the reality. The main purpose of this study was to evaluate the effect of the fiberoptic flexible endoscope distortions on the calibration of images acquired by the laser-projection system. Specifically, it is shown that two sources of nonlinear distortions could deviate captured images from reality. The first distortion stems from the wide-angle lens used in flexible endoscopes. It is shown that endoscopic images have a significantly higher spatial resolution in the center of the field of view than in its periphery. The difference between the two could lead to as high as 26.4% error in calibrated horizontal measurements. The second distortion stems from variation in the imaging angle. It is shown that the disparity between spatial resolution in the center and periphery of endoscopic images increases as the imaging angle deviates from the perpendicular position. Furthermore, it is shown that when the imaging angle varies, the symmetry of the distortion is also affected significantly. The combined distortions could lead to calibrated horizontal measurement errors as high as 65.7%. The implications of the findings on objective measurements and subjective visual assessments are discussed. These findings can contribute to the refinement of the methods for clinical assessment of voice disorders. Considering that the studied phenomena are due to optical principles, the findings of this study, especially those related to the effects of the imaging angle, can provide further insights regarding other endoscopic instruments (eg, distal-chip and rigid endoscopes) and procedures (eg, gastroendoscopy and colonoscopy).


Subject(s)
Endoscopes , Larynx , Humans , Vocal Cords/diagnostic imaging , Phonation , Endoscopy/methods
9.
Appl Sci (Basel) ; 11(2)2021 Jan 02.
Article in English | MEDLINE | ID: mdl-33628469

ABSTRACT

OBJECTIVE: Calibrated horizontal measurements (e.g., mm) from endoscopic procedures could be utilized for advancement of evidence-based practice and personalized medicine. However, the size of an object in endoscopic images is not readily calibrated and depends on multiple factors, including the distance between the endoscope and the target surface. Additionally, acquired images may have significant non-linear distortion that would further complicate calibrated measurements. This study used a recently developed in-vivo laser-projection fiberoptic laryngoscope and proposes a method for calibrated spatial measurements. METHOD: A set of circular grids were recorded at multiple working distances. A statistical model was trained that would map from pixel length of the object, the working distance, and the spatial location of the target object into its mm length. RESULT: A detailed analysis of the performance of the proposed method is presented. The analyses have shown that the accuracy of the proposed method does not depend on the working distance and length of the target object. The estimated average magnitude of error was 0.27 mm, which is three times lower than the existing alternative. CONCLUSION: The presented method can achieve sub-millimeter accuracy in horizontal measurement. SIGNIFICANCE: Evidence-based practice and personalized medicine could significantly benefit from the proposed method. Implications of the findings for other endoscopic procedures are also discussed.

10.
J Voice ; 35(1): 122-128, 2021 Jan.
Article in English | MEDLINE | ID: mdl-31383516

ABSTRACT

The design specifications and experimental characteristics of a newly developed laser-projection transnasal flexible endoscope coupled with a high-speed videoendoscopy system are provided. The hardware and software design of the proposed system benefits from the combination of structured green light projection and laser triangulation techniques, which provide the capability of calibrated absolute measurements of the laryngeal structures along the horizontal and vertical planes during phonation. Visual inspection of in vivo acquired images demonstrated sharp contrast between laser points and background, confirming successful design of the system. Objective analyses were carried out for assessing the irradiance of the system and the penetration of the green laser light into the red and blue channels in the recorded images. The analysis showed that the system has irradiance of 372 W/m2 at a working distance of 20 mm, which is well within the safety limits, indicating minimal risk of usage of the device on human subjects. Additionally, the color penetration analysis showed that, with probability of 90%, the ratio of contamination of the red channel from the green laser light is less than 0.002. This indicates minimal effect of the laser projection on the measurements performed on the red data channel, making the system applicable for calibrated 3D spatial-temporal segmentation and data-driven subject-specific modeling, which is important for further advancing voice science and clinical voice assessment.


Subject(s)
Larynx , Vocal Cords , Humans , Laryngoscopy , Larynx/diagnostic imaging , Lasers , Phonation , Vibration , Video Recording , Vocal Cords/diagnostic imaging
11.
J Voice ; 34(6): 847-861, 2020 Nov.
Article in English | MEDLINE | ID: mdl-31151853

ABSTRACT

The ability to provide absolute calibrated measurement of the laryngeal structures during phonation is of paramount importance to voice science and clinical practice. Calibrated three-dimensional measurement could provide essential information for modeling purposes, for studying the developmental aspects of vocal fold vibration, for refining functional voice assessment and treatment outcomes evaluation, and for more accurate staging and grading of laryngeal disease. Recently, a laser-calibrated transnasal fiberoptic endoscope compatible with high-speed videoendoscopy (HSV) and capable of providing three-dimensional measurements was developed. The optical principle employed is to project a grid of 7 × 7 green laser points across the field of view (FOV) at an angle relative to the imaging axis, such that (after calibration) the position of each laser point within the FOV encodes the vertical distance from the tip of the endoscope to the laryngeal tissues. The purpose of this study was to develop a precise method for vertical calibration of the endoscope. Investigating the position of the laser points showed that, besides the vertical distance, they also depend on the parameters of the lens coupler, including the FOV position within the image frame and the rotation angle of the endoscope. The presented automatic calibration method was developed to compensate for the effect of these parameters. Statistical image processing and pattern recognition were used to detect the FOV, the center of FOV, and the fiducial marker. This step normalizes the HSV frames to a standard coordinate system and removes the dependence of the laser-point positions on the parameters of the lens coupler. Then, using a statistical learning technique, a calibration protocol was developed to model the trajectories of all laser points as the working distance was varied. Finally, a set of experiments was conducted to measure the accuracy and reliability of every step of the procedure. The system was able to measure absolute vertical distance with mean percent error in the range of 1.7% to 4.7%, depending on the working distance.


Subject(s)
Phonation , Vocal Cords , Calibration , Humans , Lasers , Reproducibility of Results , Vocal Cords/diagnostic imaging
SELECTION OF CITATIONS
SEARCH DETAIL
...