Pesquisa | Portal Regional da BVS

1.

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition.

Ghorbani, Shahram; Hansen, John H L.

J Acoust Soc Am ; 155(6): 3848-3860, 2024 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-38884524

RESUMO

The ability to accurately classify accents and assess accentedness in non-native speakers are challenging tasks due primarily to the complexity and diversity of accent and dialect variations. In this study, embeddings from advanced pretrained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment. Findings demonstrate that employing pretrained LID and SID models effectively encodes accent/dialect information in speech. Furthermore, the LID and SID encoded accent information complement an end-to-end (E2E) accent identification (AID) model trained from scratch. By incorporating all three embeddings, the proposed multi-embedding AID system achieves superior accuracy in AID. Next, leveraging automatic speech recognition (ASR) and AID models is investigated to explore accentedness estimation. The ASR model is an E2E connectionist temporal classification model trained exclusively with American English (en-US) utterances. The ASR error rate and en-US output of the AID model are leveraged as objective accentedness scores. Evaluation results demonstrate a strong correlation between scores estimated by the two models. Additionally, a robust correlation between objective accentedness scores and subjective scores based on human perception is demonstrated, providing evidence for the reliability and validity of using AID-based and ASR-based systems for accentedness assessment in non-native speech. Such advanced systems would benefit accent assessment in language learning as well as speech and speaker assessment for intelligibility, quality, and speaker diarization and speech recognition advancements.

Assuntos

Percepção da Fala , Interface para o Reconhecimento da Fala , Humanos , Percepção da Fala/fisiologia , Acústica da Fala , Fonética , Idioma , Medida da Produção da Fala/métodos , Feminino , Masculino

2.

Child-adult speech diarization in naturalistic conditions of preschool classrooms using room-independent ResNet model and automatic speech recognition-based re-segmentation.

Kothalkar, Prasanna V; Hansen, John H L; Irvin, Dwight; Buzhardt, Jay.

J Acoust Soc Am ; 155(2): 1198-1215, 2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38341746

RESUMO

Speech and language development are early indicators of overall analytical and learning ability in children. The preschool classroom is a rich language environment for monitoring and ensuring growth in young children by measuring their vocal interactions with teachers and classmates. Early childhood researchers are naturally interested in analyzing naturalistic vs controlled lab recordings to measure both quality and quantity of such interactions. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to the diversity of acoustic events/conditions in such daylong audio streams, automated speaker diarization technology would need to be advanced to address this challenging domain for segmenting audio as well as information extraction. This study investigates alternate deep learning-based lightweight, knowledge-distilled, diarization solutions for segmenting classroom interactions of 3-5 years old children with teachers. In this context, the focus on speech-type diarization which classifies speech segments as being either from adults or children partitioned across multiple classrooms. Our lightest CNN model achieves a best F1-score of â¼76.0% on data from two classrooms, based on dev and test sets of each classroom. It is utilized with automatic speech recognition-based re-segmentation modules to perform child-adult diarization. Additionally, F1-scores are obtained for individual segments with corresponding speaker tags (e.g., adult vs child), which provide knowledge for educators on child engagement through naturalistic communications. The study demonstrates the prospects of addressing educational assessment needs through communication audio stream analysis, while maintaining both security and privacy of all children and adults. The resulting child communication metrics have been used for broad-based feedback for teachers with the help of visualizations.

Assuntos

Percepção da Fala , Fala , Adulto , Humanos , Pré-Escolar , Comunicação , Idioma , Desenvolvimento da Linguagem

3.

Bilateral Cochlear Implant Processing of Coding Strategies With CCi-MOBILE, an Open-Source Research Platform.

Ghosh, Ria; Hansen, John H L.

IEEE/ACM Trans Audio Speech Lang Process ; 31: 1839-1850, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38046574

RESUMO

While speech understanding for cochlear implant (CI) users in quiet is relatively effective, listeners experience difficulty in identification of speaker and sound location. To assist for better residual hearing abilities and speech intelligibility support, bilateral and bimodal forms of assisted hearing is becoming popular among CI users. Effective bilateral processing calls for testing precise algorithm synchronization and fitting between both left and right ear channels in order to capture interaural time and level difference cues (ITD and ILDs). This work demonstrates bilateral implant algorithm processing using a custom-made CI research platform - CCi-MOBILE, which is capable of capturing precise source localization information and supports researchers in testing bilateral CI processing in real-time naturalistic environments. Simulation-based, objective, and subjective testing has been performed to validate the accuracy of the platform. The subjective test results produced an RMS error of ±8.66° for source localization, which is comparable to the performance of commercial CI processors.

4.

The effects of estimation accuracy, estimation approach, and number of selected channels using formant-priority channel selection for an "n-of-m" sound processing strategy for cochlear implants.

Saba, Juliana N; Ali, Hussnain; Hansen, John H L.

J Acoust Soc Am ; 153(5): 3100, 2023 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-37227411

RESUMO

Previously, selection of l channels was prioritized according to formant frequency locations in an l-of-n-of-m-based signal processing strategy to provide important voicing information independent of listening environments for cochlear implant (CI) users. In this study, ideal, or ground truth, formants were incorporated into the selection stage to determine the effect of accuracy on (1) subjective speech intelligibility, (2) objective channel selection patterns, and (3) objective stimulation patterns (current). An average +11% improvement (p < 0.05) was observed across six CI users in quiet, but not for noise or reverberation conditions. Analogous increases in channel selection and current for the upper range of F1 and a decrease across mid-frequencies with higher corresponding current, were both observed at the expense of noise-dominant channels. Objective channel selection patterns were analyzed a second time to determine the effects of estimation approach and number of selected channels (n). A significant effect of estimation approach was only observed in the noise and reverberation condition with minor differences in channel selection and significantly decreased stimulated current. Results suggest that estimation method, accuracy, and number of channels in the proposed strategy using ideal formants may improve intelligibility when corresponding stimulated current of formant channels are not masked by noise-dominant channels.

Assuntos

Implante Coclear , Implantes Cocleares , Percepção da Fala , Som , Ruído

5.

Familiar and unfamiliar speaker recognition assessment and system emulation for cochlear implant users.

Mamun, Nursadul; Ghosh, Ria; Hansen, John H L.

J Acoust Soc Am ; 153(2): 1293, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36859118

RESUMO

In the area of speech processing, human speaker identification under naturalistic environments is a challenging task, especially for hearing-impaired individuals with cochlear implants (CIs) or hearing aids (HAs). Motivated by the fact that electrodograms reflect direct CI stimulation of input audio, this study proposes a speaker identification (ID) investigation using two-dimensional electrodograms constructed from the responses of a CI auditory system to emulate CI speaker ID capabilities. Features are extracted from electrodograms through an identity vector (i-vector) framework to train and generate identity models for each speaker using a Gaussian mixture model-universal background model followed by probabilistic linear discriminant analysis. To validate the proposed system, perceptual speaker ID for 20 normal hearing (NH) and seven CI listeners was evaluated with a total of 41 different speakers and compared with the scores from the proposed system. A one-way analysis of variance showed that the proposed system can reliably predict the speaker ID capability of CI (F[1,10] = 0.18, p = 0.68) and NH (F[1,20] = 0, p = 0.98) listeners in naturalistic environments. The impact of speaker familiarity is also addressed, and the results show a reduced performance for speaker recognition by CI subjects using their CI processor, highlighting limitations of current speech processing strategies used in CIs/HAs.

Assuntos

Implante Coclear , Implantes Cocleares , Auxiliares de Audição , Pessoas com Deficiência Auditiva , Humanos , Análise Discriminante

6.

A convolutional neural network-based framework for analysis and assessment of non-linguistic sound classification and enhancement for normal hearing and cochlear implant listeners.

Shekar, Ram C M C; Hansen, John H L.

J Acoust Soc Am ; 152(5): 2720, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-36456299

RESUMO

Naturalistic sounds encode salient acoustic content that provides situational context or subject/system properties essential for acoustic awareness, autonomy, safety, and improved quality of life for individuals with sensorineural hearing loss. Cochlear implants (CIs) are an assistive hearing device that restores auditory function in hearing impaired individuals. Most CI research advancements have focused on improving speech recognition in noisy, reverberant, or time-varying diverse environments. Relatively few studies have explored non-linguistic sound (NLS) perception among CIs, and those that have carried out such studies generally reported poor perception, suggesting a clear deficit in current CI sound processing systems. In this study, a convolutional neural network (CNN)-based NLS classification model is used as a framework to compare unprocessed and CI-simulated NLS classification and evaluate NLS perception targeted algorithms among CI listeners. Additionally, a NLS enhancement algorithm that focuses on improving identifiability and perception among CI listeners is proposed. The proposed NLS enhancement algorithm is evaluated based on identifiability performance using the CI-simulated NLS classification model. The proposed NLS classification framework was able to achieve near human-level performance with no significant effect of classification modality (model vs human subject) and achieved mean classification scores of 85.86% for NH (p = 0.3758) and 65.25% for CI (p = 0.1725). Among the four different feature-based methods of the proposed NLS enhancement algorithm, the "harmonicity"-based one achieved highest mean classification accuracy of 63.75%, when compared to baseline, and demonstrated significant improvement in performance (p = 0.0403). The resulting proposed comparative NLS classification framework contributes toward (i) advancement of NLS recognition studies, (ii) mitigation of CI user recruitment constraints and listener evaluation with NH listeners, (iii) development of a community shared testbed for comparative NLS studies, and (iv) advancement of NLS enhancement studies (identifiability and perceptual factors) among CI listeners.

Assuntos

Implante Coclear , Implantes Cocleares , Humanos , Qualidade de Vida , Redes Neurais de Computação , Audição

7.

Bimodal Cochlear Implant Processing based on Assisted Hearing algorithms with CCi-MOBILE: an open-source research platform.

Ghosh, Ria; Hansen, John H L.

Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 4265-4268, 2022 07.

Artigo em Inglês | MEDLINE | ID: mdl-36086056

RESUMO

While speech understanding for cochlear implant (CI) and Hearing aid (HA) users in quiet is relatively effective, listeners experience difficulty in identification of speakers and sound location. Previous studies have reported improved localization and better speech perception when the unilateral CI is coupled with a HA in the contralateral ear. This is referred to as bimodal presentation of speech or electric and acoustic stimulation (EAS). Various CI research interfaces developed by either academic or industry sponsored research teams support proposed signal processing and psychoacoustic investigations but have limited ability to efficiently validate bimodal algorithms. Platforms that support bimodal testing (CI and HA) are either not portable or only provide limited features due to proprietary parameters/routines. CCi-MOBILE, an open-source, portable signal processing research device, developed by UT-Dallas, enables electric and acoustic stimulations simultaneously, thus providing researchers the opportunity to explore new technology and scientific paradigms for the hearing impaired. In the present work, we provide verification and implementation of synchronized bimodal (electric-acoustic) output in an authenticated and efficient manner to support sound and acoustic localization algorithms for experimental investigations.

Assuntos

Implante Coclear , Implantes Cocleares , Auxiliares de Audição , Algoritmos , Audição

8.

The effects of Lombard perturbation on speech intelligibility in noise for normal hearing and cochlear implant listeners.

Saba, Juliana N; Hansen, John H L.

J Acoust Soc Am ; 151(2): 1007, 2022 02.

Artigo em Inglês | MEDLINE | ID: mdl-35232065

RESUMO

Natural compensation of speech production in challenging listening environments is referred to as the Lombard effect (LE). The resulting acoustic differences between neutral and Lombard speech have been shown to provide intelligibility benefits for normal hearing (NH) and cochlear implant (CI) listeners alike. Motivated by this outcome, three LE perturbation approaches consisting of pitch, duration, formant, intensity, and spectral contour modifications were designed specifically for CI listeners to combat speech-in-noise performance deficits. Experiment 1 analyzed the effects of loudness, quality, and distortion of approaches on speech intelligibility with and without formant-shifting. Significant improvements of +9.4% were observed in CI listeners without the formant-shifting approach at +5 dB signal-to-noise ratio (SNR) large-crowd-noise (LCN) when loudness was controlled, however, performance was found to be significantly lower for NH listeners. Experiment 2 evaluated the non-formant-shifting approach with additional spectral contour and high pass filtering to reduce spectral smearing and decrease distortion observed in Experiment 1. This resulted in significant intelligibility benefits of +30.2% for NH and +21.2% for CI listeners at 0 and +5 dB SNR LCN, respectively. These results suggest that LE perturbation may be useful as front-end speech modification approaches to improve intelligibility for CI users in noise.

Assuntos

Implantes Cocleares , Percepção da Fala , Estimulação Acústica/métodos , Audição , Inteligibilidade da Fala

9.

CCi-MOBILE: A Portable Real Time Speech Processing Platform for Cochlear Implant and Hearing Research.

Ghosh, Ria; Ali, Hussnain; Hansen, John H L.

IEEE Trans Biomed Eng ; 69(3): 1251-1263, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-34705633

RESUMO

Experimental hardware-research interfaces form a crucial role during developmental stages of any medical, signal-monitoring system as it allows researchers to test and optimize output results before perfecting the design for the actual FDA approved medical device and large-scale production. These testing platforms, intake raw signals through which performance of novel algorithms can be analyzed and modified to generate the desired data points for an optimized output, allowing the advancement of the medical device. With cochlear implants (CIs) and hearing aids (HAs) becoming a more common solution for varying degrees of hearing impairment, having modern signal processing strategies tested for such speech sensitive systems is a necessity. But the rigid design requirements of commercial CI and HA processors make it difficult to explore novel algorithms for research investigations and conducting longitudinal studies. This study presents the design, development, clinical evaluation, and applications of CCi-MOBILE, a computationally powerful signal processing testing platform built for researchers in the hearing-impaired field. The custom-made, portable research platform allows researchers to design and perform complex speech processing algorithm assessment offline and in real-time. It can be operated through user-friendly, open-source software and is compatible with implants manufactured by Cochlear Corporation. The FPGA design and hardware processing pipeline for CI stimulation is discussed followed by results from an acute study with implant users' speech intelligibility in quiet and noisy conditions. The results show a consistent level of performance compared with CI users' clinical processor, thus confirming the viability of the platform in chronic CI based studies.

Assuntos

Implante Coclear , Implantes Cocleares , Percepção da Fala , Audição , Inteligibilidade da Fala

10.

An intrusive method for estimating speech intelligibility from noisy and distorted signals.

Mamun, Nursadul; Zilany, Muhammad S A; Hansen, John H L; Davies-Venn, Evelyn E.

J Acoust Soc Am ; 150(3): 1762, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-34598625

RESUMO

An objective metric that predicts speech intelligibility under different types of noise and distortion would be desirable in voice communication. To date, the majority of studies concerning speech intelligibility metrics have focused on predicting the effects of individual noise or distortion mechanisms. This study proposes an objective metric, the spectrogram orthogonal polynomial measure (SOPM), that attempts to predict speech intelligibility for people with normal hearing under adverse conditions. The SOPM metric is developed by extracting features from the spectrogram using Krawtchouk moments. The metric's performance is evaluated for several types of noise (steady-state and fluctuating noise), distortions (peak clipping, center clipping, and phase jitters), ideal time-frequency segregation, and reverberation conditions both in quiet and noisy environments. High correlation (0.97-0.996) is achieved with the proposed metric when evaluated with subjective scores by normal-hearing subjects under various conditions.

Assuntos

Inteligibilidade da Fala , Percepção da Fala , Testes Auditivos , Humanos , Ruído/efeitos adversos

11.

An evaluation framework for research platforms to advance cochlear implant/hearing aid technology: A case study with CCi-MOBILE.

Shekar, Ram C M C; Hansen, John H L.

J Acoust Soc Am ; 149(1): 229, 2021 01.

Artigo em Inglês | MEDLINE | ID: mdl-33514127

RESUMO

Cochlear implants (CIs) and hearing aids (HAs) are advanced assistive hearing devices that perform sound processing to achieve acoustic to acoustic/electrical stimulation, thus enabling the prospects for hearing restoration and rehabilitation. Since commercial CIs/HAs are typically constrained by manufacturer design/production constraints, it is necessary for researchers to use research platforms (RPs) to advance algorithms and conduct investigational studies with CI/HA subjects. While previous CI/HA research platforms exist, no study has explored establishing a formal evaluation protocol for the operational safety and reliability of RPs. This study proposes a two-phase analysis and evaluation paradigm for RPs. In the acoustic phase 1 step, a signal processing acoustic space is explored in order to present a sampled set of audio input content to explore the safety of the resulting output electric/acoustic stimulation. In the parameter phase 2 step, the configurable space for realizable electrical stimulation pulses is determined, and overall stimulation reliability and safety are evaluated. The proposed protocol is applied and demonstrated using Costakis Cochlear Implant Mobile. Assessment protocol observations, results, and additional best practices for subsampling of the acoustic and parameter test spaces are discussed. The proposed analysis-evaluation protocol establishes a viable framework for assessing RP operational safety and reliability. Guidelines for adapting the proposed protocol to address variability in RP configuration due to experimental factors such as custom algorithms, stimulation techniques, and/or individualization are also considered.

Assuntos

Implante Coclear , Implantes Cocleares , Auxiliares de Audição , Percepção da Fala , Estimulação Acústica , Estimulação Elétrica , Humanos , Reprodutibilidade dos Testes

12.

Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition.

Kelly, Finnian; Hansen, John H L.

IEEE/ACM Trans Audio Speech Lang Process ; 29: 927-942, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-35783572

RESUMO

Variations in vocal effort can create challenges for speaker recognition systems that are optimized for use with neutral speech. The Lombard effect and whisper are two commonly-occurring forms of vocal effort variation that result in non-neutral speech, the first due to noise exposure and the second due to intentional adjustment on the part of the speaker. In this article, a comparative evaluation of speaker recognition performance in non-neutral conditions is presented using multiple Lombard effect and whisper corpora. The detrimental impact of these vocal effort variations on discrimination and calibration performance on global, per-corpus, and per-speaker levels is explored using conventional error metrics, along with visual representations of the model and score spaces. A non-neutral speech detector is subsequently introduced and used to inform score calibration in several ways. Two calibration approaches are proposed and shown to reduce error to the same level as an optimal calibration approach that relies on ground-truth vocal effort information. This article contributes a generalizable methodology towards detecting vocal effort variation and using this knowledge to inform and advance speaker recognition system behavior.

13.

Estimating hearing aid fitting presets with machine learning-based clustering strategies.

Belitz, Chelzy; Ali, Hussnain; Hansen, John H L.

JASA Express Lett ; 1(11)2021 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-35784455

RESUMO

Although there exist nearly 35 × 106 hearing impaired people in the U.S., only an estimated 25% use hearing aids (HA), while others elect not to use prescribed HAs. Lack of HA acceptance can be attributed to several factors including (i) performance variability in diverse environments, (ii) time-to-convergence for best HA operating configuration, (iii) unrealistic expectations, and (iv) cost/insurance. This study examines a nationwide dataset of pure-tone audiograms and HA fitting configurations. An overview of data characteristics is presented, followed by use of machine learning clustering to suggest ways of obtaining effective starting configurations, thereby reducing time-to-convergence to improve HA retention.

14.

Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems^â©.

Hansen, John H L; Stauffer, Allen; Xia, Wei.

Speech Commun ; 134: 20-31, 2021 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-35784517

RESUMO

Speech, speaker, and language systems have traditionally relied on carefully collected speech material for training acoustic models. There is an enormous amount of freely accessible audio content. A major challenge, however, is that such data is not professionally recorded, and therefore may contain a wide diversity of background noise, nonlinear distortions, or other unknown environmental or technology-based contamination or mismatch. There is a crucial need for automatic analysis to screen such unknown datasets before acoustic model development training, or to perform input audio purity screening prior to classification. In this study, we propose a waveform based clipping detection algorithm for naturalistic audio streams and examine the impact of clipping at different severities on speech quality measurements and automatic speaker recognition systems. We use the TIMIT and NIST SRE08 corpora as case studies. The results show, as expected, that clipping introduces a nonlinear distortion into clean speech data, which reduces speech quality and performance for speaker recognition. We also investigate what degree of clipping can be present to sustain effective speech system performance. The proposed detection system, which will be released, could contribute to massive new audio collections for speech and language technology development (e.g. Google Audioset (Gemmeke et al., 2017), CRSS-UTDallas Apollo Fearless-Steps (Yu et al., 2014) (19,000 h naturalistic audio from NASA Apollo missions)).

15.

Phonetic variability constrained bottleneck features for joint speaker recognition and physical task stress detection.

Zhang, Chunlei; Hansen, John H L.

J Acoust Soc Am ; 148(5): 2912, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-33261416

RESUMO

Normalizing intrinsic variabilities (e.g., variability in speech production brought on by aging, physical or cognitive task stress, Lombard effect, etc.) in speech and speaker recognition models is essential for system robustness. This study focuses on analysis of speech under physical task stress with its application for speaker recognition and physical task stress detection. An innovative framework using deep neural networks (DNNs) for joint text-independent speaker recognition and physical task stress detection is proposed. Instead of processing consecutive feature frames like d-vectors (i.e., average of frame-level speaker specific features extracted with a speaker discriminative trained DNN), phonetic variability constrained feature vectors as inputs to train deep bottleneck neural networks is proposed. More specifically, a universal background model (UBM) with a small number of mixtures is employed to align the acoustic features. The innovative feature representation is then generated by selecting and concatenating frames according to the alignments from the UBM. The benefit of feature rearrangement is twofold. First, phonetic variability is largely constrained in the front-end feature vector. Second, by sampling a determined number of representations for each speaker/utterance, the issue of data imbalance and over-fitting is alleviated. Experiments for speaker recognition and physical stress detection are conducted on the UTScope-Physical Task Stress Corpus. Improved performance in terms of accuracy (for identification/detection task) and Equal Error Rate (for verification task) over a strong i-vector probabilistic linear discriminant analysis system confirms the effectiveness of this proposed method.

Assuntos

Redes Neurais de Computação , Fonética , Análise Discriminante , Fala , Interface para o Reconhecimento da Fala

16.

Portable Smart-Space Research Interface to Predetermine Environment Acoustics for Cochlear implant and Hearing aid users with CCi-MOBILE.

Ghosh, Ria; Chandra Shekar, Ram Charan; Hansen, John H L.

Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 4221-4224, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-33018928

RESUMO

Internet of things (IoT) in healthcare, has effi-ciently accelerated medical monitoring and assessment through the real-time analysis of collected data. Hence, to support the hearing-impaired community with better calibrations to their clinical processors and hearing aids, a portable smart space interface - AURIS has been developed by the Cochlear Implant Processing Lab (CILab) at UT-Dallas. The proposed Auris interface periodically samples the acoustic space, and through a learn vs test phase, builds a Gaussian mixture model for each specific environmental locations. An effective connection is established by the Auris interface with the CRSS CCi-Mobile research platform through an android app to fine tune the con-figuration settings for cochlear implant (CI) or hearing aid (HA) users entering the room/location. Baseline objective evaluations have been performed in diverse naturalistic locations using 12 hours of audio data. The performance metrics is determined by a verified wireless communication, along with estimated acoustic environment knowledge and room classification at greater than 90% accuracy.

Assuntos

Implantes Cocleares , Auxiliares de Audição , Percepção da Fala , Acústica , Pesquisa Espacial

17.

Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing.

Hansen, John H L; Bokshi, Marigona; Khorram, Soheil.

J Acoust Soc Am ; 148(2): 829, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-32873043

RESUMO

Speech production variability introduces significant challenges for existing speech technologies such as speaker identification (SID), speaker diarization, speech recognition, and language identification (ID). There has been limited research analyzing changes in acoustic characteristics for speech produced by untrained singing versus speaking. To better understand changes in speech production of the untrained singing voice, this study presents the first cross-language comparison between normal speaking and untrained karaoke singing of the same text content. Previous studies comparing professional singing versus speaking have shown deviations in both prosodic and spectral features. Some investigations also considered assigning the intrinsic activity of the singing. Motivated by these studies, a series of experiments to investigate both prosodic and spectral variations of untrained karaoke singers for three languages, American English, Hindi, and Farsi, are considered. A comprehensive comparison on common prosodic features, including phoneme duration, mean fundamental frequency (F0), and formant center frequencies of vowels was performed. Collective changes in the corresponding overall acoustic spaces based on the Kullback-Leibler distance using Gaussian probability distribution models trained on spectral features were analyzed. Finally, these models were used in a Gausian mixture model with universal background model SID evaluation to quantify speaker changes between speaking and singing when the audio text content is the same. The experiments showed that many acoustic characteristics of untrained singing are considerably different from speaking when the text content is the same. It is suggested that these results would help advance automatic speech production normalization/compensation to improve performance of speech processing applications (e.g., speaker ID, speech recognition, and language ID).

Assuntos

Canto , Fala , Acústica , Humanos , Idioma , Acústica da Fala

18.

A speech perturbation strategy based on "Lombard effect" for enhanced intelligibility for cochlear implant listeners.

Hansen, John H L; Lee, Jaewook; Ali, Hussnain; Saba, Juliana N.

J Acoust Soc Am ; 147(3): 1418, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-32237802

RESUMO

The goal of this study is to determine potential intelligibility benefits from Lombard speech for cochlear implant (CI) listeners in speech-in-noise conditions. "Lombard effect" (LE) is the natural response of adjusting speech production via auditory feedback due to noise exposure within acoustic environments. To evaluate intelligibility performance of natural and artificially induced Lombard speech, a corpus was generated to create natural LE from large crowd noise (LCN) exposure at 70, 80, and 90 dB sound pressure level (SPL). Clean speech was mixed with 15 and 10 dB SNR LCN and presented to five CI users. First, speech intelligibility was analyzed as a function of increasing LE and decreasing SNR. Results indicate significant improvements (p < 0.05) with Lombard speech intelligibility in noise conditions for 80 and 90 dB SPL. Next, an offline perturbation strategy was formulated to modify/perturb neutral speech so as to mimic LE through amplification of highly intelligible segments, uniform time stretching, and spectral mismatch filtering. This process effectively introduces aspects of LE into the neutral speech, with the hypothesis that this would benefit intelligibility for CI users. Significant (p < 0.01) intelligibility improvements of 13% and 16% percentage points were observed for 15 and 10 dB SNR conditions respectively for CI users. The results indicate how LE and LE-inspired acoustic and frequency-based modifications can be leveraged within signal processing to improve intelligibility of speech for CI users.

Assuntos

Implante Coclear , Implantes Cocleares , Percepção da Fala , Estimulação Acústica , Inteligibilidade da Fala

19.

CCi-MOBILE: Design and Evaluation of a Cochlear Implant and Hearing Aid Research Platform for Speech Scientists and Engineers.

Hansen, John H L; Ali, Hussnain; Saba, Juliana N; Ram, Charan M C; Mamun, Nursadul; Ghosh, Ria; Brueggeman, Avamarie.

IEEE EMBS Int Conf Biomed Health Inform ; 20192019 May.

Artigo em Inglês | MEDLINE | ID: mdl-31763625

RESUMO

Hearing loss is an increasingly prevalent condition resulting from damage to the inner ear which causes a reduction in speech intelligibility. The societal need for assistive hearing devices has increased exponentially over the past two decades; however, actual human performance with such devices has only seen modest gains relative to advancements in digital signal processing (DSP) technology. A major challenge with clinical hearing technologies is the limited ability to run complex signal processing algorithms requiring high computation power. The CCi-MOBILE platform, developed at UT-Dallas, provides the research community with an open-source, flexible, easy-to-use, software-mediated, powerful computing research interface to conduct a wide variety of listening experiments. The platform supports cochlear implants (CIs) and hearing aids (HAs) independently, as well as bimodal hearing (i.e., a CI in one ear and HA in the contralateral ear). The platform is ideally suited to address hearing research for: both quiet and naturalistic noisy conditions, sound localization, and lateralization. The platform uses commercially available smartphone/tablet devices as portable sound processors and can provide bilateral electric and acoustic stimulation. The hardware components, firmware, and software suite are presented to demonstrate safety to the speech scientist and CI/HA user, highlight user-specificity, and outline various applications of the platform for research.

20.

A Machine Learning Based Clustering Protocol for Determining Hearing Aid Initial Configurations from Pure-Tone Audiograms.

Belitz, Chelzy; Ali, Hussnain; Hansen, John H L.

Interspeech ; 2019: 2325-2329, 2019 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-34307641

RESUMO

Of the nearly 35 million people in the USA who are hearing impaired, only an estimated 25% use hearing aids (HA). A good number of HAs are prescribed but not used partially because of the time to convergence for best operation between the audiologist and user. To improve HA retention, it is suggested that a machine learning (ML) protocol could be established which improves initial HA configurations given a user's pure-tone audiogram. This study examines a ML clustering method to predict the best initial HA fitting from a corpus of over 90,000 audiogram-fitting pairs collected from hearing centers throughout the USA. We first examine the final HA comfort targets to determine a limited number of preset configurations using several multi-dimensional clustering methods (Birch, Ward, and k-means). The goal is to reduce the amount of adjustments between the centroid, selected as a fitting configuration to represent the cluster, and the final HA configurations. This may be used to reduce the adjustment cycles for HAs or as preset starting configurations for personal sound amplification products (PSAPs). Using various classification methods, audiograms are mapped to a limited number of potential preset configurations. Finally, the average adjustment between the preset fitting targets and the final fitting targets is examined.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA