Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
Mais filtros

País/Região como assunto
Intervalo de ano de publicação
1.
J Neurosci ; 42(3): 435-442, 2022 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-34815317

RESUMO

In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here, we used fMRI to monitor brain activity while adult humans (n = 60) were presented with visual-only, auditory-only, and audiovisual words. The audiovisual words were presented in quiet and in several signal-to-noise ratios. As expected, audiovisual speech perception recruited both auditory and visual cortex, with some evidence for increased recruitment of premotor cortex in some conditions (including in substantial background noise). We then investigated neural connectivity using psychophysiological interaction analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. In addition to whole-brain analyses, we also conducted a region-of-interest analysis on the left posterior superior temporal sulcus (pSTS), implicated in many previous studies of audiovisual speech perception. We found evidence for both activity and effective connectivity in pSTS for visual-only and audiovisual speech, although these were not significant in whole-brain analyses. Together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech that complements activity in integrative brain regions like pSTS.SIGNIFICANCE STATEMENT In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is hard to understand (e.g., background noise). Prior work has suggested that specialized regions of the brain may play a critical role in integrating information from visual and auditory speech. Here, we show a complementary mechanism relying on synchronized brain activity among sensory and motor regions may also play a critical role. These findings encourage reconceptualizing audiovisual integration in the context of coordinated network activity.


Assuntos
Córtex Auditivo/fisiologia , Idioma , Leitura Labial , Rede Nervosa/fisiologia , Percepção da Fala/fisiologia , Córtex Visual/fisiologia , Percepção Visual/fisiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Córtex Auditivo/diagnóstico por imagem , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Rede Nervosa/diagnóstico por imagem , Córtex Visual/diagnóstico por imagem , Adulto Jovem
2.
J Neurosci ; 42(31): 6108-6120, 2022 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-35760528

RESUMO

Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase-locking to auditory and visual signals in MEG recordings from 14 human participants (6 females, 8 males) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual, or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6 Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase-locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared with audio-only speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus did not show above-chance partial coherence with visual speech signals during AV conditions but did show partial coherence in visual-only conditions. Hence, visual speech enabled stronger phase-locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.SIGNIFICANCE STATEMENT Verbal communication in noisy environments is challenging, especially for hearing-impaired individuals. Seeing facial movements of communication partners improves speech perception when auditory signals are degraded or absent. The neural mechanisms supporting lip-reading or audio-visual benefit are not fully understood. Using MEG recordings and partial coherence analysis, we show that speech information is used differently in brain regions that respond to auditory and visual speech. While visual areas use visual speech to improve phase-locking to auditory speech signals, auditory areas do not show phase-locking to visual speech unless auditory speech is absent and visual speech is used to substitute for missing auditory signals. These findings highlight brain processes that combine visual and auditory signals to support speech understanding.


Assuntos
Córtex Auditivo , Percepção da Fala , Córtex Visual , Estimulação Acústica , Córtex Auditivo/fisiologia , Percepção Auditiva , Feminino , Humanos , Leitura Labial , Masculino , Fala/fisiologia , Percepção da Fala/fisiologia , Córtex Visual/fisiologia , Percepção Visual/fisiologia
3.
Neuroimage ; 282: 120391, 2023 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-37757989

RESUMO

There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.


Assuntos
Córtex Auditivo , Percepção da Fala , Humanos , Leitura Labial , Percepção da Fala/fisiologia , Encéfalo/fisiologia , Córtex Auditivo/fisiologia , Fonética , Percepção Visual/fisiologia
4.
Small ; 19(17): e2205058, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36703524

RESUMO

Lip-reading provides an effective speech communication interface for people with voice disorders and for intuitive human-machine interactions. Existing systems are generally challenged by bulkiness, obtrusiveness, and poor robustness against environmental interferences. The lack of a truly natural and unobtrusive system for converting lip movements to speech precludes the continuous use and wide-scale deployment of such devices. Here, the design of a hardware-software architecture to capture, analyze, and interpret lip movements associated with either normal or silent speech is presented. The system can recognize different and similar visemes. It is robust in a noisy or dark environment. Self-adhesive, skin-conformable, and semi-transparent dry electrodes are developed to track high-fidelity speech-relevant electromyogram signals without impeding daily activities. The resulting skin-like sensors can form seamless contact with the curvilinear and dynamic surfaces of the skin, which is crucial for a high signal-to-noise ratio and minimal interference. Machine learning algorithms are employed to decode electromyogram signals and convert them to spoken words. Finally, the applications of the developed lip-reading system in augmented reality and medical service are demonstrated, which illustrate the great potential in immersive interaction and healthcare applications.


Assuntos
Movimento , Pele , Humanos , Eletromiografia/métodos , Eletrodos , Aprendizado de Máquina
5.
Int J Audiol ; 62(12): 1155-1165, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36129442

RESUMO

OBJECTIVE: To understand the communicational and psychosocial effects of COVID-19 protective measures in real-life everyday communication settings. DESIGN: An online survey consisting of close-set and open-ended questions aimed to describe the communication difficulties experienced in different communication activities (in-person and telecommunication) during the COVID-19 pandemic. STUDY SAMPLE: 172 individuals with hearing loss and 130 who reported not having a hearing loss completed the study. They were recruited through social media, private audiology clinics, hospitals and monthly newsletters sent by the non-profit organisation "Audition Quebec." RESULTS: Face masks were the most problematic protective measure for communication in 75-90% of participants. For all in-person communication activities, participants with hearing loss reported significantly more impact on communication than participants with normal hearing. They also exhibited more activity limitations and negative emotions associated with communication difficulties. CONCLUSION: These results suggest that, in times of pandemic, individuals with hearing loss are more likely to exhibit communication breakdowns in their everyday activities. This may lead to social isolation and have a deleterious effect on their mental health. When interacting with individuals with hearing loss, communication strategies to optimise speech understanding should be used.


Assuntos
COVID-19 , Surdez , Perda Auditiva , Humanos , Pandemias , Perda Auditiva/epidemiologia , Perda Auditiva/psicologia , Audição , Comunicação
6.
Sensors (Basel) ; 23(4)2023 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-36850669

RESUMO

Endangered language generally has low-resource characteristics, as an immaterial cultural resource that cannot be renewed. Automatic speech recognition (ASR) is an effective means to protect this language. However, for low-resource language, native speakers are few and labeled corpora are insufficient. ASR, thus, suffers deficiencies including high speaker dependence and over fitting, which greatly harms the accuracy of recognition. To tackle the deficiencies, the paper puts forward an approach of audiovisual speech recognition (AVSR) based on LSTM-Transformer. The approach introduces visual modality information including lip movements to reduce the dependence of acoustic models on speakers and the quantity of data. Specifically, the new approach, through the fusion of audio and visual information, enhances the expression of speakers' feature space, thus achieving the speaker adaptation that is difficult in a single modality. The approach also includes experiments on speaker dependence and evaluates to what extent audiovisual fusion is dependent on speakers. Experimental results show that the CER of AVSR is 16.9% lower than those of traditional models (optimal performance scenario), and 11.8% lower than that for lip reading. The accuracy for recognizing phonemes, especially finals, improves substantially. For recognizing initials, the accuracy improves for affricates and fricatives where the lip movements are obvious and deteriorates for stops where the lip movements are not obvious. In AVSR, the generalization onto different speakers is also better than in a single modality and the CER can drop by as much as 17.2%. Therefore, AVSR is of great significance in studying the protection and preservation of endangered languages through AI.


Assuntos
Aclimatação , Fala , Acústica , Fontes de Energia Elétrica , Idioma
7.
Sensors (Basel) ; 23(4)2023 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-36850882

RESUMO

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise. Additional visual information can be used for both automatic lip-reading and gesture recognition. Hand gestures are a form of non-verbal communication and can be used as a very important part of modern human-computer interaction systems. Currently, audio and video modalities are easily accessible by sensors of mobile devices. However, there is no out-of-the-box solution for automatic audio-visual speech and gesture recognition. This study introduces two deep neural network-based model architectures: one for AVSR and one for gesture recognition. The main novelty regarding audio-visual speech recognition lies in fine-tuning strategies for both visual and acoustic features and in the proposed end-to-end model, which considers three modality fusion approaches: prediction-level, feature-level, and model-level. The main novelty in gesture recognition lies in a unique set of spatio-temporal features, including those that consider lip articulation information. As there are no available datasets for the combined task, we evaluated our methods on two different large-scale corpora-LRW and AUTSL-and outperformed existing methods on both audio-visual speech recognition and gesture recognition tasks. We achieved AVSR accuracy for the LRW dataset equal to 98.76% and gesture recognition rate for the AUTSL dataset equal to 98.56%. The results obtained demonstrate not only the high performance of the proposed methodology, but also the fundamental possibility of recognizing audio-visual speech and gestures by sensors of mobile devices.


Assuntos
Gestos , Fala , Humanos , Computadores de Mão , Acústica , Sistemas Computacionais
8.
J Child Lang ; 50(1): 27-51, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36503546

RESUMO

This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.


Assuntos
Leitura Labial , Percepção da Fala , Adulto , Criança , Humanos , Pré-Escolar , Fala , Desenvolvimento da Linguagem , Sinais (Psicologia)
9.
Sensors (Basel) ; 22(10)2022 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-35632141

RESUMO

Lipreading is a technique for analyzing sequences of lip movements and then recognizing the speech content of a speaker. Limited by the structure of our vocal organs, the number of pronunciations we could make is finite, leading to problems with homophones when speaking. On the other hand, different speakers will have various lip movements for the same word. For these problems, we focused on the spatial-temporal feature extraction in word-level lipreading in this paper, and an efficient two-stream model was proposed to learn the relative dynamic information of lip motion. In this model, two different channel capacity CNN streams are used to extract static features in a single frame and dynamic information between multi-frame sequences, respectively. We explored a more effective convolution structure for each component in the front-end model and improved by about 8%. Then, according to the characteristics of the word-level lipreading dataset, we further studied the impact of the two sampling methods on the fast and slow channels. Furthermore, we discussed the influence of the fusion methods of the front-end and back-end models under the two-stream network structure. Finally, we evaluated the proposed model on two large-scale lipreading datasets and achieved a new state-of-the-art.


Assuntos
Algoritmos , Leitura Labial , Humanos , Aprendizagem , Movimento (Física) , Movimento
10.
Sensors (Basel) ; 22(20)2022 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-36298089

RESUMO

Speech is a commonly used interaction-recognition technique in edutainment-based systems and is a key technology for smooth educational learning and user-system interaction. However, its application to real environments is limited owing to the various noise disruptions in real environments. In this study, an audio and visual information-based multimode interaction system is proposed that enables virtual aquarium systems that use speech to interact to be robust to ambient noise. For audio-based speech recognition, a list of words recognized by a speech API is expressed as word vectors using a pretrained model. Meanwhile, vision-based speech recognition uses a composite end-to-end deep neural network. Subsequently, the vectors derived from the API and vision are classified after concatenation. The signal-to-noise ratio of the proposed system was determined based on data from four types of noise environments. Furthermore, it was tested for accuracy and efficiency against existing single-mode strategies for extracting visual features and audio speech recognition. Its average recognition rate was 91.42% when only speech was used, and improved by 6.7% to 98.12% when audio and visual information were combined. This method can be helpful in various real-world settings where speech recognition is regularly utilized, such as cafés, museums, music halls, and kiosks.


Assuntos
Percepção da Fala , Fala , Interface para o Reconhecimento da Fala , Ruído , Razão Sinal-Ruído
11.
Sensors (Basel) ; 22(8)2022 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-35458932

RESUMO

Deep learning technology has encouraged research on noise-robust automatic speech recognition (ASR). The combination of cloud computing technologies and artificial intelligence has significantly improved the performance of open cloud-based speech recognition application programming interfaces (OCSR APIs). Noise-robust ASRs for application in different environments are being developed. This study proposes noise-robust OCSR APIs based on an end-to-end lip-reading architecture for practical applications in various environments. Several OCSR APIs, including Google, Microsoft, Amazon, and Naver, were evaluated using the Google Voice Command Dataset v2 to obtain the optimum performance. Based on performance, the Microsoft API was integrated with Google's trained word2vec model to enhance the keywords with more complete semantic information. The extracted word vector was integrated with the proposed lip-reading architecture for audio-visual speech recognition. Three forms of convolutional neural networks (3D CNN, 3D dense connection CNN, and multilayer 3D CNN) were used in the proposed lip-reading architecture. Vectors extracted from API and vision were classified after concatenation. The proposed architecture enhanced the OCSR API average accuracy rate by 14.42% using standard ASR evaluation measures along with the signal-to-noise ratio. The proposed model exhibits improved performance in various noise settings, increasing the dependability of OCSR APIs for practical applications.


Assuntos
Inteligência Artificial , Fala , Computação em Nuvem , Redes Neurais de Computação , Interface para o Reconhecimento da Fala
12.
Sensors (Basel) ; 22(9)2022 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-35591284

RESUMO

Concomitant with the recent advances in deep learning, automatic speech recognition and visual speech recognition (VSR) have received considerable attention. However, although VSR systems must identify speech from both frontal and profile faces in real-world scenarios, most VSR studies have focused solely on frontal face pictures. To address this issue, we propose an end-to-end sentence-level multi-view VSR architecture for faces captured from four different perspectives (frontal, 30°, 45°, and 60°). The encoder uses multiple convolutional neural networks with a spatial attention module to detect minor changes in the mouth patterns of similarly pronounced words, and the decoder uses cascaded local self-attention connectionist temporal classification to collect the details of local contextual information in the immediate vicinity, which results in a substantial performance boost and speedy convergence. To compare the performance of the proposed model for experiments on the OuluVS2 dataset, the dataset was divided into four different perspectives, and the obtained performance improvement was 3.31% (0°), 4.79% (30°), 5.51% (45°), 6.18% (60°), and 4.95% (mean), respectively, compared with the existing state-of-the-art performance, and the average performance improved by 9.1% compared with the baseline. Thus, the suggested design enhances the performance of multi-view VSR and boosts its usefulness in real-world applications.


Assuntos
Leitura Labial , Redes Neurais de Computação , Atenção , Humanos , Idioma , Fala
13.
HNO ; 70(6): 456-465, 2022 Jun.
Artigo em Alemão | MEDLINE | ID: mdl-35024877

RESUMO

BACKGROUND: When reading lips, many people benefit from additional visual information from the lip movements of the speaker, which is, however, very error prone. Algorithms for lip reading with artificial intelligence based on artificial neural networks significantly improve word recognition but are not available for the German language. MATERIALS AND METHODS: A total of 1806 videoclips with only one German-speaking person each were selected, split into word segments, and assigned to word classes using speech-recognition software. In 38,391 video segments with 32 speakers, 18 polysyllabic, visually distinguishable words were used to train and validate a neural network. The 3D Convolutional Neural Network and Gated Recurrent Units models and a combination of both models (GRUConv) were compared, as were different image sections and color spaces of the videos. The accuracy was determined in 5000 training epochs. RESULTS: Comparison of the color spaces did not reveal any relevant different correct classification rates in the range from 69% to 72%. With a cut to the lips, a significantly higher accuracy of 70% was achieved than when cut to the entire speaker's face (34%). With the GRUConv model, the maximum accuracies were 87% with known speakers and 63% in the validation with unknown speakers. CONCLUSION: The neural network for lip reading, which was first developed for the German language, shows a very high level of accuracy, comparable to English-language algorithms. It works with unknown speakers as well and can be generalized with more word classes.


Assuntos
Aprendizado Profundo , Idioma , Algoritmos , Inteligência Artificial , Humanos , Leitura Labial
14.
Folia Phoniatr Logop ; 74(2): 131-140, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34348290

RESUMO

INTRODUCTION: To the best of our knowledge, there is a lack of reliable, validated, and standardized (Dutch) measuring instruments to document visual speech perception in a structured way. This study aimed to: (1) evaluate the effects of age, gender, and the used word list on visual speech perception examined by a first version of the Dutch Test for (Audio-)Visual Speech Perception on word level (TAUVIS-words) and (2) assess the internal reliability of the TAUVIS-words. METHODS: Thirty-nine normal-hearing adults divided into the following 3 age categories were included: (1) younger adults, age 18-39 years; (2) middle-aged adults, age 40-59 years; and (3) older adults, age >60 years. The TAUVIS-words consist of 4 word lists, i.e., 2 monosyllabic word lists (MS 1 and MS 2) and 2 polysyllabic word lists (PS 1 and PS 2). A first exploration of the effects of age, gender, and test stimuli (i.e., the used word list) on visual speech perception was conducted using the TAUVIS-words. A mixed-design analysis of variance (ANOVA) was conducted to analyze the results statistically. Lastly, the internal reliability of the TAUVIS-words was assessed by calculating the Chronbach α. RESULTS: The results revealed a significant effect of the used list. More specifically, the score for MS 1 was significantly better compared to that for PS 2, and the score for PS 1 was significantly better compared to that for PS 2. Furthermore, a significant main effect of gender was found. Women scored significantly better compared to men. The effect of age was not significant. The TAUVIS-word lists were found to have good internal reliability. CONCLUSION: This study was a first exploration of the effects of age, gender, and test stimuli on visual speech perception using the TAUVIS-words. Further research is necessary to optimize and validate the TAUVIS-words, making use of a larger study sample.


Assuntos
Percepção da Fala , Adolescente , Adulto , Idoso , Feminino , Testes Auditivos , Humanos , Idioma , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Adulto Jovem
15.
J Neurosci ; 40(5): 1053-1065, 2020 01 29.
Artigo em Inglês | MEDLINE | ID: mdl-31889007

RESUMO

Lip-reading is crucial for understanding speech in challenging conditions. But how the brain extracts meaning from, silent, visual speech is still under debate. Lip-reading in silence activates the auditory cortices, but it is not known whether such activation reflects immediate synthesis of the corresponding auditory stimulus or imagery of unrelated sounds. To disentangle these possibilities, we used magnetoencephalography to evaluate how cortical activity in 28 healthy adult humans (17 females) entrained to the auditory speech envelope and lip movements (mouth opening) when listening to a spoken story without visual input (audio-only), and when seeing a silent video of a speaker articulating another story (video-only). In video-only, auditory cortical activity entrained to the absent auditory signal at frequencies <1 Hz more than to the seen lip movements. This entrainment process was characterized by an auditory-speech-to-brain delay of ∼70 ms in the left hemisphere, compared with ∼20 ms in audio-only. Entrainment to mouth opening was found in the right angular gyrus at <1 Hz, and in early visual cortices at 1-8 Hz. These findings demonstrate that the brain can use a silent lip-read signal to synthesize a coarse-grained auditory speech representation in early auditory cortices. Our data indicate the following underlying oscillatory mechanism: seeing lip movements first modulates neuronal activity in early visual cortices at frequencies that match articulatory lip movements; the right angular gyrus then extracts slower features of lip movements, mapping them onto the corresponding speech sound features; this information is fed to auditory cortices, most likely facilitating speech parsing.SIGNIFICANCE STATEMENT Lip-reading consists in decoding speech based on visual information derived from observation of a speaker's articulatory facial gestures. Lip-reading is known to improve auditory speech understanding, especially when speech is degraded. Interestingly, lip-reading in silence still activates the auditory cortices, even when participants do not know what the absent auditory signal should be. However, it was uncertain what such activation reflected. Here, using magnetoencephalographic recordings, we demonstrate that it reflects fast synthesis of the auditory stimulus rather than mental imagery of unrelated, speech or non-speech, sounds. Our results also shed light on the oscillatory dynamics underlying lip-reading.


Assuntos
Córtex Auditivo/fisiologia , Leitura Labial , Percepção da Fala/fisiologia , Estimulação Acústica , Feminino , Humanos , Magnetoencefalografia , Masculino , Reconhecimento Visual de Modelos/fisiologia , Espectrografia do Som , Adulto Jovem
16.
Audiol Neurootol ; 26(3): 149-156, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33352550

RESUMO

INTRODUCTION: Patients with postlingual deafness usually depend on visual information for communication, and their lipreading ability could influence cochlear implantation (CI) outcomes. However, it is unclear whether preoperative visual dependency in postlingual deafness positively or negatively affects auditory rehabilitation after CI. Herein, we investigated the influence of preoperative audiovisual per-ception on CI outcomes. METHOD: In this retrospective case-comparison study, 118 patients with postlingual deafness who underwent unilateral CI were enrolled. Evaluation of speech perception was performed under both audiovisual (AV) and audio-only (AO) conditions before and after CI. Before CI, the speech perception test was performed under hearing aid (HA)-assisted conditions. After CI, the speech perception test was performed under the CI-only condition. Only patients with a 10% or less preoperative AO speech perception score were included. RESULTS: Multivariable regression analysis showed that age, gender, residual hearing, operation side, education level, and HA usage were not correlated with either postoperative AV (pAV) or AO (pAO) speech perception. However, duration of deafness showed a significant negative correlation with both pAO (p = 0.003) and pAV (p = 0.015) speech perceptions. Notably, the preoperative AV speech perception score was not correlated with pAO speech perception (R2 = 0.00134, p = 0.693) but was positively associated with pAV speech perception (R2 = 0.0731, p = 0.003). CONCLUSION: Preoperative dependency on audiovisual information may positively influence pAV speech perception in patients with postlingual deafness.


Assuntos
Implante Coclear , Implantes Cocleares , Surdez/cirurgia , Audição/fisiologia , Percepção da Fala/fisiologia , Adulto , Idoso , Estudos de Casos e Controles , Surdez/fisiopatologia , Feminino , Testes Auditivos , Humanos , Leitura Labial , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Terapêutica
17.
Int J Audiol ; 60(7): 495-506, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33246380

RESUMO

OBJECTIVE: To understand the impact of face coverings on hearing and communication. DESIGN: An online survey consisting of closed-set and open-ended questions distributed within the UK to gain insights into experiences of interactions involving face coverings, and of the impact of face coverings on communication. SAMPLE: Four hundred and sixty members of the general public were recruited via snowball sampling. People with hearing loss were intentionally oversampled to more thoroughly assess the effect of face coverings in this group. RESULTS: With few exceptions, participants reported that face coverings negatively impacted hearing, understanding, engagement, and feelings of connection with the speaker. Impacts were greatest when communicating in medical situations. People with hearing loss were significantly more impacted than those without hearing loss. Face coverings impacted communication content, interpersonal connectedness, and willingness to engage in conversation; they increased anxiety and stress, and made communication fatiguing, frustrating and embarrassing - both as a speaker wearing a face covering, and when listening to someone else who is wearing one. CONCLUSIONS: Face coverings have far-reaching impacts on communication for everyone, but especially for people with hearing loss. These findings illustrate the need for communication-friendly face-coverings, and emphasise the need to be communication-aware when wearing a face covering.


Assuntos
Percepção Auditiva , COVID-19/prevenção & controle , Barreiras de Comunicação , Transtornos da Audição/psicologia , Leitura Labial , Máscaras , Pessoas com Deficiência Auditiva/psicologia , COVID-19/transmissão , Sinais (Psicologia) , Expressão Facial , Audição , Transtornos da Audição/diagnóstico , Transtornos da Audição/fisiopatologia , Humanos , Comportamento Social , Percepção Visual
18.
Sensors (Basel) ; 22(1)2021 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-35009612

RESUMO

In visual speech recognition (VSR), speech is transcribed using only visual information to interpret tongue and teeth movements. Recently, deep learning has shown outstanding performance in VSR, with accuracy exceeding that of lipreaders on benchmark datasets. However, several problems still exist when using VSR systems. A major challenge is the distinction of words with similar pronunciation, called homophones; these lead to word ambiguity. Another technical limitation of traditional VSR systems is that visual information does not provide sufficient data for learning words such as "a", "an", "eight", and "bin" because their lengths are shorter than 0.02 s. This report proposes a novel lipreading architecture that combines three different convolutional neural networks (CNNs; a 3D CNN, a densely connected 3D CNN, and a multi-layer feature fusion 3D CNN), which are followed by a two-layer bi-directional gated recurrent unit. The entire network was trained using connectionist temporal classification. The results of the standard automatic speech recognition evaluation metrics show that the proposed architecture reduced the character and word error rates of the baseline model by 5.681% and 11.282%, respectively, for the unseen-speaker dataset. Our proposed architecture exhibits improved performance even when visual ambiguity arises, thereby increasing VSR reliability for practical applications.


Assuntos
Percepção da Fala , Fala , Humanos , Leitura Labial , Redes Neurais de Computação , Reprodutibilidade dos Testes
19.
Neuroimage ; 178: 721-734, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29772380

RESUMO

The cerebral cortex modulates early sensory processing via feed-back connections to sensory pathway nuclei. The functions of this top-down modulation for human behavior are poorly understood. Here, we show that top-down modulation of the visual sensory thalamus (the lateral geniculate body, LGN) is involved in visual-speech recognition. In two independent functional magnetic resonance imaging (fMRI) studies, LGN response increased when participants processed fast-varying features of articulatory movements required for visual-speech recognition, as compared to temporally more stable features required for face identification with the same stimulus material. The LGN response during the visual-speech task correlated positively with the visual-speech recognition scores across participants. In addition, the task-dependent modulation was present for speech movements and did not occur for control conditions involving non-speech biological movements. In face-to-face communication, visual speech recognition is used to enhance or even enable understanding what is said. Speech recognition is commonly explained in frameworks focusing on cerebral cortex areas. Our findings suggest that task-dependent modulation at subcortical sensory stages has an important role for communication: Together with similar findings in the auditory modality the findings imply that task-dependent modulation of the sensory thalami is a general mechanism to optimize speech recognition.


Assuntos
Mapeamento Encefálico/métodos , Corpos Geniculados/fisiologia , Reconhecimento Psicológico/fisiologia , Percepção da Fala/fisiologia , Percepção Visual/fisiologia , Adulto , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Adulto Jovem
20.
Clin Linguist Phon ; 32(12): 1090-1102, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30183411

RESUMO

The effect of hearing status on the ability to speechread is poorly understood, and current findings are inconclusive regarding differences in speechreading performance between children and adults with hearing impairment and those with normal hearing. In this study, we investigated the effect of hearing status on speechreading skills in Chinese adolescents. Thirty seven severely deaf students with a mean pure-tone average of 93 dB hearing threshold level and 21 hearing controls aged 16 completed tasks measuring their speechreading of simplex finals (monophthongs), complex finals (diphthongs or vowel + nasal constellations) and initials (consonants) in Chinese. Both accuracy rate and response time data were collected. Results showed no significant difference in accuracy between groups. By contrast, deaf individuals were significantly faster at speechreading than their hearing controls. In addition, for both groups, performance on speechreading simplex finals was faster and more accurate than complex finals, which in turn was better than initial consonants. We conclude that speechreading skills in Chinese adolescents are influenced by hearing status, characteristics of sounds to be identified, as well as the measures used.


Assuntos
Povo Asiático , Audição/fisiologia , Leitura Labial , Pessoas com Deficiência Auditiva/psicologia , Percepção da Fala/fisiologia , Adolescente , China , Feminino , Humanos , Masculino , Percepção Visual
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA