Pesquisa | Biblioteca Virtual em Saúde

The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers.

Magnotti, John F; Lado, Anastasia; Beauchamp, Michael S.

Front Neurosci ; 18: 1421713, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38988770

RESUMO

In the McGurk effect, visual speech from the face of the talker alters the perception of auditory speech. The diversity of human languages has prompted many intercultural studies of the effect in both Western and non-Western cultures, including native Japanese speakers. Studies of large samples of native English speakers have shown that the McGurk effect is characterized by high variability in the susceptibility of different individuals to the illusion and in the strength of different experimental stimuli to induce the illusion. The noisy encoding of disparity (NED) model of the McGurk effect uses principles from Bayesian causal inference to account for this variability, separately estimating the susceptibility and sensory noise for each individual and the strength of each stimulus. To determine whether variation in McGurk perception is similar between Western and non-Western cultures, we applied the NED model to data collected from 80 native Japanese-speaking participants. Fifteen different McGurk stimuli that varied in syllable content (unvoiced auditory "pa" + visual "ka" or voiced auditory "ba" + visual "ga") were presented interleaved with audiovisual congruent stimuli. The McGurk effect was highly variable across stimuli and participants, with the percentage of illusory fusion responses ranging from 3 to 78% across stimuli and from 0 to 91% across participants. Despite this variability, the NED model accurately predicted perception, predicting fusion rates for individual stimuli with 2.1% error and for individual participants with 2.4% error. Stimuli containing the unvoiced pa/ka pairing evoked more fusion responses than the voiced ba/ga pairing. Model estimates of sensory noise were correlated with participant age, with greater sensory noise in older participants. The NED model of the McGurk effect offers a principled way to account for individual and stimulus differences when examining the McGurk effect in different cultures.

Synthetic faces generated with the facial action coding system or deep neural networks improve speech-in-noise perception, but not as much as real faces.

Yu, Yingjia; Lado, Anastasia; Zhang, Yue; Magnotti, John F; Beauchamp, Michael S.

Front Neurosci ; 18: 1379988, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38784097

RESUMO

The prevalence of synthetic talking faces in both commercial and academic environments is increasing as the technology to generate them grows more powerful and available. While it has long been known that seeing the face of the talker improves human perception of speech-in-noise, recent studies have shown that synthetic talking faces generated by deep neural networks (DNNs) are also able to improve human perception of speech-in-noise. However, in previous studies the benefit provided by DNN synthetic faces was only about half that of real human talkers. We sought to determine whether synthetic talking faces generated by an alternative method would provide a greater perceptual benefit. The facial action coding system (FACS) is a comprehensive system for measuring visually discernible facial movements. Because the action units that comprise FACS are linked to specific muscle groups, synthetic talking faces generated by FACS might have greater verisimilitude than DNN synthetic faces which do not reference an explicit model of the facial musculature. We tested the ability of human observers to identity speech-in-noise accompanied by a blank screen; the real face of the talker; and synthetic talking faces generated either by DNN or FACS. We replicated previous findings of a large benefit for seeing the face of a real talker for speech-in-noise perception and a smaller benefit for DNN synthetic faces. FACS faces also improved perception, but only to the same degree as DNN faces. Analysis at the phoneme level showed that the performance of DNN and FACS faces was particularly poor for phonemes that involve interactions between the teeth and lips, such as /f/, /v/, and /th/. Inspection of single video frames revealed that the characteristic visual features for these phonemes were weak or absent in synthetic faces. Modeling the real vs. synthetic difference showed that increasing the realism of a few phonemes could substantially increase the overall perceptual benefit of synthetic faces.

Optimal feedback improves behavioral focus during self-regulated computer-based work.

Wirzberger, Maria; Lado, Anastasia; Prentice, Mike; Oreshnikov, Ivan; Passy, Jean-Claude; Stock, Adrian; Lieder, Falk.

Sci Rep ; 14(1): 3124, 2024 02 07.

Artigo em Inglês | MEDLINE | ID: mdl-38326361

RESUMO

Distractions are omnipresent and can derail our attention, which is a precious and very limited resource. To achieve their goals in the face of distractions, people need to regulate their attention, thoughts, and behavior; this is known as self-regulation. How can self-regulation be supported or strengthened in ways that are relevant for everyday work and learning activities? To address this question, we introduce and evaluate a desktop application that helps people stay focused on their work and train self-regulation at the same time. Our application lets the user set a goal for what they want to do during a defined period of focused work at their computer, then gives negative feedback when they get distracted, and positive feedback when they reorient their attention towards their goal. After this so-called focus session, the user receives overall feedback on how well they focused on their goal relative to previous sessions. While existing approaches to attention training often use artificial tasks, our approach transforms real-life challenges into opportunities for building strong attention control skills. Our results indicate that optimal attentional feedback can generate large increases in behavioral focus, task motivation, and self-control-benefitting users to successfully achieve their long-term goals.

Assuntos

Aprendizagem , Motivação , Humanos , Retroalimentação , Aprendizagem/fisiologia , Computadores , Atenção/fisiologia

The Effect on Speech-in-Noise Perception of Real Faces and Synthetic Faces Generated with either Deep Neural Networks or the Facial Action Coding System.

Yu, Yingjia; Lado, Anastasia; Zhang, Yue; Magnotti, John F; Beauchamp, Michael S.

bioRxiv ; 2024 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-38370798

RESUMO

The prevalence of synthetic talking faces in both commercial and academic environments is increasing as the technology to generate them grows more powerful and available. While it has long been known that seeing the face of the talker improves human perception of speech-in-noise, recent studies have shown that synthetic talking faces generated by deep neural networks (DNNs) are also able to improve human perception of speech-in-noise. However, in previous studies the benefit provided by DNN synthetic faces was only about half that of real human talkers. We sought to determine whether synthetic talking faces generated by an alternative method would provide a greater perceptual benefit. The facial action coding system (FACS) is a comprehensive system for measuring visually discernible facial movements. Because the action units that comprise FACS are linked to specific muscle groups, synthetic talking faces generated by FACS might have greater verisimilitude than DNN synthetic faces which do not reference an explicit model of the facial musculature. We tested the ability of human observers to identity speech-in-noise accompanied by a blank screen; the real face of the talker; and synthetic talking face generated either by DNN or FACS. We replicated previous findings of a large benefit for seeing the face of a real talker for speech-in-noise perception and a smaller benefit for DNN synthetic faces. FACS faces also improved perception, but only to the same degree as DNN faces. Analysis at the phoneme level showed that the performance of DNN and FACS faces was particularly poor for phonemes that involve interactions between the teeth and lips, such as /f/, /v/, and /th/. Inspection of single video frames revealed that the characteristic visual features for these phonemes were weak or absent in synthetic faces. Modeling the real vs. synthetic difference showed that increasing the realism of a few phonemes could substantially increase the overall perceptual benefit of synthetic faces, providing a roadmap for improving communication in this rapidly developing domain.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA