Pesquisa | BVS IEC

1.

Enhancing learning outcomes through multisensory integration: A fMRI study of audio-visual training in virtual reality.

Alwashmi, Kholoud; Meyer, Georg; Rowe, Fiona; Ward, Ryan.

Neuroimage ; 285: 120483, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38048921

RESUMO

The integration of information from different sensory modalities is a fundamental process that enhances perception and performance in real and virtual environments (VR). Understanding these mechanisms, especially during learning tasks that exploit novel multisensory cue combinations provides opportunities for the development of new rehabilitative interventions. This study aimed to investigate how functional brain changes support behavioural performance improvements during an audio-visual (AV) learning task. Twenty healthy participants underwent a 30 min daily VR training for four weeks. The task was an AV adaptation of a 'scanning training' paradigm that is commonly used in hemianopia rehabilitation. Functional magnetic resonance imaging (fMRI) and performance data were collected at baseline, after two and four weeks of training, and four weeks post-training. We show that behavioural performance, operationalised as mean reaction time reduction in VR, significantly improves. In separate tests in a controlled laboratory environment, we showed that the behavioural performance gains in the VR training environment transferred to a significant mean RT reduction for the trained AV voluntary task on a computer screen. Enhancements were observed in both the visual-only and AV conditions, with the latter demonstrating a faster response time supported by the presence of audio cues. The behavioural learning effect also transfers to two additional tasks that were tested: a visual search task and an involuntary visual task. Our fMRI results reveal an increase in functional activation (BOLD signal) in multisensory brain regions involved in early-stage AV processing: the thalamus, the caudal inferior parietal lobe and cerebellum. These functional changes were only observed for the trained, multisensory, task and not for unimodal visual stimulation. Functional activation changes in the thalamus were significantly correlated to behavioural performance improvements. This study demonstrates that incorporating spatial auditory cues to voluntary visual training in VR leads to augmented brain activation changes in multisensory integration, resulting in measurable performance gains across tasks. The findings highlight the potential of VR-based multisensory training as an effective method for enhancing cognitive function and as a potentially valuable tool in rehabilitative programmes.

Assuntos

Imageamento por Ressonância Magnética , Realidade Virtual , Humanos , Aprendizagem , Encéfalo/fisiologia , Percepção Visual , Cegueira , Percepção Auditiva

2.

Phantom perception as a Bayesian inference problem: a pilot study.

Yasoda-Mohan, Anusha; Chen, Feifan; Ó Sé, Colum; Allard, Remy; Ost, Jan; Vanneste, Sven.

J Neurophysiol ; 131(6): 1311-1327, 2024 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-38718414

RESUMO

Tinnitus is the perception of a continuous sound in the absence of an external source. Although the role of the auditory system is well investigated, there is a gap in how multisensory signals are integrated to produce a single percept in tinnitus. Here, we train participants to learn a new sensory environment by associating a cue with a target signal that varies in perceptual threshold. In the test phase, we present only the cue to see whether the person perceives an illusion of the target signal. We perform two separate experiments to observe the behavioral and electrophysiological responses to the learning and test phases in 1) healthy young adults and 2) people with continuous subjective tinnitus and matched control subjects. We observed that in both parts of the study the percentage of false alarms was negatively correlated with the 75% detection threshold. Additionally, the perception of an illusion goes together with increased evoked response potential in frontal regions of the brain. Furthermore, in patients with tinnitus, we observe no significant difference in behavioral or evoked response in the auditory paradigm, whereas patients with tinnitus were more likely to report false alarms along with increased evoked activity during the learning and test phases in the visual paradigm. This emphasizes the importance of integrity of sensory pathways in multisensory integration and how this process may be disrupted in people with tinnitus. Furthermore, the present study also presents preliminary data supporting evidence that tinnitus patients may be building stronger perceptual models, which needs future studies with a larger population to provide concrete evidence on.NEW & NOTEWORTHY Tinnitus is the continuous phantom perception of a ringing in the ears. Recently, it has been suggested that tinnitus may be a maladaptive inference of the brain to auditory anomalies, whether they are detected or undetected by an audiogram. The present study presents empirical evidence for this hypothesis by inducing an illusion in a sensory domain that is damaged (auditory) and one that is intact (visual). It also presents novel information about how people with tinnitus process multisensory stimuli in the audio-visual domain.

Assuntos

Percepção Auditiva , Teorema de Bayes , Ilusões , Zumbido , Humanos , Zumbido/fisiopatologia , Projetos Piloto , Masculino , Feminino , Adulto , Percepção Auditiva/fisiologia , Ilusões/fisiologia , Percepção Visual/fisiologia , Adulto Jovem , Eletroencefalografia , Estimulação Acústica , Sinais (Psicologia)

3.

Macaque claustrum, pulvinar and putative dorsolateral amygdala support the cross-modal association of social audio-visual stimuli based on meaning.

Froesel, Mathilda; Gacoin, Maëva; Clavagnier, Simon; Hauser, Marc; Goudard, Quentin; Ben Hamed, Suliann.

Eur J Neurosci ; 59(12): 3203-3223, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38637993

RESUMO

Social communication draws on several cognitive functions such as perception, emotion recognition and attention. The association of audio-visual information is essential to the processing of species-specific communication signals. In this study, we use functional magnetic resonance imaging in order to identify the subcortical areas involved in the cross-modal association of visual and auditory information based on their common social meaning. We identified three subcortical regions involved in audio-visual processing of species-specific communicative signals: the dorsolateral amygdala, the claustrum and the pulvinar. These regions responded to visual, auditory congruent and audio-visual stimulations. However, none of them was significantly activated when the auditory stimuli were semantically incongruent with the visual context, thus showing an influence of visual context on auditory processing. For example, positive vocalization (coos) activated the three subcortical regions when presented in the context of positive facial expression (lipsmacks) but not when presented in the context of negative facial expression (aggressive faces). In addition, the medial pulvinar and the amygdala presented multisensory integration such that audiovisual stimuli resulted in activations that were significantly higher than those observed for the highest unimodal response. Last, the pulvinar responded in a task-dependent manner, along a specific spatial sensory gradient. We propose that the dorsolateral amygdala, the claustrum and the pulvinar belong to a multisensory network that modulates the perception of visual socioemotional information and vocalizations as a function of the relevance of the stimuli in the social context. SIGNIFICANCE STATEMENT: Understanding and correctly associating socioemotional information across sensory modalities, such that happy faces predict laughter and escape scenes predict screams, is essential when living in complex social groups. With the use of functional magnetic imaging in the awake macaque, we identify three subcortical structures-dorsolateral amygdala, claustrum and pulvinar-that only respond to auditory information that matches the ongoing visual socioemotional context, such as hearing positively valenced coo calls and seeing positively valenced mutual grooming monkeys. We additionally describe task-dependent activations in the pulvinar, organizing along a specific spatial sensory gradient, supporting its role as a network regulator.

Assuntos

Tonsila do Cerebelo , Percepção Auditiva , Claustrum , Imageamento por Ressonância Magnética , Pulvinar , Percepção Visual , Pulvinar/fisiologia , Tonsila do Cerebelo/fisiologia , Tonsila do Cerebelo/diagnóstico por imagem , Masculino , Animais , Percepção Auditiva/fisiologia , Claustrum/fisiologia , Percepção Visual/fisiologia , Feminino , Expressão Facial , Macaca , Estimulação Luminosa/métodos , Mapeamento Encefálico , Estimulação Acústica , Vocalização Animal/fisiologia , Percepção Social

4.

Multisensory integration of speech and gestures in a naturalistic paradigm.

Matyjek, Magdalena; Kita, Sotaro; Torralba Cuello, Mireia; Soto Faraco, Salvador.

Hum Brain Mapp ; 45(11): e26797, 2024 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-39041175

RESUMO

Speech comprehension is crucial for human social interaction, relying on the integration of auditory and visual cues across various levels of representation. While research has extensively studied multisensory integration (MSI) using idealised, well-controlled stimuli, there is a need to understand this process in response to complex, naturalistic stimuli encountered in everyday life. This study investigated behavioural and neural MSI in neurotypical adults experiencing audio-visual speech within a naturalistic, social context. Our novel paradigm incorporated a broader social situational context, complete words, and speech-supporting iconic gestures, allowing for context-based pragmatics and semantic priors. We investigated MSI in the presence of unimodal (auditory or visual) or complementary, bimodal speech signals. During audio-visual speech trials, compared to unimodal trials, participants more accurately recognised spoken words and showed a more pronounced suppression of alpha power-an indicator of heightened integration load. Importantly, on the neural level, these effects surpassed mere summation of unimodal responses, suggesting non-linear MSI mechanisms. Overall, our findings demonstrate that typically developing adults integrate audio-visual speech and gesture information to facilitate speech comprehension in noisy environments, highlighting the importance of studying MSI in ecologically valid contexts.

Assuntos

Gestos , Percepção da Fala , Humanos , Feminino , Masculino , Percepção da Fala/fisiologia , Adulto Jovem , Adulto , Percepção Visual/fisiologia , Eletroencefalografia , Compreensão/fisiologia , Estimulação Acústica , Fala/fisiologia , Encéfalo/fisiologia , Estimulação Luminosa/métodos

5.

Does the speaker's eye gaze facilitate infants' word segmentation from continuous speech? An ERP study.

Çetinçelik, Melis; Rowland, Caroline F; Snijders, Tineke M.

Dev Sci ; 27(2): e13436, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-37551932

RESUMO

The environment in which infants learn language is multimodal and rich with social cues. Yet, the effects of such cues, such as eye contact, on early speech perception have not been closely examined. This study assessed the role of ostensive speech, signalled through the speaker's eye gaze direction, on infants' word segmentation abilities. A familiarisation-then-test paradigm was used while electroencephalography (EEG) was recorded. Ten-month-old Dutch-learning infants were familiarised with audio-visual stories in which a speaker recited four sentences with one repeated target word. The speaker addressed them either with direct or with averted gaze while speaking. In the test phase following each story, infants heard familiar and novel words presented via audio-only. Infants' familiarity with the words was assessed using event-related potentials (ERPs). As predicted, infants showed a negative-going ERP familiarity effect to the isolated familiarised words relative to the novel words over the left-frontal region of interest during the test phase. While the word familiarity effect did not differ as a function of the speaker's gaze over the left-frontal region of interest, there was also a (not predicted) positive-going early ERP familiarity effect over right fronto-central and central electrodes in the direct gaze condition only. This study provides electrophysiological evidence that infants can segment words from audio-visual speech, regardless of the ostensiveness of the speaker's communication. However, the speaker's gaze direction seems to influence the processing of familiar words. RESEARCH HIGHLIGHTS: We examined 10-month-old infants' ERP word familiarity response using audio-visual stories, in which a speaker addressed infants with direct or averted gaze while speaking. Ten-month-old infants can segment and recognise familiar words from audio-visual speech, indicated by their negative-going ERP response to familiar, relative to novel, words. This negative-going ERP word familiarity effect was present for isolated words over left-frontal electrodes regardless of whether the speaker offered eye contact while speaking. An additional positivity in response to familiar words was observed for direct gaze only, over right fronto-central and central electrodes.

Assuntos

Percepção da Fala , Fala , Lactente , Humanos , Fala/fisiologia , Fixação Ocular , Idioma , Potenciais Evocados/fisiologia , Percepção da Fala/fisiologia

6.

The different brain areas occupied for integrating information of hierarchical linguistic units: a study based on EEG and TMS.

Pei, Changfu; Qiu, Yuan; Li, Fali; Huang, Xunan; Si, Yajing; Li, Yuqin; Zhang, Xiabing; Chen, Chunli; Liu, Qiang; Cao, Zehong; Ding, Nai; Gao, Shan; Alho, Kimmo; Yao, Dezhong; Xu, Peng.

Cereb Cortex ; 33(8): 4740-4751, 2023 04 04.

Artigo em Inglês | MEDLINE | ID: mdl-36178127

RESUMO

Human language units are hierarchical, and reading acquisition involves integrating multisensory information (typically from auditory and visual modalities) to access meaning. However, it is unclear how the brain processes and integrates language information at different linguistic units (words, phrases, and sentences) provided simultaneously in auditory and visual modalities. To address the issue, we presented participants with sequences of short Chinese sentences through auditory, visual, or combined audio-visual modalities while electroencephalographic responses were recorded. With a frequency tagging approach, we analyzed the neural representations of basic linguistic units (i.e. characters/monosyllabic words) and higher-level linguistic structures (i.e. phrases and sentences) across the 3 modalities separately. We found that audio-visual integration occurs in all linguistic units, and the brain areas involved in the integration varied across different linguistic levels. In particular, the integration of sentences activated the local left prefrontal area. Therefore, we used continuous theta-burst stimulation to verify that the left prefrontal cortex plays a vital role in the audio-visual integration of sentence information. Our findings suggest the advantage of bimodal language comprehension at hierarchical stages in language-related information processing and provide evidence for the causal role of the left prefrontal regions in processing information of audio-visual sentences.

Assuntos

Mapeamento Encefálico , Compreensão , Humanos , Compreensão/fisiologia , Encéfalo/fisiologia , Linguística , Eletroencefalografia

7.

Evaluation of parental anxiety following three methods of pre-anesthesia counseling: Video, brochure and verbal communication.

Rudravaram, Swetha; Gupta, Aikta; Kalra, Bhumika; Malhotra, Shahzadi; Gupta, Manoj Kumar; Kamal, Geeta; Agarwal, Shilpa; Parida, Raunak.

Paediatr Anaesth ; 34(7): 665-670, 2024 07.

Artigo em Inglês | MEDLINE | ID: mdl-38661287

RESUMO

BACKGROUND: The purpose of this study is to provide comprehensive and efficient pre-anesthesia counseling (PAC) utilizing audiovisual aids and to examine their effect on parental anxiety. METHODS: For this prospective, controlled study, 174 parents were recruited and randomized into three groups of 58 (Group A: video, Group B: brochure, and Group C: verbal). During pre-anesthesia counseling, the parent was provided with a detailed explanation of preoperative preparation, fasting instructions, transport to the operating room, induction, the emergence of anesthesia, and nursing in the post-anesthesia care unit based on their assigned group. We evaluated parental anxiety using Spielberger's State-Trait Anxiety Inventory before and after the pre-anesthesia counseling. RESULTS: The results of our study show a statistically significant difference in the final mean STAI scores among the three groups (Group A: 34.69 ± 5.31, Group B: 36.34 ± 8.59, and Group C: 43.59 ± 3.39; p < .001). When compared to the brochure and verbal groups, the parents in the video group have the greatest difference in mean baseline and final Spielberger's State-Trait Anxiety Inventory scores (12.207 ± 5.291, p .001). CONCLUSION: The results of our study suggest that pre-anesthesia counseling by video or a brochure before the day of surgery is associated with a higher reduction in parental anxiety when compared to verbal communication.

Assuntos

Ansiedade , Comunicação , Aconselhamento , Folhetos , Pais , Cuidados Pré-Operatórios , Humanos , Ansiedade/prevenção & controle , Ansiedade/psicologia , Pais/psicologia , Feminino , Cuidados Pré-Operatórios/métodos , Masculino , Estudos Prospectivos , Aconselhamento/métodos , Anestesia/métodos , Gravação em Vídeo , Recursos Audiovisuais , Adulto , Criança , Pré-Escolar

8.

Application of 3D printing surgical training models in the preoperative assessment of robot-assisted partial nephrectomy.

Wang, Zheng; Wang, Xin Yu; Yu, Xiao Fen.

BMC Surg ; 24(1): 167, 2024 May 28.

Artigo em Inglês | MEDLINE | ID: mdl-38807080

RESUMO

BACKGROUND: To explore the application effect of 3D printing surgical training models in the preoperative assessment of robot-assisted partial nephrectomy. METHODS: Eighty patients who underwent robot-assisted partial nephrectomy surgery between January 2022 and December 2023 were selected and divided into two groups according to the chronological order. The control group (n = 40) received preoperative assessment with verbal and video education from January 2022 to December 2022, while the observation group (n = 40) received preoperative assessment with 3D printing surgical training models combined with verbal and video education from January 2023 to December 2023. The preoperative anxiety, information demand score, and surgical awareness were compared between the two groups. The physiological stress indicators, including interleukin-6 (IL-6), angiotensin II (AT II), adrenocorticotropic hormone (ACTH), cortisol (Cor), mean arterial pressure (MAP), and heart rate (HR), were also measured at different time points before and after surgery.They were 6:00 am on the day before surgery (T0), 6:00 am on the day of the operation (T1), 6:00 am on the first day after the operation (T2), and 6:00 am on the third day after the operation (T3).The preparation rate before surgery was compared between the two groups. RESULTS: The anxiety and surgical information demand scores were lower in the observation group than in the control group before anesthesia induction, and the difference was statistically significant (P < 0.001). Both groups had lower scores before anesthesia induction than before preoperative assessment, and the difference was statistically significant (P < 0.05). The physiological stress indicators at T1 time points were lower in the observation group than in the control group, and the difference was statistically significant (P < 0.05). The overall means of the physiological stress indicators differed significantly between the two groups (P < 0.001). Compared with the T0 time point, the T1, T2, and T3 time points in both groups were significantly lower, and the difference was statistically significant (P < 0.05). The surgical awareness and preparation rate before surgery were higher in the observation group than in the control group, and the difference was statistically significant (P < 0.05). CONCLUSION: The preoperative assessment mode using 3D printing surgical training models combined with verbal and video education can effectively reduce the psychological and physiological stress responses of surgical patients, improve their surgical awareness, and enhance the preparation rate before surgery.

Assuntos

Nefrectomia , Impressão Tridimensional , Procedimentos Cirúrgicos Robóticos , Humanos , Nefrectomia/métodos , Nefrectomia/educação , Procedimentos Cirúrgicos Robóticos/educação , Feminino , Masculino , Pessoa de Meia-Idade , Cuidados Pré-Operatórios/métodos , Adulto , Idoso , Modelos Anatômicos

9.

Differential Auditory and Visual Phase-Locking Are Observed during Audio-Visual Benefit and Silent Lip-Reading for Speech Perception.

Aller, Máté; Økland, Heidi Solberg; MacGregor, Lucy J; Blank, Helen; Davis, Matthew H.

J Neurosci ; 42(31): 6108-6120, 2022 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-35760528

RESUMO

Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase-locking to auditory and visual signals in MEG recordings from 14 human participants (6 females, 8 males) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual, or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6 Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase-locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared with audio-only speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus did not show above-chance partial coherence with visual speech signals during AV conditions but did show partial coherence in visual-only conditions. Hence, visual speech enabled stronger phase-locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.SIGNIFICANCE STATEMENT Verbal communication in noisy environments is challenging, especially for hearing-impaired individuals. Seeing facial movements of communication partners improves speech perception when auditory signals are degraded or absent. The neural mechanisms supporting lip-reading or audio-visual benefit are not fully understood. Using MEG recordings and partial coherence analysis, we show that speech information is used differently in brain regions that respond to auditory and visual speech. While visual areas use visual speech to improve phase-locking to auditory speech signals, auditory areas do not show phase-locking to visual speech unless auditory speech is absent and visual speech is used to substitute for missing auditory signals. These findings highlight brain processes that combine visual and auditory signals to support speech understanding.

Assuntos

Córtex Auditivo , Percepção da Fala , Córtex Visual , Estimulação Acústica , Córtex Auditivo/fisiologia , Percepção Auditiva , Feminino , Humanos , Leitura Labial , Masculino , Fala/fisiologia , Percepção da Fala/fisiologia , Córtex Visual/fisiologia , Percepção Visual/fisiologia

10.

Left Motor Î´ Oscillations Reflect Asynchrony Detection in Multisensory Speech Perception.

Biau, Emmanuel; Schultz, Benjamin G; Gunter, Thomas C; Kotz, Sonja A.

J Neurosci ; 42(11): 2313-2326, 2022 03 16.

Artigo em Inglês | MEDLINE | ID: mdl-35086905

RESUMO

During multisensory speech perception, slow Î´ oscillations (â¼1-3 Hz) in the listener's brain synchronize with the speech signal, likely engaging in speech signal decomposition. Notable fluctuations in the speech amplitude envelope, resounding speaker prosody, temporally align with articulatory and body gestures and both provide complementary sensations that temporally structure speech. Further, Î´ oscillations in the left motor cortex seem to align with speech and musical beats, suggesting their possible role in the temporal structuring of (quasi)-rhythmic stimulation. We extended the role of Î´ oscillations to audiovisual asynchrony detection as a test case of the temporal analysis of multisensory prosody fluctuations in speech. We recorded Electroencephalograph (EEG) responses in an audiovisual asynchrony detection task while participants watched videos of a speaker. We filtered the speech signal to remove verbal content and examined how visual and auditory prosodic features temporally (mis-)align. Results confirm (1) that participants accurately detected audiovisual asynchrony, and (2) increased Î´ power in the left motor cortex in response to audiovisual asynchrony. The difference of Î´ power between asynchronous and synchronous conditions predicted behavioral performance, and (3) decreased Î´-ß coupling in the left motor cortex when listeners could not accurately map visual and auditory prosodies. Finally, both behavioral and neurophysiological evidence was altered when a speaker's face was degraded by a visual mask. Together, these findings suggest that motor Î´ oscillations support asynchrony detection of multisensory prosodic fluctuation in speech.SIGNIFICANCE STATEMENT Speech perception is facilitated by regular prosodic fluctuations that temporally structure the auditory signal. Auditory speech processing involves the left motor cortex and associated Î´ oscillations. However, visual prosody (i.e., a speaker's body movements) complements auditory prosody, and it is unclear how the brain temporally analyses different prosodic features in multisensory speech perception. We combined an audiovisual asynchrony detection task with electroencephalographic (EEG) recordings to investigate how Î´ oscillations support the temporal analysis of multisensory speech. Results confirmed that asynchrony detection of visual and auditory prosodies leads to increased Î´ power in left motor cortex and correlates with performance. We conclude that Î´ oscillations are invoked in an effort to resolve denoted temporal asynchrony in multisensory speech perception.

Assuntos

Percepção da Fala , Estimulação Acústica , Percepção Auditiva/fisiologia , Eletroencefalografia , Humanos , Estimulação Luminosa , Fala , Percepção da Fala/fisiologia , Percepção Visual/fisiologia

11.

Cross-modal interactions at the audiovisual cocktail-party revealed by behavior, ERPs, and neural oscillations.

Klatt, Laura-Isabelle; Begau, Alexandra; Schneider, Daniel; Wascher, Edmund; Getzmann, Stephan.

Neuroimage ; 271: 120022, 2023 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-36918137

RESUMO

Theories of attention argue that objects are the units of attentional selection. In real-word environments such objects can contain visual and auditory features. To understand how mechanisms of selective attention operate in multisensory environments, in this pre-registered study, we created an audiovisual cocktail-party situation, in which two speakers (left and right of fixation) simultaneously articulated brief numerals. In three separate blocks, informative auditory speech was presented (a) alone or paired with (b) congruent or (c) uninformative visual speech. In all blocks, subjects localized a pre-defined numeral. While audiovisual-congruent and uninformative speech improved response times and speed of information uptake according to diffusion modeling, an ERP analysis revealed that this did not coincide with enhanced attentional engagement. Yet, consistent with object-based attentional selection, the deployment of auditory spatial attention (N2ac) was accompanied by visuo-spatial attentional orienting (N2pc) irrespective of the informational content of visual speech. Notably, an N2pc component was absent in the auditory-only condition, demonstrating that a sound-induced shift of visuo-spatial attention relies on the availability of audio-visual features evolving coherently in time. Additional exploratory analyses revealed cross-modal interactions in working memory and modulations of cognitive control.

Assuntos

Atenção , Potenciais Evocados , Humanos , Atenção/fisiologia , Tempo de Reação/fisiologia , Potenciais Evocados Auditivos/fisiologia , Memória de Curto Prazo , Estimulação Acústica , Percepção Visual/fisiologia , Percepção Auditiva/fisiologia , Estimulação Luminosa , Eletroencefalografia

12.

Psychophysiological Responses of College Students to Audio-Visual Forest Trail Landscapes.

Zeng, Chengcheng; Lin, Wei; Chen, Qibing.

J Urban Health ; 100(4): 711-724, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37495939

RESUMO

Forest trails provide urban residents with contact with nature that improves health and well-being. Vision and hearing are important forms of environmental perception, and visual and auditory stimuli should not be overlooked in forest trail landscapes. This study focused on the health benefits of the audio-visual perception of forest trail landscapes. Forest density (FD) and forest sounds (FS) in forest trail landscapes were examined as visual and auditory variables, respectively. FD was divided into three levels: high (Hd), medium (Md), and low density (Ld). FS were divided into four levels: quiet natural and anthropogenic sounds (QnQa), quiet natural and loud anthropogenic sounds (QnLa), loud natural and quiet anthropogenic sounds (LnQa), and loud natural and loud anthropogenic sounds (LnLa). The levels of these two variables were combined to create 12 conditions. A total of 360 college students were randomly assigned to 12 groups (mapping onto the 12 conditions; N=30 per group). All subjects performed the same 5-min high-pressure task indoors, followed by a 5-min recovery period of experiencing a simulated forest trail landscape (viewing pictures and listening to sounds). Brain waves, blood pressure, blood oxygen saturation (SpO2, measured with a finger monitor), the pulse rate, and mood indicators were collected to analyse the physiological and psychological responses to the audio-visual forest trail landscapes. The results indicated that higher FD and lower FS improved health benefits. The interaction between FD and FS revealed a pattern of combinations that facilitated stress reduction and positive mood recovery. These results are of theoretical value in that they indicate important audio-visual characteristics of forest trail landscapes. In terms of practical applications, these findings support the construction of urban forest trails to provide health benefits.

Assuntos

Florestas , Estudantes , Humanos , Estudantes/psicologia

13.

Perception of audio-visual synchrony in infants at elevated likelihood of developing autism spectrum disorder.

Suri, Kirin N; Whedon, Margaret; Lewis, Michael.

Eur J Pediatr ; 182(5): 2105-2117, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36820895

RESUMO

The inability to perceive audio-visual speech as a unified event may contribute to social impairments and language deficits in children with autism spectrum disorder (ASD). In this study, we examined and compared two groups of infants on their sensitivity to audio-visual asynchrony for a social (speaking face) and non-social event (bouncing ball) and assessed the relations between multisensory integration and language production. Infants at elevated likelihood of developing ASD were less sensitive to audio-visual synchrony for the social event than infants without elevated likelihood. Among infants without elevated likelihood, greater sensitivity to audio-visual synchrony for the social event was associated with a larger productive vocabulary. CONCLUSION: Findings suggest that early deficits in multisensory integration may impair language development among infants with elevated likelihood of developing ASD. WHAT IS KNOWN: â¢Perceptual integration of auditory and visual cues within speech is important for language development. â¢Prior work suggests that children with ASD are less sensitive to the temporal synchrony within audio-visual speech. WHAT IS NEW: â¢In this study, infants at elevated likelihood of developing ASD showed a larger temporal binding window for adynamic social event (Speaking Face) than TD infants, suggesting less efficient multisensory integration.

Assuntos

Transtorno do Espectro Autista , Criança , Humanos , Lactente , Transtorno do Espectro Autista/complicações , Percepção Auditiva , Percepção Visual , Fala , Idioma

14.

Long-term memory representations for audio-visual scenes.

Meyerhoff, Hauke S; Jaggy, Oliver; Papenmeier, Frank; Huff, Markus.

Mem Cognit ; 51(2): 349-370, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36100821

RESUMO

In this study, we investigated the nature of long-term memory representations for naturalistic audio-visual scenes. Whereas previous research has shown that audio-visual scenes are recognized more accurately than their unimodal counterparts, it remains unclear whether this benefit stems from audio-visually integrated long-term memory representations or a summation of independent retrieval cues. We tested two predictions for audio-visually integrated memory representations. First, we used a modeling approach to test whether recognition performance for audio-visual scenes is more accurate than would be expected from independent retrieval cues. This analysis shows that audio-visual integration is not necessary to explain the benefit of audio-visual scenes relative to purely auditory or purely visual scenes. Second, we report a series of experiments investigating the occurrence of study-test congruency effects for unimodal and audio-visual scenes. Most importantly, visually encoded information was immune to additional auditory information presented during testing, whereas auditory encoded information was susceptible to additional visual information presented during testing. This renders a true integration of visual and auditory information in long-term memory representations unlikely. In sum, our results instead provide evidence for visual dominance in long-term memory. Whereas associative auditory information is capable of enhancing memory performance, the long-term memory representations appear to be primarily visual.

Assuntos

Memória de Longo Prazo , Percepção Visual , Humanos , Cognição , Sinais (Psicologia) , Reconhecimento Psicológico

15.

Cooking through perceptual disfluencies: The effects of auditory and visual distortions on predicted and actual memory performance.

Ardiç, E Eylül; Besken, Miri.

Mem Cognit ; 51(4): 862-874, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-36376621

RESUMO

The current study investigated the joint contribution of visual and auditory disfluencies, or distortions, to actual and predicted memory performance with naturalistic, multi-modal materials through three experiments. In Experiments 1 and 2, participants watched food recipe clips containing visual and auditory information that were either fully intact or else distorted in one or both of the two modalities. They were asked to remember these for a later memory test and made memory predictions after each clip. Participants produced lower memory predictions for distorted auditory and visual information than intact ones. However, these perceptual distortions revealed no actual memory differences across encoding conditions, expanding the metacognitive illusion of perceptual disfluency for static, single-word materials to naturalistic, dynamic, multi-modal materials. Experiment 3 provided naïve participants with a hypothetical scenario about the experimental paradigm used in Experiment 1, revealing lower memory predictions for distorted than intact information in both modalities. Theoretically, these results imply that both in-the-moment experiences and a priori beliefs may contribute to the perceptual disfluency illusion. From an applied perspective, the study suggests that when audio-visual distortions occur, individuals might use this information to predict their memory performance, even when it does not factor into actual memory performance.

Assuntos

Ilusões , Metacognição , Humanos , Rememoração Mental , Culinária , Percepção Visual

16.

A noninferiority trial on information-based video versus self-selected video distraction technique for preoperative anxiety reduction in school children: Prepare trial.

Acharya, Vanitha; Jain, Divya; Gandhi, Komal; Bhardwaj, Neerja; Mathew, Preethy.

Paediatr Anaesth ; 33(11): 955-961, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37365954

RESUMO

BACKGROUND: Distraction techniques using smartphones to watch cartoon videos and play videogames have been successfully used to reduce preoperative anxiety in school children. However, the literature about the use of video-based preoperative information technique for anxiety reduction in that age group still remains understudied with conflicting results. We hypothesized that there would be no meaningful difference in anxiety score at induction period between the information-based video versus self-selected video distraction technique. METHODS: Eighty-two children between 6 and 12 years undergoing surgery were randomized to self-selected video (n = 41) and information-based video (n = 41) distraction group in this prospective, randomized, noninferiority trial. Children in self-selected video group were shown video of their choice using smart phones, while children in the information-based video group were shown video of operation theater (OT) set up and induction procedure. The children were taken inside operating room along with parents watching the respective videos. Modified Yale Preoperative Anxiety Scale (m-YPAS), just before induction of anesthesia was recorded as the primary outcome. Induction compliance checklist score, anxiety of the parents, and short-term postoperative outcomes in 15 days (telephonically) were recorded as secondary outcomes. RESULTS: The mean difference in the baseline mYPAS score (95% CI) between the two groups was -2.7 (-8.2 to 2.8, p = .33) and -6.39 (-12.74 to -0.44, p = .05) just before the induction period. The upper bound of the 95% CI did not cross the value of 8, which was the noninferiority margin decided prior to study commencement. 70.73% cases had perfect induction in the self-selected video distraction group, compared to 68.29% in the information-based video group. After 15 days of postoperative follow-up, participants in the self-selected video group had a larger proportion of negative outcomes (53.7%) compared to information-based video group (31.7%), p = .044. CONCLUSION: Information-based technique using smart phone is non inferior to self-selected video-based distraction-based technique in decreasing PA with an additional advantage of decreasing postoperative short-term negative outcomes. TRIAL REGISTRATION: CTRI identifier: CTRI/2020/03/023884.

17.

Video Recording Patients for Direct Care Purposes: Systematic Review and Narrative Synthesis of International Empirical Studies and UK Professional Guidance.

Lear, Rachael; Ellis, Sophia; Ollivierre-Harris, Tiffany; Long, Susannah; Mayer, Erik K.

J Med Internet Res ; 25: e46478, 2023 08 16.

Artigo em Inglês | MEDLINE | ID: mdl-37585249

RESUMO

BACKGROUND: Video recordings of patients may offer advantages to supplement patient assessment and clinical decision-making. However, little is known about the practice of video recording patients for direct care purposes. OBJECTIVE: We aimed to synthesize empirical studies published internationally to explore the extent to which video recording patients is acceptable and effective in supporting direct care and, for the United Kingdom, to summarize the relevant guidance of professional and regulatory bodies. METHODS: Five electronic databases (MEDLINE, Embase, APA PsycINFO, CENTRAL, and HMIC) were searched from 2012 to 2022. Eligible studies evaluated an intervention involving video recording of adult patients (≥18 years) to support diagnosis, care, or treatment. All study designs and countries of publication were included. Websites of UK professional and regulatory bodies were searched to identify relevant guidance. The acceptability of video recording patients was evaluated using study recruitment and retention rates and a framework synthesis of patients' and clinical staff's perspectives based on the Theoretical Framework of Acceptability by Sekhon. Clinically relevant measures of impact were extracted and tabulated according to the study design. The framework approach was used to synthesize the reported ethico-legal considerations, and recommendations of professional and regulatory bodies were extracted and tabulated. RESULTS: Of the 14,221 abstracts screened, 27 studies met the inclusion criteria. Overall, 13 guidance documents were retrieved, of which 7 were retained for review. The views of patients and clinical staff (16 studies) were predominantly positive, although concerns were expressed about privacy, technical considerations, and integrating video recording into clinical workflows; some patients were anxious about their physical appearance. The mean recruitment rate was 68.2% (SD 22.5%; range 34.2%-100%; 12 studies), and the mean retention rate was 73.3% (SD 28.6%; range 16.7%-100%; 17 studies). Regarding effectiveness (10 studies), patients and clinical staff considered video recordings to be valuable in supporting assessment, care, and treatment; in promoting patient engagement; and in enhancing communication and recall of information. Observational studies (n=5) favored video recording, but randomized controlled trials (n=5) did not demonstrate that video recording was superior to the controls. UK guidelines are consistent in their recommendations around consent, privacy, and storage of recordings but lack detailed guidance on how to operationalize these recommendations in clinical practice. CONCLUSIONS: Video recording patients for direct care purposes appears to be acceptable, despite concerns about privacy, technical considerations, and how to incorporate recording into clinical workflows. Methodological quality prevents firm conclusions from being drawn; therefore, pragmatic trials (particularly in older adult care and the movement disorders field) should evaluate the impact of video recording on diagnosis, treatment monitoring, patient-clinician communication, and patient safety. Professional and regulatory documents should signpost to practical guidance on the implementation of video recording in routine practice. TRIAL REGISTRATION: PROSPERO CRD42022331825: https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=331825.

Assuntos

Comunicação , Participação do Paciente , Humanos , Idoso , Pesquisa Empírica , Narração , Tomada de Decisão Clínica

18.

A Facial Feature and Lip Movement Enhanced Audio-Visual Speech Separation Model.

Li, Guizhu; Fu, Min; Sun, Mengnan; Liu, Xuefeng; Zheng, Bing.

Sensors (Basel) ; 23(21)2023 Oct 27.

Artigo em Inglês | MEDLINE | ID: mdl-37960477

RESUMO

The cocktail party problem can be more effectively addressed by leveraging the speaker's visual and audio information. This paper proposes a method to improve the audio's separation using two visual cues: facial features and lip movement. Firstly, residual connections are introduced in the audio separation module to extract detailed features. Secondly, considering the video stream contains information other than the face, which has a minimal correlation with the audio, an attention mechanism is employed in the face module to focus on crucial information. Then, the loss function considers the audio-visual similarity to take advantage of the relationship between audio and visual completely. Experimental results on the public VoxCeleb2 dataset show that the proposed model significantly enhanced SDR, PSEQ, and STOI, especially 4 dB improvements in SDR.

Assuntos

Percepção da Fala , Fala , Lábio , Movimento , Sinais (Psicologia)

19.

Multimodal Sensor-Input Architecture with Deep Learning for Audio-Visual Speech Recognition in Wild.

He, Yibo; Seng, Kah Phooi; Ang, Li Minn.

Sensors (Basel) ; 23(4)2023 Feb 07.

Artigo em Inglês | MEDLINE | ID: mdl-36850432

RESUMO

This paper investigates multimodal sensor architectures with deep learning for audio-visual speech recognition, focusing on in-the-wild scenarios. The term "in the wild" is used to describe AVSR for unconstrained natural-language audio streams and video-stream modalities. Audio-visual speech recognition (AVSR) is a speech-recognition task that leverages both an audio input of a human voice and an aligned visual input of lip motions. However, since in-the-wild scenarios can include more noise, AVSR's performance is affected. Here, we propose new improvements for AVSR models by incorporating data-augmentation techniques to generate more data samples for building the classification models. For the data-augmentation techniques, we utilized a combination of conventional approaches (e.g., flips and rotations), as well as newer approaches, such as generative adversarial networks (GANs). To validate the approaches, we used augmented data from well-known datasets (LRS2-Lip Reading Sentences 2 and LRS3) in the training process and testing was performed using the original data. The study and experimental results indicated that the proposed AVSR model and framework, combined with the augmentation approach, enhanced the performance of the AVSR framework in the wild for noisy datasets. Furthermore, in this study, we discuss the domains of automatic speech recognition (ASR) architectures and audio-visual speech recognition (AVSR) architectures and give a concise summary of the AVSR models that have been proposed.

Assuntos

Aprendizado Profundo , Percepção da Fala , Humanos , Fala , Idioma

20.

Off-Screen Sound Separation Based on Audio-visual Pre-training Using Binaural Audio.

Yoshida, Masaki; Togo, Ren; Ogawa, Takahiro; Haseyama, Miki.

Sensors (Basel) ; 23(9)2023 May 07.

Artigo em Inglês | MEDLINE | ID: mdl-37177744

RESUMO

This study proposes a novel off-screen sound separation method based on audio-visual pre-training. In the field of audio-visual analysis, researchers have leveraged visual information for audio manipulation tasks, such as sound source separation. Although such audio manipulation tasks are based on correspondences between audio and video, these correspondences are not always established. Specifically, sounds coming from outside a screen have no audio-visual correspondences and thus interfere with conventional audio-visual learning. The proposed method separates such off-screen sounds based on their arrival directions using binaural audio, which provides us with three-dimensional sensation. Furthermore, we propose a new pre-training method that can consider the off-screen space and use the obtained representation to improve off-screen sound separation. Consequently, the proposed method can separate off-screen sounds irrespective of the direction from which they arrive. We conducted our evaluation using generated video data to circumvent the problem of difficulty in collecting ground truth for off-screen sounds. We confirmed the effectiveness of our methods through off-screen sound detection and separation tasks.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA