Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 901
Filtrar
1.
Front Psychol ; 15: 1399084, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39380752

RESUMO

This review examines how visual information enhances speech perception in individuals with hearing loss, focusing on the impact of age, linguistic stimuli, and specific hearing loss factors on the effectiveness of audiovisual (AV) integration. While existing studies offer varied and sometimes conflicting findings regarding the use of visual cues, our analysis shows that these key factors can distinctly shape AV speech perception outcomes. For instance, younger individuals and those who receive early intervention tend to benefit more from visual cues, particularly when linguistic complexity is lower. Additionally, languages with dense phoneme spaces demonstrate a higher dependency on visual information, underscoring the importance of tailoring rehabilitation strategies to specific linguistic contexts. By considering these influences, we highlight areas where understanding is still developing and suggest how personalized rehabilitation strategies and supportive systems could be tailored to better meet individual needs. Furthermore, this review brings attention to important aspects that warrant further investigation, aiming to refine theoretical models and contribute to more effective, customized approaches to hearing rehabilitation.

2.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39356327

RESUMO

Single-cell cross-modal joint clustering has been extensively utilized to investigate the tumor microenvironment. Although numerous approaches have been suggested, accurate clustering remains the main challenge. First, the gene expression matrix frequently contains numerous missing values due to measurement limitations. The majority of existing clustering methods treat it as a typical multi-modal dataset without further processing. Few methods conduct recovery before clustering and do not sufficiently engage with the underlying research, leading to suboptimal outcomes. Additionally, the existing cross-modal information fusion strategy does not ensure consistency of representations across different modes, potentially leading to the integration of conflicting information, which could degrade performance. To address these challenges, we propose the 'Recover then Aggregate' strategy and introduce the Unified Cross-Modal Deep Clustering model. Specifically, we have developed a data augmentation technique based on neighborhood similarity, iteratively imposing rank constraints on the Laplacian matrix, thus updating the similarity matrix and recovering dropout events. Concurrently, we integrate cross-modal features and employ contrastive learning to align modality-specific representations with consistent ones, enhancing the effective integration of diverse modal information. Comprehensive experiments on five real-world multi-modal datasets have demonstrated this method's superior effectiveness in single-cell clustering tasks.


Assuntos
Análise de Célula Única , Análise por Conglomerados , Análise de Célula Única/métodos , Humanos , Algoritmos , Microambiente Tumoral , Biologia Computacional/métodos
3.
Trends Cogn Sci ; 2024 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-39368906

RESUMO

Many magic tricks rely solely on vision, but there are few, if any, that rely on auditory perception alone. Here, we question why this is so and argue that research focusing on this issue could provide deeper theoretical insights into the similarities and differences between our senses.

4.
Front Neurosci ; 18: 1411058, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39224575

RESUMO

Objective: The aim of this is to explore changes in cross-modal reorganization within the auditory-visual cortex after cochlear implantation, examining their influence on auditory and speech functions along with their underlying mechanisms. Methods: Twenty prelingually deaf children who received cochlear implantation and rehabilitation training at our hospital between February 2022 and February 2023 comprised the prelingual deaf group. Simultaneously, 20 healthy children served as the control group. The prelingual deaf group underwent brain cortical activity assessment and evaluation of auditory-speech recovery pre-surgery, at postoperative weeks 1 and 2, and at months 1, 3, 6, 9, and 12. The control group underwent parallel assessments and evaluations. We analyzed the correlation between cortical activity in the auditory-visual cortex of patients and their auditory-speech functional recovery. Results: The group with prelingual deafness displayed elevated levels of auditory and visual cortical electromagnetic intensity compared to the control group, both prior to and 9 months after surgery. However, by the 12-month mark post-surgery, there was no discernible distinction between the two groups. Following surgery, the prelingually deaf group exhibited a progressive improvement in both Categories of Auditory Performance (CAP) and Speech Intelligibility Rate (SIR), initially lagging behind the control group. Notably, a negative correlation emerged between auditory and visual cortical electromagnetic intensity values and CAP/SIR scores at the 12-month post-surgery assessment. Conclusion: Cochlear implantation in prelingually deaf children results in elevated activity within the auditory and visual cortices, demonstrated by heightened electromagnetic intensity readings. Cross-modal reorganization is observed temporarily at 3 months post-surgery, which resolves to baseline levels by 12 months post-surgery. This phenomenon of reversal correlates with the restoration of auditory and speech functions in these children.

5.
Food Res Int ; 194: 114889, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39232524

RESUMO

The influence of extrinsic hand-feel touch cues on consumer experiences in food and beverage consumption is well established. However, their impact on trigeminal perception, particularly the oral irritation caused by capsaicin or spicy foods, is less understood. This study aimed to determine the existence of cross-modal associations between hand-feel touch and capsaicin-induced oral irritation. This study investigated whether these potential associations were driven by the sensory contributions of the hand-feel tactile materials (measured by instrumental physical parameters) or by affective responses (evaluated through hedonic scales and the self-reported emotion questionnaire, EsSense Profile®, by consumers). In our study, 96 participants tasted a capsaicin solution while engaging with nine hand-feel tactile materials, i.e., cardboard, linen, rattan, silicone, stainless steel, sandpaper (fine), sandpaper (rough), sponge, and towel. They subsequently rated their liking and emotional responses, perceived intensity of oral irritation, and the congruency between hand-feel tactile sensation and oral irritation. Instrumental measurements characterized the surface texture of the hand-feel tactile materials, which were correlated with the collected sensory data. The results revealed that unique cross-modal associations between hand-feel touch and capsaicin-induced oral irritation. Specifically, while sandpapers demonstrated high congruence with the sensation of oral irritation, stainless steel was found to be least congruent. These associations were influenced by both the common emotional responses ("active," "aggressive," "daring," "energetic," "guilty," and "worried") evoked by the hand-feel tactile materials and the capsaicin, as well as by participants' liking for the hand-feel tactile materials and the characteristics of the surface textures. This study provides empirical evidence of the cross-modality between hand-feel tactile sensations and capsaicin-induced oral irritation, opening new avenues for future research in this area.


Assuntos
Capsaicina , Tato , Humanos , Capsaicina/efeitos adversos , Feminino , Masculino , Adulto , Adulto Jovem , Mãos , Paladar , Adolescente , Emoções , Percepção do Tato , Pessoa de Meia-Idade
6.
Neural Netw ; 180: 106718, 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39293179

RESUMO

With the rapid advent and abundance of remote sensing data in different modalities, cross-modal retrieval tasks have gained importance in the research community. Cross-modal retrieval belongs to the research paradigm in which the query is of one modality and the retrieved output is of the other modality. In this paper, the remote sensing (RS) data modalities considered are the earth observation optical data (aerial photos) and the corresponding hand-drawn sketches. The main challenge of the cross-modal retrieval research objective for optical remote sensing images and the corresponding sketches is the distribution gap between the shared embedding space of the modalities. Prior attempts to resolve this issue have not yielded satisfactory outcomes regarding accurately retrieving cross-modal sketch-image RS data. The state-of-the-art architectures used conventional convolutional architectures, which focused on local pixel-wise information about the modalities to be retrieved. This limits the interaction between the sketch texture and the corresponding image, making these models susceptible to overfitting datasets with particular scenarios. To circumvent this limitation, we suggest establishing multi-modal correspondence using a novel architecture of the combined self and cross-attention algorithms, SPCA-Net to minimize the modality gap by employing attention mechanisms for the query and other modalities. Efficient cross-modal retrieval is achieved through the suggested attention architecture, which empirically emphasizes the global information of the relevant query modality and bridges the domain gap through a unique pairwise cross-attention network. In addition to the novel architecture, this paper introduces a unique loss function, label-specific supervised contrastive loss, tailored to the intricacies of the task and to enhance the discriminative power of the learned embeddings. Extensive evaluations are conducted on two sketch-image remote sensing datasets, Earth-on-Canvas and RSketch. Under the same experimental conditions, the performance metrics of our proposed model beat the state-of-the-art architectures by significant margins of 16.7%, 18.9%, 33.7%, and 40.9% correspondingly.

7.
Data Brief ; 56: 110836, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39263230

RESUMO

Humans primarily understand the world around them through visual perception and touch. As a result, visual and tactile information play crucial roles in the interaction between humans and their environment. In order to establish a correlation between what is seen and what is felt on the same object, particularly on flexible objects (such as textile, leather, skin etc.) which humans often access by touch to cooperatively determine their quality, the need for a new dataset that includes both visual and tactile information arises. This has motivated us to create a dataset that combines visual images and corresponding tactile data to explore the potential of cross-modal data fusion. We have chosen leather as our object of focus due to its widespread usage in everyday life. The dataset we propose consists of visual images depicting leather in various colours and displaying defects, alongside corresponding tactile data collected from the same region of the leather. Notably, the tactile data comprises components along the X, Y, and Z axes. To effectively demonstrate the relationship between visual and tactile data on the same object region, the tactile data is aligned with the visual data and visualized through interpolation. Considering the potential applications in computer vision, we have manually labelled the defect regions in each visual-tactile sample. Ultimately, the dataset comprises a total of 687 records. Each sample includes visual images, image representations of the tactile data (referred to as tactile images for simplicity), and segmentation images highlighting the defect regions, all with the same resolution.

8.
Sensors (Basel) ; 24(18)2024 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-39338607

RESUMO

Multimodal emotion classification (MEC) involves analyzing and identifying human emotions by integrating data from multiple sources, such as audio, video, and text. This approach leverages the complementary strengths of each modality to enhance the accuracy and robustness of emotion recognition systems. However, one significant challenge is effectively integrating these diverse data sources, each with unique characteristics and levels of noise. Additionally, the scarcity of large, annotated multimodal datasets in Bangla limits the training and evaluation of models. In this work, we unveiled a pioneering multimodal Bangla dataset, MAViT-Bangla (Multimodal Audio Video Text Bangla dataset). This dataset, comprising 1002 samples across audio, video, and text modalities, is a unique resource for emotion recognition studies in the Bangla language. It features emotional categories such as anger, fear, joy, and sadness, providing a comprehensive platform for research. Additionally, we developed a framework for audio, video and textual emotion recognition (i.e., AVaTER) that employs a cross-modal attention mechanism among unimodal features. This mechanism fosters the interaction and fusion of features from different modalities, enhancing the model's ability to capture nuanced emotional cues. The effectiveness of this approach was demonstrated by achieving an F1-score of 0.64, a significant improvement over unimodal methods.


Assuntos
Emoções , Emoções/fisiologia , Humanos , Gravação em Vídeo/métodos , Atenção/fisiologia
9.
J Neurophysiol ; 132(4): 1183-1197, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39258775

RESUMO

Adaptation of reactive saccades (RS), made toward the sudden appearance of stimuli in our environment, is a plastic mechanism thought to occur at the motor level of saccade generation. As saccadic oculomotor commands integrate multisensory information in the parietal cortex and superior colliculus, adaptation of RS should occur not only toward visual but also tactile targets. In addition, saccadic adaptation in one modality (vision or touch) should transfer cross-modally. To test these predictions, we used the double-step target paradigm to adapt rightward saccades made at two different eccentricities toward the participants' index and middle fingers, identified either visually (experiment 1) or tactually (experiment 2). In each experiment, the rate of adaptation induced for the adapted modality and the rate of adaptation transfer to the nonadapted modality were compared with that measured in a control (no adaptation) session. Results revealed that touch-triggered RS can be adapted as well as visually triggered ones. Moreover, the transfer pattern was asymmetric: visual saccadic adaptation transferred fully to tactile saccades, whereas tactile saccadic adaptation, despite full generalization to nonadapted fingers, transferred only partially to visual saccades. These findings disclose that in the case of tactile saccades, adaptation can be elicited in the absence of postsaccadic visual feedback. In addition, the asymmetric adaptation transfer across sensory modalities suggests that the adaptation locus for tactile saccades may occur in part upstream of the final motor pathway common to all saccades. These findings bring new insights both on the functional loci(us) and on the error signals of RS adaptation. NEW & NOTEWORTHY The present study revealed that, as predicted from a large literature, adaptation of visual reactive saccades transfers to tactile saccades of the same as well as neighboring amplitudes. Furthermore, in a modified double-step target paradigm, tactile saccades exposed to repeated errors adapt with a similar rate and spatial generalization as visual saccades, but this adaptation only slightly transfers to visual saccades. These findings bring new information on saccadic adaptation processes.


Assuntos
Adaptação Fisiológica , Movimentos Sacádicos , Percepção Visual , Movimentos Sacádicos/fisiologia , Humanos , Adaptação Fisiológica/fisiologia , Masculino , Feminino , Adulto , Percepção Visual/fisiologia , Adulto Jovem , Percepção do Tato/fisiologia
10.
PeerJ Comput Sci ; 10: e2260, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39314711

RESUMO

Point clouds are highly regarded in the field of 3D object detection for their superior geometric properties and versatility. However, object occlusion and defects in scanning equipment frequently result in sparse and missing data within point clouds, adversely affecting the final prediction. Recognizing the synergistic potential between the rich semantic information present in images and the geometric data in point clouds for scene representation, we introduce a two-stage fusion framework (TSFF) for 3D object detection. To address the issue of corrupted geometric information in point clouds caused by object occlusion, we augment point features with image features, thereby enhancing the reference factor of the point cloud during the voting bias phase. Furthermore, we implement a constrained fusion module to selectively sample voting points using a 2D bounding box, integrating valuable image features while reducing the impact of background points in sparse scenes. Our methodology was evaluated on the SUNRGB-D dataset, where it achieved a 3.6 mean average percent (mAP) improvement in the mAP@0.25 evaluation criterion over the baseline. In comparison to other great 3D object detection methods, our method had excellent performance in the detection of some objects.

11.
Entropy (Basel) ; 26(9)2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39330065

RESUMO

Weakly supervised temporal language grounding (TLG) aims to locate events in untrimmed videos based on natural language queries without temporal annotations, necessitating a deep understanding of semantic context across both video and text modalities. Existing methods often focus on simple correlations between query phrases and isolated video segments, neglecting the event-oriented semantic coherence and consistency required for accurate temporal grounding. This can lead to misleading results due to partial frame correlations. To address these limitations, we propose the Event-oriented State Alignment Network (ESAN), which constructs "start-event-end" semantic state sets for both textual and video data. ESAN employs relative entropy for cross-modal alignment through knowledge distillation from pre-trained large models, thereby enhancing semantic coherence within each modality and ensuring consistency across modalities. Our approach leverages vision-language models to extract static frame semantics and large language models to capture dynamic semantic changes, facilitating a more comprehensive understanding of events. Experiments conducted on two benchmark datasets demonstrate that ESAN significantly outperforms existing methods. By reducing false high correlations and improving the overall performance, our method effectively addresses the challenges posed by previous approaches. These advancements highlight the potential of ESAN to improve the precision and reliability of temporal language grounding tasks.

12.
Neural Netw ; 180: 106751, 2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39332209

RESUMO

Though depth images can provide supplementary spatial structural cues for salient object detection (SOD) task, inappropriate utilization of depth features may introduce noisy or misleading features, which may greatly destroy SOD performance. To address this issue, we propose a depth mask guiding network (DMGNet) for RGB-D SOD. In this network, a depth mask guidance module (DMGM) is designed to pre-segment the salient objects from depth images and then create masks using pre-segmented objects to guide the RGB subnetwork to extract more discriminative features. Furthermore, a feature fusion pyramid module (FFPM) is employed to acquire more informative fused features using multi-branch convolutional channels with varying receptive fields, further enhancing the fusion of cross-modal features. Extensive experiments on nine benchmark datasets demonstrate the effectiveness of the proposed network.

13.
Brain Res ; 1844: 149137, 2024 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-39103069

RESUMO

Chronic neuropathic pain and chronic tinnitus have been likened to phantom percepts, in which a complete or partial sensory deafferentation results in a filling in of the missing information derived from memory. 150 participants, 50 with tinnitus, 50 with chronic pain and 50 healthy controls underwent a resting state EEG. Source localized current density is recorded from all the sensory cortices (olfactory, gustatory, somatosensory, auditory, vestibular, visual) as well as the parahippocampal area. Functional connectivity by means of lagged phase synchronization is also computed between these regions of interest. Pain and tinnitus are associated with gamma band activity, reflecting prediction errors, in all sensory cortices except the olfactory and gustatory cortex. Functional connectivity identifies theta frequency connectivity between each of the sensory cortices except the chemical senses to the parahippocampus, but not between the individual sensory cortices. When one sensory domain is deprived, the other senses may provide the parahippocampal 'contextual' area with the most likely sound or somatosensory sensation to fill in the gap, applying an abductive 'duck test' approach, i.e., based on stored multisensory congruence. This novel concept paves the way to develop novel treatments for pain and tinnitus, using multisensory (i.e. visual, vestibular, somatosensory, auditory) modulation with or without associated parahippocampal targeting.


Assuntos
Eletroencefalografia , Neuralgia , Zumbido , Zumbido/fisiopatologia , Humanos , Neuralgia/fisiopatologia , Feminino , Masculino , Pessoa de Meia-Idade , Eletroencefalografia/métodos , Adulto , Encéfalo/fisiopatologia , Idoso , Dor Crônica/fisiopatologia
14.
Neural Netw ; 179: 106587, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39111160

RESUMO

Continuous Sign Language Recognition (CSLR) is a task which converts a sign language video into a gloss sequence. The existing deep learning based sign language recognition methods usually rely on large-scale training data and rich supervised information. However, current sign language datasets are limited, and they are only annotated at sentence-level rather than frame-level. Inadequate supervision of sign language data poses a serious challenge for sign language recognition, which may result in insufficient training of sign language recognition models. To address above problems, we propose a cross-modal knowledge distillation method for continuous sign language recognition, which contains two teacher models and one student model. One of the teacher models is the Sign2Text dialogue teacher model, which takes a sign language video and a dialogue sentence as input and outputs the sign language recognition result. The other teacher model is the Text2Gloss translation teacher model, which targets to translate a text sentence into a gloss sequence. Both teacher models can provide information-rich soft labels to assist the training of the student model, which is a general sign language recognition model. We conduct extensive experiments on multiple commonly used sign language datasets, i.e., PHOENIX 2014T, CSL-Daily and QSL, the results show that the proposed cross-modal knowledge distillation method can effectively improve the sign language recognition accuracy by transferring multi-modal information from teacher models to the student model. Code is available at https://github.com/glq-1992/cross-modal-knowledge-distillation_new.


Assuntos
Aprendizado Profundo , Língua de Sinais , Humanos , Redes Neurais de Computação , Destilação/métodos
15.
Eur J Neurosci ; 60(7): 5621-5657, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39192569

RESUMO

The ventral posterolateral nucleus (VPL), being categorized as the first-order thalamic nucleus, is considered to be dedicated to uni-modal somatosensory processing. Cross-modal sensory interactions on thalamic reticular nucleus cells projecting to the VPL, on the other hand, suggest that VPL cells are subject to cross-modal sensory influences. To test this possibility, the effects of auditory or visual stimulation on VPL cell activities were examined in anaesthetized rats, using juxta-cellular recording and labelling techniques. Recordings were obtained from 70 VPL cells, including 65 cells responsive to cutaneous electrical stimulation of the hindpaw. Auditory or visual alone stimulation did not elicit cell activity except in three bi-modal cells and one auditory cell. Cross-modal alterations of somatosensory response by auditory and/or visual stimulation were recognized in 61 cells with regard to the response magnitude, latency (time and jitter) and/or burst spiking properties. Both early (onset) and late responses were either suppressed or facilitated, and de novo cell activity was also induced. Cross-modal alterations took place depending on the temporal interval between the preceding counterpart and somatosensory stimulations, the intensity and frequency of sound. Alterations were observed mostly at short intervals (< 200 ms) and up to 800 ms intervals. Sounds of higher intensities and lower frequencies were more effective for modulation. The susceptibility to cross-modal influences was related to cell location and/or morphology. These and previously reported similar findings in the auditory and visual thalamic nuclei suggest that cross-modal sensory interactions pervasively take place in the first-order sensory thalamic nuclei.


Assuntos
Estimulação Acústica , Estimulação Luminosa , Animais , Ratos , Masculino , Estimulação Luminosa/métodos , Estimulação Elétrica , Ratos Wistar , Neurônios/fisiologia , Percepção Auditiva/fisiologia , Núcleos Ventrais do Tálamo/fisiologia , Núcleos Talâmicos/fisiologia , Potenciais de Ação/fisiologia , Percepção Visual/fisiologia
16.
Sensors (Basel) ; 24(15)2024 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-39123907

RESUMO

Skeleton-based action recognition, renowned for its computational efficiency and indifference to lighting variations, has become a focal point in the realm of motion analysis. However, most current methods typically only extract global skeleton features, overlooking the potential semantic relationships among various partial limb motions. For instance, the subtle differences between actions such as "brush teeth" and "brush hair" are mainly distinguished by specific elements. Although combining limb movements provides a more holistic representation of an action, relying solely on skeleton points proves inadequate for capturing these nuances. Therefore, integrating detailed linguistic descriptions into the learning process of skeleton features is essential. This motivates us to explore integrating fine-grained language descriptions into the learning process of skeleton features to capture more discriminative skeleton behavior representations. To this end, we introduce a new Linguistic-Driven Partial Semantic Relevance Learning framework (LPSR) in this work. While using state-of-the-art large language models to generate linguistic descriptions of local limb motions and further constrain the learning of local motions, we also aggregate global skeleton point representations and textual representations (which generated from an LLM) to obtain a more generalized cross-modal behavioral representation. On this basis, we propose a cyclic attentional interaction module to model the implicit correlations between partial limb motions. Numerous ablation experiments demonstrate the effectiveness of the method proposed in this paper, and our method also obtains state-of-the-art results.


Assuntos
Semântica , Humanos , Linguística , Movimento/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Aprendizagem/fisiologia
17.
Front Psychol ; 15: 1353490, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39156805

RESUMO

People can use their sense of hearing for discerning thermal properties, though they are for the most part unaware that they can do so. While people unequivocally claim that they cannot perceive the temperature of pouring water through the auditory properties of hearing it being poured, our research further strengthens the understanding that they can. This multimodal ability is implicitly acquired in humans, likely through perceptual learning over the lifetime of exposure to the differences in the physical attributes of pouring water. In this study, we explore people's perception of this intriguing cross modal correspondence, and investigate the psychophysical foundations of this complex ecological mapping by employing machine learning. Our results show that not only can the auditory properties of pouring water be classified by humans in practice, the physical characteristics underlying this phenomenon can also be classified by a pre-trained deep neural network.

18.
Cogn Sci ; 48(8): e13486, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39155515

RESUMO

Research shows that high- and low-pitch sounds can be associated with various meanings. For example, high-pitch sounds are associated with small concepts, whereas low-pitch sounds are associated with large concepts. This study presents three experiments revealing that high-pitch sounds are also associated with open concepts and opening hand actions, while low-pitch sounds are associated with closed concepts and closing hand actions. In Experiment 1, this sound-meaning correspondence effect was shown using the two-alternative forced-choice task, while Experiments 2 and 3 used reaction time tasks to show this interaction. In Experiment 2, high-pitch vocalizations were found to facilitate opening hand gestures, and low-pitch vocalizations were found to facilitate closing hand gestures, when performed simultaneously. In Experiment 3, high-pitched vocalizations were produced particularly rapidly when the visual target stimulus presented an open object, and low-pitched vocalizations were produced particularly rapidly when the target presented a closed object. These findings are discussed concerning the meaning of intonational cues. They are suggested to be based on cross-modally representing conceptual spatial knowledge in sensory, motor, and affective systems. Additionally, this pitch-opening effect might share cognitive processes with other pitch-meaning effects.


Assuntos
Tempo de Reação , Humanos , Masculino , Feminino , Adulto Jovem , Adulto , Percepção da Altura Sonora/fisiologia , Percepção Espacial/fisiologia , Gestos , Som , Estimulação Acústica , Sinais (Psicologia)
19.
Sensors (Basel) ; 24(16)2024 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-39205068

RESUMO

Referring video object segmentation (R-VOS) is a fundamental vision-language task which aims to segment the target referred by language expression in all video frames. Existing query-based R-VOS methods have conducted in-depth exploration of the interaction and alignment between visual and linguistic features but fail to transfer the information of the two modalities to the query vector with balanced intensities. Furthermore, most of the traditional approaches suffer from severe information loss in the process of multi-scale feature fusion, resulting in inaccurate segmentation. In this paper, we propose DCT, an end-to-end decoupled cross-modal transformer for referring video object segmentation, to better utilize multi-modal and multi-scale information. Specifically, we first design a Language-Guided Visual Enhancement Module (LGVE) to transmit discriminative linguistic information to visual features of all levels, performing an initial filtering of irrelevant background regions. Then, we propose a decoupled transformer decoder, using a set of object queries to gather entity-related information from both visual and linguistic features independently, mitigating the attention bias caused by feature size differences. Finally, the Cross-layer Feature Pyramid Network (CFPN) is introduced to preserve more visual details by establishing direct cross-layer communication. Extensive experiments have been carried out on A2D-Sentences, JHMDB-Sentences and Ref-Youtube-VOS. The results show that DCT achieves competitive segmentation accuracy compared with the state-of-the-art methods.

20.
Laryngoscope ; 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39140234

RESUMO

OBJECTIVES: The relationship between the middle temporal (MTG) and occipital cortices in post-lingually deaf (PLD) individuals is unclear. This study aimed to investigate changes in the MTG and occipital cortices excitability and their effects on the occipital cortex in individuals with PLD after receiving a cochlear implant (CI). METHODS: Twenty-six individuals with severe-to-profound binaural sensorineural PLD were assessed clinically. Nine individuals had received a unilateral cochlear implant over 6 months, while 17 had not. Brodmann area 19 (BA19, extra-striate occipital cortex) and MTG (auditory-related area of cortex) were selected as regions of interest. The excitability of the ROI was observed and compared in the surgery and no-surgery groups by functional near-infrared spectroscopy (fNIRS) in the resting state, and correlations between connectivity of the MTG and occipital cortex, and as well as the duration of time that had elapsed following CI surgery, were investigated. RESULTS: fNIRS revealed enhanced global cortical connectivity in the BA19 and MTG on the operative side (p < 0.05) and the connectivity between BA19 and the MTG also increased (p < 0.05). The connectivity between the MTG and BA19 was positively correlated with the duration of cochlear implantation, as was the case for BA18. CONCLUSION: There was evidence for remodeling of the cerebral cortex: increased excitability was observed in the MTG and BA19, and their connectivity was enhanced, indicating a synergistic effect. Moreover, the MTG may further stimulate the visual cortex by strengthening their connectivity after CI. LEVEL OF EVIDENCE: 3 Laryngoscope, 2024.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA