Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Vis Comput Ind Biomed Art ; 7(1): 9, 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38647624

ABSTRACT

With recent advancements in robotic surgery, notable strides have been made in visual question answering (VQA). Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image. This limitation restricts the interpretative capacity of the VQA models and their ability to explore specific image regions. To address this issue, this study proposes a grounded VQA model for robotic surgery, capable of localizing a specific region during answer prediction. Drawing inspiration from prompt learning in language models, a dual-modality prompt model was developed to enhance precise multimodal information interactions. Specifically, two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model. A visual complementary prompter merges visual prompt knowledge with visual information features to guide accurate localization. The textual complementary prompter aligns visual information with textual prompt knowledge and textual information, guiding textual information towards a more accurate inference of the answer. Additionally, a multiple iterative fusion strategy was adopted for comprehensive answer reasoning, to ensure high-quality generation of textual and grounded answers. The experimental results validate the effectiveness of the model, demonstrating its superiority over existing methods on the EndoVis-18 and EndoVis-17 datasets.

2.
Article in English | MEDLINE | ID: mdl-38319762

ABSTRACT

With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this article, we focus on the task where the agent needs to learn multidimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multilayer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based reinforcement learning (RL) methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network (SAN) without any floating-point matrix operations, we draw inspiration from the nonspiking interneurons found in insects and employ the membrane voltage of the nonspiking neurons to represent the action. Before the nonspiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intralayer connections are used in output populations to enhance the representation capacity. This mechanism exists extensively in animals and has been demonstrated effectively. Finally, we propose a fully SAN with intralayer connections (ILC-SAN). Extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art performance on continuous control tasks from OpenAI gym. Moreover, we estimate the theoretical energy consumption when deploying ILC-SAN on neuromorphic chips to illustrate its high energy efficiency.

3.
Article in English | MEDLINE | ID: mdl-36215386

ABSTRACT

Modeling the interactive relationships of agents is critical to improving the collaborative capability of a multiagent system. Some methods model these by predefined rules. However, due to the nonstationary problem, the interactive relationship changes over time and cannot be well captured by rules. Other methods adopt a simple mechanism such as an attention network to select the neighbors the current agent should collaborate with. However, in large-scale multiagent systems, collaborative relationships are too complicated to be described by a simple attention network. We propose an adaptive and gated graph attention network (AGGAT), which models the interactive relationships between agents in a cascaded manner. In the AGGAT, we first propose a graph-based hard attention network that roughly filters irrelevant agents. Then, normal soft attention is adopted to decide the importance of each neighbor. Finally, gated attention further refines the collaborative relationship of agents. By using cascaded attention, the collaborative relationship of agents is precisely learned in a coarse-to-fine style. Extensive experiments are conducted on a variety of cooperative tasks. The results indicate that our proposed method outperforms state-of-the-art baselines.

4.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8065-8081, 2022 Nov.
Article in English | MEDLINE | ID: mdl-34428133

ABSTRACT

Open set recognition (OSR), aiming to simultaneously classify the seen classes and identify the unseen classes as 'unknown', is essential for reliable machine learning. The key challenge of OSR is how to reduce the empirical classification risk on the labeled known data and the open space risk on the potential unknown data simultaneously. To handle the challenge, we formulate the open space risk problem from the perspective of multi-class integration, and model the unexploited extra-class space with a novel concept Reciprocal Point. Follow this, a novel learning framework, termed Adversarial Reciprocal Point Learning (ARPL), is proposed to minimize the overlap of known distribution and unknown distributions without loss of known classification accuracy. Specifically, each reciprocal point is learned by the extra-class space with the corresponding known category, and the confrontation among multiple known categories are employed to reduce the empirical classification risk. Then, an adversarial margin constraint is proposed to reduce the open space risk by limiting the latent open space constructed by reciprocal points. To further estimate the unknown distribution from open space, an instantiated adversarial enhancement method is designed to generate diverse and confusing training samples, based on the adversarial mechanism between the reciprocal points and known classes. This can effectively enhance the model distinguishability to the unknown classes. Extensive experimental results on various benchmark datasets indicate that the proposed method is significantly superior to other existing approaches and achieves state-of-the-art performance. The code is released on github.com/iCGY96/ARPL.

5.
IEEE Trans Pattern Anal Mach Intell ; 40(7): 1625-1638, 2018 07.
Article in English | MEDLINE | ID: mdl-28692964

ABSTRACT

A number of vision problems such as zero-shot learning and person re-identification can be considered as cross-class transfer learning problems. As mid-level semantic properties shared cross different object classes, attributes have been studied extensively for knowledge transfer across classes. Most previous attribute learning methods focus only on human-defined/nameable semantic attributes, whilst ignoring the fact there also exist undefined/latent shareable visual properties, or latent attributes. These latent attributes can be either discriminative or non-discriminative parts depending on whether they can contribute to an object recognition task. In this work, we argue that learning the latent attributes jointly with user-defined semantic attributes not only leads to better representation but also helps semantic attribute prediction. A novel dictionary learning model is proposed which decomposes the dictionary space into three parts corresponding to semantic, latent discriminative and latent background attributes respectively. Such a joint attribute learning model is then extended by following a multi-task transfer learning framework to address a more challenging unsupervised domain adaptation problem, where annotations are only available on an auxiliary dataset and the target dataset is completely unlabelled. Extensive experiments show that the proposed models, though being linear and thus extremely efficient to compute, produce state-of-the-art results on both zero-shot learning and person re-identification.

SELECTION OF CITATIONS
SEARCH DETAIL
...