Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 33: 1782-1794, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38442064

RESUMO

Referring Image Segmentation (RIS) is a fundamental vision-language task that outputs object masks based on text descriptions. Many works have achieved considerable progress for RIS, including different fusion method designs. In this work, we explore an essential question, "What if the text description is wrong or misleading?" For example, the described objects are not in the image. We term such a sentence as a negative sentence. However, existing solutions for RIS cannot handle such a setting. To this end, we propose a new formulation of RIS, named Robust Referring Image Segmentation (R-RIS). It considers the negative sentence inputs besides the regular positive text inputs. To facilitate this new task, we create three R-RIS datasets by augmenting existing RIS datasets with negative sentences and propose new metrics to evaluate both types of inputs in a unified manner. Furthermore, we propose a new transformer-based model, called RefSegformer, with a token-based vision and language fusion module. Our design can be easily extended to our R-RIS setting by adding extra blank tokens. Our proposed RefSegformer achieves state-of-the-art results on both RIS and R-RIS datasets, establishing a solid baseline for both settings. Our project page is at https://github.com/jianzongwu/robust-ref-seg.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 5092-5113, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38315601

RESUMO

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective than weakly supervised and zero-shot settings. This paper thoroughly reviews open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by juxtaposing open vocabulary learning with analogous concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Subsequently, we examine several pertinent tasks within the realms of segmentation and detection, encompassing long-tail problems, few-shot, and zero-shot settings. As a foundation for our method survey, we first elucidate the fundamental principles of detection and segmentation in close-set scenarios. Next, we examine various contexts where open vocabulary learning is employed, pinpointing recurring design elements and central themes. This is followed by a comparative analysis of recent detection and segmentation methodologies in commonly used datasets and benchmarks. Our review culminates with a synthesis of insights, challenges, and discourse on prospective research trajectories. To our knowledge, this constitutes the inaugural exhaustive literature review on open vocabulary learning.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8176-8192, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37018677

RESUMO

Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of inter-token relationships across layers. On the other hand, there is naturally an evolutionary trend among attention maps at different abstraction levels, so it is beneficial to exploit a dedicated convolution-based module to capture this process. Equipped with the proposed mechanism, the convolution-enhanced evolving attention networks achieve superior performance in various applications, including time-series representation, natural language understanding, machine translation, and image classification. Especially on time-series representation tasks, Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly, achieving an average of 17% improvement compared to the best SOTA. To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps. Our implementation is available at https://github.com/pkuyym/EvolvingAttention.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7853-7869, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36417746

RESUMO

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their performance on Video Object Detection (VOD) has not been well explored. In this paper, we present TransVOD, the first end-to-end video object detection system based on simple yet effective spatial-temporal Transformer architectures. The first goal of this paper is to streamline the pipeline of current VOD, effectively removing the need for many hand-crafted components for feature aggregation, e.g., optical flow model, relation networks. Besides, benefited from the object query design in DETR, our method does not need post-processing methods such as Seq-NMS. In particular, we present a temporal Transformer to aggregate both the spatial object queries and the feature memories of each frame. Our temporal transformer consists of two components: Temporal Query Encoder (TQE) to fuse object queries, and Temporal Deformable Transformer Decoder (TDTD) to obtain current frame detection results. These designs boost the strong baseline deformable DETR by a significant margin (3 %-4 % mAP) on the ImageNet VID dataset. TransVOD yields comparable performances on the benchmark of ImageNet VID. Then, we present two improved versions of TransVOD including TransVOD++ and TransVOD Lite. The former fuses object-level information into object query via dynamic convolution while the latter models the entire video clips as the output to speed up the inference time. We give detailed analysis of all three models in the experiment part. In particular, our proposed TransVOD++ sets a new state-of-the-art record in terms of accuracy on ImageNet VID with 90.0 % mAP. Our proposed TransVOD Lite also achieves the best speed and accuracy trade-off with 83.7 % mAP while running at around 30 FPS on a single V100 GPU device. Code and models are available at https://github.com/SJTU-LuHe/TransVOD.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6594-6601, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36194713

RESUMO

Video Instance Segmentation (VIS) is a new and inherently multi-task problem, which aims to detect, segment, and track each instance in a video sequence. Existing approaches are mainly based on single-frame features or single-scale features of multiple frames, where either temporal information or multi-scale information is ignored. To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames. Specifically, TPR contains two novel components, including Dynamic Aligned Cell Routing (DACR) and Cross Pyramid Routing (CPR), where DACR is designed for aligning and gating pyramid features across temporal dimension, while CPR transfers temporally aggregated features across scale dimension. Moreover, our approach is a light-weight and plug-and-play module and can be easily applied to existing instance segmentation methods. Extensive experiments on three datasets including YouTube-VIS (2019, 2021) and Cityscapes-VPS demonstrate the effectiveness and efficiency of the proposed approach on several state-of-the-art instance and panoptic segmentation methods. Codes will be publicly available at https://github.com/lxtGH/TemporalPyramidRouting.

6.
IEEE Trans Image Process ; 30: 6829-6842, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34343090

RESUMO

Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation. However, convolutional neural networks (CNNs) are inherently limited to model such dependencies due to the naive structure in its building modules (e.g., local convolution kernel). While recent global aggregation methods are beneficial for long-range structure information modelling, they would oversmooth and bring noise to the regions contain fine details (e.g., boundaries and small objects), which are very much cared in the semantic segmentation task. To alleviate this problem, we propose to explore the local context for making the aggregated long-range relationship being distributed more accurately in local regions. In particular, we design a novel local distribution module which models the affinity map between global and local relationship for each pixel adaptively. Integrating existing global aggregation modules, we show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks, giving rise to the GALD networks. Despite its simplicity and versatility, our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff. Code and trained models are released at https://github.com/lxtGH/GALD-DGCNet to foster further research.

7.
IEEE Trans Image Process ; 30: 7050-7063, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34329163

RESUMO

Graph-based convolutional model such as non-local block has shown to be effective for strengthening the context modeling ability in convolutional neural networks (CNNs). However, its pixel-wise computational overhead is prohibitive which renders it unsuitable for high resolution imagery. In this paper, we explore the efficiency of context graph reasoning and propose a novel framework called Squeeze Reasoning. Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced. Specifically, we build the node graph in the vector where each node represents an abstract semantic concept. The refined feature within the same semantic category results to be consistent, which is thus beneficial for downstream tasks. We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks. Despite its simplicity and being lightweight, the proposed strategy allows us to establish the considerable results on different semantic segmentation datasets and shows significant improvements with respect to strong baselines on various other scene understanding tasks including object detection, instance segmentation and panoptic segmentation. Code is available at https://github.com/lxtGH/SFSegNets.

8.
PLoS One ; 11(11): e0167050, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27893780

RESUMO

In the stock market, return reversal occurs when investors sell overbought stocks and buy oversold stocks, reversing the stocks' price trends. In this paper, we develop a new method to identify key drivers of return reversal by incorporating a comprehensive set of factors derived from different economic theories into one unified dynamical Bayesian factor graph. We then use the model to depict factor relationships and their dynamics, from which we make some interesting discoveries about the mechanism behind return reversals. Through extensive experiments on the US stock market, we conclude that among the various factors, the liquidity factors consistently emerge as key drivers of return reversal, which is in support of the theory of liquidity effect. Specifically, we find that stocks with high turnover rates or high Amihud illiquidity measures have a greater probability of experiencing return reversals. Apart from the consistent drivers, we find other drivers of return reversal that generally change from year to year, and they serve as important characteristics for evaluating the trends of stock returns. Besides, we also identify some seldom discussed yet enlightening inter-factor relationships, one of which shows that stocks in Finance and Insurance industry are more likely to have high Amihud illiquidity measures in comparison with those in other industries. These conclusions are robust for return reversals under different thresholds.


Assuntos
Teorema de Bayes , Comércio/economia , Investimentos em Saúde/economia , Investimentos em Saúde/tendências , Modelos Econômicos , Humanos , Investimentos em Saúde/estatística & dados numéricos
9.
J Geriatr Cardiol ; 12(4): 448-56, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26345622

RESUMO

The electrocardiogram (ECG) has broad applications in clinical diagnosis and prognosis of cardiovascular disease. Many researchers have contributed to its progressive development. To commemorate those pioneers, and to better study and promote the use of ECG, we reviewed and present here a systematic introduction about the history, hotspots, and trends of ECG. In the historical part, information including the invention, improvement, and extensive applications of ECG, such as in long QT syndrome (LQTS), angina, and myocardial infarction (MI), are chronologically presented. New technologies and applications from the 1990s are also introduced. In the second part, we use the bibliometric analysis method to analyze the hotspots in the field of ECG-related research. By using total citations and year-specific total citations as our main criteria, four key hotspots in ECG-related research were identified from 11 articles, including atrial fibrillation, LQTS, angina and MI, and heart rate variability. Recent studies in those four areas are also reported. In the final part, we discuss the future trends concerning ECG-related research. The authors believe that improvement of the ECG instrumentation, big data mining for ECG, and the accuracy of diagnosis and application will be areas of continuous concern.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA