Pesquisa | Portal Regional da BVS

1.

Latent Semantic Consensus for Deterministic Geometric Model Fitting.

Xiao, Guobao; Yu, Jun; Ma, Jiayi; Fan, Deng-Ping; Shao, Ling.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Mar 18.

Artigo em Inglês | MEDLINE | ID: mdl-38478435

RESUMO

Estimating reliable geometric model parameters from the data with severe outliers is a fundamental and important task in computer vision. This paper attempts to sample high-quality subsets and select model instances to estimate parameters in the multi-structural data. To address this, we propose an effective method called Latent Semantic Consensus (LSC). The principle of LSC is to preserve the latent semantic consensus in both data points and model hypotheses. Specifically, LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses, respectively. Then, LSC explores the distributions of points in the two latent semantic spaces, to remove outliers, generate high-quality model hypotheses, and effectively estimate model instances. Finally, LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting, due to its deterministic fitting nature and efficiency. Compared with several state-of-the-art model fitting methods, our LSC achieves significant superiority for the performance of both accuracy and speed on synthetic data and real images. The code will be available at https://github.com/guobaoxiao/LSC.

2.

Segment anything model for medical images?

Huang, Yuhao; Yang, Xin; Liu, Lian; Zhou, Han; Chang, Ao; Zhou, Xinrui; Chen, Rusi; Yu, Junxuan; Chen, Jiongquan; Chen, Chaoyu; Liu, Sijing; Chi, Haozhe; Hu, Xindi; Yue, Kejuan; Li, Lei; Grau, Vicente; Fan, Deng-Ping; Dong, Fajin; Ni, Dong.

Med Image Anal ; 92: 103061, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38086235

RESUMO

The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging because of the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. To fully validate SAM's performance on medical data, we collected and sorted 53 open-source datasets and built a large medical segmentation dataset with 18 modalities, 84 objects, 125 object-modality paired targets, 1050K 2D images, and 6033K masks. We comprehensively analyzed different models and strategies on the so-called COSMOS 1050K dataset. Our findings mainly include the following: (1) SAM showed remarkable performance in some specific objects but was unstable, imperfect, or even totally failed in other situations. (2) SAM with the large ViT-H showed better overall performance than that with the small ViT-B. (3) SAM performed better with manual hints, especially box, than the Everything mode. (4) SAM could help human annotation with high labeling quality and less time. (5) SAM was sensitive to the randomness in the center point and tight box prompts, and may suffer from a serious performance drop. (6) SAM performed better than interactive methods with one or a few points, but will be outpaced as the number of points increases. (7) SAM's performance correlated to different factors, including boundary complexity, intensity differences, etc. (8) Finetuning the SAM on specific medical tasks could improve its average DICE performance by 4.39% and 6.68% for ViT-B and ViT-H, respectively. Codes and models are available at: https://github.com/yuhoo0302/Segment-Anything-Model-for-Medical-Images. We hope that this comprehensive report can help researchers explore the potential of SAM applications in MIS, and guide how to appropriately use and develop SAM.

Assuntos

Diagnóstico por Imagem , Processamento de Imagem Assistida por Computador , Humanos , Processamento de Imagem Assistida por Computador/métodos

3.

GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector.

Zheng, Peng; Fu, Huazhu; Fan, Deng-Ping; Fan, Qi; Qin, Jie; Tai, Yu-Wing; Tang, Chi-Keung; Van Gool, Luc.

IEEE Trans Pattern Anal Mach Intell ; 45(9): 10929-10946, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37018107

RESUMO

In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes. The proposed GCoNet+ achieves the new state-of-the-art performance for co-salient object detection (CoSOD) through mining consensus representations based on the following two essential criteria: 1) intra-group compactness to better formulate the consistency among co-salient objects by capturing their inherent shared attributes using our novel group affinity module (GAM); 2) inter-group separability to effectively suppress the influence of noisy objects on the output by introducing our new group collaborating module (GCM) conditioning on the inconsistent consensus. To further improve the accuracy, we design a series of simple yet effective components as follows: i) a recurrent auxiliary classification module (RACM) promoting model learning at the semantic level; ii) a confidence enhancement module (CEM) assisting the model in improving the quality of the final predictions; and iii) a group-based symmetric triplet (GST) loss guiding the model to learn more discriminative features. Extensive experiments on three challenging benchmarks, i.e., CoCA, CoSOD3k, and CoSal2015, demonstrate that our GCoNet+ outperforms the existing 12 cutting-edge models. Code has been released at https://github.com/ZhengPeng7/GCoNet_plus.

4.

IC9600: A Benchmark Dataset for Automatic Image Complexity Assessment.

Feng, Tinglei; Zhai, Yingjie; Yang, Jufeng; Liang, Jie; Fan, Deng-Ping; Zhang, Jing; Shao, Ling; Tao, Dacheng.

IEEE Trans Pattern Anal Mach Intell ; 45(7): 8577-8593, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37015512

RESUMO

Image complexity (IC) is an essential visual perception for human beings to understand an image. However, explicitly evaluating the IC is challenging, and has long been overlooked since, on the one hand, the evaluation of IC is relatively subjective due to its dependence on human perception, and on the other hand, the IC is semantic-dependent while real-world images are diverse. To facilitate the research of IC assessment in this deep learning era, we built the first, to our best knowledge, large-scale IC dataset with 9,600 well-annotated images. The images are of diverse areas such as abstract, paintings and real-world scenes, each of which is elaborately annotated by 17 human contributors. Powered by this high-quality dataset, we further provide a base model to predict the IC scores and estimate the complexity density maps in a weakly supervised way. The model is verified to be effective, and correlates well with human perception (with the Pearson correlation coefficient being 0.949). Last but not the least, we have empirically validated that the exploration of IC can provide auxiliary information and boost the performance of a wide range of computer vision tasks. The dataset and source code can be found at https://github.com/tinglyfeng/IC9600.

5.

Salient Object Detection via Integrity Learning.

Zhuge, Mingchen; Fan, Deng-Ping; Liu, Nian; Zhang, Dingwen; Xu, Dong; Shao, Ling.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 3738-3752, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-35666793

RESUMO

Although current salient object detection (SOD) works have achieved significant progress, they are limited when it comes to the integrity of the predicted salient regions. We define the concept of integrity at both a micro and macro level. Specifically, at the micro level, the model should highlight all parts that belong to a certain salient object. Meanwhile, at the macro level, the model needs to discover all salient objects in a given image. To facilitate integrity learning for SOD, we design a novel Integrity Cognition Network (ICON), which explores three important components for learning strong integrity features. 1) Unlike existing models, which focus more on feature discriminability, we introduce a diverse feature aggregation (DFA) component to aggregate features with various receptive fields (i.e., kernel shape and context) and increase feature diversity. Such diversity is the foundation for mining the integral salient objects. 2) Based on the DFA features, we introduce an integrity channel enhancement (ICE) component with the goal of enhancing feature channels that highlight the integral salient objects, while suppressing the other distracting ones. 3) After extracting the enhanced features, the part-whole verification (PWV) method is employed to determine whether the part and whole object features have strong agreement. Such part-whole agreements can further improve the micro-level integrity for each salient object. To demonstrate the effectiveness of our ICON, comprehensive experiments are conducted on seven challenging benchmarks. Our ICON outperforms the baseline methods in terms of a wide range of metrics. Notably, our ICON achieves â¼ 10% relative improvement over the previous best model in terms of average false negative ratio (FNR), on six datasets. Codes and results are available at: https://github.com/mczhuge/ICON.

6.

Salient Objects in Clutter.

Fan, Deng-Ping; Zhang, Jing; Xu, Gang; Cheng, Ming-Ming; Shao, Ling.

IEEE Trans Pattern Anal Mach Intell ; 45(2): 2344-2366, 2023 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-35404809

RESUMO

In this paper, we identify and address a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter (SOC), includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We therefore argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.

7.

Fast and Low-GPU-memory abdomen CT organ segmentation: The FLARE challenge.

Ma, Jun; Zhang, Yao; Gu, Song; An, Xingle; Wang, Zhihe; Ge, Cheng; Wang, Congcong; Zhang, Fan; Wang, Yu; Xu, Yinan; Gou, Shuiping; Thaler, Franz; Payer, Christian; Stern, Darko; Henderson, Edward G A; McSweeney, Dónal M; Green, Andrew; Jackson, Price; McIntosh, Lachlan; Nguyen, Quoc-Cuong; Qayyum, Abdul; Conze, Pierre-Henri; Huang, Ziyan; Zhou, Ziqi; Fan, Deng-Ping; Xiong, Huan; Dong, Guoqiang; Zhu, Qiongjie; He, Jian; Yang, Xiaoping.

Med Image Anal ; 82: 102616, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-36179380

RESUMO

Automatic segmentation of abdominal organs in CT scans plays an important role in clinical practice. However, most existing benchmarks and datasets only focus on segmentation accuracy, while the model efficiency and its accuracy on the testing cases from different medical centers have not been evaluated. To comprehensively benchmark abdominal organ segmentation methods, we organized the first Fast and Low GPU memory Abdominal oRgan sEgmentation (FLARE) challenge, where the segmentation methods were encouraged to achieve high accuracy on the testing cases from different medical centers, fast inference speed, and low GPU memory consumption, simultaneously. The winning method surpassed the existing state-of-the-art method, achieving a 19× faster inference speed and reducing the GPU memory consumption by 60% with comparable accuracy. We provide a summary of the top methods, make their code and Docker containers publicly available, and give practical suggestions on building accurate and efficient abdominal organ segmentation models. The FLARE challenge remains open for future submissions through a live platform for benchmarking further methodology developments at https://flare.grand-challenge.org/.

Assuntos

Algoritmos , Tomografia Computadorizada por Raios X , Humanos , Tomografia Computadorizada por Raios X/métodos , Abdome/diagnóstico por imagem , Benchmarking , Processamento de Imagem Assistida por Computador/métodos

8.

Concealed Object Detection.

Fan, Deng-Ping; Ji, Ge-Peng; Cheng, Ming-Ming; Shao, Ling.

IEEE Trans Pattern Anal Mach Intell ; 44(10): 6024-6042, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-34061739

RESUMO

We present the first systematic study on concealed object detection (COD), which aims to identify objects that are visually embedded in their background. The high intrinsic similarities between the concealed objects and their background make COD far more challenging than traditional object detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K, which consists of 10,000 images covering concealed objects in diverse real-world scenarios from 78 object categories. Further, we provide rich annotations including object categories, object boundaries, challenging attributes, object-level labels, and instance-level annotations. Our COD10K is the largest COD dataset to date, with the richest annotations, which enables comprehensive concealed object understanding and can even be used to help progress several other vision tasks, such as detection, segmentation, classification etc. Motivated by how animals hunt in the wild, we also design a simple but strong baseline for COD, termed the Search Identification Network (SINet). Without any bells and whistles, SINet outperforms twelve cutting-edge baselines on all datasets tested, making them robust, general architectures that could serve as catalysts for future research in COD. Finally, we provide some interesting findings, and highlight several potential applications and future directions. To spark research in this new field, our code, dataset, and online demo are available at our project page: http://mmcheng.net/cod.

Assuntos

Algoritmos , Interpretação de Imagem Assistida por Computador , Animais , Interpretação de Imagem Assistida por Computador/métodos

9.

Uncertainty Inspired RGB-D Saliency Detection.

Zhang, Jing; Fan, Deng-Ping; Dai, Yuchao; Anwar, Saeed; Saleh, Fatemeh; Aliakbarian, Sadegh; Barnes, Nick.

IEEE Trans Pattern Anal Mach Intell ; 44(9): 5761-5779, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-33856982

RESUMO

We propose the first stochastic framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection models treat this task as a point estimation problem by predicting a single saliency map following a deterministic learning pipeline. We argue that, however, the deterministic solution is relatively ill-posed. Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection which utilizes a latent variable to model the labeling variations. Our framework includes two main models: 1) a generator model, which maps the input image and latent variable to stochastic saliency prediction, and 2) an inference model, which gradually updates the latent variable by sampling it from the true or approximate posterior distribution. The generator model is an encoder-decoder saliency network. To infer the latent variable, we introduce two different solutions: i) a Conditional Variational Auto-encoder with an extra encoder to approximate the posterior distribution of the latent variable; and ii) an Alternating Back-Propagation technique, which directly samples the latent variable from the true posterior distribution. Qualitative and quantitative results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps. The source code is publicly available via our project page: https://github.com/JingZhang617/UCNet.

10.

Re-Thinking Co-Salient Object Detection.

Fan, Deng-Ping; Li, Tengpeng; Lin, Zheng; Ji, Ge-Peng; Zhang, Dingwen; Cheng, Ming-Ming; Fu, Huazhu; Shen, Jianbing.

IEEE Trans Pattern Anal Mach Intell ; 44(8): 4339-4354, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-33600309

RESUMO

In this article, we conduct a comprehensive study on the co-salient object detection (CoSOD) problem for images. CoSOD is an emerging and rapidly growing extension of salient object detection (SOD), which aims to detect the co-occurring salient objects in a group of images. However, existing CoSOD datasets often have a serious data bias, assuming that each group of images contains salient objects of similar visual appearances. This bias can lead to the ideal settings and effectiveness of models trained on existing datasets, being impaired in real-life situations, where similarities are usually semantic or conceptual. To tackle this issue, we first introduce a new benchmark, called CoSOD3k in the wild, which requires a large amount of semantic context, making it more challenging than existing CoSOD datasets. Our CoSOD3k consists of 3,316 high-quality, elaborately selected images divided into 160 groups with hierarchical annotations. The images span a wide range of categories, shapes, object sizes, and backgrounds. Second, we integrate the existing SOD techniques to build a unified, trainable CoSOD framework, which is long overdue in this field. Specifically, we propose a novel CoEG-Net that augments our prior model EGNet with a co-attention projection strategy to enable fast common information learning. CoEG-Net fully leverages previous large-scale SOD datasets and significantly improves the model scalability and stability. Third, we comprehensively summarize 40 cutting-edge algorithms, benchmarking 18 of them over three challenging CoSOD datasets (iCoSeg, CoSal2015, and our CoSOD3k), and reporting more detailed (i.e., group-level) performance analysis. Finally, we discuss the challenges and future works of CoSOD. We hope that our study will give a strong boost to growth in the CoSOD community. The benchmark toolbox and results are available on our project page at https://dpfan.net/CoSOD3K.

Assuntos

Algoritmos , Interpretação de Imagem Assistida por Computador , Semântica

11.

Bifurcated Backbone Strategy for RGB-D Salient Object Detection.

Zhai, Yingjie; Fan, Deng-Ping; Yang, Jufeng; Borji, Ali; Shao, Ling; Han, Junwei; Wang, Liang.

IEEE Trans Image Process ; 30: 8727-8742, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34613915

RESUMO

Multi-level feature fusion is a fundamental topic in computer vision. It has been exploited to detect, segment and classify objects at various scales. When multi-level features meet multi-modal cues, the optimal feature aggregation and multi-modal learning strategy become a hot potato. In this paper, we leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel Bifurcated Backbone Strategy Network (BBS-Net). Our architecture, is simple, efficient, and backbone-independent. In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS). Second, we introduce a depth-enhanced module (DEM) to excavate informative depth cues from the channel and spatial views. Then, RGB and depth modalities are fused in a complementary way. Extensive experiments show that BBS-Net significantly outperforms 18 state-of-the-art (SOTA) models on eight challenging datasets under five evaluation measures, demonstrating the superiority of our approach (~4% improvement in S-measure vs . the top-ranked model: DMRA). In addition, we provide a comprehensive analysis on the generalization ability of different RGB-D datasets and provide a powerful training set for future research. The complete algorithm, benchmark results, and post-processing toolbox are publicly available at https://github.com/zyjwuyan/BBS-Net.

12.

COVID-19 lung infection segmentation with a novel two-stage cross-domain transfer learning framework.

Liu, Jiannan; Dong, Bo; Wang, Shuai; Cui, Hui; Fan, Deng-Ping; Ma, Jiquan; Chen, Geng.

Med Image Anal ; 74: 102205, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34425317

RESUMO

With the global outbreak of COVID-19 in early 2020, rapid diagnosis of COVID-19 has become the urgent need to control the spread of the epidemic. In clinical settings, lung infection segmentation from computed tomography (CT) images can provide vital information for the quantification and diagnosis of COVID-19. However, accurate infection segmentation is a challenging task due to (i) the low boundary contrast between infections and the surroundings, (ii) large variations of infection regions, and, most importantly, (iii) the shortage of large-scale annotated data. To address these issues, we propose a novel two-stage cross-domain transfer learning framework for the accurate segmentation of COVID-19 lung infections from CT images. Our framework consists of two major technical innovations, including an effective infection segmentation deep learning model, called nCoVSegNet, and a novel two-stage transfer learning strategy. Specifically, our nCoVSegNet conducts effective infection segmentation by taking advantage of attention-aware feature fusion and large receptive fields, aiming to resolve the issues related to low boundary contrast and large infection variations. To alleviate the shortage of the data, the nCoVSegNet is pre-trained using a two-stage cross-domain transfer learning strategy, which makes full use of the knowledge from natural images (i.e., ImageNet) and medical images (i.e., LIDC-IDRI) to boost the final training on CT images with COVID-19 infections. Extensive experiments demonstrate that our framework achieves superior segmentation accuracy and outperforms the cutting-edge models, both quantitatively and qualitatively.

Assuntos

COVID-19 , Humanos , Pulmão/diagnóstico por imagem , Aprendizado de Máquina , SARS-CoV-2 , Tomografia Computadorizada por Raios X

13.

Siamese Network for RGB-D Salient Object Detection and Beyond.

Fu, Keren; Fan, Deng-Ping; Ji, Ge-Peng; Zhao, Qijun; Shen, Jianbing; Zhu, Ce.

IEEE Trans Pattern Anal Mach Intell ; PP2021 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-33861691

RESUMO

Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training process. Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture. In this paper, we propose two effective components: joint learning (JL), and densely cooperative fusion (DCF). The JL module provides robust saliency feature learning by exploiting cross-modal commonality via a Siamese network, while the DCF module is introduced for complementary feature discovery. Comprehensive experiments using 5 popular metrics show that the designed framework yields a robust RGB-D saliency detector with good generalization. As a result, JL-DCF significantly advances the SOTAs by an average of ~2.0% (F-measure) across 7 challenging datasets. In addition, we show that JL-DCF is readily applicable to other related multi-modal detection tasks, including RGB-T SOD and video SOD, achieving comparable or better performance.

14.

JCS: An Explainable COVID-19 Diagnosis System by Joint Classification and Segmentation.

Wu, Yu-Huan; Gao, Shang-Hua; Mei, Jie; Xu, Jun; Fan, Deng-Ping; Zhang, Rong-Guo; Cheng, Ming-Ming.

IEEE Trans Image Process ; 30: 3113-3126, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33600316

RESUMO

Recently, the coronavirus disease 2019 (COVID-19) has caused a pandemic disease in over 200 countries, influencing billions of humans. To control the infection, identifying and separating the infected people is the most crucial step. The main diagnostic tool is the Reverse Transcription Polymerase Chain Reaction (RT-PCR) test. Still, the sensitivity of the RT-PCR test is not high enough to effectively prevent the pandemic. The chest CT scan test provides a valuable complementary tool to the RT-PCR test, and it can identify the patients in the early-stage with high sensitivity. However, the chest CT scan test is usually time-consuming, requiring about 21.5 minutes per case. This paper develops a novel Joint Classification and Segmentation (JCS) system to perform real-time and explainable COVID- 19 chest CT diagnosis. To train our JCS system, we construct a large scale COVID- 19 Classification and Segmentation (COVID-CS) dataset, with 144,167 chest CT images of 400 COVID- 19 patients and 350 uninfected cases. 3,855 chest CT images of 200 patients are annotated with fine-grained pixel-level labels of opacifications, which are increased attenuation of the lung parenchyma. We also have annotated lesion counts, opacification areas, and locations and thus benefit various diagnosis aspects. Extensive experiments demonstrate that the proposed JCS diagnosis system is very efficient for COVID-19 classification and segmentation. It obtains an average sensitivity of 95.0% and a specificity of 93.0% on the classification test set, and 78.5% Dice score on the segmentation test set of our COVID-CS dataset. The COVID-CS dataset and code are available at https://github.com/yuhuan-wu/JCS.

Assuntos

COVID-19/diagnóstico por imagem , Aprendizado Profundo , Pulmão/diagnóstico por imagem , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Bases de Dados Factuais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , SARS-CoV-2 , Tomografia Computadorizada por Raios X , Adulto Jovem

15.

Bilateral Attention Network for RGB-D Salient Object Detection.

Zhang, Zhao; Lin, Zheng; Xu, Jun; Jin, Wen-Da; Lu, Shao-Ping; Fan, Deng-Ping.

IEEE Trans Image Process ; 30: 1949-1961, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33439842

RESUMO

RGB-D salient object detection (SOD) aims to segment the most attractive objects in a pair of cross-modal RGB and depth images. Currently, most existing RGB-D SOD methods focus on the foreground region when utilizing the depth images. However, the background also provides important information in traditional SOD methods for promising performance. To better explore salient information in both foreground and background regions, this paper proposes a Bilateral Attention Network (BiANet) for the RGB-D SOD task. Specifically, we introduce a Bilateral Attention Module (BAM) with a complementary attention mechanism: foreground-first (FF) attention and background-first (BF) attention. The FF attention focuses on the foreground region with a gradual refinement style, while the BF one recovers potentially useful salient information in the background region. Benefited from the proposed BAM module, our BiANet can capture more meaningful foreground and background cues, and shift more attention to refining the uncertain details between foreground and background regions. Additionally, we extend our BAM by leveraging the multi-scale techniques for better SOD performance. Extensive experiments on six benchmark datasets demonstrate that our BiANet outperforms other state-of-the-art RGB-D SOD methods in terms of objective metrics and subjective visual comparison. Our BiANet can run up to 80 fps on 224×224 RGB-D images, with an NVIDIA GeForce RTX 2080Ti GPU. Comprehensive ablation studies also validate our contributions.

16.

RGB-D salient object detection: A survey.

Zhou, Tao; Fan, Deng-Ping; Cheng, Ming-Ming; Shen, Jianbing; Shao, Ling.

Comput Vis Media (Beijing) ; 7(1): 37-69, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33432275

RESUMO

Salient object detection, which simulates human visual perception in locating the most significant object(s) in a scene, has been widely applied to various computer vision tasks. Now, the advent of depth sensors means that depth maps can easily be captured; this additional spatial information can boost the performance of salient object detection. Although various RGB-D based salient object detection models with promising performance have been proposed over the past several years, an in-depth understanding of these models and the challenges in this field remains lacking. In this paper, we provide a comprehensive survey of RGB-D based salient object detection models from various perspectives, and review related benchmark datasets in detail. Further, as light fields can also provide depth maps, we review salient object detection models and popular benchmark datasets from this domain too. Moreover, to investigate the ability of existing models to detect salient objects, we have carried out a comprehensive attribute-based evaluation of several representative RGB-D based salient object detection models. Finally, we discuss several challenges and open directions of RGB-D based salient object detection for future research. All collected models, benchmark datasets, datasets constructed for attribute-based evaluation, and related code are publicly available at https://github.com/taozh2017/RGBD-SODsurvey.

17.

Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks.

Fan, Deng-Ping; Lin, Zheng; Zhang, Zhao; Zhu, Menglong; Cheng, Ming-Ming.

IEEE Trans Neural Netw Learn Syst ; 32(5): 2075-2089, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-32491986

RESUMO

The use of RGB-D information for salient object detection (SOD) has been extensively explored in recent years. However, relatively few efforts have been put toward modeling SOD in real-world human activity scenes with RGB-D. In this article, we fill the gap by making the following contributions to RGB-D SOD: 1) we carefully collect a new Salient Person (SIP) data set that consists of ~1 K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and background s; 2) we conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research, and we systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven data sets containing a total of about 97k images; and 3) we propose a simple general architecture, called deep depth-depurator network (D3Net). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning, respectively. These components form a nested structure and are elaborately designed to be learned jointly. D3Net exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that D3Net can be used to efficiently extract salient object masks from real scenes, enabling effective background-changing application with a speed of 65 frames/s on a single GPU. All the saliency maps, our new SIP data set, the D3Net model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.

Assuntos

Cor , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Benchmarking , Sistemas Computacionais , Humanos , Aprendizado de Máquina , Redes Neurais de Computação

18.

Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images.

Fan, Deng-Ping; Zhou, Tao; Ji, Ge-Peng; Zhou, Yi; Chen, Geng; Fu, Huazhu; Shen, Jianbing; Shao, Ling.

IEEE Trans Med Imaging ; 39(8): 2626-2637, 2020 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-32730213

RESUMO

Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. Further, collecting a large amount of data is impractical within a short time period, inhibiting the training of a deep model. To address these challenges, a novel COVID-19 Lung Infection Segmentation Deep Network (Inf-Net) is proposed to automatically identify infected regions from chest CT slices. In our Inf-Net, a parallel partial decoder is used to aggregate the high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Moreover, to alleviate the shortage of labeled data, we present a semi-supervised segmentation framework based on a randomly selected propagation strategy, which only requires a few labeled images and leverages primarily unlabeled data. Our semi-supervised framework can improve the learning ability and achieve a higher performance. Extensive experiments on our COVID-SemiSeg and real CT volumes demonstrate that the proposed Inf-Net outperforms most cutting-edge segmentation models and advances the state-of-the-art performance.

Assuntos

Infecções por Coronavirus/diagnóstico por imagem , Pneumonia Viral/diagnóstico por imagem , Aprendizado de Máquina Supervisionado , Tomografia Computadorizada por Raios X/métodos , Algoritmos , Betacoronavirus , COVID-19 , Humanos , Pulmão/diagnóstico por imagem , Pandemias , SARS-CoV-2

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA