Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 65
Filtrar
1.
IEEE Trans Med Imaging ; PP2020 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-33245693

RESUMO

Clusters of viral pneumonia occurrences over a short period may be a harbinger of an outbreak or pandemic. Rapid and accurate detection of viral pneumonia using chest X-rays can be of significant value for large-scale screening and epidemic prevention, particularly when other more sophisticated imaging modalities are not readily accessible. However, the emergence of novel mutated viruses causes a substantial dataset shift, which can greatly limit the performance of classification-based approaches. In this paper, we formulate the task of differentiating viral pneumonia from non-viral pneumonia and healthy controls into a one-class classification-based anomaly detection problem. We therefore propose the confidence-aware anomaly detection (CAAD) model, which consists of a shared feature extractor, an anomaly detection module, and a confidence prediction module. If the anomaly score produced by the anomaly detection module is large enough, or the confidence score estimated by the confidence prediction module is small enough, the input will be accepted as an anomaly case (i.e., viral pneumonia). The major advantage of our approach over binary classification is that we avoid modeling individual viral pneumonia classes explicitly and treat all known viral pneumonia cases as anomalies to improve the one-class model. The proposed model outperforms binary classification models on the clinical X-VIRAL dataset that contains 5,977 viral pneumonia (no COVID-19) cases, 37,393 non-viral pneumonia or healthy cases. Moreover, when directly testing on the X-COVID dataset that contains 106 COVID-19 cases and 107 normal controls without any fine-tuning, our model achieves an AUC of 83.61% and sensitivity of 71.70%, which is comparable to the performance of radiologists reported in the literature.

2.
Artigo em Inglês | MEDLINE | ID: mdl-33074804

RESUMO

In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications. Recently one-stage methods have gained much attention over two-stage approaches due to their simpler design and competitive performance. Here we propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to other dense prediction problems such as semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the pre-defined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating the intersection over union (IoU) scores during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at: git.io/AdelaiDet.

3.
IEEE Trans Med Imaging ; PP2020 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-32956049

RESUMO

Medical image segmentation is an essential task in computer-aided diagnosis. Despite their prevalence and success, deep convolutional neural networks (DCNNs) still need to be improved to produce accurate and robust enough segmentation results for clinical use. In this paper, we propose a novel and generic framework called Segmentation-Emendation-reSegmentation-Verification (SESV) to improve the accuracy of existing DCNNs in medical image segmentation, instead of designing a more accurate segmentation model. Our idea is to predict the segmentation errors produced by an existing model and then correct them. Since predicting segmentation errors is challenging, we design two ways to tolerate the mistakes in the error prediction. First, rather than using a predicted segmentation error map to correct the segmentation mask directly, we only treat the error map as the prior that indicates the locations where segmentation errors are prone to occur, and then concatenate the error map with the image and segmentation mask as the input of a re-segmentation network. Second, we introduce a verification network to determine whether to accept or reject the refined mask produced by the re-segmentation network on a region-by-region basis. The experimental results on the CRAG, ISIC, and IDRiD datasets suggest that using our SESV framework can improve the accuracy of DeepLabv3+ substantially and achieve advanced performance in the segmentation of gland cells, skin lesions, and retinal microaneurysms. Consistent conclusions can also be drawn when using PSPNet, U-Net, and FPN as the segmentation network, respectively. Therefore, our SESV framework is capable of improving the accuracy of different DCNNs on different medical image segmentation tasks.

4.
Artigo em Inglês | MEDLINE | ID: mdl-32750784

RESUMO

In this work, we consider transferring the structure information from large networks to compact ones for dense prediction tasks in computer vision. Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately, leading to sub-optimal performance. Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense predictions a structured prediction problem. Specifically, we study two structured distillation schemes: i)pair-wise distillation that distills the pair-wise similarities by building a static graph; and ii) holistic distillation that uses adversarial training to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by experiments on three dense prediction tasks: semantic segmentation, depth estimation and object detection. Code is available at: https://git.io/StructKD}.

5.
Artigo em Inglês | MEDLINE | ID: mdl-32750793

RESUMO

We show that existing upsampling operators can be unified using the notion of the index function. This notion is inspired by an observation in the decoding process of deep image matting where indices-guided unpooling can often recover boundary details considerably better than other upsampling operators such as bilinear interpolation. By viewing the indices as a function of the feature map, we introduce the concept of 'learning to index', and present a novel index-guided encoder-decoder framework where indices are learned adaptively from data and are used to guide downsampling and upsampling stages, without extra training supervision. At the core of this framework is a new learnable module, termed Index Network (IndexNet), which dynamically generates indices conditioned on the feature map. IndexNet can be used as a plug-in applicable to almost all convolutional networks that have coupled downsampling and upsampling stages, enabling the networks to dynamically capture variations of local patterns. In particular, we instantiate, investigate five families of IndexNet, highlight their superiority in delivering spatial information over other upsampling operators with experiments on synthetic data, and demonstrate their effectiveness on four dense prediction tasks, including image matting, image denoising, semantic segmentation, and monocular depth estimation. Code and is available at: https://git.io/IndexNet.

6.
Artigo em Inglês | MEDLINE | ID: mdl-32750833

RESUMO

Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distributions. However, such prior distributions are often independent of real data and thus may lose semantic information of data. In practice, the semantic information might be represented by some latent distribution learned from data. However, such latent distribution may incur difficulties in data sampling for GAN methods. In this paper, rather than sampling from the predefined prior distribution, we propose a local coordinate coding GAN (LCCGAN-v1) to improve the performance of GANs. First, we propose a local coordinate coding (LCC)-based sampling method to sample points from the latent manifold. With the LCC sampling method, we can exploit the local information on the latent manifold and thus produce new data with promising quality. Second, we propose an advanced LCCGAN-v2 by introducing a higher-order term in the generator approximation. This term is able to achieve better approximation and thus further improve the performance. More critically, we derive the generalization bound for both LCCGAN-v1 and LCCGAN-v2 and prove that a small-dimensional input is sufficient to achieve good generalization performance. Extensive experiments on four benchmark datasets demonstrate the superiority of the proposed method over existing GAN methods.

7.
Huan Jing Ke Xue ; 41(6): 2664-2670, 2020 Jun 08.
Artigo em Chinês | MEDLINE | ID: mdl-32608781

RESUMO

Underground rivers are an important source of groundwater in karst area. Recently, nitrate pollution of underground rivers has become a serious issue. To identify the sources of nitrate in Guancun typical karst underground river basin, stable isotope techniques (δ15N-NO3-, δ18O-NO3-, and δ18O-H2O) were applied in this study. The contribution rates of different nitrate sources in groundwater were quantitatively identified based on the stable isotope analysis in R (SIAR) model, and the influence of land use type on nitrate distribution and source in watershed water was clarified. The results showed that ① nitrate mainly came from fertilizers, soil organic nitrogen, and manure/sewage based to the isotopic composition of nitrate nitrogen and oxygen isotopes. It was revealed that non-point sources significantly contributed to nitrate in waters of the Guancun underground river basin. ② Nitrification dominated the formation process of nitrate in groundwater, and the initial values of nitrogen and oxygen isotopes were not affected by fractionation. ③ Based on SIAR, the contribution of different sources to nitrate in water in the basin varied seasonally, and the contributions of fertilizer, soil organic nitrogen, and manure/sewage to nitrate were 57.07%, 34.06%, and 8.87% in the wet season and 34.14%, 33.02%, and 32.84% in the dry season, respectively. Overall, the present study quantitatively evaluated the temporal variations of nitrate sources in a typical karst groundwater river basin and provided a theoretical foundation for prevention and control management of non-point source pollution and watershed management in karst areas.

8.
Artigo em Inglês | MEDLINE | ID: mdl-32142419

RESUMO

Is recurrent network really necessary for learning a good visual representation for video based person re-identification (VPRe-id)? In this paper, we first show that the common practice of employing recurrent neural networks (RNNs) to aggregate temporal- spatial features may not be optimal. Specifically, with a diagnostic analysis, we show that the recurrent structure may not be effective learn temporal dependencies as we expected and implicitly yields an orderless representation. Based on this observation, we then present a simple yet surprisingly powerful approach for VPRe-id, where we treat VPRe-id as an efficient orderless ensemble of image based person re-identification problem. More specifically, we divide videos into individual images and re-identify person with ensemble of image based rankers. Under the i.i.d. assumption, we provide an error bound that sheds light upon how could we improve VPRe-id. Our work also presents a promising way to bridge the gap between video and image based person re-identification. Comprehensive experimental evaluations demonstrate that the proposed solution achieves state-of-the-art performances on multiple widely used datasets (iLIDS-VID, PRID 2011, and MARS).

9.
IEEE Trans Med Imaging ; 39(7): 2482-2493, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32070946

RESUMO

Automated skin lesion segmentation and classification are two most essential and related tasks in the computer-aided diagnosis of skin cancer. Despite their prevalence, deep learning models are usually designed for only one task, ignoring the potential benefits in jointly performing both tasks. In this paper, we propose the mutual bootstrapping deep convolutional neural networks (MB-DCNN) model for simultaneous skin lesion segmentation and classification. This model consists of a coarse segmentation network (coarse-SN), a mask-guided classification network (mask-CN), and an enhanced segmentation network (enhanced-SN). On one hand, the coarse-SN generates coarse lesion masks that provide a prior bootstrapping for mask-CN to help it locate and classify skin lesions accurately. On the other hand, the lesion localization maps produced by mask-CN are then fed into enhanced-SN, aiming to transfer the localization information learned by mask-CN to enhanced-SN for accurate lesion segmentation. In this way, both segmentation and classification networks mutually transfer knowledge between each other and facilitate each other in a bootstrapping way. Meanwhile, we also design a novel rank loss and jointly use it with the Dice loss in segmentation networks to address the issues caused by class imbalance and hard-easy pixel imbalance. We evaluate the proposed MB-DCNN model on the ISIC-2017 and PH2 datasets, and achieve a Jaccard index of 80.4% and 89.4% in skin lesion segmentation and an average AUC of 93.8% and 97.7% in skin lesion classification, which are superior to the performance of representative state-of-the-art skin lesion segmentation and classification methods. Our results suggest that it is possible to boost the performance of skin lesion segmentation and classification simultaneously via training a unified model to perform both tasks in a mutual bootstrapping way.

10.
IEEE Trans Neural Netw Learn Syst ; 31(12): 5468-5482, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32078566

RESUMO

As an integral component of blind image deblurring, non-blind deconvolution removes image blur with a given blur kernel, which is essential but difficult due to the ill-posed nature of the inverse problem. The predominant approach is based on optimization subject to regularization functions that are either manually designed or learned from examples. Existing learning-based methods have shown superior restoration quality but are not practical enough due to their restricted and static model design. They solely focus on learning a prior and require to know the noise level for deconvolution. We address the gap between the optimization- and learning-based approaches by learning a universal gradient descent optimizer. We propose a recurrent gradient descent network (RGDN) by systematically incorporating deep neural networks into a fully parameterized gradient descent scheme. A hyperparameter-free update unit shared across steps is used to generate the updates from the current estimates based on a convolutional neural network. By training on diverse examples, the RGDN learns an implicit image prior and a universal update rule through recursive supervision. The learned optimizer can be repeatedly used to improve the quality of diverse degenerated observations. The proposed method possesses strong interpretability and high generalization. Extensive experiments on synthetic benchmarks and challenging real-world images demonstrate that the proposed deep optimization method is effective and robust to produce favorable results as well as practical for real-world image deblurring applications.

11.
IEEE Trans Neural Netw Learn Syst ; 31(11): 4857-4868, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31902782

RESUMO

Most popular clustering methods map raw image data into a projection space in which the clustering assignment is obtained with the vanilla k-means approach. In this article, we discovered a novel prior, namely, there exists a common invariance when assigning an image sample to clusters using different metrics. In short, different distance metrics will lead to similar soft clustering assignments on the manifold. Based on such a novel prior, we propose a novel clustering method by minimizing the discrepancy between pairwise sample assignments for each data point. To the best of our knowledge, this could be the first work to reveal the sample-assignment invariance prior based on the idea of treating labels as ideal representations. Furthermore, the proposed method is one of the first end-to-end clustering approaches, which jointly learns clustering assignment and representation. Extensive experimental results show that the proposed method is remarkably superior to 16 state-of-the-art clustering methods on five image data sets in terms of four evaluation metrics.

12.
IEEE Trans Neural Netw Learn Syst ; 31(10): 4170-4184, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31899434

RESUMO

Low-rank representation-based approaches that assume low-rank tensors and exploit their low-rank structure with appropriate prior models have underpinned much of the recent progress in tensor completion. However, real tensor data only approximately comply with the low-rank requirement in most cases, viz., the tensor consists of low-rank (e.g., principle part) as well as non-low-rank (e.g., details) structures, which limit the completion accuracy of these approaches. To address this problem, we propose an adaptive low-rank representation model for tensor completion that represents low-rank and non-low-rank structures of a latent tensor separately in a Bayesian framework. Specifically, we reformulate the CANDECOMP/PARAFAC (CP) tensor rank and develop a sparsity-induced prior for the low-rank structure that can be used to determine tensor rank automatically. Then, the non-low-rank structure is modeled using a mixture of Gaussians prior that is shown to be sufficiently flexible and powerful to inform the completion process for a variety of real tensor data. With these two priors, we develop a Bayesian minimum mean-squared error estimate framework for inference. The developed framework can capture the important distinctions between low-rank and non-low-rank structures, thereby enabling more accurate model, and ultimately, completion. For various applications, compared with the state-of-the-art methods, the proposed model yields more accurate completion results.

13.
IEEE Trans Pattern Anal Mach Intell ; 42(5): 1228-1242, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-30668461

RESUMO

Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense prediction problems such as semantic segmentation and depth estimation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments on semantic segmentation which is a dense classification problem and achieve good performance on seven public datasets. We further apply our method for depth estimation and demonstrate the effectiveness of our method on dense regression problems.

14.
IEEE Trans Pattern Anal Mach Intell ; 42(7): 1654-1669, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-30835211

RESUMO

Landmark/pose estimation in single monocular images has received much effort in computer vision due to its important applications. It remains a challenging task when input images come with severe occlusions caused by, e.g., adverse camera views. Under such circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity. To address the problem, by incorporating priors about the structure of pose components, we propose a novel structure-aware fully convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, inspired by how human identifies implausible poses, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator G generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors. Training of the network follows the strategy of conditional Generative Adversarial Networks (GANs). The effectiveness of the proposed network is evaluated on three pose-related tasks: 2D human pose estimation, 2D facial landmark estimation and 3D human pose estimation. The proposed approach significantly outperforms several state-of-the-art methods and almost always generates plausible pose predictions, demonstrating the usefulness of implicit learning of structures using GANs.

15.
Artigo em Inglês | MEDLINE | ID: mdl-31796387

RESUMO

Visual Question Answering (VQA) has attracted extensive research focus recently. Along with the ever-increasing data scale and model complexity, the enormous training cost has become an emerging challenge for VQA. In this paper, we show such a massive training cost is indeed plague. In contrast, a fine-grained design of the learning paradigm can be extremely beneficial in terms of both training efficiency and model accuracy. In particular, we argue that there exist two essential and unexplored issues in the existing VQA training paradigm that randomly samples data in each epoch, namely, the "difficulty diversity" and the "label redundancy". Concretely, "difficulty diversity" refers to the varying difficulty levels of different question types, while "label redundancy" refers to the redundant and noisy labels contained in individual question type. To tackle these two issues, in this paper we propose a fine-grained VQA learning paradigm with an actor-critic based learning agent, termed FG-A1C. Instead of using all training data from scratch, FG-A1C includes a learning agent that adaptively and intelligently schedules the most difficult question types in each training epoch. Subsequently, two curriculum learning based schemes are further designed to identify the most useful data to be learned within each inidividual question type. We conduct extensive experiments on the VQA2.0 and VQA-CP v2 datasets, which demonstrate the significant benefits of our approach. For instance, on VQA-CP v2, with less than 75% of the training data, our learning paradigms can help the model achieves better performance than using the whole dataset. Meanwhile, we also shows the effectivenesss of our method in guiding data labeling. Finally, the proposed paradigm can be seamlessly integrated with any cutting-edge VQA models, without modifying their structures.

16.
Plant Methods ; 15: 150, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31857821

RESUMO

Background: Grain yield of wheat is greatly associated with the population of wheat spikes, i.e., s p i k e n u m b e r m - 2 . To obtain this index in a reliable and efficient way, it is necessary to count wheat spikes accurately and automatically. Currently computer vision technologies have shown great potential to automate this task effectively in a low-end manner. In particular, counting wheat spikes is a typical visual counting problem, which is substantially studied under the name of object counting in Computer Vision. TasselNet, which represents one of the state-of-the-art counting approaches, is a convolutional neural network-based local regression model, and currently benchmarks the best record on counting maize tassels. However, when applying TasselNet to wheat spikes, it cannot predict accurate counts when spikes partially present. Results: In this paper, we make an important observation that the counting performance of local regression networks can be significantly improved via adding visual context to the local patches. Meanwhile, such context can be treated as part of the receptive field without increasing the model capacity. We thus propose a simple yet effective contextual extension of TasselNet-TasselNetv2. If implementing TasselNetv2 in a fully convolutional form, both training and inference can be greatly sped up by reducing redundant computations. In particular, we collected and labeled a large-scale wheat spikes counting (WSC) dataset, with 1764 high-resolution images and 675,322 manually-annotated instances. Extensive experiments show that, TasselNetv2 not only achieves state-of-the-art performance on the WSC dataset ( 91.01 % counting accuracy) but also is more than an order of magnitude faster than TasselNet (13.82 fps on 912 × 1216 images). The generality of TasselNetv2 is further demonstrated by advancing the state of the art on both the Maize Tassels Counting and ShanghaiTech Crowd Counting datasets. Conclusions: This paper describes TasselNetv2 for counting wheat spikes, which simultaneously addresses two important use cases in plant counting: improving the counting accuracy without increasing model capacity, and improving efficiency without sacrificing accuracy. It is promising to be deployed in a real-time system with high-throughput demand. In particular, TasselNetv2 can achieve sufficiently accurate results when training from scratch with small networks, and adopting larger pre-trained networks can further boost accuracy. In practice, one can trade off the performance and efficiency according to certain application scenarios. Code and models are made available at: https://tinyurl.com/TasselNetv2.

17.
IEEE Trans Image Process ; 28(12): 6116-6125, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31265400

RESUMO

Humans are capable of learning a new fine-grained concept with very little supervision, e.g., few exemplary images for a species of bird, yet our best deep learning systems need hundreds or thousands of labeled examples. In this paper, we try to reduce this gap by studying the fine-grained image recognition problem in a challenging few-shot learning setting, termed few-shot fine-grained recognition (FSFG). The task of FSFG requires the learning systems to build classifiers for the novel fine-grained categories from few examples (only one or less than five). To solve this problem, we propose an end-to-end trainable deep network, which is inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task. Specifically, our network consists of a bilinear feature learning module and a classifier mapping module: while the former encodes the discriminative information of an exemplar image into a feature vector, the latter maps the intermediate feature into the decision boundary of the novel category. The key novelty of our model is a "piecewise mappings" function in the classifier mapping module, which generates the decision boundary via learning a set of more attainable sub-classifiers in a more parameter-economic way. We learn the exemplar-to-classifier mapping based on an auxiliary dataset in a meta-learning fashion, which is expected to be able to generalize to novel categories. By conducting comprehensive experiments on three fine-grained datasets, we demonstrate that the proposed method achieves superior performance over the competing baselines.

18.
J BUON ; 24(2): 819-825, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31128041

RESUMO

PURPOSE: Melanoma is a malignant skin tumor that can easily metastasize, while no effective treatment exists for this disease. This study explored the mechanism of microRNA-29c in inhibiting melanoma cell growth. METHODS: Bioinformatics analysis and polymerase chain reaction (PCR) experiments were performed to analyze the expression of microRNA-29c in various samples. The Cell Counting Kit-8 (CCK-8) experiment was used to detect cell viability. The mimic and inhibitor of microRNA-29c were transfected into melanoma cells to achieve microRNA-29c overexpression or knockdown so as to observe the biological effect on the melanoma cells. Flow cytometry was used to detect cell cycle, while the luciferase reporter gene assay was used for predicting microRNA-29c target genes. Western blot was performed to determine the cellular protein expression. RESULTS: microRNA-29c was highly expressed in melanoma cells. Overexpression of microRNA-29c inhibited cell viability and induced G1 cell cycle arrest. Conversely, cell proliferation and cycle progression were promoted by transfection of microRNA-29c inhibitor in melanoma cells. In addition, CDK6 served as a microRNA-29c target gene. G1 phase of melanoma cells was blocked by knockdown of CDK6. CONCLUSIONS: microRNA-29c can inhibit the growth of melanoma cells by targeting CDK6, which could trigger G1 arrest of melanoma cells.


Assuntos
Proliferação de Células/genética , Quinase 6 Dependente de Ciclina/genética , Melanoma/genética , MicroRNAs/genética , Apoptose/genética , Linhagem Celular Tumoral , Sobrevivência Celular/genética , Biologia Computacional , Feminino , Pontos de Checagem da Fase G1 do Ciclo Celular/genética , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Melanoma/patologia , Pessoa de Meia-Idade
19.
IEEE Trans Med Imaging ; 38(9): 2092-2103, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-30668469

RESUMO

Automated skin lesion classification in dermoscopy images is an essential way to improve the diagnostic performance and reduce melanoma deaths. Although deep convolutional neural networks (DCNNs) have made dramatic breakthroughs in many image classification tasks, accurate classification of skin lesions remains challenging due to the insufficiency of training data, inter-class similarity, intra-class variation, and the lack of the ability to focus on semantically meaningful lesion parts. To address these issues, we propose an attention residual learning convolutional neural network (ARL-CNN) model for skin lesion classification in dermoscopy images, which is composed of multiple ARL blocks, a global average pooling layer, and a classification layer. Each ARL block jointly uses the residual learning and a novel attention learning mechanisms to improve its ability for discriminative representation. Instead of using extra learnable layers, the proposed attention learning mechanism aims to exploit the intrinsic self-attention ability of DCNNs, i.e., using the feature maps learned by a high layer to generate the attention map for a low layer. We evaluated our ARL-CNN model on the ISIC-skin 2017 dataset. Our results indicate that the proposed ARL-CNN model can adaptively focus on the discriminative parts of skin lesions, and thus achieve the state-of-the-art performance in skin lesion classification.


Assuntos
Interpretação de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Dermatopatias/diagnóstico por imagem , Pele/diagnóstico por imagem , Bases de Dados Factuais , Dermoscopia/métodos , Humanos , Processamento de Sinais Assistido por Computador , Dermatopatias/classificação
20.
Artigo em Inglês | MEDLINE | ID: mdl-30668473

RESUMO

Salient object detection (SOD), which aims to identify and locate the most salient pixels or regions in images, has been attracting more and more interest due to its various realworld applications. However, this vision task is quite challenging, especially under complex image scenes. Inspired by the intrinsic reflection of natural images, in this paper we propose a novel feature learning framework for large-scale salient object detection. Specifically, we design a symmetrical fully convolutional network (SFCN) to effectively learn complementary saliency features under the guidance of lossless feature reflection. The location information, together with contextual and semantic information, of salient objects are jointly utilized to supervise the proposed network for more accurate saliency predictions. In addition, to overcome the blurry boundary problem, we propose a new weighted structural loss function to ensure clear object boundaries and spatially consistent saliency. The coarse prediction results are effectively refined by these structural information for performance improvements. Extensive experiments on seven saliency detection datasets demonstrate that our approach achieves consistently superior performance and outperforms the very recent state-of-the-art methods with a large margin.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA