RESUMEN
For small-object detection, vision patterns can only provide limited support to feature learning. Most prior schemes mainly depend on a single vision pattern to learn object features, seldom considering more latent motion patterns. In the real world, humans often efficiently perceive small objects through multipattern signals. Inspired by this observation, this article attempts to address small-object detection from a new prospective of latent pattern learning. To fulfill this purpose, it regards a real-world moving object as the spatiotemporal sequences of a static object to capture latent motion patterns. In view of this, we propose a motion-inspired cross-pattern learning (MICPL) scheme to capture the motion patterns for moving small-object scenarios. This scheme mainly consists of two crucial parts: motion pattern mining (MPM) and motion-vision adaption. The former is designed to effectively mine the motion pattern from time-dependent representation space. The latter is devised to correlate between motion patterns and vision semantics. In the meanwhile, we explore their cross-pattern interactions to guide MICPL to capture motion patterns effectively. Comparison experiments verify that, cooperated by motion pattern, even a simple detector could often refresh state-of-the-art (SOTA) results on moving small-object detection. Moreover, the experiments on two small-object-related tasks further prove the adaptivity and advantages of our cross-pattern feature learning scheme. Our source codes are available at https://github.com/ UESTC-nnLab/MICPL.
RESUMEN
BACKGROUND AND OBJECTIVE: Pathology image classification is one of the most essential auxiliary processes in cancer diagnosis. To overcome the problem of inadequate Whole-Slide Image (WSI) samples with weak labels, pseudo-bag-based multiple instance learning (MIL) methods have attracted wide attention in pathology image classification. In this type of method, the division scheme of pseudo-bags is usually a primary factor affecting classification performance. In order to improve the division of WSI pseudo-bags on existing random/clustering approaches, this paper proposes a new Prototype-driven Division (ProDiv) scheme for the pseudo-bag-based MIL classification framework on pathology images. METHODS: This scheme first designs an attention-based method to generate a bag prototype for each slide. On this basis, it further groups WSI patch instances into a series of instance clusters according to the feature similarities between the prototype and patches. Finally, pseudo-bags are obtained by randomly combining the non-overlapping patch instances of different instance clusters. Moreover, the design scheme of our ProDiv considers practicality, and it could be smoothly assembled with almost all the MIL-based WSI classification methods in recent years. RESULTS: Empirical results show that our ProDiv, when integrated with several existing methods, can deliver classification AUC improvements of up to 7.3% and 10.3%, respectively on two public WSI datasets. CONCLUSIONS: ProDiv could almost always bring obvious performance improvements to compared MIL models on typical metrics, which suggests the effectiveness of our scheme. Experimental visualization also visually interprets the correctness of the proposed ProDiv.
Asunto(s)
Benchmarking , Análisis por ConglomeradosRESUMEN
Given the special situation of modeling gigapixel images, multiple instance learning (MIL) has become one of the most important frameworks for Whole Slide Image (WSI) classification. In current practice, most MIL networks often face two unavoidable problems in training: i) insufficient WSI data and ii) the sample memorization inclination inherent in neural networks. These problems may hinder MIL models from adequate and efficient training, suppressing the continuous performance promotion of classification models on WSIs. Inspired by the basic idea of Mixup, this paper proposes a new Pseudo-bag Mixup (PseMix) data augmentation scheme to improve the training of MIL models. This scheme generalizes the Mixup strategy for general images to special WSIs via pseudo-bags so as to be applied in MIL-based WSI classification. Cooperated by pseudo-bags, our PseMix fulfills the critical size alignment and semantic alignment in Mixup strategy. Moreover, it is designed as an efficient and decoupled method, neither involving time-consuming operations nor relying on MIL model predictions. Comparative experiments and ablation studies are specially designed to evaluate the effectiveness and advantages of our PseMix. Experimental results show that PseMix could often assist state-of-the-art MIL networks to refresh their classification performance on WSIs. Besides, it could also boost the generalization performance of MIL models in special test scenarios, and promote their robustness to patch occlusion and label noise. Our source code is available at https://github.com/liupei101/PseMix.
Asunto(s)
Algoritmos , Interpretación de Imagen Asistida por Computador , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Aprendizaje Automático , Redes Neurales de la Computación , Aprendizaje ProfundoRESUMEN
The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervised learning requirements. As a result, these models provide patients only with a completely-certain point estimation of time-to-event, and they could only learn from the labeled WSI data currently at a small scale. To tackle these problems, we propose a novel adversarial multiple instance learning (AdvMIL) framework. This framework is based on adversarial time-to-event modeling, and integrates the multiple instance learning (MIL) that is much necessary for WSI representation learning. It is a plug-and-play one, so that most existing MIL-based end-to-end methods can be easily upgraded by applying this framework, gaining the improved abilities of survival distribution estimation and semi-supervised learning. Our extensive experiments show that AdvMIL not only could often bring performance improvement to mainstream WSI survival analysis methods at a relatively low computational cost, but also enables these methods to effectively utilize unlabeled data via semi-supervised learning. Moreover, it is observed that AdvMIL could help improving the robustness of models against patch occlusion and two representative image noises. The proposed AdvMIL framework could promote the research of survival analysis in computational pathology with its novel adversarial MIL paradigm.
Asunto(s)
Aprendizaje Automático Supervisado , Humanos , Análisis de SupervivenciaRESUMEN
BACKGROUND AND OBJECTIVE: Predicting patients' survival from gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To learn effective WSI representations for survival prediction, existing deep learning methods have explored utilizing graphs to describe the complex structure inner WSIs, where graph node is respective to WSI patch. However, these graphs are often densely-connected or static, leading to some redundant or missing patch correlations. Moreover, these methods cannot be directly scaled to the very-large WSI with more than 10,000 patches. To address these, this paper proposes a scalable graph convolution network, GraphLSurv, which can efficiently learn adaptive and sparse structures to better characterize WSIs for survival prediction. METHODS: GraphLSurv has three highlights in methodology: (1) it generates adaptive and sparse structures for patches so that latent patch correlations could be captured and adjusted dynamically according to prediction tasks; (2) based on the generated structure and a given graph, GraphLSurv further aggregates local microenvironmental cues into a non-local embedding using the proposed hybrid message passing network; (3) to make this network suitable for very large-scale graphs, it adopts an anchor-based technique to reduce theorical computation complexity. RESULTS: The experiments on 2268 WSIs show that GraphLSurv achieves a concordance-index of 0.66132 and 0.68348, with an improvement of 3.79% and 3.41% compared to existing methods, on NLST and TCGA-BRCA, respectively. CONCLUSIONS: GraphLSurv could often perform better than previous methods, which suggests that GraphLSurv could provide an important and effective means for WSI survival prediction. Moreover, this work empirically shows that adaptive and sparse structures could be more suitable than static or dense ones for modeling WSIs.
RESUMEN
Unsupervised domain adaptation aims to transfer knowledge from labeled source domain to unlabeled target domain. Recently, multisource domain adaptation (MDA) has begun to attract attention. Its performance should go beyond simply mixing all source domains together for knowledge transfer. In this article, we propose a novel prototype-based method for MDA. Specifically, for solving the problem that the target domain has no label, we use the prototype to transfer the semantic category information from source domains to target domain. First, a feature extraction network is applied to both source and target domains to obtain the extracted features from which the domain-invariant features and domain-specific features will be disentangled. Then, based on these two kinds of features, the named inherent class prototypes and domain prototypes are estimated, respectively. Then a prototype mapping to the extracted feature space is learned in the feature reconstruction process. Thus, the class prototypes for all source and target domains can be constructed in the extracted feature space based on the previous domain prototypes and inherent class prototypes. By forcing the extracted features are close to the corresponding class prototypes for all domains, the feature extraction network is progressively adjusted. In the end, the inherent class prototypes are used as a classifier in the target domain. Our contribution is that through the inherent class prototypes and domain prototypes, the semantic category information from source domains is transformed into the target domain by constructing the corresponding class prototypes. In our method, all source and target domains are aligned twice at the feature level for better domain-invariant features and more closer features to the class prototypes, respectively. Several experiments on public data sets also prove the effectiveness of our method.
RESUMEN
This correspondence presents a coarse-to-fine binary-image-thinning algorithm by proposing a template-based pulse-coupled neural-network model. Under the control of coupled templates, this algorithm iteratively skeletonizes a binary image by changing the load signals of pulse neurons. A direction-constraining scheme for avoiding fingerprint ridge spikes has been discussed. Experiments show that this algorithm is effective for fingerprint thinning, as well as other common images. Moreover, this algorithm can be coupled with a fingerprint identification system to improve the recognition performance.