RESUMO
Foundation vision-language models are currently transforming computer vision, and are on the rise in medical imaging fueled by their very promising generalization capabilities. However, the initial attempts to transfer this new paradigm to medical imaging have shown less impressive performances than those observed in other domains, due to the significant domain shift and the complex, expert domain knowledge inherent to medical-imaging tasks. Motivated by the need for domain-expert foundation models, we present FLAIR, a pre-trained vision-language model for universal retinal fundus image understanding. To this end, we compiled 38 open-access, mostly categorical fundus imaging datasets from various sources, with up to 101 different target conditions and 288,307 images. We integrate the expert's domain knowledge in the form of descriptive textual prompts, during both pre-training and zero-shot inference, enhancing the less-informative categorical supervision of the data. Such a textual expert's knowledge, which we compiled from the relevant clinical literature and community standards, describes the fine-grained features of the pathologies as well as the hierarchies and dependencies between them. We report comprehensive evaluations, which illustrate the benefit of integrating expert knowledge and the strong generalization capabilities of FLAIR under difficult scenarios with domain shifts or unseen categories. When adapted with a lightweight linear probe, FLAIR outperforms fully-trained, dataset-focused models, more so in the few-shot regimes. Interestingly, FLAIR outperforms by a wide margin larger-scale generalist image-language models and retina domain-specific self-supervised networks, which emphasizes the potential of embedding experts' domain knowledge and the limitations of generalist models in medical imaging. The pre-trained model is available at: https://github.com/jusiro/FLAIR.
RESUMO
BACKGROUND: Bladder cancer (BC) segmentation on MRI images is the first step to determining the presence of muscular invasion. This study aimed to assess the tumor segmentation performance of three deep learning (DL) models on multi-parametric MRI (mp-MRI) images. METHODS: We studied 53 patients with bladder cancer. Bladder tumors were segmented on each slice of T2-weighted (T2WI), diffusion-weighted imaging/apparent diffusion coefficient (DWI/ADC), and T1-weighted contrast-enhanced (T1WI) images acquired at a 3Tesla MRI scanner. We trained Unet, MAnet, and PSPnet using three loss functions: cross-entropy (CE), dice similarity coefficient loss (DSC), and focal loss (FL). We evaluated the model performances using DSC, Hausdorff distance (HD), and expected calibration error (ECE). RESULTS: The MAnet algorithm with the CE+DSC loss function gave the highest DSC values on the ADC, T2WI, and T1WI images. PSPnet with CE+DSC obtained the smallest HDs on the ADC, T2WI, and T1WI images. The segmentation accuracy overall was better on the ADC and T1WI than on the T2WI. The ECEs were the smallest for PSPnet with FL on the ADC images, while they were the smallest for MAnet with CE+DSC on the T2WI and T1WI. CONCLUSIONS: Compared to Unet, MAnet and PSPnet with a hybrid CE+DSC loss function displayed better performances in BC segmentation depending on the choice of the evaluation metric.
RESUMO
Most segmentation losses are arguably variants of the Cross-Entropy (CE) or Dice losses. On the surface, these two categories of losses (i.e., distribution based vs. geometry based) seem unrelated, and there is no clear consensus as to which category is a better choice, with varying performances for each across different benchmarks and applications. Furthermore, it is widely argued within the medical-imaging community that Dice and CE are complementary, which has motivated the use of compound CE-Dice losses. In this work, we provide a theoretical analysis, which shows that CE and Dice share a much deeper connection than previously thought. First, we show that, from a constrained-optimization perspective, they both decompose into two components, i.e., a similar ground-truth matching term, which pushes the predicted foreground regions towards the ground-truth, and a region-size penalty term imposing different biases on the size (or proportion) of the predicted regions. Then, we provide bound relationships and an information-theoretic analysis, which uncover hidden region-size biases: Dice has an intrinsic bias towards specific extremely imbalanced solutions, whereas CE implicitly encourages the ground-truth region proportions. Our theoretical results explain the wide experimental evidence in the medical-imaging literature, whereby Dice losses bring improvements for imbalanced segmentation. It also explains why CE dominates natural-image problems with diverse class proportions, in which case Dice might have difficulty adapting to different region-size distributions. Based on our theoretical analysis, we propose a principled and simple solution, which enables to control explicitly the region-size bias. The proposed method integrates CE with explicit terms based on L1 or the KL divergence, which encourage segmenting region proportions to match target class proportions, thereby mitigating class imbalance but without losing generality. Comprehensive experiments and ablation studies over different losses and applications validate our theoretical analysis, as well as the effectiveness of explicit and simple region-size terms. The code is available at https://github.com/by-liu/SegLossBias .
RESUMO
Semi-supervised learning relaxes the need of large pixel-wise labeled datasets for image segmentation by leveraging unlabeled data. A prominent way to exploit unlabeled data is to regularize model predictions. Since the predictions of unlabeled data can be unreliable, uncertainty-aware schemes are typically employed to gradually learn from meaningful and reliable predictions. Uncertainty estimation methods, however, rely on multiple inferences from the model predictions that must be computed for each training step, which is computationally expensive. Moreover, these uncertainty maps capture pixel-wise disparities and do not consider global information. This work proposes a novel method to estimate segmentation uncertainty by leveraging global information from the segmentation masks. More precisely, an anatomically-aware representation is first learnt to model the available segmentation masks. The learnt representation thereupon maps the prediction of a new segmentation into an anatomically-plausible segmentation. The deviation from the plausible segmentation aids in estimating the underlying pixel-level uncertainty in order to further guide the segmentation network. The proposed method consequently estimates the uncertainty using a single inference from our representation, thereby reducing the total computation. We evaluate our method on two publicly available segmentation datasets of left atria in cardiac MRIs and of multiple organs in abdominal CTs. Our anatomically-aware method improves the segmentation accuracy over the state-of-the-art semi-supervised methods in terms of two commonly used evaluation metrics.
Assuntos
Benchmarking , Átrios do Coração , Humanos , Incerteza , Aprendizado de Máquina Supervisionado , Processamento de Imagem Assistida por ComputadorRESUMO
Neonatal MRIs are used increasingly in preterm infants. However, it is not always feasible to analyze this data. Having a tool that assesses brain maturation during this period of extraordinary changes would be immensely helpful. Approaches based on deep learning approaches could solve this task since, once properly trained and validated, they can be used in practically any system and provide holistic quantitative information in a matter of minutes. However, one major deterrent for radiologists is that these tools are not easily interpretable. Indeed, it is important that structures driving the results be detailed and survive comparison to the available literature. To solve these challenges, we propose an interpretable pipeline based on deep learning to predict postmenstrual age at scan, a key measure for assessing neonatal brain development. For this purpose, we train a state-of-the-art deep neural network to segment the brain into 87 different regions using normal preterm and term infants from the dHCP study. We then extract informative features for brain age estimation using the segmented MRIs and predict the brain age at scan with a regression model. The proposed framework achieves a mean absolute error of 0.46 weeks to predict postmenstrual age at scan. While our model is based solely on structural T2-weighted images, the results are superior to recent, arguably more complex approaches. Furthermore, based on the extracted knowledge from the trained models, we found that frontal and parietal lobes are among the most important structures for neonatal brain age estimation.
Assuntos
Recém-Nascido Prematuro , Nascimento Prematuro , Feminino , Humanos , Recém-Nascido , Lactente , Encéfalo/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Redes Neurais de ComputaçãoRESUMO
Despite the undeniable progress in visual recognition tasks fueled by deep neural networks, there exists recent evidence showing that these models are poorly calibrated, resulting in over-confident predictions. The standard practices of minimizing the cross-entropy loss during training promote the predicted softmax probabilities to match the one-hot label assignments. Nevertheless, this yields a pre-softmax activation of the correct class that is significantly larger than the remaining activations, which exacerbates the miscalibration problem. Recent observations from the classification literature suggest that loss functions that embed implicit or explicit maximization of the entropy of predictions yield state-of-the-art calibration performances. Despite these findings, the impact of these losses in the relevant task of calibrating medical image segmentation networks remains unexplored. In this work, we provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. Specifically, these losses could be viewed as approximations of a linear penalty (or a Lagrangian term) imposing equality constraints on logit distances. This points to an important limitation of such underlying equality constraints, whose ensuing gradients constantly push towards a non-informative solution, which might prevent from reaching the best compromise between the discriminative performance and calibration of the model during gradient-based optimization. Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. Comprehensive experiments on a variety of public medical image segmentation benchmarks demonstrate that our method sets novel state-of-the-art results on these tasks in terms of network calibration, whereas the discriminative performance is also improved. The code is available at https://github.com/Bala93/MarginLoss.
Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos , Calibragem , EntropiaRESUMO
Despite achieving promising results in a breadth of medical image segmentation tasks, deep neural networks (DNNs) require large training datasets with pixel-wise annotations. Obtaining these curated datasets is a cumbersome process which limits the applicability of DNNs in scenarios where annotated images are scarce. Mixed supervision is an appealing alternative for mitigating this obstacle. In this setting, only a small fraction of the data contains complete pixel-wise annotations and other images have a weaker form of supervision, e.g., only a handful of pixels are labeled. In this work, we propose a dual-branch architecture, where the upper branch (teacher) receives strong annotations, while the bottom one (student) is driven by limited supervision and guided by the upper branch. Combined with a standard cross-entropy loss over the labeled pixels, our novel formulation integrates two important terms: (i) a Shannon entropy loss defined over the less-supervised images, which encourages confident student predictions in the bottom branch; and (ii) a Kullback-Leibler (KL) divergence term, which transfers the knowledge (i.e., predictions) of the strongly supervised branch to the less-supervised branch and guides the entropy (student-confidence) term to avoid trivial solutions. We show that the synergy between the entropy and KL divergence yields substantial improvements in performance. We also discuss an interesting link between Shannon-entropy minimization and standard pseudo-mask generation, and argue that the former should be preferred over the latter for leveraging information from unlabeled pixels. We evaluate the effectiveness of the proposed formulation through a series of quantitative and qualitative experiments using two publicly available datasets. Results demonstrate that our method significantly outperforms other strategies for semantic segmentation within a mixed-supervision framework, as well as recent semi-supervised approaches. Moreover, in line with recent observations in classification, we show that the branch trained with reduced supervision and guided by the top branch largely outperforms the latter. Our code is publicly available: https://github.com/by-liu/ConfKD.
Assuntos
Redes Neurais de Computação , Semântica , Humanos , EntropiaRESUMO
Domain adaptation (DA) has drawn high interest for its capacity to adapt a model trained on labeled source data to perform well on unlabeled or weakly labeled target data from a different domain. Most common DA techniques require concurrent access to the input images of both the source and target domains. However, in practice, privacy concerns often impede the availability of source images in the adaptation phase. This is a very frequent DA scenario in medical imaging, where, for instance, the source and target images could come from different clinical sites. We introduce a source-free domain adaptation for image segmentation. Our formulation is based on minimizing a label-free entropy loss defined over target-domain data, which we further guide with weak labels of the target samples and a domain-invariant prior on the segmentation regions. Many priors can be derived from anatomical information. Here, a class-ratio prior is estimated from anatomical knowledge and integrated in the form of a Kullback-Leibler (KL) divergence in our overall loss function. Furthermore, we motivate our overall loss with an interesting link to maximizing the mutual information between the target images and their label predictions. We show the effectiveness of our prior-aware entropy minimization in a variety of domain-adaptation scenarios, with different modalities and applications, including spine, prostate and cardiac segmentation. Our method yields comparable results to several state-of-the-art adaptation techniques, despite having access to much less information, as the source images are entirely absent in our adaptation phase. Our straightforward adaptation strategy uses only one network, contrary to popular adversarial techniques, which are not applicable to a source-free DA setting. Our framework can be readily used in a breadth of segmentation problems, and our code is publicly available: https://github.com/mathilde-b/SFDA.
Assuntos
Próstata , Coluna Vertebral , Humanos , Masculino , Processamento de Imagem Assistida por Computador/métodosRESUMO
Current unsupervised anomaly localization approaches rely on generative models to learn the distribution of normal images, which is later used to identify potential anomalous regions derived from errors on the reconstructed images. To address the limitations of residual-based anomaly localization, very recent literature has focused on attention maps, by integrating supervision on them in the form of homogenization constraints. In this work, we propose a novel formulation that addresses the problem in a more principled manner, leveraging well-known knowledge in constrained optimization. In particular, the equality constraint on the attention maps in prior work is replaced by an inequality constraint, which allows more flexibility. In addition, to address the limitations of penalty-based functions we employ an extension of the popular log-barrier methods to handle the constraint. Last, we propose an alternative regularization term that maximizes the Shannon entropy of the attention maps, reducing the amount of hyperparameters of the proposed model. Comprehensive experiments on two publicly available datasets on brain lesion segmentation demonstrate that the proposed approach substantially outperforms relevant literature, establishing new state-of-the-art results for unsupervised lesion segmentation.
RESUMO
Learning similarity is a key aspect in medical image analysis, particularly in recommendation systems or in uncovering the interpretation of anatomical data in images. Most existing methods learn such similarities in the embedding space over image sets using a single metric learner. Images, however, have a variety of object attributes such as color, shape, or artifacts. Encoding such attributes using a single metric learner is inadequate and may fail to generalize. Instead, multiple learners could focus on separate aspects of these attributes in subspaces of an overarching embedding. This, however, implies the number of learners to be found empirically for each new dataset. This work, Dynamic Subspace Learners, proposes to dynamically exploit multiple learners by removing the need of knowing apriori the number of learners and aggregating new subspace learners during training. Furthermore, the visual interpretability of such subspace learning is enforced by integrating an attention module into our method. This integrated attention mechanism provides a visual insight of discriminative image features that contribute to the clustering of image sets and a visual explanation of the embedding features. The benefits of our attention-based dynamic subspace learners are evaluated in the application of image clustering, image retrieval, and weakly supervised segmentation. Our method achieves competitive results with the performances of multiple learners baselines and significantly outperforms the classification network in terms of clustering and retrieval scores on three different public benchmark datasets. Moreover, our method also provides an attention map generated directly during inference to illustrate the visual interpretability of the embedding features. These attention maps offer a proxy-labels, which improves the segmentation accuracy up to 15% in Dice scores when compared to state-of-the-art interpretation techniques.
Assuntos
Algoritmos , Inteligência Artificial , Artefatos , Análise por Conglomerados , HumanosRESUMO
The segmentation of retinal vasculature from eye fundus images is a fundamental task in retinal image analysis. Over recent years, increasingly complex approaches based on sophisticated Convolutional Neural Network architectures have been pushing performance on well-established benchmark datasets. In this paper, we take a step back and analyze the real need of such complexity. We first compile and review the performance of 20 different techniques on some popular databases, and we demonstrate that a minimalistic version of a standard U-Net with several orders of magnitude less parameters, carefully trained and rigorously evaluated, closely approximates the performance of current best techniques. We then show that a cascaded extension (W-Net) reaches outstanding performance on several popular datasets, still using orders of magnitude less learnable weights than any previously published work. Furthermore, we provide the most comprehensive cross-dataset performance analysis to date, involving up to 10 different databases. Our analysis demonstrates that the retinal vessel segmentation is far from solved when considering test images that differ substantially from the training data, and that this task represents an ideal scenario for the exploration of domain adaptation techniques. In this context, we experiment with a simple self-labeling strategy that enables moderate enhancement of cross-dataset performance, indicating that there is still much room for improvement in this area. Finally, we test our approach on Artery/Vein and vessel segmentation from OCTA imaging problems, where we again achieve results well-aligned with the state-of-the-art, at a fraction of the model complexity available in recent literature. Code to reproduce the results in this paper is released.
Assuntos
Redes Neurais de Computação , Vasos Retinianos , Fundo de Olho , Processamento de Imagem Assistida por Computador/métodos , Retina/diagnóstico por imagem , Vasos Retinianos/diagnóstico por imagemRESUMO
Weakly supervised learning has emerged as an appealing alternative to alleviate the need for large labeled datasets in semantic segmentation. Most current approaches exploit class activation maps (CAMs), which can be generated from image-level annotations. Nevertheless, resulting maps have been demonstrated to be highly discriminant, failing to serve as optimal proxy pixel-level labels. We present a novel learning strategy that leverages self-supervision in a multi-modal image scenario to significantly enhance original CAMs. In particular, the proposed method is based on two observations. First, the learning of fully-supervised segmentation networks implicitly imposes equivariance by means of data augmentation, whereas this implicit constraint disappears on CAMs generated with image tags. And second, the commonalities between image modalities can be employed as an efficient self-supervisory signal, correcting the inconsistency shown by CAMs obtained across multiple modalities. To effectively train our model, we integrate a novel loss function that includes a within-modality and a cross-modality equivariant term to explicitly impose these constraints during training. In addition, we add a KL-divergence on the class prediction distributions to facilitate the information exchange between modalities which, combined with the equivariant regularizers further improves the performance of our model. Exhaustive experiments on the popular multi-modal BraTS and prostate DECATHLON segmentation challenge datasets demonstrate that our approach outperforms relevant recent literature under the same learning conditions.
Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos , Masculino , Próstata , Semântica , Aprendizado de Máquina SupervisionadoRESUMO
Weakly-supervised learning (WSL) has recently triggered substantial interest as it mitigates the lack of pixel-wise annotations. Given global image labels, WSL methods yield pixel-level predictions (segmentations), which enable to interpret class predictions. Despite their recent success, mostly with natural images, such methods can face important challenges when the foreground and background regions have similar visual cues, yielding high false-positive rates in segmentations, as is the case in challenging histology images. WSL training is commonly driven by standard classification losses, which implicitly maximize model confidence, and locate the discriminative regions linked to classification decisions. Therefore, they lack mechanisms for modeling explicitly non-discriminative regions and reducing false-positive rates. We propose novel regularization terms, which enable the model to seek both non-discriminative and discriminative regions, while discouraging unbalanced segmentations. We introduce high uncertainty as a criterion to localize non-discriminative regions that do not affect classifier decision, and describe it with original Kullback-Leibler (KL) divergence losses evaluating the deviation of posterior predictions from the uniform distribution. Our KL terms encourage high uncertainty of the model when the latter inputs the latent non-discriminative regions. Our loss integrates: (i) a cross-entropy seeking a foreground, where model confidence about class prediction is high; (ii) a KL regularizer seeking a background, where model uncertainty is high; and (iii) log-barrier terms discouraging unbalanced segmentations. Comprehensive experiments and ablation studies over the public GlaS colon cancer data and a Camelyon16 patch-based benchmark for breast cancer show substantial improvements over state-of-the-art WSL methods, and confirm the effect of our new regularizers (our code is publicly available at https://github.com/sbelharbi/deep-wsl-histo-min-max-uncertainty).
Assuntos
Neoplasias da Mama , Técnicas Histológicas , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/patologia , Entropia , Feminino , Humanos , IncertezaRESUMO
Precise determination and assessment of bladder cancer (BC) extent of muscle invasion involvement guides proper risk stratification and personalized therapy selection. In this context, segmentation of both bladder walls and cancer are of pivotal importance, as it provides invaluable information to stage the primary tumor. Hence, multiregion segmentation on patients presenting with symptoms of bladder tumors using deep learning heralds a new level of staging accuracy and prediction of the biologic behavior of the tumor. Nevertheless, despite the success of these models in other medical problems, progress in multiregion bladder segmentation, particularly in MRI and CT modalities, is still at a nascent stage, with just a handful of works tackling a multiregion scenario. Furthermore, most existing approaches systematically follow prior literature in other clinical problems, without casting a doubt on the validity of these methods on bladder segmentation, which may present different challenges. Inspired by this, we provide an in-depth look at bladder cancer segmentation using deep learning models. The critical determinants for accurate differentiation of muscle invasive disease, current status of deep learning based bladder segmentation, lessons and limitations of prior work are highlighted.
Assuntos
Aprendizado Profundo , Humanos , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Tomografia Computadorizada por Raios X , Bexiga Urinária/diagnóstico por imagemRESUMO
This paper presents a client/server privacy-preserving network in the context of multicentric medical image analysis. Our approach is based on adversarial learning which encodes images to obfuscate the patient identity while preserving enough information for a target task. Our novel architecture is composed of three components: 1) an encoder network which removes identity-specific features from input medical images, 2) a discriminator network that attempts to identify the subject from the encoded images, 3) a medical image analysis network which analyzes the content of the encoded images (segmentation in our case). By simultaneously fooling the discriminator and optimizing the medical analysis network, the encoder learns to remove privacy-specific features while keeping those essentials for the target task. Our approach is illustrated on the problem of segmenting brain MRI from the large-scale Parkinson Progression Marker Initiative (PPMI) dataset. Using longitudinal data from PPMI, we show that the discriminator learns to heavily distort input images while allowing for highly accurate segmentation results. Our results also demonstrate that an encoder trained on the PPMI dataset can be used for segmenting other datasets, without the need for retraining. The code is made available at: https://github.com/bachkimn/Privacy-Net-An-Adversarial-Approach-forIdentity-Obfuscated-Segmentation-of-MedicalImages.
Assuntos
Processamento de Imagem Assistida por Computador , Privacidade , Humanos , Imageamento por Ressonância MagnéticaRESUMO
Prostate cancer is one of the main diseases affecting men worldwide. The gold standard for diagnosis and prognosis is the Gleason grading system. In this process, pathologists manually analyze prostate histology slides under microscope, in a high time-consuming and subjective task. In the last years, computer-aided-diagnosis (CAD) systems have emerged as a promising tool that could support pathologists in the daily clinical practice. Nevertheless, these systems are usually trained using tedious and prone-to-error pixel-level annotations of Gleason grades in the tissue. To alleviate the need of manual pixel-wise labeling, just a handful of works have been presented in the literature. Furthermore, despite the promising results achieved on global scoring the location of cancerous patterns in the tissue is only qualitatively addressed. These heatmaps of tumor regions, however, are crucial to the reliability of CAD systems as they provide explainability to the system's output and give confidence to pathologists that the model is focusing on medical relevant features. Motivated by this, we propose a novel weakly-supervised deep-learning model, based on self-learning CNNs, that leverages only the global Gleason score of gigapixel whole slide images during training to accurately perform both, grading of patch-level patterns and biopsy-level scoring. To evaluate the performance of the proposed method, we perform extensive experiments on three different external datasets for the patch-level Gleason grading, and on two different test sets for global Grade Group prediction. We empirically demonstrate that our approach outperforms its supervised counterpart on patch-level Gleason grading by a large margin, as well as state-of-the-art methods on global biopsy-level scoring. Particularly, the proposed model brings an average improvement on the Cohen's quadratic kappa ( κ) score of nearly 18% compared to full-supervision for the patch-level Gleason grading task. This suggests that the absence of the annotator's bias in our approach and the capability of using large weakly labeled datasets during training leads to higher performing and more robust models. Furthermore, raw features obtained from the patch-level classifier showed to generalize better than previous approaches in the literature to the subjective global biopsy-level scoring.
Assuntos
Interpretação de Imagem Assistida por Computador , Neoplasias da Próstata , Humanos , Masculino , Gradação de Tumores , Reprodutibilidade dos TestesRESUMO
Even though convolutional neural networks (CNNs) are driving progress in medical image segmentation, standard models still have some drawbacks. First, the use of multi-scale approaches, i.e., encoder-decoder architectures, leads to a redundant use of information, where similar low-level features are extracted multiple times at multiple scales. Second, long-range feature dependencies are not efficiently modeled, resulting in non-optimal discriminative feature representations associated with each semantic class. In this paper we attempt to overcome these limitations with the proposed architecture, by capturing richer contextual dependencies based on the use of guided self-attention mechanisms. This approach is able to integrate local features with their corresponding global dependencies, as well as highlight interdependent channel maps in an adaptive manner. Further, the additional loss between different modules guides the attention mechanisms to neglect irrelevant information and focus on more discriminant regions of the image by emphasizing relevant feature associations. We evaluate the proposed model in the context of semantic segmentation on three different datasets: abdominal organs, cardiovascular structures and brain tumors. A series of ablation experiments support the importance of these attention modules in the proposed architecture. In addition, compared to other state-of-the-art segmentation networks our model yields better segmentation performance, increasing the accuracy of the predictions while reducing the standard deviation. This demonstrates the efficiency of our approach to generate precise and reliable automatic segmentations of medical images. Our code is made publicly available at: https://github.com/sinAshish/Multi-Scale-Attention.
Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Humanos , SemânticaRESUMO
Widely used loss functions for CNN segmentation, e.g., Dice or cross-entropy, are based on integrals over the segmentation regions. Unfortunately, for highly unbalanced segmentations, such regional summations have values that differ by several orders of magnitude across classes, which affects training performance and stability. We propose a boundary loss, which takes the form of a distance metric on the space of contours, not regions. This can mitigate the difficulties of highly unbalanced problems because it uses integrals over the interface between regions instead of unbalanced integrals over the regions. Furthermore, a boundary loss complements regional information. Inspired by graph-based optimization techniques for computing active-contour flows, we express a non-symmetric L2 distance on the space of contours as a regional integral, which avoids completely local differential computations involving contour points. This yields a boundary loss expressed with the regional softmax probability outputs of the network, which can be easily combined with standard regional losses and implemented with any existing deep network architecture for N-D segmentation. We report comprehensive evaluations and comparisons on different unbalanced problems, showing that our boundary loss can yield significant increases in performances while improving training stability. Our code is publicly available1.
Assuntos
Processamento de Imagem Assistida por Computador , HumanosRESUMO
Purpose: Introducing a new technique to improve deep learning (DL) models designed for automatic grading of diabetic retinopathy (DR) from retinal fundus images by enhancing predictions' consistency. Methods: A convolutional neural network (CNN) was optimized in three different manners to predict DR grade from eye fundus images. The optimization criteria were (1) the standard cross-entropy (CE) loss; (2) CE supplemented with label smoothing (LS), a regularization approach widely employed in computer vision tasks; and (3) our proposed non-uniform label smoothing (N-ULS), a modification of LS that models the underlying structure of expert annotations. Results: Performance was measured in terms of quadratic-weighted κ score (quad-κ) and average area under the receiver operating curve (AUROC), as well as with suitable metrics for analyzing diagnostic consistency, like weighted precision, recall, and F1 score, or Matthews correlation coefficient. While LS generally harmed the performance of the CNN, N-ULS statistically significantly improved performance with respect to CE in terms quad-κ score (73.17 vs. 77.69, P < 0.025), without any performance decrease in average AUROC. N-ULS achieved this while simultaneously increasing performance for all other analyzed metrics. Conclusions: For extending standard modeling approaches from DR detection to the more complex task of DR grading, it is essential to consider the underlying structure of expert annotations. The approach introduced in this article can be easily implemented in conjunction with deep neural networks to increase their consistency without sacrificing per-class performance. Translational Relevance: A straightforward modification of current standard training practices of CNNs can substantially improve consistency in DR grading, better modeling expert annotations and human variability.