RESUMEN
Interpretability for machine learning models in medical imaging (MLMI) is an important direction of research. However, there is a general sense of murkiness in what interpretability means. Why does the need for interpretability in MLMI arise? What goals does one actually seek to address when interpretability is needed? To answer these questions, we identify a need to formalize the goals and elements of interpretability in MLMI. By reasoning about real-world tasks and goals common in both medical image analysis and its intersection with machine learning, we identify five core elements of interpretability: localization, visual recognizability, physical attribution, model transparency, and actionability. From this, we arrive at a framework for interpretability in MLMI, which serves as a step-by-step guide to approaching interpretability in this context. Overall, this paper formalizes interpretability needs in the context of medical imaging, and our applied perspective clarifies concrete MLMI-specific goals and considerations in order to guide method design and improve real-world usage. Our goal is to provide practical and didactic information for model designers and practitioners, inspire developers of models in the medical imaging field to reason more deeply about what interpretability is achieving, and suggest future directions of interpretability research.
RESUMEN
We present open-source tools for three-dimensional (3D) analysis of photographs of dissected slices of human brains, which are routinely acquired in brain banks but seldom used for quantitative analysis. Our tools can: (1) 3D reconstruct a volume from the photographs and, optionally, a surface scan; and (2) produce a high-resolution 3D segmentation into 11 brain regions per hemisphere (22 in total), independently of the slice thickness. Our tools can be used as a substitute for ex vivo magnetic resonance imaging (MRI), which requires access to an MRI scanner, ex vivo scanning expertise, and considerable ï¬nancial resources. We tested our tools on synthetic and real data from two NIH Alzheimer's Disease Research Centers. The results show that our methodology yields accurate 3D reconstructions, segmentations, and volumetric measurements that are highly correlated to those from MRI. Our method also detects expected diï¬erences between post mortem conï¬rmed Alzheimer's disease cases and controls. The tools are available in our widespread neuroimaging suite 'FreeSurfer' (https://surfer.nmr.mgh.harvard.edu/fswiki/PhotoTools).
Every year, thousands of human brains are donated to science. These brains are used to study normal aging, as well as neurological diseases like Alzheimer's or Parkinson's. Donated brains usually go to 'brain banks', institutions where the brains are dissected to extract tissues relevant to different diseases. During this process, it is routine to take photographs of brain slices for archiving purposes. Often, studies of dead brains rely on qualitative observations, such as 'the hippocampus displays some atrophy', rather than concrete 'numerical' measurements. This is because the gold standard to take three-dimensional measurements of the brain is magnetic resonance imaging (MRI), which is an expensive technique that requires high expertise especially with dead brains. The lack of quantitative data means it is not always straightforward to study certain conditions. To bridge this gap, Gazula et al. have developed an openly available software that can build three-dimensional reconstructions of dead brains based on photographs of brain slices. The software can also use machine learning methods to automatically extract different brain regions from the three-dimensional reconstructions and measure their size. These data can be used to take precise quantitative measurements that can be used to better describe how different conditions lead to changes in the brain, such as atrophy (reduced volume of one or more brain regions). The researchers assessed the accuracy of the method in two ways. First, they digitally sliced MRI-scanned brains and used the software to compute the sizes of different structures based on these synthetic data, comparing the results to the known sizes. Second, they used brains for which both MRI data and dissection photographs existed and compared the measurements taken by the software to the measurements obtained with MRI images. Gazula et al. show that, as long as the photographs satisfy some basic conditions, they can provide good estimates of the sizes of many brain structures. The tools developed by Gazula et al. are publicly available as part of FreeSurfer, a widespread neuroimaging software that can be used by any researcher working at a brain bank. This will allow brain banks to obtain accurate measurements of dead brains, allowing them to cheaply perform quantitative studies of brain structures, which could lead to new findings relating to neurodegenerative diseases.
Asunto(s)
Enfermedad de Alzheimer , Encéfalo , Imagenología Tridimensional , Aprendizaje Automático , Humanos , Imagenología Tridimensional/métodos , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/patología , Encéfalo/diagnóstico por imagen , Encéfalo/patología , Fotograbar/métodos , Disección , Imagen por Resonancia Magnética/métodos , Neuropatología/métodos , Neuroimagen/métodosRESUMEN
We present open-source tools for 3D analysis of photographs of dissected slices of human brains, which are routinely acquired in brain banks but seldom used for quantitative analysis. Our tools can: (i) 3D reconstruct a volume from the photographs and, optionally, a surface scan; and (ii) produce a high-resolution 3D segmentation into 11 brain regions per hemisphere (22 in total), independently of the slice thickness. Our tools can be used as a substitute for ex vivo magnetic resonance imaging (MRI), which requires access to an MRI scanner, ex vivo scanning expertise, and considerable financial resources. We tested our tools on synthetic and real data from two NIH Alzheimer's Disease Research Centers. The results show that our methodology yields accurate 3D reconstructions, segmentations, and volumetric measurements that are highly correlated to those from MRI. Our method also detects expected differences between post mortem confirmed Alzheimer's disease cases and controls. The tools are available in our widespread neuroimaging suite "FreeSurfer" ( https://surfer.nmr.mgh.harvard.edu/fswiki/PhotoTools ).
RESUMEN
Learning-based image reconstruction models, such as those based on the U-Net, require a large set of labeled images if good generalization is to be guaranteed. In some imaging domains, however, labeled data with pixel- or voxel-level label accuracy are scarce due to the cost of acquiring them. This problem is exacerbated further in domains like medical imaging, where there is no single ground truth label, resulting in large amounts of repeat variability in the labels. Therefore, training reconstruction networks to generalize better by learning from both labeled and unlabeled examples (called semi-supervised learning) is problem of practical and theoretical interest. However, traditional semi-supervised learning methods for image reconstruction often necessitate handcrafting a differentiable regularizer specific to some given imaging problem, which can be extremely time-consuming. In this work, we propose "supervision by denoising" (SUD), a framework to supervise reconstruction models using their own denoised output as labels. SUD unifies stochastic averaging and spatial denoising techniques under a spatio-temporal denoising framework and alternates denoising and model weight update steps in an optimization framework for semi-supervision. As example applications, we apply SUD to two problems from biomedical imaging-anatomical brain reconstruction (3D) and cortical parcellation (2D)-to demonstrate a significant improvement in reconstruction over supervised-only and ensembling baselines. Our code available at https://github.com/seannz/sud.
RESUMEN
In recent years, learning-based image registration methods have gradually moved away from direct supervision with target warps to instead use self-supervision, with excellent results in several registration benchmarks. These approaches utilize a loss function that penalizes the intensity differences between the fixed and moving images, along with a suitable regularizer on the deformation. However, since images typically have large untextured regions, merely maximizing similarity between the two images is not sufficient to recover the true deformation. This problem is exacerbated by texture in other regions, which introduces severe non-convexity into the landscape of the training objective and ultimately leads to overfitting. In this paper, we argue that the relative failure of supervised registration approaches can in part be blamed on the use of regular U-Nets, which are jointly tasked with feature extraction, feature matching and deformation estimation. Here, we introduce a simple but crucial modification to the U-Net that disentangles feature extraction and matching from deformation prediction, allowing the U-Net to warp the features, across levels, as the deformation field is evolved. With this modification, direct supervision using target warps begins to outperform self-supervision approaches that require segmentations, presenting new directions for registration when images do not have segmentations. We hope that our findings in this preliminary workshop paper will re-ignite research interest in supervised image registration techniques. Our code is publicly available from http://github.com/balbasty/superwarp.
RESUMEN
In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).
RESUMEN
Recently, many fast implementations of the bilateral and the nonlocal filters were proposed based on lattice and vector quantization, e.g. clustering, in higher dimensions. However, these approaches can still be inefficient owing to the complexities in the resampling process or in filtering the high-dimensional resampled signal. In contrast, simply scalar resampling the high-dimensional signal after decorrelation presents the opportunity to filter signals using multi-rate signal processing techniques. Cis work proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering, appealing to the similarities between separable wavelet transforms and Gaussian pyramids. Accurately implementing the filter is important not only for image processing applications, but also for a number of recently proposed bilateralregularized inverse problems, where the accuracy of the solutions depends ultimately on an accurate filter implementation. We show that our Gaussian lifting approach filters images more accurately and efficiently across many filter scales. Adaptive lifting schemes for bilateral and nonlocal means filtering are also explored.
RESUMEN
We propose the fast optical flow extractor, a filtering method that recovers artifact-free optical flow fields from HEVCcompressed video. To extract accurate optical flow fields, we form a regularized optimization problem that considers the smoothness of the solution and the pixelwise confidence weights of an artifactridden HEVC motion field. Solving such an optimization problem is slow, so we first convert the problem into a confidence-weighted filtering task. By leveraging the already-available HEVC motion parameters, we achieve a 100-fold speed-up in the running times compared to similar methods, while producing subpixel-accurate flow estimates. Je fast optical flow extractor is useful when video frames are already available in coded formats. Our method is not specific to a coder, and works with motion fields from video coders such as H.264/AVC and HEVC.
RESUMEN
This paper proposes graph Laplacian regularization for robust estimation of optical flow. First, we analyze the spectral properties of dense graph Laplacians and show that dense graphs achieve a better trade-off between preserving flow discontinuities and filtering noise, compared with the usual Laplacian. Using this analysis, we then propose a robust optical flow estimation method based on Gaussian graph Laplacians. We revisit the framework of iteratively reweighted least-squares from the perspective of graph edge reweighting, and employ the Welsch loss function to preserve flow discontinuities and handle occlusions. Our experiments using the Middlebury and MPI-Sintel optical flow datasets demonstrate the robustness and the efficiency of our proposed approach.
RESUMEN
We address the problem of decoding joint photographic experts group (JPEG)-encoded images with less visual artifacts. We view the decoding task as an ill-posed inverse problem and find a regularized solution using a convex, graph Laplacian-regularized model. Since the resulting problem is non-smooth and entails non-local regularization, we use fast high-dimensional Gaussian filtering techniques with the proximal gradient descent method to solve our convex problem efficiently. Our patch-based "coefficient graph" is better suited than the traditional pixel-based ones for regularizing smooth non-stationary signals such as natural images and relates directly to classic non-local means de-noising of images. We also extend our graph along the temporal dimension to handle the decoding of M-JPEG-encoded video. Despite the minimalistic nature of our convex problem, it produces decoded images with similar quality to other more complex, state-of-the-art methods while being up to five times faster. We also expound on the relationship between our method and the classic ANCE method, reinterpreting ANCE from a graph-based regularization perspective.