RESUMO
Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations on compiling test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries? The recommendations are intended to help AI developers demonstrate the utility of their products and to help pathologists and regulatory agencies verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.
Assuntos
Inteligência Artificial , Patologia , Humanos , Previsões , Conjuntos de Dados como AssuntoRESUMO
Diffusion MRI (dMRI) has become an invaluable tool to assess the microstructural organization of brain tissue. Depending on the specific acquisition settings, the dMRI signal encodes specific properties of the underlying diffusion process. In the last two decades, several signal representations have been proposed to fit the dMRI signal and decode such properties. Most methods, however, are tested and developed on a limited amount of data, and their applicability to other acquisition schemes remains unknown. With this work, we aimed to shed light on the generalizability of existing dMRI signal representations to different diffusion encoding parameters and brain tissue types. To this end, we organized a community challenge - named MEMENTO, making available the same datasets for fair comparisons across algorithms and techniques. We considered two state-of-the-art diffusion datasets, including single-diffusion-encoding (SDE) spin-echo data from a human brain with over 3820 unique diffusion weightings (the MASSIVE dataset), and double (oscillating) diffusion encoding data (DDE/DODE) of a mouse brain including over 2520 unique data points. A subset of the data sampled in 5 different voxels was openly distributed, and the challenge participants were asked to predict the remaining part of the data. After one year, eight participant teams submitted a total of 80 signal fits. For each submission, we evaluated the mean squared error, the variance of the prediction error and the Bayesian information criteria. The received submissions predicted either multi-shell SDE data (37%) or DODE data (22%), followed by cartesian SDE data (19%) and DDE (18%). Most submissions predicted the signals measured with SDE remarkably well, with the exception of low and very strong diffusion weightings. The prediction of DDE and DODE data seemed more challenging, likely because none of the submissions explicitly accounted for diffusion time and frequency. Next to the choice of the model, decisions on fit procedure and hyperparameters play a major role in the prediction performance, highlighting the importance of optimizing and reporting such choices. This work is a community effort to highlight strength and limitations of the field at representing dMRI acquired with trending encoding schemes, gaining insights into how different models generalize to different tissue types and fiber configurations over a large range of diffusion encodings.
Assuntos
Encéfalo/diagnóstico por imagem , Bases de Dados Factuais , Imagem de Difusão por Ressonância Magnética/métodos , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Animais , Encéfalo/fisiologia , Humanos , CamundongosRESUMO
PURPOSE: Acquisition time is a major limitation in recovering brain white matter microstructure with diffusion magnetic resonance imaging. The aim of this paper is to bridge the gap between growing demands on spatiotemporal resolution of diffusion signal and the real-world time limitations. The authors introduce an acquisition scheme that reduces the number of samples under adjustable quality loss. METHODS: Finding a sampling scheme that maximizes signal quality and satisfies given time constraints is NP-hard. Therefore, a heuristic method based on genetic algorithm is proposed in order to find suboptimal solutions in acceptable time. The analyzed diffusion signal representation is defined in the qτ space, so that it captures both spacial and temporal phenomena. RESULTS: The experiments on synthetic data and in vivo diffusion images of the C57Bl6 wild-type mouse corpus callosum reveal superiority of the proposed approach over random sampling and even distribution in the qτ space. CONCLUSIONS: The use of genetic algorithm allows to find acquisition parameters that guarantee high signal reconstruction accuracy under given time constraints. In practice, the proposed approach helps to accelerate the acquisition for the use of qτ-dMRI signal representation.
Assuntos
Corpo Caloso/diagnóstico por imagem , Imagem de Difusão por Ressonância Magnética , Interpretação de Imagem Assistida por Computador/métodos , Substância Branca/diagnóstico por imagem , Algoritmos , Animais , Simulação por Computador , Difusão , Análise de Fourier , Camundongos , Camundongos Endogâmicos C57BL , Modelos Estatísticos , Probabilidade , Reprodutibilidade dos Testes , Razão Sinal-Ruído , Processos EstocásticosRESUMO
A large number of mathematical models have been proposed to describe the measured signal in diffusion-weighted (DW) magnetic resonance imaging (MRI). However, model comparison to date focuses only on specific subclasses, e.g. compartment models or signal models, and little or no information is available in the literature on how performance varies among the different types of models. To address this deficiency, we organized the 'White Matter Modeling Challenge' during the International Symposium on Biomedical Imaging (ISBI) 2015 conference. This competition aimed to compare a range of different kinds of models in their ability to explain a large range of measurable in vivo DW human brain data. Specifically, we assessed the ability of models to predict the DW signal accurately for new diffusion gradients and b values. We did not evaluate the accuracy of estimated model parameters, as a ground truth is hard to obtain. We used the Connectome scanner at the Massachusetts General Hospital, using gradient strengths of up to 300 mT/m and a broad set of diffusion times. We focused on assessing the DW signal prediction in two regions: the genu in the corpus callosum, where the fibres are relatively straight and parallel, and the fornix, where the configuration of fibres is more complex. The challenge participants had access to three-quarters of the dataset and their models were ranked on their ability to predict the remaining unseen quarter of the data. The challenge provided a unique opportunity for a quantitative comparison of diverse methods from multiple groups worldwide. The comparison of the challenge entries reveals interesting trends that could potentially influence the next generation of diffusion-based quantitative MRI techniques. The first is that signal models do not necessarily outperform tissue models; in fact, of those tested, tissue models rank highest on average. The second is that assuming a non-Gaussian (rather than purely Gaussian) noise model provides little improvement in prediction of unseen data, although it is possible that this may still have a beneficial effect on estimated parameter values. The third is that preprocessing the training data, here by omitting signal outliers, and using signal-predicting strategies, such as bootstrapping or cross-validation, could benefit the model fitting. The analysis in this study provides a benchmark for other models and the data remain available to build up a more complete comparison in the future.
Assuntos
Encéfalo/fisiologia , Conectoma , Imagem de Difusão por Ressonância Magnética/métodos , Modelos Neurológicos , Corpo Caloso/fisiologia , Fórnice/fisiologia , HumanosRESUMO
The recovery of microstructure-related features of the brain's white matter is a current challenge in diffusion MRI. To robustly estimate these important features from multi-shell diffusion MRI data, we propose to analytically regularize the coefficient estimation of the Mean Apparent Propagator (MAP)-MRI method using the norm of the Laplacian of the reconstructed signal. We first compare our approach, which we call MAPL, with competing, state-of-the-art functional basis approaches. We show that it outperforms the original MAP-MRI implementation and the recently proposed modified Spherical Polar Fourier (mSPF) basis with respect to signal fitting and reconstruction of the Ensemble Average Propagator (EAP) and Orientation Distribution Function (ODF) in noisy, sparsely sampled data of a physical phantom with reference gold standard data. Then, to reduce the variance of parameter estimation using multi-compartment tissue models, we propose to use MAPL's signal fitting and extrapolation as a preprocessing step. We study the effect of MAPL on the estimation of axon diameter using a simplified Axcaliber model and axonal dispersion using the Neurite Orientation Dispersion and Density Imaging (NODDI) model. We show the positive effect of using it as a preprocessing step in estimating and reducing the variances of these parameters in the Corpus Callosum of six different subjects of the MGH Human Connectome Project. Finally, we correlate the estimated axon diameter, dispersion and restricted volume fractions with Fractional Anisotropy (FA) and clearly show that changes in FA significantly correlate with changes in all estimated parameters. Overall, we illustrate the potential of using a well-regularized functional basis together with multi-compartment approaches to recover important microstructure tissue parameters with much less variability, thus contributing to the challenge of better understanding microstructure-related features of the brain's white matter.
Assuntos
Algoritmos , Axônios/ultraestrutura , Corpo Caloso/diagnóstico por imagem , Corpo Caloso/ultraestrutura , Imagem de Tensor de Difusão/métodos , Interpretação de Imagem Assistida por Computador/métodos , Humanos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
The evaluation of the Human Epidermal growth factor Receptor-2 (HER2) expression is an important prognostic biomarker for breast cancer treatment selection. However, HER2 scoring has notoriously high interobserver variability due to stain variations between centers and the need to estimate visually the staining intensity in specific percentages of tumor area. In this paper, focusing on the interpretability of HER2 scoring by a pathologist, we propose a semi-automatic, two-stage deep learning approach that directly evaluates the clinical HER2 guidelines defined by the American Society of Clinical Oncology/ College of American Pathologists (ASCO/CAP). In the first stage, we segment the invasive tumor over the user-indicated Region of Interest (ROI). Then, in the second stage, we classify the tumor tissue into four HER2 classes. For the classification stage, we use weakly supervised, constrained optimization to find a model that classifies cancerous patches such that the tumor surface percentage meets the guidelines specification of each HER2 class. We end the second stage by freezing the model and refining its output logits in a supervised way to all slide labels in the training set. To ensure the quality of our dataset's labels, we conducted a multi-pathologist HER2 scoring consensus. For the assessment of doubtful cases where no consensus was found, our model can help by interpreting its HER2 class percentages output. We achieve a performance of 0.78 in F1-score on the test set while keeping our model interpretable for the pathologist, hopefully contributing to interpretable AI models in digital pathology.
Assuntos
Neoplasias da Mama , Aprendizado Profundo , Humanos , Feminino , Hibridização in Situ Fluorescente/métodos , Neoplasias da Mama/patologiaRESUMO
The density of mitotic figures (MF) within tumor tissue is known to be highly correlated with tumor proliferation and thus is an important marker in tumor grading. Recognition of MF by pathologists is subject to a strong inter-rater bias, limiting its prognostic value. State-of-the-art deep learning methods can support experts but have been observed to strongly deteriorate when applied in a different clinical environment. The variability caused by using different whole slide scanners has been identified as one decisive component in the underlying domain shift. The goal of the MICCAI MIDOG 2021 challenge was the creation of scanner-agnostic MF detection algorithms. The challenge used a training set of 200 cases, split across four scanning systems. As test set, an additional 100 cases split across four scanning systems, including two previously unseen scanners, were provided. In this paper, we evaluate and compare the approaches that were submitted to the challenge and identify methodological factors contributing to better performance. The winning algorithm yielded an F1 score of 0.748 (CI95: 0.704-0.781), exceeding the performance of six experts on the same task.
Assuntos
Algoritmos , Mitose , Humanos , Gradação de Tumores , PrognósticoRESUMO
The French Society of Pathology (SFP) organized its first data challenge in 2020 with the help of the Health Data Hub (HDH). The organization of this event first consisted of recruiting nearly 5000 cervical biopsy slides obtained from 20 pathology centers. After ensuring that patients did not refuse to include their slides in the project, the slides were anonymized, digitized, and annotated by expert pathologists, and finally uploaded to a data challenge platform for competitors from around the world. Competing teams had to develop algorithms that could distinguish 4 diagnostic classes in cervical epithelial lesions. Among the many submissions from competitors, the best algorithms achieved an overall score close to 95%. The final part of the competition lasted only 6 weeks, and the goal of SFP and HDH is now to allow for the collection to be published in open access for the scientific community. In this report, we have performed a "post-competition analysis" of the results. We first described the algorithmic pipelines of 3 top competitors. We then analyzed several difficult cases that even the top competitors could not predict correctly. A medical committee of several expert pathologists looked for possible explanations for these erroneous results by reviewing the images, and we present their findings here targeted for a large audience of pathologists and data scientists in the field of digital pathology.
RESUMO
Cervical cancer is the fourth most common cancer in women worldwide. To determine early treatment for patients, it is critical to accurately classify the cervical intraepithelial lesion status based on a microscopic biopsy. Lesion classification is a 4-class problem, with biopsies being designated as benign or increasingly malignant as class 1-3, with 3 being invasive cancer. Unfortunately, traditional biopsy analysis by a pathologist is time-consuming and subject to intra- and inter-observer variability. For this reason, it is of interest to develop automatic analysis pipelines to classify lesion status directly from a digitalized whole slide image (WSI). The recent TissueNet Challenge was organized to find the best automatic detection pipeline for this task, using a dataset of 1015 annotated WSI slides. In this work, we present our winning end-to-end solution for cervical slide classification composed of a two-step classification model: First, we classify individual slide patches using an ensemble CNN, followed by an SVM-based slide classification using statistical features of the aggregated patch-level predictions. Importantly, we present the key innovation of our approach, which is a novel partial label-based loss function that allows us to supplement the supervised WSI patch annotations with weakly supervised patches based on the WSI class. This led to us not requiring additional expert tissue annotation, while still reaching the winning score of 94.7%. Our approach is a step towards the clinical inclusion of automatic pipelines for cervical cancer treatment planning.Clinical relevance- The explanation of the winning Tis-sueNet AI algorithm for automated cervical cancer classification, which may provide insights for the next generation of computer assisted tools in digital pathology.
Assuntos
Aprendizado de Máquina , Neoplasias do Colo do Útero , Algoritmos , Feminino , Humanos , Teste de Papanicolaou , Neoplasias do Colo do Útero/diagnósticoRESUMO
Non-invasive estimation of brain microstructure features using diffusion MRI (dMRI)-known as Microstructure Imaging-has become an increasingly diverse and complicated field over the last decades. Multi-compartment (MC)-models, representing the measured diffusion signal as a linear combination of signal models of distinct tissue types, have been developed in many forms to estimate these features. However, a generalized implementation of MC-modeling as a whole, providing deeper insights in its capabilities, remains missing. To address this fact, we present Diffusion Microstructure Imaging in Python (Dmipy), an open-source toolbox implementing PGSE-based MC-modeling in its most general form. Dmipy allows on-the-fly implementation, signal modeling, and optimization of any user-defined MC-model, for any PGSE acquisition scheme. Dmipy follows a "building block"-based philosophy to Microstructure Imaging, meaning MC-models are modularly constructed to include any number and type of tissue models, allowing simultaneous representation of a tissue's diffusivity, orientation, volume fractions, axon orientation dispersion, and axon diameter distribution. In particular, Dmipy is geared toward facilitating reproducible, reliable MC-modeling pipelines, often allowing the whole process from model construction to parameter map recovery in fewer than 10 lines of code. To demonstrate Dmipy's ease of use and potential, we implement a wide range of well-known MC-models, including IVIM, AxCaliber, NODDI(x), Bingham-NODDI, the spherical mean-based SMT and MC-MDI, and spherical convolution-based single- and multi-tissue CSD. By allowing parameter cascading between MC-models, Dmipy also facilitates implementation of advanced approaches like CSD with voxel-varying kernels and single-shell 3-tissue CSD. By providing a well-tested, user-friendly toolbox that simplifies the interaction with the otherwise complicated field of dMRI-based Microstructure Imaging, Dmipy contributes to more reproducible, high-quality research.
RESUMO
Effective representation of the four-dimensional diffusion MRI signal - varying over three-dimensional q-space and diffusion time τ - is a sought-after and still unsolved challenge in diffusion MRI (dMRI). We propose a functional basis approach that is specifically designed to represent the dMRI signal in this qτ-space. Following recent terminology, we refer to our qτ-functional basis as "qτ-dMRI". qτ-dMRI can be seen as a time-dependent realization of q-space imaging by Paul Callaghan and colleagues. We use GraphNet regularization - imposing both signal smoothness and sparsity - to drastically reduce the number of diffusion-weighted images (DWIs) that is needed to represent the dMRI signal in the qτ-space. As the main contribution, qτ-dMRI provides the framework to - without making biophysical assumptions - represent the qτ-space signal and estimate time-dependent q-space indices (qτ-indices), providing a new means for studying diffusion in nervous tissue. We validate our method on both in-silico generated data using Monte-Carlo simulations and an in-vivo test-retest study of two C57Bl6 wild-type mice, where we found good reproducibility of estimated qτ-index values and trends. In the hopes of opening up new τ-dependent venues of studying nervous tissues, qτ-dMRI is the first of its kind in being specifically designed to provide open interpretation of the qτ-diffusion signal.
Assuntos
Imagem de Difusão por Ressonância Magnética/métodos , Animais , Camundongos , Camundongos Endogâmicos C57BL , Método de Monte Carlo , Reprodutibilidade dos TestesRESUMO
We propose a novel framework to simultaneously represent the diffusion-weighted MRI (dMRI) signal over diffusion times, gradient strengths and gradient directions. Current frameworks such as the 3D Simple Harmonic Oscillator Reconstruction and Estimation basis (3D-SHORE) only represent the signal over the spatial domain, leaving the temporal dependency as a fixed parameter. However, microstructure-focused techniques such as Axcaliber and ActiveAx provide evidence of the importance of sampling the dMRI space over .diffusion time. Up to now there exists no generalized framework that simultaneously models the dependence of the dMRI signal in space and time. We use a functional basis to fit the 3D+t spatio-temporal dMRI signal, similarly to the 3D-SHORE basis in three dimensional 'q-space'. The lowest order term in this expansion contains an isotropic diffusion tensor that characterizes the Gaussian displacement distribution, multiplied by a negative exponential. We regularize the signal fitting by minimizing the norm of the analytic Laplacian of the basis, and validate our technique on synthetic data generated using the theoretical model proposed by Callaghan et al. We show that our method is robust to noise, and can accurately describe the restricted spatio-temporal signal decay originating from tissue models such as cylindrical pores. From the fitting we can then estimate the axon radius distribution parameters along any direction using approaches similar to AxCaliber. We also apply our method on real data from an ActiveAx acquisition. Overall, our approach allows one to represent the complete 3D+t dMRI signal, which should prove helpful in understanding normal and pathologic nervous tissue.
Assuntos
Corpo Caloso/ultraestrutura , Imagem de Tensor de Difusão/métodos , Imageamento Tridimensional/métodos , Fibras Nervosas Mielinizadas/ultraestrutura , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Algoritmos , Animais , Haplorrinos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise Espaço-TemporalRESUMO
Diffusion magnetic resonance imaging (dMRI) is the modality of choice for investigating in-vivo white matter connectivity and neural tissue architecture of the brain. The diffusion-weighted signal in dMRI reflects the diffusivity of water molecules in brain tissue and can be utilized to produce image-based biomarkers for clinical research. Due to the constraints on scanning time, a limited number of measurements can be acquired within a clinically feasible scan time. In order to reconstruct the dMRI signal from a discrete set of measurements, a large number of algorithms have been proposed in recent years in conjunction with varying sampling schemes, i.e., with varying b-values and gradient directions. Thus, it is imperative to compare the performance of these reconstruction methods on a single data set to provide appropriate guidelines to neuroscientists on making an informed decision while designing their acquisition protocols. For this purpose, the SPArse Reconstruction Challenge (SPARC) was held along with the workshop on Computational Diffusion MRI (at MICCAI 2014) to validate the performance of multiple reconstruction methods using data acquired from a physical phantom. A total of 16 reconstruction algorithms (9 teams) participated in this community challenge. The goal was to reconstruct single b-value and/or multiple b-value data from a sparse set of measurements. In particular, the aim was to determine an appropriate acquisition protocol (in terms of the number of measurements, b-values) and the analysis method to use for a neuroimaging study. The challenge did not delve on the accuracy of these methods in estimating model specific measures such as fractional anisotropy (FA) or mean diffusivity, but on the accuracy of these methods to fit the data. This paper presents several quantitative results pertaining to each reconstruction algorithm. The conclusions in this paper provide a valuable guideline for choosing a suitable algorithm and the corresponding data-sampling scheme for clinical neuroscience applications.