RESUMEN
MOTIVATION: Advances in automation and imaging have made it possible to capture a large image dataset that spans multiple experimental batches of data. However, accurate biological comparison across the batches is challenged by batch-to-batch variation (i.e. batch effect) due to uncontrollable experimental noise (e.g. varying stain intensity or cell density). Previous approaches to minimize the batch effect have commonly focused on normalizing the low-dimensional image measurements such as an embedding generated by a neural network. However, normalization of the embedding could suffer from over-correction and alter true biological features (e.g. cell size) due to our limited ability to interpret the effect of the normalization on the embedding space. Although techniques like flat-field correction can be applied to normalize the image values directly, they are limited transformations that handle only simple artifacts due to batch effect. RESULTS: We present a neural network-based batch equalization method that can transfer images from one batch to another while preserving the biological phenotype. The equalization method is trained as a generative adversarial network (GAN), using the StarGAN architecture that has shown considerable ability in style transfer. After incorporating new objectives that disentangle batch effect from biological features, we show that the equalized images have less batch information and preserve the biological information. We also demonstrate that the same model training parameters can generalize to two dramatically different types of cells, indicating this approach could be broadly applicable. AVAILABILITY AND IMPLEMENTATION: https://github.com/tensorflow/gan/tree/master/tensorflow_gan/examples/stargan. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , ArtefactosRESUMEN
PURPOSE: To understand the impact of deep learning diabetic retinopathy (DR) algorithms on physician readers in computer-assisted settings. DESIGN: Evaluation of diagnostic technology. PARTICIPANTS: One thousand seven hundred ninety-six retinal fundus images from 1612 diabetic patients. METHODS: Ten ophthalmologists (5 general ophthalmologists, 4 retina specialists, 1 retina fellow) read images for DR severity based on the International Clinical Diabetic Retinopathy disease severity scale in each of 3 conditions: unassisted, grades only, or grades plus heatmap. Grades-only assistance comprised a histogram of DR predictions (grades) from a trained deep-learning model. For grades plus heatmap, we additionally showed explanatory heatmaps. MAIN OUTCOME MEASURES: For each experiment arm, we computed sensitivity and specificity of each reader and the algorithm for different levels of DR severity against an adjudicated reference standard. We also measured accuracy (exact 5-class level agreement and Cohen's quadratically weighted κ), reader-reported confidence (5-point Likert scale), and grading time. RESULTS: Readers graded more accurately with model assistance than without for the grades-only condition (P < 0.001). Grades plus heatmaps improved accuracy for patients with DR (P < 0.001), but reduced accuracy for patients without DR (P = 0.006). Both forms of assistance increased readers' sensitivity moderate-or-worse DR: unassisted: mean, 79.4% [95% confidence interval (CI), 72.3%-86.5%]; grades only: mean, 87.5% [95% CI, 85.1%-89.9%]; grades plus heatmap: mean, 88.7% [95% CI, 84.9%-92.5%] without a corresponding drop in specificity (unassisted: mean, 96.6% [95% CI, 95.9%-97.4%]; grades only: mean, 96.1% [95% CI, 95.5%-96.7%]; grades plus heatmap: mean, 95.5% [95% CI, 94.8%-96.1%]). Algorithmic assistance increased the accuracy of retina specialists above that of the unassisted reader or model alone; and increased grading confidence and grading time across all readers. For most cases, grades plus heatmap was only as effective as grades only. Over the course of the experiment, grading time decreased across all conditions, although most sharply for grades plus heatmap. CONCLUSIONS: Deep learning algorithms can improve the accuracy of, and confidence in, DR diagnosis in an assisted read setting. They also may increase grading time, although these effects may be ameliorated with experience.
Asunto(s)
Algoritmos , Aprendizaje Profundo , Retinopatía Diabética/clasificación , Retinopatía Diabética/diagnóstico , Diagnóstico por Computador/métodos , Femenino , Humanos , Masculino , Oftalmólogos/normas , Fotograbar/métodos , Curva ROC , Estándares de Referencia , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
BACKGROUND: Large image datasets acquired on automated microscopes typically have some fraction of low quality, out-of-focus images, despite the use of hardware autofocus systems. Identification of these images using automated image analysis with high accuracy is important for obtaining a clean, unbiased image dataset. Complicating this task is the fact that image focus quality is only well-defined in foreground regions of images, and as a result, most previous approaches only enable a computation of the relative difference in quality between two or more images, rather than an absolute measure of quality. RESULTS: We present a deep neural network model capable of predicting an absolute measure of image focus on a single image in isolation, without any user-specified parameters. The model operates at the image-patch level, and also outputs a measure of prediction certainty, enabling interpretable predictions. The model was trained on only 384 in-focus Hoechst (nuclei) stain images of U2OS cells, which were synthetically defocused to one of 11 absolute defocus levels during training. The trained model can generalize on previously unseen real Hoechst stain images, identifying the absolute image focus to within one defocus level (approximately 3 pixel blur diameter difference) with 95% accuracy. On a simpler binary in/out-of-focus classification task, the trained model outperforms previous approaches on both Hoechst and Phalloidin (actin) stain images (F-scores of 0.89 and 0.86, respectively over 0.84 and 0.83), despite only having been presented Hoechst stain images during training. Lastly, we observe qualitatively that the model generalizes to two additional stains, Hoechst and Tubulin, of an unseen cell type (Human MCF-7) acquired on a different instrument. CONCLUSIONS: Our deep neural network enables classification of out-of-focus microscope images with both higher accuracy and greater precision than previous approaches via interpretable patch-level focus and certainty predictions. The use of synthetically defocused images precludes the need for a manually annotated training dataset. The model also generalizes to different image and cell types. The framework for model training and image prediction is available as a free software library and the pre-trained model is available for immediate use in Fiji (ImageJ) and CellProfiler.
Asunto(s)
Diagnóstico por Imagen/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Microscopía/métodos , Osteosarcoma/diagnóstico , Programas Informáticos , Neoplasias Óseas/diagnóstico , Humanos , Células Tumorales CultivadasRESUMEN
Importance: Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation. Objective: To apply deep learning to create an algorithm for automated detection of diabetic retinopathy and diabetic macular edema in retinal fundus photographs. Design and Setting: A specific type of neural network optimized for image classification called a deep convolutional neural network was trained using a retrospective development data set of 128â¯175 retinal images, which were graded 3 to 7 times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology senior residents between May and December 2015. The resultant algorithm was validated in January and February 2016 using 2 separate data sets, both graded by at least 7 US board-certified ophthalmologists with high intragrader consistency. Exposure: Deep learning-trained algorithm. Main Outcomes and Measures: The sensitivity and specificity of the algorithm for detecting referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy, referable diabetic macular edema, or both, were generated based on the reference standard of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2 operating points selected from the development set, one selected for high specificity and another for high sensitivity. Results: The EyePACS-1 data set consisted of 9963 images from 4997 patients (mean age, 54.4 years; 62.2% women; prevalence of RDR, 683/8878 fully gradable images [7.8%]); the Messidor-2 data set had 1748 images from 874 patients (mean age, 57.6 years; 42.6% women; prevalence of RDR, 254/1745 fully gradable images [14.6%]). For detecting RDR, the algorithm had an area under the receiver operating curve of 0.991 (95% CI, 0.988-0.993) for EyePACS-1 and 0.990 (95% CI, 0.986-0.995) for Messidor-2. Using the first operating cut point with high specificity, for EyePACS-1, the sensitivity was 90.3% (95% CI, 87.5%-92.7%) and the specificity was 98.1% (95% CI, 97.8%-98.5%). For Messidor-2, the sensitivity was 87.0% (95% CI, 81.1%-91.0%) and the specificity was 98.5% (95% CI, 97.7%-99.1%). Using a second operating point with high sensitivity in the development set, for EyePACS-1 the sensitivity was 97.5% and specificity was 93.4% and for Messidor-2 the sensitivity was 96.1% and specificity was 93.9%. Conclusions and Relevance: In this evaluation of retinal fundus photographs from adults with diabetes, an algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy. Further research is necessary to determine the feasibility of applying this algorithm in the clinical setting and to determine whether use of the algorithm could lead to improved care and outcomes compared with current ophthalmologic assessment.
Asunto(s)
Algoritmos , Retinopatía Diabética/diagnóstico por imagen , Fondo de Ojo , Aprendizaje Automático , Edema Macular/diagnóstico por imagen , Redes Neurales de la Computación , Fotograbar , Femenino , Humanos , Masculino , Persona de Mediana Edad , Variaciones Dependientes del Observador , Oftalmólogos , Sensibilidad y EspecificidadRESUMEN
Systematic development of accurate density functionals has been a decades-long challenge for scientists. Despite emerging applications of machine learning (ML) in approximating functionals, the resulting ML functionals usually contain more than tens of thousands of parameters, leading to a huge gap in the formulation with the conventional human-designed symbolic functionals. We propose a new framework, Symbolic Functional Evolutionary Search (SyFES), that automatically constructs accurate functionals in the symbolic form, which is more explainable to humans, cheaper to evaluate, and easier to integrate to existing codes than other ML functionals. We first show that, without prior knowledge, SyFES reconstructed a known functional from scratch. We then demonstrate that evolving from an existing functional ωB97M-V, SyFES found a new functional, GAS22 (Google Accelerated Science 22), that performs better for most of the molecular types in the test set of Main Group Chemistry Database (MGCDB84). Our framework opens a new direction in leveraging computing power for the systematic development of symbolic density functionals.
RESUMEN
Drug discovery for diseases such as Parkinson's disease are impeded by the lack of screenable cellular phenotypes. We present an unbiased phenotypic profiling platform that combines automated cell culture, high-content imaging, Cell Painting, and deep learning. We applied this platform to primary fibroblasts from 91 Parkinson's disease patients and matched healthy controls, creating the largest publicly available Cell Painting image dataset to date at 48 terabytes. We use fixed weights from a convolutional deep neural network trained on ImageNet to generate deep embeddings from each image and train machine learning models to detect morphological disease phenotypes. Our platform's robustness and sensitivity allow the detection of individual-specific variation with high fidelity across batches and plate layouts. Lastly, our models confidently separate LRRK2 and sporadic Parkinson's disease lines from healthy controls (receiver operating characteristic area under curve 0.79 (0.08 standard deviation)), supporting the capacity of this platform for complex disease modeling and drug screening applications.
Asunto(s)
Aprendizaje Profundo , Enfermedad de Parkinson , Fibroblastos , Humanos , Aprendizaje Automático , Redes Neurales de la ComputaciónRESUMEN
Center-involved diabetic macular edema (ci-DME) is a major cause of vision loss. Although the gold standard for diagnosis involves 3D imaging, 2D imaging by fundus photography is usually used in screening settings, resulting in high false-positive and false-negative calls. To address this, we train a deep learning model to predict ci-DME from fundus photographs, with an ROC-AUC of 0.89 (95% CI: 0.87-0.91), corresponding to 85% sensitivity at 80% specificity. In comparison, retinal specialists have similar sensitivities (82-85%), but only half the specificity (45-50%, p < 0.001). Our model can also detect the presence of intraretinal fluid (AUC: 0.81; 95% CI: 0.81-0.86) and subretinal fluid (AUC 0.88; 95% CI: 0.85-0.91). Using deep learning to make predictions via simple 2D images without sophisticated 3D-imaging equipment and with better than specialist performance, has broad relevance to many other applications in medical imaging.
Asunto(s)
Retinopatía Diabética/diagnóstico por imagen , Edema Macular/diagnóstico por imagen , Anciano , Aprendizaje Profundo , Retinopatía Diabética/genética , Femenino , Humanos , Imagenología Tridimensional , Edema Macular/genética , Masculino , Persona de Mediana Edad , Mutación , Fotograbar , Retina/diagnóstico por imagen , Tomografía de Coherencia ÓpticaRESUMEN
The etiological underpinnings of many CNS disorders are not well understood. This is likely due to the fact that individual diseases aggregate numerous pathological subtypes, each associated with a complex landscape of genetic risk factors. To overcome these challenges, researchers are integrating novel data types from numerous patients, including imaging studies capturing broadly applicable features from patient-derived materials. These datasets, when combined with machine learning, potentially hold the power to elucidate the subtle patterns that stratify patients by shared pathology. In this study, we interrogated whether high-content imaging of primary skin fibroblasts, using the Cell Painting method, could reveal disease-relevant information among patients. First, we showed that technical features such as batch/plate type, plate, and location within a plate lead to detectable nuisance signals, as revealed by a pre-trained deep neural network and analysis with deep image embeddings. Using a plate design and image acquisition strategy that accounts for these variables, we performed a pilot study with 12 healthy controls and 12 subjects affected by the severe genetic neurological disorder spinal muscular atrophy (SMA), and evaluated whether a convolutional neural network (CNN) generated using a subset of the cells could distinguish disease states on cells from the remaining unseen control-SMA pair. Our results indicate that these two populations could effectively be differentiated from one another and that model selectivity is insensitive to batch/plate type. One caveat is that the samples were also largely separated by source. These findings lay a foundation for how to conduct future studies exploring diseases with more complex genetic contributions and unknown subtypes.
Asunto(s)
Ensayos Analíticos de Alto Rendimiento , Aprendizaje Automático , Imagen Molecular , Redes Neurales de la Computación , Aprendizaje Profundo , Humanos , Procesamiento de Imagen Asistido por ComputadorRESUMEN
Brain structural complexity has confounded prior efforts to extract quantitative image-based measurements. We present a systematic 'divide and conquer' methodology for analyzing three-dimensional (3D) multi-parameter images of brain tissue to delineate and classify key structures, and compute quantitative associations among them. To demonstrate the method, thick ( approximately 100 microm) slices of rat brain tissue were labeled using three to five fluorescent signals, and imaged using spectral confocal microscopy and unmixing algorithms. Automated 3D segmentation and tracing algorithms were used to delineate cell nuclei, vasculature, and cell processes. From these segmentations, a set of 23 intrinsic and 8 associative image-based measurements was computed for each cell. These features were used to classify astrocytes, microglia, neurons, and endothelial cells. Associations among cells and between cells and vasculature were computed and represented as graphical networks to enable further analysis. The automated results were validated using a graphical interface that permits investigator inspection and corrective editing of each cell in 3D. Nuclear counting accuracy was >89%, and cell classification accuracy ranged from 81 to 92% depending on cell type. We present a software system named FARSIGHT implementing our methodology. Its output is a detailed XML file containing measurements that may be used for diverse quantitative hypothesis-driven and exploratory studies of the central nervous system.
Asunto(s)
Encéfalo/anatomía & histología , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Animales , Vasos Sanguíneos/anatomía & histología , Vasos Sanguíneos/química , Encéfalo/citología , Química Encefálica , Corteza Cerebral/anatomía & histología , Corteza Cerebral/citología , Corteza Cerebral/fisiología , Circulación Cerebrovascular/fisiología , Colorantes , Proteína Ácida Fibrilar de la Glía/metabolismo , Hipocampo/anatomía & histología , Hipocampo/citología , Hipocampo/fisiología , Procesamiento de Imagen Asistido por Computador/estadística & datos numéricos , Masculino , Red Nerviosa/citología , Red Nerviosa/fisiología , Neuronas/clasificación , Ratas , Ratas Sprague-Dawley , Reproducibilidad de los Resultados , Programas InformáticosRESUMEN
The accuracy and reliability of automated neurite tracing systems is ultimately limited by image quality as reflected in the signal-to-noise ratio, contrast, and image variability. This paper describes a novel combination of image processing methods that operate on images of neurites captured by confocal and widefield microscopy, and produce synthetic images that are better suited to automated tracing. The algorithms are based on the curvelet transform (for denoising curvilinear structures and local orientation estimation), perceptual grouping by scalar voting (for elimination of non-tubular structures and improvement of neurite continuity while preserving branch points), adaptive focus detection, and depth estimation (for handling widefield images without deconvolution). The proposed methods are fast, and capable of handling large images. Their ability to handle images of unlimited size derives from automated tiling of large images along the lateral dimension, and processing of 3-D images one optical slice at a time. Their speed derives in part from the fact that the core computations are formulated in terms of the Fast Fourier Transform (FFT), and in part from parallel computation on multi-core computers. The methods are simple to apply to new images since they require very few adjustable parameters, all of which are intuitive. Examples of pre-processing DIADEM Challenge images are used to illustrate improved automated tracing resulting from our pre-processing methods.
Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Imagenología Tridimensional/métodos , Técnicas de Trazados de Vías Neuroanatómicas/métodos , Neuronas/citología , Animales , Humanos , Procesamiento de Imagen Asistido por Computador/tendencias , Imagenología Tridimensional/tendencias , Microscopía/métodos , Microscopía/tendencias , Técnicas de Trazados de Vías Neuroanatómicas/tendencias , Neuronas/fisiologíaRESUMEN
This paper presents a broadly applicable algorithm and a comprehensive open-source software implementation for automated tracing of neuronal structures in 3-D microscopy images. The core 3-D neuron tracing algorithm is based on three-dimensional (3-D) open-curve active Contour (Snake). It is initiated from a set of automatically detected seed points. Its evolution is driven by a combination of deforming forces based on the Gradient Vector Flow (GVF), stretching forces based on estimation of the fiber orientations, and a set of control rules. In this tracing model, bifurcation points are detected implicitly as points where multiple snakes collide. A boundariness measure is employed to allow local radius estimation. A suite of pre-processing algorithms enable the system to accommodate diverse neuronal image datasets by reducing them to a common image format. The above algorithms form the basis for a comprehensive, scalable, and efficient software system developed for confocal or brightfield images. It provides multiple automated tracing modes. The user can optionally interact with the tracing system using multiple view visualization, and exercise full control to ensure a high quality reconstruction. We illustrate the utility of this tracing system by presenting results from a synthetic dataset, a brightfield dataset and two confocal datasets from the DIADEM challenge.
Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Imagenología Tridimensional/métodos , Neuronas/citología , Programas Informáticos/normas , Animales , Procesamiento de Imagen Asistido por Computador/tendencias , Imagenología Tridimensional/tendencias , Ratones , Técnicas de Trazados de Vías Neuroanatómicas/métodos , Técnicas de Trazados de Vías Neuroanatómicas/tendencias , Neuronas/fisiología , Ratas , Programas Informáticos/tendenciasRESUMEN
This paper presents robust 3-D algorithms to segment vasculature that is imaged by labeling laminae, rather than the lumenal volume. The signal is weak, sparse, noisy, nonuniform, low-contrast, and exhibits gaps and spectral artifacts, so adaptive thresholding and Hessian filtering based methods are not effective. The structure deviates from a tubular geometry, so tracing algorithms are not effective. We propose a four step approach. The first step detects candidate voxels using a robust hypothesis test based on a model that assumes Poisson noise and locally planar geometry. The second step performs an adaptive region growth to extract weakly labeled and fine vessels while rejecting spectral artifacts. To enable interactive visualization and estimation of features such as statistical confidence, local curvature, local thickness, and local normal, we perform the third step. In the third step, we construct an accurate mesh representation using marching tetrahedra, volume-preserving smoothing, and adaptive decimation algorithms. To enable topological analysis and efficient validation, we describe a method to estimate vessel centerlines using a ray casting and vote accumulation algorithm which forms the final step of our algorithm. Our algorithm lends itself to parallel processing, and yielded an 8 x speedup on a graphics processor (GPU). On synthetic data, our meshes had average error per face (EPF) values of (0.1-1.6) voxels per mesh face for peak signal-to-noise ratios from (110-28 dB). Separately, the error from decimating the mesh to less than 1% of its original size, the EPF was less than 1 voxel/face. When validated on real datasets, the average recall and precision values were found to be 94.66% and 94.84%, respectively.