RESUMEN
Microscopic evaluation of glands in the colon is of utmost importance in the diagnosis of inflammatory bowel disease and cancer. When properly trained, deep learning pipelines can provide a systematic, reproducible, and quantitative assessment of disease-related changes in glandular tissue architecture. The training and testing of deep learning models require large amounts of manual annotations, which are difficult, time-consuming, and expensive to obtain. Here, we propose a method for automated generation of ground truth in digital hematoxylin and eosin (H&E)-stained slides using immunohistochemistry (IHC) labels. The image processing pipeline generates annotations of glands in H&E histopathology images from colon biopsy specimens by transfer of gland masks from KRT8/18, CDX2, or EPCAM IHC. The IHC gland outlines are transferred to coregistered H&E images for training of deep learning models. We compared the performance of the deep learning models to that of manual annotations using an internal held-out set of biopsy specimens as well as 2 public data sets. Our results show that EPCAM IHC provides gland outlines that closely match manual gland annotations (Dice = 0.89) and are resilient to damage by inflammation. In addition, we propose a simple data sampling technique that allows models trained on data from several sources to be adapted to a new data source using just a few newly annotated samples. The best performing models achieved average Dice scores of 0.902 and 0.89 on Gland Segmentation and Colorectal Adenocarcinoma Gland colon cancer public data sets, respectively, when trained with only 10% of annotated cases from either public cohort. Altogether, the performances of our models indicate that automated annotations using cell type-specific IHC markers can safely replace manual annotations. Automated IHC labels from single-institution cohorts can be combined with small numbers of hand-annotated cases from multi-institutional cohorts to train models that generalize well to diverse data sources.
Asunto(s)
Neoplasias del Colon , Aprendizaje Profundo , Humanos , Molécula de Adhesión Celular Epitelial , Inmunohistoquímica , Procesamiento de Imagen Asistido por ComputadorRESUMEN
Statistical shape models (SSM) have been well-established as an excellent tool for identifying variations in the morphology of anatomy across the underlying population. Shape models use consistent shape representation across all the samples in a given cohort, which helps to compare shapes and identify the variations that can detect pathologies and help in formulating treatment plans. In medical imaging, computing these shape representations from CT/MRI scans requires time-intensive preprocessing operations, including but not limited to anatomy segmentation annotations, registration, and texture denoising. Deep learning models have demonstrated exceptional capabilities in learning shape representations directly from volumetric images, giving rise to highly effective and efficient Image-to-SSM networks. Nevertheless, these models are data-hungry and due to the limited availability of medical data, deep learning models tend to overfit. Offline data augmentation techniques, that use kernel density estimation based (KDE) methods for generating shape-augmented samples, have successfully aided Image-to-SSM networks in achieving comparable accuracy to traditional SSM methods. However, these augmentation methods focus on shape augmentation, whereas deep learning models exhibit image-based texture bias resulting in sub-optimal models. This paper introduces a novel strategy for on-the-fly data augmentation for the Image-to-SSM framework by leveraging data-dependent noise generation or texture augmentation. The proposed framework is trained as an adversary to the Image-to-SSM network, augmenting diverse and challenging noisy samples. Our approach achieves improved accuracy by encouraging the model to focus on the underlying geometry rather than relying solely on pixel values.