Search | VHL CLAP/WR-PAHO/WHO

Artificial intelligence-based image analysis in clinical testing: lessons from cervical cancer screening.

Egemen, Didem; Perkins, Rebecca B; Cheung, Li C; Befano, Brian; Rodriguez, Ana Cecilia; Desai, Kanan; Lemay, Andreanne; Ahmed, Syed Rakin; Antani, Sameer; Jeronimo, Jose; Wentzensen, Nicolas; Kalpathy-Cramer, Jayashree; De Sanjose, Silvia; Schiffman, Mark.

J Natl Cancer Inst ; 116(1): 26-33, 2024 01 10.

Article in English | MEDLINE | ID: mdl-37758250

ABSTRACT

Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentation of lessons learned, organized as a conceptual step-by-step approach to bridge the gap between the creation of an AI algorithm and clinical efficacy. The first fundamental principle is specifying rigorously what the algorithm is designed to identify and what the test is intended to measure (eg, screening, diagnostic, or prognostic). Second, designing the AI algorithm to minimize the most clinically important errors. For example, many equivocal cervical images cannot yet be labeled because the borderline between cases and controls is blurred. To avoid a misclassified case-control dichotomy, we have isolated the equivocal cases and formally included an intermediate, indeterminate class (severity order of classes: case>indeterminate>control). The third principle is evaluating AI algorithms like any other test, using clinical epidemiologic criteria. Repeatability of the algorithm at the borderline, for indeterminate images, has proven extremely informative. Distinguishing between internal and external validation is also essential. Linking the AI algorithm results to clinical risk estimation is the fourth principle. Absolute risk (not relative) is the critical metric for translating a test result into clinical use. Finally, generating risk-based guidelines for clinical use that match local resources and priorities is the last principle in our approach. We are particularly interested in applications to lower-resource settings to address health disparities. We note that similar principles apply to other domains of AI-based image analysis for medical diagnostic testing.

Subject(s)

Artificial Intelligence , Uterine Cervical Neoplasms , Female , Humans , Early Detection of Cancer , Uterine Cervical Neoplasms/diagnosis , Algorithms , Image Processing, Computer-Assisted

Reproducible and clinically translatable deep neural networks for cervical screening.

Ahmed, Syed Rakin; Befano, Brian; Lemay, Andreanne; Egemen, Didem; Rodriguez, Ana Cecilia; Angara, Sandeep; Desai, Kanan; Jeronimo, Jose; Antani, Sameer; Campos, Nicole; Inturrisi, Federica; Perkins, Rebecca; Kreimer, Aimee; Wentzensen, Nicolas; Herrero, Rolando; Del Pino, Marta; Quint, Wim; de Sanjose, Silvia; Schiffman, Mark; Kalpathy-Cramer, Jayashree.

Sci Rep ; 13(1): 21772, 2023 12 08.

Article in English | MEDLINE | ID: mdl-38066031

ABSTRACT

Cervical cancer is a leading cause of cancer mortality, with approximately 90% of the 250,000 deaths per year occurring in low- and middle-income countries (LMIC). Secondary prevention with cervical screening involves detecting and treating precursor lesions; however, scaling screening efforts in LMIC has been hampered by infrastructure and cost constraints. Recent work has supported the development of an artificial intelligence (AI) pipeline on digital images of the cervix to achieve an accurate and reliable diagnosis of treatable precancerous lesions. In particular, WHO guidelines emphasize visual triage of women testing positive for human papillomavirus (HPV) as the primary screen, and AI could assist in this triage task. In this work, we implemented a comprehensive deep-learning model selection and optimization study on a large, collated, multi-geography, multi-institution, and multi-device dataset of 9462 women (17,013 images). We evaluated relative portability, repeatability, and classification performance. The top performing model, when combined with HPV type, achieved an area under the Receiver Operating Characteristics (ROC) curve (AUC) of 0.89 within our study population of interest, and a limited total extreme misclassification rate of 3.4%, on held-aside test sets. Our model also produced reliable and consistent predictions, achieving a strong quadratic weighted kappa (QWK) of 0.86 and a minimal %2-class disagreement (% 2-Cl. D.) of 0.69%, between image pairs across women. Our work is among the first efforts at designing a robust, repeatable, accurate and clinically translatable deep-learning model for cervical screening.

Subject(s)

Papillomavirus Infections , Uterine Cervical Neoplasms , Humans , Female , Cervix Uteri/pathology , Papillomavirus Infections/epidemiology , Artificial Intelligence , Early Detection of Cancer/methods , Mass Screening/methods , Neural Networks, Computer

REPRODUCIBLE AND CLINICALLY TRANSLATABLE DEEP NEURAL NETWORKS FOR CANCER SCREENING.

Res Sq ; 2023 Mar 03.

Article in English | MEDLINE | ID: mdl-36909463

ABSTRACT

Cervical cancer is a leading cause of cancer mortality, with approximately 90% of the 250,000 deaths per year occurring in low- and middle-income countries (LMIC). Secondary prevention with cervical screening involves detecting and treating precursor lesions; however, scaling screening efforts in LMIC has been hampered by infrastructure and cost constraints. Recent work has supported the development of an artificial intelligence (AI) pipeline on digital images of the cervix to achieve an accurate and reliable diagnosis of treatable precancerous lesions. In particular, WHO guidelines emphasize visual triage of women testing positive for human papillomavirus (HPV) as the primary screen, and AI could assist in this triage task. Published AI reports have exhibited overfitting, lack of portability, and unrealistic, near-perfect performance estimates. To surmount recognized issues, we implemented a comprehensive deep-learning model selection and optimization study on a large, collated, multi-institutional dataset of 9,462 women (17,013 images). We evaluated relative portability, repeatability, and classification performance. The top performing model, when combined with HPV type, achieved an area under the Receiver Operating Characteristics (ROC) curve (AUC) of 0.89 within our study population of interest, and a limited total extreme misclassification rate of 3.4%, on held-aside test sets. Our work is among the first efforts at designing a robust, repeatable, accurate and clinically translatable deep-learning model for cervical screening.

Improving the repeatability of deep learning models with Monte Carlo dropout.

Lemay, Andreanne; Hoebel, Katharina; Bridge, Christopher P; Befano, Brian; De Sanjosé, Silvia; Egemen, Didem; Rodriguez, Ana Cecilia; Schiffman, Mark; Campbell, John Peter; Kalpathy-Cramer, Jayashree.

NPJ Digit Med ; 5(1): 174, 2022 Nov 18.

Article in English | MEDLINE | ID: mdl-36400939

ABSTRACT

The integration of artificial intelligence into clinical workflows requires reliable and robust models. Repeatability is a key attribute of model robustness. Ideal repeatable models output predictions without variation during independent tests carried out under similar conditions. However, slight variations, though not ideal, may be unavoidable and acceptable in practice. During model development and evaluation, much attention is given to classification performance while model repeatability is rarely assessed, leading to the development of models that are unusable in clinical practice. In this work, we evaluate the repeatability of four model types (binary classification, multi-class classification, ordinal classification, and regression) on images that were acquired from the same patient during the same visit. We study the each model's performance on four medical image classification tasks from public and private datasets: knee osteoarthritis, cervical cancer screening, breast density estimation, and retinopathy of prematurity. Repeatability is measured and compared on ResNet and DenseNet architectures. Moreover, we assess the impact of sampling Monte Carlo dropout predictions at test time on classification performance and repeatability. Leveraging Monte Carlo predictions significantly increases repeatability, in particular at the class boundaries, for all tasks on the binary, multi-class, and ordinal models leading to an average reduction of the 95% limits of agreement by 16% points and of the class disagreement rate by 7% points. The classification accuracy improves in most settings along with the repeatability. Our results suggest that beyond about 20 Monte Carlo iterations, there is no further gain in repeatability. In addition to the higher test-retest agreement, Monte Carlo predictions are better calibrated which leads to output probabilities reflecting more accurately the true likelihood of being correctly classified.

SoftSeg: Advantages of soft versus binary training for image segmentation.

Gros, Charley; Lemay, Andreanne; Cohen-Adad, Julien.

Med Image Anal ; 71: 102038, 2021 07.

Article in English | MEDLINE | ID: mdl-33784599

ABSTRACT

Most image segmentation algorithms are trained on binary masks formulated as a classification task per pixel. However, in applications such as medical imaging, this "black-and-white" approach is too constraining because the contrast between two tissues is often ill-defined, i.e., the voxels located on objects' edges contain a mixture of tissues (a partial volume effect). Consequently, assigning a single "hard" label can result in a detrimental approximation. Instead, a soft prediction containing non-binary values would overcome that limitation. In this study, we introduce SoftSeg, a deep learning training approach that takes advantage of soft ground truth labels, and is not bound to binary predictions. SoftSeg aims at solving a regression instead of a classification problem. This is achieved by using (i) no binarization after preprocessing and data augmentation, (ii) a normalized ReLU final activation layer (instead of sigmoid), and (iii) a regression loss function (instead of the traditional Dice loss). We assess the impact of these three features on three open-source MRI segmentation datasets from the spinal cord gray matter, the multiple sclerosis brain lesion, and the multimodal brain tumor segmentation challenges. Across multiple random dataset splittings, SoftSeg outperformed the conventional approach, leading to an increase in Dice score of 2.0% on the gray matter dataset (p=0.001), 3.3% for the brain lesions, and 6.5% for the brain tumors. SoftSeg produces consistent soft predictions at tissues' interfaces and shows an increased sensitivity for small objects (e.g., multiple sclerosis lesions). The richness of soft labels could represent the inter-expert variability, the partial volume effect, and complement the model uncertainty estimation, which is typically unclear with binary predictions. The developed training pipeline can easily be incorporated into most of the existing deep learning architectures. SoftSeg is implemented in the freely-available deep learning toolbox ivadomed (https://ivadomed.org).

Subject(s)

Brain Neoplasms , Multiple Sclerosis , Algorithms , Brain Neoplasms/diagnostic imaging , Humans , Image Processing, Computer-Assisted , Magnetic Resonance Imaging

Automatic multiclass intramedullary spinal cord tumor segmentation on MRI with deep learning.

Lemay, Andreanne; Gros, Charley; Zhuo, Zhizheng; Zhang, Jie; Duan, Yunyun; Cohen-Adad, Julien; Liu, Yaou.

Neuroimage Clin ; 31: 102766, 2021.

Article in English | MEDLINE | ID: mdl-34352654

ABSTRACT

Spinal cord tumors lead to neurological morbidity and mortality. Being able to obtain morphometric quantification (size, location, growth rate) of the tumor, edema, and cavity can result in improved monitoring and treatment planning. Such quantification requires the segmentation of these structures into three separate classes. However, manual segmentation of three-dimensional structures is time consuming, tedious and prone to intra- and inter-rater variability, motivating the development of automated methods. Here, we tailor a model adapted to the spinal cord tumor segmentation task. Data were obtained from 343 patients using gadolinium-enhanced T1-weighted and T2-weighted MRI scans with cervical, thoracic, and/or lumbar coverage. The dataset includes the three most common intramedullary spinal cord tumor types: astrocytomas, ependymomas, and hemangioblastomas. The proposed approach is a cascaded architecture with U-Net-based models that segments tumors in a two-stage process: locate and label. The model first finds the spinal cord and generates bounding box coordinates. The images are cropped according to this output, leading to a reduced field of view, which mitigates class imbalance. The tumor is then segmented. The segmentation of the tumor, cavity, and edema (as a single class) reached 76.7 ± 1.5% of Dice score and the segmentation of tumors alone reached 61.8 ± 4.0% Dice score. The true positive detection rate was above 87% for tumor, edema, and cavity. To the best of our knowledge, this is the first fully automatic deep learning model for spinal cord tumor segmentation. The multiclass segmentation pipeline is available in the Spinal Cord Toolbox (https://spinalcordtoolbox.com/). It can be run with custom data on a regular computer within seconds.

Subject(s)

Brain Neoplasms , Deep Learning , Spinal Cord Neoplasms , Humans , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Spinal Cord/diagnostic imaging , Spinal Cord Neoplasms/diagnostic imaging

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL