RESUMO
The French Society of Pathology (SFP) organized its first data challenge in 2020 with the help of the Health Data Hub (HDH). The organization of this event first consisted of recruiting nearly 5000 cervical biopsy slides obtained from 20 pathology centers. After ensuring that patients did not refuse to include their slides in the project, the slides were anonymized, digitized, and annotated by expert pathologists, and finally uploaded to a data challenge platform for competitors from around the world. Competing teams had to develop algorithms that could distinguish 4 diagnostic classes in cervical epithelial lesions. Among the many submissions from competitors, the best algorithms achieved an overall score close to 95%. The final part of the competition lasted only 6 weeks, and the goal of SFP and HDH is now to allow for the collection to be published in open access for the scientific community. In this report, we have performed a "post-competition analysis" of the results. We first described the algorithmic pipelines of 3 top competitors. We then analyzed several difficult cases that even the top competitors could not predict correctly. A medical committee of several expert pathologists looked for possible explanations for these erroneous results by reviewing the images, and we present their findings here targeted for a large audience of pathologists and data scientists in the field of digital pathology.
RESUMO
Cervical cancer is the fourth most common cancer in women worldwide. To determine early treatment for patients, it is critical to accurately classify the cervical intraepithelial lesion status based on a microscopic biopsy. Lesion classification is a 4-class problem, with biopsies being designated as benign or increasingly malignant as class 1-3, with 3 being invasive cancer. Unfortunately, traditional biopsy analysis by a pathologist is time-consuming and subject to intra- and inter-observer variability. For this reason, it is of interest to develop automatic analysis pipelines to classify lesion status directly from a digitalized whole slide image (WSI). The recent TissueNet Challenge was organized to find the best automatic detection pipeline for this task, using a dataset of 1015 annotated WSI slides. In this work, we present our winning end-to-end solution for cervical slide classification composed of a two-step classification model: First, we classify individual slide patches using an ensemble CNN, followed by an SVM-based slide classification using statistical features of the aggregated patch-level predictions. Importantly, we present the key innovation of our approach, which is a novel partial label-based loss function that allows us to supplement the supervised WSI patch annotations with weakly supervised patches based on the WSI class. This led to us not requiring additional expert tissue annotation, while still reaching the winning score of 94.7%. Our approach is a step towards the clinical inclusion of automatic pipelines for cervical cancer treatment planning.Clinical relevance- The explanation of the winning Tis-sueNet AI algorithm for automated cervical cancer classification, which may provide insights for the next generation of computer assisted tools in digital pathology.