RESUMEN
The biopsy Gleason score is an important prognostic marker for prostate cancer patients. It is, however, subject to substantial variability among pathologists. Artificial intelligence (AI)-based algorithms employing deep learning have shown their ability to match pathologists' performance in assigning Gleason scores, with the potential to enhance pathologists' grading accuracy. The performance of Gleason AI algorithms in research is mostly reported on common benchmark data sets or within public challenges. In contrast, many commercial algorithms are evaluated in clinical studies, for which data are not publicly released. As commercial AI vendors typically do not publish performance on public benchmarks, comparison between research and commercial AI is difficult. The aims of this study are to evaluate and compare the performance of top-ranked public and commercial algorithms using real-world data. We curated a diverse data set of whole-slide prostate biopsy images through crowdsourcing containing images with a range of Gleason scores and from diverse sources. Predictions were obtained from 5 top-ranked public algorithms from the Prostate cANcer graDe Assessment (PANDA) challenge and 2 commercial Gleason grading algorithms. Additionally, 10 pathologists (A.C., C.R., J.v.I., K.R.M.L., P.R., P.G.S., R.G., S.F.K.J., T.v.d.K., X.F.) evaluated the data set in a reader study. Overall, the pairwise quadratic weighted kappa among pathologists ranged from 0.777 to 0.916. Both public and commercial algorithms showed high agreement with pathologists, with quadratic kappa ranging from 0.617 to 0.900. Commercial algorithms performed on par or outperformed top public algorithms.
RESUMEN
ABSTRACT: A crucial aspect of prostate cancer grading, especially in low- and intermediate-risk cancer, is the accurate identification of Gleason pattern 4 glands, which includes ill-formed or fused glands. However, there is notable inconsistency among pathologists in recognizing these glands, especially when mixed with pattern 3 glands. This inconsistency has significant implications for patient management and treatment decisions. Conversely, the recognition of glomeruloid and cribriform architecture has shown higher reproducibility. Cribriform architecture, in particular, has been linked to the worst prognosis among pattern 4 subtypes. Intraductal carcinoma of the prostate (IDC-P) is also associated with high-grade cancer and poor prognosis. Accurate identification, classification, and tumor size evaluation by pathologists are vital for determining patient treatment. This review emphasizes the importance of prostate cancer grading, highlighting challenges like distinguishing between pattern 3 and pattern 4 and the prognostic implications of cribriform architecture and intraductal proliferations. It also addresses the inherent grading limitations due to interobserver variability and explores the potential of computational pathology to enhance pathologist accuracy and consistency.