RESUMO
Prostate cancer (PCa) data is of public health importance in South Africa. Biopsy data is recorded as semi-structured narrative text that is not easily analysed. Our study reports a pilot study that applied predictive analytics and text mining techniques to extract prognostic information that guides patient management. In particular, the Gleason score (GS) reported in a number of formats were extracted successfully. Our study reports that predominantly older men were diagnosed with PCa reporting a high-risk GS (8-10). Where cell differentiation was reported, 64% of biopsies reported poor differentiation. The approaches demonstrated in our study should be extended to a larger dataset to assess whether it has the potential to scale up to the national level.