RESUMEN
The search for understanding immunotherapy response has sparked interest in diverse areas of oncology, with artificial intelligence (AI) and radiomics emerging as promising tools, capable of gathering large amounts of information to identify suitable patients for treatment. The application of AI in radiology has grown, driven by the hypothesis that radiology images capture tumor phenotypes and thus could provide valuable insights into immunotherapy response likelihood. However, despite the rapid growth of studies, no algorithms in the field have reached clinical implementation, mainly due to the lack of standardized methods, hampering study comparisons and reproducibility across different datasets. In this review, we performed a comprehensive assessment of published data to identify sources of variability in radiomics study design that hinder the comparison of the different model performance and, therefore, clinical implementation. Subsequently, we conducted a use-case meta-analysis using homogenous studies to assess the overall performance of radiomics in estimating programmed death-ligand 1 (PD-L1) expression. Our findings indicate that, despite numerous attempts to predict immunotherapy response, only a limited number of studies share comparable methodologies and report sufficient data about cohorts and methods to be suitable for meta-analysis. Nevertheless, although only a few studies meet these criteria, their promising results underscore the importance of ongoing standardization and benchmarking efforts. This review highlights the importance of uniformity in study design and reporting. Such standardization is crucial to enable meaningful comparisons and demonstrate the validity of biomarkers across diverse populations, facilitating their implementation into the immunotherapy patient selection process.