RESUMO
PURPOSE: K trans $$ {K}^{\mathrm{trans}} $$ has often been proposed as a quantitative imaging biomarker for diagnosis, prognosis, and treatment response assessment for various tumors. None of the many software tools for K trans $$ {K}^{\mathrm{trans}} $$ quantification are standardized. The ISMRM Open Science Initiative for Perfusion Imaging-Dynamic Contrast-Enhanced (OSIPI-DCE) challenge was designed to benchmark methods to better help the efforts to standardize K trans $$ {K}^{\mathrm{trans}} $$ measurement. METHODS: A framework was created to evaluate K trans $$ {K}^{\mathrm{trans}} $$ values produced by DCE-MRI analysis pipelines to enable benchmarking. The perfusion MRI community was invited to apply their pipelines for K trans $$ {K}^{\mathrm{trans}} $$ quantification in glioblastoma from clinical and synthetic patients. Submissions were required to include the entrants' K trans $$ {K}^{\mathrm{trans}} $$ values, the applied software, and a standard operating procedure. These were evaluated using the proposed OSIP I gold $$ \mathrm{OSIP}{\mathrm{I}}_{\mathrm{gold}} $$ score defined with accuracy, repeatability, and reproducibility components. RESULTS: Across the 10 received submissions, the OSIP I gold $$ \mathrm{OSIP}{\mathrm{I}}_{\mathrm{gold}} $$ score ranged from 28% to 78% with a 59% median. The accuracy, repeatability, and reproducibility scores ranged from 0.54 to 0.92, 0.64 to 0.86, and 0.65 to 1.00, respectively (0-1 = lowest-highest). Manual arterial input function selection markedly affected the reproducibility and showed greater variability in K trans $$ {K}^{\mathrm{trans}} $$ analysis than automated methods. Furthermore, provision of a detailed standard operating procedure was critical for higher reproducibility. CONCLUSIONS: This study reports results from the OSIPI-DCE challenge and highlights the high inter-software variability within K trans $$ {K}^{\mathrm{trans}} $$ estimation, providing a framework for ongoing benchmarking against the scores presented. Through this challenge, the participating teams were ranked based on the performance of their software tools in the particular setting of this challenge. In a real-world clinical setting, many of these tools may perform differently with different benchmarking methodology.