Your browser doesn't support javascript.
loading
SurvdigitizeR: an algorithm for automated survival curve digitization.
Zhang, Jasper Zhongyuan; Rios, Juan David; Pechlivanoglou, Tilemanchos; Yang, Alan; Zhang, Qiyue; Deris, Dimitrios; Cromwell, Ian; Pechlivanoglou, Petros.
Afiliação
  • Zhang JZ; Child Health Evaluative Sciences, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, Toronto, ON, Canada.
  • Rios JD; Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
  • Pechlivanoglou T; Child Health Evaluative Sciences, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, Toronto, ON, Canada.
  • Yang A; Lassonde School of Engineering, York University, Toronto, ON, Canada.
  • Zhang Q; Child Health Evaluative Sciences, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, Toronto, ON, Canada.
  • Deris D; Child Health Evaluative Sciences, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, Toronto, ON, Canada.
  • Cromwell I; Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada.
  • Pechlivanoglou P; Canada's Drug Agency, Ottawa, ON, Canada.
BMC Med Res Methodol ; 24(1): 147, 2024 Jul 13.
Article em En | MEDLINE | ID: mdl-39003440
ABSTRACT

BACKGROUND:

Decision analytic models and meta-analyses often rely on survival probabilities that are digitized from published Kaplan-Meier (KM) curves. However, manually extracting these probabilities from KM curves is time-consuming, expensive, and error-prone. We developed an efficient and accurate algorithm that automates extraction of survival probabilities from KM curves.

METHODS:

The automated digitization algorithm processes images from a JPG or PNG format, converts them in their hue, saturation, and lightness scale and uses optical character recognition to detect axis location and labels. It also uses a k-medoids clustering algorithm to separate multiple overlapping curves on the same figure. To validate performance, we generated survival plots form random time-to-event data from a sample size of 25, 50, 150, and 250, 1000 individuals split into 1,2, or 3 treatment arms. We assumed an exponential distribution and applied random censoring. We compared automated digitization and manual digitization performed by well-trained researchers. We calculated the root mean squared error (RMSE) at 100-time points for both methods. The algorithm's performance was also evaluated by Bland-Altman analysis for the agreement between automated and manual digitization on a real-world set of published KM curves.

RESULTS:

The automated digitizer accurately identified survival probabilities over time in the simulated KM curves. The average RMSE for automated digitization was 0.012, while manual digitization had an average RMSE of 0.014. Its performance was negatively correlated with the number of curves in a figure and the presence of censoring markers. In real-world scenarios, automated digitization and manual digitization showed very close agreement.

CONCLUSIONS:

The algorithm streamlines the digitization process and requires minimal user input. It effectively digitized KM curves in simulated and real-world scenarios, demonstrating accuracy comparable to conventional manual digitization. The algorithm has been developed as an open-source R package and as a Shiny application and is available on GitHub https//github.com/Pechli-Lab/SurvdigitizeR and https//pechlilab.shinyapps.io/SurvdigitizeR/ .
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos Limite: Humans Idioma: En Revista: BMC Med Res Methodol Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos Limite: Humans Idioma: En Revista: BMC Med Res Methodol Ano de publicação: 2024 Tipo de documento: Article