A Machine Learning method for relabeling arbitrary DICOM structure sets to TG-263 defined labels.

Sleeman Iv, William C; Nalluri, Joseph; Syed, Khajamoinuddin; Ghosh, Preetam; Krawczyk, Bartosz; Hagan, Michael; Palta, Jatinder; Kapoor, Rishabh

Sleeman Iv, William C; Nalluri, Joseph; Syed, Khajamoinuddin; Ghosh, Preetam; Krawczyk, Bartosz; Hagan, Michael; Palta, Jatinder; Kapoor, Rishabh.

Afiliação

Sleeman Iv WC; Virginia Commonwealth University, Department of Radiation Oncology, Richmond, VA, United States of America; Virginia Commonwealth University, Department of Computer Science, Richmond, VA, United States of America; National Radiation Oncology Program, Department of Veteran Affairs, Richmond, VA, Unit
Nalluri J; Virginia Commonwealth University, Department of Radiation Oncology, Richmond, VA, United States of America; National Radiation Oncology Program, Department of Veteran Affairs, Richmond, VA, United States of America.
Syed K; Virginia Commonwealth University, Department of Computer Science, Richmond, VA, United States of America.
Ghosh P; Virginia Commonwealth University, Department of Computer Science, Richmond, VA, United States of America.
Krawczyk B; Virginia Commonwealth University, Department of Computer Science, Richmond, VA, United States of America.
Hagan M; Virginia Commonwealth University, Department of Radiation Oncology, Richmond, VA, United States of America; National Radiation Oncology Program, Department of Veteran Affairs, Richmond, VA, United States of America.
Palta J; Virginia Commonwealth University, Department of Radiation Oncology, Richmond, VA, United States of America; National Radiation Oncology Program, Department of Veteran Affairs, Richmond, VA, United States of America.
Kapoor R; Virginia Commonwealth University, Department of Radiation Oncology, Richmond, VA, United States of America; National Radiation Oncology Program, Department of Veteran Affairs, Richmond, VA, United States of America.

J Biomed Inform ; 109: 103527, 2020 09.

Article em En | MEDLINE | ID: mdl-32777484

RESUMO

PURPOSE: To present a Machine Learning pipeline for automatically relabeling anatomical structure sets in the Digital Imaging and Communications in Medicine (DICOM) format to a standard nomenclature that will enable data abstraction for research and quality improvement. METHODS: DICOM structure sets from approximately 1200 lung and prostate cancer patients across 40 treatment centers were used to build predictive models to automate the relabeling of clinically specified structure labels to standardized labels as defined by the American Association of Physics in Medicine's (AAPM) Task Group 263 (TG-263). Volumetric bitmaps were created based on the delineated volumes and were combined with associated bony anatomy data to build feature vectors. Feature reduction was performed with singular value decomposition and the resulting vectors were used for predicting the label of each structure using five different classifier algorithms on the Apache Spark platform with 5-fold cross-validation. Undersampling methods were used to deal with underlying class imbalance that hindered the performance of classifiers. Experiments were performed on both a curated version of the data, which included only annotated structures, and the non-curated data that included all structures from the original treatment plans. RESULTS: Random Forest provided the highest accuracies with F1 scores of 98.77 for lung and 95.06 for prostate on the curated data sets. Scores were lower with 95.67 for lung and 90.22 for prostate on the non-curated data sets, highlighting some of the challenges of classifying real clinical data. Including bony anatomy data and pooling information from all structures for the same patient both increased accuracies. In some cases, undersampling with k-Means clustering for class balancing improved classifier accuracy but in all experiments it significantly reduced run time compared to random undersampling. CONCLUSION: This work shows that structure sets can be relabeled using our approach with accuracies over 95% for many structure types when presented with curated data. Although accuracies dropped when using the full non-curated data sets, some structure types were still correctly labeled over 90% of the time. With similar results obtained on an external test data set, we can infer that the proposed models are likely to work on other clinical data sets.

Assuntos

Algoritmos; Aprendizado de Máquina; Análise por Conglomerados; Humanos; Masculino

Palavras-chave

Class imbalance; DICOM; Machine Learning; Radiation Oncology; Random Forest; TG-263

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Limite: Humans / Male Idioma: En Revista: J Biomed Inform Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google