Your browser doesn't support javascript.
loading
Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data.
Doran, Simon J; Barfoot, Theo; Wedlake, Linda; Winfield, Jessica M; Petts, James; Glocker, Ben; Li, Xingfeng; Leach, Martin; Kaiser, Martin; Barwick, Tara D; Chaidos, Aristeidis; Satchwell, Laura; Soneji, Neil; Elgendy, Khalil; Sheeka, Alexander; Wallitt, Kathryn; Koh, Dow-Mu; Messiou, Christina; Rockall, Andrea.
Afiliación
  • Doran SJ; Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK. simon.doran@icr.ac.uk.
  • Barfoot T; National Cancer Imaging Translational Accelerator, London, UK. simon.doran@icr.ac.uk.
  • Wedlake L; Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK.
  • Winfield JM; Department of Radiology, The Royal Marsden NHS Foundation Trust, London, UK.
  • Petts J; The Royal Marsden Clinical Trials Unit, London, UK.
  • Glocker B; Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK.
  • Li X; Joint Department of Physics, The Royal Marsden NHS Foundation Trust, London, UK.
  • Leach M; Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK.
  • Kaiser M; Department of Computing, Imperial College London, London, UK.
  • Barwick TD; Division of Cancer, Department of Surgery and Cancer, Imperial College London, London, UK.
  • Chaidos A; Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK.
  • Satchwell L; Joint Department of Physics, The Royal Marsden NHS Foundation Trust, London, UK.
  • Soneji N; Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK.
  • Elgendy K; Haemato-Oncology Unit, The Royal Marsden NHS Foundation Trust, London, UK.
  • Sheeka A; Division of Cancer, Department of Surgery and Cancer, Imperial College London, London, UK.
  • Wallitt K; Department of Radiology, Imperial College Healthcare NHS Trust, London, UK.
  • Koh DM; Department of Haematology, Imperial College Healthcare NHS Trust, London, UK.
  • Messiou C; Research and Development Statistics Unit, The Royal Marsden NHS Foundation Trust, London, UK.
  • Rockall A; Department of Radiology, Imperial College Healthcare NHS Trust, London, UK.
Insights Imaging ; 15(1): 47, 2024 Feb 16.
Article en En | MEDLINE | ID: mdl-38361108
ABSTRACT

OBJECTIVES:

MAchine Learning In MyelomA Response (MALIMAR) is an observational clinical study combining "real-world" and clinical trial data, both retrospective and prospective. Images were acquired on three MRI scanners over a 10-year window at two institutions, leading to a need for extensive curation.

METHODS:

Curation involved image aggregation, pseudonymisation, allocation between project phases, data cleaning, upload to an XNAT repository visible from multiple sites, annotation, incorporation of machine learning research outputs and quality assurance using programmatic methods.

RESULTS:

A total of 796 whole-body MR imaging sessions from 462 subjects were curated. A major change in scan protocol part way through the retrospective window meant that approximately 30% of available imaging sessions had properties that differed significantly from the remainder of the data. Issues were found with a vendor-supplied clinical algorithm for "composing" whole-body images from multiple imaging stations. Historic weaknesses in a digital video disk (DVD) research archive (already addressed by the mid-2010s) were highlighted by incomplete datasets, some of which could not be completely recovered. The final dataset contained 736 imaging sessions for 432 subjects. Software was written to clean and harmonise data. Implications for the subsequent machine learning activity are considered.

CONCLUSIONS:

MALIMAR exemplifies the vital role that curation plays in machine learning studies that use real-world data. A research repository such as XNAT facilitates day-to-day management, ensures robustness and consistency and enhances the value of the final dataset. The types of process described here will be vital for future large-scale multi-institutional and multi-national imaging projects. CRITICAL RELEVANCE STATEMENT This article showcases innovative data curation methods using a state-of-the-art image repository platform; such tools will be vital for managing the large multi-institutional datasets required to train and validate generalisable ML algorithms and future foundation models in medical imaging. KEY POINTS • Heterogeneous data in the MALIMAR study required the development of novel curation strategies. • Correction of multiple problems affecting the real-world data was successful, but implications for machine learning are still being evaluated. • Modern image repositories have rich application programming interfaces enabling data enrichment and programmatic QA, making them much more than simple "image marts".
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Tipo de estudio: Guideline / Observational_studies / Prognostic_studies Idioma: En Año: 2024 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Tipo de estudio: Guideline / Observational_studies / Prognostic_studies Idioma: En Año: 2024 Tipo del documento: Article