Determining the clinical applicability of machine learning models through assessment of reporting across skin phototypes and rarer skin cancer types: A systematic review.

Steele, Lloyd; Tan, Xiang Li; Olabi, Bayanne; Gao, Jing Mia; Tanaka, Reiko J; Williams, Hywel C

Steele, Lloyd; Tan, Xiang Li; Olabi, Bayanne; Gao, Jing Mia; Tanaka, Reiko J; Williams, Hywel C.

Afiliação

Steele L; Department of Dermatology, The Royal London Hospital, London, UK.
Tan XL; Centre for Cell Biology and Cutaneous Research, Blizard Institute, Queen Mary University of London, London, UK.
Olabi B; St George's University Hospitals NHS Foundation Trust, London, UK.
Gao JM; Biosciences Institute, Newcastle University, Newcastle, UK.
Tanaka RJ; Department of Dermatology, The Royal London Hospital, London, UK.
Williams HC; Department of Bioengineering, Imperial College London, London, UK.

J Eur Acad Dermatol Venereol ; 37(4): 657-665, 2023 Apr.

Article em En | MEDLINE | ID: mdl-36514990

ABSTRACT

ABSTRACT

Machine learning (ML) models for skin cancer recognition may have variable performance across different skin phototypes and skin cancer types. Overall performance metrics alone are insufficient to detect poor subgroup performance. We aimed (1) to assess whether studies of ML models reported results separately for different skin phototypes and rarer skin cancers, and (2) to graphically represent the skin cancer training datasets used by current ML models. In this systematic review, we searched PubMed, Embase and CENTRAL. We included all studies in medical journals assessing an ML technique for skin cancer diagnosis that used clinical or dermoscopic images from 1 January 2012 to 22 September 2021. No language restrictions were applied. We considered rarer skin cancers to be skin cancers other than pigmented melanoma, basal cell carcinoma and squamous cell carcinoma. We identified 114 studies for inclusion. Rarer skin cancers were included by 8/114 studies (7.0%), and results for a rarer skin cancer were reported separately in 1/114 studies (0.9%). Performance was reported across all skin phototypes in 1/114 studies (0.9%), but performance was uncertain in skin phototypes I and VI from minimal representation of the skin phototypes in the test dataset (9/3756 and 1/3756, respectively). For training datasets, although public datasets were most frequently used, with the most widely used being the International Skin Imaging Collaboration (ISIC) archive (65/114 studies, 57.0%), the largest datasets were private. Our review identified that most ML models did not report performance separately for rarer skin cancers and different skin phototypes. A degree of variability in ML model performance across subgroups is expected, but the current lack of transparency is not justifiable and risks models being used inappropriately in populations in whom accuracy is low.

Assuntos

Carcinoma Basocelular; Carcinoma de Células Escamosas; Melanoma; Neoplasias Cutâneas; Humanos; Neoplasias Cutâneas/patologia; Carcinoma Basocelular/patologia; Melanoma/diagnóstico; Melanoma/patologia; Pele/patologia; Carcinoma de Células Escamosas/patologia

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Neoplasias Cutâneas / Carcinoma Basocelular / Carcinoma de Células Escamosas / Melanoma Tipo de estudo: Prognostic_studies / Systematic_reviews Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google