Your browser doesn't support javascript.
loading
The uncovered biases and errors in clinical determination of bone age by using deep learning models.
Bai, Mei; Gao, Liangxin; Ji, Min; Ge, Jianbang; Huang, Lingyun; Qiao, HaoChen; Xiao, Jing; Chen, Xiaotian; Yang, Bin; Sun, Yingqi; Zhang, Minjie; Zhang, Wenjie; Luo, Feihong; Yang, Haowei; Mei, Haibing; Qiao, Zhongwei.
Afiliação
  • Bai M; Department of Radiology, Children's Hospital of Fudan University, No 399, Wan Yuan Road, Minhang District, Shanghai, 201102, China.
  • Gao L; Ping An Technology, Shenzhen, China.
  • Ji M; Department of Radiology, Children's Hospital of Fudan University, No 399, Wan Yuan Road, Minhang District, Shanghai, 201102, China. ilovexray_349@163.com.
  • Ge J; Ping An Technology, Shenzhen, China.
  • Huang L; Ping An Technology, Shenzhen, China.
  • Qiao H; School of Public Health, Yale University, New Haven, USA.
  • Xiao J; Ping An Technology, Shenzhen, China.
  • Chen X; Department of Clinical epidemiology, Children's Hospital of Fudan University, Shanghai, China.
  • Yang B; Department of Radiology, Children's Hospital of Fudan University, No 399, Wan Yuan Road, Minhang District, Shanghai, 201102, China.
  • Sun Y; Department of Radiology, Children's Hospital of Fudan University, No 399, Wan Yuan Road, Minhang District, Shanghai, 201102, China.
  • Zhang M; Department of Radiology, Children's Hospital of Fudan University, No 399, Wan Yuan Road, Minhang District, Shanghai, 201102, China.
  • Zhang W; Information Technology Center, Children's Hospital of Fudan University, Shanghai, China.
  • Luo F; Department of Endocrinology, Children's Hospital of Fudan University, Shanghai, China.
  • Yang H; Department of Radiology, Children's Hospital of Fudan University, No 399, Wan Yuan Road, Minhang District, Shanghai, 201102, China.
  • Mei H; Department of Radiology, Ningbo Women and Children's Hospital, Ningbo, China.
  • Qiao Z; Department of Radiology, Children's Hospital of Fudan University, No 399, Wan Yuan Road, Minhang District, Shanghai, 201102, China. zqiao@fudan.edu.cn.
Eur Radiol ; 33(5): 3544-3556, 2023 May.
Article em En | MEDLINE | ID: mdl-36538072
ABSTRACT

OBJECTIVES:

To evaluate AI biases and errors in estimating bone age (BA) by comparing AI and radiologists' clinical determinations of BA.

METHODS:

We established three deep learning models from a Chinese private dataset (CHNm), an American public dataset (USAm), and a joint dataset combining the above two datasets (JOIm). The test data CHNt (n = 1246) were labeled by ten senior pediatric radiologists. The effects of data site differences, interpretation bias, and interobserver variability on BA assessment were evaluated. The differences between the AI models' and radiologists' clinical determinations of BA (normal, advanced, and delayed BA groups by using the Brush data) were evaluated by the chi-square test and Kappa values. The heatmaps of CHNm-CHNt were generated by using Grad-CAM.

RESULTS:

We obtained an MAD value of 0.42 years on CHNm-CHNt; this result indicated an appropriate accuracy for the whole group but did not indicate an accurate estimation of individual BA because with a kappa value of 0.714, the agreement between AI and human clinical determinations of BA was significantly different. The features of the heatmaps were not fully consistent with the human vision on the X-ray films. Variable performance in BA estimation by different AI models and the disagreement between AI and radiologists' clinical determinations of BA may be caused by data biases, including patients' sex and age, institutions, and radiologists.

CONCLUSIONS:

The deep learning models outperform external validation in predicting BA on both internal and joint datasets. However, the biases and errors in the models' clinical determinations of child development should be carefully considered. KEY POINTS • With a kappa value of 0.714, clinical determinations of bone age by using AI did not accord well with clinical determinations by radiologists. • Several biases, including patients' sex and age, institutions, and radiologists, may cause variable performance by AI bone age models and disagreement between AI and radiologists' clinical determinations of bone age. • AI heatmaps of bone age were not fully consistent with human vision on X-ray films.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Simulação por Computador / Determinação da Idade pelo Esqueleto / Aprendizado Profundo Tipo de estudo: Prognostic_studies Limite: Adolescent / Child / Child, preschool / Female / Humans / Male País/Região como assunto: America do norte Idioma: En Revista: Eur Radiol Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Simulação por Computador / Determinação da Idade pelo Esqueleto / Aprendizado Profundo Tipo de estudo: Prognostic_studies Limite: Adolescent / Child / Child, preschool / Female / Humans / Male País/Região como assunto: America do norte Idioma: En Revista: Eur Radiol Ano de publicação: 2023 Tipo de documento: Article