Pesquisa | Secretaria de Estado da Saúde

Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists?

Li, Yi-Chu; Chen, Hung-Hsun; Horng-Shing Lu, Henry; Hondar Wu, Hung-Ta; Chang, Ming-Chau; Chou, Po-Hsin.

Clin Orthop Relat Res ; 479(7): 1598-1612, 2021 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-33651768

RESUMO

BACKGROUND: Vertebral fractures are the most common osteoporotic fractures in older individuals. Recent studies suggest that the performance of artificial intelligence is equal to humans in detecting osteoporotic fractures, such as fractures of the hip, distal radius, and proximal humerus. However, whether artificial intelligence performs as well in the detection of vertebral fractures on plain lateral spine radiographs has not yet been reported. QUESTIONS/PURPOSES: (1) What is the accuracy, sensitivity, specificity, and interobserver reliability (kappa value) of an artificial intelligence model in detecting vertebral fractures, based on Genant fracture grades, using plain lateral spine radiographs compared with values obtained by human observers? (2) Do patients' clinical data, including the anatomic location of the fracture (thoracic or lumbar spine), T-score on dual-energy x-ray absorptiometry, or fracture grade severity, affect the performance of an artificial intelligence model? (3) How does the artificial intelligence model perform on external validation? METHODS: Between 2016 and 2018, 1019 patients older than 60 years were treated for vertebral fractures in our institution. Seventy-eight patients were excluded because of missing CT or MRI scans (24% [19]), poor image quality in plain lateral radiographs of spines (54% [42]), multiple myeloma (5% [4]), and prior spine instrumentation (17% [13]). The plain lateral radiographs of 941 patients (one radiograph per person), with a mean age of 76 ± 12 years, and 1101 vertebral fractures between T7 and L5 were retrospectively evaluated for training (n = 565), validating (n = 188), and testing (n = 188) of an artificial intelligence deep-learning model. The gold standard for diagnosis (ground truth) of a vertebral fracture is the interpretation of the CT or MRI reports by a spine surgeon and a radiologist independently. If there were any disagreements between human observers, the corresponding CT or MRI images would be rechecked by them together to reach a consensus. For the Genant classification, the injured vertebral body height was measured in the anterior, middle, and posterior third. Fractures were classified as Grade 1 (< 25%), Grade 2 (26% to 40%), or Grade 3 (> 40%). The framework of the artificial intelligence deep-learning model included object detection, data preprocessing of radiographs, and classification to detect vertebral fractures. Approximately 90 seconds was needed to complete the procedure and obtain the artificial intelligence model results when applied clinically. The accuracy, sensitivity, specificity, interobserver reliability (kappa value), receiver operating characteristic curve, and area under the curve (AUC) were analyzed. The bootstrapping method was applied to our testing dataset and external validation dataset. The accuracy, sensitivity, and specificity were used to investigate whether fracture anatomic location or T-score in dual-energy x-ray absorptiometry report affected the performance of the artificial intelligence model. The receiver operating characteristic curve and AUC were used to investigate the relationship between the performance of the artificial intelligence model and fracture grade. External validation with a similar age population and plain lateral radiographs from another medical institute was also performed to investigate the performance of the artificial intelligence model. RESULTS: The artificial intelligence model with ensemble method demonstrated excellent accuracy (93% [773 of 830] of vertebrae), sensitivity (91% [129 of 141]), and specificity (93% [644 of 689]) for detecting vertebral fractures of the lumbar spine. The interobserver reliability (kappa value) of the artificial intelligence performance and human observers for thoracic and lumbar vertebrae were 0.72 (95% CI 0.65 to 0.80; p < 0.001) and 0.77 (95% CI 0.72 to 0.83; p < 0.001), respectively. The AUCs for Grades 1, 2, and 3 vertebral fractures were 0.919, 0.989, and 0.990, respectively. The artificial intelligence model with ensemble method demonstrated poorer performance for discriminating normal osteoporotic lumbar vertebrae, with a specificity of 91% (260 of 285) compared with nonosteoporotic lumbar vertebrae, with a specificity of 95% (222 of 234). There was a higher sensitivity 97% (60 of 62) for detecting osteoporotic (dual-energy x-ray absorptiometry T-score ≤ -2.5) lumbar vertebral fractures, implying easier detection, than for nonosteoporotic vertebral fractures (83% [39 of 47]). The artificial intelligence model also demonstrated better detection of lumbar vertebral fractures compared with detection of thoracic vertebral fractures based on the external dataset using various radiographic techniques. Based on the dataset for external validation, the overall accuracy, sensitivity, and specificity on bootstrapping method were 89%, 83%, and 95%, respectively. CONCLUSION: The artificial intelligence model detected vertebral fractures on plain lateral radiographs with high accuracy, sensitivity, and specificity, especially for osteoporotic lumbar vertebral fractures (Genant Grades 2 and 3). The rapid reporting of results using this artificial intelligence model may improve the efficiency of diagnosing vertebral fractures. The testing model is available at http://140.113.114.104/vght_demo/corr/. One or multiple plain lateral radiographs of the spine in the Digital Imaging and Communications in Medicine format can be uploaded to see the performance of the artificial intelligence model. LEVEL OF EVIDENCE: Level II, diagnostic study.

Assuntos

Aprendizado Profundo/estatística & dados numéricos , Vértebras Lombares/lesões , Fraturas por Osteoporose/diagnóstico , Radiografia/estatística & dados numéricos , Fraturas da Coluna Vertebral/diagnóstico , Vértebras Torácicas/lesões , Absorciometria de Fóton/métodos , Absorciometria de Fóton/estatística & dados numéricos , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Vértebras Lombares/diagnóstico por imagem , Masculino , Variações Dependentes do Observador , Curva ROC , Radiografia/métodos , Reprodutibilidade dos Testes , Estudos Retrospectivos , Sensibilidade e Especificidade , Vértebras Torácicas/diagnóstico por imagem

Sufficient dimension reduction with additional information.

Hung, Hung; Liu, Chih-Yen; Horng-Shing Lu, Henry.

Biostatistics ; 17(3): 405-21, 2016 07.

Artigo em Inglês | MEDLINE | ID: mdl-26704765

RESUMO

Sufficient dimension reduction is widely applied to help model building between the response [Formula: see text] and covariate [Formula: see text] In some situations, we also collect additional covariate [Formula: see text] that has better performance in predicting [Formula: see text], but has a higher obtaining cost, than [Formula: see text] While constructing a predictive model for [Formula: see text] based on [Formula: see text] is straightforward, this strategy is not applicable since [Formula: see text] is not available for future observations in which the constructed model is to be applied. As a result, the aim of the study is to build a predictive model for [Formula: see text] based on [Formula: see text] only, where the available data is [Formula: see text] A naive method is to conduct analysis using [Formula: see text] directly, but ignoring [Formula: see text] can cause the problem of inefficiency. On the other hand, it is not trivial to utilize the information of [Formula: see text] to infer [Formula: see text], either. In this article, we propose a two-stage dimension reduction method for [Formula: see text] that is able to utilize the information of [Formula: see text] In the breast cancer data, the risk score constructed from the two-stage method can well separate patients with different survival experiences. In the Pima data, the two-stage method requires fewer components to infer the diabetes status, while achieving higher classification accuracy than the conventional method.

Assuntos

Interpretação Estatística de Dados , Modelos Teóricos , Medição de Risco/métodos , Arizona/etnologia , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/epidemiologia , Diabetes Mellitus/diagnóstico , Diabetes Mellitus/etnologia , Feminino , Humanos , Indígenas Norte-Americanos/etnologia

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa