Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros

Bases de dados
Ano de publicação
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Radiology ; 310(3): e232255, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38470237

RESUMO

Background Large language models (LLMs) hold substantial promise for medical imaging interpretation. However, there is a lack of studies on their feasibility in handling reasoning questions associated with medical diagnosis. Purpose To investigate the viability of leveraging three publicly available LLMs to enhance consistency and diagnostic accuracy in medical imaging based on standardized reporting, with pathology as the reference standard. Materials and Methods US images of thyroid nodules with pathologic results were retrospectively collected from a tertiary referral hospital between July 2022 and December 2022 and used to evaluate malignancy diagnoses generated by three LLMs-OpenAI's ChatGPT 3.5, ChatGPT 4.0, and Google's Bard. Inter- and intra-LLM agreement of diagnosis were evaluated. Then, diagnostic performance, including accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC), was evaluated and compared for the LLMs and three interactive approaches: human reader combined with LLMs, image-to-text model combined with LLMs, and an end-to-end convolutional neural network model. Results A total of 1161 US images of thyroid nodules (498 benign, 663 malignant) from 725 patients (mean age, 42.2 years ± 14.1 [SD]; 516 women) were evaluated. ChatGPT 4.0 and Bard displayed substantial to almost perfect intra-LLM agreement (κ range, 0.65-0.86 [95% CI: 0.64, 0.86]), while ChatGPT 3.5 showed fair to substantial agreement (κ range, 0.36-0.68 [95% CI: 0.36, 0.68]). ChatGPT 4.0 had an accuracy of 78%-86% (95% CI: 76%, 88%) and sensitivity of 86%-95% (95% CI: 83%, 96%), compared with 74%-86% (95% CI: 71%, 88%) and 74%-91% (95% CI: 71%, 93%), respectively, for Bard. Moreover, with ChatGPT 4.0, the image-to-text-LLM strategy exhibited an AUC (0.83 [95% CI: 0.80, 0.85]) and accuracy (84% [95% CI: 82%, 86%]) comparable to those of the human-LLM interaction strategy with two senior readers and one junior reader and exceeding those of the human-LLM interaction strategy with one junior reader. Conclusion LLMs, particularly integrated with image-to-text approaches, show potential in enhancing diagnostic medical imaging. ChatGPT 4.0 was optimal for consistency and diagnostic accuracy when compared with Bard and ChatGPT 3.5. © RSNA, 2024 Supplemental material is available for this article.


Assuntos
Nódulo da Glândula Tireoide , Humanos , Feminino , Adulto , Nódulo da Glândula Tireoide/diagnóstico por imagem , Estudos Retrospectivos , Idioma , Redes Neurais de Computação , Curva ROC
2.
Radiology ; 311(1): e231461, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38652028

RESUMO

Background Noninvasive tests can be used to screen patients with chronic liver disease for advanced liver fibrosis; however, the use of single tests may not be adequate. Purpose To construct sequential clinical algorithms that include a US deep learning (DL) model and compare their ability to predict advanced liver fibrosis with that of other noninvasive tests. Materials and Methods This retrospective study included adult patients with a history of chronic liver disease or unexplained abnormal liver function test results who underwent B-mode US of the liver between January 2014 and September 2022 at three health care facilities. A US-based DL network (FIB-Net) was trained on US images to predict whether the shear-wave elastography (SWE) value was 8.7 kPa or higher, indicative of advanced fibrosis. In the internal and external test sets, a two-step algorithm (Two-step#1) using the Fibrosis-4 Index (FIB-4) followed by FIB-Net and a three-step algorithm (Three-step#1) using FIB-4 followed by FIB-Net and SWE were used to simulate screening scenarios where liver stiffness measurements were not or were available, respectively. Measures of diagnostic accuracy were calculated using liver biopsy as the reference standard and compared between FIB-4, SWE, FIB-Net, and European Association for the Study of the Liver guidelines (ie, FIB-4 followed by SWE), along with sequential algorithms. Results The training, validation, and test data sets included 3067 (median age, 42 years [IQR, 33-53 years]; 2083 male), 1599 (median age, 41 years [IQR, 33-51 years]; 1124 male), and 1228 (median age, 44 years [IQR, 33-55 years]; 741 male) patients, respectively. FIB-Net obtained a noninferior specificity with a margin of 5% (P < .001) compared with SWE (80% vs 82%). The Two-step#1 algorithm showed higher specificity and positive predictive value (PPV) than FIB-4 (specificity, 79% vs 57%; PPV, 44% vs 32%) while reducing unnecessary referrals by 42%. The Three-step#1 algorithm had higher specificity and PPV compared with European Association for the Study of the Liver guidelines (specificity, 94% vs 88%; PPV, 73% vs 64%) while reducing unnecessary referrals by 35%. Conclusion A sequential algorithm combining FIB-4 and a US DL model showed higher diagnostic accuracy and improved referral management for all-cause advanced liver fibrosis compared with FIB-4 or the DL model alone. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Ghosh in this issue.


Assuntos
Algoritmos , Técnicas de Imagem por Elasticidade , Cirrose Hepática , Humanos , Masculino , Cirrose Hepática/diagnóstico por imagem , Pessoa de Meia-Idade , Feminino , Estudos Retrospectivos , Técnicas de Imagem por Elasticidade/métodos , Adulto , Aprendizado Profundo , Fígado/diagnóstico por imagem , Fígado/patologia , Idoso , Ultrassonografia/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA