Búsqueda | Portal Regional de la BVS

1.

Correction: Integrating Text and Image Analysis: Exploring GPT-4V's Capabilities in Advanced Radiological Applications Across Subspecialties.

Busch, Felix; Han, Tianyu; Makowski, Marcus R; Truhn, Daniel; Bressem, Keno K; Adams, Lisa.

J Med Internet Res ; 26: e64411, 2024 Jul 19.

Artículo en Inglés | MEDLINE | ID: mdl-39028990

RESUMEN

[This corrects the article DOI: 10.2196/54948.].

2.

Weakly Supervised Deep Learning in Radiology.

Misera, Leo; Müller-Franzes, Gustav; Truhn, Daniel; Kather, Jakob Nikolas.

Radiology ; 312(1): e232085, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-39041937

RESUMEN

Deep learning (DL) is currently the standard artificial intelligence tool for computer-based image analysis in radiology. Traditionally, DL models have been trained with strongly supervised learning methods. These methods depend on reference standard labels, typically applied manually by experts. In contrast, weakly supervised learning is more scalable. Weak supervision comprises situations in which only a portion of the data are labeled (incomplete supervision), labels refer to a whole region or case as opposed to a precisely delineated image region (inexact supervision), or labels contain errors (inaccurate supervision). In many applications, weak labels are sufficient to train useful models. Thus, weakly supervised learning can unlock a large amount of otherwise unusable data for training DL models. One example of this is using large language models to automatically extract weak labels from free-text radiology reports. Here, we outline the key concepts in weakly supervised learning and provide an overview of applications in radiologic image analysis. With more fundamental and clinical translational work, weakly supervised learning could facilitate the uptake of DL in radiology and research workflows by enabling large-scale image analysis and advancing the development of new DL-based biomarkers.

Asunto(s)

Aprendizaje Profundo , Radiología , Humanos , Radiología/educación , Aprendizaje Automático Supervisado , Interpretación de Imagen Asistida por Computador/métodos

3.

Intraindividual Comparison of Different Methods for Automated BPE Assessment at Breast MRI: A Call for Standardization.

Müller-Franzes, Gustav; Khader, Firas; Tayebi Arasteh, Soroosh; Huck, Luisa; Bode, Maike; Han, Tianyu; Lemainque, Teresa; Kather, Jakob Nikolas; Nebelung, Sven; Kuhl, Christiane; Truhn, Daniel.

Radiology ; 312(1): e232304, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-39012249

RESUMEN

Background The level of background parenchymal enhancement (BPE) at breast MRI provides predictive and prognostic information and can have diagnostic implications. However, there is a lack of standardization regarding BPE assessment. Purpose To investigate how well results of quantitative BPE assessment methods correlate among themselves and with assessments made by radiologists experienced in breast MRI. Materials and Methods In this pseudoprospective analysis of 5773 breast MRI examinations from 3207 patients (mean age, 60 years ± 10 [SD]), the level of BPE was prospectively categorized according to the Breast Imaging Reporting and Data System by radiologists experienced in breast MRI. For automated extraction of BPE, fibroglandular tissue (FGT) was segmented in an automated pipeline. Four different published methods for automated quantitative BPE extractions were used: two methods (A and B) based on enhancement intensity and two methods (C and D) based on the volume of enhanced FGT. The results from all methods were correlated, and agreement was investigated in comparison with the respective radiologist-based categorization. For surrogate validation of BPE assessment, how accurately the methods distinguished premenopausal women with (n = 50) versus without (n = 896) antihormonal treatment was determined. Results Intensity-based methods (A and B) exhibited a correlation with radiologist-based categorization of 0.56 ± 0.01 and 0.55 ± 0.01, respectively, and volume-based methods (C and D) had a correlation of 0.52 ± 0.01 and 0.50 ± 0.01 (P < .001). There were notable correlation differences (P < .001) between the BPE determined with the four methods. Among the four quantitation methods, method D offered the highest accuracy for distinguishing women with versus without antihormonal therapy (P = .01). Conclusion Results of different methods for quantitative BPE assessment agree only moderately among themselves or with visual categories reported by experienced radiologists; intensity-based methods correlate more closely with radiologists' ratings than volume-based methods. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Mann in this issue.

Asunto(s)

Neoplasias de la Mama , Mama , Imagen por Resonancia Magnética , Humanos , Femenino , Persona de Mediana Edad , Imagen por Resonancia Magnética/métodos , Neoplasias de la Mama/diagnóstico por imagen , Mama/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos , Adulto , Estudios Prospectivos , Aumento de la Imagen/métodos , Anciano , Reproducibilidad de los Resultados , Estudios Retrospectivos

4.

Time-efficient combined morphologic and quantitative joint MRI: an in situ study of standardized knee cartilage defects in human cadaveric specimens.

Lemainque, Teresa; Pridöhl, Nicola; Zhang, Shuo; Huppertz, Marc; Post, Manuel; Yüksel, Can; Yoneyama, Masami; Prescher, Andreas; Kuhl, Christiane; Truhn, Daniel; Nebelung, Sven.

Eur Radiol Exp ; 8(1): 66, 2024 Jun 05.

Artículo en Inglés | MEDLINE | ID: mdl-38834751

RESUMEN

BACKGROUND: Quantitative techniques such as T2 and T1ρ mapping allow evaluating the cartilage and meniscus. We evaluated multi-interleaved X-prepared turbo-spin echo with intuitive relaxometry (MIXTURE) sequences with turbo spin-echo (TSE) contrast and additional parameter maps versus reference TSE sequences in an in situ model of human cartilage defects. METHODS: Standardized cartilage defects of 8, 5, and 3 mm in diameter were created in the lateral femora of ten human cadaveric knee specimens (81 ± 10 years old; nine males, one female). MIXTURE sequences providing proton density-weighted fat-saturated images and T2 maps or T1-weighted images and T1ρ maps as well as the corresponding two- and three-dimensional TSE reference sequences were acquired before and after defect creation (3-T scanner; knee coil). Defect delineability, bone texture, and cartilage relaxation times were quantified. Appropriate parametric or non-parametric tests were used. RESULTS: Overall, defect delineability and texture features were not significantly different between the MIXTURE and reference sequences (p ≤ 0.47). After defect creation, relaxation times significantly increased in the central femur (T2pre = 51 ± 4 ms [mean ± standard deviation] versus T2post = 56 ± 4 ms; p = 0.002) and all regions combined (T1ρpre = 40 ± 4 ms versus T1ρpost = 43 ± 4 ms; p = 0.004). CONCLUSIONS: MIXTURE permitted time-efficient simultaneous morphologic and quantitative joint assessment based on clinical image contrasts. While providing T2 or T1ρ maps in clinically feasible scan time, morphologic image features, i.e., cartilage defects and bone texture, were comparable between MIXTURE and reference sequences. RELEVANCE STATEMENT: Equally time-efficient and versatile, the MIXTURE sequence platform combines morphologic imaging using familiar contrasts, excellent image correspondence versus corresponding reference sequences and quantitative mapping information, thereby increasing the diagnostic value beyond mere morphology. KEY POINTS: â¢ Combined morphologic and quantitative MIXTURE sequences are based on three-dimensional TSE contrasts. â¢ MIXTURE sequences were studied in an in situ human cartilage defect model. â¢ Morphologic image features, i.e., defect delineabilty and bone texture, were investigated. â¢ Morphologic image features were similar between MIXTURE and reference sequences. â¢ MIXTURE allowed time-efficient simultaneous morphologic and quantitative knee joint assessment.

Asunto(s)

Cadáver , Cartílago Articular , Articulación de la Rodilla , Imagen por Resonancia Magnética , Humanos , Masculino , Imagen por Resonancia Magnética/métodos , Femenino , Cartílago Articular/diagnóstico por imagen , Articulación de la Rodilla/diagnóstico por imagen , Anciano de 80 o más Años , Anciano

5.

Insights into Predicting Tooth Extraction from Panoramic Dental Images: Artificial Intelligence vs. Dentists.

Motmaen, Ila; Xie, Kunpeng; Schönbrunn, Leon; Berens, Jeff; Grunert, Kim; Plum, Anna Maria; Raufeisen, Johannes; Ferreira, André; Hermans, Alexander; Egger, Jan; Hölzle, Frank; Truhn, Daniel; Puladi, Behrus.

Clin Oral Investig ; 28(7): 381, 2024 Jun 18.

Artículo en Inglés | MEDLINE | ID: mdl-38886242

RESUMEN

OBJECTIVES: Tooth extraction is one of the most frequently performed medical procedures. The indication is based on the combination of clinical and radiological examination and individual patient parameters and should be made with great care. However, determining whether a tooth should be extracted is not always a straightforward decision. Moreover, visual and cognitive pitfalls in the analysis of radiographs may lead to incorrect decisions. Artificial intelligence (AI) could be used as a decision support tool to provide a score of tooth extractability. MATERIAL AND METHODS: Using 26,956 single teeth images from 1,184 panoramic radiographs (PANs), we trained a ResNet50 network to classify teeth as either extraction-worthy or preservable. For this purpose, teeth were cropped with different margins from PANs and annotated. The usefulness of the AI-based classification as well that of dentists was evaluated on a test dataset. In addition, the explainability of the best AI model was visualized via a class activation mapping using CAMERAS. RESULTS: The ROC-AUC for the best AI model to discriminate teeth worthy of preservation was 0.901 with 2% margin on dental images. In contrast, the average ROC-AUC for dentists was only 0.797. With a 19.1% tooth extractions prevalence, the AI model's PR-AUC was 0.749, while the dentist evaluation only reached 0.589. CONCLUSION: AI models outperform dentists/specialists in predicting tooth extraction based solely on X-ray images, while the AI performance improves with increasing contextual information. CLINICAL RELEVANCE: AI could help monitor at-risk teeth and reduce errors in indications for extractions.

Asunto(s)

Inteligencia Artificial , Radiografía Panorámica , Extracción Dental , Humanos , Odontólogos , Femenino , Masculino , Adulto

6.

Integrating Text and Image Analysis: Exploring GPT-4V's Capabilities in Advanced Radiological Applications Across Subspecialties.

Busch, Felix; Han, Tianyu; Makowski, Marcus R; Truhn, Daniel; Bressem, Keno K; Adams, Lisa.

J Med Internet Res ; 26: e54948, 2024 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-38691404

RESUMEN

This study demonstrates that GPT-4V outperforms GPT-4 across radiology subspecialties in analyzing 207 cases with 1312 images from the Radiological Society of North America Case Collection.

Asunto(s)

Radiología , Radiología/métodos , Radiología/estadística & datos numéricos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos

7.

Two for One-Combined Morphologic and Quantitative Knee Joint MRI Using a Versatile Turbo Spin-Echo Platform.

Lemainque, Teresa; Pridöhl, Nicola; Huppertz, Marc; Post, Manuel; Yüksel, Can; Siepmann, Robert; Radke, Karl Ludger; Zhang, Shuo; Yoneyama, Masami; Prescher, Andreas; Kuhl, Christiane; Truhn, Daniel; Nebelung, Sven.

Diagnostics (Basel) ; 14(10)2024 May 08.

Artículo en Inglés | MEDLINE | ID: mdl-38786276

RESUMEN

Quantitative MRI techniques such as T2 and T1ρ mapping are beneficial in evaluating knee joint pathologies; however, long acquisition times limit their clinical adoption. MIXTURE (Multi-Interleaved X-prepared Turbo Spin-Echo with IntUitive RElaxometry) provides a versatile turbo spin-echo (TSE) platform for simultaneous morphologic and quantitative joint imaging. Two MIXTURE sequences were designed along clinical requirements: "MIX1", combining proton density (PD)-weighted fat-saturated (FS) images and T2 mapping (acquisition time: 4:59 min), and "MIX2", combining T1-weighted images and T1ρ mapping (6:38 min). MIXTURE sequences and their reference 2D and 3D TSE counterparts were acquired from ten human cadaveric knee joints at 3.0 T. Contrast, contrast-to-noise ratios, and coefficients of variation were comparatively evaluated using parametric tests. Clinical radiologists (n = 3) assessed diagnostic quality as a function of sequence and anatomic structure using five-point Likert scales and ordinal regression, with a significance level of α = 0.01. MIX1 and MIX2 had at least equal diagnostic quality compared to reference sequences of the same image weighting. Contrast, contrast-to-noise ratios, and coefficients of variation were largely similar for the PD-weighted FS and T1-weighted images. In clinically feasible scan times, MIXTURE sequences yield morphologic, TSE-based images of diagnostic quality and quantitative parameter maps with additional insights on soft tissue composition and ultrastructure.

8.

Preoperative three-dimensional lung volumetry predicts respiratory complications in patients undergoing major liver resection for colorectal metastases.

Elmaagacli, Suzan; Thiele, Christoph; Meister, Franziska; Menne, Philipp; Truhn, Daniel; Olde Damink, Steven W M; Bickenbach, Johannes; Neumann, Ulf; Lang, Sven Arke; Vondran, Florian; Amygdalos, Iakovos.

Sci Rep ; 14(1): 10594, 2024 05 08.

Artículo en Inglés | MEDLINE | ID: mdl-38719953

RESUMEN

Colorectal liver metastases (CRLM) are the predominant factor limiting survival in patients with colorectal cancer and liver resection with complete tumor removal is the best treatment option for these patients. This study examines the predictive ability of three-dimensional lung volumetry (3DLV) based on preoperative computerized tomography (CT), to predict postoperative pulmonary complications in patients undergoing major liver resection for CRLM. Patients undergoing major curative liver resection for CRLM between 2010 and 2021 with a preoperative CT scan of the thorax within 6 weeks of surgery, were included. Total lung volume (TLV) was calculated using volumetry software 3D-Slicer version 4.11.20210226 including Chest Imaging Platform extension ( http://www.slicer.org ). The area under the curve (AUC) of a receiver-operating characteristic analysis was used to define a cut-off value of TLV, for predicting the occurrence of postoperative respiratory complications. Differences between patients with TLV below and above the cut-off were examined with Chi-square or Fisher's exact test and Mann-Whitney U tests and logistic regression was used to determine independent risk factors for the development of respiratory complications. A total of 123 patients were included, of which 35 (29%) developed respiratory complications. A predictive ability of TLV regarding respiratory complications was shown (AUC 0.62, p = 0.036) and a cut-off value of 4500 cm3 was defined. Patients with TLV < 4500 cm3 were shown to suffer from significantly higher rates of respiratory complications (44% vs. 21%, p = 0.007) compared to the rest. Logistic regression analysis identified TLV < 4500 cm3 as an independent predictor for the occurrence of respiratory complications (odds ratio 3.777, 95% confidence intervals 1.488-9.588, p = 0.005). Preoperative 3DLV is a viable technique for prediction of postoperative pulmonary complications in patients undergoing major liver resection for CRLM. More studies in larger cohorts are necessary to further evaluate this technique.

Asunto(s)

Neoplasias Colorrectales , Hepatectomía , Neoplasias Hepáticas , Complicaciones Posoperatorias , Tomografía Computarizada por Rayos X , Humanos , Femenino , Masculino , Neoplasias Colorrectales/patología , Neoplasias Colorrectales/cirugía , Persona de Mediana Edad , Neoplasias Hepáticas/cirugía , Neoplasias Hepáticas/secundario , Anciano , Hepatectomía/efectos adversos , Hepatectomía/métodos , Complicaciones Posoperatorias/etiología , Pulmón/patología , Pulmón/diagnóstico por imagen , Pulmón/cirugía , Estudios Retrospectivos , Imagenología Tridimensional , Mediciones del Volumen Pulmonar , Factores de Riesgo , Periodo Preoperatorio

9.

Diffusion probabilistic versus generative adversarial models to reduce contrast agent dose in breast MRI.

Müller-Franzes, Gustav; Huck, Luisa; Bode, Maike; Nebelung, Sven; Kuhl, Christiane; Truhn, Daniel; Lemainque, Teresa.

Eur Radiol Exp ; 8(1): 53, 2024 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-38689178

RESUMEN

BACKGROUND: To compare denoising diffusion probabilistic models (DDPM) and generative adversarial networks (GAN) for recovering contrast-enhanced breast magnetic resonance imaging (MRI) subtraction images from virtual low-dose subtraction images. METHODS: Retrospective, ethically approved study. DDPM- and GAN-reconstructed single-slice subtraction images of 50 breasts with enhancing lesions were compared to original ones at three dose levels (25%, 10%, 5%) using quantitative measures and radiologic evaluations. Two radiologists stated their preference based on the reconstruction quality and scored the lesion conspicuity as compared to the original, blinded to the model. Fifty lesion-free maximum intensity projections were evaluated for the presence of false-positives. Results were compared between models and dose levels, using generalized linear mixed models. RESULTS: At 5% dose, both radiologists preferred the GAN-generated images, whereas at 25% dose, both radiologists preferred the DDPM-generated images. Median lesion conspicuity scores did not differ between GAN and DDPM at 25% dose (5 versus 5, p = 1.000) and 10% dose (4 versus 4, p = 1.000). At 5% dose, both readers assigned higher conspicuity to the GAN than to the DDPM (3 versus 2, p = 0.007). In the lesion-free examinations, DDPM and GAN showed no differences in the false-positive rate at 5% (15% versus 22%), 10% (10% versus 6%), and 25% (6% versus 4%) (p = 1.000). CONCLUSIONS: Both GAN and DDPM yielded promising results in low-dose image reconstruction. However, neither of them showed superior results over the other model for all dose levels and evaluation metrics. Further development is needed to counteract false-positives. RELEVANCE STATEMENT: For MRI-based breast cancer screening, reducing the contrast agent dose is desirable. Diffusion probabilistic models and generative adversarial networks were capable of retrospectively enhancing the signal of low-dose images. Hence, they may supplement imaging with reduced doses in the future. KEY POINTS: â¢ Deep learning may help recover signal in low-dose contrast-enhanced breast MRI. â¢ Two models (DDPM and GAN) were trained at different dose levels. â¢ Radiologists preferred DDPM at 25%, and GAN images at 5% dose. â¢ Lesion conspicuity between DDPM and GAN was similar, except at 5% dose. â¢ GAN and DDPM yield promising results in low-dose image reconstruction.

Asunto(s)

Neoplasias de la Mama , Medios de Contraste , Imagen por Resonancia Magnética , Humanos , Femenino , Estudios Retrospectivos , Medios de Contraste/administración & dosificación , Neoplasias de la Mama/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Persona de Mediana Edad , Modelos Estadísticos , Adulto , Anciano

10.

Using histopathology latent diffusion models as privacy-preserving dataset augmenters improves downstream classification performance.

Niehues, Jan M; Müller-Franzes, Gustav; Schirris, Yoni; Wagner, Sophia Janine; Jendrusch, Michael; Kloor, Matthias; Pearson, Alexander T; Muti, Hannah Sophie; Hewitt, Katherine J; Veldhuizen, Gregory P; Zigutyte, Laura; Truhn, Daniel; Kather, Jakob Nikolas.

Comput Biol Med ; 175: 108410, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38678938

RESUMEN

Latent diffusion models (LDMs) have emerged as a state-of-the-art image generation method, outperforming previous Generative Adversarial Networks (GANs) in terms of training stability and image quality. In computational pathology, generative models are valuable for data sharing and data augmentation. However, the impact of LDM-generated images on histopathology tasks compared to traditional GANs has not been systematically studied. We trained three LDMs and a styleGAN2 model on histology tiles from nine colorectal cancer (CRC) tissue classes. The LDMs include 1) a fine-tuned version of stable diffusion v1.4, 2) a Kullback-Leibler (KL)-autoencoder (KLF8-DM), and 3) a vector quantized (VQ)-autoencoder deploying LDM (VQF8-DM). We assessed image quality through expert ratings, dimensional reduction methods, distribution similarity measures, and their impact on training a multiclass tissue classifier. Additionally, we investigated image memorization in the KLF8-DM and styleGAN2 models. All models provided a high image quality, with the KLF8-DM achieving the best Frechet Inception Distance (FID) and expert rating scores for complex tissue classes. For simpler classes, the VQF8-DM and styleGAN2 models performed better. Image memorization was negligible for both styleGAN2 and KLF8-DM models. Classifiers trained on a mix of KLF8-DM generated and real images achieved a 4% improvement in overall classification accuracy, highlighting the usefulness of these images for dataset augmentation. Our systematic study of generative methods showed that KLF8-DM produces the highest quality images with negligible image memorization. The higher classifier performance in the generatively augmented dataset suggests that this augmentation technique can be employed to enhance histopathology classifiers for various tasks.

Asunto(s)

Neoplasias Colorrectales , Humanos , Neoplasias Colorrectales/patología , Neoplasias Colorrectales/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos

11.

Artificial intelligence in liver cancer - new tools for research and patient management.

Calderaro, Julien; Zigutyte, Laura; Truhn, Daniel; Jaffe, Ariel; Kather, Jakob Nikolas.

Nat Rev Gastroenterol Hepatol ; 2024 Apr 16.

Artículo en Inglés | MEDLINE | ID: mdl-38627537

RESUMEN

Liver cancer has high incidence and mortality globally. Artificial intelligence (AI) has advanced rapidly, influencing cancer care. AI systems are already approved for clinical use in some tumour types (for example, colorectal cancer screening). Crucially, research demonstrates that AI can analyse histopathology, radiology and natural language in liver cancer, and can replace manual tasks and access hidden information in routinely available clinical data. However, for liver cancer, few of these applications have translated into large-scale clinical trials or clinically approved products. Here, we advocate for the incorporation of AI in all stages of liver cancer management. We present a taxonomy of AI approaches in liver cancer, highlighting areas with academic and commercial potential, and outline a policy for AI-based liver cancer management, including interdisciplinary training of researchers, clinicians and patients. The potential of AI in liver cancer is immense, but effort is required to ensure that AI can fulfil expectations.

12.

The virtual reference radiologist: comprehensive AI assistance for clinical image reading and interpretation.

Siepmann, Robert; Huppertz, Marc; Rastkhiz, Annika; Reen, Matthias; Corban, Eric; Schmidt, Christian; Wilke, Stephan; Schad, Philipp; Yüksel, Can; Kuhl, Christiane; Truhn, Daniel; Nebelung, Sven.

Eur Radiol ; 2024 Apr 16.

Artículo en Inglés | MEDLINE | ID: mdl-38627289

RESUMEN

OBJECTIVES: Large language models (LLMs) have shown potential in radiology, but their ability to aid radiologists in interpreting imaging studies remains unexplored. We investigated the effects of a state-of-the-art LLM (GPT-4) on the radiologists' diagnostic workflow. MATERIALS AND METHODS: In this retrospective study, six radiologists of different experience levels read 40 selected radiographic [n = 10], CT [n = 10], MRI [n = 10], and angiographic [n = 10] studies unassisted (session one) and assisted by GPT-4 (session two). Each imaging study was presented with demographic data, the chief complaint, and associated symptoms, and diagnoses were registered using an online survey tool. The impact of Artificial Intelligence (AI) on diagnostic accuracy, confidence, user experience, input prompts, and generated responses was assessed. False information was registered. Linear mixed-effect models were used to quantify the factors (fixed: experience, modality, AI assistance; random: radiologist) influencing diagnostic accuracy and confidence. RESULTS: When assessing if the correct diagnosis was among the top-3 differential diagnoses, diagnostic accuracy improved slightly from 181/240 (75.4%, unassisted) to 188/240 (78.3%, AI-assisted). Similar improvements were found when only the top differential diagnosis was considered. AI assistance was used in 77.5% of the readings. Three hundred nine prompts were generated, primarily involving differential diagnoses (59.1%) and imaging features of specific conditions (27.5%). Diagnostic confidence was significantly higher when readings were AI-assisted (p > 0.001). Twenty-three responses (7.4%) were classified as hallucinations, while two (0.6%) were misinterpretations. CONCLUSION: Integrating GPT-4 in the diagnostic process improved diagnostic accuracy slightly and diagnostic confidence significantly. Potentially harmful hallucinations and misinterpretations call for caution and highlight the need for further safeguarding measures. CLINICAL RELEVANCE STATEMENT: Using GPT-4 as a virtual assistant when reading images made six radiologists of different experience levels feel more confident and provide more accurate diagnoses; yet, GPT-4 gave factually incorrect and potentially harmful information in 7.4% of its responses.

13.

Reporting guidelines in medical artificial intelligence: a systematic review and meta-analysis.

Kolbinger, Fiona R; Veldhuizen, Gregory P; Zhu, Jiefu; Truhn, Daniel; Kather, Jakob Nikolas.

Commun Med (Lond) ; 4(1): 71, 2024 Apr 11.

Artículo en Inglés | MEDLINE | ID: mdl-38605106

RESUMEN

BACKGROUND: The field of Artificial Intelligence (AI) holds transformative potential in medicine. However, the lack of universal reporting guidelines poses challenges in ensuring the validity and reproducibility of published research studies in this field. METHODS: Based on a systematic review of academic publications and reporting standards demanded by both international consortia and regulatory stakeholders as well as leading journals in the fields of medicine and medical informatics, 26 reporting guidelines published between 2009 and 2023 were included in this analysis. Guidelines were stratified by breadth (general or specific to medical fields), underlying consensus quality, and target research phase (preclinical, translational, clinical) and subsequently analyzed regarding the overlap and variations in guideline items. RESULTS: AI reporting guidelines for medical research vary with respect to the quality of the underlying consensus process, breadth, and target research phase. Some guideline items such as reporting of study design and model performance recur across guidelines, whereas other items are specific to particular fields and research stages. CONCLUSIONS: Our analysis highlights the importance of reporting guidelines in clinical AI research and underscores the need for common standards that address the identified variations and gaps in current guidelines. Overall, this comprehensive overview could help researchers and public stakeholders reinforce quality standards for increased reliability, reproducibility, clinical validity, and public trust in AI research in healthcare. This could facilitate the safe, effective, and ethical translation of AI methods into clinical applications that will ultimately improve patient outcomes.

Artificial Intelligence (AI) refers to computer systems that can perform tasks that normally require human intelligence, like recognizing patterns or making decisions. AI has the potential to transform healthcare, but research on AI in medicine needs clear rules so caregivers and patients can trust it. This study reviews and compares 26 existing guidelines for reporting on AI in medicine. The key differences between these guidelines are their target areas (medicine in general or specific medical fields), the ways they were created, and the research stages they address. While some key items like describing the AI model recurred across guidelines, others were specific to the research area. The analysis shows gaps and variations in current guidelines. Overall, transparent reporting is important, so AI research is reliable, reproducible, trustworthy, and safe for patients. This systematic review of guidelines aims to increase the transparency of AI research, supporting an ethical and safe progression of AI from research into clinical practice.

14.

Comparative Analysis of Multimodal Large Language Model Performance on Clinical Vignette Questions.

Han, Tianyu; Adams, Lisa C; Bressem, Keno K; Busch, Felix; Nebelung, Sven; Truhn, Daniel.

JAMA ; 331(15): 1320-1321, 2024 04 16.

Artículo en Inglés | MEDLINE | ID: mdl-38497956

RESUMEN

This study compares 2 large language models and their performance vs that of competing open-source models.

Asunto(s)

Inteligencia Artificial , Diagnóstico por Imagen , Anamnesis , Lenguaje

15.

Author Correction: A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports.

Truhn, Daniel; Weber, Christian D; Braun, Benedikt J; Bressem, Keno; Kather, Jakob N; Kuhl, Christiane; Nebelung, Sven.

Sci Rep ; 14(1): 5431, 2024 Mar 05.

Artículo en Inglés | MEDLINE | ID: mdl-38443449

16.

Evaluation of Pulmonary Nodules by Radiologists vs. Radiomics in Stand-Alone and Complementary CT and MRI.

Tietz, Eric; Müller-Franzes, Gustav; Zimmermann, Markus; Kuhl, Christiane Katharina; Keil, Sebastian; Nebelung, Sven; Truhn, Daniel.

Diagnostics (Basel) ; 14(5)2024 Feb 23.

Artículo en Inglés | MEDLINE | ID: mdl-38472955

RESUMEN

Increased attention has been given to MRI in radiation-free screening for malignant nodules in recent years. Our objective was to compare the performance of human readers and radiomic feature analysis based on stand-alone and complementary CT and MRI imaging in classifying pulmonary nodules. This single-center study comprises patients with CT findings of pulmonary nodules who underwent additional lung MRI and whose nodules were classified as benign/malignant by resection. For radiomic features analysis, 2D segmentation was performed for each lung nodule on axial CT, T2-weighted (T2w), and diffusion (DWI) images. The 105 extracted features were reduced by iterative backward selection. The performance of radiomics and human readers was compared by calculating accuracy with Clopper-Pearson confidence intervals. Fifty patients (mean age 63 +/- 10 years) with 66 pulmonary nodules (40 malignant) were evaluated. ACC values for radiomic features analysis vs. radiologists based on CT alone (0.68; 95%CI: 0.56, 0.79 vs. 0.59; 95%CI: 0.46, 0.71), T2w alone (0.65; 95%CI: 0.52, 0.77 vs. 0.68; 95%CI: 0.54, 0.78), DWI alone (0.61; 95%CI:0.48, 0.72 vs. 0.73; 95%CI: 0.60, 0.83), combined T2w/DWI (0.73; 95%CI: 0.60, 0.83 vs. 0.70; 95%CI: 0.57, 0.80), and combined CT/T2w/DWI (0.83; 95%CI: 0.72, 0.91 vs. 0.64; 95%CI: 0.51, 0.75) were calculated. This study is the first to show that by combining quantitative image information from CT, T2w, and DWI datasets, pulmonary nodule assessment through radiomics analysis is superior to using one modality alone, even exceeding human readers' performance.

17.

Large language models and multimodal foundation models for precision oncology.

Truhn, Daniel; Eckardt, Jan-Niklas; Ferber, Dyke; Kather, Jakob Nikolas.

NPJ Precis Oncol ; 8(1): 72, 2024 Mar 22.

Artículo en Inglés | MEDLINE | ID: mdl-38519519

RESUMEN

The technological progress in artificial intelligence (AI) has massively accelerated since 2022, with far-reaching implications for oncology and cancer research. Large language models (LLMs) now perform at human-level competency in text processing. Notably, both text and image processing networks are increasingly based on transformer neural networks. This convergence enables the development of multimodal AI models that take diverse types of data as an input simultaneously, marking a qualitative shift from specialized niche models which were prevalent in the 2010s. This editorial summarizes these developments, which are expected to impact precision oncology in the coming years.

18.

Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging.

Tayebi Arasteh, Soroosh; Ziller, Alexander; Kuhl, Christiane; Makowski, Marcus; Nebelung, Sven; Braren, Rickmer; Rueckert, Daniel; Truhn, Daniel; Kaissis, Georgios.

Commun Med (Lond) ; 4(1): 46, 2024 Mar 14.

Artículo en Inglés | MEDLINE | ID: mdl-38486100

RESUMEN

BACKGROUND: Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. METHODS: We used two datasets: (1) A large dataset (N = 193,311) of high quality clinical chest radiographs, and (2) a dataset (N = 1625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver operating characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. RESULTS: We find that, while the privacy-preserving training yields lower accuracy, it largely does not amplify discrimination against age, sex or co-morbidity. However, we find an indication that difficult diagnoses and subgroups suffer stronger performance hits in private training. CONCLUSIONS: Our study shows that - under the challenging realistic circumstances of a real-life clinical dataset - the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.

Artificial intelligence (AI), in which computers can learn to do tasks that normally require human intelligence, is particularly useful in medical imaging. However, AI should be used in a way that preserves patient privacy. We explored the balance between maintaining patient data privacy and AI performance in medical imaging. We use an approach called differential privacy to protect the privacy of patients' images. We show that, although training AI with differential privacy leads to a slight decrease in accuracy, it does not substantially increase bias against different age groups, genders, or patients with multiple health conditions. However, we notice that AI faces more challenges in accurately diagnosing complex cases and specific subgroups when trained under these privacy constraints. These findings highlight the importance of designing AI systems that are both privacy-conscious and capable of reliable diagnoses across patient groups.

19.

Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images.

Tayebi Arasteh, Soroosh; Misera, Leo; Kather, Jakob Nikolas; Truhn, Daniel; Nebelung, Sven.

Eur Radiol Exp ; 8(1): 10, 2024 Feb 08.

Artículo en Inglés | MEDLINE | ID: mdl-38326501

RESUMEN

BACKGROUND: Pretraining labeled datasets, like ImageNet, have become a technical standard in advanced medical image analysis. However, the emergence of self-supervised learning (SSL), which leverages unlabeled data to learn robust features, presents an opportunity to bypass the intensive labeling process. In this study, we explored if SSL for pretraining on non-medical images can be applied to chest radiographs and how it compares to supervised pretraining on non-medical images and on medical images. METHODS: We utilized a vision transformer and initialized its weights based on the following: (i) SSL pretraining on non-medical images (DINOv2), (ii) supervised learning (SL) pretraining on non-medical images (ImageNet dataset), and (iii) SL pretraining on chest radiographs from the MIMIC-CXR database, the largest labeled public dataset of chest radiographs to date. We tested our approach on over 800,000 chest radiographs from 6 large global datasets, diagnosing more than 20 different imaging findings. Performance was quantified using the area under the receiver operating characteristic curve and evaluated for statistical significance using bootstrapping. RESULTS: SSL pretraining on non-medical images not only outperformed ImageNet-based pretraining (p < 0.001 for all datasets) but, in certain cases, also exceeded SL on the MIMIC-CXR dataset. Our findings suggest that selecting the right pretraining strategy, especially with SSL, can be pivotal for improving diagnostic accuracy of artificial intelligence in medical imaging. CONCLUSIONS: By demonstrating the promise of SSL in chest radiograph analysis, we underline a transformative shift towards more efficient and accurate AI models in medical imaging. RELEVANCE STATEMENT: Self-supervised learning highlights a paradigm shift towards the enhancement of AI-driven accuracy and efficiency in medical imaging. Given its promise, the broader application of self-supervised learning in medical imaging calls for deeper exploration, particularly in contexts where comprehensive annotated datasets are limited.

Asunto(s)

Inteligencia Artificial , Aprendizaje Profundo , Bases de Datos Factuales

20.

Large language models streamline automated machine learning for clinical studies.

Tayebi Arasteh, Soroosh; Han, Tianyu; Lotfinia, Mahshad; Kuhl, Christiane; Kather, Jakob Nikolas; Truhn, Daniel; Nebelung, Sven.

Nat Commun ; 15(1): 1603, 2024 Feb 21.

Artículo en Inglés | MEDLINE | ID: mdl-38383555

RESUMEN

A knowledge gap persists between machine learning (ML) developers (e.g., data scientists) and practitioners (e.g., clinicians), hampering the full utilization of ML for clinical data analysis. We investigated the potential of the ChatGPT Advanced Data Analysis (ADA), an extension of GPT-4, to bridge this gap and perform ML analyses efficiently. Real-world clinical datasets and study details from large trials across various medical specialties were presented to ChatGPT ADA without specific guidance. ChatGPT ADA autonomously developed state-of-the-art ML models based on the original study's training data to predict clinical outcomes such as cancer development, cancer progression, disease complications, or biomarkers such as pathogenic gene sequences. Following the re-implementation and optimization of the published models, the head-to-head comparison of the ChatGPT ADA-crafted ML models and their respective manually crafted counterparts revealed no significant differences in traditional performance metrics (p ≥ 0.072). Strikingly, the ChatGPT ADA-crafted ML models often outperformed their counterparts. In conclusion, ChatGPT ADA offers a promising avenue to democratize ML in medicine by simplifying complex data analyses, yet should enhance, not replace, specialized training and resources, to promote broader applications in medical research and practice.

Asunto(s)

Algoritmos , Neoplasias , Humanos , Benchmarking , Lenguaje , Aprendizaje Automático

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA