RESUMEN
PURPOSE: To investigate if GPT-4 improves the accuracy, consistency, and trustworthiness of a context-aware chatbot to provide personalized imaging recommendations from American College of Radiology (ACR) appropriateness criteria documents using semantic similarity processing: In addition, we sought to enable auditability of the output by revealing the information source the decision relies on. MATERIAL AND METHODS: We refined an existing chatbot that incorporated specialized knowledge of the ACR guidelines by upgrading GPT-3.5-Turbo to its successor GPT-4 by OpenAI, using the latest version of LlamaIndex, and improving the prompting strategy. This chatbot was compared to the previous version, generic GPT-3.5-Turbo and GPT-4, and general radiologists regarding the performance in applying the ACR appropriateness guidelines. RESULTS: The refined context-aware chatbot performed superior to the previous version using GPT-3.5-Turbo, generic chatbots GPT-3.5-Turbo and GPT-4, and general radiologists in providing "usually or may be appropriate" recommendations according to the ACR guidelines (all p < 0.001). It also outperformed GPT-3.5-Turbo and general radiologists in respect to "usually appropriate" recommendations (both p < 0.001). Moreover, the consistency in correct answers was higher with 78 % consistent correct "usually appropriate" answers and 94 % for "usually or may be appropriate" recommendations. In all cases, the same source documents were chosen, ensuring transparency. CONCLUSION: Our study demonstrates the significance of context awareness in ensuring the use of appropriate knowledge and proposes a strategy to enhance trust in chatbot-based outputs to provide transparency. The improvements in accuracy, consistency, and source transparency address trust issues and enhance the clinical decision support process. ABBREVIATIONS: ACR, American College of Radiology; accGPT, appropriateness criteria context aware GPT; accGPT-4, appropriateness criteria context aware GPT using GPT-4; GPT, generative pre-trained transformer; LLM, Large Language Model.
RESUMEN
BACKGROUND: We investigated the potential of an imaging-aware GPT-4-based chatbot in providing diagnoses based on imaging descriptions of abdominal pathologies. METHODS: Utilizing zero-shot learning via the LlamaIndex framework, GPT-4 was enhanced using the 96 documents from the Radiographics Top 10 Reading List on gastrointestinal imaging, creating a gastrointestinal imaging-aware chatbot (GIA-CB). To assess its diagnostic capability, 50 cases on a variety of abdominal pathologies were created, comprising radiological findings in fluoroscopy, MRI, and CT. We compared the GIA-CB to the generic GPT-4 chatbot (g-CB) in providing the primary and 2 additional differential diagnoses, using interpretations from senior-level radiologists as ground truth. The trustworthiness of the GIA-CB was evaluated by investigating the source documents as provided by the knowledge-retrieval mechanism. Mann-Whitney U test was employed. RESULTS: The GIA-CB demonstrated a high capability to identify the most appropriate differential diagnosis in 39/50 cases (78%), significantly surpassing the g-CB in 27/50 cases (54%) (p = 0.006). Notably, the GIA-CB offered the primary differential in the top 3 differential diagnoses in 45/50 cases (90%) versus g-CB with 37/50 cases (74%) (p = 0.022) and always with appropriate explanations. The median response time was 29.8 s for GIA-CB and 15.7 s for g-CB, and the mean cost per case was $0.15 and $0.02, respectively. CONCLUSIONS: The GIA-CB not only provided an accurate diagnosis for gastrointestinal pathologies, but also direct access to source documents, providing insight into the decision-making process, a step towards trustworthy and explainable AI. Integrating context-specific data into AI models can support evidence-based clinical decision-making. RELEVANCE STATEMENT: A context-aware GPT-4 chatbot demonstrates high accuracy in providing differential diagnoses based on imaging descriptions, surpassing the generic GPT-4. It provided formulated rationale and source excerpts supporting the diagnoses, thus enhancing trustworthy decision-support. KEY POINTS: ⢠Knowledge retrieval enhances differential diagnoses in a gastrointestinal imaging-aware chatbot (GIA-CB). ⢠GIA-CB outperformed the generic counterpart, providing formulated rationale and source excerpts. ⢠GIA-CB has the potential to pave the way for AI-assisted decision support systems.
Asunto(s)
Inteligencia Artificial , Enfermedades Gastrointestinales , Prueba de Estudio Conceptual , Humanos , Diagnóstico Diferencial , Enfermedades Gastrointestinales/diagnóstico por imagenRESUMEN
Osteoarthritis of the knee, a widespread cause of knee disability, is commonly treated in orthopedics due to its rising prevalence. Lower extremity misalignment, pivotal in knee injury etiology and management, necessitates comprehensive mechanical alignment evaluation via frequently-requested weight-bearing long leg radiographs (LLR). Despite LLR's routine use, current analysis techniques are error-prone and time-consuming. To address this, we conducted a multicentric study to develop and validate a deep learning (DL) model for fully automated leg alignment assessment on anterior-posterior LLR, targeting enhanced reliability and efficiency. The DL model, developed using 594 patients' LLR and a 60%/10%/30% data split for training, validation, and testing, executed alignment analyses via a multi-step process, employing a detection network and nine specialized networks. It was designed to assess all vital anatomical and mechanical parameters for standard clinical leg deformity analysis and preoperative planning. Accuracy, reliability, and assessment duration were compared with three specialized orthopedic surgeons across two distinct institutional datasets (136 and 143 radiographs). The algorithm exhibited equivalent performance to the surgeons in terms of alignment accuracy (DL: 0.21 ± 0.18°to 1.06 ± 1.3°vs. OS: 0.21 ± 0.16°to 1.72 ± 1.96°), interrater reliability (ICC DL: 0.90 ± 0.05 to 1.0 ± 0.0 vs. ICC OS: 0.90 ± 0.03 to 1.0 ± 0.0), and clinically acceptable accuracy (DL: 53.9%-100% vs OS 30.8%-100%). Further, automated analysis significantly reduced analysis time compared to manual annotation (DL: 22 ± 0.6 s vs. OS; 101.7 ± 7 s, p ≤ 0.01). By demonstrating that our algorithm not only matches the precision of expert surgeons but also significantly outpaces them in both speed and consistency of measurements, our research underscores a pivotal advancement in harnessing AI to enhance clinical efficiency and decision-making in orthopaedics.
Asunto(s)
Aprendizaje Profundo , Humanos , Reproducibilidad de los Resultados , Extremidad Inferior/diagnóstico por imagen , Extremidad Inferior/cirugía , Articulación de la Rodilla , Radiografía , Estudios RetrospectivosRESUMEN
BACKGROUND: The growing prevalence of musculoskeletal diseases increases radiologic workload, highlighting the need for optimized workflow management and automated metadata classification systems. We developed a large-scale, well-characterized dataset of musculoskeletal radiographs and trained deep learning neural networks to classify radiographic projection and body side. METHODS: In this IRB-approved retrospective single-center study, a dataset of musculoskeletal radiographs from 2011 to 2019 was retrieved and manually labeled for one of 45 possible radiographic projections and the depicted body side. Two classification networks were trained for the respective tasks using the Xception architecture with a custom network top and pretrained weights. Performance was evaluated on a hold-out test sample, and gradient-weighted class activation mapping (Grad-CAM) heatmaps were computed to visualize the influential image regions for network predictions. RESULTS: A total of 13,098 studies comprising 23,663 radiographs were included with a patient-level dataset split, resulting in 19,183 training, 2,145 validation, and 2,335 test images. Focusing on paired body regions, training for side detection included 16,319 radiographs (13,284 training, 1,443 validation, and 1,592 test images). The models achieved an overall accuracy of 0.975 for projection and 0.976 for body-side classification on the respective hold-out test sample. Errors were primarily observed in projections with seamless anatomical transitions or non-orthograde adjustment techniques. CONCLUSIONS: The deep learning neural networks demonstrated excellent performance in classifying radiographic projection and body side across a wide range of musculoskeletal radiographs. These networks have the potential to serve as presorting algorithms, optimizing radiologic workflow and enhancing patient care. RELEVANCE STATEMENT: The developed networks excel at classifying musculoskeletal radiographs, providing valuable tools for research data extraction, standardized image sorting, and minimizing misclassifications in artificial intelligence systems, ultimately enhancing radiology workflow efficiency and patient care. KEY POINTS: ⢠A large-scale, well-characterized dataset was developed, covering a broad spectrum of musculoskeletal radiographs. ⢠Deep learning neural networks achieved high accuracy in classifying radiographic projection and body side. ⢠Grad-CAM heatmaps provided insight into network decisions, contributing to their interpretability and trustworthiness. ⢠The trained models can help optimize radiologic workflow and manage large amounts of data.
Asunto(s)
Aprendizaje Profundo , Radiología , Humanos , Inteligencia Artificial , Estudios Retrospectivos , RadiografíaRESUMEN
OBJECTIVES: Despite their life-saving capabilities, cerebrospinal fluid (CSF) shunts exhibit high failure rates, with a large fraction of failures attributed to the regulating valve. Due to a lack of methods for the detailed analysis of valve malfunctions, failure mechanisms are not well understood, and valves often have to be surgically explanted on the mere suspicion of malfunction. The presented pilot study aims to demonstrate radiological methods for comprehensive analysis of CSF shunt valves, considering both the potential for failure analysis in design optimization, and for future clinical in-vivo application to reduce the number of required shunt revision surgeries. The proposed method could also be utilized to develop and support in situ repair methods (e.g. by lysis or ultrasound) of malfunctioning CSF shunt valves. MATERIALS AND METHODS: The primary methods described are contrast-enhanced radiographic time series of CSF shunt valves, taken in a favorable projection geometry at low radiation dose, and the machine-learning-based diagnosis of CSF shunt valve obstructions. Complimentarily, we investigate CT-based methods capable of providing accurate ground truth for the training of such diagnostic tools. Using simulated test and training data, the performance of the machine-learning diagnostics in identifying and localizing obstructions within a shunt valve is evaluated regarding per-pixel sensitivity and specificity, the Dice similarity coefficient, and the false positive rate in the case of obstruction free test samples. RESULTS: Contrast enhanced subtraction radiography allows high-resolution, time-resolved, low-dose analysis of fluid transport in CSF shunt valves. Complementarily, photon-counting micro-CT allows to investigate valve obstruction mechanisms in detail, and to generate valid ground truth for machine learning-based diagnostics. Machine-learning-based detection of valve obstructions in simulated radiographies shows promising results, with a per-pixel sensitivity >70%, per-pixel specificity >90%, a median Dice coefficient >0.8 and <10% false positives at a detection threshold of 0.5. CONCLUSIONS: This ex-vivo study demonstrates obstruction detection in cerebro-spinal fluid shunt valves, combining radiological methods with machine learning under conditions compatible to future in-vivo application. Results indicate that high-resolution contrast-enhanced subtraction radiography, possibly including time-series data, combined with machine-learning image analysis, has the potential to strongly improve the diagnostics of CSF shunt valve failures. The presented method is in principle suitable for in-vivo application, considering both measurement geometry and radiological dose. Further research is needed to validate these results on real-world data and to refine the employed methods. In combination, the presented methods enable comprehensive analysis of valve failure mechanisms, paving the way for improved product development and clinical diagnostics of CSF shunt valves.
RESUMEN
In magnetic resonance imaging (MRI), the perception of substandard image quality may prompt repetition of the respective image acquisition protocol. Subsequently selecting the preferred high-quality image data from a series of acquisitions can be challenging. An automated workflow may facilitate and improve this selection. We therefore aimed to investigate the applicability of an automated image quality assessment for the prediction of the subjectively preferred image acquisition. Our analysis included data from 11,347 participants with whole-body MRI examinations performed as part of the ongoing prospective multi-center German National Cohort (NAKO) study. Trained radiologic technologists repeated any of the twelve examination protocols due to induced setup errors and/or subjectively unsatisfactory image quality and chose a preferred acquisition from the resultant series. Up to 11 quantitative image quality parameters were automatically derived from all acquisitions. Regularized regression and standard estimates of diagnostic accuracy were calculated. Controlling for setup variations in 2342 series of two or more acquisitions, technologists preferred the repetition over the initial acquisition in 1116 of 1396 series in which the initial setup was retained (79.9%, range across protocols: 73-100%). Image quality parameters then commonly showed statistically significant differences between chosen and discarded acquisitions. In regularized regression across all protocols, 'structured noise maximum' was the strongest predictor for the technologists' choice, followed by 'N/2 ghosting average'. Combinations of the automatically derived parameters provided an area under the ROC curve between 0.51 and 0.74 for the prediction of the technologists' choice. It is concluded that automated image quality assessment can, despite considerable performance differences between protocols and anatomical regions, contribute substantially to identifying the subjective preference in a series of MRI acquisitions and thus provide effective decision support to readers.
Asunto(s)
Imagen por Resonancia Magnética , Humanos , Estudios de Cohortes , Estudios Prospectivos , Imagen por Resonancia Magnética/métodos , Curva ROC , Estudios LongitudinalesRESUMEN
While radiologists can describe a fracture's morphology and complexity with ease, the translation into classification systems such as the Arbeitsgemeinschaft Osteosynthesefragen (AO) Fracture and Dislocation Classification Compendium is more challenging. We tested the performance of generic chatbots and chatbots aware of specific knowledge of the AO classification provided by a vector-index and compared it to human readers. In the 100 radiological reports we created based on random AO codes, chatbots provided AO codes significantly faster than humans (mean 3.2 s per case vs. 50 s per case, p < .001) though not reaching human performance (max. chatbot performance of 86% correct full AO codes vs. 95% in human readers). In general, chatbots based on GPT 4 outperformed the ones based on GPT 3.5-Turbo. Further, we found that providing specific knowledge substantially enhances the chatbot's performance and consistency as the context-aware chatbot based on GPT 4 provided 71% consistent correct full AO codes for the compared to the 2% consistent correct full AO codes for the generic ChatGPT 4. This provides evidence, that refining and providing specific context to ChatGPT will be the next essential step in harnessing its power.
Asunto(s)
Fracturas Óseas , Radiología , Humanos , Concienciación , Medicamentos Genéricos , RadiólogosRESUMEN
Background Radiological imaging guidelines are crucial for accurate diagnosis and optimal patient care as they result in standardized decisions and thus reduce inappropriate imaging studies. Purpose In the present study, we investigated the potential to support clinical decision-making using an interactive chatbot designed to provide personalized imaging recommendations from American College of Radiology (ACR) appropriateness criteria documents using semantic similarity processing. Methods We utilized 209 ACR appropriateness criteria documents as specialized knowledge base and employed LlamaIndex, a framework that allows to connect large language models with external data, and the ChatGPT 3.5-Turbo to create an appropriateness criteria contexted chatbot (accGPT). Fifty clinical case files were used to compare the accGPT's performance against general radiologists at varying experience levels and to generic ChatGPT 3.5 and 4.0. Results All chatbots reached at least human performance level. For the 50 case files, the accGPT performed best in providing correct recommendations that were "usually appropriate" according to the ACR criteria and also did provide the highest proportion of consistently correct answers in comparison with generic chatbots and radiologists. Further, the chatbots provided substantial time and cost savings, with an average decision time of 5 minutes and a cost of 0.19 for all cases, compared to 50 minutes and 29.99 for radiologists (both p < 0.01). Conclusion ChatGPT-based algorithms have the potential to substantially improve the decision-making for clinical imaging studies in accordance with ACR guidelines. Specifically, a context-based algorithm performed superior to its generic counterpart, demonstrating the value of tailoring AI solutions to specific healthcare applications.
Asunto(s)
Algoritmos , Programas Informáticos , Humanos , Toma de Decisiones Clínicas , Ahorro de Costo , RadiólogosRESUMEN
OBJECTIVES: The precise segmentation of atrophic structures remains challenging in neurodegenerative diseases. We determined the performance of a Deep Neural Patchwork (DNP) in comparison to established segmentation algorithms regarding the ability to delineate the putamen in multiple system atrophy (MSA), Parkinson's disease (PD), and healthy controls. METHODS: We retrospectively included patients with MSA and PD as well as healthy controls. A DNP was trained on manual segmentations of the putamen as ground truth. For this, the cohort was randomly split into a training (N = 131) and test set (N = 120). The DNP's performance was compared with putaminal segmentations as derived by Automatic Anatomic Labelling, Freesurfer and Fastsurfer. For validation, we assessed the diagnostic accuracy of the resulting segmentations in the delineation of MSA vs. PD and healthy controls. RESULTS: A total of 251 subjects (61 patients with MSA, 158 patients with PD, and 32 healthy controls; mean age of 61.5 ± 8.8 years) were included. Compared to the dice-coefficient of the DNP (0.96), we noted significantly weaker performance for AAL3 (0.72; p < .001), Freesurfer (0.82; p < .001), and Fastsurfer (0.84, p < .001). This was corroborated by the superior diagnostic performance of MSA vs. PD and HC of the DNP (AUC 0.93) versus the AUC of 0.88 for AAL3 (p = 0.02), 0.86 for Freesurfer (p = 0.048), and 0.85 for Fastsurfer (p = 0.04). CONCLUSION: By utilization of a DNP, accurate segmentations of the putamen can be obtained even if substantial atrophy is present. This allows for more precise extraction of imaging parameters or shape features from the putamen in relevant patient cohorts. CLINICAL RELEVANCE STATEMENT: Deep learning-based segmentation of the putamen was superior to currently available algorithms and is beneficial for the diagnosis of multiple system atrophy. KEY POINTS: ⢠A Deep Neural Patchwork precisely delineates the putamen and performs equal to human labeling in multiple system atrophy, even when pronounced putaminal volume loss is present. ⢠The Deep Neural Patchwork-based segmentation was more capable to differentiate between multiple system atrophy and Parkinson's disease than the AAL3 atlas, Freesurfer, or Fastsurfer.
Asunto(s)
Aprendizaje Profundo , Atrofia de Múltiples Sistemas , Enfermedad de Parkinson , Humanos , Persona de Mediana Edad , Anciano , Atrofia de Múltiples Sistemas/diagnóstico por imagen , Enfermedad de Parkinson/diagnóstico por imagen , Putamen/diagnóstico por imagen , Estudios Retrospectivos , Imagen por Resonancia Magnética/métodosRESUMEN
INTRODUCTION: Recent developments in the postoperative evaluation of deep brain stimulation surgery on the group level warrant the detection of achieved electrode positions based on postoperative imaging. Computed tomography (CT) is a frequently used imaging modality, but because of its idiosyncrasies (high spatial accuracy at low soft tissue resolution), it has not been sufficient for the parallel determination of electrode position and details of the surrounding brain anatomy (nuclei). The common solution is rigid fusion of CT images and magnetic resonance (MR) images, which have much better soft tissue contrast and allow accurate normalization into template spaces. Here, we explored a deep-learning approach to directly relate positions (usually the lead position) in postoperative CT images to the native anatomy of the midbrain and group space. MATERIALS AND METHODS: Deep learning is used to create derived tissue contrasts (white matter, gray matter, cerebrospinal fluid, brainstem nuclei) based on the CT image; that is, a convolution neural network (CNN) takes solely the raw CT image as input and outputs several tissue probability maps. The ground truth is based on coregistrations with MR contrasts. The tissue probability maps are then used to either rigidly coregister or normalize the CT image in a deformable way to group space. The CNN was trained in 220 patients and tested in a set of 80 patients. RESULTS: Rigorous validation of such an approach is difficult because of the lack of ground truth. We examined the agreements between the classical and proposed approaches and considered the spread of implantation locations across a group of identically implanted subjects, which serves as an indicator of the accuracy of the lead localization procedure. The proposed procedure agrees well with current magnetic resonance imaging-based techniques, and the spread is comparable or even lower. CONCLUSIONS: Postoperative CT imaging alone is sufficient for accurate localization of the midbrain nuclei and normalization to the group space. In the context of group analysis, it seems sufficient to have a single postoperative CT image of good quality for inclusion. The proposed approach will allow researchers and clinicians to include cases that were not previously suitable for analysis.
Asunto(s)
Estimulación Encefálica Profunda , Aprendizaje Profundo , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Encéfalo/diagnóstico por imagen , Encéfalo/cirugía , Tomografía Computarizada por Rayos X/métodos , Imagen por Resonancia Magnética/métodosRESUMEN
BACKGROUND: This study evaluated the accuracy of computer-assisted surgery (CAS)-driven DCIA (deep circumflex iliac artery) flap mandibular reconstruction by traditional morphometric methods and geometric morphometric methods (GMM). METHODS: Reconstruction accuracy was evaluated by measuring distances and angles between bilateral anatomical landmarks. Additionally, the average length of displacements vectors between landmarks was computed to evaluate factors assumed to influence reconstruction accuracy. Principal component analysis (PCA) was applied to unveil main modes of dislocation. RESULTS: High reconstruction accuracy could be demonstrated for a sample consisting of 26 patients. The effect of the number of segments and length of defect on reconstruction accuracy were close to the commonly used significance threshold (p = 0.062/0.060). PCA demonstrated displacement to result mainly from sagittal and transversal shifts. CONCLUSIONS: CAS is a viable approach to achieve high accuracy in mandibular reconstruction and GMM can facilitate the evaluation of factors influencing reconstruction accuracy and unveil main modes of dislocation in this context.
Asunto(s)
Colgajos Tisulares Libres , Reconstrucción Mandibular , Procedimientos de Cirugía Plástica , Cirugía Asistida por Computador , Humanos , Reconstrucción Mandibular/métodos , Arteria Ilíaca/cirugía , Colgajos Quirúrgicos/irrigación sanguínea , Computadores , Procedimientos de Cirugía Plástica/métodos , Colgajos Tisulares Libres/cirugíaRESUMEN
OBJECTIVE: To evaluate the impact of reducing the radiographic field of view (FOV) on the trueness and precision of the alignment between cone beam computed tomography (CBCT) and intraoral scanning data for implant planning. MATERIALS AND METHODS: Fifteen participants presenting with one of three clinical scenarios: single tooth loss (ST, n = 5), multiple missing teeth (MT, n = 5) and presence of radiographic artifacts (AR, n = 5) were included. CBCT volumes covering the full arch (FA) were reduced to the quadrant (Q) or the adjacent tooth/teeth (A). Two operators, an expert (exp) in virtual implant planning and an inexperienced clinician, performed multiple superimpositions, with FA-exp serving as a reference. The deviations were calculated at the implant apex and shoulder levels. Thereafter, linear mixed models were adapted to investigate the influence of FOV on discrepancies. RESULTS: Evaluation of trueness compared to FA-exp resulted in the largest mean (AR-A: 0.10 ± 0.33 mm) and single maximum discrepancy (AR-Q: 1.44 mm) in the presence of artifacts. Furthermore, for the ST group, the largest mean error (-0.06 ± 0.2 mm, shoulder) was calculated with the FA-FOV, while for MT, with the intermediate volume (-0.07 ± 0.24 mm, Q). In terms of precision, the mean SD intervals were ≤0.25 mm (A-exp). Precision was influenced by FOV volume (FA < Q < A) but not by operator expertise. CONCLUSIONS: For single posterior missing teeth, an extended FOV does not improve registration accuracy. However, in the presence of artifacts or multiple missing posterior teeth, caution is recommended when reducing FOV.
Asunto(s)
Implantes Dentales , Diente , Tomografía Computarizada de Haz Cónico/métodos , Humanos , Imagenología Tridimensional , Proyectos Piloto , Estudios RetrospectivosRESUMEN
OBJECTIVES: To develop and validate machine learning models to distinguish between benign and malignant bone lesions and compare the performance to radiologists. METHODS: In 880 patients (age 33.1 ± 19.4 years, 395 women) diagnosed with malignant (n = 213, 24.2%) or benign (n = 667, 75.8%) primary bone tumors, preoperative radiographs were obtained, and the diagnosis was established using histopathology. Data was split 70%/15%/15% for training, validation, and internal testing. Additionally, 96 patients from another institution were obtained for external testing. Machine learning models were developed and validated using radiomic features and demographic information. The performance of each model was evaluated on the test sets for accuracy, area under the curve (AUC) from receiver operating characteristics, sensitivity, and specificity. For comparison, the external test set was evaluated by two radiology residents and two radiologists who specialized in musculoskeletal tumor imaging. RESULTS: The best machine learning model was based on an artificial neural network (ANN) combining both radiomic and demographic information achieving 80% and 75% accuracy at 75% and 90% sensitivity with 0.79 and 0.90 AUC on the internal and external test set, respectively. In comparison, the radiology residents achieved 71% and 65% accuracy at 61% and 35% sensitivity while the radiologists specialized in musculoskeletal tumor imaging achieved an 84% and 83% accuracy at 90% and 81% sensitivity, respectively. CONCLUSIONS: An ANN combining radiomic features and demographic information showed the best performance in distinguishing between benign and malignant bone lesions. The model showed lower accuracy compared to specialized radiologists, while accuracy was higher or similar compared to residents. KEY POINTS: ⢠The developed machine learning model could differentiate benign from malignant bone tumors using radiography with an AUC of 0.90 on the external test set. ⢠Machine learning models that used radiomic features or demographic information alone performed worse than those that used both radiomic features and demographic information as input, highlighting the importance of building comprehensive machine learning models. ⢠An artificial neural network that combined both radiomic and demographic information achieved the best performance and its performance was compared to radiology readers on an external test set.
Asunto(s)
Neoplasias Óseas , Aprendizaje Automático , Adolescente , Adulto , Neoplasias Óseas/diagnóstico por imagen , Femenino , Humanos , Persona de Mediana Edad , Radiografía , Estudios Retrospectivos , Tomografía Computarizada por Rayos X/métodos , Rayos X , Adulto JovenRESUMEN
Background An artificial intelligence model that assesses primary bone tumors on radiographs may assist in the diagnostic workflow. Purpose To develop a multitask deep learning (DL) model for simultaneous bounding box placement, segmentation, and classification of primary bone tumors on radiographs. Materials and Methods This retrospective study analyzed bone tumors on radiographs acquired prior to treatment and obtained from patient data from January 2000 to June 2020. Benign or malignant bone tumors were diagnosed in all patients by using the histopathologic findings as the reference standard. By using split-sample validation, 70% of the patients were assigned to the training set, 15% were assigned to the validation set, and 15% were assigned to the test set. The final performance was evaluated on an external test set by using geographic validation, with accuracy, sensitivity, specificity, and 95% CIs being used for classification, the intersection over union (IoU) being used for bounding box placements, and the Dice score being used for segmentations. Results Radiographs from 934 patients (mean age, 33 years ± 19 [standard deviation]; 419 women) were evaluated in the internal data set, which included 667 benign bone tumors and 267 malignant bone tumors. Six hundred fifty-four patients were in the training set, 140 were in the validation set, and 140 were in the test set. One hundred eleven patients were in the external test set. The multitask DL model achieved 80.2% (89 of 111; 95% CI: 72.8, 87.6) accuracy, 62.9% (22 of 35; 95% CI: 47, 79) sensitivity, and 88.2% (67 of 76; CI: 81, 96) specificity in the classification of bone tumors as malignant or benign. The model achieved an IoU of 0.52 ± 0.34 for bounding box placements and a mean Dice score of 0.60 ± 0.37 for segmentations. The model accuracy was higher than that of two radiologic residents (71.2% and 64.9%; P = .002 and P < .001, respectively) and was comparable with that of two musculoskeletal fellowship-trained radiologists (83.8% and 82.9%; P = .13 and P = .25, respectively) in classifying a tumor as malignant or benign. Conclusion The developed multitask deep learning model allowed for accurate and simultaneous bounding box placement, segmentation, and classification of primary bone tumors on radiographs. © RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Carrino in this issue.
Asunto(s)
Neoplasias Óseas/diagnóstico por imagen , Aprendizaje Profundo , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Radiografía/métodos , Adulto , Huesos/diagnóstico por imagen , Femenino , Humanos , Masculino , Estudios RetrospectivosRESUMEN
OBJECTIVES: The impact of specific blood flow patterns within ascending aortic and/or aortic root aneurysms on aortic morphology is unknown. We investigated the interrelation of ascending aortic flow compression/peripheralization and aneurysm morphology with respect to sinotubuar junction (STJ) definition. METHODS: Thirty-one patients (aortic root/ascending aortic aneurysm >45 mm) underwent flow-sensitive 4D magnetic resonance thoracic aortic flow measurement at 3 Tesla (Siemens, Germany) at two different institutions (Freiburg, Germany, and San Francisco, CA, USA). Time-resolved image data post-processing and visualization of mid-systolic, mid-ascending aortic flow were performed using local vector fields. The Flow Compression Index (FCI) was calculated individually as a fraction of the area of high-velocity mid-systolic flow over the complete cross-sectional ascending aortic area. According to aortic aneurysm morphology, patients were grouped as (i) small root, eccentric ascending aortic aneurysm (STJ definition) and (ii) enlarged aortic root, non-eccentric ascending aortic aneurysm with diffuse root and tubular enlargement. RESULTS: The mean FCI over all patients was 0.47 ± 0.5 (0.37-0.99). High levels of flow compression/peripheralization (FCI <0.6) were linked to eccentric aneurysm morphology (Group A, n = 11), while low levels or absence of aortic flow compression/peripheralization (FCI >0.8) occurred more often in Group B (n = 20). The FCI was 0.48 ± 0.05 in Group A and 0.78 ± 0.14 in Group B (P < 0.001). Distribution of bicuspid aortic valve (P = 0.6) and type of valve dysfunction (P = 0.22 for aortic stenosis) was not found to be different between groups. CONCLUSIONS: Irrespective of aortic valve morphology and function, ascending aortic blood flow patterns are linked to distinct patterns of ascending aortic aneurysm morphology. Implementation of quantitative local blood flow analyses might help to improve aneurysm risk stratification in the future.
Asunto(s)
Aorta/patología , Aneurisma de la Aorta/diagnóstico , Imagen por Resonancia Cinemagnética/métodos , Intensificación de Imagen Radiográfica , Adulto , Anciano , Aorta/cirugía , Aneurisma de la Aorta/cirugía , Velocidad del Flujo Sanguíneo/fisiología , Estudios de Cohortes , Fuerza Compresiva , Intervalos de Confianza , Medios de Contraste , Femenino , Estudios de Seguimiento , Humanos , Imagenología Tridimensional/métodos , Masculino , Persona de Mediana Edad , Cuidados Preoperatorios/métodos , Flujo Sanguíneo Regional/fisiología , Estudios Retrospectivos , Medición de Riesgo , Índice de Severidad de la Enfermedad , Estadísticas no Paramétricas , Resultado del Tratamiento , Adulto JovenRESUMEN
OBJECTIVE: We sought to evaluate the feasibility of k-t parallel imaging for accelerated 4D flow MRI in the hepatic vascular system by investigating the impact of different acceleration factors. MATERIALS AND METHODS: k-t GRAPPA accelerated 4D flow MRI of the liver vasculature was evaluated in 16 healthy volunteers at 3T with acceleration factors R = 3, R = 5, and R = 8 (2.0 × 2.5 × 2.4 mm(3), TR = 82 ms), and R = 5 (TR = 41 ms); GRAPPA R = 2 was used as the reference standard. Qualitative flow analysis included grading of 3D streamlines and time-resolved particle traces. Quantitative evaluation assessed velocities, net flow, and wall shear stress (WSS). RESULTS: Significant scan time savings were realized for all acceleration factors compared to standard GRAPPA R = 2 (21-71 %) (p < 0.001). Quantification of velocities and net flow offered similar results between k-t GRAPPA R = 3 and R = 5 compared to standard GRAPPA R = 2. Significantly increased leakage artifacts and noise were seen between standard GRAPPA R = 2 and k-t GRAPPA R = 8 (p < 0.001) with significant underestimation of peak velocities and WSS of up to 31 % in the hepatic arterial system (p <0.05). WSS was significantly underestimated up to 13 % in all vessels of the portal venous system for k-t GRAPPA R = 5, while significantly higher values were observed for the same acceleration with higher temporal resolution in two veins (p < 0.05). CONCLUSION: k-t acceleration of 4D flow MRI is feasible for liver hemodynamic assessment with acceleration factors R = 3 and R = 5 resulting in a scan time reduction of at least 40 % with similar quantitation of liver hemodynamics compared with GRAPPA R = 2.
Asunto(s)
Velocidad del Flujo Sanguíneo/fisiología , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Circulación Hepática/fisiología , Hígado/fisiología , Angiografía por Resonancia Magnética/métodos , Adulto , Estudios de Factibilidad , Femenino , Humanos , Aumento de la Imagen/métodos , Hígado/anatomía & histología , Reproducibilidad de los Resultados , Técnicas de Imagen Sincronizada Respiratorias/métodos , Sensibilidad y Especificidad , Resistencia al Corte/fisiologíaRESUMEN
PURPOSE: To evaluate influence of variation in spatio-temporal resolution and scan-rescan reproducibility on three-dimensional (3D) visualization and quantification of arterial and portal venous (PV) liver hemodynamics at four-dimensional (4D) flow MRI. METHODS: Scan-rescan reproducibility of 3D hemodynamic analysis of the liver was evaluated in 10 healthy volunteers using 4D flow MRI at 3T with three different spatio-temporal resolutions (2.4 × 2.0 × 2.4 mm(3), 61.2 ms; 2.5 × 2.0 × 2.4 mm(3), 81.6 ms; 2.6 × 2.5 × 2.6 mm(3), 80 ms) and thus different total scan times. Qualitative flow analysis used 3D streamlines and time-resolved particle traces. Quantitative evaluation was based on maximum and mean velocities, flow volume, and vessel lumen area in the hepatic arterial and PV systems. RESULTS: 4D flow MRI showed good interobserver variability for assessment of arterial and PV liver hemodynamics. 3D flow visualization revealed limitations for the left intrahepatic PV branch. Lower spatio-temporal resolution resulted in underestimation of arterial velocities (mean 15%, P < 0.05). For the PV system, hemodynamic analyses showed significant differences in the velocities for intrahepatic portal vein vessels (P < 0.05). Scan-rescan reproducibility was good except for flow volumes in the arterial system. CONCLUSION: 4D flow MRI for assessment of liver hemodynamics can be performed with low interobserver variability and good reproducibility. Higher spatio-temporal resolution is necessary for complete assessment of the hepatic blood flow required for clinical applications.
Asunto(s)
Velocidad del Flujo Sanguíneo/fisiología , Técnicas de Imagen Sincronizada Cardíacas/métodos , Arteria Hepática/fisiología , Venas Hepáticas/fisiología , Imagenología Tridimensional/métodos , Circulación Hepática/fisiología , Angiografía por Resonancia Magnética/métodos , Adulto , Femenino , Humanos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Masculino , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Análisis Espacio-Temporal , Adulto JovenRESUMEN
PURPOSE: To evaluate the feasibility of time-resolved flow-sensitive four-dimensional (4D) MRI for the visualization and quantification of splanchnic arterial and portal venous hemodynamics in patients with cirrhosis and in controls. MATERIALS AND METHODS: We applied flow-sensitive 4D MRI to evaluate arterial and portal venous three-dimensional blood flow in patients with advanced liver cirrhosis (n=5) and in healthy controls (n=10) using 3T MRI (spatial resolution=1.7×2.1×2.4 mm, temporal resolution=62.4 ms). The qualitative flow was analyzed using three-dimensional streamlines and time-resolved particle traces. Retrospective flow was quantified in nine predefined anatomic regions evaluating maximum and mean velocities, the flow volume, the vessel lumen area, pulsatility indices, and resistance indices. Doppler ultrasound (US) was our reference standard. RESULTS: Flow-sensitive 4D MRI visualized liver hemodynamics successfully in 91% of patients and 96% of volunteers with limitations for the patients' extrahepatic vessels (one case of splenic and superior mesenteric veins each) and intrahepatic portal vein branches (in five vessels). Healthy control individuals revealed reduced velocities and larger vessel areas in MRI than in Doppler US. We found no significant differences in the flow volume, pulsatility indices, and resistance indices on comparing MRI with US. Regional flow quantification within the splanchnic system of healthy volunteers and liver cirrhosis patients revealed an increase in the inflow (up to 65%), but a decrease in the patients' outflow (up to 37%). CONCLUSION: Flow-sensitive 4D MRI is feasible for profound evaluation of arterial and portal venous hemodynamics in liver cirrhosis patients, providing additional information on the pathophysiology of the altered splanchnic system.
Asunto(s)
Cirrosis Hepática/fisiopatología , Vena Porta/fisiopatología , Circulación Esplácnica/fisiología , Anciano , Velocidad del Flujo Sanguíneo/fisiología , Estudios de Casos y Controles , Estudios de Factibilidad , Femenino , Hemodinámica/fisiología , Humanos , Cirrosis Hepática/diagnóstico por imagen , Angiografía por Resonancia Magnética/métodos , Masculino , Arteria Mesentérica Superior/diagnóstico por imagen , Arteria Mesentérica Superior/fisiopatología , Venas Mesentéricas/diagnóstico por imagen , Venas Mesentéricas/fisiopatología , Persona de Mediana Edad , Vena Porta/diagnóstico por imagen , Vena Esplénica/diagnóstico por imagen , Vena Esplénica/fisiopatología , Ultrasonografía Doppler/métodosRESUMEN
OBJECTIVES: Conflicting results have been reported on late aortic growth and complication rates of the descending thoracic aorta in patients with Marfan syndrome (MFS) after proximal aortic surgery. METHODS: Of 198 Marfan patients followed up regularly, 121 (43% David-I, 7% David-II, 11% supracoronary replacement, 52% mechanical conduit, 8% arch replacement) were analysed after proximal aortic surgery retrospectively. 97% had MFS1, 3% MFS2 (Loeys-Dietz-Syndrome); 56% were male and the mean age was 35 ± 13 years. 65% were initially operated on for root/ascending aortic aneurysm and 35% for aortic dissections. Using automated computed tomography angiography and magnetic resonance angiography cross-sectional analyses, the mean diameters of the distal arch, mid-descending and distal supradiaphragmatic descending thoracic aorta were measured at early and late follow-up (mean 6.3 years for aneurysms and 4.7 years for dissections). The mean duration of clinical follow-up was 7.6 years and the cumulative clinical follow-up comprised 894 patient-years. RESULTS: At 20 years, overall freedom from distal aortic complications and/or reintervention was 76% (51-86%) for aneurysms and 52% (28-71%) for dissections (P = 0.03). In non-dissected aortas, distal aortic growth was significant, but minimal: arches grew from 25.2 ± 0.6 to 26.3 ± 0.8 mm (P = 0.01), mid-descending aortas from 22.2 ± 0.5 to 24.9 ± 1.2 mm (P = 0.05) and distal descending aortas from 22.1 ± 0.7 to 24.2 ± 1.4 (P = 0.02, 0.58 mm/year ± 0.5 mm). Dissected distal aortas increased by a mean of 0.3 ± 0.5 mm/year. Dissection (P < 0.001), urgent procedure (P = 0.02) and hypertension (0.052) were associated with larger distal aortic diameters at late follow-up and more significant aortic growth over time. CONCLUSIONS: Late distal complication rates are low for patients initially presenting with aneurysms. The risk of late distal reoperation is dictated by the initial pathology and by the presence of an initial dissection and not by faster distal aortic growth. Strategies to completely restore a non-dissected anatomy might improve late surgical outcome in Marfan's syndrome.