RESUMEN
Polypharmacy remains an important challenge for patients with extensive medical complexity. Given the primary care shortage and the increasing aging population, effective polypharmacy management is crucial to manage the increasing burden of care. The capacity of large language model (LLM)-based artificial intelligence to aid in polypharmacy management has yet to be evaluated. Here, we evaluate ChatGPT's performance in polypharmacy management via its deprescribing decisions in standardized clinical vignettes. We inputted several clinical vignettes originally from a study of general practicioners' deprescribing decisions into ChatGPT 3.5, a publicly available LLM, and evaluated its capacity for yes/no binary deprescribing decisions as well as list-based prompts in which the model was prompted to choose which of several medications to deprescribe. We recorded ChatGPT responses to yes/no binary deprescribing prompts and the number and types of medications deprescribed. In yes/no binary deprescribing decisions, ChatGPT universally recommended deprescribing medications regardless of ADL status in patients with no overlying CVD history; in patients with CVD history, ChatGPT's answers varied by technical replicate. Total number of medications deprescribed ranged from 2.67 to 3.67 (out of 7) and did not vary with CVD status, but increased linearly with severity of ADL impairment. Among medication types, ChatGPT preferentially deprescribed pain medications. ChatGPT's deprescribing decisions vary along the axes of ADL status, CVD history, and medication type, indicating some concordance of internal logic between general practitioners and the model. These results indicate that specifically trained LLMs may provide useful clinical support in polypharmacy management for primary care physicians.
Asunto(s)
Enfermedades Cardiovasculares , Deprescripciones , Médicos Generales , Humanos , Anciano , Polifarmacia , Inteligencia ArtificialRESUMEN
Radiologic tests often contain rich imaging data not relevant to the clinical indication. Opportunistic screening refers to the practice of systematically leveraging these incidental imaging findings. Although opportunistic screening can apply to imaging modalities such as conventional radiography, US, and MRI, most attention to date has focused on body CT by using artificial intelligence (AI)-assisted methods. Body CT represents an ideal high-volume modality whereby a quantitative assessment of tissue composition (eg, bone, muscle, fat, and vascular calcium) can provide valuable risk stratification and help detect unsuspected presymptomatic disease. The emergence of "explainable" AI algorithms that fully automate these measurements could eventually lead to their routine clinical use. Potential barriers to widespread implementation of opportunistic CT screening include the need for buy-in from radiologists, referring providers, and patients. Standardization of acquiring and reporting measures is needed, in addition to expanded normative data according to age, sex, and race and ethnicity. Regulatory and reimbursement hurdles are not insurmountable but pose substantial challenges to commercialization and clinical use. Through demonstration of improved population health outcomes and cost-effectiveness, these opportunistic CT-based measures should be attractive to both payers and health care systems as value-based reimbursement models mature. If highly successful, opportunistic screening could eventually justify a practice of standalone "intended" CT screening.
Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Algoritmos , Radiólogos , Tamizaje Masivo/métodos , Radiología/métodosRESUMEN
As the role of artificial intelligence (AI) in clinical practice evolves, governance structures oversee the implementation, maintenance, and monitoring of clinical AI algorithms to enhance quality, manage resources, and ensure patient safety. In this article, a framework is established for the infrastructure required for clinical AI implementation and presents a road map for governance. The road map answers four key questions: Who decides which tools to implement? What factors should be considered when assessing an application for implementation? How should applications be implemented in clinical practice? Finally, how should tools be monitored and maintained after clinical implementation? Among the many challenges for the implementation of AI in clinical practice, devising flexible governance structures that can quickly adapt to a changing environment will be essential to ensure quality patient care and practice improvement objectives.
Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Radiografía , Algoritmos , Calidad de la Atención de SaludRESUMEN
The purpose of this study was to assess if clinical indications, patient location, and imaging sites predict the viewing pattern of referring physicians for CT and MR of the head, chest, and abdomen. Our study included 166,953 CT/MR images of head/chest/abdomen in 2016-2017 in the outpatient (OP, n = 83,981 CT/MR), inpatient (IP, n = 51,052), and emergency (ED, n = 31,920) settings. There were 125,329 CT/MR performed in the hospital setting and 41,624 in one of the nine off-campus locations. We extracted information regarding body region (head/chest/abdomen), patient location, and imaging site from the electronic medical records (EPIC). We recorded clinical indications and the number of times referring physicians viewed CT/MR (defined as the number of separate views of imaging in the EPIC). Data were analyzed with the Microsoft SQL and SPSS statistical software. About 33% of IP CT and MR studies are viewed > 6 times compared to 7% for OP and 19% of ED studies (p < 0.001). Conversely, most OP studies (55%) were viewed 1-2 times only, compared to 21% for IP and 38% for ED studies (p < 0.001). In-hospital exams are viewed (≥ 6 views; 39% studies) more frequently than off-campus imaging (≥ 6 views; 17% studies) (p < 0.001). For head CT/MR, certain clinical indications (i.e., stroke) had higher viewing rates compared to other clinical indications such as malignancy, headache, and dizziness. Conversely, for chest CT, dyspnea-hypoxia had much higher viewing rates (> 6 times) in IP (55%) and ED (46%) than in OP settings (22%). Patient location and imaging site regardless of clinical indications have a profound effect on viewing patterns of referring physicians. Understanding viewing patterns of the referring physicians can help guide interpretation priorities and finding communication for imaging exams based on patient location, imaging site, and clinical indications. The information can help in the efficient delivery of patient care.
Asunto(s)
Médicos , Tomografía Computarizada por Rayos X , Abdomen , Comunicación , Registros Electrónicos de Salud , HumanosRESUMEN
Recent advances and future perspectives of machine learning techniques offer promising applications in medical imaging. Machine learning has the potential to improve different steps of the radiology workflow including order scheduling and triage, clinical decision support systems, detection and interpretation of findings, postprocessing and dose estimation, examination quality control, and radiology reporting. In this article, the authors review examples of current applications of machine learning and artificial intelligence techniques in diagnostic radiology. In addition, the future impact and natural extension of these techniques in radiology practice are discussed.
Asunto(s)
Aprendizaje Automático , Sistemas de Información Radiológica , Radiología/métodos , Radiología/tendencias , HumanosRESUMEN
Artificial intelligence (AI), machine learning, and deep learning are terms now seen frequently, all of which refer to computer algorithms that change as they are exposed to more data. Many of these algorithms are surprisingly good at recognizing objects in images. The combination of large amounts of machine-consumable digital data, increased and cheaper computing power, and increasingly sophisticated statistical models combine to enable machines to find patterns in data in ways that are not only cost-effective but also potentially beyond humans' abilities. Building an AI algorithm can be surprisingly easy. Understanding the associated data structures and statistics, on the other hand, is often difficult and obscure. Converting the algorithm into a sophisticated product that works consistently in broad, general clinical use is complex and incompletely understood. To show how these AI products reduce costs and improve outcomes will require clinical translation and industrial-grade integration into routine workflow. Radiology has the chance to leverage AI to become a center of intelligently aggregated, quantitative, diagnostic information. Centaur radiologists, formed as a synergy of human plus computer, will provide interpretations using data extracted from images by humans and image-analysis computer algorithms, as well as the electronic health record, genomics, and other disparate sources. These interpretations will form the foundation of precision health care, or care customized to an individual patient. © RSNA, 2017.
Asunto(s)
Sistemas de Apoyo a Decisiones Clínicas/tendencias , Diagnóstico por Imagen/tendencias , Predicción , Interpretación de Imagen Asistida por Computador/métodos , Aprendizaje Automático/tendencias , Radiología/tendencias , Algoritmos , Humanos , Reconocimiento de Normas Patrones Automatizadas/tendencias , Programas InformáticosRESUMEN
Purpose To quantify the effect of a comprehensive, long-term, provider-led utilization management (UM) program on high-cost imaging (computed tomography, magnetic resonance imaging, nuclear imaging, and positron emission tomography) performed on an outpatient basis. Materials and Methods This retrospective, 7-year cohort study included all patients regularly seen by primary care physicians (PCPs) at an urban academic medical center. The main outcome was the number of outpatient high-cost imaging examinations per patient per year ordered by the patient's PCP or by any specialist. The authors determined the probability of a patient undergoing any high-cost imaging procedure during a study year and the number of examinations per patient per year (intensity) in patients who underwent high-cost imaging. Risk-adjusted hierarchical models were used to directly quantify the physician component of variation in probability and intensity of high-cost imaging use, and clinicians were provided with regular comparative feedback on the basis of the results. Observed trends in high-cost imaging use and provider variation were compared with the same measures for outpatient laboratory studies because laboratory use was not subject to UM during this period. Finally, per-member per-year high-cost imaging use data were compared with statewide high-cost imaging use data from a major private payer on the basis of the same claim set. Results The patient cohort steadily increased in size from 88 959 in 2007 to 109 823 in 2013. Overall high-cost imaging utilization went from 0.43 examinations per year in 2007 to 0.34 examinations per year in 2013, a decrease of 21.33% (P < .0001). At the same time, similarly adjusted routine laboratory study utilization decreased by less than half that rate (9.4%, P < .0001). On the basis of unadjusted data, outpatient high-cost imaging utilization in this cohort decreased 28%, compared with a 20% decrease in statewide utilization (P = .0023). Conclusion Analysis of high-cost imaging utilization in a stable cohort of patients cared for by PCPs during a 7-year period showed that comprehensive UM can produce a significant and sustained reduction in risk-adjusted per-patient year outpatient high-cost imaging volume. © RSNA, 2017.
Asunto(s)
Diagnóstico por Imagen , Pacientes Ambulatorios/estadística & datos numéricos , Atención Primaria de Salud , Diagnóstico por Imagen/economía , Diagnóstico por Imagen/estadística & datos numéricos , Femenino , Humanos , Masculino , Persona de Mediana Edad , Médicos de Atención Primaria/estadística & datos numéricos , Atención Primaria de Salud/economía , Atención Primaria de Salud/estadística & datos numéricos , Estudios RetrospectivosRESUMEN
PURPOSE: To determine the relevant physician- and practice-related factors that jointly affect the rate of low-utility imaging examinations (score of 1-3 out of 9) ordered by means of an order entry system that provides normative appropriateness feedback. MATERIALS AND METHODS: This HIPAA-compliant study was approved by the institutional review board under an expedited protocol for analyzing anonymous aggregated administrative data. This is a retrospective study of approximately 250 000 consecutive scheduled outpatient advanced imaging examinations (computed tomography, magnetic resonance imaging, nuclear medicine) ordered by 164 primary care and 379 medical specialty physicians from 2008 to 2012. A hierarchical logistic regression model was used to identify multiple predictors of the probability that an examination received a low utility score. Physician- and practice-specific random effects were estimated to articulate (odds ratio) and quantify (intraclass correlation) interphysician variation. RESULTS: Fixed effects found to be statistically significant predictors of low-utility imaging included examination type, whether the examination was cancelled, status of the person entering the order, and the total number of examinations ordered by the clinician. Neither patient age nor sex had any effect, and there were no secular trends (year of study). The remaining amount of interphysician variation was moderate (intraclass correlation, 22%), whereas the variation between medical specialties and primary care practices was low (intraclass correlation, 5%). The estimated physician-specific effects had reliability of 70%, which makes them just suitable for identifying outliers. CONCLUSION: The authors found that 22% of the variation in the rate of low-utility examinations is attributable to ordering providers and 5% to their specialty or clinic.
Asunto(s)
Diagnóstico por Imagen/estadística & datos numéricos , Retroalimentación , Sistemas de Entrada de Órdenes Médicas/estadística & datos numéricos , Pautas de la Práctica en Medicina , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , Adulto JovenRESUMEN
OBJECTIVE: Informatics innovations of the past 30 years have improved radiology quality and efficiency immensely. Radiologists are groundbreaking leaders in clinical information technology (IT), and often radiologists and imaging informaticists created, specified, and implemented these technologies, while also carrying the ongoing burdens of training, maintenance, support, and operation of these IT solutions. Being pioneers of clinical IT had advantages of local radiology control and radiology-centric products and services. As health care businesses become more clinically IT savvy, however, they are standardizing IT products and procedures across the enterprise, resulting in the loss of radiologists' local control and flexibility. Although this inevitable consequence may provide new opportunities in the long run, several questions arise. CONCLUSION: What will happen to the informatics expertise within the radiology domain? Will radiology's current and future concerns be heard and their needs addressed? What should radiologists do to understand, obtain, and use informatics products to maximize efficiency and provide the most value and quality for patients and the greater health care community? This article will propose some insights and considerations as we rethink radiology informatics.
Asunto(s)
Diagnóstico por Imagen/tendencias , Aplicaciones de la Informática Médica , Difusión de Innovaciones , Eficiencia Organizacional , Predicción , Humanos , Servicio de Radiología en Hospital/tendencias , Sistemas de Información Radiológica/tendenciasRESUMEN
The goal of this work is to provide radiologists an update regarding changes to stage 1 of meaningful use in 2014. These changes were promulgated in the final rulemaking released by the Centers for Medicare and Medicaid Services and the Office of the National Coordinator for Health Information Technology in September 2012. Under the new rules, radiologists are exempt from meaningful use penalties provided that they are listed as radiologists under the Provider Enrollment, Chain and Ownership System (PECOS). A major caveat is that this exemption can be removed at any time. Additional concerns are discussed in the main text. Additional changes discussed include software editions independent of meaningful use stage (i.e., 2011 edition versus 2014 edition), changes to the definition of certified electronic health record technology (CEHRT), and changes to specific measures and exemptions to those measures. The new changes regarding stage 1 add complexity to an already complex program, but overall make achieving meaningful use a win-win situation for radiologists. There are no penalties for failure and incentive payments for success. The cost of upgrading to CEHRT may be much less than the incentive payments, adding a potential new source of revenue. Additional benefits may be realized if the radiology department can build upon a modern electronic health record to improve their practice and billing patterns. Meaningful use and electronic health records represent an important evolutionary step in US healthcare, and it is imperative that radiologists are active participants in the process.
Asunto(s)
Registros Electrónicos de Salud/economía , Uso Significativo/economía , Informática Médica/economía , Radiología/economía , Difusión de Innovaciones , Femenino , Humanos , Masculino , Medicaid/economía , Medicare/economía , Estados UnidosRESUMEN
PURPOSE: We compared the performance of generative artificial intelligence (AI) (Augmented Transformer Assisted Radiology Intelligence [ATARI, Microsoft Nuance, Microsoft Corporation, Redmond, Washington]) and natural language processing (NLP) tools for identifying laterality errors in radiology reports and images. METHODS: We used an NLP-based (mPower, Microsoft Nuance) tool to identify radiology reports flagged for laterality errors in its Quality Assurance Dashboard. The NLP model detects and highlights laterality mismatches in radiology reports. From an initial pool of 1,124 radiology reports flagged by the NLP for laterality errors, we selected and evaluated 898 reports that encompassed radiography, CT, MRI, and ultrasound modalities to ensure comprehensive coverage. A radiologist reviewed each radiology report to assess if the flagged laterality errors were present (reporting error-true-positive) or absent (NLP error-false-positive). Next, we applied ATARI to 237 radiology reports and images with consecutive NLP true-positive (118 reports) and false-positive (119 reports) laterality errors. We estimated accuracy of NLP and generative AI tools to identify overall and modality-wise laterality errors. RESULTS: Among the 898 NLP-flagged laterality errors, 64% (574 of 898) had NLP errors and 36% (324 of 898) were reporting errors. The text query ATARI feature correctly identified the absence of laterality mismatch (NLP false-positives) with a 97.4% accuracy (115 of 118 reports; 95% confidence interval [CI] = 96.5%-98.3%). Combined vision and text query resulted in 98.3% accuracy (116 of 118 reports or images; 95% CI = 97.6%-99.0%), and query alone had a 98.3% accuracy (116 of 118 images; 95% CI = 97.6%-99.0%). CONCLUSION: The generative AI-empowered ATARI prototype outperformed the assessed NLP tool for determining true and false laterality errors in radiology reports while enabling an image-based laterality determination. Underlying errors in ATARI text query in complex radiology reports emphasize the need for further improvement in the technology.
Asunto(s)
Inteligencia Artificial , Procesamiento de Lenguaje Natural , Humanos , Sistemas de Información Radiológica , Errores Diagnósticos , Diagnóstico por ImagenRESUMEN
PURPOSE: We created an infrastructure for no code machine learning (NML) platform for non-programming physicians to create NML model. We tested the platform by creating an NML model for classifying radiographs for the presence and absence of clavicle fractures. METHODS: Our IRB-approved retrospective study included 4135 clavicle radiographs from 2039 patients (mean age 52 ± 20 years, F:M 1022:1017) from 13 hospitals. Each patient had two-view clavicle radiographs with axial and anterior-posterior projections. The positive radiographs had either displaced or non-displaced clavicle fractures. We configured the NML platform to automatically retrieve the eligible exams using the series' unique identification from the hospital virtual network archive via web access to DICOM Objects. The platform trained a model until the validation loss plateaus. Once the testing was complete, the platform provided the receiver operating characteristics curve and confusion matrix for estimating sensitivity, specificity, and accuracy. RESULTS: The NML platform successfully retrieved 3917 radiographs (3917/4135, 94.7 %) and parsed them for creating a ML classifier with 2151 radiographs in the training, 100 radiographs for validation, and 1666 radiographs in testing datasets (772 radiographs with clavicle fracture, 894 without clavicle fracture). The network identified clavicle fracture with 90 % sensitivity, 87 % specificity, and 88 % accuracy with AUC of 0.95 (confidence interval 0.94-0.96). CONCLUSION: A NML platform can help physicians create and test machine learning models from multicenter imaging datasets such as the one in our study for classifying radiographs based on the presence of clavicle fracture.
Asunto(s)
Clavícula , Fracturas Óseas , Aprendizaje Automático , Humanos , Clavícula/lesiones , Clavícula/diagnóstico por imagen , Fracturas Óseas/diagnóstico por imagen , Fracturas Óseas/clasificación , Femenino , Persona de Mediana Edad , Masculino , Estudios Retrospectivos , Sensibilidad y Especificidad , Adulto , Radiografía/métodosRESUMEN
BACKGROUND AND PURPOSE: Mass effect and vasogenic edema are critical findings on CT of the head. This study compared the accuracy of an artificial intelligence model (Annalise Enterprise CTB) with consensus neuroradiologists' interpretations in detecting mass effect and vasogenic edema. MATERIALS AND METHODS: A retrospective stand-alone performance assessment was conducted on data sets of noncontrast CT head cases acquired between 2016 and 2022 for each finding. The cases were obtained from patients 18 years of age or older from 5 hospitals in the United States. The positive cases were selected consecutively on the basis of the original clinical reports using natural language processing and manual confirmation. The negative cases were selected by taking the next negative case acquired from the same CT scanner after positive cases. Each case was interpreted independently by up-to-three neuroradiologists to establish consensus interpretations. Each case was then interpreted by the artificial intelligence model for the presence of the relevant finding. The neuroradiologists were provided with the entire CT study. The artificial intelligence model separately received thin (≤1.5 mm) and/or thick (>1.5 and ≤5 mm) axial series. RESULTS: The 2 cohorts included 818 cases for mass effect and 310 cases for vasogenic edema. The artificial intelligence model identified mass effect with a sensitivity of 96.6% (95% CI, 94.9%-98.2%) and a specificity of 89.8% (95% CI, 84.7%-94.2%) for the thin series, and 95.3% (95% CI, 93.5%-96.8%) and 93.1% (95% CI, 89.1%-96.6%) for the thick series. It identified vasogenic edema with a sensitivity of 90.2% (95% CI, 82.0%-96.7%) and a specificity of 93.5% (95% CI, 88.9%-97.2%) for the thin series, and 90.0% (95% CI, 84.0%-96.0%) and 95.5% (95% CI, 92.5%-98.0%) for the thick series. The corresponding areas under the curve were at least 0.980. CONCLUSIONS: The assessed artificial intelligence model accurately identified mass effect and vasogenic edema in this CT data set. It could assist the clinical workflow by prioritizing interpretation of cases with abnormal findings, possibly benefiting patients through earlier identification and subsequent treatment.
Asunto(s)
Inteligencia Artificial , Edema Encefálico , Tomografía Computarizada por Rayos X , Humanos , Edema Encefálico/diagnóstico por imagen , Estudios Retrospectivos , Femenino , Tomografía Computarizada por Rayos X/métodos , Masculino , Persona de Mediana Edad , Anciano , Sensibilidad y Especificidad , AdultoRESUMEN
PURPOSE: To assess the ability of the Annalise Enterprise CXR Triage Trauma (Annalise AI Pty Ltd, Sydney, NSW, Australia) artificial intelligence model to identify vertebral compression fractures on chest radiographs and its potential to address undiagnosed osteoporosis and its treatment. MATERIALS AND METHODS: This retrospective study used a consecutive cohort of 596 chest radiographs from four US hospitals between 2015 and 2021. Each radiograph included both frontal (anteroposterior or posteroanterior) and lateral projections. These radiographs were assessed for the presence of vertebral compression fracture in a consensus manner by up to three thoracic radiologists. The model then performed inference on the cases. A chart review was also performed for the presence of osteoporosis-related International Classification of Diseases, 10th revision diagnostic codes and medication use for the study period and an additional year of follow-up. RESULTS: The model successfully completed inference on 595 cases (99.8%); these cases included 272 positive cases and 323 negative cases. The model performed with area under the receiver operating characteristic curve of 0.955 (95% confidence interval [CI]: 0.939-0.968), sensitivity 89.3% (95% CI: 85.7%-92.7%) and specificity 89.2% (95% CI: 85.4%-92.3%). Out of the 236 true-positive cases (ie, correctly identified vertebral compression fractures by the model) with available chart information, only 86 (36.4%) had a diagnosis of vertebral compression fracture and 140 (59.3%) had a diagnosis of either osteoporosis or osteopenia; only 78 (33.1%) were receiving a disease-modifying medication for osteoporosis. CONCLUSION: The model identified vertebral compression fracture accurately with a sensitivity 89.3% (95% CI: 85.7%-92.7%) and specificity of 89.2% (95% CI: 85.4%-92.3%). Its automated use could help identify patients who have undiagnosed osteoporosis and who may benefit from taking disease-modifying medications.
RESUMEN
The opportunistic use of radiological examinations for disease detection can potentially enable timely management. We assessed if an index created by an AI software to quantify chest radiography (CXR) findings associated with heart failure (HF) could distinguish between patients who would develop HF or not within a year of the examination. Our multicenter retrospective study included patients who underwent CXR without an HF diagnosis. We included 1117 patients (age 67.6 ± 13 years; m:f 487:630) that underwent CXR. A total of 413 patients had the CXR image taken within one year of their HF diagnosis. The rest (n = 704) were patients without an HF diagnosis after the examination date. All CXR images were processed with the model (qXR-HF, Qure.AI) to obtain information on cardiac silhouette, pleural effusion, and the index. We calculated the accuracy, sensitivity, specificity, and area under the curve (AUC) of the index to distinguish patients who developed HF within a year of the CXR and those who did not. We report an AUC of 0.798 (95%CI 0.77-0.82), accuracy of 0.73, sensitivity of 0.81, and specificity of 0.68 for the overall AI performance. AI AUCs by lead time to diagnosis (<3 months: 0.85; 4-6 months: 0.82; 7-9 months: 0.75; 10-12 months: 0.71), accuracy (0.68-0.72), and specificity (0.68) remained stable. Our results support the ongoing investigation efforts for opportunistic screening in radiology.
RESUMEN
PURPOSE: To assess feasibility of automated segmentation and measurement of tracheal collapsibility for detecting tracheomalacia on inspiratory and expiratory chest CT images. METHODS: Our study included 123 patients (age 67 ± 11 years; female: male 69:54) who underwent clinically indicated chest CT examinations in both inspiration and expiration phases. A thoracic radiologist measured anteroposterior length of trachea in inspiration and expiration phase image at the level of maximum collapsibility or aortic arch (in absence of luminal change). Separately, another investigator separately processed the inspiratory and expiratory DICOM CT images with Airway Segmentation component of a commercial COPD software (IntelliSpace Portal, Philips Healthcare). Upon segmentation, the software automatically estimated average lumen diameter (in mm) and lumen area (sq.mm) both along the entire length of trachea and at the level of aortic arch. Data were analyzed with independent t-tests and area under the receiver operating characteristic curve (AUC). RESULTS: Of the 123 patients, 48 patients had tracheomalacia and 75 patients did not. Ratios of inspiration to expiration phases average lumen area and lumen diameter from the length of trachea had the highest AUC of 0.93 (95% CI = 0.88-0.97) for differentiating presence and absence of tracheomalacia. A decrease of ≥25% in average lumen diameter had sensitivity of 82% and specificity of 87% for detecting tracheomalacia. A decrease of ≥40% in the average lumen area had sensitivity and specificity of 86% for detecting tracheomalacia. CONCLUSION: Automatic segmentation and measurement of tracheal dimension over the entire tracheal length is more accurate than a single-level measurement for detecting tracheomalacia.
Asunto(s)
Traqueomalacia , Humanos , Masculino , Femenino , Persona de Mediana Edad , Anciano , Traqueomalacia/diagnóstico por imagen , Tráquea/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Sensibilidad y Especificidad , Curva ROCRESUMEN
OBJECTIVE: Despite rising popularity and performance, studies evaluating the use of large language models for clinical decision support are lacking. Here, we evaluate ChatGPT (Generative Pre-trained Transformer)-3.5 and GPT-4's (OpenAI, San Francisco, California) capacity for clinical decision support in radiology via the identification of appropriate imaging services for two important clinical presentations: breast cancer screening and breast pain. METHODS: We compared ChatGPT's responses to the ACR Appropriateness Criteria for breast pain and breast cancer screening. Our prompt formats included an open-ended (OE) and a select all that apply (SATA) format. Scoring criteria evaluated whether proposed imaging modalities were in accordance with ACR guidelines. Three replicate entries were conducted for each prompt, and the average of these was used to determine final scores. RESULTS: Both ChatGPT-3.5 and ChatGPT-4 achieved an average OE score of 1.830 (out of 2) for breast cancer screening prompts. ChatGPT-3.5 achieved a SATA average percentage correct of 88.9%, compared with ChatGPT-4's average percentage correct of 98.4% for breast cancer screening prompts. For breast pain, ChatGPT-3.5 achieved an average OE score of 1.125 (out of 2) and a SATA average percentage correct of 58.3%, as compared with an average OE score of 1.666 (out of 2) and a SATA average percentage correct of 77.7%. DISCUSSION: Our results demonstrate the eventual feasibility of using large language models like ChatGPT for radiologic decision making, with the potential to improve clinical workflow and responsible use of radiology services. More use cases and greater accuracy are necessary to evaluate and implement such tools.
Asunto(s)
Neoplasias de la Mama , Mastodinia , Radiología , Humanos , Femenino , Neoplasias de la Mama/diagnóstico por imagen , Toma de DecisionesRESUMEN
PURPOSE: Knowledge of kidney stone composition can help in patient management; urine composition analysis and dual-energy CT are frequently used to assess stone type. We assessed if threshold-based stone segmentation and radiomics can determine the composition of kidney stones from single-energy, non-contrast abdomen-pelvis CT. METHODS: With IRB approval, we identified 218 consecutive patients (mean age 64 ± 13 years; male:female 138:80) with the presence of kidney stones on non-contrast, abdomen-pelvis CT and surgical or biochemical proof of their stone composition. CT examinations were performed on one of the seven multidetector-row scanners from four vendors (GE, Philips, Siemens, Toshiba). Deidentified CT images were processed with a radiomics prototype (Frontier, Siemens Healthineers) to segment the entire kidney volumes with an AI-based organ segmentation tool. We applied a threshold of 130 HU to isolate stones in the segmented kidneys and to estimate radiomics over the segmented stone volume. A coinvestigator verified kidney stone segmentation and adjusted the volume of interest to include the entire stone volume when necessary. We applied multiple logistic regression tests with precision recall plots to obtain area under the curve (AUC) using a built-in R statistical program. RESULTS: The threshold-based stone segmentation successfully isolated kidney stones (uric acid: n = 102 patients, calcium oxalate/phosphate: n = 116 patients) in all patients. Radiomics differentiated between calcium and uric acid stones with an AUC of 0.78 (p < 0.01, 95% CI 0.73-0.83), 0.79 sensitivity, and 0.90 specificity regardless of CT vendors (GE CT: AUC = 0.82, p < 0.01, 95% CI 0.740-0896; Siemens CT: AUC = 0.77, 95% CI 0.700-0.846, p < 0.01). CONCLUSION: Automated threshold-based stone segmentation and radiomics can differentiate between calcium oxalate/phosphate and urate stones from non-contrast, single-energy abdomen CT.