RESUMO
BACKGROUND: As advancements in artificial intelligence (AI) continue, large language models (LLMs) have emerged as promising tools for generating medical information. Their rapid adaptation and potential benefits in health care require rigorous assessment in terms of the quality, accuracy, and safety of the generated information across diverse medical specialties. OBJECTIVE: This study aimed to evaluate the performance of 4 prominent LLMs, namely, Claude-instant-v1.0, GPT-3.5-Turbo, Command-xlarge-nightly, and Bloomz, in generating medical content spanning the clinical specialties of ophthalmology, orthopedics, and dermatology. METHODS: Three domain-specific physicians evaluated the AI-generated therapeutic recommendations for a diverse set of 60 diseases. The evaluation criteria involved the mDISCERN score, correctness, and potential harmfulness of the recommendations. ANOVA and pairwise t tests were used to explore discrepancies in content quality and safety across models and specialties. Additionally, using the capabilities of OpenAI's most advanced model, GPT-4, an automated evaluation of each model's responses to the diseases was performed using the same criteria and compared to the physicians' assessments through Pearson correlation analysis. RESULTS: Claude-instant-v1.0 emerged with the highest mean mDISCERN score (3.35, 95% CI 3.23-3.46). In contrast, Bloomz lagged with the lowest score (1.07, 95% CI 1.03-1.10). Our analysis revealed significant differences among the models in terms of quality (P<.001). Evaluating their reliability, the models displayed strong contrasts in their falseness ratings, with variations both across models (P<.001) and specialties (P<.001). Distinct error patterns emerged, such as confusing diagnoses; providing vague, ambiguous advice; or omitting critical treatments, such as antibiotics for infectious diseases. Regarding potential harm, GPT-3.5-Turbo was found to be the safest, with the lowest harmfulness rating. All models lagged in detailing the risks associated with treatment procedures, explaining the effects of therapies on quality of life, and offering additional sources of information. Pearson correlation analysis underscored a substantial alignment between physician assessments and GPT-4's evaluations across all established criteria (P<.01). CONCLUSIONS: This study, while comprehensive, was limited by the involvement of a select number of specialties and physician evaluators. The straightforward prompting strategy ("How to treat ") and the assessment benchmarks, initially conceptualized for human-authored content, might have potential gaps in capturing the nuances of AI-driven information. The LLMs evaluated showed a notable capability in generating valuable medical content; however, evident lapses in content quality and potential harm signal the need for further refinements. Given the dynamic landscape of LLMs, this study's findings emphasize the need for regular and methodical assessments, oversight, and fine-tuning of these AI tools to ensure they produce consistently trustworthy and clinically safe medical advice. Notably, the introduction of an auto-evaluation mechanism using GPT-4, as detailed in this study, provides a scalable, transferable method for domain-agnostic evaluations, extending beyond therapy recommendation assessments.
Assuntos
Inteligência Artificial , Medicina , Humanos , Qualidade de Vida , Reprodutibilidade dos Testes , IdiomaRESUMO
BACKGROUND: In response to students´ poor ratings of emergency remote lectures in internal medicine, a team of undergraduate medical students initiated a series of voluntary peer-moderated clinical case discussions. This study aims to describe the student-led effort to develop peer-moderated clinical case discussions focused on training cognitive clinical skill for first and second-year clinical students. METHODS: Following the Kern Cycle a didactic concept is conceived by matching cognitive learning theory to the competence levels of the German Medical Training Framework. A 50-item survey is developed based on previous evaluation tools and administered after each tutorial. Educational environment, cognitive congruence, and learning outcomes are assessed using pre-post-self-reports in a single-institution study. RESULTS: Over the course of two semesters 19 tutors conducted 48 tutorials. There were 794 attendances in total (273 in the first semester and 521 in the second). The response rate was 32%. The didactic concept proved successful in attaining all learning objectives. Students rated the educational environment, cognitive congruence, and tutorials overall as "very good" and significantly better than the corresponding lecture. Students reported a 70%-increase in positive feelings about being tutored by peers after the session. CONCLUSION: Peer-assisted learning can improve students´ subjective satisfaction levels and successfully foster clinical reasoning skills. This highlights successful student contributions to the development of curricula.
Assuntos
Estudantes de Medicina , Capacitação de Professores , Humanos , Grupo Associado , Medicina Interna , CurrículoRESUMO
INTRODUCTION: Remission is the ultimate goal in systemic lupus erythematosus (SLE). In this study, we applied four definitions of remission agreed on by an international collaboration (Definitions of Remission in SLE, DORIS) to a large clinical cohort to estimate rates and predictors of remission. METHODS: We applied the DORIS definitions of Clinical Remission, Complete Remission (requiring negative serologies), Clinical Remission on treatment (ROT) and Complete ROT. 2307 patients entered the cohort from 1987 to 2014 and were seen at least quarterly. Patients not in remission at cohort entry were followed prospectively. We used the Kaplan-Meier approach to estimate the time to remission and the time from remission to relapse. Cox regression was used to identify baseline factors associated with time to remission, adjusting for baseline disease activity and baseline treatment. RESULTS: The median time to remission was 8.7, 11.0, 1.8 and 3.1â years for Clinical Remission, Complete Remission, Clinical ROT and Complete ROT, respectively. High baseline treatment was the major predictor of a longer time to remission, followed by high baseline activity. The median duration of remission for all definitions was 3â months. African-American ethnicity, baseline low C3 and baseline haematological activity were associated with longer time to remission for all definitions. Baseline anti-dsDNA and baseline low C4 were associated with longer time to Complete Remission and Complete ROT. Baseline low C4 was also negatively associated with Clinical Remission. CONCLUSIONS: Our results provide further insights into the frequency and duration of remission in SLE and call attention to the major role of baseline activity and baseline treatment in predicting remission.
Assuntos
Lúpus Eritematoso Sistêmico/sangue , Lúpus Eritematoso Sistêmico/tratamento farmacológico , Adulto , Anti-Inflamatórios/uso terapêutico , Anticorpos Antinucleares/sangue , Complemento C3/metabolismo , Complemento C4/metabolismo , DNA/imunologia , Intervalo Livre de Doença , Feminino , Humanos , Imunossupressores/uso terapêutico , Estimativa de Kaplan-Meier , Lúpus Eritematoso Sistêmico/etnologia , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Prednisona/uso terapêutico , Modelos de Riscos Proporcionais , Indução de Remissão , Fatores de TempoRESUMO
Plasmacytoid dendritic cells (pDCs) play a central role in the pathogenesis of systemic lupus erythematosus (SLE) as IFN-α producers and promoters of T-cell activation or tolerance. Here, we demonstrated by flow-cytometry and confocal microscopy that Siglec-1, a molecule involved in the regulation of adaptive immunoresponses, is expressed in a subset of semi-mature, myeloid-like pDCs in human blood. These pDCs express lower BDCA-2 and CD123 and higher HLA-DR and CD11c than Siglec-1-negative pDCs and do not produce IFN-α via TLR7/TLR9 engagement. In vitro, Siglec-1 expression was induced in Siglec-1-negative pDCs by influenza virus. Proportions of Siglec-1-positive/Siglec-1-negative pDCs were higher in SLE than in healthy controls and correlated with disease activity. Healthy donors immunized with yellow fever vaccine YFV-17D displayed different kinetics of the two pDC subsets during protective immune response. PDCs can be subdivided into two subsets according to Siglec-1 expression. These subsets may play specific roles in (auto)immune responses.
Assuntos
Células Dendríticas/imunologia , Vacinas contra Influenza/farmacologia , Lúpus Eritematoso Sistêmico/imunologia , Lectina 1 Semelhante a Ig de Ligação ao Ácido Siálico/metabolismo , Vacina contra Febre Amarela/farmacologia , Adulto , Doenças Autoimunes/imunologia , Autoimunidade/imunologia , Estudos de Casos e Controles , Células Dendríticas/efeitos dos fármacos , Células Dendríticas/metabolismo , Feminino , Citometria de Fluxo , Humanos , Imunofenotipagem , Técnicas In Vitro , Interferon-alfa/imunologia , Microscopia Confocal , Microscopia de Fluorescência , Pessoa de Meia-Idade , Células Mieloides/imunologia , Receptor Toll-Like 9/imunologia , Adulto JovemRESUMO
This study evaluates multimodal AI models' accuracy and responsiveness in answering NEJM Image Challenge questions, juxtaposed with human collective intelligence, underscoring AI's potential and current limitations in clinical diagnostics. Anthropic's Claude 3 family demonstrated the highest accuracy among the evaluated AI models, surpassing the average human accuracy, while collective human decision-making outperformed all AI models. GPT-4 Vision Preview exhibited selectivity, responding more to easier questions with smaller images and longer questions.
RESUMO
BACKGROUND: Understanding the dynamics of conduction velocity (CV) and voltage amplitude (VA) is crucial in cardiac electrophysiology, particularly for substrate-based catheter ablations targeting slow conduction zones and low voltage areas. This study utilizes ultra-high-density mapping to investigate the impact of heart rate and pacing location on changes in the wavefront direction, CV, and VA of healthy pig hearts. METHODS: We conducted in vivo electrophysiological studies on four healthy juvenile pigs, involving various pacing locations and heart rates. High-resolution electroanatomic mapping was performed during intrinsic normal sinus rhythm (NSR) and electrical pacing. The study encompassed detailed analyses at three levels: entire heart cavities, subregions, and localized 5-mm-diameter circular areas. Linear mixed-effects models were used to analyze the influence of heart rate and pacing location on CV and VA in different regions. RESULTS: An increase in heart rate correlated with an increase in conduction velocity and a decrease in voltage amplitude. Pacing influenced conduction velocity and voltage amplitude. Pacing also influenced conduction velocity and voltage amplitude, with varying effects observed based on the pacing location within different heart cavities. Pacing from the right atrium (RA) decreased CV in all heart cavities. The overall CV and VA changes in the whole heart cavities were not uniformly reflected in all subregions and subregional CV and VA changes were not always reflected in the overall analysis. Overall, there was a notable variability in absolute CV and VA changes attributed to pacing. CONCLUSIONS: Heart rate and pacing location influence CV and VA within healthy juvenile pig hearts. Subregion analysis suggests that specific regions of the heart cavities are more susceptible to pacing. High-resolution mapping aids in detecting regional changes, emphasizing the substantial physiological variations in CV and VA.
RESUMO
BACKGROUND: Ultra-high-density mapping systems allow more precise measurement of the heart chambers at corresponding conduction velocities (CVs) and voltage amplitudes (VAs). Our aim for this study was to define and compare a basic value set for unipolar CV and VA in all four heart chambers and their separate walls in healthy, juvenile porcine hearts using ultra-high-density mapping. METHODS: We used the Rhythmia Mapping System to create electroanatomical maps of four pig hearts in sinus rhythm. CVs and VAs were calculated for chambers and wall segments with overlapping circular areas (radius of 5 mm). RESULTS: We analysed 21 maps with a resolution of 1.4 points/mm2. CVs were highest in the left atrium (LA), followed by the left ventricle (LV), right ventricle (RV), and right atrium (RA). As for VA, LV was highest, followed by RV, LA, and RA. The left chambers had a higher overall CV and VA than the right. Within the chambers, CV varied more in the right than in the left chambers, and VA varied in the ventricles but not in the atria. There was a slightly positive correlation between CVs and VAs at velocity values of <1.5 m/s. CONCLUSIONS: In healthy porcine hearts, the left chambers showed higher VAs and CVs than the right. CV differs mainly within the right chambers and VA differs only within the ventricles. A slightly positive linear correlation was found between slow CVs and low VAs.
RESUMO
Type I interferons (IFN) are central players in the pathogenesis of systemic lupus erythematosus (SLE) and the up-regulation of interferon-stimulated genes (ISGs) in SLE patients is subjected to increasing scrutiny as for its use in diagnosis, stratification and monitoring of SLE patients. Determinants of this immunological phenomenon are yet to be fully charted. The purpose of this systematic review was to characterize expressions of ISGs in blood of SLE patients and to analyze if they associated with core demographic and clinical features of SLE. Twenty cross-sectional, case-control studies comprising 1033 SLE patients and 602 study controls could be included. ISG fold-change expression values (SLE vs controls), demographic and clinical data were extracted from the published material and analyzed by hierarchical cluster analysis and generalized linear modelling. ISG expression varied substantially within each study with IFI27, IFI44, IFI44L, IFIT4 and RSAD2, being the top-five upregulated ISGs. Analysis of inter-study variation showed that IFI27, IFI44, IFI44L, IFIT1, PRKR and RSAD2 expression clustered with the fraction of SLE cases having African ancestry or lupus nephritis. Generalized linear models adjusted for prevalence of lupus nephritis and usage of hydroxychloroquine confirmed the observed association between African ancestry and IFI27, IFI44L, IFIT1, PRKR and RSAD2, whereas disease activity was associated with expression of IFI27 and RNASE2. In conclusion, this systematic review revealed that expression of ISGs often used for deriving an IFN signature in SLE patients were influenced by African ancestry rather than disease activity. This underscores the necessity of taking ancestry into account when employing the IFN signature for clinical research in SLE.
Assuntos
Expressão Gênica , Interferon Tipo I/metabolismo , Lúpus Eritematoso Sistêmico/genética , Proteínas Adaptadoras de Transdução de Sinal/genética , População Negra/genética , Estudos Transversais , Neurotoxina Derivada de Eosinófilo/genética , Humanos , Interferon Tipo I/genética , Peptídeos e Proteínas de Sinalização Intracelular/genética , Lúpus Eritematoso Sistêmico/sangue , Nefrite Lúpica , Proteínas de Membrana/genética , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/genética , Proteínas de Ligação a RNA/genéticaRESUMO
Purpose The purpose of this study was to assess the diagnostic performance of dual-energy CT angiography (DE-CTA) in patients with symptomatic peripheral artery occlusive disease (PAOD) and to identify factors that impede its diagnostic accuracy. Materials and Methods Dual-source DE-CTA scans of the lower extremities of 94 patients were retrospectively compared to the diagnostic reference standard, digital subtraction angiography (DSA). Two independent observers assessed PAOD incidence, image quality, artifacts, and diagnostic accuracy of DE-CTA in 1014 arterial segments on axial, combined 80/140 kVp reconstructions and on 3âD maximum intensity projections (MIP) after automated bone and plaque removal. The impact of calcifications, image quality, and image artifacts on the diagnostic accuracy was evaluated using Fisher's exact test. Furthermore, interobserver agreement was analyzed. Results Two observers achieved sensitivities of 98.0â% and 93.9â%, respectively, and specificities of 75.0â% and 66.7â%, respectively, for detecting stenoses of >â50â% of the lower extremity arteries. Calcifications impeded specificity, e.âg. from 81.2â% to 46.2â% for reader 1 (pâ<â0.001). Specificity increased with higher image quality, e.âg. from 70.0â% to 76.4â% for reader 1 (pâ<â0.001). Artifacts decreased the specificity of reader 2 (pâ<â0.001). The overall interobserver agreement ranged between moderate and substantial for stenosis detection and calcified plaques. Conclusion DE-CTA is accurate in the detection of arterial stenoses of >â50â% in symptomatic PAOD patients. Calcified atherosclerotic plaques, image quality, and artifacts may impede specificity. Key Points: · Sensitivities of DE-CTA were 98.0 and 93.9â%, specificities 75.0â% and 66.7â%.. · Interobserver agreement was moderate to substantial for stenosis and plaque detection.. · Calcified atherosclerotic plaques, image quality, and artifacts may impede specificity.. Citation Format · Klink T, Wilhelm T, Roth C etâal. Dual-Energy CTA in Patients with Symptomatic Peripheral Arterial Occlusive Disease: Study of Diagnostic Accuracy and Impeding Factors. Fortschr Röntgenstr 2017; 189: 441â-â452.