Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
J Med Internet Res ; 25: e49324, 2023 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-37902826

RESUMO

BACKGROUND: As advancements in artificial intelligence (AI) continue, large language models (LLMs) have emerged as promising tools for generating medical information. Their rapid adaptation and potential benefits in health care require rigorous assessment in terms of the quality, accuracy, and safety of the generated information across diverse medical specialties. OBJECTIVE: This study aimed to evaluate the performance of 4 prominent LLMs, namely, Claude-instant-v1.0, GPT-3.5-Turbo, Command-xlarge-nightly, and Bloomz, in generating medical content spanning the clinical specialties of ophthalmology, orthopedics, and dermatology. METHODS: Three domain-specific physicians evaluated the AI-generated therapeutic recommendations for a diverse set of 60 diseases. The evaluation criteria involved the mDISCERN score, correctness, and potential harmfulness of the recommendations. ANOVA and pairwise t tests were used to explore discrepancies in content quality and safety across models and specialties. Additionally, using the capabilities of OpenAI's most advanced model, GPT-4, an automated evaluation of each model's responses to the diseases was performed using the same criteria and compared to the physicians' assessments through Pearson correlation analysis. RESULTS: Claude-instant-v1.0 emerged with the highest mean mDISCERN score (3.35, 95% CI 3.23-3.46). In contrast, Bloomz lagged with the lowest score (1.07, 95% CI 1.03-1.10). Our analysis revealed significant differences among the models in terms of quality (P<.001). Evaluating their reliability, the models displayed strong contrasts in their falseness ratings, with variations both across models (P<.001) and specialties (P<.001). Distinct error patterns emerged, such as confusing diagnoses; providing vague, ambiguous advice; or omitting critical treatments, such as antibiotics for infectious diseases. Regarding potential harm, GPT-3.5-Turbo was found to be the safest, with the lowest harmfulness rating. All models lagged in detailing the risks associated with treatment procedures, explaining the effects of therapies on quality of life, and offering additional sources of information. Pearson correlation analysis underscored a substantial alignment between physician assessments and GPT-4's evaluations across all established criteria (P<.01). CONCLUSIONS: This study, while comprehensive, was limited by the involvement of a select number of specialties and physician evaluators. The straightforward prompting strategy ("How to treat…") and the assessment benchmarks, initially conceptualized for human-authored content, might have potential gaps in capturing the nuances of AI-driven information. The LLMs evaluated showed a notable capability in generating valuable medical content; however, evident lapses in content quality and potential harm signal the need for further refinements. Given the dynamic landscape of LLMs, this study's findings emphasize the need for regular and methodical assessments, oversight, and fine-tuning of these AI tools to ensure they produce consistently trustworthy and clinically safe medical advice. Notably, the introduction of an auto-evaluation mechanism using GPT-4, as detailed in this study, provides a scalable, transferable method for domain-agnostic evaluations, extending beyond therapy recommendation assessments.


Assuntos
Inteligência Artificial , Medicina , Humanos , Qualidade de Vida , Reprodutibilidade dos Testes , Idioma
2.
NPJ Digit Med ; 7(1): 205, 2024 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-39112822

RESUMO

This study evaluates multimodal AI models' accuracy and responsiveness in answering NEJM Image Challenge questions, juxtaposed with human collective intelligence, underscoring AI's potential and current limitations in clinical diagnostics. Anthropic's Claude 3 family demonstrated the highest accuracy among the evaluated AI models, surpassing the average human accuracy, while collective human decision-making outperformed all AI models. GPT-4 Vision Preview exhibited selectivity, responding more to easier questions with smaller images and longer questions.

3.
J Pers Med ; 14(5)2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38793055

RESUMO

BACKGROUND: Understanding the dynamics of conduction velocity (CV) and voltage amplitude (VA) is crucial in cardiac electrophysiology, particularly for substrate-based catheter ablations targeting slow conduction zones and low voltage areas. This study utilizes ultra-high-density mapping to investigate the impact of heart rate and pacing location on changes in the wavefront direction, CV, and VA of healthy pig hearts. METHODS: We conducted in vivo electrophysiological studies on four healthy juvenile pigs, involving various pacing locations and heart rates. High-resolution electroanatomic mapping was performed during intrinsic normal sinus rhythm (NSR) and electrical pacing. The study encompassed detailed analyses at three levels: entire heart cavities, subregions, and localized 5-mm-diameter circular areas. Linear mixed-effects models were used to analyze the influence of heart rate and pacing location on CV and VA in different regions. RESULTS: An increase in heart rate correlated with an increase in conduction velocity and a decrease in voltage amplitude. Pacing influenced conduction velocity and voltage amplitude. Pacing also influenced conduction velocity and voltage amplitude, with varying effects observed based on the pacing location within different heart cavities. Pacing from the right atrium (RA) decreased CV in all heart cavities. The overall CV and VA changes in the whole heart cavities were not uniformly reflected in all subregions and subregional CV and VA changes were not always reflected in the overall analysis. Overall, there was a notable variability in absolute CV and VA changes attributed to pacing. CONCLUSIONS: Heart rate and pacing location influence CV and VA within healthy juvenile pig hearts. Subregion analysis suggests that specific regions of the heart cavities are more susceptible to pacing. High-resolution mapping aids in detecting regional changes, emphasizing the substantial physiological variations in CV and VA.

4.
J Clin Med ; 12(17)2023 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-37685665

RESUMO

BACKGROUND: Ultra-high-density mapping systems allow more precise measurement of the heart chambers at corresponding conduction velocities (CVs) and voltage amplitudes (VAs). Our aim for this study was to define and compare a basic value set for unipolar CV and VA in all four heart chambers and their separate walls in healthy, juvenile porcine hearts using ultra-high-density mapping. METHODS: We used the Rhythmia Mapping System to create electroanatomical maps of four pig hearts in sinus rhythm. CVs and VAs were calculated for chambers and wall segments with overlapping circular areas (radius of 5 mm). RESULTS: We analysed 21 maps with a resolution of 1.4 points/mm2. CVs were highest in the left atrium (LA), followed by the left ventricle (LV), right ventricle (RV), and right atrium (RA). As for VA, LV was highest, followed by RV, LA, and RA. The left chambers had a higher overall CV and VA than the right. Within the chambers, CV varied more in the right than in the left chambers, and VA varied in the ventricles but not in the atria. There was a slightly positive correlation between CVs and VAs at velocity values of <1.5 m/s. CONCLUSIONS: In healthy porcine hearts, the left chambers showed higher VAs and CVs than the right. CV differs mainly within the right chambers and VA differs only within the ventricles. A slightly positive linear correlation was found between slow CVs and low VAs.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA