Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Radiology ; 310(3): e231593, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38530171

RESUMO

Background The complex medical terminology of radiology reports may cause confusion or anxiety for patients, especially given increased access to electronic health records. Large language models (LLMs) can potentially simplify radiology report readability. Purpose To compare the performance of four publicly available LLMs (ChatGPT-3.5 and ChatGPT-4, Bard [now known as Gemini], and Bing) in producing simplified radiology report impressions. Materials and Methods In this retrospective comparative analysis of the four LLMs (accessed July 23 to July 26, 2023), the Medical Information Mart for Intensive Care (MIMIC)-IV database was used to gather 750 anonymized radiology report impressions covering a range of imaging modalities (MRI, CT, US, radiography, mammography) and anatomic regions. Three distinct prompts were employed to assess the LLMs' ability to simplify report impressions. The first prompt (prompt 1) was "Simplify this radiology report." The second prompt (prompt 2) was "I am a patient. Simplify this radiology report." The last prompt (prompt 3) was "Simplify this radiology report at the 7th grade level." Each prompt was followed by the radiology report impression and was queried once. The primary outcome was simplification as assessed by readability score. Readability was assessed using the average of four established readability indexes. The nonparametric Wilcoxon signed-rank test was applied to compare reading grade levels across LLM output. Results All four LLMs simplified radiology report impressions across all prompts tested (P < .001). Within prompts, differences were found between LLMs. Providing the context of being a patient or requesting simplification at the seventh-grade level reduced the reading grade level of output for all models and prompts (except prompt 1 to prompt 2 for ChatGPT-4) (P < .001). Conclusion Although the success of each LLM varied depending on the specific prompt wording, all four models simplified radiology report impressions across all modalities and prompts tested. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Rahsepar in this issue.


Assuntos
Confusão , Radiologia , Humanos , Estudos Retrospectivos , Bases de Dados Factuais , Idioma
2.
Yale J Biol Med ; 97(1): 17-27, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38559461

RESUMO

Enhanced health literacy in children has been empirically linked to better health outcomes over the long term; however, few interventions have been shown to improve health literacy. In this context, we investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children. We tested pediatric conditions using 26 different prompts in ChatGPT-3.5, ChatGPT-4, Microsoft Bing, and Google Bard (now known as Google Gemini). The primary outcome measurement was the reading grade level (RGL) of output as assessed by Gunning Fog, Flesch-Kincaid Grade Level, Automated Readability Index, and Coleman-Liau indices. Word counts were also assessed. Across all models, output for basic prompts such as "Explain" and "What is (are)," were at, or exceeded, the tenth-grade RGL. When prompts were specified to explain conditions from the first- to twelfth-grade level, we found that LLMs had varying abilities to tailor responses based on grade level. ChatGPT-3.5 provided responses that ranged from the seventh-grade to college freshmen RGL while ChatGPT-4 outputted responses from the tenth-grade to the college senior RGL. Microsoft Bing provided responses from the ninth- to eleventh-grade RGL while Google Bard provided responses from the seventh- to tenth-grade RGL. LLMs face challenges in crafting outputs below a sixth-grade RGL. However, their capability to modify outputs above this threshold, provides a potential mechanism for adolescents to explore, understand, and engage with information regarding their health conditions, spanning from simple to complex terms. Future studies are needed to verify the accuracy and efficacy of these tools.


Assuntos
Letramento em Saúde , Adolescente , Criança , Humanos , Estudos Transversais , Compreensão , Leitura , Idioma
4.
Clin Imaging ; 109: 110113, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38552383

RESUMO

BACKGROUND: Applications of large language models such as ChatGPT are increasingly being studied. Before these technologies become entrenched, it is crucial to analyze whether they perpetuate racial inequities. METHODS: We asked Open AI's ChatGPT-3.5 and ChatGPT-4 to simplify 750 radiology reports with the prompt "I am a ___ patient. Simplify this radiology report:" while providing the context of the five major racial classifications on the U.S. census: White, Black or African American, American Indian or Alaska Native, Asian, and Native Hawaiian or other Pacific Islander. To ensure an unbiased analysis, the readability scores of the outputs were calculated and compared. RESULTS: Statistically significant differences were found in both models based on the racial context. For ChatGPT-3.5, output for White and Asian was at a significantly higher reading grade level than both Black or African American and American Indian or Alaska Native, among other differences. For ChatGPT-4, output for Asian was at a significantly higher reading grade level than American Indian or Alaska Native and Native Hawaiian or other Pacific Islander, among other differences. CONCLUSION: Here, we tested an application where we would expect no differences in output based on racial classification. Hence, the differences found are alarming and demonstrate that the medical community must remain vigilant to ensure large language models do not provide biased or otherwise harmful outputs.


Assuntos
Idioma , Radiologia , Humanos , Estados Unidos
5.
Asian Spine J ; 17(1): 96-108, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35989505

RESUMO

STUDY DESIGN: This was a retrospective cohort study. PURPOSE: This study investigated the influence of preoperative mental health on patient-reported outcome measures (PROMs) and minimal clinically important difference (MCID) among workers' compensation (WC) recipients undergoing minimally invasive transforaminal lumbar interbody fusion (MIS TLIF). OVERVIEW OF LITERATURE: No studies have evaluated the impact of preoperative mental functioning on outcomes following MIS TLIF among WC claimants. METHODS: WC recipients undergoing single-level MIS TLIF were identified. PROMs of Visual Analog Scale (VAS) for back and leg pain, Oswestry Disability Index (ODI), 12-item Short Form Physical and Mental Composite Scale (SF-12 PCS/MCS), and Patient-Reported Outcomes Measurement Information System Physical Function evaluated subjects preoperatively/postoperatively. Subjects were grouped according to preoperative SF-12 MCS: <41 vs. ≥41. Demographic/perioperative variables, PROMs, and MCID were compared using inferential statistics. Multiple regression was used to account for differences in spinal pathology. RESULTS: The SF-12 MCS <41 and SF-12 MCS ≥41 groups included 48 and 45 patients, respectively. Significant differences in ΔPROMs were observed at SF-12 MCS at all timepoints, except at 6 months (p≤0.041, all). The SF-12 MCS <41 group had worse preoperative to 6-months SF-12 MCS, 12-weeks/6-months VAS back, 12-week VAS leg, and preoperative to 6-months ODI (p≤0.029, all). The SF-12 MCS <41 group had greater MCID achievement for overall ODI and 6-weeks/1-year/overall SF-12 MCS (p≤0.043, all); the SF-12 MCS ≥41 group had greater attainment for 6-month VAS back (p=0.004). CONCLUSIONS: Poorer mental functioning adversely affected the baseline and intermediate postoperative quality-of-life outcomes pertaining to mental health, back pain, and disability among WC recipients undergoing lumbar fusion. However, outcomes did not differ 1-2 years after surgery. While MCID achievement for pain and physical function was largely unaffected by preoperative mental health score, WC recipients with poorer baseline mental health demonstrated higher rates of overall clinically meaningful improvements for disability and mental health.

6.
World Neurosurg ; 165: e337-e345, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35718277

RESUMO

OBJECTIVE: To compare patient-reported outcome measure (PROM) scores and minimum clinically important difference (MCID) achievement rates among patients undergoing single-level anterior cervical discectomy and fusion (ACDF) in patients with varying severity of preoperative visual analog scale (VAS) neck score. METHODS: Patients with ACDF were grouped: severity of preoperative VAS neck score ≤8 or >8. Demographic/perioperative variables and PROMs (Patient-Reported Outcomes Measurement Information System Physical Function [PROMIS PF] score, 12-Item Short Form [SF-12] Mental Component Score [MCS], VAS neck/arm score, and Neck Disability Index [NDI]) were collected preoperatively/postoperatively. MCID attainment comparison by grouping was evaluated using χ2 analysis. RESULTS: A total of 137 patients were included (103 VAS neck preoperative score ≤8; 34 VAS neck preoperative score >8). The VAS neck preoperative score ≤8 cohort did not improve: 6 weeks PROMIS-PF score, 6 weeks SF-12 Physical Component Score [PCS], 12 weeks/1 year/2 years SF-12 MCS, 2 years VAS neck score, and 1 years/2 years VAS arm score (P ≤ 0.015, all). VAS neck preoperative score >8 did not improve: 6 weeks/12 weeks/2 years PROMIS-PF score, all time points SF-12 PCS, 6 weeks/12 weeks/1 year/2 years SF-12 MCS, and 2 years VAS arm score (P ≤ 0.013, all). VAS neck preoperative score >8 had inferior PROMIS-PF scores all time points except 1 year (P ≤ 0.036, all), lower SF-12 PCS 6 weeks/6 months (P ≤ 0.043, both), inferior SF-12 MCS at preoperative to 6 months (P ≤ 006, all), higher VAS neck score from preoperative to 6 months (P ≤ 0.018), higher VAS arm score preoperative/12 weeks/6 months (P ≤ 0.020, all), and higher NDI at preoperative/12 weeks/6 months (P ≤ 0.030, all). MCID attainment rates for VAS neck preoperative score >8 were greater for NDI 2 years (P = 0.040), lower for PROMIS-PF score 2 years, and overall (P = 0.018), lower for SF-12 MCS 12 weeks (P = 0.046), lower for VAS neck score 12 weeks to 1 year and overall (P ≤ 0.032, all), and lower for VAS arm score 6 weeks/1 year (P ≤ 0.030, both). CONCLUSIONS: Patients with single-level ACDF presenting with greater baseline neck pain showed poorer physical function/pain/disability/mental health at preoperative/intermediate postoperative time points, but had comparable long-term PROMs by 2 years. MCID attainment was lower among patients with greater preoperative neck pain; MCID among the VAS neck score >8 cohort were only significantly inferior for neck pain.


Assuntos
Cervicalgia , Fusão Vertebral , Vértebras Cervicais/cirurgia , Discotomia , Humanos , Cervicalgia/cirurgia , Medidas de Resultados Relatados pelo Paciente , Resultado do Tratamento
7.
Clin Imaging ; 111: 110173, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38735100
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA