RESUMEN
BACKGROUND: Frailty is an important predictor of health outcomes, characterized by increased vulnerability due to physiological decline. The Clinical Frailty Scale (CFS) is commonly used for frailty assessment but may be influenced by rater bias. Use of artificial intelligence (AI), particularly Large Language Models (LLMs) offers a promising method for efficient and reliable frailty scoring. METHODS: The study utilized seven standardized patient scenarios to evaluate the consistency and reliability of CFS scoring by OpenAI's GPT-3.5-turbo model. Two methods were tested: a basic prompt and an instruction-tuned prompt incorporating CFS definition, a directive for accurate responses, and temperature control. The outputs were compared using the Mann-Whitney U test and Fleiss' Kappa for inter-rater reliability. The outputs were compared with historic human scores of the same scenarios. RESULTS: The LLM's median scores were similar to human raters, with differences of no more than one point. Significant differences in score distributions were observed between the basic and instruction-tuned prompts in five out of seven scenarios. The instruction-tuned prompt showed high inter-rater reliability (Fleiss' Kappa of 0.887) and produced consistent responses in all scenarios. Difficulty in scoring was noted in scenarios with less explicit information on activities of daily living (ADLs). CONCLUSIONS: This study demonstrates the potential of LLMs in consistently scoring clinical frailty with high reliability. It demonstrates that prompt engineering via instruction-tuning can be a simple but effective approach for optimizing LLMs in healthcare applications. The LLM may overestimate frailty scores when less information about ADLs is provided, possibly as it is less subject to implicit assumptions and extrapolation than humans. Future research could explore the integration of LLMs in clinical research and frailty-related outcome prediction.
RESUMEN
BACKGROUND: Discharge letters are a critical component in the continuity of care between specialists and primary care providers. However, these letters are time-consuming to write, underprioritized in comparison to direct clinical care, and are often tasked to junior doctors. Prior studies assessing the quality of discharge summaries written for inpatient hospital admissions show inadequacies in many domains. Large language models such as GPT have the ability to summarize large volumes of unstructured free text such as electronic medical records and have the potential to automate such tasks, providing time savings and consistency in quality. OBJECTIVE: The aim of this study was to assess the performance of GPT-4 in generating discharge letters written from urology specialist outpatient clinics to primary care providers and to compare their quality against letters written by junior clinicians. METHODS: Fictional electronic records were written by physicians simulating 5 common urology outpatient cases with long-term follow-up. Records comprised simulated consultation notes, referral letters and replies, and relevant discharge summaries from inpatient admissions. GPT-4 was tasked to write discharge letters for these cases with a specified target audience of primary care providers who would be continuing the patient's care. Prompts were written for safety, content, and style. Concurrently, junior clinicians were provided with the same case records and instructional prompts. GPT-4 output was assessed for instances of hallucination. A blinded panel of primary care physicians then evaluated the letters using a standardized questionnaire tool. RESULTS: GPT-4 outperformed human counterparts in information provision (mean 4.32, SD 0.95 vs 3.70, SD 1.27; P=.03) and had no instances of hallucination. There were no statistically significant differences in the mean clarity (4.16, SD 0.95 vs 3.68, SD 1.24; P=.12), collegiality (4.36, SD 1.00 vs 3.84, SD 1.22; P=.05), conciseness (3.60, SD 1.12 vs 3.64, SD 1.27; P=.71), follow-up recommendations (4.16, SD 1.03 vs 3.72, SD 1.13; P=.08), and overall satisfaction (3.96, SD 1.14 vs 3.62, SD 1.34; P=.36) between the letters generated by GPT-4 and humans, respectively. CONCLUSIONS: Discharge letters written by GPT-4 had equivalent quality to those written by junior clinicians, without any hallucinations. This study provides a proof of concept that large language models can be useful and safe tools in clinical documentation.
Asunto(s)
Alta del Paciente , Humanos , Alta del Paciente/normas , Registros Electrónicos de Salud/normas , Método Simple Ciego , LenguajeRESUMEN
BACKGROUND AND AIM: Colonoscopy is commonly used in screening and surveillance for colorectal cancer. Multiple different guidelines provide recommendations on the interval between colonoscopies. This can be challenging for non-specialist healthcare providers to navigate. Large language models like ChatGPT are a potential tool for parsing patient histories and providing advice. However, the standard GPT model is not designed for medical use and can hallucinate. One way to overcome these challenges is to provide contextual information with medical guidelines to help the model respond accurately to queries. Our study compares the standard GPT4 against a contextualized model provided with relevant screening guidelines. We evaluated whether the models could provide correct advice for screening and surveillance intervals for colonoscopy. METHODS: Relevant guidelines pertaining to colorectal cancer screening and surveillance were formulated into a knowledge base for GPT. We tested 62 example case scenarios (three times each) on standard GPT4 and on a contextualized model with the knowledge base. RESULTS: The contextualized GPT4 model outperformed the standard GPT4 in all domains. No high-risk features were missed, and only two cases had hallucination of additional high-risk features. A correct interval to colonoscopy was provided in the majority of cases. Guidelines were appropriately cited in almost all cases. CONCLUSIONS: A contextualized GPT4 model could identify high-risk features and quote appropriate guidelines without significant hallucination. It gave a correct interval to the next colonoscopy in the majority of cases. This provides proof of concept that ChatGPT with appropriate refinement can serve as an accurate physician assistant.
Asunto(s)
Colonoscopía , Neoplasias Colorrectales , Humanos , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/prevención & control , Neoplasias Colorrectales/epidemiología , Factores de Riesgo , Detección Precoz del Cáncer , AlucinacionesRESUMEN
BACKGROUND Electrocardiography (ECG) may be performed as part of preparticipation sports screening. Recommendations on screening of athletes to identify individuals with previously unrecognized cardiac disease are robust; however, data guiding the preparticipation screening of unselected populations are scarce. T wave inversion (TWI) on ECG may suggest an undiagnosed cardiomyopathy. This study aims to describe the prevalence of abnormal TWI in an unselected young male cohort and the outcomes of an echocardiography-guided approach to investigating these individuals for structural heart diseases, focusing on the yield for cardiomyopathies. METHODS AND RESULTS Consecutive young male individuals undergoing a national preparticipation cardiac screening program for 39 months were studied. All underwent resting supine 12-lead ECG. Those manifesting abnormal TWI, defined as negatively deflected T waves of at least 0.1 mV amplitude in any 2 contiguous leads, underwent echocardiography. A total of 69 714 male individuals with a mean age of 17.9±1.1 years were studied. Of the individuals, 562 (0.8%) displayed abnormal TWI. This was most frequently observed in the anterior territory and least so in the lateral territory. A total of 12 individuals (2.1%) were diagnosed with a cardiomyopathy. Cardiomyopathy diagnoses were significantly associated with deeper maximum TWI depth and the presence of abnormal TWI in the lateral territory, but not with abnormal TWI in the anterior and inferior territories. No individual presenting with TWI restricted to solely leads V1 to V2, 2 inferior leads or both was diagnosed with a cardiomyopathy. CONCLUSIONS Cardiomyopathy diagnoses were more strongly associated with certain patterns of abnormal TWI. Our findings may support decisions to prioritize echocardiography in these individuals.
Asunto(s)
Cardiomiopatías , Ecocardiografía , Cardiopatías , Adolescente , Adulto , Humanos , Masculino , Adulto Joven , Arritmias Cardíacas/diagnóstico , Cardiomiopatías/diagnóstico , Electrocardiografía/métodos , CorazónRESUMEN
BACKGROUND: The optimal management of patients with end-stage renal disease (ESRD) on dialysis with severe coronary artery disease (CAD) has not been determined. METHODS: Between 2013 and 2017, all patients with ESRD on dialysis who had left main (LM) disease, triple vessel disease (TVD) and/or severe CAD for consideration of coronary artery bypass graft (CABG) were included. Patients were divided into 3 groups based on final treatment modality: CABG, percutaneous coronary intervention (PCI), optimal medical therapy (OMT). Outcome measures include in-hospital, 180-day, 1-year and overall mortality and major adverse cardiac events (MACE). RESULTS: In total, 418 patients were included (CABG 11.0%, PCI 65.6%, OMT 23.4%). Overall, 1-year mortality and MACE rates were 27.5% and 55.0% respectively. Patients who underwent CABG were significantly younger, more likely to have LM disease and have no prior heart failure. In this non-randomized setting, treatment modality did not impact on 1-year mortality, although the CABG group had significantly lower 1-year MACE rates (CABG 32.6%, PCI 57.3%, OMT 59.2%; CABG vs. OMT p < 0.01, CABG vs. PCI p < 0.001). Independent predictors of overall mortality include STEMI presentation (HR 2.31, 95% CI 1.38-3.86), prior heart failure (HR 1.84, 95% CI 1.22-2.75), LM disease (HR 1.71, 95% CI 1.26-2.31), NSTE-ACS presentation (HR 1.40, 95% CI 1.03-1.91) and increased age (HR 1.02, 95% CI 1.01-1.04). CONCLUSION: Treatment decisions for patients with severe CAD with ESRD on dialysis are complex. Understanding independent predictors of mortality and MACE in specific treatment subgroups may provide valuable insights into the selection of optimal treatment options.
Asunto(s)
Enfermedad de la Arteria Coronaria , Insuficiencia Cardíaca , Fallo Renal Crónico , Intervención Coronaria Percutánea , Humanos , Enfermedad de la Arteria Coronaria/complicaciones , Enfermedad de la Arteria Coronaria/diagnóstico , Enfermedad de la Arteria Coronaria/cirugía , Diálisis Renal , Intervención Coronaria Percutánea/efectos adversos , Resultado del Tratamiento , Fallo Renal Crónico/epidemiología , Fallo Renal Crónico/terapia , Insuficiencia Cardíaca/etiologíaRESUMEN
Introduction: Elevated low-density lipoprotein cholesterol (LDL-C) is an important risk factor for atherosclerotic cardiovascular disease (ASCVD). Direct LDL-C measurement is not widely performed. LDL-C is routinely calculated using the Friedewald equation (FLDL), which is inaccurate at high triglyceride (TG) or low LDL-C levels. We aimed to compare this routine method with other estimation methods in patients with type 2 diabetes mellitus (T2DM), who typically have elevated TG levels and ASCVD risk. Method: We performed a retrospective cohort study on T2DM patients from a multi-institutional diabetes registry in Singapore from 2013 to 2020. LDL-C values estimated by the equations: FLDL, Martin/Hopkins (MLDL) and Sampson (SLDL) were compared using measures of agreement and correlation. Subgroup analysis comparing estimated LDL-C with directly measured LDL-C (DLDL) was conducted in patients from a single institution. Estimated LDL-C was considered discordant if LDL-C was <1.8mmol/L for the index equation and ≥1.8mmol/L for the comparator. Results: A total of 154,877 patients were included in the final analysis, and 11,475 patients in the subgroup analysis. All 3 equations demonstrated strong overall correlation and goodness-of-fit. Discordance was 4.21% for FLDL-SLDL and 6.55% for FLDL-MLDL. In the subgroup analysis, discordance was 21.57% for DLDL-FLDL, 17.31% for DLDL-SLDL and 14.44% for DLDL-MLDL. All discordance rates increased at TG levels >4.5mmol/L. Conclusion: We demonstrated strong correlations between newer methods of LDL-C estimation, FLDL, and DLDL. At higher TG concentrations, no equation performed well. The Martin/Hopkins equation had the least discordance with DLDL, and may minimise misclassification compared with the FLDL and SLDL.
Asunto(s)
LDL-Colesterol , Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/sangre , LDL-Colesterol/sangre , Estudios Retrospectivos , Masculino , Femenino , Persona de Mediana Edad , Singapur/epidemiología , Anciano , Triglicéridos/sangre , Aterosclerosis/sangre , Sistema de RegistrosRESUMEN
Resumo Fundamento A síndrome de Wolff-Parkinson-White (WPW) é uma condição pró-arrítmica que pode exigir restrição de atividades extenuantes e é caracterizada por sinais de ECG, incluindo ondas delta. Observamos casos de padrões intermitentes de WPW apresentando-se como QRS alternante ('WPW alternante') em uma grande coorte de triagem de ECG pré-participação de homens jovens que se candidataram ao recrutamento militar. Objetivos Nosso objetivo foi determinar o padrão de WPW alternante, as características do caso e a prevalência de outros diagnósticos diferenciais relevantes apresentando-se como alternância de QRS em um ambiente de pré-participação. Métodos Cento e vinte e cinco mil cento e cinquenta e oito recrutas militares do sexo masculino prospectivos foram revisados de janeiro de 2016 a dezembro de 2019. Uma revisão de prontuários médicos eletrônicos identificou casos de WPW alternante e padrões ou síndrome de WPW. A revisão de prontuários médicos eletrônicos identificou casos de diagnósticos diferenciais relevantes que podem causar alternância de QRS. Resultados Quatro indivíduos (2,2%) apresentaram WPW alternante em 184 indivíduos com diagnóstico final de padrão ou síndrome de WPW. Dois desses indivíduos manifestaram sintomas ou achados eletrocardiográficos compatíveis com taquicardia supraventricular. A prevalência geral de WPW alternante foi de 0,003%, e a prevalência de WPW foi de 0,147%. As WPW alternantes representaram 8,7% dos indivíduos com QRS alternantes, e QRS alternantes tiveram prevalência de 0,037% em toda a população. Conclusões A WPW alternante é uma variante da WPW intermitente, que compreendeu 2,2% dos casos de WPW em nossa coorte de triagem pré-participação. Não indica necessariamente um baixo risco de taquicardia supraventricular. Deve ser reconhecido na triagem de ECG e distinguido de outras patologias que também apresentam QRS alternantes.
Abstract Background Wolff-Parkinson-White (WPW) syndrome is a proarrhythmic condition that may require restriction from strenuous activities and is characterized by ECG signs, including delta waves. We observed cases of intermittent WPW patterns presenting as QRS alternans ('WPW alternans') in a large pre-participation ECG screening cohort of young men reporting for military conscription. Objectives We aimed to determine the WPW alternans pattern, case characteristics, and the prevalence of other relevant differential diagnoses presenting as QRS alternans in a pre-participation setting. Methods One hundred twenty-five thousand one hundred fifty-eight prospective male military recruits were reviewed from January 2016 to December 2019. A review of electronic medical records identified cases of WPW alternans and WPW patterns or syndrome. Reviewing electronic medical records identified cases of relevant differential diagnoses that might cause QRS alternans. Results Four individuals (2.2%) had WPW alternans out of 184 individuals with a final diagnosis of WPW pattern or syndrome. Two of these individuals manifested symptoms or ECG findings consistent with supraventricular tachycardia. The overall prevalence of WPW alternans was 0.003%, and the prevalence of WPW was 0.147%. WPW alternans represented 8.7% of individuals presenting with QRS alternans, and QRS alternans had a prevalence of 0.037% in the entire population. Conclusions WPW alternans is a variant of intermittent WPW, which comprised 2.2% of WPW cases in our pre-participation screening cohort. It does not necessarily indicate a low risk for supraventricular tachycardia. It must be recognized at ECG screening and distinguished from other pathologies that also present with QRS alternans.
RESUMEN
BACKGROUND: Wolff-Parkinson-White (WPW) syndrome is a proarrhythmic condition that may require restriction from strenuous activities and is characterized by ECG signs, including delta waves. We observed cases of intermittent WPW patterns presenting as QRS alternans ('WPW alternans') in a large pre-participation ECG screening cohort of young men reporting for military conscription. OBJECTIVES: We aimed to determine the WPW alternans pattern, case characteristics, and the prevalence of other relevant differential diagnoses presenting as QRS alternans in a pre-participation setting. METHODS: One hundred twenty-five thousand one hundred fifty-eight prospective male military recruits were reviewed from January 2016 to December 2019. A review of electronic medical records identified cases of WPW alternans and WPW patterns or syndrome. Reviewing electronic medical records identified cases of relevant differential diagnoses that might cause QRS alternans. RESULTS: Four individuals (2.2%) had WPW alternans out of 184 individuals with a final diagnosis of WPW pattern or syndrome. Two of these individuals manifested symptoms or ECG findings consistent with supraventricular tachycardia. The overall prevalence of WPW alternans was 0.003%, and the prevalence of WPW was 0.147%. WPW alternans represented 8.7% of individuals presenting with QRS alternans, and QRS alternans had a prevalence of 0.037% in the entire population. CONCLUSIONS: WPW alternans is a variant of intermittent WPW, which comprised 2.2% of WPW cases in our pre-participation screening cohort. It does not necessarily indicate a low risk for supraventricular tachycardia. It must be recognized at ECG screening and distinguished from other pathologies that also present with QRS alternans.
FUNDAMENTO: A síndrome de Wolff-Parkinson-White (WPW) é uma condição pró-arrítmica que pode exigir restrição de atividades extenuantes e é caracterizada por sinais de ECG, incluindo ondas delta. Observamos casos de padrões intermitentes de WPW apresentando-se como QRS alternante ('WPW alternante') em uma grande coorte de triagem de ECG pré-participação de homens jovens que se candidataram ao recrutamento militar. OBJETIVOS: Nosso objetivo foi determinar o padrão de WPW alternante, as características do caso e a prevalência de outros diagnósticos diferenciais relevantes apresentando-se como alternância de QRS em um ambiente de pré-participação. MÉTODOS: Cento e vinte e cinco mil cento e cinquenta e oito recrutas militares do sexo masculino prospectivos foram revisados de janeiro de 2016 a dezembro de 2019. Uma revisão de prontuários médicos eletrônicos identificou casos de WPW alternante e padrões ou síndrome de WPW. A revisão de prontuários médicos eletrônicos identificou casos de diagnósticos diferenciais relevantes que podem causar alternância de QRS. RESULTADOS: Quatro indivíduos (2,2%) apresentaram WPW alternante em 184 indivíduos com diagnóstico final de padrão ou síndrome de WPW. Dois desses indivíduos manifestaram sintomas ou achados eletrocardiográficos compatíveis com taquicardia supraventricular. A prevalência geral de WPW alternante foi de 0,003%, e a prevalência de WPW foi de 0,147%. As WPW alternantes representaram 8,7% dos indivíduos com QRS alternantes, e QRS alternantes tiveram prevalência de 0,037% em toda a população. CONCLUSÕES: A WPW alternante é uma variante da WPW intermitente, que compreendeu 2,2% dos casos de WPW em nossa coorte de triagem pré-participação. Não indica necessariamente um baixo risco de taquicardia supraventricular. Deve ser reconhecido na triagem de ECG e distinguido de outras patologias que também apresentam QRS alternantes.
Asunto(s)
Taquicardia Supraventricular , Síndrome de Wolff-Parkinson-White , Humanos , Masculino , Diagnóstico Diferencial , Electrocardiografía , Estudios Prospectivos , Síndrome de Wolff-Parkinson-White/diagnósticoRESUMEN
BACKGROUND: Classical electrocardiographic (ECG) criteria for left ventricular hypertrophy (LVH) are well studied in older populations and patients with hypertension. Their utility in young pre-participation cohorts is unclear. AIMS: We aimed to develop machine learning models for detection of echocardiogram-diagnosed LVH from ECG, and compare these models with classical criteria. METHODS: Between November 2009 and December 2014, pre-participation screening ECG and subsequent echocardiographic data was collected from 17 310 males aged 16 to 23, who reported for medical screening prior to military conscription. A final diagnosis of LVH was made during echocardiography, defined by a left ventricular mass index >115 g/m2. The continuous and threshold forms of classical ECG criteria (Sokolow-Lyon, Romhilt-Estes, Modified Cornell, Cornell Product, and Cornell) were compared against machine learning models (Logistic Regression, GLMNet, Random Forests, Gradient Boosting Machines) using receiver-operating characteristics curve analysis. We also compared the important variables identified by machine learning models with the input variables of classical criteria. RESULTS: Prevalence of echocardiographic LVH in this population was 0.82% (143/17310). Classical ECG criteria had poor performance in predicting LVH. Machine learning methods achieved superior performance: Logistic Regression (area under the curve [AUC], 0.811; 95% confidence interval [CI], 0.738-0.884), GLMNet (AUC, 0.873; 95% CI, 0.817-0.929), Random Forest (AUC, 0.824; 95% CI, 0.749-0.898), Gradient Boosting Machines (AUC, 0.800; 95% CI, 0.738-0.862). CONCLUSIONS: Machine learning methods are superior to classical ECG criteria in diagnosing echocardiographic LVH in the context of pre-participation screening.
Asunto(s)
Hipertensión , Hipertrofia Ventricular Izquierda , Anciano , Ecocardiografía , Electrocardiografía , Humanos , Hipertrofia Ventricular Izquierda/diagnóstico por imagen , Aprendizaje Automático , MasculinoAsunto(s)
Síndrome de Brugada/diagnóstico , Adolescente , Síndrome de Brugada/epidemiología , Síndrome de Brugada/etnología , Electrocardiografía , Flecainida/efectos adversos , Flecainida/uso terapéutico , Humanos , Masculino , Prevalencia , Infarto del Miocardio con Elevación del ST/fisiopatología , Singapur/epidemiología , Bloqueadores de los Canales de Sodio/efectos adversos , Bloqueadores de los Canales de Sodio/uso terapéutico , Adulto JovenRESUMEN
INTRODUCTION: Empathy and burnout are two entities that are important in a physician's career. They are likely to relate to each other and can be heavily influenced by surrounding factors, such as medical education, local practices and cultural expectations. To our knowledge, empathy and burnout studies have not been performed in Singapore. This study was designed to evaluate empathy and burnout levels using the Jefferson Scale of Physician Empathy (JSPE) and Maslach Burnout Inventory (MBI) among residents in Singapore, and compare them with the United States (US) literature. METHODS: The JSPE, MBI and a self-designed questionnaire were completed by 446 trainees at a residency-sponsoring institution in Singapore. RESULTS: Residents in Singapore had lower empathy and higher rates of burnout compared to US literature. Physician empathy was associated with burnout: residents with higher empathy scores had higher personal accomplishment (p < 0.001, r = 0.477, r2 = 0.200); and lower emotional exhaustion (p < 0.001, r = 0.187, r2 = 0.035) and depersonalisation (p < 0.001, r = 0.321, r2 = 0.103) scores. CONCLUSION: Residents in Singapore had lower empathy and higher burnout scores compared to the US literature. Further research into the underlying cause is imperative to guide intervention.