RESUMO
BACKGROUND: Although patients have easy access to their electronic health records and laboratory test result data through patient portals, laboratory test results are often confusing and hard to understand. Many patients turn to web-based forums or question-and-answer (Q&A) sites to seek advice from their peers. The quality of answers from social Q&A sites on health-related questions varies significantly, and not all responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to have their questions answered. OBJECTIVE: We aimed to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to laboratory test-related questions asked by patients and identify potential issues that can be mitigated using augmentation approaches. METHODS: We collected laboratory test result-related Q&A data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from 5 LLMs: GPT-4, GPT-3.5, LLaMA 2, MedAlpaca, and ORCA_mini. We assessed the similarity of their answers using standard Q&A similarity-based evaluation metrics, including Recall-Oriented Understudy for Gisting Evaluation, Bilingual Evaluation Understudy, Metric for Evaluation of Translation With Explicit Ordering, and Bidirectional Encoder Representations from Transformers Score. We used an LLM-based evaluator to judge whether a target model had higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. We performed a manual evaluation with medical experts for all the responses to 7 selected questions on the same 4 aspects. RESULTS: Regarding the similarity of the responses from 4 LLMs; the GPT-4 output was used as the reference answer, the responses from GPT-3.5 were the most similar, followed by those from LLaMA 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored the lowest and, thus, as the least similar to GPT-4-generated answers. The results of the win rate and medical expert evaluation both showed that GPT-4's responses achieved better scores than all the other LLM responses and human responses on all 4 aspects (relevance, correctness, helpfulness, and safety). LLM responses occasionally also suffered from lack of interpretation in one's medical context, incorrect statements, and lack of references. CONCLUSIONS: By evaluating LLMs in generating responses to patients' laboratory test result-related questions, we found that, compared to other 4 LLMs and human answers from a Q&A website, GPT-4's responses were more accurate, helpful, relevant, and safer. There were cases in which GPT-4 responses were inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses, including prompt engineering, prompt augmentation, retrieval-augmented generation, and response evaluation.
Assuntos
Inteligência Artificial , Registros Eletrônicos de Saúde , Humanos , IdiomaRESUMO
INTRODUCTION: As patients, members of the public, and professional stakeholders engage in co-producing health-related research, an important issue to consider is trauma. Trauma is very common and associated with a wide range of physical and behavioural health conditions. Thus, it may benefit research partnerships to consider its impact on their stakeholders as well as its relevance to the health condition under study. The aims of this article are to describe the development and evaluation of a training programme that applied principles of trauma-informed care (TIC) to patient- and public-engaged research. METHODS: A research partnership focused on addressing trauma in primary care patients ('myPATH') explicitly incorporated TIC into its formation, governance document and collaborative processes, and developed and evaluated a free 3-credit continuing education online training. The training was presented by 11 partners (5 professionals, 6 patients) and included academic content and lived experiences. RESULTS: Training participants (N = 46) positively rated achievement of learning objectives and speakers' performance (ranging from 4.39 to 4.74 on a 5-point scale). The most salient themes from open-ended comments were that training was informative (n = 12) and that lived experiences shared by patient partners were impactful (n = 10). Suggestions were primarily technical or logistical. CONCLUSION: This preliminary evaluation indicates that it is possible to incorporate TIC principles into a research partnership's collaborative processes and training about these topics is well-received. Learning about trauma and TIC may benefit research partnerships that involve patients and public stakeholders studying a wide range of health conditions, potentially improving how stakeholders engage in co-producing research as well as producing research that addresses how trauma relates to their health condition under study. PATIENT OR PUBLIC CONTRIBUTION: The myPATH Partnership includes 22 individuals with professional and lived experiences related to trauma (https://www.usf.edu/cbcs/mhlp/centers/mypath/); nine partners were engaged due to personal experiences with trauma; other partners are community-based providers and researchers. All partners contributed ideas that led to trauma-informed research strategies and training. Eleven partners (5 professionals, 6 patients) presented the training, and 12 partners (8 professionals, 4 patients) contributed to this article and chose to be named as authors.
RESUMO
OBJECTIVES: In 2019, the Centers for Medicare & Medicaid Services began implementing the Patients Over Paperwork (POP) initiative in response to clinicians reporting burdensome documentation regulations. To date, no study has evaluated how these policy changes have influenced documentation burden. METHODS: Our data came from the electronic health records of an academic health system. Using quantile regression models, we assessed the association between the implementation of POP and clinical documentation word count using data from family medicine physicians in an academic health system from January 2017 to May 2021 inclusive. Studied quantiles included the 10th, 25th, 50th, 75th, and 90th quantiles. We controlled for patient-level (race/ethnicity, primary language, age, comorbidity burden), visit-level (primary payer, level of clinical decision making involved, whether a visit was done through telemedicine, whether a visit was for a new patient), and physician-level (sex) characteristics. RESULTS: We found that the POP initiative was associated with lower word counts across all of the quantiles. In addition, we found lower word counts among notes for private payers and telemedicine visits. Conversely, higher word counts were observed in notes that were written by female physicians, notes for new patient visits, and notes involving patients with greater comorbidity burden. CONCLUSIONS: Our initial evaluation suggests that documentation burden, as measured by word count, has declined over time, particularly following implementation of the POP in 2019. Additional research is needed to see whether the same occurs when examining other medical specialties, clinician types, and longer evaluation periods.
Assuntos
Medicina de Família e Comunidade , Médicos , Estados Unidos , Humanos , Idoso , Feminino , Medicare , Tomada de Decisão Clínica , DocumentaçãoRESUMO
Context: The COVID-19 pandemic required primary care practices to rapidly adapt cancer screening procedures to comply with changing guidelines and policies. Objective: This study sought to: 1) identify cancer screening barriers and facilitators during the COVID-19 pandemic; 2) describe cancer screening adaptations; and 3) provide recommendations. Study design: A qualitative study was conducted (n= 42) with primary care staff. Individual interviews were conducted through videoconference from August 2020 - April 2021 and recorded, transcribed, and analyzed for themes using NVivo 12 Plus. Setting: Primary care practices included federally qualified health centers, tribal health centers, rural health clinics, hospital/health system-owned, and academic medical centers located across ten states including urban (55%) and rural (45%) sites. Population studied: Primary care staff included physicians (n=13), residents (n=10), advanced practice providers (n=9), and administrators (n=10). Outcome measures: The interviews assessed perceptions about cancer screening barriers and facilitators, necessary adaptations, and future recommendations. Results: Barriers to cancer screening included delays in primary and specialty care, staff shortages, lack of personal protective equipment, patient hesitancy to receive in-person care, postal service delays for mail-home testing, COVID-19 travel restrictions (for Mexico-US border-crossing patients) and organizational policies (e.g., required COVID-19 testing prior to screening). Facilitators included better care coordination and collaboration due to the pandemic and more time during telehealth visits to discuss cancer screening compared to in-person visits. Adaptations included delayed screening, patient triage (e.g., prioritizing patients overdue for screening), telehealth visits to discuss cancer screening, mail-home testing, coordinating cancer screenings (e.g., providing fecal immunochemical test materials during cervical cancer screening) and same-day cancer screening. Recommendations included more public health education about the importance of cancer screening during COVID-19, more mail-home testing, and expanded healthcare access (e.g., weekend clinic) to address patient backlogs for cancer screening. Conclusions: Primary care staff developed innovative strategies to adapt cancer screening during the COVID-19 pandemic. Unresolved challenges (e.g., patient backlogs) will require additional implementation stra.
Assuntos
COVID-19 , Neoplasias do Colo do Útero , Humanos , Feminino , Detecção Precoce de Câncer , Teste para COVID-19 , PandemiasRESUMO
BACKGROUND: Electronic visits (e-visits) involve asynchronous communication between clinicians and patients through a secure web-based platform, such as a patient portal, to elicit symptoms and determine a diagnosis and treatment plan. E-visits are now reimbursable through Medicare due to the COVID-19 pandemic. The state of evidence regarding e-visits, such as the impact on clinical outcomes and health care delivery, is unclear. OBJECTIVE: To address this gap, we examine how e-visits have impacted clinical outcomes and health care quality, access, utilization, and costs. METHODS: We conducted a systematic review; MEDLINE, Embase, and Web of Science were searched from January 2000 through October 2020 for peer-reviewed studies that assessed e-visits' impacts on clinical and health care delivery outcomes. RESULTS: Out of 1859 papers, 19 met the inclusion criteria. E-visit usage was associated with improved or comparable clinical outcomes, especially for chronic disease management (eg, diabetes care, blood pressure management). The impact on quality of care varied across conditions. Quality of care was equivalent or better for chronic conditions, but variable quality was observed in infection management (eg, appropriate antibiotic prescribing). Similarly, the impact on health care utilization varied across conditions (eg, lower utilization for dermatology but mixed impact in primary care). Health care costs were lower for e-visits than those for in-person visits for a wide range of conditions (eg, dermatology and acute visits). No studies examined the impact of e-visits on health care access. It is difficult to draw firm conclusions about effectiveness or impact on care delivery from the studies that were included because many used observational designs. CONCLUSIONS: Overall, the evidence suggests e-visits may provide clinical outcomes that are comparable to those provided by in-person care and reduce health care costs for certain health care conditions. At the same time, there is mixed evidence on health care quality, especially regarding infection management (eg, sinusitis, urinary tract infections, conjunctivitis). Further studies are needed to test implementation strategies that might improve delivery (eg, clinical decision support for antibiotic prescribing) and to assess which conditions can be managed via e-visits.
Assuntos
COVID-19/diagnóstico , Sistemas de Apoio a Decisões Clínicas , Atenção à Saúde/métodos , Telemedicina/métodos , Comunicação , Eletrônica , Humanos , SARS-CoV-2/isolamento & purificaçãoRESUMO
OBJECTIVES: Reports of medical student mentorship prevalence range between 26% and 77%. This broad range likely reflects the tendencies of studies to focus on specific populations of medical students. There is little consensus about the characteristics of mentoring relationships among medical students. The primary goal of this study was to determine the reported prevalence of mentorship among medical students in the United States. The secondary goals were to assess the desired qualities of and barriers to successful mentoring from a medical student perspective. METHODS: A cross-sectional online survey was administered via Qualtrics to all medical students at participating accredited medical schools from July 2018 to March 2019. The questionnaire contained a subsection of questions that assessed the existence of mentoring, facilitators, and barriers in finding a mentor, and the desired qualities of a successful mentor. RESULTS: With a 94% completion rate, 369 (69%) of 532 medical students reported having a mentor. Adjusted analysis showed that fourth-year medical students were significantly more likely to have a mentor compared with first-year (odds ratio [OR] 2.65, 95% confidence interval [CI] 1.49-4.73, P = 0.001), second-year (OR 2.07, 95% CI 1.14-3.76, P = 0.016), and third-year medical students (OR 2.16, 95% CI 1.2-3.90, P = 0.011). Compassion (64%) was the most commonly reported quality in a successful mentoring relationship. Lack of time from mentor (75%) was the most commonly reported barrier. CONCLUSIONS: This study may serve as a guide to fostering more supportive mentoring relationships. Each mentoring relationship should be tailored to the needs of the mentee, however.
Assuntos
Tutoria/normas , Estudantes de Medicina/psicologia , Adulto , Estudos Transversais , Feminino , Humanos , Masculino , Tutoria/ética , Prevalência , Inquéritos e Questionários , Estados UnidosRESUMO
OBJECTIVES: Acid suppression therapy (AST), composed of proton pump inhibitors (PPIs), histamine-2 receptor blockers, and antacids, is one of the most common medication groups used in the United States. Long-term AST is concerning, however, because it is linked with an increased risk of community-acquired pneumonia, Clostridium difficile infections, bone fractures, and nutritional deficiencies. The potentially harmful biological and economic consequences associated with the improper use of acid suppression medications presents a great deal of risk to those in underserved communities. We sought to determine the prevalence of AST in an underserved population and the common diagnoses and symptoms associated with therapy. In addition, we studied the frequency of suboptimal usage of PPIs in an indigent care population and the potential factors related to high-risk behaviors. METHODS: The study was a cross-sectional study using a survey that was distributed to participants during their regularly scheduled visits to a public sector provider of health care for low-income patients. RESULTS: Of the 176 participants surveyed, 70 (40%) were using AST. Esophagitis and gastroesophageal reflux disease were the most prevalent in our sample population. PPIs were the most common acid suppression medication used in our population. Of those using PPIs, 85% were never instructed to cease use. Of the 27 patients with PPI prescriptions, 26 used it in a suboptimal manner, and of those without prescriptions, 7 used it in a suboptimal manner. CONCLUSIONS: ASTs are prevalent in low-income populations, and patients are not being managed appropriately to minimize their risk for complications of AST.
Assuntos
Gastroenteropatias/tratamento farmacológico , Antagonistas dos Receptores H2 da Histamina/uso terapêutico , Populações Vulneráveis/estatística & dados numéricos , Adulto , Idoso , Estudos Transversais , Feminino , Florida , Gastroenteropatias/economia , Antagonistas dos Receptores H2 da Histamina/economia , Humanos , Masculino , Pessoa de Meia-Idade , Inquéritos e QuestionáriosRESUMO
OBJECTIVES: To assess patients' knowledge of blood pressure (BP) and their comfort level with using technology, including a Bluetooth-enabled BP device and pharmacist telemonitoring. The secondary objective was to discover if pharmacist interventions improved BP readings. SETTING: The study took place in Pharmacy Plus and the Family Medicine Department at the University of South Florida in Tampa, FL. PRACTICE DESCRIPTION: The pharmacists within Pharmacy Plus and the Family Medicine Department are part of the interdisciplinary team providing care to patients and seeking to achieve optimal patient outcomes. Pharmacy Plus breaks away from the traditional behind-the-counter model using innovative technology to create a personalized experience for patients. PRACTICE INNOVATION: During this pilot study, the patients received a Bluetooth-enabled BP monitor and were asked to obtain their BP readings at least once daily for 6 weeks. The patients' electronic health records automatically captured the BP readings, which were reviewed by the study pharmacists. The patients had an appointment with the pharmacists once weekly via a telehealth platform through which they were counseled on their weekly average BP, BP goals, lifestyle modifications, and proper use of the devices. EVALUATION: The patients completed a prestudy survey assessing their baseline knowledge of BP, comfort level when using technology, and ease in working with pharmacists. Reliability and satisfaction in using the BP device and telehealth communication with pharmacists were also assessed poststudy. RESULTS: Twelve patients enrolled, with 9 completing the study. There was a statistically significant increase in patients' knowledge of BP and an improvement in the recommended lifestyle modifications. In addition, comfort level regarding communication with the pharmacist was statistically significantly improved. The patients responded positively to using the Bluetooth-enabled BP monitor and telehealth for receiving health care services. CONCLUSION: Using Bluetooth-enabled BP monitors that report results in real time into electronic health records, along with pharmacist interventions within a team-based care model, may result in improved BP control and patient outcomes.
Assuntos
Farmacêuticos , Telemedicina , Pressão Sanguínea , Florida , Humanos , Projetos Piloto , Reprodutibilidade dos Testes , TecnologiaRESUMO
Reconstruction of full-thickness total or subtotal lower lip defects represents a challenge for the reconstructive surgeon because of the difficulty to create a functional and aesthetically good lip. Many surgical techniques, going from local to free flaps, have been reported, each of them having its own advantages and disadvantages. In particular, the free fascio-cutaneous flaps in most cases are the first reconstructive option, even though several disadvantages such as the complexity of the procedure, longer operative times, morbidity, longer hospitalization, and conspicuous donor-site scar. To avoid these problems, especially in aged patients and in presence of low compliance and/or comorbidities, the Authors propose a single stage reconstruction with a double overlying cervical flap.
Assuntos
Lábio/cirurgia , Cicatriz , Retalhos de Tecido Biológico , Humanos , Duração da CirurgiaRESUMO
BACKGROUND: The purpose is to determine the clinicopathologic factors related to survival and recurrence of primary resected pelvic soft tissue sarcomas (STS). METHODS: Demographic/clinical variables were recorded. RESULTS: Thirty-five pts were identified. Median follow-up was 24.2 months. There were 23 (65.7%) high/intermediate-grade and 12 (34.3%) low-grade tumors included in the final analysis. Eight patients (22.9%) received neoadjuvant therapy. Margins were grossly negative in 27 (77.1%, 17-R0, 10-R1) and grossly positive (R2) in 8 (22.9%). Adjuvant therapy was used in 13 patients (37.1%). The 2- and 3-year RFS was 56.5% and 51.3%, with 14 patients recurring at a median time of 16 months (6-local, 8-distant). All distant recurrences were in high-grade tumors. There were no differences in RFS for margins (R0 vs. R1), neoadjuvant/adjuvant therapy, size (≥10 vs. <10 cm) or gender. High/intermediate-grade tumors had worse RFS (P < 0.008). The 2- and 3-year OS was 80.9%. OS was improved for R0/R1 resection (P < 0.001). Resection to R0/R1 margin was a significant predictor of improved OS (P = 0.001). CONCLUSIONS: High/intermediate-grade lesions were associated with worse OS and RFS. Resection to gross negative margins was the only independent predictor of OS. Adjuvant therapy may be reserved for high-grade lesions due to increased metastatic potential. J
Assuntos
Recidiva Local de Neoplasia/mortalidade , Recidiva Local de Neoplasia/terapia , Neoplasias Pélvicas/mortalidade , Neoplasias Pélvicas/cirurgia , Neoplasias Retroperitoneais/mortalidade , Neoplasias Retroperitoneais/cirurgia , Sarcoma/mortalidade , Sarcoma/cirurgia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Quimioterapia Adjuvante , Fatores de Confusão Epidemiológicos , Feminino , Humanos , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Gradação de Tumores , Neoplasias Pélvicas/patologia , Neoplasias Pélvicas/terapia , Valor Preditivo dos Testes , Radioterapia Adjuvante , Neoplasias Retroperitoneais/patologia , Neoplasias Retroperitoneais/terapia , Estudos Retrospectivos , Fatores de Risco , Sarcoma/patologia , Sarcoma/terapiaRESUMO
BACKGROUND AND OBJECTIVES: Artificial intelligence (AI), such as ChatGPT and Bard, has gained popularity as a tool in medical education. The use of AI in family medicine has not yet been assessed. The objective of this study is to compare the performance of three large language models (LLMs; ChatGPT 3.5, ChatGPT 4.0, and Google Bard) on the family medicine in-training exam (ITE). METHODS: The 193 multiple-choice questions of the 2022 ITE, written by the American Board of Family Medicine, were inputted in ChatGPT 3.5, ChatGPT 4.0, and Bard. The LLMs' performance was then scored and scaled. RESULTS: ChatGPT 4.0 scored 167/193 (86.5%) with a scaled score of 730 out of 800. According to the Bayesian score predictor, ChatGPT 4.0 has a 100% chance of passing the family medicine board exam. ChatGPT 3.5 scored 66.3%, translating to a scaled score of 400 and an 88% chance of passing the family medicine board exam. Bard scored 64.2%, with a scaled score of 380 and an 85% chance of passing the boards. Compared to the national average of postgraduate year 3 residents, only ChatGPT 4.0 surpassed the residents' mean of 68.4%. CONCLUSIONS: ChatGPT 4.0 was the only LLM that outperformed the family medicine postgraduate year 3 residents' national averages on the 2022 ITE, providing robust explanations and demonstrating its potential use in delivering background information on common medical concepts that appear on board exams.
Assuntos
Avaliação Educacional , Medicina de Família e Comunidade , Medicina de Família e Comunidade/educação , Humanos , Avaliação Educacional/métodos , Inteligência Artificial , Internato e Residência , Idioma , Competência Clínica , Teorema de BayesRESUMO
BACKGROUND: Access to dermatologists is limited in parts of the US, making primary care clinicians (PCCs) integral for early detection of skin cancers. A handheld device using elastic scattering spectroscopy (ESS) was developed to aid PCCs in their clinical assessment of skin lesions. METHODS: In this prospective study, 3 PCCs evaluated skin lesions reported by patients as concerning and scanned each lesion with the handheld ESS device. The comparison was pathology results or a 3-dermatologist panel examining high resolution dermatoscopic and clinical images. PCCs reported their diagnosis, management decision, and confidence level for each lesion. Evaluation of results included sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and Area Under the Curve (AUC). RESULTS: A total of 155 patients and 178 lesions were included in the final analysis. The most commonly patient-reported concerning feature was "new or changing lesion" (91.6%). Device diagnostic sensitivity and specificity were 90.0% and 60.7%, respectively, based on biopsy result or dermatologist panel reference standard; comparatively, PCC sensitivity was 40.0% and 84.8% specificity without the use of the device. Device NPV was 98.9%, and device PPV was 13.6%. The device recommended patient referral to dermatology with 88.2% concordance with the dermatologist panel. AUC for the device and PCCs were 0.815 and 0.643, respectively. CONCLUSIONS: The use of the ESS device by PCCs can improve diagnostic and management sensitivity for select malignant skin lesions by correctly classifying most benign lesions of patient concern. This may increase skin cancer detection while improving access to specialist care.
Assuntos
Sensibilidade e Especificidade , Neoplasias Cutâneas , Humanos , Estudos Prospectivos , Neoplasias Cutâneas/diagnóstico , Neoplasias Cutâneas/patologia , Feminino , Masculino , Pessoa de Meia-Idade , Idoso , Análise Espectral/métodos , Adulto , Detecção Precoce de Câncer/métodos , Atenção Primária à Saúde , Idoso de 80 Anos ou mais , Dermoscopia/instrumentação , Valor Preditivo dos TestesRESUMO
About 1 in 9 older adults over 65 has Alzheimer's disease (AD), many of whom also have multiple other chronic conditions such as hypertension and diabetes, necessitating careful monitoring through laboratory tests. Understanding the patterns of laboratory tests in this population aids our understanding and management of these chronic conditions along with AD. In this study, we used an unimodal cosinor model to assess the seasonality of lab tests using electronic health record (EHR) data from 34,303 AD patients from the OneFlorida+ Clinical Research Consortium. We observed significant seasonal fluctuations-higher in winter in lab tests such as glucose, neutrophils per 100 white blood cells (WBC), and WBC. Notably, certain leukocyte types like eosinophils, lymphocytes, and monocytes are elevated during summer, likely reflecting seasonal respiratory diseases and allergens. Seasonality is more pronounced in older patients and varies by gender. Our findings suggest that recognizing these patterns and adjusting reference intervals for seasonality would allow healthcare providers to enhance diagnostic precision, tailor care, and potentially improve patient outcomes.
RESUMO
Background: Even though patients have easy access to their electronic health records and lab test results data through patient portals, lab results are often confusing and hard to understand. Many patients turn to online forums or question and answering (Q&A) sites to seek advice from their peers. However, the quality of answers from social Q&A on health-related questions varies significantly, and not all the responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to get their questions answered. Objective: We aim to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to lab test-related questions asked by patients and to identify potential issues that can be mitigated with augmentation approaches. Methods: We first collected lab test results related question and answer data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from four LLMs including GPT-4, Meta LLaMA 2, MedAlpaca, and ORCA_mini. We first assessed the similarity of their answers using standard QA similarity-based evaluation metrics including ROUGE, BLEU, METEOR, BERTScore. We also utilized an LLM-based evaluator to judge whether a target model has higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. Finally, we performed a manual evaluation with medical experts for all the responses of seven selected questions on the same four aspects. Results: Regarding the similarity of the responses from 4 LLMs, where GPT-4 output was used as the reference answer, the responses from LLaMa 2 are the most similar ones, followed by LLaMa 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored lowest and thus least similar to GPT-4-generated answers. The results of Win Rate and medical expert evaluation both showed that GPT-4's responses achieved better scores than all the other LLM responses and human responses on all the four aspects (relevance, correctness, helpfulness, and safety). However, LLM responses occasionally also suffer from lack of interpretation in one's medical context, incorrect statements, and lack of references. Conclusions: By evaluating LLMs in generating responses to patients' lab test results related questions, we find that compared to other three LLMs and human answer from the Q&A website, GPT-4's responses are more accurate, helpful, relevant, and safer. However, there are cases that GPT-4 responses are inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses including prompt engineering, prompt augmentation, retrieval augmented generation, and response evaluation.
RESUMO
Artificial Intelligence (AI) is poised to revolutionize family medicine, offering a transformative approach to achieving the Quintuple Aim. This article examines the imperative for family medicine to adapt to the rapidly evolving field of AI, with an emphasis on its integration in clinical practice. AI's recent advancements have the potential to significantly transform health care. We argue for the proactive engagement of family medicine in directing AI technologies toward enhancing the "Quintuple Aim."The article highlights potential benefits of AI, such as improved patient outcomes through enhanced diagnostic tools, clinician well-being through reduced administrative burdens, and the promotion of health equity by analyzing diverse data sets. However, we also acknowledge the risks associated with AI, including the potential for automation to diverge from patient-centered care and exacerbate health care disparities. Our recommendations stress the need for family medicine education to incorporate AI literacy, the development of a collaborative for AI integration, and the establishment of guidelines and standards through interdisciplinary cooperation. We conclude that although AI poses challenges, its responsible and ethical implementation can revolutionize family medicine, optimizing patient care and enhancing the role of clinicians in a technology-driven future.
Assuntos
Inteligência Artificial , Medicina de Família e Comunidade , Humanos , Medicina de Família e Comunidade/métodos , Assistência Centrada no Paciente/organização & administraçãoRESUMO
Importance: Nudges have been increasingly studied as a tool for facilitating behavior change and may represent a novel way to modify the electronic health record (EHR) to encourage evidence-based care. Objective: To evaluate the association between EHR nudges and health care outcomes in primary care settings and describe implementation facilitators and barriers. Evidence Review: On June 9, 2023, an electronic search was performed in PubMed, Embase, PsycINFO, CINAHL, and Web of Science for all articles about clinician-facing EHR nudges. After reviewing titles, abstracts, and full texts, the present review was restricted to articles that used a randomized clinical trial (RCT) design, focused on primary care settings, and evaluated the association between EHR nudges and health care quality and patient outcome measures. Two reviewers abstracted the following elements: country, targeted clinician types, medical conditions studied, length of evaluation period, study design, sample size, intervention conditions, nudge mechanisms, implementation facilitators and barriers encountered, and major findings. The findings were qualitatively reported by type of health care quality and patient outcome and type of primary care condition targeted. The Risk of Bias 2.0 tool was adapted to evaluate the studies based on RCT design (cluster, parallel, crossover). Studies were scored from 0 to 5 points, with higher scores indicating lower risk of bias. Findings: Fifty-four studies met the inclusion criteria. Overall, most studies (79.6%) were assessed to have a moderate risk of bias. Most or all descriptive (eg, documentation patterns) (30 of 38) or patient-centeredness measures (4 of 4) had positive associations with EHR nudges. As for other measures of health care quality and patient outcomes, few had positive associations between EHR nudges and patient safety (4 of 12), effectiveness (19 of 48), efficiency (0 of 4), patient-reported outcomes (0 of 3), patient adherence (1 of 2), or clinical outcome measures (1 of 7). Conclusions and Relevance: This systematic review found low- and moderate-quality evidence that suggested that EHR nudges were associated with improved descriptive measures (eg, documentation patterns). Meanwhile, it was unclear whether EHR nudges were associated with improvements in other areas of health care quality, such as effectiveness and patient safety outcomes. Future research is needed using longer evaluation periods, a broader range of primary care conditions, and in deimplementation contexts.
Assuntos
Registros Eletrônicos de Saúde , Atenção Primária à Saúde , Qualidade da Assistência à Saúde , Atenção Primária à Saúde/normas , Atenção Primária à Saúde/estatística & dados numéricos , Humanos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Registros Eletrônicos de Saúde/normas , Qualidade da Assistência à Saúde/normas , Qualidade da Assistência à Saúde/estatística & dados numéricos , Avaliação de Resultados em Cuidados de Saúde/métodosRESUMO
Background Undergraduate medical education aims to prepare learners to become capable residents. New interns are expected to perform clinical tasks with distant supervision reliant on having acquired a medical degree. However, there is limited data to discuss what entrustment residency programs grant versus what the medical schools believe they have trained their graduates to perform. At our institution, we sought to foster an alliance between undergraduate medical education (UME) and graduate medical education (GME) toward specialty-specific entrustable professional activities (SSEPAs). These SSEPAs create a bridge to residency and help students structure the final year of medical school while striving for entrustability for day one of residency. This paper describes the SSEPA curriculum development process and student self-assessment of competence. Methodology We piloted an SSEPA program with the departments of Family Medicine, Internal Medicine, Neurology, and Obstetrics & Gynecology. Utilizing Kern's curriculum development framework, each specialty designed a longitudinal curriculum with a post-match capstone course. Students participated in pre-course and post-course self-assessments utilizing the Chen scale for each entrustable professional activity (EPA). Results A total of 42 students successfully completed the SSEPA curriculum in these four specialties. Students' self-assessed competence levels rose from 2.61 to 3.65 in Internal Medicine; 3.23 to 4.12 in Obstetrics and Gynecology; 3.62 to 4.13 in Neurology; and 3.65 to 3.79 in Family Medicine. Students across all specialties noted an increase in confidence from 3.45 to 4.38 in Internal Medicine; 3.3 to 4.6 in Obstetrics and Gynecology; 3.25 to 4.25 in Neurology; and 4.33 to 4.67 in Family Medicine. Conclusions A specialty-specific curriculum utilizing a competency-based framework for learners traversing the UME to GME journey in the final year of medical school improves learner confidence in their clinical abilities and may lead to an improved educational handoff between UME and GME.
RESUMO
The increasing death rate over the past eight years due to stroke has prompted clinicians to look for data-driven decision aids. Recently, deep-learning-based prediction models trained with fine-grained electronic health record (EHR) data have shown superior promise for health outcome prediction. However, the use of EHR-based deep learning models for hemorrhagic stroke outcome prediction has not been extensively explored. This paper proposes an ensemble deep learning framework to predict early mortality among ICU patients with hemorrhagic stroke. The proposed ensemble model achieved an accuracy of 83%, which was higher than the fusion model and other baseline models (logistic regression, decision tree, random forest, and XGBoost). Moreover, we used SHAP values for interpretation of the ensemble model to identify important features for the prediction. In addition, this paper follows the MINIMAR (MINimum Information for Medical AI Reporting) standard, presenting an important step towards building trust among the AI system and clinicians.
RESUMO
Stroke is a significant cause of mortality and morbidity, necessitating early predictive strategies to minimize risks. Traditional methods for evaluating patients, such as Acute Physiology and Chronic Health Evaluation (APACHE II, IV) and Simplified Acute Physiology Score III (SAPS III), have limited accuracy and interpretability. This paper proposes a novel approach: an interpretable, attention-based transformer model for early stroke mortality prediction. This model seeks to address the limitations of previous predictive models, providing both interpretability (providing clear, understandable explanations of the model) and fidelity (giving a truthful explanation of the model's dynamics from input to output). Furthermore, the study explores and compares fidelity and interpretability scores using Shapley values and attention-based scores to improve model explainability. The research objectives include designing an interpretable attention-based transformer model, evaluating its performance compared to existing models, and providing feature importance derived from the model.
RESUMO
BACKGROUND: After-hours documentation burden among US clinicians is often uncompensated work and has been associated with burnout, leading health systems to identify root causes and seek interventions to reduce this. A few studies have suggested quality programme participation (e.g., Merit-Based Incentive Payment System [MIPS]) was associated with a higher administrative burden. However, the association between MIPS participation and after-hours documentation has not been fully explored. Thus, this study aims to assess whether participation in the MIPS programme was independently associated with after-hours documentation burden. METHODS: We used 2021 data from the National Electronic Health Records Survey. We used a multivariable ordinal logistic regression model to assess whether MIPS participation was associated with the amount of after-hours documentation burden when controlling for other factors. We controlled for physician age, specialty, sex, number of practice locations, number of physicians, practice ownership, whether team support (e.g., scribes) is used for documentation tasks, and whether the practice accepts Medicaid patients. RESULTS: We included 1801 office-based US physician respondents with complete data for variables of interest. After controlling for other factors, MIPS participation was associated with greater odds of spending a greater number of hours on after-hours documentation (odds ratio = 1.44, 95% confidence interval 1.06-1.95). CONCLUSIONS: MIPS participation may increase after-hours documentation burden among US office-based physicians, suggesting that physicians may require additional resources to more efficiently report data.