Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 100
1.
Cureus ; 16(4): e57611, 2024 Apr.
Article En | MEDLINE | ID: mdl-38707042

Purpose The purpose of this study is to assess the accuracy of and bias in recommendations for oculoplastic surgeons from three artificial intelligence (AI) chatbot systems. Methods ChatGPT, Microsoft Bing Balanced, and Google Bard were asked for recommendations for oculoplastic surgeons practicing in 20 cities with the highest population in the United States. Three prompts were used: "can you help me find (an oculoplastic surgeon)/(a doctor who does eyelid lifts)/(an oculofacial plastic surgeon) in (city)." Results A total of 672 suggestions were made between (oculoplastic surgeon; doctor who does eyelid lifts; oculofacial plastic surgeon); 19.8% suggestions were excluded, leaving 539 suggested physicians. Of these, 64.1% were oculoplastics specialists (of which 70.1% were American Society of Ophthalmic Plastic and Reconstructive Surgery (ASOPRS) members); 16.1% were general plastic surgery trained, 9.0% were ENT trained, 8.8% were ophthalmology but not oculoplastics trained, and 1.9% were trained in another specialty. 27.7% of recommendations across all AI systems were female. Conclusions Among the chatbot systems tested, there were high rates of inaccuracy: up to 38% of recommended surgeons were nonexistent or not practicing in the city requested, and 35.9% of those recommended as oculoplastic/oculofacial plastic surgeons were not oculoplastics specialists. Choice of prompt affected the result, with requests for "a doctor who does eyelid lifts" resulting in more plastic surgeons and ENTs and fewer oculoplastic surgeons. It is important to identify inaccuracies and biases in recommendations provided by AI systems as more patients may start using them to choose a surgeon.

2.
JMIR Med Inform ; 12: e51842, 2024 May 08.
Article En | MEDLINE | ID: mdl-38722209

Background: Numerous pressure injury prediction models have been developed using electronic health record data, yet hospital-acquired pressure injuries (HAPIs) are increasing, which demonstrates the critical challenge of implementing these models in routine care. Objective: To help bridge the gap between development and implementation, we sought to create a model that was feasible, broadly applicable, dynamic, actionable, and rigorously validated and then compare its performance to usual care (ie, the Braden scale). Methods: We extracted electronic health record data from 197,991 adult hospital admissions with 51 candidate features. For risk prediction and feature selection, we used logistic regression with a least absolute shrinkage and selection operator (LASSO) approach. To compare the model with usual care, we used the area under the receiver operating curve (AUC), Brier score, slope, intercept, and integrated calibration index. The model was validated using a temporally staggered cohort. Results: A total of 5458 HAPIs were identified between January 2018 and July 2022. We determined 22 features were necessary to achieve a parsimonious and highly accurate model. The top 5 features included tracheostomy, edema, central line, first albumin measure, and age. Our model achieved higher discrimination than the Braden scale (AUC 0.897, 95% CI 0.893-0.901 vs AUC 0.798, 95% CI 0.791-0.803). Conclusions: We developed and validated an accurate prediction model for HAPIs that surpassed the standard-of-care risk assessment and fulfilled necessary elements for implementation. Future work includes a pragmatic randomized trial to assess whether our model improves patient outcomes.

3.
Obstet Gynecol ; 2024 May 10.
Article En | MEDLINE | ID: mdl-38723260

OBJECTIVE: To develop and validate a predictive model for postpartum hemorrhage that can be deployed in clinical care using automated, real-time electronic health record (EHR) data and to compare performance of the model with a nationally published risk prediction tool. METHODS: A multivariable logistic regression model was developed from retrospective EHR data from 21,108 patients delivering at a quaternary medical center between January 1, 2018, and April 30, 2022. Deliveries were divided into derivation and validation sets based on an 80/20 split by date of delivery. Postpartum hemorrhage was defined as blood loss of 1,000 mL or more in addition to postpartum transfusion of 1 or more units of packed red blood cells. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC) and was compared with a postpartum hemorrhage risk assessment tool published by the CMQCC (California Maternal Quality Care Collaborative). The model was then programmed into the EHR and again validated with prospectively collected data from 928 patients between November 7, 2023, and January 31, 2024. RESULTS: Postpartum hemorrhage occurred in 235 of 16,862 patients (1.4%) in the derivation cohort. The predictive model included 21 risk factors and demonstrated an AUC of 0.81 (95% CI, 0.79-0.84) and calibration slope of 1.0 (Brier score 0.013). During external temporal validation, the model maintained discrimination (AUC 0.80, 95% CI, 0.72-0.84) and calibration (calibration slope 0.95, Brier score 0.014). This was superior to the CMQCC tool (AUC 0.69 [95% CI, 0.67-0.70], P<.001). The model maintained performance in prospective, automated data collected with the predictive model in real time (AUC 0.82 [95% CI, 0.73-0.91]). CONCLUSION: We created and temporally validated a postpartum hemorrhage prediction model, demonstrated its superior performance over a commonly used risk prediction tool, successfully coded the model into the EHR, and prospectively validated the model using risk factor data collected in real time. Future work should evaluate external generalizability and effects on patient outcomes; to facilitate this work, we have included the model coefficients and examples of EHR integration in the article.

4.
Front Immunol ; 15: 1384229, 2024.
Article En | MEDLINE | ID: mdl-38571954

Objective: Positive antinuclear antibodies (ANAs) cause diagnostic dilemmas for clinicians. Currently, no tools exist to help clinicians interpret the significance of a positive ANA in individuals without diagnosed autoimmune diseases. We developed and validated a risk model to predict risk of developing autoimmune disease in positive ANA individuals. Methods: Using a de-identified electronic health record (EHR), we randomly chart reviewed 2,000 positive ANA individuals to determine if a systemic autoimmune disease was diagnosed by a rheumatologist. A priori, we considered demographics, billing codes for autoimmune disease-related symptoms, and laboratory values as variables for the risk model. We performed logistic regression and machine learning models using training and validation samples. Results: We assembled training (n = 1030) and validation (n = 449) sets. Positive ANA individuals who were younger, female, had a higher titer ANA, higher platelet count, disease-specific autoantibodies, and more billing codes related to symptoms of autoimmune diseases were all more likely to develop autoimmune diseases. The most important variables included having a disease-specific autoantibody, number of billing codes for autoimmune disease-related symptoms, and platelet count. In the logistic regression model, AUC was 0.83 (95% CI 0.79-0.86) in the training set and 0.75 (95% CI 0.68-0.81) in the validation set. Conclusion: We developed and validated a risk model that predicts risk for developing systemic autoimmune diseases and can be deployed easily within the EHR. The model can risk stratify positive ANA individuals to ensure high-risk individuals receive urgent rheumatology referrals while reassuring low-risk individuals and reducing unnecessary referrals.


Autoimmune Diseases , Rheumatology , Female , Humans , Antibodies, Antinuclear , Autoantibodies , Autoimmune Diseases/diagnosis , Electronic Health Records , Male
5.
Article En | MEDLINE | ID: mdl-38497958

OBJECTIVE: This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal. MATERIALS AND METHODS: Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate fine-tuned models, we used 10 representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness. RESULTS: The dataset consisted of 499 794 pairs of patient messages and corresponding responses from the patient portal, with 5000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness. CONCLUSION: This subjective analysis suggests that leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and healthcare providers.

6.
Article En | MEDLINE | ID: mdl-38452289

OBJECTIVES: To evaluate the capability of using generative artificial intelligence (AI) in summarizing alert comments and to determine if the AI-generated summary could be used to improve clinical decision support (CDS) alerts. MATERIALS AND METHODS: We extracted user comments to alerts generated from September 1, 2022 to September 1, 2023 at Vanderbilt University Medical Center. For a subset of 8 alerts, comment summaries were generated independently by 2 physicians and then separately by GPT-4. We surveyed 5 CDS experts to rate the human-generated and AI-generated summaries on a scale from 1 (strongly disagree) to 5 (strongly agree) for the 4 metrics: clarity, completeness, accuracy, and usefulness. RESULTS: Five CDS experts participated in the survey. A total of 16 human-generated summaries and 8 AI-generated summaries were assessed. Among the top 8 rated summaries, five were generated by GPT-4. AI-generated summaries demonstrated high levels of clarity, accuracy, and usefulness, similar to the human-generated summaries. Moreover, AI-generated summaries exhibited significantly higher completeness and usefulness compared to the human-generated summaries (AI: 3.4 ± 1.2, human: 2.7 ± 1.2, P = .001). CONCLUSION: End-user comments provide clinicians' immediate feedback to CDS alerts and can serve as a direct and valuable data resource for improving CDS delivery. Traditionally, these comments may not be considered in the CDS review process due to their unstructured nature, large volume, and the presence of redundant or irrelevant content. Our study demonstrates that GPT-4 is capable of distilling these comments into summaries characterized by high clarity, accuracy, and completeness. AI-generated summaries are equivalent and potentially better than human-generated summaries. These AI-generated summaries could provide CDS experts with a novel means of reviewing user comments to rapidly optimize CDS alerts both online and offline.

7.
JAMA Intern Med ; 184(5): 484-492, 2024 May 01.
Article En | MEDLINE | ID: mdl-38466302

Importance: Chronic kidney disease (CKD) affects 37 million adults in the United States, and for patients with CKD, hypertension is a key risk factor for adverse outcomes, such as kidney failure, cardiovascular events, and death. Objective: To evaluate a computerized clinical decision support (CDS) system for the management of uncontrolled hypertension in patients with CKD. Design, Setting, and Participants: This multiclinic, randomized clinical trial randomized primary care practitioners (PCPs) at a primary care network, including 15 hospital-based, ambulatory, and community health center-based clinics, through a stratified, matched-pair randomization approach February 2021 to February 2022. All adult patients with a visit to a PCP in the last 2 years were eligible and those with evidence of CKD and hypertension were included. Intervention: The intervention consisted of a CDS system based on behavioral economic principles and human-centered design methods that delivered tailored, evidence-based recommendations, including initiation or titration of renin-angiotensin-aldosterone system inhibitors. The patients in the control group received usual care from PCPs with the CDS system operating in silent mode. Main Outcomes and Measures: The primary outcome was the change in mean systolic blood pressure (SBP) between baseline and 180 days compared between groups. The primary analysis was a repeated measures linear mixed model, using SBP at baseline, 90 days, and 180 days in an intention-to-treat repeated measures model to account for missing data. Secondary outcomes included blood pressure (BP) control and outcomes such as percentage of patients who received an action that aligned with the CDS recommendations. Results: The study included 174 PCPs and 2026 patients (mean [SD] age, 75.3 [0.3] years; 1223 [60.4%] female; mean [SD] SBP at baseline, 154.0 [14.3] mm Hg), with 87 PCPs and 1029 patients randomized to the intervention and 87 PCPs and 997 patients randomized to usual care. Overall, 1714 patients (84.6%) were treated for hypertension at baseline. There were 1623 patients (80.1%) with an SBP measurement at 180 days. From the linear mixed model, there was a statistically significant difference in mean SBP change in the intervention group compared with the usual care group (change, -14.6 [95% CI, -13.1 to -16.0] mm Hg vs -11.7 [-10.2 to -13.1] mm Hg; P = .005). There was no difference in the percentage of patients who achieved BP control in the intervention group compared with the control group (50.4% [95% CI, 46.5% to 54.3%] vs 47.1% [95% CI, 43.3% to 51.0%]). More patients received an action aligned with the CDS recommendations in the intervention group than in the usual care group (49.9% [95% CI, 45.1% to 54.8%] vs 34.6% [95% CI, 29.8% to 39.4%]; P < .001). Conclusions and Relevance: These findings suggest that implementing this computerized CDS system could lead to improved management of uncontrolled hypertension and potentially improved clinical outcomes at the population level for patients with CKD. Trial Registration: ClinicalTrials.gov Identifier: NCT03679247.


Antihypertensive Agents , Decision Support Systems, Clinical , Hypertension , Renal Insufficiency, Chronic , Humans , Female , Male , Hypertension/drug therapy , Hypertension/complications , Renal Insufficiency, Chronic/complications , Renal Insufficiency, Chronic/therapy , Antihypertensive Agents/therapeutic use , Aged , Middle Aged , Primary Health Care/methods
8.
J Am Med Inform Assoc ; 31(4): 968-974, 2024 Apr 03.
Article En | MEDLINE | ID: mdl-38383050

OBJECTIVE: To develop and evaluate a data-driven process to generate suggestions for improving alert criteria using explainable artificial intelligence (XAI) approaches. METHODS: We extracted data on alerts generated from January 1, 2019 to December 31, 2020, at Vanderbilt University Medical Center. We developed machine learning models to predict user responses to alerts. We applied XAI techniques to generate global explanations and local explanations. We evaluated the generated suggestions by comparing with alert's historical change logs and stakeholder interviews. Suggestions that either matched (or partially matched) changes already made to the alert or were considered clinically correct were classified as helpful. RESULTS: The final dataset included 2 991 823 firings with 2689 features. Among the 5 machine learning models, the LightGBM model achieved the highest Area under the ROC Curve: 0.919 [0.918, 0.920]. We identified 96 helpful suggestions. A total of 278 807 firings (9.3%) could have been eliminated. Some of the suggestions also revealed workflow and education issues. CONCLUSION: We developed a data-driven process to generate suggestions for improving alert criteria using XAI techniques. Our approach could identify improvements regarding clinical decision support (CDS) that might be overlooked or delayed in manual reviews. It also unveils a secondary purpose for the XAI: to improve quality by discovering scenarios where CDS alerts are not accepted due to workflow, education, or staffing issues.


Artificial Intelligence , Decision Support Systems, Clinical , Humans , Machine Learning , Academic Medical Centers , Educational Status
9.
J Gen Intern Med ; 39(1): 27-35, 2024 Jan.
Article En | MEDLINE | ID: mdl-37528252

BACKGROUND: Early detection of clinical deterioration among hospitalized patients is a clinical priority for patient safety and quality of care. Current automated approaches for identifying these patients perform poorly at identifying imminent events. OBJECTIVE: Develop a machine learning algorithm using pager messages sent between clinical team members to predict imminent clinical deterioration. DESIGN: We conducted a large observational study using long short-term memory machine learning models on the content and frequency of clinical pages. PARTICIPANTS: We included all hospitalizations between January 1, 2018 and December 31, 2020 at Vanderbilt University Medical Center that included at least one page message to physicians. Exclusion criteria included patients receiving palliative care, hospitalizations with a planned intensive care stay, and hospitalizations in the top 2% longest length of stay. MAIN MEASURES: Model classification performance to identify in-hospital cardiac arrest, transfer to intensive care, or Rapid Response activation in the next 3-, 6-, and 12-hours. We compared model performance against three common early warning scores: Modified Early Warning Score, National Early Warning Score, and the Epic Deterioration Index. KEY RESULTS: There were 87,783 patients (mean [SD] age 54.0 [18.8] years; 45,835 [52.2%] women) who experienced 136,778 hospitalizations. 6214 hospitalized patients experienced a deterioration event. The machine learning model accurately identified 62% of deterioration events within 3-hours prior to the event and 47% of events within 12-hours. Across each time horizon, the model surpassed performance of the best early warning score including area under the receiver operating characteristic curve at 6-hours (0.856 vs. 0.781), sensitivity at 6-hours (0.590 vs. 0.505), specificity at 6-hours (0.900 vs. 0.878), and F-score at 6-hours (0.291 vs. 0.220). CONCLUSIONS: Machine learning applied to the content and frequency of clinical pages improves prediction of imminent deterioration. Using clinical pages to monitor patient acuity supports improved detection of imminent deterioration without requiring changes to clinical workflow or nursing documentation.


Clinical Deterioration , Humans , Female , Middle Aged , Male , Hospitalization , Critical Care , ROC Curve , Algorithms , Machine Learning , Retrospective Studies
10.
Cureus ; 15(9): e45911, 2023 Sep.
Article En | MEDLINE | ID: mdl-37885556

PURPOSE AND DESIGN: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities. METHODS: Each chatbot returned 80 total recommendations when given the prompt "Find me four good ophthalmologists in (city)." Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson's chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy. RESULTS: Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots. CONCLUSION: This study revealed substantial bias and inaccuracy in the AI chatbots' recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine.

11.
Appl Clin Inform ; 14(5): 833-842, 2023 10.
Article En | MEDLINE | ID: mdl-37541656

OBJECTIVES: Geocoding, the process of converting addresses into precise geographic coordinates, allows researchers and health systems to obtain neighborhood-level estimates of social determinants of health. This information supports opportunities to personalize care and interventions for individual patients based on the environments where they live. We developed an integrated offline geocoding pipeline to streamline the process of obtaining address-based variables, which can be integrated into existing data processing pipelines. METHODS: POINT is a web-based, containerized, application for geocoding addresses that can be deployed offline and made available to multiple users across an organization. Our application supports use through both a graphical user interface and application programming interface to query geographic variables, by census tract, without exposing sensitive patient data. We evaluated our application's performance using two datasets: one consisting of 1 million nationally representative addresses sampled from Open Addresses, and the other consisting of 3,096 previously geocoded patient addresses. RESULTS: A total of 99.4 and 99.8% of addresses in the Open Addresses and patient addresses datasets, respectively, were geocoded successfully. Census tract assignment was concordant with reference in greater than 90% of addresses for both datasets. Among successful geocodes, median (interquartile range) distances from reference coordinates were 52.5 (26.5-119.4) and 14.5 (10.9-24.6) m for the two datasets. CONCLUSION: POINT successfully geocodes more addresses and yields similar accuracy to existing solutions, including the U.S. Census Bureau's official geocoder. Addresses are considered protected health information and cannot be shared with common online geocoding services. POINT is an offline solution that enables scalability to multiple users and integrates downstream mapping to neighborhood-level variables with a pipeline that allows users to incorporate additional datasets as they become available. As health systems and researchers continue to explore and improve health equity, it is essential to quickly and accurately obtain neighborhood variables in a Health Insurance Portability and Accountability Act (HIPAA)-compliant way.


Geographic Information Systems , Geographic Mapping , Humans , Residence Characteristics , Software
12.
J Am Med Inform Assoc ; 30(10): 1755, 2023 09 25.
Article En | MEDLINE | ID: mdl-37535834
13.
Int J Med Inform ; 177: 105136, 2023 09.
Article En | MEDLINE | ID: mdl-37392712

OBJECTIVE: To develop and validate an approach that identifies patients eligible for lung cancer screening (LCS) by combining structured and unstructured smoking data from the electronic health record (EHR). METHODS: We identified patients aged 50-80 years who had at least one encounter in a primary care clinic at Vanderbilt University Medical Center (VUMC) between 2019 and 2022. We fine-tuned an existing natural language processing (NLP) tool to extract quantitative smoking information using clinical notes collected from VUMC. Then, we developed an approach to identify patients who are eligible for LCS by combining smoking information from structured data and clinical narratives. We compared this method with two approaches to identify LCS eligibility only using smoking information from structured EHR. We used 50 patients with a documented history of tobacco use for comparison and validation. RESULTS: 102,475 patients were included. The NLP-based approach achieved an F1-score of 0.909, and accuracy of 0.96. The baseline approach could identify 5,887 patients. Compared to the baseline approach, the number of identified patients using all structured data and the NLP-based algorithm was 7,194 (22.2 %) and 10,231 (73.8 %), respectively. The NLP-based approach identified 589 Black/African Americans, a significant increase of 119 %. CONCLUSION: We present a feasible NLP-based approach to identify LCS eligible patients. It provides a technical basis for the development of clinical decision support tools to potentially improve the utilization of LCS and diminish healthcare disparities.


Lung Neoplasms , Humans , Lung Neoplasms/diagnosis , Lung Neoplasms/epidemiology , Early Detection of Cancer , Electronic Health Records , Natural Language Processing , Smoking/epidemiology
14.
Yearb Med Inform ; 32(1): 169-178, 2023 Aug.
Article En | MEDLINE | ID: mdl-37414030

OBJECTIVES: This literature review summarizes relevant studies from the last three years (2020-2022) related to clinical decision support (CDS) and CDS impact on health disparities and the digital divide. This survey identifies current trends and synthesizes evidence-based recommendations and considerations for future development and implementation of CDS tools. METHODS: We conducted a search in PubMed for literature published between 2020 and 2022. Our search strategy was constructed as a combination of the MEDLINE®/PubMed® Health Disparities and Minority Health Search Strategy and relevant CDS MeSH terms and phrases. We then extracted relevant data from the studies, including priority population when applicable, domain of influence on the disparity being addressed, and the type of CDS being used. We also made note of when a study discussed the digital divide in some capacity and organized the comments into general themes through group discussion. RESULTS: Our search yielded 520 studies, with 45 included at the conclusion of screening. The most frequent CDS type in this review was point-of-care alerts/reminders (33.3%). Health Care System was the most frequent domain of influence (71.1%), and Blacks/African Americans were the most frequently included priority population (42.2%). Throughout the literature, we found four general themes related to the technology divide: inaccessibility of technology, access to care, trust of technology, and technology literacy.This survey revealed the diversity of CDS being used to address health disparities and several barriers which may make CDS less effective or potentially harmful to certain populations. Regular examinations of literature that feature CDS and address health disparities can help to reveal new strategies and patterns for improving healthcare.


Decision Support Systems, Clinical , Digital Divide , Humans , Delivery of Health Care , Surveys and Questionnaires , Health Inequities
15.
medRxiv ; 2023 Jul 16.
Article En | MEDLINE | ID: mdl-37503263

Objective: This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal. Methods: Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate the fine-tuned models, we used ten representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness. Results: The dataset consisted of a total of 499,794 pairs of patient messages and corresponding responses from the patient portal, with 5,000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness. Conclusion: Leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and primary care providers.

16.
Appl Clin Inform ; 14(3): 528-537, 2023 05.
Article En | MEDLINE | ID: mdl-37437601

BACKGROUND: Chronic kidney disease (CKD) is common and associated with adverse clinical outcomes. Most care for early CKD is provided in primary care, including hypertension (HTN) management. Computerized clinical decision support (CDS) can improve the quality of care for CKD but can also cause alert fatigue for primary care physicians (PCPs). Computable phenotypes (CPs) are algorithms to identify disease populations using, for example, specific laboratory data criteria. OBJECTIVES: Our objective was to determine the feasibility of implementation of CDS alerts by developing CPs and estimating potential alert burden. METHODS: We utilized clinical guidelines to develop a set of five CPs for patients with stage 3 to 4 CKD, uncontrolled HTN, and indications for initiation or titration of guideline-recommended antihypertensive agents. We then conducted an iterative data analytic process consisting of database queries, data validation, and subject matter expert discussion, to make iterative changes to the CPs. We estimated the potential alert burden to make final decisions about the scope of the CDS alerts. Specifically, the number of times that each alert could fire was limited to once per patient. RESULTS: In our primary care network, there were 239,339 encounters for 105,992 primary care patients between April 1, 2018 and April 1, 2019. Of these patients, 9,081 (8.6%) had stage 3 and 4 CKD. Almost half of the CKD patients, 4,191 patients, also had uncontrolled HTN. The majority of CKD patients were female, elderly, white, and English-speaking. We estimated that 5,369 alerts would fire if alerts were triggered multiple times per patient, with a mean number of alerts shown to each PCP ranging from 0.07-to 0.17 alerts per week. CONCLUSION: Development of CPs and estimation of alert burden allows researchers to iteratively fine-tune CDS prior to implementation. This method of assessment can help organizations balance the tradeoff between standardization of care and alert fatigue.


Decision Support Systems, Clinical , Female , Male , Animals , Feasibility Studies , Algorithms , Cognition , Phenotype
17.
J Gen Intern Med ; 38(11): 2546-2552, 2023 08.
Article En | MEDLINE | ID: mdl-37254011

BACKGROUND: Clinical trials indicate continuous glucose monitor (CGM) use may benefit adults with type 2 diabetes, but CGM rates and correlates in real-world care settings are unknown. OBJECTIVE: We sought to ascertain prevalence and correlates of CGM use and to examine rates of new CGM prescriptions across clinic types and medication regimens. DESIGN: Retrospective cohort using electronic health records in a large academic medical center in the Southeastern US. PARTICIPANTS: Adults with type 2 diabetes and a primary care or endocrinology visit during 2021. MAIN MEASURES: Age, gender, race, ethnicity, insurance, clinic type, insulin regimen, hemoglobin A1c values, CGM prescriptions, and prescribing clinic type. KEY RESULTS: Among 30,585 adults with type 2 diabetes, 13% had used a CGM. CGM users were younger and more had private health insurance (p < .05) as compared to non-users; 72% of CGM users had an intensive insulin regimen, but 12% were not taking insulin. CGM users had higher hemoglobin A1c values (both most recent and most proximal to the first CGM prescription) than non-users. CGM users were more likely to receive endocrinology care than non-users, but 23% had only primary care visits in 2021. For each month in 2021, a mean of 90.5 (SD 12.5) people started using CGM. From 2020 to 2021, monthly rates of CGM prescriptions to new users grew 36% overall, but 125% in primary care. Most starting CGM in endocrinology had an intensive insulin regimen (82% vs. 49% starting in primary care), whereas 28% starting CGM in primary care were not using insulin (vs. 5% in endocrinology). CONCLUSION: CGM uptake for type 2 diabetes is increasing rapidly, with most growth in primary care. These trends present opportunities for healthcare system adaptations to support CGM use and related workflows in primary care to support growth in uptake.


Diabetes Mellitus, Type 1 , Diabetes Mellitus, Type 2 , Hypoglycemia , Adult , Humans , Diabetes Mellitus, Type 2/drug therapy , Diabetes Mellitus, Type 2/epidemiology , Glycated Hemoglobin , Diabetes Mellitus, Type 1/drug therapy , Hypoglycemia/epidemiology , Retrospective Studies , Blood Glucose Self-Monitoring , Blood Glucose , Insulin/therapeutic use , Primary Health Care , Hypoglycemic Agents/therapeutic use
18.
Urolithiasis ; 51(1): 73, 2023 Apr 17.
Article En | MEDLINE | ID: mdl-37067633

This study seeks to evaluate the recurrence of kidney stones (ROKS) nomogram for risk stratification of recurrence in a retrospective study. To do this, we analyzed the performance of the 2018 ROKS nomogram in a case-control study of 200 patients (100 with and 100 without subsequent recurrence). All patients underwent kidney stone surgery between 2013 and 2015 and had at least 5 years of follow-up. We evaluated ROKS performance for prediction of recurrence at 2- and 5-year via area under the receiver operating curve (ROC-AUC). Specifically, we assessed the nomogram's potential for stratifying patients based on low or high risk of recurrence at: a) an optimized cutoff threshold (i.e., optimized for both sensitivity and specificity), and b) a sensitive cutoff threshold (i.e., high sensitivity (0.80) and low specificity). We found fair performance of the nomogram for recurrence prediction at 2 and 5 years (ROC-AUC of 0.67 and 0.63, respectively). At the optimized cutoff threshold, recurrence rates for the low and high-risk groups were 20 and 45% at 2 years, and 50 and 70% at 5 years, respectively. At the sensitive cutoff threshold, the corresponding recurrence rates for the low and high-risk groups were of 16 and 38% at 2 years, and 42 and 66% at 5 years, respectively. Kaplan-Meier analysis revealed a recurrence-free advantage between the groups for both cutoff thresholds (p < 0.01, Fig. 2). Therefore, we believe that the ROKS nomogram could facilitate risk stratification for stone recurrence and adherence to risk-based surveillance protocols.


Kidney Calculi , Nomograms , Humans , Case-Control Studies , Feasibility Studies , Kidney Calculi/diagnosis , Kidney Calculi/surgery , Retrospective Studies , Risk Assessment , Recurrence
19.
J Am Med Inform Assoc ; 30(7): 1237-1245, 2023 06 20.
Article En | MEDLINE | ID: mdl-37087108

OBJECTIVE: To determine if ChatGPT can generate useful suggestions for improving clinical decision support (CDS) logic and to assess noninferiority compared to human-generated suggestions. METHODS: We supplied summaries of CDS logic to ChatGPT, an artificial intelligence (AI) tool for question answering that uses a large language model, and asked it to generate suggestions. We asked human clinician reviewers to review the AI-generated suggestions as well as human-generated suggestions for improving the same CDS alerts, and rate the suggestions for their usefulness, acceptance, relevance, understanding, workflow, bias, inversion, and redundancy. RESULTS: Five clinicians analyzed 36 AI-generated suggestions and 29 human-generated suggestions for 7 alerts. Of the 20 suggestions that scored highest in the survey, 9 were generated by ChatGPT. The suggestions generated by AI were found to offer unique perspectives and were evaluated as highly understandable and relevant, with moderate usefulness, low acceptance, bias, inversion, redundancy. CONCLUSION: AI-generated suggestions could be an important complementary part of optimizing CDS alerts, can identify potential improvements to alert logic and support their implementation, and may even be able to assist experts in formulating their own suggestions for CDS improvement. ChatGPT shows great potential for using large language models and reinforcement learning from human feedback to improve CDS alert logic and potentially other medical areas involving complex, clinical logic, a key step in the development of an advanced learning health system.


Decision Support Systems, Clinical , Learning Health System , Humans , Artificial Intelligence , Language , Workflow
20.
medRxiv ; 2023 Feb 23.
Article En | MEDLINE | ID: mdl-36865144

Objective: To determine if ChatGPT can generate useful suggestions for improving clinical decision support (CDS) logic and to assess noninferiority compared to human-generated suggestions. Methods: We supplied summaries of CDS logic to ChatGPT, an artificial intelligence (AI) tool for question answering that uses a large language model, and asked it to generate suggestions. We asked human clinician reviewers to review the AI-generated suggestions as well as human-generated suggestions for improving the same CDS alerts, and rate the suggestions for their usefulness, acceptance, relevance, understanding, workflow, bias, inversion, and redundancy. Results: Five clinicians analyzed 36 AI-generated suggestions and 29 human-generated suggestions for 7 alerts. Of the 20 suggestions that scored highest in the survey, 9 were generated by ChatGPT. The suggestions generated by AI were found to offer unique perspectives and were evaluated as highly understandable and relevant, with moderate usefulness, low acceptance, bias, inversion, redundancy. Conclusion: AI-generated suggestions could be an important complementary part of optimizing CDS alerts, can identify potential improvements to alert logic and support their implementation, and may even be able to assist experts in formulating their own suggestions for CDS improvement. ChatGPT shows great potential for using large language models and reinforcement learning from human feedback to improve CDS alert logic and potentially other medical areas involving complex, clinical logic, a key step in the development of an advanced learning health system.

...