Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 298
Filter
1.
J Natl Cancer Inst ; 2024 Sep 30.
Article in English | MEDLINE | ID: mdl-39348179

ABSTRACT

BACKGROUND: Advance care planning/serious illness conversations can help clinicians understand patients' values and preferences. There are limited data on how to increase these conversations, and their effect on care patterns. We hypothesized that using a machine learning survival model to select patients for serious illness conversations, along with trained care coaches to conduct the conversations, would increase uptake in cancer patients at high risk of short-term mortality. METHODS: We conducted a cluster-randomized stepped wedge study on the physician level. Oncologists entered the intervention condition in a random order over six months. Adult patients with metastatic cancer were included. Patients with <2 year computer-predicted survival and no prognosis documentation were classified as high-priority for serious illness conversations. In the intervention condition, providers received automated weekly emails highlighting high-priority patients and were asked to document prognosis for them. Care coaches reached out to these patients to conduct the remainder of the conversation. The primary endpoint was proportion of visits with prognosis documentation within 14 days. RESULTS: 6,372 visits in 1,825 patients were included in the primary analysis. The proportion of visits with prognosis documentation within 14 days was higher in the intervention condition than control condition: 2.9% vs 1.1% (adjusted odds ratio 4.3, p < .0001). The proportion of visits with advance care planning documentation was also higher in the intervention condition: 7.7% vs 1.8% (adjusted odds ratio 14.2, p < .0001). In high-priority visits, advance care planning documentation rate in intervention/control visits was 24.2% vs 4.0%. CONCLUSION: The intervention increased documented conversations, with contributions by both providers and care coaches.

4.
Nat Med ; 30(9): 2409-2410, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39060659
5.
Res Sq ; 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38978576

ABSTRACT

Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current shortage of both general and specialized radiologists, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies while simultaneously using the images to extract novel physiological insights. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs) that utilize both the image and the corresponding textual radiology reports. However, current medical VLMs are generally limited to 2D images and short reports. To overcome these shortcomings for abdominal CT interpretation, we introduce Merlin - a 3D VLM that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining without requiring additional manual annotations. We train Merlin using a high-quality clinical dataset of paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens) for training. We comprehensively evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year chronic disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU. This computationally efficient design can help democratize foundation model training, especially for health systems with compute constraints. We plan to release our trained models, code, and dataset, pending manual removal of all protected health information.

6.
NPJ Digit Med ; 7(1): 171, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38937550

ABSTRACT

Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. Foundation models for structured electronic health records (EHR), trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across hospitals and their performance in local tasks. This multi-center study examined the adaptability of a publicly accessible structured EHR foundation model (FMSM), trained on 2.57 M patient records from Stanford Medicine. Experiments used EHR data from The Hospital for Sick Children (SickKids) and Medical Information Mart for Intensive Care (MIMIC-IV). We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of locally training models from scratch, including a local foundation model. Evaluations on 8 clinical prediction tasks showed that adapting the off-the-shelf FMSM matched the performance of gradient boosting machines (GBM) locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. Continued pretraining on local data showed FMSM required fewer than 1% of training examples to match the fully trained GBM's performance, and was 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings demonstrate that adapting EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.

7.
JMIR Med Inform ; 12: e51171, 2024 Apr 04.
Article in English | MEDLINE | ID: mdl-38596848

ABSTRACT

Background: With the capability to render prediagnoses, consumer wearables have the potential to affect subsequent diagnoses and the level of care in the health care delivery setting. Despite this, postmarket surveillance of consumer wearables has been hindered by the lack of codified terms in electronic health records (EHRs) to capture wearable use. Objective: We sought to develop a weak supervision-based approach to demonstrate the feasibility and efficacy of EHR-based postmarket surveillance on consumer wearables that render atrial fibrillation (AF) prediagnoses. Methods: We applied data programming, where labeling heuristics are expressed as code-based labeling functions, to detect incidents of AF prediagnoses. A labeler model was then derived from the predictions of the labeling functions using the Snorkel framework. The labeler model was applied to clinical notes to probabilistically label them, and the labeled notes were then used as a training set to fine-tune a classifier called Clinical-Longformer. The resulting classifier identified patients with an AF prediagnosis. A retrospective cohort study was conducted, where the baseline characteristics and subsequent care patterns of patients identified by the classifier were compared against those who did not receive a prediagnosis. Results: The labeler model derived from the labeling functions showed high accuracy (0.92; F1-score=0.77) on the training set. The classifier trained on the probabilistically labeled notes accurately identified patients with an AF prediagnosis (0.95; F1-score=0.83). The cohort study conducted using the constructed system carried enough statistical power to verify the key findings of the Apple Heart Study, which enrolled a much larger number of participants, where patients who received a prediagnosis tended to be older, male, and White with higher CHA2DS2-VASc (congestive heart failure, hypertension, age ≥75 years, diabetes, stroke, vascular disease, age 65-74 years, sex category) scores (P<.001). We also made a novel discovery that patients with a prediagnosis were more likely to use anticoagulants (525/1037, 50.63% vs 5936/16,560, 35.85%) and have an eventual AF diagnosis (305/1037, 29.41% vs 262/16,560, 1.58%). At the index diagnosis, the existence of a prediagnosis did not distinguish patients based on clinical characteristics, but did correlate with anticoagulant prescription (P=.004 for apixaban and P=.01 for rivaroxaban). Conclusions: Our work establishes the feasibility and efficacy of an EHR-based surveillance system for consumer wearables that render AF prediagnoses. Further work is necessary to generalize these findings for patient populations at other sites.

8.
Lancet Digit Health ; 6(6): e428-e432, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38658283

ABSTRACT

With the rapid growth of interest in and use of large language models (LLMs) across various industries, we are facing some crucial and profound ethical concerns, especially in the medical field. The unique technical architecture and purported emergent abilities of LLMs differentiate them substantially from other artificial intelligence (AI) models and natural language processing techniques used, necessitating a nuanced understanding of LLM ethics. In this Viewpoint, we highlight ethical concerns stemming from the perspectives of users, developers, and regulators, notably focusing on data privacy and rights of use, data provenance, intellectual property contamination, and broad applications and plasticity of LLMs. A comprehensive framework and mitigating strategies will be imperative for the responsible integration of LLMs into medical practice, ensuring alignment with ethical principles and safeguarding against potential societal risks.


Subject(s)
Artificial Intelligence , Natural Language Processing , Humans , Artificial Intelligence/ethics , Intellectual Property
9.
J Am Med Inform Assoc ; 31(6): 1441-1444, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38452298

ABSTRACT

OBJECTIVES: This article aims to examine how generative artificial intelligence (AI) can be adopted with the most value in health systems, in response to the Executive Order on AI. MATERIALS AND METHODS: We reviewed how technology has historically been deployed in healthcare, and evaluated recent examples of deployments of both traditional AI and generative AI (GenAI) with a lens on value. RESULTS: Traditional AI and GenAI are different technologies in terms of their capability and modes of current deployment, which have implications on value in health systems. DISCUSSION: Traditional AI when applied with a framework top-down can realize value in healthcare. GenAI in the short term when applied top-down has unclear value, but encouraging more bottom-up adoption has the potential to provide more benefit to health systems and patients. CONCLUSION: GenAI in healthcare can provide the most value for patients when health systems adapt culturally to grow with this new technology and its adoption patterns.


Subject(s)
Artificial Intelligence , Delivery of Health Care , Humans
10.
11.
BMC Med Inform Decis Mak ; 24(1): 51, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38355486

ABSTRACT

BACKGROUND: Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels. METHODS: This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen's Kappa, sensitivity and specificity were calculated for each lab-based severity level. RESULTS: The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen's Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds. CONCLUSIONS: Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.


Subject(s)
Hyperkalemia , Neutropenia , Humans , Delivery of Health Care , Machine Learning , Sensitivity and Specificity
12.
J Palliat Med ; 27(1): 83-89, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37935036

ABSTRACT

Background: Patients with serious illness benefit from conversations to share prognosis and explore goals and values. To address this, we implemented Ariadne Labs' Serious Illness Care Program (SICP) at Stanford Health Care. Objective: Improve quantity, timing, and quality of serious illness conversations. Methods: Initial implementation followed Ariadne Labs' SICP framework. We later incorporated a team-based approach that included nonphysician care team members. Outcomes included number of patients with documented conversations according to clinician role and practice location. Machine learning algorithms were used in some settings to identify eligible patients. Results: Ambulatory oncology and hospital medicine were our largest implementation sites, engaging 4707 and 642 unique patients in conversations, respectively. Clinicians across eight disciplines engaged in these conversations. Identified barriers that included leadership engagement, complex workflows, and patient identification. Conclusion: Several factors contributed to successful SICP implementation across clinical sites: innovative clinical workflows, machine learning based predictive algorithms, and nonphysician care team member engagement.


Subject(s)
Critical Care , Critical Illness , Humans , Critical Illness/therapy , Communication , Physician-Patient Relations , Academic Medical Centers
13.
JAMA ; 331(1): 17-18, 2024 01 02.
Article in English | MEDLINE | ID: mdl-38032634

ABSTRACT

This Viewpoint discusses a recent executive order by US President Joe Biden about the development and implementation of AI, including the role of government vs the private sector and how the order may affect health care.


Subject(s)
Artificial Intelligence , Delivery of Health Care , Delivery of Health Care/legislation & jurisprudence , Group Practice/legislation & jurisprudence , Organizations/legislation & jurisprudence , Politics , Federal Government , United States
14.
Pac Symp Biocomput ; 29: 8-23, 2024.
Article in English | MEDLINE | ID: mdl-38160266

ABSTRACT

The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.


Subject(s)
Computational Biology , Natural Language Processing , Humans , PubMed , Information Storage and Retrieval , Language
15.
JAMA ; 331(3): 245-249, 2024 01 16.
Article in English | MEDLINE | ID: mdl-38117493

ABSTRACT

Importance: Given the importance of rigorous development and evaluation standards needed of artificial intelligence (AI) models used in health care, nationwide accepted procedures to provide assurance that the use of AI is fair, appropriate, valid, effective, and safe are urgently needed. Observations: While there are several efforts to develop standards and best practices to evaluate AI, there is a gap between having such guidance and the application of such guidance to both existing and new AI models being developed. As of now, there is no publicly available, nationwide mechanism that enables objective evaluation and ongoing assessment of the consequences of using health AI models in clinical care settings. Conclusion and Relevance: The need to create a public-private partnership to support a nationwide health AI assurance labs network is outlined here. In this network, community best practices could be applied for testing health AI models to produce reports on their performance that can be widely shared for managing the lifecycle of AI models over time and across populations and sites where these models are deployed.


Subject(s)
Artificial Intelligence , Delivery of Health Care , Laboratories , Quality Assurance, Health Care , Quality of Health Care , Artificial Intelligence/standards , Health Facilities/standards , Laboratories/standards , Public-Private Sector Partnerships , Quality Assurance, Health Care/standards , Delivery of Health Care/standards , Quality of Health Care/standards , United States
16.
JAMA Netw Open ; 6(12): e2348422, 2023 12 01.
Article in English | MEDLINE | ID: mdl-38113040

ABSTRACT

Importance: Limited sharing of data sets that accurately represent disease and patient diversity limits the generalizability of artificial intelligence (AI) algorithms in health care. Objective: To explore the factors associated with organizational motivation to share health data for AI development. Design, Setting, and Participants: This qualitative study investigated organizational readiness for sharing health data across the academic, governmental, nonprofit, and private sectors. Using a multiple case studies approach, 27 semistructured interviews were conducted with leaders in data-sharing roles from August 29, 2022, to January 9, 2023. The interviews were conducted in the English language using a video conferencing platform. Using a purposive and nonprobabilistic sampling strategy, 78 individuals across 52 unique organizations were identified. Of these, 35 participants were enrolled. Participant recruitment concluded after 27 interviews, as theoretical saturation was reached and no additional themes emerged. Main Outcome and Measure: Concepts defining organizational readiness for data sharing and the association between data-sharing factors and organizational behavior were mapped through iterative qualitative analysis to establish a framework defining organizational readiness for sharing clinical data for AI development. Results: Interviews included 27 leaders from 18 organizations (academia: 10, government: 7, nonprofit: 8, and private: 2). Organizational readiness for data sharing centered around 2 main constructs: motivation and capabilities. Motivation related to the alignment of an organization's values with data-sharing priorities and was associated with its engagement in data-sharing efforts. However, organizational motivation could be modulated by extrinsic incentives for financial or reputational gains. Organizational capabilities comprised infrastructure, people, expertise, and access to data. Cross-sector collaboration was a key strategy to mitigate barriers to access health data. Conclusions and Relevance: This qualitative study identified sector-specific factors that may affect the data-sharing behaviors of health organizations. External incentives may bolster cross-sector collaborations by helping overcome barriers to accessing health data for AI development. The findings suggest that tailored incentives may boost organizational motivation and facilitate sustainable flow of health data for AI development.


Subject(s)
Artificial Intelligence , Delivery of Health Care , Humans , Private Sector , Information Dissemination , Motivation
17.
Discov Ment Health ; 3(1): 27, 2023 Nov 30.
Article in English | MEDLINE | ID: mdl-38036718

ABSTRACT

Schizophrenia is a debilitating condition necessitating more efficacious therapies. Previous studies suggested that schizophrenia development is associated with aberrant synaptic pruning by glial cells. We pursued an interdisciplinary approach to understand whether therapeutic reduction in glial cell-specifically astrocytic-phagocytosis might benefit neuropsychiatric patients. We discovered that beta-2 adrenergic receptor (ADRB2) agonists reduced phagocytosis using a high-throughput, phenotypic screen of over 3200 compounds in primary human fetal astrocytes. We used protein interaction pathways analysis to associate ADRB2, to schizophrenia and endocytosis. We demonstrated that patients with a pediatric exposure to salmeterol, an ADRB2 agonist, had reduced in-patient psychiatry visits using a novel observational study in the electronic health record. We used a mouse model of inflammatory neurodegenerative disease and measured changes in proteins associated with endocytosis and vesicle-mediated transport after ADRB2 agonism. These results provide substantial rationale for clinical consideration of ADRB2 agonists as possible therapies for patients with schizophrenia.

19.
BMJ Med ; 2(1): e000651, 2023.
Article in English | MEDLINE | ID: mdl-37829182

ABSTRACT

Objective: To assess the uptake of second line antihyperglycaemic drugs among patients with type 2 diabetes mellitus who are receiving metformin. Design: Federated pharmacoepidemiological evaluation in LEGEND-T2DM. Setting: 10 US and seven non-US electronic health record and administrative claims databases in the Observational Health Data Sciences and Informatics network in eight countries from 2011 to the end of 2021. Participants: 4.8 million patients (≥18 years) across US and non-US based databases with type 2 diabetes mellitus who had received metformin monotherapy and had initiated second line treatments. Exposure: The exposure used to evaluate each database was calendar year trends, with the years in the study that were specific to each cohort. Main outcomes measures: The outcome was the incidence of second line antihyperglycaemic drug use (ie, glucagon-like peptide-1 receptor agonists, sodium-glucose cotransporter-2 inhibitors, dipeptidyl peptidase-4 inhibitors, and sulfonylureas) among individuals who were already receiving treatment with metformin. The relative drug class level uptake across cardiovascular risk groups was also evaluated. Results: 4.6 million patients were identified in US databases, 61 382 from Spain, 32 442 from Germany, 25 173 from the UK, 13 270 from France, 5580 from Scotland, 4614 from Hong Kong, and 2322 from Australia. During 2011-21, the combined proportional initiation of the cardioprotective antihyperglycaemic drugs (glucagon-like peptide-1 receptor agonists and sodium-glucose cotransporter-2 inhibitors) increased across all data sources, with the combined initiation of these drugs as second line drugs in 2021 ranging from 35.2% to 68.2% in the US databases, 15.4% in France, 34.7% in Spain, 50.1% in Germany, and 54.8% in Scotland. From 2016 to 2021, in some US and non-US databases, uptake of glucagon-like peptide-1 receptor agonists and sodium-glucose cotransporter-2 inhibitors increased more significantly among populations with no cardiovascular disease compared with patients with established cardiovascular disease. No data source provided evidence of a greater increase in the uptake of these two drug classes in populations with cardiovascular disease compared with no cardiovascular disease. Conclusions: Despite the increase in overall uptake of cardioprotective antihyperglycaemic drugs as second line treatments for type 2 diabetes mellitus, their uptake was lower in patients with cardiovascular disease than in people with no cardiovascular disease over the past decade. A strategy is needed to ensure that medication use is concordant with guideline recommendations to improve outcomes of patients with type 2 diabetes mellitus.

20.
JAMA Netw Open ; 6(9): e2333495, 2023 09 05.
Article in English | MEDLINE | ID: mdl-37725377

ABSTRACT

Importance: Ranitidine, the most widely used histamine-2 receptor antagonist (H2RA), was withdrawn because of N-nitrosodimethylamine impurity in 2020. Given the worldwide exposure to this drug, the potential risk of cancer development associated with the intake of known carcinogens is an important epidemiological concern. Objective: To examine the comparative risk of cancer associated with the use of ranitidine vs other H2RAs. Design, Setting, and Participants: This new-user active comparator international network cohort study was conducted using 3 health claims and 9 electronic health record databases from the US, the United Kingdom, Germany, Spain, France, South Korea, and Taiwan. Large-scale propensity score (PS) matching was used to minimize confounding of the observed covariates with negative control outcomes. Empirical calibration was performed to account for unobserved confounding. All databases were mapped to a common data model. Database-specific estimates were combined using random-effects meta-analysis. Participants included individuals aged at least 20 years with no history of cancer who used H2RAs for more than 30 days from January 1986 to December 2020, with a 1-year washout period. Data were analyzed from April to September 2021. Exposure: The main exposure was use of ranitidine vs other H2RAs (famotidine, lafutidine, nizatidine, and roxatidine). Main Outcomes and Measures: The primary outcome was incidence of any cancer, except nonmelanoma skin cancer. Secondary outcomes included all cancer except thyroid cancer, 16 cancer subtypes, and all-cause mortality. Results: Among 1 183 999 individuals in 11 databases, 909 168 individuals (mean age, 56.1 years; 507 316 [55.8%] women) were identified as new users of ranitidine, and 274 831 individuals (mean age, 58.0 years; 145 935 [53.1%] women) were identified as new users of other H2RAs. Crude incidence rates of cancer were 14.30 events per 1000 person-years (PYs) in ranitidine users and 15.03 events per 1000 PYs among other H2RA users. After PS matching, cancer risk was similar in ranitidine compared with other H2RA users (incidence, 15.92 events per 1000 PYs vs 15.65 events per 1000 PYs; calibrated meta-analytic hazard ratio, 1.04; 95% CI, 0.97-1.12). No significant associations were found between ranitidine use and any secondary outcomes after calibration. Conclusions and Relevance: In this cohort study, ranitidine use was not associated with an increased risk of cancer compared with the use of other H2RAs. Further research is needed on the long-term association of ranitidine with cancer development.


Subject(s)
Skin Neoplasms , Thyroid Neoplasms , Female , Humans , Middle Aged , Male , Ranitidine/adverse effects , Cohort Studies , Histamine H2 Antagonists/adverse effects
SELECTION OF CITATIONS
SEARCH DETAIL