Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 293
Filtrar
1.
NPJ Digit Med ; 7(1): 171, 2024 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-38937550

RESUMEN

Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. Foundation models for structured electronic health records (EHR), trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across hospitals and their performance in local tasks. This multi-center study examined the adaptability of a publicly accessible structured EHR foundation model (FMSM), trained on 2.57 M patient records from Stanford Medicine. Experiments used EHR data from The Hospital for Sick Children (SickKids) and Medical Information Mart for Intensive Care (MIMIC-IV). We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of locally training models from scratch, including a local foundation model. Evaluations on 8 clinical prediction tasks showed that adapting the off-the-shelf FMSM matched the performance of gradient boosting machines (GBM) locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. Continued pretraining on local data showed FMSM required fewer than 1% of training examples to match the fully trained GBM's performance, and was 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings demonstrate that adapting EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.

2.
Lancet Digit Health ; 6(6): e428-e432, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38658283

RESUMEN

With the rapid growth of interest in and use of large language models (LLMs) across various industries, we are facing some crucial and profound ethical concerns, especially in the medical field. The unique technical architecture and purported emergent abilities of LLMs differentiate them substantially from other artificial intelligence (AI) models and natural language processing techniques used, necessitating a nuanced understanding of LLM ethics. In this Viewpoint, we highlight ethical concerns stemming from the perspectives of users, developers, and regulators, notably focusing on data privacy and rights of use, data provenance, intellectual property contamination, and broad applications and plasticity of LLMs. A comprehensive framework and mitigating strategies will be imperative for the responsible integration of LLMs into medical practice, ensuring alignment with ethical principles and safeguarding against potential societal risks.


Asunto(s)
Inteligencia Artificial , Procesamiento de Lenguaje Natural , Humanos , Inteligencia Artificial/ética , Propiedad Intelectual
3.
JMIR Med Inform ; 12: e51171, 2024 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-38596848

RESUMEN

Background: With the capability to render prediagnoses, consumer wearables have the potential to affect subsequent diagnoses and the level of care in the health care delivery setting. Despite this, postmarket surveillance of consumer wearables has been hindered by the lack of codified terms in electronic health records (EHRs) to capture wearable use. Objective: We sought to develop a weak supervision-based approach to demonstrate the feasibility and efficacy of EHR-based postmarket surveillance on consumer wearables that render atrial fibrillation (AF) prediagnoses. Methods: We applied data programming, where labeling heuristics are expressed as code-based labeling functions, to detect incidents of AF prediagnoses. A labeler model was then derived from the predictions of the labeling functions using the Snorkel framework. The labeler model was applied to clinical notes to probabilistically label them, and the labeled notes were then used as a training set to fine-tune a classifier called Clinical-Longformer. The resulting classifier identified patients with an AF prediagnosis. A retrospective cohort study was conducted, where the baseline characteristics and subsequent care patterns of patients identified by the classifier were compared against those who did not receive a prediagnosis. Results: The labeler model derived from the labeling functions showed high accuracy (0.92; F1-score=0.77) on the training set. The classifier trained on the probabilistically labeled notes accurately identified patients with an AF prediagnosis (0.95; F1-score=0.83). The cohort study conducted using the constructed system carried enough statistical power to verify the key findings of the Apple Heart Study, which enrolled a much larger number of participants, where patients who received a prediagnosis tended to be older, male, and White with higher CHA2DS2-VASc (congestive heart failure, hypertension, age ≥75 years, diabetes, stroke, vascular disease, age 65-74 years, sex category) scores (P<.001). We also made a novel discovery that patients with a prediagnosis were more likely to use anticoagulants (525/1037, 50.63% vs 5936/16,560, 35.85%) and have an eventual AF diagnosis (305/1037, 29.41% vs 262/16,560, 1.58%). At the index diagnosis, the existence of a prediagnosis did not distinguish patients based on clinical characteristics, but did correlate with anticoagulant prescription (P=.004 for apixaban and P=.01 for rivaroxaban). Conclusions: Our work establishes the feasibility and efficacy of an EHR-based surveillance system for consumer wearables that render AF prediagnoses. Further work is necessary to generalize these findings for patient populations at other sites.

4.
J Am Med Inform Assoc ; 31(6): 1441-1444, 2024 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-38452298

RESUMEN

OBJECTIVES: This article aims to examine how generative artificial intelligence (AI) can be adopted with the most value in health systems, in response to the Executive Order on AI. MATERIALS AND METHODS: We reviewed how technology has historically been deployed in healthcare, and evaluated recent examples of deployments of both traditional AI and generative AI (GenAI) with a lens on value. RESULTS: Traditional AI and GenAI are different technologies in terms of their capability and modes of current deployment, which have implications on value in health systems. DISCUSSION: Traditional AI when applied with a framework top-down can realize value in healthcare. GenAI in the short term when applied top-down has unclear value, but encouraging more bottom-up adoption has the potential to provide more benefit to health systems and patients. CONCLUSION: GenAI in healthcare can provide the most value for patients when health systems adapt culturally to grow with this new technology and its adoption patterns.


Asunto(s)
Inteligencia Artificial , Atención a la Salud , Humanos
5.
BMC Med Inform Decis Mak ; 24(1): 51, 2024 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-38355486

RESUMEN

BACKGROUND: Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels. METHODS: This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen's Kappa, sensitivity and specificity were calculated for each lab-based severity level. RESULTS: The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen's Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds. CONCLUSIONS: Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.


Asunto(s)
Hiperpotasemia , Neutropenia , Humanos , Atención a la Salud , Aprendizaje Automático , Sensibilidad y Especificidad
7.
J Palliat Med ; 27(1): 83-89, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37935036

RESUMEN

Background: Patients with serious illness benefit from conversations to share prognosis and explore goals and values. To address this, we implemented Ariadne Labs' Serious Illness Care Program (SICP) at Stanford Health Care. Objective: Improve quantity, timing, and quality of serious illness conversations. Methods: Initial implementation followed Ariadne Labs' SICP framework. We later incorporated a team-based approach that included nonphysician care team members. Outcomes included number of patients with documented conversations according to clinician role and practice location. Machine learning algorithms were used in some settings to identify eligible patients. Results: Ambulatory oncology and hospital medicine were our largest implementation sites, engaging 4707 and 642 unique patients in conversations, respectively. Clinicians across eight disciplines engaged in these conversations. Identified barriers that included leadership engagement, complex workflows, and patient identification. Conclusion: Several factors contributed to successful SICP implementation across clinical sites: innovative clinical workflows, machine learning based predictive algorithms, and nonphysician care team member engagement.


Asunto(s)
Cuidados Críticos , Enfermedad Crítica , Humanos , Enfermedad Crítica/terapia , Comunicación , Relaciones Médico-Paciente , Centros Médicos Académicos
8.
JAMA ; 331(1): 17-18, 2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38032634

RESUMEN

This Viewpoint discusses a recent executive order by US President Joe Biden about the development and implementation of AI, including the role of government vs the private sector and how the order may affect health care.


Asunto(s)
Inteligencia Artificial , Atención a la Salud , Atención a la Salud/legislación & jurisprudencia , Práctica de Grupo/legislación & jurisprudencia , Organizaciones/legislación & jurisprudencia , Política , Gobierno Federal , Estados Unidos
9.
JAMA ; 331(3): 245-249, 2024 01 16.
Artículo en Inglés | MEDLINE | ID: mdl-38117493

RESUMEN

Importance: Given the importance of rigorous development and evaluation standards needed of artificial intelligence (AI) models used in health care, nationwide accepted procedures to provide assurance that the use of AI is fair, appropriate, valid, effective, and safe are urgently needed. Observations: While there are several efforts to develop standards and best practices to evaluate AI, there is a gap between having such guidance and the application of such guidance to both existing and new AI models being developed. As of now, there is no publicly available, nationwide mechanism that enables objective evaluation and ongoing assessment of the consequences of using health AI models in clinical care settings. Conclusion and Relevance: The need to create a public-private partnership to support a nationwide health AI assurance labs network is outlined here. In this network, community best practices could be applied for testing health AI models to produce reports on their performance that can be widely shared for managing the lifecycle of AI models over time and across populations and sites where these models are deployed.


Asunto(s)
Inteligencia Artificial , Atención a la Salud , Laboratorios , Garantía de la Calidad de Atención de Salud , Calidad de la Atención de Salud , Inteligencia Artificial/normas , Instituciones de Salud/normas , Laboratorios/normas , Asociación entre el Sector Público-Privado , Garantía de la Calidad de Atención de Salud/normas , Atención a la Salud/normas , Calidad de la Atención de Salud/normas , Estados Unidos
10.
Pac Symp Biocomput ; 29: 8-23, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38160266

RESUMEN

The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.


Asunto(s)
Biología Computacional , Procesamiento de Lenguaje Natural , Humanos , PubMed , Almacenamiento y Recuperación de la Información , Lenguaje
11.
JAMA Netw Open ; 6(12): e2348422, 2023 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-38113040

RESUMEN

Importance: Limited sharing of data sets that accurately represent disease and patient diversity limits the generalizability of artificial intelligence (AI) algorithms in health care. Objective: To explore the factors associated with organizational motivation to share health data for AI development. Design, Setting, and Participants: This qualitative study investigated organizational readiness for sharing health data across the academic, governmental, nonprofit, and private sectors. Using a multiple case studies approach, 27 semistructured interviews were conducted with leaders in data-sharing roles from August 29, 2022, to January 9, 2023. The interviews were conducted in the English language using a video conferencing platform. Using a purposive and nonprobabilistic sampling strategy, 78 individuals across 52 unique organizations were identified. Of these, 35 participants were enrolled. Participant recruitment concluded after 27 interviews, as theoretical saturation was reached and no additional themes emerged. Main Outcome and Measure: Concepts defining organizational readiness for data sharing and the association between data-sharing factors and organizational behavior were mapped through iterative qualitative analysis to establish a framework defining organizational readiness for sharing clinical data for AI development. Results: Interviews included 27 leaders from 18 organizations (academia: 10, government: 7, nonprofit: 8, and private: 2). Organizational readiness for data sharing centered around 2 main constructs: motivation and capabilities. Motivation related to the alignment of an organization's values with data-sharing priorities and was associated with its engagement in data-sharing efforts. However, organizational motivation could be modulated by extrinsic incentives for financial or reputational gains. Organizational capabilities comprised infrastructure, people, expertise, and access to data. Cross-sector collaboration was a key strategy to mitigate barriers to access health data. Conclusions and Relevance: This qualitative study identified sector-specific factors that may affect the data-sharing behaviors of health organizations. External incentives may bolster cross-sector collaborations by helping overcome barriers to accessing health data for AI development. The findings suggest that tailored incentives may boost organizational motivation and facilitate sustainable flow of health data for AI development.


Asunto(s)
Inteligencia Artificial , Atención a la Salud , Humanos , Sector Privado , Difusión de la Información , Motivación
12.
Discov Ment Health ; 3(1): 27, 2023 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-38036718

RESUMEN

Schizophrenia is a debilitating condition necessitating more efficacious therapies. Previous studies suggested that schizophrenia development is associated with aberrant synaptic pruning by glial cells. We pursued an interdisciplinary approach to understand whether therapeutic reduction in glial cell-specifically astrocytic-phagocytosis might benefit neuropsychiatric patients. We discovered that beta-2 adrenergic receptor (ADRB2) agonists reduced phagocytosis using a high-throughput, phenotypic screen of over 3200 compounds in primary human fetal astrocytes. We used protein interaction pathways analysis to associate ADRB2, to schizophrenia and endocytosis. We demonstrated that patients with a pediatric exposure to salmeterol, an ADRB2 agonist, had reduced in-patient psychiatry visits using a novel observational study in the electronic health record. We used a mouse model of inflammatory neurodegenerative disease and measured changes in proteins associated with endocytosis and vesicle-mediated transport after ADRB2 agonism. These results provide substantial rationale for clinical consideration of ADRB2 agonists as possible therapies for patients with schizophrenia.

13.
BMJ Med ; 2(1): e000651, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37829182

RESUMEN

Objective: To assess the uptake of second line antihyperglycaemic drugs among patients with type 2 diabetes mellitus who are receiving metformin. Design: Federated pharmacoepidemiological evaluation in LEGEND-T2DM. Setting: 10 US and seven non-US electronic health record and administrative claims databases in the Observational Health Data Sciences and Informatics network in eight countries from 2011 to the end of 2021. Participants: 4.8 million patients (≥18 years) across US and non-US based databases with type 2 diabetes mellitus who had received metformin monotherapy and had initiated second line treatments. Exposure: The exposure used to evaluate each database was calendar year trends, with the years in the study that were specific to each cohort. Main outcomes measures: The outcome was the incidence of second line antihyperglycaemic drug use (ie, glucagon-like peptide-1 receptor agonists, sodium-glucose cotransporter-2 inhibitors, dipeptidyl peptidase-4 inhibitors, and sulfonylureas) among individuals who were already receiving treatment with metformin. The relative drug class level uptake across cardiovascular risk groups was also evaluated. Results: 4.6 million patients were identified in US databases, 61 382 from Spain, 32 442 from Germany, 25 173 from the UK, 13 270 from France, 5580 from Scotland, 4614 from Hong Kong, and 2322 from Australia. During 2011-21, the combined proportional initiation of the cardioprotective antihyperglycaemic drugs (glucagon-like peptide-1 receptor agonists and sodium-glucose cotransporter-2 inhibitors) increased across all data sources, with the combined initiation of these drugs as second line drugs in 2021 ranging from 35.2% to 68.2% in the US databases, 15.4% in France, 34.7% in Spain, 50.1% in Germany, and 54.8% in Scotland. From 2016 to 2021, in some US and non-US databases, uptake of glucagon-like peptide-1 receptor agonists and sodium-glucose cotransporter-2 inhibitors increased more significantly among populations with no cardiovascular disease compared with patients with established cardiovascular disease. No data source provided evidence of a greater increase in the uptake of these two drug classes in populations with cardiovascular disease compared with no cardiovascular disease. Conclusions: Despite the increase in overall uptake of cardioprotective antihyperglycaemic drugs as second line treatments for type 2 diabetes mellitus, their uptake was lower in patients with cardiovascular disease than in people with no cardiovascular disease over the past decade. A strategy is needed to ensure that medication use is concordant with guideline recommendations to improve outcomes of patients with type 2 diabetes mellitus.

15.
BMC Med Res Methodol ; 23(1): 204, 2023 09 09.
Artículo en Inglés | MEDLINE | ID: mdl-37689623

RESUMEN

BACKGROUND: Non-experimental studies (also known as observational studies) are valuable for estimating the effects of various medical interventions, but are notoriously difficult to evaluate because the methods used in non-experimental studies require untestable assumptions. This lack of intrinsic verifiability makes it difficult both to compare different non-experimental study methods and to trust the results of any particular non-experimental study. METHODS: We introduce TrialProbe, a data resource and statistical framework for the evaluation of non-experimental methods. We first collect a dataset of pseudo "ground truths" about the relative effects of drugs by using empirical Bayesian techniques to analyze adverse events recorded in public clinical trial reports. We then develop a framework for evaluating non-experimental methods against that ground truth by measuring concordance between the non-experimental effect estimates and the estimates derived from clinical trials. As a demonstration of our approach, we also perform an example methods evaluation between propensity score matching, inverse propensity score weighting, and an unadjusted approach on a large national insurance claims dataset. RESULTS: From the 33,701 clinical trial records in our version of the ClinicalTrials.gov dataset, we are able to extract 12,967 unique drug/drug adverse event comparisons to form a ground truth set. During our corresponding methods evaluation, we are able to use that reference set to demonstrate that both propensity score matching and inverse propensity score weighting can produce estimates that have high concordance with clinical trial results and substantially outperform an unadjusted baseline. CONCLUSIONS: We find that TrialProbe is an effective approach for probing non-experimental study methods, being able to generate large ground truth sets that are able to distinguish how well non-experimental methods perform in real world observational data.


Asunto(s)
Proyectos de Investigación , Humanos , Teorema de Bayes , Causalidad , Puntaje de Propensión
16.
JAMA Netw Open ; 6(9): e2333495, 2023 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-37725377

RESUMEN

Importance: Ranitidine, the most widely used histamine-2 receptor antagonist (H2RA), was withdrawn because of N-nitrosodimethylamine impurity in 2020. Given the worldwide exposure to this drug, the potential risk of cancer development associated with the intake of known carcinogens is an important epidemiological concern. Objective: To examine the comparative risk of cancer associated with the use of ranitidine vs other H2RAs. Design, Setting, and Participants: This new-user active comparator international network cohort study was conducted using 3 health claims and 9 electronic health record databases from the US, the United Kingdom, Germany, Spain, France, South Korea, and Taiwan. Large-scale propensity score (PS) matching was used to minimize confounding of the observed covariates with negative control outcomes. Empirical calibration was performed to account for unobserved confounding. All databases were mapped to a common data model. Database-specific estimates were combined using random-effects meta-analysis. Participants included individuals aged at least 20 years with no history of cancer who used H2RAs for more than 30 days from January 1986 to December 2020, with a 1-year washout period. Data were analyzed from April to September 2021. Exposure: The main exposure was use of ranitidine vs other H2RAs (famotidine, lafutidine, nizatidine, and roxatidine). Main Outcomes and Measures: The primary outcome was incidence of any cancer, except nonmelanoma skin cancer. Secondary outcomes included all cancer except thyroid cancer, 16 cancer subtypes, and all-cause mortality. Results: Among 1 183 999 individuals in 11 databases, 909 168 individuals (mean age, 56.1 years; 507 316 [55.8%] women) were identified as new users of ranitidine, and 274 831 individuals (mean age, 58.0 years; 145 935 [53.1%] women) were identified as new users of other H2RAs. Crude incidence rates of cancer were 14.30 events per 1000 person-years (PYs) in ranitidine users and 15.03 events per 1000 PYs among other H2RA users. After PS matching, cancer risk was similar in ranitidine compared with other H2RA users (incidence, 15.92 events per 1000 PYs vs 15.65 events per 1000 PYs; calibrated meta-analytic hazard ratio, 1.04; 95% CI, 0.97-1.12). No significant associations were found between ranitidine use and any secondary outcomes after calibration. Conclusions and Relevance: In this cohort study, ranitidine use was not associated with an increased risk of cancer compared with the use of other H2RAs. Further research is needed on the long-term association of ranitidine with cancer development.


Asunto(s)
Neoplasias Cutáneas , Neoplasias de la Tiroides , Femenino , Humanos , Persona de Mediana Edad , Masculino , Ranitidina/efectos adversos , Estudios de Cohortes , Antagonistas de los Receptores H2 de la Histamina/efectos adversos
17.
J Am Med Inform Assoc ; 30(12): 2004-2011, 2023 11 17.
Artículo en Inglés | MEDLINE | ID: mdl-37639620

RESUMEN

OBJECTIVE: Development of electronic health records (EHR)-based machine learning models for pediatric inpatients is challenged by limited training data. Self-supervised learning using adult data may be a promising approach to creating robust pediatric prediction models. The primary objective was to determine whether a self-supervised model trained in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients, for pediatric inpatient clinical prediction tasks. MATERIALS AND METHODS: This retrospective cohort study used EHR data and included patients with at least one admission to an inpatient unit. One admission per patient was randomly selected. Adult inpatients were 18 years or older while pediatric inpatients were more than 28 days and less than 18 years. Admissions were temporally split into training (January 1, 2008 to December 31, 2019), validation (January 1, 2020 to December 31, 2020), and test (January 1, 2021 to August 1, 2022) sets. Primary comparison was a self-supervised model trained in adult inpatients versus count-based logistic regression models trained in pediatric inpatients. Primary outcome was mean area-under-the-receiver-operating-characteristic-curve (AUROC) for 11 distinct clinical outcomes. Models were evaluated in pediatric inpatients. RESULTS: When evaluated in pediatric inpatients, mean AUROC of self-supervised model trained in adult inpatients (0.902) was noninferior to count-based logistic regression models trained in pediatric inpatients (0.868) (mean difference = 0.034, 95% CI=0.014-0.057; P < .001 for noninferiority and P = .006 for superiority). CONCLUSIONS: Self-supervised learning in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients. This finding suggests transferability of self-supervised models trained in adult patients to pediatric patients, without requiring costly model retraining.


Asunto(s)
Pacientes Internos , Aprendizaje Automático , Humanos , Adulto , Niño , Estudios Retrospectivos , Aprendizaje Automático Supervisado , Registros Electrónicos de Salud
18.
JAMA ; 330(9): 866-869, 2023 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-37548965

RESUMEN

Importance: There is increased interest in and potential benefits from using large language models (LLMs) in medicine. However, by simply wondering how the LLMs and the applications powered by them will reshape medicine instead of getting actively involved, the agency in shaping how these tools can be used in medicine is lost. Observations: Applications powered by LLMs are increasingly used to perform medical tasks without the underlying language model being trained on medical records and without verifying their purported benefit in performing those tasks. Conclusions and Relevance: The creation and use of LLMs in medicine need to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments.


Asunto(s)
Lenguaje , Aprendizaje Automático , Registros Médicos , Medicina , Registros Médicos/normas , Medicina/métodos , Medicina/normas , Simulación por Computador
19.
JAMIA Open ; 6(3): ooad054, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37545984

RESUMEN

Objective: To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research. Materials and Methods: The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training. Results: The Stanford Medicine Research Data Repository (STARR) functions as the SDSR data integration point, and includes electronic medical records, clinical images, text, bedside monitoring data and HL7 messages. SDSR tools include tools for electronic phenotyping, cohort building, and a search engine for patient timelines. The SDSR supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and large-scale observational studies. Discussion: Research patient data repositories and their underlying data science infrastructure are essential to realizing a learning health system and advancing the mission of academic medical centers. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users. Conclusion: Our experience maintaining the SDSR offers a case study for academic medical centers developing data science and research informatics infrastructure.

20.
JAMIA Open ; 6(2): ooad043, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37397506

RESUMEN

Objective: Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults. Materials and methods: We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions. We relied on rule-based phenotype definitions as reference to evaluate probabilistic phenotype algorithms created using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation framework. Results: We demonstrate that some algorithms have performance variations anywhere from 3% to 30% for different populations, even when not using race as an input variable. We show that while performance differences in subgroups are not present for all phenotypes, they do affect some phenotypes and groups more disproportionately than others. Discussion: Our analysis establishes the need for a robust evaluation framework for subgroup differences. The underlying patient populations for the algorithms showing subgroup performance differences have great variance between model features when compared with the phenotypes with little to no differences. Conclusion: We have created a framework to identify systematic differences in the performance of probabilistic phenotyping algorithms specifically in the context of ADRD as a use case. Differences in subgroup performance of probabilistic phenotyping algorithms are not widespread nor do they occur consistently. This highlights the great need for careful ongoing monitoring to evaluate, measure, and try to mitigate such differences.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...