Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Nature ; 619(7969): 357-362, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37286606

RESUMO

Physicians make critical time-constrained decisions every day. Clinical predictive models can help physicians and administrators make decisions by forecasting clinical and operational events. Existing structured data-based clinical predictive models have limited use in everyday practice owing to complexity in data processing, as well as model development and deployment1-3. Here we show that unstructured clinical notes from the electronic health record can enable the training of clinical language models, which can be used as all-purpose clinical predictive engines with low-resistance development and deployment. Our approach leverages recent advances in natural language processing4,5 to train a large language model for medical language (NYUTron) and subsequently fine-tune it across a wide range of clinical and operational predictive tasks. We evaluated our approach within our health system for five such tasks: 30-day all-cause readmission prediction, in-hospital mortality prediction, comorbidity index prediction, length of stay prediction, and insurance denial prediction. We show that NYUTron has an area under the curve (AUC) of 78.7-94.9%, with an improvement of 5.36-14.7% in the AUC compared with traditional models. We additionally demonstrate the benefits of pretraining with clinical text, the potential for increasing generalizability to different sites through fine-tuning and the full deployment of our system in a prospective, single-arm trial. These results show the potential for using clinical language models in medicine to read alongside physicians and provide guidance at the point of care.


Assuntos
Tomada de Decisão Clínica , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Médicos , Humanos , Tomada de Decisão Clínica/métodos , Readmissão do Paciente , Mortalidade Hospitalar , Comorbidade , Tempo de Internação , Cobertura do Seguro , Área Sob a Curva , Sistemas Automatizados de Assistência Junto ao Leito/tendências , Ensaios Clínicos como Assunto
2.
J Gen Intern Med ; 37(9): 2230-2238, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35710676

RESUMO

BACKGROUND: Residents receive infrequent feedback on their clinical reasoning (CR) documentation. While machine learning (ML) and natural language processing (NLP) have been used to assess CR documentation in standardized cases, no studies have described similar use in the clinical environment. OBJECTIVE: The authors developed and validated using Kane's framework a ML model for automated assessment of CR documentation quality in residents' admission notes. DESIGN, PARTICIPANTS, MAIN MEASURES: Internal medicine residents' and subspecialty fellows' admission notes at one medical center from July 2014 to March 2020 were extracted from the electronic health record. Using a validated CR documentation rubric, the authors rated 414 notes for the ML development dataset. Notes were truncated to isolate the relevant portion; an NLP software (cTAKES) extracted disease/disorder named entities and human review generated CR terms. The final model had three input variables and classified notes as demonstrating low- or high-quality CR documentation. The ML model was applied to a retrospective dataset (9591 notes) for human validation and data analysis. Reliability between human and ML ratings was assessed on 205 of these notes with Cohen's kappa. CR documentation quality by post-graduate year (PGY) was evaluated by the Mantel-Haenszel test of trend. KEY RESULTS: The top-performing logistic regression model had an area under the receiver operating characteristic curve of 0.88, a positive predictive value of 0.68, and an accuracy of 0.79. Cohen's kappa was 0.67. Of the 9591 notes, 31.1% demonstrated high-quality CR documentation; quality increased from 27.0% (PGY1) to 31.0% (PGY2) to 39.0% (PGY3) (p < .001 for trend). Validity evidence was collected in each domain of Kane's framework (scoring, generalization, extrapolation, and implications). CONCLUSIONS: The authors developed and validated a high-performing ML model that classifies CR documentation quality in resident admission notes in the clinical environment-a novel application of ML and NLP with many potential use cases.


Assuntos
Raciocínio Clínico , Documentação , Registros Eletrônicos de Saúde , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Reprodutibilidade dos Testes , Estudos Retrospectivos
3.
J Gen Intern Med ; 37(3): 507-512, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-33945113

RESUMO

BACKGROUND: Residents and fellows receive little feedback on their clinical reasoning documentation. Barriers include lack of a shared mental model and variability in the reliability and validity of existing assessment tools. Of the existing tools, the IDEA assessment tool includes a robust assessment of clinical reasoning documentation focusing on four elements (interpretive summary, differential diagnosis, explanation of reasoning for lead and alternative diagnoses) but lacks descriptive anchors threatening its reliability. OBJECTIVE: Our goal was to develop a valid and reliable assessment tool for clinical reasoning documentation building off the IDEA assessment tool. DESIGN, PARTICIPANTS, AND MAIN MEASURES: The Revised-IDEA assessment tool was developed by four clinician educators through iterative review of admission notes written by medicine residents and fellows and subsequently piloted with additional faculty to ensure response process validity. A random sample of 252 notes from July 2014 to June 2017 written by 30 trainees across several chief complaints was rated. Three raters rated 20% of the notes to demonstrate internal structure validity. A quality cut-off score was determined using Hofstee standard setting. KEY RESULTS: The Revised-IDEA assessment tool includes the same four domains as the IDEA assessment tool with more detailed descriptive prompts, new Likert scale anchors, and a score range of 0-10. Intraclass correlation was high for the notes rated by three raters, 0.84 (95% CI 0.74-0.90). Scores ≥6 were determined to demonstrate high-quality clinical reasoning documentation. Only 53% of notes (134/252) were high-quality. CONCLUSIONS: The Revised-IDEA assessment tool is reliable and easy to use for feedback on clinical reasoning documentation in resident and fellow admission notes with descriptive anchors that facilitate a shared mental model for feedback.


Assuntos
Competência Clínica , Raciocínio Clínico , Documentação , Retroalimentação , Humanos , Modelos Psicológicos , Reprodutibilidade dos Testes
4.
Arterioscler Thromb Vasc Biol ; 40(10): 2539-2547, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32840379

RESUMO

OBJECTIVE: To determine the prevalence of D-dimer elevation in coronavirus disease 2019 (COVID-19) hospitalization, trajectory of D-dimer levels during hospitalization, and its association with clinical outcomes. Approach and Results: Consecutive adults admitted to a large New York City hospital system with a positive polymerase chain reaction test for SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) between March 1, 2020 and April 8, 2020 were identified. Elevated D-dimer was defined by the laboratory-specific upper limit of normal (>230 ng/mL). Outcomes included critical illness (intensive care, mechanical ventilation, discharge to hospice, or death), thrombotic events, acute kidney injury, and death during admission. Among 2377 adults hospitalized with COVID-19 and ≥1 D-dimer measurement, 1823 (76%) had elevated D-dimer at presentation. Patients with elevated presenting baseline D-dimer were more likely than those with normal D-dimer to have critical illness (43.9% versus 18.5%; adjusted odds ratio, 2.4 [95% CI, 1.9-3.1]; P<0.001), any thrombotic event (19.4% versus 10.2%; adjusted odds ratio, 1.9 [95% CI, 1.4-2.6]; P<0.001), acute kidney injury (42.4% versus 19.0%; adjusted odds ratio, 2.4 [95% CI, 1.9-3.1]; P<0.001), and death (29.9% versus 10.8%; adjusted odds ratio, 2.1 [95% CI, 1.6-2.9]; P<0.001). Rates of adverse events increased with the magnitude of D-dimer elevation; individuals with presenting D-dimer >2000 ng/mL had the highest risk of critical illness (66%), thrombotic event (37.8%), acute kidney injury (58.3%), and death (47%). CONCLUSIONS: Abnormal D-dimer was frequently observed at admission with COVID-19 and was associated with higher incidence of critical illness, thrombotic events, acute kidney injury, and death. The optimal management of patients with elevated D-dimer in COVID-19 requires further study.


Assuntos
Infecções por Coronavirus/sangue , Infecções por Coronavirus/mortalidade , Estado Terminal/epidemiologia , Progressão da Doença , Produtos de Degradação da Fibrina e do Fibrinogênio/metabolismo , Mortalidade Hospitalar/tendências , Pneumonia Viral/sangue , Pneumonia Viral/mortalidade , Adulto , Idoso , Biomarcadores/sangue , COVID-19 , Causas de Morte , Estudos de Coortes , Infecções por Coronavirus/fisiopatologia , Bases de Dados Factuais , Feminino , Hospitais Urbanos , Humanos , Masculino , Pessoa de Meia-Idade , Cidade de Nova Iorque/epidemiologia , Pandemias , Pneumonia Viral/fisiopatologia , Prevalência , Estudos Retrospectivos , Medição de Risco , Síndrome Respiratória Aguda Grave/sangue , Síndrome Respiratória Aguda Grave/mortalidade , Síndrome Respiratória Aguda Grave/fisiopatologia , Índice de Gravidade de Doença
5.
BMC Med Inform Decis Mak ; 20(1): 214, 2020 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-32894128

RESUMO

BACKGROUND: Automated systems that use machine learning to estimate a patient's risk of death are being developed to influence care. There remains sparse transparent reporting of model generalizability in different subpopulations especially for implemented systems. METHODS: A prognostic study included adult admissions at a multi-site, academic medical center between 2015 and 2017. A predictive model for all-cause mortality (including initiation of hospice care) within 60 days of admission was developed. Model generalizability is assessed in temporal validation in the context of potential demographic bias. A subsequent prospective cohort study was conducted at the same sites between October 2018 and June 2019. Model performance during prospective validation was quantified with areas under the receiver operating characteristic and precision recall curves stratified by site. Prospective results include timeliness, positive predictive value, and the number of actionable predictions. RESULTS: Three years of development data included 128,941 inpatient admissions (94,733 unique patients) across sites where patients are mostly white (61%) and female (60%) and 4.2% led to death within 60 days. A random forest model incorporating 9614 predictors produced areas under the receiver operating characteristic and precision recall curves of 87.2 (95% CI, 86.1-88.2) and 28.0 (95% CI, 25.0-31.0) in temporal validation. Performance marginally diverges within sites as the patient mix shifts from development to validation (patients of one site increases from 10 to 38%). Applied prospectively for nine months, 41,728 predictions were generated in real-time (median [IQR], 1.3 [0.9, 32] minutes). An operating criterion of 75% positive predictive value identified 104 predictions at very high risk (0.25%) where 65% (50 from 77 well-timed predictions) led to death within 60 days. CONCLUSION: Temporal validation demonstrates good model discrimination for 60-day mortality. Slight performance variations are observed across demographic subpopulations. The model was implemented prospectively and successfully produced meaningful estimates of risk within minutes of admission.


Assuntos
Registros Eletrônicos de Saúde , Hospitalização , Aprendizado de Máquina , Admissão do Paciente , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Mortalidade , Prognóstico , Estudos Prospectivos , Adulto Jovem
6.
Gynecol Oncol ; 149(1): 22-27, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29605045

RESUMO

OBJECTIVES: Black race has been associated with increased 30-day morbidity and mortality following surgery for endometrial cancer. Black women are also less likely to undergo laparoscopy when compared to white women. With the development of improved laparoscopic techniques and equipment, including the robotic platform, we sought to evaluate whether there has been a change in surgical approach for black women, and in turn, improvement in perioperative outcomes. METHODS: Using the American College of Surgeons' National Surgical Quality Improvement Project's database, patients who underwent hysterectomy for endometrial cancer from 2010 to 2015 were identified. Comparative analyses stratified by race and hysterectomy approach were performed to assess the relationship between race and perioperative outcomes. RESULTS: A total of 17,692 patients were identified: of these, 13,720 (77.5%) were white and 1553 (8.8%) were black. Black women were less likely to undergo laparoscopic hysterectomy compared to white women (49.3% vs 71.3%, p<0.0001). Rates of laparoscopy in both races increased over the 6-year period; however these consistently remained lower in black women each year. Black women had higher 30-day postoperative complication rates compared to white women (22.5% vs 13.6%, p<0.0001). When laparoscopic hysterectomies were isolated, there was no difference in postoperative complication rates between black and white women (9.2% vs 7.5%, p=0.1). CONCLUSIONS: Overall black women incur more postoperative complications compared to white women undergoing hysterectomy for endometrial cancer. However, laparoscopy may mitigate this disparity. Efforts should be made to maximize the utilization of minimally invasive surgery for the surgical management of endometrial cancer.


Assuntos
População Negra/estatística & dados numéricos , Neoplasias do Endométrio/etnologia , Neoplasias do Endométrio/cirurgia , Histerectomia/estatística & dados numéricos , Laparoscopia/estatística & dados numéricos , População Branca/estatística & dados numéricos , Feminino , Disparidades em Assistência à Saúde/estatística & dados numéricos , Humanos , Histerectomia/efeitos adversos , Histerectomia/métodos , Laparoscopia/efeitos adversos , Laparoscopia/métodos , Pessoa de Meia-Idade , Complicações Pós-Operatórias/epidemiologia , Complicações Pós-Operatórias/etnologia , Complicações Pós-Operatórias/etiologia , Estados Unidos/epidemiologia
8.
Semin Musculoskelet Radiol ; 21(1): 32-36, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-28253531

RESUMO

This article reviews examples of big data analyses in health care with a focus on radiology. We review the defining characteristics of big data, the use of natural language processing, traditional and novel data sources, and large clinical data repositories available for research. This article aims to invoke novel research ideas through a combination of examples of analyses and domain knowledge.


Assuntos
Interpretação Estatística de Dados , Radiologia/estatística & dados numéricos , Humanos
9.
Am J Addict ; 26(6): 581-586, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28799677

RESUMO

BACKGROUND AND OBJECTIVES: Missed visits are common in office-based buprenorphine treatment (OBOT). The feasibility of text message (TM) appointment reminders among OBOT patients is unknown. METHODS: This 6-month prospective cohort study provided TM reminders to OBOT program patients (N = 93). A feasibility survey was completed following delivery of TM reminders and at 6 months. RESULTS: Respondents reported that the reminders should be provided to all OBOT patients (100%) and helped them to adhere to their scheduled appointment (97%). At 6 months, there were no reports of intrusion to their privacy or disruption of daily activities due to the TM reminders. Most participants reported that the TM reminders were helpful in adhering to scheduled appointments (95%), that the reminders should be offered to all clinic patients (95%), and favored receiving only TM reminders rather than telephone reminders (95%). Barriers to adhering to scheduled appointment times included transportation difficulties (34%), not being able to take time off from school or work (31%), long clinic wait-times (9%), being hospitalized or sick (8%), feeling sad or depressed (6%), and child care (6%). CONCLUSIONS: This study demonstrated the acceptability and feasibility of TM appointment reminders in OBOT. Older age and longer duration in buprenorphine treatment did not diminish interest in receiving the TM intervention. Although OBOT patients expressed concern regarding the privacy of TM content sent from their providers, privacy issues were uncommon among this cohort. Scientific Significance Findings from this study highlighted patient barriers to adherence to scheduled appointments. These barriers included transportation difficulties (34%), not being able to take time off from school or work (31%), long clinic lines (9%), and other factors that may confound the effect of future TM appointment reminder interventions. Further research is also required to assess 1) the level of system changes required to integrate TM appointment reminder tools with already existing electronic medical records and appointment records software; 2) acceptability among clinicians and administrators; and 3) financial and resource constraints to healthcare systems. (Am J Addict 2017;26:581-586).


Assuntos
Buprenorfina/uso terapêutico , Tratamento de Substituição de Opiáceos , Transtornos Relacionados ao Uso de Opioides/tratamento farmacológico , Sistemas de Alerta , Envio de Mensagens de Texto , Adulto , Agendamento de Consultas , Estudos de Viabilidade , Feminino , Humanos , Masculino , Antagonistas de Entorpecentes/uso terapêutico , Tratamento de Substituição de Opiáceos/métodos , Tratamento de Substituição de Opiáceos/psicologia , Tratamento de Substituição de Opiáceos/estatística & dados numéricos , Transtornos Relacionados ao Uso de Opioides/epidemiologia , Transtornos Relacionados ao Uso de Opioides/psicologia , Aceitação pelo Paciente de Cuidados de Saúde , Cooperação do Paciente/psicologia , Cooperação do Paciente/estatística & dados numéricos , Estudos Prospectivos , Sistemas de Alerta/instrumentação , Sistemas de Alerta/estatística & dados numéricos , Estados Unidos
11.
J Transl Med ; 14(1): 235, 2016 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-27492440

RESUMO

BACKGROUND: Translational research is a key area of focus of the National Institutes of Health (NIH), as demonstrated by the substantial investment in the Clinical and Translational Science Award (CTSA) program. The goal of the CTSA program is to accelerate the translation of discoveries from the bench to the bedside and into communities. Different classification systems have been used to capture the spectrum of basic to clinical to population health research, with substantial differences in the number of categories and their definitions. Evaluation of the effectiveness of the CTSA program and of translational research in general is hampered by the lack of rigor in these definitions and their application. This study adds rigor to the classification process by creating a checklist to evaluate publications across the translational spectrum and operationalizes these classifications by building machine learning-based text classifiers to categorize these publications. METHODS: Based on collaboratively developed definitions, we created a detailed checklist for categories along the translational spectrum from T0 to T4. We applied the checklist to CTSA-linked publications to construct a set of coded publications for use in training machine learning-based text classifiers to classify publications within these categories. The training sets combined T1/T2 and T3/T4 categories due to low frequency of these publication types compared to the frequency of T0 publications. We then compared classifier performance across different algorithms and feature sets and applied the classifiers to all publications in PubMed indexed to CTSA grants. To validate the algorithm, we manually classified the articles with the top 100 scores from each classifier. RESULTS: The definitions and checklist facilitated classification and resulted in good inter-rater reliability for coding publications for the training set. Very good performance was achieved for the classifiers as represented by the area under the receiver operating curves (AUC), with an AUC of 0.94 for the T0 classifier, 0.84 for T1/T2, and 0.92 for T3/T4. CONCLUSIONS: The combination of definitions agreed upon by five CTSA hubs, a checklist that facilitates more uniform definition interpretation, and algorithms that perform well in classifying publications along the translational spectrum provide a basis for establishing and applying uniform definitions of translational research categories. The classification algorithms allow publication analyses that would not be feasible with manual classification, such as assessing the distribution and trends of publications across the CTSA network and comparing the categories of publications and their citations to assess knowledge transfer across the translational research spectrum.


Assuntos
Aprendizado de Máquina , Publicações/classificação , Pesquisa Translacional Biomédica , Algoritmos , Área Sob a Curva , Documentação
12.
Gynecol Oncol ; 142(3): 508-13, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27288543

RESUMO

OBJECTIVE: To determine factors influencing discharge patterns after laparoscopic hysterectomy for endometrial cancer and to evaluate the safety of same-day discharge during the 30-day postoperative period. METHODS: Using the American College of Surgeons' National Surgical Quality Improvement Project's database, patients who underwent hysterectomy for endometrial cancer from 2010 to 2014 were identified and categorized by their hospital length of stay. Statistical analyses were performed to assess the relationship between hospital stay and demographics, medical comorbidities, intraoperative surgical factors and postoperative outcomes. RESULTS: A total of 9020 patients had laparoscopic hysterectomies for endometrial cancer and of these, 729 patients (8.1%) were successfully discharged on the day of surgery. These patients were younger and had lower body mass indexes and fewer medical comorbidities than patients who were admitted after their procedure. The same-day discharge group underwent surgical procedures of less complexity than the hospital admission group based on shorter operative times and fewer relative value units (RVUs). There was a lower rate of surgical site infections in the same-day discharge group, and no difference in rates of other postoperative complications including hospital readmissions and reoperations. CONCLUSIONS: Rates of laparoscopic hysterectomy for endometrial cancer are gradually increasing but the rates of same-day discharge have increased at a much slower rate. Same-day discharge has been successful despite differences in preoperative demographics, medical comorbidities and intraoperative surgical complexity. Overall postoperative complication rates were equivalent despite length of hospital stay, demonstrating the safety and feasibility of same-day discharge after laparoscopic hysterectomy for endometrial cancer.


Assuntos
Procedimentos Cirúrgicos Ambulatórios/métodos , Neoplasias do Endométrio/cirurgia , Histerectomia/métodos , Procedimentos Cirúrgicos Ambulatórios/efeitos adversos , Procedimentos Cirúrgicos Ambulatórios/estatística & dados numéricos , Neoplasias do Endométrio/epidemiologia , Feminino , Humanos , Histerectomia/efeitos adversos , Histerectomia/estatística & dados numéricos , Laparoscopia/efeitos adversos , Laparoscopia/métodos , Laparoscopia/estatística & dados numéricos , Pessoa de Meia-Idade , Estados Unidos/epidemiologia
13.
JAMA Netw Open ; 7(3): e240357, 2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38466307

RESUMO

Importance: By law, patients have immediate access to discharge notes in their medical records. Technical language and abbreviations make notes difficult to read and understand for a typical patient. Large language models (LLMs [eg, GPT-4]) have the potential to transform these notes into patient-friendly language and format. Objective: To determine whether an LLM can transform discharge summaries into a format that is more readable and understandable. Design, Setting, and Participants: This cross-sectional study evaluated a sample of the discharge summaries of adult patients discharged from the General Internal Medicine service at NYU (New York University) Langone Health from June 1 to 30, 2023. Patients discharged as deceased were excluded. All discharge summaries were processed by the LLM between July 26 and August 5, 2023. Interventions: A secure Health Insurance Portability and Accountability Act-compliant platform, Microsoft Azure OpenAI, was used to transform these discharge summaries into a patient-friendly format between July 26 and August 5, 2023. Main Outcomes and Measures: Outcomes included readability as measured by Flesch-Kincaid Grade Level and understandability using Patient Education Materials Assessment Tool (PEMAT) scores. Readability and understandability of the original discharge summaries were compared with the transformed, patient-friendly discharge summaries created through the LLM. As balancing metrics, accuracy and completeness of the patient-friendly version were measured. Results: Discharge summaries of 50 patients (31 female [62.0%] and 19 male [38.0%]) were included. The median patient age was 65.5 (IQR, 59.0-77.5) years. Mean (SD) Flesch-Kincaid Grade Level was significantly lower in the patient-friendly discharge summaries (6.2 [0.5] vs 11.0 [1.5]; P < .001). PEMAT understandability scores were significantly higher for patient-friendly discharge summaries (81% vs 13%; P < .001). Two physicians reviewed each patient-friendly discharge summary for accuracy on a 6-point scale, with 54 of 100 reviews (54.0%) giving the best possible rating of 6. Summaries were rated entirely complete in 56 reviews (56.0%). Eighteen reviews noted safety concerns, mostly involving omissions, but also several inaccurate statements (termed hallucinations). Conclusions and Relevance: The findings of this cross-sectional study of 50 discharge summaries suggest that LLMs can be used to translate discharge summaries into patient-friendly language and formats that are significantly more readable and understandable than discharge summaries as they appear in electronic health records. However, implementation will require improvements in accuracy, completeness, and safety. Given the safety concerns, initial implementation will require physician review.


Assuntos
Inteligência Artificial , Pacientes Internados , Estados Unidos , Adulto , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Idoso , Estudos Transversais , Alta do Paciente , Registros Eletrônicos de Saúde , Idioma
14.
JACC Clin Electrophysiol ; 10(5): 956-966, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38703162

RESUMO

BACKGROUND: Prediction of drug-induced long QT syndrome (diLQTS) is of critical importance given its association with torsades de pointes. There is no reliable method for the outpatient prediction of diLQTS. OBJECTIVES: This study sought to evaluate the use of a convolutional neural network (CNN) applied to electrocardiograms (ECGs) to predict diLQTS in an outpatient population. METHODS: We identified all adult outpatients newly prescribed a QT-prolonging medication between January 1, 2003, and March 31, 2022, who had a 12-lead sinus ECG in the preceding 6 months. Using risk factor data and the ECG signal as inputs, the CNN QTNet was implemented in TensorFlow to predict diLQTS. RESULTS: Models were evaluated in a held-out test dataset of 44,386 patients (57% female) with a median age of 62 years. Compared with 3 other models relying on risk factors or ECG signal or baseline QTc alone, QTNet achieved the best (P < 0.001) performance with a mean area under the curve of 0.802 (95% CI: 0.786-0.818). In a survival analysis, QTNet also had the highest inverse probability of censorship-weighted area under the receiver-operating characteristic curve at day 2 (0.875; 95% CI: 0.848-0.904) and up to 6 months. In a subgroup analysis, QTNet performed best among males and patients ≤50 years or with baseline QTc <450 ms. In an external validation cohort of solely suburban outpatient practices, QTNet similarly maintained the highest predictive performance. CONCLUSIONS: An ECG-based CNN can accurately predict diLQTS in the outpatient setting while maintaining its predictive performance over time. In the outpatient setting, our model could identify higher-risk individuals who would benefit from closer monitoring.


Assuntos
Inteligência Artificial , Eletrocardiografia , Síndrome do QT Longo , Redes Neurais de Computação , Humanos , Feminino , Masculino , Síndrome do QT Longo/induzido quimicamente , Síndrome do QT Longo/diagnóstico , Pessoa de Meia-Idade , Idoso , Adulto , Fatores de Risco
15.
JAMA Netw Open ; 7(7): e2422399, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39012633

RESUMO

Importance: Virtual patient-physician communications have increased since 2020 and negatively impacted primary care physician (PCP) well-being. Generative artificial intelligence (GenAI) drafts of patient messages could potentially reduce health care professional (HCP) workload and improve communication quality, but only if the drafts are considered useful. Objectives: To assess PCPs' perceptions of GenAI drafts and to examine linguistic characteristics associated with equity and perceived empathy. Design, Setting, and Participants: This cross-sectional quality improvement study tested the hypothesis that PCPs' ratings of GenAI drafts (created using the electronic health record [EHR] standard prompts) would be equivalent to HCP-generated responses on 3 dimensions. The study was conducted at NYU Langone Health using private patient-HCP communications at 3 internal medicine practices piloting GenAI. Exposures: Randomly assigned patient messages coupled with either an HCP message or the draft GenAI response. Main Outcomes and Measures: PCPs rated responses' information content quality (eg, relevance), using a Likert scale, communication quality (eg, verbosity), using a Likert scale, and whether they would use the draft or start anew (usable vs unusable). Branching logic further probed for empathy, personalization, and professionalism of responses. Computational linguistics methods assessed content differences in HCP vs GenAI responses, focusing on equity and empathy. Results: A total of 16 PCPs (8 [50.0%] female) reviewed 344 messages (175 GenAI drafted; 169 HCP drafted). Both GenAI and HCP responses were rated favorably. GenAI responses were rated higher for communication style than HCP responses (mean [SD], 3.70 [1.15] vs 3.38 [1.20]; P = .01, U = 12 568.5) but were similar to HCPs on information content (mean [SD], 3.53 [1.26] vs 3.41 [1.27]; P = .37; U = 13 981.0) and usable draft proportion (mean [SD], 0.69 [0.48] vs 0.65 [0.47], P = .49, t = -0.6842). Usable GenAI responses were considered more empathetic than usable HCP responses (32 of 86 [37.2%] vs 13 of 79 [16.5%]; difference, 125.5%), possibly attributable to more subjective (mean [SD], 0.54 [0.16] vs 0.31 [0.23]; P < .001; difference, 74.2%) and positive (mean [SD] polarity, 0.21 [0.14] vs 0.13 [0.25]; P = .02; difference, 61.5%) language; they were also numerically longer (mean [SD] word count, 90.5 [32.0] vs 65.4 [62.6]; difference, 38.4%), but the difference was not statistically significant (P = .07) and more linguistically complex (mean [SD] score, 125.2 [47.8] vs 95.4 [58.8]; P = .002; difference, 31.2%). Conclusions: In this cross-sectional study of PCP perceptions of an EHR-integrated GenAI chatbot, GenAI was found to communicate information better and with more empathy than HCPs, highlighting its potential to enhance patient-HCP communication. However, GenAI drafts were less readable than HCPs', a significant concern for patients with low health or English literacy.


Assuntos
Relações Médico-Paciente , Humanos , Estudos Transversais , Feminino , Masculino , Adulto , Pessoa de Meia-Idade , Comunicação , Melhoria de Qualidade , Inteligência Artificial , Médicos de Atenção Primária/psicologia , Registros Eletrônicos de Saúde , Idioma , Empatia , Atitude do Pessoal de Saúde
16.
Artigo em Inglês | MEDLINE | ID: mdl-38778578

RESUMO

OBJECTIVES: To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings. MATERIALS AND METHODS: Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as "definitely actionable" (DA) or "possibly actionable-clinical correlation" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale. RESULTS: For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were "hallucinated" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision. CONCLUSION: GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via "human-in-the-loop" workflows remains critical for clinical implementation.

17.
Eur Heart J Acute Cardiovasc Care ; 13(6): 472-480, 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-38518758

RESUMO

AIMS: Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the USA with morbidity and mortality being highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock allows prompt implementation of treatment measures. Our objective is to develop a new dynamic risk score, called CShock, to improve early detection of cardiogenic shock in the cardiac intensive care unit (ICU). METHODS AND RESULTS: We developed and externally validated a deep learning-based risk stratification tool, called CShock, for patients admitted into the cardiac ICU with acute decompensated heart failure and/or myocardial infarction to predict the onset of cardiogenic shock. We prepared a cardiac ICU dataset using the Medical Information Mart for Intensive Care-III database by annotating with physician-adjudicated outcomes. This dataset which consisted of 1500 patients with 204 having cardiogenic/mixed shock was then used to train CShock. The features used to train the model for CShock included patient demographics, cardiac ICU admission diagnoses, routinely measured laboratory values and vital signs, and relevant features manually extracted from echocardiogram and left heart catheterization reports. We externally validated the risk model on the New York University (NYU) Langone Health cardiac ICU database which was also annotated with physician-adjudicated outcomes. The external validation cohort consisted of 131 patients with 25 patients experiencing cardiogenic/mixed shock. CShock achieved an area under the receiver operator characteristic curve (AUROC) of 0.821 (95% CI 0.792-0.850). CShock was externally validated in the more contemporary NYU cohort and achieved an AUROC of 0.800 (95% CI 0.717-0.884), demonstrating its generalizability in other cardiac ICUs. Having an elevated heart rate is most predictive of cardiogenic shock development based on Shapley values. The other top 10 predictors are having an admission diagnosis of myocardial infarction with ST-segment elevation, having an admission diagnosis of acute decompensated heart failure, Braden Scale, Glasgow Coma Scale, blood urea nitrogen, systolic blood pressure, serum chloride, serum sodium, and arterial blood pH. CONCLUSION: The novel CShock score has the potential to provide automated detection and early warning for cardiogenic shock and improve the outcomes for millions of patients who suffer from myocardial infarction and heart failure.


Assuntos
Aprendizado de Máquina , Choque Cardiogênico , Humanos , Choque Cardiogênico/diagnóstico , Masculino , Feminino , Medição de Risco/métodos , Idoso , Pessoa de Meia-Idade , Unidades de Cuidados Coronarianos , Diagnóstico Precoce , Estudos Retrospectivos , Fatores de Risco , Curva ROC , Mortalidade Hospitalar/tendências , Infarto do Miocárdio/diagnóstico , Infarto do Miocárdio/complicações , Unidades de Terapia Intensiva
18.
medRxiv ; 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38405784

RESUMO

Importance: Large language models (LLMs) are crucial for medical tasks. Ensuring their reliability is vital to avoid false results. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. Objective: Evaluate ChatGPT and LlaMA-2 performance in extracting MMSE and CDR scores, including their associated dates. Methods: Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. Results: For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. Conclusions: In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.

19.
JAMA Netw Open ; 6(7): e2321792, 2023 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-37405771

RESUMO

Importance: The marketing of health care devices enabled for use with artificial intelligence (AI) or machine learning (ML) is regulated in the US by the US Food and Drug Administration (FDA), which is responsible for approving and regulating medical devices. Currently, there are no uniform guidelines set by the FDA to regulate AI- or ML-enabled medical devices, and discrepancies between FDA-approved indications for use and device marketing require articulation. Objective: To explore any discrepancy between marketing and 510(k) clearance of AI- or ML-enabled medical devices. Evidence Review: This systematic review was a manually conducted survey of 510(k) approval summaries and accompanying marketing materials of devices approved between November 2021 and March 2022, conducted between March and November 2022, following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline. Analysis focused on the prevalence of discrepancies between marketing and certification material for AI/ML enabled medical devices. Findings: A total of 119 FDA 510(k) clearance summaries were analyzed in tandem with their respective marketing materials. The devices were taxonomized into 3 individual categories of adherent, contentious, and discrepant devices. A total of 15 devices (12.61%) were considered discrepant, 8 devices (6.72%) were considered contentious, and 96 devices (84.03%) were consistent between marketing and FDA 510(k) clearance summaries. Most devices were from the radiological approval committees (75 devices [82.35%]), with 62 of these devices (82.67%) adherent, 3 (4.00%) contentious, and 10 (13.33%) discrepant; followed by the cardiovascular device approval committee (23 devices [19.33%]), with 19 of these devices (82.61%) considered adherent, 2 contentious (8.70%) and 2 discrepant (8.70%). The difference between these 3 categories in cardiovascular and radiological devices was statistically significant (P < .001). Conclusions and Relevance: In this systematic review, low adherence rates within committees were observed most often in committees with few AI- or ML-enabled devices. and discrepancies between clearance documentation and marketing material were present in one-fifth of devices surveyed.


Assuntos
Inteligência Artificial , Aprovação de Equipamentos , Estados Unidos , Humanos , United States Food and Drug Administration , Aprendizado de Máquina , Marketing , Software
20.
Neurosurgery ; 92(2): 431-438, 2023 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-36399428

RESUMO

BACKGROUND: The development of accurate machine learning algorithms requires sufficient quantities of diverse data. This poses a challenge in health care because of the sensitive and siloed nature of biomedical information. Decentralized algorithms through federated learning (FL) avoid data aggregation by instead distributing algorithms to the data before centrally updating one global model. OBJECTIVE: To establish a multicenter collaboration and assess the feasibility of using FL to train machine learning models for intracranial hemorrhage (ICH) detection without sharing data between sites. METHODS: Five neurosurgery departments across the United States collaborated to establish a federated network and train a convolutional neural network to detect ICH on computed tomography scans. The global FL model was benchmarked against a standard, centrally trained model using a held-out data set and was compared against locally trained models using site data. RESULTS: A federated network of practicing neurosurgeon scientists was successfully initiated to train a model for predicting ICH. The FL model achieved an area under the ROC curve of 0.9487 (95% CI 0.9471-0.9503) when predicting all subtypes of ICH compared with a benchmark (non-FL) area under the ROC curve of 0.9753 (95% CI 0.9742-0.9764), although performance varied by subtype. The FL model consistently achieved top three performance when validated on any site's data, suggesting improved generalizability. A qualitative survey described the experience of participants in the federated network. CONCLUSION: This study demonstrates the feasibility of implementing a federated network for multi-institutional collaboration among clinicians and using FL to conduct machine learning research, thereby opening a new paradigm for neurosurgical collaboration.


Assuntos
Algoritmos , Benchmarking , Humanos , Hemorragias Intracranianas , Aprendizado de Máquina , Redes Neurais de Computação
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa