Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
J Emerg Med ; 66(2): 184-191, 2024 02.
Artículo en Inglés | MEDLINE | ID: mdl-38369413

RESUMEN

BACKGROUND: The adoption of point-of-care ultrasound (POCUS) has greatly improved the ability to rapidly evaluate unstable emergency department (ED) patients at the bedside. One major use of POCUS is to obtain echocardiograms to assess cardiac function. OBJECTIVES: We developed EchoNet-POCUS, a novel deep learning system, to aid emergency physicians (EPs) in interpreting POCUS echocardiograms and to reduce operator-to-operator variability. METHODS: We collected a new dataset of POCUS echocardiogram videos obtained in the ED by EPs and annotated the cardiac function and quality of each video. Using this dataset, we train EchoNet-POCUS to evaluate both cardiac function and video quality in POCUS echocardiograms. RESULTS: EchoNet-POCUS achieves an area under the receiver operating characteristic curve (AUROC) of 0.92 (0.89-0.94) for predicting whether cardiac function is abnormal and an AUROC of 0.81 (0.78-0.85) for predicting video quality. CONCLUSIONS: EchoNet-POCUS can be applied to bedside echocardiogram videos in real time using commodity hardware, as we demonstrate in a prospective pilot study.


Asunto(s)
Ecocardiografía , Sistemas de Atención de Punto , Humanos , Estudios Prospectivos , Proyectos Piloto , Ultrasonografía , Servicio de Urgencia en Hospital
2.
Emerg Med J ; 41(5): 298-303, 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38233106

RESUMEN

BACKGROUND: Tools to increase the turnaround speed and accuracy of imaging reports could positively influence ED logistics. The Caire ICH is an artificial intelligence (AI) software developed for ED physicians to recognise intracranial haemorrhages (ICHs) on non-contrast enhanced cranial CT scans to manage the clinical care of these patients in a timelier fashion. METHODS: A dataset of 532 non-contrast cranial CT scans was reviewed by five board-certified emergency physicians (EPs) with an average of 14.8 years of practice experience. The scans were labelled in random order for the presence or absence of an ICH. If an ICH was detected, the reader further labelled all subtypes present (ie, epidural, subdural, subarachnoid, intraparenchymal and/or intraventricular haemorrhage). After a washout period, the five EPs reviewed again the scans individually with the assistance of Caire ICH. The mean accuracy of the EP readings with AI assistance was compared with the mean accuracy of three general radiologists reading the films individually. The final diagnosis (ie, ground truth) was adjudicated by a consensus of the radiologists after their individual readings. RESULTS: Mean EP reader accuracy significantly increased by 6.20% (95% CI for the difference 5.10%-7.29%; p=0.0092) when using Caire ICH to detect an ICH. Mean accuracy of the EP cohort in detecting an ICH using Caire ICH was found to be more accurate than the radiologist cohort prior to discussion; this difference, however, was not statistically significant. CONCLUSION: The Caire ICH software significantly improved the accuracy and sensitivity of detecting an ICH by the EP to a level comparable to general radiologists. Further prospective research with larger numbers will be needed to understand the impact of Caire ICH on ED logistics and patient outcomes.

3.
JAMA ; 2024 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-39405325

RESUMEN

Importance: Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. Objective: To summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty. Data Sources: A systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024. Study Selection: Studies evaluating 1 or more LLMs in health care. Data Extraction and Synthesis: Three independent reviewers categorized studies via keyword searches based on the data used, the health care tasks, the NLP and NLU tasks, the dimensions of evaluation, and the medical specialty. Results: Of 519 studies reviewed, published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. The most common health care tasks were assessing medical knowledge such as answering medical licensing examination questions (44.5%) and making diagnoses (19.5%). Administrative tasks such as assigning billing codes (0.2%) and writing prescriptions (0.2%) were less studied. For NLP and NLU tasks, most studies focused on question answering (84.2%), while tasks such as summarization (8.9%) and conversational dialogue (3.3%) were infrequent. Almost all studies (95.4%) used accuracy as the primary dimension of evaluation; fairness, bias, and toxicity (15.8%), deployment considerations (4.6%), and calibration and uncertainty (1.2%) were infrequently measured. Finally, in terms of medical specialty area, most studies were in generic health care applications (25.6%), internal medicine (16.4%), surgery (11.4%), and ophthalmology (6.9%), with nuclear medicine (0.6%), physical medicine (0.4%), and medical genetics (0.2%) being the least represented. Conclusions and Relevance: Existing evaluations of LLMs mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such as fairness, bias, and toxicity and deployment considerations received limited attention. Future evaluations should adopt standardized applications and metrics, use clinical data, and broaden focus to include a wider range of tasks and specialties.

4.
Am J Manag Care ; 29(1): e1-e7, 2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36716157

RESUMEN

OBJECTIVES: To evaluate whether one summary metric of calculator performance sufficiently conveys equity across different demographic subgroups, as well as to evaluate how calculator predictive performance affects downstream health outcomes. STUDY DESIGN: We evaluate 3 commonly used clinical calculators-Model for End-Stage Liver Disease (MELD), CHA2DS2-VASc, and simplified Pulmonary Embolism Severity Index (sPESI)-on the cohort extracted from the Stanford Medicine Research Data Repository, following the cohort selection process as described in respective calculator derivation papers. METHODS: We quantified the predictive performance of the 3 clinical calculators across sex and race. Then, using the clinical guidelines that guide care based on these calculators' output, we quantified potential disparities in subsequent health outcomes. RESULTS: Across the examined subgroups, the MELD calculator exhibited worse performance for female and White populations, CHA2DS2-VASc calculator for the male population, and sPESI for the Black population. The extent to which such performance differences translated into differential health outcomes depended on the distribution of the calculators' scores around the thresholds used to trigger a care action via the corresponding guidelines. In particular, under the old guideline for CHA2DS2-VASc, among those who would not have been offered anticoagulant therapy, the Hispanic subgroup exhibited the highest rate of stroke. CONCLUSIONS: Clinical calculators, even when they do not include variables such as sex and race as inputs, can have very different care consequences across those subgroups. These differences in health care outcomes across subgroups can be explained by examining the distribution of scores and their calibration around the thresholds encoded in the accompanying care guidelines.


Asunto(s)
Fibrilación Atrial , Enfermedad Hepática en Estado Terminal , Accidente Cerebrovascular , Humanos , Masculino , Femenino , Medición de Riesgo , Índice de Severidad de la Enfermedad , Anticoagulantes/uso terapéutico , Sesgo , Factores de Riesgo , Fibrilación Atrial/complicaciones , Fibrilación Atrial/tratamiento farmacológico
5.
JAMA Netw Open ; 5(8): e2227779, 2022 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-35984654

RESUMEN

Importance: Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied. Objectives: To assess information requested by model reporting guidelines and whether the documentation for commonly used machine learning models developed by a single vendor provides the information requested. Evidence Review: MEDLINE was queried using machine learning model card and reporting machine learning from November 4 to December 6, 2020. References were reviewed to find additional publications, and publications without specific reporting recommendations were excluded. Similar elements requested for reporting were merged into representative items. Four independent reviewers and 1 adjudicator assessed how often documentation for the most commonly used models developed by a single vendor reported the items. Findings: From 15 model reporting guidelines, 220 unique items were identified that represented the collective reporting requirements. Although 12 items were commonly requested (requested by 10 or more guidelines), 77 items were requested by just 1 guideline. Documentation for 12 commonly used models from a single vendor reported a median of 39% (IQR, 37%-43%; range, 31%-47%) of items from the collective reporting requirements. Many of the commonly requested items had 100% reporting rates, including items concerning outcome definition, area under the receiver operating characteristics curve, internal validation, and intended clinical use. Several items reported half the time or less related to reliability, such as external validation, uncertainty measures, and strategy for handling missing data. Other frequently unreported items related to fairness (summary statistics and subgroup analyses, including for race and ethnicity or sex). Conclusions and Relevance: These findings suggest that consistent reporting recommendations for clinical predictive models are needed for model developers to share necessary information for model deployment. The many published guidelines would, collectively, require reporting more than 200 items. Model documentation from 1 vendor reported the most commonly requested items from model reporting guidelines. However, areas for improvement were identified in reporting items related to model reliability and fairness. This analysis led to feedback to the vendor, which motivated updates to the documentation for future users.


Asunto(s)
Modelos Estadísticos , Informe de Investigación , Recolección de Datos , Humanos , Pronóstico , Reproducibilidad de los Resultados
6.
Cureus ; 14(10): e30264, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36381767

RESUMEN

BACKGROUND: Intracranial hemorrhage (ICH) requires emergent medical treatment for positive outcomes. While previous artificial intelligence (AI) solutions achieved rapid diagnostics, none were shown to improve the performance of radiologists in detecting ICHs. Here, we show that the Caire ICH artificial intelligence system enhances a radiologist's ICH diagnosis performance. METHODS: A dataset of non-contrast-enhanced axial cranial computed tomography (CT) scans (n=532) were labeled for the presence or absence of an ICH. If an ICH was detected, its ICH subtype was identified. After a washout period, the three radiologists reviewed the same dataset with the assistance of the Caire ICH system. Performance was measured with respect to reader agreement, accuracy, sensitivity, and specificity when compared to the ground truth, defined as reader consensus. RESULTS: Caire ICH improved the inter-reader agreement on average by 5.76% in a dataset with an ICH prevalence of 74.3%. Further, radiologists using Caire ICH detected an average of 18 more ICHs and significantly increased their accuracy by 6.15%, their sensitivity by 4.6%, and their specificity by 10.62%. The Caire ICH system also improved the radiologist's ability to accurately identify the ICH subtypes present. CONCLUSION: The Caire ICH device significantly improves the performance of a cohort of radiologists. Such a device has the potential to be a tool that can improve patient outcomes and reduce misdiagnosis of ICH.

7.
Appl Clin Inform ; 13(1): 315-321, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35235994

RESUMEN

BACKGROUND: One key aspect of a learning health system (LHS) is utilizing data generated during care delivery to inform clinical care. However, institutional guidelines that utilize observational data are rare and require months to create, making current processes impractical for more urgent scenarios such as those posed by the COVID-19 pandemic. There exists a need to rapidly analyze institutional data to drive guideline creation where evidence from randomized control trials are unavailable. OBJECTIVES: This article provides a background on the current state of observational data generation in institutional guideline creation and details our institution's experience in creating a novel workflow to (1) demonstrate the value of such a workflow, (2) demonstrate a real-world example, and (3) discuss difficulties encountered and future directions. METHODS: Utilizing a multidisciplinary team of database specialists, clinicians, and informaticists, we created a workflow for identifying and translating a clinical need into a queryable format in our clinical data warehouse, creating data summaries and feeding this information back into clinical guideline creation. RESULTS: Clinical questions posed by the hospital medicine division were answered in a rapid time frame and informed creation of institutional guidelines for the care of patients with COVID-19. The cost of setting up a workflow, answering the questions, and producing data summaries required around 300 hours of effort and $300,000 USD. CONCLUSION: A key component of an LHS is the ability to learn from data generated during care delivery. There are rare examples in the literature and we demonstrate one such example along with proposed thoughts of ideal multidisciplinary team formation and deployment.


Asunto(s)
COVID-19 , Aprendizaje del Sistema de Salud , COVID-19/epidemiología , Humanos , Estudios Observacionales como Asunto , Pandemias , Guías de Práctica Clínica como Asunto , Flujo de Trabajo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA