Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 142
Filtrar
1.
CJC Open ; 6(6): 798-804, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-39022171

RESUMEN

Background: Inaccurate blood pressure (BP) classification results in inappropriate treatment. We tested whether machine learning (ML), using routine clinical data, can serve as a reliable alternative to ambulatory BP monitoring (ABPM) in classifying BP status. Methods: This study employed a multicentre approach involving 3 derivation cohorts from Glasgow, Gdansk, and Birmingham, and a fourth independent evaluation cohort. ML models were trained using office BP, ABPM, and clinical, laboratory, and demographic data, collected from patients referred for hypertension assessment. Seven ML algorithms were trained to classify patients into 5 groups, named as follows: Normal/Target; Hypertension-Masked; Normal/Target-White-Coat (WC); Hypertension-WC; and Hypertension. The 10-year cardiovascular outcomes and 27-year all-cause mortality risks were calculated for the ML-derived groups using the Cox proportional hazards model. Results: Overall, extreme gradient boosting (using XGBoost open source software) showed the highest area under the receiver operating characteristic curve of 0.85-0.88 across derivation cohorts, Glasgow (n = 923; 43% female; age 50.7 ± 16.3 years), Gdansk (n = 709; 46% female; age 54.4 ± 13 years), and Birmingham (n = 1222; 56% female; age 55.7 ± 14 years). But accuracy (0.57-0.72) and F1 (harmonic mean of precision and recall) scores (0.57-0.69) were low across the 3 patient cohorts. The evaluation cohort (n = 6213; 51% female; age 51.2 ± 10.8 years) indicated elevated 10-year risks of composite cardiovascular events in the Normal/Target-WC and the Hypertension-WC groups, with heightened 27-year all-cause mortality observed in all groups, except the Hypertension-Masked group, compared to the Normal/Target group. Conclusions: ML has limited potential in accurate BP classification when ABPM is unavailable. Larger studies including diverse patient groups and different resource settings are warranted.


Contexte: Les erreurs dans la classification des valeurs de la pression artérielle (PA) entraînent une inadéquation du traitement. Nous avons tâché de déterminer si l'apprentissage machine, à l'aide de données cliniques routinières, constituait une solution de rechange fiable à la surveillance ambulatoire de la PA pour définir le statut de la PA. Méthodologie: Cette étude a utilisé une approche multicentrique incluant trois cohortes de dérivation de Glasgow, Gdansk et Birmingham, et une quatrième cohorte d'évaluation indépendante. Les modèles d'apprentissage machine ont été développés en analysant les données démographiques, les valeurs de la PA mesurée au cabinet, les données relatives à la surveillance ambulatoire de la PA et aux épreuves de laboratoire recueillies auprès de patients adressés pour une évaluation de l'hypertension. Sept algorithmes d'apprentissage machine ont été appliqués pour classer les patients en cinq groupes : Normale/Cible; Hypertension-Masquée; Normal/Cible-Blouse blanche; Hypertension-Blouse blanche; Hypertension. Les événements cardiovasculaires sur 10 ans et le risque de mortalité toutes causes confondues sur 27 ans ont été calculés dans les groupes dérivés de l'apprentissage machine à l'aide d'un modèle de risques proportionnels de Cox. Résultats: D'une manière générale, l'amplification de gradient extrême (à l'aide du logiciel ouvert XGBoost) a mis en évidence l'aire sous la courbe de la fonction d'efficacité du récepteur (courbe ROC pour Receiver Operating Characteristic) la plus haute, soit 0,85 à 0,88, pour toutes les cohortes de dérivation : Glasgow (n = 923; 43 % de femmes; âge : 50,7 ± 16,3 ans); Gdansk (n = 709; 46 % de femmes; âge : 54,4 ± 13 ans); Birmingham (n = 1 222; 56 % de femmes; âge : 55,7 ± 14 ans). La précision (0,57 ­ 0,72) et le score F1 (moyenne harmonique de la précision et du rappel) (0,57 ­ 0,69) ont été faibles dans les trois cohortes de patients. La cohorte d'évaluation (n = 6 213; 51 % de femmes; âge : 51,2 ± 10,8 ans) a indiqué un risque d'événements cardiovasculaires composites sur 10 ans élevé dans les groupes Normale/Cible-Blouse blanche et Hypertension-Blouse blanche, tandis qu'une hausse de la mortalité toutes causes confondues sur 27 ans a été observée dans tous les groupes, sauf dans le groupe Hypertension-Masquée, comparativement au groupe Normale/Cible. Conclusions: Le potentiel d'exactitude de la classification de la PA à l'aide de l'apprentissage machine lorsque la surveillance ambulatoire de la PA n'est pas possible est limité. Des études de plus grande envergure portant sur des groupes de patients et des niveaux de ressources diversifiés s'imposent.

2.
Artículo en Inglés | MEDLINE | ID: mdl-39001795

RESUMEN

OBJECTIVES: Alzheimer's disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients' subjective experience. We aim to automate the extraction of specific sleep-related patterns, such as snoring, napping, poor sleep quality, daytime sleepiness, night wakings, other sleep problems, and sleep duration, from clinical notes of AD patients. These sleep patterns are hypothesized to play a role in the incidence of AD, providing insight into the relationship between sleep and AD onset and progression. MATERIALS AND METHODS: A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192 000 de-identified clinical notes of 7266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based natural language processing (NLP) algorithm, machine learning models, and large language model (LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, napping, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset. RESULTS: The annotated dataset of 482 patients comprised a predominantly White (89.2%), older adult population with an average age of 84.7 years, where females represented 64.1%, and a vast majority were non-Hispanic or Latino (94.6%). Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of positive predictive value (PPV), the rule-based NLP algorithm achieved the highest PPV scores for daytime sleepiness (1.00) and sleep duration (1.00), while the machine learning models had the highest PPV for napping (0.95) and bad sleep quality (0.86), and LLAMA2 with finetuning had the highest PPV for night wakings (0.93) and sleep problem (0.89). DISCUSSION: Although sleep information is infrequently documented in the clinical notes, the proposed rule-based NLP algorithm and LLM-based NLP algorithms still achieved promising results. In comparison, the machine learning-based approaches did not achieve good results, which is due to the small size of sleep information in the training data. CONCLUSION: The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD but could be extended to general sleep information extraction for other diseases.

3.
AMIA Jt Summits Transl Sci Proc ; 2024: 613-622, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38827046

RESUMEN

Monitoring cerebral neuronal activity via electroencephalography (EEG) during surgery can detect ischemia, a precursor to stroke. However, current neurophysiologist-based monitoring is prone to error. In this study, we evaluated machine learning (ML) for efficient and accurate ischemia detection. We trained supervised ML models on a dataset of 802 patients with intraoperative ischemia labels and evaluated them on an independent validation dataset of 30 patients with refined labels from five neurophysiologists. Our results show moderate-to-substantial agreement between neurophysiologists, with Cohen's kappa values between 0.59 and 0.74. Neurophysiologist performance ranged from 58-93% for sensitivity and 83-96% for specificity, while ML models demonstrated comparable ranges of 63-89% and 85-96%. Random Forest (RF), LightGBM (LGBM), and XGBoost RF achieved area under the receiver operating characteristic curve (AUROC) values of 0.92-0.93 and area under the precision-recall curve (AUPRC) values of 0.79-0.83. ML has the potential to improve intraoperative monitoring, enhancing patient safety and reducing costs.

4.
medRxiv ; 2024 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-38854094

RESUMEN

Importance: Accurately predicting major bleeding events in non-valvular atrial fibrillation (AF) patients on direct oral anticoagulants (DOACs) is crucial for personalized treatment and improving patient outcomes, especially with emerging alternatives like left atrial appendage closure devices. The left atrial appendage closure devices reduce stroke risk comparably but with significantly fewer non-procedural bleeding events. Objective: To evaluate the performance of machine learning (ML) risk models in predicting clinically significant bleeding events requiring hospitalization and hemorrhagic stroke in non-valvular AF patients on DOACs compared to conventional bleeding risk scores (HAS-BLED, ORBIT, and ATRIA) at the index visit to a cardiologist for AF management. Design: Prognostic modeling with retrospective cohort study design using electronic health record (EHR) data, with clinical follow-up at one-, two-, and five-years. Setting: University of Pittsburgh Medical Center (UPMC) system. Participants: 24,468 non-valvular AF patients aged ≥18 years treated with DOACs, excluding those with prior history of significant bleeding, other indications for DOACs, on warfarin or contraindicated to DOACs. Exposures: DOAC therapy for non-valvular AF. Main Outcomes and Measures: The primary endpoint was clinically significant bleeding requiring hospitalization within one year of index visit. The models incorporated demographic, clinical, and laboratory variables available in the EHR at the index visit. Results: Among 24,468 patients, 553 (2.3%) had bleeding events within one year, 829 (3.5%) within two years, and 1,292 (5.8%) within five years of index visit. We evaluated multivariate logistic regression and ML models including random forest, classification trees, k-nearest neighbor, naive Bayes, and extreme gradient boosting (XGBoost) which modestly outperformed HAS-BLED, ATRIA, and ORBIT scores in predicting clinically significant bleeding at 1-year follow-up. The best performing model (random forest) showed area under the curve (AUC-ROC) 0.76 (0.70-0.81), G-Mean score of 0.67, net reclassification index 0.14 compared to 0.57 (0.50-0.63), G-Mean score of 0.57 for HASBLED score, p-value for difference <0.001. The ML models had improved performance compared to conventional risk across time-points of 2-year and 5-years and within the subgroup of hemorrhagic stroke. SHAP analysis identified novel risk factors including measures from body mass index, cholesterol profile, and insurance type beyond those used in conventional risk scores. Conclusions and Relevance: Our findings demonstrate the superior performance of ML models compared to conventional bleeding risk scores and identify novel risk factors highlighting the potential for personalized bleeding risk assessment in AF patients on DOACs.

5.
AMIA Jt Summits Transl Sci Proc ; 2024: 488-497, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38827048

RESUMEN

Clinical predictive models that include race as a predictor have the potential to exacerbate disparities in healthcare. Such models can be respecified to exclude race or optimized to reduce racial bias. We investigated the impact of such respecifications in a predictive model - UTICalc - which was designed to reduce catheterizations in young children with suspected urinary tract infections. To reduce racial bias, race was removed from the UTICalc logistic regression model and replaced with two new features. We compared the two versions of UTICalc using fairness and predictive performance metrics to understand the effects on racial bias. In addition, we derived three new models for UTICalc to specifically improve racial fairness. Our results show that, as predicted by previously described impossibility results, fairness cannot be simultaneously improved on all fairness metrics, and model respecification may improve racial fairness but decrease overall predictive performance.

6.
IEEE Trans Med Imaging ; PP2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38900619

RESUMEN

This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.

7.
Online J Public Health Inform ; 16: e53445, 2024 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-38700929

RESUMEN

BACKGROUND: Post-COVID-19 condition (colloquially known as "long COVID-19") characterized as postacute sequelae of SARS-CoV-2 has no universal clinical case definition. Recent efforts have focused on understanding long COVID-19 symptoms, and electronic health record (EHR) data provide a unique resource for understanding this condition. The introduction of the International Classification of Diseases, Tenth Revision (ICD-10) code U09.9 for "Post COVID-19 condition, unspecified" to identify patients with long COVID-19 has provided a method of evaluating this condition in EHRs; however, the accuracy of this code is unclear. OBJECTIVE: This study aimed to characterize the utility and accuracy of the U09.9 code across 3 health care systems-the Veterans Health Administration, the Beth Israel Deaconess Medical Center, and the University of Pittsburgh Medical Center-against patients identified with long COVID-19 via a chart review by operationalizing the World Health Organization (WHO) and Centers for Disease Control and Prevention (CDC) definitions. METHODS: Patients who were COVID-19 positive with either a U07.1 ICD-10 code or positive polymerase chain reaction test within these health care systems were identified for chart review. Among this cohort, we sampled patients based on two approaches: (1) with a U09.9 code and (2) without a U09.9 code but with a new onset long COVID-19-related ICD-10 code, which allows us to assess the sensitivity of the U09.9 code. To operationalize the long COVID-19 definition based on health agency guidelines, symptoms were grouped into a "core" cluster of 11 commonly reported symptoms among patients with long COVID-19 and an extended cluster that captured all other symptoms by disease domain. Patients having ≥2 symptoms persisting for ≥60 days that were new onset after their COVID-19 infection, with ≥1 symptom in the core cluster, were labeled as having long COVID-19 per chart review. The code's performance was compared across 3 health care systems and across different time periods of the pandemic. RESULTS: Overall, 900 patient charts were reviewed across 3 health care systems. The prevalence of long COVID-19 among the cohort with the U09.9 ICD-10 code based on the operationalized WHO definition was between 23.2% and 62.4% across these health care systems. We also evaluated a less stringent version of the WHO definition and the CDC definition and observed an increase in the prevalence of long COVID-19 at all 3 health care systems. CONCLUSIONS: This is one of the first studies to evaluate the U09.9 code against a clinical case definition for long COVID-19, as well as the first to apply this definition to EHR data using a chart review approach on a nationwide cohort across multiple health care systems. This chart review approach can be implemented at other EHR systems to further evaluate the utility and performance of the U09.9 code.

8.
medRxiv ; 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38699316

RESUMEN

Scalable identification of patients with the post-acute sequelae of COVID-19 (PASC) is challenging due to a lack of reproducible precision phenotyping algorithms and the suboptimal accuracy, demographic biases, and underestimation of the PASC diagnosis code (ICD-10 U09.9). In a retrospective case-control study, we developed a precision phenotyping algorithm for identifying research cohorts of PASC patients, defined as a diagnosis of exclusion. We used longitudinal electronic health records (EHR) data from over 295 thousand patients from 14 hospitals and 20 community health centers in Massachusetts. The algorithm employs an attention mechanism to exclude sequelae that prior conditions can explain. We performed independent chart reviews to tune and validate our precision phenotyping algorithm. Our PASC phenotyping algorithm improves precision and prevalence estimation and reduces bias in identifying Long COVID patients compared to the U09.9 diagnosis code. Our algorithm identified a PASC research cohort of over 24 thousand patients (compared to about 6 thousand when using the U09.9 diagnosis code), with a 79.9 percent precision (compared to 77.8 percent from the U09.9 diagnosis code). Our estimated prevalence of PASC was 22.8 percent, which is close to the national estimates for the region. We also provide an in-depth analysis outlining the clinical attributes, encompassing identified lingering effects by organ, comorbidity profiles, and temporal differences in the risk of PASC. The PASC phenotyping method presented in this study boasts superior precision, accurately gauges the prevalence of PASC without underestimating it, and exhibits less bias in pinpointing Long COVID patients. The PASC cohort derived from our algorithm will serve as a springboard for delving into Long COVID's genetic, metabolomic, and clinical intricacies, surmounting the constraints of recent PASC cohort studies, which were hampered by their limited size and available outcome data.

9.
J Healthc Inform Res ; 8(2): 313-352, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38681755

RESUMEN

Clinical information retrieval (IR) plays a vital role in modern healthcare by facilitating efficient access and analysis of medical literature for clinicians and researchers. This scoping review aims to offer a comprehensive overview of the current state of clinical IR research and identify gaps and potential opportunities for future studies in this field. The main objective was to assess and analyze the existing literature on clinical IR, focusing on the methods, techniques, and tools employed for effective retrieval and analysis of medical information. Adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, we conducted an extensive search across databases such as Ovid Embase, Ovid Medline, Scopus, ACM Digital Library, IEEE Xplore, and Web of Science, covering publications from January 1, 2010, to January 4, 2023. The rigorous screening process led to the inclusion of 184 papers in our review. Our findings provide a detailed analysis of the clinical IR research landscape, covering aspects like publication trends, data sources, methodologies, evaluation metrics, and applications. The review identifies key research gaps in clinical IR methods such as indexing, ranking, and query expansion, offering insights and opportunities for future studies in clinical IR, thus serving as a guiding framework for upcoming research efforts in this rapidly evolving field. The study also underscores an imperative for innovative research on advanced clinical IR systems capable of fast semantic vector search and adoption of neural IR techniques for effective retrieval of information from unstructured electronic health records (EHRs). Supplementary Information: The online version contains supplementary material available at 10.1007/s41666-024-00159-4.

10.
PLOS Digit Health ; 3(4): e0000484, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38620037

RESUMEN

Few studies examining the patient outcomes of concurrent neurological manifestations during acute COVID-19 leveraged multinational cohorts of adults and children or distinguished between central and peripheral nervous system (CNS vs. PNS) involvement. Using a federated multinational network in which local clinicians and informatics experts curated the electronic health records data, we evaluated the risk of prolonged hospitalization and mortality in hospitalized COVID-19 patients from 21 healthcare systems across 7 countries. For adults, we used a federated learning approach whereby we ran Cox proportional hazard models locally at each healthcare system and performed a meta-analysis on the aggregated results to estimate the overall risk of adverse outcomes across our geographically diverse populations. For children, we reported descriptive statistics separately due to their low frequency of neurological involvement and poor outcomes. Among the 106,229 hospitalized COVID-19 patients (104,031 patients ≥18 years; 2,198 patients <18 years, January 2020-October 2021), 15,101 (14%) had at least one CNS diagnosis, while 2,788 (3%) had at least one PNS diagnosis. After controlling for demographics and pre-existing conditions, adults with CNS involvement had longer hospital stay (11 versus 6 days) and greater risk of (Hazard Ratio = 1.78) and faster time to death (12 versus 24 days) than patients with no neurological condition (NNC) during acute COVID-19 hospitalization. Adults with PNS involvement also had longer hospital stay but lower risk of mortality than the NNC group. Although children had a low frequency of neurological involvement during COVID-19 hospitalization, a substantially higher proportion of children with CNS involvement died compared to those with NNC (6% vs 1%). Overall, patients with concurrent CNS manifestation during acute COVID-19 hospitalization faced greater risks for adverse clinical outcomes than patients without any neurological diagnosis. Our global informatics framework using a federated approach (versus a centralized data collection approach) has utility for clinical discovery beyond COVID-19.

11.
JMIR Med Inform ; 12: e55318, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38587879

RESUMEN

BACKGROUND: Large language models (LLMs) have shown remarkable capabilities in natural language processing (NLP), especially in domains where labeled data are scarce or expensive, such as the clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches. OBJECTIVE: The objective of this study is to assess the effectiveness of various prompt engineering techniques, including 2 newly introduced types-heuristic and ensemble prompts, for zero-shot and few-shot clinical information extraction using pretrained language models. METHODS: This comprehensive experimental study evaluated different prompt types (simple prefix, simple cloze, chain of thought, anticipatory, heuristic, and ensemble) across 5 clinical NLP tasks: clinical sense disambiguation, biomedical evidence extraction, coreference resolution, medication status extraction, and medication attribute extraction. The performance of these prompts was assessed using 3 state-of-the-art language models: GPT-3.5 (OpenAI), Gemini (Google), and LLaMA-2 (Meta). The study contrasted zero-shot with few-shot prompting and explored the effectiveness of ensemble approaches. RESULTS: The study revealed that task-specific prompt tailoring is vital for the high performance of LLMs for zero-shot clinical NLP. In clinical sense disambiguation, GPT-3.5 achieved an accuracy of 0.96 with heuristic prompts and 0.94 in biomedical evidence extraction. Heuristic prompts, alongside chain of thought prompts, were highly effective across tasks. Few-shot prompting improved performance in complex scenarios, and ensemble approaches capitalized on multiple prompt strengths. GPT-3.5 consistently outperformed Gemini and LLaMA-2 across tasks and prompt types. CONCLUSIONS: This study provides a rigorous evaluation of prompt engineering methodologies and introduces innovative techniques for clinical information extraction, demonstrating the potential of in-context learning in the clinical domain. These findings offer clear guidelines for future prompt-based clinical NLP research, facilitating engagement by non-NLP experts in clinical NLP advancements. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative artificial intelligence, and we hope that it will inspire and inform future research in this area.

12.
JMIR Med Inform ; 12: e52289, 2024 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-38568736

RESUMEN

BACKGROUND: The rehabilitation of a patient who had a stroke requires precise, personalized treatment plans. Natural language processing (NLP) offers the potential to extract valuable exercise information from clinical notes, aiding in the development of more effective rehabilitation strategies. OBJECTIVE: This study aims to develop and evaluate a variety of NLP algorithms to extract and categorize physical rehabilitation exercise information from the clinical notes of patients who had a stroke treated at the University of Pittsburgh Medical Center. METHODS: A cohort of 13,605 patients diagnosed with stroke was identified, and their clinical notes containing rehabilitation therapy notes were retrieved. A comprehensive clinical ontology was created to represent various aspects of physical rehabilitation exercises. State-of-the-art NLP algorithms were then developed and compared, including rule-based, machine learning-based algorithms (support vector machine, logistic regression, gradient boosting, and AdaBoost) and large language model (LLM)-based algorithms (ChatGPT [OpenAI]). The study focused on key performance metrics, particularly F1-scores, to evaluate algorithm effectiveness. RESULTS: The analysis was conducted on a data set comprising 23,724 notes with detailed demographic and clinical characteristics. The rule-based NLP algorithm demonstrated superior performance in most areas, particularly in detecting the "Right Side" location with an F1-score of 0.975, outperforming gradient boosting by 0.063. Gradient boosting excelled in "Lower Extremity" location detection (F1-score: 0.978), surpassing rule-based NLP by 0.023. It also showed notable performance in the "Passive Range of Motion" detection with an F1-score of 0.970, a 0.032 improvement over rule-based NLP. The rule-based algorithm efficiently handled "Duration," "Sets," and "Reps" with F1-scores up to 0.65. LLM-based NLP, particularly ChatGPT with few-shot prompts, achieved high recall but generally lower precision and F1-scores. However, it notably excelled in "Backward Plane" motion detection, achieving an F1-score of 0.846, surpassing the rule-based algorithm's 0.720. CONCLUSIONS: The study successfully developed and evaluated multiple NLP algorithms, revealing the strengths and weaknesses of each in extracting physical rehabilitation exercise information from clinical notes. The detailed ontology and the robust performance of the rule-based and gradient boosting algorithms demonstrate significant potential for enhancing precision rehabilitation. These findings contribute to the ongoing efforts to integrate advanced NLP techniques into health care, moving toward predictive models that can recommend personalized rehabilitation treatments for optimal patient outcomes.

13.
medRxiv ; 2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-38585746

RESUMEN

Objective: Statistical and artificial intelligence algorithms are increasingly being developed for use in healthcare. These algorithms may reflect biases that magnify disparities in clinical care, and there is a growing need for understanding how algorithmic biases can be mitigated in pursuit of algorithmic fairness. Individual fairness in algorithms constrains algorithms to the notion that "similar individuals should be treated similarly." We conducted a scoping review on algorithmic individual fairness to understand the current state of research in the metrics and methods developed to achieve individual fairness and its applications in healthcare. Methods: We searched three databases, PubMed, ACM Digital Library, and IEEE Xplore, for algorithmic individual fairness metrics, algorithmic bias mitigation, and healthcare applications. Our search was restricted to articles published between January 2013 and September 2023. We identified 1,886 articles through database searches and manually identified one article from which we included 30 articles in the review. Data from the selected articles were extracted, and the findings were synthesized. Results: Based on the 30 articles in the review, we identified several themes, including philosophical underpinnings of fairness, individual fairness metrics, mitigation methods for achieving individual fairness, implications of achieving individual fairness on group fairness and vice versa, fairness metrics that combined individual fairness and group fairness, software for measuring and optimizing individual fairness, and applications of individual fairness in healthcare. Conclusion: While there has been significant work on algorithmic individual fairness in recent years, the definition, use, and study of individual fairness remain in their infancy, especially in healthcare. Future research is needed to apply and evaluate individual fairness in healthcare comprehensively.

14.
Stud Health Technol Inform ; 310: 274-278, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269808

RESUMEN

Continuous intraoperative monitoring with electroencephalo2 graphy (EEG) is commonly used to detect cerebral ischemia in high-risk surgical procedures such as carotid endarterectomy. Machine learning (ML) models that detect ischemia in real time can form the basis of automated intraoperative EEG monitoring. In this study, we describe and compare two time-series aware precision and recall metrics to the classical precision and recall metrics for evaluating the performance of ML models that detect ischemia. We trained six ML models to detect ischemia in intraoperative EEG and evaluated them with the area under the precision-recall curve (AUPRC) using time-series aware and classical approaches to compute precision and recall. The Support Vector Classification (SVC) model performed the best on the time-series aware metrics, while the Light Gradient Boosting Machine (LGBM) model performed the best on the classical metrics. Visual inspection of the probability outputs of the models alongside the actual ischemic periods revealed that the time-series aware AUPRC selected a model more likely to predict ischemia onset in a timely fashion than the model selected by classical AUPRC.


Asunto(s)
Isquemia , Monitoreo Intraoperatorio , Humanos , Factores de Tiempo , Área Bajo la Curva , Electroencefalografía
15.
J Cardiothorac Vasc Anesth ; 38(2): 526-533, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37838509

RESUMEN

OBJECTIVE: Postoperative delirium (POD) can occur in up to 50% of older patients undergoing cardiovascular surgery, resulting in hospitalization and significant morbidity and mortality. This study aimed to determine whether intraoperative neurophysiologic monitoring (IONM) modalities can be used to predict delirium in patients undergoing cardiovascular surgery. DESIGN: Adult patients undergoing cardiovascular surgery with IONM between 2019 and 2021 were reviewed retrospectively. Delirium was assessed multiple times using the Intensive Care Delirium Screening Checklist (ICDSC). Patients with an ICDSC score ≥4 were considered to have POD. Significant IONM changes were evaluated based on a visual review of electroencephalography (EEG) and somatosensory evoked potentials data and documentation of significant changes during surgery. SETTING: University of Pittsburgh Medical Center hospitals. PARTICIPANTS: Patients 18 years old and older undergoing cardiovascular surgery with IONM monitoring. MEASUREMENTS AND MAIN RESULTS: Of the 578 patients undergoing cardiovascular surgery with IONM, 126 had POD (21.8%). Significant IONM changes were noted in 134 patients, of whom 49 patients had delirium (36.6%). In contrast, 444 patients had no IONM changes during surgery, of whom 77 (17.3%) patients had POD. Upon multivariate analysis, IONM changes were associated with POD (odds ratio 2.12; 95% CI 1.31-3.44; p < 0.001). Additionally, baseline EEG abnormalities were associated with POD (p = 0.002). CONCLUSION: Significant IONM changes are associated with an increased risk of POD in patients undergoing cardiovascular surgery. These findings offer a basis for future research and analysis of EEG and somatosensory evoked potential monitoring to predict, detect, and prevent POD.


Asunto(s)
Delirio del Despertar , Monitorización Neurofisiológica Intraoperatoria , Adulto , Humanos , Adolescente , Estudios Retrospectivos , Potenciales Evocados Somatosensoriales/fisiología , Monitorización Neurofisiológica Intraoperatoria/métodos , Electroencefalografía , Complicaciones Posoperatorias/diagnóstico , Complicaciones Posoperatorias/etiología , Complicaciones Posoperatorias/prevención & control
16.
Am J Cardiol ; 213: 126-131, 2024 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-38103769

RESUMEN

Valvular heart diseases (VHDs) significantly impact morbidity and mortality rates worldwide. Early diagnosis improves patient outcomes. Artificial intelligence (AI) applied to electrocardiogram (ECG) interpretation presents a promising approach for early VHD detection. We conducted a meta-analysis on the efficacy of AI models in this context. We reviewed databases including PubMed, MEDLINE, Embase, Scopus, and Cochrane until August 20, 2023, focusing on AI for ECG-based VHD detection. The outcomes included pooled accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value. The pooled proportions were derived using a random-effects model with 95% confidence intervals (CIs). Study heterogeneity was evaluated with the I-squared statistic. Our analysis included 10 studies, involving ECG data from 713,537 patients. The AI algorithms mainly screened for aortic stenosis (n = 6), mitral regurgitation (n = 4), aortic regurgitation (n = 3), mitral stenosis (n = 1), mitral valve prolapse (n = 2), and tricuspid regurgitation (n = 1). A total of 9 studies used convolution neural network models, whereas 1 study combined the strengths of support vector machine logistic regression and multilayer perceptron for ECG interpretation. The collective AI models demonstrated a pooled accuracy of 81% (95% CI 73 to 89, I² = 92%), sensitivity was 83% (95% CI 77 to 88, I² = 86%), specificity was 72% (95% CI 68 to 75, I² = 52%), PPV was 13% (95% CI 7 to 19, I² = 90%), and negative predictive value was 99% (95% CI 97 to 99, I² = 50%). The subgroup analyses for aortic stenosis and mitral regurgitation detection yielded analogous outcomes. In conclusion, AI-driven ECG offers high accuracy in VHD screening. However, its low PPV indicates the need for a combined approach with clinical judgment, especially in primary care settings.


Asunto(s)
Estenosis de la Válvula Aórtica , Enfermedades de las Válvulas Cardíacas , Insuficiencia de la Válvula Mitral , Humanos , Inteligencia Artificial , Enfermedades de las Válvulas Cardíacas/diagnóstico , Estenosis de la Válvula Aórtica/diagnóstico , Electrocardiografía
17.
medRxiv ; 2023 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-37790354

RESUMEN

Clinical predictive models that include race as a predictor have the potential to exacerbate disparities in healthcare. Such models can be respecified to exclude race or optimized to reduce racial bias. We investigated the impact of such respecifications in a predictive model - UTICalc - which was designed to reduce catheterizations in young children with suspected urinary tract infections. To reduce racial bias, race was removed from the UTICalc logistic regression model and replaced with two new features. We compared the two versions of UTICalc using fairness and predictive performance metrics to understand the effects on racial bias. In addition, we derived three new models for UTICalc to specifically improve racial fairness. Our results show that, as predicted by previously described impossibility results, fairness cannot be simultaneously improved on all fairness metrics, and model respecification may improve racial fairness but decrease overall predictive performance.

18.
medRxiv ; 2023 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-37790390

RESUMEN

Background: A scalable approach for the sharing and reuse of human-readable and computer-executable phenotype definitions can facilitate the reuse of electronic health records for cohort identification and research studies. Description: We developed a tool called Sharephe for the Informatics for Integrating Biology and the Bedside (i2b2) platform. Sharephe consists of a plugin for i2b2 and a cloud-based searchable repository of computable phenotypes, has the functionality to import to and export from the repository, and has the ability to link to supporting metadata. Discussion: The i2b2 platform enables researchers to create, evaluate, and implement phenotypes without knowing complex query languages. In an initial evaluation, two sites on the Evolve to Next-Gen ACT (ENACT) network used Sharephe to successfully create, share, and reuse phenotypes. Conclusion: The combination of a cloud-based computable repository and an i2b2 plugin for accessing the repository enables investigators to store and retrieve phenotypes from anywhere and at any time and to collaborate across sites in a research network.

19.
EClinicalMedicine ; 64: 102210, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37745021

RESUMEN

Background: Characterizing Post-Acute Sequelae of COVID (SARS-CoV-2 Infection), or PASC has been challenging due to the multitude of sub-phenotypes, temporal attributes, and definitions. Scalable characterization of PASC sub-phenotypes can enhance screening capacities, disease management, and treatment planning. Methods: We conducted a retrospective multi-centre observational cohort study, leveraging longitudinal electronic health record (EHR) data of 30,422 patients from three healthcare systems in the Consortium for the Clinical Characterization of COVID-19 by EHR (4CE). From the total cohort, we applied a deductive approach on 12,424 individuals with follow-up data and developed a distributed representation learning process for providing augmented definitions for PASC sub-phenotypes. Findings: Our framework characterized seven PASC sub-phenotypes. We estimated that on average 15.7% of the hospitalized COVID-19 patients were likely to suffer from at least one PASC symptom and almost 5.98%, on average, had multiple symptoms. Joint pain and dyspnea had the highest prevalence, with an average prevalence of 5.45% and 4.53%, respectively. Interpretation: We provided a scalable framework to every participating healthcare system for estimating PASC sub-phenotypes prevalence and temporal attributes, thus developing a unified model that characterizes augmented sub-phenotypes across the different systems. Funding: Authors are supported by National Institute of Allergy and Infectious Diseases, National Institute on Aging, National Center for Advancing Translational Sciences, National Medical Research Council, National Institute of Neurological Disorders and Stroke, European Union, National Institutes of Health, National Center for Advancing Translational Sciences.

20.
EClinicalMedicine ; 64: 102212, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37745025

RESUMEN

Background: Multisystem inflammatory syndrome in children (MIS-C) is a severe complication of SARS-CoV-2 infection. It remains unclear how MIS-C phenotypes vary across SARS-CoV-2 variants. We aimed to investigate clinical characteristics and outcomes of MIS-C across SARS-CoV-2 eras. Methods: We performed a multicentre observational retrospective study including seven paediatric hospitals in four countries (France, Spain, U.K., and U.S.). All consecutive confirmed patients with MIS-C hospitalised between February 1st, 2020, and May 31st, 2022, were included. Electronic Health Records (EHR) data were used to calculate pooled risk differences (RD) and effect sizes (ES) at site level, using Alpha as reference. Meta-analysis was used to pool data across sites. Findings: Of 598 patients with MIS-C (61% male, 39% female; mean age 9.7 years [SD 4.5]), 383 (64%) were admitted in the Alpha era, 111 (19%) in the Delta era, and 104 (17%) in the Omicron era. Compared with patients admitted in the Alpha era, those admitted in the Delta era were younger (ES -1.18 years [95% CI -2.05, -0.32]), had fewer respiratory symptoms (RD -0.15 [95% CI -0.33, -0.04]), less frequent non-cardiogenic shock or systemic inflammatory response syndrome (SIRS) (RD -0.35 [95% CI -0.64, -0.07]), lower lymphocyte count (ES -0.16 × 109/uL [95% CI -0.30, -0.01]), lower C-reactive protein (ES -28.5 mg/L [95% CI -46.3, -10.7]), and lower troponin (ES -0.14 ng/mL [95% CI -0.26, -0.03]). Patients admitted in the Omicron versus Alpha eras were younger (ES -1.6 years [95% CI -2.5, -0.8]), had less frequent SIRS (RD -0.18 [95% CI -0.30, -0.05]), lower lymphocyte count (ES -0.39 × 109/uL [95% CI -0.52, -0.25]), lower troponin (ES -0.16 ng/mL [95% CI -0.30, -0.01]) and less frequently received anticoagulation therapy (RD -0.19 [95% CI -0.37, -0.04]). Length of hospitalization was shorter in the Delta versus Alpha eras (-1.3 days [95% CI -2.3, -0.4]). Interpretation: Our study suggested that MIS-C clinical phenotypes varied across SARS-CoV-2 eras, with patients in Delta and Omicron eras being younger and less sick. EHR data can be effectively leveraged to identify rare complications of pandemic diseases and their variation over time. Funding: None.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA