Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38926131

RESUMO

OBJECTIVES: Heart failure (HF) impacts millions of patients worldwide, yet the variability in treatment responses remains a major challenge for healthcare professionals. The current treatment strategies, largely derived from population based evidence, often fail to consider the unique characteristics of individual patients, resulting in suboptimal outcomes. This study aims to develop computational models that are patient-specific in predicting treatment outcomes, by utilizing a large Electronic Health Records (EHR) database. The goal is to improve drug response predictions by identifying specific HF patient subgroups that are likely to benefit from existing HF medications. MATERIALS AND METHODS: A novel, graph-based model capable of predicting treatment responses, combining Graph Neural Network and Transformer was developed. This method differs from conventional approaches by transforming a patient's EHR data into a graph structure. By defining patient subgroups based on this representation via K-Means Clustering, we were able to enhance the performance of drug response predictions. RESULTS: Leveraging EHR data from 11 627 Mayo Clinic HF patients, our model significantly outperformed traditional models in predicting drug response using NT-proBNP as a HF biomarker across five medication categories (best RMSE of 0.0043). Four distinct patient subgroups were identified with differential characteristics and outcomes, demonstrating superior predictive capabilities over existing HF subtypes (best mean RMSE of 0.0032). DISCUSSION: These results highlight the power of graph-based modeling of EHR in improving HF treatment strategies. The stratification of patients sheds light on particular patient segments that could benefit more significantly from tailored response predictions. CONCLUSIONS: Longitudinal EHR data have the potential to enhance personalized prognostic predictions through the application of graph-based AI techniques.

2.
JMIR Med Inform ; 12: e50437, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38941140

RESUMO

Integrating machine learning (ML) models into clinical practice presents a challenge of maintaining their efficacy over time. While existing literature offers valuable strategies for detecting declining model performance, there is a need to document the broader challenges and solutions associated with the real-world development and integration of model monitoring solutions. This work details the development and use of a platform for monitoring the performance of a production-level ML model operating in Mayo Clinic. In this paper, we aimed to provide a series of considerations and guidelines necessary for integrating such a platform into a team's technical infrastructure and workflow. We have documented our experiences with this integration process, discussed the broader challenges encountered with real-world implementation and maintenance, and included the source code for the platform. Our monitoring platform was built as an R shiny application, developed and implemented over the course of 6 months. The platform has been used and maintained for 2 years and is still in use as of July 2023. The considerations necessary for the implementation of the monitoring platform center around 4 pillars: feasibility (what resources can be used for platform development?); design (through what statistics or models will the model be monitored, and how will these results be efficiently displayed to the end user?); implementation (how will this platform be built, and where will it exist within the IT ecosystem?); and policy (based on monitoring feedback, when and what actions will be taken to fix problems, and how will these problems be translated to clinical staff?). While much of the literature surrounding ML performance monitoring emphasizes methodological approaches for capturing changes in performance, there remains a battery of other challenges and considerations that must be addressed for successful real-world implementation.

3.
medRxiv ; 2024 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-38826441

RESUMO

The consistent and persuasive evidence illustrating the influence of social determinants on health has prompted a growing realization throughout the health care sector that enhancing health and health equity will likely depend, at least to some extent, on addressing detrimental social determinants. However, detailed social determinants of health (SDoH) information is often buried within clinical narrative text in electronic health records (EHRs), necessitating natural language processing (NLP) methods to automatically extract these details. Most current NLP efforts for SDoH extraction have been limited, investigating on limited types of SDoH elements, deriving data from a single institution, focusing on specific patient cohorts or note types, with reduced focus on generalizability. This study aims to address these issues by creating cross-institutional corpora spanning different note types and healthcare systems, and developing and evaluating the generalizability of classification models, including novel large language models (LLMs), for detecting SDoH factors from diverse types of notes from four institutions: Harris County Psychiatric Center, University of Texas Physician Practice, Beth Israel Deaconess Medical Center, and Mayo Clinic. Four corpora of deidentified clinical notes were annotated with 21 SDoH factors at two levels: level 1 with SDoH factor types only and level 2 with SDoH factors along with associated values. Three traditional classification algorithms (XGBoost, TextCNN, Sentence BERT) and an instruction tuned LLM-based approach (LLaMA) were developed to identify multiple SDoH factors. Substantial variation was noted in SDoH documentation practices and label distributions based on patient cohorts, note types, and hospitals. The LLM achieved top performance with micro-averaged F1 scores over 0.9 on level 1 annotated corpora and an F1 over 0.84 on level 2 annotated corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. To foster collaboration, access to partial annotated corpora and models trained by merging all annotated datasets will be made available on the PhysioNet repository.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38934289

RESUMO

OBJECTIVES: The surge in patient portal messages (PPMs) with increasing needs and workloads for efficient PPM triage in healthcare settings has spurred the exploration of AI-driven solutions to streamline the healthcare workflow processes, ensuring timely responses to patients to satisfy their healthcare needs. However, there has been less focus on isolating and understanding patient primary concerns in PPMs-a practice which holds the potential to yield more nuanced insights and enhances the quality of healthcare delivery and patient-centered care. MATERIALS AND METHODS: We propose a fusion framework to leverage pretrained language models (LMs) with different language advantages via a Convolution Neural Network for precise identification of patient primary concerns via multi-class classification. We examined 3 traditional machine learning models, 9 BERT-based language models, 6 fusion models, and 2 ensemble models. RESULTS: The outcomes of our experimentation underscore the superior performance achieved by BERT-based models in comparison to traditional machine learning models. Remarkably, our fusion model emerges as the top-performing solution, delivering a notably improved accuracy score of 77.67 ± 2.74% and an F1 score of 74.37 ± 3.70% in macro-average. DISCUSSION: This study highlights the feasibility and effectiveness of multi-class classification for patient primary concern detection and the proposed fusion framework for enhancing primary concern detection. CONCLUSIONS: The use of multi-class classification enhanced by a fusion of multiple pretrained LMs not only improves the accuracy and efficiency of patient primary concern identification in PPMs but also aids in managing the rising volume of PPMs in healthcare, ensuring critical patient communications are addressed promptly and accurately.

5.
J Am Med Inform Assoc ; 31(7): 1493-1502, 2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38742455

RESUMO

BACKGROUND: Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process. OBJECTIVES: This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks. MATERIALS AND METHODS: We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator. RESULTS: The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies. CONCLUSION: The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde/classificação , Humanos , Classificação/métodos , Erros Médicos/classificação
6.
medRxiv ; 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38712199

RESUMO

Background: Postoperative ileus (POI) after colorectal surgery leads to increased morbidity, costs, and hospital stays. Identifying POI risk for early intervention is important for improving surgical outcomes especially given the increasing trend towards early discharge after surgery. While existing studies have assessed POI risk with regression models, the role of deep learning's remains unexplored. Methods: We assessed the performance and transferability (brutal force/instance/parameter transfer) of Gated Recurrent Unit with Decay (GRU-D), a longitudinal deep learning architecture, for real-time risk assessment of POI among 7,349 colorectal surgeries performed across three hospital sites operated by Mayo Clinic with two electronic health records (EHR) systems. The results were compared with atemporal models on a panel of benchmark metrics. Results: GRU-D exhibits robust transferability across different EHR systems and hospital sites, showing enhanced performance by integrating new measurements, even amid the extreme sparsity of real-world longitudinal data. On average, for labs, vitals, and assisted living status, 72.2%, 26.9%, and 49.3% respectively lack measurements within 24 hours after surgery. Over the follow-up period with 4-hour intervals, 98.7%, 84%, and 95.8% of data points are missing, respectively. A maximum of 5% decrease in AUROC was observed in brutal-force transfer between different EHR systems with non-overlapping surgery date frames. Multi-source instance transfer witnessed the best performance, with a maximum of 2.6% improvement in AUROC over local learning. The significant benefit, however, lies in the reduction of variance (a maximum of 86% decrease). The GRU-D model's performance mainly depends on the prediction task's difficulty, especially the case prevalence rate. Whereas the impact of training data and transfer strategy is less crucial, underscoring the challenge of effectively leveraging transfer learning for rare outcomes. While atemporal Logit models show notably superior performance at certain pre-surgical points, their performance fluctuate significantly and generally underperform GRU-D in post-surgical hours. Conclusion: GRU-D demonstrated robust transferability across EHR systems and hospital sites with highly sparse real-world EHR data. Further research on built-in explainability for meaningful intervention would be highly valuable for its integration into clinical practice.

7.
J Phys Ther Educ ; 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38640081

RESUMO

INTRODUCTION: Letters of recommendation (LOR) are an integral component of physical therapy residency applications. Identifying the influence of applicant and writer gender in LOR will help identify whether potential implicit gender bias exists in physical therapy residency application processes. REVIEW OF LITERATURE: Several medical and surgical residency education programs have reported positive, neutral, or negative LOR female gender bias among applicants and writers. Little research exists on gender differences in LOR to physical therapy education programs or physical therapy residency programs. SUBJECTS: Seven hundred sixty-eight LOR were analyzed from 256 applications to 3 physical therapy residency programs (neurologic, orthopaedic, sports) at one institution from 2014 to 2020. METHODS: Thematic categories were developed to identify themes in a sample of LOR. Associations between writer and applicant gender were analyzed using summary statistics, word counts, thematic and psycholinguistic extraction, and rule-based and deep learning Natural Language Processing . RESULTS: No significant difference in LOR word counts were found based on writer or applicant gender. Increased word counts were seen in sports residency LOR compared with the orthopaedic residency. Thematic analysis showed LOR gender differences with male applicants receiving more positive generalized recommendations and female applicants receiving more comments regarding interpersonal relationship skills. No thematic or psycholinguistic gender differences were seen by LOR writer. Male applicants were 1.9 times more likely to select all male LOR writers, whereas female applicants were 2.1 times more likely to choose all female LOR writers. DISCUSSION AND CONCLUSION: Gender differences in LORs for physical therapy residencies were found using a comprehensive Natural Language Processing approach that identified both a positive recommendation male applicant gender bias and a positive interpersonal relationship skill female applicant gender bias. Applicants were not harmed nor helped by selecting LOR writers of the opposite gender. Admissions committees and LOR writers should be mindful of potential implicit gender biases in LOR submitted to physical therapy residency programs.

8.
J Biomed Inform ; 152: 104623, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38458578

RESUMO

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.


Assuntos
Atividades Cotidianas , Estado Funcional , Humanos , Idoso , Aprendizagem , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural
9.
Aging Dis ; 2024 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-38421836

RESUMO

Covert cerebrovascular disease (CCD) is frequently reported on neuroimaging and associates with increased dementia and stroke risk. We aimed to determine how incidentally-discovered CCD during clinical neuroimaging in a large population associates with mortality. We screened CT and MRI reports of adults aged ≥50 in the Kaiser Permanente Southern California health system who underwent neuroimaging for a non-stroke clinical indication from 2009-2019. Natural language processing identified incidental covert brain infarcts (CBI) and/or white matter hyperintensities (WMH), grading WMH as mild/moderate/severe. Models adjusted for age, sex, ethnicity, multimorbidity, vascular risks, depression, exercise, and imaging modality. Of n=241,028, the mean age was 64.9 (SD=10.4); mean follow-up 4.46 years; 178,554 (74.1%) had CT; 62,474 (25.9%) had MRI; 11,328 (4.7%) had CBI; and 69,927 (29.0%) had WMH. The mortality rate per 1,000 person-years with CBI was 59.0 (95%CI 57.0-61.1); with WMH=46.5 (45.7-47.2); with neither=17.4 (17.1-17.7). In adjusted models, mortality risk associated with CBI was modified by age, e.g. HR 1.34 [1.21-1.48] at age 56.1 years vs HR 1.22 [1.17-1.28] at age 72 years. Mortality associated with WMH was modified by both age and imaging modality e.g., WMH on MRI at age 56.1 HR = 1.26 [1.18-1.35]; WMH on MRI at age 72 HR 1.15 [1.09-1.21]; WMH on CT at age 56.1 HR 1.41 [1.33-1.50]; WMH on CT at age 72 HR 1.28 [1.24-1.32], vs. patients without CBI or without WMH, respectively. Increasing WMH severity associated with higher mortality, e.g. mild WMH on MRI had adjusted HR=1.13 [1.06-1.20] while severe WMH on CT had HR=1.45 [1.33-1.59]. Incidentally-detected CBI and WMH on population-based clinical neuroimaging can predict higher mortality rates. We need treatments and healthcare planning for individuals with CCD.

10.
Stud Health Technol Inform ; 310: 850-854, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269929

RESUMO

With increasing number of people living with dementia, the problem of late diagnosis significantly impacts a person's quality of life while early signs of dementia may provide useful insights to facilitate better treatment plans. With time, this progressive neurodegenerative syndrome could progress from mild cognitive impairment to dementia. A pattern of health conditions can be characterized in unsupervised manner to help predict this progress. As a significant extension to our previous work with streaming clustering model, we consider additional information for predicting dementia onset. With empirical observations, we discover the importance of examining sex and age to predict dementia onset. To this end, we propose a sex-specific model with age-constraint for predicting dementia onset and validate the effectiveness of our models using data from Mayo Clinic Study of Aging (MCSA). The proposed sex-specific models for older adult populations (>=65 years of age) outperformed the previous models with F-score of 77% and 78% for male-specific and female-specific models, respectively. Our experiments of sex-specific temporal clustering of features in older adults demonstrate the potential of more personalized models for early alerts of dementia.


Assuntos
Disfunção Cognitiva , Demência , Humanos , Feminino , Masculino , Idoso , Qualidade de Vida , Envelhecimento , Análise por Conglomerados , Disfunção Cognitiva/diagnóstico , Demência/diagnóstico
11.
World Neurosurg ; 183: e243-e249, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38103686

RESUMO

BACKGROUND: Many predictive models for estimating clinical outcomes after spine surgery have been reported in the literature. However, implementation of predictive scores in practice is limited by the time-intensive nature of manually abstracting relevant predictors. In this study, we designed natural language processing (NLP) algorithms to automate data abstraction for the thoracolumbar injury classification score (TLICS). METHODS: We retrieved the radiology reports of all Mayo Clinic patients with an International Classification of Diseases, 9th or 10th revision, code corresponding to a fracture of the thoracolumbar spine between January 2005 and October 2020. Annotated data were used to train an N-gram NLP model using machine learning methods, including random forest, stepwise linear discriminant analysis, k-nearest neighbors, and penalized logistic regression models. RESULTS: A total of 1085 spine radiology reports were included in our analysis. Our dataset included 483 compression, 401 burst, 103 translational/rotational, and 98 distraction fractures. A total of 103 reports had documented an injury of the posterior ligamentous complex. The overall accuracy of the random forest model for fracture morphology feature detection was 76.96% versus 65.90% in the stepwise linear discriminant analysis, 50.69% in the k-nearest neighbors, and 62.67% in the penalized logistic regression. The overall accuracy to detect posterior ligamentous complex integrity was highest in the random forest model at 83.41%. Our random forest model was implemented in the backend of a web application in which users can dictate reports and have TLICS features automatically extracted. CONCLUSIONS: We have developed a machine learning NLP model for extracting TLICS features from radiology reports, which we deployed in a web application that can be integrated into clinical practice.


Assuntos
Fraturas Ósseas , Radiologia , Humanos , Processamento de Linguagem Natural , Reconhecimento de Voz , Vértebras Lombares/diagnóstico por imagem , Vértebras Lombares/lesões , Vértebras Torácicas/diagnóstico por imagem , Vértebras Torácicas/lesões
12.
Cerebrovasc Dis ; 2023 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-37935160

RESUMO

BACKGROUND: Covert cerebrovascular disease (CCD) includes white matter disease (WMD) and covert brain infarction (CBI). Incidentally-discovered CCD is associated with increased risk of subsequent symptomatic stroke. However, it is unknown whether the severity of WMD or the location of CBI predicts risk. OBJECTIVES: To examine the association of incidentally-discovered WMD severity and CBI location with risk of subsequent symptomatic stroke. METHOD: This retrospective cohort study includes patients 50 years old in the Kaiser Permanente Southern California health system who received neuroimaging for a non-stroke indication between 2009-2019. Incidental CBI and WMD were identified via natural language processing of the neuroimage report, and WMD severity was classified into grades. RESULTS: 261,960 patients received neuroimaging; 78,555 (30.0%) were identified to have incidental WMD, and 12,857 (4.9%) to have incidental CBI. Increasing WMD severity is associated with increased incidence rate of future stroke. However, the stroke incidence rate in CT-identified WMD is higher at each level of severity compared to rates in MRI-identified WMD. Patients with mild WMD via CT have a stroke incidence rate of 24.9 per 1,000 person-years, similar to that of patients with severe WMD via MRI. Among incidentally-discovered CBI patients with a determined CBI location, 97.9% are subcortical rather than cortical infarcts. CBI confers a similar risk of future stroke, whether cortical or subcortical, or whether MRI- or CT-detected. CONCLUSIONS: Increasing severity of incidental WMD is associated with an increased risk of future symptomatic stroke, dependent on the imaging modality. Subcortical and cortical CBI conferred similar risks.

13.
J Clin Transl Sci ; 7(1): e187, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37745932

RESUMO

Introduction: We tested the ability of our natural language processing (NLP) algorithm to identify delirium episodes in a large-scale study using real-world clinical notes. Methods: We used the Rochester Epidemiology Project to identify persons ≥ 65 years who were hospitalized between 2011 and 2017. We identified all persons with an International Classification of Diseases code for delirium within ±14 days of a hospitalization. We independently applied our NLP algorithm to all clinical notes for this same population. We calculated rates using number of delirium episodes as the numerator and number of hospitalizations as the denominator. Rates were estimated overall, by demographic characteristics, and by year of episode, and differences were tested using Poisson regression. Results: In total, 14,255 persons had 37,554 hospitalizations between 2011 and 2017. The code-based delirium rate was 3.02 per 100 hospitalizations (95% CI: 2.85, 3.20). The NLP-based rate was 7.36 per 100 (95% CI: 7.09, 7.64). Rates increased with age (both p < 0.0001). Code-based rates were higher in men compared to women (p = 0.03), but NLP-based rates were similar by sex (p = 0.89). Code-based rates were similar by race and ethnicity, but NLP-based rates were higher in the White population compared to the Black and Asian populations (p = 0.001). Both types of rates increased significantly over time (both p values < 0.001). Conclusions: The NLP algorithm identified more delirium episodes compared to the ICD code method. However, NLP may still underestimate delirium cases because of limitations in real-world clinical notes, including incomplete documentation, practice changes over time, and missing clinical notes in some time periods.

14.
J Alzheimers Dis ; 95(3): 931-940, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37638438

RESUMO

BACKGROUND: Multiple algorithms with variable performance have been developed to identify dementia using combinations of billing codes and medication data that are widely available from electronic health records (EHR). If the characteristics of misclassified patients are clearly identified, modifying existing algorithms to improve performance may be possible. OBJECTIVE: To examine the performance of a code-based algorithm to identify dementia cases in the population-based Mayo Clinic Study of Aging (MCSA) where dementia diagnosis (i.e., reference standard) is actively assessed through routine follow-up and describe the characteristics of persons incorrectly categorized. METHODS: There were 5,316 participants (age at baseline (mean (SD)): 73.3 (9.68) years; 50.7% male) without dementia at baseline and available EHR data. ICD-9/10 codes and prescription medications for dementia were extracted between baseline and one year after an MCSA dementia diagnosis or last follow-up. Fisher's exact or Kruskal-Wallis tests were used to compare characteristics between groups. RESULTS: Algorithm sensitivity and specificity were 0.70 (95% CI: 0.67, 0.74) and 0.95 (95% CI: 0.95, 0.96). False positives (i.e., participants falsely diagnosed with dementia by the algorithm) were older, with higher Charlson comorbidity index, more likely to have mild cognitive impairment (MCI), and longer follow-up (versus true negatives). False negatives (versus true positives) were older, more likely to have MCI, or have more functional limitations. CONCLUSIONS: We observed a moderate-high performance of the code-based diagnosis method against the population-based MCSA reference standard dementia diagnosis. Older participants and those with MCI at baseline were more likely to be misclassified.


Assuntos
Doença de Alzheimer , Envelhecimento Cognitivo , Disfunção Cognitiva , Demência , Humanos , Masculino , Feminino , Demência/diagnóstico , Demência/epidemiologia , Doença de Alzheimer/diagnóstico , Progressão da Doença , Disfunção Cognitiva/diagnóstico , Disfunção Cognitiva/epidemiologia
15.
J Am Med Inform Assoc ; 30(12): 2036-2040, 2023 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-37555837

RESUMO

Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.


Assuntos
COVID-19 , Processamento de Linguagem Natural , Humanos , Registros Eletrônicos de Saúde , Algoritmos
16.
NPJ Digit Med ; 6(1): 132, 2023 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-37479735

RESUMO

Clinical phenotyping is often a foundational requirement for obtaining datasets necessary for the development of digital health applications. Traditionally done via manual abstraction, this task is often a bottleneck in development due to time and cost requirements, therefore raising significant interest in accomplishing this task via in-silico means. Nevertheless, current in-silico phenotyping development tends to be focused on a single phenotyping task resulting in a dearth of reusable tools supporting cross-task generalizable in-silico phenotyping. In addition, in-silico phenotyping remains largely inaccessible for a substantial portion of potentially interested users. Here, we highlight the barriers to the usage of in-silico phenotyping and potential solutions in the form of a framework of several desiderata as observed during our implementation of such tasks. In addition, we introduce an example implementation of said framework as a software application, with a focus on ease of adoption, cross-task reusability, and facilitating the clinical phenotyping algorithm development process.

17.
JMIR Med Inform ; 11: e48072, 2023 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-37368483

RESUMO

BACKGROUND: A patient's family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used. OBJECTIVE: In this study, we aimed to construct an FH lexical resource for information extraction and normalization. METHODS: We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning-based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation. RESULTS: The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning-based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable. CONCLUSIONS: The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.

18.
AMIA Jt Summits Transl Sci Proc ; 2023: 196-205, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37350914

RESUMO

Gender stereotyping is the practice of assigning or ascribing specific characteristics, differences, or identities to a person solely based on their gender. Biased conceptions of gender can create barriers to equality and need to be proactively identified and addressed. In biomedical education, letters of recommendation (LOR) are considered an important source for evaluating candidates' past performance. Because LOR is subjective and has no standard formatting requirements for the writer, potential language bias can be introduced. Natural language processing (NLP) offers a promising solution to detect language bias in LOR through automatic extraction of sensitive language and identification of letters with strong biases. In our study, we developed, evaluated, and deployed four NLP different methods (sublanguage analysis, dictionary-based approach, rule-based approach, and deep learning approach) for the extraction of psycholinguistics and thematic characteristics in LORs from three different physical therapy residency programs (Neurologic, Orthopaedic, and Sport) at Mayo Clinic. The evaluation statistics suggest that both MedTaggerIE model and Bidirectional Encoder Representations from Transformers model achieved moderate-high performance across eight different thematic categories. Through the pilot demonstration study, we learned that male writers were more likely to use the words 'intelligence', 'exceptional', and 'pursue' and male applicants were more likely to have the words 'strength', 'interpersonal skills', 'conversations', and 'pursue' in their letters of recommendation. Thematic analysis suggested that male and female writers have significant differences in expressing doubt, motivation, and recommendation. Findings derived from the study needed to be carefully interpreted based on the context of the study setting, residency programs, and data. A follow-up demonstration study is needed to further evaluate and interpret the findings.

19.
J Am Med Inform Assoc ; 30(9): 1465-1473, 2023 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-37301740

RESUMO

OBJECTIVE: Social determinants of health (SDoH) play critical roles in health outcomes and well-being. Understanding the interplay of SDoH and health outcomes is critical to reducing healthcare inequalities and transforming a "sick care" system into a "health-promoting" system. To address the SDOH terminology gap and better embed relevant elements in advanced biomedical informatics, we propose an SDoH ontology (SDoHO), which represents fundamental SDoH factors and their relationships in a standardized and measurable way. MATERIAL AND METHODS: Drawing on the content of existing ontologies relevant to certain aspects of SDoH, we used a top-down approach to formally model classes, relationships, and constraints based on multiple SDoH-related resources. Expert review and coverage evaluation, using a bottom-up approach employing clinical notes data and a national survey, were performed. RESULTS: We constructed the SDoHO with 708 classes, 106 object properties, and 20 data properties, with 1,561 logical axioms and 976 declaration axioms in the current version. Three experts achieved 0.967 agreement in the semantic evaluation of the ontology. A comparison between the coverage of the ontology and SDOH concepts in 2 sets of clinical notes and a national survey instrument also showed satisfactory results. DISCUSSION: SDoHO could potentially play an essential role in providing a foundation for a comprehensive understanding of the associations between SDoH and health outcomes and paving the way for health equity across populations. CONCLUSION: SDoHO has well-designed hierarchies, practical objective properties, and versatile functionalities, and the comprehensive semantic and coverage evaluation achieved promising performance compared to the existing ontologies relevant to SDoH.


Assuntos
Equidade em Saúde , Determinantes Sociais da Saúde , Humanos , Semântica , Disparidades em Assistência à Saúde
20.
PLoS One ; 18(3): e0283800, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37000801

RESUMO

BACKGROUND: The incorporation of information from clinical narratives is critical for computational phenotyping. The accurate interpretation of clinical terms highly depends on their associated context, especially the corresponding clinical section information. However, the heterogeneity across different Electronic Health Record (EHR) systems poses challenges in utilizing the section information. OBJECTIVES: Leveraging the eMERGE heart failure (HF) phenotyping algorithm, we assessed the heterogeneity quantitatively through the performance comparison of machine learning (ML) classifiers which map clinical sections containing HF-relevant terms across different EHR systems to standard sections in Health Level 7 (HL7) Clinical Document Architecture (CDA). METHODS: We experimented with both random forest models with sentence-embedding features and bidirectional encoder representations from transformers models. We trained MLs using an automated labeled corpus from an EHR system that adopted HL7 CDA standard. We assessed the performance using a blind test set (n = 300) from the same EHR system and a gold standard (n = 900) manually annotated from three other EHR systems. RESULTS: The F-measure of those ML models varied widely (0.00-0.91%), indicating MLs with one tuning parameter set were insufficient to capture sections across different EHR systems. The error analysis indicates that the section does not always comply with the corresponding standardized sections, leading to low performance. CONCLUSIONS: We presented the potential use of ML techniques to map the sections containing HF-relevant terms in multiple EHR systems to standard sections. However, the findings suggested that the quality and heterogeneity of section structure across different EHRs affect applications due to the poor adoption of documentation standards.


Assuntos
Registros Eletrônicos de Saúde , Insuficiência Cardíaca , Humanos , Software , Algoritmos , Aprendizado de Máquina
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...