Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
1.
medRxiv ; 2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39228725

RESUMO

Background: The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that is developed and maintained by the Observational Health Data Sciences and Informatics (OHDSI) community supports large scale cancer research by enabling distributed network analysis. As the number of studies using the OMOP CDM for cancer research increases, there is a growing need for an overview of the scope of cancer research that relies on the OMOP CDM ecosystem. Objectives: In this study, we present a comprehensive review of the adoption of the OMOP CDM for cancer research and offer some insights on opportunities in leveraging the OMOP CDM ecosystem for advancing cancer research. Materials and Methods: Published literature databases were searched to retrieve OMOP CDM and cancer-related English language articles published between January 2010 and December 2023. A charting form was developed for two main themes, i.e., clinically focused data analysis studies and infrastructure development studies in the cancer domain. Results: In total, 50 unique articles were included, with 30 for the data analysis theme and 23 for the infrastructure theme, with 3 articles belonging to both themes. The topics covered by the existing body of research was depicted. Conclusion: Through depicting the status quo of research efforts to improve or leverage the potential of the OMOP CDM ecosystem for advancing cancer research, we identify challenges and opportunities surrounding data analysis and infrastructure including data quality, advanced analytics methodology adoption, in-depth phenotypic data inclusion through NLP, and multisite evaluation.

2.
JMIR Med Inform ; 12: e49997, 2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39250782

RESUMO

BACKGROUND: A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC). OBJECTIVE: This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC. METHODS: We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm. RESULTS: An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites. CONCLUSIONS: The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement.

3.
JMIR AI ; 3: e56932, 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39106099

RESUMO

BACKGROUND: Despite their growing use in health care, pretrained language models (PLMs) often lack clinical relevance due to insufficient domain expertise and poor interpretability. A key strategy to overcome these challenges is integrating external knowledge into PLMs, enhancing their adaptability and clinical usefulness. Current biomedical knowledge graphs like UMLS (Unified Medical Language System), SNOMED CT (Systematized Medical Nomenclature for Medicine-Clinical Terminology), and HPO (Human Phenotype Ontology), while comprehensive, fail to effectively connect general biomedical knowledge with physician insights. There is an equally important need for a model that integrates diverse knowledge in a way that is both unified and compartmentalized. This approach not only addresses the heterogeneous nature of domain knowledge but also recognizes the unique data and knowledge repositories of individual health care institutions, necessitating careful and respectful management of proprietary information. OBJECTIVE: This study aimed to enhance the clinical relevance and interpretability of PLMs by integrating external knowledge in a manner that respects the diversity and proprietary nature of health care data. We hypothesize that domain knowledge, when captured and distributed as stand-alone modules, can be effectively reintegrated into PLMs to significantly improve their adaptability and utility in clinical settings. METHODS: We demonstrate that through adapters, small and lightweight neural networks that enable the integration of extra information without full model fine-tuning, we can inject diverse sources of external domain knowledge into language models and improve the overall performance with an increased level of interpretability. As a practical application of this methodology, we introduce a novel task, structured as a case study, that endeavors to capture physician knowledge in assigning cardiovascular diagnoses from clinical narratives, where we extract diagnosis-comment pairs from electronic health records (EHRs) and cast the problem as text classification. RESULTS: The study demonstrates that integrating domain knowledge into PLMs significantly improves their performance. While improvements with ClinicalBERT are more modest, likely due to its pretraining on clinical texts, BERT (bidirectional encoder representations from transformer) equipped with knowledge adapters surprisingly matches or exceeds ClinicalBERT in several metrics. This underscores the effectiveness of knowledge adapters and highlights their potential in settings with strict data privacy constraints. This approach also increases the level of interpretability of these models in a clinical context, which enhances our ability to precisely identify and apply the most relevant domain knowledge for specific tasks, thereby optimizing the model's performance and tailoring it to meet specific clinical needs. CONCLUSIONS: This research provides a basis for creating health knowledge graphs infused with physician knowledge, marking a significant step forward for PLMs in health care. Notably, the model balances integrating knowledge both comprehensively and selectively, addressing the heterogeneous nature of medical knowledge and the privacy needs of health care institutions.

4.
J Am Med Inform Assoc ; 31(8): 1671-1681, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-38926131

RESUMO

OBJECTIVES: Heart failure (HF) impacts millions of patients worldwide, yet the variability in treatment responses remains a major challenge for healthcare professionals. The current treatment strategies, largely derived from population based evidence, often fail to consider the unique characteristics of individual patients, resulting in suboptimal outcomes. This study aims to develop computational models that are patient-specific in predicting treatment outcomes, by utilizing a large Electronic Health Records (EHR) database. The goal is to improve drug response predictions by identifying specific HF patient subgroups that are likely to benefit from existing HF medications. MATERIALS AND METHODS: A novel, graph-based model capable of predicting treatment responses, combining Graph Neural Network and Transformer was developed. This method differs from conventional approaches by transforming a patient's EHR data into a graph structure. By defining patient subgroups based on this representation via K-Means Clustering, we were able to enhance the performance of drug response predictions. RESULTS: Leveraging EHR data from 11 627 Mayo Clinic HF patients, our model significantly outperformed traditional models in predicting drug response using NT-proBNP as a HF biomarker across five medication categories (best RMSE of 0.0043). Four distinct patient subgroups were identified with differential characteristics and outcomes, demonstrating superior predictive capabilities over existing HF subtypes (best mean RMSE of 0.0032). DISCUSSION: These results highlight the power of graph-based modeling of EHR in improving HF treatment strategies. The stratification of patients sheds light on particular patient segments that could benefit more significantly from tailored response predictions. CONCLUSIONS: Longitudinal EHR data have the potential to enhance personalized prognostic predictions through the application of graph-based AI techniques.


Assuntos
Registros Eletrônicos de Saúde , Insuficiência Cardíaca , Redes Neurais de Computação , Humanos , Insuficiência Cardíaca/tratamento farmacológico , Masculino , Feminino , Idoso , Resultado do Tratamento , Pessoa de Meia-Idade , Peptídeo Natriurético Encefálico/sangue , Fármacos Cardiovasculares/uso terapêutico
5.
J Am Med Inform Assoc ; 31(7): 1493-1502, 2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38742455

RESUMO

BACKGROUND: Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process. OBJECTIVES: This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks. MATERIALS AND METHODS: We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator. RESULTS: The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies. CONCLUSION: The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde/classificação , Humanos , Classificação/métodos , Erros Médicos/classificação
6.
J Biomed Inform ; 152: 104623, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38458578

RESUMO

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.


Assuntos
Atividades Cotidianas , Estado Funcional , Humanos , Idoso , Aprendizagem , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural
7.
J Am Med Inform Assoc ; 30(12): 2036-2040, 2023 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-37555837

RESUMO

Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.


Assuntos
COVID-19 , Processamento de Linguagem Natural , Humanos , Registros Eletrônicos de Saúde , Algoritmos
8.
NPJ Digit Med ; 6(1): 132, 2023 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-37479735

RESUMO

Clinical phenotyping is often a foundational requirement for obtaining datasets necessary for the development of digital health applications. Traditionally done via manual abstraction, this task is often a bottleneck in development due to time and cost requirements, therefore raising significant interest in accomplishing this task via in-silico means. Nevertheless, current in-silico phenotyping development tends to be focused on a single phenotyping task resulting in a dearth of reusable tools supporting cross-task generalizable in-silico phenotyping. In addition, in-silico phenotyping remains largely inaccessible for a substantial portion of potentially interested users. Here, we highlight the barriers to the usage of in-silico phenotyping and potential solutions in the form of a framework of several desiderata as observed during our implementation of such tasks. In addition, we introduce an example implementation of said framework as a software application, with a focus on ease of adoption, cross-task reusability, and facilitating the clinical phenotyping algorithm development process.

9.
JMIR Med Inform ; 11: e48072, 2023 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-37368483

RESUMO

BACKGROUND: A patient's family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used. OBJECTIVE: In this study, we aimed to construct an FH lexical resource for information extraction and normalization. METHODS: We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning-based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation. RESULTS: The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning-based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable. CONCLUSIONS: The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.

10.
medRxiv ; 2023 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-36747787

RESUMO

Heart failure management is challenging due to the complex and heterogenous nature of its pathophysiology which makes the conventional treatments based on the "one size fits all" ideology not suitable. Coupling the longitudinal medical data with novel deep learning and network-based analytics will enable identifying the distinct patient phenotypic characteristics to help individualize the treatment regimen through the accurate prediction of the physiological response. In this study, we develop a graph representation learning framework that integrates the heterogeneous clinical events in the electronic health records (EHR) as graph format data, in which the patient-specific patterns and features are naturally infused for personalized predictions of lab test response. The framework includes a novel Graph Transformer Network that is equipped with a self-attention mechanism to model the underlying spatial interdependencies among the clinical events characterizing the cardiac physiological interactions in the heart failure treatment and a graph neural network (GNN) layer to incorporate the explicit temporality of each clinical event, that would help summarize the therapeutic effects induced on the physiological variables, and subsequently on the patient's health status as the heart failure condition progresses over time. We introduce a global attention mask that is computed based on event co-occurrences and is aggregated across all patient records to enhance the guidance of neighbor selection in graph representation learning. We test the feasibility of our model through detailed quantitative and qualitative evaluations on observational EHR data.

11.
JCO Clin Cancer Inform ; 6: e2200006, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35917480

RESUMO

PURPOSE: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.


Assuntos
Processamento de Linguagem Natural , Neoplasias , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Neoplasias/diagnóstico , Neoplasias/terapia , Assistência ao Paciente
12.
AMIA Jt Summits Transl Sci Proc ; 2022: 196-205, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35854735

RESUMO

Translation of predictive modeling algorithms into routine clinical care workflows faces challenges in the form of varying data quality-related issues caused by the heterogeneity of electronic health record (EHR) systems. To better understand these issues, we retrospectively assessed and compared the variability of data produced from two different EHR systems. We considered three dimensions of data quality in the context of EHR-based predictive modeling for three distinct translational stages: model development (data completeness), model deployment (data variability), and model implementation (data timeliness). The case study was conducted based on predicting post-surgical complications using both structured and unstructured data. Our study discovered a consistent level of data completeness, a high syntactic, and moderate-high semantic variability across two EHR systems, for which the quality of data is context-specific and closely related to the documentation workflow and the functionality of individual EHR systems.

13.
NPJ Digit Med ; 5(1): 77, 2022 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-35701544

RESUMO

Computational drug repurposing methods adapt Artificial intelligence (AI) algorithms for the discovery of new applications of approved or investigational drugs. Among the heterogeneous datasets, electronic health records (EHRs) datasets provide rich longitudinal and pathophysiological data that facilitate the generation and validation of drug repurposing. Here, we present an appraisal of recently published research on computational drug repurposing utilizing the EHR. Thirty-three research articles, retrieved from Embase, Medline, Scopus, and Web of Science between January 2000 and January 2022, were included in the final review. Four themes, (1) publication venue, (2) data types and sources, (3) method for data processing and prediction, and (4) targeted disease, validation, and released tools were presented. The review summarized the contribution of EHR used in drug repurposing as well as revealed that the utilization is hindered by the validation, accessibility, and understanding of EHRs. These findings can support researchers in the utilization of medical data resources and the development of computational methods for drug repurposing.

14.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35649342

RESUMO

Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.


Assuntos
Benchmarking , Desenvolvimento de Medicamentos , Algoritmos , Avaliação Pré-Clínica de Medicamentos , Reposicionamento de Medicamentos/métodos , Proteínas/genética
15.
Stud Health Technol Inform ; 290: 173-177, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35672994

RESUMO

Reproducibility is an important quality criterion for the secondary use of electronic health records (EHRs). However, multiple barriers to reproducibility are embedded in the heterogeneous EHR environment. These barriers include complex processes for collecting and organizing EHR data and dynamic multi-level interactions occurring during information use (e.g., inter-personal, inter-system, and cross-institutional). To ensure reproducible use of EHRs, we investigated four information quality dimensions and examine the implications for reproducibility based on a real-world EHR study. Four types of IQ measurements suggested that barriers to reproducibility occurred for all stages of secondary use of EHR data. We discussed our recommendations and emphasized the importance of promoting transparent, high-throughput, and accessible data infrastructures and implementation best practices (e.g., data quality assessment, reporting standard).


Assuntos
Registros Eletrônicos de Saúde , Reprodutibilidade dos Testes
16.
Int J Med Inform ; 162: 104736, 2022 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-35316697

RESUMO

INTRODUCTION: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word "fall" is a typical homonym. METHODS: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis. RESULTS: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories. CONCLUSION: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.

17.
J Gerontol A Biol Sci Med Sci ; 77(3): 524-530, 2022 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-35239951

RESUMO

BACKGROUND: Delirium is underdiagnosed in clinical practice and is not routinely coded for billing. Manual chart review can be used to identify the occurrence of delirium; however, it is labor-intensive and impractical for large-scale studies. Natural language processing (NLP) has the capability to process raw text in electronic health records (EHRs) and determine the meaning of the information. We developed and validated NLP algorithms to automatically identify the occurrence of delirium from EHRs. METHODS: This study used a randomly selected cohort from the population-based Mayo Clinic Biobank (N = 300, age ≥65). We adopted the standardized evidence-based framework confusion assessment method (CAM) to develop and evaluate NLP algorithms to identify the occurrence of delirium using clinical notes in EHRs. Two NLP algorithms were developed based on CAM criteria: one based on the original CAM (NLP-CAM; delirium vs no delirium) and another based on our modified CAM (NLP-mCAM; definite, possible, and no delirium). The sensitivity, specificity, and accuracy were used for concordance in delirium status between NLP algorithms and manual chart review as the gold standard. The prevalence of delirium cases was examined using International Classification of Diseases, 9th Revision (ICD-9), NLP-CAM, and NLP-mCAM. RESULTS: NLP-CAM demonstrated a sensitivity, specificity, and accuracy of 0.919, 1.000, and 0.967, respectively. NLP-mCAM demonstrated sensitivity, specificity, and accuracy of 0.827, 0.913, and 0.827, respectively. The prevalence analysis of delirium showed that the NLP-CAM algorithm identified 12 651 (9.4%) delirium patients, the NLP-mCAM algorithm identified 20 611 (15.3%) definite delirium cases, and 10 762 (8.0%) possible cases. CONCLUSIONS: NLP algorithms based on the standardized evidence-based CAM framework demonstrated high performance in delineating delirium status in an expeditious and cost-effective manner.


Assuntos
Delírio , Processamento de Linguagem Natural , Idoso , Algoritmos , Delírio/diagnóstico , Delírio/epidemiologia , Registros Eletrônicos de Saúde , Humanos , Classificação Internacional de Doenças
18.
J Rural Health ; 38(4): 908-915, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35261092

RESUMO

PURPOSE: Rural populations are disproportionately affected by the COVID-19 pandemic. We characterized urban-rural disparities in patient portal messaging utilization for COVID-19, and, of those who used the portal during its early stage in the Midwest. METHODS: We collected over 1 million portal messages generated by midwestern Mayo Clinic patients from February to August 2020. We analyzed patient-generated messages (PGMs) on COVID-19 by urban-rural locality and incorporated patients' sociodemographic factors into the analysis. FINDINGS: The urban-rural ratio of portal users, message senders, and COVID-19 message senders was 1.18, 1.31, and 1.79, indicating greater use among urban patients. The urban-rural ratio (1.69) of PGMs on COVID-19 was higher than that (1.43) of general PGMs. The urban-rural ratios of messaging were 1.72-1.85 for COVID-19-related care and 1.43-1.66 for other health care issues on COVID-19. Compared with urban patients, rural patients sent fewer messages for COVID-19 diagnosis and treatment but more messages for other reasons related to COVID-19-related health care (eg, isolation and anxiety). The frequent senders of COVID-19-related messages among rural patients were 40+ years old, women, married, and White. CONCLUSIONS: In this Midwest health system, rural patients were less likely to use patient online services during a pandemic and their reasons for its use differ from urban patients. Results suggest opportunities for increasing equity in rural patient engagement in patient portals (in particular, minority populations) for COVID-19. Public health intervention strategies could target reasons why rural patients might seek health care in a pandemic, such as social isolation and anxiety.


Assuntos
COVID-19 , Adulto , COVID-19/epidemiologia , Teste para COVID-19 , Feminino , Humanos , Pandemias , Participação do Paciente , População Rural
19.
JMIR Hum Factors ; 9(2): e35187, 2022 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-35171108

RESUMO

BACKGROUND: During the COVID-19 pandemic, patient portals and their message platforms allowed remote access to health care. Utilization patterns in patient messaging during the COVID-19 crisis have not been studied thoroughly. In this work, we propose characterizing patients and their use of asynchronous virtual care for COVID-19 via a retrospective analysis of patient portal messages. OBJECTIVE: This study aimed to perform a retrospective analysis of portal messages to probe asynchronous patient responses to the COVID-19 crisis. METHODS: We collected over 2 million patient-generated messages (PGMs) at Mayo Clinic during February 1 to August 31, 2020. We analyzed descriptive statistics on PGMs related to COVID-19 and incorporated patients' sociodemographic factors into the analysis. We analyzed the PGMs on COVID-19 in terms of COVID-19-related care (eg, COVID-19 symptom self-assessment and COVID-19 tests and results) and other health issues (eg, appointment cancellation, anxiety, and depression). RESULTS: The majority of PGMs on COVID-19 pertained to COVID-19 symptom self-assessment (42.50%) and COVID-19 tests and results (30.84%). The PGMs related to COVID-19 symptom self-assessment and COVID-19 test results had dynamic patterns and peaks similar to the newly confirmed cases in the United States and in Minnesota. The trend of PGMs related to COVID-19 care plans paralleled trends in newly hospitalized cases and deaths. After an initial peak in March, the PGMs on issues such as appointment cancellations and anxiety regarding COVID-19 displayed a declining trend. The majority of message senders were 30-64 years old, married, female, White, or urban residents. This majority was an even higher proportion among patients who sent portal messages on COVID-19. CONCLUSIONS: During the COVID-19 pandemic, patients increased portal messaging utilization to address health care issues about COVID-19 (in particular, symptom self-assessment and tests and results). Trends in message usage closely followed national trends in new cases and hospitalizations. There is a wide disparity for minority and rural populations in the use of PGMs for addressing the COVID-19 crisis.

20.
Bioinformatics ; 38(6): 1776-1778, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-34983060

RESUMO

SUMMARY: Building a high-quality annotation corpus requires expenditure of considerable time and expertise, particularly for biomedical and clinical research applications. Most existing annotation tools provide many advanced features to cover a variety of needs where the installation, integration and difficulty of use present a significant burden for actual annotation tasks. Here, we present MedTator, a serverless annotation tool, aiming to provide an intuitive and interactive user interface that focuses on the core steps related to corpus annotation, such as document annotation, corpus summarization, annotation export and annotation adjudication. AVAILABILITY AND IMPLEMENTATION: MedTator and its tutorial are freely available from https://ohnlp.github.io/MedTator. MedTator source code is available under the Apache 2.0 license: https://github.com/OHNLP/MedTator. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Biologia Computacional
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA