Pesquisa | Biblioteca Virtual em Saúde

1.

Automated Identification of Patients' Unmet Social Needs in Clinical Text Using Natural Language Processing.

Moon, Sungrim; Wu, Yuqi; Doughty, Jay B; Wieland, Mark L; Philpot, Lindsey M; Fan, Jungwei W; Njeru, Jane W.

Mayo Clin Proc Digit Health ; 2(3): 411-420, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39324128

RESUMO

Objective: To develop natural language processing (NLP) solutions for identifying patients' unmet social needs to enable timely intervention. Patients and Methods: Design: A retrospective cohort study with review and annotation of clinical notes to identify unmet social needs, followed by using the annotations to develop and evaluate NLP solutions. Participants: A total of 1103 primary care patients seen at a large academic medical center from June 1, 2019, to May 31, 2021 and referred to a community health worker (CHW) program. Clinical notes and portal messages of 200 age and sex-stratified patients were sampled for annotation of unmet social needs. Systems: Two NLP solutions were developed and compared. The first solution employed similarity-based classification on top of sentences represented as semantic embedding vectors. The second solution involved designing of terms and patterns for identifying each domain of unmet social needs in the clinical text. Measures: Precision, recall, and f1-score of the NLP solutions. Results: A total of 5675 clinical notes and 475 portal messages were annotated, with an inter-annotator agreement of 0.938. The best NLP solution achieved an f1-score of 0.95 and was applied to the entire CHW-referred cohort (n=1103), of whom >80% had at least 1 unmet social need within the 6 months before the first CHW referral. Financial strain and health literacy were the top 2 domains of unmet social needs across most of the sex and age strata. Conclusion: Clinical text contains rich information about patients' unmet social needs. The NLP can achieve good performance in identifying those needs for CHW referral and facilitate data-driven research on social determinants of health.

2.

FedFSA: Hybrid and federated framework for functional status ascertainment across institutions.

Fu, Sunyang; Jia, Heling; Vassilaki, Maria; Keloth, Vipina K; Dang, Yifang; Zhou, Yujia; Garg, Muskan; Petersen, Ronald C; St Sauver, Jennifer; Moon, Sungrim; Wang, Liwei; Wen, Andrew; Li, Fang; Xu, Hua; Tao, Cui; Fan, Jungwei; Liu, Hongfang; Sohn, Sunghwan.

J Biomed Inform ; 152: 104623, 2024 04.

Artigo em Inglês | MEDLINE | ID: mdl-38458578

RESUMO

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.

Assuntos

Atividades Cotidianas , Estado Funcional , Humanos , Idoso , Aprendizagem , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural

3.

Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis.

Wang, Liwei; He, Huan; Wen, Andrew; Moon, Sungrim; Fu, Sunyang; Peterson, Kevin J; Ai, Xuguang; Liu, Sijia; Kavuluru, Ramakanth; Liu, Hongfang.

JMIR Med Inform ; 11: e48072, 2023 Jun 27.

Artigo em Inglês | MEDLINE | ID: mdl-37368483

RESUMO

BACKGROUND: A patient's family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used. OBJECTIVE: In this study, we aimed to construct an FH lexical resource for information extraction and normalization. METHODS: We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning-based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation. RESULTS: The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning-based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable. CONCLUSIONS: The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.

4.

Assessing document section heterogeneity across multiple electronic health record systems for computational phenotyping: A case study of heart-failure phenotyping algorithm.

Moon, Sungrim; Liu, Sijia; Kshatriya, Bhavani Singh Agnikula; Fu, Sunyang; Moser, Ethan D; Bielinski, Suzette J; Fan, Jungwei; Liu, Hongfang.

PLoS One ; 18(3): e0283800, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37000801

RESUMO

BACKGROUND: The incorporation of information from clinical narratives is critical for computational phenotyping. The accurate interpretation of clinical terms highly depends on their associated context, especially the corresponding clinical section information. However, the heterogeneity across different Electronic Health Record (EHR) systems poses challenges in utilizing the section information. OBJECTIVES: Leveraging the eMERGE heart failure (HF) phenotyping algorithm, we assessed the heterogeneity quantitatively through the performance comparison of machine learning (ML) classifiers which map clinical sections containing HF-relevant terms across different EHR systems to standard sections in Health Level 7 (HL7) Clinical Document Architecture (CDA). METHODS: We experimented with both random forest models with sentence-embedding features and bidirectional encoder representations from transformers models. We trained MLs using an automated labeled corpus from an EHR system that adopted HL7 CDA standard. We assessed the performance using a blind test set (n = 300) from the same EHR system and a gold standard (n = 900) manually annotated from three other EHR systems. RESULTS: The F-measure of those ML models varied widely (0.00-0.91%), indicating MLs with one tuning parameter set were insufficient to capture sections across different EHR systems. The error analysis indicates that the section does not always comply with the corresponding standardized sections, leading to low performance. CONCLUSIONS: We presented the potential use of ML techniques to map the sections containing HF-relevant terms in multiple EHR systems to standard sections. However, the findings suggested that the quality and heterogeneity of section structure across different EHRs affect applications due to the poor adoption of documentation standards.

Assuntos

Registros Eletrônicos de Saúde , Insuficiência Cardíaca , Humanos , Software , Algoritmos , Aprendizado de Máquina

5.

Recommended practices and ethical considerations for natural language processing-assisted observational research: A scoping review.

Fu, Sunyang; Wang, Liwei; Moon, Sungrim; Zong, Nansu; He, Huan; Pejaver, Vikas; Relevo, Rose; Walden, Anita; Haendel, Melissa; Chute, Christopher G; Liu, Hongfang.

Clin Transl Sci ; 16(3): 398-411, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36478394

RESUMO

An increasing number of studies have reported using natural language processing (NLP) to assist observational research by extracting clinical information from electronic health records (EHRs). Currently, no standardized reporting guidelines for NLP-assisted observational studies exist. The absence of detailed reporting guidelines may create ambiguity in the use of NLP-derived content, knowledge gaps in the current research reporting practices, and reproducibility challenges. To address these issues, we conducted a scoping review of NLP-assisted observational clinical studies and examined their reporting practices, focusing on NLP methodology and evaluation. Through our investigation, we discovered a high variation regarding the reporting practices, such as inconsistent use of references for measurement studies, variation in the reporting location (reference, appendix, and manuscript), and different granularity of NLP methodology and evaluation details. To promote the wide adoption and utilization of NLP solutions in clinical research, we outline several perspectives that align with the six principles released by the World Health Organization (WHO) that guide the ethical use of artificial intelligence for health.

Assuntos

Inteligência Artificial , Processamento de Linguagem Natural , Humanos , Registros Eletrônicos de Saúde , Reprodutibilidade dos Testes , Estudos Observacionais como Assunto

6.

Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study.

Pongdee, Thanai; Larson, Nicholas B; Divekar, Rohit; Bielinski, Suzette J; Liu, Hongfang; Moon, Sungrim.

JMIR AI ; 2: e44191, 2023 Jun 12.

Artigo em Inglês | MEDLINE | ID: mdl-39105270

RESUMO

BACKGROUND: Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications. OBJECTIVE: Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR). METHODS: A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set. RESULTS: The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively. CONCLUSIONS: We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.

7.

Digital Solutions Observed in Clinical Trials: A Formative Feasibility Scoping Review.

Harrison, Taylor M; Moon, Sungrim; Wang, Liwei; Fu, Sunyang; Liu, Hongfang.

AMIA Annu Symp Proc ; 2023: 987-996, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38222440

RESUMO

Growing digital access accelerates digital transformation of clinical trials where digital solutions (DSs) are increasingly and widely leveraged for improving trial efficiency, effectiveness, and accessibility. Many factors impact DS success including technology barriers, privacy concerns, or user engagement activities. It is unclear how those factors are considered or reported in the literature. Here, we perform a formative feasibility scoping review to identify gaps impacting DS quality and reproducibility in trials. Articles containing digital terms published in English from 2009 to 2022 were collected (n=4,167). 130 articles published between 2016 and 2022 were randomly selected for full-text review. Eligible articles (n=100) were sorted into four identified categories: 16% Education, 59% Intervention, 8% Patient, 17% Treatment. Initial findings about DS trends and reporting practices inform protocol development for a large-scale study urging the generation of fundamental knowledge on reporting standardization, best practice guidelines, and evaluation methodologies related to DS for clinical trials.

Assuntos

Estudos de Viabilidade , Humanos , Reprodutibilidade dos Testes , Ensaios Clínicos como Assunto

8.

Contextual Variation of Clinical Notes induced by EHR Migration.

Miller, Kurt; Moon, Sungrim; Fu, Sunyang; Liu, Hongfang.

AMIA Annu Symp Proc ; 2023: 1155-1164, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38222426

RESUMO

The structure and semantics of clinical notes vary considerably across different Electronic Health Record (EHR) systems, sites, and institutions. Such heterogeneity hampers the portability of natural language processing (NLP) models in extracting information from the text for clinical research or practice. In this study, we evaluate the contextual variation of clinical notes by measuring the semantic and syntactic similarity of the notes of two sets of physicians comprising four medical specialties across EHR migrations at two Mayo Clinic sites. We find significant semantic and syntactic variation imposed by the context of the EHR system and between medical specialties whereas only minor variation is caused by variation of spatial context across sites. Our findings suggest that clinical language models need to account for process differences at the specialty sublanguage level to be generalizable.

Assuntos

Registros Eletrônicos de Saúde , Médicos , Humanos , Semântica , Processamento de Linguagem Natural , Instituições de Assistência Ambulatorial

9.

Extractive Clinical Question-Answering With Multianswer and Multifocus Questions: Data Set Development and Evaluation Study.

Moon, Sungrim; He, Huan; Jia, Heling; Liu, Hongfang; Fan, Jungwei Wilfred.

JMIR AI ; 2: e41818, 2023 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-38875580

RESUMO

BACKGROUND: Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can yield multiple answers to a single question and multiple focus points in 1 question, which are lacking in existing data sets for the development of artificial intelligence solutions. OBJECTIVE: This study aimed to create a data set for developing and evaluating clinical EQA systems that can handle natural multianswer and multifocus questions. METHODS: We leveraged the annotated relations from the 2018 National NLP Clinical Challenges corpus to generate an EQA data set. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multianswer and multifocus question-answering entries, which represent more complex and natural challenges in addition to the basic 1-drug-1-reason cases. A baseline solution was developed and tested on the data set. RESULTS: The derived RxWhyQA data set contains 96,939 QA entries. Among the answerable questions, 25% of them require multiple answers, and 2% of them ask about multiple drugs within 1 question. Frequent cues were observed around the answers in the text, and 90% of the drug and reason terms occurred within the same or an adjacent sentence. The baseline EQA solution achieved a best F1-score of 0.72 on the entire data set, and on specific subsets, it was 0.93 for the unanswerable questions, 0.48 for single-drug questions versus 0.60 for multidrug questions, and 0.54 for the single-answer questions versus 0.43 for multianswer questions. CONCLUSIONS: The RxWhyQA data set can be used to train and evaluate systems that need to handle multianswer and multifocus questions. Specifically, multianswer EQA appears to be challenging and therefore warrants more investment in research. We created and shared a clinical EQA data set with multianswer and multifocus questions that would channel future research efforts toward more realistic scenarios.

10.

Rethinking blood eosinophil counts: Epidemiology, associated chronic diseases, and increased risks of cardiovascular disease.

Pongdee, Thanai; Manemann, Sheila M; Decker, Paul A; Larson, Nicholas B; Moon, Sungrim; Killian, Jill M; Liu, Hongfang; Kita, Hirohito; Bielinski, Suzette J.

J Allergy Clin Immunol Glob ; 1(4): 233-240, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-36466741

RESUMO

Background: The distribution and determinants of blood eosinophil counts in the general population are unclear. Furthermore, whether elevated blood eosinophil counts increase risk for cardiovascular disease (CVD) and other chronic diseases, other than atopic conditions, remains uncertain. Objective: We sought to describe the distribution of eosinophil counts in the general population and determine the association of eosinophil count with prevalent chronic disease and incident CVD. Methods: A population-based adult cohort was followed from January 1, 2006, to December 31, 2020. Electronic health record data regarding demographic characteristics, prevalent clinical characteristics, and incident CVD were extracted. Associations between blood eosinophil counts and demographic characteristics, chronic diseases, laboratory values, and risks of incident CVD were assessed using chi-square test, ANOVA, and Cox proportional hazards regression. Results: Blood eosinophil counts increased with age, body mass index, and reported smoking and tobacco use. The prevalence of chronic obstructive pulmonary disease, hypertension, cardiac arrhythmias, hyperlipidemia, diabetes mellitus, chronic kidney disease, and cancer increased as eosinophil counts increased. Eosinophil counts were significantly associated with coronary heart disease (hazard ratio [HR], 1.44; 95% CI, 1.12-1.84) and heart failure (HR, 1.62; 95% CI, 1.30-2.01) in fully adjusted models and with stroke/transient ischemic attack (HR, 1.37; 95% CI, 1.16-1.61) and CVD death (HR, 1.49; 95% CI, 1.10-2.00) in a model adjusting for age, sex, race, and ethnicity. Conclusions: Blood eosinophil counts differ by demographic and clinical characteristics as well as by prevalent chronic disease. Moreover, elevated eosinophil counts are associated with risk of CVD. Further prospective investigations are needed to determine the utility of eosinophil counts as a biomarker for CVD risk.

11.

Quality assessment of functional status documentation in EHRs across different healthcare institutions.

Fu, Sunyang; Vassilaki, Maria; Ibrahim, Omar A; Petersen, Ronald C; Pagali, Sandeep; St Sauver, Jennifer; Moon, Sungrim; Wang, Liwei; Fan, Jungwei W; Liu, Hongfang; Sohn, Sunghwan.

Front Digit Health ; 4: 958539, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36238199

RESUMO

The secondary use of electronic health records (EHRs) faces challenges in the form of varying data quality-related issues. To address that, we retrospectively assessed the quality of functional status documentation in EHRs of persons participating in Mayo Clinic Study of Aging (MCSA). We used a convergent parallel design to collect quantitative and qualitative data and independently analyzed the findings. We discovered a heterogeneous documentation process, where the care practice teams, institutions, and EHR systems all play an important role in how text data is documented and organized. Four prevalent instrument-assisted documentation (iDoc) expressions were identified based on three distinct instruments: Epic smart form, questionnaire, and occupational therapy and physical therapy templates. We found strong differences in the usage, information quality (intrinsic and contextual), and naturality of language among different type of iDoc expressions. These variations can be caused by different source instruments, information providers, practice settings, care events and institutions. In addition, iDoc expressions are context specific and thus shall not be viewed and processed uniformly. We recommend conducting data quality assessment of unstructured EHR text prior to using the information.

12.

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.

Wang, Liwei; Fu, Sunyang; Wen, Andrew; Ruan, Xiaoyang; He, Huan; Liu, Sijia; Moon, Sungrim; Mai, Michelle; Riaz, Irbaz B; Wang, Nan; Yang, Ping; Xu, Hua; Warner, Jeremy L; Liu, Hongfang.

JCO Clin Cancer Inform ; 6: e2200006, 2022 07.

Artigo em Inglês | MEDLINE | ID: mdl-35917480

RESUMO

PURPOSE: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.

Assuntos

Processamento de Linguagem Natural , Neoplasias , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Neoplasias/diagnóstico , Neoplasias/terapia , Assistência ao Paciente

13.

A scoping review of medical practice variation research within the informatics literature.

Sohn, Sunghwan; Moon, Sungrim; Prokop, Larry J; Montori, Victor M; Fan, J Wilfred.

Int J Med Inform ; 165: 104833, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35868231

RESUMO

RATIONALE: We performed a scoping review of informatics core literature about medical practice variation (MPV) as an agile summary of the subject in our field. MATERIALS AND METHODS: The Ovid integrated database was searched between 1946 and 2022 to identify MPV studies published in major informatics journals and conference proceedings. Two reviewers performed relevance screening, with assistance from another independent reviewer for adjudication. The included articles were then thematically analyzed and summarized through discussion among all three reviewers. RESULTS: A total of 43 articles were included and went through the thematic analysis. About half (n = 21) of the included articles were published in conference proceedings. Five articles reported the effect of MPV on patient outcomes. The variation of interest was most frequently in treatment decisions. In terms of the role informatics played (multiple roles allowed), 39 (90.7%) articles pertained to detection of MPV, 5 were about prevention of MPV and 4 about learning from MPV. DISCUSSION: MPV remains a critical issue in health care, yet most informatics research has been focused on simple tasks such as automating the detection of MPV and assessing compliance to decision-support systems, and less focused on addressing the causes of variation or supporting learning from variation. CONCLUSION: Our scoping review found that informatics studies have focused on detecting of MPV, especially variability in treatments and deviation from practice guidelines. Technological advances should promote more informatics research focused on explaining and learning from MPV.

Assuntos

Pesquisa Biomédica , Informática Médica , Atenção à Saúde , Humanos

14.

Computational drug repurposing based on electronic health records: a scoping review.

Zong, Nansu; Wen, Andrew; Moon, Sungrim; Fu, Sunyang; Wang, Liwei; Zhao, Yiqing; Yu, Yue; Huang, Ming; Wang, Yanshan; Zheng, Gang; Mielke, Michelle M; Cerhan, James R; Liu, Hongfang.

NPJ Digit Med ; 5(1): 77, 2022 Jun 14.

Artigo em Inglês | MEDLINE | ID: mdl-35701544

RESUMO

Computational drug repurposing methods adapt Artificial intelligence (AI) algorithms for the discovery of new applications of approved or investigational drugs. Among the heterogeneous datasets, electronic health records (EHRs) datasets provide rich longitudinal and pathophysiological data that facilitate the generation and validation of drug repurposing. Here, we present an appraisal of recently published research on computational drug repurposing utilizing the EHR. Thirty-three research articles, retrieved from Embase, Medline, Scopus, and Web of Science between January 2000 and January 2022, were included in the final review. Four themes, (1) publication venue, (2) data types and sources, (3) method for data processing and prediction, and (4) targeted disease, validation, and released tools were presented. The review summarized the contribution of EHR used in drug repurposing as well as revealed that the utilization is hindered by the validation, accessibility, and understanding of EHRs. These findings can support researchers in the utilization of medical data resources and the development of computational methods for drug repurposing.

15.

Identifying Information Gaps in Electronic Health Records by Using Natural Language Processing: Gynecologic Surgery History Identification.

Moon, Sungrim; Carlson, Luke A; Moser, Ethan D; Agnikula Kshatriya, Bhavani Singh; Smith, Carin Y; Rocca, Walter A; Gazzuola Rocca, Liliana; Bielinski, Suzette J; Liu, Hongfang; Larson, Nicholas B.

J Med Internet Res ; 24(1): e29015, 2022 01 28.

Artigo em Inglês | MEDLINE | ID: mdl-35089141

RESUMO

BACKGROUND: Electronic health records (EHRs) are a rich source of longitudinal patient data. However, missing information due to clinical care that predated the implementation of EHR system(s) or care that occurred at different medical institutions impedes complete ascertainment of a patient's medical history. OBJECTIVE: This study aimed to investigate information discrepancies and to quantify information gaps by comparing the gynecological surgical history extracted from an EHR of a single institution by using natural language processing (NLP) techniques with the manually curated surgical history information through chart review of records from multiple independent regional health care institutions. METHODS: To facilitate high-throughput evaluation, we developed a rule-based NLP algorithm to detect gynecological surgery history from the unstructured narrative of the Mayo Clinic EHR. These results were compared to a gold standard cohort of 3870 women with gynecological surgery status adjudicated using the Rochester Epidemiology Project medical records-linkage system. We quantified and characterized the information gaps observed that led to misclassification of the surgical status. RESULTS: The NLP algorithm achieved precision of 0.85, recall of 0.82, and F1-score of 0.83 in the test set (n=265) relative to outcomes abstracted from the Mayo EHR. This performance attenuated when directly compared to the gold standard (precision 0.79, recall 0.76, and F1-score 0.76), with the majority of misclassifications being false negatives in nature. We then applied the algorithm to the remaining patients (n=3340) and identified 2 types of information gaps through error analysis. First, 6% (199/3340) of women in this study had no recorded surgery information or partial information in the EHR. Second, 4.3% (144/3340) of women had inconsistent or inaccurate information within the clinical narrative owing to misinterpreted information, erroneous "copy and paste," or incorrect information provided by patients. Additionally, the NLP algorithm misclassified the surgery status of 3.6% (121/3340) of women. CONCLUSIONS: Although NLP techniques were able to adequately recreate the gynecologic surgical status from the clinical narrative, missing or inaccurately reported and recorded information resulted in much of the misclassification observed. Therefore, alternative approaches to collect or curate surgical history are needed.

Assuntos

Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Algoritmos , Estudos de Coortes , Feminino , Procedimentos Cirúrgicos em Ginecologia , Humanos

16.

Towards User-centered Corpus Development: Lessons Learnt from Designing and Developing MedTator.

He, Huan; Fu, Sunyang; Wang, Liwei; Wen, Andrew; Liu, Sijia; Moon, Sungrim; Miller, Kurt; Liu, Hongfang.

AMIA Annu Symp Proc ; 2022: 532-541, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-37128369

RESUMO

A gold standard annotated corpus is usually indispensable when developing natural language processing (NLP) systems. Building a high-quality annotated corpus for clinical NLP requires considerable time and domain expertise during the annotation process. Existing annotation tools may provide powerful features to cover various needs of text annotation tasks, but the target end users tend to be trained annotators. It is challenging for clinical research teams to utilize those tools in their projects due to various factors such as the complexity of advanced features and data security concerns. To address those challenges, we developed MedTator, a serverless web-based annotation tool with an intuitive user-centered interface aiming to provide a lightweight solution for the core tasks in corpus development. Moreover, we present three lessons learned from the designing and developing MedTator, which will contribute to the research community's knowledge for future open-source tool development.

Assuntos

Processamento de Linguagem Natural , Humanos

17.

Bridging the Granularity Gap in Family History Information Extracted from Clinical Narratives.

Moon, Sungrim; Wang, Liwei; Chen, Xuan; Wang, Nan; Manemann, Sheila M; Larson, Nicholas B; Bielinski, Suzette J; Liu, Hongfang.

AMIA Annu Symp Proc ; 2022: 795-804, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-37128427

RESUMO

Family history (FH) is important for disease risk assessment and prevention. However, incorporating FH information derived from electronic health records (EHRs) for downstream analytics is challenging due to the lack of standardization. We aimed to automatically align FH concepts derived from a clinical corpus to disease category resources popularly used, including Clinical Classification System (CCS), Phecode, Comparative Toxicogenomics Database (CTD), Human phenotype ontology, and Human disease ontology (HDO). Leveraging the Unified Medical Language System (UMLS), we achieved high mapping coverages of FH concepts in those resources, using the parent and broader/alike relations available in the UMLS. Among the five resources, CTD has the best coverage (93%) of FH concepts, HDO has the coarsest granularity of FH disease categories, while CCS showed the finest-grained regarding disease categories. The study suggests that we can mitigate the challenge of various degrees of granularity of NLP-derived FH using those ontology or terminological resources.

Assuntos

Narração , Unified Medical Language System , Humanos , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Medição de Risco , Processamento de Linguagem Natural

18.

Artificial intelligence-assisted clinical decision support for childhood asthma management: A randomized clinical trial.

Seol, Hee Yun; Shrestha, Pragya; Muth, Joy Fladager; Wi, Chung-Il; Sohn, Sunghwan; Ryu, Euijung; Park, Miguel; Ihrke, Kathy; Moon, Sungrim; King, Katherine; Wheeler, Philip; Borah, Bijan; Moriarty, James; Rosedahl, Jordan; Liu, Hongfang; McWilliams, Deborah B; Juhn, Young J.

PLoS One ; 16(8): e0255261, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34339438

RESUMO

RATIONALE: Clinical decision support (CDS) tools leveraging electronic health records (EHRs) have been an approach for addressing challenges in asthma care but remain under-studied through clinical trials. OBJECTIVES: To assess the effectiveness and efficiency of Asthma-Guidance and Prediction System (A-GPS), an Artificial Intelligence (AI)-assisted CDS tool, in optimizing asthma management through a randomized clinical trial (RCT). METHODS: This was a single-center pragmatic RCT with a stratified randomization design conducted for one year in the primary care pediatric practice of the Mayo Clinic, MN. Children (<18 years) diagnosed with asthma receiving care at the study site were enrolled along with their 42 primary care providers. Study subjects were stratified into three strata (based on asthma severity, asthma care status, and asthma diagnosis) and were blinded to the assigned groups. MEASUREMENTS: Intervention was a quarterly A-GPS report to clinicians including relevant clinical information for asthma management from EHRs and machine learning-based prediction for risk of asthma exacerbation (AE). Primary endpoint was the occurrence of AE within 1 year and secondary outcomes included time required for clinicians to review EHRs for asthma management. MAIN RESULTS: Out of 555 participants invited to the study, 184 consented for the study and were randomized (90 in intervention and 94 in control group). Median age of 184 participants was 8.5 years. While the proportion of children with AE in both groups decreased from the baseline (P = 0.042), there was no difference in AE frequency between the two groups (12% for the intervention group vs. 15% for the control group, Odds Ratio: 0.82; 95%CI 0.374-1.96; P = 0.626) during the study period. For the secondary end points, A-GPS intervention, however, significantly reduced time for reviewing EHRs for asthma management of each participant (median: 3.5 min, IQR: 2-5), compared to usual care without A-GPS (median: 11.3 min, IQR: 6.3-15); p<0.001). Mean health care costs with 95%CI of children during the trial (compared to before the trial) in the intervention group were lower than those in the control group (-$1,036 [-$2177, $44] for the intervention group vs. +$80 [-$841, $1000] for the control group), though there was no significant difference (p = 0.12). Among those who experienced the first AE during the study period (n = 25), those in the intervention group had timelier follow up by the clinical care team compared to those in the control group but no significant difference was found (HR = 1.93; 95% CI: 0.82-1.45, P = 0.10). There was no difference in the proportion of duration when patients had well-controlled asthma during the study period between the intervention and the control groups. CONCLUSIONS: While A-GPS-based intervention showed similar reduction in AE events to usual care, it might reduce clinicians' burden for EHRs review resulting in efficient asthma management. A larger RCT is needed for further studying the findings. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT02865967.

Assuntos

Asma , Inteligência Artificial , Asma/tratamento farmacológico , Criança , Sistemas de Apoio a Decisões Clínicas , Humanos , Masculino , Atenção Primária à Saúde

19.

Longitudinal cohorts for harnessing the electronic health record for disease prediction in a US population.

Manemann, Sheila M; St Sauver, Jennifer L; Liu, Hongfang; Larson, Nicholas B; Moon, Sungrim; Takahashi, Paul Y; Olson, Janet E; Rocca, Walter A; Miller, Virginia M; Therneau, Terry M; Ngufor, Che G; Roger, Veronique L; Zhao, Yiqing; Decker, Paul A; Killian, Jill M; Bielinski, Suzette J.

BMJ Open ; 11(6): e044353, 2021 06 08.

Artigo em Inglês | MEDLINE | ID: mdl-34103314

RESUMO

PURPOSE: The depth and breadth of clinical data within electronic health record (EHR) systems paired with innovative machine learning methods can be leveraged to identify novel risk factors for complex diseases. However, analysing the EHR is challenging due to complexity and quality of the data. Therefore, we developed large electronic population-based cohorts with comprehensive harmonised and processed EHR data. PARTICIPANTS: All individuals 30 years of age or older who resided in Olmsted County, Minnesota on 1 January 2006 were identified for the discovery cohort. Algorithms to define a variety of patient characteristics were developed and validated, thus building a comprehensive risk profile for each patient. Patients are followed for incident diseases and ageing-related outcomes. Using the same methods, an independent validation cohort was assembled by identifying all individuals 30 years of age or older who resided in the largely rural 26-county area of southern Minnesota and western Wisconsin on 1 January 2013. FINDINGS TO DATE: For the discovery cohort, 76 255 individuals (median age 49; 53% women) were identified from which a total of 9 644 221 laboratory results; 9 513 840 diagnosis codes; 10 924 291 procedure codes; 1 277 231 outpatient drug prescriptions; 966 136 heart rate measurements and 1 159 836 blood pressure (BP) measurements were retrieved during the baseline time period. The most prevalent conditions in this cohort were hyperlipidaemia, hypertension and arthritis. For the validation cohort, 333 460 individuals (median age 54; 52% women) were identified and to date, a total of 19 926 750 diagnosis codes, 10 527 444 heart rate measurements and 7 356 344 BP measurements were retrieved during baseline. FUTURE PLANS: Using advanced machine learning approaches, these electronic cohorts will be used to identify novel sex-specific risk factors for complex diseases. These approaches will allow us to address several challenges with the use of EHR.

Assuntos

Registros Eletrônicos de Saúde , Aprendizado de Máquina , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Minnesota/epidemiologia , Wisconsin

20.

Clinical concept extraction: A methodology review.

Fu, Sunyang; Chen, David; He, Huan; Liu, Sijia; Moon, Sungrim; Peterson, Kevin J; Shen, Feichen; Wang, Liwei; Wang, Yanshan; Wen, Andrew; Zhao, Yiqing; Sohn, Sunghwan; Liu, Hongfang.

J Biomed Inform ; 109: 103526, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32768446

RESUMO

BACKGROUND: Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES: In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS: Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS: A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.

Assuntos

Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Bibliometria , Projetos de Pesquisa

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA