Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
J Biomed Inform ; 152: 104623, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38458578

RESUMO

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.


Assuntos
Atividades Cotidianas , Estado Funcional , Humanos , Idoso , Aprendizagem , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural
2.
JMIR Med Inform ; 11: e48072, 2023 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-37368483

RESUMO

BACKGROUND: A patient's family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used. OBJECTIVE: In this study, we aimed to construct an FH lexical resource for information extraction and normalization. METHODS: We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning-based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation. RESULTS: The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning-based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable. CONCLUSIONS: The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.

3.
PLoS One ; 18(3): e0283800, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37000801

RESUMO

BACKGROUND: The incorporation of information from clinical narratives is critical for computational phenotyping. The accurate interpretation of clinical terms highly depends on their associated context, especially the corresponding clinical section information. However, the heterogeneity across different Electronic Health Record (EHR) systems poses challenges in utilizing the section information. OBJECTIVES: Leveraging the eMERGE heart failure (HF) phenotyping algorithm, we assessed the heterogeneity quantitatively through the performance comparison of machine learning (ML) classifiers which map clinical sections containing HF-relevant terms across different EHR systems to standard sections in Health Level 7 (HL7) Clinical Document Architecture (CDA). METHODS: We experimented with both random forest models with sentence-embedding features and bidirectional encoder representations from transformers models. We trained MLs using an automated labeled corpus from an EHR system that adopted HL7 CDA standard. We assessed the performance using a blind test set (n = 300) from the same EHR system and a gold standard (n = 900) manually annotated from three other EHR systems. RESULTS: The F-measure of those ML models varied widely (0.00-0.91%), indicating MLs with one tuning parameter set were insufficient to capture sections across different EHR systems. The error analysis indicates that the section does not always comply with the corresponding standardized sections, leading to low performance. CONCLUSIONS: We presented the potential use of ML techniques to map the sections containing HF-relevant terms in multiple EHR systems to standard sections. However, the findings suggested that the quality and heterogeneity of section structure across different EHRs affect applications due to the poor adoption of documentation standards.


Assuntos
Registros Eletrônicos de Saúde , Insuficiência Cardíaca , Humanos , Software , Algoritmos , Aprendizado de Máquina
4.
Clin Transl Sci ; 16(3): 398-411, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36478394

RESUMO

An increasing number of studies have reported using natural language processing (NLP) to assist observational research by extracting clinical information from electronic health records (EHRs). Currently, no standardized reporting guidelines for NLP-assisted observational studies exist. The absence of detailed reporting guidelines may create ambiguity in the use of NLP-derived content, knowledge gaps in the current research reporting practices, and reproducibility challenges. To address these issues, we conducted a scoping review of NLP-assisted observational clinical studies and examined their reporting practices, focusing on NLP methodology and evaluation. Through our investigation, we discovered a high variation regarding the reporting practices, such as inconsistent use of references for measurement studies, variation in the reporting location (reference, appendix, and manuscript), and different granularity of NLP methodology and evaluation details. To promote the wide adoption and utilization of NLP solutions in clinical research, we outline several perspectives that align with the six principles released by the World Health Organization (WHO) that guide the ethical use of artificial intelligence for health.


Assuntos
Inteligência Artificial , Processamento de Linguagem Natural , Humanos , Registros Eletrônicos de Saúde , Reprodutibilidade dos Testes , Estudos Observacionais como Assunto
5.
AMIA Annu Symp Proc ; 2023: 987-996, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222440

RESUMO

Growing digital access accelerates digital transformation of clinical trials where digital solutions (DSs) are increasingly and widely leveraged for improving trial efficiency, effectiveness, and accessibility. Many factors impact DS success including technology barriers, privacy concerns, or user engagement activities. It is unclear how those factors are considered or reported in the literature. Here, we perform a formative feasibility scoping review to identify gaps impacting DS quality and reproducibility in trials. Articles containing digital terms published in English from 2009 to 2022 were collected (n=4,167). 130 articles published between 2016 and 2022 were randomly selected for full-text review. Eligible articles (n=100) were sorted into four identified categories: 16% Education, 59% Intervention, 8% Patient, 17% Treatment. Initial findings about DS trends and reporting practices inform protocol development for a large-scale study urging the generation of fundamental knowledge on reporting standardization, best practice guidelines, and evaluation methodologies related to DS for clinical trials.


Assuntos
Estudos de Viabilidade , Humanos , Reprodutibilidade dos Testes , Ensaios Clínicos como Assunto
6.
AMIA Annu Symp Proc ; 2023: 1155-1164, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222426

RESUMO

The structure and semantics of clinical notes vary considerably across different Electronic Health Record (EHR) systems, sites, and institutions. Such heterogeneity hampers the portability of natural language processing (NLP) models in extracting information from the text for clinical research or practice. In this study, we evaluate the contextual variation of clinical notes by measuring the semantic and syntactic similarity of the notes of two sets of physicians comprising four medical specialties across EHR migrations at two Mayo Clinic sites. We find significant semantic and syntactic variation imposed by the context of the EHR system and between medical specialties whereas only minor variation is caused by variation of spatial context across sites. Our findings suggest that clinical language models need to account for process differences at the specialty sublanguage level to be generalizable.


Assuntos
Registros Eletrônicos de Saúde , Médicos , Humanos , Semântica , Processamento de Linguagem Natural , Instituições de Assistência Ambulatorial
7.
J Allergy Clin Immunol Glob ; 1(4): 233-240, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36466741

RESUMO

Background: The distribution and determinants of blood eosinophil counts in the general population are unclear. Furthermore, whether elevated blood eosinophil counts increase risk for cardiovascular disease (CVD) and other chronic diseases, other than atopic conditions, remains uncertain. Objective: We sought to describe the distribution of eosinophil counts in the general population and determine the association of eosinophil count with prevalent chronic disease and incident CVD. Methods: A population-based adult cohort was followed from January 1, 2006, to December 31, 2020. Electronic health record data regarding demographic characteristics, prevalent clinical characteristics, and incident CVD were extracted. Associations between blood eosinophil counts and demographic characteristics, chronic diseases, laboratory values, and risks of incident CVD were assessed using chi-square test, ANOVA, and Cox proportional hazards regression. Results: Blood eosinophil counts increased with age, body mass index, and reported smoking and tobacco use. The prevalence of chronic obstructive pulmonary disease, hypertension, cardiac arrhythmias, hyperlipidemia, diabetes mellitus, chronic kidney disease, and cancer increased as eosinophil counts increased. Eosinophil counts were significantly associated with coronary heart disease (hazard ratio [HR], 1.44; 95% CI, 1.12-1.84) and heart failure (HR, 1.62; 95% CI, 1.30-2.01) in fully adjusted models and with stroke/transient ischemic attack (HR, 1.37; 95% CI, 1.16-1.61) and CVD death (HR, 1.49; 95% CI, 1.10-2.00) in a model adjusting for age, sex, race, and ethnicity. Conclusions: Blood eosinophil counts differ by demographic and clinical characteristics as well as by prevalent chronic disease. Moreover, elevated eosinophil counts are associated with risk of CVD. Further prospective investigations are needed to determine the utility of eosinophil counts as a biomarker for CVD risk.

8.
Front Digit Health ; 4: 958539, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36238199

RESUMO

The secondary use of electronic health records (EHRs) faces challenges in the form of varying data quality-related issues. To address that, we retrospectively assessed the quality of functional status documentation in EHRs of persons participating in Mayo Clinic Study of Aging (MCSA). We used a convergent parallel design to collect quantitative and qualitative data and independently analyzed the findings. We discovered a heterogeneous documentation process, where the care practice teams, institutions, and EHR systems all play an important role in how text data is documented and organized. Four prevalent instrument-assisted documentation (iDoc) expressions were identified based on three distinct instruments: Epic smart form, questionnaire, and occupational therapy and physical therapy templates. We found strong differences in the usage, information quality (intrinsic and contextual), and naturality of language among different type of iDoc expressions. These variations can be caused by different source instruments, information providers, practice settings, care events and institutions. In addition, iDoc expressions are context specific and thus shall not be viewed and processed uniformly. We recommend conducting data quality assessment of unstructured EHR text prior to using the information.

9.
JCO Clin Cancer Inform ; 6: e2200006, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35917480

RESUMO

PURPOSE: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.


Assuntos
Processamento de Linguagem Natural , Neoplasias , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Neoplasias/diagnóstico , Neoplasias/terapia , Assistência ao Paciente
10.
Int J Med Inform ; 165: 104833, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35868231

RESUMO

RATIONALE: We performed a scoping review of informatics core literature about medical practice variation (MPV) as an agile summary of the subject in our field. MATERIALS AND METHODS: The Ovid integrated database was searched between 1946 and 2022 to identify MPV studies published in major informatics journals and conference proceedings. Two reviewers performed relevance screening, with assistance from another independent reviewer for adjudication. The included articles were then thematically analyzed and summarized through discussion among all three reviewers. RESULTS: A total of 43 articles were included and went through the thematic analysis. About half (n = 21) of the included articles were published in conference proceedings. Five articles reported the effect of MPV on patient outcomes. The variation of interest was most frequently in treatment decisions. In terms of the role informatics played (multiple roles allowed), 39 (90.7%) articles pertained to detection of MPV, 5 were about prevention of MPV and 4 about learning from MPV. DISCUSSION: MPV remains a critical issue in health care, yet most informatics research has been focused on simple tasks such as automating the detection of MPV and assessing compliance to decision-support systems, and less focused on addressing the causes of variation or supporting learning from variation. CONCLUSION: Our scoping review found that informatics studies have focused on detecting of MPV, especially variability in treatments and deviation from practice guidelines. Technological advances should promote more informatics research focused on explaining and learning from MPV.


Assuntos
Pesquisa Biomédica , Informática Médica , Atenção à Saúde , Humanos
11.
NPJ Digit Med ; 5(1): 77, 2022 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-35701544

RESUMO

Computational drug repurposing methods adapt Artificial intelligence (AI) algorithms for the discovery of new applications of approved or investigational drugs. Among the heterogeneous datasets, electronic health records (EHRs) datasets provide rich longitudinal and pathophysiological data that facilitate the generation and validation of drug repurposing. Here, we present an appraisal of recently published research on computational drug repurposing utilizing the EHR. Thirty-three research articles, retrieved from Embase, Medline, Scopus, and Web of Science between January 2000 and January 2022, were included in the final review. Four themes, (1) publication venue, (2) data types and sources, (3) method for data processing and prediction, and (4) targeted disease, validation, and released tools were presented. The review summarized the contribution of EHR used in drug repurposing as well as revealed that the utilization is hindered by the validation, accessibility, and understanding of EHRs. These findings can support researchers in the utilization of medical data resources and the development of computational methods for drug repurposing.

12.
J Med Internet Res ; 24(1): e29015, 2022 01 28.
Artigo em Inglês | MEDLINE | ID: mdl-35089141

RESUMO

BACKGROUND: Electronic health records (EHRs) are a rich source of longitudinal patient data. However, missing information due to clinical care that predated the implementation of EHR system(s) or care that occurred at different medical institutions impedes complete ascertainment of a patient's medical history. OBJECTIVE: This study aimed to investigate information discrepancies and to quantify information gaps by comparing the gynecological surgical history extracted from an EHR of a single institution by using natural language processing (NLP) techniques with the manually curated surgical history information through chart review of records from multiple independent regional health care institutions. METHODS: To facilitate high-throughput evaluation, we developed a rule-based NLP algorithm to detect gynecological surgery history from the unstructured narrative of the Mayo Clinic EHR. These results were compared to a gold standard cohort of 3870 women with gynecological surgery status adjudicated using the Rochester Epidemiology Project medical records-linkage system. We quantified and characterized the information gaps observed that led to misclassification of the surgical status. RESULTS: The NLP algorithm achieved precision of 0.85, recall of 0.82, and F1-score of 0.83 in the test set (n=265) relative to outcomes abstracted from the Mayo EHR. This performance attenuated when directly compared to the gold standard (precision 0.79, recall 0.76, and F1-score 0.76), with the majority of misclassifications being false negatives in nature. We then applied the algorithm to the remaining patients (n=3340) and identified 2 types of information gaps through error analysis. First, 6% (199/3340) of women in this study had no recorded surgery information or partial information in the EHR. Second, 4.3% (144/3340) of women had inconsistent or inaccurate information within the clinical narrative owing to misinterpreted information, erroneous "copy and paste," or incorrect information provided by patients. Additionally, the NLP algorithm misclassified the surgery status of 3.6% (121/3340) of women. CONCLUSIONS: Although NLP techniques were able to adequately recreate the gynecologic surgical status from the clinical narrative, missing or inaccurately reported and recorded information resulted in much of the misclassification observed. Therefore, alternative approaches to collect or curate surgical history are needed.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Algoritmos , Estudos de Coortes , Feminino , Procedimentos Cirúrgicos em Ginecologia , Humanos
13.
AMIA Annu Symp Proc ; 2022: 532-541, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128369

RESUMO

A gold standard annotated corpus is usually indispensable when developing natural language processing (NLP) systems. Building a high-quality annotated corpus for clinical NLP requires considerable time and domain expertise during the annotation process. Existing annotation tools may provide powerful features to cover various needs of text annotation tasks, but the target end users tend to be trained annotators. It is challenging for clinical research teams to utilize those tools in their projects due to various factors such as the complexity of advanced features and data security concerns. To address those challenges, we developed MedTator, a serverless web-based annotation tool with an intuitive user-centered interface aiming to provide a lightweight solution for the core tasks in corpus development. Moreover, we present three lessons learned from the designing and developing MedTator, which will contribute to the research community's knowledge for future open-source tool development.


Assuntos
Processamento de Linguagem Natural , Humanos
14.
AMIA Annu Symp Proc ; 2022: 795-804, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128427

RESUMO

Family history (FH) is important for disease risk assessment and prevention. However, incorporating FH information derived from electronic health records (EHRs) for downstream analytics is challenging due to the lack of standardization. We aimed to automatically align FH concepts derived from a clinical corpus to disease category resources popularly used, including Clinical Classification System (CCS), Phecode, Comparative Toxicogenomics Database (CTD), Human phenotype ontology, and Human disease ontology (HDO). Leveraging the Unified Medical Language System (UMLS), we achieved high mapping coverages of FH concepts in those resources, using the parent and broader/alike relations available in the UMLS. Among the five resources, CTD has the best coverage (93%) of FH concepts, HDO has the coarsest granularity of FH disease categories, while CCS showed the finest-grained regarding disease categories. The study suggests that we can mitigate the challenge of various degrees of granularity of NLP-derived FH using those ontology or terminological resources.


Assuntos
Narração , Unified Medical Language System , Humanos , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Medição de Risco , Processamento de Linguagem Natural
15.
PLoS One ; 16(8): e0255261, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34339438

RESUMO

RATIONALE: Clinical decision support (CDS) tools leveraging electronic health records (EHRs) have been an approach for addressing challenges in asthma care but remain under-studied through clinical trials. OBJECTIVES: To assess the effectiveness and efficiency of Asthma-Guidance and Prediction System (A-GPS), an Artificial Intelligence (AI)-assisted CDS tool, in optimizing asthma management through a randomized clinical trial (RCT). METHODS: This was a single-center pragmatic RCT with a stratified randomization design conducted for one year in the primary care pediatric practice of the Mayo Clinic, MN. Children (<18 years) diagnosed with asthma receiving care at the study site were enrolled along with their 42 primary care providers. Study subjects were stratified into three strata (based on asthma severity, asthma care status, and asthma diagnosis) and were blinded to the assigned groups. MEASUREMENTS: Intervention was a quarterly A-GPS report to clinicians including relevant clinical information for asthma management from EHRs and machine learning-based prediction for risk of asthma exacerbation (AE). Primary endpoint was the occurrence of AE within 1 year and secondary outcomes included time required for clinicians to review EHRs for asthma management. MAIN RESULTS: Out of 555 participants invited to the study, 184 consented for the study and were randomized (90 in intervention and 94 in control group). Median age of 184 participants was 8.5 years. While the proportion of children with AE in both groups decreased from the baseline (P = 0.042), there was no difference in AE frequency between the two groups (12% for the intervention group vs. 15% for the control group, Odds Ratio: 0.82; 95%CI 0.374-1.96; P = 0.626) during the study period. For the secondary end points, A-GPS intervention, however, significantly reduced time for reviewing EHRs for asthma management of each participant (median: 3.5 min, IQR: 2-5), compared to usual care without A-GPS (median: 11.3 min, IQR: 6.3-15); p<0.001). Mean health care costs with 95%CI of children during the trial (compared to before the trial) in the intervention group were lower than those in the control group (-$1,036 [-$2177, $44] for the intervention group vs. +$80 [-$841, $1000] for the control group), though there was no significant difference (p = 0.12). Among those who experienced the first AE during the study period (n = 25), those in the intervention group had timelier follow up by the clinical care team compared to those in the control group but no significant difference was found (HR = 1.93; 95% CI: 0.82-1.45, P = 0.10). There was no difference in the proportion of duration when patients had well-controlled asthma during the study period between the intervention and the control groups. CONCLUSIONS: While A-GPS-based intervention showed similar reduction in AE events to usual care, it might reduce clinicians' burden for EHRs review resulting in efficient asthma management. A larger RCT is needed for further studying the findings. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT02865967.


Assuntos
Asma , Inteligência Artificial , Asma/tratamento farmacológico , Criança , Sistemas de Apoio a Decisões Clínicas , Humanos , Masculino , Atenção Primária à Saúde
16.
BMJ Open ; 11(6): e044353, 2021 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-34103314

RESUMO

PURPOSE: The depth and breadth of clinical data within electronic health record (EHR) systems paired with innovative machine learning methods can be leveraged to identify novel risk factors for complex diseases. However, analysing the EHR is challenging due to complexity and quality of the data. Therefore, we developed large electronic population-based cohorts with comprehensive harmonised and processed EHR data. PARTICIPANTS: All individuals 30 years of age or older who resided in Olmsted County, Minnesota on 1 January 2006 were identified for the discovery cohort. Algorithms to define a variety of patient characteristics were developed and validated, thus building a comprehensive risk profile for each patient. Patients are followed for incident diseases and ageing-related outcomes. Using the same methods, an independent validation cohort was assembled by identifying all individuals 30 years of age or older who resided in the largely rural 26-county area of southern Minnesota and western Wisconsin on 1 January 2013. FINDINGS TO DATE: For the discovery cohort, 76 255 individuals (median age 49; 53% women) were identified from which a total of 9 644 221 laboratory results; 9 513 840 diagnosis codes; 10 924 291 procedure codes; 1 277 231 outpatient drug prescriptions; 966 136 heart rate measurements and 1 159 836 blood pressure (BP) measurements were retrieved during the baseline time period. The most prevalent conditions in this cohort were hyperlipidaemia, hypertension and arthritis. For the validation cohort, 333 460 individuals (median age 54; 52% women) were identified and to date, a total of 19 926 750 diagnosis codes, 10 527 444 heart rate measurements and 7 356 344 BP measurements were retrieved during baseline. FUTURE PLANS: Using advanced machine learning approaches, these electronic cohorts will be used to identify novel sex-specific risk factors for complex diseases. These approaches will allow us to address several challenges with the use of EHR.


Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Minnesota/epidemiologia , Wisconsin
17.
J Biomed Inform ; 109: 103526, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32768446

RESUMO

BACKGROUND: Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES: In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS: Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS: A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.


Assuntos
Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Bibliometria , Projetos de Pesquisa
18.
JAMIA Open ; 3(1): 16-20, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32607483

RESUMO

OBJECTIVES: To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. MATERIALS AND METHODS: Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: (1) comparing the merits from different training data and (2) error analysis. RESULTS: The best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. DISCUSSION: The error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. CONCLUSION: The BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for question-driven clinical information extraction.

19.
NPJ Digit Med ; 2: 130, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31872069

RESUMO

Data is foundational to high-quality artificial intelligence (AI). Given that a substantial amount of clinically relevant information is embedded in unstructured data, natural language processing (NLP) plays an essential role in extracting valuable information that can benefit decision making, administration reporting, and research. Here, we share several desiderata pertaining to development and usage of NLP systems, derived from two decades of experience implementing clinical NLP at the Mayo Clinic, to inform the healthcare AI community. Using a framework, we developed as an example implementation, the desiderata emphasize the importance of a user-friendly platform, efficient collection of domain expert inputs, seamless integration with clinical data, and a highly scalable computing infrastructure.

20.
Int J Med Inform ; 128: 32-38, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31160009

RESUMO

BACKGROUND: The management of hypertrophic cardiomyopathy (HCM) patients requires the knowledge of risk factors associated with sudden cardiac death (SCD). SCD risk factors such as syncope and family history of SCD (FH-SCD) as well as family history of HCM (FH-HCM) are documented in electronic health records (EHRs) as clinical narratives. Automated extraction of risk factors from clinical narratives by natural language processing (NLP) may expedite management workflow of HCM patients. The aim of this study was to develop and deploy NLP algorithms for automated extraction of syncope, FH-SCD, and FH-HCM from clinical narratives. METHODS AND RESULTS: We randomly selected 200 patients from the Mayo HCM registry for development (n = 100) and testing (n = 100) of NLP algorithms for extraction of syncope, FH-SCD as well as FH-HCM from clinical narratives of EHRs. The clinical reference standard was manually abstracted by 2 independent annotators. Performance of NLP algorithms was compared to aggregation and summarization of data entries in the HCM registry for syncope, FH-SCD, and FH-HCM. We also compared the NLP algorithms with billing codes for syncope as well as responses to patient survey questions for FH-SCD and FH-HCM. These analyses demonstrated NLP had superior sensitivity (0.96 vs 0.39, p < 0.001) and comparable specificity (0.90 vs 0.92, p = 0.74) and PPV (0.90 vs 0.83, p = 0.37) compared to billing codes for syncope. For FH-SCD, NLP outperformed survey responses for all parameters (sensitivity: 0.91 vs 0.59, p = 0.002; specificity: 0.98 vs 0.50, p < 0.001; PPV: 0.97 vs 0.38, p < 0.001). NLP also achieved superior sensitivity (0.95 vs 0.24, p < 0.001) with comparable specificity (0.95 vs 1.0, p-value not calculable) and positive predictive value (PPV) (0.92 vs 1.0, p = 0.09) compared to survey responses for FH-HCM. CONCLUSIONS: Automated extraction of syncope, FH-SCD and FH-HCM using NLP is feasible and has promise to increase efficiency of workflow for providers managing HCM patients.


Assuntos
Algoritmos , Cardiomiopatia Hipertrófica/complicações , Morte Súbita Cardíaca/etiologia , Registros Eletrônicos de Saúde/estatística & dados numéricos , Processamento de Linguagem Natural , Morte Súbita Cardíaca/prevenção & controle , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA