Search | VHL Regional Portal

Using Artificial Intelligence With Natural Language Processing to Combine Electronic Health Record's Structured and Free Text Data to Identify Nonvalvular Atrial Fibrillation to Decrease Strokes and Death: Evaluation and Case-Control Study.

Elkin, Peter L; Mullin, Sarah; Mardekian, Jack; Crowner, Christopher; Sakilay, Sylvester; Sinha, Shyamashree; Brady, Gary; Wright, Marcia; Nolen, Kimberly; Trainer, JoAnn; Koppel, Ross; Schlegel, Daniel; Kaushik, Sashank; Zhao, Jane; Song, Buer; Anand, Edwin.

J Med Internet Res ; 23(11): e28946, 2021 11 09.

Article in English | MEDLINE | ID: mdl-34751659

ABSTRACT

BACKGROUND: Nonvalvular atrial fibrillation (NVAF) affects almost 6 million Americans and is a major contributor to stroke but is significantly undiagnosed and undertreated despite explicit guidelines for oral anticoagulation. OBJECTIVE: The aim of this study is to investigate whether the use of semisupervised natural language processing (NLP) of electronic health record's (EHR) free-text information combined with structured EHR data improves NVAF discovery and treatment and perhaps offers a method to prevent thousands of deaths and save billions of dollars. METHODS: We abstracted 96,681 participants from the University of Buffalo faculty practice's EHR. NLP was used to index the notes and compare the ability to identify NVAF, congestive heart failure, hypertension, age ≥75 years, diabetes mellitus, stroke or transient ischemic attack, vascular disease, age 65 to 74 years, sex category (CHA2DS2-VASc), and Hypertension, Abnormal liver/renal function, Stroke history, Bleeding history or predisposition, Labile INR, Elderly, Drug/alcohol usage (HAS-BLED) scores using unstructured data (International Classification of Diseases codes) versus structured and unstructured data from clinical notes. In addition, we analyzed data from 63,296,120 participants in the Optum and Truven databases to determine the NVAF frequency, rates of CHA2DS2VASc ≥2, and no contraindications to oral anticoagulants, rates of stroke and death in the untreated population, and first year's costs after stroke. RESULTS: The structured-plus-unstructured method would have identified 3,976,056 additional true NVAF cases (P<.001) and improved sensitivity for CHA2DS2-VASc and HAS-BLED scores compared with the structured data alone (P=.002 and P<.001, respectively), causing a 32.1% improvement. For the United States, this method would prevent an estimated 176,537 strokes, save 10,575 lives, and save >US $13.5 billion. CONCLUSIONS: Artificial intelligence-informed bio-surveillance combining NLP of free-text information with structured EHR data improves data completeness, prevents thousands of strokes, and saves lives and funds. This method is applicable to many disorders with profound public health consequences.

Subject(s)

Atrial Fibrillation , Stroke , Aged , Anticoagulants , Artificial Intelligence , Atrial Fibrillation/drug therapy , Atrial Fibrillation/prevention & control , Case-Control Studies , Electronic Health Records , Humans , Natural Language Processing , Risk Assessment , Risk Factors , Stroke/prevention & control

Rosacea Patients Are at Higher Risk for Obstructive Sleep Apnea: Automated Retrospective Research.

Elkin, Peter L; Mullin, Sarah; Sakilay, Sylvester.

Stud Health Technol Inform ; 270: 1381-1382, 2020 Jun 16.

Article in English | MEDLINE | ID: mdl-32570669

ABSTRACT

Using big data science we employ NLP and a novel interface the BMI Investigator to answer clinically meaninful questions. The use case presented is the association between Rosacea and Obstructive Sleep Apnea.

Subject(s)

Rosacea , Sleep Apnea, Obstructive , Body Mass Index , Humans , Retrospective Studies , Rosacea/complications , Sleep Apnea, Obstructive/etiology

Comparison of Changes in the Number of Included Patients Between Interventional Trials and Observational Studies Published from 1995 to 2014 in Three Leading Journals.

Dezetree, Arnaud; Chazard, Emmanuel; Schlegel, Daniel R; Sakilay, Sylvester; Elkin, Peter L; Ficheur, Grégoire.

Stud Health Technol Inform ; 255: 50-54, 2018.

Article in English | MEDLINE | ID: mdl-30306905

ABSTRACT

INTRODUCTION: Since the late 1990s, research and administrative institutions have been developing health data warehouses and increasingly reusing claims data. The impact of these changes is not yet completely quantified. Our objective was to compare the change in the number of patients included per study between observational and interventional studies over a 20-year period starting in 1995. MATERIALS AND METHODS: We extracted all abstracts from studies published in three leading medical journals over the period 1995-2014 (18,107 studies). Then, we divided our study into two steps. First, we constructed an SVM-based predictive model to categorize each abstract into "observational", "interventional" or "other" studies. In a second step, we built an algorithm based on regular expressions to automatically extract the number of included patients. RESULTS: During the investigated period, the median number of enrolled patients per study increased for interventional studies, from 282 in 1995-1999 to 629 in 2010-2014. In the same time, the median number of patients increased more for observational studies, from 368 in 1995-1999 to 2078 in 2010-2014. DISCUSSION: The routine storage of an increasing amount of data (from data warehouses or claims data) has had an impact in recent years on the number of patients included in observational studies. The recent development of "randomized registry trials" combining, on the one hand, an intervention and, on the other hand, the identification of the outcome through data reuse, may also have an impact, over the next decade, on the number of patients included in randomized clinical trials.

Subject(s)

Clinical Trials as Topic , Observational Studies as Topic , Periodicals as Topic , Publishing , Clinical Trials as Topic/statistics & numerical data , Humans , Observational Studies as Topic/statistics & numerical data , Publishing/trends , Randomized Controlled Trials as Topic , Registries

Biomedical Informatics Investigator.

Elkin, Peter L; Mullin, Sarah; Sakilay, Sylvester.

Stud Health Technol Inform ; 255: 195-199, 2018.

Article in English | MEDLINE | ID: mdl-30306935

ABSTRACT

The BMI Investigator is a computer human interface built in .Net which allows simultaneous query of structured data such as demographics, administrative codes, medications (coded in RxNorm), laboratory test results (coded in LOINC) and formerly unstructured data in clinical notes (coded in SNOMED CT). The ontology terms identified using SNOMED are all coded as either positive, negative or uncertain assertions. They are then where applicable built into compositional expressions and stored in both a graph database and a triple store. The SNOMED CT codes are stored in a NOSQL database, Berkley DB, and the structured data is stored in SQL using the OMOP/OHDSI format. The BMI investigator also lets you develop models for cohort selection (data driven recruitment to clinical trials) and automated retrospective research using genomic criteria and we are adding image feature data currently to the system. We performed a usability experiment and the users identified some usability flaws which were used to improve the software. Overall, the BMI Investigator was felt to be usable by subject matter experts. Next steps for the software are to integrate genomic criteria and image features into the query engine.

Subject(s)

RxNorm , Software , Systematized Nomenclature of Medicine , Humans , Information Storage and Retrieval , Retrospective Studies , Vocabulary, Controlled

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL