Pesquisa | Portal Regional da BVS

Comparison of Grouping Methods for Template Extraction from VA Medical Record Text.

Redd, Andrew M; Gundlapalli, Adi V; Divita, Guy; Tran, Le-Thuy; Pettey, Warren B P; Samore, Matthew H.

Stud Health Technol Inform ; 238: 136-139, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28679906

RESUMO

We investigate options for grouping templates for the purpose of template identification and extraction from electronic medical records. We sampled a corpus of 1000 documents originating from Veterans Health Administration (VA) electronic medical record. We grouped documents through hashing and binning tokens (Hashed) as well as by the top 5% of tokens identified as important through the term frequency inverse document frequency metric (TF-IDF). We then compared the approaches on the number of groups with 3 or more and the resulting longest common subsequences (LCSs) common to all documents in the group. We found that the Hashed method had a higher success rate for finding LCSs, and longer LCSs than the TF-IDF method, however the TF-IDF approach found more groups than the Hashed and subsequently more long sequences, however the average length of LCSs were lower. In conclusion, each algorithm appears to have areas where it appears to be superior.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Estados Unidos , United States Department of Veterans Affairs , Veteranos

An Innovative Method for Monitoring Food Quality and the Healthfulness of Consumers' Grocery Purchases.

Tran, Le-Thuy T; Brewster, Philip J; Chidambaram, Valliammai; Hurdle, John F.

Nutrients ; 9(5)2017 May 05.

Artigo em Inglês | MEDLINE | ID: mdl-28475153

RESUMO

This study presents a method laying the groundwork for systematically monitoring food quality and the healthfulness of consumers' point-of-sale grocery purchases. The method automates the process of identifying United States Department of Agriculture (USDA) Food Patterns Equivalent Database (FPED) components of grocery food items. The input to the process is the compact abbreviated descriptions of food items that are similar to those appearing on the point-of-sale sales receipts of most food retailers. The FPED components of grocery food items are identified using Natural Language Processing techniques combined with a collection of food concept maps and relationships that are manually built using the USDA Food and Nutrient Database for Dietary Studies, the USDA National Nutrient Database for Standard Reference, the What We Eat In America food categories, and the hierarchical organization of food items used by many grocery stores. We have established the construct validity of the method using data from the National Health and Nutrition Examination Survey, but further evaluation of validity and reliability will require a large-scale reference standard with known grocery food quality measures. Here we evaluate the method's utility in identifying the FPED components of grocery food items available in a large sample of retail grocery sales data (~190 million transaction records).

Assuntos

Comportamento do Consumidor , Qualidade dos Alimentos , Bases de Dados Factuais , Dieta Saudável , Humanos , Marketing , Inquéritos Nutricionais , Reprodutibilidade dos Testes , Estados Unidos , United States Department of Agriculture

General Symptom Extraction from VA Electronic Medical Notes.

Divita, Guy; Luo, Gang; Tran, Le-Thuy T; Workman, T Elizabeth; Gundlapalli, Adi V; Samore, Matthew H.

Stud Health Technol Inform ; 245: 356-360, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29295115

RESUMO

There is need for cataloging signs and symptoms, but not all are documented in structured data. The text from clinical records are an additional source of signs and symptoms. We describe a Natural Language Processing (NLP) technique to identify symptoms from text. Using a human-annotated reference corpus from VA electronic medical notes we trained and tested an NLP pipeline to identify and categorize symptoms. The technique includes a model created from an automatic machine learning model selection tool. Tested on a hold-out set, its precision at the mention level was 0.80, recall 0.74 and an overall f-score of 0.80. The tool was scaled-up to process a large corpus of 964,105 patient records.

Assuntos

Mineração de Dados , Aprendizado de Máquina , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Humanos

A pilot study of a heuristic algorithm for novel template identification from VA electronic medical record text.

Redd, Andrew M; Gundlapalli, Adi V; Divita, Guy; Carter, Marjorie E; Tran, Le-Thuy; Samore, Matthew H.

J Biomed Inform ; 71S: S68-S76, 2017 07.

Artigo em Inglês | MEDLINE | ID: mdl-27497780

RESUMO

RATIONALE: Templates in text notes pose challenges for automated information extraction algorithms. We propose a method that identifies novel templates in plain text medical notes. The identification can then be used to either include or exclude templates when processing notes for information extraction. METHODS: The two-module method is based on the framework of information foraging and addresses the hypothesis that documents containing templates and the templates within those documents can be identified by common features. The first module takes documents from the corpus and groups those with common templates. This is accomplished through a binned word count hierarchical clustering algorithm. The second module extracts the templates. It uses the groupings and performs a longest common subsequence (LCS) algorithm to obtain the constituent parts of the templates. The method was developed and tested on a random document corpus of 750 notes derived from a large database of US Department of Veterans Affairs (VA) electronic medical notes. RESULTS: The grouping module, using hierarchical clustering, identified 23 groups with 3 documents or more, consisting of 120 documents from the 750 documents in our test corpus. Of these, 18 groups had at least one common template that was present in all documents in the group for a positive predictive value of 78%. The LCS extraction module performed with 100% positive predictive value, 94% sensitivity, and 83% negative predictive value. The human review determined that in 4 groups the template covered the entire document, with the remaining 14 groups containing a common section template. Among documents with templates, the number of templates per document ranged from 1 to 14. The mean and median number of templates per group was 5.9 and 5, respectively. DISCUSSION: The grouping method was successful in finding like documents containing templates. Of the groups of documents containing templates, the LCS module was successful in deciphering text belonging to the template and text that was extraneous. Major obstacles to improved performance included documents composed of multiple templates, templates that included other templates embedded within them, and variants of templates. We demonstrate proof of concept of the grouping and extraction method of identifying templates in electronic medical records in this pilot study and propose methods to improve performance and scaling up.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde , Heurística , Processamento de Linguagem Natural , Humanos , Projetos Piloto

v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text.

Divita, Guy; Carter, Marjorie E; Tran, Le-Thuy; Redd, Doug; Zeng, Qing T; Duvall, Scott; Samore, Matthew H; Gundlapalli, Adi V.

EGEMS (Wash DC) ; 4(3): 1228, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27683667

RESUMO

INTRODUCTION: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of "best-of-breed" functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. BACKGROUND: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. INNOVATION: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. DISCUSSION: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. CONCLUSION: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records.

Exploiting the UMLS Metathesaurus for extracting and categorizing concepts representing signs and symptoms to anatomically related organ systems.

Tran, Le-Thuy T; Divita, Guy; Carter, Marjorie E; Judd, Joshua; Samore, Matthew H; Gundlapalli, Adi V.

J Biomed Inform ; 58: 19-27, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-26362345

RESUMO

OBJECTIVE: To develop a method to exploit the UMLS Metathesaurus for extracting and categorizing concepts found in clinical text representing signs and symptoms to anatomically related organ systems. The overarching goal is to classify patient reported symptoms to organ systems for population health and epidemiological analyses. MATERIALS AND METHODS: Using the concepts' semantic types and the inter-concept relationships as guidance, a selective portion of the concepts within the UMLS Metathesaurus was traversed starting from the concepts representing the highest level organ systems. The traversed concepts were chosen, filtered, and reviewed to obtain the concepts representing clinical signs and symptoms by blocking deviations, pruning superfluous concepts, and manual review. The mapping process was applied to signs and symptoms annotated in a corpus of 750 clinical notes. RESULTS: The mapping process yielded a total of 91,000 UMLS concepts (with approximately 300,000 descriptions) possibly representing physical and mental signs and symptoms that were extracted and categorized to the anatomically related organ systems. Of 1864 distinct descriptions of signs and symptoms found in the 750 document corpus, 1635 of these (88%) were successfully mapped to the set of concepts extracted from the UMLS. Of 668 unique concepts mapped, 603 (90%) were correctly categorized to their organ systems. CONCLUSION: We present a process that facilitates mapping of signs and symptoms to their organ systems. By providing a smaller set of UMLS concepts to use for comparing and matching patient records, this method has the potential to increase efficiency of information extraction pipelines.

Assuntos

Anatomia , Formação de Conceito , Unified Medical Language System , Humanos

Towards Measuring the Food Quality of Grocery Purchases: an Estimation Model of the Healthy Eating Index-2010 Using only Food Item Counts.

Tran, Le-Thuy T; Brewster, Philip J; Chidambaram, Valliammai; Hurdle, John F.

Procedia Food Sci ; 4: 148-159, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26998419

RESUMO

Measuring the quality of food consumed by individuals or groups in the U.S. is essential to informed public health surveillance efforts and sound nutrition policymaking. For example, the Healthy Eating Index-2010 (HEI) is an ideal metric to assess the food quality of households, but the traditional methods of collecting the data required to calculate the HEI are expensive and burdensome. We evaluated an alternative source: rather than measuring the quality of the foods consumers eat, we want to estimate the quality of the foods consumers buy. To accomplish that we need a way of estimating the HEI based solely on the count of food items. We developed an estimation model of the HEI, using an augmented set of the What We Eat In America (WWEIA) food categories. Then we mapped ~92,000 grocery food items to it. The model uses an inverse Cumulative Distribution Function sampling technique. Here we describe the model and report reliability metrics based on NHANES data from 2003-2010.

Scaling Out and Evaluation of OBSecAn, an Automated Section Annotator for Semi-Structured Clinical Documents, on a Large VA Clinical Corpus.

Tran, Le-Thuy T; Divita, Guy; Redd, Andrew; Carter, Marjorie E; Samore, Matthew; Gundlapalli, Adi V.

AMIA Annu Symp Proc ; 2015: 1204-13, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26958260

RESUMO

"Identifying and labeling" (annotating) sections improves the effectiveness of extracting information stored in the free text of clinical documents. OBSecAn, an automated ontology-based section annotator, was developed to identify and label sections of semi-structured clinical documents from the Department of Veterans Affairs (VA). In the first step, the algorithm reads and parses the document to obtain and store information regarding sections into a structure that supports the hierarchy of sections. The second stage detects and makes correction to errors in the parsed structure. The third stage produces the section annotation output using the final parsed tree. In this study, we present the OBSecAn method and its scale to a million document corpus and evaluate its performance in identifying family history sections. We identify high yield sections for this use case from note titles such as primary care and demonstrate a median rate of 99% in correctly identifying a family history section.

Assuntos

Algoritmos , Curadoria de Dados , Processamento de Linguagem Natural , Humanos , Obras Médicas de Referência , Estados Unidos , United States Department of Veterans Affairs

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA