Búsqueda | Portal Regional de la BVS

1.

Improving drug safety with adverse event detection using natural language processing.

Botsis, Taxiarchis; Kreimeyer, Kory.

Expert Opin Drug Saf ; 22(8): 659-668, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37339273

RESUMEN

INTRODUCTION: Pharmacovigilance (PV) involves monitoring and aggregating adverse event information from a variety of data sources, including health records, biomedical literature, spontaneous adverse event reports, product labels, and patient-generated content like social media posts, but the most pertinent details in these sources are typically available in narrative free-text formats. Natural language processing (NLP) techniques can be used to extract clinically relevant information from PV texts to inform decision-making. AREAS COVERED: We conducted a non-systematic literature review by querying the PubMed database to examine the uses of NLP in drug safety and distilled the findings to present our expert opinion on the topic. EXPERT OPINION: New NLP techniques and approaches continue to be applied for drug safety use cases; however, systems that are fully deployed and in use in a clinical environment remain vanishingly rare. To see high-performing NLP techniques implemented in the real setting will require long-term engagement with end users and other stakeholders and revised workflows in fully formulated business plans for the targeted use cases. Additionally, we found little to no evidence of extracted information placed into standardized data models, which should be a way to make implementations more portable and adaptable.

Asunto(s)

Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Medios de Comunicación Sociales , Humanos , Procesamiento de Lenguaje Natural , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/prevención & control , Sistemas de Registro de Reacción Adversa a Medicamentos , Farmacovigilancia

2.

Precision Oncology Core Data Model to Support Clinical Genomics Decision Making.

Botsis, Taxiarchis; Murray, Joseph C; Ghanem, Paola; Balan, Archana; Kernagis, Alexander; Hardart, Kent; He, Ting; Spiker, Jonathan; Kreimeyer, Kory; Tao, Jessica; Baras, Alexander S; Yegnasubramanian, Srinivasan; Canzoniero, Jenna; Anagnostou, Valsamo.

JCO Clin Cancer Inform ; 7: e2200108, 2023 04.

Artículo en Inglés | MEDLINE | ID: mdl-37040583

RESUMEN

PURPOSE: Precision oncology mandates developing standardized common data models (CDMs) to facilitate analyses and enable clinical decision making. Expert-opinion-based precision oncology initiatives are epitomized in Molecular Tumor Boards (MTBs), which process large volumes of clinical-genomic data to match genotypes with molecularly guided therapies. METHODS: We used the Johns Hopkins University MTB as a use case and developed a precision oncology core data model (Precision-DM) to capture key clinical-genomic data elements. We leveraged existing CDMs, building upon the Minimal Common Oncology Data Elements model (mCODE). Our model was defined as a set of profiles with multiple data elements, focusing on next-generation sequencing and variant annotations. Most elements were mapped to terminologies or code sets and the Fast Healthcare Interoperability Resources (FHIR). We subsequently compared our Precision-DM with existing CDMs, including the National Cancer Institute's Genomic Data Commons (NCI GDC), mCODE, OSIRIS, the clinical Genome Data Model (cGDM), and the genomic CDM (gCDM). RESULTS: Precision-DM contained 16 profiles and 355 data elements. 39% of the elements derived values from selected terminologies or code sets, and 61% were mapped to FHIR. Although we used most elements contained in mCODE, we significantly expanded the profiles to include genomic annotations, resulting in a partial overlap of 50.7% between our core model and mCODE. Limited overlap was noted between Precision-DM and OSIRIS (33.2%), NCI GDC (21.4%), cGDM (9.3%), and gCDM (7.9%). Precision-DM covered most of the mCODE elements (87.7%), with less coverage for OSIRIS (35.8%), NCI GDC (11%), cGDM (26%) and gCDM (33.3%). CONCLUSION: Precision-DM supports clinical-genomic data standardization to support the MTB use case and may allow for harmonized data pulls across health care systems, academic institutions, and community medical centers.

Asunto(s)

Neoplasias , Humanos , Neoplasias/terapia , Medicina de Precisión/métodos , Genómica/métodos , Toma de Decisiones Clínicas , Toma de Decisiones

3.

Trends and opportunities in computable clinical phenotyping: A scoping review.

He, Ting; Belouali, Anas; Patricoski, Jessica; Lehmann, Harold; Ball, Robert; Anagnostou, Valsamo; Kreimeyer, Kory; Botsis, Taxiarchis.

J Biomed Inform ; 140: 104335, 2023 04.

Artículo en Inglés | MEDLINE | ID: mdl-36933631

RESUMEN

Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.

Asunto(s)

Algoritmos , Registros Electrónicos de Salud , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Fenotipo

4.

Natural Language Processing Approaches for Retrieval of Clinically Relevant Genomic Information in Cancer.

Botsis, Taxiarchis; Murray, Joseph; Leal, Alessandro; Palsgrove, Doreen; Wang, Wei; White, James R; Velculescu, Victor E; Anagnostou, Valsamo.

Stud Health Technol Inform ; 295: 350-353, 2022 Jun 29.

Artículo en Inglés | MEDLINE | ID: mdl-35773881

RESUMEN

The accelerating impact of genomic data in clinical decision-making has generated a paradigm shift from treatment based on the anatomic origin of the tumor to the incorporation of key genomic features to guide therapy. Assessing the clinical validity and utility of the genomic background of a patient's cancer represents one of the emerging challenges in oncology practice, demanding the development of automated platforms for extracting clinically relevant genomic information from medical texts. We developed PubMiner, a natural language processing tool to extract and interpret cancer type, therapy, and genomic information from biomedical abstracts. Our initial focus has been the retrieval of gene names, variants, and negations, where PubMiner performed highly in terms of total recall (91.7%) with a precision of 79.7%. Our next steps include developing a web-based interface to promote personalized treatment based on each tumor's unique genomic fingerprints.

Asunto(s)

Procesamiento de Lenguaje Natural , Neoplasias , Toma de Decisiones Clínicas , Genómica , Humanos , Oncología Médica , Neoplasias/terapia

5.

Overcoming Major Barriers to Build Efficient Decision Support Systems in Pharmacovigilance.

Kreimeyer, Kory; Spiker, Jonathan; Botsis, Taxiarchis.

Stud Health Technol Inform ; 295: 398-401, 2022 Jun 29.

Artículo en Inglés | MEDLINE | ID: mdl-35773895

RESUMEN

Many decision support methods and systems in pharmacovigilance are built without explicitly addressing specific challenges that jeopardize their eventual success. We describe two sets of challenges and appropriate strategies to address them. The first are data-related challenges, which include using extensive multi-source data of poor quality, incomplete information integration, and inefficient data visualization. The second are user-related challenges, which encompass users' overall expectations and their engagement in developing automated solutions. Pharmacovigilance decision support systems will need to rely on advanced methods, such as natural language processing and validated mathematical models, to resolve data-related issues and provide properly contextualized data. However, sophisticated approaches will not provide a complete solution if end-users do not actively participate in their development, which will ensure tools that efficiently complement existing processes without creating unnecessary resistance. Our group has already tackled these issues and applied the proposed strategies in multiple projects.

Asunto(s)

Sistemas de Apoyo a Decisiones Clínicas/normas , Sistemas de Apoyo a Decisiones Administrativas/normas , Procesamiento de Lenguaje Natural , Farmacovigilancia , Exactitud de los Datos , Interfaz Usuario-Computador

6.

An Evaluation of Pretrained BERT Models for Comparing Semantic Similarity Across Unstructured Clinical Trial Texts.

Patricoski, Jessica; Kreimeyer, Kory; Balan, Archana; Hardart, Kent; Tao, Jessica; Anagnostou, Valsamo; Botsis, Taxiarchis.

Stud Health Technol Inform ; 289: 18-21, 2022 Jan 14.

Artículo en Inglés | MEDLINE | ID: mdl-35062081

RESUMEN

Processing unstructured clinical texts is often necessary to support certain tasks in biomedicine, such as matching patients to clinical trials. Among other methods, domain-specific language models have been built to utilize free-text information. This study evaluated the performance of Bidirectional Encoder Representations from Transformers (BERT) models in assessing the similarity between clinical trial texts. We compared an unstructured aggregated summary of clinical trials reviewed at the Johns Hopkins Molecular Tumor Board with the ClinicalTrials.gov records, focusing on the titles and eligibility criteria. Seven pretrained BERT-Based models were used in our analysis. Of the six biomedical-domain-specific models, only SciBERT outperformed the original BERT model by accurately assigning higher similarity scores to matched than mismatched trials. This finding is promising and shows that BERT and, likely, other language models may support patient-trial matching.

Asunto(s)

Procesamiento de Lenguaje Natural , Semántica , Ensayos Clínicos como Asunto , Humanos , Lenguaje

7.

Feature engineering and machine learning for causality assessment in pharmacovigilance: Lessons learned from application to the FDA Adverse Event Reporting System.

Kreimeyer, Kory; Dang, Oanh; Spiker, Jonathan; Muñoz, Monica A; Rosner, Gary; Ball, Robert; Botsis, Taxiarchis.

Comput Biol Med ; 135: 104517, 2021 08.

Artículo en Inglés | MEDLINE | ID: mdl-34130003

RESUMEN

BACKGROUND: Our objective was to support the automated classification of Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) reports for their usefulness in assessing the possibility of a causal relationship between a drug product and an adverse event. METHOD: We used a data set of 326 redacted FAERS reports that was previously annotated using a modified version of the World Health Organization-Uppsala Monitoring Centre criteria for drug causality assessment by a group of SEs at the FDA and supported a similar study on the classification of reports using supervised machine learning and text engineering methods. We explored many potential features, including the incorporation of natural language processing on report text and information from external data sources, for supervised learning and developed models for predicting the classification status of reports. We then evaluated the models on a larger data set of previously unseen reports. RESULTS: The best-performing models achieved recall and F1 scores on both data sets above 0.80 for the identification of assessable reports (i.e. those containing enough information to make an informed causality assessment) and above 0.75 for the identification of reports meeting at least a Possible causality threshold. CONCLUSIONS: Causal inference from FAERS reports depends on many components with complex logical relationships that are yet to be made fully computable. Efforts focused on readily addressable tasks, such as quickly eliminating unassessable reports, fit naturally in SE's thought processes to provide real enhancements for FDA workflows.

Asunto(s)

Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Farmacovigilancia , Sistemas de Registro de Reacción Adversa a Medicamentos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/epidemiología , Humanos , Aprendizaje Automático , Estados Unidos , United States Food and Drug Administration

8.

Can FHIR Support Standardization in Post-Market Safety Surveillance?

Wang, Xingtong; Lehmann, Harold; Botsis, Taxiarchis.

Stud Health Technol Inform ; 281: 33-37, 2021 May 27.

Artículo en Inglés | MEDLINE | ID: mdl-34042700

RESUMEN

The Fast Healthcare Interoperability Resources (FHIR) contain multiple data-exchange standards that aim at optimizing healthcare information exchange. One of them, the FHIR AdverseEvent, may support post-market safety surveillance. We examined its readiness using the Food and Drug Administration's (FDA) Adverse Event Reporting System (FAERS). Our methodology focused on mapping the public FAERS data fields to the FHIR AdverseEvent Resource elements and developing a software tool to automate this process. We mapped directly nine and indirectly two of the twenty-six FAERS elements, while six elements were assigned default values. This exploration further revealed opportunities for adding extra elements to the FHIR standard, based on critical FAERS pieces of information reviewed at the FDA. The existing version of the FHIR AdverseEvent Resource may standardize some of the FAERS information but has to be modified and extended to maximize its value in post-market safety surveillance.

Asunto(s)

Programas Informáticos , Estándares de Referencia

9.

Information Visualization Platform for Postmarket Surveillance Decision Support.

Spiker, Jonathan; Kreimeyer, Kory; Dang, Oanh; Boxwell, Debra; Chan, Vicky; Cheng, Connie; Gish, Paula; Lardieri, Allison; Wu, Eileen; De, Suranjan; Naidoo, Jarushka; Lehmann, Harold; Rosner, Gary L; Ball, Robert; Botsis, Taxiarchis.

Drug Saf ; 43(9): 905-915, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-32445187

RESUMEN

INTRODUCTION: The US FDA receives more than 2 million postmarket reports each year. Safety Evaluators (SEs) review these reports, as well as external information, to identify potential safety signals. With the increasing number of reports and the size of external information, more efficient solutions for data integration and decision making are needed. OBJECTIVES: The aim of this study was to develop an interactive decision support application for drug safety surveillance that integrates and visualizes information from postmarket reports, product labels, and biomedical literature. METHODS: We conducted multiple meetings with a group of seven SEs at the FDA to collect the requirements for the Information Visualization Platform (InfoViP). Using infographic design principles, we implemented the InfoViP prototype version as a modern web application using the integrated information collected from the FDA Adverse Event Reporting System, the DailyMed repository, and PubMed. The same group of SEs evaluated the InfoViP prototype functionalities using a simple evaluation form and provided input for potential enhancements. RESULTS: The SEs described their workflows and overall expectations around the automation of time-consuming tasks, including the access to the visualization of external information. We developed a set of wireframes, shared them with the SEs, and finalized the InfoViP design. The InfoViP prototype architecture relied on a javascript and a python-based framework, as well as an existing tool for the processing of free-text information in all sources. This natural language processing tool supported multiple functionalities, especially the construction of time plots for individual postmarket reports and groups of reports. Overall, we received positive comments from the SEs during the InfoViP prototype evaluation and addressed their suggestions in the final version. CONCLUSIONS: The InfoViP system uses context-driven interactive visualizations and informatics tools to assist FDA SEs in synthesizing data from multiple sources for their case series analyses.

Asunto(s)

Técnicas de Apoyo para la Decisión , Sistemas de Información Geográfica , Procesamiento de Imagen Asistido por Computador , Vigilancia de Productos Comercializados , Humanos , Procesamiento de Lenguaje Natural , Estados Unidos , United States Food and Drug Administration

10.

Visual storytelling enhances knowledge dissemination in biomedical science.

Botsis, Taxiarchis; Fairman, Jennifer E; Moran, Meghan Bridgid; Anagnostou, Valsamo.

J Biomed Inform ; 107: 103458, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32445856

RESUMEN

Research findings in biomedical science are often summarized in statistical plots and sophisticated data presentations. Such visualizations are challenging for people who lack the appropriate scientific background or even experts who work in other areas. Scientists have to maximize knowledge dissemination by improving the communication of their findings to the public. To address the need for compelling and successful information visualizations in biomedical science, we propose a new theoretical framework for Visual Storytelling and illustrate its potential application through two visual stories, one on vaccine safety and one on cancer immunotherapy. In both examples, we rely on solid data and combine multiple media (photographs, illustrations, choropleth maps, tables, graphs, and charts) with text to create powerful visual stories for the selected target audiences. If fully validated, the proposed theory may shed light into non-traditional techniques for building visual stories and further the agenda of creating compelling information visualizations.

Asunto(s)

Comunicación , Conocimiento , Humanos , Difusión de la Información

11.

Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.

Woldaregay, Ashenafi Zebene; Årsand, Eirik; Walderhaug, Ståle; Albers, David; Mamykina, Lena; Botsis, Taxiarchis; Hartvigsen, Gunnar.

Artif Intell Med ; 98: 109-134, 2019 07.

Artículo en Inglés | MEDLINE | ID: mdl-31383477

RESUMEN

BACKGROUND: Diabetes mellitus (DM) is a metabolic disorder that causes abnormal blood glucose (BG) regulation that might result in short and long-term health complications and even death if not properly managed. Currently, there is no cure for diabetes. However, self-management of the disease, especially keeping BG in the recommended range, is central to the treatment. This includes actively tracking BG levels and managing physical activity, diet, and insulin intake. The recent advancements in diabetes technologies and self-management applications have made it easier for patients to have more access to relevant data. In this regard, the development of an artificial pancreas (a closed-loop system), personalized decision systems, and BG event alarms are becoming more apparent than ever. Techniques such as predicting BG (modeling of a personalized profile), and modeling BG dynamics are central to the development of these diabetes management technologies. The increased availability of sufficient patient historical data has paved the way for the introduction of machine learning and its application for intelligent and improved systems for diabetes management. The capability of machine learning to solve complex tasks with dynamic environment and knowledge has contributed to its success in diabetes research. MOTIVATION: Recently, machine learning and data mining have become popular, with their expanding application in diabetes research and within BG prediction services in particular. Despite the increasing and expanding popularity of machine learning applications in BG prediction services, updated reviews that map and materialize the current trends in modeling options and strategies are lacking within the context of BG prediction (modeling of personalized profile) in type 1 diabetes. OBJECTIVE: The objective of this review is to develop a compact guide regarding modeling options and strategies of machine learning and a hybrid system focusing on the prediction of BG dynamics in type 1 diabetes. The review covers machine learning approaches pertinent to the controller of an artificial pancreas (closed-loop systems), modeling of personalized profiles, personalized decision support systems, and BG alarm event applications. Generally, the review will identify, assess, analyze, and discuss the current trends of machine learning applications within these contexts. METHOD: A rigorous literature review was conducted between August 2017 and February 2018 through various online databases, including Google Scholar, PubMed, ScienceDirect, and others. Additionally, peer-reviewed journals and articles were considered. Relevant studies were first identified by reviewing the title, keywords, and abstracts as preliminary filters with our selection criteria, and then we reviewed the full texts of the articles that were found relevant. Information from the selected literature was extracted based on predefined categories, which were based on previous research and further elaborated through brainstorming among the authors. RESULTS: The initial search was done by analyzing the title, abstract, and keywords. A total of 624 papers were retrieved from DBLP Computer Science (25), Diabetes Technology and Therapeutics (31), Google Scholar (193), IEEE (267), Journal of Diabetes Science and Technology (31), PubMed/Medline (27), and ScienceDirect (50). After removing duplicates from the list, 417 records remained. Then, we independently assessed and screened the articles based on the inclusion and exclusion criteria, which eliminated another 204 papers, leaving 213 relevant papers. After a full-text assessment, 55 articles were left, which were critically analyzed. The inter-rater agreement was measured using a Cohen Kappa test, and disagreements were resolved through discussion. CONCLUSION: Due to the complexity of BG dynamics, it remains difficult to achieve a universal model that produces an accurate prediction in every circumstance (i.e., hypo/eu/hyperglycemia events). Recently, machine learning techniques have received wider attention and increased popularity in diabetes research in general and BG prediction in particular, coupled with the ever-growing availability of a self-collected health data. The state-of-the-art demonstrates that various machine learning techniques have been tested to predict BG, such as recurrent neural networks, feed-forward neural networks, support vector machines, self-organizing maps, the Gaussian process, genetic algorithm and programs, deep neural networks, and others, using various group of input parameters and training algorithms. The main limitation of the current approaches is the lack of a well-defined approach to estimate carbohydrate intake, which is mainly done manually by individual users and is prone to an error that can severely affect the predictive performance. Moreover, a universal approach has not been established to estimate and quantify the approximate effect of physical activities, stress, and infections on the BG level. No researchers have assessed model predictive performance during stress and infection incidences in a free-living condition, which should be considered in future studies. Furthermore, a little has been done regarding model portability that can capture inter- and intra-variability among patients. It seems that the effect of time lags between the CGM readings and the actual BG levels is not well covered. However, in general, we foresee that these developments might foster the advancement of next-generation BG prediction algorithms, which will make a great contribution in the effort to develop the long-awaited, so-called artificial pancreas (a closed-loop system).

Asunto(s)

Glucemia/metabolismo , Diabetes Mellitus Tipo 1/metabolismo , Aprendizaje Automático , Modelación Específica para el Paciente , Automonitorización de la Glucosa Sanguínea , Minería de Datos , Diabetes Mellitus Tipo 1/tratamiento farmacológico , Dieta , Ejercicio Físico , Conducta Alimentaria , Humanos , Hipoglucemiantes/uso terapéutico , Insulina/uso terapéutico , Aplicaciones Móviles , Modelos Biológicos , Estrés Psicológico , Dispositivos Electrónicos Vestibles

12.

Data-Driven Blood Glucose Pattern Classification and Anomalies Detection: Machine-Learning Applications in Type 1 Diabetes.

Woldaregay, Ashenafi Zebene; Årsand, Eirik; Botsis, Taxiarchis; Albers, David; Mamykina, Lena; Hartvigsen, Gunnar.

J Med Internet Res ; 21(5): e11030, 2019 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-31042157

RESUMEN

BACKGROUND: Diabetes mellitus is a chronic metabolic disorder that results in abnormal blood glucose (BG) regulations. The BG level is preferably maintained close to normality through self-management practices, which involves actively tracking BG levels and taking proper actions including adjusting diet and insulin medications. BG anomalies could be defined as any undesirable reading because of either a precisely known reason (normal cause variation) or an unknown reason (special cause variation) to the patient. Recently, machine-learning applications have been widely introduced within diabetes research in general and BG anomaly detection in particular. However, irrespective of their expanding and increasing popularity, there is a lack of up-to-date reviews that materialize the current trends in modeling options and strategies for BG anomaly classification and detection in people with diabetes. OBJECTIVE: This review aimed to identify, assess, and analyze the state-of-the-art machine-learning strategies and their hybrid systems focusing on BG anomaly classification and detection including glycemic variability (GV), hyperglycemia, and hypoglycemia in type 1 diabetes within the context of personalized decision support systems and BG alarm events applications, which are important constituents for optimal diabetes self-management. METHODS: A rigorous literature search was conducted between September 1 and October 1, 2017, and October 15 and November 5, 2018, through various Web-based databases. Peer-reviewed journals and articles were considered. Information from the selected literature was extracted based on predefined categories, which were based on previous research and further elaborated through brainstorming. RESULTS: The initial results were vetted using the title, abstract, and keywords and retrieved 496 papers. After a thorough assessment and screening, 47 articles remained, which were critically analyzed. The interrater agreement was measured using a Cohen kappa test, and disagreements were resolved through discussion. The state-of-the-art classes of machine learning have been developed and tested up to the task and achieved promising performance including artificial neural network, support vector machine, decision tree, genetic algorithm, Gaussian process regression, Bayesian neural network, deep belief network, and others. CONCLUSIONS: Despite the complexity of BG dynamics, there are many attempts to capture hypoglycemia and hyperglycemia incidences and the extent of an individual's GV using different approaches. Recently, the advancement of diabetes technologies and continuous accumulation of self-collected health data have paved the way for popularity of machine learning in these tasks. According to the review, most of the identified studies used a theoretical threshold, which suffers from inter- and intrapatient variation. Therefore, future studies should consider the difference among patients and also track its temporal change over time. Moreover, studies should also give more emphasis on the types of inputs used and their associated time lag. Generally, we foresee that these developments might encourage researchers to further develop and test these systems on a large-scale basis.

Asunto(s)

Glucemia/metabolismo , Diabetes Mellitus Tipo 1/clasificación , Algoritmos , Glucemia/análisis , Diabetes Mellitus Tipo 1/sangre , Diabetes Mellitus Tipo 1/complicaciones , Femenino , Humanos , Aprendizaje Automático , Masculino

13.

Adverse Event extraction from Structured Product Labels using the Event-based Text-mining of Health Electronic Records (ETHER) system.

Pandey, Abhishek; Kreimeyer, Kory; Foster, Matthew; Dang, Oanh; Ly, Thomas; Wang, Wei; Forshee, Richard; Botsis, Taxiarchis.

Health Informatics J ; 25(4): 1232-1243, 2019 12.

Artículo en Inglés | MEDLINE | ID: mdl-29359620

RESUMEN

Structured Product Labels follow an XML-based document markup standard approved by the Health Level Seven organization and adopted by the US Food and Drug Administration as a mechanism for exchanging medical products information. Their current organization makes their secondary use rather challenging. We used the Side Effect Resource database and DailyMed to generate a comparison dataset of 1159 Structured Product Labels. We processed the Adverse Reaction section of these Structured Product Labels with the Event-based Text-mining of Health Electronic Records system and evaluated its ability to extract and encode Adverse Event terms to Medical Dictionary for Regulatory Activities Preferred Terms. A small sample of 100 labels was then selected for further analysis. Of the 100 labels, Event-based Text-mining of Health Electronic Records achieved a precision and recall of 81 percent and 92 percent, respectively. This study demonstrated Event-based Text-mining of Health Electronic Record's ability to extract and encode Adverse Event terms from Structured Product Labels which may potentially support multiple pharmacoepidemiological tasks.

Asunto(s)

Minería de Datos , Etiquetado de Medicamentos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Estados Unidos , United States Food and Drug Administration

14.

Evaluating automated approaches to anaphylaxis case classification using unstructured data from the FDA Sentinel System.

Ball, Robert; Toh, Sengwee; Nolan, Jamie; Haynes, Kevin; Forshee, Richard; Botsis, Taxiarchis.

Pharmacoepidemiol Drug Saf ; 27(10): 1077-1084, 2018 10.

Artículo en Inglés | MEDLINE | ID: mdl-30152575

RESUMEN

INTRODUCTION: In May 2008, the Food and Drug Administration launched the Sentinel Initiative, a multi-year program for the establishment of a national electronic monitoring system for medical product safety that led, in 2016, to the launch of the full Sentinel System. Under the Mini-Sentinel pilot, several algorithms for identifying health outcomes of interest, including one for anaphylaxis, were developed and evaluated using data available from the Sentinel common data model. PURPOSE: To evaluate whether features extracted from unstructured narrative data using natural language processing (NLP) could be used to classify anaphylaxis cases. METHODS: Using previously developed methods, we extracted features from unstructured narrative data using NLP and applied rule-based and similarity-based algorithms to identify anaphylaxis among 62 potential cases previously classified by human experts as anaphylaxis (N = 33), not anaphylaxis (N = 27), and unknown (N = 2). RESULTS: The rule-based and similarity-based approaches demonstrated almost equal performance (recall 100% vs 100%, precision 60.3% vs 57.4%, F-measure: 0.753 vs 0.729). Reasons for misclassification included the inability of the algorithms to make the same clinical judgments as human experts about the timing, severity, or presence of alternative explanations; and the identification of terms consistent with anaphylaxis but present in conditions other than anaphylaxis. CONCLUSIONS: Although precision needs to be improved before these algorithms could be used without human review, we demonstrated that applying rule-based and similarity-based algorithms to unstructured narrative information from clinical records can be used for classification of anaphylaxis in the Sentinel System. Further development and assessment of these methods in the Sentinel System are warranted.

Asunto(s)

Algoritmos , Anafilaxia/clasificación , Análisis de Datos , Vigilancia de Guardia , United States Food and Drug Administration/normas , Anafilaxia/epidemiología , Humanos , Estados Unidos/epidemiología , United States Food and Drug Administration/estadística & datos numéricos

15.

Evaluation of Natural Language Processing (NLP) systems to annotate drug product labeling with MedDRA terminology.

Ly, Thomas; Pamer, Carol; Dang, Oanh; Brajovic, Sonja; Haider, Shahrukh; Botsis, Taxiarchis; Milward, David; Winter, Andrew; Lu, Susan; Ball, Robert.

J Biomed Inform ; 83: 73-86, 2018 07.

Artículo en Inglés | MEDLINE | ID: mdl-29860093

RESUMEN

INTRODUCTION: The FDA Adverse Event Reporting System (FAERS) is a primary data source for identifying unlabeled adverse events (AEs) in a drug or biologic drug product's postmarketing phase. Many AE reports must be reviewed by drug safety experts to identify unlabeled AEs, even if the reported AEs are previously identified, labeled AEs. Integrating the labeling status of drug product AEs into FAERS could increase report triage and review efficiency. Medical Dictionary for Regulatory Activities (MedDRA) is the standard for coding AE terms in FAERS cases. However, drug manufacturers are not required to use MedDRA to describe AEs in product labels. We hypothesized that natural language processing (NLP) tools could assist in automating the extraction and MedDRA mapping of AE terms in drug product labels. MATERIALS AND METHODS: We evaluated the performance of three NLP systems, (ETHER, I2E, MetaMap) for their ability to extract AE terms from drug labels and translate the terms to MedDRA Preferred Terms (PTs). Pharmacovigilance-based annotation guidelines for extracting AE terms from drug labels were developed for this study. We compared each system's output to MedDRA PT AE lists, manually mapped by FDA pharmacovigilance experts using the guidelines, for ten drug product labels known as the "gold standard AE list" (GSL) dataset. Strict time and configuration conditions were imposed in order to test each system's capabilities under conditions of no human intervention and minimal system configuration. Each NLP system's output was evaluated for precision, recall and F measure in comparison to the GSL. A qualitative error analysis (QEA) was conducted to categorize a random sample of each NLP system's false positive and false negative errors. RESULTS: A total of 417, 278, and 250 false positive errors occurred in the ETHER, I2E, and MetaMap outputs, respectively. A total of 100, 80, and 187 false negative errors occurred in ETHER, I2E, and MetaMap outputs, respectively. Precision ranged from 64% to 77%, recall from 64% to 83% and F measure from 67% to 79%. I2E had the highest precision (77%), recall (83%) and F measure (79%). ETHER had the lowest precision (64%). MetaMap had the lowest recall (64%). The QEA found that the most prevalent false positive errors were context errors such as "Context error/General term", "Context error/Instructions or monitoring parameters", "Context error/Medical history preexisting condition underlying condition risk factor or contraindication", and "Context error/AE manifestations or secondary complication". The most prevalent false negative errors were in the "Incomplete or missed extraction" error category. Missing AE terms were typically due to long terms, or terms containing non-contiguous words which do not correspond exactly to MedDRA synonyms. MedDRA mapping errors were a minority of errors for ETHER and I2E but were the most prevalent false positive errors for MetaMap. CONCLUSIONS: The results demonstrate that it may be feasible to use NLP tools to extract and map AE terms to MedDRA PTs. However, the NLP tools we tested would need to be modified or reconfigured to lower the error rates to support their use in a regulatory setting. Tools specific for extracting AE terms from drug labels and mapping the terms to MedDRA PTs may need to be developed to support pharmacovigilance. Conducting research using additional NLP systems on a larger, diverse GSL would also be informative.

Asunto(s)

Sistemas de Registro de Reacción Adversa a Medicamentos , Etiquetado de Medicamentos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Procesamiento de Lenguaje Natural , Terminología como Asunto , Humanos , Farmacovigilancia , Estados Unidos , United States Food and Drug Administration

16.

Generation of an annotated reference standard for vaccine adverse event reports.

Foster, Matthew; Pandey, Abhishek; Kreimeyer, Kory; Botsis, Taxiarchis.

Vaccine ; 36(29): 4325-4330, 2018 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-29880244

RESUMEN

As part of a collaborative project between the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention for the development of a web-based natural language processing (NLP) workbench, we created a corpus of 1000 Vaccine Adverse Event Reporting System (VAERS) reports annotated for 36,726 clinical features, 13,365 temporal features, and 22,395 clinical-temporal links. This paper describes the final corpus, as well as the methodology used to create it, so that clinical NLP researchers outside FDA can evaluate the utility of the corpus to aid their own work. The creation of this standard went through four phases: pre-training, pre-production, production-clinical feature annotation, and production-temporal annotation. The pre-production phase used a double annotation followed by adjudication strategy to refine and finalize the annotation model while the production phases followed a single annotation strategy to maximize the number of reports in the corpus. An analysis of 30 reports randomly selected as part of a quality control assessment yielded accuracies of 0.97, 0.96, and 0.83 for clinical features, temporal features, and clinical-temporal associations, respectively and speaks to the quality of the corpus.

Asunto(s)

Sistemas de Registro de Reacción Adversa a Medicamentos/normas , Estándares de Referencia , Vacunas/efectos adversos , Centers for Disease Control and Prevention, U.S. , Humanos , Estados Unidos , United States Food and Drug Administration

17.

Monitoring biomedical literature for post-market safety purposes by analyzing networks of text-based coded information.

Botsis, Taxiarchis; Foster, Matthew; Kreimeyer, Kory; Pandey, Abhishek; Forshee, Richard.

AMIA Jt Summits Transl Sci Proc ; 2017: 66-75, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-28815108

RESUMEN

Literature review is critical but time-consuming in the post-market surveillance of medical products. We focused on the safety signal of intussusception after the vaccination of infants with the Rotashield Vaccine in 1999 and retrieved all PubMed abstracts for rotavirus vaccines published after January 1, 1998. We used the Event-based Text-mining of Health Electronic Records system, the MetaMap tool, and the National Center for Biomedical Ontologies Annotator to process the abstracts and generate coded terms stamped with the date of publication. Data were analyzed in the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment to evaluate the intussusception-related findings before and after the release of the new rotavirus vaccines in 2006. The tight connection of intussusception with the historical signal in the first period and the absence of any safety concern for the new vaccines in the second period were verified. We demonstrated the feasibility for semi-automated solutions that may assist medical reviewers in monitoring biomedical literature.

18.

Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

Kreimeyer, Kory; Foster, Matthew; Pandey, Abhishek; Arya, Nina; Halford, Gwendolyn; Jones, Sandra F; Forshee, Richard; Walderhaug, Mark; Botsis, Taxiarchis.

J Biomed Inform ; 73: 14-29, 2017 09.

Artículo en Inglés | MEDLINE | ID: mdl-28729030

RESUMEN

We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP.

Asunto(s)

Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Humanos

19.

Application of Natural Language Processing and Network Analysis Techniques to Post-market Reports for the Evaluation of Dose-related Anti-Thymocyte Globulin Safety Patterns.

Botsis, Taxiarchis; Foster, Matthew; Arya, Nina; Kreimeyer, Kory; Pandey, Abhishek; Arya, Deepa.

Appl Clin Inform ; 8(2): 396-411, 2017 04 26.

Artículo en Inglés | MEDLINE | ID: mdl-28447098

RESUMEN

OBJECTIVE: To evaluate the feasibility of automated dose and adverse event information retrieval in supporting the identification of safety patterns. METHODS: We extracted all rabbit Anti-Thymocyte Globulin (rATG) reports submitted to the United States Food and Drug Administration Adverse Event Reporting System (FAERS) from the product's initial licensure in April 16, 1984 through February 8, 2016. We processed the narratives using the Medication Extraction (MedEx) and the Event-based Text-mining of Health Electronic Records (ETHER) systems and retrieved the appropriate medication, clinical, and temporal information. When necessary, the extracted information was manually curated. This process resulted in a high quality dataset that was analyzed with the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment (PANACEA) to explore the association of rATG dosing with post-transplant lymphoproliferative disorder (PTLD). RESULTS: Although manual curation was necessary to improve the data quality, MedEx and ETHER supported the extraction of the appropriate information. We created a final dataset of 1,380 cases with complete information for rATG dosing and date of administration. Analysis in PANACEA found that PTLD was associated with cumulative doses of rATG >8 mg/kg, even in periods where most of the submissions to FAERS reported low doses of rATG. CONCLUSION: We demonstrated the feasibility of investigating a dose-related safety pattern for a particular product in FAERS using a set of automated tools.

Asunto(s)

Sistemas de Registro de Reacción Adversa a Medicamentos , Suero Antilinfocítico/efectos adversos , Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Seguridad , Relación Dosis-Respuesta a Droga , Estudios de Factibilidad , Humanos , Factores de Tiempo

20.

Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems.

Kreimeyer, Kory; Menschik, David; Winiecki, Scott; Paul, Wendy; Barash, Faith; Woo, Emily Jane; Alimchandani, Meghna; Arya, Deepa; Zinderman, Craig; Forshee, Richard; Botsis, Taxiarchis.

Drug Saf ; 40(7): 571-582, 2017 07.

Artículo en Inglés | MEDLINE | ID: mdl-28293864

RESUMEN

INTRODUCTION: Duplicate case reports in spontaneous adverse event reporting systems pose a challenge for medical reviewers to efficiently perform individual and aggregate safety analyses. Duplicate cases can bias data mining by generating spurious signals of disproportional reporting of product-adverse event pairs. OBJECTIVE: We have developed a probabilistic record linkage algorithm for identifying duplicate cases in the US Vaccine Adverse Event Reporting System (VAERS) and the US Food and Drug Administration Adverse Event Reporting System (FAERS). METHODS: In addition to using structured field data, the algorithm incorporates the non-structured narrative text of adverse event reports by examining clinical and temporal information extracted by the Event-based Text-mining of Health Electronic Records system, a natural language processing tool. The final component of the algorithm is a novel duplicate confidence value that is calculated by a rule-based empirical approach that looks for similarities in a number of criteria between two case reports. RESULTS: For VAERS, the algorithm identified 77% of known duplicate pairs with a precision (or positive predictive value) of 95%. For FAERS, it identified 13% of known duplicate pairs with a precision of 100%. The textual information did not improve the algorithm's automated classification for VAERS or FAERS. The empirical duplicate confidence value increased performance on both VAERS and FAERS, mainly by reducing the occurrence of false-positives. CONCLUSIONS: The algorithm was shown to be effective at identifying pre-linked duplicate VAERS reports. The narrative text was not shown to be a key component in the automated detection evaluation; however, it is essential for supporting the semi-automated approach that is likely to be deployed at the Food and Drug Administration, where medical reviewers will perform some manual review of the most highly ranked reports identified by the algorithm.

Asunto(s)

Sistemas de Registro de Reacción Adversa a Medicamentos , Interpretación Estadística de Datos , Minería de Datos , Bases de Datos Factuales , Humanos , Estados Unidos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA