Search | VHL Regional Portal

Quantifying outcome misclassification in multi-database studies: The case study of pertussis in the ADVANCE project.

Gini, Rosa; Dodd, Caitlin N; Bollaerts, Kaatje; Bartolini, Claudia; Roberto, Giuseppe; Huerta-Alvarez, Consuelo; Martín-Merino, Elisa; Duarte-Salles, Talita; Picelli, Gino; Tramontan, Lara; Danieli, Giorgia; Correa, Ana; McGee, Chris; Becker, Benedikt F H; Switzer, Charlotte; Gandhi-Banga, Sonja; Bauwens, Jorgen; van der Maas, Nicoline A T; Spiteri, Gianfranco; Sdona, Emmanouela; Weibel, Daniel; Sturkenboom, Miriam.

Vaccine ; 38 Suppl 2: B56-B64, 2020 12 22.

Article in English | MEDLINE | ID: mdl-31677950

ABSTRACT

BACKGROUND: The Accelerated Development of VAccine beNefit-risk Collaboration in Europe (ADVANCE) is a public-private collaboration aiming to develop and test a system for rapid benefit-risk (B/R) monitoring of vaccines using European healthcare databases. Event misclassification can result in biased estimates. Using different algorithms for identifying cases of Bordetella pertussis (BorPer) infection as a test case, we aimed to describe a strategy to quantify event misclassification, when manual chart review is not feasible. METHODS: Four participating databases retrieved data from primary care (PC) setting: BIFAP: (Spain), THIN and RCGP RSC (UK) and PEDIANET (Italy); SIDIAP (Spain) retrieved data from both PC and hospital settings. BorPer algorithms were defined by healthcare setting, data domain (diagnoses, drugs, or laboratory tests) and concept sets (specific or unspecified pertussis). Algorithm- and database-specific BorPer incidence rates (IRs) were estimated in children aged 0-14â¯years enrolled in 2012 and 2014 and followed up until the end of each calendar year and compared with IRs of confirmed pertussis from the ECDC surveillance system (TESSy). Novel formulas were used to approximate validity indices, based on a small set of assumptions. They were applied to approximately estimate positive predictive value (PPV) and sensitivity in SIDIAP. RESULTS: The number of cases and the estimated BorPer IRs per 100,000 person-years in PC, using data representing 3,173,268 person-years, were 0 (IRâ¯=â¯0.0), 21 (IRâ¯=â¯4.3), 21 (IRâ¯=â¯5.1), 79 (IRâ¯=â¯5.7), and 2 (IRâ¯=â¯2.3) in BIFAP, SIDIAP, THIN, RCGP RSC and PEDIANET respectively. The IRs for combined specific/unspecified pertussis were higher than TESSy, suggesting that some false positives had been included. In SIDIAP the estimated IR was 45.0 when discharge diagnoses were included. The sensitivity and PPV of combined PC specific and unspecific diagnoses for BorPer cases in SIDIAP were approximately 85% and 72%, respectively. CONCLUSION: Retrieving BorPer cases using only specific concepts has low sensitivity in PC databases, while including cases retrieved by unspecified concepts introduces false positives, which were approximately estimated to be 28% in one database. The share of cases that cannot be retrieved from a PC database because they are only seen in hospital was approximately estimated to be 15% in one database. This study demonstrated that quantifying the impact of different event-finding algorithms across databases and benchmarking with disease surveillance data can provide approximate estimates of algorithm validity.

Subject(s)

Pertussis Vaccine , Whooping Cough , Adolescent , Child , Child, Preschool , Databases, Factual , Electronic Health Records , Europe , Humans , Infant , Infant, Newborn , Italy , Pertussis Vaccine/adverse effects , Spain , Whooping Cough/diagnosis , Whooping Cough/epidemiology , Whooping Cough/prevention & control

CodeMapper: semiautomatic coding of case definitions. A contribution from the ADVANCE project.

Becker, Benedikt F H; Avillach, Paul; Romio, Silvana; van Mulligen, Erik M; Weibel, Daniel; Sturkenboom, Miriam C J M; Kors, Jan A.

Pharmacoepidemiol Drug Saf ; 26(8): 998-1005, 2017 Aug.

Article in English | MEDLINE | ID: mdl-28657162

ABSTRACT

BACKGROUND: Assessment of drug and vaccine effects by combining information from different healthcare databases in the European Union requires extensive efforts in the harmonization of codes as different vocabularies are being used across countries. In this paper, we present a web application called CodeMapper, which assists in the mapping of case definitions to codes from different vocabularies, while keeping a transparent record of the complete mapping process. METHODS: CodeMapper builds upon coding vocabularies contained in the Metathesaurus of the Unified Medical Language System. The mapping approach consists of three phases. First, medical concepts are automatically identified in a free-text case definition. Second, the user revises the set of medical concepts by adding or removing concepts, or expanding them to related concepts that are more general or more specific. Finally, the selected concepts are projected to codes from the targeted coding vocabularies. We evaluated the application by comparing codes that were automatically generated from case definitions by applying CodeMapper's concept identification and successive concept expansion, with reference codes that were manually created in a previous epidemiological study. RESULTS: Automated concept identification alone had a sensitivity of 0.246 and positive predictive value (PPV) of 0.420 for reproducing the reference codes. Three successive steps of concept expansion increased sensitivity to 0.953 and PPV to 0.616. CONCLUSIONS: Automatic concept identification in the case definition alone was insufficient to reproduce the reference codes, but CodeMapper's operations for concept expansion provide an effective, efficient, and transparent way for reproducing the reference codes.

Subject(s)

Databases, Factual/statistics & numerical data , International Classification of Diseases/statistics & numerical data , Medical Records Systems, Computerized/statistics & numerical data , Unified Medical Language System/statistics & numerical data , Europe/epidemiology , Humans

Evaluation of a multinational, multilingual vaccine debate on Twitter.

Becker, Benedikt F H; Larson, Heidi J; Bonhoeffer, Jan; van Mulligen, Erik M; Kors, Jan A; Sturkenboom, Miriam C J M.

Vaccine ; 34(50): 6166-6171, 2016 12 07.

Article in English | MEDLINE | ID: mdl-27840012

ABSTRACT

BACKGROUND: Public confidence in an immunization programme is a pivotal determinant of the programme's success. The mining of social media is increasingly employed to provide insight into the public's sentiment. This research further explores the value of monitoring social media to understand public sentiment about an international vaccination programme. OBJECTIVE: To gain insight into international public discussion on the paediatric pentavalent vaccine (DTP-HepB-Hib) programme by analysing Twitter messages. METHODS: Using a multilingual search, we retrospectively collected all public Twitter messages mentioning the DTP-HepB-Hib vaccine from July 2006 until May 2015. We analysed message characteristics by frequency of referencing other websites, type of websites, and geographic focus of the discussion. In addition, a sample of messages was manually annotated for positive or negative message tone. RESULTS: We retrieved 5771 messages. Only 3.1% of the messages were reactions to other messages, and 86.6% referred to websites, mostly news sites (70.7%), other social media (9.8%), and health-information sites (9.5%). Country mentions were identified in 70.4% of the messages, of which India (35.4%), Indonesia (18.3%), and Vietnam (13.9%) were the most prevalent. In the annotated sample, 63% of the messages showed a positive or neutral sentiment about DTP-HepB-Hib. Peaks in negative and positive messages could be related to country-specific programme events. CONCLUSIONS: Public messages about DTP-HepB-Hib were characterized by little interaction between tweeters, and by frequent referencing of websites and other information links. Twitter messages can indirectly reflect the public's opinion about major events in the debates about the DTP-HepB-Hib vaccine.

Subject(s)

Diphtheria-Tetanus-Pertussis Vaccine/adverse effects , Diphtheria-Tetanus-Pertussis Vaccine/immunology , Haemophilus Vaccines/adverse effects , Haemophilus Vaccines/immunology , Hepatitis B Vaccines/adverse effects , Hepatitis B Vaccines/immunology , Immunization/adverse effects , Immunization/psychology , Public Opinion , Social Media , Diphtheria-Tetanus-Pertussis Vaccine/administration & dosage , Haemophilus Vaccines/administration & dosage , Hepatitis B Vaccines/administration & dosage , Humans , Retrospective Studies

Chemical entity recognition in patents by combining dictionary-based and statistical approaches.

Akhondi, Saber A; Pons, Ewoud; Afzal, Zubair; van Haagen, Herman; Becker, Benedikt F H; Hettne, Kristina M; van Mulligen, Erik M; Kors, Jan A.

Database (Oxford) ; 20162016.

Article in English | MEDLINE | ID: mdl-27141091

ABSTRACT

We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistical one. For this purpose the performance of several lexical resources was assessed using Peregrine, our open-source indexing engine. We combined our dictionary-based results on the patent corpus with the results of tmChem, a chemical recognizer using a conditional random field classifier. To improve the performance of tmChem, we utilized three additional features, viz. part-of-speech tags, lemmas and word-vector clusters. When evaluated on the training data, our final system obtained an F-score of 85.21% for the CEMP task, and an accuracy of 91.53% for the CPD task. On the test set, the best system ranked sixth among 21 teams for CEMP with an F-score of 86.82%, and second among nine teams for CPD with an accuracy of 94.23%. The differences in performance between the best ensemble system and the statistical system separately were small.Database URL: http://biosemantics.org/chemdner-patents.

Subject(s)

Data Mining/methods , Databases, Chemical , Machine Learning , Patents as Topic , Models, Statistical , Software

Extraction of chemical-induced diseases using prior knowledge and textual information.

Pons, Ewoud; Becker, Benedikt F H; Akhondi, Saber A; Afzal, Zubair; van Mulligen, Erik M; Kors, Jan A.

Database (Oxford) ; 20162016.

Article in English | MEDLINE | ID: mdl-27081155

ABSTRACT

We describe our approach to the chemical-disease relation (CDR) task in the BioCreative V challenge. The CDR task consists of two subtasks: automatic disease-named entity recognition and normalization (DNER), and extraction of chemical-induced diseases (CIDs) from Medline abstracts. For the DNER subtask, we used our concept recognition tool Peregrine, in combination with several optimization steps. For the CID subtask, our system, which we named RELigator, was trained on a rich feature set, comprising features derived from a graph database containing prior knowledge about chemicals and diseases, and linguistic and statistical features derived from the abstracts in the CDR training corpus. We describe the systems that were developed and present evaluation results for both subtasks on the CDR test set. For DNER, our Peregrine system reached anF-score of 0.757. For CID, the system achieved anF-score of 0.526, which ranked second among 18 participating teams. Several post-challenge modifications of the systems resulted in substantially improvedF-scores (0.828 for DNER and 0.602 for CID). RELigator is available as a web service athttp://biosemantics.org/index.php/software/religator.

Subject(s)

Computational Biology/methods , Data Mining/methods , Databases, Factual , Disease/etiology , Hazardous Substances/toxicity , Humans , Toxicogenetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL