Your browser doesn't support javascript.
loading
Term-BLAST-like alignment tool for concept recognition in noisy clinical texts.
Groza, Tudor; Wu, Honghan; Dinger, Marcel E; Danis, Daniel; Hilton, Coleman; Bagley, Anita; Davids, Jon R; Luo, Ling; Lu, Zhiyong; Robinson, Peter N.
Afiliação
  • Groza T; Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia.
  • Wu H; Genetics and Rare Diseases Program, Telethon Kids Institute, Nedlands, WA 6009, Australia.
  • Dinger ME; Institute of Health Informatics, University College London, London WC1E 6BT, United Kingdom.
  • Danis D; Pryzm Health, Sydney, NSW 2089, Australia.
  • Hilton C; School of Life and Environmental Sciences, Faculty of Science, University of Sydney, NSW 2006, Australia.
  • Bagley A; The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States.
  • Davids JR; Shriners Children's Corporate Headquarters, Tampa, FL 33607, United States.
  • Luo L; Shriners Children's Northern California, Sacramento, CA 95817, United States.
  • Lu Z; Shriners Children's Northern California, Sacramento, CA 95817, United States.
  • Robinson PN; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States.
Bioinformatics ; 39(12)2023 12 01.
Article em En | MEDLINE | ID: mdl-38001031
ABSTRACT
MOTIVATION Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts.

RESULTS:

Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https//github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Idioma Limite: Humans Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Austrália

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Idioma Limite: Humans Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Austrália