Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases.

Bhasuran, Balu; Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar

Bhasuran, Balu; Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar.

Afiliação

Bhasuran B; DRDO-BU Center for Life Sciences, Bharathiar University Campus, Coimbatore 641046, India.
Murugesan G; Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore 641046, India.
Abdulkadhar S; Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore 641046, India.
Natarajan J; DRDO-BU Center for Life Sciences, Bharathiar University Campus, Coimbatore 641046, India; Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore 641046, India. Electronic address: n.jeyakumar@yahoo.co.in.

J Biomed Inform ; 64: 1-9, 2016 12.

Article em En | MEDLINE | ID: mdl-27634494

ABSTRACT

ABSTRACT

Biomedical Named Entity Recognition (Bio-NER) is the crucial initial step in the information extraction process and a majorly focused research area in biomedical text mining. In the past years, several models and methodologies have been proposed for the recognition of semantic types related to gene, protein, chemical, drug and other biological relevant named entities. In this paper, we implemented a stacked ensemble approach combined with fuzzy matching for biomedical named entity recognition of disease names. The underlying concept of stacked generalization is to combine the outputs of base-level classifiers using a second-level meta-classifier in an ensemble. We used Conditional Random Field (CRF) as the underlying classification method that makes use of a diverse set of features, mostly based on domain specific, and are orthographic and morphologically relevant. In addition, we used fuzzy string matching to tag rare disease names from our in-house disease dictionary. For fuzzy matching, we incorporated two best fuzzy search algorithms Rabin Karp and Tuned Boyer Moore. Our proposed approach shows promised result of 94.66%, 89.12%, 84.10%, and 76.71% of F-measure while on evaluating training and testing set of both NCBI disease and BioCreative V CDR Corpora.

Assuntos

Algoritmos; Biologia Computacional; Mineração de Dados; Doença; Classificação; Lógica Fuzzy; Genes; Humanos; Proteínas

Palavras-chave

Biomedical named entity recognition; Fuzzy matching; Machine learning; Stacked ensemble; Text mining

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Doença / Biologia Computacional / Mineração de Dados Limite: Humans Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google