Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition.
Comput Biol Med
; 108: 122-132, 2019 05.
Article
em En
| MEDLINE
| ID: mdl-31003175
ABSTRACT
BACKGROUND:
Disease named entity recognition (NER) plays an important role in biomedical research. There are a significant number of challenging issues to be addressed; among these, the identification of rare diseases and complex disease names and the problem of tagging inconsistency (i.e., if an entity is tagged differently in a document) are attracting substantial research attention.METHODS:
We propose a new neural network method named Dic-Att-BiLSTM-CRF (DABLC) for disease NER. DABLC applies an efficient exact string matching method to match disease entities with a disease dictionary; here, the dictionary is constructed based on the Disease Ontology. Furthermore, DABLC constructs a dictionary attention layer by incorporating a disease dictionary matching method and document-level attention mechanism. Finally, a bidirectional long short-term memory network and conditional random field (BiLSTM-CRF) with a dictionary attention layer is proposed to combine the disease dictionary to develop disease NER.RESULTS:
Extensive experiments are conducted on two widely-used corpora the NCBI disease corpus and the BioCreative V CDR corpus. We apply each test on 10 executions of each model, with a 95% confidence interval. DABLC achieves the highest F1 scores (NCBI Precisionâ¯=â¯0.883, Recallâ¯=â¯0.89, F1â¯=â¯0.886; BioCreative V CDR Precisionâ¯=â¯0.891, Recallâ¯=â¯0.875, F1â¯=â¯0.883), outperforming the state-of-the-art methods.CONCLUSION:
DABLC combines the advantages of both external dictionary resources and deep attention neural networks. This aids the identification of rare diseases and complex disease names; moreover, it reduces the impact of tagging inconsistency. Special disease NER and deep learning models addressing long sentences are noteworthy areas for future examination.Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Doença
/
Mineração de Dados
/
Aprendizado Profundo
/
Idioma
Idioma:
En
Ano de publicação:
2019
Tipo de documento:
Article