Your browser doesn't support javascript.
loading
Improving protein coreference resolution by simple semantic classification.
Nguyen, Ngan; Kim, Jin-Dong; Miwa, Makoto; Matsuzaki, Takuya; Tsujii, Junichi.
Afiliação
  • Nguyen N; National Institute of Informatics, Hitotsubashi 2-1-2, Chiyoda-ku, Tokyo, Japan. ngan@nii.ac.jp
BMC Bioinformatics ; 13: 304, 2012 Nov 17.
Article em En | MEDLINE | ID: mdl-23157272
ABSTRACT

BACKGROUND:

Current research has shown that major difficulties in event extraction for the biomedical domain are traceable to coreference. Therefore, coreference resolution is believed to be useful for improving event extraction. To address coreference resolution in molecular biology literature, the Protein Coreference (COREF) task was arranged in the BioNLP Shared Task (BioNLP-ST, hereafter) 2011, as a supporting task. However, the shared task results indicated that transferring coreference resolution methods developed for other domains to the biological domain was not a straight-forward task, due to the domain differences in the coreference phenomena.

RESULTS:

We analyzed the contribution of domain-specific information, including the information that indicates the protein type, in a rule-based protein coreference resolution system. In particular, the domain-specific information is encoded into semantic classification modules for which the output is used in different components of the coreference resolution. We compared our system with the top four systems in the BioNLP-ST 2011; surprisingly, we found that the minimal configuration had outperformed the best system in the BioNLP-ST 2011. Analysis of the experimental results revealed that semantic classification, using protein information, has contributed to an increase in performance by 2.3% on the test data, and 4.0% on the development data, in F-score.

CONCLUSIONS:

The use of domain-specific information in semantic classification is important for effective coreference resolution. Since it is difficult to transfer domain-specific information across different domains, we need to continue seek for methods to utilize such information in coreference resolution.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas / Biologia Computacional Idioma: En Ano de publicação: 2012 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas / Biologia Computacional Idioma: En Ano de publicação: 2012 Tipo de documento: Article