Your browser doesn't support javascript.
loading
A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER.
Hu, Yun; He, Hao; Chen, Zhengfei; Zhu, Qingmeng; Zheng, Changwen.
Afiliação
  • Hu Y; Institute of Software, Chinese Academy of Sciences, Haidian, Beijing 100190, China.
  • He H; University of Chinese Academy of Sciences, Haidian, Beijing 100190, China.
  • Chen Z; Institute of Software, Chinese Academy of Sciences, Haidian, Beijing 100190, China.
  • Zhu Q; Shenzhen Power Supply Bureau Co., Ltd., Shenzhen 518001, China.
  • Zheng C; Institute of Software, Chinese Academy of Sciences, Haidian, Beijing 100190, China.
Comput Intell Neurosci ; 2022: 1987829, 2022.
Article em En | MEDLINE | ID: mdl-35676955
ABSTRACT
Named entity recognition (NER) systems are often realized by supervised methods that require large hand-annotated data. When the hand-annotated data is limited, distantly supervised (DS) data and cross-domain (CD) data are usually used separately to improve the performance. The distantly supervised data can provide in-domain dictionary information, and the hand-annotated cross-domain information can be provided by cross-domain data. These two types of information are complemental. However, there are two problems required to be solved before using directly. First, the distantly supervised data may contain a lot of noise. Second, directly using cross-domain data may degrade performance due to the distribution mismatching problem. In this paper, we propose a unified model named PARE (PArtial learning and REinforcement learning). The PARE model can simultaneously use distantly supervised data and cross-domain data as external data. The model uses the partial learning method with a new label strategy to better handle the noise in distantly supervised data. The reinforcement learning method is used to alleviate the distribution mismatching problem in cross-domain data. Experiments in three datasets show that our model outperforms other baseline models. Besides, our model can be used in the situation where no hand-annotated in-domain data is provided.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Aprendizagem Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Aprendizagem Idioma: En Ano de publicação: 2022 Tipo de documento: Article