A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER.

Hu, Yun; He, Hao; Chen, Zhengfei; Zhu, Qingmeng; Zheng, Changwen

Hu, Yun; He, Hao; Chen, Zhengfei; Zhu, Qingmeng; Zheng, Changwen.

Afiliação

Hu Y; Institute of Software, Chinese Academy of Sciences, Haidian, Beijing 100190, China.
He H; University of Chinese Academy of Sciences, Haidian, Beijing 100190, China.
Chen Z; Institute of Software, Chinese Academy of Sciences, Haidian, Beijing 100190, China.
Zhu Q; Shenzhen Power Supply Bureau Co., Ltd., Shenzhen 518001, China.
Zheng C; Institute of Software, Chinese Academy of Sciences, Haidian, Beijing 100190, China.

Comput Intell Neurosci ; 2022: 1987829, 2022.

Article em En | MEDLINE | ID: mdl-35676955

ABSTRACT

ABSTRACT

Named entity recognition (NER) systems are often realized by supervised methods that require large hand-annotated data. When the hand-annotated data is limited, distantly supervised (DS) data and cross-domain (CD) data are usually used separately to improve the performance. The distantly supervised data can provide in-domain dictionary information, and the hand-annotated cross-domain information can be provided by cross-domain data. These two types of information are complemental. However, there are two problems required to be solved before using directly. First, the distantly supervised data may contain a lot of noise. Second, directly using cross-domain data may degrade performance due to the distribution mismatching problem. In this paper, we propose a unified model named PARE (PArtial learning and REinforcement learning). The PARE model can simultaneously use distantly supervised data and cross-domain data as external data. The model uses the partial learning method with a new label strategy to better handle the noise in distantly supervised data. The reinforcement learning method is used to alleviate the distribution mismatching problem in cross-domain data. Experiments in three datasets show that our model outperforms other baseline models. Besides, our model can be used in the situation where no hand-annotated in-domain data is provided.

Assuntos

Aprendizagem; Aprendizado de Máquina; Reconhecimento Psicológico

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Aprendizagem Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Aprendizagem Idioma: En Ano de publicação: 2022 Tipo de documento: Article