SNER: Semi-supervised Named Entity Recognition for Large Volume of Diabetes Data.
IEEE J Biomed Health Inform
; PP2024 Jun 11.
Article
em En
| MEDLINE
| ID: mdl-38861441
ABSTRACT
The medical literature and records on diabetes provide crucial resources for diabetes prevention and treatment. However, extracting entities from these textual diabetes data is crucial but challenging. Named entity recognition (NER) - an important corner-stone technology of natural language processing - has been studied well in the general medical field. However, there is still a lack of effective NER methods to handle diabetes data. Briefly, there are three challenges in the real world, including 1) the large volume of diabetes-related data to be processed, 2) the lack of labeled data, and 3) the high costs of manual labeling. To mitigate those challenges, this paper proposes a novel NER method based on semi-supervised learning, namely SNER, for diabetes data processing. It utilizes large amounts of unlabeled data to solve the problem of lack of labeled data. Specifically, it filters the predicted labels based on their confidence and uncertainty scores to reduce the noise entering the model and divide them into positive pseudo-labels and negative pseudo-labels. Also, it utilizes negative pseudo-labels reasonably to improve the training effect of pseudo-labels. Experiments on two public diabetes datasets show that SNER achieves the best performance compared with existing state-of-the-art models.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Idioma:
En
Revista:
IEEE J Biomed Health Inform
Ano de publicação:
2024
Tipo de documento:
Article
País de publicação:
Estados Unidos