A combined recall and rank framework with online negative sampling for Chinese procedure terminology normalization.

Liang, Ming; Xue, Kui; Ye, Qi; Ruan, Tong

Liang, Ming; Xue, Kui; Ye, Qi; Ruan, Tong.

Affiliation

Liang M; School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China.
Xue K; School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China.
Ye Q; School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China.
Ruan T; School of Information Science and Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China.

Bioinformatics ; 37(20): 3610-3617, 2021 Oct 25.

Article in En | MEDLINE | ID: mdl-34037691

ABSTRACT

MOTIVATION: Medical terminology normalization aims to map the clinical mention to terminologies coming from a knowledge base, which plays an important role in analyzing electronic health record and many downstream tasks. In this article, we focus on Chinese procedure terminology normalization. The expressions of terminology are various and one medical mention may be linked to multiple terminologies. Existing studies based on learning to rank does not fully consider the quality of negative samples during model training and the importance of keywords in this domain-specific task. RESULTS: We propose a combined recall and rank framework to solve these problems. A pair-wise Bert model with deep metric learning is used to recall candidates. Previous methods either train Bert in a point-wise way or based on a multi-class classification problem, which may lead serious efficiency problems or not be effective enough. During model training, we design a novel online negative sampling algorithm to activate the pair-wise method. To deal with multi-implication scenarios, we train the task of implication number prediction together with the recall task in a multi-task learning setting, since these two tasks are highly complementary. In rank step, we propose a keywords attentive mechanism to focus on domain-specific information such as procedure sites and procedure types. Finally, a fusion block merges the results of the recall and the rank model. Detailed experimental analysis shows our proposed framework has a remarkable improvement on both performance and efficiency. AVAILABILITY AND IMPLEMENTATION: The source code will be available at https://github.com/sxthunder/CMTN upon publication.

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2021 Type: Article Affiliation country: China

Fulltext

XML

PubMed Links

Search on Google