Cross-lingual Unified Medical Language System entity linking in online health communities.
J Am Med Inform Assoc
; 27(10): 1585-1592, 2020 10 01.
Article
em En
| MEDLINE
| ID: mdl-32910823
OBJECTIVE: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. MATERIALS AND METHODS: We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. RESULTS: We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. CONCLUSIONS: Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel).
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Tradução
/
Unified Medical Language System
Limite:
Humans
País/Região como assunto:
Asia
Idioma:
En
Revista:
J Am Med Inform Assoc
Assunto da revista:
INFORMATICA MEDICA
Ano de publicação:
2020
Tipo de documento:
Article
País de afiliação:
Israel
País de publicação:
Reino Unido