Replacing non-biomedical concepts improves embedding of biomedical concepts.
bioRxiv
; 2024 Jul 04.
Article
em En
| MEDLINE
| ID: mdl-39005436
ABSTRACT
Objectives:
Concept embeddings are low-dimensional vector representations of concepts such as MeSHD009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings. Materials andmethods:
We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of the synonym set.Results:
We tested our approach on 1055 concept sets and found that, on average, the mean intra-cluster distance was reduced by 8% in the vector-space. Assuming that homophily of related concepts in the vector space is desirable, our approach tends to improve the quality of embeddings. Discussion andConclusion:
This pilot study shows that non-biomedical synonym replacement tends to improve the quality of embeddings of biomedical concepts using the Word2Vec algorithm. We have implemented our approach in a freely available Python package available at https//github.com/TheJacksonLaboratory/wn2vec.
Texto completo:
1
Base de dados:
MEDLINE
Idioma:
En
Ano de publicação:
2024
Tipo de documento:
Article