Your browser doesn't support javascript.
loading
Replacing non-biomedical concepts improves embedding of biomedical concepts.
Niyonkuru, Enock; Gomez, Mauricio Soto; Casiraghi, Elena; Antogiovanni, Stephan; Blau, Hannah; Reese, Justin T; Valentini, Giorgio; Robinson, Peter N.
Afiliación
  • Niyonkuru E; The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.
  • Gomez MS; Trinity College, Hartford, CT, USA.
  • Casiraghi E; AnacletoLab, Computer Science Department, Dipartimento di Informatica, Università degli Studi di Milano, Milan, 20133, Italy.
  • Antogiovanni S; AnacletoLab, Computer Science Department, Dipartimento di Informatica, Università degli Studi di Milano, Milan, 20133, Italy.
  • Blau H; The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.
  • Reese JT; Trinity College, Hartford, CT, USA.
  • Valentini G; The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.
  • Robinson PN; Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
bioRxiv ; 2024 Jul 04.
Article en En | MEDLINE | ID: mdl-39005436
ABSTRACT

Objectives:

Concept embeddings are low-dimensional vector representations of concepts such as MeSHD009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings. Materials and

methods:

We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of the synonym set.

Results:

We tested our approach on 1055 concept sets and found that, on average, the mean intra-cluster distance was reduced by 8% in the vector-space. Assuming that homophily of related concepts in the vector space is desirable, our approach tends to improve the quality of embeddings. Discussion and

Conclusion:

This pilot study shows that non-biomedical synonym replacement tends to improve the quality of embeddings of biomedical concepts using the Word2Vec algorithm. We have implemented our approach in a freely available Python package available at https//github.com/TheJacksonLaboratory/wn2vec.

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos