Your browser doesn't support javascript.
loading
VAIV bio-discovery service using transformer model and retrieval augmented generation.
Kim, Seonho; Yoon, Juntae.
Affiliation
  • Kim S; Department of Computer Science, Sogang University, 35, Baekbeom-Ro, Mapo-Gu, Seoul, Korea. shkim.lex@gmail.com.
  • Yoon J; VAIV Company Inc, 97, Dokseodang-Ro, Yongsan-Gu, Seoul, Korea. jtyoon@vaiv.kr.
BMC Bioinformatics ; 25(1): 273, 2024 Aug 21.
Article in En | MEDLINE | ID: mdl-39169321
ABSTRACT

BACKGROUND:

There has been a considerable advancement in AI technologies like LLM and machine learning to support biomedical knowledge discovery. MAIN BODY We propose a novel biomedical neural search service called 'VAIV Bio-Discovery', which supports enhanced knowledge discovery and document search on unstructured text such as PubMed. It mainly handles with information related to chemical compound/drugs, gene/proteins, diseases, and their interactions (chemical compounds/drugs-proteins/gene including drugs-targets, drug-drug, and drug-disease). To provide comprehensive knowledge, the system offers four search options basic search, entity and interaction search, and natural language search. We employ T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the interaction extraction task by removing the self-attention layer in the decoder block. It also assists in interpreting research findings by summarizing the retrieved search results for a given natural language query with Retrieval Augmented Generation (RAG). The search engine is built with a hybrid method that combines neural search with the probabilistic search, BM25.

CONCLUSION:

As a result, our system can better understand the context, semantics and relationships between terms within the document, enhancing search accuracy. This research contributes to the rapidly evolving biomedical field by introducing a new service to access and discover relevant knowledge.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Natural Language Processing Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2024 Type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Natural Language Processing Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2024 Type: Article