Search | VHL Search Portal

VAIV bio-discovery service using transformer model and retrieval augmented generation.

Kim, Seonho; Yoon, Juntae.

BMC Bioinformatics ; 25(1): 273, 2024 Aug 21.

Article in English | MEDLINE | ID: mdl-39169321

ABSTRACT

BACKGROUND: There has been a considerable advancement in AI technologies like LLM and machine learning to support biomedical knowledge discovery. MAIN BODY: We propose a novel biomedical neural search service called 'VAIV Bio-Discovery', which supports enhanced knowledge discovery and document search on unstructured text such as PubMed. It mainly handles with information related to chemical compound/drugs, gene/proteins, diseases, and their interactions (chemical compounds/drugs-proteins/gene including drugs-targets, drug-drug, and drug-disease). To provide comprehensive knowledge, the system offers four search options: basic search, entity and interaction search, and natural language search. We employ T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the interaction extraction task by removing the self-attention layer in the decoder block. It also assists in interpreting research findings by summarizing the retrieved search results for a given natural language query with Retrieval Augmented Generation (RAG). The search engine is built with a hybrid method that combines neural search with the probabilistic search, BM25. CONCLUSION: As a result, our system can better understand the context, semantics and relationships between terms within the document, enhancing search accuracy. This research contributes to the rapidly evolving biomedical field by introducing a new service to access and discover relevant knowledge.

Subject(s)

Natural Language Processing , Data Mining/methods , Knowledge Discovery/methods , PubMed , Search Engine , Machine Learning , Information Storage and Retrieval/methods , Neural Networks, Computer

Information Retrieval in an Infodemic: The Case of COVID-19 Publications.

Teodoro, Douglas; Ferdowsi, Sohrab; Borissov, Nikolay; Kashani, Elham; Vicente Alvarez, David; Copara, Jenny; Gouareb, Racha; Naderi, Nona; Amini, Poorya.

J Med Internet Res ; 23(9): e30161, 2021 09 17.

Article in English | MEDLINE | ID: mdl-34375298

ABSTRACT

BACKGROUND: The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19-related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses. OBJECTIVE: In the context of searching for scientific evidence in the deluge of COVID-19-related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language. METHODS: Our multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents. RESULTS: The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25-based baseline, retrieving on average, 83% of relevant documents in the top 20. CONCLUSIONS: These results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19-related questions posed using natural language.

Subject(s)

COVID-19 , Algorithms , Humans , Information Storage and Retrieval , Language , SARS-CoV-2

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL