Your browser doesn't support javascript.
loading
Open-source large language models in action: A bioinformatics chatbot for PRIDE database.
Bai, Jingwen; Kamatchinathan, Selvakumar; Kundu, Deepti J; Bandla, Chakradhar; Vizcaíno, Juan Antonio; Perez-Riverol, Yasset.
Afiliación
  • Bai J; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
  • Kamatchinathan S; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
  • Kundu DJ; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
  • Bandla C; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
  • Vizcaíno JA; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
  • Perez-Riverol Y; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Proteomics ; : e2400005, 2024 Mar 31.
Article en En | MEDLINE | ID: mdl-38556628
ABSTRACT
We here present a chatbot assistant infrastructure (https//www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM) llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source (https//github.com/PRIDE-Archive/pride-chatbot).
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: Proteomics Asunto de la revista: BIOQUIMICA Año: 2024 Tipo del documento: Article País de afiliación: Reino Unido

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: Proteomics Asunto de la revista: BIOQUIMICA Año: 2024 Tipo del documento: Article País de afiliación: Reino Unido