Pesquisa | BVS Aleitamento Materno

The ProteomeXchange consortium at 10 years: 2023 update.

Deutsch, Eric W; Bandeira, Nuno; Perez-Riverol, Yasset; Sharma, Vagisha; Carver, Jeremy J; Mendoza, Luis; Kundu, Deepti J; Wang, Shengbo; Bandla, Chakradhar; Kamatchinathan, Selvakumar; Hewapathirana, Suresh; Pullman, Benjamin S; Wertz, Julie; Sun, Zhi; Kawano, Shin; Okuda, Shujiro; Watanabe, Yu; MacLean, Brendan; MacCoss, Michael J; Zhu, Yunping; Ishihama, Yasushi; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 51(D1): D1539-D1548, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36370099

RESUMO

Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.

Assuntos

Proteômica , Software , Humanos , Bases de Dados de Proteínas , Espectrometria de Massas , Proteômica/métodos , Biologia Computacional/métodos

Open-source large language models in action: A bioinformatics chatbot for PRIDE database.

Bai, Jingwen; Kamatchinathan, Selvakumar; Kundu, Deepti J; Bandla, Chakradhar; Vizcaíno, Juan Antonio; Perez-Riverol, Yasset.

Proteomics ; : e2400005, 2024 Mar 31.

Artigo em Inglês | MEDLINE | ID: mdl-38556628

RESUMO

We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source (https://github.com/PRIDE-Archive/pride-chatbot).

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.

Perez-Riverol, Yasset; Bai, Jingwen; Bandla, Chakradhar; García-Seisdedos, David; Hewapathirana, Suresh; Kamatchinathan, Selvakumar; Kundu, Deepti J; Prakash, Ananth; Frericks-Zipper, Anika; Eisenacher, Martin; Walzer, Mathias; Wang, Shengbo; Brazma, Alvis; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 50(D1): D543-D552, 2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34723319

RESUMO

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.

Assuntos

Bases de Dados de Proteínas , Metadados/estatística & dados numéricos , Anotação de Sequência Molecular/estatística & dados numéricos , Peptídeos/química , Proteínas/química , Software , Sequência de Aminoácidos , Bibliometria , Conjuntos de Dados como Assunto , Humanos , Armazenamento e Recuperação da Informação , Internet , Espectrometria de Massas , Peptídeos/genética , Peptídeos/metabolismo , Proteínas/genética , Proteínas/metabolismo , Proteômica/instrumentação , Proteômica/métodos , Alinhamento de Sequência

A proteomics sample metadata representation for multiomics integration and big data analysis.

Dai, Chengxin; Füllgrabe, Anja; Pfeuffer, Julianus; Solovyeva, Elizaveta M; Deng, Jingwen; Moreno, Pablo; Kamatchinathan, Selvakumar; Kundu, Deepti Jaiswal; George, Nancy; Fexova, Silvie; Grüning, Björn; Föll, Melanie Christine; Griss, Johannes; Vaudel, Marc; Audain, Enrique; Locard-Paulet, Marie; Turewicz, Michael; Eisenacher, Martin; Uszkoreit, Julian; Van Den Bossche, Tim; Schwämmle, Veit; Webel, Henry; Schulze, Stefan; Bouyssié, David; Jayaram, Savita; Duggineni, Vinay Kumar; Samaras, Patroklos; Wilhelm, Mathias; Choi, Meena; Wang, Mingxun; Kohlbacher, Oliver; Brazma, Alvis; Papatheodorou, Irene; Bandeira, Nuno; Deutsch, Eric W; Vizcaíno, Juan Antonio; Bai, Mingze; Sachsenberg, Timo; Levitsky, Lev I; Perez-Riverol, Yasset.

Nat Commun ; 12(1): 5854, 2021 10 06.

Artigo em Inglês | MEDLINE | ID: mdl-34615866

RESUMO

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Assuntos

Análise de Dados , Bases de Dados de Proteínas , Metadados , Proteômica , Big Data , Humanos , Reprodutibilidade dos Testes , Software , Transcriptoma

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA