Pesquisa | Portal Regional da BVS

Meta-Analysis of Rice Phosphoproteomics Data to Understand Variation in Cell Signaling Across the Rice Pan-Genome.

Ramsbottom, Kerry A; Prakash, Ananth; Perez-Riverol, Yasset; Camacho, Oscar Martin; Sun, Zhi; Kundu, Deepti J; Bowler-Barnett, Emily; Martin, Maria; Fan, Jun; Chebotarov, Dmytro; McNally, Kenneth L; Deutsch, Eric W; Vizcaíno, Juan Antonio; Jones, Andrew R.

J Proteome Res ; 23(7): 2518-2531, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38810119

RESUMO

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,565 phosphosites on serine, threonine, and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety and clustered the data to identify groups of sites with similar patterns across rice family groups. The data has been loaded into UniProt Knowledge-Baseâenabling researchers to visualize sites alongside other data on rice proteins, e.g., structural models from AlphaFold2, PeptideAtlas, and the PRIDE databaseâenabling visualization of source evidence, including scores and supporting mass spectra.

Assuntos

Genoma de Planta , Oryza , Fosfoproteínas , Proteínas de Plantas , Proteômica , Transdução de Sinais , Oryza/genética , Oryza/metabolismo , Oryza/química , Proteômica/métodos , Fosfoproteínas/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/química , Fosfoproteínas/análise , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fosforilação , Processamento de Proteína Pós-Traducional , Fosfopeptídeos/metabolismo , Fosfopeptídeos/análise , Bases de Dados de Proteínas , Motivos de Aminoácidos , Espectrometria de Massas

Open-source large language models in action: A bioinformatics chatbot for PRIDE database.

Bai, Jingwen; Kamatchinathan, Selvakumar; Kundu, Deepti J; Bandla, Chakradhar; Vizcaíno, Juan Antonio; Perez-Riverol, Yasset.

Proteomics ; : e2400005, 2024 Mar 31.

Artigo em Inglês | MEDLINE | ID: mdl-38556628

RESUMO

We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source (https://github.com/PRIDE-Archive/pride-chatbot).

A meta-analysis of rice phosphoproteomics data to understand variation in cell signalling across the rice pan-genome.

Ramsbottom, Kerry A; Prakash, Ananth; Riverol, Yasset Perez; Camacho, Oscar Martin; Sun, Zhi; Kundu, Deepti J; Bowler-Barnett, Emily; Martin, Maria; Fan, Jun; Chebotarov, Dmytro; McNally, Kenneth L; Deutsch, Eric W; Vizcaíno, Juan Antonio; Jones, Andrew R.

bioRxiv ; 2023 Nov 17.

Artigo em Inglês | MEDLINE | ID: mdl-38014076

RESUMO

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have re-analysed publicly available mass spectrometry proteomics datasets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,522 phosphosites on serine, threonine and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety. The data was clustered to identify groups of sites with similar patterns across rice family groups, for example those highly conserved in Japonica, but mostly absent in Aus type rice varieties - known to have different responses to drought. These resources can assist rice researchers to discover alleles with significantly different functional effects across rice varieties. The data has been loaded into UniProt Knowledge-Base - enabling researchers to visualise sites alongside other data on rice proteins e.g. structural models from AlphaFold2, PeptideAtlas and the PRIDE database - enabling visualisation of source evidence, including scores and supporting mass spectra.

The ProteomeXchange consortium at 10 years: 2023 update.

Deutsch, Eric W; Bandeira, Nuno; Perez-Riverol, Yasset; Sharma, Vagisha; Carver, Jeremy J; Mendoza, Luis; Kundu, Deepti J; Wang, Shengbo; Bandla, Chakradhar; Kamatchinathan, Selvakumar; Hewapathirana, Suresh; Pullman, Benjamin S; Wertz, Julie; Sun, Zhi; Kawano, Shin; Okuda, Shujiro; Watanabe, Yu; MacLean, Brendan; MacCoss, Michael J; Zhu, Yunping; Ishihama, Yasushi; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 51(D1): D1539-D1548, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36370099

RESUMO

Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.

Assuntos

Proteômica , Software , Humanos , Bases de Dados de Proteínas , Espectrometria de Massas , Proteômica/métodos , Biologia Computacional/métodos

Expression Atlas update: gene and protein expression in multiple species.

Moreno, Pablo; Fexova, Silvie; George, Nancy; Manning, Jonathan R; Miao, Zhichiao; Mohammed, Suhaib; Muñoz-Pomer, Alfonso; Fullgrabe, Anja; Bi, Yalan; Bush, Natassja; Iqbal, Haider; Kumbham, Upendra; Solovyev, Andrey; Zhao, Lingyun; Prakash, Ananth; García-Seisdedos, David; Kundu, Deepti J; Wang, Shengbo; Walzer, Mathias; Clarke, Laura; Osumi-Sutherland, David; Tello-Ruiz, Marcela Karey; Kumari, Sunita; Ware, Doreen; Eliasova, Jana; Arends, Mark J; Nawijn, Martijn C; Meyer, Kerstin; Burdett, Tony; Marioni, John; Teichmann, Sarah; Vizcaíno, Juan Antonio; Brazma, Alvis; Papatheodorou, Irene.

Nucleic Acids Res ; 50(D1): D129-D140, 2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34850121

RESUMO

The EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, developmental stage, cell type) and under which conditions (disease, treatment, gender, etc) a gene or protein of interest is expressed. Expression Atlas brings together data from >4500 expression studies from >65 different species, across different conditions and tissues. It makes these data freely available in an easy to visualise form, after expert curation to accurately represent the intended experimental design, re-analysed via standardised pipelines that rely on open-source community developed tools. Each study's metadata are annotated using ontologies. The data are re-analyzed with the aim of reproducing the original conclusions of the underlying experiments. Expression Atlas is currently divided into Bulk Expression Atlas and Single Cell Expression Atlas. Expression Atlas contains data from differential studies (microarray and bulk RNA-Seq) and baseline studies (bulk RNA-Seq and proteomics), whereas Single Cell Expression Atlas is currently dedicated to Single Cell RNA-Sequencing (scRNA-Seq) studies. The resource has been in continuous development since 2009 and it is available at https://www.ebi.ac.uk/gxa.

Assuntos

Bases de Dados Genéticas , Proteínas/genética , Proteômica , Software , Biologia Computacional , Perfilação da Expressão Gênica , Humanos , Proteínas/química , RNA-Seq , Análise de Sequência de RNA , Análise de Célula Única

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.

Perez-Riverol, Yasset; Bai, Jingwen; Bandla, Chakradhar; García-Seisdedos, David; Hewapathirana, Suresh; Kamatchinathan, Selvakumar; Kundu, Deepti J; Prakash, Ananth; Frericks-Zipper, Anika; Eisenacher, Martin; Walzer, Mathias; Wang, Shengbo; Brazma, Alvis; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 50(D1): D543-D552, 2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34723319

RESUMO

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.

Assuntos

Bases de Dados de Proteínas , Metadados/estatística & dados numéricos , Anotação de Sequência Molecular/estatística & dados numéricos , Peptídeos/química , Proteínas/química , Software , Sequência de Aminoácidos , Bibliometria , Conjuntos de Dados como Assunto , Humanos , Armazenamento e Recuperação da Informação , Internet , Espectrometria de Massas , Peptídeos/genética , Peptídeos/metabolismo , Proteínas/genética , Proteínas/metabolismo , Proteômica/instrumentação , Proteômica/métodos , Alinhamento de Sequência

An integrated landscape of protein expression in human cancer.

Jarnuczak, Andrew F; Najgebauer, Hanna; Barzine, Mitra; Kundu, Deepti J; Ghavidel, Fatemeh; Perez-Riverol, Yasset; Papatheodorou, Irene; Brazma, Alvis; Vizcaíno, Juan Antonio.

Sci Data ; 8(1): 115, 2021 04 23.

Artigo em Inglês | MEDLINE | ID: mdl-33893311

RESUMO

Using 11 proteomics datasets, mostly available through the PRIDE database, we assembled a reference expression map for 191 cancer cell lines and 246 clinical tumour samples, across 13 lineages. We found unique peptides identified only in tumour samples despite a much higher coverage in cell lines. These were mainly mapped to proteins related to regulation of signalling receptor activity. Correlations between baseline expression in cell lines and tumours were calculated. We found these to be highly similar across all samples with most similarity found within a given sample type. Integration of proteomics and transcriptomics data showed median correlation across cell lines to be 0.58 (range between 0.43 and 0.66). Additionally, in agreement with previous studies, variation in mRNA levels was often a poor predictor of changes in protein abundance. To our knowledge, this work constitutes the first meta-analysis focusing on cancer-related public proteomics datasets. We therefore also highlight shortcomings and limitations of such studies. All data is available through PRIDE dataset identifier PXD013455 and in Expression Atlas.

Assuntos

Proteínas de Neoplasias/biossíntese , Neoplasias/metabolismo , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Humanos , Proteínas de Neoplasias/genética , Neoplasias/genética , Proteômica , RNA Mensageiro/biossíntese , RNA Mensageiro/genética , Transcriptoma

The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics.

Deutsch, Eric W; Bandeira, Nuno; Sharma, Vagisha; Perez-Riverol, Yasset; Carver, Jeremy J; Kundu, Deepti J; García-Seisdedos, David; Jarnuczak, Andrew F; Hewapathirana, Suresh; Pullman, Benjamin S; Wertz, Julie; Sun, Zhi; Kawano, Shin; Okuda, Shujiro; Watanabe, Yu; Hermjakob, Henning; MacLean, Brendan; MacCoss, Michael J; Zhu, Yunping; Ishihama, Yasushi; Vizcaíno, Juan A.

Nucleic Acids Res ; 48(D1): D1145-D1152, 2020 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-31686107

RESUMO

The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) has standardized data submission and dissemination of mass spectrometry proteomics data worldwide since 2012. In this paper, we describe the main developments since the previous update manuscript was published in Nucleic Acids Research in 2017. Since then, in addition to the four PX existing members at the time (PRIDE, PeptideAtlas including the PASSEL resource, MassIVE and jPOST), two new resources have joined PX: iProX (China) and Panorama Public (USA). We first describe the updated submission guidelines, now expanded to include six members. Next, with current data submission statistics, we demonstrate that the proteomics field is now actively embracing public open data policies. At the end of June 2019, more than 14 100 datasets had been submitted to PX resources since 2012, and from those, more than 9 500 in just the last three years. In parallel, an unprecedented increase of data re-use activities in the field, including 'big data' approaches, is enabling novel research and new data resources. At last, we also outline some of our future plans for the coming years.

Assuntos

Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteômica/métodos , Big Data , Mineração de Dados , Software , Design de Software , Navegador

The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Perez-Riverol, Yasset; Csordas, Attila; Bai, Jingwen; Bernal-Llinares, Manuel; Hewapathirana, Suresh; Kundu, Deepti J; Inuganti, Avinash; Griss, Johannes; Mayer, Gerhard; Eisenacher, Martin; Pérez, Enrique; Uszkoreit, Julian; Pfeuffer, Julianus; Sachsenberg, Timo; Yilmaz, Sule; Tiwary, Shivani; Cox, Jürgen; Audain, Enrique; Walzer, Mathias; Jarnuczak, Andrew F; Ternent, Tobias; Brazma, Alvis; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 47(D1): D442-D450, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30395289

RESUMO

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

Assuntos

Bases de Dados de Proteínas , Espectrometria de Massas , Proteômica , Peptídeos/química , Software

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA