Pesquisa | BVS Integralidade em Saúde

1.

EMBL's European Bioinformatics Institute (EMBL-EBI) in 2023.

Thakur, Matthew; Buniello, Annalisa; Brooksbank, Catherine; Gurwitz, Kim T; Hall, Matthew; Hartley, Matthew; Hulcoop, David G; Leach, Andrew R; Marques, Diana; Martin, Maria; Mithani, Aziz; McDonagh, Ellen M; Mutasa-Gottgens, Euphemia; Ochoa, David; Perez-Riverol, Yasset; Stephenson, James; Varadi, Mihaly; Velankar, Sameer; Vizcaino, Juan Antonio; Witham, Rick; McEntyre, Johanna.

Nucleic Acids Res ; 52(D1): D10-D17, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-38015445

RESUMO

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the latest developments in the services provided by EMBL-EBI data resources to scientific communities globally. These developments aim to ensure EMBL-EBI resources meet the current and future needs of these scientific communities, accelerating the impact of open biological data for all.

Assuntos

Academias e Institutos , Biologia Computacional , Biologia Computacional/organização & administração , Biologia Computacional/tendências , Academias e Institutos/organização & administração , Academias e Institutos/tendências , Bases de Dados de Ácidos Nucleicos , Europa (Continente)

2.

Expression Atlas update: insights from sequencing data at both bulk and single cell level.

George, Nancy; Fexova, Silvie; Fuentes, Alfonso Munoz; Madrigal, Pedro; Bi, Yalan; Iqbal, Haider; Kumbham, Upendra; Nolte, Nadja Francesca; Zhao, Lingyun; Thanki, Anil S; Yu, Iris D; Marugan Calles, Jose C; Erdos, Karoly; Vilmovsky, Liora; Kurri, Sandeep R; Vathrakokoili-Pournara, Anna; Osumi-Sutherland, David; Prakash, Ananth; Wang, Shengbo; Tello-Ruiz, Marcela K; Kumari, Sunita; Ware, Doreen; Goutte-Gattat, Damien; Hu, Yanhui; Brown, Nick; Perrimon, Norbert; Vizcaíno, Juan Antonio; Burdett, Tony; Teichmann, Sarah; Brazma, Alvis; Papatheodorou, Irene.

Nucleic Acids Res ; 52(D1): D107-D114, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37992296

RESUMO

Expression Atlas (www.ebi.ac.uk/gxa) and its newest counterpart the Single Cell Expression Atlas (www.ebi.ac.uk/gxa/sc) are EMBL-EBI's knowledgebases for gene and protein expression and localisation in bulk and at single cell level. These resources aim to allow users to investigate their expression in normal tissue (baseline) or in response to perturbations such as disease or changes to genotype (differential) across multiple species. Users are invited to search for genes or metadata terms across species or biological conditions in a standardised consistent interface. Alongside these data, new features in Single Cell Expression Atlas allow users to query metadata through our new cell type wheel search. At the experiment level data can be explored through two types of dimensionality reduction plots, t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP), overlaid with either clustering or metadata information to assist users' understanding. Data are also visualised as marker gene heatmaps identifying genes that help confer cluster identity. For some data, additional visualisations are available as interactive cell level anatomograms and cell type gene expression heatmaps.

Assuntos

Bases de Dados Genéticas , Perfilação da Expressão Gênica , Proteômica , Genótipo , Metadados , Análise de Célula Única , Internet , Humanos , Animais

3.

Integrated meta-analysis of colorectal cancer public proteomic datasets for biomarker discovery and validation.

Robles, Javier; Prakash, Ananth; Vizcaíno, Juan Antonio; Casal, J Ignacio.

PLoS Comput Biol ; 20(1): e1011828, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38252632

RESUMO

The cancer biomarker field has been an object of thorough investigation in the last decades. Despite this, colorectal cancer (CRC) heterogeneity makes it challenging to identify and validate effective prognostic biomarkers for patient classification according to outcome and treatment response. Although a massive amount of proteomics data has been deposited in public data repositories, this rich source of information is vastly underused. Here, we attempted to reuse public proteomics datasets with two main objectives: i) to generate hypotheses (detection of biomarkers) for their posterior/downstream validation, and (ii) to validate, using an orthogonal approach, a previously described biomarker panel. Twelve CRC public proteomics datasets (mostly from the PRIDE database) were re-analysed and integrated to create a landscape of protein expression. Samples from both solid and liquid biopsies were included in the reanalysis. Integrating this data with survival annotation data, we have validated in silico a six-gene signature for CRC classification at the protein level, and identified five new blood-detectable biomarkers (CD14, PPIA, MRC2, PRDX1, and TXNDC5) associated with CRC prognosis. The prognostic value of these blood-derived proteins was confirmed using additional public datasets, supporting their potential clinical value. As a conclusion, this proof-of-the-concept study demonstrates the value of re-using public proteomics datasets as the basis to create a useful resource for biomarker discovery and validation. The protein expression data has been made available in the public resource Expression Atlas.

Assuntos

Neoplasias Colorretais , Proteômica , Humanos , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/genética , Neoplasias Colorretais/metabolismo , Biomarcadores Tumorais/metabolismo , Proteínas Sanguíneas , Isomerases de Dissulfetos de Proteínas

4.

EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022.

Thakur, Matthew; Bateman, Alex; Brooksbank, Cath; Freeberg, Mallory; Harrison, Melissa; Hartley, Matthew; Keane, Thomas; Kleywegt, Gerard; Leach, Andrew; Levchenko, Mariia; Morgan, Sarah; McDonagh, Ellen M; Orchard, Sandra; Papatheodorou, Irene; Velankar, Sameer; Vizcaino, Juan Antonio; Witham, Rick; Zdrazil, Barbara; McEntyre, Johanna.

Nucleic Acids Res ; 51(D1): D9-D17, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36477213

RESUMO

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.

Assuntos

Inteligência Artificial , Biologia Computacional , Gerenciamento de Dados , Bases de Dados Factuais , Genoma , Internet

5.

The ProteomeXchange consortium at 10 years: 2023 update.

Deutsch, Eric W; Bandeira, Nuno; Perez-Riverol, Yasset; Sharma, Vagisha; Carver, Jeremy J; Mendoza, Luis; Kundu, Deepti J; Wang, Shengbo; Bandla, Chakradhar; Kamatchinathan, Selvakumar; Hewapathirana, Suresh; Pullman, Benjamin S; Wertz, Julie; Sun, Zhi; Kawano, Shin; Okuda, Shujiro; Watanabe, Yu; MacLean, Brendan; MacCoss, Michael J; Zhu, Yunping; Ishihama, Yasushi; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 51(D1): D1539-D1548, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36370099

RESUMO

Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.

Assuntos

Proteômica , Software , Humanos , Bases de Dados de Proteínas , Espectrometria de Massas , Proteômica/métodos , Biologia Computacional/métodos

6.

TopDownApp: An open and modular platform for analysis and visualisation of top-down proteomics data.

Walzer, Mathias; Jeong, Kyowon; Tabb, David L; Vizcaíno, Juan Antonio.

Proteomics ; 24(3-4): e2200403, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37787899

RESUMO

Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise their adoption. In this context, there are numerous improvements that are possible in the area of open science practices, including a greater application of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. These include, for example, increased data sharing practices and readily available open data standards. Additionally, the field would benefit from the development of open data analysis workflows that can enable data reuse of public datasets, something that is increasingly common in other proteomics fields.

Assuntos

Proteínas , Proteômica , Proteômica/métodos , Proteínas/análise , Fluxo de Trabalho

7.

Open-source large language models in action: A bioinformatics chatbot for PRIDE database.

Bai, Jingwen; Kamatchinathan, Selvakumar; Kundu, Deepti J; Bandla, Chakradhar; Vizcaíno, Juan Antonio; Perez-Riverol, Yasset.

Proteomics ; : e2400005, 2024 Mar 31.

Artigo em Inglês | MEDLINE | ID: mdl-38556628

RESUMO

We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source (https://github.com/PRIDE-Archive/pride-chatbot).

8.

Integrated Proteomics Analysis of Baseline Protein Expression in Pig Tissues.

Wang, Shengbo; Collins, Andrew; Prakash, Ananth; Fexova, Silvie; Papatheodorou, Irene; Jones, Andrew R; Vizcaíno, Juan Antonio.

J Proteome Res ; 23(6): 1948-1959, 2024 Jun 07.

Artigo em Inglês | MEDLINE | ID: mdl-38717300

RESUMO

The availability of an increasingly large amount of public proteomics data sets presents an opportunity for performing combined analyses to generate comprehensive organism-wide protein expression maps across different organisms and biological conditions. Sus scrofa, a domestic pig, is a model organism relevant for food production and for human biomedical research. Here, we reanalyzed 14 public proteomics data sets from the PRIDE database coming from pig tissues to assess baseline (without any biological perturbation) protein abundance in 14 organs, encompassing a total of 20 healthy tissues from 128 samples. The analysis involved the quantification of protein abundance in 599 mass spectrometry runs. We compared protein expression patterns among different pig organs and examined the distribution of proteins across these organs. Then, we studied how protein abundances were compared across different data sets and studied the tissue specificity of the detected proteins. Of particular interest, we conducted a comparative analysis of protein expression between pig and human tissues, revealing a high degree of correlation in protein expression among orthologs, particularly in brain, kidney, heart, and liver samples. We have integrated the protein expression results into the Expression Atlas resource for easy access and visualization of the protein expression data individually or alongside gene expression data.

Assuntos

Rim , Proteômica , Animais , Proteômica/métodos , Humanos , Suínos , Rim/metabolismo , Rim/química , Especificidade de Órgãos , Fígado/metabolismo , Fígado/química , Bases de Dados de Proteínas , Encéfalo/metabolismo , Miocárdio/metabolismo , Miocárdio/química , Sus scrofa/metabolismo , Sus scrofa/genética , Proteoma/metabolismo , Proteoma/análise , Espectrometria de Massas

9.

Meta-Analysis of Rice Phosphoproteomics Data to Understand Variation in Cell Signaling Across the Rice Pan-Genome.

Ramsbottom, Kerry A; Prakash, Ananth; Perez-Riverol, Yasset; Camacho, Oscar Martin; Sun, Zhi; Kundu, Deepti J; Bowler-Barnett, Emily; Martin, Maria; Fan, Jun; Chebotarov, Dmytro; McNally, Kenneth L; Deutsch, Eric W; Vizcaíno, Juan Antonio; Jones, Andrew R.

J Proteome Res ; 23(7): 2518-2531, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38810119

RESUMO

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,565 phosphosites on serine, threonine, and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety and clustered the data to identify groups of sites with similar patterns across rice family groups. The data has been loaded into UniProt Knowledge-Baseâenabling researchers to visualize sites alongside other data on rice proteins, e.g., structural models from AlphaFold2, PeptideAtlas, and the PRIDE databaseâenabling visualization of source evidence, including scores and supporting mass spectra.

Assuntos

Genoma de Planta , Oryza , Fosfoproteínas , Proteínas de Plantas , Proteômica , Transdução de Sinais , Oryza/genética , Oryza/metabolismo , Oryza/química , Proteômica/métodos , Fosfoproteínas/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/química , Fosfoproteínas/análise , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fosforilação , Processamento de Proteína Pós-Traducional , Fosfopeptídeos/metabolismo , Fosfopeptídeos/análise , Bases de Dados de Proteínas , Motivos de Aminoácidos , Espectrometria de Massas

10.

WOMBAT-P: Benchmarking Label-Free Proteomics Data Analysis Workflows.

Bouyssié, David; Altiner, Pinar; Capella-Gutierrez, Salvador; Fernández, José M; Hagemeijer, Yanick Paco; Horvatovich, Peter; Hubálek, Martin; Levander, Fredrik; Mauri, Pierluigi; Palmblad, Magnus; Raffelsberger, Wolfgang; Rodríguez-Navas, Laura; Di Silvestre, Dario; Kunkli, Balázs Tibor; Uszkoreit, Julian; Vandenbrouck, Yves; Vizcaíno, Juan Antonio; Winkelhardt, Dirk; Schwämmle, Veit.

J Proteome Res ; 23(1): 418-429, 2024 01 05.

Artigo em Inglês | MEDLINE | ID: mdl-38038272

RESUMO

The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.

Assuntos

Benchmarking , Proteômica , Fluxo de Trabalho , Software , Proteínas , Análise de Dados

11.

Universal Spectrum Identifier for mass spectra.

Deutsch, Eric W; Perez-Riverol, Yasset; Carver, Jeremy; Kawano, Shin; Mendoza, Luis; Van Den Bossche, Tim; Gabriels, Ralf; Binz, Pierre-Alain; Pullman, Benjamin; Sun, Zhi; Shofstahl, Jim; Bittremieux, Wout; Mak, Tytus D; Klein, Joshua; Zhu, Yunping; Lam, Henry; Vizcaíno, Juan Antonio; Bandeira, Nuno.

Nat Methods ; 18(7): 768-770, 2021 07.

Artigo em Inglês | MEDLINE | ID: mdl-34183830

RESUMO

Mass spectra provide the ultimate evidence to support the findings of mass spectrometry proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USI enables greater transparency of spectral evidence, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.

Assuntos

Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Proteômica/métodos , Processamento de Sinais Assistido por Computador , Software , Algoritmos

12.

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.

Perez-Riverol, Yasset; Bai, Jingwen; Bandla, Chakradhar; García-Seisdedos, David; Hewapathirana, Suresh; Kamatchinathan, Selvakumar; Kundu, Deepti J; Prakash, Ananth; Frericks-Zipper, Anika; Eisenacher, Martin; Walzer, Mathias; Wang, Shengbo; Brazma, Alvis; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 50(D1): D543-D552, 2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34723319

RESUMO

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.

Assuntos

Bases de Dados de Proteínas , Metadados/estatística & dados numéricos , Anotação de Sequência Molecular/estatística & dados numéricos , Peptídeos/química , Proteínas/química , Software , Sequência de Aminoácidos , Bibliometria , Conjuntos de Dados como Assunto , Humanos , Armazenamento e Recuperação da Informação , Internet , Espectrometria de Massas , Peptídeos/genética , Peptídeos/metabolismo , Proteínas/genética , Proteínas/metabolismo , Proteômica/instrumentação , Proteômica/métodos , Alinhamento de Sequência

13.

Expression Atlas update: gene and protein expression in multiple species.

Moreno, Pablo; Fexova, Silvie; George, Nancy; Manning, Jonathan R; Miao, Zhichiao; Mohammed, Suhaib; Muñoz-Pomer, Alfonso; Fullgrabe, Anja; Bi, Yalan; Bush, Natassja; Iqbal, Haider; Kumbham, Upendra; Solovyev, Andrey; Zhao, Lingyun; Prakash, Ananth; García-Seisdedos, David; Kundu, Deepti J; Wang, Shengbo; Walzer, Mathias; Clarke, Laura; Osumi-Sutherland, David; Tello-Ruiz, Marcela Karey; Kumari, Sunita; Ware, Doreen; Eliasova, Jana; Arends, Mark J; Nawijn, Martijn C; Meyer, Kerstin; Burdett, Tony; Marioni, John; Teichmann, Sarah; Vizcaíno, Juan Antonio; Brazma, Alvis; Papatheodorou, Irene.

Nucleic Acids Res ; 50(D1): D129-D140, 2022 01 07.

Artigo em Inglês | MEDLINE | ID: mdl-34850121

RESUMO

The EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, developmental stage, cell type) and under which conditions (disease, treatment, gender, etc) a gene or protein of interest is expressed. Expression Atlas brings together data from >4500 expression studies from >65 different species, across different conditions and tissues. It makes these data freely available in an easy to visualise form, after expert curation to accurately represent the intended experimental design, re-analysed via standardised pipelines that rely on open-source community developed tools. Each study's metadata are annotated using ontologies. The data are re-analyzed with the aim of reproducing the original conclusions of the underlying experiments. Expression Atlas is currently divided into Bulk Expression Atlas and Single Cell Expression Atlas. Expression Atlas contains data from differential studies (microarray and bulk RNA-Seq) and baseline studies (bulk RNA-Seq and proteomics), whereas Single Cell Expression Atlas is currently dedicated to Single Cell RNA-Sequencing (scRNA-Seq) studies. The resource has been in continuous development since 2009 and it is available at https://www.ebi.ac.uk/gxa.

Assuntos

Bases de Dados Genéticas , Proteínas/genética , Proteômica , Software , Biologia Computacional , Perfilação da Expressão Gênica , Humanos , Proteínas/química , RNA-Seq , Análise de Sequência de RNA , Análise de Célula Única

14.

Is DIA proteomics data FAIR? Current data sharing practices, available bioinformatics infrastructure and recommendations for the future.

Jones, Andrew R; Deutsch, Eric W; Vizcaíno, Juan Antonio.

Proteomics ; 23(7-8): e2200014, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36074795

RESUMO

Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.

Assuntos

Proteoma , Proteômica , Proteômica/métodos , Espectrometria de Massas/métodos , Biologia Computacional/métodos

15.

Integrated View of Baseline Protein Expression in Human Tissues.

Prakash, Ananth; García-Seisdedos, David; Wang, Shengbo; Kundu, Deepti Jaiswal; Collins, Andrew; George, Nancy; Moreno, Pablo; Papatheodorou, Irene; Jones, Andrew R; Vizcaíno, Juan Antonio.

J Proteome Res ; 22(3): 729-742, 2023 03 03.

Artigo em Inglês | MEDLINE | ID: mdl-36577097

RESUMO

The availability of proteomics datasets in the public domain, and in the PRIDE database, in particular, has increased dramatically in recent years. This unprecedented large-scale availability of data provides an opportunity for combined analyses of datasets to get organism-wide protein abundance data in a consistent manner. We have reanalyzed 24 public proteomics datasets from healthy human individuals to assess baseline protein abundance in 31 organs. We defined tissue as a distinct functional or structural region within an organ. Overall, the aggregated dataset contains 67 healthy tissues, corresponding to 3,119 mass spectrometry runs covering 498 samples from 489 individuals. We compared protein abundances between different organs and studied the distribution of proteins across these organs. We also compared the results with data generated in analogous studies. Additionally, we performed gene ontology and pathway-enrichment analyses to identify organ-specific enriched biological processes and pathways. As a key point, we have integrated the protein abundance results into the resource Expression Atlas, where they can be accessed and visualized either individually or together with gene expression data coming from transcriptomics datasets. We believe this is a good mechanism to make proteomics data more accessible for life scientists.

Assuntos

Proteoma , Proteômica , Humanos , Proteoma/análise , Proteômica/métodos , Perfilação da Expressão Gênica , Bases de Dados Factuais , Espectrometria de Massas/métodos , Bases de Dados de Proteínas

16.

ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.

Rehfeldt, Tobias G; Gabriels, Ralf; Bouwmeester, Robbin; Gessulat, Siegfried; Neely, Benjamin A; Palmblad, Magnus; Perez-Riverol, Yasset; Schmidt, Tobias; Vizcaíno, Juan Antonio; Deutsch, Eric W.

J Proteome Res ; 22(2): 632-636, 2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36693629

RESUMO

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.

Assuntos

Algoritmos , Proteômica , Proteômica/métodos , Reprodutibilidade dos Testes , Peptídeos/análise , Espectrometria de Massas/métodos , Software

17.

Toward an Integrated Machine Learning Model of a Proteomics Experiment.

Neely, Benjamin A; Dorfer, Viktoria; Martens, Lennart; Bludau, Isabell; Bouwmeester, Robbin; Degroeve, Sven; Deutsch, Eric W; Gessulat, Siegfried; Käll, Lukas; Palczynski, Pawel; Payne, Samuel H; Rehfeldt, Tobias Greisager; Schmidt, Tobias; Schwämmle, Veit; Uszkoreit, Julian; Vizcaíno, Juan Antonio; Wilhelm, Mathias; Palmblad, Magnus.

J Proteome Res ; 22(3): 681-696, 2023 03 03.

Artigo em Inglês | MEDLINE | ID: mdl-36744821

RESUMO

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.

Assuntos

Aprendizado de Máquina , Proteômica , Proteômica/métodos , Algoritmos , Espectrometria de Massas

18.

Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work.

Deutsch, Eric W; Vizcaíno, Juan Antonio; Jones, Andrew R; Binz, Pierre-Alain; Lam, Henry; Klein, Joshua; Bittremieux, Wout; Perez-Riverol, Yasset; Tabb, David L; Walzer, Mathias; Ricard-Blum, Sylvie; Hermjakob, Henning; Neumann, Steffen; Mak, Tytus D; Kawano, Shin; Mendoza, Luis; Van Den Bossche, Tim; Gabriels, Ralf; Bandeira, Nuno; Carver, Jeremy; Pullman, Benjamin; Sun, Zhi; Hoffmann, Nils; Shofstahl, Jim; Zhu, Yunping; Licata, Luana; Quaglia, Federica; Tosatto, Silvio C E; Orchard, Sandra E.

J Proteome Res ; 22(2): 287-301, 2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36626722

RESUMO

The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.

Assuntos

Proteoma , Proteômica , Humanos , Padrões de Referência , Vocabulário Controlado , Espectrometria de Massas , Bases de Dados de Proteínas

19.

Integrated view and comparative analysis of baseline protein expression in mouse and rat tissues.

Wang, Shengbo; García-Seisdedos, David; Prakash, Ananth; Kundu, Deepti Jaiswal; Collins, Andrew; George, Nancy; Fexova, Silvie; Moreno, Pablo; Papatheodorou, Irene; Jones, Andrew R; Vizcaíno, Juan Antonio.

PLoS Comput Biol ; 18(6): e1010174, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35714157

RESUMO

The increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respectively. In all cases, we studied the distribution of canonical proteins between the different organs. The number of canonical proteins per dataset ranged from 273 (tendon) and 9,715 (liver) in mouse, and from 101 (tendon) and 6,130 (kidney) in rat. Then, we studied how protein abundances compared across different datasets and organs for both species. As a key point we carried out a comparative analysis of protein expression between mouse, rat and human tissues. We observed a high level of correlation of protein expression among orthologs between all three species in brain, kidney, heart and liver samples, whereas the correlation of protein expression was generally slightly lower between organs within the same species. Protein expression results have been integrated into the resource Expression Atlas for widespread dissemination.

Assuntos

Proteínas , Proteômica , Animais , Encéfalo/metabolismo , Camundongos , Proteínas/metabolismo , Ratos

20.

Data Management of Sensitive Human Proteomics Data: Current Practices, Recommendations, and Perspectives for the Future.

Bandeira, Nuno; Deutsch, Eric W; Kohlbacher, Oliver; Martens, Lennart; Vizcaíno, Juan Antonio.

Mol Cell Proteomics ; 20: 100071, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33711481

RESUMO

Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories. However, given the increase in the number of clinical proteomics studies, an important emerging topic is the management and dissemination of clinical, and thus potentially sensitive, human proteomics data. Both in the United States and in the European Union, there are legal frameworks protecting the privacy of individuals. Implementing privacy standards for publicly released research data in genomics and transcriptomics has led to processes to control who may access the data, so-called "controlled access" data. In parallel with the technological developments in the field, it is clear that the privacy risks of sharing proteomics data need to be properly assessed and managed. In our view, the proteomics community must be proactive in addressing these issues. Yet a careful balance must be kept. On the one hand, neglecting to address the potential of identifiability in human proteomics data could lead to reputational damage of the field, while on the other hand, erecting barriers to open access to clinical proteomics data will inevitably reduce reuse of proteomics data and could substantially delay critical discoveries in biomedical research. In order to balance these apparently conflicting requirements for data privacy and efficient use and reuse of research efforts through the sharing of clinical proteomics data, development efforts will be needed at different levels including bioinformatics infrastructure, policymaking, and mechanisms of oversight.

Assuntos

Gerenciamento de Dados , Proteômica , Confidencialidade , Humanos , Disseminação de Informação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa