Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 130
Filtrar
1.
J Proteome Res ; 2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38717300

RESUMEN

The availability of an increasingly large amount of public proteomics data sets presents an opportunity for performing combined analyses to generate comprehensive organism-wide protein expression maps across different organisms and biological conditions. Sus scrofa, a domestic pig, is a model organism relevant for food production and for human biomedical research. Here, we reanalyzed 14 public proteomics data sets from the PRIDE database coming from pig tissues to assess baseline (without any biological perturbation) protein abundance in 14 organs, encompassing a total of 20 healthy tissues from 128 samples. The analysis involved the quantification of protein abundance in 599 mass spectrometry runs. We compared protein expression patterns among different pig organs and examined the distribution of proteins across these organs. Then, we studied how protein abundances were compared across different data sets and studied the tissue specificity of the detected proteins. Of particular interest, we conducted a comparative analysis of protein expression between pig and human tissues, revealing a high degree of correlation in protein expression among orthologs, particularly in brain, kidney, heart, and liver samples. We have integrated the protein expression results into the Expression Atlas resource for easy access and visualization of the protein expression data individually or alongside gene expression data.

2.
Proteomics ; : e2400005, 2024 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-38556628

RESUMEN

We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source (https://github.com/PRIDE-Archive/pride-chatbot).

3.
PLoS Comput Biol ; 20(1): e1011828, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38252632

RESUMEN

The cancer biomarker field has been an object of thorough investigation in the last decades. Despite this, colorectal cancer (CRC) heterogeneity makes it challenging to identify and validate effective prognostic biomarkers for patient classification according to outcome and treatment response. Although a massive amount of proteomics data has been deposited in public data repositories, this rich source of information is vastly underused. Here, we attempted to reuse public proteomics datasets with two main objectives: i) to generate hypotheses (detection of biomarkers) for their posterior/downstream validation, and (ii) to validate, using an orthogonal approach, a previously described biomarker panel. Twelve CRC public proteomics datasets (mostly from the PRIDE database) were re-analysed and integrated to create a landscape of protein expression. Samples from both solid and liquid biopsies were included in the reanalysis. Integrating this data with survival annotation data, we have validated in silico a six-gene signature for CRC classification at the protein level, and identified five new blood-detectable biomarkers (CD14, PPIA, MRC2, PRDX1, and TXNDC5) associated with CRC prognosis. The prognostic value of these blood-derived proteins was confirmed using additional public datasets, supporting their potential clinical value. As a conclusion, this proof-of-the-concept study demonstrates the value of re-using public proteomics datasets as the basis to create a useful resource for biomarker discovery and validation. The protein expression data has been made available in the public resource Expression Atlas.


Asunto(s)
Neoplasias Colorrectales , Proteómica , Humanos , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/metabolismo , Biomarcadores de Tumor/metabolismo , Proteínas Sanguíneas , Proteína Disulfuro Isomerasas
4.
Nucleic Acids Res ; 52(D1): D10-D17, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-38015445

RESUMEN

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the latest developments in the services provided by EMBL-EBI data resources to scientific communities globally. These developments aim to ensure EMBL-EBI resources meet the current and future needs of these scientific communities, accelerating the impact of open biological data for all.


Asunto(s)
Academias e Institutos , Biología Computacional , Biología Computacional/organización & administración , Biología Computacional/tendencias , Academias e Institutos/organización & administración , Academias e Institutos/tendencias , Bases de Datos de Ácidos Nucleicos , Europa (Continente)
5.
Proteomics ; 24(3-4): e2200403, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37787899

RESUMEN

Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise their adoption. In this context, there are numerous improvements that are possible in the area of open science practices, including a greater application of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. These include, for example, increased data sharing practices and readily available open data standards. Additionally, the field would benefit from the development of open data analysis workflows that can enable data reuse of public datasets, something that is increasingly common in other proteomics fields.


Asunto(s)
Proteínas , Proteómica , Proteómica/métodos , Proteínas/análisis , Flujo de Trabajo
6.
J Proteome Res ; 23(1): 418-429, 2024 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-38038272

RESUMEN

The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.


Asunto(s)
Benchmarking , Proteómica , Flujo de Trabajo , Programas Informáticos , Proteínas , Análisis de Datos
7.
Nucleic Acids Res ; 52(D1): D107-D114, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37992296

RESUMEN

Expression Atlas (www.ebi.ac.uk/gxa) and its newest counterpart the Single Cell Expression Atlas (www.ebi.ac.uk/gxa/sc) are EMBL-EBI's knowledgebases for gene and protein expression and localisation in bulk and at single cell level. These resources aim to allow users to investigate their expression in normal tissue (baseline) or in response to perturbations such as disease or changes to genotype (differential) across multiple species. Users are invited to search for genes or metadata terms across species or biological conditions in a standardised consistent interface. Alongside these data, new features in Single Cell Expression Atlas allow users to query metadata through our new cell type wheel search. At the experiment level data can be explored through two types of dimensionality reduction plots, t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP), overlaid with either clustering or metadata information to assist users' understanding. Data are also visualised as marker gene heatmaps identifying genes that help confer cluster identity. For some data, additional visualisations are available as interactive cell level anatomograms and cell type gene expression heatmaps.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Proteómica , Genotipo , Metadatos , Análisis de la Célula Individual , Internet , Humanos , Animales
8.
bioRxiv ; 2023 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-38014076

RESUMEN

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have re-analysed publicly available mass spectrometry proteomics datasets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,522 phosphosites on serine, threonine and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety. The data was clustered to identify groups of sites with similar patterns across rice family groups, for example those highly conserved in Japonica, but mostly absent in Aus type rice varieties - known to have different responses to drought. These resources can assist rice researchers to discover alleles with significantly different functional effects across rice varieties. The data has been loaded into UniProt Knowledge-Base - enabling researchers to visualise sites alongside other data on rice proteins e.g. structural models from AlphaFold2, PeptideAtlas and the PRIDE database - enabling visualisation of source evidence, including scores and supporting mass spectra.

9.
Nat Commun ; 14(1): 6743, 2023 10 24.
Artículo en Inglés | MEDLINE | ID: mdl-37875519

RESUMEN

Public proteomics data often lack essential metadata, limiting its potential. To address this, we present lesSDRF, a tool to simplify the process of metadata annotation, thereby ensuring that data leave a lasting, impactful legacy well beyond its initial publication.


Asunto(s)
Metadatos , Proteómica
10.
Open Res Eur ; 3: 59, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37645494

RESUMEN

With the advent of robust and high-throughput mass spectrometric technologies and bioinformatics tools to analyze large data sets, proteomics has penetrated broadly into basic and translational life sciences research. More than 95% of FDA-approved drugs currently target proteins, and most diagnostic tests are protein-based. The introduction of proteomics to the clinic, for instance to guide patient stratification and treatment, is already ongoing. Importantly, ethical challenges come with this success, which must also be adequately addressed by the proteomics and medical communities. Consortium members of the H2020 European Union-funded proteomics initiative: European Proteomics Infrastructure Consortium-providing access (EPIC-XS) met at the Core Technologies for Life Sciences (CTLS) conference to discuss the emerging role and implementation of proteomics in the clinic. The discussion, involving leaders in the field, focused on the current status, related challenges, and future efforts required to make proteomics a more mainstream technology for translational and clinical research. Here we report on that discussion and provide an expert update concerning the feasibility of clinical proteomics, the ethical implications of generating and analyzing large-scale proteomics clinical data, and recommendations to ensure both ethical and effective implementation in real-world applications.

11.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-36744821

RESUMEN

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Asunto(s)
Aprendizaje Automático , Proteómica , Proteómica/métodos , Algoritmos , Espectrometría de Masas
12.
J Proteome Res ; 22(2): 287-301, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36626722

RESUMEN

The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.


Asunto(s)
Proteoma , Proteómica , Humanos , Estándares de Referencia , Vocabulario Controlado , Espectrometría de Masas , Bases de Datos de Proteínas
13.
J Proteome Res ; 22(2): 632-636, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36693629

RESUMEN

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.


Asunto(s)
Algoritmos , Proteómica , Proteómica/métodos , Reproducibilidad de los Resultados , Péptidos/análisis , Espectrometría de Masas/métodos , Programas Informáticos
14.
Nucleic Acids Res ; 51(D1): D1539-D1548, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36370099

RESUMEN

Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.


Asunto(s)
Proteómica , Programas Informáticos , Humanos , Bases de Datos de Proteínas , Espectrometría de Masas , Proteómica/métodos , Biología Computacional/métodos
15.
J Proteome Res ; 22(3): 729-742, 2023 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-36577097

RESUMEN

The availability of proteomics datasets in the public domain, and in the PRIDE database, in particular, has increased dramatically in recent years. This unprecedented large-scale availability of data provides an opportunity for combined analyses of datasets to get organism-wide protein abundance data in a consistent manner. We have reanalyzed 24 public proteomics datasets from healthy human individuals to assess baseline protein abundance in 31 organs. We defined tissue as a distinct functional or structural region within an organ. Overall, the aggregated dataset contains 67 healthy tissues, corresponding to 3,119 mass spectrometry runs covering 498 samples from 489 individuals. We compared protein abundances between different organs and studied the distribution of proteins across these organs. We also compared the results with data generated in analogous studies. Additionally, we performed gene ontology and pathway-enrichment analyses to identify organ-specific enriched biological processes and pathways. As a key point, we have integrated the protein abundance results into the resource Expression Atlas, where they can be accessed and visualized either individually or together with gene expression data coming from transcriptomics datasets. We believe this is a good mechanism to make proteomics data more accessible for life scientists.


Asunto(s)
Proteoma , Proteómica , Humanos , Proteoma/análisis , Proteómica/métodos , Perfilación de la Expresión Génica , Bases de Datos Factuales , Espectrometría de Masas/métodos , Bases de Datos de Proteínas
16.
Nucleic Acids Res ; 51(D1): D9-D17, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36477213

RESUMEN

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.


Asunto(s)
Inteligencia Artificial , Biología Computacional , Manejo de Datos , Bases de Datos Factuales , Genoma , Internet
17.
Proteomics ; 23(7-8): e2200014, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36074795

RESUMEN

Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.


Asunto(s)
Proteoma , Proteómica , Proteómica/métodos , Espectrometría de Masas/métodos , Biología Computacional/métodos
18.
Front Mol Biosci ; 9: 1062031, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36523653

RESUMEN

Multi-omics approaches including proteomics analyses are becoming an integral component of precision medicine. As clinical proteomics studies gain momentum and their sensitivity increases, research on identifying individuals based on their proteomics data is here examined for risks and ethics-related issues. A great deal of work has already been done on this topic for DNA/RNA sequencing data, but it has yet to be widely studied in other omics fields. The current state-of-the-art for the identification of individuals based solely on proteomics data is explained. Protein sequence variation analysis approaches are covered in more detail, including the available analysis workflows and their limitations. We also outline some previous forensic and omics proteomics studies that are relevant for the identification of individuals. Following that, we discuss the risks of patient reidentification using other proteomics data types such as protein expression abundance and post-translational modification (PTM) profiles. In light of the potential identification of individuals through proteomics data, possible legal and ethical implications are becoming increasingly important in the field.

19.
Sci Data ; 9(1): 335, 2022 06 14.
Artículo en Inglés | MEDLINE | ID: mdl-35701420

RESUMEN

The number of mass spectrometry (MS)-based proteomics datasets in the public domain keeps increasing, particularly those generated by Data Independent Acquisition (DIA) approaches such as SWATH-MS. Unlike Data Dependent Acquisition datasets, the re-use of DIA datasets has been rather limited to date, despite its high potential, due to the technical challenges involved. We introduce a (re-)analysis pipeline for public SWATH-MS datasets which includes a combination of metadata annotation protocols, automated workflows for MS data analysis, statistical analysis, and the integration of the results into the Expression Atlas resource. Automation is orchestrated with Nextflow, using containerised open analysis software tools, rendering the pipeline readily available and reproducible. To demonstrate its utility, we reanalysed 10 public DIA datasets from the PRIDE database, comprising 1,278 SWATH-MS runs. The robustness of the analysis was evaluated, and the results compared to those obtained in the original publications. The final expression values were integrated into Expression Atlas, making SWATH-MS experiments more widely available and combining them with expression data originating from other proteomics and transcriptomics datasets.


Asunto(s)
Proteómica , Programas Informáticos , Análisis de Datos , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto , Espectrometría de Masas/métodos , Proteómica/métodos
20.
PLoS Comput Biol ; 18(6): e1010174, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35714157

RESUMEN

The increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respectively. In all cases, we studied the distribution of canonical proteins between the different organs. The number of canonical proteins per dataset ranged from 273 (tendon) and 9,715 (liver) in mouse, and from 101 (tendon) and 6,130 (kidney) in rat. Then, we studied how protein abundances compared across different datasets and organs for both species. As a key point we carried out a comparative analysis of protein expression between mouse, rat and human tissues. We observed a high level of correlation of protein expression among orthologs between all three species in brain, kidney, heart and liver samples, whereas the correlation of protein expression was generally slightly lower between organs within the same species. Protein expression results have been integrated into the resource Expression Atlas for widespread dissemination.


Asunto(s)
Proteínas , Proteómica , Animales , Encéfalo/metabolismo , Ratones , Proteínas/metabolismo , Ratas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...