Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 179
Filtrar
1.
Cell ; 166(3): 766-778, 2016 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-27453469

RESUMEN

The ability to reliably and reproducibly measure any protein of the human proteome in any tissue or cell type would be transformative for understanding systems-level properties as well as specific pathways in physiology and disease. Here, we describe the generation and verification of a compendium of highly specific assays that enable quantification of 99.7% of the 20,277 annotated human proteins by the widely accessible, sensitive, and robust targeted mass spectrometric method selected reaction monitoring, SRM. This human SRMAtlas provides definitive coordinates that conclusively identify the respective peptide in biological samples. We report data on 166,174 proteotypic peptides providing multiple, independent assays to quantify any human protein and numerous spliced variants, non-synonymous mutations, and post-translational modifications. The data are freely accessible as a resource at http://www.srmatlas.org/, and we demonstrate its utility by examining the network response to inhibition of cholesterol synthesis in liver cells and to docetaxel in prostate cancer lines.


Asunto(s)
Bases de Datos de Proteínas , Proteoma , Acceso a la Información , Antineoplásicos/uso terapéutico , Línea Celular Tumoral , Colesterol/biosíntesis , Docetaxel , Femenino , Humanos , Internet , Hígado/efectos de los fármacos , Masculino , Mutación , Neoplasias de la Próstata/tratamiento farmacológico , Empalme del ARN , Taxoides/uso terapéutico
2.
Mol Cell ; 83(6): 994-1011.e18, 2023 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-36806354

RESUMEN

All species continuously evolve short open reading frames (sORFs) that can be templated for protein synthesis and may provide raw materials for evolutionary adaptation. We analyzed the evolutionary origins of 7,264 recently cataloged human sORFs and found that most were evolutionarily young and had emerged de novo. We additionally identified 221 previously missed sORFs potentially translated into peptides of up to 15 amino acids-all of which are smaller than the smallest human microprotein annotated to date. To investigate the bioactivity of sORF-encoded small peptides and young microproteins, we subjected 266 candidates to a mass-spectrometry-based interactome screen with motif resolution. Based on these interactomes and additional cellular assays, we can associate several candidates with mRNA splicing, translational regulation, and endocytosis. Our work provides insights into the evolutionary origins and interaction potential of young and small proteins, thereby helping to elucidate this underexplored territory of the human proteome.


Asunto(s)
Péptidos , Biosíntesis de Proteínas , Humanos , Sistemas de Lectura Abierta , Péptidos/genética , Proteómica , Micropéptidos
3.
Plant Physiol ; 194(3): 1411-1430, 2024 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-37879112

RESUMEN

Arabidopsis (Arabidopsis thaliana) ecotype Col-0 has plastid and mitochondrial genomes encoding over 100 proteins. Public databases (e.g. Araport11) have redundancy and discrepancies in gene identifiers for these organelle-encoded proteins. RNA editing results in changes to specific amino acid residues or creation of start and stop codons for many of these proteins, but the impact of RNA editing at the protein level is largely unexplored due to the complexities of detection. Here, we assembled the nonredundant set of identifiers, their correct protein sequences, and 452 predicted nonsynonymous editing sites of which 56 are edited at lower frequency. We then determined accumulation of edited and/or unedited proteoforms by searching ∼259 million raw tandem MS spectra from ProteomeXchange, which is part of PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/). We identified all mitochondrial proteins and all except 3 plastid-encoded proteins (NdhG/Ndh6, PsbM, and Rps16), but no proteins predicted from the 4 ORFs were identified. We suggest that Rps16 and 3 of the ORFs are pseudogenes. Detection frequencies for each edit site and type of edit (e.g. S to L/F) were determined at the protein level, cross-referenced against the metadata (e.g. tissue), and evaluated for technical detection challenges. We detected 167 predicted edit sites at the proteome level. Minor frequency sites were edited at low frequency at the protein level except for cytochrome C biogenesis 382 at residue 124 (Ccb382-124). Major frequency sites (>50% editing of RNA) only accumulated in edited form (>98% to 100% edited) at the protein level, with the exception of Rpl5-22. We conclude that RNA editing for major editing sites is required for stable protein accumulation.


Asunto(s)
Proteínas de Arabidopsis , Arabidopsis , Arabidopsis/genética , Arabidopsis/metabolismo , Proteoma/genética , Proteoma/metabolismo , Plastidios/genética , Plastidios/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Mitocondrias/genética , Mitocondrias/metabolismo
4.
Mol Cell Proteomics ; 22(9): 100631, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37572790

RESUMEN

Ribosome profiling (Ribo-Seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of noncanonical sites of ribosome translation outside the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7000 noncanonical ORFs are translated, which, at first glance, has the potential to expand the number of human protein CDSs by 30%, from ∼19,500 annotated CDSs to over 26,000 annotated CDSs. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of noncanonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome but searching for guidance on how to proceed. Here, we discuss the current state of noncanonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein coding."


Asunto(s)
Biosíntesis de Proteínas , Proteoma , Humanos , Proteoma/metabolismo , Proteómica/métodos , Perfilado de Ribosomas , Ribosomas/metabolismo , Sistemas de Lectura Abierta
5.
Nucleic Acids Res ; 51(D1): D1539-D1548, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36370099

RESUMEN

Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.


Asunto(s)
Proteómica , Programas Informáticos , Humanos , Bases de Datos de Proteínas , Espectrometría de Masas , Proteómica/métodos , Biología Computacional/métodos
6.
J Proteome Res ; 23(4): 1519-1530, 2024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38538550

RESUMEN

Most tandem mass spectrometry fragmentation spectra have small calibration errors that can lead to suboptimal interpretation and annotation. We developed SpectiCal, a software tool that can read mzML files from data-dependent acquisition proteomics experiments in parallel, compute m/z calibrations for each file prior to identification analysis based on known low-mass ions, and produce information about frequently observed peaks and their explanations. Using calibration coefficients, the data can be corrected to generate new calibrated mzML files. SpectiCal was tested using five public data sets, creating a table of commonly observed low-mass ions and their identifications. Information about the calibration and individual peaks is written in PDF and TSV files. This includes information for each peak, such as the number of runs in which it appears, the percentage of spectra in which it appears, and a plot of the aggregated region surrounding each peak. SpectiCal can be used to compute MS run calibrations, examine MS runs for artifacts that might hinder downstream analysis, and generate tables of detected low-mass ions for further analysis. SpectiCal is freely available at https://github.com/PlantProteomes/SpectiCal.


Asunto(s)
Péptidos , Programas Informáticos , Calibración , Péptidos/análisis , Espectrometría de Masas en Tándem/métodos , Iones
7.
J Proteome Res ; 23(1): 185-214, 2024 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-38104260

RESUMEN

This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource (build 2023-10) providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected post-translational modifications (PTMs), and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying ∼0.6 million unique peptides and 18,267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins, and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome: the "dark" proteome. This dark proteome is highly enriched for E3 ligases, transcription factors, and for certain (e.g., CLE, IDA, PSY) but not other (e.g., THIONIN, CAP) signaling peptides families. A machine learning model trained on RNA expression data and protein properties predicts the probability that proteins will be detected. The model aids in discovery of proteins with short half-life (e.g., SIG1,3 and ERF-VII TFs) and for developing strategies to identify the missing proteins. PeptideAtlas is linked to TAIR, tracks in JBrowse, and several other community proteomics resources.


Asunto(s)
Arabidopsis , Humanos , Arabidopsis/genética , Arabidopsis/metabolismo , Proteoma/análisis , Espectrometría de Masas en Tándem/métodos , Procesamiento Proteico-Postraduccional , Péptidos/análisis , Bases de Datos de Proteínas
8.
J Proteome Res ; 23(7): 2518-2531, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38810119

RESUMEN

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,565 phosphosites on serine, threonine, and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety and clustered the data to identify groups of sites with similar patterns across rice family groups. The data has been loaded into UniProt Knowledge-Base─enabling researchers to visualize sites alongside other data on rice proteins, e.g., structural models from AlphaFold2, PeptideAtlas, and the PRIDE database─enabling visualization of source evidence, including scores and supporting mass spectra.


Asunto(s)
Genoma de Planta , Oryza , Fosfoproteínas , Proteínas de Plantas , Proteómica , Transducción de Señal , Oryza/genética , Oryza/metabolismo , Oryza/química , Proteómica/métodos , Fosfoproteínas/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/química , Fosfoproteínas/análisis , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fosforilación , Procesamiento Proteico-Postraduccional , Fosfopéptidos/metabolismo , Fosfopéptidos/análisis , Bases de Datos de Proteínas , Secuencias de Aminoácidos , Espectrometría de Masas
9.
J Proteome Res ; 23(2): 532-549, 2024 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-38232391

RESUMEN

Since 2010, the Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify the protein parts list and (2) to make proteomics an integral part of multiomics studies of human health and disease. The HPP relies on international collaboration, data sharing, standardized reanalysis of MS data sets by PeptideAtlas and MassIVE-KB using HPP Guidelines for quality assurance, integration and curation of MS and non-MS protein data by neXtProt, plus extensive use of antibody profiling carried out by the Human Protein Atlas. According to the neXtProt release 2023-04-18, protein expression has now been credibly detected (PE1) for 18,397 of the 19,778 neXtProt predicted proteins coded in the human genome (93%). Of these PE1 proteins, 17,453 were detected with mass spectrometry (MS) in accordance with HPP Guidelines and 944 by a variety of non-MS methods. The number of neXtProt PE2, PE3, and PE4 missing proteins now stands at 1381. Achieving the unambiguous identification of 93% of predicted proteins encoded from across all chromosomes represents remarkable experimental progress on the Human Proteome parts list. Meanwhile, there are several categories of predicted proteins that have proved resistant to detection regardless of protein-based methods used. Additionally there are some PE1-4 proteins that probably should be reclassified to PE5, specifically 21 LINC entries and ∼30 HERV entries; these are being addressed in the present year. Applying proteomics in a wide array of biological and clinical studies ensures integration with other omics platforms as reported by the Biology and Disease-driven HPP teams and the antibody and pathology resource pillars. Current progress has positioned the HPP to transition to its Grand Challenge Project focused on determining the primary function(s) of every protein itself and in networks and pathways within the context of human health and disease.


Asunto(s)
Anticuerpos , Proteoma , Humanos , Proteoma/genética , Proteoma/análisis , Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Proteómica/métodos
10.
Nat Methods ; 18(7): 768-770, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34183830

RESUMEN

Mass spectra provide the ultimate evidence to support the findings of mass spectrometry proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USI enables greater transparency of spectral evidence, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.


Asunto(s)
Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Proteómica/métodos , Procesamiento de Señales Asistido por Computador , Programas Informáticos , Algoritmos
11.
Bioinformatics ; 39(3)2023 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-36752514

RESUMEN

MOTIVATION: With the rapidly growing volume of knowledge and data in biomedical databases, improved methods for knowledge-graph-based computational reasoning are needed in order to answer translational questions. Previous efforts to solve such challenging computational reasoning problems have contributed tools and approaches, but progress has been hindered by the lack of an expressive analysis workflow language for translational reasoning and by the lack of a reasoning engine-supporting that language-that federates semantically integrated knowledge-bases. RESULTS: We introduce ARAX, a new reasoning system for translational biomedicine that provides a web browser user interface and an application programming interface (API). ARAX enables users to encode translational biomedical questions and to integrate knowledge across sources to answer the user's query and facilitate exploration of results. For ARAX, we developed new approaches to query planning, knowledge-gathering, reasoning and result ranking and dynamically integrate knowledge providers for answering biomedical questions. To illustrate ARAX's application and utility in specific disease contexts, we present several use-case examples. AVAILABILITY AND IMPLEMENTATION: The source code and technical documentation for building the ARAX server-side software and its built-in knowledge database are freely available online (https://github.com/RTXteam/RTX). We provide a hosted ARAX service with a web browser interface at arax.rtx.ai and a web API endpoint at arax.rtx.ai/api/arax/v1.3/ui/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bases del Conocimiento , Programas Informáticos , Bases de Datos Factuales , Lenguaje , Navegador Web
12.
Plant Cell ; 33(11): 3421-3453, 2021 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-34411258

RESUMEN

We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of ∼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying ∼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.


Asunto(s)
Arabidopsis/genética , Péptidos/análisis , Proteínas de Plantas/análisis , Proteómica
13.
Proteomics ; 23(7-8): e2200014, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36074795

RESUMEN

Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.


Asunto(s)
Proteoma , Proteómica , Proteómica/métodos , Espectrometría de Masas/métodos , Biología Computacional/métodos
14.
J Proteome Res ; 22(6): 2079-2091, 2023 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-37092802

RESUMEN

A recent paper in Science Advances by Sun et al. claims that intra-chloroplast proteins in the model plant Arabidopsis can be polyubiquitinated and then extracted into the cytosol for subsequent degradation by the proteasome. Most of this conclusion hinges on several sets of mass spectrometry (MS) data. If the proposed results and conclusion are true, this would be a major change in the proteolysis/proteostasis field, breaking the long-standing dogma that there are no polyubiquitination mechanisms within chloroplast organelles (nor in mitochondria). Given its importance, we reanalyzed their raw MS data using both open and closed sequence database searches and encountered many issues not only with the results but also discrepancies between stated methods (e.g., use of alkylating agent iodoacetamide (IAA)) and observed mass modifications. Although there is likely enrichment of ubiquitination signatures in a subset of the data (probably from ubiquitination in the cytosol), we show that runaway alkylation with IAA caused extensive artifactual modifications of N termini and lysines to the point that a large fraction of the desired ubiquitination signatures is indistinguishable from artifactual acetamide signatures, and thus, no intra-chloroplast polyubiquitination conclusions can be drawn from these data. We provide recommendations on how to avoid such perils in future work.


Asunto(s)
Arabidopsis , Cloroplastos , Ubiquitinación , Proteolisis , Cloroplastos/metabolismo , Complejo de la Endopetidasa Proteasomal/metabolismo , Arabidopsis/metabolismo , Espectrometría de Masas
15.
J Proteome Res ; 22(2): 615-624, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36648445

RESUMEN

The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.


Asunto(s)
Proteómica , Programas Informáticos , Proteómica/métodos , Espectrometría de Masas , Probabilidad , Análisis de Datos
16.
J Proteome Res ; 22(2): 632-636, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36693629

RESUMEN

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.


Asunto(s)
Algoritmos , Proteómica , Proteómica/métodos , Reproducibilidad de los Resultados , Péptidos/análisis , Espectrometría de Masas/métodos , Programas Informáticos
17.
J Proteome Res ; 22(4): 1024-1042, 2023 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-36318223

RESUMEN

The 2022 Metrics of the Human Proteome from the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 407 (93.2%) of the 19 750 predicted proteins coded in the human genome, a net gain of 50 since 2021 from data sets generated around the world and reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 78 from 1421 to 1343. This represents continuing experimental progress on the human proteome parts list across all the chromosomes, as well as significant reclassifications. Meanwhile, applying proteomics in a vast array of biological and clinical studies continues to yield significant findings and growing integration with other omics platforms. We present highlights from the Chromosome-Centric HPP, Biology and Disease-driven HPP, and HPP Resource Pillars, compare features of mass spectrometry and Olink and Somalogic platforms, note the emergence of translation products from ribosome profiling of small open reading frames, and discuss the launch of the initial HPP Grand Challenge Project, "A Function for Each Protein".


Asunto(s)
Proteoma , Proteómica , Humanos , Proteoma/genética , Proteoma/análisis , Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Sistemas de Lectura Abierta , Proteómica/métodos
18.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-36744821

RESUMEN

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Asunto(s)
Aprendizaje Automático , Proteómica , Proteómica/métodos , Algoritmos , Espectrometría de Masas
19.
J Proteome Res ; 22(2): 287-301, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36626722

RESUMEN

The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.


Asunto(s)
Proteoma , Proteómica , Humanos , Estándares de Referencia , Vocabulario Controlado , Espectrometría de Masas , Bases de Datos de Proteínas
20.
Mol Cell Proteomics ; 20: 100071, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33711481

RESUMEN

Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories. However, given the increase in the number of clinical proteomics studies, an important emerging topic is the management and dissemination of clinical, and thus potentially sensitive, human proteomics data. Both in the United States and in the European Union, there are legal frameworks protecting the privacy of individuals. Implementing privacy standards for publicly released research data in genomics and transcriptomics has led to processes to control who may access the data, so-called "controlled access" data. In parallel with the technological developments in the field, it is clear that the privacy risks of sharing proteomics data need to be properly assessed and managed. In our view, the proteomics community must be proactive in addressing these issues. Yet a careful balance must be kept. On the one hand, neglecting to address the potential of identifiability in human proteomics data could lead to reputational damage of the field, while on the other hand, erecting barriers to open access to clinical proteomics data will inevitably reduce reuse of proteomics data and could substantially delay critical discoveries in biomedical research. In order to balance these apparently conflicting requirements for data privacy and efficient use and reuse of research efforts through the sharing of clinical proteomics data, development efforts will be needed at different levels including bioinformatics infrastructure, policymaking, and mechanisms of oversight.


Asunto(s)
Manejo de Datos , Proteómica , Confidencialidad , Humanos , Difusión de la Información
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda