Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 132
Filtrar
1.
Proteomics ; : e2300385, 2024 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-39001627

RESUMEN

The mzIdentML data format, originally developed by the Proteomics Standards Initiative in 2011, is the open XML data standard for peptide and protein identification results coming from mass spectrometry. We present mzIdentML version 1.3.0, which introduces new functionality and support for additional use cases. First of all, a new mechanism for encoding identifications based on multiple spectra has been introduced. Furthermore, the main mzIdentML specification document can now be supplemented by extension documents which provide further guidance for encoding specific use cases for different proteomics subfields. One extension document has been added, covering additional use cases for the encoding of crosslinked peptide identifications. The ability to add extension documents facilitates keeping the mzIdentML standard up to date with advances in the proteomics field, without having to change the main specification document. The crosslinking extension document provides further explanation of the crosslinking use cases already supported in mzIdentML version 1.2.0, and provides support for encoding additional scenarios that are critical to reflect developments in the crosslinking field and facilitate its integration in structural biology. These are: (i) support for cleavable crosslinkers, (ii) support for internally linked peptides, (iii) support for noncovalently associated peptides, and (iv) improved support for encoding scores and the corresponding thresholds.

2.
Methods Mol Biol ; 2809: 19-36, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38907888

RESUMEN

The allele frequency net database (AFND, http://www.allelefrequencies.net ) is an online web-based repository that contains information on the frequencies of immune-related genes and their corresponding alleles in worldwide human populations. At present, the website contains data from 1784 population samples in more than 14 million individuals from 129 countries on the frequency of genes from different polymorphic regions including data for the human leukocyte antigen (HLA) system. In addition, over the last four years, AFND has also incorporated genotype raw data from 85,000 individuals comprising 215 population samples from 39 countries. Moreover, more population data sets containing next generation sequencing data spanning >3 million individuals have been added. This resource has been widely used in a variety of contexts such as histocompatibility, immunology, epidemiology, pharmacogenetics, epitope prediction algorithms for population coverage in vaccine development, population genetics, among many others. In this chapter, we present an update of the most used searching mechanisms as described in a previous volume and some of the latest developments included in AFND.


Asunto(s)
Bases de Datos Genéticas , Frecuencia de los Genes , Genética de Población , Humanos , Genética de Población/métodos , Antígenos HLA/genética , Alelos , Biología Computacional/métodos , Internet , Navegador Web , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
3.
J Proteome Res ; 23(6): 1948-1959, 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38717300

RESUMEN

The availability of an increasingly large amount of public proteomics data sets presents an opportunity for performing combined analyses to generate comprehensive organism-wide protein expression maps across different organisms and biological conditions. Sus scrofa, a domestic pig, is a model organism relevant for food production and for human biomedical research. Here, we reanalyzed 14 public proteomics data sets from the PRIDE database coming from pig tissues to assess baseline (without any biological perturbation) protein abundance in 14 organs, encompassing a total of 20 healthy tissues from 128 samples. The analysis involved the quantification of protein abundance in 599 mass spectrometry runs. We compared protein expression patterns among different pig organs and examined the distribution of proteins across these organs. Then, we studied how protein abundances were compared across different data sets and studied the tissue specificity of the detected proteins. Of particular interest, we conducted a comparative analysis of protein expression between pig and human tissues, revealing a high degree of correlation in protein expression among orthologs, particularly in brain, kidney, heart, and liver samples. We have integrated the protein expression results into the Expression Atlas resource for easy access and visualization of the protein expression data individually or alongside gene expression data.


Asunto(s)
Riñón , Proteómica , Animales , Proteómica/métodos , Humanos , Porcinos , Riñón/metabolismo , Riñón/química , Especificidad de Órganos , Hígado/metabolismo , Hígado/química , Bases de Datos de Proteínas , Encéfalo/metabolismo , Miocardio/metabolismo , Miocardio/química , Sus scrofa/metabolismo , Sus scrofa/genética , Proteoma/metabolismo , Proteoma/análisis , Espectrometría de Masas
4.
J Proteome Res ; 23(7): 2518-2531, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38810119

RESUMEN

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,565 phosphosites on serine, threonine, and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety and clustered the data to identify groups of sites with similar patterns across rice family groups. The data has been loaded into UniProt Knowledge-Base─enabling researchers to visualize sites alongside other data on rice proteins, e.g., structural models from AlphaFold2, PeptideAtlas, and the PRIDE database─enabling visualization of source evidence, including scores and supporting mass spectra.


Asunto(s)
Genoma de Planta , Oryza , Fosfoproteínas , Proteínas de Plantas , Proteómica , Transducción de Señal , Oryza/genética , Oryza/metabolismo , Oryza/química , Proteómica/métodos , Fosfoproteínas/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/química , Fosfoproteínas/análisis , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fosforilación , Procesamiento Proteico-Postraduccional , Fosfopéptidos/metabolismo , Fosfopéptidos/análisis , Bases de Datos de Proteínas , Secuencias de Aminoácidos , Espectrometría de Masas
5.
Plants (Basel) ; 12(23)2023 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-38068604

RESUMEN

Cyanobacteria were among the oldest organisms to undertake oxygenic photosynthesis and have an essential impact on the atmosphere and carbon/nitrogen cycles on the planet. The thylakoid membrane of cyanobacteria represents an intricate compartment that houses a variety of multi-component (pigment-)protein complexes, assembly factors, and regulators, as well as transporters involved in photosynthetic light reactions, and respiratory electron transport. How these protein components are incorporated into membranes during thylakoid formation and how individual complexes are regulated to construct the functional machinery remains elusive. Here, we carried out an in-depth statistical analysis of the thylakoid proteome data obtained during light-induced thylakoid membrane biogenesis in the model cyanobacterium Synechococcus elongatus PCC 7942. A total of 1581 proteins were experimentally quantified, among which 457 proteins demonstrated statistically significant variations in abundance at distinct thylakoid biogenesis stages. Gene Ontology and KEGG enrichment analysis revealed that predominantly photosystems, light-harvesting antennae, ABC transporters, and pathway enzymes involved in oxidative stress responses and protein folding exhibited notable alternations in abundance between high light and growth light. Moreover, through cluster analysis the 1581 proteins were categorized into six distinct clusters that have significantly different trajectories of the change in their abundance during thylakoid development. Our study provides insights into the physiological regulation for the membrane integration of protein components and functionally linked complexes during the cyanobacterial TM biogenesis process. The findings and analytical methodologies developed in this study may be valuable for studying the global responses of TM biogenesis and photosynthetic acclimation in plants and algae.

6.
J Proteome Res ; 22(12): 3754-3772, 2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-37939282

RESUMEN

Protein tyrosine sulfation (sY) is a post-translational modification (PTM) catalyzed by Golgi-resident tyrosyl protein sulfo transferases (TPSTs). Information on sY in humans is currently limited to ∼50 proteins, with only a handful having verified sites of sulfation. As such, the contribution of sulfation to the regulation of biological processes remains poorly defined. Mass spectrometry (MS)-based proteomics is the method of choice for PTM analysis but has yet to be applied for systematic investigation of the "sulfome", primarily due to issues associated with discrimination of sY-containing from phosphotyrosine (pY)-containing peptides. In this study, we developed an MS-based workflow for sY-peptide characterization, incorporating optimized Zr4+ immobilized metal-ion affinity chromatography (IMAC) and TiO2 enrichment strategies. Extensive characterization of a panel of sY- and pY-peptides using an array of fragmentation regimes (CID, HCD, EThcD, ETciD, UVPD) highlighted differences in the generation of site-determining product ions and allowed us to develop a strategy for differentiating sulfated peptides from nominally isobaric phosphopeptides based on low collision energy-induced neutral loss. Application of our "sulfomics" workflow to a HEK-293 cell extracellular secretome facilitated identification of 21 new sulfotyrosine-containing proteins, several of which we validate enzymatically, and reveals new interplay between enzymes relevant to both protein and glycan sulfation.


Asunto(s)
Fosfopéptidos , Tirosina , Humanos , Fosfopéptidos/análisis , Células HEK293 , Flujo de Trabajo , Tirosina/metabolismo , Proteínas , Fosfotirosina
7.
bioRxiv ; 2023 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-38014076

RESUMEN

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have re-analysed publicly available mass spectrometry proteomics datasets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,522 phosphosites on serine, threonine and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety. The data was clustered to identify groups of sites with similar patterns across rice family groups, for example those highly conserved in Japonica, but mostly absent in Aus type rice varieties - known to have different responses to drought. These resources can assist rice researchers to discover alleles with significantly different functional effects across rice varieties. The data has been loaded into UniProt Knowledge-Base - enabling researchers to visualise sites alongside other data on rice proteins e.g. structural models from AlphaFold2, PeptideAtlas and the PRIDE database - enabling visualisation of source evidence, including scores and supporting mass spectra.

8.
Genome Biol ; 24(1): 223, 2023 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-37798615

RESUMEN

Crop pangenomes made from individual cultivar assemblies promise easy access to conserved genes, but genome content variability and inconsistent identifiers hamper their exploration. To address this, we define pangenes, which summarize a species coding potential and link back to original annotations. The protocol get_pangenes performs whole genome alignments (WGA) to call syntenic gene models based on coordinate overlaps. A benchmark with small and large plant genomes shows that pangenes recapitulate phylogeny-based orthologies and produce complete soft-core gene sets. Moreover, WGAs support lift-over and help confirm gene presence-absence variation. Source code and documentation: https://github.com/Ensembl/plant-scripts .


Asunto(s)
Genoma de Planta , Programas Informáticos
9.
Toxicol Sci ; 196(1): 112-125, 2023 10 30.
Artículo en Inglés | MEDLINE | ID: mdl-37647630

RESUMEN

To minimize the occurrence of unexpected toxicities in early phase preclinical studies of new drugs, it is vital to understand fundamental similarities and differences between preclinical species and humans. Species differences in sensitivity to acetaminophen (APAP) liver injury have been related to differences in the fraction of the drug that is bioactivated to the reactive metabolite N-acetyl-p-benzoquinoneimine (NAPQI). We have used physiologically based pharmacokinetic modeling to identify oral doses of APAP (300 and 1000 mg/kg in mice and rats, respectively) yielding similar hepatic burdens of NAPQI to enable the comparison of temporal liver tissue responses under conditions of equivalent chemical insult. Despite pharmacokinetic and biochemical verification of the equivalent NAPQI insult, serum biomarker and tissue histopathology analyses revealed that mice still exhibited a greater degree of liver injury than rats. Transcriptomic and proteomic analyses highlighted the stronger activation of stress response pathways (including the Nrf2 oxidative stress response and autophagy) in the livers of rats, indicative of a more robust transcriptional adaptation to the equivalent insult. Components of these pathways were also found to be expressed at a higher basal level in the livers of rats compared with both mice and humans. Our findings exemplify a systems approach to understanding differential species sensitivity to hepatotoxicity. Multiomics analysis indicated that rats possess a greater basal and adaptive capacity for hepatic stress responses than mice and humans, with important implications for species selection and human translation in the safety testing of new drug candidates associated with reactive metabolite formation.


Asunto(s)
Acetaminofén , Enfermedad Hepática Inducida por Sustancias y Drogas , Ratas , Ratones , Humanos , Animales , Acetaminofén/toxicidad , Acetaminofén/metabolismo , Proteómica , Especificidad de la Especie , Enfermedad Hepática Inducida por Sustancias y Drogas/metabolismo , Hígado/metabolismo , Estrés Oxidativo , Análisis de Sistemas
10.
J Proteome Res ; 22(6): 1828-1842, 2023 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-37099386

RESUMEN

Phosphorylation is a post-translational modification of great interest to researchers due to its relevance in many biological processes. LC-MS/MS techniques have enabled high-throughput data acquisition, with studies claiming identification and localization of thousands of phosphosites. The identification and localization of phosphosites emerge from different analytical pipelines and scoring algorithms, with uncertainty embedded throughout the pipeline. For many pipelines and algorithms, arbitrary thresholding is used, but little is known about the actual global false localization rate in these studies. Recently, it has been suggested to use decoy amino acids to estimate global false localization rates of phosphosites, among the peptide-spectrum matches reported. Here, we describe a simple pipeline aiming to maximize the information extracted from these studies by objectively collapsing from peptide-spectrum match to the peptidoform-site level, as well as combining findings from multiple studies while maintaining track of false localization rates. We show that the approach is more effective than current processes that use a simpler mechanism for handling phosphosite identification redundancy within and across studies. In our case study using eight rice phosphoproteomics data sets, 6368 unique sites were confidently identified using our decoy approach compared to 4687 using traditional thresholding in which false localization rates are unknown.


Asunto(s)
Proteómica , Ríos , Cromatografía Liquida , Proteómica/métodos , Espectrometría de Masas en Tándem , Procesamiento Proteico-Postraduccional , Péptidos/química , Algoritmos , Bases de Datos de Proteínas
11.
J Proteome Res ; 22(2): 287-301, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36626722

RESUMEN

The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.


Asunto(s)
Proteoma , Proteómica , Humanos , Estándares de Referencia , Vocabulario Controlado , Espectrometría de Masas , Bases de Datos de Proteínas
12.
PLoS Negl Trop Dis ; 17(1): e0011058, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36656904

RESUMEN

Parasitic diseases caused by kinetoplastid parasites are a burden to public health throughout tropical and subtropical regions of the world. TriTrypDB (https://tritrypdb.org) is a free online resource for data mining of genomic and functional data from these kinetoplastid parasites and is part of the VEuPathDB Bioinformatics Resource Center (https://veupathdb.org). As of release 59, TriTrypDB hosts 83 kinetoplastid genomes, nine of which, including Trypanosoma brucei brucei TREU927, Trypanosoma cruzi CL Brener and Leishmania major Friedlin, undergo manual curation by integrating information from scientific publications, high-throughput assays and user submitted comments. TriTrypDB also integrates transcriptomic, proteomic, epigenomic, population-level and isolate data, functional information from genome-wide RNAi knock-down and fluorescent tagging, and results from automated bioinformatics analysis pipelines. TriTrypDB offers a user-friendly web interface embedded with a genome browser, search strategy system and bioinformatics tools to support custom in silico experiments that leverage integrated data. A Galaxy workspace enables users to analyze their private data (e.g., RNA-sequencing, variant calling, etc.) and explore their results privately in the context of publicly available information in the database. The recent addition of an annotation platform based on Apollo enables users to provide both functional and structural changes that will appear as 'community annotations' immediately and, pending curatorial review, will be integrated into the official genome annotation.


Asunto(s)
Kinetoplastida , Programas Informáticos , Interfaz Usuario-Computador , Proteómica , Genómica/métodos , Biología Computacional/métodos , Bases de Datos Genéticas , Internet
13.
J Proteome Res ; 22(3): 729-742, 2023 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-36577097

RESUMEN

The availability of proteomics datasets in the public domain, and in the PRIDE database, in particular, has increased dramatically in recent years. This unprecedented large-scale availability of data provides an opportunity for combined analyses of datasets to get organism-wide protein abundance data in a consistent manner. We have reanalyzed 24 public proteomics datasets from healthy human individuals to assess baseline protein abundance in 31 organs. We defined tissue as a distinct functional or structural region within an organ. Overall, the aggregated dataset contains 67 healthy tissues, corresponding to 3,119 mass spectrometry runs covering 498 samples from 489 individuals. We compared protein abundances between different organs and studied the distribution of proteins across these organs. We also compared the results with data generated in analogous studies. Additionally, we performed gene ontology and pathway-enrichment analyses to identify organ-specific enriched biological processes and pathways. As a key point, we have integrated the protein abundance results into the resource Expression Atlas, where they can be accessed and visualized either individually or together with gene expression data coming from transcriptomics datasets. We believe this is a good mechanism to make proteomics data more accessible for life scientists.


Asunto(s)
Proteoma , Proteómica , Humanos , Proteoma/análisis , Proteómica/métodos , Perfilación de la Expresión Génica , Bases de Datos Factuales , Espectrometría de Masas/métodos , Bases de Datos de Proteínas
14.
Proteomics ; 23(7-8): e2200014, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36074795

RESUMEN

Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.


Asunto(s)
Proteoma , Proteómica , Proteómica/métodos , Espectrometría de Masas/métodos , Biología Computacional/métodos
15.
Matrix Biol ; 113: 61-82, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36152781

RESUMEN

Muscle stem cells (MuSCs) are indispensable for muscle regeneration. A multitude of extracellular stimuli direct MuSC fate decisions from quiescent progenitors to differentiated myocytes. The activity of these signals is modulated by coreceptors such as syndecan-3 (SDC3). We investigated the global landscape of SDC3-mediated regulation of myogenesis using a phosphoproteomics approach which revealed, with the precision level of individual phosphosites, the large-scale extent of SDC3-mediated regulation of signal transduction in MuSCs. We then focused on INSR/AKT/mTOR as a key pathway regulated by SDC3 during myogenesis and mechanistically dissected SDC3-mediated inhibition of insulin receptor signaling in MuSCs. SDC3 interacts with INSR ultimately limiting signal transduction via AKT/mTOR. Both knockdown of INSR and inhibition of AKT restore Sdc3-/- MuSC differentiation to wild type levels. Since SDC3 is rapidly downregulated at the onset of differentiation, our study suggests that SDC3 acts a timekeeper to restrain proliferating MuSC response and prevent premature differentiation.


Asunto(s)
Músculo Esquelético , Proteínas Proto-Oncogénicas c-akt , Proteínas Proto-Oncogénicas c-akt/genética , Proteínas Proto-Oncogénicas c-akt/metabolismo , Sindecano-3/genética , Sindecano-3/metabolismo , Células Cultivadas , Músculo Esquelético/metabolismo , Desarrollo de Músculos/genética , Serina-Treonina Quinasas TOR/genética , Serina-Treonina Quinasas TOR/metabolismo , Diferenciación Celular
16.
Essays Biochem ; 66(2): 155-168, 2022 08 05.
Artículo en Inglés | MEDLINE | ID: mdl-35920279

RESUMEN

The response to abiotic and biotic stresses in plants and crops is considered a multifaceted process. Due to their sessile nature, plants have evolved unique mechanisms to ensure that developmental plasticity remains during their life cycle. Among these mechanisms, post-translational modifications (PTMs) are crucial components of adaptive responses in plants and transduce environmental stimuli into cellular signalling through the modulation of proteins. SUMOylation is an emerging PTM that has received recent attention due to its dynamic role in protein modification and has quickly been considered a significant component of adaptive mechanisms in plants during stress with great potential for agricultural improvement programs. In the present review, we outline the concept that small ubiquitin-like modifier (SUMO)-mediated response in plants and crops to abiotic and biotic stresses is a multifaceted process with each component of the SUMO cycle facilitating tolerance to several different environmental stresses. We also highlight the clear increase in SUMO genes in crops when compared with Arabidopsis thaliana. The SUMO system is understudied in crops, given the importance of SUMO for stress responses, and for some SUMO genes, the apparent expansion provides new avenues to discover SUMO-conjugated targets that could regulate beneficial agronomical traits.


Asunto(s)
Arabidopsis , Ubiquitina , Arabidopsis/genética , Arabidopsis/metabolismo , Productos Agrícolas/genética , Productos Agrícolas/metabolismo , Estrés Fisiológico , Sumoilación , Ubiquitina/metabolismo
17.
PLoS Comput Biol ; 18(6): e1010174, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35714157

RESUMEN

The increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respectively. In all cases, we studied the distribution of canonical proteins between the different organs. The number of canonical proteins per dataset ranged from 273 (tendon) and 9,715 (liver) in mouse, and from 101 (tendon) and 6,130 (kidney) in rat. Then, we studied how protein abundances compared across different datasets and organs for both species. As a key point we carried out a comparative analysis of protein expression between mouse, rat and human tissues. We observed a high level of correlation of protein expression among orthologs between all three species in brain, kidney, heart and liver samples, whereas the correlation of protein expression was generally slightly lower between organs within the same species. Protein expression results have been integrated into the resource Expression Atlas for widespread dissemination.


Asunto(s)
Proteínas , Proteómica , Animales , Encéfalo/metabolismo , Ratones , Proteínas/metabolismo , Ratas
18.
J Proteome Res ; 21(6): 1510-1524, 2022 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-35532924

RESUMEN

Public phosphorylation databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available mass spectrometry (MS) data. However, there is no database-level control for false discovery of sites, likely leading to the overestimation of true phosphosites. By profiling the human phosphoproteome, we estimate the false discovery rate (FDR) of phosphosites and predict a more realistic count of true identifications. We rank sites into phosphorylation likelihood sets and analyze them in terms of conservation across 100 species, sequence properties, and functional annotations. We demonstrate significant differences between the sets and develop a method for independent phosphosite FDR estimation. Remarkably, we report estimated FDRs of 84, 98, and 82% within sets of phosphoserine (pSer), phosphothreonine (pThr), and phosphotyrosine (pTyr) sites, respectively, that are supported by only a single piece of identification evidence─the majority of sites in PSP. We estimate that around 62 000 Ser, 8000 Thr, and 12 000 Tyr phosphosites in the human proteome are likely to be true, which is lower than most published estimates. Furthermore, our analysis estimates that 86 000 Ser, 50 000 Thr, and 26 000 Tyr phosphosites are likely false-positive identifications, highlighting the significant potential of false-positive data to be present in phosphorylation databases.


Asunto(s)
Fosfopéptidos , Proteoma , Humanos , Espectrometría de Masas/métodos , Fosfopéptidos/metabolismo , Fosfoproteínas/análisis , Fosforilación , Proteoma/análisis
19.
J Proteome Res ; 21(7): 1603-1615, 2022 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-35640880

RESUMEN

Phosphoproteomic methods are commonly employed to identify and quantify phosphorylation sites on proteins. In recent years, various tools have been developed, incorporating scores or statistics related to whether a given phosphosite has been correctly identified or to estimate the global false localization rate (FLR) within a given data set for all sites reported. These scores have generally been calibrated using synthetic datasets, and their statistical reliability on real datasets is largely unknown, potentially leading to studies reporting incorrectly localized phosphosites, due to inadequate statistical control. In this work, we develop the concept of scoring modifications on a decoy amino acid, that is, one that cannot be modified, to allow for independent estimation of global FLR. We test a variety of amino acids, on both synthetic and real data sets, demonstrating that the selection can make a substantial difference to the estimated global FLR. We conclude that while several different amino acids might be appropriate, the most reliable FLR results were achieved using alanine and leucine as decoys. We propose the use of a decoy amino acid to control false reporting in the literature and in public databases that re-distribute the data. Data are available via ProteomeXchange with identifier PXD028840.


Asunto(s)
Aminoácidos , Espectrometría de Masas en Tándem , Bases de Datos de Proteínas , Reproducibilidad de los Resultados , Espectrometría de Masas en Tándem/métodos
20.
Sol Phys ; 296(3)2021.
Artículo en Inglés | MEDLINE | ID: mdl-34803188

RESUMEN

In spite of strict limits on outgassing from organic materials, some spacecraft instruments making long-term measurements of solar extreme ultraviolet (EUV) radiation still suffer significant degradation. While such measures have reduced the rate of degradation, they have not completely eliminated it in some cases. For example, in five years, the aluminum filters used in the Extreme Ultraviolet Variability Experiment (EVE) instruments onboard the Solar Dynamics Observatory (SDO) suffered losses exceeding 40% at 30.4 nm. Comparing those losses with the negligible losses of nearby zirconium filters on the same instruments indicated that the problem was not due to carbonization on the Sun-facing side of the filter. To investigate whether the loss was due to carbon deposition on the downstream face of the Al filter, we exposed the backsides of Al and Zr filters to EUV in the presence of a volatile organic solvent in the laboratory and concluded that this could not be the cause. Given that the residual gas composition in the SDO spacecraft likely has water vapor as well as organics, these findings suggest that the transmission loss in the Al filter originated with oxidation caused by UV-activated adsorbed water.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...