Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 131
Filtrar
1.
J Proteome Res ; 23(6): 1948-1959, 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38717300

RESUMEN

The availability of an increasingly large amount of public proteomics data sets presents an opportunity for performing combined analyses to generate comprehensive organism-wide protein expression maps across different organisms and biological conditions. Sus scrofa, a domestic pig, is a model organism relevant for food production and for human biomedical research. Here, we reanalyzed 14 public proteomics data sets from the PRIDE database coming from pig tissues to assess baseline (without any biological perturbation) protein abundance in 14 organs, encompassing a total of 20 healthy tissues from 128 samples. The analysis involved the quantification of protein abundance in 599 mass spectrometry runs. We compared protein expression patterns among different pig organs and examined the distribution of proteins across these organs. Then, we studied how protein abundances were compared across different data sets and studied the tissue specificity of the detected proteins. Of particular interest, we conducted a comparative analysis of protein expression between pig and human tissues, revealing a high degree of correlation in protein expression among orthologs, particularly in brain, kidney, heart, and liver samples. We have integrated the protein expression results into the Expression Atlas resource for easy access and visualization of the protein expression data individually or alongside gene expression data.


Asunto(s)
Riñón , Proteómica , Animales , Proteómica/métodos , Humanos , Porcinos , Riñón/metabolismo , Riñón/química , Especificidad de Órganos , Hígado/metabolismo , Hígado/química , Bases de Datos de Proteínas , Encéfalo/metabolismo , Miocardio/metabolismo , Miocardio/química , Sus scrofa/metabolismo , Sus scrofa/genética , Proteoma/metabolismo , Proteoma/análisis , Espectrometría de Masas
2.
J Proteome Res ; 23(7): 2518-2531, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38810119

RESUMEN

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,565 phosphosites on serine, threonine, and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety and clustered the data to identify groups of sites with similar patterns across rice family groups. The data has been loaded into UniProt Knowledge-Base─enabling researchers to visualize sites alongside other data on rice proteins, e.g., structural models from AlphaFold2, PeptideAtlas, and the PRIDE database─enabling visualization of source evidence, including scores and supporting mass spectra.


Asunto(s)
Genoma de Planta , Oryza , Fosfoproteínas , Proteínas de Plantas , Proteómica , Transducción de Señal , Oryza/genética , Oryza/metabolismo , Oryza/química , Proteómica/métodos , Fosfoproteínas/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/química , Fosfoproteínas/análisis , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fosforilación , Procesamiento Proteico-Postraduccional , Fosfopéptidos/metabolismo , Fosfopéptidos/análisis , Bases de Datos de Proteínas , Secuencias de Aminoácidos , Espectrometría de Masas
3.
Proteomics ; 23(7-8): e2200014, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36074795

RESUMEN

Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.


Asunto(s)
Proteoma , Proteómica , Proteómica/métodos , Espectrometría de Masas/métodos , Biología Computacional/métodos
4.
J Proteome Res ; 22(6): 1828-1842, 2023 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-37099386

RESUMEN

Phosphorylation is a post-translational modification of great interest to researchers due to its relevance in many biological processes. LC-MS/MS techniques have enabled high-throughput data acquisition, with studies claiming identification and localization of thousands of phosphosites. The identification and localization of phosphosites emerge from different analytical pipelines and scoring algorithms, with uncertainty embedded throughout the pipeline. For many pipelines and algorithms, arbitrary thresholding is used, but little is known about the actual global false localization rate in these studies. Recently, it has been suggested to use decoy amino acids to estimate global false localization rates of phosphosites, among the peptide-spectrum matches reported. Here, we describe a simple pipeline aiming to maximize the information extracted from these studies by objectively collapsing from peptide-spectrum match to the peptidoform-site level, as well as combining findings from multiple studies while maintaining track of false localization rates. We show that the approach is more effective than current processes that use a simpler mechanism for handling phosphosite identification redundancy within and across studies. In our case study using eight rice phosphoproteomics data sets, 6368 unique sites were confidently identified using our decoy approach compared to 4687 using traditional thresholding in which false localization rates are unknown.


Asunto(s)
Proteómica , Ríos , Cromatografía Liquida , Proteómica/métodos , Espectrometría de Masas en Tándem , Procesamiento Proteico-Postraduccional , Péptidos/química , Algoritmos , Bases de Datos de Proteínas
5.
J Proteome Res ; 22(3): 729-742, 2023 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-36577097

RESUMEN

The availability of proteomics datasets in the public domain, and in the PRIDE database, in particular, has increased dramatically in recent years. This unprecedented large-scale availability of data provides an opportunity for combined analyses of datasets to get organism-wide protein abundance data in a consistent manner. We have reanalyzed 24 public proteomics datasets from healthy human individuals to assess baseline protein abundance in 31 organs. We defined tissue as a distinct functional or structural region within an organ. Overall, the aggregated dataset contains 67 healthy tissues, corresponding to 3,119 mass spectrometry runs covering 498 samples from 489 individuals. We compared protein abundances between different organs and studied the distribution of proteins across these organs. We also compared the results with data generated in analogous studies. Additionally, we performed gene ontology and pathway-enrichment analyses to identify organ-specific enriched biological processes and pathways. As a key point, we have integrated the protein abundance results into the resource Expression Atlas, where they can be accessed and visualized either individually or together with gene expression data coming from transcriptomics datasets. We believe this is a good mechanism to make proteomics data more accessible for life scientists.


Asunto(s)
Proteoma , Proteómica , Humanos , Proteoma/análisis , Proteómica/métodos , Perfilación de la Expresión Génica , Bases de Datos Factuales , Espectrometría de Masas/métodos , Bases de Datos de Proteínas
6.
J Proteome Res ; 22(12): 3754-3772, 2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-37939282

RESUMEN

Protein tyrosine sulfation (sY) is a post-translational modification (PTM) catalyzed by Golgi-resident tyrosyl protein sulfo transferases (TPSTs). Information on sY in humans is currently limited to ∼50 proteins, with only a handful having verified sites of sulfation. As such, the contribution of sulfation to the regulation of biological processes remains poorly defined. Mass spectrometry (MS)-based proteomics is the method of choice for PTM analysis but has yet to be applied for systematic investigation of the "sulfome", primarily due to issues associated with discrimination of sY-containing from phosphotyrosine (pY)-containing peptides. In this study, we developed an MS-based workflow for sY-peptide characterization, incorporating optimized Zr4+ immobilized metal-ion affinity chromatography (IMAC) and TiO2 enrichment strategies. Extensive characterization of a panel of sY- and pY-peptides using an array of fragmentation regimes (CID, HCD, EThcD, ETciD, UVPD) highlighted differences in the generation of site-determining product ions and allowed us to develop a strategy for differentiating sulfated peptides from nominally isobaric phosphopeptides based on low collision energy-induced neutral loss. Application of our "sulfomics" workflow to a HEK-293 cell extracellular secretome facilitated identification of 21 new sulfotyrosine-containing proteins, several of which we validate enzymatically, and reveals new interplay between enzymes relevant to both protein and glycan sulfation.


Asunto(s)
Fosfopéptidos , Tirosina , Humanos , Fosfopéptidos/análisis , Células HEK293 , Flujo de Trabajo , Tirosina/metabolismo , Proteínas , Fosfotirosina
7.
J Proteome Res ; 22(2): 287-301, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36626722

RESUMEN

The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.


Asunto(s)
Proteoma , Proteómica , Humanos , Estándares de Referencia , Vocabulario Controlado , Espectrometría de Masas , Bases de Datos de Proteínas
8.
EMBO J ; 38(21): e100847, 2019 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-31433507

RESUMEN

Phosphorylation is a key regulator of protein function under (patho)physiological conditions, and defining site-specific phosphorylation is essential to understand basic and disease biology. In vertebrates, the investigative focus has primarily been on serine, threonine and tyrosine phosphorylation, but mounting evidence suggests that phosphorylation of other "non-canonical" amino acids also regulates critical aspects of cell biology. However, standard methods of phosphoprotein characterisation are largely unsuitable for the analysis of non-canonical phosphorylation due to their relative instability under acidic conditions and/or elevated temperature. Consequently, the complete landscape of phosphorylation remains unexplored. Here, we report an unbiased phosphopeptide enrichment strategy based on strong anion exchange (SAX) chromatography (UPAX), which permits identification of histidine (His), arginine (Arg), lysine (Lys), aspartate (Asp), glutamate (Glu) and cysteine (Cys) phosphorylation sites on human proteins by mass spectrometry-based phosphoproteomics. Remarkably, under basal conditions, and having accounted for false site localisation probabilities, the number of unique non-canonical phosphosites is approximately one-third of the number of observed canonical phosphosites. Our resource reveals the previously unappreciated diversity of protein phosphorylation in human cells, and opens up avenues for high-throughput exploration of non-canonical phosphorylation in all organisms.


Asunto(s)
Aniones/química , Cromatografía por Intercambio Iónico/métodos , Fosfopéptidos/análisis , Fosfoproteínas/análisis , Proteoma/análisis , Biología Computacional , Células HeLa , Humanos , Espectrometría de Masas , Fosforilación
9.
PLoS Comput Biol ; 18(6): e1010174, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35714157

RESUMEN

The increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respectively. In all cases, we studied the distribution of canonical proteins between the different organs. The number of canonical proteins per dataset ranged from 273 (tendon) and 9,715 (liver) in mouse, and from 101 (tendon) and 6,130 (kidney) in rat. Then, we studied how protein abundances compared across different datasets and organs for both species. As a key point we carried out a comparative analysis of protein expression between mouse, rat and human tissues. We observed a high level of correlation of protein expression among orthologs between all three species in brain, kidney, heart and liver samples, whereas the correlation of protein expression was generally slightly lower between organs within the same species. Protein expression results have been integrated into the resource Expression Atlas for widespread dissemination.


Asunto(s)
Proteínas , Proteómica , Animales , Encéfalo/metabolismo , Ratones , Proteínas/metabolismo , Ratas
10.
J Proteome Res ; 21(6): 1510-1524, 2022 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-35532924

RESUMEN

Public phosphorylation databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available mass spectrometry (MS) data. However, there is no database-level control for false discovery of sites, likely leading to the overestimation of true phosphosites. By profiling the human phosphoproteome, we estimate the false discovery rate (FDR) of phosphosites and predict a more realistic count of true identifications. We rank sites into phosphorylation likelihood sets and analyze them in terms of conservation across 100 species, sequence properties, and functional annotations. We demonstrate significant differences between the sets and develop a method for independent phosphosite FDR estimation. Remarkably, we report estimated FDRs of 84, 98, and 82% within sets of phosphoserine (pSer), phosphothreonine (pThr), and phosphotyrosine (pTyr) sites, respectively, that are supported by only a single piece of identification evidence─the majority of sites in PSP. We estimate that around 62 000 Ser, 8000 Thr, and 12 000 Tyr phosphosites in the human proteome are likely to be true, which is lower than most published estimates. Furthermore, our analysis estimates that 86 000 Ser, 50 000 Thr, and 26 000 Tyr phosphosites are likely false-positive identifications, highlighting the significant potential of false-positive data to be present in phosphorylation databases.


Asunto(s)
Fosfopéptidos , Proteoma , Humanos , Espectrometría de Masas/métodos , Fosfopéptidos/metabolismo , Fosfoproteínas/análisis , Fosforilación , Proteoma/análisis
11.
J Proteome Res ; 21(7): 1603-1615, 2022 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-35640880

RESUMEN

Phosphoproteomic methods are commonly employed to identify and quantify phosphorylation sites on proteins. In recent years, various tools have been developed, incorporating scores or statistics related to whether a given phosphosite has been correctly identified or to estimate the global false localization rate (FLR) within a given data set for all sites reported. These scores have generally been calibrated using synthetic datasets, and their statistical reliability on real datasets is largely unknown, potentially leading to studies reporting incorrectly localized phosphosites, due to inadequate statistical control. In this work, we develop the concept of scoring modifications on a decoy amino acid, that is, one that cannot be modified, to allow for independent estimation of global FLR. We test a variety of amino acids, on both synthetic and real data sets, demonstrating that the selection can make a substantial difference to the estimated global FLR. We conclude that while several different amino acids might be appropriate, the most reliable FLR results were achieved using alanine and leucine as decoys. We propose the use of a decoy amino acid to control false reporting in the literature and in public databases that re-distribute the data. Data are available via ProteomeXchange with identifier PXD028840.


Asunto(s)
Aminoácidos , Espectrometría de Masas en Tándem , Bases de Datos de Proteínas , Reproducibilidad de los Resultados , Espectrometría de Masas en Tándem/métodos
12.
Bioinformatics ; 37(21): 3830-3838, 2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34196671

RESUMEN

MOTIVATION: MHC-peptide binding prediction has been widely used for understanding the immune response of individuals or populations, each carrying different MHC molecules as well as for the development of immunotherapeutics. The results from MHC-peptide binding prediction tools are mostly reported as a predicted binding affinity (IC50) and the percentile rank score, and global thresholds e.g. IC50 value < 500 nM or percentile rank < 2% are generally recommended for distinguishing binding peptides from non-binding peptides. However, it is difficult to evaluate statistically the probability of an individual peptide binding prediction to be true or false solely considering predicted scores. Therefore, statistics describing the overall global false discovery rate (FDR) and local FDR, also called posterior error probability (PEP) are required to give statistical context to the natively produced scores. RESULT: We have developed an algorithm and code implementation, called MHCVision, for estimation of FDR and PEP values for the predicted results of MHC-peptide binding prediction from the NetMHCpan tool. MHCVision performs parameter estimation using a modified expectation maximization framework for a two-component beta mixture model, representing the distribution of true and false scores of the predicted dataset. We can then estimate the PEP of an individual peptide's predicted score, and conversely the probability that it is true. We demonstrate that the use of global FDR and PEP estimation can provide a better trade-off between sensitivity and precision over using currently recommended thresholds from tools. AVAILABILITY AND IMPLEMENTATION: https://github.com/PGB-LIV/MHCVision. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Péptidos , Humanos , Unión Proteica , Péptidos/química , Probabilidad
13.
Bioinformatics ; 37(16): 2347-2355, 2021 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-33560295

RESUMEN

MOTIVATION: A fundamental problem for disease treatment is that while antibiotics are a powerful counter to bacteria, they are ineffective against viruses. Often, bacterial and viral infections are confused due to their similar symptoms and lack of rapid diagnostics. With many clinicians relying primarily on symptoms for diagnosis, overuse and misuse of modern antibiotics are rife, contributing to the growing pool of antibiotic resistance. To ensure an individual receives optimal treatment given their disease state and to reduce over-prescription of antibiotics, the host response can in theory be measured quickly to distinguish between the two states. To establish a predictive biomarker panel of disease state (viral/bacterial/no-infection), we conducted a meta-analysis of human blood infection studies using machine learning. RESULTS: We focused on publicly available gene expression data from two widely used platforms, Affymetrix and Illumina microarrays as they represented a significant proportion of the available data. We were able to develop multi-class models with high accuracies with our best model predicting 93% of bacterial and 89% viral samples correctly. To compare the selected features in each of the different technologies, we reverse-engineered the underlying molecular regulatory network and explored the neighbourhood of the selected features. The networks highlighted that although on the gene-level the models differed, they contained genes from the same areas of the network. Specifically, this convergence was to pathways including the Type I interferon Signalling Pathway, Chemotaxis, Apoptotic Processes and Inflammatory/Innate Response. AVAILABILITY: Data and code are available on the Gene Expression Omnibus and github. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

14.
Biochem J ; 478(3): 533-551, 2021 02 12.
Artículo en Inglés | MEDLINE | ID: mdl-33438746

RESUMEN

Different types of DNA damage can initiate phosphorylation-mediated signalling cascades that result in stimulus specific pro- or anti-apoptotic cellular responses. Amongst its many roles, the NF-κB transcription factor RelA is central to these DNA damage response pathways. However, we still lack understanding of the co-ordinated signalling mechanisms that permit different DNA damaging agents to induce distinct cellular outcomes through RelA. Here, we use label-free quantitative phosphoproteomics to examine the temporal effects of exposure of U2OS cells to either etoposide (ETO) or hydroxyurea (HU) by monitoring the phosphorylation status of RelA and its protein binding partners. Although few stimulus-specific differences were identified in the constituents of phosphorylated RelA interactome after exposure to these DNA damaging agents, we observed subtle, but significant, changes in their phosphorylation states, as a function of both type and duration of treatment. The DNA double strand break (DSB)-inducing ETO invoked more rapid, sustained responses than HU, with regulated targets primarily involved in transcription, cell division and canonical DSB repair. Kinase substrate prediction of ETO-regulated phosphosites suggest abrogation of CDK and ERK1 signalling, in addition to the known induction of ATM/ATR. In contrast, HU-induced replicative stress mediated temporally dynamic regulation, with phosphorylated RelA binding partners having roles in rRNA/mRNA processing and translational initiation, many of which contained a 14-3-3ε binding motif, and were putative substrates of the dual specificity kinase CLK1. Our data thus point to differential regulation of key cellular processes and the involvement of distinct signalling pathways in modulating DNA damage-specific functions of RelA.


Asunto(s)
Daño del ADN , Procesamiento Proteico-Postraduccional , Factor de Transcripción ReIA/fisiología , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Apoptosis/efectos de los fármacos , Apoptosis/fisiología , Neoplasias Óseas/patología , Línea Celular Tumoral , Cromatografía Liquida , Secuencia de Consenso , Roturas del ADN de Doble Cadena , Replicación del ADN , ADN de Neoplasias/efectos de los fármacos , ADN de Neoplasias/metabolismo , Etopósido/farmacología , Humanos , Hidroxiurea/farmacología , Osteosarcoma/patología , Fosforilación , Mapas de Interacción de Proteínas , Proteínas Quinasas/metabolismo , Proteómica/métodos , Espectrometría de Masas en Tándem , Factores de Tiempo
15.
Nucleic Acids Res ; 48(D1): D783-D788, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31722398

RESUMEN

The Allele Frequency Net Database (AFND, www.allelefrequencies.net) provides the scientific community with a freely available repository for the storage of frequency data (alleles, genes, haplotypes and genotypes) related to human leukocyte antigens (HLA), killer-cell immunoglobulin-like receptors (KIR), major histocompatibility complex Class I chain related genes (MIC) and a number of cytokine gene polymorphisms in worldwide populations. In the last five years, AFND has become more popular in terms of clinical and scientific usage, with a recent increase in genotyping data as a necessary component of Short Population Report article submissions to another scientific journal. In addition, we have developed a user-friendly desktop application for HLA and KIR genotype/population data submissions. We have also focused on classification of existing and new data into 'gold-silver-bronze' criteria, allowing users to filter and query depending on their needs. Moreover, we have also continued to expand other features, for example focussed on HLA associations with adverse drug reactions. At present, AFND contains >1600 populations from >10 million healthy individuals, making AFND a valuable resource for the analysis of some of the most polymorphic regions in the human genome.


Asunto(s)
Citocinas/genética , Bases de Datos Genéticas , Frecuencia de los Genes/genética , Antígenos HLA/genética , Antígenos de Histocompatibilidad Clase I/genética , Receptores KIR/genética , Genoma Humano , Humanos , Polimorfismo Genético , Interfaz Usuario-Computador
16.
J Proteome Res ; 20(4): 1981-1985, 2021 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-33710902

RESUMEN

Complex biological samples, in particular, in proteomics and metabolomics research, are often analyzed using mass spectrometry paired with liquid chromatography or gas chromatography. The chromatography stage adds a third dimension (retention time) to the usual 2D mass spectrometry output (mass/charge, detected ion counts). Experimental results are often discovered by complex computational analysis, but it is not always possible to know if the data has been correctly interpreted. To perform quality-control checks, it can often be helpful to verify the results by manually examining the raw data, and it is typically easier to understand the data in a graphical, rather than numerical, form. 3D graphics hardware is present in most modern computers but is rarely utilized by bioinformatics software, even when the data to be viewed are naturally 3D. lcmsWorld is new software that uses graphics hardware to quickly and smoothly examine and compare LC-MS data. A preprocessing step allows the software to subsequently access any area of the data instantly at multiple levels of detail. The data can then be freely navigated while the software automatically selects, loads, and displays the most appropriate detail. lcmsWorld is open source. Releases, source code, and example data files are available via https://github.com/PGB-LIV/lcmsWorld.


Asunto(s)
Imagenología Tridimensional , Programas Informáticos , Cromatografía Liquida , Cromatografía de Gases y Espectrometría de Masas , Espectrometría de Masas
17.
J Proteome Res ; 20(1): 172-183, 2021 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-32864978

RESUMEN

With ever-increasing amounts of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer volume of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the next generation of data analysis pipelines. The Proteomics Standards Initiative (PSI) has established a clear and precise extensible markup language (XML) representation for data interchange, mzML, receiving substantial uptake; nevertheless, storage and file access efficiency has not been the main focus. We propose an HDF5 file format "mzMLb" that is optimized for both read/write speed and storage of the raw mass spectrometry data. We provide an extensive validation of the write speed, random read speed, and storage size, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases, while with compression is comparable in size to proprietary vendor file formats. Since our approach uniquely preserves the XML encoding of the metadata, the format implicitly supports future versions of mzML and is straightforward to implement: mzMLb's design adheres to both HDF5 and NetCDF4 standard implementations, which allows it to be easily utilized by third parties due to their widespread programming language support. A reference implementation within the established ProteoWizard toolkit is provided.


Asunto(s)
Lenguajes de Programación , Proteómica , Bases de Datos de Proteínas , Espectrometría de Masas , Metabolómica , Programas Informáticos
18.
Sol Phys ; 296(3)2021.
Artículo en Inglés | MEDLINE | ID: mdl-34803188

RESUMEN

In spite of strict limits on outgassing from organic materials, some spacecraft instruments making long-term measurements of solar extreme ultraviolet (EUV) radiation still suffer significant degradation. While such measures have reduced the rate of degradation, they have not completely eliminated it in some cases. For example, in five years, the aluminum filters used in the Extreme Ultraviolet Variability Experiment (EVE) instruments onboard the Solar Dynamics Observatory (SDO) suffered losses exceeding 40% at 30.4 nm. Comparing those losses with the negligible losses of nearby zirconium filters on the same instruments indicated that the problem was not due to carbonization on the Sun-facing side of the filter. To investigate whether the loss was due to carbon deposition on the downstream face of the Al filter, we exposed the backsides of Al and Zr filters to EUV in the presence of a volatile organic solvent in the laboratory and concluded that this could not be the cause. Given that the residual gas composition in the SDO spacecraft likely has water vapor as well as organics, these findings suggest that the transmission loss in the Al filter originated with oxidation caused by UV-activated adsorbed water.

19.
Mol Cell Proteomics ; 18(1): 86-98, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30293062

RESUMEN

Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants, and common contaminants in which to search for protein-level evidence. We re-analyzed nine publicly accessible rice proteomics data sets. In total, we identified 420K peptide spectrum matches from 47K peptides and 8,187 protein groups. 4168 peptides were initially classed as putative novel peptides (not matching official genes). Following a strict filtration scheme to rule out other possible explanations, we discovered 1,584 high confidence novel peptides. The novel peptides were clustered into 692 genomic loci where our results suggest annotation improvements. 80% of the novel peptides had an ortholog match in the curated protein sequence set from at least one other plant species. For the peptides clustering in intergenic regions (and thus potentially new genes), 101 loci were identified, for which 43 had a high-confidence hit for a protein domain. Our results can be displayed as tracks on the Ensembl genome or other browsers supporting Track Hubs, to support re-annotation of the rice genome.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Oryza/genética , Proteómica/métodos , Análisis por Conglomerados , Bases de Datos de Proteínas , Regulación de la Expresión Génica de las Plantas , Anotación de Secuencia Molecular , Oryza/metabolismo , Análisis de Secuencia de ARN
20.
Biochem J ; 477(13): 2451-2475, 2020 07 17.
Artículo en Inglés | MEDLINE | ID: mdl-32501498

RESUMEN

Polo-like kinase 4 (PLK4) is the master regulator of centriole duplication in metazoan organisms. Catalytic activity and protein turnover of PLK4 are tightly coupled in human cells, since changes in PLK4 concentration and catalysis have profound effects on centriole duplication and supernumerary centrosomes, which are associated with aneuploidy and cancer. Recently, PLK4 has been targeted with a variety of small molecule kinase inhibitors exemplified by centrinone, which rapidly induces inhibitory effects on PLK4 and leads to on-target centrosome depletion. Despite this, relatively few PLK4 substrates have been identified unequivocally in human cells, and PLK4 signalling outside centriolar networks remains poorly characterised. We report an unbiased mass spectrometry (MS)-based quantitative analysis of cellular protein phosphorylation in stable PLK4-expressing U2OS human cells exposed to centrinone. PLK4 phosphorylation was itself sensitive to brief exposure to the compound, resulting in PLK4 stabilisation. Analysing asynchronous cell populations, we report hundreds of centrinone-regulated cellular phosphoproteins, including centrosomal and cell cycle proteins and a variety of likely 'non-canonical' substrates. Surprisingly, sequence interrogation of ∼300 significantly down-regulated phosphoproteins reveals an extensive network of centrinone-sensitive [Ser/Thr]Pro phosphorylation sequence motifs, which based on our analysis might be either direct or indirect targets of PLK4. In addition, we confirm that NMYC and PTPN12 are PLK4 substrates, both in vitro and in human cells. Our findings suggest that PLK4 catalytic output directly controls the phosphorylation of a diverse set of cellular proteins, including Pro-directed targets that are likely to be important in PLK4-mediated cell signalling.


Asunto(s)
Proteínas Serina-Treonina Quinasas/metabolismo , Pirimidinas/farmacología , Sulfonas/farmacología , Línea Celular Tumoral , Citometría de Flujo , Fluorometría , Humanos , Inmunoprecipitación , Leupeptinas/farmacología , Microscopía Fluorescente , Fosforilación/efectos de los fármacos , Proteínas Serina-Treonina Quinasas/antagonistas & inhibidores , Espectrometría de Masas en Tándem
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda