Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
PLoS Biol ; 20(5): e3001636, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35576205

RESUMEN

The recent revolution in computational protein structure prediction provides folding models for entire proteomes, which can now be integrated with large-scale experimental data. Mass spectrometry (MS)-based proteomics has identified and quantified tens of thousands of posttranslational modifications (PTMs), most of them of uncertain functional relevance. In this study, we determine the structural context of these PTMs and investigate how this information can be leveraged to pinpoint potential regulatory sites. Our analysis uncovers global patterns of PTM occurrence across folded and intrinsically disordered regions. We found that this information can help to distinguish regulatory PTMs from those marking improperly folded proteins. Interestingly, the human proteome contains thousands of proteins that have large folded domains linked by short, disordered regions that are strongly enriched in regulatory phosphosites. These include well-known kinase activation loops that induce protein conformational changes upon phosphorylation. This regulatory mechanism appears to be widespread in kinases but also occurs in other protein families such as solute carriers. It is not limited to phosphorylation but includes ubiquitination and acetylation sites as well. Furthermore, we performed three-dimensional proximity analysis, which revealed examples of spatial coregulation of different PTM types and potential PTM crosstalk. To enable the community to build upon these first analyses, we provide tools for 3D visualization of proteomics data and PTMs as well as python libraries for data accession and processing.


Asunto(s)
Procesamiento Proteico-Postraduccional , Proteoma , Humanos , Espectrometría de Masas/métodos , Fosforilación , Proteómica/métodos
2.
Mol Cell Proteomics ; 22(7): 100581, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37225017

RESUMEN

Recent advances in mass spectrometry-based proteomics enable the acquisition of increasingly large datasets within relatively short times, which exposes bottlenecks in the bioinformatics pipeline. Although peptide identification is already scalable, most label-free quantification (LFQ) algorithms scale quadratic or cubic with the sample numbers, which may even preclude the analysis of large-scale data. Here we introduce directLFQ, a ratio-based approach for sample normalization and the calculation of protein intensities. It estimates quantities via aligning samples and ion traces by shifting them on top of each other in logarithmic space. Importantly, directLFQ scales linearly with the number of samples, allowing analyses of large studies to finish in minutes instead of days or months. We quantify 10,000 proteomes in 10 min and 100,000 proteomes in less than 2 h, a 1000-fold faster than some implementations of the popular LFQ algorithm MaxLFQ. In-depth characterization of directLFQ reveals excellent normalization properties and benchmark results, comparing favorably to MaxLFQ for both data-dependent acquisition and data-independent acquisition. In addition, directLFQ provides normalized peptide intensity estimates for peptide-level comparisons. It is an important part of an overall quantitative proteomic pipeline that also needs to include high sensitive statistical analysis leading to proteoform resolution. Available as an open-source Python package and a graphical user interface with a one-click installer, it can be used in the AlphaPept ecosystem as well as downstream of most common computational proteomics pipelines.


Asunto(s)
Proteoma , Proteómica , Proteoma/análisis , Proteómica/métodos , Ecosistema , Péptidos/análisis , Espectrometría de Masas/métodos , Programas Informáticos
3.
Mol Cell Proteomics ; 22(2): 100489, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36566012

RESUMEN

Data-independent acquisition (DIA) methods have become increasingly popular in mass spectrometry-based proteomics because they enable continuous acquisition of fragment spectra for all precursors simultaneously. However, these advantages come with the challenge of correctly reconstructing the precursor-fragment relationships in these highly convoluted spectra for reliable identification and quantification. Here, we introduce a scan mode for the combination of trapped ion mobility spectrometry with parallel accumulation-serial fragmentation (PASEF) that seamlessly and continuously follows the natural shape of the ion cloud in ion mobility and peptide precursor mass dimensions. Termed synchro-PASEF, it increases the detected fragment ion current several-fold at sub-second cycle times. Consecutive quadrupole selection windows move synchronously through the mass and ion mobility range. In this process, the quadrupole slices through the peptide precursors, which separates fragment ion signals of each precursor into adjacent synchro-PASEF scans. This precisely defines precursor-fragment relationships in ion mobility and mass dimensions and effectively deconvolutes the DIA fragment space. Importantly, the partitioned parts of the fragment ion transitions provide a further dimension of specificity via a lock-and-key mechanism. This is also advantageous for quantification, where signals from interfering precursors in the DIA selection window do not affect all partitions of the fragment ion, allowing to retain only the specific parts for quantification. Overall, we establish the defining features of synchro-PASEF and explore its potential for proteomic analyses.


Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Proteómica/métodos , Proteoma/análisis , Péptidos/análisis
4.
Mol Cell Proteomics ; 21(9): 100279, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35944843

RESUMEN

Data-independent acquisition (DIA) methods have become increasingly attractive in mass spectrometry-based proteomics because they enable high data completeness and a wide dynamic range. Recently, we combined DIA with parallel accumulation-serial fragmentation (dia-PASEF) on a Bruker trapped ion mobility (IM) separated quadrupole time-of-flight mass spectrometer. This requires alignment of the IM separation with the downstream mass selective quadrupole, leading to a more complex scheme for dia-PASEF window placement compared with DIA. To achieve high data completeness and deep proteome coverage, here we employ variable isolation windows that are placed optimally depending on precursor density in the m/z and IM plane. This is implemented in the freely available py_diAID (Python package for DIA with an automated isolation design) package. In combination with in-depth project-specific proteomics libraries and the Evosep liquid chromatography system, we reproducibly identified over 7700 proteins in a human cancer cell line in 44 min with quadruplicate single-shot injections at high sensitivity. Even at a throughput of 100 samples per day (11 min liquid chromatography gradients), we consistently quantified more than 6000 proteins in mammalian cell lysates by injecting four replicates. We found that optimal dia-PASEF window placement facilitates in-depth phosphoproteomics with very high sensitivity, quantifying more than 35,000 phosphosites in a human cancer cell line stimulated with an epidermal growth factor in triplicate 21 min runs. This covers a substantial part of the regulated phosphoproteome with high sensitivity, opening up for extensive systems-biological studies.


Asunto(s)
Proteoma , Espectrometría de Masas en Tándem , Animales , Cromatografía Liquida/métodos , Factor de Crecimiento Epidérmico , Humanos , Mamíferos/metabolismo , Proteoma/metabolismo , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos
5.
Bioinformatics ; 38(3): 849-852, 2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-34586352

RESUMEN

SUMMARY: Integrating experimental information across proteomic datasets with the wealth of publicly available sequence annotations is a crucial part in many proteomic studies that currently lacks an automated analysis platform. Here, we present AlphaMap, a Python package that facilitates the visual exploration of peptide-level proteomics data. Identified peptides and post-translational modifications in proteomic datasets are mapped to their corresponding protein sequence and visualized together with prior knowledge from UniProt and with expected proteolytic cleavage sites. The functionality of AlphaMap can be accessed via an intuitive graphical user interface or-more flexibly-as a Python package that allows its integration into common analysis workflows for data visualization. AlphaMap produces publication-quality illustrations and can easily be customized to address a given research question. AVAILABILITY AND IMPLEMENTATION: AlphaMap is implemented in Python and released under an Apache license. The source code and one-click installers are freely available at https://github.com/MannLabs/alphamap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteómica , Programas Informáticos , Péptidos , Secuencia de Aminoácidos , Péptido Hidrolasas
6.
Mol Cell Proteomics ; 20: 100149, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34543758

RESUMEN

High-resolution MS-based proteomics generates large amounts of data, even in the standard LC-tandem MS configuration. Adding an ion mobility dimension vastly increases the acquired data volume, challenging both analytical processing pipelines and especially data exploration by scientists. This has necessitated data aggregation, effectively discarding much of the information present in these rich datasets. Taking trapped ion mobility spectrometry (TIMS) on a quadrupole TOF (Q-TOF) platform as an example, we developed an efficient indexing scheme that represents all data points as detector arrival times on scales of minutes (LC), milliseconds (TIMS), and microseconds (TOF). In our open-source AlphaTims package, data are indexed, accessed, and visualized by a combination of tools of the scientific Python ecosystem. We interpret unprocessed data as a sparse four-dimensional matrix and use just-in-time compilation to machine code with Numba, accelerating our computational procedures by several orders of magnitude while keeping to familiar indexing and slicing notations. For samples with more than six billion detector events, a modern laptop can load and index raw data in about a minute. Loading is even faster when AlphaTims has already saved indexed data in an HDF5 file, a portable scientific standard used in extremely large-scale data acquisition. Subsequently, data accession along any dimension and interactive visualization happens in milliseconds. We have found AlphaTims to be a key enabling tool to explore high-dimensional LC-TIMS-Q-TOF data and have made it freely available as an open-source Python package with a stand-alone graphical user interface at https://github.com/MannLabs/alphatims or as part of the AlphaPept 'ecosystem'.


Asunto(s)
Programas Informáticos , Cromatografía Liquida , Células HeLa , Humanos , Espectrometría de Movilidad Iónica , Espectrometría de Masas , Péptidos
7.
Proteomics ; 20(3-4): e1900306, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31981311

RESUMEN

Data-independent acquisition (DIA) generates comprehensive yet complex mass spectrometric data, which imposes the use of data-dependent acquisition (DDA) libraries for deep peptide-centric detection. Here, it is shown that DIA can be redeemed from this dependency by combining predicted fragment intensities and retention times with narrow window DIA. This eliminates variation in library building and omits stochastic sampling, finally making the DIA workflow fully deterministic. Especially for clinical proteomics, this has the potential to facilitate inter-laboratory comparison.


Asunto(s)
Cromatografía Liquida/métodos , Minería de Datos/métodos , Espectrometría de Masas/métodos , Péptidos/análisis , Proteoma/análisis , Proteómica/métodos , Biología Computacional/métodos , Bases de Datos de Proteínas , Células HeLa , Humanos , Biblioteca de Péptidos , Programas Informáticos
8.
J Proteome Res ; 18(11): 3840-3849, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31429292

RESUMEN

Mass spectrometry (MS) has become the technique of choice for large-scale analysis of histone post-translational modifications (hPTMs) and their combinatorial patterns, especially in untargeted settings where novel discovery-driven hypotheses are being generated. However, MS-based histone analysis requires a distinct sample preparation, acquisition, and data analysis workflow when compared to traditional MS-based approaches. To this end, sequential window acquisition of all theoretical fragment ion spectra (SWATH) has great potential, as it allows for untargeted accurate identification and quantification of hPTMs. Here, we present a complete SWATH workflow specifically adapted for the untargeted study of histones (hSWATH). We assess its validity on a technical dataset of time-lapse deacetylation of a commercial histone extract using HDAC1, which contains a ground truth, i.e., acetylated substrate peptides reduce in intensity. We successfully apply this workflow in a biological setting and subsequently investigate the differential response to HDAC inhibition in different breast cancer cell lines.


Asunto(s)
Cromatografía Liquida/métodos , Histonas/metabolismo , Péptidos/metabolismo , Procesamiento Proteico-Postraduccional , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos , Acetilación/efectos de los fármacos , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Línea Celular Tumoral , Femenino , Inhibidores de Histona Desacetilasas/farmacología , Humanos , Biblioteca de Péptidos , Reproducibilidad de los Resultados
9.
Proteomics ; 18(24): e1800186, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30387297

RESUMEN

Sequential windows acquisition of all theoretical fragment ions mass spectrometry (SWATH-MS) provides large-scale protein quantification with high accuracy and selectivity. Nevertheless, reliable quantification of low-abundant signals in complex samples remains challenging, as recently illustrated in a multicenter benchmark study of different label-free software tools. Here, the SWATH Replicates Analysis 2.0 template from Sciex is used to highlight that the relationship between the MS2 peak area and the variability can be described by a function. This functional relationship appears to be largely insensitive to variation in samples or acquisition conditions, suggesting a device-intrinsic property. By using a power regression, it is shown that the MS2 peak area can be used to predict the quantification repeatability without relying on replicate injections, thus contributing to high-throughput confident quantification of low-abundant signals with SWATH-MS.


Asunto(s)
Espectrometría de Masas/métodos , Fragmentos de Péptidos/análisis , Proteínas/análisis , Proteómica/métodos , Programas Informáticos , Humanos , Reproducibilidad de los Resultados
10.
Proteomics ; 17(15-16)2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-28664598

RESUMEN

For data-independent acquisition by means of sequential window acquisition of all theoretical fragment ion spectra (SWATH), a reference library of data-dependent acquisition (DDA) runs is typically used to correlate the quantitative data from the fragment ion spectra with peptide identifications. The quality and coverage of such a reference library is therefore essential when processing SWATH data. In general, library sizes can be increased by reducing the impact of DDA precursor selection with replicate runs or fractionation. However, these strategies can affect the match between the library and SWATH measurement, and thus larger library sizes do not necessarily correspond to improved SWATH quantification. Here, three fractionation strategies to increase local library size were compared to standard library building using replicate DDA injection: protein SDS-PAGE fractionation, peptide high-pH RP-HPLC fractionation and MS-acquisition gas phase fractionation. The impact of these libraries on SWATH performance was evaluated in terms of the number of extracted peptides and proteins, the match quality of the peptides and the extraction reproducibility of the transitions. These analyses were conducted using the hydrophilic proteome of differentiating human embryonic stem cells. Our results show that SWATH quantitative results and interpretations are affected by choice of fractionation technique. Data are available via ProteomeXchange with identifier PXD006190.


Asunto(s)
Fraccionamiento Químico/métodos , Células Madre Embrionarias/metabolismo , Biblioteca de Péptidos , Proteómica/métodos , Programas Informáticos , Cromatografía Líquida de Alta Presión , Electroforesis en Gel de Poliacrilamida , Células Madre Embrionarias/citología , Humanos , Espectrometría de Masas , Proteoma/análisis , Reproducibilidad de los Resultados
11.
J Proteome Res ; 16(2): 655-664, 2017 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-28152592

RESUMEN

Epigenetic changes can be studied with an untargeted characterization of histone post-translational modifications (PTMs) by liquid chromatography-mass spectrometry (LC-MS). While prior information about more than 20 types of histone PTMs exists, little is known about histone PTM combinations (PTMCs). Because of the combinatorial explosion it is intrinsically impossible to consider all potential PTMCs in a database search. Consequentially, high-scoring false positives with unconsidered but correct alternative isobaric PTMCs can occur. Current quality controls can neither estimate the amount of unconsidered alternatives nor flag potential false positives. Here, we propose a conceptual workflow that provides such options. In this workflow, an in silico modeling of all candidate isoforms with known-to-exist PTMs is made. The most frequently occurring PTM sets of these candidate isoforms are determined and used in several database searches. This suppresses the combinatorial explosion while considering as many candidate isoforms as possible. Finally, annotations can be classified as unique or ambiguous, the latter implying false positives. This workflow was evaluated on an LC-MS data set containing 44 histone extracts. We were able to consider 60% of all candidate isoforms. Importantly, 40% of all annotations were classified as ambiguous. This highlights the need for a more thorough evaluation of modified peptide annotations.


Asunto(s)
Histonas/genética , Isoformas de Proteínas/genética , Procesamiento Proteico-Postraduccional/genética , Proteómica , Secuencia de Aminoácidos/genética , Cromatografía Liquida , Simulación por Computador , Epigénesis Genética/genética , Histonas/metabolismo , Humanos , Células Jurkat , Anotación de Secuencia Molecular , Isoformas de Proteínas/metabolismo , Espectrometría de Masas en Tándem
12.
Proteomics ; 16(14): 1970-4, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27139031

RESUMEN

Histone proteins are essential elements for DNA packaging. Moreover, the PTMs that are extremely abundant on these proteins, contribute in modeling chromatin structure and recruiting enzymes involved in gene regulation, DNA repair and chromosome condensation. This fundamental aspect, together with the epigenetic inheritance of histone PTMs, underlines the importance of having biochemical techniques for their characterization. Over the past two decades, significant improvements in mass accuracy and resolution of mass spectrometers have made LC-coupled MS the strategy of choice for accurate identification and quantification of protein PTMs. Nevertheless, in previous work we disclosed the limitations and biases of the most widely adopted sample preparation protocols for histone propionylation, required prior to bottom-up MS analysis. In this work, however, we put forward a new specific and efficient propionylation strategy by means of propionic anhydride. In this method, aspecific overpropionylation at serine (S), threonine (T) and tyrosine (Y) is reversed by adding hydroxylamine (HA). We recommend using this method for future analysis of histones through bottom-up MS.


Asunto(s)
Anhídridos/química , Histonas/análisis , Fragmentos de Péptidos/análisis , Propionatos/química , Procesamiento Proteico-Postraduccional , Proteómica/métodos , Secuencia de Aminoácidos , Hidróxido de Amonio/química , Anhídridos/metabolismo , Arginina/química , Arginina/metabolismo , Artefactos , Código de Histonas , Histonas/química , Histonas/metabolismo , Humanos , Concentración de Iones de Hidrógeno , Hidroxilamina/química , Lisina/química , Lisina/metabolismo , Espectrometría de Masas/normas , Mapeo Peptídico , Propionatos/metabolismo , Solventes/química , Tripsina/química
13.
Proteomics ; 16(23): 2937-2944, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27718312

RESUMEN

Extracting histones from cells is the first step in studies that aim to characterize histones and their post-translational modifications (hPTMs) with MS. In the last decade, label-free quantification is more frequently being used for MS-based histone characterization. However, many histone extraction protocols were not specifically designed for label-free MS. While label-free quantification has its advantages, it is also very susceptible to technical variation. Here, we adjust an established histone extraction protocol according to general label-free MS guidelines with a specific focus on minimizing sample handling. These protocols are first evaluated using SDS-PAGE. Hereafter, a selection of extraction protocols was used in a complete histone workflow for label-free MS. All protocols display nearly identical relative quantification of hPTMs. We thus show that, depending on the cell type under investigation and at the cost of some additional contaminating proteins, minimizing sample handling can be done during histone isolation. This allows analyzing bigger sample batches, leads to reduced technical variation and minimizes the chance of in vitro alterations to the hPTM snapshot. Overall, these results allow researchers to determine the best protocol depending on the resources and goal of their specific study. Data are available via ProteomeXchange with identifier PXD002885.


Asunto(s)
Histonas/aislamiento & purificación , Espectrometría de Masas/métodos , Proteómica/métodos , Fraccionamiento Químico/métodos , Electroforesis en Gel de Poliacrilamida , Células Madre Embrionarias , Histonas/análisis , Histonas/metabolismo , Humanos , Procesamiento Proteico-Postraduccional , Reproducibilidad de los Resultados , Flujo de Trabajo
14.
Nat Commun ; 15(1): 2168, 2024 Mar 09.
Artículo en Inglés | MEDLINE | ID: mdl-38461149

RESUMEN

In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, we develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances. We provide an easy on-ramp for community contributions through the concept of literate programming, implemented in Jupyter Notebooks. Large datasets can rapidly be processed as shown by the analysis of hundreds of proteomes in minutes per file, many-fold faster than acquisition. AlphaPept can be used to build automated processing pipelines with web-serving functionality and compatibility with downstream analysis tools. It provides easy access via one-click installation, a modular Python library for advanced users, and via an open GitHub repository for developers.


Asunto(s)
Proteómica , Programas Informáticos , Proteómica/métodos , Espectrometría de Masas/métodos , Proteoma
15.
BJGP Open ; 7(1)2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36343966

RESUMEN

BACKGROUND: Recent studies suggest that ethnic minority students underperform in standardised assessments commonly used to evaluate their progress. This disparity seems to also hold for postgraduate medical students and GP trainees, and may affect the quality of primary health care, which requires an optimally diverse workforce. AIMS: To address the following: 1) to determine to what extent ethnic minority GP trainees are more at risk of being assessed as underperforming than their majority peers; 2) to investigate whether established underperformance appears in specific competence areas; and 3) to explore first- and second-generation ethnic minority trainees' deviations. DESIGN & SETTING: Quantitative retrospective cohort design in Dutch GP specialty training (start years: 2015-2017). METHOD: In 2020-2021, the authors evaluated files on assessed underperformance of 1700 GP trainees at seven Dutch GP specialty training institutes after excluding five opt-outs and 165 incomplete datasets (17.4% ethnic minority trainees). Underperformance was defined as the occurrence of the following, which was prompted by the training institute: 1) preliminary dropout; 2) extension of the educational pathway; and/or 3) mandatory coaching pathways. Statistics Netherlands (CBS) anonymised the files and added data about ethnic group. Thereafter, the authors performed logistic regression for potential underperformance analysis and χ2 tests for competence area analysis. RESULTS: Ethnic minority GP trainees were more likely to face underperformance assessments than the majority group (odds ratio [OR] 2.41, 95% confidence interval [CI] = 1.67 to 3.49). Underperformance was not significantly nested in particular competence areas. First-generation ethnic minority trainees seemed more at risk than their second-generation peers. CONCLUSION: Ethnic minority GP trainees seem more at risk of facing educational barriers than the majority group. Additional qualitative research on underlying factors is essential.

16.
Sci Rep ; 12(1): 1256, 2022 01 24.
Artículo en Inglés | MEDLINE | ID: mdl-35075221

RESUMEN

Toxicoepigenetics is an emerging field that studies the toxicological impact of compounds on protein expression through heritable, non-genetic mechanisms, such as histone post-translational modifications (hPTMs). Due to substantial progress in the large-scale study of hPTMs, integration into the field of toxicology is promising and offers the opportunity to gain novel insights into toxicological phenomena. Moreover, there is a growing demand for high-throughput human-based in vitro assays for toxicity testing, especially for developmental toxicity. Consequently, we developed a mass spectrometry-based proof-of-concept to assess a histone code screening assay capable of simultaneously detecting multiple hPTM-changes in human embryonic stem cells. We first validated the untargeted workflow with valproic acid (VPA), a histone deacetylase inhibitor. These results demonstrate the capability of mapping the hPTM-dynamics, with a general increase in acetylations as an internal control. To illustrate the scalability, a dose-response study was performed on a proof-of-concept library of ten compounds (1) with a known effect on the hPTMs (BIX-01294, 3-Deazaneplanocin A, Trichostatin A, and VPA), (2) classified as highly embryotoxic by the European Centre for the Validation of Alternative Methods (ECVAM) (Methotrexate, and All-trans retinoic acid), (3) classified as non-embryotoxic by ECVAM (Penicillin G), and (4) compounds of abuse with a presumed developmental toxicity (ethanol, caffeine, and nicotine).


Asunto(s)
Código de Histonas , Espectrometría de Masas , Procesamiento Proteico-Postraduccional , Teratógenos/análisis , Pruebas de Toxicidad/métodos , Humanos , Prueba de Estudio Conceptual
17.
Nat Commun ; 13(1): 7238, 2022 11 24.
Artículo en Inglés | MEDLINE | ID: mdl-36433986

RESUMEN

Machine learning and in particular deep learning (DL) are increasingly important in mass spectrometry (MS)-based proteomics. Recent DL models can predict the retention time, ion mobility and fragment intensities of a peptide just from the amino acid sequence with good accuracy. However, DL is a very rapidly developing field with new neural network architectures frequently appearing, which are challenging to incorporate for proteomics researchers. Here we introduce AlphaPeptDeep, a modular Python framework built on the PyTorch DL library that learns and predicts the properties of peptides ( https://github.com/MannLabs/alphapeptdeep ). It features a model shop that enables non-specialists to create models in just a few lines of code. AlphaPeptDeep represents post-translational modifications in a generic manner, even if only the chemical composition is known. Extensive use of transfer learning obviates the need for large data sets to refine models for particular experimental conditions. The AlphaPeptDeep models for predicting retention time, collisional cross sections and fragment intensities are at least on par with existing tools. Additional sequence-based properties can also be predicted by AlphaPeptDeep, as demonstrated with a HLA peptide prediction model to improve HLA peptide identification for data-independent acquisition ( https://github.com/MannLabs/PeptDeep-HLA ).


Asunto(s)
Aprendizaje Profundo , Proteómica , Proteómica/métodos , Péptidos/química , Secuencia de Aminoácidos , Redes Neurales de la Computación
18.
Sci Data ; 9(1): 126, 2022 03 30.
Artículo en Inglés | MEDLINE | ID: mdl-35354825

RESUMEN

In the last decade, a revolution in liquid chromatography-mass spectrometry (LC-MS) based proteomics was unfolded with the introduction of dozens of novel instruments that incorporate additional data dimensions through innovative acquisition methodologies, in turn inspiring specialized data analysis pipelines. Simultaneously, a growing number of proteomics datasets have been made publicly available through data repositories such as ProteomeXchange, Zenodo and Skyline Panorama. However, developing algorithms to mine this data and assessing the performance on different platforms is currently hampered by the lack of a single benchmark experimental design. Therefore, we acquired a hybrid proteome mixture on different instrument platforms and in all currently available families of data acquisition. Here, we present a comprehensive Data-Dependent and Data-Independent Acquisition (DDA/DIA) dataset acquired using several of the most commonly used current day instrumental platforms. The dataset consists of over 700 LC-MS runs, including adequate replicates allowing robust statistics and covering over nearly 10 different data formats, including scanning quadrupole and ion mobility enabled acquisitions. Datasets are available via ProteomeXchange (PXD028735).


Asunto(s)
Benchmarking , Proteómica , Animales , Cromatografía Liquida/métodos , Humanos , Espectrometría de Masas/métodos , Proteoma
19.
Proteomes ; 9(2)2021 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-33919160

RESUMEN

Histone-based chromatin organization enabled eukaryotic genome complexity. This epigenetic control mechanism allowed for the differentiation of stable gene-expression and thus the very existence of multicellular organisms. This existential role in biology makes histones one of the most complexly modified molecules in the biotic world, which makes these key regulators notoriously hard to analyze. We here provide a roadmap to enable fast and informed selection of a bottom-up mass spectrometry sample preparation protocol that matches a specific research question. We therefore propose a two-step assessment procedure: (i) visualization of the coverage that is attained for a given workflow and (ii) direct alignment between runs to assess potential pitfalls at the ion level. To illustrate the applicability, we compare four different sample preparation protocols while adding a new enzyme to the toolbox, i.e., RgpB (GingisREX®, Genovis, Lund, Sweden), an endoproteinase that selectively and efficiently cleaves at the c-terminal end of arginine residues. Raw data are available via ProteomeXchange with identifier PXD024423.

20.
Mol Omics ; 17(6): 929-938, 2021 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-34522942

RESUMEN

Histone-based chromatin organization paved the way for eukaryotic genome complexity. Because of their key role in information management, the histone posttranslational modifications (hPTM), which mediate their function, have evolved into an alphabet that has more letters than there are amino acids, together making up the "histone code". The resulting combinatorial complexity is manifold higher than what is usually encountered in proteomics. Consequently, a considerably bigger part of the acquired MSMS spectra remains unannotated to date. Adapted search parameters can dig deeper into the dark histone ion space, but the lack of false discovery rate (FDR) control and the high level of ambiguity when searching combinatorial PTMs makes it very hard to assess whether the newly assigned ions are informative. Therefore, we propose an easily adoptable time-lapse enzymatic deacetylation (HDAC1) of a commercial histone extract as a quantify-first strategy that allows isolating ion populations of interest, when studying e.g. acetylation on histones, that currently remain in the dark. By adapting search parameters to study potential issues in sample preparation, data acquisition and data analysis, we stepwise managed to double the portion of annotated precursors of interest from 10.5% to 21.6%. This strategy is intended to make up for the lack of validated FDR control and has led to several adaptations of our current workflow that will reduce the portion of the dark histone ion space in the future. Finally, this strategy can be applied with any enzyme targeting a modification of interest.


Asunto(s)
Histonas , Proyectos de Investigación , Código de Histonas , Histonas/metabolismo , Procesamiento Proteico-Postraduccional , Proteómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA