Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 106
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Proteome Res ; 23(1): 418-429, 2024 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-38038272

RESUMEN

The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.


Asunto(s)
Benchmarking , Proteómica , Flujo de Trabajo , Programas Informáticos , Proteínas , Análisis de Datos
2.
J Proteome Res ; 22(2): 514-519, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36173614

RESUMEN

It has long been known that biological species can be identified from mass spectrometry data alone. Ten years ago, we described a method and software tool, compareMS2, for calculating a distance between sets of tandem mass spectra, as routinely collected in proteomics. This method has seen use in species identification and mixture characterization in food and feed products, as well as other applications. Here, we present the first major update of this software, including a new metric, a graphical user interface and additional functionality. The data have been deposited to ProteomeXchange with dataset identifier PXD034932.


Asunto(s)
Programas Informáticos , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Proteómica/métodos , Algoritmos
3.
J Proteome Res ; 22(2): 632-636, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36693629

RESUMEN

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.


Asunto(s)
Algoritmos , Proteómica , Proteómica/métodos , Reproducibilidad de los Resultados , Péptidos/análisis , Espectrometría de Masas/métodos , Programas Informáticos
4.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-36744821

RESUMEN

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Asunto(s)
Aprendizaje Automático , Proteómica , Proteómica/métodos , Algoritmos , Espectrometría de Masas
5.
J Proteome Res ; 21(4): 1204-1207, 2022 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-35119864

RESUMEN

Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and interpret the recently published and general DOME (Data, Optimization, Model, Evaluation) recommendations for conducting and reporting on machine learning in the specific context of proteomics and metabolomics.


Asunto(s)
Metabolómica , Proteómica , Aprendizaje Automático , Metabolómica/métodos , Proteómica/métodos , Espectrometría de Masas en Tándem
6.
Anal Chem ; 94(44): 15464-15471, 2022 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-36281827

RESUMEN

A major obstacle for reusing and integrating existing data is finding the data that is most relevant in a given context. The primary metadata resource is the scientific literature describing the experiments that produced the data. To stimulate the development of natural language processing methods for extracting this information from articles, we have manually annotated 100 recent open access publications in Analytical Chemistry as semantic graphs. We focused on articles mentioning mass spectrometry in their experimental sections, as we are particularly interested in the topic, which is also within the domain of several ontologies and controlled vocabularies. The resulting gold standard dataset is publicly available and directly applicable to validating automated methods for retrieving this metadata from the literature. In the process, we also made a number of observations on the structure and description of experiments and open access publication in this journal.


Asunto(s)
Procesamiento de Lenguaje Natural , Semántica , Proyectos de Investigación , Química Analítica
7.
Brief Bioinform ; 21(5): 1697-1705, 2020 09 25.
Artículo en Inglés | MEDLINE | ID: mdl-31624831

RESUMEN

The corpus of bioinformatics resources is huge and expanding rapidly, presenting life scientists with a growing challenge in selecting tools that fit the desired purpose. To address this, the European Infrastructure for Biological Information is supporting a systematic approach towards a comprehensive registry of tools and databases for all domains of bioinformatics, provided under a single portal (https://bio.tools). We describe here the practical means by which scientific communities, including individual developers and projects, through major service providers and research infrastructures, can describe their own bioinformatics resources and share these via bio.tools.


Asunto(s)
Participación de la Comunidad , Biología Computacional/métodos , Programas Informáticos , Biología Computacional/normas , Sistemas de Administración de Bases de Datos , Europa (Continente) , Humanos
8.
Bioinformatics ; 37(17): 2768-2769, 2021 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-33538780

RESUMEN

SUMMARY: In mass spectrometry-based proteomics, accurate peptide masses improve identifications, alignment and quantitation. Getting the most out of any instrument therefore requires proper calibration. Here, we present a new stand-alone software, mzRecal, for universal automatic recalibration of data from all common mass analyzers using standard open formats and based on physical principles. AVAILABILITY AND IMPLEMENTATION: mzRecal is implemented in Go and freely available on https://github.com/524D/mzRecal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.
J Proteome Res ; 20(6): 3395-3399, 2021 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-33904308

RESUMEN

While mass spectrometry still dominates proteomics research, alternative and potentially disruptive, next-generation technologies are receiving increased investment and attention. Most of these technologies aim at the sequencing of single peptide or protein molecules, typically labeling or otherwise distinguishing a subset of the proteinogenic amino acids. This note considers some theoretical aspects of these future technologies from a bottom-up proteomics viewpoint, including the ability to uniquely identify human proteins as a function of which and how many amino acids can be read, enzymatic efficiency, and the maximum read length. This is done through simulations under ideal and non-ideal conditions to set benchmarks for what may be achievable with future single-molecule sequencing technology. The simulations reveal, among other observations, that the best choice of reading N amino acids performs similarly to the average choice of N+1 amino acids, and that the discrimination power of the amino acids scales with their frequency in the proteome. The simulations are agnostic with respect to the next-generation proteomics platform, and the results and conclusions should therefore be applicable to any single-molecule partial peptide sequencing technology.


Asunto(s)
Proteoma , Proteómica , Secuencia de Aminoácidos , Humanos , Espectrometría de Masas , Péptidos
10.
J Proteome Res ; 20(10): 4640-4645, 2021 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-34523928

RESUMEN

Science is full of overlooked and undervalued research waiting to be rediscovered. Proteomics is no exception. In this perspective, we follow the ripples from a 1960 study of Zuckerkandl, Jones, and Pauling comparing tryptic peptides across animal species. This pioneering work directly led to the molecular clock hypothesis and the ensuing explosion in molecular phylogenetics. In the decades following, proteins continued to provide essential clues on evolutionary history. While technology has continued to improve, contemporary proteomics has strayed from this larger biological context, rarely comparing species or asking how protein structure, function, and interactions have evolved. Here we recombine proteomics with molecular phylogenetics, highlighting the value of framing proteomic results in a larger biological context and how almost forgotten research, though technologically surpassed, can still generate new ideas and illuminate our work from a different perspective. Though it is infeasible to read all research published on a large topic, looking up older papers can be surprisingly rewarding when rediscovering a "gem" at the end of a long citation chain, aided by digital collections and perpetually helpful librarians. Proper literature study reduces unnecessary repetition and allows research to be more insightful and impactful by truly standing on the shoulders of giants. All data was uploaded to MassIVE (https://massive.ucsd.edu/) as dataset MSV000087993.


Asunto(s)
Péptidos , Proteómica , Animales , Filogenia
11.
J Proteome Res ; 20(4): 2157-2165, 2021 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-33720735

RESUMEN

The bio.tools registry is a main catalogue of computational tools in the life sciences. More than 17 000 tools have been registered by the international bioinformatics community. The bio.tools metadata schema includes semantic annotations of tool functions, that is, formal descriptions of tools' data types, formats, and operations with terms from the EDAM bioinformatics ontology. Such annotations enable the automated composition of tools into multistep pipelines or workflows. In this Technical Note, we revisit a previous case study on the automated composition of proteomics workflows. We use the same four workflow scenarios but instead of using a small set of tools with carefully handcrafted annotations, we explore workflows directly on bio.tools. We use the Automated Pipeline Explorer (APE), a reimplementation and extension of the workflow composition method previously used. Moving "into the wild" opens up an unprecedented wealth of tools and a huge number of alternative workflows. Automated composition tools can be used to explore this space of possibilities systematically. Inevitably, the mixed quality of semantic annotations in bio.tools leads to unintended or erroneous tool combinations. However, our results also show that additional control mechanisms (tool filters, configuration options, and workflow constraints) can effectively guide the exploration toward smaller sets of more meaningful workflows.


Asunto(s)
Proteómica , Programas Informáticos , Biología Computacional , Sistema de Registros , Flujo de Trabajo
12.
J Proteome Res ; 19(12): 4754-4765, 2020 12 04.
Artículo en Inglés | MEDLINE | ID: mdl-33166149

RESUMEN

Mass spectrometry has greatly improved the analysis of phosphorylation events in complex biological systems and on a large scale. Despite considerable progress, the correct identification of phosphorylated sites, their quantification, and their interpretation regarding physiological relevance remain challenging. The MS Resource Pillar of the Human Proteome Organization (HUPO) Human Proteome Project (HPP) initiated the Phosphopeptide Challenge as a resource to help the community evaluate methods, learn procedures and data analysis routines, and establish their own workflows by comparing results obtained from a standard set of 94 phosphopeptides (serine, threonine, tyrosine) and their nonphosphorylated counterparts mixed at different ratios in a neat sample and a yeast background. Participants analyzed both samples with their method(s) of choice to report the identification and site localization of these peptides, determine their relative abundances, and enrich for the phosphorylated peptides in the yeast background. We discuss the results from 22 laboratories that used a range of different methods, instruments, and analysis software. We reanalyzed submitted data with a single software pipeline and highlight the successes and challenges in correct phosphosite localization. All of the data from this collaborative endeavor are shared as a resource to encourage the development of even better methods and tools for diverse phosphoproteomic applications. All submitted data and search results were uploaded to MassIVE (https://massive.ucsd.edu/) as data set MSV000085932 with ProteomeXchange identifier PXD020801.


Asunto(s)
Fosfopéptidos , Proteoma , Humanos , Espectrometría de Masas , Fosforilación , Proteómica
13.
Brief Bioinform ; 19(2): 210-218, 2018 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-28011752

RESUMEN

In mass spectrometry-based proteomics, peptides are typically identified from tandem mass spectra using spectrum comparison. A sequence search engine compares experimentally obtained spectra with those predicted from protein sequences, applying enzyme cleavage and fragmentation rules. To this, there are two main alternatives: spectral libraries and de novo sequencing. The former compares measured spectra with a collection of previously acquired and identified spectra in a library. De novo attempts to sequence peptides from the tandem mass spectra alone. We here present a theoretical framework and a data processing workflow for visualizing and comparing the results of these different types of algorithms. The method considers the three search strategies as different dimensions, identifies distinct agreement classes and visualizes the complementarity of the search strategies. We have included X! Tandem, SpectraST and PepNovo, as they are in common use and representative for algorithms of each type. Our method allows advanced investigation of how the three search methods perform relatively to each other and shows the impact of the currently used decoy sequences for evaluating the false discovery rates.


Asunto(s)
Gráficos por Computador , Proteínas de Escherichia coli/metabolismo , Escherichia coli/metabolismo , Fragmentos de Péptidos/análisis , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos
14.
Bioinformatics ; 35(4): 656-664, 2019 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-30060113

RESUMEN

MOTIVATION: Numerous software utilities operating on mass spectrometry (MS) data are described in the literature and provide specific operations as building blocks for the assembly of on-purpose workflows. Working out which tools and combinations are applicable or optimal in practice is often hard. Thus researchers face difficulties in selecting practical and effective data analysis pipelines for a specific experimental design. RESULTS: We provide a toolkit to support researchers in identifying, comparing and benchmarking multiple workflows from individual bioinformatics tools. Automated workflow composition is enabled by the tools' semantic annotation in terms of the EDAM ontology. To demonstrate the practical use of our framework, we created and evaluated a number of logically and semantically equivalent workflows for four use cases representing frequent tasks in MS-based proteomics. Indeed we found that the results computed by the workflows could vary considerably, emphasizing the benefits of a framework that facilitates their systematic exploration. AVAILABILITY AND IMPLEMENTATION: The project files and workflows are available from https://github.com/bio-tools/biotoolsCompose/tree/master/Automatic-Workflow-Composition. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Espectrometría de Masas , Proteómica , Flujo de Trabajo , Biología Computacional , Programas Informáticos
15.
Beilstein J Org Chem ; 16: 3038-3051, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33363672

RESUMEN

Glycoproteomic data are often very complex, reflecting the high structural diversity of peptide and glycan portions. The use of glycopeptide-centered glycoproteomics by mass spectrometry is rapidly evolving in many research areas, leading to a demand in reliable data analysis tools. In recent years, several bioinformatic tools were developed to facilitate and improve both the identification and quantification of glycopeptides. Here, a selection of these tools was combined and evaluated with the aim of establishing a robust glycopeptide detection and quantification workflow targeting enriched glycoproteins. For this purpose, a tryptic digest from affinity-purified immunoglobulins G and A was analyzed on a nano-reversed-phase liquid chromatography-tandem mass spectrometry platform with a high-resolution mass analyzer and higher-energy collisional dissociation fragmentation. Initial glycopeptide identification based on MS/MS data was aided by the Byonic software. Additional MS1-based glycopeptide identification relying on accurate mass and retention time differences using GlycopeptideGraphMS considerably expanded the set of confidently annotated glycopeptides. For glycopeptide quantification, the performance of LaCyTools was compared to Skyline, and GlycopeptideGraphMS. All quantification packages resulted in comparable glycosylation profiles but featured differences in terms of robustness and data quality control. Partial cysteine oxidation was identified as an unexpectedly abundant peptide modification and impaired the automated processing of several IgA glycopeptides. Finally, this study presents a semiautomated workflow for reliable glycoproteomic data analysis by the combination of software packages for MS/MS- and MS1-based glycopeptide identification as well as the integration of analyte quality control and quantification.

16.
J Proteome Res ; 18(10): 3580-3585, 2019 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-31429284

RESUMEN

Proteomics is a highly dynamic field driven by frequent introduction of new technological approaches, leading to high demand for new software tools and the concurrent development of many methods for data analysis, processing, and storage. The rapidly changing landscape of proteomics software makes finding a tool fit for a particular purpose a significant challenge. The comparison of software and the selection of tools capable to perform a certain operation on a given type of data rely on their detailed annotation using well-defined descriptors. However, finding accurate information including tool input/output capabilities can be challenging and often heavily depends on manual curation efforts. This is further hampered by a rather low half-life of most of the tools, thus demanding the maintenance of a resource with updated information about the tools. We present here our approach to curate a collection of 189 software tools with detailed information about their functional capabilities. We furthermore describe our efforts to reach out to the proteomics community for their engagement, which further increased the catalog to >750 tools being about 70% of the estimated number of 1097 tools existing for proteomics data analysis. Descriptions of all annotated tools are available at  https://proteomics.bio.tools.


Asunto(s)
Proteómica/métodos , Programas Informáticos , Biología Computacional , Curaduría de Datos , Internet
17.
Anal Chem ; 91(7): 4312-4316, 2019 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-30835438

RESUMEN

The open-access scientific literature contains a wealth of information for meaningful text mining. However, this information is not always easy to retrieve. This technical note addresses the problem by a new flexible method combining in a single workflow existing resources for literature searches, text mining, and large-scale prediction of physicochemical and biological properties. The results are visualized as virtual mass spectra, chromatograms, or images in styles new to text mining but familiar to analytical chemistry. The method is demonstrated on comparisons of analytical-chemistry techniques and semantically enriched searches for proteins and their activities, but it may also be of general utility in experimental design, drug discovery, chemical syntheses, business intelligence, and historical studies. The method is realized in shareable scientific workflows using only freely available data, services, and software that scale to millions of publications and named chemical entities in the literature.

20.
J Proteome Res ; 17(5): 1879-1886, 2018 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-29631402

RESUMEN

A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.


Asunto(s)
Algoritmos , Benchmarking/métodos , Proteómica/métodos , Homología de Secuencia de Aminoácido , Benchmarking/normas , Escherichia coli/metabolismo , Humanos , Fragmentos de Péptidos/análisis , Péptidos/análisis , Proteínas/análisis , Proteínas/metabolismo , Tripsina/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA