Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 80
Filtrar
1.
Nat Methods ; 2024 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-38965444

RESUMO

The volume of public proteomics data is rapidly increasing, causing a computational challenge for large-scale reanalysis. Here, we introduce quantms ( https://quant,ms.org/ ), an open-source cloud-based pipeline for massively parallel proteomics data analysis. We used quantms to reanalyze 83 public ProteomeXchange datasets, comprising 29,354 instrument files from 13,132 human samples, to quantify 16,599 proteins based on 1.03 million unique peptides. quantms is based on standard file formats improving the reproducibility, submission and dissemination of the data to ProteomeXchange.

2.
Nat Methods ; 21(4): 673-679, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38438615

RESUMO

Spatial landmarks are crucial in describing histological features between samples or sites, tracking regions of interest in microscopy, and registering tissue samples within a common coordinate framework. Although other studies have explored unsupervised landmark detection, existing methods are not well-suited for histological image data as they often require a large number of images to converge, are unable to handle nonlinear deformations between tissue sections and are ineffective for z-stack alignment, other modalities beyond image data or multimodal data. We address these challenges by introducing effortless landmark detection, a new unsupervised landmark detection and registration method using neural-network-guided thin-plate splines. Our proposed method is evaluated on a diverse range of datasets including histology and spatially resolved transcriptomics, demonstrating superior performance in both accuracy and stability compared to existing approaches.


Assuntos
Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos
3.
Nature ; 628(8007): 450-457, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38408488

RESUMO

Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention in three-dimensional computer graphics programs1,2. Here we present ModelAngelo, a machine-learning approach for automated atomic model building in cryo-EM maps. By combining information from the cryo-EM map with information from protein sequence and structure in a single graph neural network, ModelAngelo builds atomic models for proteins that are of similar quality to those generated by human experts. For nucleotides, ModelAngelo builds backbones with similar accuracy to those built by humans. By using its predicted amino acid probabilities for each residue in hidden Markov model sequence searches, ModelAngelo outperforms human experts in the identification of proteins with unknown sequences. ModelAngelo will therefore remove bottlenecks and increase objectivity in cryo-EM structure determination.


Assuntos
Microscopia Crioeletrônica , Aprendizado de Máquina , Modelos Moleculares , Proteínas , Sequência de Aminoácidos , Microscopia Crioeletrônica/métodos , Microscopia Crioeletrônica/normas , Cadeias de Markov , Redes Neurais de Computação , Conformação Proteica , Proteínas/química , Proteínas/ultraestrutura , Gráficos por Computador
4.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38195928

RESUMO

MOTIVATION: In pathway analysis, we aim to establish a connection between the activity of a particular biological pathway and a difference in phenotype. There are many available methods to perform pathway analysis, many of them rely on an upstream differential expression analysis, and many model the relations between the abundances of the analytes in a pathway as linear relationships. RESULTS: Here, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles and, therefore, does not model the association between pathway activity and phenotype, resulting in relatively few assumptions. For this, we construct a graph of the data points for each pathway using a nearest-neighbor approach and score the association between the structure of this graph and the phenotype of these same samples using Mutual Information while adjusting for the effects of random chance in each score. The initial nearest neighbor approach evades individual gene-level comparisons, hence making the method scalable and less vulnerable to missing values. These properties make our method particularly useful for single-cell data. We benchmarked our method on several single-cell datasets, comparing it to established and new methods, and found that it produces robust, reproducible, and meaningful scores. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/statisticalbiotechnology/mipath, or through Python Package Index as "mipathway."


Assuntos
Software , Fenótipo , Análise por Conglomerados
5.
J Biomol Tech ; 34(3)2023 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-37969874

RESUMO

Metaproteomics research using mass spectrometry data has emerged as a powerful strategy to understand the mechanisms underlying microbiome dynamics and the interaction of microbiomes with their immediate environment. Recent advances in sample preparation, data acquisition, and bioinformatics workflows have greatly contributed to progress in this field. In 2020, the Association of Biomolecular Research Facilities Proteome Informatics Research Group launched a collaborative study to assess the bioinformatics options available for metaproteomics research. The study was conducted in 2 phases. In the first phase, participants were provided with mass spectrometry data files and were asked to identify the taxonomic composition and relative taxa abundances in the samples without supplying any protein sequence databases. The most challenging question asked of the participants was to postulate the nature of any biological phenomena that may have taken place in the samples, such as interactions among taxonomic species. In the second phase, participants were provided a protein sequence database composed of the species present in the sample and were asked to answer the same set of questions as for phase 1. In this report, we summarize the data processing methods and tools used by participants, including database searching and software tools used for taxonomic and functional analysis. This study provides insights into the status of metaproteomics bioinformatics in participating laboratories and core facilities.


Assuntos
Proteoma , Proteômica , Humanos , Proteômica/métodos , Software , Biologia Computacional , Bases de Dados de Proteínas
6.
Nat Biotechnol ; 2023 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-37667091

RESUMO

We present a spatial omics approach that combines histology, mass spectrometry imaging and spatial transcriptomics to facilitate precise measurements of mRNA transcripts and low-molecular-weight metabolites across tissue regions. The workflow is compatible with commercially available Visium glass slides. We demonstrate the potential of our method using mouse and human brain samples in the context of dopamine and Parkinson's disease.

7.
J Proteome Res ; 22(10): 3190-3199, 2023 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-37656829

RESUMO

Precision medicine focuses on adapting care to the individual profile of patients, for example, accounting for their unique genetic makeup. Being able to account for the effect of genetic variation on the proteome holds great promise toward this goal. However, identifying the protein products of genetic variation using mass spectrometry has proven very challenging. Here we show that the identification of variant peptides can be improved by the integration of retention time and fragmentation predictors into a unified proteogenomic pipeline. By combining these intrinsic peptide characteristics using the search-engine post-processor Percolator, we demonstrate improved discrimination power between correct and incorrect peptide-spectrum matches. Our results demonstrate that the drop in performance that is induced when expanding a protein sequence database can be compensated, hence enabling efficient identification of genetic variation products in proteomics data. We anticipate that this enhancement of proteogenomic pipelines can provide a more refined picture of the unique proteome of patients and thereby contribute to improving patient care.

8.
bioRxiv ; 2023 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-37292681

RESUMO

Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention. We present ModelAngelo, a machine-learning approach for automated atomic model building in cryo-EM maps. By combining information from the cryo-EM map with information from protein sequence and structure in a single graph neural network, ModelAngelo builds atomic models for proteins that are of similar quality as those generated by human experts. For nucleotides, ModelAngelo builds backbones with similar accuracy as humans. By using its predicted amino acid probabilities for each residue in hidden Markov model sequence searches, ModelAngelo outperforms human experts in the identification of proteins with unknown sequences. ModelAngelo will thus remove bottlenecks and increase objectivity in cryo-EM structure determination.

9.
J Proteome Res ; 22(4): 1359-1366, 2023 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-36988210

RESUMO

A frequent goal, or subgoal, when processing data from a quantitative shotgun proteomics experiment is a list of proteins that are differentially abundant under the examined experimental conditions. Unfortunately, obtaining such a list is a challenging process, as the mass spectrometer analyzes the proteolytic peptides of a protein rather than the proteins themselves. We have previously designed a Bayesian hierarchical probabilistic model, Triqler, for combining peptide identification and quantification errors into probabilities of proteins being differentially abundant. However, the model was developed for data from data-dependent acquisition. Here, we show that Triqler is also compatible with data-independent acquisition data after applying minor alterations for the missing value distribution. Furthermore, we find that it has better performance than a set of compared state-of-the-art protein summarization tools when evaluated on data-independent acquisition data.


Assuntos
Peptídeos , Proteínas , Teorema de Bayes , Proteínas/análise , Peptídeos/análise , Espectrometria de Massas/métodos , Proteômica/métodos
10.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-36744821

RESUMO

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Assuntos
Aprendizado de Máquina , Proteômica , Proteômica/métodos , Algoritmos , Espectrometria de Massas
11.
Methods Mol Biol ; 2426: 91-117, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36308686

RESUMO

Protein quantification for shotgun proteomics is a complicated process where errors can be introduced in each of the steps. Triqler is a Python package that estimates and integrates errors of the different parts of the label-free protein quantification pipeline into a single Bayesian model. Specifically, it weighs the quantitative values by the confidence we have in the correctness of the corresponding PSM. Furthermore, it treats missing values in a way that reflects their uncertainty relative to observed values. Finally, it combines these error estimates in a single differential abundance FDR that not only reflects the errors and uncertainties in quantification but also in identification. In this tutorial, we show how to (1) generate input data for Triqler from quantification packages such as MaxQuant and Quandenser, (2) run Triqler and what the different options are, (3) interpret the results, (4) investigate the posterior distributions of a protein of interest in detail, and (5) verify that the hyperparameter estimations are sensible.


Assuntos
Proteínas , Proteômica , Teorema de Bayes , Incerteza , Proteômica/métodos , Software
12.
J Proteome Res ; 21(6): 1566-1574, 2022 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-35549218

RESUMO

Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Análise por Conglomerados , Consenso , Bases de Dados de Proteínas , Proteômica/métodos , Software , Espectrometria de Massas em Tandem/métodos
13.
J Proteome Res ; 21(5): 1359-1364, 2022 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-35413196

RESUMO

Machine learning has been an integral part of interpreting data from mass spectrometry (MS)-based proteomics for a long time. Relatively recently, a machine-learning structure appeared successful in other areas of bioinformatics, Transformers. Furthermore, the implementation of Transformers within bioinformatics has become relatively convenient due to transfer learning, i.e., adapting a network trained for other tasks to new functionality. Transfer learning makes these relatively large networks more accessible as it generally requires less data, and the training time improves substantially. We implemented a Transformer based on the pretrained model TAPE to predict MS2 intensities. TAPE is a general model trained to predict missing residues from protein sequences. Despite being trained for a different task, we could modify its behavior by adding a prediction head at the end of the TAPE model and fine-tune it using the spectrum intensity from the training set to the well-known predictor Prosit. We demonstrate that the predictor, which we call Prosit Transformer, outperforms the recurrent neural-network-based predictor Prosit, increasing the median angular similarity on its hold-out set from 0.908 to 0.929. We believe that Transformers will significantly increase prediction accuracy for other types of predictions within MS-based proteomics.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Sequência de Aminoácidos , Espectrometria de Massas , Proteômica
14.
PLoS Comput Biol ; 18(3): e1010020, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35344554

RESUMO

High throughput biology enables the measurements of relative concentrations of thousands of biomolecules from e.g. tissue samples. The process leaves the investigator with the problem of how to best interpret the potentially large number of differences between samples. Many activities in a cell depend on ordered reactions involving multiple biomolecules, often referred to as pathways. It hence makes sense to study differences between samples in terms of altered pathway activity, using so-called pathway analysis. Traditional pathway analysis gives significance to differences in the pathway components' concentrations between sample groups, however, less frequently used methods for estimating individual samples' pathway activities have been suggested. Here we demonstrate that such a method can be used for pathway-based survival analysis. Specifically, we investigate the pathway activities' association with patients' survival time based on the transcription profiles of the METABRIC dataset. Our implementation shows that pathway activities are better prognostic markers for survival time in METABRIC than the individual transcripts. We also demonstrate that we can regress out the effect of individual pathways on other pathways, which allows us to estimate the other pathways' residual pathway activity on survival. Furthermore, we illustrate how one can visualize the often interdependent measures over hierarchical pathway databases using sunburst plots.


Assuntos
Neoplasias da Mama , Neoplasias da Mama/metabolismo , Feminino , Humanos , Prognóstico , Análise de Sobrevida
15.
J Proteome Res ; 21(4): 891-898, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35220718

RESUMO

Bottom-up proteomics provides peptide measurements and has been invaluable for moving proteomics into large-scale analyses. Commonly, a single quantitative value is reported for each protein-coding gene by aggregating peptide quantities into protein groups following protein inference or parsimony. However, given the complexity of both RNA splicing and post-translational protein modification, it is overly simplistic to assume that all peptides that map to a singular protein-coding gene will demonstrate the same quantitative response. By assuming that all peptides from a protein-coding sequence are representative of the same protein, we may miss the discovery of important biological differences. To capture the contributions of existing proteoforms, we need to reconsider the practice of aggregating protein values to a single quantity per protein-coding gene.


Assuntos
Proteínas , Proteômica , Peptídeos/genética , Peptídeos/metabolismo , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Proteoma/genética , Proteoma/metabolismo
16.
J Proteome Res ; 21(4): 1204-1207, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35119864

RESUMO

Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and interpret the recently published and general DOME (Data, Optimization, Model, Evaluation) recommendations for conducting and reporting on machine learning in the specific context of proteomics and metabolomics.


Assuntos
Metabolômica , Proteômica , Aprendizado de Máquina , Metabolômica/métodos , Proteômica/métodos , Espectrometria de Massas em Tandem
17.
Gigascience ; 122022 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-37919975

RESUMO

BACKGROUND: The nonrandom distribution of alleles of common genomic variants produces haplotypes, which are fundamental in medical and population genetic studies. Consequently, protein-coding genes with different co-occurring sets of alleles can encode different amino acid sequences: protein haplotypes. These protein haplotypes are present in biological samples and detectable by mass spectrometry, but they are not accounted for in proteomic searches. Consequently, the impact of haplotypic variation on the results of proteomic searches and the discoverability of peptides specific to haplotypes remain unknown. FINDINGS: Here, we study how common genetic haplotypes influence the proteomic search space and investigate the possibility to match peptides containing multiple amino acid substitutions to a publicly available data set of mass spectra. We found that for 12.42% of the discoverable amino acid substitutions encoded by common haplotypes, 2 or more substitutions may co-occur in the same peptide after tryptic digestion of the protein haplotypes. We identified 352 spectra that matched to such multivariant peptides, and out of the 4,582 amino acid substitutions identified, 6.37% were covered by multivariant peptides. However, the evaluation of the reliability of these matches remains challenging, suggesting that refined error rate estimation procedures are needed for such complex proteomic searches. CONCLUSIONS: As these procedures become available and the ability to analyze protein haplotypes increases, we anticipate that proteomics will provide new information on the consequences of common variation, across tissues and time.


Assuntos
Proteínas , Proteômica , Proteômica/métodos , Haplótipos , Reprodutibilidade dos Testes , Proteínas/genética , Peptídeos
18.
J Proteome Res ; 20(4): 2062-2068, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33661646

RESUMO

Error estimation for differential protein quantification by label-free shotgun proteomics is challenging due to the multitude of error sources, each contributing uncertainty to the final results. We have previously designed a Bayesian model, Triqler, to combine such error terms into one combined quantification error. Here we present an interface for Triqler that takes MaxQuant results as input, allowing quick reanalysis of already processed data. We demonstrate that Triqler outperforms the original processing for a large set of both engineered and clinical/biological relevant data sets. Triqler and its interface to MaxQuant are available as a Python module under an Apache 2.0 license from https://pypi.org/project/triqler/.


Assuntos
Proteômica , Software , Teorema de Bayes , Proteínas
19.
Sci Adv ; 7(8)2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33608280

RESUMO

Induction of the one-carbon cycle is an early hallmark of mitochondrial dysfunction and cancer metabolism. Vital intermediary steps are localized to mitochondria, but it remains unclear how one-carbon availability connects to mitochondrial function. Here, we show that the one-carbon metabolite and methyl group donor S-adenosylmethionine (SAM) is pivotal for energy metabolism. A gradual decline in mitochondrial SAM (mitoSAM) causes hierarchical defects in fly and mouse, comprising loss of mitoSAM-dependent metabolites and impaired assembly of the oxidative phosphorylation system. Complex I stability and iron-sulfur cluster biosynthesis are directly controlled by mitoSAM levels, while other protein targets are predominantly methylated outside of the organelle before import. The mitoSAM pool follows its cytosolic production, establishing mitochondria as responsive receivers of one-carbon units. Thus, we demonstrate that cellular methylation potential is required for energy metabolism, with direct relevance for pathophysiology, aging, and cancer.

20.
J Proteome Res ; 20(4): 1849-1854, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33529032

RESUMO

Nonparametric statistical tests are an integral part of scientific experiments in a diverse range of fields. When performing such tests, it is standard to sort values; however, this requires Ω(n log(n)) time to sort n values. Thus given enough data, sorting becomes the computational bottleneck, even with very optimized implementations such as the C++ standard library routine, std::sort. Frequently, a nonparametric statistical test is only used to partition values above and below a threshold in the sorted ordering, where the threshold corresponds to a significant statistical result. Linear-time selection and partitioning algorithms cannot be directly used because the selection and partitioning are performed on the transformed statistical significance values rather than on the sorted statistics. Usually, those transformed statistical significance values (e.g., the p value when investigating the family-wise error rate and q values when investigating the false discovery rate (FDR)) can only be computed at a threshold. Because this threshold is unknown, this leads to sorting the data. Layer-ordered heaps, which can be constructed in O(n), only partially sort values and thus can be used to get around the slow runtime required to fully sort. Here we introduce a layer-ordering-based method for selection and partitioning on the transformed values (e.g., p values or q values). We demonstrate the use of this method to partition peptides using an FDR threshold. This approach is applied to speed up Percolator, a postprocessing algorithm used in mass-spectrometry-based proteomics to evaluate the quality of peptide-spectrum matches (PSMs), by >70% on data sets with 100 million PSMs.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Bases de Dados de Proteínas , Peptídeos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA