Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-39253518

RESUMO

Missing values are a major challenge in the analysis of mass spectrometry proteomics data. Missing values hinder reproducibility, decrease statistical power for identifying differentially expressed (DE) proteins and make it challenging to analyze low-abundance proteins. We present Lupine, a deep learning-based method for imputing, or estimating, missing values in tandem mass tag (TMT) proteomics data. Lupine is, to our knowledge, the first imputation method that is designed to learn jointly from many datasets, and we provide evidence that this approach leads to more accurate predictions. We validated Lupine by applying it to TMT data from >1,000 cancer patient samples spanning ten cancer types from the Clinical Proteomics Tumor Atlas Consortium (CPTAC). Lupine outperforms the state of the art for TMT imputation, identifies more DE proteins than other methods, corrects for TMT batch effects, and learns a meaningful representation of proteins and patient samples. Lupine is implemented as an open source Python package.

2.
bioRxiv ; 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38617345

RESUMO

Membrane-bound particles in plasma are composed of exosomes, microvesicles, and apoptotic bodies and represent ~1-2% of the total protein composition. Proteomic interrogation of this subset of plasma proteins augments the representation of tissue-specific proteins, representing a "liquid biopsy," while enabling the detection of proteins that would otherwise be beyond the dynamic range of liquid chromatography-tandem mass spectrometry of unfractionated plasma. We have developed an enrichment strategy (Mag-Net) using hyper-porous strong-anion exchange magnetic microparticles to sieve membrane-bound particles from plasma. The Mag-Net method is robust, reproducible, inexpensive, and requires <100 µL plasma input. Coupled to a quantitative data-independent mass spectrometry analytical strategy, we demonstrate that we can collect results for >37,000 peptides from >4,000 plasma proteins with high precision. Using this analytical pipeline on a small cohort of patients with neurodegenerative disease and healthy age-matched controls, we discovered 204 proteins that differentiate (q-value < 0.05) patients with Alzheimer's disease dementia (ADD) from those without ADD. Our method also discovered 310 proteins that were different between Parkinson's disease and those with either ADD or healthy cognitively normal individuals. Using machine learning we were able to distinguish between ADD and not ADD with a mean ROC AUC = 0.98 ± 0.06.

3.
J Proteome Res ; 23(6): 1894-1906, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38652578

RESUMO

Searching for tandem mass spectrometry proteomics data against a database is a well-established method for assigning peptide sequences to observed spectra but typically cannot identify peptides harboring unexpected post-translational modifications (PTMs). Open modification searching aims to address this problem by allowing a spectrum to match a peptide even if the spectrum's precursor mass differs from the peptide mass. However, expanding the search space in this way can lead to a loss of statistical power to detect peptides. We therefore developed a method, called CONGA (combining open and narrow searches with group-wise analysis), that takes into account results from both types of searches─a traditional "narrow window" search and an open modification search─while carrying out rigorous false discovery rate control. The result is an algorithm that provides the best of both worlds: the ability to detect unexpected PTMs without a concomitant loss of power to detect unmodified peptides.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Processamento de Proteína Pós-Traducional , Proteômica , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Peptídeos/análise , Peptídeos/química , Humanos , Software , Sequência de Aminoácidos
4.
J Proteome Res ; 22(11): 3427-3438, 2023 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-37861703

RESUMO

Quantitative measurements produced by tandem mass spectrometry proteomics experiments typically contain a large proportion of missing values. Missing values hinder reproducibility, reduce statistical power, and make it difficult to compare across samples or experiments. Although many methods exist for imputing missing values, in practice, the most commonly used methods are among the worst performing. Furthermore, previous benchmarking studies have focused on relatively simple measurements of error such as the mean-squared error between imputed and held-out values. Here we evaluate the performance of commonly used imputation methods using three practical, "downstream-centric" criteria. These criteria measure the ability to identify differentially expressed peptides, generate new quantitative peptides, and improve the peptide lower limit of quantification. Our evaluation comprises several experiment types and acquisition strategies, including data-dependent and data-independent acquisition. We find that imputation does not necessarily improve the ability to identify differentially expressed peptides but that it can identify new quantitative peptides and improve the peptide lower limit of quantification. We find that MissForest is generally the best performing method per our downstream-centric criteria. We also argue that existing imputation methods do not properly account for the variance of peptide quantifications and highlight the need for methods that do.


Assuntos
Algoritmos , Proteômica , Proteômica/métodos , Reprodutibilidade dos Testes , Espectrometria de Massas em Tandem , Peptídeos/análise
5.
J Proteome Res ; 21(6): 1382-1391, 2022 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-35549345

RESUMO

Advances in library-based methods for peptide detection from data-independent acquisition (DIA) mass spectrometry have made it possible to detect and quantify tens of thousands of peptides in a single mass spectrometry run. However, many of these methods rely on a comprehensive, high-quality spectral library containing information about the expected retention time and fragmentation patterns of peptides in the sample. Empirical spectral libraries are often generated through data-dependent acquisition and may suffer from biases as a result. Spectral libraries can be generated in silico, but these models are not trained to handle all possible post-translational modifications. Here, we propose a false discovery rate-controlled spectrum-centric search workflow to generate spectral libraries directly from gas-phase fractionated DIA tandem mass spectrometry data. We demonstrate that this strategy is able to detect phosphorylated peptides and can be used to generate a spectral library for accurate peptide detection and quantitation in wide-window DIA data. We compare the results of this search workflow to other library-free approaches and demonstrate that our search is competitive in terms of accuracy and sensitivity. These results demonstrate that the proposed workflow has the capacity to generate spectral libraries while avoiding the limitations of other methods.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Biblioteca de Peptídeos , Peptídeos/análise , Processamento de Proteína Pós-Traducional , Proteoma/análise , Espectrometria de Massas em Tandem/métodos , Fluxo de Trabalho
6.
Am J Clin Pathol ; 157(5): 748-757, 2022 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-35512256

RESUMO

OBJECTIVES: Standard implementations of amyloid typing by liquid chromatography-tandem mass spectrometry use capabilities unavailable to most clinical laboratories. To improve accessibility of this testing, we explored easier approaches to tissue sampling and data processing. METHODS: We validated a typing method using manual sampling in place of laser microdissection, pairing the technique with a semiquantitative measure of sampling adequacy. In addition, we created an open-source data processing workflow (Crux Pipeline) for clinical users. RESULTS: Cases of amyloidosis spanning the major types were distinguishable with 100% specificity using measurements of individual amyloidogenic proteins or in combination with the ratio of λ and κ constant regions. Crux Pipeline allowed for rapid, batched data processing, integrating the steps of peptide identification, statistical confidence estimation, and label-free protein quantification. CONCLUSIONS: Accurate mass spectrometry-based amyloid typing is possible without laser microdissection. To facilitate entry into solid tissue proteomics, newcomers can leverage manual sampling approaches in combination with Crux Pipeline and related tools.


Assuntos
Amiloidose , Espectrometria de Massas em Tandem , Amiloide/análise , Proteínas Amiloidogênicas , Amiloidose/diagnóstico , Humanos , Microdissecção , Espectrometria de Massas em Tandem/métodos
7.
Elife ; 102021 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-34734806

RESUMO

A longstanding hypothesis is that chromatin fiber folding mediated by interactions between nearby nucleosomes represses transcription. However, it has been difficult to determine the relationship between local chromatin fiber compaction and transcription in cells. Further, global changes in fiber diameters have not been observed, even between interphase and mitotic chromosomes. We show that an increase in the range of local inter-nucleosomal contacts in quiescent yeast drives the compaction of chromatin fibers genome-wide. Unlike actively dividing cells, inter-nucleosomal interactions in quiescent cells require a basic patch in the histone H4 tail. This quiescence-specific fiber folding globally represses transcription and inhibits chromatin loop extrusion by condensin. These results reveal that global changes in chromatin fiber compaction can occur during cell state transitions, and establish physiological roles for local chromatin fiber folding in regulating transcription and chromatin domain formation.


Assuntos
Montagem e Desmontagem da Cromatina , Cromatina/genética , Saccharomyces cerevisiae/genética , Adenosina Trifosfatases , Cromatina/metabolismo , Proteínas de Ligação a DNA , Histonas/química , Histonas/metabolismo , Complexos Multiproteicos , Nucleossomos/metabolismo , Dobramento de Proteína , Saccharomyces cerevisiae/crescimento & desenvolvimento , Transcrição Gênica
8.
J Proteome Res ; 20(8): 4153-4164, 2021 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-34236864

RESUMO

The standard proteomics database search strategy involves searching spectra against a peptide database and estimating the false discovery rate (FDR) of the resulting set of peptide-spectrum matches. One assumption of this protocol is that all the peptides in the database are relevant to the hypothesis being investigated. However, in settings where researchers are interested in a subset of peptides, alternative search and FDR control strategies are needed. Recently, two methods were proposed to address this problem: subset-search and all-sub. We show that both methods fail to control the FDR. For subset-search, this failure is due to the presence of "neighbor" peptides, which are defined as irrelevant peptides with a similar precursor mass and fragmentation spectrum as a relevant peptide. Not considering neighbors compromises the FDR estimate because a spectrum generated by an irrelevant peptide can incorrectly match well to a relevant peptide. Therefore, we have developed a new method, "subset-neighbor search" (SNS), that accounts for neighbor peptides. We show evidence that SNS controls the FDR when neighbors are present and that SNS outperforms group-FDR, the only other method that appears to control the FDR relative to a subset of relevant peptides.


Assuntos
Algoritmos , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas , Humanos , Peptídeos , Proteômica
9.
J Proteome Res ; 20(4): 1966-1971, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33596079

RESUMO

Proteomics studies rely on the accurate assignment of peptides to the acquired tandem mass spectra-a task where machine learning algorithms have proven invaluable. We describe mokapot, which provides a flexible semisupervised learning algorithm that allows for highly customized analyses. We demonstrate some of the unique features of mokapot by improving the detection of RNA-cross-linked peptides from an analysis of RNA-binding proteins and increasing the consistency of peptide detection in a single-cell proteomics study.


Assuntos
Peptídeos , Proteômica , Algoritmos , Bases de Dados de Proteínas , Espectrometria de Massas em Tandem
10.
J Proteome Res ; 19(3): 1147-1153, 2020 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-32037841

RESUMO

Mass spectrometry is a powerful tool for quantifying protein abundance in complex samples. Advances in sample preparation and the development of data-independent acquisition (DIA) mass spectrometry approaches have increased the number of peptides and proteins measured per sample. Here, we present a series of experiments demonstrating how to assess whether a peptide measurement is quantitative by mass spectrometry. Our results demonstrate that increasing the number of detected peptides in a proteomics experiment does not necessarily result in increased numbers of peptides that can be measured quantitatively.


Assuntos
Peptídeos , Proteômica , Calibragem , Espectrometria de Massas , Proteínas
11.
J Biol Chem ; 294(22): 8760-8772, 2019 05 31.
Artigo em Inglês | MEDLINE | ID: mdl-31010829

RESUMO

The cohesin complex regulates sister chromatid cohesion, chromosome organization, gene expression, and DNA repair. Cohesin is a ring complex composed of four core subunits and seven regulatory subunits. In an effort to comprehensively identify additional cohesin-interacting proteins, we used gene editing to introduce a dual epitope tag into the endogenous allele of each of 11 known components of cohesin in cultured human cells, and we performed MS analyses on dual-affinity purifications. In addition to reciprocally identifying all known components of cohesin, we found that cohesin interacts with a panoply of splicing factors and RNA-binding proteins (RBPs). These included diverse components of the U4/U6.U5 tri-small nuclear ribonucleoprotein complex and several splicing factors that are commonly mutated in cancer. The interaction between cohesin and splicing factors/RBPs was RNA- and DNA-independent, occurred in chromatin, was enhanced during mitosis, and required RAD21. Furthermore, cohesin-interacting splicing factors and RBPs followed the cohesin cycle and prophase pathway of cell cycle-regulated interactions with chromatin. Depletion of cohesin-interacting splicing factors and RBPs resulted in aberrant mitotic progression. These results provide a comprehensive view of the endogenous human cohesin interactome and identify splicing factors and RBPs as functionally significant cohesin-interacting proteins.


Assuntos
Proteínas de Ciclo Celular/metabolismo , Proteínas Cromossômicas não Histona/metabolismo , Mitose , Proteômica , Fatores de Processamento de RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Proteínas de Ciclo Celular/antagonistas & inibidores , Proteínas de Ciclo Celular/genética , Linhagem Celular Tumoral , Cromatina/metabolismo , Proteínas Cromossômicas não Histona/genética , Proteínas de Ligação a DNA/antagonistas & inibidores , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Humanos , Microscopia de Fluorescência , Ligação Proteica , Mapas de Interação de Proteínas , Interferência de RNA , Fatores de Processamento de RNA/antagonistas & inibidores , Fatores de Processamento de RNA/genética , RNA Interferente Pequeno/metabolismo , Proteínas de Ligação a RNA/antagonistas & inibidores , Proteínas de Ligação a RNA/genética , Coesinas
12.
Genome Biol ; 20(1): 57, 2019 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-30890172

RESUMO

BACKGROUND: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. RESULTS: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. CONCLUSIONS: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.


Assuntos
Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Neoplasias/genética , Controle de Qualidade , Software , Humanos , Reprodutibilidade dos Testes , Células Tumorais Cultivadas
13.
IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1168-1181, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-29993658

RESUMO

MOTIVATION: Identification of spectra produced by a shotgun proteomics mass spectrometry experiment is commonly performed by searching the observed spectra against a peptide database. The heart of this search procedure is a score function that evaluates the quality of a hypothesized match between an observed spectrum and a theoretical spectrum corresponding to a particular peptide sequence. Accordingly, the success of a spectrum analysis pipeline depends critically upon this peptide-spectrum score function. We develop peptide-spectrum score functions that compute the maximum value of a submodular function under $m$ m matroid constraints. We call this procedure a submodular generalized matching (SGM) since it generalizes bipartite matching. We use a greedy algorithm to compute maximization, which can achieve a solution whose objective is guaranteed to be at least $\frac{1}{1+m}$ 1 1 + m of the true optimum. The advantage of the SGM framework is that known long-range properties of experimental spectra can be modeled by designing suitable submodular functions and matroid constraints. Experiments on four data sets from various organisms and mass spectrometry platforms show that the SGM approach leads to significantly improved performance compared to several state-of-the-art methods. Supplementary information, C++ source code, and data sets can be found at https://melodi-lab.github.io/SGM.


Assuntos
Biologia Computacional/métodos , Peptídeos/química , Espectrometria de Massas em Tandem , Algoritmos , Animais , Caenorhabditis elegans/química , Calibragem , Bases de Dados de Proteínas , Humanos , Modelos Estatísticos , Plasmodium falciparum/química , Proteômica/métodos , Saccharomyces cerevisiae/química , Software
14.
Sci Rep ; 7(1): 16943, 2017 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-29208983

RESUMO

A comprehensive characterization of tumor genetic heterogeneity is critical for understanding how cancers evolve and escape treatment. Although many algorithms have been developed for capturing tumor heterogeneity, they are designed for analyzing either a single type of genomic aberration or individual biopsies. Here we present THEMIS (Tumor Heterogeneity Extensible Modeling via an Integrative System), which allows for the joint analysis of different types of genomic aberrations from multiple biopsies taken from the same patient, using a dynamic graphical model. Simulation experiments demonstrate higher accuracy of THEMIS over its ancestor, TITAN. The heterogeneity analysis results from THEMIS are validated with single cell DNA sequencing from a clinical tumor biopsy. When THEMIS is used to analyze tumor heterogeneity among multiple biopsies from the same patient, it helps to reveal the mutation accumulation history, track cancer progression, and identify the mutations related to treatment resistance. We implement our model via an extensible modeling platform, which makes our approach open, reproducible, and easy for others to extend.


Assuntos
Biópsia/métodos , Modelos Biológicos , Neoplasias/patologia , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Neoplasias de Mama Triplo Negativas/genética , Algoritmos , Teorema de Bayes , Evolução Clonal , Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Feminino , Humanos , Mutação , Neoplasias/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Análise de Célula Única , Transcriptoma , Neoplasias de Mama Triplo Negativas/patologia
15.
PLoS Pathog ; 13(3): e1006256, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28257516

RESUMO

Kaposi's Sarcoma associated Herpesvirus (KSHV), an oncogenic, human gamma-herpesvirus, is the etiological agent of Kaposi's Sarcoma the most common tumor of AIDS patients world-wide. KSHV is predominantly latent in the main KS tumor cell, the spindle cell, a cell of endothelial origin. KSHV modulates numerous host cell-signaling pathways to activate endothelial cells including major metabolic pathways involved in lipid metabolism. To identify the underlying cellular mechanisms of KSHV alteration of host signaling and endothelial cell activation, we identified changes in the host proteome, phosphoproteome and transcriptome landscape following KSHV infection of endothelial cells. A Steiner forest algorithm was used to integrate the global data sets and, together with transcriptome based predicted transcription factor activity, cellular networks altered by latent KSHV were predicted. Several interesting pathways were identified, including peroxisome biogenesis. To validate the predictions, we showed that KSHV latent infection increases the number of peroxisomes per cell. Additionally, proteins involved in peroxisomal lipid metabolism of very long chain fatty acids, including ABCD3 and ACOX1, are required for the survival of latently infected cells. In summary, novel cellular pathways altered during herpesvirus latency that could not be predicted by a single systems biology platform, were identified by integrated proteomics and transcriptomics data analysis and when correlated with our metabolomics data revealed that peroxisome lipid metabolism is essential for KSHV latent infection of endothelial cells.


Assuntos
Herpesvirus Humano 8/metabolismo , Interações Hospedeiro-Parasita/fisiologia , Metabolismo dos Lipídeos/fisiologia , Peroxissomos/metabolismo , Ativação Viral/fisiologia , Latência Viral/fisiologia , Separação Celular , Células Cultivadas , Células Endoteliais/virologia , Citometria de Fluxo , Humanos , Espectrometria de Massas , Microscopia Confocal , RNA Interferente Pequeno , Sarcoma de Kaposi/virologia , Biologia de Sistemas , Transfecção
16.
J Proteome Res ; 16(4): 1817-1824, 2017 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-28263070

RESUMO

In shotgun proteomics analysis, user-specified parameters are critical to database search performance and therefore to the yield of confident peptide-spectrum matches (PSMs). Two of the most important parameters are related to the accuracy of the mass spectrometer. Precursor mass tolerance defines the peptide candidates considered for each spectrum. Fragment mass tolerance or bin size determines how close observed and theoretical fragments must be to be considered a match. For either of these two parameters, too wide a setting yields randomly high-scoring false PSMs, whereas too narrow a setting erroneously excludes true PSMs, in both cases, lowering the yield of peptides detected at a given false discovery rate. We describe a strategy for inferring optimal search parameters by assembling and analyzing pairs of spectra that are likely to have been generated by the same peptide ion to infer precursor and fragment mass error. This strategy does not rely on a database search, making it usable in a wide variety of settings. In our experiments on data from a variety of instruments including Orbitrap and Q-TOF acquisitions, this strategy yields more high-confidence PSMs than using settings based on instrument defaults or determined by experts. Param-Medic is open-source and cross-platform. It is available as a standalone tool ( http://noble.gs.washington.edu/proj/param-medic/ ) and has been integrated into the Crux proteomics toolkit ( http://crux.ms ), providing automatic parameter selection for the Comet and Tide search engines.


Assuntos
Bases de Dados de Proteínas , Peptídeos/isolamento & purificação , Proteômica , Espectrometria de Massas em Tandem/métodos , Algoritmos , Mapeamento de Peptídeos , Peptídeos/genética , Ferramenta de Busca , Software
17.
Nat Methods ; 14(3): 263-266, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28135255

RESUMO

We present single-cell combinatorial indexed Hi-C (sciHi-C), a method that applies combinatorial cellular indexing to chromosome conformation capture. In this proof of concept, we generate and sequence six sciHi-C libraries comprising a total of 10,696 single cells. We use sciHi-C data to separate cells by karyotypic and cell-cycle state differences and identify cell-to-cell heterogeneity in mammalian chromosomal conformation. Our results demonstrate that combinatorial indexing is a generalizable strategy for single-cell genomics.


Assuntos
Cromossomos/genética , DNA/genética , Genoma Humano/genética , Genômica/métodos , Conformação Molecular , Análise de Célula Única/métodos , Ciclo Celular/genética , Linhagem Celular Tumoral , DNA/análise , Biblioteca Gênica , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos
18.
J Am Soc Mass Spectrom ; 27(11): 1719-1727, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27572102

RESUMO

Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method-grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein-in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. Graphical Abstract ᅟ.


Assuntos
Bases de Dados de Proteínas , Proteômica , Software , Algoritmos , Humanos , Peptídeos/química , Espectrometria de Massas em Tandem
19.
J Proteome Res ; 15(8): 2697-705, 2016 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-27396978

RESUMO

In principle, tandem mass spectrometry can be used to detect and quantify the peptides present in a microbiome sample, enabling functional and taxonomic insight into microbiome metabolic activity. However, the phylogenetic diversity constituting a particular microbiome is often unknown, and many of the organisms present may not have assembled genomes. In ocean microbiome samples, with particularly diverse and uncultured bacterial communities, it is difficult to construct protein databases that contain the bulk of the peptides in the sample without losing detection sensitivity due to the overwhelming number of candidate peptides for each tandem mass spectrum. We describe a method for deriving "metapeptides" (short amino acid sequences that may be represented in multiple organisms) from shotgun metagenomic sequencing of microbiome samples. In two ocean microbiome samples, we constructed site-specific metapeptide databases to detect more than one and a half times as many peptides as by searching against predicted genes from an assembled metagenome and roughly three times as many peptides as by searching against the NCBI environmental proteome database. The increased peptide yield has the potential to enrich the taxonomic and functional characterization of sample metaproteomes.


Assuntos
Organismos Aquáticos/química , Metagenômica/métodos , Microbiota , Peptídeos/análise , Proteômica/métodos , Organismos Aquáticos/genética , Biodiversidade , Bases de Dados de Proteínas , Microbiota/genética , Análise de Sequência de DNA , Manejo de Espécimes , Espectrometria de Massas em Tandem
20.
J Proteome Res ; 15(8): 2749-59, 2016 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-27397138

RESUMO

A central problem in mass spectrometry analysis involves identifying, for each observed tandem mass spectrum, the corresponding generating peptide. We present a dynamic Bayesian network (DBN) toolkit that addresses this problem by using a machine learning approach. At the heart of this toolkit is a DBN for Rapid Identification (DRIP), which can be trained from collections of high-confidence peptide-spectrum matches (PSMs). DRIP's score function considers fragment ion matches using Gaussians rather than fixed fragment-ion tolerances and also finds the optimal alignment between the theoretical and observed spectrum by considering all possible alignments, up to a threshold that is controlled using a beam-pruning algorithm. This function not only yields state-of-the art database search accuracy but also can be used to generate features that significantly boost the performance of the Percolator postprocessor. The DRIP software is built upon a general purpose DBN toolkit (GMTK), thereby allowing a wide variety of options for user-specific inference tasks as well as facilitating easy modifications to the DRIP model in future work. DRIP is implemented in Python and C++ and is available under Apache license at http://melodi-lab.github.io/dripToolkit .


Assuntos
Aprendizado de Máquina , Peptídeos/análise , Proteômica/métodos , Teorema de Bayes , Bases de Dados de Proteínas , Software , Espectrometria de Massas em Tandem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA