Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Proteomics ; 24(5): e2300145, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37726251

RESUMO

Exact p-value (XPV)-based methods for dot product-like score functions-such as the XCorr score implemented in Tide, SEQUEST, Comet or shared peak count-based scoring in MSGF+ and ASPV-provide a fairly good calibration for peptide-spectrum-match (PSM) scoring in database searching-based MS/MS spectrum data identification. Unfortunately, standard XPV methods, in practice, cannot handle high-resolution fragmentation data produced by state-of-the-art mass spectrometers because having smaller bins increases the number of fragment matches that are assigned to incorrect bins and scored improperly. In this article, we present an extension of the XPV method, called the high-resolution exact p-value (HR-XPV) method, which can be used to calibrate PSM scores of high-resolution MS/MS spectra obtained with dot product-like scoring such as the XCorr. The HR-XPV carries remainder masses throughout the fragmentation, allowing them to greatly increase the number of fragments that are properly assigned to the correct bin and, thus, taking advantage of high-resolution data. Using four mass spectrometry data sets, our experimental results demonstrate that HR-XPV produces well-calibrated scores, which in turn results in more trusted spectrum annotations at any false discovery rate level.


Assuntos
Algoritmos , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Software , Peptídeos/química , Calibragem , Bases de Dados de Proteínas
2.
J Proteome Res ; 22(2): 577-584, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36633229

RESUMO

The first step in the analysis of protein tandem mass spectrometry data typically involves searching the observed spectra against a protein database. During database search, the search engine must digest the proteins in the database into peptides, subject to digestion rules that are under user control. The choice of these digestion parameters, as well as selection of post-translational modifications (PTMs), can dramatically affect the size of the search space and hence the statistical power of the search. The Tide search engine separates the creation of the peptide index from the database search step, thereby saving time by allowing a peptide index to be reused in multiple searches. Here we describe an improved implementation of the indexing component of Tide that consumes around four times less resources (CPU and RAM) than the previous version and can generate arbitrarily large peptide databases, limited by only the amount of available disk space. We use this improved implementation to explore the relationship between database size and the parameters controlling digestion and PTMs, as well as database size and statistical power. Our results can help guide practitioners in proper selection of these important parameters.


Assuntos
Algoritmos , Peptídeos , Peptídeos/química , Proteínas/metabolismo , Ferramenta de Busca , Bases de Dados de Proteínas , Software
3.
J Proteome Res ; 22(2): 561-569, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36598107

RESUMO

The Crux tandem mass spectrometry data analysis toolkit provides a collection of algorithms for analyzing bottom-up proteomics tandem mass spectrometry data. Many publications have described various individual components of Crux, but a comprehensive summary has not been published since 2014. The goal of this work is to summarize the functionality of Crux, focusing on developments since 2014. We begin with empirical results demonstrating our recently implemented speedups to the Tide search engine. Other new features include a new score function in Tide, two new confidence estimation procedures, as well as three new tools: Param-medic for estimating search parameters directly from mass spectrometry data, Kojak for searching cross-linked mass spectra, and DIAmeter for searching data independent acquisition data against a sequence database.


Assuntos
Software , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Bases de Dados de Proteínas , Algoritmos
4.
J Proteome Res ; 20(10): 4708-4717, 2021 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-34449232

RESUMO

Spectrum annotation is a challenging task due to the presence of unexpected peptide fragmentation ions as well as the inaccuracy of the detectors of the spectrometers. We present a deep convolutional neural network, called Slider, which learns an optimal feature extraction in its kernels for scoring mass spectrometry (MS)/MS spectra to increase the number of spectrum annotations with high confidence. Experimental results using publicly available data sets show that Slider can annotate slightly more spectra than the state-of-the-art methods (BoltzMatch, Res-EV, Prosit), albeit 2-10 times faster. More interestingly, Slider provides only 2-4% fewer spectrum annotations with low-resolution fragmentation information than other methods with high-resolution information. This means that Slider can exploit nearly as much information from the context of low-resolution spectrum peaks as the high-resolution fragmentation information can provide for other scoring methods. Thus, Slider can be an optimal choice for practitioners using old spectrometers with low-resolution detectors.


Assuntos
Redes Neurais de Computação , Espectrometria de Massas em Tandem , Algoritmos , Bases de Dados Factuais , Peptídeos
6.
Bioinformatics ; 36(12): 3781-3787, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32207518

RESUMO

MOTIVATION: The discrimination ability of score functions to separate correct from incorrect peptide-spectrum-matches in database-searching-based spectrum identification is hindered by many superfluous peaks belonging to unexpected fragmentation ions or by the lacking peaks of anticipated fragmentation ions. RESULTS: Here, we present a new method, called BoltzMatch, to learn score functions using a particular stochastic neural networks, called restricted Boltzmann machines, in order to enhance their discrimination ability. BoltzMatch learns chemically explainable patterns among peak pairs in the spectrum data, and it can augment peaks depending on their semantic context or even reconstruct lacking peaks of expected ions during its internal scoring mechanism. As a result, BoltzMatch achieved 50% and 33% more annotations on high- and low-resolution MS2 data than XCorr at a 0.1% false discovery rate in our benchmark; conversely, XCorr yielded the same number of spectrum annotations as BoltzMatch, albeit with 4-6 times more errors. In addition, BoltzMatch alone does yield 14% more annotations than Prosit (which runs with Percolator), and BoltzMatch with Percolator yields 32% more annotations than Prosit at 0.1% FDR level in our benchmark. AVAILABILITY AND IMPLEMENTATION: BoltzMatch is freely available at: https://github.com/kfattila/BoltzMatch. CONTACT: akerteszfarkas@hse.ru. SUPPORTING INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Redes Neurais de Computação , Software
7.
Sci Rep ; 10(1): 5068, 2020 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-32193485

RESUMO

Recent advancements in deep learning have revolutionized the way microscopy images of cells are processed. Deep learning network architectures have a large number of parameters, thus, in order to reach high accuracy, they require a massive amount of annotated data. A common way of improving accuracy builds on the artificial increase of the training set by using different augmentation techniques. A less common way relies on test-time augmentation (TTA) which yields transformed versions of the image for prediction and the results are merged. In this paper we describe how we have incorporated the test-time argumentation prediction method into two major segmentation approaches utilized in the single-cell analysis of microscopy images. These approaches are semantic segmentation based on the U-Net, and instance segmentation based on the Mask R-CNN models. Our findings show that even if only simple test-time augmentations (such as rotation or flipping and proper merging methods) are applied, TTA can significantly improve prediction accuracy. We have utilized images of tissue and cell cultures from the Data Science Bowl (DSB) 2018 nuclei segmentation competition and other sources. Additionally, boosting the highest-scoring method of the DSB with TTA, we could further improve prediction accuracy, and our method has reached an ever-best score at the DSB.

8.
J Proteome Res ; 19(4): 1481-1490, 2020 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-32175744

RESUMO

Peptide-spectrum-match (PSM) scores used in database searching are calibrated to spectrum- or spectrum-peptide-specific null distributions. Some calibration methods rely on specific assumptions and use analytical models (e.g., binomial distributions), whereas other methods utilize exact empirical null distributions. The former may be inaccurate because of unjustified assumptions, while the latter are accurate, albeit computationally exhaustive. Here, we introduce a novel, nonparametric, heuristic PSM score calibration method, called Tailor, which calibrates PSM scores by dividing them with the top 100-quantile of the empirical, spectrum-specific null distributions (i.e., the score with an associated p-value of 0.01 at the tail, hence the name) observed during database searching. Tailor does not require any optimization steps or long calculations; it does not rely on any assumptions on the form of the score distribution (i.e., if it is, e.g., binomial); however, it relies on our empirical observation that the mean and the variance of the null distributions are correlated. In our benchmark, we re-calibrated the match scores of XCorr from Crux, HyperScore scores from X!Tandem, and the p-values from OMSSA with the Tailor method and obtained more spectrum annotations than with raw scores at any false discovery rate level. Moreover, Tailor provided slightly more annotations than E-values of X!Tandem and OMSSA and approached the performance of the computationally exhaustive exact p-value method for XCorr on spectrum data sets containing low-resolution fragmentation information (MS2) around 20-150 times faster. On high-resolution MS2 data sets, the Tailor method with XCorr achieved state-of-the-art performance and produced more annotations than the well-calibrated residue-evidence (Res-ev) score around 50-80 times faster.


Assuntos
Algoritmos , Proteômica , Calibragem , Bases de Dados de Proteínas , Peptídeos
9.
J Proteome Res ; 18(5): 2354-2358, 2019 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-30983355

RESUMO

Accurate target-decoy-based false discovery rate (FDR) control of peptide identification from tandem mass-spectrometry data relies on an important but often neglected assumption that incorrect spectrum annotations are equally likely to receive either target or decoy peptides. Here we argue that this assumption is often violated in practice, even by popular methods. Preference can be given to target peptides by biased scoring functions, which result in liberal FDR estimations, or to decoy peptides by correlated spectra, which result in conservative estimations.


Assuntos
Artefatos , Peptídeos/isolamento & purificação , Proteômica/normas , Espectrometria de Massas em Tandem/normas , Sequência de Aminoácidos , Viés , Humanos , Plasmodium falciparum/química , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos
10.
Sensors (Basel) ; 18(12)2018 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-30486308

RESUMO

Several studies have analyzed human gait data obtained from inertial gyroscope and accelerometer sensors mounted on different parts of the body. In this article, we take a step further in gait analysis and provide a methodology for predicting the movements of the legs, which can be applied in prosthesis to imitate the missing part of the leg in walking. In particular, we propose a method, called GaIn, to control non-invasive, robotic, prosthetic legs. GaIn can infer the movements of both missing shanks and feet for humans suffering from double trans-femoral amputation using biologically inspired recurrent neural networks. Predictions are performed for casual walking related activities such as walking, taking stairs, and running based on thigh movement. In our experimental tests, GaIn achieved a 4.55 prediction error for shank movements on average. However, a patient's intention to stand up and sit down cannot be inferred from thigh movements. In fact, intention causes thigh movements while the shanks and feet remain roughly still. The GaIn system can be triggered by thigh muscle activities measured with electromyography (EMG) sensors to make robotic prosthetic legs perform standing up and sitting down actions. The GaIn system has low prediction latency and is fast and computationally inexpensive to be deployed on mobile platforms and portable devices.


Assuntos
Amputação Cirúrgica , Marcha/fisiologia , Membros Artificiais , Técnicas Biossensoriais , Eletromiografia , Humanos , Aprendizado de Máquina , Dispositivos Eletrônicos Vestíveis
11.
Bioinformatics ; 34(19): 3281-3288, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-29741583

RESUMO

Motivation: Bioinformatics studies often rely on similarity measures between sequence pairs, which often pose a bottleneck in large-scale sequence analysis. Results: Here, we present a new convolutional kernel function for protein sequences called the Lempel-Ziv-Welch (LZW)-Kernel. It is based on code words identified with the LZW universal text compressor. The LZW-Kernel is an alignment-free method, it is always symmetric, is positive, always provides 1.0 for self-similarity and it can directly be used with Support Vector Machines (SVMs) in classification problems, contrary to normalized compression distance, which often violates the distance metric properties in practice and requires further techniques to be used with SVMs. The LZW-Kernel is a one-pass algorithm, which makes it particularly plausible for big data applications. Our experimental studies on remote protein homology detection and protein classification tasks reveal that the LZW-Kernel closely approaches the performance of the Local Alignment Kernel (LAK) and the SVM-pairwise method combined with Smith-Waterman (SW) scoring at a fraction of the time. Moreover, the LZW-Kernel outperforms the SVM-pairwise method when combined with Basic Local Alignment Search Tool (BLAST) scores, which indicates that the LZW code words might be a better basis for similarity measures than local alignment approximations found with BLAST. In addition, the LZW-Kernel outperforms n-gram based mismatch kernels, hidden Markov model based SAM and Fisher kernel and protein family based PSI-BLAST, among others. Further advantages include the LZW-Kernel's reliance on a simple idea, its ease of implementation, and its high speed, three times faster than BLAST and several magnitudes faster than SW or LAK in our tests. Availability and implementation: LZW-Kernel is implemented as a standalone C code and is a free open-source program distributed under GPLv3 license and can be downloaded from https://github.com/kfattila/LZW-Kernel. Supplementary information: Supplementary data are available at Bioinformatics Online.


Assuntos
Algoritmos , Sequência de Aminoácidos , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Biologia Computacional , Máquina de Vetores de Suporte
12.
Gene ; 660: 8-12, 2018 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-29574188

RESUMO

Type II restriction endonucleases and modification DNA-methyltransferases are key instruments of genetic engineering. Recently the number of proteins assigned to this group exceeds 8500. Subtype IIC organizes bifunctional endonuclease-methyltransferase enzymes and currently consists of 16 described members. Here we present phylogenetic tree of 22 new potential bifunctional endonucleases. The majority of them are thought to be fusions of a restriction nuclease with a DNA-methyltransferase and a target recognition subunit of type I restriction-modification systems (R-M-S structure). A RM.AloI isoschizomer from Prevotella copri DSM-18205, PcoI, has been cloned, purified and its REase activity demonstrated. It cuts DNA in magnesium-dependent manner and demonstrates high affinity to DNA, which probably reflects its mechanism of action. This work provides additional proves that gene fusion might play an important role in evolution of restriction-modification systems and other DNA-modifying proteins.


Assuntos
Proteínas de Bactérias/química , Metilases de Modificação do DNA/química , Desoxirribonucleases de Sítio Específico do Tipo II/química , Prevotella/enzimologia
14.
J Proteome Res ; 14(8): 3148-61, 2015 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-26152888

RESUMO

Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications.


Assuntos
Algoritmos , Biologia Computacional , Peptídeos/metabolismo , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Sequência de Aminoácidos , Animais , Proteínas de Caenorhabditis elegans/metabolismo , Biologia Computacional/métodos , Bases de Dados de Proteínas , Reações Falso-Positivas , Plasmodium falciparum/metabolismo , Proteômica/normas , Proteínas de Protozoários/metabolismo , Reprodutibilidade dos Testes , Proteínas de Saccharomyces cerevisiae/metabolismo , Espectrometria de Massas em Tandem/normas
15.
J Proteome Res ; 14(8): 3027-38, 2015 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-26084232

RESUMO

Accurate assignment of peptide sequences to observed fragmentation spectra is hindered by the large number of hypotheses that must be considered for each observed spectrum. A high score assigned to a particular peptide-spectrum match (PSM) may not end up being statistically significant after multiple testing correction. Researchers can mitigate this problem by controlling the hypothesis space in various ways: considering only peptides resulting from enzymatic cleavages, ignoring possible post-translational modifications or single nucleotide variants, etc. However, these strategies sacrifice identifications of spectra generated by rarer types of peptides. In this work, we introduce a statistical testing framework, cascade search, that directly addresses this problem. The method requires that the user specify a priori a statistical confidence threshold as well as a series of peptide databases. For instance, such a cascade of databases could include fully tryptic, semitryptic, and nonenzymatic peptides or peptides with increasing numbers of modifications. Cascaded search then gradually expands the list of candidate peptides from more likely peptides toward rare peptides, sequestering at each stage any spectrum that is identified with a specified statistical confidence. We compare cascade search to a standard procedure that lumps all of the peptides into a single database, as well as to a previously described group FDR procedure that computes the FDR separately within each database. We demonstrate, using simulated and real data, that cascade search identifies more spectra at a fixed FDR threshold than with either the ungrouped or grouped approach. Cascade search thus provides a general method for maximizing the number of identified spectra in a statistically rigorous fashion.


Assuntos
Algoritmos , Peptídeos/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Linhagem Celular , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Peptídeos/metabolismo , Isoformas de Proteínas/análise , Isoformas de Proteínas/metabolismo , Processamento de Proteína Pós-Traducional , Reprodutibilidade dos Testes , Proteínas de Saccharomyces cerevisiae/análise , Proteínas de Saccharomyces cerevisiae/metabolismo
16.
Nature ; 521(7551): 227-31, 2015 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-25731161

RESUMO

Long-standing evidence indicates that human immunodeficiency virus type 1 (HIV-1) preferentially integrates into a subset of transcriptionally active genes of the host cell genome. However, the reason why the virus selects only certain genes among all transcriptionally active regions in a target cell remains largely unknown. Here we show that HIV-1 integration occurs in the outer shell of the nucleus in close correspondence with the nuclear pore. This region contains a series of cellular genes, which are preferentially targeted by the virus, and characterized by the presence of active transcription chromatin marks before viral infection. In contrast, the virus strongly disfavours the heterochromatic regions in the nuclear lamin-associated domains and other transcriptionally active regions located centrally in the nucleus. Functional viral integrase and the presence of the cellular Nup153 and LEDGF/p75 integration cofactors are indispensable for the peripheral integration of the virus. Once integrated at the nuclear pore, the HIV-1 DNA makes contact with various nucleoporins; this association takes part in the transcriptional regulation of the viral genome. These results indicate that nuclear topography is an essential determinant of the HIV-1 life cycle.


Assuntos
Núcleo Celular/genética , Núcleo Celular/metabolismo , Posicionamento Cromossômico/genética , Loci Gênicos/genética , HIV-1/genética , HIV-1/fisiologia , Integração Viral/genética , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Linfócitos T CD4-Positivos/citologia , Linfócitos T CD4-Positivos/metabolismo , Células Cultivadas , Cromatina/genética , Cromatina/metabolismo , Integrase de HIV/metabolismo , Meia-Vida , Humanos , Poro Nuclear/genética , Poro Nuclear/metabolismo , Complexo de Proteínas Formadoras de Poros Nucleares/metabolismo , Fatores de Transcrição/metabolismo , Ativação Transcricional/genética
17.
J Proteome Res ; 13(10): 4488-91, 2014 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-25182276

RESUMO

Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit ( http://cruxtoolkit.sourceforge.net ) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data.


Assuntos
Proteínas/química , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Proteínas , Internet
18.
PLoS One ; 9(4): e95511, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24755769

RESUMO

Multispecies bacterial communities such as the microbiota of the gastrointestinal tract can be remarkably stable and resilient even though they consist of cells and species that compete for resources and also produce a large number of antimicrobial agents. Computational modeling suggests that horizontal transfer of resistance genes may greatly contribute to the formation of stable and diverse communities capable of protecting themselves with a battery of antimicrobial agents while preserving a varied metabolic repertoire of the constituent species. In other words horizontal transfer of resistance genes makes a community compatible in terms of exoproducts and capable to maintain a varied and mature metagenome. The same property may allow microbiota to protect a host organism, or if used as a microbial therapy, to purge pathogens and restore a protective environment.


Assuntos
Bactérias/genética , Bactérias/imunologia , Transferência Genética Horizontal , Microbiota/genética , Simulação por Computador , Variação Genética , Genótipo , Modelos Biológicos
19.
Bioinformatics ; 30(2): 234-41, 2014 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-24215026

RESUMO

MOTIVATION: Tandem mass spectrometry has become a standard tool for identifying post-translational modifications (PTMs) of proteins. Algorithmic searches for PTMs from tandem mass spectrum data (MS/MS) tend to be hampered by noisy data as well as by a combinatorial explosion of search space. This leads to high uncertainty and long search-execution times. RESULTS: To address this issue, we present PTMTreeSearch, a new algorithm that uses a large database of known PTMs to identify PTMs from MS/MS data. For a given peptide sequence, PTMTreeSearch builds a computational tree wherein each path from the root to the leaves is labeled with the amino acids of a peptide sequence. Branches then represent PTMs. Various empirical tree pruning rules have been designed to decrease the search-execution time by eliminating biologically unlikely solutions. PTMTreeSearch first identifies a relatively small set of high confidence PTM types, and in a second stage, performs a more exhaustive search on this restricted set using relaxed search parameter settings. An analysis of experimental data shows that using the same criteria for false discovery, PTMTreeSearch annotates more peptides than the current state-of-the-art methods and PTM identification algorithms, and achieves this at roughly the same execution time. PTMTreeSearch is implemented as a plugable scoring function in the X!Tandem search engine. AVAILABILITY: The source code of PTMTreeSearch and a demo server application can be found at http://net.icgeb.org/ptmtreesearch


Assuntos
Algoritmos , Fragmentos de Peptídeos/análise , Processamento de Proteína Pós-Traducional , Proteínas/química , Software , Espectrometria de Massas em Tandem/métodos , Biologia Computacional , Humanos , Proteínas/metabolismo
20.
Protein Pept Lett ; 21(8): 858-63, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-23855660

RESUMO

Identification and elimination of noise peaks in mass spectra from large proteomics data streams simultaneously improves the accuracy of peptide identification and significantly decreases the size of the data. There are a number of peak filtering strategies that can achieve this goal. Here we present a simple algorithm wherein the number of highest intensity peaks retained for further analysis is proportional to the mass of the precursor ion. We show that this technique provides an improvement over other intensity based strategies, especially for low mass precursors.


Assuntos
Espectrometria de Massas/métodos , Proteínas/química , Proteínas/metabolismo , Proteômica/métodos , Estatística como Assunto/métodos , Algoritmos , Humanos , Células Jurkat , Peso Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...