Pesquisa | Portal Regional da BVS

1.

OpenMS 3 enables reproducible analysis of large-scale mass spectrometry data.

Pfeuffer, Julianus; Bielow, Chris; Wein, Samuel; Jeong, Kyowon; Netz, Eugen; Walter, Axel; Alka, Oliver; Nilse, Lars; Colaianni, Pasquale Domenico; McCloskey, Douglas; Kim, Jihyung; Rosenberger, George; Bichmann, Leon; Walzer, Mathias; Veit, Johannes; Boudaud, Bertrand; Bernt, Matthias; Patikas, Nikolaos; Pilz, Matteo; Startek, Michal Piotr; Kutuzova, Svetlana; Heumos, Lukas; Charkow, Joshua; Sing, Justin Cyril; Feroz, Ayesha; Siraj, Arslan; Weisser, Hendrik; Dijkstra, Tjeerd M H; Perez-Riverol, Yasset; Röst, Hannes; Kohlbacher, Oliver; Sachsenberg, Timo.

Nat Methods ; 21(3): 365-367, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38366242

Assuntos

Software , Espectrometria de Massas/métodos , Análise Espectral

2.

TopDownApp: An open and modular platform for analysis and visualisation of top-down proteomics data.

Walzer, Mathias; Jeong, Kyowon; Tabb, David L; Vizcaíno, Juan Antonio.

Proteomics ; 24(3-4): e2200403, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37787899

RESUMO

Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise their adoption. In this context, there are numerous improvements that are possible in the area of open science practices, including a greater application of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. These include, for example, increased data sharing practices and readily available open data standards. Additionally, the field would benefit from the development of open data analysis workflows that can enable data reuse of public datasets, something that is increasingly common in other proteomics fields.

Assuntos

Proteínas , Proteômica , Proteômica/métodos , Proteínas/análise , Fluxo de Trabalho

3.

Precursor deconvolution error estimation: The missing puzzle piece in false discovery rate in top-down proteomics.

Jeong, Kyowon; Kaulich, Philipp T; Jung, Wonhyeuk; Kim, Jihyung; Tholey, Andreas; Kohlbacher, Oliver.

Proteomics ; 24(3-4): e2300068, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37997224

RESUMO

Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics (BUP) that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentation, and data analysis, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key step for robust data analysis is the establishment of an objective estimation of proteoform-level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is based on the target-decoy approach (TDA), which has primarily been established for BUP. We present evidence that the TDA-based FDR estimation may not work at the proteoform-level due to an overlooked factor, namely the erroneous deconvolution of precursor masses, which leads to incorrect FDR estimation. We argue that the conventional TDA-based FDR in proteoform identification is in fact protein-level FDR rather than proteoform-level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform-level FDR bias by combining TDA-based FDR and precursor deconvolution error rate.

Assuntos

Peptídeos , Proteômica , Proteínas de Ligação a DNA

4.

MASH Native: a unified solution for native top-down proteomics data processing.

Larson, Eli J; Pergande, Melissa R; Moss, Michelle E; Rossler, Kalina J; Wenger, R Kent; Krichel, Boris; Josyer, Harini; Melby, Jake A; Roberts, David S; Pike, Kyndalanne; Shi, Zhuoxin; Chan, Hsin-Ju; Knight, Bridget; Rogers, Holden T; Brown, Kyle A; Ong, Irene M; Jeong, Kyowon; Marty, Michael T; McIlwain, Sean J; Ge, Ying.

Bioinformatics ; 39(6)2023 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-37294807

RESUMO

MOTIVATION: Native top-down proteomics (nTDP) integrates native mass spectrometry (nMS) with top-down proteomics (TDP) to provide comprehensive analysis of protein complexes together with proteoform identification and characterization. Despite significant advances in nMS and TDP software developments, a unified and user-friendly software package for analysis of nTDP data remains lacking. RESULTS: We have developed MASH Native to provide a unified solution for nTDP to process complex datasets with database searching capabilities in a user-friendly interface. MASH Native supports various data formats and incorporates multiple options for deconvolution, database searching, and spectral summing to provide a "one-stop shop" for characterizing both native protein complexes and proteoforms. AVAILABILITY AND IMPLEMENTATION: The MASH Native app, video tutorials, written tutorials, and additional documentation are freely available for download at https://labs.wisc.edu/gelab/MASH_Explorer/MASHSoftware.php. All data files shown in user tutorials are included with the MASH Native software in the download .zip file.

Assuntos

Proteômica , Software , Bases de Dados Factuais , Proteínas de Ligação a DNA , Espectrometria de Massas , Proteômica/métodos

5.

Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection.

Tabb, David L; Jeong, Kyowon; Druart, Karen; Gant, Megan S; Brown, Kyle A; Nicora, Carrie; Zhou, Mowei; Couvillion, Sneha; Nakayasu, Ernesto; Williams, Janet E; Peterson, Haley K; McGuire, Michelle K; McGuire, Mark A; Metz, Thomas O; Chamot-Rooke, Julia.

J Proteome Res ; 22(7): 2199-2217, 2023 07 07.

Artigo em Inglês | MEDLINE | ID: mdl-37235544

RESUMO

Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-spectrum matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification (ProSight PD, TopPIC, MSPathFinderT, and pTop) in their yield of PrSMs while controlling false discovery rate. We evaluated deconvolution engines (ThermoFisher Xtract, Bruker AutoMSn, Matrix Science Mascot Distiller, TopFD, and FLASHDeconv) in both ThermoFisher Orbitrap-class and Bruker maXis Q-TOF data (PXD033208) to produce consistent precursor charges and mass determinations. Finally, we sought post-translational modifications (PTMs) in proteoforms from bovine milk (PXD031744) and human ovarian tissue. Contemporary identification workflows produce excellent PrSM yields, although approximately half of all identified proteoforms from these four pipelines were specific to only one workflow. Deconvolution algorithms disagree on precursor masses and charges, contributing to identification variability. Detection of PTMs is inconsistent among algorithms. In bovine milk, 18% of PrSMs produced by pTop and TopMG were singly phosphorylated, but this percentage fell to 1% for one algorithm. Applying multiple search engines produces more comprehensive assessments of experiments. Top-down algorithms would benefit from greater interoperability.

Assuntos

Proteoma , Espectrometria de Massas em Tandem , Humanos , Proteoma/genética , Proteômica , Software , Processamento de Proteína Pós-Traducional

6.

MASH Native: A Unified Solution for Native Top-Down Proteomics Data Processing.

Larson, Eli J; Pergande, Melissa R; Moss, Michelle E; Rossler, Kalina J; Wenger, R Kent; Krichel, Boris; Josyer, Harini; Melby, Jake A; Roberts, David S; Pike, Kyndalanne; Shi, Zhuoxin; Chan, Hsin-Ju; Knight, Bridget; Rogers, Holden T; Brown, Kyle A; Ong, Irene M; Jeong, Kyowon; Marty, Michael; McIlwain, Sean J; Ge, Ying.

bioRxiv ; 2023 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-36711733

RESUMO

Native top-down proteomics (nTDP) integrates native mass spectrometry (nMS) with top-down proteomics (TDP) to provide comprehensive analysis of protein complexes together with proteoform identification and characterization. Despite significant advances in nMS and TDP software developments, a unified and user-friendly software package for analysis of nTDP data remains lacking. Herein, we have developed MASH Native to provide a unified solution for nTDP to process complex datasets with database searching capabilities in a user-friendly interface. MASH Native supports various data formats and incorporates multiple options for deconvolution, database searching, and spectral summing to provide a one-stop shop for characterizing both native protein complexes and proteoforms. The MASH Native app, video tutorials, written tutorials and additional documentation are freely available for download at https://labs.wisc.edu/gelab/MASH_Explorer/MASHNativeSoftware.php . All data files shown in user tutorials are included with the MASH Native software in the download .zip file.

7.

Native metabolomics identifies the rivulariapeptolide family of protease inhibitors.

Reher, Raphael; Aron, Allegra T; Fajtová, Pavla; Stincone, Paolo; Wagner, Berenike; Pérez-Lorente, Alicia I; Liu, Chenxi; Shalom, Ido Y Ben; Bittremieux, Wout; Wang, Mingxun; Jeong, Kyowon; Matos-Hernandez, Marie L; Alexander, Kelsey L; Caro-Diaz, Eduardo J; Naman, C Benjamin; Scanlan, J H William; Hochban, Phil M M; Diederich, Wibke E; Molina-Santiago, Carlos; Romero, Diego; Selim, Khaled A; Sass, Peter; Brötz-Oesterhelt, Heike; Hughes, Chambers C; Dorrestein, Pieter C; O'Donoghue, Anthony J; Gerwick, William H; Petras, Daniel.

Nat Commun ; 13(1): 4619, 2022 08 08.

Artigo em Inglês | MEDLINE | ID: mdl-35941113

RESUMO

The identity and biological activity of most metabolites still remain unknown. A bottleneck in the exploration of metabolite structures and pharmaceutical activities is the compound purification needed for bioactivity assignments and downstream structure elucidation. To enable bioactivity-focused compound identification from complex mixtures, we develop a scalable native metabolomics approach that integrates non-targeted liquid chromatography tandem mass spectrometry and detection of protein binding via native mass spectrometry. A native metabolomics screen for protease inhibitors from an environmental cyanobacteria community reveals 30 chymotrypsin-binding cyclodepsipeptides. Guided by the native metabolomics results, we select and purify five of these compounds for full structure elucidation via tandem mass spectrometry, chemical derivatization, and nuclear magnetic resonance spectroscopy as well as evaluation of their biological activities. These results identify rivulariapeptolides as a family of serine protease inhibitors with nanomolar potency, highlighting native metabolomics as a promising approach for drug discovery, chemical ecology, and chemical biology studies.

Assuntos

Metabolômica , Inibidores de Proteases , Cromatografia Líquida/métodos , Espectroscopia de Ressonância Magnética/métodos , Metabolômica/métodos , Inibidores de Proteases/farmacologia , Espectrometria de Massas em Tandem/métodos

8.

FLASHIda enables intelligent data acquisition for top-down proteomics to boost proteoform identification counts.

Jeong, Kyowon; Babovic, Masa; Gorshkov, Vladimir; Kim, Jihyung; Jensen, Ole N; Kohlbacher, Oliver.

Nat Commun ; 13(1): 4407, 2022 07 29.

Artigo em Inglês | MEDLINE | ID: mdl-35906205

RESUMO

The detailed analysis and structural characterization of proteoforms by top-down proteomics (TDP) has gained a lot of interest in biomedical research. Data-dependent acquisition (DDA) of intact proteins is non-trivial due to the diversity and complexity of proteoforms. Dedicated acquisition methods thus have the potential to greatly improve TDP. Here, we present FLASHIda, an intelligent online data acquisition algorithm for TDP that ensures the real-time selection of high-quality precursors of diverse proteoforms. FLASHIda combines fast charge deconvolution algorithms and machine learning-based quality assessment for optimal precursor selection. In an analysis of E. coli lysate, FLASHIda increases the number of unique proteoform level identifications from 800 to 1500 or generates a near-identical number of identifications in one third of the instrument time when compared to standard DDA mode. Furthermore, FLASHIda enables sensitive mapping of post-translational modifications and detection of chemical adducts. As a software extension module to the instrument, FLASHIda can be readily adopted for TDP studies of complex samples to enhance proteoform identification rates.

Assuntos

Proteoma , Proteômica , ATPases Transportadoras de Cálcio do Retículo Sarcoplasmático/antagonistas & inibidores , Proteínas de Ligação a DNA/metabolismo , Escherichia coli/metabolismo , Coração , Peptídeos , Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , Proteômica/métodos

9.

Mass Deconvolution of Top-Down Mass Spectrometry Datasets by FLASHDeconv.

Jeong, Kyowon; Kim, Jihyung; Kohlbacher, Oliver.

Methods Mol Biol ; 2500: 145-157, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35657592

RESUMO

Mass deconvolution, the determination of proteoform precursor and fragment masses, is crucial for top-down proteomics data analysis. Here we describe the detailed procedure to run FLASHDeconv, an ultrafast, high-quality mass deconvolution tool. Both spectrum- and feature-level deconvolution results are obtainable in various output formats by FLASHDeconv. FLASHDeconv is runnable in different environments such as the command line and OpenMS workflows.

Assuntos

Análise de Dados , Proteômica , Espectrometria de Massas , Proteômica/métodos

10.

MS1-Level Proteome Quantification Platform Allowing Maximally Increased Multiplexity for SILAC and In Vitro Chemical Labeling.

Choi, Yeon; Jeong, Kyowon; Shin, Sanghee; Lee, Joon Won; Lee, Young-Suk; Kim, Sangtae; Kim, Sun Ah; Jung, Jaehun; Kim, Kwang Pyo; Kim, V Narry; Kim, Jong-Seo.

Anal Chem ; 92(7): 4980-4989, 2020 04 07.

Artigo em Inglês | MEDLINE | ID: mdl-32167278

RESUMO

Quantitative proteomic platforms based on precursor intensity in mass spectrometry (MS1-level) uniquely support in vivo metabolic labeling with superior quantification accuracy but suffer from limited multiplexity (≤3-plex) and frequent missing quantities. Here we present a new MS1-level quantification platform that allows maximal multiplexing with high quantification accuracy and precision for the given labeling scheme. The platform currently comprises 6-plex in vivo SILAC or in vitro diethylation labeling with a dedicated algorithm and is also expandable to higher multiplexity (e.g., nine-plex for SILAC). For complex samples with broad dynamic ranges such as total cell lysates, our platform performs highly accurately and free of missing quantities. Furthermore, we successfully applied our method to measure protein synthesis rate under heat shock response in human cells by 6-plex pulsed SILAC experiments, demonstrating the unique biological merits of our in vivo platform to disclose translational regulations for cellular response to stress.

Assuntos

Proteínas de Neoplasias/análise , Proteoma/análise , Células HeLa , Humanos , Espectrometria de Massas , Células Tumorais Cultivadas

11.

FLASHDeconv: Ultrafast, High-Quality Feature Deconvolution for Top-Down Proteomics.

Jeong, Kyowon; Kim, Jihyung; Gaikwad, Manasi; Hidayah, Siti Nurul; Heikaus, Laura; Schlüter, Hartmut; Kohlbacher, Oliver.

Cell Syst ; 10(2): 213-218.e6, 2020 02 26.

Artigo em Inglês | MEDLINE | ID: mdl-32078799

RESUMO

Top-down mass spectrometry (TD-MS)-based proteomics analyzes intact proteoforms and thus preserves information about individual protein species. The MS signal of these high-mass analytes is complex and challenges the accurate determination of proteoform masses. Fast and accurate feature deconvolution (i.e., the determination of intact proteoform masses) is, therefore, an essential step for TD data analysis. Here, we present FLASHDeconv, an algorithm achieving higher deconvolution quality, with an execution speed two orders of magnitude faster than existing approaches. FLASHDeconv transforms peak positions (m/z) within spectra into log m/z space. This simple transformation turns the deconvolution problem into a search for constant patterns, thereby greatly accelerating the process. In both simple and complex samples, FLASHDeconv reports more genuine feature masses and substantially fewer artifacts than other existing methods. FLASHDeconv is freely available for download here: https://www.openms.org/flashdeconv/. A record of this paper's Transparent Peer Review process is included in the Supplemental Information.

Assuntos

Proteômica/métodos , Algoritmos , Humanos

12.

Deuterium-Free, Three-Plexed Peptide Diethylation for Highly Accurate Quantitative Proteomics.

Jung, Jaehun; Jeong, Kyowon; Choi, Yeon; Kim, Sun Ah; Kim, Hyunjoon; Lee, Joon Won; Kim, V Narry; Kim, Kwang Pyo; Kim, Jong-Seo.

J Proteome Res ; 18(3): 1078-1087, 2019 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-30638020

RESUMO

The deuterium, a frequently used stable isotope in isotopic labeling for quantitative proteomics, could deteriorate the accuracy and precision of proteome quantification owing to the retention time shift of deuterated peptides from the hydrogenated counterpart. We introduce a novel three-plexed peptide "diethylation" using only 13C isotopologues of acetaldehyde and demonstrate that the accuracy and precision of our method in proteome quantification are significantly superior to the conventional deuterium-based dimethylation labeling in both a single-shot and multidimensional LC-MS/MS analysis of the HeLa proteome. Furthermore, in time-resolved profiling of Xenopus laevis early embryogenesis, our 3-plexed diethylation outperformed isobaric labeling approaches in terms of the quantification accuracy or the number of protein identifications, generating more than two times more differentially expressed proteins. Our cost-effective and highly accurate 3-plexed diethylation method could contribute to various types of quantitative proteomics applications in which three of multiplexity would be sufficient.

Assuntos

Desenvolvimento Embrionário/genética , Proteoma/genética , Proteômica/métodos , Xenopus laevis/genética , Animais , Cromatografia Líquida , Deutério/química , Regulação da Expressão Gênica no Desenvolvimento/genética , Células HeLa , Humanos , Marcação por Isótopo , Espectrometria de Massas em Tandem , Xenopus laevis/crescimento & desenvolvimento

13.

Genome-wide Mapping of DROSHA Cleavage Sites on Primary MicroRNAs and Noncanonical Substrates.

Kim, Baekgyu; Jeong, Kyowon; Kim, V Narry.

Mol Cell ; 66(2): 258-269.e5, 2017 Apr 20.

Artigo em Inglês | MEDLINE | ID: mdl-28431232

RESUMO

MicroRNA (miRNA) maturation is initiated by DROSHA, a double-stranded RNA (dsRNA)-specific RNase III enzyme. By cleaving primary miRNAs (pri-miRNAs) at specific positions, DROSHA serves as a main determinant of miRNA sequences and a highly selective gatekeeper for the canonical miRNA pathway. However, the sites of DROSHA-mediated processing have not been annotated, and it remains unclear to what extent DROSHA functions outside the miRNA pathway. Here, we establish a protocol termed "formaldehyde crosslinking, immunoprecipitation, and sequencing (fCLIP-seq)," which allows identification of DROSHA cleavage sites at single-nucleotide resolution. fCLIP identifies numerous processing sites, suggesting widespread end modifications during miRNA maturation. fCLIP also finds many pri-miRNAs that undergo alternative processing, yielding multiple miRNA isoforms. Moreover, we discovered dozens of DROSHA substrates on non-miRNA loci, which may serve as cis-elements for DROSHA-mediated gene regulation. We anticipate that fCLIP-seq could be a general tool for investigating interactions between dsRNA-binding proteins and structured RNAs.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , MicroRNAs/metabolismo , Processamento Pós-Transcricional do RNA , Ribonuclease III/metabolismo , Análise de Sequência de RNA/métodos , Sequência de Bases , Sítios de Ligação , Reagentes de Ligações Cruzadas/química , Formaldeído/química , Células HEK293 , Células HeLa , Humanos , Imunoprecipitação , MicroRNAs/química , MicroRNAs/genética , Conformação de Ácido Nucleico , Ligação Proteica , Interferência de RNA , Ribonuclease III/química , Ribonuclease III/genética , Relação Estrutura-Atividade , Especificidade por Substrato , Transfecção

14.

Virmid: accurate detection of somatic mutations with sample impurity inference.

Kim, Sangwoo; Jeong, Kyowon; Bhutani, Kunal; Lee, Jeong; Patel, Anand; Scott, Eric; Nam, Hojung; Lee, Hayan; Gleeson, Joseph G; Bafna, Vineet.

Genome Biol ; 14(8): R90, 2013 Aug 29.

Artigo em Inglês | MEDLINE | ID: mdl-23987214

RESUMO

Detection of somatic variation using sequence from disease-control matched data sets is a critical first step. In many cases including cancer, however, it is hard to isolate pure disease tissue, and the impurity hinders accurate mutation analysis by disrupting overall allele frequencies. Here, we propose a new method, Virmid, that explicitly determines the level of impurity in the sample, and uses it for improved detection of somatic variation. Extensive tests on simulated and real sequencing data from breast cancer and hemimegalencephaly demonstrate the power of our model. A software implementation of our method is available at http://sourceforge.net/projects/virmid/.

Assuntos

Neoplasias da Mama/genética , Hemimegalencefalia/genética , Mutação , Proteínas de Neoplasias/genética , Software , Microambiente Tumoral/genética , Alelos , Neoplasias da Mama/diagnóstico , Exoma , Feminino , Frequência do Gene , Hemimegalencefalia/diagnóstico , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Funções Verossimilhança

15.

UniNovo: a universal tool for de novo peptide sequencing.

Jeong, Kyowon; Kim, Sangtae; Pevzner, Pavel A.

Bioinformatics ; 29(16): 1953-62, 2013 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-23766417

RESUMO

MOTIVATION: Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but de novo peptide sequencing algorithms to analyze tandem mass (MS/MS) spectra are lagging behind. Although existing de novo sequencing tools perform well on certain types of spectra [e.g. Collision Induced Dissociation (CID) spectra of tryptic peptides], their performance often deteriorates on other types of spectra, such as Electron Transfer Dissociation (ETD), Higher-energy Collisional Dissociation (HCD) spectra or spectra of non-tryptic digests. Thus, rather than developing a new algorithm for each type of spectra, we develop a universal de novo sequencing algorithm called UniNovo that works well for all types of spectra or even for spectral pairs (e.g. CID/ETD spectral pairs). UniNovo uses an improved scoring function that captures the dependences between different ion types, where such dependencies are learned automatically using a modified offset frequency function. RESULTS: The performance of UniNovo is compared with PepNovo+, PEAKS and pNovo using various types of spectra. The results show that the performance of UniNovo is superior to other tools for ETD spectra and superior or comparable with others for CID and HCD spectra. UniNovo also estimates the probability that each reported reconstruction is correct, using simple statistics that are readily obtained from a small training dataset. We demonstrate that the estimation is accurate for all tested types of spectra (including CID, HCD, ETD, CID/ETD and HCD/ETD spectra of trypsin, LysC or AspN digested peptides). AVAILABILITY: UniNovo is implemented in JAVA and tested on Windows, Ubuntu and OS X machines. UniNovo is available at http://proteomics.ucsd.edu/Software/UniNovo.html along with the manual.

Assuntos

Peptídeos/química , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Espectrometria de Massas em Tandem

16.

Wessim: a whole-exome sequencing simulator based on in silico exome capture.

Kim, Sangwoo; Jeong, Kyowon; Bafna, Vineet.

Bioinformatics ; 29(8): 1076-7, 2013 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-23413434

RESUMO

SUMMARY: We propose a targeted re-sequencing simulator Wessim that generates synthetic exome sequencing reads from a given sample genome. Wessim emulates conventional exome capture technologies, including Agilent's SureSelect and NimbleGen's SeqCap, to generate DNA fragments from genomic target regions. The target regions can be either specified by genomic coordinates or inferred from in silico probe hybridization. Coupled with existing next-generation sequencing simulators, Wessim generates a realistic artificial exome sequencing data, which is essential for developing and evaluating exome-targeted variant callers. AVAILABILITY: Source code and the packaged version of Wessim with manuals are available at http://sak042.github.com/Wessim/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Exoma , Análise de Sequência de DNA/métodos , Software , Simulação por Computador , Genoma Humano , Genômica/métodos , Humanos

17.

False discovery rates in spectral identification.

Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno.

BMC Bioinformatics ; 13 Suppl 16: S2, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23176207

RESUMO

Automated database search engines are one of the fundamental engines of high-throughput proteomics enabling daily identifications of hundreds of thousands of peptides and proteins from tandem mass (MS/MS) spectrometry data. Nevertheless, this automation also makes it humanly impossible to manually validate the vast lists of resulting identifications from such high-throughput searches. This challenge is usually addressed by using a Target-Decoy Approach (TDA) to impose an empirical False Discovery Rate (FDR) at a pre-determined threshold x% with the expectation that at most x% of the returned identifications would be false positives. But despite the fundamental importance of FDR estimates in ensuring the utility of large lists of identifications, there is surprisingly little consensus on exactly how TDA should be applied to minimize the chances of biased FDR estimates. In fact, since less rigorous TDA/FDR estimates tend to result in more identifications (at higher 'true' FDR), there is often little incentive to enforce strict TDA/FDR procedures in studies where the major metric of success is the size of the list of identifications and there are no follow up studies imposing hard cost constraints on the number of reported false positives. Here we address the problem of the accuracy of TDA estimates of empirical FDR. Using MS/MS spectra from samples where we were able to define a factual FDR estimator of 'true' FDR we evaluate several popular variants of the TDA procedure in a variety of database search contexts. We show that the fraction of false identifications can sometimes be over 10× higher than reported and may be unavoidably high for certain types of searches. In addition, we further report that the two-pass search strategy seems the most promising database search strategy. While unavoidably constrained by the particulars of any specific evaluation dataset, our observations support a series of recommendations towards maximizing the number of resulting identifications while controlling database searches with robust and reproducible TDA estimation of empirical FDR.

Assuntos

Bases de Dados de Proteínas/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Ferramenta de Busca/métodos , Espectrometria de Massas em Tandem/estatística & dados numéricos , Algoritmos , Peptídeos/química , Proteínas/química

18.

Gapped spectral dictionaries and their applications for database searches of tandem mass spectra.

Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno; Pevzner, Pavel A.

Mol Cell Proteomics ; 10(6): M110.002220, 2011 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-21444829

RESUMO

Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-Gapped-Dictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-Gapped-Dictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches.

Assuntos

Bases de Dados de Proteínas , Análise de Sequência de Proteína/métodos , Espectrometria de Massas em Tandem/métodos , Algoritmos , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Células HEK293 , Humanos , Shewanella/metabolismo

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA