Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Aguilar, Eduardo J; Barbosa, Valmir C.

PLoS One ; 18(5): e0286312, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37235568

RESUMO

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here we explore the use of multidimensional shapes of data, aiming to obtain scaling factors for use prior to clustering by some method, like k-means, that makes explicit use of distances between samples. We borrow from the field of cosmology and related areas the recently introduced notion of shape complexity, which in the variant we use is a relatively simple, data-dependent nonlinear function that we show can be used to help with the determination of appropriate scaling factors. Focusing on what might be called "midrange" distances, we formulate a constrained nonlinear programming problem and use it to produce candidate scaling-factor sets that can be sifted on the basis of further considerations of the data, say via expert knowledge. We give results on some iconic data sets, highlighting the strengths and potential weaknesses of the new approach. These results are generally positive across all the data sets used.

Assuntos

Algoritmos , Análise por Conglomerados

2.

PatternLab V Handles Multiplex Spectra in Shotgun Proteomic Searches and Increases Identification.

Clasen, Milan A; Santos, Marlon D M; Kurt, Louise Ulrich; Fischer, Juliana; Camillo-Andrade, Amanda C; Sales, Lucas Albuquerque; de Arruda Campos Brasil de Souza, Tatiana; Lima, Diogo Borges; Gozzo, Fabio C; Valente, Richard Hemmi; Duran, Rosario; Barbosa, Valmir C; Carvalho, Paulo C.

J Am Soc Mass Spectrom ; 34(4): 794-796, 2023 Apr 05.

Artigo em Inglês | MEDLINE | ID: mdl-36947430

RESUMO

Complex protein mixtures typically generate many tandem mass spectra produced by different peptides coisolated in the gas phase. Widely adopted proteomic data analysis environments usually fail to identify most of these spectra, succeeding at best in identifying only one of the multiple cofragmenting peptides. We present PatternLab V (PLV), an updated version of PatternLab that integrates the YADA 3 deconvolution algorithm to handle such cases efficiently. In general, we expect an increase of 10% in spectral identifications when dealing with complex proteomic samples. PLV is freely available at http://patternlabforproteomics.org.

Assuntos

Peptídeos , Proteômica , Peptídeos/análise , Proteínas/análise , Algoritmos , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas , Software

3.

DiagnoMass: A proteomics hub for pinpointing discriminative spectral clusters.

Santos, Marlon D M; Camillo-Andrade, Amanda C; Lima, Diogo B; Souza, Tatiana A C B; Fischer, Juliana de S da G; Valente, Richard H; Gozzo, Fabio C; Barbosa, Valmir C; Batthyany, Carlos; Chamot-Rooke, Julia; Duran, Rosario; Carvalho, Paulo C.

J Proteomics ; 277: 104853, 2023 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-36804625

RESUMO

MOTIVATION: There are several well-established paradigms for identifying and pinpointing discriminative peptides/proteins using shotgun proteomic data; examples are peptide-spectrum matching, de novo sequencing, open searches, and even hybrid approaches. Such an arsenal of complementary paradigms can provide deep data coverage, albeit some unidentified discriminative peptides remain. RESULTS: We present DiagnoMass, software tool that groups similar spectra into spectral clusters and then shortlists those clusters that are discriminative for biological conditions. DiagnoMass then communicates with proteomic tools to attempt the identification of such clusters. We demonstrate the effectiveness of DiagnoMass by analyzing proteomic data from Escherichia coli, Salmonella, and Shigella, listing many high-quality discriminative spectral clusters that had thus far remained unidentified by widely adopted proteomic tools. DiagnoMass can also classify proteomic profiles. We anticipate the use of DiagnoMass as a vital tool for pinpointing biomarkers. AVAILABILITY: DiagnoMass and related documentation, including a usage protocol, are available at http://www.diagnomass.com.

Assuntos

Proteômica , Software , Proteômica/métodos , Proteínas/química , Peptídeos/química , Escherichia coli , Algoritmos , Bases de Dados de Proteínas

4.

Increasing confidence in proteomic spectral deconvolution through mass defect.

Clasen, Milan A; Kurt, Louise U; Santos, Marlon D M; Lima, Diogo B; Liu, Fan; Gozzo, Fabio C; Barbosa, Valmir C; Carvalho, Paulo C.

Bioinformatics ; 38(22): 5119-5120, 2022 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-36130273

RESUMO

MOTIVATION: Confident deconvolution of proteomic spectra is critical for several applications such as de novo sequencing, cross-linking mass spectrometry and handling chimeric mass spectra. RESULTS: In general, all deconvolution algorithms may eventually report mass peaks that are not compatible with the chemical formula of any peptide. We show how to remove these artifacts by considering their mass defects. We introduce Y.A.D.A. 3.0, a fast deconvolution algorithm that can remove peaks with unacceptable mass defects. Our approach is effective for polypeptides with less than 10 kDa, and its essence can be easily incorporated into any deconvolution algorithm. AVAILABILITY AND IMPLEMENTATION: Y.A.D.A. 3.0 is freely available for academic use at http://patternlabforproteomics.org/yada3. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.

Assuntos

Algoritmos , Proteômica , Peptídeos , Espectrometria de Massas/métodos , Software

5.

Simple, efficient and thorough shotgun proteomic analysis with PatternLab V.

Santos, Marlon D M; Lima, Diogo B; Fischer, Juliana S G; Clasen, Milan A; Kurt, Louise U; Camillo-Andrade, Amanda Caroline; Monteiro, Leandro C; de Aquino, Priscila F; Neves-Ferreira, Ana G C; Valente, Richard H; Trugilho, Monique R O; Brunoro, Giselle V F; Souza, Tatiana A C B; Santos, Renata M; Batista, Michel; Gozzo, Fabio C; Durán, Rosario; Yates, John R; Barbosa, Valmir C; Carvalho, Paulo C.

Nat Protoc ; 17(7): 1553-1578, 2022 07.

Artigo em Inglês | MEDLINE | ID: mdl-35411045

RESUMO

Shotgun proteomics aims to identify and quantify the thousands of proteins in complex mixtures such as cell and tissue lysates and biological fluids. This approach uses liquid chromatography coupled with tandem mass spectrometry and typically generates hundreds of thousands of mass spectra that require specialized computational environments for data analysis. PatternLab for proteomics is a unified computational environment for analyzing shotgun proteomic data. PatternLab V (PLV) is the most comprehensive and crucial update so far, the result of intensive interaction with the proteomics community over several years. All PLV modules have been optimized and its graphical user interface has been completely updated for improved user experience. Major improvements were made to all aspects of the software, ranging from boosting the number of protein identifications to faster extraction of ion chromatograms. PLV provides modules for preparing sequence databases, protein identification, statistical filtering and in-depth result browsing for both labeled and label-free quantitation. The PepExplorer module can even pinpoint de novo sequenced peptides not already present in the database. PLV is of broad applicability and therefore suitable for challenging experimental setups, such as time-course experiments and data handling from unsequenced organisms. PLV interfaces with widely adopted software and community initiatives, e.g., Comet, Skyline, PEAKS and PRIDE. It is freely available at http://www.patternlabforproteomics.org .

Assuntos

Proteômica , Software , Bases de Dados de Proteínas , Proteínas/química , Proteômica/métodos , Espectrometria de Massas em Tandem

6.

Leveraging the partition selection bias to achieve a high-quality clustering of mass spectra.

Silva, André R F; Lima, Diogo B; Kurt, Louise U; Dupré, Mathieu; Chamot-Rooke, Julia; Santos, Marlon D M; Nicolau, Carolina Alves; Valente, Richard Hemmi; Barbosa, Valmir C; Carvalho, Paulo C.

J Proteomics ; 245: 104282, 2021 08 15.

Artigo em Inglês | MEDLINE | ID: mdl-34089898

RESUMO

In proteomics, the identification of peptides from mass spectral data can be mathematically described as the partitioning of mass spectra into clusters (i.e., groups of spectra derived from the same peptide). The way partitions are validated is just as important, having evolved side by side with the clustering algorithms themselves and given rise to many partition assessment measures. An assessment measure is said to have a selection bias if, and only if, the probability that a randomly chosen partition scoring a high value depends on the number of clusters in the partition. In the context of clustering mass spectra, this might mislead the validation process to favor clustering algorithms that generate too many (or few) spectral clusters, regardless of the underlying peptide sequence. A selection bias toward the number of peptides is desirable for proteomics as it estimates the number of peptides in a complex protein mixture. Here, we introduce an assessment measure that is purposely biased toward the number of peptide ion species. We also introduce a partition assessment framework for proteomics, called the Partition Assessment Tool, and demonstrate its importance by evaluating the performance of eight clustering algorithms on seven proteomics datasets while discussing the trade-offs involved. SIGNIFICANCE: Clustering algorithms are widely adopted in proteomics for undertaking several tasks such as speeding up search engines, generating consensus mass spectra, and to aid in the classification of proteomic profiles. Choosing which algorithm is most fit for the task at hand is not simple as each algorithm has advantages and disadvantages; furthermore, specifying clustering parameters is also a necessary and fundamental step. For example, deciding on whether to generate "pure clusters" or fewer clusters but accepting noise. With this as motivation, we verify the performance of several widely adopted algorithms on proteomic datasets and introduce a theoretical framework for drawing conclusions on which approach is suitable for the task at hand.

Assuntos

Proteômica , Software , Algoritmos , Análise por Conglomerados , Bases de Dados de Proteínas , Viés de Seleção , Espectrometria de Massas em Tandem

7.

Interspecies evolutionary dynamics mediated by public goods in bacterial quorum sensing.

Aguilar, Eduardo J; Barbosa, Valmir C; Donangelo, Raul; Souza, Sergio R.

Phys Rev E ; 103(1-1): 012403, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-33601496

RESUMO

Bacterial quorum sensing is the communication that takes place between bacteria as they secrete certain molecules into the intercellular medium that later get absorbed by the secreting cells themselves and by others. Depending on cell density, this uptake has the potential to alter gene expression and thereby affect global properties of the community. We consider the case of multiple bacterial species coexisting, referring to each one of them as a genotype and adopting the usual denomination of the molecules they collectively secrete as public goods. A crucial problem in this setting is characterizing the coevolution of genotypes as some of them secrete public goods (and pay the associated metabolic costs) while others do not but may nevertheless benefit from the available public goods. We introduce a network model to describe genotype interaction and evolution when genotype fitness depends on the production and uptake of public goods. The model comprises a random graph to summarize the possible evolutionary pathways the genotypes may take as they interact genetically with one another, and a system of coupled differential equations to characterize the behavior of genotype abundance in time. We study some simple variations of the model analytically and more complex variations computationally. Our results point to a simple trade-off affecting the long-term survival of those genotypes that do produce public goods. This trade-off involves, on the producer side, the impact of producing and that of absorbing the public good. On the nonproducer side, it involves the impact of absorbing the public good as well, now compounded by the molecular compatibility between the producer and the nonproducer. Depending on how these factors turn out, producers may or may not survive.

Assuntos

Bactérias/citologia , Evolução Biológica , Percepção de Quorum , Bactérias/genética , Modelos Biológicos

8.

Mixed-Data Acquisition: Next-Generation Quantitative Proteomics Data Acquisition.

Santos, Marlon D M; Camillo-Andrade, Amanda Caroline; Kurt, Louise U; Clasen, Milan A; Lyra, Eduardo; Gozzo, Fabio C; Batista, Michel; Valente, Richard H; Brunoro, Giselle V F; Barbosa, Valmir C; Fischer, Juliana S G; Carvalho, Paulo C.

J Proteomics ; 222: 103803, 2020 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-32387712

RESUMO

We present the Mixed-Data Acquisition (MDA) strategy for mass spectrometry data acquisition. MDA combines Data-Dependent Acquisition (DDA) and Data-Independent Acquisition (DIA) in the same run, thus doing away with the requirements for separate DDA spectral libraries. MDA is a natural result from advances in mass spectrometry, such as high scan rates and multiple analyzers, and is tailored toward exploiting these features. We demonstrate MDA's effectiveness on a yeast proteome analysis by overcoming a common bottleneck for XIC-based label-free quantitation; namely, the coelution of precursors when m/z values cannot be distinguished. We anticipate that MDA will become the next mainstream data generation approach for proteomics. MDA can also serve as an orthogonal validation approach for DDA experiments. Specialized software for MDA data analysis is made available on the project's website.

Assuntos

Proteoma , Proteômica , Espectrometria de Massas , Software

9.

A quantitation module for isotope-labeled peptides integrated into PatternLab for proteomics.

Santos, Marlon D M; Lima, Diogo B; Silva, André R F; Kurt, Louise U; Clasen, Milan A; Pinto, Antônio F M; Moresco, James J; Yates, John R; Aquino, Priscila; Barbosa, Valmir C; Fischer, Juliana S G; Carvalho, Paulo C.

J Proteomics ; 202: 103371, 2019 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-31034900

RESUMO

We present a new module integrated into the widely adopted PatternLab for proteomics to enable analysis of isotope-labeled peptides produced using dimethyl or SILAC. The accurate quantitation of proteins lies within the heart of proteomics; dimethylation has shown to be reliable, inexpensive, and applicable to any sample type. We validate our algorithm using an M. tuberculosis dataset obtained from two biological conditions; we used three dimethyl labels, one serving as an internal control for labeling a mixture of samples from both biological conditions. This internal control certified the proper functioning of our software. Availability: http://patternlabforproteomics.org, freely available for academic use.

Assuntos

Algoritmos , Proteínas de Bactérias/metabolismo , Bases de Dados de Proteínas , Marcação por Isótopo , Mycobacterium tuberculosis/metabolismo , Peptídeos/química , Proteômica/normas , Proteínas de Bactérias/química , Peptídeos/metabolismo

10.

Differential proteomic comparison of breast cancer secretome using a quantitative paired analysis workflow.

Brunoro, Giselle Villa Flor; Carvalho, Paulo Costa; Barbosa, Valmir C; Pagnoncelli, Dante; De Moura Gallo, Claudia Vitória; Perales, Jonas; Zahedi, René Peiman; Valente, Richard Hemmi; Neves-Ferreira, Ana Gisele da Costa.

BMC Cancer ; 19(1): 365, 2019 Apr 18.

Artigo em Inglês | MEDLINE | ID: mdl-30999875

RESUMO

BACKGROUND: Worldwide, breast cancer is the main cause of cancer mortality in women. Most cases originate in mammary ductal cells that produce the nipple aspirate fluid (NAF). In cancer patients, this secretome contains proteins associated with the tumor microenvironment. NAF studies are challenging because of inter-individual variability. We introduced a paired-proteomic shotgun strategy that relies on NAF analysis from both breasts of patients with unilateral breast cancer and extended PatternLab for Proteomics software to take advantage of this setup. METHODS: The software is based on a peptide-centric approach and uses the binomial distribution to attribute a probability for each peptide as being linked to the disease; these probabilities are propagated to a final protein p-value according to the Stouffer's Z-score method. RESULTS: A total of 1227 proteins were identified and quantified, of which 87 were differentially abundant, being mainly involved in glycolysis (Warburg effect) and immune system activation (activated stroma). Additionally, in the estrogen receptor-positive subgroup, proteins related to the regulation of insulin-like growth factor transport and platelet degranulation displayed higher abundance, confirming the presence of a proliferative microenvironment. CONCLUSIONS: We debuted a differential bioinformatics workflow for the proteomic analysis of NAF, validating this secretome as a treasure-trove for studying a paired-organ cancer type.

Assuntos

Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Fluido do Aspirado de Mamilo/metabolismo , Proteoma/análise , Proteômica/métodos , Microambiente Tumoral , Idoso , Idoso de 80 Anos ou mais , Estudos de Casos e Controles , Feminino , Seguimentos , Humanos , Pessoa de Meia-Idade , Prognóstico , Fluxo de Trabalho

11.

Top-Down Garbage Collector: a tool for selecting high-quality top-down proteomics mass spectra.

Lima, Diogo B; Silva, André R F; Dupré, Mathieu; Santos, Marlon D M; Clasen, Milan A; Kurt, Louise U; Aquino, Priscila F; Barbosa, Valmir C; Carvalho, Paulo C; Chamot-Rooke, Julia.

Bioinformatics ; 35(18): 3489-3490, 2019 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-30715205

RESUMO

MOTIVATION: We present the first tool for unbiased quality control of top-down proteomics datasets. Our tool can select high-quality top-down proteomics spectra, serve as a gateway for building top-down spectral libraries and, ultimately, improve identification rates. RESULTS: We demonstrate that a twofold rate increase for two E. coli top-down proteomics datasets may be achievable. AVAILABILITY AND IMPLEMENTATION: http://patternlabforproteomics.org/tdgc, freely available for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Proteômica , Escherichia coli , Software , Espectrometria de Massas em Tandem

12.

Information-theoretic signatures of biodiversity in the barcoding gene.

Barbosa, Valmir C.

J Theor Biol ; 451: 111-116, 2018 08 14.

Artigo em Inglês | MEDLINE | ID: mdl-29750998

RESUMO

Analyzing the information content of DNA, though holding the promise to help quantify how the processes of evolution have led to information gain throughout the ages, has remained an elusive goal. Paradoxically, one of the main reasons for this has been precisely the great diversity of life on the planet: if on the one hand this diversity is a rich source of data for information-content analysis, on the other hand there is so much variation as to make the task unmanageable. During the past decade or so, however, succinct fragments of the COI mitochondrial gene, which is present in all animal phyla and in a few others, have been shown to be useful for species identification through DNA barcoding. A few million such fragments are now publicly available through the BOLD systems initiative, thus providing an unprecedented opportunity for relatively comprehensive information-theoretic analyses of DNA to be attempted. Here we show how a generalized form of total correlation can yield distinctive information-theoretic descriptors of the phyla represented in those fragments. In order to illustrate the potential of this analysis to provide new insight into the evolution of species, we performed principal component analysis on standardized versions of the said descriptors for 23 phyla. Surprisingly, we found that, though based solely on the species represented in the data, the first principal component correlates strongly with the natural logarithm of the number of all known living species for those phyla. The new descriptors thus constitute clear information-theoretic signatures of the processes whereby evolution has given rise to current biodiversity, which suggests their potential usefulness in further related studies.

Assuntos

Biodiversidade , Código de Barras de DNA Taxonômico/métodos , Animais , Evolução Biológica , DNA Mitocondrial/genética , Complexo IV da Cadeia de Transporte de Elétrons/genética , Filogenia , Análise de Componente Principal

13.

Characterization of homodimer interfaces with cross-linking mass spectrometry and isotopically labeled proteins.

Lima, Diogo B; Melchior, John T; Morris, Jamie; Barbosa, Valmir C; Chamot-Rooke, Julia; Fioramonte, Mariana; Souza, Tatiana A C B; Fischer, Juliana S G; Gozzo, Fabio C; Carvalho, Paulo C; Davidson, W Sean.

Nat Protoc ; 13(3): 431-458, 2018 03.

Artigo em Inglês | MEDLINE | ID: mdl-29388937

RESUMO

Cross-linking coupled with mass spectrometry (XL-MS) has emerged as a powerful strategy for the identification of protein-protein interactions, characterization of interaction regions, and obtainment of structural information on proteins and protein complexes. In XL-MS, proteins or complexes are covalently stabilized with cross-linkers and digested, followed by identification of the cross-linked peptides by tandem mass spectrometry (MS/MS). This provides spatial constraints that enable modeling of protein (complex) structures and regions of interaction. However, most XL-MS approaches are not capable of differentiating intramolecular from intermolecular links in multimeric complexes, and therefore they cannot be used to study homodimer interfaces. We have recently developed an approach that overcomes this limitation by stable isotope-labeling of one of the two monomers, thereby creating a homodimer with one 'light' and one 'heavy' monomer. Here, we describe a step-by-step protocol for stable isotope-labeling, followed by controlled denaturation and refolding in the presence of the wild-type protein. The resulting light-heavy dimers are cross-linked, digested, and analyzed by mass spectrometry. We show how to quantitatively analyze the corresponding data with SIM-XL, an XL-MS software with a module tailored toward the MS/MS data from homodimers. In addition, we provide a video tutorial of the data analysis with this protocol. This protocol can be performed in â¼14 d, and requires basic biochemical and mass spectrometry skills.

Assuntos

Marcação por Isótopo/métodos , Espectrometria de Massas em Tandem/métodos , Sequência de Aminoácidos , Reagentes de Ligações Cruzadas , Peptídeos , Conformação Proteica , Proteínas , Software

14.

A multi-protease, multi-dissociation, bottom-up-to-top-down proteomic view of the Loxosceles intermedia venom.

Trevisan-Silva, Dilza; Bednaski, Aline V; Fischer, Juliana S G; Veiga, Silvio S; Bandeira, Nuno; Guthals, Adrian; Marchini, Fabricio K; Leprevost, Felipe V; Barbosa, Valmir C; Senff-Ribeiro, Andrea; Carvalho, Paulo C.

Sci Data ; 4: 170090, 2017 07 11.

Artigo em Inglês | MEDLINE | ID: mdl-28696408

RESUMO

Venoms are a rich source for the discovery of molecules with biotechnological applications, but their analysis is challenging even for state-of-the-art proteomics. Here we report on a large-scale proteomic assessment of the venom of Loxosceles intermedia, the so-called brown spider. Venom was extracted from 200 spiders and fractioned into two aliquots relative to a 10 kDa cutoff mass. Each of these was further fractioned and digested with trypsin (4 h), trypsin (18 h), pepsin (18 h), and chymotrypsin (18 h), then analyzed by MudPIT on an LTQ-Orbitrap XL ETD mass spectrometer fragmenting precursors by CID, HCD, and ETD. Aliquots of undigested samples were also analyzed. Our experimental design allowed us to apply spectral networks, thus enabling us to obtain meta-contig assemblies, and consequently de novo sequencing of practically complete proteins, culminating in a deep proteome assessment of the venom. Data are available via ProteomeXchange, with identifier PXD005523.

Assuntos

Proteoma , Venenos de Aranha/química , Aranhas , Animais , Espectrometria de Massas , Peptídeo Hidrolases , Proteômica

15.

DiagnoProt: a tool for discovery of new molecules by mass spectrometry.

Silva, André R F; Lima, Diogo B; Leyva, Alejandro; Duran, Rosario; Batthyany, Carlos; Aquino, Priscila F; Leal, Juliana C; Rodriguez, Jimmy E; Domont, Gilberto B; Santos, Marlon D M; Chamot-Rooke, Julia; Barbosa, Valmir C; Carvalho, Paulo C.

Bioinformatics ; 33(12): 1883-1885, 2017 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-28186229

RESUMO

MOTIVATION: Around 75% of all mass spectra remain unidentified by widely adopted proteomic strategies. We present DiagnoProt, an integrated computational environment that can efficiently cluster millions of spectra and use machine learning to shortlist high-quality unidentified mass spectra that are discriminative of different biological conditions. RESULTS: We exemplify the use of DiagnoProt by shortlisting 4366 high-quality unidentified tandem mass spectra that are discriminative of different types of the Aspergillus fungus. AVAILABILITY AND IMPLEMENTATION: DiagnoProt, a demonstration video and a user tutorial are available at http://patternlabforproteomics.org/diagnoprot . CONTACT: andrerfsilva@gmail.com or paulo@pcarvalho.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado de Máquina , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Software , Espectrometria de Massas em Tandem/métodos , Aspergillus/metabolismo , Proteínas Fúngicas/análise

16.

Integrated analysis of shotgun proteomic data with PatternLab for proteomics 4.0.

Carvalho, Paulo C; Lima, Diogo B; Leprevost, Felipe V; Santos, Marlon D M; Fischer, Juliana S G; Aquino, Priscila F; Moresco, James J; Yates, John R; Barbosa, Valmir C.

Nat Protoc ; 11(1): 102-17, 2016 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-26658470

RESUMO

PatternLab for proteomics is an integrated computational environment that unifies several previously published modules for the analysis of shotgun proteomic data. The contained modules allow for formatting of sequence databases, peptide spectrum matching, statistical filtering and data organization, extracting quantitative information from label-free and chemically labeled data, and analyzing statistics for differential proteomics. PatternLab also has modules to perform similarity-driven studies with de novo sequencing data, to evaluate time-course experiments and to highlight the biological significance of data with regard to the Gene Ontology database. The PatternLab for proteomics 4.0 package brings together all of these modules in a self-contained software environment, which allows for complete proteomic data analysis and the display of results in a variety of graphical formats. All updates to PatternLab, including new features, have been previously tested on millions of mass spectra. PatternLab is easy to install, and it is freely available from http://patternlabforproteomics.org.

Assuntos

Proteômica/métodos , Software , Integração de Sistemas , Bases de Dados de Proteínas , Humanos , Peptídeos/química , Peptídeos/metabolismo , Processamento de Proteína Pós-Traducional , Espectrometria de Massas em Tandem , Fatores de Tempo

17.

Using PepExplorer to Filter and Organize De Novo Peptide Sequencing Results.

da Veiga Leprevost, Felipe; Barbosa, Valmir C; Carvalho, Paulo Costa.

Curr Protoc Bioinformatics ; 51: 13.27.1-13.27.9, 2015 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-26334921

RESUMO

PepExplorer aids in the biological interpretation of de novo sequencing results; this is accomplished by assembling a list of homolog proteins obtained by aligning results from widely adopted de novo sequencing tools against a target-decoy sequence database. Our tool relies on pattern recognition to ensure that the results satisfy a user-given false-discovery rate (FDR). For this, it employs a radial basis function neural network that considers the precursor charge states, de novo sequencing scores, the peptide lengths, and alignment scores. PepExplorer is recommended for studies addressing organisms with no genomic sequence available. PepExplorer is integrated into the PatternLab for proteomics environment, which makes available various tools for downstream data analysis, including the resources for quantitative and differential proteomics.

Assuntos

Algoritmos , Peptídeos/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Mineração de Dados/métodos , Bases de Dados de Proteínas , Dados de Sequência Molecular , Peptídeos/genética

18.

SIM-XL: A powerful and user-friendly tool for peptide cross-linking analysis.

Lima, Diogo B; de Lima, Tatiani B; Balbuena, Tiago S; Neves-Ferreira, Ana Gisele C; Barbosa, Valmir C; Gozzo, Fábio C; Carvalho, Paulo C.

J Proteomics ; 129: 51-55, 2015 Nov 03.

Artigo em Inglês | MEDLINE | ID: mdl-25638023

RESUMO

Chemical cross-linking has emerged as a powerful approach for the structural characterization of proteins and protein complexes. However, the correct identification of covalently linked (cross-linked or XL) peptides analyzed by tandem mass spectrometry is still an open challenge. Here we present SIM-XL, a software tool that can analyze data generated through commonly used cross-linkers (e.g., BS3/DSS). Our software introduces a new paradigm for search-space reduction, which ultimately accounts for its increase in speed and sensitivity. Moreover, our search engine is the first to capitalize on reporter ions for selecting tandem mass spectra derived from cross-linked peptides. It also makes available a 2D interaction map and a spectrum-annotation tool unmatched by any of its kind. We show SIM-XL to be more sensitive and faster than a competing tool when analyzing a data set obtained from the human HSP90. The software is freely available for academic use at http://patternlabforproteomics.org/sim-xl. A video demonstrating the tool is available at http://patternlabforproteomics.org/sim-xl/video. SIM-XL is the first tool to support XL data in the mzIdentML format; all data are thus available from the ProteomeXchange consortium (identifier PXD001677). This article is part of a Special Issue entitled: Computational Proteomics.

Assuntos

Algoritmos , Reagentes de Ligações Cruzadas/química , Peptídeos/química , Mapeamento de Interação de Proteínas/métodos , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Sítios de Ligação , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Ligação Proteica , Espectrometria de Massas em Tandem/métodos , Interface Usuário-Computador

19.

A scoring model for phosphopeptide site localization and its impact on the question of whether to use MSA.

Fischer, Juliana de S da G; Dos Santos, Marlon D M; Marchini, Fabricio K; Barbosa, Valmir C; Carvalho, Paulo C; Zanchin, Nilson I T.

J Proteomics ; 129: 42-50, 2015 Nov 03.

Artigo em Inglês | MEDLINE | ID: mdl-25623781

RESUMO

The production of structurally significant product ions during the dissociation of phosphopeptides is a key to the successful determination of phosphorylation sites. These diagnostic ions can be generated using the widely adopted MS/MS approach, MS3 (Data Dependent Neutral Loss - DDNL), or by multistage activation (MSA). The main purpose of this work is to introduce a false-localization rate (FLR) probabilistic model to enable unbiased phosphoproteomics studies. Briefly, our algorithm infers a probabilistic function from the distribution of the identified phosphopeptides' XCorr Delta scores (XD-Scores) in the current experiment. Our module infers p-values by relying on Gaussian mixture models and a logistic function. We demonstrate the usefulness of our probabilistic model by revisiting the "to MSA, or not to MSA" dilemma. For this, we use human leukemia-derived cells (K562) as a study model and enriched for phosphopeptides using the hydroxyapatite (HAP) chromatography. The aliquots were analyzed with and without MSA on an Orbitrap-XL. Our XD-Scoring analysis revealed that the MS/MS approach provides more identifications because of its faster scan rate, but that for the same given scan rate higher-confidence spectra can be achieved with MSA. Our software is integrated into the PatternLab for proteomics freely available for academic community at http://www.patternlabforproteomics.org. Biological significance Assigning statistical confidence to phosphorylation sites is necessary for proper phosphoproteomic assessment. Here we present a rigorous statistical model, based on Gaussian mixture models and a logistic function, which overcomes shortcomings of previous tools. The algorithm described herein is made readily available to the scientific community by integrating it into the widely adopted PatternLab for proteomics. This article is part of a Special Issue entitled: Computational Proteomics.

Assuntos

Espectrometria de Massas/métodos , Modelos Estatísticos , Fosfopeptídeos/química , Matrizes de Pontuação de Posição Específica , Mapeamento de Interação de Proteínas/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Dados de Sequência Molecular , Fosforilação , Ligação Proteica , Proteoma/química

20.

Early detection of epilepsy seizures based on a weightless neural network.

de Aguiar, Kleber; Franca, Felipe M G; Barbosa, Valmir C; Teixeira, Cesar A D.

Annu Int Conf IEEE Eng Med Biol Soc ; 2015: 4470-4, 2015 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26737287

RESUMO

This work introduces a new methodology for the early detection of epileptic seizure based on the WiSARD weightless neural network model and a new approach in terms of preprocessing the electroencephalogram (EEG) data. WiSARD has, among other advantages, the capacity of perform the training phase in a very fast way. This speed in training is due to the fact that WiSARD's neurons work like Random Access Memories (RAM) addressed by input patterns. Promising results were obtained in the anticipation of seizure onsets in four representative patients from the European Database on Epilepsy (EPILEPSIAE). The proposed seizure early detection WNN architecture was explored by varying the detection anticipation (Î´) in the 2 to 30 seconds interval, and by adopting 2 and 3 seconds as the width of the Sliding Observation Window (SOW) input. While in the most challenging patient (A) one obtained accuracies from 99.57% (Î´=2s; SOW=3s) to 72.56% (Î´=30s; SOW=2s), patient D seizures could be detected in the 99.77% (Î´=2s; SOW=2s) to 99.93% (Î´=30s; SOW=3s) accuracy interval.

Assuntos

Epilepsia , Diagnóstico Precoce , Eletroencefalografia , Humanos , Redes Neurais de Computação , Convulsões

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA