Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 50
Filter
Add more filters

Affiliation country
Publication year range
1.
Bioinformatics ; 38(22): 5119-5120, 2022 11 15.
Article in English | MEDLINE | ID: mdl-36130273

ABSTRACT

MOTIVATION: Confident deconvolution of proteomic spectra is critical for several applications such as de novo sequencing, cross-linking mass spectrometry and handling chimeric mass spectra. RESULTS: In general, all deconvolution algorithms may eventually report mass peaks that are not compatible with the chemical formula of any peptide. We show how to remove these artifacts by considering their mass defects. We introduce Y.A.D.A. 3.0, a fast deconvolution algorithm that can remove peaks with unacceptable mass defects. Our approach is effective for polypeptides with less than 10 kDa, and its essence can be easily incorporated into any deconvolution algorithm. AVAILABILITY AND IMPLEMENTATION: Y.A.D.A. 3.0 is freely available for academic use at http://patternlabforproteomics.org/yada3. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.


Subject(s)
Algorithms , Proteomics , Peptides , Mass Spectrometry/methods , Software
2.
Bioinformatics ; 35(18): 3489-3490, 2019 09 15.
Article in English | MEDLINE | ID: mdl-30715205

ABSTRACT

MOTIVATION: We present the first tool for unbiased quality control of top-down proteomics datasets. Our tool can select high-quality top-down proteomics spectra, serve as a gateway for building top-down spectral libraries and, ultimately, improve identification rates. RESULTS: We demonstrate that a twofold rate increase for two E. coli top-down proteomics datasets may be achievable. AVAILABILITY AND IMPLEMENTATION: http://patternlabforproteomics.org/tdgc, freely available for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteomics , Escherichia coli , Software , Tandem Mass Spectrometry
3.
BMC Cancer ; 19(1): 365, 2019 Apr 18.
Article in English | MEDLINE | ID: mdl-30999875

ABSTRACT

BACKGROUND: Worldwide, breast cancer is the main cause of cancer mortality in women. Most cases originate in mammary ductal cells that produce the nipple aspirate fluid (NAF). In cancer patients, this secretome contains proteins associated with the tumor microenvironment. NAF studies are challenging because of inter-individual variability. We introduced a paired-proteomic shotgun strategy that relies on NAF analysis from both breasts of patients with unilateral breast cancer and extended PatternLab for Proteomics software to take advantage of this setup. METHODS: The software is based on a peptide-centric approach and uses the binomial distribution to attribute a probability for each peptide as being linked to the disease; these probabilities are propagated to a final protein p-value according to the Stouffer's Z-score method. RESULTS: A total of 1227 proteins were identified and quantified, of which 87 were differentially abundant, being mainly involved in glycolysis (Warburg effect) and immune system activation (activated stroma). Additionally, in the estrogen receptor-positive subgroup, proteins related to the regulation of insulin-like growth factor transport and platelet degranulation displayed higher abundance, confirming the presence of a proliferative microenvironment. CONCLUSIONS: We debuted a differential bioinformatics workflow for the proteomic analysis of NAF, validating this secretome as a treasure-trove for studying a paired-organ cancer type.


Subject(s)
Biomarkers, Tumor/metabolism , Breast Neoplasms/metabolism , Breast Neoplasms/pathology , Nipple Aspirate Fluid/metabolism , Proteome/analysis , Proteomics/methods , Tumor Microenvironment , Aged , Aged, 80 and over , Case-Control Studies , Female , Follow-Up Studies , Humans , Middle Aged , Prognosis , Workflow
4.
Bioinformatics ; 33(12): 1883-1885, 2017 Jun 15.
Article in English | MEDLINE | ID: mdl-28186229

ABSTRACT

MOTIVATION: Around 75% of all mass spectra remain unidentified by widely adopted proteomic strategies. We present DiagnoProt, an integrated computational environment that can efficiently cluster millions of spectra and use machine learning to shortlist high-quality unidentified mass spectra that are discriminative of different biological conditions. RESULTS: We exemplify the use of DiagnoProt by shortlisting 4366 high-quality unidentified tandem mass spectra that are discriminative of different types of the Aspergillus fungus. AVAILABILITY AND IMPLEMENTATION: DiagnoProt, a demonstration video and a user tutorial are available at http://patternlabforproteomics.org/diagnoprot . CONTACT: andrerfsilva@gmail.com or paulo@pcarvalho.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Machine Learning , Proteomics/methods , Sequence Analysis, Protein/methods , Software , Tandem Mass Spectrometry/methods , Aspergillus/metabolism , Fungal Proteins/analysis
5.
J Theor Biol ; 451: 111-116, 2018 08 14.
Article in English | MEDLINE | ID: mdl-29750998

ABSTRACT

Analyzing the information content of DNA, though holding the promise to help quantify how the processes of evolution have led to information gain throughout the ages, has remained an elusive goal. Paradoxically, one of the main reasons for this has been precisely the great diversity of life on the planet: if on the one hand this diversity is a rich source of data for information-content analysis, on the other hand there is so much variation as to make the task unmanageable. During the past decade or so, however, succinct fragments of the COI mitochondrial gene, which is present in all animal phyla and in a few others, have been shown to be useful for species identification through DNA barcoding. A few million such fragments are now publicly available through the BOLD systems initiative, thus providing an unprecedented opportunity for relatively comprehensive information-theoretic analyses of DNA to be attempted. Here we show how a generalized form of total correlation can yield distinctive information-theoretic descriptors of the phyla represented in those fragments. In order to illustrate the potential of this analysis to provide new insight into the evolution of species, we performed principal component analysis on standardized versions of the said descriptors for 23 phyla. Surprisingly, we found that, though based solely on the species represented in the data, the first principal component correlates strongly with the natural logarithm of the number of all known living species for those phyla. The new descriptors thus constitute clear information-theoretic signatures of the processes whereby evolution has given rise to current biodiversity, which suggests their potential usefulness in further related studies.


Subject(s)
Biodiversity , DNA Barcoding, Taxonomic/methods , Animals , Biological Evolution , DNA, Mitochondrial/genetics , Electron Transport Complex IV/genetics , Phylogeny , Principal Component Analysis
6.
Mol Cell Proteomics ; 13(9): 2480-9, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24878498

ABSTRACT

Peptide spectrum matching is the current gold standard for protein identification via mass-spectrometry-based proteomics. Peptide spectrum matching compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database cannot be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can make it possible to infer a peptide sequence directly from a mass spectrum, but interpreting long lists of peptide sequences resulting from large-scale experiments is not trivial. With this as motivation, PepExplorer was developed to use rigorous pattern recognition to assemble a list of homologue proteins using de novo sequencing data coupled to sequence alignment to allow biological interpretation of the data. PepExplorer can read the output of various widely adopted de novo sequencing tools and converge to a list of proteins with a global false-discovery rate. To this end, it employs a radial basis function neural network that considers precursor charge states, de novo sequencing scores, peptide lengths, and alignment scores to select similar protein candidates, from a target-decoy database, usually obtained from phylogenetically related species. Alignments are performed using a modified Smith-Waterman algorithm tailored for the task at hand. We verified the effectiveness of our approach using a reference set of identifications generated by ProLuCID when searching for Pyrococcus furiosus mass spectra on the corresponding NCBI RefSeq database. We then modified the sequence database by swapping amino acids until ProLuCID was no longer capable of identifying any proteins. By searching the mass spectra using PepExplorer on the modified database, we were able to recover most of the identifications at a 1% false-discovery rate. Finally, we employed PepExplorer to disclose a comprehensive proteomic assessment of the Bothrops jararaca plasma, a known biological source of natural inhibitors of snake toxins. PepExplorer is integrated into the PatternLab for Proteomics environment, which makes available various tools for downstream data analysis, including resources for quantitative and differential proteomics.


Subject(s)
Algorithms , Databases, Protein , Sequence Analysis, Protein/methods , Amino Acid Sequence , Animals , Archaeal Proteins/metabolism , Bothrops/metabolism , Mass Spectrometry , Plasma/metabolism , Proteomics , Pyrococcus furiosus/metabolism , Sequence Alignment
7.
J Proteome Res ; 13(1): 314-20, 2014 Jan 03.
Article in English | MEDLINE | ID: mdl-24283986

ABSTRACT

Accessing localized proteomic profiles has emerged as a fundamental strategy to understand the biology of diseases, as recently demonstrated, for example, in the context of determining cancer resection margins with improved precision. Here, we analyze a gastric cancer biopsy sectioned into 10 parts, each one subjected to MudPIT analysis. We introduce a software tool, named Shotgun Imaging Analyzer and inspired in MALDI imaging, to enable the overlaying of a protein's expression heat map on a tissue picture. The software is tightly integrated with the NeXtProt database, so it enables the browsing of identified proteins according to chromosomes, quickly listing human proteins never identified by mass spectrometry (i.e., the so-called missing proteins), and the automatic search for proteins that are more expressed over a specific region of interest on the biopsy, all of which constitute goals that are clearly well-aligned with those of the C-HPP. Our software has been able to highlight an intense expression of proteins previously known to be correlated with cancers (e.g., glutathione S-transferase Mu 3), and in particular, we draw attention to Gastrokine-2, a "missing protein" identified in this work of which we were able to clearly delineate the tumoral region from the "healthy" with our approach. Data are available via ProteomeXchange with identifier PXD000584.


Subject(s)
Neoplasm Proteins/metabolism , Proteomics , Stomach Neoplasms/metabolism , Biopsy , Chromatography, Liquid , Humans , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization , Stomach Neoplasms/pathology , Tandem Mass Spectrometry
8.
Bioinformatics ; 29(10): 1343-4, 2013 May 15.
Article in English | MEDLINE | ID: mdl-23446294

ABSTRACT

SUMMARY: Protein identification by mass spectrometry is commonly accomplished using a peptide sequence matching search algorithm, whose sensitivity varies inversely with the size of the sequence database and the number of post-translational modifications considered. We present the Spectrum Identification Machine, a peptide sequence matching tool that capitalizes on the high-intensity b1-fragment ion of tandem mass spectra of peptides coupled in solution with phenylisotiocyanate to confidently sequence the first amino acid and ultimately reduce the search space. We demonstrate that in complex search spaces, a gain of some 120% in sensitivity can be achieved. AVAILABILITY: All data generated and the software are freely available for academic use at http://proteomics.fiocruz.br/software/sim. CONTACT: paulo@pcarvalho.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Escherichia coli Proteins/analysis , Escherichia coli/chemistry , Peptides/analysis , Proteomics/methods , Amino Acid Sequence , Escherichia coli Proteins/chemistry , Mass Spectrometry , Peptides/chemistry , Protein Processing, Post-Translational , Software
9.
Bioinformatics ; 28(12): 1652-4, 2012 Jun 15.
Article in English | MEDLINE | ID: mdl-22539673

ABSTRACT

UNLABELLED: We present an updated version of the TFold software for pinpointing differentially expressed proteins in shotgun proteomics experiments. Given an FDR bound, the updated approach uses a theoretical FDR estimator to maximize the number of identifications that satisfy both a fold-change cutoff that varies with the t-test P-value as a power law and a stringency criterion that aims to detect lowly abundant proteins. The new version has yielded significant improvements in sensitivity over the previous one. AVAILABILITY: Freely available for academic use at http://pcarvalho.com/patternlab.


Subject(s)
Proteins/analysis , Proteomics/methods , Software , Algorithms , Cell Line, Tumor , Computational Biology/methods , Data Interpretation, Statistical , Humans , Sequence Analysis, Protein , User-Computer Interface
10.
PLoS One ; 18(5): e0286312, 2023.
Article in English | MEDLINE | ID: mdl-37235568

ABSTRACT

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here we explore the use of multidimensional shapes of data, aiming to obtain scaling factors for use prior to clustering by some method, like k-means, that makes explicit use of distances between samples. We borrow from the field of cosmology and related areas the recently introduced notion of shape complexity, which in the variant we use is a relatively simple, data-dependent nonlinear function that we show can be used to help with the determination of appropriate scaling factors. Focusing on what might be called "midrange" distances, we formulate a constrained nonlinear programming problem and use it to produce candidate scaling-factor sets that can be sifted on the basis of further considerations of the data, say via expert knowledge. We give results on some iconic data sets, highlighting the strengths and potential weaknesses of the new approach. These results are generally positive across all the data sets used.


Subject(s)
Algorithms , Cluster Analysis
11.
J Am Soc Mass Spectrom ; 34(4): 794-796, 2023 Apr 05.
Article in English | MEDLINE | ID: mdl-36947430

ABSTRACT

Complex protein mixtures typically generate many tandem mass spectra produced by different peptides coisolated in the gas phase. Widely adopted proteomic data analysis environments usually fail to identify most of these spectra, succeeding at best in identifying only one of the multiple cofragmenting peptides. We present PatternLab V (PLV), an updated version of PatternLab that integrates the YADA 3 deconvolution algorithm to handle such cases efficiently. In general, we expect an increase of 10% in spectral identifications when dealing with complex proteomic samples. PLV is freely available at http://patternlabforproteomics.org.


Subject(s)
Peptides , Proteomics , Peptides/analysis , Proteins/analysis , Algorithms , Tandem Mass Spectrometry , Databases, Protein , Software
12.
J Proteomics ; 277: 104853, 2023 04 15.
Article in English | MEDLINE | ID: mdl-36804625

ABSTRACT

MOTIVATION: There are several well-established paradigms for identifying and pinpointing discriminative peptides/proteins using shotgun proteomic data; examples are peptide-spectrum matching, de novo sequencing, open searches, and even hybrid approaches. Such an arsenal of complementary paradigms can provide deep data coverage, albeit some unidentified discriminative peptides remain. RESULTS: We present DiagnoMass, software tool that groups similar spectra into spectral clusters and then shortlists those clusters that are discriminative for biological conditions. DiagnoMass then communicates with proteomic tools to attempt the identification of such clusters. We demonstrate the effectiveness of DiagnoMass by analyzing proteomic data from Escherichia coli, Salmonella, and Shigella, listing many high-quality discriminative spectral clusters that had thus far remained unidentified by widely adopted proteomic tools. DiagnoMass can also classify proteomic profiles. We anticipate the use of DiagnoMass as a vital tool for pinpointing biomarkers. AVAILABILITY: DiagnoMass and related documentation, including a usage protocol, are available at http://www.diagnomass.com.


Subject(s)
Proteomics , Software , Proteomics/methods , Proteins/chemistry , Peptides/chemistry , Escherichia coli , Algorithms , Databases, Protein
13.
Proteomics ; 12(7): 944-9, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22311825

ABSTRACT

The search engine processor (SEPro) is a tool for filtering, organizing, sharing, and displaying peptide spectrum matches. It employs a novel three-tier Bayesian approach that uses layers of spectrum, peptide, and protein logic to lead the data to converge to a single list of reliable protein identifications. SEPro is integrated into the PatternLab for proteomics environment, where an arsenal of tools for analyzing shotgun proteomic data is provided. By using the semi-labeled decoy approach for benchmarking, we show that SEPro significantly outperforms a commercially available competitor.


Subject(s)
Algorithms , Databases, Protein , Peptide Fragments/chemistry , Proteomics/methods , Software , Animals , Bayes Theorem , Chromatography, Liquid , Database Management Systems , Mice , Proteins/chemistry , Proteins/classification , Tandem Mass Spectrometry , User-Computer Interface
14.
J Proteome Res ; 11(12): 5836-42, 2012 Dec 07.
Article in English | MEDLINE | ID: mdl-23145836

ABSTRACT

A strategy for treating cancer is to surgically remove the tumor together with a portion of apparently healthy tissue surrounding it, the so-called "resection margin", to minimize recurrence. Here, we investigate whether the proteomic profiles from biopsies of gastric cancer resection margins are indeed more similar to those from healthy tissue than from cancer biopsies. To this end, we analyzed biopsies using an offline MudPIT shotgun proteomic approach and performed label-free quantitation through a distributed normalized spectral abundance factor approach adapted for extracted ion chromatograms (XICs). A multidimensional scaling analysis revealed that each of those tissue-types is very distinct from each other. The resection margin presented several proteins previously correlated with cancer, but also other overexpressed proteins that may be related to tumor nourishment and metastasis, such as collagen alpha-1, ceruloplasmin, calpastatin, and E-cadherin. We argue that the resection margin plays a key role in Paget's "soil to seed" hypothesis, that is, that cancer cells require a special microenvironment to nourish and that understanding it could ultimately lead to more effective treatments.


Subject(s)
Biomarkers, Tumor/analysis , Proteome/analysis , Software , Stomach Neoplasms/metabolism , Biomarkers, Tumor/metabolism , Biopsy , Cadherins/metabolism , Case-Control Studies , Ceruloplasmin/metabolism , Chromatography, Ion Exchange/methods , Collagen Type XI/metabolism , Databases, Protein , Female , Humans , Male , Neoplasm Metastasis/diagnosis , Neoplasm Proteins/metabolism , Prognosis , Proteomics/methods , Pyloric Antrum/metabolism , Pyloric Antrum/pathology , Stomach Neoplasms/diagnosis , Stomach Neoplasms/pathology
15.
Bioinformatics ; 27(2): 275-6, 2011 Jan 15.
Article in English | MEDLINE | ID: mdl-21075743

ABSTRACT

SUMMARY: We present an approach to statistically pinpoint differentially expressed proteins that have quantitation values near the quantitation threshold and are not identified in all replicates (marginal cases). Our method uses a Bayesian strategy to combine parametric statistics with an empirical distribution built from the reproducibility quality of the technical replicates. AVAILABILITY: The software is freely available for academic use at http://pcarvalho.com/patternlab.


Subject(s)
Proteins/metabolism , Proteomics/methods , Bayes Theorem , Software
16.
J Theor Biol ; 312: 114-9, 2012 Nov 07.
Article in English | MEDLINE | ID: mdl-22898555

ABSTRACT

A quasispecies is a set of interrelated genotypes that have reached a stationary state while evolving according to the usual Darwinian principles of selection and mutation. Quasispecies studies invariably assume that it is possible for any genotype to mutate into any other, but recent finds indicate that this assumption is not necessarily true. Here we revisit the traditional quasispecies theory by adopting a network structure to constrain the occurrence of mutations. Such structure is governed by a random-graph model, whose single parameter (a probability p) controls both the graph's density and the dynamics of mutation. We contribute two further modifications to the theory, one to account for the fact that different loci in a genotype may be differently susceptible to the occurrence of mutations, the other to allow for a more plausible description of the transition from adaptation to degeneracy of the quasispecies as p is increased. We give analytical and simulation results for the usual case of binary genotypes, assuming the fitness landscape in which a genotype's fitness decays exponentially with its Hamming distance to the wild type. These results support the theory's assertions regarding the adaptation of the quasispecies to the fitness landscape and also its possible demise as a function of p.


Subject(s)
Evolution, Molecular , Models, Biological
17.
Nat Protoc ; 17(7): 1553-1578, 2022 07.
Article in English | MEDLINE | ID: mdl-35411045

ABSTRACT

Shotgun proteomics aims to identify and quantify the thousands of proteins in complex mixtures such as cell and tissue lysates and biological fluids. This approach uses liquid chromatography coupled with tandem mass spectrometry and typically generates hundreds of thousands of mass spectra that require specialized computational environments for data analysis. PatternLab for proteomics is a unified computational environment for analyzing shotgun proteomic data. PatternLab V (PLV) is the most comprehensive and crucial update so far, the result of intensive interaction with the proteomics community over several years. All PLV modules have been optimized and its graphical user interface has been completely updated for improved user experience. Major improvements were made to all aspects of the software, ranging from boosting the number of protein identifications to faster extraction of ion chromatograms. PLV provides modules for preparing sequence databases, protein identification, statistical filtering and in-depth result browsing for both labeled and label-free quantitation. The PepExplorer module can even pinpoint de novo sequenced peptides not already present in the database. PLV is of broad applicability and therefore suitable for challenging experimental setups, such as time-course experiments and data handling from unsequenced organisms. PLV interfaces with widely adopted software and community initiatives, e.g., Comet, Skyline, PEAKS and PRIDE. It is freely available at http://www.patternlabforproteomics.org .


Subject(s)
Proteomics , Software , Databases, Protein , Proteins/chemistry , Proteomics/methods , Tandem Mass Spectrometry
18.
Proteomics ; 11(20): 4105-8, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21834134

ABSTRACT

The decoy-database approach is currently the gold standard for assessing the confidence of identifications in shotgun proteomic experiments. Here, we demonstrate that what might appear to be a good result under the decoy-database approach for a given false-discovery rate could be, in fact, the product of overfitting. This problem has been overlooked until now and could lead to obtaining boosted identification numbers whose reliability does not correspond to the expected false-discovery rate. To overcome this, we are introducing a modified version of the method, termed a semi-labeled decoy approach, which enables the statistical determination of an overfitted result.


Subject(s)
Computational Biology , Proteomics/standards , Drug Discovery/standards
19.
Bioinformatics ; 26(6): 847-8, 2010 Mar 15.
Article in English | MEDLINE | ID: mdl-20106817

ABSTRACT

SUMMARY: XDIA is a computational strategy for analyzing multiplexed spectra acquired using electron transfer dissociation and collision-activated dissociation; it significantly increases identified spectra (approximately 250%) and unique peptides (approximately 30%) when compared with the data-dependent ETCaD analysis on middle-down, single-phase shotgun proteomic analysis. Increasing identified spectra and peptides improves quantitation statistics confidence and protein coverage, respectively. AVAILABILITY: The software and data produced in this work are freely available for academic use at http://fields.scripps.edu/XDIA CONTACT: paulo@pcarvalho.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteomics/methods , Software , Algorithms , Databases, Factual
20.
Phys Rev E ; 103(1-1): 012403, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33601496

ABSTRACT

Bacterial quorum sensing is the communication that takes place between bacteria as they secrete certain molecules into the intercellular medium that later get absorbed by the secreting cells themselves and by others. Depending on cell density, this uptake has the potential to alter gene expression and thereby affect global properties of the community. We consider the case of multiple bacterial species coexisting, referring to each one of them as a genotype and adopting the usual denomination of the molecules they collectively secrete as public goods. A crucial problem in this setting is characterizing the coevolution of genotypes as some of them secrete public goods (and pay the associated metabolic costs) while others do not but may nevertheless benefit from the available public goods. We introduce a network model to describe genotype interaction and evolution when genotype fitness depends on the production and uptake of public goods. The model comprises a random graph to summarize the possible evolutionary pathways the genotypes may take as they interact genetically with one another, and a system of coupled differential equations to characterize the behavior of genotype abundance in time. We study some simple variations of the model analytically and more complex variations computationally. Our results point to a simple trade-off affecting the long-term survival of those genotypes that do produce public goods. This trade-off involves, on the producer side, the impact of producing and that of absorbing the public good. On the nonproducer side, it involves the impact of absorbing the public good as well, now compounded by the molecular compatibility between the producer and the nonproducer. Depending on how these factors turn out, producers may or may not survive.


Subject(s)
Bacteria/cytology , Biological Evolution , Quorum Sensing , Bacteria/genetics , Models, Biological
SELECTION OF CITATIONS
SEARCH DETAIL