Search | VHL Regional Portal

An automated proteogenomic method uses mass spectrometry to reveal novel genes in Zea mays.

Castellana, Natalie E; Shen, Zhouxin; He, Yupeng; Walley, Justin W; Cassidy, California Jack; Briggs, Steven P; Bafna, Vineet.

Mol Cell Proteomics ; 13(1): 157-67, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24142994

ABSTRACT

New technologies in genomics and proteomics have influenced the emergence of proteogenomics, a field at the confluence of genomics, transcriptomics, and proteomics. First generation proteogenomic toolkits employ peptide mass spectrometry to identify novel protein coding regions. We extend first generation proteogenomic tools to achieve greater accuracy and enable the analysis of large, complex genomes. We apply our pipeline to Zea mays, which has a genome comparable in size to human. Our pipeline begins with the comparison of mass spectra to a putative translation of the genome. We select novel peptides, those that match a region of the genome that was not previously known to be protein coding, for grouping into refinement events. We present a novel, probabilistic framework for evaluating the accuracy of each event. Our calculated event probability, or eventProb, considers the number of supporting peptides and spectra, and the quality of each supporting peptide-spectrum match. Our pipeline predicts 165 novel protein-coding genes and proposes updated models for 741 additional genes.

Subject(s)

Genomics , Proteomics , Zea mays/genetics , Genome, Plant , Humans , Mass Spectrometry , Open Reading Frames

MORPH-PRO: a novel algorithm and web server for protein morphing.

Castellana, Natalie E; Lushnikov, Andrey; Rotkiewicz, Piotr; Sefcovic, Natasha; Pevzner, Pavel A; Godzik, Adam; Vyatkina, Kira.

Algorithms Mol Biol ; 8(1): 19, 2013 Jul 11.

Article in English | MEDLINE | ID: mdl-23844614

ABSTRACT

BACKGROUND: Proteins are known to be dynamic in nature, changing from one conformation to another while performing vital cellular tasks. It is important to understand these movements in order to better understand protein function. At the same time, experimental techniques provide us with only single snapshots of the whole ensemble of available conformations. Computational protein morphing provides a visualization of a protein structure transitioning from one conformation to another by producing a series of intermediate conformations. RESULTS: We present a novel, efficient morphing algorithm, Morph-Pro based on linear interpolation. We also show that apart from visualization, morphing can be used to provide plausible intermediate structures. We test this by using the intermediate structures of a c-Jun N-terminal kinase (JNK1) conformational change in a virtual docking experiment. The structures are shown to dock with higher score to known JNK1-binding ligands than structures solved using X-Ray crystallography. This experiment demonstrates the potential applications of the intermediate structures in modeling or virtual screening efforts. CONCLUSIONS: Visualization of protein conformational changes is important for characterization of protein function. Furthermore, the intermediate structures produced by our algorithm are good approximations to true structures. We believe there is great potential for these computationally predicted structures in protein-ligand docking experiments and virtual screening. The Morph-Pro web server can be accessed at http://morph-pro.bioinf.spbau.ru.

Resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-α antibody.

Castellana, Natalie E; McCutcheon, Krista; Pham, Victoria C; Harden, Kristin; Nguyen, Allen; Young, Judy; Adams, Camellia; Schroeder, Kurt; Arnott, David; Bafna, Vineet; Grogan, Jane L; Lill, Jennie R.

Proteomics ; 11(3): 395-405, 2011 Feb.

Article in English | MEDLINE | ID: mdl-21268269

ABSTRACT

A mouse hybridoma antibody directed against a member of the tumour necrosis factor (TNF)-superfamily, lymphotoxin-alpha (LT-α), was isolated from stored mouse ascites and purified to homogeneity. After more than a decade of storage the genetic material was not available for cloning; however, biochemical assays with the ascites showed this antibody against LT-α (LT-3F12) to be a preclinical candidate for the treatment of several inflammatory pathologies. We have successfully rescued the LT-3F12 antibody by performing MS analysis, primary amino acid sequence determination by template proteogenomics, and synthesis of the corresponding recombinant DNA by reverse engineering. The resurrected antibody was expressed, purified and shown to demonstrate the desired specificity and binding properties in a panel of immuno-biochemical tests. The work described herein demonstrates the powerful combination of high-throughput informatic proteomic de novo sequencing with reverse engineering to reestablish monoclonal antibody-expressing cells from archived protein sample, exemplifying the development of novel therapeutics from cryptic protein sources.

Subject(s)

Antibodies, Anti-Idiotypic/metabolism , Antibodies, Monoclonal/metabolism , Genetic Engineering , Genomics , Lymphotoxin-alpha/metabolism , Proteomics , Recombinant Proteins/metabolism , Amino Acid Sequence , Animals , Antibodies, Anti-Idiotypic/genetics , Antibodies, Anti-Idiotypic/immunology , Antibodies, Monoclonal/genetics , Antibodies, Monoclonal/immunology , Cells, Cultured , Endothelium, Vascular/cytology , Endothelium, Vascular/metabolism , Hybridomas , Lymphotoxin-alpha/genetics , Lymphotoxin-alpha/immunology , Mice , Molecular Sequence Data , Recombinant Proteins/genetics , Recombinant Proteins/immunology , Sequence Homology, Amino Acid , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization , Umbilical Veins/cytology , Umbilical Veins/metabolism

Template proteogenomics: sequencing whole proteins using an imperfect database.

Castellana, Natalie E; Pham, Victoria; Arnott, David; Lill, Jennie R; Bafna, Vineet.

Mol Cell Proteomics ; 9(6): 1260-70, 2010 Jun.

Article in English | MEDLINE | ID: mdl-20164058

ABSTRACT

Database search algorithms are the primary workhorses for the identification of tandem mass spectra. However, these methods are limited to the identification of spectra for which peptides are present in the database, preventing the identification of peptides from mutated or alternatively spliced sequences. A variety of methods has been developed to search a spectrum against a sequence allowing for variations. Some tools determine the sequence of the homologous protein in the related species but do not report the peptide in the target organism. Other tools consider variations, including modifications and mutations, in reconstructing the target sequence. However, these tools will not work if the template (homologous peptide) is missing in the database, and they do not attempt to reconstruct the entire protein target sequence. De novo identification of peptide sequences is another possibility, because it does not require a protein database. However, the lack of database reduces the accuracy. We present a novel proteogenomic approach, GenoMS, that draws on the strengths of database and de novo peptide identification methods. Protein sequence templates (i.e. proteins or genomic sequences that are similar to the target protein) are identified using the database search tool InsPecT. The templates are then used to recruit, align, and de novo sequence regions of the target protein that have diverged from the database or are missing. We used GenoMS to reconstruct the full sequence of an antibody by using spectra acquired from multiple digests using different proteases. Antibodies are a prime example of proteins that confound standard database identification techniques. The mature antibody genes result from large-scale genome rearrangements with flexible fusion boundaries and somatic hypermutation. Using GenoMS we automatically reconstruct the complete sequences of two immunoglobulin chains with accuracy greater than 98% using a diverged protein database. Using the genome as the template, we achieve accuracy exceeding 97%.

Subject(s)

Databases, Protein , Proteomics/methods , Sequence Analysis, Protein/methods , Templates, Genetic , Algorithms , Amino Acid Sequence , Animals , Immunoglobulins/biosynthesis , Immunoglobulins/chemistry , Markov Chains , Mice , Receptors, Immunologic/chemistry , Receptors, Immunologic/metabolism , Sequence Alignment , Tandem Mass Spectrometry

Discovery and revision of Arabidopsis genes by proteogenomics.

Castellana, Natalie E; Payne, Samuel H; Shen, Zhouxin; Stanke, Mario; Bafna, Vineet; Briggs, Steven P.

Proc Natl Acad Sci U S A ; 105(52): 21034-8, 2008 Dec 30.

Article in English | MEDLINE | ID: mdl-19098097

ABSTRACT

Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of Arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides corresponded to 1 or more of 3 different translations of the genome: a 6-frame translation, an exon splice-graph, and the currently annotated proteome. The majority of the peptides (126,055) resided in existing gene models (12,769 confirmed proteins), comprising 40% of annotated genes. Surprisingly, 18,024 novel peptides were found that do not correspond to annotated genes. Using the gene finding program AUGUSTUS and 5,426 novel peptides that occurred in clusters, we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models. The remaining 13,449 novel peptides provide high quality annotation (>99% correct) for thousands of additional genes. Our observation that 18,024 of 144,079 peptides did not match current gene models suggests that 13% of the Arabidopsis proteome was incomplete due to approximately equal numbers of missing and incorrect gene models.

Subject(s)

Arabidopsis Proteins/genetics , Arabidopsis/genetics , Genome, Plant/genetics , Proteome/genetics , Proteomics , Software , Models, Genetic , Proteomics/methods

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL