Subject(s)
Brain Neoplasms , Formaldehyde , Glioblastoma , Whole Genome Sequencing , Humans , Glioblastoma/genetics , Glioblastoma/diagnosis , Whole Genome Sequencing/methods , Brain Neoplasms/genetics , Brain Neoplasms/pathology , Brain Neoplasms/diagnosis , Tissue Fixation/methods , DNA, Neoplasm/genetics , Male , FemaleABSTRACT
New approach methodologies (NAM), including omics and in vitro approaches, are contributing to the implementation of 3R (reduction, refinement and replacement) strategies in regulatory science and risk assessment. In this study, we present an integrative transcriptomics and proteomics analysis workflow for the validation and revision of complex fish genomes and demonstrate how proteogenomics expression matrices can be used to support multi-level omics data integration in non-model species in vivo and in vitro. Using Atlantic salmon as an example, we constructed proteogenomic databases from publicly available transcriptomic data and in-house generated RNA-Seq and LC-MS/MS data. Our analysis identified â¼80,000 peptides, providing direct evidence of translation for over 40,000 RefSeq structures. The data also highlighted 183 co-located peptide groups that supported a single transcript each, and in each case, either corrected a previous annotation, supported Ensembl annotations not present in RefSeq, or identified novel previously unannotated genes. Proteogenomics data-derived expression matrices revealed distinct profiles for the different tissue types analyzed. Focusing on proteins involved in defense against xenobiotics, we detected distinct expression patterns across different salmon tissues and observed homology in the expression of chemical defense proteins between in vivo and in vitro liver systems. Our study demonstrates the potential of proteogenomic analyses in extending our understanding of complex fish genomes and provides an advanced bioinformatic toolkit to support the further development of NAMs and their application in regulatory science and (eco)toxicological studies of non-model species.
Subject(s)
Proteogenomics , Animals , Proteogenomics/methods , Molecular Sequence Annotation , Chromatography, Liquid , Tandem Mass Spectrometry , Proteomics/methods , Peptides/analysis , Peptides/genetics , Peptides/metabolismABSTRACT
Previous studies have implicated the novel peptide antibiotic human beta-defensin 1 (hBD-1) in the pathogenesis of cystic fibrosis. We describe in this report the isolation and characterization of the second member of this defensin family, human beta-defensin 2 (hBD-2). A cDNA for hBD-2 was identified by homology to hBD-1. hBD-2 is expressed diffusely throughout epithelia of many organs, including the lung, where it is found in the surface epithelia and serous cells of the submucosal glands. A specific antibody made of recombinant peptide detected hBD-2 in airway surface fluid of human lung. The fully processed peptide has broad antibacterial activity against many organisms, which is salt sensitive and synergistic with lysozyme and lactoferrin. These data suggest the existence of a family of beta-defensin molecules on mucosal surfaces that in the aggregate contributes to normal host defense.
Subject(s)
Anti-Bacterial Agents/chemistry , Blood Proteins/chemistry , Proteins/chemistry , Salts/pharmacology , beta-Defensins , Amino Acid Sequence , Anti-Bacterial Agents/pharmacology , Anti-Infective Agents/chemistry , Blood Proteins/urine , Bronchoalveolar Lavage , Cloning, Molecular , Cystic Fibrosis/physiopathology , Defensins , Humans , In Situ Hybridization, Fluorescence , Lactoferrin/pharmacology , Lung/chemistry , Molecular Sequence Data , Muramidase/pharmacology , Peptide Fragments/chemistry , RNA, Messenger/metabolism , Recombinant Proteins/chemistry , Recombinant Proteins/pharmacology , Sequence Analysis, DNAABSTRACT
This paper introduces a novel class of tree comparison problems strongly motivated by an important and cost intensive step in drug discovery pipeline viz., mapping cell bound receptors to the ligands they bind to and vice versa. Tree comparison studies motivated by problems such as virus-host tree comparison, gene-species tree comparison and consensus tree problem have been reported. None of these studies are applicable in our context because in all these problems, there is a well-defined mapping of the nodes the trees are built on across the set of trees being compared. A new class of tree comparison problems arises in cases where finding the correspondence among the nodes of the trees being compared is itself the problem. The problem arises while trying to find the interclass correspondence between the members of a pair of coevolving classes, e.g., cell bound receptors and their ligands. Given the evolution of the two classes, the combinatorial problem is to find a mapping among the leaves of the two trees that optimizes a given cost function. In this work we formulate various combinatorial optimization problems motivated by the aforementioned biological problem for the first time. We present hardness results, give an efficient algorithm for a restriction of the problem and demonstrate its applicability.
Subject(s)
Receptors, Chemokine/metabolism , Algorithms , Biological Evolution , Biometry , Chemokines/genetics , Chemokines/metabolism , Drug Design , Ligands , Receptors, Chemokine/geneticsABSTRACT
We have employed recently developed blind modification search techniques to generate the most comprehensive map of post-translational modifications (PTMs) in human lens constructed to date. Three aged lenses, two of which had moderate cataract, and one young control lens were analyzed using multidimensional liquid chromatography mass spectrometry. In total, 491 modification sites in lens proteins were identified. There were 155 in vivo PTM sites in crystallins: 77 previously reported sites and 78 newly detected PTM sites. Several of these sites had modifications previously undetected by mass spectrometry in lens including carboxymethyl lysine (+58 Da), carboxyethyl lysine (+72 Da), and an arginine modification of +55 Da with yet unknown chemical structure. These new modifications were observed in all three aged lenses but were not found in the young lens. Several new sites of cysteine methylation were identified indicating this modification is more extensive in lens than previously thought. The results were used to estimate the extent of modification at specific sites by spectral counting. We tested the long-standing hypothesis that PTMs contribute to age-related loss of crystallin solubility by comparing spectral counts between the water-soluble and water-insoluble fractions of the aged lenses and found that the extent of deamidation was significantly increased in the water-insoluble fractions. On the basis of spectral counting, the most abundant PTMs in aged lenses were deamidations and methylated cysteines with other PTMs present at lower levels.
Subject(s)
Amides/analysis , Crystallins/analysis , Lens, Crystalline/chemistry , Protein Processing, Post-Translational , Age Factors , Aged , Aged, 80 and over , Amino Acid Sequence , Cysteine/analysis , Humans , Infant, Newborn , Male , Methylation , Molecular Sequence Data , Peptides/analysis , SolubilityABSTRACT
Biological signals, such as the start of protein translation in eukaryotic mRNA, are stretches of nucleotides recognized by cellular machinery. There are a variety of techniques for modeling and identifying them. Most of these techniques either assume that the base pairs at each position of the signal are independently distributed, or they allow for limited dependencies among different positions. In previous work, we provided a statistical model that generalizes earlier methods and captures all significant high-order dependencies among different base positions. In this paper, we use a set of experimentally verified translation initiation (TI) sites (provided by Amos Bairoch) from eukaryotic sequences to train a range of methods, and then compare these methods. None of the methods is effective in predicting TI sites. We take advantage of the ribosome scanning model (Cigan et al., 1988) to significantly improve the prediction accuracy for full-length mRNAs. The ribosome scanning model suggests scanning from the 5' end of the capped mRNA and initiating translation at the first AUG in good context. This reduces the search space dramatically and accounts for its effectiveness. The success of this approach illustrates how biological ideas can illuminate and help solve challenging problems in computational biology.
Subject(s)
DNA, Complementary/genetics , Models, Genetic , Protein Biosynthesis , Ribosomes/genetics , Ribosomes/metabolism , Artificial Intelligence , Biometry , Codon, Initiator/genetics , Databases, Factual , Expressed Sequence Tags , Peptide Chain Initiation, Translational/genetics , Proteins/genetics , RNA, Messenger/geneticsABSTRACT
Proteomics, or the direct analysis of the expressed protein components of a cell, is critical to our understanding of cellular biological processes in normal and diseased tissue. A key requirement for its success is the ability to identify proteins in complex mixtures. Recent technological advances in tandem mass spectrometry has made it the method of choice for high-throughput identification of proteins. Unfortunately, the software for unambiguously identifying peptide sequences has not kept pace with the recent hardware improvements in mass spectrometry instruments. Critical for reliable high-throughput protein identification, scoring functions evaluate the quality of a match between experimental spectra and a database peptide. Current scoring function technology relies heavily on ad-hoc parameterization and manual curation by experienced mass spectrometrists. In this work, we propose a two-stage stochastic model for the observed MS/MS spectrum, given a peptide. Our model explicitly incorporates fragment ion probabilities, noisy spectra, and instrument measurement error. We describe how to compute this probability based score efficiently, using a dynamic programming technique. A prototype implementation demonstrates the effectiveness of the model.
Subject(s)
Databases, Protein , Mass Spectrometry/statistics & numerical data , Models, Statistical , Peptides/chemistry , Amino Acid Sequence , Computational Biology , Molecular Sequence Data , Peptide Fragments/chemistry , Peptide Fragments/genetics , Peptide Fragments/isolation & purification , Peptides/genetics , Peptides/isolation & purification , Proteome , Software , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/statistics & numerical data , Stochastic ProcessesABSTRACT
A new approach to gene finding is introduced called the "Conserved Exon Method" (CEM). It is based on the idea of looking for conserved protein sequences by comparing pairs of DNA sequences, identifying putative exon pairs based on conserved regions and splice junction signals then chaining pairs of putative exons together. It simultaneously predicts gene structures in both human and mouse genomic sequences (or in other pairs of sequences at the appropriate evolutionary distance). Experimental results indicate the potential usefulness of this approach.