Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
J Proteome Res ; 9(5): 2496-507, 2010 May 07.
Article in English | MEDLINE | ID: mdl-20192274

ABSTRACT

Sequenced genomes often reveal interrupted coding sequences that complicate the annotation process and the subsequent functional characterization of the genes. In the past, interrupted genes were generally considered to be the result of sequencing errors or pseudogenes, that is, gene remnants with little or no biological importance. However, recent lines of evidence support the hypothesis that these coding sequences can be functional; thus, it is crucial to understand whether interrupted genes are expressed in vivo. We addressed this issue by experimentally demonstrating the existence of functional disrupted genes in archaeal genomes. We discovered previously unknown disrupted genes that have interrupted homologues in distantly related species of archaea. The combination of a RT-PCR strategy with shotgun proteomics demonstrates that interrupted genes in the archaeon Sulfolobus solfataricus are expressed in vivo. In addition, the sequence of the peptides determined by LCMSMS and experiments of in vitro translation allows us to identify a gene expressed by programmed -1 frameshifting. Our findings will enable an accurate reinterpretation of archaeal interrupted genes shedding light on their function and on archaeal genome evolution.


Subject(s)
Archaeal Proteins/chemistry , Genes, Archaeal , High-Throughput Screening Assays/methods , Proteome/analysis , Proteomics/methods , Sulfolobus solfataricus/genetics , Amino Acid Sequence , Archaeal Proteins/genetics , Base Sequence , Chromatography, Liquid , Molecular Sequence Data , Peptide Mapping , Pseudogenes , Reverse Transcriptase Polymerase Chain Reaction , Tandem Mass Spectrometry , Transketolase/chemistry , Transketolase/genetics
2.
BMC Genomics ; 11: 634, 2010 Nov 16.
Article in English | MEDLINE | ID: mdl-21080938

ABSTRACT

BACKGROUND: Alvinella pompejana is a representative of Annelids, a key phylum for evo-devo studies that is still poorly studied at the sequence level. A. pompejana inhabits deep-sea hydrothermal vents and is currently known as one of the most thermotolerant Eukaryotes in marine environments, withstanding the largest known chemical and thermal ranges (from 5 to 105°C). This tube-dwelling worm forms dense colonies on the surface of hydrothermal chimneys and can withstand long periods of hypo/anoxia and long phases of exposure to hydrogen sulphides. A. pompejana specifically inhabits chimney walls of hydrothermal vents on the East Pacific Rise. To survive, Alvinella has developed numerous adaptations at the physiological and molecular levels, such as an increase in the thermostability of proteins and protein complexes. It represents an outstanding model organism for studying adaptation to harsh physicochemical conditions and for isolating stable macromolecules resistant to high temperatures. RESULTS: We have constructed four full length enriched cDNA libraries to investigate the biology and evolution of this intriguing animal. Analysis of more than 75,000 high quality reads led to the identification of 15,858 transcripts and 9,221 putative protein sequences. Our annotation reveals a good coverage of most animal pathways and networks with a prevalence of transcripts involved in oxidative stress resistance, detoxification, anti-bacterial defence, and heat shock protection. Alvinella proteins seem to show a slow evolutionary rate and a higher similarity with proteins from Vertebrates compared to proteins from Arthropods or Nematodes. Their composition shows enrichment in positively charged amino acids that might contribute to their thermostability. The gene content of Alvinella reveals that an important pool of genes previously considered to be specific to Deuterostomes were in fact already present in the last common ancestor of the Bilaterian animals, but have been secondarily lost in model invertebrates. This pool is enriched in glycoproteins that play a key role in intercellular communication, hormonal regulation and immunity. CONCLUSIONS: Our study starts to unravel the gene content and sequence evolution of a deep-sea annelid, revealing key features in eukaryote adaptation to extreme environmental conditions and highlighting the proximity of Annelids and Vertebrates.


Subject(s)
DNA, Complementary/genetics , Evolution, Molecular , Phylogeny , Polychaeta/genetics , Adaptation, Physiological/drug effects , Adaptation, Physiological/genetics , Amino Acids/genetics , Animals , Base Composition/genetics , Bayes Theorem , Databases, Genetic , Expressed Sequence Tags , Gene Expression Regulation , Gene Library , Internet , Metals, Heavy/toxicity , Molecular Sequence Annotation , Molecular Sequence Data , Oxidative Stress/drug effects , Oxidative Stress/genetics , Polychaeta/drug effects , Protein Structure, Tertiary , Ribosomes/genetics , Temperature , Vertebrates/genetics
3.
BMC Bioinformatics ; 9: 213, 2008 Apr 25.
Article in English | MEDLINE | ID: mdl-18439277

ABSTRACT

BACKGROUND: Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs. RESULTS: We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases. CONCLUSION: We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.


Subject(s)
Amino Acid Motifs , Sequence Alignment/standards , Software Validation , User-Computer Interface , Artificial Intelligence , Pattern Recognition, Automated/methods , Pattern Recognition, Automated/standards , Proteins/analysis , Proteins/ultrastructure , Proteomics/methods , Quality Control , Reproducibility of Results , Sequence Alignment/methods , Sequence Homology, Amino Acid
4.
BMC Evol Biol ; 8: 78, 2008 Mar 06.
Article in English | MEDLINE | ID: mdl-18325090

ABSTRACT

BACKGROUND: Computer-assisted analyses have shown that all bacterial genomes contain a small percentage of open reading frames with a frameshift or in-frame stop codon We report here a comparative analysis of these interrupted coding sequences (ICDSs) in six isolates of M. tuberculosis, two of M. bovis and one of M. africanum and question their phenotypic impact and evolutionary significance. RESULTS: ICDSs were classified as "common to all strains" or "strain-specific". Common ICDSs are believed to result from mutations acquired before the divergence of the species, whereas strain-specific ICDSs were acquired after this divergence. Comparative analyses of these ICDSs therefore define the molecular signature of a particular strain, phylogenetic lineage or species, which may be useful for inferring phenotypic traits such as virulence and molecular relationships. For instance, in silico analysis of the W-Beijing lineage of M. tuberculosis, an emergent family involved in several outbreaks, is readily distinguishable from other phyla by its smaller number of common ICDSs, including at least one known to be associated with virulence. Our observation was confirmed through the sequencing analysis of ICDSs in a panel of 21 clinical M. tuberculosis strains. This analysis further illustrates the divergence of the W-Beijing lineage from other phyla in terms of the number of full-length ORFs not containing a frameshift. We further show that ICDS formation is not associated with the presence of a mutated promoter, and suggest that promoter extinction is not the main cause of pseudogene formation. CONCLUSION: The correlation between ICDSs, function and phenotypes could have important evolutionary implications. This study provides population geneticists with a list of targets, which could undergo selective pressure and thus alters relationships between the various lineages of M. tuberculosis strains and their host. This approach could be applied to any closely related bacterial strains or species for which several genome sequences are available.


Subject(s)
DNA, Bacterial/genetics , Evolution, Molecular , Mycobacterium bovis/genetics , Mycobacterium tuberculosis/genetics , Open Reading Frames , Bacterial Typing Techniques , Frameshift Mutation , Genome, Bacterial , Mycobacterium bovis/classification , Mycobacterium tuberculosis/classification , Phylogeny , Sequence Analysis, DNA , Species Specificity
5.
Nucleic Acids Res ; 34(Database issue): D338-43, 2006 Jan 01.
Article in English | MEDLINE | ID: mdl-16381882

ABSTRACT

Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.


Subject(s)
Codon, Terminator , Databases, Genetic , Frameshift Mutation , Genome, Archaeal , Genome, Bacterial , Bacillus/genetics , Genomics , Internet , Mycobacterium smegmatis/genetics , Sequence Homology, Amino Acid , User-Computer Interface
6.
Genome Res ; 19(1): 128-35, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18955433

ABSTRACT

The progress in sequencing technologies irrigates biology with an ever-increasing number of genome sequences. In most cases, the gene repertoire is predicted in silico and conceptually translated into proteins. As recently highlighted, the predicted genes exhibit frequent errors, particularly in start codons, with a serious impact on subsequent biological studies. A new "ortho-proteogenomic" approach is presented here for the annotation refinement of multiple genomes at once. It combines comparative genomics with an original proteomic protocol that allows the characterization of both N-terminal and internal peptides in a single experiment. This strategy was applied to the Mycobacterium genus with Mycobacterium smegmatis as the reference, and identified 946 distinct proteins, including 443 characterized N termini. These experimental data allowed the correction of 19% of the characterized start codons, the identification of 29 proteins missed during the annotation process, and the curation, thanks to comparative genomics, of 4328 sequences of 16 other Mycobacterium proteomes.


Subject(s)
Genomics/methods , Proteomics/methods , Bacterial Proteins/genetics , Bacterial Proteins/isolation & purification , Codon, Initiator/genetics , Genome, Bacterial , Mass Spectrometry , Mycobacterium/chemistry , Mycobacterium/genetics , Mycobacterium smegmatis/chemistry , Mycobacterium smegmatis/genetics , Peptide Fragments/genetics , Peptide Fragments/isolation & purification , Proteome , RNA, Bacterial/genetics , Sequence Alignment , Species Specificity
7.
Genome Biol ; 8(2): R20, 2007.
Article in English | MEDLINE | ID: mdl-17295914

ABSTRACT

BACKGROUND: In silico analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The reality or otherwise of these sequences has major implications for all subsequent functional characterization steps, including module prediction, comparative genomics and high-throughput proteomic projects. RESULTS: We show here, using Mycobacterium smegmatis as a model species, that a significant proportion of these ICDSs result from sequencing errors. We used a resequencing procedure and mass spectrometry analysis to determine the nature of a number of ICDSs in this organism. We found that 28 of the 73 ICDSs investigated correspond to sequencing errors. CONCLUSION: The correction of these errors results in modification of the predicted amino acid sequences of the corresponding proteins and changes in annotation. We suggest that each bacterial ICDS should be investigated individually, to determine its true status and to ensure that the genome sequence is appropriate for comparative genomics analyses.


Subject(s)
Frameshift Mutation/genetics , Genomics/methods , Mycobacterium smegmatis/genetics , Research Design , Sequence Analysis, DNA/methods , Chromatography, Liquid , Computational Biology , Electrophoresis, Gel, Two-Dimensional , Tandem Mass Spectrometry
8.
EMBO Rep ; 3(12): 1195-200, 2002 Dec.
Article in English | MEDLINE | ID: mdl-12446570

ABSTRACT

Blood cells play a crucial role in both morphogenetic and immunological processes in Drosophila, yet the factors regulating their proliferation remain largely unknown. In order to address this question, we raised antibodies against a tumorous blood cell line and identified an antigenic determinant that marks the surface of prohemocytes and also circulating plasmatocytes in larvae. This antigen was identified as a Drosophila homolog of the mammalian receptor for platelet-derived growth factor (PDGF)/vascular endothelial growth factor (VEGF). The Drosophila receptor controls cell proliferation in vitro. By overexpressing in vivo one of its putative ligands, PVF2, we induced a dramatic increase in circulating hemocytes. These results identify the PDGF/VEGF receptor homolog and one of its ligands as important players in Drosophila hematopoiesis.


Subject(s)
Cell Differentiation/physiology , Drosophila/metabolism , Hemocytes/physiology , Larva/metabolism , Receptors, Platelet-Derived Growth Factor/physiology , Receptors, Vascular Endothelial Growth Factor/physiology , Animals , Antibodies/immunology , Blotting, Western , Cell Differentiation/immunology , Drosophila/growth & development , Drosophila/immunology , Hemocytes/immunology , Immunohistochemistry , Larva/growth & development , Larva/immunology , Ligands
SELECTION OF CITATIONS
SEARCH DETAIL