Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Mol Microbiol Biotechnol ; 16(1-2): 81-90, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-18957864

RESUMO

Anaerobranca gottschalkii strain LBS3 T is an extremophile living at high temperature (up to 65 degrees C) and in alkaline environments (up to pH 10.5). An assembly of 696 DNA contigs representing about 96% of the 2.26-Mbp genome of A. gottschalkii has been generated with a low-sequence-coverage shotgun-sequencing strategy. The chosen sequencing strategy provided rapid and economical access to genes encoding key enzymes of the mono- and polysaccharide metabolism, without dilution of spare resources for extensive sequencing of genes lacking potential economical value. Five of these amylolytic enzymes of considerable commercial interest for biotechnological applications have been expressed and characterized in more detail after identification of their genes in the partial genome sequence: type I pullulanase, cyclodextrin glycosyltransferase (CGTase), two alpha-amylases (AmyA and AmyB), and an alpha-1,4-glucan-branching enzyme.


Assuntos
Biotecnologia , Enzimas/genética , Genes Bacterianos/genética , Genoma Bacteriano/genética , Bactérias Gram-Positivas/enzimologia , Bactérias Gram-Positivas/genética , alfa-Amilases/química , alfa-Amilases/genética , alfa-Amilases/isolamento & purificação , alfa-Amilases/metabolismo
2.
Nucleic Acids Res ; 36(Database issue): D196-201, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18158298

RESUMO

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).


Assuntos
Bases de Dados de Proteínas , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas de Plantas/química , Proteínas de Plantas/genética , Proteínas Fúngicas/metabolismo , Genoma Fúngico , Genoma de Planta , Genômica , Internet , Proteínas de Plantas/metabolismo , Mapeamento de Interação de Proteínas , Análise de Sequência de Proteína , Software , Interface Usuário-Computador
3.
Nucleic Acids Res ; 34(Database issue): D169-72, 2006 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-16381839

RESUMO

The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).


Assuntos
Bases de Dados Genéticas , Genômica , Proteínas/genética , Animais , Biologia Computacional/métodos , Evolução Molecular , Internet , Camundongos , Modelos Genéticos , Mapeamento de Interação de Proteínas , Interface Usuário-Computador
4.
BMC Bioinformatics ; 6: 266, 2005 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-16274476

RESUMO

BACKGROUND: Alternative splicing is a major mechanism of generating protein diversity in higher eukaryotes. Although at least half, and probably more, of mammalian genes are alternatively spliced, it was not clear, whether the frequency of alternative splicing is the same in different functional categories. The problem is obscured by uneven coverage of genes by ESTs and a large number of artifacts in the EST data. RESULTS: We have developed a method that generates possible mRNA isoforms for human genes contained in the EDAS database, taking into account the effects of nonsense-mediated decay and translation initiation rules, and a procedure for offsetting the effects of uneven EST coverage. Then we computed the number of mRNA isoforms for genes from different functional categories. Genes encoding ribosomal proteins and genes in the category "Small GTPase-mediated signal transduction" tend to have fewer isoforms than the average, whereas the genes in the category "DNA replication and chromosome cycle" have more isoforms than the average. Genes encoding proteins involved in protein-protein interactions tend to be alternatively spliced more often than genes encoding non-interacting proteins, although there is no significant difference in the number of isoforms of alternatively spliced genes. CONCLUSION: Filtering for functional isoforms satisfying biological constraints and accounting for uneven EST coverage allowed us to describe differences in alternative splicing of genes from different functional categories. The observations seem to be consistent with expectations based on current biological knowledge: less isoforms for ribosomal and signal transduction proteins, and more alternative splicing of interacting and cell cycle proteins.


Assuntos
Algoritmos , Processamento Alternativo/fisiologia , Mapeamento Cromossômico/métodos , Códon de Iniciação , Computadores Moleculares , Humanos , Biossíntese de Proteínas , Isoformas de Proteínas/classificação , RNA Mensageiro/química , RNA Mensageiro/classificação , Software
5.
Nucleic Acids Res ; 32(Database issue): D41-4, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14681354

RESUMO

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).


Assuntos
Bases de Dados de Proteínas , Genoma , Proteômica , Animais , Biologia Computacional , DNA Complementar/genética , Fungos/genética , Humanos , Internet , Modelos Biológicos , Ligação Proteica , Homologia de Sequência
6.
Nucleic Acids Res ; 30(1): 31-4, 2002 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11752246

RESUMO

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).


Assuntos
Bases de Dados Genéticas , Bases de Dados de Proteínas , Genoma , Sequência de Aminoácidos , Arabidopsis/genética , Sequência de Bases , Etiquetas de Sequências Expressas , Genoma Fúngico , Genoma Humano , Genoma de Planta , Alemanha , Humanos , Internet , Proteínas Mitocondriais/genética , Neurospora crassa/genética , Leveduras/genética
7.
J Mol Biol ; 311(4): 639-56, 2001 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-11518521

RESUMO

We describe a computational approach for finding genes that are functionally related but do not possess any noticeable sequence similarity. Our method, which we call SNAP (similarity-neighborhood approach), reveals the conservation of gene order on bacterial chromosomes based on both cross-genome comparison and context information. The novel feature of this method is that it does not rely on detection of conserved colinear gene strings. Instead, we introduce the notion of a similarity-neighborhood graph (SN-graph), which is constructed from the chains of similarity and neighborhood relationships between orthologous genes in different genomes and adjacent genes in the same genome, respectively. An SN-cycle is defined as a closed path on the SN-graph and is postulated to preferentially join functionally related gene products that participate in the same biochemical or regulatory process. We demonstrate the substantial non-randomness and functional significance of SN-cycles derived from real genome data and estimate the prediction accuracy of SNAP in assigning broad function to uncharacterized proteins. Examples of practical application of SNAP for improving the quality of genome annotation are described.


Assuntos
Bactérias/genética , Ordem dos Genes/genética , Genes Bacterianos/genética , Genoma Bacteriano , Genômica/métodos , Algoritmos , Bactérias/metabolismo , Biologia Computacional/métodos , Sequência Conservada/genética , Bases de Dados como Assunto , Família Multigênica/genética
8.
Bioinformatics ; 17(1): 44-57, 2001 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-11222261

RESUMO

MOTIVATION: Enormous demand for fast and accurate analysis of biological sequences is fuelled by the pace of genome analysis efforts. There is also an acute need in reliable up-to-date genomic databases integrating both functional and structural information. Here we describe the current status of the PEDANT software system for high-throughput analysis of large biological sequence sets and the genome analysis server associated with it. RESULTS: The principal features of PEDANT are: (i) completely automatic processing of data using a wide range of bioinformatics methods, (ii) manual refinement of annotation, (iii) automatic and manual assignment of gene products to a number of functional and structural categories, (iv) extensive hyperlinked protein reports, and (v) advanced DNA and protein viewers. The system is easily extensible and allows to include custom methods, databases, and categories with minimal or no programming effort. PEDANT is actively used as a collaborative environment to support several on-going genome sequencing projects. The main purpose of the PEDANT genome database is to quickly disseminate well-organized information on completely sequenced and unfinished genomes. It currently includes 80 genomic sequences and in many cases serves as the only source of exhaustive information on a given genome. The database also acts as a vehicle for a number of research projects in bioinformatics. Using SQL queries, it is possible to correlate a large variety of pre-computed properties of gene products encoded in complete genomes with each other and compare them with data sets of special scientific interest. In particular, the availability of structural predictions for over 300 000 genomic proteins makes PEDANT the most extensive structural genomics resource available on the web.


Assuntos
Genômica , Software , Arabidopsis/genética , Chaperonina 60/metabolismo , Biologia Computacional , Bases de Dados Factuais , Etiquetas de Sequências Expressas , Genoma Fúngico , Humanos , Internet , Proteínas/genética , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Análise de Sequência de Proteína/métodos , Análise de Sequência de Proteína/estatística & dados numéricos
9.
Nature ; 407(6803): 508-13, 2000 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-11029001

RESUMO

Thermoplasma acidophilum is a thermoacidophilic archaeon that thrives at 59 degrees C and pH 2, which was isolated from self-heating coal refuse piles and solfatara fields. Species of the genus Thermoplasma do not possess a rigid cell wall, but are only delimited by a plasma membrane. Many macromolecular assemblies from Thermoplasma, primarily proteases and chaperones, have been pivotal in elucidating the structure and function of their more complex eukaryotic homologues. Our interest in protein folding and degradation led us to seek a more complete representation of the proteins involved in these pathways by determining the genome sequence of the organism. Here we have sequenced the 1,564,905-base-pair genome in just 7,855 sequencing reactions by using a new strategy. The 1,509 open reading frames identify Thermoplasma as a typical euryarchaeon with a substantial complement of bacteria-related genes; however, evidence indicates that there has been much lateral gene transfer between Thermoplasma and Sulfolobus solfataricus, a phylogenetically distant crenarchaeon inhabiting the same environment. At least 252 open reading frames, including a complete protein degradation pathway and various transport proteins, resemble Sulfolobus proteins most closely.


Assuntos
Genoma Arqueal , Thermoplasma/genética , Proteínas Arqueais/genética , Proteínas Arqueais/metabolismo , Sequência de Bases , DNA Arqueal , Endopeptidases/metabolismo , Metabolismo Energético , Dados de Sequência Molecular , Fases de Leitura Aberta , Recombinação Genética , Sulfolobus/genética , Thermoplasma/metabolismo , Ubiquitinas/metabolismo
10.
Pac Symp Biocomput ; (12): 3-5, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-10902151
11.
Nucleic Acids Res ; 28(1): 37-40, 2000 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-10592176

RESUMO

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several up-to-date genome-oriented databases. This report describes growing databases reflecting the progress of sequencing the Arabidopsis thaliana (MATDB) and Neurospora crassa genomes (MNCDB), the yeast genome database (MYGD) extended by functional analysis data, the database of annotated human EST-clusters (HIB) and the database of the complete cDNA sequences from the DHGP (German Human Genome Project). It also contains information on the up-to-date database of complete genomes (PEDANT), the classification of protein sequences (ProtFam) and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database. These databases can be accessed through the MIPS WWW server (http://www. mips.biochem.mpg.de).


Assuntos
Bases de Dados Factuais , Genoma , Proteínas/genética , Arabidopsis/genética , Humanos , Internet , Neurospora crassa/genética , Proteínas/química , Saccharomyces cerevisiae/genética
12.
Prog Biophys Mol Biol ; 72(1): 1-17, 1999.
Artigo em Inglês | MEDLINE | ID: mdl-10446500

RESUMO

Spectacular achievements in whole genome sequencing open up new possibilities for structural research. Protein structures can now be studied in their natural genomic context. On the other hand, structure prediction algorithms can be improved using species-specific tendencies in folding patterns. Finally, efficient strategies to select targets for structure determination can be devised. In this review we consider new computational approaches and results in protein structure analysis stemming from the availability of complete genomes.


Assuntos
Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Bactérias/genética , Fungos/genética , Internet , Modelos Moleculares , Estrutura Molecular , Dobramento de Proteína , Análise de Sequência
13.
Gene ; 234(2): 257-65, 1999 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-10395898

RESUMO

Exact mapping of gene starts is an important problem in the computer-assisted functional analysis of newly sequenced prokaryotic genomes. We describe an algorithm for finding ribosomal binding sites without a learning sample. This algorithm is particularly useful for analysis of genomes with little or no experimentally mapped genes. There is a clear correlation between the ribosomal binding site (RBS) properties of a given genome and the potential gene start prediction accuracy. This correlation is of considerable predictive power and may be useful for estimating the expected success of future genome analysis efforts. We also demonstrate that the RBS properties depend on the phylogenetic position of a genome.


Assuntos
Genes Bacterianos/genética , Algoritmos , Sequência de Bases , Sítios de Ligação , Códon de Iniciação/genética , DNA Bacteriano/genética , DNA Bacteriano/metabolismo , Evolução Molecular , Filogenia , RNA Bacteriano/genética , RNA Ribossômico/genética , Reprodutibilidade dos Testes , Ribossomos/metabolismo , Alinhamento de Sequência , Software
15.
Nucleic Acids Res ; 27(1): 44-8, 1999 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-9847138

RESUMO

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database.


Assuntos
Sequência de Aminoácidos , Bases de Dados Factuais , Genoma , Proteínas/química , Animais , Arabidopsis/genética , Etiquetas de Sequências Expressas , Genoma Fúngico , Genoma Humano , Genoma de Planta , Alemanha , Humanos , Armazenamento e Recuperação da Informação , Família Multigênica , Proteínas/genética , Leveduras/genética
16.
Nature ; 402(6758): 147-54, 1999 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-10647006

RESUMO

The chaperonin GroEL has an essential role in mediating protein folding in the cytosol of Escherichia coli. Here we show that GroEL interacts strongly with a well-defined set of approximately 300 newly translated polypeptides, including essential components of the transcription/translation machinery and metabolic enzymes. About one third of these proteins are structurally unstable and repeatedly return to GroEL for conformational maintenance. GroEL substrates consist preferentially of two or more domains with alphabeta-folds, which contain alpha-helices and buried beta-sheets with extensive hydrophobic surfaces. These proteins are expected to fold slowly and be prone to aggregation. The hydrophobic binding regions of GroEL may be well adapted to interact with the non-native states of alphabeta-domain proteins.


Assuntos
Chaperonina 60/metabolismo , Escherichia coli/metabolismo , Estrutura Secundária de Proteína , Especificidade por Substrato
17.
Yeast ; 14(14): 1327-32, 1998 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-9802211

RESUMO

Single-read sequences from both ends of 415 3-kb average size genomic DNA fragments of Candida albicans were compared with the complete sequence data of Saccharomyces cerevisiae. Comparison at the protein level, translated DNA against protein sequences, revealed 138 sequence tags with clear similarity to S. cerevisiae proteins or open reading frames. One case of synteny was found for the open reading frames of RAD16 and LYS2, which are adjacent to each other in S. cerevisiae and C. albicans.


Assuntos
Adenosina Trifosfatases , Candida albicans/genética , Mapeamento Cromossômico/métodos , Genoma Fúngico , Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Proteínas Fúngicas , Análise de Sequência de DNA , Sitios de Sequências Rotuladas , Especificidade da Espécie
18.
Bioinformatics ; 14(7): 551-61, 1998.
Artigo em Inglês | MEDLINE | ID: mdl-9730920

RESUMO

MOTIVATION: It is only a matter of time until a user will see not many but one integrated database of information for molecular biology. Is this true? Is it a good thing? Why will it happen? Where are we now? What developments are fostering and what developments are impeding progress towards this end? SUPPLEMENTARY INFORMATION: A list of WWW resources devoted to database issues in molecular biology is available at http://www.mips.biochem.mpg.de CONTACT: frishman@mips.biochem.mpg.de


Assuntos
Biologia Computacional , Bases de Dados como Assunto , Sistemas de Gerenciamento de Base de Dados , Internet , Controle de Qualidade
19.
Eur J Biochem ; 254(2): 230-7, 1998 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-9660175

RESUMO

Iron-regulatory protein-1 (IRP-1) plays a dual role as a regulatory RNA-binding protein and as a cytoplasmic aconitase. When bound to iron-responsive elements (IRE), IRP-1 post-transcriptionally regulates the expression of mRNAs involved in iron metabolism. IRP have been cloned from several vertebrate species. Using a degenerate-primer PCR strategy and the screening of data bases, we now identify the homologues of IRP-1 in two invertebrate species, Drosophila melanogaster and Caenorhabditis elegans. Comparative sequence analysis shows that these invertebrate IRP are closely related to vertebrate IRP, and that the amino acid residues that have been implicated in aconitase function are particularly highly conserved, suggesting that invertebrate IRP may function as cytoplasmic aconitases. Antibodies raised against recombinant human IRP-1 immunoprecipitate the Drosophila homologue expressed from the cloned cDNA. In contrast to vertebrates, two IRP-1 homologues (Drosophila IRP-1A and Drosophila IRP-1B), displaying 86% identity to each other, are expressed in D. melanogaster. Both of these homologues are distinct from vertebrate IRP-2. In contrast to the mammalian system where the two IRP (IRP-1 and IRP-2) are differentially expressed, Drosophila IRP-1A and Drosophila IRP-1B are not preferentially expressed in specific organs. The localization of Drosophila IRP-1A to position 94C1-8 and of Drosophila IRP-1B to position 86B3-6 on the right arm of chromosome 3 and the availability of an IRP-1 cDNA from C. elegans will facilitate a genetic analysis of the IRE/IRP system, thus opening a new avenue to explore this regulatory network.


Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Proteínas Ferro-Enxofre/genética , Proteínas de Ligação a RNA/genética , Sequência de Aminoácidos , Animais , Sequência de Bases , Mapeamento Cromossômico , Clonagem Molecular , Sequência Conservada , Primers do DNA/genética , DNA Complementar/genética , Drosophila melanogaster/embriologia , Drosophila melanogaster/metabolismo , Evolução Molecular , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Proteína 1 Reguladora do Ferro , Proteína 2 Reguladora do Ferro , Proteínas Reguladoras de Ferro , Dados de Sequência Molecular , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
20.
Nucleic Acids Res ; 26(12): 2941-7, 1998 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-9611239

RESUMO

Analysis of a newly sequenced bacterial genome starts with identification of protein-coding genes. Functional assignment of proteins requires the exact knowledge of protein N-termini. We present a new program ORPHEUS that identifies candidate genes and accurately predicts gene starts. The analysis starts with a database similarity search and identification of reliable gene fragments. The latter are used to derive statistical characteristics of protein-coding regions and ribosome-binding sites and to predict the complete set of genes in the analyzed genome. In a test on Bacillus subtilis and Escherichia coli genomes, the program correctly identified 93.3% (resp. 96.3%) of experimentally annotated genes longer than 100 codons described in the PIR-International database, and for these genes 96.3% (83.9%) of starts were predicted exactly. Furthermore, 98.9% (99.1%) of genes longer than 100 codons annotated in GenBank were found, and 92.9% (75.7%) of predicted starts coincided with the feature table description. Finally, for the complete gene complements of B.subtilis and E.coli , including genes shorter than 100 codons, gene prediction accuracy was 88.9 and 87.1%, respectively, with 94.2 and 76.7% starts coinciding with the existing annotation.


Assuntos
Algoritmos , Genoma Bacteriano , Alinhamento de Sequência/métodos , Software , Bacillus subtilis/genética , Proteínas de Bactérias/genética , Códon de Iniciação , Bases de Dados Factuais , Escherichia coli/genética , Fases de Leitura Aberta , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...