Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Science ; 293(5537): 2040-4, 2001 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-11557880

RESUMO

A pathway database (DB) is a DB that describes biochemical pathways, reactions, and enzymes. The EcoCyc pathway DB (see http://ecocyc.org) describes the metabolic, transport, and genetic-regulatory networks of Escherichia coli. EcoCyc is an example of a computational symbolic theory, which is a DB that structures a scientific theory within a formal ontology so that it is available for computational analysis. It is argued that by encoding scientific theories in symbolic form, we open new realms of analysis and understanding for theories that would otherwise be too large and complex for scientists to reason with effectively.


Assuntos
Biologia Computacional , Bases de Dados Factuais , Escherichia coli/genética , Escherichia coli/metabolismo , Genoma Bacteriano , Inteligência Artificial , Meios de Cultura , Escherichia coli/enzimologia , Escherichia coli/crescimento & desenvolvimento , Internet , Software
2.
Science ; 294(5550): 2317-23, 2001 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-11743193

RESUMO

The 5.67-megabase genome of the plant pathogen Agrobacterium tumefaciens C58 consists of a circular chromosome, a linear chromosome, and two plasmids. Extensive orthology and nucleotide colinearity between the genomes of A. tumefaciens and the plant symbiont Sinorhizobium meliloti suggest a recent evolutionary divergence. Their similarities include metabolic, transport, and regulatory systems that promote survival in the highly competitive rhizosphere; differences are apparent in their genome structure and virulence gene complement. Availability of the A. tumefaciens sequence will facilitate investigations into the molecular basis of pathogenesis and the evolutionary divergence of pathogenic and symbiotic lifestyles.


Assuntos
Agrobacterium tumefaciens/genética , Genoma Bacteriano , Análise de Sequência de DNA , Agrobacterium tumefaciens/classificação , Agrobacterium tumefaciens/patogenicidade , Agrobacterium tumefaciens/fisiologia , Aderência Bacteriana/genética , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Proteínas de Transporte/genética , Proteínas de Transporte/metabolismo , Cromossomos Bacterianos/genética , Conjugação Genética , Replicação do DNA , Genes Bacterianos , Genes Reguladores , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Dados de Sequência Molecular , Filogenia , Plantas/microbiologia , Plasmídeos , Replicon , Rhizobiaceae/genética , Rhizobiaceae/fisiologia , Sinorhizobium meliloti/genética , Sinorhizobium meliloti/fisiologia , Simbiose , Virulência/genética
3.
Nucleic Acids Res ; 34(13): 3687-97, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16893953

RESUMO

Different biological notions of pathways are used in different pathway databases. Those pathway ontologies significantly impact pathway computations. Computational users of pathway databases will obtain different results depending on the pathway ontology used by the databases they employ, and different pathway ontologies are preferable for different end uses. We explore differences in pathway ontologies by comparing the BioCyc and KEGG ontologies. The BioCyc ontology defines a pathway as a conserved, atomic module of the metabolic network of a single organism, i.e. often regulated as a unit, whose boundaries are defined at high-connectivity stable metabolites. KEGG pathways are on average 4.2 times larger than BioCyc pathways, and combine multiple biological processes from different organisms to produce a substrate-centered reaction mosaic. We compared KEGG and BioCyc pathways using genome context methods, which determine the functional relatedness of pairs of genes. For each method we employed, a pair of genes randomly selected from a BioCyc pathway is more likely to be related by that method than is a pair of genes randomly selected from a KEGG pathway, supporting the conclusion that the BioCyc pathway conceptualization is closer to a single conserved biological process than is that of KEGG.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Metabolismo/genética , Genômica/métodos , Vocabulário Controlado
4.
Nucleic Acids Res ; 33(13): 4035-9, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16034025

RESUMO

We report on a new type of systematic annotation error in genome and pathway databases that results from the misinterpretation of partial Enzyme Commission (EC) numbers such as '1.1.1.-'. This error results in the assignment of genes annotated with a partial EC number to many or all biochemical reactions that are annotated with the same partial EC number. That inference is faulty because of the ambiguous nature of partial EC numbers. We have observed this type of error in multiple databases, including KEGG, VIMSS and IMG, all of which assign genes to KEGG pathways. The Escherichia coli subset of the KEGG database exhibits this error for 6.8% of its gene-reaction assignments. For example, KEGG contains 17 reactions that are annotated with EC 1.1.1.-. A group of three E.coli genes, b1580 [putative dehydrogenase, NAD(P)-binding, starvation-sensing protein], b3787 (UDP-N-acetyl-D-mannosaminuronic acid dehydrogenase) and b0207 (2,5-diketo-D-gluconate reductase B), is assigned to 15 of those reactions, despite experimental evidence indicating different single functions for two of the three genes. Furthermore, the databases (DBs) are internally inconsistent in that the description of gene functions for genes with partial EC numbers is inconsistent with the activities implied by reactions to which the genes were assigned. We infer that these inconsistencies result from the processing used to match gene products to reactions within KEGG's metabolic pathways. These errors affect scientists who use these DBs as online encyclopedias and they affect bioinformaticists who use these DBs to train and validate newly developed algorithms.


Assuntos
Bases de Dados Genéticas , Enzimas/genética , Genômica , Vocabulário Controlado , Sequência de Bases , Escherichia coli/enzimologia , Escherichia coli/genética , Humanos , Dados de Sequência Molecular , Reprodutibilidade dos Testes
5.
Trends Biotechnol ; 14(8): 273-9, 1996 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-8987457

RESUMO

Several techniques are being introduced into the bioinformatics community to permit interoperation between molecular biology databases (DBs). The common factor to these approaches is the creation of links between entities in different DBs. Links can connect pieces of information about a single protein that are partitioned across multiple DBs, and can also encode relationships between different biological entities, such as relationships between an enzyme, its gene and its catalytic activity. This article provides an overview of the DB-interoperation problem, and offers several solutions. It discusses how links are used in molecular biology DBs, and describes the potential stumbling blocks when DB links are created and used.


Assuntos
Biologia Computacional/tendências , Bases de Dados Factuais , Biotecnologia/tendências , Redes de Comunicação de Computadores , Semântica
6.
Trends Biotechnol ; 17(7): 275-81, 1999 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-10370234

RESUMO

Integrated pathway-genome databases describe the genes and genome of an organism, as well as its predicted pathways, reactions, enzymes and metabolites. In conjunction with visualization and analysis software, these databases provide a framework for improved understanding of microbial physiology and for antimicrobial drug discovery. We describe pathway-based analyses of the genomes of a number of medically relevant microorganisms and a novel software tool that visualizes gene-expression data on a diagram showing the whole metabolic network of the microorganism.


Assuntos
Anti-Infecciosos/farmacologia , Sistemas de Gerenciamento de Base de Dados , Genoma Bacteriano , Genoma Fúngico , Integração de Sistemas , Antibacterianos , Anti-Infecciosos/síntese química , Bactérias/efeitos dos fármacos , Bactérias/genética , Bactérias/metabolismo , Desenho de Fármacos , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Especificidade da Espécie
7.
Gene ; 172(1): GC43-50, 1996 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-8654966

RESUMO

The World Wide Web (WWW) offers the potential to deliver specialized information to an audience of unprecedented size. Along with this exciting new opportunity comes a challenge for software developers: instead of rewriting our software applications to operate over the WWW, how can we maximize software reuse by retrofitting existing applications? We have developed a Web server tool, written in Common Lisp, that allows existing graphical user interface applications written using the Common Lisp Interface Manager (CLIM) to hook easily into the WWW. This tool-CWEST (CLIM-WEb Server Tool, pronounced "quest")-was developed to operate with EcoCyc, an electronic encyclopedia of the genes and metabolism of the bacterium E. coli. EcoCyc consists of a database of objects relevant to E. coli biochemistry and a user interface, implemented in CLIM, that runs on the X-window system and generates graphical displays appropriate to biological objects. Each query to the EcoCyc WWW server is treated as a command to the EcoCyc program, which dynamically generates an appropriate CLIM drawing. CWEST translates that drawing, which can be a mixture of text and graphics, into the HyperText Markup Language (HTML) and/or the Graphics Interchange Format (GIF), which are returned to the client. Sensitive regions embedded in the CLIM drawing are converted to hyperlinks with Universal Resource Locators (URLs) that generate further EcoCyc queries. This tight coupling of CLIM output with Web output makes CLIM a powerful high-level programming tool for Web applications. The flexibility of Common Lisp and CLIM made implementation of the server tool surprisingly easy, requiring few changes to the existing EcoCyc program. The results can be seen at URL http: @www.ai.sri.com/ecocyc/browser.html. We have made CWEST available to the CLIM community at large, with the hope that it will spur other software developers to make their CLIM applications available over the WWW.


Assuntos
Redes de Comunicação de Computadores , Bases de Dados Factuais , Software , Interface Usuário-Computador
8.
J Comput Biol ; 2(4): 573-86, 1995.
Artigo em Inglês | MEDLINE | ID: mdl-8634909

RESUMO

To realize the full potential of biological databases (DBs) requires more than the interactive, hypertext flavor of database interoperation that is now so popular in the bioinformatics community. Interoperation based on declarative queries to multiple network-accessible databases will support analyses and investigations that are orders of magnitude faster and more powerful than what can be accomplished through interactive navigation. I present a vision of the capabilities that a query-based interoperation infrastructure should provide, and identify assumptions underlying, and requirements of, this vision. I then propose an architecture for query-based interoperation that includes a number of novel components of an information infrastructure for molecular biology. These components include a knowledge base that describes relationships among the conceptualizations used in different biological databases, a module that can determine the DBs that are relevant to a particular query, a module that can translate a query and its results from one conceptualization to another, a collection of DB drivers that provide uniform physical access to different database management systems, a suite of translators that can interconvert among different database schema languages, and a database that describes the network location and access methods for biological databases. A number of the components are translators that bridge the heterogeneities that exist between biological DBs at several different levels, including the conceptual level, the data model, the query language, and data formats.


Assuntos
Bases de Dados Factuais , Biologia Molecular , Inteligência Artificial , Redes de Comunicação de Computadores , Sistemas Computacionais , Sistemas de Gerenciamento de Base de Dados , Sistemas de Informação
9.
J Comput Biol ; 3(1): 191-212, 1996.
Artigo em Inglês | MEDLINE | ID: mdl-8697237

RESUMO

The EcoCyc system consists of a knowledge base (KB) that describes the genes and intermediary metabolism of Escherichia coli, and a graphical user interface (GUI) for accessing that knowledge. This paper addresses two problems: How can we create a GUI that provides integrated access to metabolic and genomic data? We describe the design and implementation of visual presentations that closely mimic those found in the biology literature, and that offer hypertext navigation among related entities, and multiple views of the same entity. We employ a frame knowledge representation system (FRS) called HyperTHEO to manage the EcoCyc knowledge base. Among the advantages of FRSs are an expressive data model for capturing the complexities of biological information, and schema-evolution capabilities that facilitate the constant schema changes that biological databases tend to undergo. HyperTHEO also includes rule-based inference facilities that are the foundation of expert systems, a constraint language for maintaining data integrity, and a declarative query language. A graphic KB editor and browser allow the EcoCyc developers to interactively inspect and modify this evolving KB.


Assuntos
Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , Escherichia coli/genética , Escherichia coli/metabolismo , Genoma Bacteriano , Redes de Comunicação de Computadores , Gráficos por Computador , Computadores , Linguagens de Programação , Integração de Sistemas , Interface Usuário-Computador
11.
Comput Appl Biosci ; 8(4): 347-57, 1992 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-1498690

RESUMO

This paper describes a publicly available knowledge base of the chemical compounds involved in intermediary metabolism. We consider the motivations for constructing a knowledge base of metabolic compounds, the methodology by which it was constructed, and the information that it currently contains. Currently the knowledge base describes 981 compounds, listing for each: synonyms for its name, a systematic name, CAS registry number, chemical formula, molecular weight, chemical structure and two-dimensional display coordinates for the structure. The Compound Knowledge Base (CompoundKB) illustrates several methodological principles that should guide the development of biological knowledge bases. I argue that biological datasets should be made available in multiple representations to increase their accessibility to end users, and I present multiple representations of the CompoundKB (knowledge base, relational data base and ASN. 1 representations). I also analyze the general characteristics of these representations to provide an understanding of their relative advantages and disadvantages. Another principle is that the error rate of biological data bases should be estimated and documented-this analysis is performed for the CompoundKB.


Assuntos
Bioquímica , Bases de Dados Factuais , Metabolismo , Inteligência Artificial , Viés , Fenômenos Bioquímicos , Sistemas de Gerenciamento de Base de Dados
12.
Comp Funct Genomics ; 2(1): 25-7, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-18628940

RESUMO

A survey of Genbank entries for complete microbial genomes reveals that the majority do not conform to the Genbank standard. Typical deviations from the Genbank standard include records with information in incorrect fields, addition of extraneous and confusing information within a field, and omission of useful fields. This situation results from two principal causes: genome centres do not submit Genbank records in the proper form and the Genbank, EMBL and DDBJ staffs do not enforce the database standards that they have defined.

13.
Comput Appl Biosci ; 7(3): 301-8, 1991 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-1913210

RESUMO

This article describes artificial intelligence methods for representing theories in molecular biology, and for improving the predictive power of these theories using experimental data. A program called GENSIM provides a framework for representing theories that includes descriptions of classes of biological objects (genes, enzymes, etc.), and processes that specify potential interactions among these objects (such as enzymatic reactions). GENSIM can employ a theory specified within this framework to predict the outcomes of biological experiments. A program called HYPGENE comes into play when the observed outcome of an experiment does not match the outcome predicted by GENSIM. HYPGENE works backward from the error in GENSIMs prediction to postulate changes to both the theory embodied by GENSIM, and the presumed initial conditions of the experiment. I view HYPGENEs hypothesis generation task as a design problem, and I have adapted AI methods developed for design and planning to this task. These techniques were developed in conjunction with an in-depth study of the discovery of the gene regulation mechanism of attenuation in the E. coli tryptophan operon. Both GENSIM and HYPGENE have been tested on sample problems from the history of attenuation, and produced many of the same solutions as biologists did.


Assuntos
Inteligência Artificial , Simulação por Computador , Modelos Biológicos , Biologia Molecular , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica/fisiologia , Métodos , Óperon/genética , Software , Design de Software , Triptofano/genética
14.
Pac Symp Biocomput ; : 438-45, 1996.
Artigo em Inglês | MEDLINE | ID: mdl-9390249

RESUMO

The bioinformatics community is becoming increasingly reliant on the creation of links among biological databases (DBs) as a foundation for DB interoperability. For example, a link might be created from a protein in one DB (such as PIR), to a gene in another DB (such as GDB), by storing the unique identifier (id) of the gene object within an attribute of the protein object. User interfaces can then support navigation from the protein to the gene, and multiDB queries can join the protein with the gene. The unique id of the gene is serving as a foreign key. However, a variety of factors, such as changes in the underlying biology, can cause object ids to become invalid, thus producing invalid links among DBs. Invalid links are a violation of multidatabase referential integrity. We propose a network protocol whereby a database administrator can provide information about changes to the identifiers of objects in their database via Internet, to allow other databases to maintain referential integrity. We request comments from the bioinformatics community for the purpose of building a consensus on the proposed protocol.


Assuntos
Biologia Computacional/métodos , Bases de Dados como Assunto , Proteínas/química , Biologia Computacional/normas , Redes de Comunicação de Computadores , Enzimas/química , Enzimas/metabolismo , Genes , Proteínas/genética , Controle de Qualidade , Reprodutibilidade dos Testes
15.
Bioinformatics ; 16(3): 269-85, 2000 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-10869020

RESUMO

MOTIVATIONS: A number of important bioinformatics computations involve computing with function: executing computational operations whose inputs or outputs are descriptions of the functions of biomolecules. Examples include performing functional queries to sequence and pathway databases, and determining functional equality to evaluate algorithms that predict function from sequence. A prerequisite to computing with function is the existence of an ontology that provides a structured semantic encoding of function. Functional bioinformatics is an emerging subfield of bioinformatics that is concerned with developing ontologies and algorithms for computing with biological function. RESULTS: The article explores the notion of computing with function, and explains the importance of ontologies of function to bioinformatics. The functional ontology developed for the EcoCyc database is presented. This ontology can encode a diverse array of biochemical processes, including enzymatic reactions involving small-molecule substrates and macromolecular substrates, signal-transduction processes, transport events, and mechanisms of regulation of gene expression. The ontology is validated through its use to express complex functional queries for the EcoCyc DB. CONTACT: pkarp@ai.sri.com


Assuntos
Biologia Computacional , Proteínas/fisiologia , Metodologias Computacionais , Bases de Dados Factuais , Proteínas/metabolismo
16.
Bioinformatics ; 20(5): 709-17, 2004 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-14751985

RESUMO

MOTIVATION: The prediction of transcription units (TUs, which are similar to operons) is an important problem that has been tackled using many different approaches. The availability of complete microbial genomes has made genome-wide TU predictions possible. Pathway-genome databases (PGDBs) add metabolic and other organizational (i.e. protein complexes) information to the annotated genome, and are able to capture TU organization information. These characteristics of PGDBs make them a suitable framework for the development and implementation of TU predictors. RESULTS: We implemented a TU predictor that uses only intergenic distance and functional classification of genes to predict TU boundaries, and applied it to EcoCyc, our PGDB of Escherichia coli. To this original predictor, we added information on metabolic pathways, protein complexes and transporters, all readily available in EcoCyc, in order to generate an enhanced predictor. The enhanced predictor correctly predicted 80% of the known E.coli TUs (69% of the known operons), a moderate improvement over the original predictor's performance (75% of TUs and 65% of operons correctly predicted), demonstrating that the extra information available in the PGDB does indeed improve prediction performance. Performance of this E.coli-based predictor on a genome other than that of E.coli was tested on BsubCyc, our computationally generated PGDB for Bacillus subtilis, for which a set of 100 known operons is available. Prediction accuracy decreased substantially (46% of the known operons correctly predicted). This was due in part to missing information in BsubCyc, which prevented full use of the predictor's features. The augmented predictor has been implemented as part of our Pathway Tools software suite, and can be used to populate a PGDB with predicted TUs. AVAILABILITY: The TU predictor is included in version 7.0 of the Pathway Tools software suite. Pathway Tools 7.0 is available free of charge to academic institutions and for a fee to commercial enterprises. It runs on Sun Solaris 8, Linux and Windows. TUs predicted on the Caulobacter crescentus and Mycobacterium tuberculosis (H37Rv) genomes are available in our CauloCyc and MtbrvCyc databases, available at the BioCyc web site (http://biocyc.org). To obtain version 7.0 of Pathway Tools, follow the directions in our web site, http://biocyc.org/download.shtml.


Assuntos
Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Regulação Bacteriana da Expressão Gênica/fisiologia , Genoma Bacteriano , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Algoritmos , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Armazenamento e Recuperação da Informação/métodos , Transdução de Sinais/fisiologia , Software , Relação Estrutura-Atividade
17.
Artigo em Inglês | MEDLINE | ID: mdl-7584392

RESUMO

The automatic generation of drawings of metabolic pathways is a challenging problem that depends intimately on exactly what information has been recorded for each pathway, and on how that information is encoded. The chief contributions of the paper are a minimized representation for biochemical pathways called the predecessor list, and inference procedures for converting the predecessor list into a pathway-graph representation that can serve as input to a pathway-drawing algorithm. The predecessor list has several advantages over the pathway graph, including its compactness and its lack of redundancy. The conversion between the two representations can be formulated as both a constraint-satisfaction problem and a logical inference problem, whose goal is to assign directions to reactions, and to determine which are the main chemical compounds in the reaction. We describe a set of production rules that solves this inference problem. We also present heuristics for inferring whether the exterior compounds that are substrates of reactions at the periphery of a pathway are side or main compounds. These techniques were evaluated on 18 metabolic pathways from the EcoCyc knowledge base.


Assuntos
Simulação por Computador , Transdução de Sinais , Animais , Humanos
18.
Artigo em Inglês | MEDLINE | ID: mdl-7584337

RESUMO

Construction of electronic repositories of metabolic information is an increasingly active area of research. Encoding detailed knowledge of a complex biological domain requires finely honed representations. We survey representations used for several metabolic databases, including Eco-Cyc, and reach the following conclusions. Representation of the metabolism must distinguish enzyme classes from individual enzymes, because there is not a one-to-one mapping from enzymes to the reactions they catalyze. Individual enzymes must be represented explicitly as proteins, e.g., by encoding their subunit structure. The species variation of metabolism must be represented. So must the substrate specificity of enzymes, which may be treated in several ways.


Assuntos
Bases de Dados Factuais , Metabolismo , Enzimas/metabolismo , Estudos de Avaliação como Assunto , Proteínas/metabolismo , Especificidade da Espécie , Especificidade por Substrato
19.
Genome Res ; 10(4): 568-76, 2000 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-10779499

RESUMO

The EcoCyc database characterizes the known network of Escherichia coli small-molecule metabolism. Here we present a computational analysis of the global properties of that network, which consists of 744 reactions that are catalyzed by 607 enzymes. The reactions are organized into 131 pathways. Of the metabolic enzymes, 100 are multifunctional, and 68 of the reactions are catalyzed by >1 enzyme. The network contains 791 chemical substrates. Other properties considered by the analysis include the distribution of enzyme subunit organization, and the distribution of modulators of enzyme activity and of enzyme cofactors. The dimensions chosen for this analysis can be employed for comparative functional analysis of complete genomes.


Assuntos
Escherichia coli/metabolismo , Catálise , Biologia Computacional/métodos , Bases de Dados Factuais , Ativação Enzimática/genética , Escherichia coli/enzimologia , Escherichia coli/genética , Genoma Bacteriano , Complexos Multienzimáticos/genética
20.
Bioinformatics ; 17(6): 526-32; discussion 533-4, 2001 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-11395429

RESUMO

PROBLEM STATEMENT: We have studied the relationships among SWISS-PROT, TrEMBL, and GenBank with two goals. First is to determine whether users can reliably identify those proteins in SWISS-PROT whose functions were determined experimentally, as opposed to proteins whose functions were predicted computationally. If this information was present in reasonable quantities, it would allow researchers to decrease the propagation of incorrect function predictions during sequence annotation, and to assemble training sets for developing the next generation of sequence-analysis algorithms. Second is to assess the consistency between translated GenBank sequences and sequences in SWISS-PROT and TrEMBL. RESULTS: (1) Contrary to claims by the SWISS-PROT authors, we conclude that SWISS-PROT does not identify a significant number of experimentally characterized proteins. (2) SWISS-PROT is more incomplete than we expected in that version 38.0 from July 1999 lacks many proteins from the full genomes of important organisms that were sequenced years earlier. (3) Even if we combine SWISS-PROT and TrEMBL, some sequences from the full genomes are missing from the combined dataset. (4) In many cases, translated GenBank genes do not exactly match the corresponding SWISS-PROT sequences, for reasons that include missing or removed methionines, differing translation start positions, individual amino-acid differences, and inclusion of sequence data from multiple sequencing projects. For example, results show that for Escherichia coli, 80.6% of the proteins in the GenBank entry for the complete genome have identical sequence matches with SWISS-PROT/TrEMBL sequences, 13.4% have exact substring matches, and matches for 4.1% can be found using BLAST search; the remaining 2.0% of E.coli protein sequences (most of which are ORFs) have no clear matches to SWISS-PROT/TrEMBL. Although many of these differences can be explained by the complexity of the DB, and by the curation processes used to create it, the scale of the differences is notable.


Assuntos
Algoritmos , Bases de Dados Factuais/normas , Biblioteca Gênica , Projeto Genoma Humano , Sequência de Bases , Interpretação Estatística de Dados , Escherichia coli/classificação , Escherichia coli/genética , Haemophilus influenzae/genética , Helicobacter pylori/genética , Fases de Leitura Aberta/genética , Biossíntese de Proteínas/genética , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA