Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Mais filtros

Base de dados
Tipo de documento
País/Região como assunto
Intervalo de ano de publicação
1.
Science ; 278(5338): 631-7, 1997 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-9381173

RESUMO

In order to extract the maximum amount of information from the rapidly accumulating genome sequences, all conserved genes need to be classified according to their homologous relationships. Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the delineation of 720 clusters of orthologous groups (COGs). Each COG consists of individual orthologous proteins or orthologous sets of paralogs from at least three lineages. Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG. This relation automatically yields a number of functional predictions for poorly characterized genomes. The COGs comprise a framework for functional and evolutionary genome analysis.


Assuntos
Genes Arqueais , Genes Bacterianos , Genes Fúngicos , Família Multigênica , Filogenia , Proteínas/genética , Sequência de Aminoácidos , Proteínas Arqueais/química , Proteínas Arqueais/classificação , Proteínas Arqueais/genética , Proteínas Arqueais/fisiologia , Bactérias/química , Bactérias/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/classificação , Proteínas de Bactérias/genética , Proteínas de Bactérias/fisiologia , Sequência Conservada , Evolução Molecular , Proteínas Fúngicas/química , Proteínas Fúngicas/classificação , Proteínas Fúngicas/genética , Proteínas Fúngicas/fisiologia , Mathanococcus/química , Mathanococcus/genética , Proteínas/química , Proteínas/classificação , Proteínas/fisiologia , Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/genética , Especificidade da Espécie
2.
Science ; 282(5389): 754-9, 1998 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-9784136

RESUMO

Analysis of the 1,042,519-base pair Chlamydia trachomatis genome revealed unexpected features related to the complex biology of chlamydiae. Although chlamydiae lack many biosynthetic capabilities, they retain functions for performing key steps and interconversions of metabolites obtained from their mammalian host cells. Numerous potential virulence-associated proteins also were characterized. Several eukaryotic chromatin-associated domain proteins were identified, suggesting a eukaryotic-like mechanism for chlamydial nucleoid condensation and decondensation. The phylogenetic mosaic of chlamydial genes, including a large number of genes with phylogenetic origins from eukaryotes, implies a complex evolution for adaptation to obligate intracellular parasitism.


Assuntos
Chlamydia trachomatis/genética , Genoma Bacteriano , Análise de Sequência de DNA , Aerobiose , Sequência de Aminoácidos , Aminoácidos/biossíntese , Proteínas da Membrana Bacteriana Externa/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Evolução Biológica , Chlamydia trachomatis/classificação , Chlamydia trachomatis/metabolismo , Chlamydia trachomatis/fisiologia , Reparo do DNA , Metabolismo Energético , Enzimas/química , Enzimas/genética , Humanos , Lipídeos/biossíntese , Dados de Sequência Molecular , Peptidoglicano/biossíntese , Peptidoglicano/genética , Filogenia , Biossíntese de Proteínas , Recombinação Genética , Transcrição Gênica , Transformação Bacteriana , Virulência
3.
Microbiol Mol Biol Rev ; 65(1): 44-79, 2001 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-11238985

RESUMO

The bacterium Deinococcus radiodurans shows remarkable resistance to a range of damage caused by ionizing radiation, desiccation, UV radiation, oxidizing agents, and electrophilic mutagens. D. radiodurans is best known for its extreme resistance to ionizing radiation; not only can it grow continuously in the presence of chronic radiation (6 kilorads/h), but also it can survive acute exposures to gamma radiation exceeding 1,500 kilorads without dying or undergoing induced mutation. These characteristics were the impetus for sequencing the genome of D. radiodurans and the ongoing development of its use for bioremediation of radioactive wastes. Although it is known that these multiple resistance phenotypes stem from efficient DNA repair processes, the mechanisms underlying these extraordinary repair capabilities remain poorly understood. In this work we present an extensive comparative sequence analysis of the Deinococcus genome. Deinococcus is the first representative with a completely sequenced genome from a distinct bacterial lineage of extremophiles, the Thermus-Deinococcus group. Phylogenetic tree analysis, combined with the identification of several synapomorphies between Thermus and Deinococcus, supports the hypothesis that it is an ancient group with no clear affinities to any of the other known bacterial lineages. Distinctive features of the Deinococcus genome as well as features shared with other free-living bacteria were revealed by comparison of its proteome to the collection of clusters of orthologous groups of proteins. Analysis of paralogs in Deinococcus has revealed several unique protein families. In addition, specific expansions of several other families including phosphatases, proteases, acyltransferases, and Nudix family pyrophosphohydrolases were detected. Genes that potentially affect DNA repair and recombination and stress responses were investigated in detail. Some proteins appear to have been horizontally transferred from eukaryotes and are not present in other bacteria. For example, three proteins homologous to plant desiccation resistance proteins were identified, and these are particularly interesting because of the correlation between desiccation and radiation resistance. Compared to other bacteria, the D. radiodurans genome is enriched in repetitive sequences, namely, IS-like transposons and small intergenic repeats. In combination, these observations suggest that several different biological mechanisms contribute to the multiple DNA repair-dependent phenotypes of this organism.


Assuntos
Dano ao DNA/efeitos da radiação , Genoma Bacteriano , Cocos Gram-Positivos/genética , Sequência de Aminoácidos , Evolução Biológica , Metabolismo dos Carboidratos , Reparo do DNA/fisiologia , Replicação do DNA , Metabolismo Energético , Regulação Bacteriana da Expressão Gênica , Transferência Genética Horizontal , Genômica/métodos , Cocos Gram-Positivos/efeitos da radiação , Dados de Sequência Molecular , Biossíntese de Proteínas , Transdução de Sinais
4.
Curr Biol ; 6(3): 279-91, 1996 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-8805245

RESUMO

BACKGROUND: The 1.83 Megabase (Mb) sequence of the Haemophilus influenzae chromosome, the first completed genome sequence of a cellular life form, has been recently reported. Approximately 75 % of the 4.7 Mb genome sequence of Escherichia coli is also available. The life styles of the two bacteria are very different - H. influenzae is an obligate parasite that lives in human upper respiratory mucosa and can be cultivated only on rich media, whereas E. coli is a saprophyte that can grow on minimal media. A detailed comparison of the protein products encoded by these two genomes is expected to provide valuable insights into bacterial cell physiology and genome evolution. RESULTS: We describe the results of computer analysis of the amino-acid sequences of 1703 putative proteins encoded by the complete genome of H. influenzae. We detected sequence similarity to proteins in current databases for 92 % of the H. influenzae protein sequences, and at least a general functional prediction was possible for 83 %. A comparison of the H. influenzae protein sequences with those of 3010 proteins encoded by the sequenced 75 % of the E. coli genome revealed 1128 pairs of apparent orthologs, with an average of 59 % identity. In contrast to the high similarity between orthologs, the genome organization and the functional repertoire of genes in the two bacteria were remarkably different. The smaller genome size of H. influenzae is explained, to a large extent, by a reduction in the number of paralogous genes. There was no long range colinearity between the E. coli and H. influenzae gene orders, but over 70 % of the orthologous genes were found in short conserved strings, only about half of which were operons in E. coli. Superposition of the H. influenzae enzyme repertoire upon the known E. coli metabolic pathways allowed us to reconstruct similar and alternative pathways in H. influenzae and provides an explanation for the known nutritional requirements. CONCLUSIONS: By comparing proteins encoded by the two bacterial genomes, we have shown that extensive gene shuffling and variation in the extent of gene paralogy are major trends in bacterial evolution; this comparison has also allowed us to deduce crucial aspects of the largely uncharacterized metabolism of H. influenzae.


Assuntos
Proteínas de Bactérias/metabolismo , Escherichia coli/genética , Genoma Bacteriano , Haemophilus influenzae/genética , Haemophilus influenzae/metabolismo , Proteínas de Bactérias/química , Evolução Biológica , Sequência Conservada , DNA Bacteriano , Dados de Sequência Molecular
5.
Curr Opin Struct Biol ; 8(3): 355-63, 1998 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-9666332

RESUMO

Computer analysis of complete prokaryotic genomes shows that microbial proteins are in general highly conserved--approximately 70% of them contain ancient conserved regions. This allows us to delineate families of orthologs across a wide phylogenetic range and, in many cases, predict protein functions with considerable precision. Sequence database searches using newly developed, sensitive algorithms result in the unification of such orthologous families into larger superfamilies sharing common sequence motifs. For many of these superfamilies, prediction of the structural fold and specific amino acid residues involved in enzymatic catalysis is possible. Taken together, sequence and structure comparisons provide a powerful methodology that can successfully complement traditional experimental approaches.


Assuntos
DNA/química , DNA/genética , Genoma , Animais , Bactérias/genética , Simulação por Computador , Evolução Molecular , Variação Genética , Helicobacter pylori/enzimologia , Helicobacter pylori/genética , Humanos , Modelos Genéticos , Proteínas/química , Proteínas/classificação , Proteínas/genética
6.
Nucleic Acids Res ; 29(1): 22-8, 2001 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11125040

RESUMO

The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.


Assuntos
Bases de Dados Factuais , Proteínas , Animais , Archaea/genética , Bactérias/genética , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Genoma , Armazenamento e Recuperação da Informação , Internet , Filogenia , Proteínas/classificação , Proteínas/genética , Saccharomyces cerevisiae/genética , Alinhamento de Sequência
7.
J Mol Biol ; 244(1): 125-32, 1994 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-7966317

RESUMO

Using an iterative approach to sequence database search that combines scanning with individual amino acid sequences and with alignment blocks, we show that bacterial haloacid dehalogenases (HADs) belong to a large superfamily of hydrolases with diverse substrate specificity. The superfamily also includes epoxide hydrolases, different types of phosphatases, and numerous uncharacterized proteins from eubacteria, eukaryotes, and Archaea. Nine putative proteins of the HAD superfamily with functions unknown, in addition to two known enzymes, were found in Escherichia coli alone, making it one of the largest groups of enzymes and indicating that a variety of hydrolytic enzyme activities remain to be described. Many of the proteins with known enzymatic activities in the HAD superfamily are involved in detoxification of xenobiotics or metabolic by-products. All the proteins in the superfamily contain three conserved sequence motifs. Along with the conservation of the predicted secondary structure, motifs I, II, and III include a conserved aspartic acid residue, a lysine, and a nucleophile, namely aspartic acid or serine, respectively. A specific role in the catalysis of the hydrolysis of carbon-halogen and other bonds is assigned to each of these residues.


Assuntos
Bactérias/enzimologia , Hidrolases/classificação , Hidrolases/genética , Família Multigênica , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Sítios de Ligação , Biodegradação Ambiental , Evolução Biológica , Ácidos Carboxílicos/metabolismo , Computadores , Sequência Conservada , Bases de Dados Factuais , Epóxido Hidrolases/genética , Hidrocarbonetos Halogenados/metabolismo , Modelos Biológicos , Modelos Químicos , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos , Estatística como Assunto , Especificidade por Substrato , Xenobióticos/metabolismo
8.
Protein Sci ; 3(11): 2045-54, 1994 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-7703850

RESUMO

Using computer methods for multiple alignment, sequence motif search, and tertiary structure modeling, we show that eukaryotic translation elongation factor 1 gamma (EF1 gamma) contains an N-terminal domain related to class theta glutathione S-transferases (GST). GST-like proteins related to class theta comprise a large group including, in addition to typical GSTs and EF1 gamma, stress-induced proteins from bacteria and plants, bacterial reductive dehalogenases and beta-etherases, and several uncharacterized proteins. These proteins share 2 conserved sequence motifs with GSTs of other classes (alpha, mu, and pi). Tertiary structure modeling showed that in spite of the relatively low sequence similarity, the GST-related domain of EF1 gamma is likely to form a fold very similar to that in the known structures of class alpha, mu, and pi GSTs. One of the conserved motifs is implicated in glutathione binding, whereas the other motif probably is involved in maintaining the proper conformation of the GST domain. We predict that the GST-like domain in EF1 gamma is enzymatically active and that to exhibit GST activity, EF1 gamma has to form homodimers. The GST activity may be involved in the regulation of the assembly of multisubunit complexes containing EF1 and aminoacyl-tRNA synthetases by shifting the balance between glutathione, disulfide glutathione, thiol groups of cysteines, and protein disulfide bonds. The GST domain is a widespread, conserved enzymatic module that may be covalently or noncovalently complexed with other proteins. Regulation of protein assembly and folding may be 1 of the functions of GST.


Assuntos
Glutationa Transferase/química , Fatores de Alongamento de Peptídeos/química , Sequência de Aminoácidos , Animais , Sítios de Ligação/genética , Sítios de Ligação/fisiologia , Evolução Biológica , Gráficos por Computador , Sequência Conservada/genética , Glutationa Transferase/metabolismo , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Fator 1 de Elongação de Peptídeos , Fatores de Alongamento de Peptídeos/metabolismo , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Alinhamento de Sequência
9.
BMC Evol Biol ; 1: 8, 2001 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-11734060

RESUMO

BACKGROUND: The availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes. RESULTS: Five largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i) presence-absence of genomes in clusters of orthologous genes; ii) conservation of local gene order (gene pairs) among prokaryotic genomes; iii) parameters of identity distribution for probable orthologs; iv) analysis of concatenated alignments of ribosomal proteins; v) comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i) Chlamydia-Spirochetes, ii) Thermotogales-Aquificales (bacterial hyperthermophiles), and ii) Actinomycetes-Deinococcales-Cyanobacteria. The latter group also appeared to join the low-GC Gram-positive bacteria at a deeper tree node. These new groupings of bacteria were supported by the analysis of alternative topologies in the concatenated ribosomal protein tree using the Kishino-Hasegawa test and by a census of the topologies of 132 individual groups of orthologous proteins. Additionally, the results of this analysis put into question the sister-group relationship between the two major archaeal groups, Euryarchaeota and Crenarchaeota, and suggest instead that Euryarchaeota might be a paraphyletic group with respect to Crenarchaeota. CONCLUSIONS: We conclude that, the extensive horizontal gene flow and lineage-specific gene loss notwithstanding, extension of phylogenetic analysis to the genome scale has the potential of uncovering deep evolutionary relationships between prokaryotic lineages.


Assuntos
Bactérias/classificação , Bactérias/genética , Evolução Molecular , Genoma Bacteriano , Genômica/métodos , Filogenia , Sequência Conservada/genética , Ordem dos Genes/genética , Transferência Genética Horizontal , Genes Arqueais/genética , Genes Bacterianos/genética , Genoma Arqueal , Funções Verossimilhança , Células Procarióticas/metabolismo , Proteínas Ribossômicas/genética , Alinhamento de Sequência , Especificidade da Espécie
10.
Methods Enzymol ; 266: 131-41, 1996.
Artigo em Inglês | MEDLINE | ID: mdl-8743682

RESUMO

The sequence databases continue to grow at an extraordinary rate. Contributions come from both small laboratories and large-scale projects, such as the Merck EST project. This growth has placed new demands on computational sequence comparison tools such as BLAST. Even now it is no longer practical to evaluate some BLAST reports manually; it is necessary to filter the output by, for example, organism, source, or degree of annotation. The new network BLAST service makes such tools possible. It is also possible to present BLAST output in different formats, such as BLANCE. Perhaps most important of all, it becomes simple to call BLAST from another application, making it one step within an integrated system. This makes the automated preparation of sequence evaluations that include BLAST runs possible. In the near future we expect to see a number of applications that use the network BLAST interface to help molecular biologists search against a database that is growing not only in size but in biological richness.


Assuntos
Sequência de Aminoácidos , Sequência de Bases , Bases de Dados Factuais , Proteínas/química , Software , Algoritmos , Animais , Composição de Bases , Escherichia coli , Humanos , Dados de Sequência Molecular , Sequências Repetitivas de Ácido Nucleico , Saccharomyces cerevisiae
11.
Methods Enzymol ; 266: 295-322, 1996.
Artigo em Inglês | MEDLINE | ID: mdl-8743691

RESUMO

An adequate set of computer procedures tailored to address the task of genome-scale analysis of protein sequences will greatly increase the beneficial impact of the genome sequencing projects on the progress of biological research. This is especially pertinent given the fact that, for model organisms, one-half or more of the putative gene products have not been functionally characterized. Here we described several programs that may comprise the core of such a set and their application to the analysis of about 3000 proteins comprising 75% of the E. coli gene products. We find that the protein sequences encoded in this model genome are a rich source of information, with biologically relevant similarities detected for more than 80% of them. In the majority of cases, these similarities become evident directly from the results of BLAST searches. However, methods for motif analysis provide for a significant increase in search sensitivity and are particularly important for the detection of ancient conserved regions. As a result of sequence similarity analysis, generalized functional predictions can be made for the majority of uncharacterized ORF products, allowing efficient focusing of experimental effort. Clustering of the E. coli proteins on the basis of sequence similarity shows that almost one-half of the bacterial proteins have at least one paralog and that the likelihood that a protein belongs to a small or a large cluster depends on the function of this particular protein.


Assuntos
Sequência de Aminoácidos , Proteínas de Bactérias/química , Bases de Dados Factuais , Escherichia coli/genética , Genoma Bacteriano , Homologia de Sequência de Aminoácidos , Software , Algoritmos , Proteínas de Bactérias/genética , Bacteriófago T4/genética , Sequência Conservada , Dados de Sequência Molecular , Fases de Leitura Aberta
14.
Proc Natl Acad Sci U S A ; 92(25): 11921-5, 1995 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-8524875

RESUMO

A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.


Assuntos
Sequência de Aminoácidos , Proteínas de Bactérias/genética , Evolução Biológica , Sequência Conservada , Escherichia coli/genética , Algoritmos , Archaea/genética , Proteínas de Bactérias/classificação , Proteínas Cromossômicas não Histona/genética , Proteínas Cromossômicas não Histona/metabolismo , Bases de Dados Factuais , Células Eucarióticas , Previsões , Humanos , Metiltransferases/genética , Metiltransferases/metabolismo , Dados de Sequência Molecular , RNA Ribossômico/metabolismo , S-Adenosilmetionina/metabolismo , Alinhamento de Sequência , Análise de Sequência , Homologia de Sequência de Aminoácidos , Relação Estrutura-Atividade
15.
Proc Natl Acad Sci U S A ; 91(25): 12091-5, 1994 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-7991589

RESUMO

We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance.


Assuntos
Sequência de Aminoácidos , Sequência Conservada , Bases de Dados Factuais , Proteínas/química , Proteínas/genética , Bactérias/enzimologia , Bactérias/genética , Evolução Biológica , Sequência Consenso , DNA Topoisomerases Tipo I/química , DNA Topoisomerases Tipo I/genética , Modelos Teóricos , Dados de Sequência Molecular , Saccharomyces cerevisiae/enzimologia , Saccharomyces cerevisiae/genética , Estatística como Assunto
16.
Genetica ; 108(1): 9-17, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-11145426

RESUMO

A complete understanding of the biology of an organism necessarily starts with knowledge of its genetic makeup. Proteins encoded in a genome must be identified and characterized, and the presence or absence of specific sets of proteins must be noted in order to determine the possible biochemical pathways or functional systems utilized by that organism. The COG database presents a set of tools suited to these purposes, including the ability to select protein families (COGs) that contain proteins from a specified set of species. The selection is based upon a phylogenetic pattern, which is a shorthand representation of the presence or absence of a particular species in a COG. Here we present the use of phylogenetic patterns as a means to perform targeted searches for undetected protein-coding genes in complete genomes.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Genoma Arqueal , Genoma Bacteriano , Família Multigênica/genética , Algoritmos , Proteínas de Bactérias/genética , Proteínas Fúngicas/genética , Dados de Sequência Molecular , Filogenia , Saccharomyces cerevisiae/genética , Homologia de Sequência de Aminoácidos , Especificidade da Espécie
17.
Nucleic Acids Res ; 28(1): 33-6, 2000 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-10592175

RESUMO

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.


Assuntos
Bases de Dados Factuais , Evolução Molecular , Genoma Arqueal , Genoma Fúngico , Proteínas/genética , Sistemas de Gerenciamento de Base de Dados , Internet , Filogenia , Proteínas/fisiologia
18.
Genome Biol ; 2(12): RESEARCH0053, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11790256

RESUMO

BACKGROUND: Detection of changes in a protein's evolutionary rate may reveal cases of change in that protein's function. We developed and implemented a simple relative rates test in an attempt to assess the rate constancy of protein evolution and to detect cases of functional diversification between orthologous proteins. The test was performed on clusters of orthologous protein sequences from complete bacterial genomes (Chlamydia trachomatis, C. muridarum and Chlamydophila pneumoniae), complete archaeal genomes (Pyrococcus horikoshii, P. abyssi and P. furiosus) and partially sequenced mammalian genomes (human, mouse and rat). RESULTS: Amino-acid sequence evolution rates are significantly correlated on different branches of phylogenetic trees representing the great majority of analyzed orthologous protein sets from all three domains of life. However, approximately 1% of the proteins from each group of species deviates from this pattern and instead shows variation that is consistent with an acceleration of the rate of amino-acid substitution, which may be due to functional diversification. Most of the putative functionally diversified proteins from all three species groups are predicted to function at the periphery of the cells and mediate their interaction with the environment. CONCLUSIONS: Relative rates of protein evolution are remarkably constant for the three species groups analyzed here. Deviations from this rate constancy are probably due to changes in selective constraints associated with diversification between orthologs. Functional diversification between orthologs is thought to be a relatively rare event. However, the resolution afforded by the test designed specifically for genomic-scale datasets allowed us to identify numerous cases of possible functional diversification between orthologous proteins.


Assuntos
Evolução Molecular , Proteínas/genética , Proteínas/fisiologia , Animais , Proteínas Arqueais/química , Proteínas Arqueais/genética , Proteínas Arqueais/fisiologia , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/fisiologia , Células Eucarióticas/metabolismo , Previsões , Genoma Arqueal , Genoma Bacteriano , Genoma Humano , Humanos , Camundongos , Mutação , Filogenia , Estrutura Terciária de Proteína , Proteínas/química , Ratos , Análise de Sequência de Proteína
19.
Genome Res ; 9(7): 608-28, 1999 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-10413400

RESUMO

Comparative analysis of the protein sequences encoded in the four euryarchaeal species whose genomes have been sequenced completely (Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Archaeoglobus fulgidus, and Pyrococcus horikoshii) revealed 1326 orthologous sets, of which 543 are represented in all four species. The proteins that belong to these conserved euryarchaeal families comprise 31%-35% of the gene complement and may be considered the evolutionarily stable core of the archaeal genomes. The core gene set includes the great majority of genes coding for proteins involved in genome replication and expression, but only a relatively small subset of metabolic functions. For many gene families that are conserved in all euryarchaea, previously undetected orthologs in bacteria and eukaryotes were identified. A number of euryarchaeal synapomorphies (unique shared characters) were identified; these are protein families that possess sequence signatures or domain architectures that are conserved in all euryarchaea but are not found in bacteria or eukaryotes. In addition, euryarchaea-specific expansions of several protein and domain families were detected. In terms of their apparent phylogenetic affinities, the archaeal protein families split into bacterial and eukaryotic families. The majority of the proteins that have only eukaryotic orthologs or show the greatest similarity to their eukaryotic counterparts belong to the core set. The families of euryarchaeal genes that are conserved in only two or three species constitute a relatively mobile component of the genomes whose evolution should have involved multiple events of lineage-specific gene loss and horizontal gene transfer. Frequently these proteins have detectable orthologs only in bacteria or show the greatest similarity to the bacterial homologs, which might suggest a significant role of horizontal gene transfer from bacteria in the evolution of the euryarchaeota.


Assuntos
Euryarchaeota/genética , Genoma , Sequência de Aminoácidos , Proteínas Arqueais/genética , Proteínas de Bactérias/genética , Sequência Conservada , Células Eucarióticas/metabolismo , Evolução Molecular , Genes Arqueais/genética , Variação Genética , Filogenia , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
20.
J Bacteriol ; 183(16): 4823-38, 2001 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-11466286

RESUMO

The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria. However, the C. acetobutylicum genome also contains a significant number of predicted operons that are shared with distantly related bacteria and archaea but not with B. subtilis. Phylogenetic analysis is compatible with the dissemination of such operons by horizontal transfer. The enzymes of the solventogenesis pathway and of the cellulosome of C. acetobutylicum comprise a new set of metabolic capacities not previously represented in the collection of complete genomes. These enzymes show a complex pattern of evolutionary affinities, emphasizing the role of lateral gene exchange in the evolution of the unique metabolic profile of the bacterium. Many of the sporulation genes identified in B. subtilis are missing in C. acetobutylicum, which suggests major differences in the sporulation process. Thus, comparative analysis reveals both significant conservation of the genome organization and pronounced differences in many systems that reflect unique adaptive strategies of the two gram-positive bacteria.


Assuntos
Clostridium/genética , Genoma Bacteriano , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Sequência de Bases , Cromossomos Bacterianos/genética , Clostridium/metabolismo , Sequência Conservada , Enzimas/genética , Genes Bacterianos , Modelos Biológicos , Dados de Sequência Molecular , Óperon , Filogenia , Plasmídeos , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Solventes/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA