Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
OMICS ; 7(2): 171-5, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-14506846

RESUMEN

As more and more complete bacterial genome sequences become available, the genome annotation of previously sequenced genomes may become quickly outdated. This is primarily due to the discovery and functional characterization of new genes. We have reannotated the recently published genome of Shewanella oneidensis with the following results: 51 new genes have been identified, and functional annotation has been added to the 97 genes, including 15 new and 82 existing ones with previously unassigned function. The identification of new genes was achieved by predicting the protein coding regions using the HMM-based program GeneMark.hmm. Subsequent comparison of the predicted gene products to the non-redundant protein database using BLAST and the COG (Clusters of Orthologous Groups) database using COGNITOR provided for the functional annotation.


Asunto(s)
Proteínas Bacterianas/genética , Genoma Bacteriano , Shewanella/genética , Algoritmos , Proteínas Bacterianas/fisiología , Biología Computacional/métodos , Genes Bacterianos/genética , Genómica , Datos de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Alineación de Secuencia/métodos , Programas Informáticos
2.
BMC Evol Biol ; 1: 8, 2001 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-11734060

RESUMEN

BACKGROUND: The availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes. RESULTS: Five largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i) presence-absence of genomes in clusters of orthologous genes; ii) conservation of local gene order (gene pairs) among prokaryotic genomes; iii) parameters of identity distribution for probable orthologs; iv) analysis of concatenated alignments of ribosomal proteins; v) comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i) Chlamydia-Spirochetes, ii) Thermotogales-Aquificales (bacterial hyperthermophiles), and ii) Actinomycetes-Deinococcales-Cyanobacteria. The latter group also appeared to join the low-GC Gram-positive bacteria at a deeper tree node. These new groupings of bacteria were supported by the analysis of alternative topologies in the concatenated ribosomal protein tree using the Kishino-Hasegawa test and by a census of the topologies of 132 individual groups of orthologous proteins. Additionally, the results of this analysis put into question the sister-group relationship between the two major archaeal groups, Euryarchaeota and Crenarchaeota, and suggest instead that Euryarchaeota might be a paraphyletic group with respect to Crenarchaeota. CONCLUSIONS: We conclude that, the extensive horizontal gene flow and lineage-specific gene loss notwithstanding, extension of phylogenetic analysis to the genome scale has the potential of uncovering deep evolutionary relationships between prokaryotic lineages.


Asunto(s)
Bacterias/clasificación , Bacterias/genética , Evolución Molecular , Genoma Bacteriano , Genómica/métodos , Filogenia , Secuencia Conservada/genética , Orden Génico/genética , Transferencia de Gen Horizontal , Genes Arqueales/genética , Genes Bacterianos/genética , Genoma Arqueal , Funciones de Verosimilitud , Células Procariotas/metabolismo , Proteínas Ribosómicas/genética , Alineación de Secuencia , Especificidad de la Especie
3.
J Bacteriol ; 183(16): 4823-38, 2001 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-11466286

RESUMEN

The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria. However, the C. acetobutylicum genome also contains a significant number of predicted operons that are shared with distantly related bacteria and archaea but not with B. subtilis. Phylogenetic analysis is compatible with the dissemination of such operons by horizontal transfer. The enzymes of the solventogenesis pathway and of the cellulosome of C. acetobutylicum comprise a new set of metabolic capacities not previously represented in the collection of complete genomes. These enzymes show a complex pattern of evolutionary affinities, emphasizing the role of lateral gene exchange in the evolution of the unique metabolic profile of the bacterium. Many of the sporulation genes identified in B. subtilis are missing in C. acetobutylicum, which suggests major differences in the sporulation process. Thus, comparative analysis reveals both significant conservation of the genome organization and pronounced differences in many systems that reflect unique adaptive strategies of the two gram-positive bacteria.


Asunto(s)
Clostridium/genética , Genoma Bacteriano , Secuencia de Aminoácidos , Proteínas Bacterianas/genética , Secuencia de Bases , Cromosomas Bacterianos/genética , Clostridium/metabolismo , Secuencia Conservada , Enzimas/genética , Genes Bacterianos , Modelos Biológicos , Datos de Secuencia Molecular , Operón , Filogenia , Plásmidos , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Solventes/metabolismo
5.
Microbiol Mol Biol Rev ; 65(1): 44-79, 2001 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-11238985

RESUMEN

The bacterium Deinococcus radiodurans shows remarkable resistance to a range of damage caused by ionizing radiation, desiccation, UV radiation, oxidizing agents, and electrophilic mutagens. D. radiodurans is best known for its extreme resistance to ionizing radiation; not only can it grow continuously in the presence of chronic radiation (6 kilorads/h), but also it can survive acute exposures to gamma radiation exceeding 1,500 kilorads without dying or undergoing induced mutation. These characteristics were the impetus for sequencing the genome of D. radiodurans and the ongoing development of its use for bioremediation of radioactive wastes. Although it is known that these multiple resistance phenotypes stem from efficient DNA repair processes, the mechanisms underlying these extraordinary repair capabilities remain poorly understood. In this work we present an extensive comparative sequence analysis of the Deinococcus genome. Deinococcus is the first representative with a completely sequenced genome from a distinct bacterial lineage of extremophiles, the Thermus-Deinococcus group. Phylogenetic tree analysis, combined with the identification of several synapomorphies between Thermus and Deinococcus, supports the hypothesis that it is an ancient group with no clear affinities to any of the other known bacterial lineages. Distinctive features of the Deinococcus genome as well as features shared with other free-living bacteria were revealed by comparison of its proteome to the collection of clusters of orthologous groups of proteins. Analysis of paralogs in Deinococcus has revealed several unique protein families. In addition, specific expansions of several other families including phosphatases, proteases, acyltransferases, and Nudix family pyrophosphohydrolases were detected. Genes that potentially affect DNA repair and recombination and stress responses were investigated in detail. Some proteins appear to have been horizontally transferred from eukaryotes and are not present in other bacteria. For example, three proteins homologous to plant desiccation resistance proteins were identified, and these are particularly interesting because of the correlation between desiccation and radiation resistance. Compared to other bacteria, the D. radiodurans genome is enriched in repetitive sequences, namely, IS-like transposons and small intergenic repeats. In combination, these observations suggest that several different biological mechanisms contribute to the multiple DNA repair-dependent phenotypes of this organism.


Asunto(s)
Daño del ADN/efectos de la radiación , Genoma Bacteriano , Cocos Grampositivos/genética , Secuencia de Aminoácidos , Evolución Biológica , Metabolismo de los Hidratos de Carbono , Reparación del ADN/fisiología , Replicación del ADN , Metabolismo Energético , Regulación Bacteriana de la Expresión Génica , Transferencia de Gen Horizontal , Genómica/métodos , Cocos Grampositivos/efectos de la radiación , Datos de Secuencia Molecular , Biosíntesis de Proteínas , Transducción de Señal
6.
Nucleic Acids Res ; 29(1): 22-8, 2001 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-11125040

RESUMEN

The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.


Asunto(s)
Bases de Datos Factuales , Proteínas , Animales , Archaea/genética , Bacterias/genética , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Genoma , Almacenamiento y Recuperación de la Información , Internet , Filogenia , Proteínas/clasificación , Proteínas/genética , Saccharomyces cerevisiae/genética , Alineación de Secuencia
7.
Genome Biol ; 2(12): RESEARCH0053, 2001.
Artículo en Inglés | MEDLINE | ID: mdl-11790256

RESUMEN

BACKGROUND: Detection of changes in a protein's evolutionary rate may reveal cases of change in that protein's function. We developed and implemented a simple relative rates test in an attempt to assess the rate constancy of protein evolution and to detect cases of functional diversification between orthologous proteins. The test was performed on clusters of orthologous protein sequences from complete bacterial genomes (Chlamydia trachomatis, C. muridarum and Chlamydophila pneumoniae), complete archaeal genomes (Pyrococcus horikoshii, P. abyssi and P. furiosus) and partially sequenced mammalian genomes (human, mouse and rat). RESULTS: Amino-acid sequence evolution rates are significantly correlated on different branches of phylogenetic trees representing the great majority of analyzed orthologous protein sets from all three domains of life. However, approximately 1% of the proteins from each group of species deviates from this pattern and instead shows variation that is consistent with an acceleration of the rate of amino-acid substitution, which may be due to functional diversification. Most of the putative functionally diversified proteins from all three species groups are predicted to function at the periphery of the cells and mediate their interaction with the environment. CONCLUSIONS: Relative rates of protein evolution are remarkably constant for the three species groups analyzed here. Deviations from this rate constancy are probably due to changes in selective constraints associated with diversification between orthologs. Functional diversification between orthologs is thought to be a relatively rare event. However, the resolution afforded by the test designed specifically for genomic-scale datasets allowed us to identify numerous cases of possible functional diversification between orthologous proteins.


Asunto(s)
Evolución Molecular , Proteínas/genética , Proteínas/fisiología , Animales , Proteínas Arqueales/química , Proteínas Arqueales/genética , Proteínas Arqueales/fisiología , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Proteínas Bacterianas/fisiología , Células Eucariotas/metabolismo , Predicción , Genoma Arqueal , Genoma Bacteriano , Genoma Humano , Humanos , Ratones , Mutación , Filogenia , Estructura Terciaria de Proteína , Proteínas/química , Ratas , Análisis de Secuencia de Proteína
8.
Genome Res ; 10(10): 1643-7, 2000 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-11042161

RESUMEN

We describe a genome annotation service provided by the Entrez browser, http://www.ncbi.nlm.nih.gov/entrez. All protein products identified in fully sequenced microbial genomes have been compared with proteins with known 3-D structure by use of the BLAST sequence comparison algorithm. For the approximately 20% of genome proteins in which unambiguous sequence similarity is detected, Entrez provides a link from the gene product to its predicted structure. The service uses the Cn3D molecular graphics viewer to present a 3-D view of the known structure, together with an alignment display mapping conserved residues from the genome protein onto the known structure. Using an example from Aeropyrum pernix, we illustrate how mapping to a 3-D structure can confirm predictions of biological function.


Asunto(s)
Bases de Datos Factuales , Genoma , Algoritmos , Secuencia de Aminoácidos , Proteínas Bacterianas/química , Proteínas Bacterianas/fisiología , Gráficos por Computador , Recolección de Datos/instrumentación , Recolección de Datos/métodos , Bases de Datos Factuales/provisión & distribución , Bases de Datos Factuales/tendencias , Genoma Bacteriano , Modelos Moleculares , Datos de Secuencia Molecular , Estructura Cuaternaria de Proteína
9.
Genetica ; 108(1): 9-17, 2000.
Artículo en Inglés | MEDLINE | ID: mdl-11145426

RESUMEN

A complete understanding of the biology of an organism necessarily starts with knowledge of its genetic makeup. Proteins encoded in a genome must be identified and characterized, and the presence or absence of specific sets of proteins must be noted in order to determine the possible biochemical pathways or functional systems utilized by that organism. The COG database presents a set of tools suited to these purposes, including the ability to select protein families (COGs) that contain proteins from a specified set of species. The selection is based upon a phylogenetic pattern, which is a shorthand representation of the presence or absence of a particular species in a COG. Here we present the use of phylogenetic patterns as a means to perform targeted searches for undetected protein-coding genes in complete genomes.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Genoma Arqueal , Genoma Bacteriano , Familia de Multigenes/genética , Algoritmos , Proteínas Bacterianas/genética , Proteínas Fúngicas/genética , Datos de Secuencia Molecular , Filogenia , Saccharomyces cerevisiae/genética , Homología de Secuencia de Aminoácido , Especificidad de la Especie
10.
Nucleic Acids Res ; 28(1): 33-6, 2000 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-10592175

RESUMEN

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.


Asunto(s)
Bases de Datos Factuales , Evolución Molecular , Genoma Arqueal , Genoma Fúngico , Proteínas/genética , Sistemas de Administración de Bases de Datos , Internet , Filogenia , Proteínas/fisiología
11.
Genome Res ; 9(7): 608-28, 1999 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-10413400

RESUMEN

Comparative analysis of the protein sequences encoded in the four euryarchaeal species whose genomes have been sequenced completely (Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Archaeoglobus fulgidus, and Pyrococcus horikoshii) revealed 1326 orthologous sets, of which 543 are represented in all four species. The proteins that belong to these conserved euryarchaeal families comprise 31%-35% of the gene complement and may be considered the evolutionarily stable core of the archaeal genomes. The core gene set includes the great majority of genes coding for proteins involved in genome replication and expression, but only a relatively small subset of metabolic functions. For many gene families that are conserved in all euryarchaea, previously undetected orthologs in bacteria and eukaryotes were identified. A number of euryarchaeal synapomorphies (unique shared characters) were identified; these are protein families that possess sequence signatures or domain architectures that are conserved in all euryarchaea but are not found in bacteria or eukaryotes. In addition, euryarchaea-specific expansions of several protein and domain families were detected. In terms of their apparent phylogenetic affinities, the archaeal protein families split into bacterial and eukaryotic families. The majority of the proteins that have only eukaryotic orthologs or show the greatest similarity to their eukaryotic counterparts belong to the core set. The families of euryarchaeal genes that are conserved in only two or three species constitute a relatively mobile component of the genomes whose evolution should have involved multiple events of lineage-specific gene loss and horizontal gene transfer. Frequently these proteins have detectable orthologs only in bacteria or show the greatest similarity to the bacterial homologs, which might suggest a significant role of horizontal gene transfer from bacteria in the evolution of the euryarchaeota.


Asunto(s)
Euryarchaeota/genética , Genoma , Secuencia de Aminoácidos , Proteínas Arqueales/genética , Proteínas Bacterianas/genética , Secuencia Conservada , Células Eucariotas/metabolismo , Evolución Molecular , Genes Arqueales/genética , Variación Genética , Filogenia , Alineación de Secuencia , Homología de Secuencia de Aminoácido
13.
Science ; 282(5389): 754-9, 1998 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-9784136

RESUMEN

Analysis of the 1,042,519-base pair Chlamydia trachomatis genome revealed unexpected features related to the complex biology of chlamydiae. Although chlamydiae lack many biosynthetic capabilities, they retain functions for performing key steps and interconversions of metabolites obtained from their mammalian host cells. Numerous potential virulence-associated proteins also were characterized. Several eukaryotic chromatin-associated domain proteins were identified, suggesting a eukaryotic-like mechanism for chlamydial nucleoid condensation and decondensation. The phylogenetic mosaic of chlamydial genes, including a large number of genes with phylogenetic origins from eukaryotes, implies a complex evolution for adaptation to obligate intracellular parasitism.


Asunto(s)
Chlamydia trachomatis/genética , Genoma Bacteriano , Análisis de Secuencia de ADN , Aerobiosis , Secuencia de Aminoácidos , Aminoácidos/biosíntesis , Proteínas de la Membrana Bacteriana Externa/genética , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Evolución Biológica , Chlamydia trachomatis/clasificación , Chlamydia trachomatis/metabolismo , Chlamydia trachomatis/fisiología , Reparación del ADN , Metabolismo Energético , Enzimas/química , Enzimas/genética , Humanos , Lípidos/biosíntesis , Datos de Secuencia Molecular , Peptidoglicano/biosíntesis , Peptidoglicano/genética , Filogenia , Biosíntesis de Proteínas , Recombinación Genética , Transcripción Genética , Transformación Bacteriana , Virulencia
14.
Curr Opin Struct Biol ; 8(3): 355-63, 1998 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-9666332

RESUMEN

Computer analysis of complete prokaryotic genomes shows that microbial proteins are in general highly conserved--approximately 70% of them contain ancient conserved regions. This allows us to delineate families of orthologs across a wide phylogenetic range and, in many cases, predict protein functions with considerable precision. Sequence database searches using newly developed, sensitive algorithms result in the unification of such orthologous families into larger superfamilies sharing common sequence motifs. For many of these superfamilies, prediction of the structural fold and specific amino acid residues involved in enzymatic catalysis is possible. Taken together, sequence and structure comparisons provide a powerful methodology that can successfully complement traditional experimental approaches.


Asunto(s)
ADN/química , ADN/genética , Genoma , Animales , Bacterias/genética , Simulación por Computador , Evolución Molecular , Variación Genética , Helicobacter pylori/enzimología , Helicobacter pylori/genética , Humanos , Modelos Genéticos , Proteínas/química , Proteínas/clasificación , Proteínas/genética
15.
Science ; 278(5338): 631-7, 1997 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-9381173

RESUMEN

In order to extract the maximum amount of information from the rapidly accumulating genome sequences, all conserved genes need to be classified according to their homologous relationships. Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the delineation of 720 clusters of orthologous groups (COGs). Each COG consists of individual orthologous proteins or orthologous sets of paralogs from at least three lineages. Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG. This relation automatically yields a number of functional predictions for poorly characterized genomes. The COGs comprise a framework for functional and evolutionary genome analysis.


Asunto(s)
Genes Arqueales , Genes Bacterianos , Genes Fúngicos , Familia de Multigenes , Filogenia , Proteínas/genética , Secuencia de Aminoácidos , Proteínas Arqueales/química , Proteínas Arqueales/clasificación , Proteínas Arqueales/genética , Proteínas Arqueales/fisiología , Bacterias/química , Bacterias/genética , Proteínas Bacterianas/química , Proteínas Bacterianas/clasificación , Proteínas Bacterianas/genética , Proteínas Bacterianas/fisiología , Secuencia Conservada , Evolución Molecular , Proteínas Fúngicas/química , Proteínas Fúngicas/clasificación , Proteínas Fúngicas/genética , Proteínas Fúngicas/fisiología , Methanococcus/química , Methanococcus/genética , Proteínas/química , Proteínas/clasificación , Proteínas/fisiología , Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/genética , Especificidad de la Especie
16.
Curr Biol ; 6(3): 279-91, 1996 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-8805245

RESUMEN

BACKGROUND: The 1.83 Megabase (Mb) sequence of the Haemophilus influenzae chromosome, the first completed genome sequence of a cellular life form, has been recently reported. Approximately 75 % of the 4.7 Mb genome sequence of Escherichia coli is also available. The life styles of the two bacteria are very different - H. influenzae is an obligate parasite that lives in human upper respiratory mucosa and can be cultivated only on rich media, whereas E. coli is a saprophyte that can grow on minimal media. A detailed comparison of the protein products encoded by these two genomes is expected to provide valuable insights into bacterial cell physiology and genome evolution. RESULTS: We describe the results of computer analysis of the amino-acid sequences of 1703 putative proteins encoded by the complete genome of H. influenzae. We detected sequence similarity to proteins in current databases for 92 % of the H. influenzae protein sequences, and at least a general functional prediction was possible for 83 %. A comparison of the H. influenzae protein sequences with those of 3010 proteins encoded by the sequenced 75 % of the E. coli genome revealed 1128 pairs of apparent orthologs, with an average of 59 % identity. In contrast to the high similarity between orthologs, the genome organization and the functional repertoire of genes in the two bacteria were remarkably different. The smaller genome size of H. influenzae is explained, to a large extent, by a reduction in the number of paralogous genes. There was no long range colinearity between the E. coli and H. influenzae gene orders, but over 70 % of the orthologous genes were found in short conserved strings, only about half of which were operons in E. coli. Superposition of the H. influenzae enzyme repertoire upon the known E. coli metabolic pathways allowed us to reconstruct similar and alternative pathways in H. influenzae and provides an explanation for the known nutritional requirements. CONCLUSIONS: By comparing proteins encoded by the two bacterial genomes, we have shown that extensive gene shuffling and variation in the extent of gene paralogy are major trends in bacterial evolution; this comparison has also allowed us to deduce crucial aspects of the largely uncharacterized metabolism of H. influenzae.


Asunto(s)
Proteínas Bacterianas/metabolismo , Escherichia coli/genética , Genoma Bacteriano , Haemophilus influenzae/genética , Haemophilus influenzae/metabolismo , Proteínas Bacterianas/química , Evolución Biológica , Secuencia Conservada , ADN Bacteriano , Datos de Secuencia Molecular
17.
Methods Enzymol ; 266: 131-41, 1996.
Artículo en Inglés | MEDLINE | ID: mdl-8743682

RESUMEN

The sequence databases continue to grow at an extraordinary rate. Contributions come from both small laboratories and large-scale projects, such as the Merck EST project. This growth has placed new demands on computational sequence comparison tools such as BLAST. Even now it is no longer practical to evaluate some BLAST reports manually; it is necessary to filter the output by, for example, organism, source, or degree of annotation. The new network BLAST service makes such tools possible. It is also possible to present BLAST output in different formats, such as BLANCE. Perhaps most important of all, it becomes simple to call BLAST from another application, making it one step within an integrated system. This makes the automated preparation of sequence evaluations that include BLAST runs possible. In the near future we expect to see a number of applications that use the network BLAST interface to help molecular biologists search against a database that is growing not only in size but in biological richness.


Asunto(s)
Secuencia de Aminoácidos , Secuencia de Bases , Bases de Datos Factuales , Proteínas/química , Programas Informáticos , Algoritmos , Animales , Composición de Base , Escherichia coli , Humanos , Datos de Secuencia Molecular , Secuencias Repetitivas de Ácidos Nucleicos , Saccharomyces cerevisiae
18.
Methods Enzymol ; 266: 295-322, 1996.
Artículo en Inglés | MEDLINE | ID: mdl-8743691

RESUMEN

An adequate set of computer procedures tailored to address the task of genome-scale analysis of protein sequences will greatly increase the beneficial impact of the genome sequencing projects on the progress of biological research. This is especially pertinent given the fact that, for model organisms, one-half or more of the putative gene products have not been functionally characterized. Here we described several programs that may comprise the core of such a set and their application to the analysis of about 3000 proteins comprising 75% of the E. coli gene products. We find that the protein sequences encoded in this model genome are a rich source of information, with biologically relevant similarities detected for more than 80% of them. In the majority of cases, these similarities become evident directly from the results of BLAST searches. However, methods for motif analysis provide for a significant increase in search sensitivity and are particularly important for the detection of ancient conserved regions. As a result of sequence similarity analysis, generalized functional predictions can be made for the majority of uncharacterized ORF products, allowing efficient focusing of experimental effort. Clustering of the E. coli proteins on the basis of sequence similarity shows that almost one-half of the bacterial proteins have at least one paralog and that the likelihood that a protein belongs to a small or a large cluster depends on the function of this particular protein.


Asunto(s)
Secuencia de Aminoácidos , Proteínas Bacterianas/química , Bases de Datos Factuales , Escherichia coli/genética , Genoma Bacteriano , Homología de Secuencia de Aminoácido , Programas Informáticos , Algoritmos , Proteínas Bacterianas/genética , Bacteriófago T4/genética , Secuencia Conservada , Datos de Secuencia Molecular , Sistemas de Lectura Abierta
19.
Proc Natl Acad Sci U S A ; 92(25): 11921-5, 1995 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-8524875

RESUMEN

A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.


Asunto(s)
Secuencia de Aminoácidos , Proteínas Bacterianas/genética , Evolución Biológica , Secuencia Conservada , Escherichia coli/genética , Algoritmos , Archaea/genética , Proteínas Bacterianas/clasificación , Proteínas Cromosómicas no Histona/genética , Proteínas Cromosómicas no Histona/metabolismo , Bases de Datos Factuales , Células Eucariotas , Predicción , Humanos , Metiltransferasas/genética , Metiltransferasas/metabolismo , Datos de Secuencia Molecular , ARN Ribosómico/metabolismo , S-Adenosilmetionina/metabolismo , Alineación de Secuencia , Análisis de Secuencia , Homología de Secuencia de Aminoácido , Relación Estructura-Actividad
20.
Proc Natl Acad Sci U S A ; 91(25): 12091-5, 1994 Dec 06.
Artículo en Inglés | MEDLINE | ID: mdl-7991589

RESUMEN

We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance.


Asunto(s)
Secuencia de Aminoácidos , Secuencia Conservada , Bases de Datos Factuales , Proteínas/química , Proteínas/genética , Bacterias/enzimología , Bacterias/genética , Evolución Biológica , Secuencia de Consenso , ADN-Topoisomerasas de Tipo I/química , ADN-Topoisomerasas de Tipo I/genética , Modelos Teóricos , Datos de Secuencia Molecular , Saccharomyces cerevisiae/enzimología , Saccharomyces cerevisiae/genética , Estadística como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA