Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
BMC Bioinformatics ; 10: 355, 2009 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-19860884

RESUMO

BACKGROUND: Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. RESULTS: The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. CONCLUSION: Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.


Assuntos
Biologia Computacional/métodos , Genoma , Genômica/métodos , Filogenia , Evolução Molecular , Metagenômica
2.
BMC Evol Biol ; 9: 28, 2009 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-19192293

RESUMO

BACKGROUND: The question of how genomic processes, such as gene duplication, give rise to co-ordinated organismal properties, such as emergence of new body plans, organs and lifestyles, is of importance in developmental and evolutionary biology. Herein, we focus on the diversification of the transforming growth factor-beta (TGF-beta) pathway -- one of the fundamental and versatile metazoan signal transduction engines. RESULTS: After an investigation of 33 genomes, we show that the emergence of the TGF-beta pathway coincided with appearance of the first known animal species. The primordial pathway repertoire consisted of four Smads and four receptors, similar to those observed in the extant genome of the early diverging tablet animal (Trichoplax adhaerens). We subsequently retrace duplications in ancestral genomes on the lineage leading to humans, as well as lineage-specific duplications, such as those which gave rise to novel Smads and receptors in teleost fishes. We conclude that the diversification of the TGF-beta pathway can be parsimoniously explained according to the 2R model, with additional rounds of duplications in teleost fishes. Finally, we investigate duplications followed by accelerated evolution which gave rise to an atypical TGF-beta pathway in free-living bacterial feeding nematodes of the genus Rhabditis. CONCLUSION: Our results challenge the view of well-conserved developmental pathways. The TGF-beta signal transduction engine has expanded through gene duplication, continually adopting new functions, as animals grew in anatomical complexity, colonized new environments, and developed an active immune system.


Assuntos
Evolução Molecular , Família Multigênica , Fator de Crescimento Transformador beta/genética , Animais , Teorema de Bayes , Duplicação Gênica , Genoma , Humanos , Funções Verossimilhança , Filogenia , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Transdução de Sinais/genética
3.
BMC Evol Biol ; 8: 247, 2008 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-18782449

RESUMO

BACKGROUND: We describe a function-driven approach to the analysis of metabolism which takes into account the phylogenetic origin of biochemical reactions to reveal subtle lineage-specific metabolic innovations, undetectable by more traditional methods based on sequence comparison. The origins of reactions and thus entire pathways are inferred using a simple taxonomic classification scheme that describes the evolutionary course of events towards the lineage of interest. We investigate the evolutionary history of the human metabolic network extracted from a metabolic database, construct a network of interconnected pathways and classify this network according to the taxonomic categories representing eukaryotes, metazoa and vertebrates. RESULTS: It is demonstrated that lineage-specific innovations correspond to reactions and pathways associated with key phenotypic changes during evolution, such as the emergence of cellular organelles in eukaryotes, cell adhesion cascades in metazoa and the biosynthesis of complex cell-specific biomolecules in vertebrates. CONCLUSION: This phylogenetic view of metabolic networks puts gene innovations within an evolutionary context, demonstrating how the emergence of a phenotype in a lineage provides a platform for the development of specialized traits.


Assuntos
Evolução Molecular , Redes e Vias Metabólicas , Modelos Genéticos , Filogenia , Colesterol/metabolismo , Biologia Computacional/métodos , Bases de Dados Genéticas , Glicoesfingolipídeos/metabolismo , Glicosilação , Humanos
4.
PLoS Comput Biol ; 3(10): 2032-42, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17967053

RESUMO

Network analysis transcends conventional pairwise approaches to data analysis as the context of components in a network graph can be taken into account. Such approaches are increasingly being applied to genomics data, where functional linkages are used to connect genes or proteins. However, while microarray gene expression datasets are now abundant and of high quality, few approaches have been developed for analysis of such data in a network context. We present a novel approach for 3-D visualisation and analysis of transcriptional networks generated from microarray data. These networks consist of nodes representing transcripts connected by virtue of their expression profile similarity across multiple conditions. Analysing genome-wide gene transcription across 61 mouse tissues, we describe the unusual topography of the large and highly structured networks produced, and demonstrate how they can be used to visualise, cluster, and mine large datasets. This approach is fast, intuitive, and versatile, and allows the identification of biological relationships that may be missed by conventional analysis techniques. This work has been implemented in a freely available open-source application named BioLayout Express(3D).


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Transcrição Gênica , Algoritmos , Animais , Análise por Conglomerados , Expressão Gênica , Redes Reguladoras de Genes , Imageamento Tridimensional , Camundongos , Reconhecimento Automatizado de Padrão , Software
5.
BMC Bioinformatics ; 8 Suppl 4: S3, 2007 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-17570146

RESUMO

Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: http://www.genomes.org/services/corrie/.


Assuntos
Algoritmos , Enzimas/química , Enzimas/metabolismo , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Intervalos de Confiança , Interpretação Estatística de Dados , Enzimas/classificação , Dados de Sequência Molecular , Sensibilidade e Especificidade , Homologia de Sequência de Aminoácidos
6.
BMC Genomics ; 8: 460, 2007 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-18081932

RESUMO

BACKGROUND: Gene fusion detection - also known as the 'Rosetta Stone' method - involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes. RESULTS: In order to explore the usefulness and scope of this approach for protein interaction prediction and generate a high-quality, non-redundant set of interacting pairs of proteins across a wide taxonomic range, we have exhaustively performed gene fusion analysis for 184 genomes using an efficient variant of a previously developed protocol. By analyzing interaction graphs and applying a threshold that limits the maximum number of possible interactions within the largest graph components, we show that we can reduce the number of implausible interactions due to the detection of promiscuous domains. With this generally applicable approach, we generate a robust set of over 2 million distinct and testable interactions encompassing 696,894 proteins in 184 species or strains, most of which have never been the subject of high-throughput experimental proteomics. We investigate the cumulative effect of increasing numbers of genomes on the fidelity and quantity of predictions, and show that, for large numbers of genomes, predictions do not become saturated but continue to grow linearly, for the majority of the species. We also examine the percentage of component (and composite) proteins with relation to the number of genes and further validate the functional categories that are highly represented in this robust set of detected genome-wide interactions. CONCLUSION: We illustrate the phylogenetic and functional diversity of gene fusion events across genomes, and their usefulness for accurate prediction of protein interaction and function.


Assuntos
Fusão Gênica , Redes Reguladoras de Genes , Arabidopsis/genética , Proteínas de Bactérias/metabolismo , Chlamydia/genética , Variação Genética , Genoma , Filogenia , Proteínas de Plantas/metabolismo , Ligação Proteica , Reprodutibilidade dos Testes
7.
Nucleic Acids Res ; 33(2): 616-21, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15681613

RESUMO

Species evolutionary relationships have traditionally been defined by sequence similarities of phylogenetic marker molecules, recently followed by whole-genome phylogenies based on gene order, average ortholog similarity or gene content. Here, we introduce genome conservation--a novel metric of evolutionary distances between species that simultaneously takes into account, both gene content and sequence similarity at the whole-genome level. Genome conservation represents a robust distance measure, as demonstrated by accurate phylogenetic reconstructions. The genome conservation matrix for all presently sequenced organisms exhibits a remarkable ability to define evolutionary relationships across all taxonomic ranges. An assessment of taxonomic ranks with genome conservation shows that certain ranks are inadequately described and raises the possibility for a more precise and quantitative taxonomy in the future. All phylogenetic reconstructions are available at the genome phylogeny server: .


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Filogenia , Bactérias/classificação , Bactérias/genética , Evolução Molecular , Genoma Bacteriano , Proteobactérias/classificação , Proteobactérias/genética
8.
Nucleic Acids Res ; 33(19): 6083-9, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16246909

RESUMO

The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing.


Assuntos
Bases de Dados Genéticas , Genoma , Animais , Biologia Computacional , Genoma Arqueal , Genoma Bacteriano , Genômica , Humanos , Metabolismo/genética
9.
Res Microbiol ; 157(1): 57-68, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16431085

RESUMO

Using an algorithm for ancestral state inference of gene content, given a large number of extant genome sequences and a phylogenetic tree, we aim to reconstruct the gene content of the last universal common ancestor (LUCA), a hypothetical life form that presumably was the progenitor of the three domains of life. The method allows for gene loss, previously found to be a major factor in shaping gene content, and thus the estimate of LUCA's gene content appears to be substantially higher than that proposed previously, with a typical number of over 1000 gene families, of which more than 90% are also functionally characterized. More precisely, when only prokaryotes are considered, the number varies between 1006 and 1189 gene families while when eukaryotes are also included, this number increases to between 1344 and 1529 families depending on the underlying phylogenetic tree. Therefore, the common belief that the hypothetical genome of LUCA should resemble those of the smallest extant genomes of obligate parasites is not supported by recent advances in computational genomics. Instead, a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities including metabolic transformation, information processing, membrane/transport proteins and complex regulation, shared between the three domains of life, emerges as the most likely progenitor of life on Earth, with profound repercussions for planetary exploration and exobiology.


Assuntos
Planeta Terra , Evolução Molecular , Exobiologia , Genoma , Filogenia , Algoritmos , Transferência Genética Horizontal
10.
Appl Bioinformatics ; 4(1): 71-4, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16000016

RESUMO

Visualisation of biological networks is becoming a common task for the analysis of high-throughput data. These networks correspond to a wide variety of biological relationships, such as sequence similarity, metabolic pathways, gene regulatory cascades and protein interactions. We present a general approach for the representation and analysis of networks of variable type, size and complexity. The application is based on the original BioLayout program (C-language implementation of the Fruchterman-Rheingold layout algorithm), entirely re-written in Java to guarantee portability across platforms. BioLayout(Java) provides broader functionality, various analysis techniques, extensions for better visualisation and a new user interface. Examples of analysis of biological networks using BioLayout(Java) are presented.


Assuntos
Gráficos por Computador , Regulação da Expressão Gênica/fisiologia , Proteoma/química , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Software , Interface Usuário-Computador , Linguagens de Programação , Relação Estrutura-Atividade
11.
Genome Biol ; 7(10): R89, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17029626

RESUMO

BACKGROUND: Gene duplications have been hypothesized to be a major factor in enabling the evolution of tissue differentiation. Analyses of the expression profiles of duplicate genes in mammalian tissues have indicated that, with time, the expression patterns of duplicate genes diverge and become more tissue specific. We explored the relationship between duplication events, the time at which they took place, and both the expression breadth of the duplicated genes and the cumulative expression breadth of the gene family to which they belong. RESULTS: We show that only duplicates that arose through post-multicellularity duplication events show a tendency to become more specifically expressed, whereas such a tendency is not observed for duplicates that arose in a unicellular ancestor. Unlike the narrow expression profile of the duplicated genes, the overall expression of gene families tends to maintain a global expression pattern. CONCLUSION: The work presented here supports the view suggested by the subfunctionalization model, namely that expression divergence in different tissues, following gene duplication, promotes the retention of a gene in the genome of multicellular species. The global expression profile of the gene families suggests division of expression between family members, whose expression becomes specialized. Because specialization of expression is coupled with an increased rate of sequence divergence, it can facilitate the evolution of new, tissue-specific functions.


Assuntos
Evolução Molecular , Duplicação Gênica , Regulação da Expressão Gênica , Proteínas/genética , Animais , Diferenciação Celular , Genes Duplicados , Cinética , Camundongos , Homologia de Sequência de Aminoácidos , Especificidade da Espécie
12.
Genome Res ; 15(7): 954-9, 2005 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-15965028

RESUMO

It has previously been suggested that the phylogeny of microbial species might be better described as a network containing vertical and horizontal gene transfer (HGT) events. Yet, all phylogenetic reconstructions so far have presented microbial trees rather than networks. Here, we present a first attempt to reconstruct such an evolutionary network, which we term the "net of life". We use available tree reconstruction methods to infer vertical inheritance, and use an ancestral state inference algorithm to map HGT events on the tree. We also describe a weighting scheme used to estimate the number of genes exchanged between pairs of organisms. We demonstrate that vertical inheritance constitutes the bulk of gene transfer on the tree of life. We term the bulk of horizontal gene flow between tree nodes as "vines", and demonstrate that multiple but mostly tiny vines interconnect the tree. Our results strongly suggest that the HGT network is a scale-free graph, a finding with important implications for genome evolution. We propose that genes might propagate extremely rapidly across microbial species through the HGT network, using certain organisms as hubs.


Assuntos
Archaea/genética , Bactérias/genética , Transferência Genética Horizontal , Filogenia , Algoritmos , Biologia Computacional , Evolução Molecular , Genoma Bacteriano , Modelos Genéticos
13.
Bioinformatics ; 21(16): 3429-30, 2005 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-15961438

RESUMO

MOTIVATION: At present, mapping of sequence identifiers across databases is a daunting, time-consuming and computationally expensive process, usually achieved by sequence similarity searches with strict threshold values. SUMMARY: We present a rapid and efficient method to map sequence identifiers across databases. The method uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map sequences across databases. The program, called MagicMatch, is able to cross-link any of the major sequence databases within a few seconds on a modest desktop computer.


Assuntos
Algoritmos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Dados de Sequência Molecular , Proteínas/análise , Proteínas/classificação
14.
Bioinformatics ; 21(19): 3806-10, 2005 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-16216832

RESUMO

MOTIVATION: CoGenT++ is a data environment for computational research in comparative and functional genomics, designed to address issues of consistency, reproducibility, scalability and accessibility. DESCRIPTION: CoGenT++ facilitates the re-distribution of all fully sequenced and published genomes, storing information about species, gene names and protein sequences. We describe our scalable implementation of ProXSim, a continually updated all-against-all similarity database, which stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions--AllFuse, putative orthologs--OFAM, protein families--TRIBES, phylogenetic profiles--ProfUse and phylogenetic trees. Extensions based on the CoGenT++ environment include disease gene prediction, pattern discovery, automated domain detection, genome annotation and ancestral reconstruction. CONCLUSION: CoGenT++ provides a comprehensive environment for computational genomics, accessible primarily for large-scale analyses as well as manual browsing.


Assuntos
Mapeamento Cromossômico/métodos , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genômica/métodos , Análise de Sequência/métodos , Interface Usuário-Computador , Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Integração de Sistemas
15.
Bioinformatics ; 19(11): 1451-2, 2003 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-12874064

RESUMO

SUMMARY: We present a database of fully sequenced and published genomes to facilitate the re-distribution of data and ensure reproducibility of results in the field of computational genomics. For its design we have implemented an extremely simple yet powerful schema to allow linking of genome sequence data to other resources. AVAILABILITY: http://maine.ebi.ac.uk:8000/services/cogent/


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Documentação , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Internet
16.
Genome Biol ; 4(5): 402, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-12734008

RESUMO

By the end of 2002, we witnessed the landmark submission of the 100th complete genome sequence in the databases. An overview of these genomes reveals certain interesting trends and provides valuable insights into possible future developments.


Assuntos
Genoma , Animais , Biologia Computacional/métodos , Biologia Computacional/tendências , Humanos , Filogenia , Proteínas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA