Pesquisa | Portal Regional da BVS

OMA standalone: orthology inference among public and custom genomes and transcriptomes.

Altenhoff, Adrian M; Levy, Jeremy; Zarowiecki, Magdalena; Tomiczek, Bartlomiej; Warwick Vesztrocy, Alex; Dalquen, Daniel A; Müller, Steven; Telford, Maximilian J; Glover, Natasha M; Dylus, David; Dessimoz, Christophe.

Genome Res ; 29(7): 1152-1163, 2019 07.

Artigo em Inglês | MEDLINE | ID: mdl-31235654

RESUMO

Genomes and transcriptomes are now typically sequenced by individual laboratories but analyzing them often remains challenging. One essential step in many analyses lies in identifying orthologs-corresponding genes across multiple species-but this is far from trivial. The Orthologous MAtrix (OMA) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and precomputed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of Lophotrochozoa, a challenging clade within the protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in nonmodel organisms. OMA standalone is available under the permissive open source Mozilla Public License Version 2.0.

Assuntos

Bases de Dados Genéticas , Genoma , Invertebrados/classificação , Software , Transcriptoma , Animais , Invertebrados/genética , Filogenia

Maximum Likelihood Implementation of an Isolation-with-Migration Model for Three Species.

Dalquen, Daniel A; Zhu, Tianqi; Yang, Ziheng.

Syst Biol ; 66(3): 379-398, 2017 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-27486180

RESUMO

We develop a maximum likelihood (ML) method for estimating migration rates between species using genomic sequence data. A species tree is used to accommodate the phylogenetic relationships among three species, allowing for migration between the two sister species, while the third species is used as an out-group. A Markov chain characterization of the genealogical process of coalescence and migration is used to integrate out the migration histories at each locus analytically, whereas Gaussian quadrature is used to integrate over the coalescent times on each genealogical tree numerically. This is an extension of our early implementation of the symmetrical isolation-with-migration model for three species to accommodate arbitrary loci with two or three sequences per locus and to allow asymmetrical migration rates. Our implementation can accommodate tens of thousands of loci, making it feasible to analyze genome-scale data sets to test for gene flow. We calculate the posterior probabilities of gene trees at individual loci to identify genomic regions that are likely to have been transferred between species due to gene flow. We conduct a simulation study to examine the statistical properties of the likelihood ratio test for gene flow between the two in-group species and of the ML estimates of model parameters such as the migration rate. Inclusion of data from a third out-group species is found to increase dramatically the power of the test and the precision of parameter estimation. We compiled and analyzed several genomic data sets from the Drosophila fruit flies. Our analyses suggest no migration from D. melanogaster to D. simulans, and a significant amount of gene flow from D. simulans to D. melanogaster, at the rate of ~0.02 migrant individuals per generation. We discuss the utility of the multispecies coalescent model for species tree estimation, accounting for incomplete lineage sorting and migration.

Assuntos

Classificação/métodos , Modelos Biológicos , Filogenia , Migração Animal , Animais , Simulação por Computador , Drosophila/classificação , Drosophila/genética , Fluxo Gênico , Funções Verossimilhança

Standardized benchmarking in the quest for orthologs.

Altenhoff, Adrian M; Boeckmann, Brigitte; Capella-Gutierrez, Salvador; Dalquen, Daniel A; DeLuca, Todd; Forslund, Kristoffer; Huerta-Cepas, Jaime; Linard, Benjamin; Pereira, Cécile; Pryszcz, Leszek P; Schreiber, Fabian; da Silva, Alan Sousa; Szklarczyk, Damian; Train, Clément-Marie; Bork, Peer; Lecompte, Odile; von Mering, Christian; Xenarios, Ioannis; Sjölander, Kimmen; Jensen, Lars Juhl; Martin, Maria J; Muffato, Matthieu; Gabaldón, Toni; Lewis, Suzanna E; Thomas, Paul D; Sonnhammer, Erik; Dessimoz, Christophe.

Nat Methods ; 13(5): 425-30, 2016 05.

Artigo em Inglês | MEDLINE | ID: mdl-27043882

RESUMO

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.

Assuntos

Biologia Computacional/normas , Genômica/normas , Filogenia , Proteômica/normas , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Eucariotos/classificação , Eucariotos/genética , Ontologia Genética , Genômica/métodos , Modelos Genéticos , Proteômica/métodos , Análise de Sequência de Proteína , Homologia de Sequência , Especificidade da Espécie

Bidirectional best hits miss many orthologs in duplication-rich clades such as plants and animals.

Dalquen, Daniel A; Dessimoz, Christophe.

Genome Biol Evol ; 5(10): 1800-6, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24013106

RESUMO

Bidirectional best hits (BBH), which entails identifying the pairs of genes in two different genomes that are more similar to each other than either is to any other gene in the other genome, is a simple and widely used method to infer orthology. A recent study has analyzed the link between BBH and orthology in bacteria and archaea and concluded that, given the very high consistency in BBH they observed among triplets of neighboring genes, a high proportion of BBH are likely to be bona fide orthologs. However, limited by their analysis setup, the previous study could not easily test the reverse question: which proportion of orthologs are BBH? In this follow-up study, we consider this question in theory and answer it based on conceptual arguments, simulated data, and real biological data from all three domains of life. Our analyses corroborate the findings of the previous study, but also show that because of the high rate of gene duplication in plants and animals, as much as 60% of orthologous relations are missed by the BBH criterion.

Assuntos

Evolução Molecular , Duplicação Gênica , Plantas/genética , Algoritmos , Animais , Sequência Conservada/genética , Filogenia

The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.

Dalquen, Daniel A; Altenhoff, Adrian M; Gonnet, Gaston H; Dessimoz, Christophe.

PLoS One ; 8(2): e56925, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23451112

RESUMO

The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.

Assuntos

Duplicação Gênica/genética , Transferência Genética Horizontal/genética , Mutagênese Insercional/genética , Genômica/métodos

Gene copy number variation and its significance in cyanobacterial phylogeny.

Schirrmeister, Bettina E; Dalquen, Daniel A; Anisimova, Maria; Bagheri, Homayoun C.

BMC Microbiol ; 12: 177, 2012 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-22894826

RESUMO

BACKGROUND: In eukaryotes, variation in gene copy numbers is often associated with deleterious effects, but may also have positive effects. For prokaryotes, studies on gene copy number variation are rare. Previous studies have suggested that high numbers of rRNA gene copies can be advantageous in environments with changing resource availability, but further association of gene copies and phenotypic traits are not documented. We used one of the morphologically most diverse prokaryotic phyla to test whether numbers of gene copies are associated with levels of cell differentiation. RESULTS: We implemented a search algorithm that identified 44 genes with highly conserved copies across 22 fully sequenced cyanobacterial taxa. For two very basal cyanobacterial species, Gloeobacter violaceus and a thermophilic Synechococcus species, distinct phylogenetic positions previously found were supported by identical protein coding gene copy numbers. Furthermore, we found that increased ribosomal gene copy numbers showed a strong correlation to cyanobacteria capable of terminal cell differentiation. Additionally, we detected extremely low variation of 16S rRNA sequence copies within the cyanobacteria. We compared our results for 16S rRNA to three other eubacterial phyla (Chroroflexi, Spirochaetes and Bacteroidetes). Based on Bayesian phylogenetic inference and the comparisons of genetic distances, we could confirm that cyanobacterial 16S rRNA paralogs and orthologs show significantly stronger conservation than found in other eubacterial phyla. CONCLUSIONS: A higher number of ribosomal operons could potentially provide an advantage to terminally differentiated cyanobacteria. Furthermore, we suggest that 16S rRNA gene copies in cyanobacteria are homogenized by both concerted evolution and purifying selection. In addition, the small ribosomal subunit in cyanobacteria appears to evolve at extraordinary slow evolutionary rates, an observation that has been made previously for morphological characteristics of cyanobacteria.

Assuntos

Cianobactérias/classificação , Cianobactérias/citologia , Dosagem de Genes , Genes de RNAr , Filogenia , Cianobactérias/genética , Óperon

ALF--a simulation framework for genome evolution.

Dalquen, Daniel A; Anisimova, Maria; Gonnet, Gaston H; Dessimoz, Christophe.

Mol Biol Evol ; 29(4): 1115-23, 2012 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-22160766

RESUMO

In computational evolutionary biology, verification and benchmarking is a challenging task because the evolutionary history of studied biological entities is usually not known. Computer programs for simulating sequence evolution in silico have shown to be viable test beds for the verification of newly developed methods and to compare different algorithms. However, current simulation packages tend to focus either on gene-level aspects of genome evolution such as character substitutions and insertions and deletions (indels) or on genome-level aspects such as genome rearrangement and speciation events. Here, we introduce Artificial Life Framework (ALF), which aims at simulating the entire range of evolutionary forces that act on genomes: nucleotide, codon, or amino acid substitution (under simple or mixture models), indels, GC-content amelioration, gene duplication, gene loss, gene fusion, gene fission, genome rearrangement, lateral gene transfer (LGT), or speciation. The other distinctive feature of ALF is its user-friendly yet powerful web interface. We illustrate the utility of ALF with two possible applications: 1) we reanalyze data from a study of selection after globin gene duplication and test the statistical significance of the original conclusions and 2) we demonstrate that LGT can dramatically decrease the accuracy of two well-established orthology inference methods. ALF is available as a stand-alone application or via a web interface at http://www.cbrg.ethz.ch/alf.

Assuntos

Biologia Computacional/métodos , Evolução Molecular , Genoma , Modelos Genéticos , Software , Algoritmos , Composição de Bases , Simulação por Computador , Transferência Genética Horizontal , Especiação Genética , Mutagênese

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA