Pesquisa | Portal de Pesquisa da BVS

1.

Comparing genomes with rearrangements and segmental duplications.

Shao, Mingfu; Moret, Bernard M E.

Bioinformatics ; 31(12): i329-38, 2015 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-26072500

RESUMO

MOTIVATION: Large-scale evolutionary events such as genomic rearrange.ments and segmental duplications form an important part of the evolution of genomes and are widely studied from both biological and computational perspectives. A basic computational problem is to infer these events in the evolutionary history for given modern genomes, a task for which many algorithms have been proposed under various constraints. Algorithms that can handle both rearrangements and content-modifying events such as duplications and losses remain few and limited in their applicability. RESULTS: We study the comparison of two genomes under a model including general rearrangements (through double-cut-and-join) and segmental duplications. We formulate the comparison as an optimization problem and describe an exact algorithm to solve it by using an integer linear program. We also devise a sufficient condition and an efficient algorithm to identify optimal substructures, which can simplify the problem while preserving optimality. Using the optimal substructures with the integer linear program (ILP) formulation yields a practical and exact algorithm to solve the problem. We then apply our algorithm to assign in-paralogs and orthologs (a necessary step in handling duplications) and compare its performance with that of the state-of-the-art method MSOAR, using both simulations and real data. On simulated datasets, our method outperforms MSOAR by a significant margin, and on five well-annotated species, MSOAR achieves high accuracy, yet our method performs slightly better on each of the 10 pairwise comparisons. AVAILABILITY AND IMPLEMENTATION: http://lcbb.epfl.ch/softwares/coser.

Assuntos

Algoritmos , Evolução Molecular , Genômica/métodos , Duplicações Segmentares Genômicas , Animais , Cromossomos , Humanos , Camundongos , Programação Linear , Ratos

2.

Evaluating synteny for improved comparative studies.

Ghiurcuta, Cristina G; Moret, Bernard M E.

Bioinformatics ; 30(12): i9-18, 2014 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-24932010

RESUMO

MOTIVATION: Comparative genomics aims to understand the structure and function of genomes by translating knowledge gained about some genomes to the object of study. Early approaches used pairwise comparisons, but today researchers are attempting to leverage the larger potential of multi-way comparisons. Comparative genomics relies on the structuring of genomes into syntenic blocks: blocks of sequence that exhibit conserved features across the genomes. Syntenic blocs are required for complex computations to scale to the billions of nucleotides present in many genomes; they enable comparisons across broad ranges of genomes because they filter out much of the individual variability; they highlight candidate regions for in-depth studies; and they facilitate whole-genome comparisons through visualization tools. However, the concept of syntenic block remains loosely defined. Tools for the identification of syntenic blocks yield quite different results, thereby preventing a systematic assessment of the next steps in an analysis. Current tools do not include measurable quality objectives and thus cannot be benchmarked against themselves. Comparisons among tools have also been neglected-what few results are given use superficial measures unrelated to quality or consistency. RESULTS: We present a theoretical model as well as an experimental basis for comparing syntenic blocks and thus also for improving or designing tools for the identification of syntenic blocks. We illustrate the application of the model and the measures by applying them to syntenic blocks produced by three different contemporary tools (DRIMM-Synteny, i-ADHoRe and Cyntenator) on a dataset of eight yeast genomes. Our findings highlight the need for a well founded, systematic approach to the decomposition of genomes into syntenic blocks. Our experiments demonstrate widely divergent results among these tools, throwing into question the robustness of the basic approach in comparative genomics. We have taken the first step towards a formal approach to the construction of syntenic blocks by developing a simple quality criterion based on sound evolutionary principles.

Assuntos

Sintenia , Genoma Fúngico , Genômica/métodos , Alinhamento de Sequência , Software , Leveduras/genética

3.

Probabilistic partitioning methods to find significant patterns in ChIP-Seq data.

Nair, Nishanth Ulhas; Kumar, Sunil; Moret, Bernard M E; Bucher, Philipp.

Bioinformatics ; 30(17): 2406-13, 2014 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-24812341

RESUMO

MOTIVATION: We have witnessed an enormous increase in ChIP-Seq data for histone modifications in the past few years. Discovering significant patterns in these data is an important problem for understanding biological mechanisms. RESULTS: We propose probabilistic partitioning methods to discover significant patterns in ChIP-Seq data. Our methods take into account signal magnitude, shape, strand orientation and shifts. We compare our methods with some current methods and demonstrate significant improvements, especially with sparse data. Besides pattern discovery and classification, probabilistic partitioning can serve other purposes in ChIP-Seq data analysis. Specifically, we exemplify its merits in the context of peak finding and partitioning of nucleosome positioning patterns in human promoters. AVAILABILITY AND IMPLEMENTATION: The software and code are available in the supplementary material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Imunoprecipitação da Cromatina/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Histonas/metabolismo , Análise de Sequência de DNA/métodos , Algoritmos , Humanos , Nucleossomos/metabolismo , Probabilidade , Regiões Promotoras Genéticas , Software

4.

Study of cell differentiation by phylogenetic analysis using histone modification data.

Nair, Nishanth Ulhas; Lin, Yu; Manasovska, Ana; Antic, Jelena; Grnarova, Paulina; Sahu, Avinash Das; Bucher, Philipp; Moret, Bernard M E.

BMC Bioinformatics ; 15: 269, 2014 Aug 08.

Artigo em Inglês | MEDLINE | ID: mdl-25104072

RESUMO

BACKGROUND: In cell differentiation, a cell of a less specialized type becomes one of a more specialized type, even though all cells have the same genome. Transcription factors and epigenetic marks like histone modifications can play a significant role in the differentiation process. RESULTS: In this paper, we present a simple analysis of cell types and differentiation paths using phylogenetic inference based on ChIP-Seq histone modification data. We precisely defined the notion of cell-type trees and provided a procedure of building such trees. We propose new data representation techniques and distance measures for ChIP-Seq data and use these together with standard phylogenetic inference methods to build biologically meaningful cell-type trees that indicate how diverse types of cells are related. We demonstrate our approach on various kinds of histone modifications for various cell types, also using the datasets to explore various issues surrounding replicate data, variability between cells of the same type, and robustness. We use the results to get some interesting biological findings like important patterns of histone modification changes during cell differentiation process. CONCLUSIONS: We introduced and studied the novel problem of inferring cell type trees from histone modification data. The promising results we obtain point the way to a new approach to the study of cell differentiation. We also discuss how cell-type trees can be used to study the evolution of cell types.

Assuntos

Diferenciação Celular/genética , Epigenômica/métodos , Histonas/metabolismo , Filogenia , Imunoprecipitação da Cromatina , Histonas/genética , Humanos , Análise de Sequência de DNA , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo

5.

TIBA: a tool for phylogeny inference from rearrangement data with bootstrap analysis.

Lin, Yu; Rajan, Vaibhav; Moret, Bernard M E.

Bioinformatics ; 28(24): 3324-5, 2012 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-23060619

RESUMO

TIBA is a tool to reconstruct phylogenetic trees from rearrangement data that consist of ordered lists of synteny blocks (or genes), where each synteny block is shared with all of its homologues in the input genomes. The evolution of these synteny blocks, through rearrangement operations, is modelled by the uniform Double-Cut-and-Join model. Using a true distance estimate under this model and simple distance-based methods, TIBA reconstructs a phylogeny of the input genomes. Unlike any previous tool for inferring phylogenies from rearrangement data, TIBA uses novel methods of robustness estimation to provide support values for the edges in the inferred tree.

Assuntos

Filogenia , Software , Evolução Molecular , Genoma , Sintenia

6.

Inferring transcript phylogenies.

Christinat, Yann; Moret, Bernard M E.

BMC Bioinformatics ; 13 Suppl 9: S1, 2012 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-22831154

RESUMO

Alternative splicing, an unknown mechanism 20 years ago, is now recognized as a major mechanism for proteome and transcriptome diversity, particularly in mammalssome researchers conjecture that up to 90% of human genes are alternatively spliced. Despite much research on exon and intron evolution, little is known about the evolution of transcripts. In this paper, we present a model of transcript evolution and an associated algorithm to reconstruct transcript phylogenies. The evolution of the gene structureexons and intronsis used as basis for the reconstruction of transcript phylogenies. We apply our model and reconstruction algorithm on two well-studied genes, MAG and PAX6, obtaining results consistent with current knowledge and thereby providing evidence that a phylogenetic analysis of transcripts is feasible and likely to be informative.

Assuntos

Algoritmos , Processamento Alternativo , Modelos Genéticos , Filogenia , Animais , Evolução Molecular , Éxons , Proteínas do Olho/genética , Proteínas de Homeodomínio/genética , Humanos , Íntrons , Glicoproteína Associada a Mielina/genética , Fator de Transcrição PAX6 , Fatores de Transcrição Box Pareados/genética , Proteínas Repressoras/genética

7.

FluReF, an automated flu virus reassortment finder based on phylogenetic trees.

Yurovsky, Alisa; Moret, Bernard M E.

BMC Genomics ; 12 Suppl 2: S3, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-21989112

RESUMO

BACKGROUND: Reassortments are events in the evolution of the genome of influenza (flu), whereby segments of the genome are exchanged between different strains. As reassortments have been implicated in major human pandemics of the last century, their identification has become a health priority. While such identification can be done "by hand" on a small dataset, researchers and health authorities are building up enormous databases of genomic sequences for every flu strain, so that it is imperative to develop automated identification methods. However, current methods are limited to pairwise segment comparisons. RESULTS: We present FluReF, a fully automated flu virus reassortment finder. FluReF is inspired by the visual approach to reassortment identification and uses the reconstructed phylogenetic trees of the individual segments and of the full genome. We also present a simple flu evolution simulator, based on the current, source-sink, hypothesis for flu cycles. On synthetic datasets produced by our simulator, FluReF, tuned for a 0% false positive rate, yielded false negative rates of less than 10%. FluReF corroborated two new reassortments identified by visual analysis of 75 Human H3N2 New York flu strains from 2005-2008 and gave partial verification of reassortments found using another bioinformatics method. METHODS: FluReF finds reassortments by a bottom-up search of the full-genome and segment-based phylogenetic trees for candidate clades--groups of one or more sampled viruses that are separated from the other variants from the same season. Candidate clades in each tree are tested to guarantee confidence values, using the lengths of key edges as well as other tree parameters; clades with reassortments must have validated incongruencies among segment trees. CONCLUSIONS: FluReF demonstrates robustness of prediction for geographically and temporally expanded datasets, and is not limited to finding reassortments with previously collected sequences. The complete source code is available from http://lcbb.epfl.ch/software.html.

Assuntos

Algoritmos , Genoma Viral , Vírus da Influenza A Subtipo H3N2/classificação , Filogenia , Vírus Reordenados/classificação , Software , Evolução Molecular , Vírus da Influenza A Subtipo H3N2/genética , Modelos Estatísticos , Mutação Puntual , Vírus Reordenados/genética , Alinhamento de Sequência

8.

Estimating true evolutionary distances under rearrangements, duplications, and losses.

Lin, Yu; Rajan, Vaibhav; Swenson, Krister M; Moret, Bernard M E.

BMC Bioinformatics ; 11 Suppl 1: S54, 2010 Jan 18.

Artigo em Inglês | MEDLINE | ID: mdl-20122229

RESUMO

BACKGROUND: The rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution. Evolutionary mechanisms based on genome rearrangements have attracted much attention and given rise to many models; somewhat independently, the mechanisms of gene duplication and loss have seen much work. However, the two are not independent and thus require a unified treatment, which remains missing to date. Moreover, existing rearrangement models do not fit the dichotomy between most prokaryotic genomes (one circular chromosome) and most eukaryotic genomes (multiple linear chromosomes). RESULTS: To handle rearrangements, gene duplications and losses, we propose a new evolutionary model and the corresponding method for estimating true evolutionary distance. Our model, inspired from the DCJ model, is simple and the first to respect the prokaryotic/eukaryotic structural dichotomy. Experimental results on a wide variety of genome structures demonstrate the very high accuracy and robustness of our distance estimator. CONCLUSION: We give the first robust, statistically based, estimate of genomic pairwise distances based on rearrangements, duplications and losses, under a model that respects the structural dichotomy between prokaryotic and eukaryotic genomes. Accurate and robust estimates in true evolutionary distances should translate into much better phylogenetic reconstructions as well as more accurate genomic alignments, while our new model of genome rearrangements provides another refinement in simplicity and verisimilitude.

Assuntos

Evolução Molecular , Duplicação Gênica , Rearranjo Gênico/genética , Genômica/métodos , Genoma , Filogenia

9.

Heuristics for the inversion median problem.

Rajan, Vaibhav; Xu, Andrew Wei; Lin, Yu; Swenson, Krister M; Moret, Bernard M E.

BMC Bioinformatics ; 11 Suppl 1: S30, 2010 Jan 18.

Artigo em Inglês | MEDLINE | ID: mdl-20122203

RESUMO

BACKGROUND: The study of genome rearrangements has become a mainstay of phylogenetics and comparative genomics. Fundamental in such a study is the median problem: given three genomes find a fourth that minimizes the sum of the evolutionary distances between itself and the given three. Many exact algorithms and heuristics have been developed for the inversion median problem, of which the best known is MGR. RESULTS: We present a unifying framework for median heuristics, which enables us to clarify existing strategies and to place them in a partial ordering. Analysis of this framework leads to a new insight: the best strategies continue to refer to the input data rather than reducing the problem to smaller instances. Using this insight, we develop a new heuristic for inversion medians that uses input data to the end of its computation and leverages our previous work with DCJ medians. Finally, we present the results of extensive experimentation showing that our new heuristic outperforms all others in accuracy and, especially, in running time: the heuristic typically returns solutions within 1% of optimal and runs in seconds to minutes even on genomes with 25'000 genes--in contrast, MGR can take days on instances of 200 genes and cannot be used beyond 1'000 genes. CONCLUSION: Finding good rearrangement medians, in particular inversion medians, had long been regarded as the computational bottleneck in whole-genome studies. Our new heuristic for inversion medians, ASM, which dominates all others in our framework, puts that issue to rest by providing near-optimal solutions within seconds to minutes on even the largest genomes.

Assuntos

Algoritmos , Genoma , Genômica/métodos , Evolução Molecular , Rearranjo Gênico , Filogenia

10.

Inversion-based genomic signatures.

Swenson, Krister M; Moret, Bernard M E.

BMC Bioinformatics ; 10 Suppl 1: S7, 2009 Jan 30.

Artigo em Inglês | MEDLINE | ID: mdl-19208174

RESUMO

BACKGROUND: Reconstructing complete ancestral genomes (at least in terms of their gene inventory and arrangement) is attracting much interest due to the rapidly increasing availability of whole genome sequences. While modest successes have been reported for mammalian and even vertebrate genomes, more divergent groups continue to pose a stiff challenge, mostly because current models of genomic evolution support too many choices. RESULTS: We describe a novel type of genomic signature based on rearrangements that characterizes evolutionary changes that must be common to all minimal rearrangement scenarios; by focusing on global patterns of rearrangements, such signatures bypass individual variations and sharply restrict the search space. We present the results of extensive simulation studies demonstrating that these signatures can be used to reconstruct accurate ancestral genomes and phylogenies even for widely divergent collections. CONCLUSION: Focusing on genome triples rather than genomes pairs unleashes the full power of evolutionary analysis. Our genomic signature captures shared evolutionary events and thus can form the basis of a robust analysis and reconstruction of evolutionary history.

Assuntos

Inversão Cromossômica/genética , Evolução Molecular , Genoma , Animais , Rearranjo Gênico , Humanos

11.

Maximum independent sets of commuting and noninterfering inversions.

Swenson, Krister M; To, Yokuki; Tang, Jijun; Moret, Bernard M E.

BMC Bioinformatics ; 10 Suppl 1: S6, 2009 Jan 30.

Artigo em Inglês | MEDLINE | ID: mdl-19208163

RESUMO

BACKGROUND: Given three signed permutations, an inversion median is a fourth permutation that minimizes the sum of the pairwise inversion distances between it and the three others. This problem is NP-hard as well as hard to approximate. Yet median-based approaches to phylogenetic reconstruction have been shown to be among the most accurate, especially in the presence of long branches. Most existing approaches have used heuristics that attempt to find a longest sequence of inversions from one of the three permutations that, at each step in the sequence, moves closer to the other two permutations; yet very little is known about the quality of solutions returned by such approaches. RESULTS: Recently, Arndt and Tang took a step towards finding longer such sequences by using sets of commuting inversions. In this paper, we formalize the problem of finding such sequences of inversions with what we call signatures and provide algorithms to find maximum cardinality sets of commuting and noninterfering inversions. CONCLUSION: Our results offer a framework in which to study the inversion median problem, faster algorithms to obtain good medians, and an approach to study characteristic events along an evolutionary path.

Assuntos

Algoritmos , Filogenia , Inversão Cromossômica , Biologia Computacional/métodos , Evolução Molecular

12.

Estimating true evolutionary distances under the DCJ model.

Lin, Yu; Moret, Bernard M E.

Bioinformatics ; 24(13): i114-22, 2008 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-18586703

RESUMO

MOTIVATION: Modern techniques can yield the ordering and strandedness of genes on each chromosome of a genome; such data already exists for hundreds of organisms. The evolutionary mechanisms through which the set of the genes of an organism is altered and reordered are of great interest to systematists, evolutionary biologists, comparative genomicists and biomedical researchers. Perhaps the most basic concept in this area is that of evolutionary distance between two genomes: under a given model of genomic evolution, how many events most likely took place to account for the difference between the two genomes? RESULTS: We present a method to estimate the true evolutionary distance between two genomes under the 'double-cut-and-join' (DCJ) model of genome rearrangement, a model under which a single multichromosomal operation accounts for all genomic rearrangement events: inversion, transposition, translocation, block interchange and chromosomal fusion and fission. Our method relies on a simple structural characterization of a genome pair and is both analytically and computationally tractable. We provide analytical results to describe the asymptotic behavior of genomes under the DCJ model, as well as experimental results on a wide variety of genome structures to exemplify the very high accuracy (and low variance) of our estimator. Our results provide a tool for accurate phylogenetic reconstruction from multichromosomal gene rearrangement data as well as a theoretical basis for refinements of the DCJ model to account for biological constraints. AVAILABILITY: All of our software is available in source form under GPL at http://lcbb.epfl.ch.

Assuntos

Mapeamento Cromossômico/métodos , Evolução Molecular , Genoma/genética , Desequilíbrio de Ligação/genética , Modelos Genéticos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Simulação por Computador , Dados de Sequência Molecular

13.

Gene rearrangement analysis and ancestral order inference from chloroplast genomes with inverted repeat.

Yue, Feng; Cui, Liying; dePamphilis, Claude W; Moret, Bernard M E; Tang, Jijun.

BMC Genomics ; 9 Suppl 1: S25, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18366615

RESUMO

BACKGROUND: Genome evolution is shaped not only by nucleotide substitutions, but also by structural changes including gene and genome duplications, insertions, deletions and gene order rearrangements. The most popular methods for reconstructing phylogeny from genome rearrangements include GRAPPA and MGR. However these methods are limited to cases where equal gene content or few deletions can be assumed. Since conserved duplicated regions are present in many chloroplast genomes, the inference of inverted repeats is needed in chloroplast phylogeny analysis and ancestral genome reconstruction. RESULTS: We extend GRAPPA and develop a new method GRAPPA-IR to handle chloroplast genomes. A test of GRAPPA-IR using divergent chloroplast genomes from land plants and green algae recovers the phylogeny congruent with prior studies, while analysis that do not consider IR structure fail to obtain the accepted topology. Our extensive simulation study also confirms that GRAPPA has better accuracy then the existing methods. CONCLUSIONS: Tests on a biological and simulated dataset show GRAPPA-IR can accurately recover the genome phylogeny as well as ancestral gene orders. Close analysis of the ancestral genome structure suggests that genome rearrangement in chloroplasts is probably limited by inverted repeats with a conserved core region. In addition, the boundaries of inverted repeats are hot spots for gene duplications or deletions. The new GRAPPA-IR is available from http://phylo.cse.sc.edu.

Assuntos

Algoritmos , DNA de Cloroplastos/genética , Evolução Molecular , Rearranjo Gênico/genética , Gossypium/genética , Filogenia , Classificação/métodos , Simulação por Computador , Sequências Repetitivas de Ácido Nucleico/genética

14.

Sequence-Based Synteny Analysis of Multiple Large Genomes.

Doerr, Daniel; Moret, Bernard M E.

Methods Mol Biol ; 1704: 317-329, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29277871

RESUMO

Current methods for synteny analysis provide only limited support to study large genomes at the sequence level. In this chapter, we describe a pipeline based on existing tools that, applied in a suitable fashion, enables synteny analysis of large genomic datasets. We give a hands-on description of each step of the pipeline using four avian genomes for data. We also provide integration scripts that simplify the conversion and setup of data between the different tools in the pipeline.

Assuntos

Aves/genética , Genoma , Software , Sintenia , Algoritmos , Animais , Aves/classificação , Biologia Computacional , Marcadores Genéticos , Genômica/métodos , Análise de Sequência de DNA

15.

Efficiently computing the Robinson-Foulds metric.

Pattengale, Nicholas D; Gottlieb, Eric J; Moret, Bernard M E.

J Comput Biol ; 14(6): 724-35, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17691890

RESUMO

The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day's algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, in sublinear time and with high probability, a (1 + epsilon) approximation of the true RF metric. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We complement our algorithm by presenting an efficient embedding procedure, thereby resolving an open issue from the preliminary version of this paper. We have also improved the performance of Day's (exact) algorithm in practice by using techniques discovered while implementing our approximation scheme. Indeed, we give a unified framework for edge-based tree algorithms in which implementation tradeoffs are clear. Finally, we present detailed experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach. Our new implementation, FastRF, is available as an open-source tool for phylogenetic analysis.

Assuntos

Algoritmos , Biologia Computacional , Evolução Molecular , Filogenia , Proteínas/genética , Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Software

16.

On Computing Breakpoint Distances for Genomes with Duplicate Genes.

Shao, Mingfu; Moret, Bernard M E.

J Comput Biol ; 24(6): 571-580, 2017 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-27788022

RESUMO

A fundamental problem in comparative genomics is to compute the distance between two genomes in terms of its higher level organization (given by genes or syntenic blocks). For two genomes without duplicate genes, we can easily define (and almost always efficiently compute) a variety of distance measures, but the problem is NP-hard under most models when genomes contain duplicate genes. To tackle duplicate genes, three formulations (exemplar, maximum matching, and any matching) have been proposed, all of which aim to build a matching between homologous genes so as to minimize some distance measure. Of the many distance measures, the breakpoint distance (the number of nonconserved adjacencies) was the first one to be studied and remains of significant interest because of its simplicity and model-free property. The three breakpoint distance problems corresponding to the three formulations have been widely studied. Although we provided last year a solution for the exemplar problem that runs very fast on full genomes, computing optimal solutions for the other two problems has remained challenging. In this article, we describe very fast, exact algorithms for these two problems. Our algorithms rely on a compact integer-linear program that we further simplify by developing an algorithm to remove variables, based on new results on the structure of adjacencies and matchings. Through extensive experiments using both simulations and biological data sets, we show that our algorithms run very fast (in seconds) on mammalian genomes and scale well beyond. We also apply these algorithms (as well as the classic orthology tool MSOAR) to create orthology assignment, then compare their quality in terms of both accuracy and coverage. We find that our algorithm for the "any matching" formulation significantly outperforms other methods in terms of accuracy while achieving nearly maximum coverage.

Assuntos

Algoritmos , Genes Duplicados , Genoma , Genômica/métodos , Mamíferos/genética , Modelos Genéticos , Animais , Evolução Biológica

17.

New Genome Similarity Measures based on Conserved Gene Adjacencies.

Doerr, Daniel; Kowada, Luis Antonio B; Araujo, Eloi; Deshpande, Shachi; Dantas, Simone; Moret, Bernard M E; Stoye, Jens.

J Comput Biol ; 24(6): 616-634, 2017 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-28590847

RESUMO

Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerful-but also most complex-models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.

Assuntos

Algoritmos , Biologia Computacional/métodos , Ordem dos Genes , Genes de Plantas , Genoma de Planta , Genômica/métodos , Modelos Genéticos , Família Multigênica , Filogenia

18.

NEMo: An Evolutionary Model With Modularity for PPI Networks.

Ye, Min; Zhang, Xiuwei; Racz, Gabriela C; Jiang, Qijia; Moret, Bernard M E.

IEEE Trans Nanobioscience ; 16(2): 131-139, 2017 03.

Artigo em Inglês | MEDLINE | ID: mdl-28113347

RESUMO

Modeling the evolution of biological networks is a major challenge. Biological networks are usually represented as graphs; evolutionary events not only include addition and removal of vertices and edges but also duplication of vertices and their associated edges. Since duplication is viewed as a primary driver of genomic evolution, recent work has focused on duplication-based models. Missing from these models is any embodiment of modularity, a widely accepted attribute of biological networks. Some models spontaneously generate modular structures, but none is known to maintain and evolve them. We describe network evolution with modularity (NEMo), a new model that embodies modularity. NEMo allows modules to appear and disappear and to fission and to merge, all driven by the underlying edge-level events using a duplication-based process. We also introduce measures to compare biological networks in terms of their modular structure; we present comparisons between NEMo and existing duplication-based models and run our measuring tools on both generated and published networks.

Assuntos

Biologia Computacional/métodos , Modelos Genéticos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Evolução Molecular , Humanos , Proteínas/genética

19.

A Fast and Exact Algorithm for the Exemplar Breakpoint Distance.

Shao, Mingfu; Moret, Bernard M E.

J Comput Biol ; 23(5): 337-46, 2016 05.

Artigo em Inglês | MEDLINE | ID: mdl-26953781

RESUMO

A fundamental problem in comparative genomics is to compute the distance between two genomes. For two genomes without duplicate genes, we can easily compute a variety of distance measures in linear time, but the problem is NP-hard under most models when genomes contain duplicate genes. Sankoff proposed the use of exemplars to tackle the problem of duplicate genes and gene families: each gene family is represented by a single gene (the exemplar for that family), chosen so as to optimize some metric. Unfortunately, choosing exemplars is itself an NP-hard problem. In this article, we propose a very fast and exact algorithm to compute the exemplar breakpoint distance, based on new insights in the underlying structure of genome rearrangements and exemplars. We evaluate the performance of our algorithm on simulation data and compare its performance to the best effort to date (a divide-and-conquer approach), showing that our algorithm runs much faster and scales much better. We also devise a new algorithm for the intermediate breakpoint distance problem, which can then be applied to assign orthologs. We compare our algorithm with the state-of-the-art method MSOAR by assigning orthologs among five well annotated mammalian genomes, showing that our algorithm runs much faster and is slightly more accurate than MSOAR.

Assuntos

Genômica/métodos , Mamíferos/genética , Algoritmos , Animais , Genoma , Modelos Genéticos , Família Multigênica , Software

20.

Advances in phylogeny reconstruction from gene order and content data.

Moret, Bernard M E; Warnow, Tandy.

Methods Enzymol ; 395: 673-700, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-15865990

RESUMO

Genomes can be viewed in terms of their gene content and the order in which the genes appear along each chromosome. Evolutionary events that affect the gene order or content are "rare genomic events" (rarer than events that affect the composition of the nucleotide sequences) and have been advocated by systematists for inferring deep evolutionary histories. This chapter surveys recent developments in the reconstruction of phylogenies from gene order and content, focusing on their performance under various stochastic models of evolution. Because such methods are quite restricted in the type of data they can analyze, we also present research aimed at handling the full range of whole-genome data.

Assuntos

Ordem dos Genes , Genômica/métodos , Filogenia , Algoritmos , Cloroplastos/genética , Cromossomos/genética , Criptófitas/genética , Evolução Molecular , Funções Verossimilhança , Modelos Genéticos , Processos Estocásticos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA