Pesquisa | BVS Integralidade em Saúde

1.

RecPhyloXML: a format for reconciled gene trees.

Duchemin, Wandrille; Gence, Guillaume; Arigon Chifolleau, Anne-Muriel; Arvestad, Lars; Bansal, Mukul S; Berry, Vincent; Boussau, Bastien; Chevenet, François; Comte, Nicolas; Davín, Adrián A; Dessimoz, Christophe; Dylus, David; Hasic, Damir; Mallo, Diego; Planel, Rémi; Posada, David; Scornavacca, Celine; Szöllosi, Gergely; Zhang, Louxin; Tannier, Éric; Daubin, Vincent.

Bioinformatics ; 34(21): 3646-3652, 2018 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-29762653

RESUMO

Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation: http://phylariane.univ-lyon1.fr/recphyloxml/.

Assuntos

Evolução Molecular , Duplicação Gênica , Algoritmos , Filogenia , Software

2.

Diversity and evolution of chitin synthases in oomycetes (Straminipila: Oomycota).

Klinter, Stefan; Bulone, Vincent; Arvestad, Lars.

Mol Phylogenet Evol ; 139: 106558, 2019 10.

Artigo em Inglês | MEDLINE | ID: mdl-31288106

RESUMO

The oomycetes are filamentous eukaryotic microorganisms, distinct from true fungi, many of which act as crop or fish pathogens that cause devastating losses in agriculture and aquaculture. Chitin is present in all true fungi, but it occurs in only small amounts in some Saprolegniomycetes and it is absent in Peronosporomycetes. However, the growth of several oomycetes is severely impacted by competitive chitin synthase (CHS) inhibitors. Here, we shed light on the diversity, evolution and function of oomycete CHS proteins. We show by phylogenetic analysis of 93 putative CHSs from 48 highly diverse oomycetes, including the early diverging Eurychasma dicksonii, that all available oomycete genomes contain at least one putative CHS gene. All gene products contain conserved CHS motifs essential for enzymatic activity and form two Peronosporomycete-specific and six Saprolegniale-specific clades. Proteins of all clades, except one, contain an N-terminal microtubule interacting and trafficking (MIT) domain as predicted by protein domain databases or manual analysis, which is supported by homology modelling and comparison of conserved structural features from sequence logos. We identified at least three groups of CHSs conserved among all oomycete lineages and used phylogenetic reconciliation analysis to infer the dynamic evolution of CHSs in oomycetes. The evolutionary aspects of CHS diversity in modern-day oomycetes are discussed. In addition, we observed hyphal tip rupture in Phytophthora infestans upon treatment with the CHS inhibitor nikkomycin Z. Combining data on phylogeny, gene expression, and response to CHS inhibitors, we propose the association of different CHS clades with certain developmental stages.

Assuntos

Quitina Sintase/genética , Evolução Molecular , Variação Genética , Oomicetos/enzimologia , Oomicetos/genética , Sequência de Aminoácidos , Quitina Sintase/química , Sequência Conservada/genética , Funções Verossimilhança , Filogenia , Domínios Proteicos

3.

The Norway spruce genome sequence and conifer genome evolution.

Nystedt, Björn; Street, Nathaniel R; Wetterbom, Anna; Zuccolo, Andrea; Lin, Yao-Cheng; Scofield, Douglas G; Vezzi, Francesco; Delhomme, Nicolas; Giacomello, Stefania; Alexeyenko, Andrey; Vicedomini, Riccardo; Sahlin, Kristoffer; Sherwood, Ellen; Elfstrand, Malin; Gramzow, Lydia; Holmberg, Kristina; Hällman, Jimmie; Keech, Olivier; Klasson, Lisa; Koriabine, Maxim; Kucukoglu, Melis; Käller, Max; Luthman, Johannes; Lysholm, Fredrik; Niittylä, Totte; Olson, Ake; Rilakovic, Nemanja; Ritland, Carol; Rosselló, Josep A; Sena, Juliana; Svensson, Thomas; Talavera-López, Carlos; Theißen, Günter; Tuominen, Hannele; Vanneste, Kevin; Wu, Zhi-Qiang; Zhang, Bo; Zerbe, Philipp; Arvestad, Lars; Bhalerao, Rishikesh; Bohlmann, Joerg; Bousquet, Jean; Garcia Gil, Rosario; Hvidsten, Torgeir R; de Jong, Pieter; MacKay, John; Morgante, Michele; Ritland, Kermit; Sundberg, Björn; Thompson, Stacey Lee.

Nature ; 497(7451): 579-84, 2013 May 30.

Artigo em Inglês | MEDLINE | ID: mdl-23698360

RESUMO

Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.

Assuntos

Evolução Molecular , Genoma de Planta/genética , Picea/genética , Sequência Conservada/genética , Elementos de DNA Transponíveis/genética , Inativação Gênica , Genes de Plantas/genética , Genômica , Internet , Íntrons/genética , Fenótipo , RNA não Traduzido/genética , Análise de Sequência de DNA , Sequências Repetidas Terminais/genética , Transcrição Gênica/genética

4.

VMCMC: a graphical and statistical analysis tool for Markov chain Monte Carlo traces.

Ali, Raja H; Bark, Mikael; Miró, Jorge; Muhammad, Sayyed A; Sjöstrand, Joel; Zubair, Syed M; Abbas, Raja M; Arvestad, Lars.

BMC Bioinformatics ; 18(1): 97, 2017 Feb 10.

Artigo em Inglês | MEDLINE | ID: mdl-28187712

RESUMO

BACKGROUND: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters. RESULTS: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines. CONCLUSIONS: VMCMC is a free software available under the New BSD License. Executable jar files, tutorial manual and source code can be downloaded from https://bitbucket.org/rhali/visualmcmc/ .

Assuntos

Software , Cadeias de Markov , Método de Monte Carlo , Filogenia

5.

Assembly scaffolding with PE-contaminated mate-pair libraries.

Sahlin, Kristoffer; Chikhi, Rayan; Arvestad, Lars.

Bioinformatics ; 32(13): 1925-32, 2016 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-27153683

RESUMO

MOTIVATION: Scaffolding is often an essential step in a genome assembly process, in which contigs are ordered and oriented using read pairs from a combination of paired-end libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problems is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected. This contamination has been discussed before, in relation to integrated scaffolders, but solutions rely on the orientation being observable, e.g. by finding the junction adapter sequence in the reads. This is not always possible, making orientation and insert size of a read pair stochastic. To our knowledge, there is neither previous work on modeling PE-contamination, nor a study on the effect PE-contamination has on scaffolding quality. RESULTS: We have addressed PE-contamination in an update to our scaffolder BESST. We formulate the problem as an integer linear program which is solved using an efficient heuristic. The new method shows significant improvement over both integrated and stand-alone scaffolders in our experiments. The impact of modeling PE-contamination is quantified by comparing with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in an increased number of misassemblies, more conservative scaffolding and inflated assembly sizes. AVAILABILITY AND IMPLEMENTATION: The model is implemented in BESST. Source code and usage instructions are found at https://github.com/ksahlin/BESST BESST can also be downloaded using PyPI. CONTACT: ksahlin@kth.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biblioteca Gênica , Genômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Bactérias/genética , Humanos , Programação Linear , Software

6.

Probabilistic inference of lateral gene transfer events.

Khan, Mehmood Alam; Mahmudi, Owais; Ullah, Ikram; Arvestad, Lars; Lagergren, Jens.

BMC Bioinformatics ; 17(Suppl 14): 431, 2016 Nov 11.

Artigo em Inglês | MEDLINE | ID: mdl-28185583

RESUMO

BACKGROUND: Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge. RESULTS: In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify "highways" of LGT. CONCLUSIONS: Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples.

Assuntos

Transferência Genética Horizontal/genética , Modelos Genéticos , Evolução Biológica , Entomoplasmataceae/classificação , Entomoplasmataceae/genética , Filogenia

7.

GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm.

Ali, Raja H; Muhammad, Sayyed A; Arvestad, Lars.

BMC Evol Biol ; 16(1): 120, 2016 06 04.

Artigo em Inglês | MEDLINE | ID: mdl-27260514

RESUMO

BACKGROUND: Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity. RESULTS: In this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs. CONCLUSIONS: The study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods.

Assuntos

Algoritmos , Homologia de Sequência do Ácido Nucleico , Sintenia , Animais , Análise por Conglomerados , Bases de Dados Genéticas , Fungos/genética , Humanos , Camundongos , Filogenia , Especificidade da Espécie , Estatística como Assunto

8.

Gene-pseudogene evolution: a probabilistic approach.

Mahmudi, Owais; Sennblad, Bengt; Arvestad, Lars; Nowick, Katja; Lagergren, Jens.

BMC Genomics ; 16 Suppl 10: S12, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26449131

RESUMO

Over the last decade, methods have been developed for the reconstruction of gene trees that take into account the species tree. Many of these methods have been based on the probabilistic duplication-loss model, which describes how a gene-tree evolves over a species-tree with respect to duplication and losses, as well as extension of this model, e.g., the DLRS (Duplication, Loss, Rate and Sequence evolution) model that also includes sequence evolution under relaxed molecular clock. A disjoint, almost as recent, and very important line of research has been focused on non protein-coding, but yet, functional DNA. For instance, DNA sequences being pseudogenes in the sense that they are not translated, may still be transcribed and the thereby produced RNA may be functional.

Assuntos

DNA/genética , Evolução Molecular , Filogenia , Pseudogenes/genética , Duplicação Gênica

9.

A Bayesian method for analyzing lateral gene transfer.

Sjöstrand, Joel; Tofigh, Ali; Daubin, Vincent; Arvestad, Lars; Sennblad, Bengt; Lagergren, Jens.

Syst Biol ; 63(3): 409-20, 2014 May.

Artigo em Inglês | MEDLINE | ID: mdl-24562812

RESUMO

Lateral gene transfer (LGT)--which transfers DNA between two non-vertically related individuals belonging to the same or different species--is recognized as a major force in prokaryotic evolution, and evidence of its impact on eukaryotic evolution is ever increasing. LGT has attracted much public attention for its potential to transfer pathogenic elements and antibiotic resistance in bacteria, and to transfer pesticide resistance from genetically modified crops to other plants. In a wider perspective, there is a growing body of studies highlighting the role of LGT in enabling organisms to occupy new niches or adapt to environmental changes. The challenge LGT poses to the standard tree-based conception of evolution is also being debated. Studies of LGT have, however, been severely limited by a lack of computational tools. The best currently available LGT algorithms are parsimony-based phylogenetic methods, which require a pre-computed gene tree and cannot choose between sometimes wildly differing most parsimonious solutions. Moreover, in many studies, simple heuristics are applied that can only handle putative orthologs and completely disregard gene duplications (GDs). Consequently, proposed LGT among specific gene families, and the rate of LGT in general, remain debated. We present a Bayesian Markov-chain Monte Carlo-based method that integrates GD, gene loss, LGT, and sequence evolution, and apply the method in a genome-wide analysis of two groups of bacteria: Mollicutes and Cyanobacteria. Our analyses show that although the LGT rate between distant species is high, the net combined rate of duplication and close-species LGT is on average higher. We also show that the common practice of disregarding reconcilability in gene tree inference overestimates the number of LGT and duplication events.

Assuntos

Classificação/métodos , Transferência Genética Horizontal , Teorema de Bayes , Cianobactérias/classificação , Cianobactérias/genética , Evolução Molecular , Modelos Teóricos , Filogenia , Tenericutes/classificação , Tenericutes/genética

10.

BESST--efficient scaffolding of large fragmented assemblies.

Sahlin, Kristoffer; Vezzi, Francesco; Nystedt, Björn; Lundeberg, Joakim; Arvestad, Lars.

BMC Bioinformatics ; 15: 281, 2014 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-25128196

RESUMO

BACKGROUND: The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features.We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software's general performance. RESULTS: We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide. CONCLUSION: We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding.

Assuntos

Algoritmos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Biblioteca Gênica , Humanos , Reprodutibilidade dos Testes

11.

GenPhyloData: realistic simulation of gene family evolution.

Sjöstrand, Joel; Arvestad, Lars; Lagergren, Jens; Sennblad, Bengt.

BMC Bioinformatics ; 14: 209, 2013 Jun 27.

Artigo em Inglês | MEDLINE | ID: mdl-23803001

RESUMO

BACKGROUND: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and--perhaps more interestingly--also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock. RESULT: Simulated data created with PrIME-GenPhyloData can be used for benchmarking phylogenetic approaches, or for characterizing models or model parameters with respect to biological data. CONCLUSION: The concept of tree-in-tree evolution can also be used to model, for instance, biogeography or host-parasite co-evolution.

Assuntos

Duplicação Gênica/genética , Família Multigênica/genética , Filogenia , Relógios Biológicos/genética , Simulação por Computador , Evolução Molecular , Técnicas de Transferência de Genes , Humanos , Modelos Biológicos , Especificidade da Espécie

12.

Quantitative synteny scoring improves homology inference and partitioning of gene families.

Ali, Raja Hashim; Muhammad, Sayyed; Khan, Mehmood; Arvestad, Lars.

BMC Bioinformatics ; 14 Suppl 15: S12, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24564516

RESUMO

BACKGROUND: Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential. RESULTS: Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data. CONCLUSIONS: The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

Assuntos

Sequência de Bases , Homologia de Sequência do Ácido Nucleico , Sintenia , Algoritmos , Animais , Mapeamento Cromossômico , Análise por Conglomerados , Humanos , Camundongos , Proteínas/genética

13.

GAM-NGS: genomic assemblies merger for next generation sequencing.

Vicedomini, Riccardo; Vezzi, Francesco; Scalabrin, Simone; Arvestad, Lars; Policriti, Alberto.

BMC Bioinformatics ; 14 Suppl 7: S6, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23815503

RESUMO

BACKGROUND: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions. RESULTS: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools. CONCLUSIONS: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Algoritmos , Cromossomos/genética , Genoma Bacteriano , Genoma Humano , Humanos , Rhodobacter sphaeroides/genética , Software , Staphylococcus aureus/genética

14.

Fastphylo: fast tools for phylogenetics.

Khan, Mehmood Alam; Elias, Isaac; Sjölund, Erik; Nylander, Kristina; Guimera, Roman Valls; Schobesberger, Richard; Schmitzberger, Peter; Lagergren, Jens; Arvestad, Lars.

BMC Bioinformatics ; 14: 334, 2013 Nov 20.

Artigo em Inglês | MEDLINE | ID: mdl-24255987

RESUMO

BACKGROUND: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances. RESULTS: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency. CONCLUSIONS: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

Assuntos

Biologia Computacional/instrumentação , Biologia Computacional/métodos , Filogenia , Algoritmos , Sequência de Aminoácidos , Evolução Biológica , Idioma , Memória , Família Multigênica , Software

15.

Improved gap size estimation for scaffolding algorithms.

Sahlin, Kristoffer; Street, Nathaniel; Lundeberg, Joakim; Arvestad, Lars.

Bioinformatics ; 28(17): 2215-22, 2012 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-22923455

RESUMO

MOTIVATION: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance. RESULTS: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners. AVAILABILITY: A reference implementation is provided at https://github.com/SciLifeLab/gapest. SUPPLEMENTARY INFORMATION: Supplementary data are availible at Bioinformatics online.

Assuntos

Algoritmos , Mapeamento de Sequências Contíguas/métodos , Genômica/métodos , Modelos Genéticos , Biblioteca Gênica , Funções Verossimilhança , Probabilidade , Análise de Regressão , Análise de Sequência de DNA/métodos , Software

16.

DLRS: gene tree evolution in light of a species tree.

Sjöstrand, Joel; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens.

Bioinformatics ; 28(22): 2994-5, 2012 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-22982573

RESUMO

SUMMARY: PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. AVAILABILITY AND IMPLEMENTATION: PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/. There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se. The C++ source code is available upon request. CONTACT: joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se. SUPPLEMENTARY INFORMATION: PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al., 2009) and has been thoroughly validated on synthetic and biological datasets (Supplementary Material online).

Assuntos

Evolução Molecular , Filogenia , Software , Algoritmos , Animais , Teorema de Bayes , Modelos Estatísticos , Linguagens de Programação , Alinhamento de Sequência

17.

Simultaneous Bayesian gene tree reconstruction and reconciliation analysis.

Akerborg, Orjan; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens.

Proc Natl Acad Sci U S A ; 106(14): 5714-9, 2009 Apr 07.

Artigo em Inglês | MEDLINE | ID: mdl-19299507

RESUMO

We present GSR, a probabilistic model integrating gene duplication, sequence evolution, and a relaxed molecular clock for substitution rates, that enables genomewide analysis of gene families. The gene duplication and loss process is a major cause for incongruence between gene and species tree, and deterministic methods have been developed to explain such differences through tree reconciliations. Although probabilistic methods for phylogenetic inference have been around for decades, probabilistic reconciliation methods are far less established. Based on our model, we have implemented a Bayesian analysis tool, PrIME-GSR, for gene tree inference that takes a known species tree into account. Our implementation is sound and we demonstrate its utility for genomewide gene-family analysis by applying it to recently presented yeast data. We validate PrIME-GSR by comparing with previous analyses of these data that take advantage of gene order information. In a case study we apply our method to the ADH gene family and are able to draw biologically relevant conclusions concerning gene duplications creating key yeast phenotypes. On a higher level this shows the biological relevance of our method. The obtained results demonstrate the value of a relaxed molecular clock. Our good performance will extend to species where gene order conservation is insufficient.

Assuntos

Teorema de Bayes , Modelos Genéticos , Filogenia , Duplicação Gênica , Genoma Fúngico , Cinética , Mutação , Leveduras/genética

18.

Classification of DNA sequences using Bloom filters.

Stranneheim, Henrik; Käller, Max; Allander, Tobias; Andersson, Björn; Arvestad, Lars; Lundeberg, Joakim.

Bioinformatics ; 26(13): 1595-600, 2010 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-20472541

RESUMO

MOTIVATION: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. RESULTS: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. AVAILABILITY: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/ approximately palvaro/Bloom-Faster-1.6/ CONTACTS: henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Metagenoma , Análise de Sequência de DNA , Genoma Mitocondrial , Humanos , Sistema Respiratório/microbiologia , Sensibilidade e Especificidade

19.

Evolution and human tissue expression of the Cres/Testatin subgroup genes, a reproductive tissue specific subgroup of the type 2 cystatins.

Frygelius, Jessica; Arvestad, Lars; Wedell, Anna; Töhönen, Virpi.

Evol Dev ; 12(3): 329-42, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20565543

RESUMO

The cystatin family comprises a group of generally broadly expressed protease inhibitors. The Cres/Testatin subgroup (CTES) genes within the type 2 cystatins differs from the classical type 2 cystatins in having a strikingly reproductive tissue-specific expression, and putative functions in reproduction have therefore been discussed. We have performed evolutionary studies of the CTES genes based on gene searches in genomes from 11 species. Ancestors of the cystatin family can be traced back to plants. We have localized the evolutionary origin of the CTES genes to the split of marsupial and placental mammals. A model for the evolution of these genes illustrates that they constitute a dynamic group of genes, which has undergone several gene expansions and we find indications of a high degree of positive selection, in striking contrast to what is seen for the classical cystatin C. We show with phylogenetic relations that the CTES genes are clustered into three original groups, a testatin, a Cres, and a CstL1 group. We have further characterized the expression patterns of all human members of the subfamily. Of a total of nine identified human genes, four express putative functional transcripts with a predominant expression in the male reproductive system. Our results are compatible with a function of this gene family in reproduction.

Assuntos

Cistatinas/genética , Evolução Molecular , Sequência de Bases , Primers do DNA , DNA Complementar , Humanos , RNA Mensageiro/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa

20.

The Mitogenome of Norway Spruce and a Reappraisal of Mitochondrial Recombination in Plants.

Sullivan, Alexis R; Eldfjell, Yrin; Schiffthaler, Bastian; Delhomme, Nicolas; Asp, Torben; Hebelstrup, Kim H; Keech, Olivier; Öberg, Lisa; Møller, Ian Max; Arvestad, Lars; Street, Nathaniel R; Wang, Xiao-Ru.

Genome Biol Evol ; 12(1): 3586-3598, 2020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31774499

RESUMO

Plant mitogenomes can be difficult to assemble because they are structurally dynamic and prone to intergenomic DNA transfers, leading to the unusual situation where an organelle genome is far outnumbered by its nuclear counterparts. As a result, comparative mitogenome studies are in their infancy and some key aspects of genome evolution are still known mainly from pregenomic, qualitative methods. To help address these limitations, we combined machine learning and in silico enrichment of mitochondrial-like long reads to assemble the bacterial-sized mitogenome of Norway spruce (Pinaceae: Picea abies). We conducted comparative analyses of repeat abundance, intergenomic transfers, substitution and rearrangement rates, and estimated repeat-by-repeat homologous recombination rates. Prompted by our discovery of highly recombinogenic small repeats in P. abies, we assessed the genomic support for the prevailing hypothesis that intramolecular recombination is predominantly driven by repeat length, with larger repeats facilitating DNA exchange more readily. Overall, we found mixed support for this view: Recombination dynamics were heterogeneous across vascular plants and highly active small repeats (ca. 200 bp) were present in about one-third of studied mitogenomes. As in previous studies, we did not observe any robust relationships among commonly studied genome attributes, but we identify variation in recombination rates as a underinvestigated source of plant mitogenome diversity.

Assuntos

Genoma Mitocondrial , Picea/genética , Recombinação Genética , Simulação por Computador , Cycadopsida/genética , DNA de Plantas/química , Genes de Plantas , Variação Genética , Sequências Repetitivas de Ácido Nucleico , Máquina de Vetores de Suporte

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa