Búsqueda | BVS Bolivia

Insertions and deletions as phylogenetic signal in an alignment-free context.

Birth, Niklas; Dencker, Thomas; Morgenstern, Burkhard.

PLoS Comput Biol ; 18(8): e1010303, 2022 08.

Artículo en Inglés | MEDLINE | ID: mdl-35939516

RESUMEN

Most methods for phylogenetic tree reconstruction are based on sequence alignments; they infer phylogenies from substitutions that may have occurred at the aligned sequence positions. Gaps in alignments are usually not employed as phylogenetic signal. In this paper, we explore an alignment-free approach that uses insertions and deletions (indels) as an additional source of information for phylogeny inference. For a set of four or more input sequences, we generate so-called quartet blocks of four putative homologous segments each. For pairs of such quartet blocks involving the same four sequences, we compare the distances between the two blocks in these sequences, to obtain hints about indels that may have happened between the blocks since the respective four sequences have evolved from their last common ancestor. A prototype implementation that we call Gap-SpaM is presented to infer phylogenetic trees from these data, using a quartet-tree approach or, alternatively, under the maximum-parsimony paradigm. This approach should not be regarded as an alternative to established methods, but rather as a complementary source of phylogenetic information. Interestingly, however, our software is able to produce phylogenetic trees from putative indels alone that are comparable to trees obtained with existing alignment-free methods.

Asunto(s)

Mutación INDEL , Programas Informáticos , Algoritmos , Mutación INDEL/genética , Filogenia , Alineación de Secuencia

The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances.

Röhling, Sophie; Linne, Alexander; Schellhorn, Jendrik; Hosseini, Morteza; Dencker, Thomas; Morgenstern, Burkhard.

PLoS One ; 15(2): e0228070, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32040534

RESUMEN

We study the number Nk of length-k word matches between pairs of evolutionarily related DNA sequences, as a function of k. We show that the Jukes-Cantor distance between two genome sequences-i.e. the number of substitutions per site that occurred since they evolved from their last common ancestor-can be estimated from the slope of a function F that depends on Nk and that is affine-linear within a certain range of k. Integers kmin and kmax can be calculated depending on the length of the input sequences, such that the slope of F in the relevant range can be estimated from the values F(kmin) and F(kmax). This approach can be generalized to so-called Spaced-word Matches (SpaM), where mismatches are allowed at positions specified by a user-defined binary pattern. Based on these theoretical results, we implemented a prototype software program for alignment-free sequence comparison called Slope-SpaM. Test runs on real and simulated sequence data show that Slope-SpaM can accurately estimate phylogenetic distances for distances up to around 0.5 substitutions per position. The statistical stability of our results is improved if spaced words are used instead of contiguous words. Unlike previous alignment-free methods that are based on the number of (spaced) word matches, Slope-SpaM produces accurate results, even if sequences share only local homologies.

Asunto(s)

Filogenia , Análisis de Secuencia de ADN , Alineación de Secuencia , Homología de Secuencia de Ácido Nucleico

'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees.

Dencker, Thomas; Leimeister, Chris-André; Gerth, Michael; Bleidorn, Christoph; Snir, Sagi; Morgenstern, Burkhard.

NAR Genom Bioinform ; 2(1): lqz013, 2020 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-33575565

RESUMEN

Word-based or 'alignment-free' methods for phylogeny inference have become popular in recent years. These methods are much faster than traditional, alignment-based approaches, but they are generally less accurate. Most alignment-free methods calculate 'pairwise' distances between nucleic-acid or protein sequences; these distance values can then be used as input for tree-reconstruction programs such as neighbor-joining. In this paper, we propose the first word-based phylogeny approach that is based on 'multiple' sequence comparison and 'maximum likelihood'. Our algorithm first samples small, gap-free alignments involving four taxa each. For each of these alignments, it then calculates a quartet tree and, finally, the program 'Quartet MaxCut' is used to infer a super tree for the full set of input taxa from the calculated quartet trees. Experimental results show that trees produced with our approach are of high quality.

Benchmarking of alignment-free sequence comparison methods.

Zielezinski, Andrzej; Girgis, Hani Z; Bernard, Guillaume; Leimeister, Chris-Andre; Tang, Kujin; Dencker, Thomas; Lau, Anna Katharina; Röhling, Sophie; Choi, Jae Jin; Waterman, Michael S; Comin, Matteo; Kim, Sung-Hou; Vinga, Susana; Almeida, Jonas S; Chan, Cheong Xin; James, Benjamin T; Sun, Fengzhu; Morgenstern, Burkhard; Karlowski, Wojciech M.

Genome Biol ; 20(1): 144, 2019 07 25.

Artículo en Inglés | MEDLINE | ID: mdl-31345254

RESUMEN

BACKGROUND: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. RESULTS: Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. CONCLUSION: The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.

Asunto(s)

Análisis de Secuencia , Benchmarking , Transferencia de Gen Horizontal , Internet , Filogenia , Secuencias Reguladoras de Ácidos Nucleicos , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos

Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points.

Leimeister, Chris-André; Dencker, Thomas; Morgenstern, Burkhard.

Bioinformatics ; 35(2): 211-218, 2019 01 15.

Artículo en Inglés | MEDLINE | ID: mdl-29992260

RESUMEN

Motivation: Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods. Results: In this article, we use filtered spaced word matches to generate anchor points for genome alignment. For a given binary pattern representing match and don't-care positions, we first search for spaced-word matches, i.e. ungapped local pairwise alignments with matching nucleotides at the match positions of the pattern and possible mismatches at the don't-care positions. Those spaced-word matches that have similarity scores above some threshold value are then extended using a standard X-drop algorithm; the resulting local alignments are used as anchor points. To evaluate this approach, we used the popular multiple-genome-alignment pipeline Mugsy and replaced the exact word matches that Mugsy uses as anchor points with our spaced-word-based anchor points. For closely related genome sequences, the two anchoring procedures lead to multiple alignments of similar quality. For distantly related genomes, however, alignments calculated with our filtered-spaced-word matches are superior to alignments produced with the original Mugsy program where exact word matches are used to find anchor points. Availability and implementation: http://spacedanchor.gobics.de. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Genoma , Alineación de Secuencia/métodos , Programas Informáticos , Biología Computacional

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA