Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bull Math Biol ; 86(1): 10, 2023 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-38117376

RESUMO

Phylogenetic networks are an extension of phylogenetic trees that allow for the representation of reticulate evolution events. One of the classes of networks that has gained the attention of the scientific community over the last years is the class of orchard networks, that generalizes tree-child networks, one of the most studied classes of networks. In this paper we focus on the combinatorial and algorithmic problem of the generation of binary orchard networks, and also of binary tree-child networks. To this end, we use that these networks are defined as those that can be recovered by reversing a certain reduction process. Then, we show how to choose a "minimum" reduction process among all that can be applied to a network, and hence we get a unique representation of the network that, in fact, can be given in terms of sequences of pairs of integers, whose length is related to the number of leaves and reticulations of the network. Therefore, the generation of networks is reduced to the generation of such sequences of pairs. Our main result is a recursive method for the efficient generation of all minimum sequences, and hence of all orchard (or tree-child) networks with a given number of leaves and reticulations. An implementation in C of the algorithms described in this paper, along with some computational experiments, can be downloaded from the public repository  https://github.com/gerardet46/OrchardGenerator . Using this implementation, we have computed the number of binary orchard networks with at most 6 leaves and 8 reticulations.


Assuntos
Conceitos Matemáticos , Modelos Biológicos , Humanos , Filogenia , Algoritmos , Folhas de Planta
2.
Bioinformatics ; 37(13): 1805-1813, 2021 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-33471063

RESUMO

MOTIVATION: Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. RESULTS: In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. AVAILABILITY AND IMPLEMENTATION: The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3.
J Math Biol ; 84(6): 47, 2022 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-35503141

RESUMO

The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.


Assuntos
Transferência Genética Horizontal , Hibridização Genética , Algoritmos , Evolução Biológica , Modelos Genéticos , Filogenia
4.
Nucleic Acids Res ; 47(D1): D678-D686, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30407573

RESUMO

The Integrated Microbial Genome/Virus (IMG/VR) system v.2.0 (https://img.jgi.doe.gov/vr/) is the largest publicly available data management and analysis platform dedicated to viral genomics. Since the last report published in the 2016, NAR Database Issue, the data has tripled in size and currently contains genomes of 8389 cultivated reference viruses, 12 498 previously published curated prophages derived from cultivated microbial isolates, and 735 112 viral genomic fragments computationally predicted from assembled shotgun metagenomes. Nearly 60% of the viral genomes and genome fragments are clustered into 110 384 viral Operational Taxonomic Units (vOTUs) with two or more members. To improve data quality and predictions of host specificity, IMG/VR v.2.0 now separates prokaryotic and eukaryotic viruses, utilizes known prophage sequences to improve taxonomic assignments, and provides viral genome quality scores based on the estimated genome completeness. New features also include enhanced BLAST search capabilities for external queries. Finally, geographic map visualization to locate user-selected viral genomes or genome fragments has been implemented and download options have been extended. All of these features make IMG/VR v.2.0 a key resource for the study of viruses.


Assuntos
Gerenciamento de Dados/métodos , Genoma Viral , Genômica/métodos , Software
5.
PLoS Comput Biol ; 15(9): e1007347, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31509525

RESUMO

Phylogenetic networks generalize phylogenetic trees by allowing the modelization of events of reticulate evolution. Among the different kinds of phylogenetic networks that have been proposed in the literature, the subclass of binary tree-child networks is one of the most studied ones. However, very little is known about the combinatorial structure of these networks. In this paper we address the problem of generating all possible binary tree-child (BTC) networks with a given number of leaves in an efficient way via reduction/augmentation operations that extend and generalize analogous operations for phylogenetic trees, and are biologically relevant. Since our solution is recursive, this also provides us with a recurrence relation giving an upper bound on the number of such networks. We also show how the operations introduced in this paper can be employed to extend the evolutive history of a set of sequences, represented by a BTC network, to include a new sequence. An implementation in python of the algorithms described in this paper, along with some computational experiments, can be downloaded from https://github.com/bielcardona/TCGenerators.


Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Algoritmos , Simulação por Computador
6.
PLoS Comput Biol ; 15(10): e1007440, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31596844

RESUMO

[This corrects the article DOI: 10.1371/journal.pcbi.1007347.].

7.
J Math Biol ; 78(4): 899-918, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30283985

RESUMO

Phylogenetic networks generalise phylogenetic trees and allow for the accurate representation of the evolutionary history of a set of present-day species whose past includes reticulate events such as hybridisation and lateral gene transfer. One way to obtain such a network is by starting with a (rooted) phylogenetic tree T, called a base tree, and adding arcs between arcs of T. The class of phylogenetic networks that can be obtained in this way is called tree-based networks and includes the prominent classes of tree-child and reticulation-visible networks. Initially defined for binary phylogenetic networks, tree-based networks naturally extend to arbitrary phylogenetic networks. In this paper, we generalise recent tree-based characterisations and associated proximity measures for binary phylogenetic networks to arbitrary phylogenetic networks. These characterisations are in terms of matchings in bipartite graphs, path partitions, and antichains. Some of the generalisations are straightforward to establish using the original approach, while others require a very different approach. Furthermore, for an arbitrary tree-based network N, we characterise the support trees of N, that is, the tree-based embeddings of N. We use this characterisation to give an explicit formula for the number of support trees of N when N is binary. This formula is written in terms of the components of a bipartite graph.


Assuntos
Evolução Biológica , Filogenia , Animais , Biologia Computacional , Conceitos Matemáticos , Modelos Genéticos
8.
J Math Biol ; 75(6-7): 1669-1692, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-28451760

RESUMO

Phylogenetic networks have gained attention from the scientific community due to the evidence of the existence of evolutionary events that cannot be represented using trees. A variant of phylogenetic networks, called LGT networks, models specifically lateral gene transfer events, which cannot be properly represented with generic phylogenetic networks. In this paper we treat the problem of the reconstruction of LGT networks from substructures induced by three leaves, which we call tri-LGT-nets. We first restrict ourselves to a class of LGT networks that are both mathematically treatable and biologically significant, called BAN-LGT networks. Then, we study the decomposition of such networks in subnetworks with three leaves and ask whether or not this decomposition determines the network. The answer to this question is negative, but if we further impose time-consistency (species involved in a later gene transfer must coexist) the answer is affirmative, up to some redundancy that can never be recovered but is fully characterized.


Assuntos
Transferência Genética Horizontal , Modelos Genéticos , Filogenia , Simulação por Computador , Evolução Molecular , Redes Reguladoras de Genes , Conceitos Matemáticos
9.
Artigo em Inglês | MEDLINE | ID: mdl-38300780

RESUMO

Phylogenetic networks generalize phylogenetic trees in order to model reticulation events. Although the comparison of phylogenetic trees is well studied, and there are multiple ways to do it in an efficient way, the situation is much different for phylogenetic networks. Some classes of phylogenetic networks, mainly tree-child networks, are known to be classified efficiently by their µ-representation, which essentially counts, for every node, the number of paths to each leaf. In this article, we introduce the extended µ-representation of networks, where the number of paths to reticulations is also taken into account. This modification allows us to distinguish orchard networks and to define a metric on the space of such networks that can, moreover, be computed efficiently. The class of orchard networks, as well as being one of the classes with biological significance (one such network can be interpreted as a tree with extra arcs involving coexisting organisms), is one of the most generic ones (in mathematical terms) for which such a representation can (conjecturally) exist, since a slight relaxation of the definition leads to a problem that is Graph Isomorphism Complete.


Assuntos
Algoritmos , Biologia Computacional , Filogenia , Biologia Computacional/métodos , Modelos Genéticos
10.
PLoS One ; 17(5): e0268181, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35594308

RESUMO

Invariants for complicated objects such as those arising in phylogenetics, whether they are invariants as matrices, polynomials, or other mathematical structures, are important tools for distinguishing and working with such objects. In this paper, we generalize a complete polynomial invariant on trees to a class of phylogenetic networks called separable networks, which will include orchard networks. Networks are becoming increasingly important for their ability to represent reticulation events, such as hybridization, in evolutionary history. We provide a function from the space of internally multi-labelled phylogenetic networks, a more generic graph structure than phylogenetic networks where the reticulations are also labelled, to a polynomial ring. We prove that the separability condition allows us to characterize, via the polynomial, the phylogenetic networks with the same number of leaves and same number of reticulations by considering their internally labelled versions. While the invariant for trees is a polynomial in [Formula: see text] where n is the number of leaves, the invariant for internally multi-labelled phylogenetic networks is an element of [Formula: see text], where r is the number of reticulations in the network. When the networks are considered without leaf labels the number of variables reduces to r + 2.


Assuntos
Conceitos Matemáticos , Modelos Genéticos , Algoritmos , Evolução Biológica , Filogenia
11.
Artigo em Inglês | MEDLINE | ID: mdl-30703035

RESUMO

Phylogenetic networks provide a mathematical model to represent the evolution of a set of species where, apart from speciation, reticulate evolutionary events have to be taken into account. Among these events, lateral gene transfers need special consideration due to the asymmetry in the roles of the species involved in such an event. To take into account this asymmetry, LGT networks were introduced. Contrarily to the case of phylogenetic trees, the combinatorial structure of phylogenetic networks is much less known and difficult to describe. One of the approaches in the literature is to classify them according to their level and find generators of the given level that can be used to recursively generate all networks. In this paper, we adapt the concept of generators to the case of LGT networks. We show how these generators, classified by their level, give rise to simple LGT networks of the specified level, and how any LGT network can be obtained from these simple networks, that act as building blocks of the generic structure. The stochastic models of evolution of phylogenetic networks are also much less studied than those for phylogenetic trees. In this setting, we introduce a novel two-parameter model that generates LGT networks. Finally, we present some computer simulations using this model in order to investigate the complexity of the generated networks, depending on the parameters of the model.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Transferência Genética Horizontal/genética , Modelos Genéticos , Filogenia , Simulação por Computador
12.
Algorithms Mol Biol ; 10: 28, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26691555

RESUMO

BACKGROUND: Lateral, or Horizontal, Gene Transfers are a type of asymmetric evolutionary events where genetic material is transferred from one species to another. In this paper we consider LGT networks, a general model of phylogenetic networks with lateral gene transfers which consist, roughly, of a principal rooted tree with its leaves labelled on a set of taxa, and a set of extra secondary arcs between nodes in this tree representing lateral gene transfers. An LGT network gives rise in a natural way to a principal phylogenetic subtree and a set of secondary phylogenetic subtrees, which, roughly, represent, respectively, the main line of evolution of most genes and the secondary lines of evolution through lateral gene transfers. RESULTS: We introduce a set of simple conditions on an LGT network that guarantee that its principal and secondary phylogenetic subtrees are pairwise different and that these subtrees determine, up to isomorphism, the LGT network. We then give an algorithm that, given a set of pairwise different phylogenetic trees [Formula: see text] on the same set of taxa, outputs, when it exists, the LGT network that satisfies these conditions and such that its principal phylogenetic tree is [Formula: see text] and its secondary phylogenetic trees are [Formula: see text].

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa