Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 125
Filter
1.
Ann Comb ; 28(1): 1-32, 2024.
Article in English | MEDLINE | ID: mdl-38433929

ABSTRACT

An equidistant X-cactus is a type of rooted, arc-weighted, directed acyclic graph with leaf set X, that is used in biology to represent the evolutionary history of a set X of species. In this paper, we introduce and investigate the space of equidistant X-cactuses. This space contains, as a subset, the space of ultrametric trees on X that was introduced by Gavryushkin and Drummond. We show that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points. As a key step to proving this, we present a combinatorial result concerning ranked rooted X-cactuses. In particular, we show that such graphs can be encoded in terms of a pairwise compatibility condition arising from a poset of collections of pairs of subsets of X that satisfy certain set-theoretic properties. As a corollary, we also obtain an encoding of ranked, rooted X-trees in terms of partitions of X, which provides an alternative proof that the space of ultrametric trees on X is CAT(0). We expect that our results will provide the basis for novel ways to perform statistical analyses on collections of equidistant X-cactuses, as well as new directions for defining and understanding spaces of more general, arc-weighted phylogenetic networks.

2.
J Math Biol ; 88(2): 17, 2024 01 18.
Article in English | MEDLINE | ID: mdl-38238584

ABSTRACT

Convergent evolution is an important process in which independent species evolve similar features usually over a long period of time. It occurs with many different species across the tree of life, and is often caused by the fact that species have to adapt to similar environmental niches. In this paper, we introduce and study properties of a distance-based model for convergent evolution in which we assume that two ancestral species converge for a certain period of time within a collection of species that have otherwise evolved according to an evolutionary clock. Under these assumptions it follows that we obtain a distance on the collection that is a modification of an ultrametric distance arising from an equidistant phylogenetic tree. As well as characterising when this modified distance is a tree metric, we give conditions in terms of the model's parameters for when it is still possible to recover the underlying tree and also its height, even in case the modified distance is not a tree metric.


Subject(s)
Evolution, Molecular , Models, Genetic , Phylogeny
3.
Discrete Comput Geom ; 70(4): 1862-1883, 2023.
Article in English | MEDLINE | ID: mdl-38022897

ABSTRACT

The generalized circumradius of a set of points A⊆Rd with respect to a convex body K equals the minimum value of λ≥0 such that a translate of λK contains A. Each choice of K gives a different function on the set of bounded subsets of Rd; we characterize which functions can arise in this way. Our characterization draws on the theory of diversities, a recently introduced generalization of metrics from functions on pairs to functions on finite subsets. We additionally investigate functions which arise by restricting the generalized circumradius to a finite subset of Rd. We obtain elegant characterizations in the case that K is a simplex or parallelotope.

4.
Data Brief ; 47: 108990, 2023 Apr.
Article in English | MEDLINE | ID: mdl-36879606

ABSTRACT

This article presents metagenome-assembled genomes (MAGs) for both eukaryotic and prokaryotic organisms originating from the Arctic and Atlantic oceans, along with gene prediction and functional annotation for MAGs from both domains. Eleven samples from the chlorophyll-a maximum layer of the surface ocean were collected during two cruises in 2012; six from the Arctic in June-July on ARK-XXVII/1 (PS80), and five from the Atlantic in November on ANT-XXIX/1 (PS81). Sequencing and assembly was carried out by the Joint Genome Institute (JGI), who provide annotation of the assembled sequences, and 122 MAGs for prokaryotic organisms. A subsequent binning process identified 21 MAGs for eukaryotic organisms, mostly identified as Mamiellophyceae or Bacillariophyceae. The data for each MAG includes sequences in FASTA format, and tables of functional annotation of genes. For eukaryotic MAGs, transcript and protein sequences for predicted genes are available. A spreadsheet is provided summarising quality measures and taxonomic classifications for each MAG. These data provide draft genomes for uncultured marine microbes, including some of the first MAGs for polar eukaryotes, and can provide reference genetic data for these environments, or used in genomics-based comparison between environments.

5.
Genes (Basel) ; 14(2)2023 01 30.
Article in English | MEDLINE | ID: mdl-36833289

ABSTRACT

Ice-binding proteins (IBPs) are a group of ecologically and biotechnologically relevant enzymes produced by psychrophilic organisms. Although putative IBPs containing the domain of unknown function (DUF) 3494 have been identified in many taxa of polar microbes, our knowledge of their genetic and structural diversity in natural microbial communities is limited. Here, we used samples from sea ice and sea water collected in the central Arctic Ocean as part of the MOSAiC expedition for metagenome sequencing and the subsequent analyses of metagenome-assembled genomes (MAGs). By linking structurally diverse IBPs to particular environments and potential functions, we reveal that IBP sequences are enriched in interior ice, have diverse genomic contexts and cluster taxonomically. Their diverse protein structures may be a consequence of domain shuffling, leading to variable combinations of protein domains in IBPs and probably reflecting the functional versatility required to thrive in the extreme and variable environment of the central Arctic Ocean.


Subject(s)
Carrier Proteins , Prokaryotic Cells , Protein Domains , Seawater , Oceans and Seas
6.
IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1289-1297, 2023.
Article in English | MEDLINE | ID: mdl-35737633

ABSTRACT

A rooted phylogenetic network is a directed acyclic graph with a single root, whose sinks correspond to a set of species. As such networks are useful for representing the evolution of species that have undergone reticulate evolution, there has been great interest in developing the theory behind and algorithms for constructing them. However, unlike evolutionary trees, these networks can be highly non-planar, which can make them difficult to visualise and interpret. Here we investigate properties of planar rooted phylogenetic networks and algorithms for deciding whether or not rooted networks have certain special planarity properties. In particular, we introduce three natural subclasses of planar rooted phylogenetic networks and show that they form a hierarchy. In addition, for the well-known level- k networks, we show that level-1, -2, -3 networks are always outer, terminal, and upward planar, respectively, and that level-4 networks are not necessarily planar. Finally, we show that a regular network is terminal planar if and only if it is pyramidal. Our results make use of the highly developed field of planar digraphs, and we believe that the link between phylogenetic networks and planar graphs should prove useful in future for developing new approaches to both construct and visualise phylogenetic networks.


Subject(s)
Algorithms , Phylogeny
7.
Microbiome ; 10(1): 67, 2022 04 28.
Article in English | MEDLINE | ID: mdl-35484634

ABSTRACT

BACKGROUND: Phytoplankton communities significantly contribute to global biogeochemical cycles of elements and underpin marine food webs. Although their uncultured genomic diversity has been estimated by planetary-scale metagenome sequencing and subsequent reconstruction of metagenome-assembled genomes (MAGs), this approach has yet to be applied for complex phytoplankton microbiomes from polar and non-polar oceans consisting of microbial eukaryotes and their associated prokaryotes. RESULTS: Here, we have assembled MAGs from chlorophyll a maximum layers in the surface of the Arctic and Atlantic Oceans enriched for species associations (microbiomes) with a focus on pico- and nanophytoplankton and their associated heterotrophic prokaryotes. From 679 Gbp and estimated 50 million genes in total, we recovered 143 MAGs of medium to high quality. Although there was a strict demarcation between Arctic and Atlantic MAGs, adjacent sampling stations in each ocean had 51-88% MAGs in common with most species associations between Prasinophytes and Proteobacteria. Phylogenetic placement revealed eukaryotic MAGs to be more diverse in the Arctic whereas prokaryotic MAGs were more diverse in the Atlantic Ocean. Approximately 70% of protein families were shared between Arctic and Atlantic MAGs for both prokaryotes and eukaryotes. However, eukaryotic MAGs had more protein families unique to the Arctic whereas prokaryotic MAGs had more families unique to the Atlantic. CONCLUSION: Our study provides a genomic context to complex phytoplankton microbiomes to reveal that their community structure was likely driven by significant differences in environmental conditions between the polar Arctic and warm surface waters of the tropical and subtropical Atlantic Ocean. Video Abstract.


Subject(s)
Metagenome , Microbiota , Atlantic Ocean , Chlorophyll A , Eukaryota/genetics , Metagenome/genetics , Microbiota/genetics , Phylogeny , Phytoplankton/genetics
8.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3374-3383, 2022.
Article in English | MEDLINE | ID: mdl-34559659

ABSTRACT

Current microRNA (miRNA) prediction methods are generally based on annotation criteria that tend to miss potential functional miRNAs. Recently, new miRNA annotation criteria have been proposed that could lead to improvements in miRNA prediction methods in plants. Here, we investigate the effect of the new criteria on miRNA prediction in Arabidopsis thaliana and present a new degradome assisted functional miRNA prediction approach. We investigated the effect by applying the new criteria, and a more permissive criteria on miRNA prediction using existing miRNA prediction tools. We also developed an approach to miRNA prediction that is assisted by the functional information extracted from the analysis of degradome sequencing. We demonstrate the improved performance of degradome assisted miRNA prediction compared to unassisted prediction and evaluate the approach using miRNA differential expression analysis. We observe how the miRNA predictions fit under the different criteria and show a potential novel miRNA that has been missed within Arabidopsis thaliana. Additionally, we introduce a freely available software 'PAREfirst' that employs the degradome assisted approach. The study shows that some miRNAs could be missed due to the stringency of the former annotation criteria, and combining a degradome assisted approach with more permissive miRNA criteria can expand confident miRNA predictions.


Subject(s)
Arabidopsis , MicroRNAs , MicroRNAs/genetics , MicroRNAs/metabolism , Arabidopsis/genetics , Arabidopsis/metabolism , Plants/genetics , Software , Sequence Analysis, RNA , Gene Expression Regulation, Plant , High-Throughput Nucleotide Sequencing
9.
Nat Commun ; 12(1): 5483, 2021 09 16.
Article in English | MEDLINE | ID: mdl-34531387

ABSTRACT

Eukaryotic phytoplankton are responsible for at least 20% of annual global carbon fixation. Their diversity and activity are shaped by interactions with prokaryotes as part of complex microbiomes. Although differences in their local species diversity have been estimated, we still have a limited understanding of environmental conditions responsible for compositional differences between local species communities on a large scale from pole to pole. Here, we show, based on pole-to-pole phytoplankton metatranscriptomes and microbial rDNA sequencing, that environmental differences between polar and non-polar upper oceans most strongly impact the large-scale spatial pattern of biodiversity and gene activity in algal microbiomes. The geographic differentiation of co-occurring microbes in algal microbiomes can be well explained by the latitudinal temperature gradient and associated break points in their beta diversity, with an average breakpoint at 14 °C ± 4.3, separating cold and warm upper oceans. As global warming impacts upper ocean temperatures, we project that break points of beta diversity move markedly pole-wards. Hence, abrupt regime shifts in algal microbiomes could be caused by anthropogenic climate change.


Subject(s)
Genetic Variation , Microalgae/genetics , Microbiota/genetics , Phytoplankton/genetics , Transcriptome/genetics , Antarctic Regions , Arctic Regions , Biodiversity , Carbon Cycle , Climate Change , Gene Ontology , Geography , Global Warming , Microalgae/classification , Microalgae/growth & development , Oceans and Seas , Phytoplankton/classification , Phytoplankton/growth & development , RNA, Ribosomal, 16S/genetics , RNA, Ribosomal, 18S/genetics , Sequence Analysis, DNA/methods , Species Specificity , Temperature
10.
J Math Biol ; 82(5): 40, 2021 03 26.
Article in English | MEDLINE | ID: mdl-33770290

ABSTRACT

Recently there has been considerable interest in the problem of finding a phylogenetic network with a minimum number of reticulation vertices which displays a given set of phylogenetic trees, that is, a network with minimum hybrid number. Such networks are useful for representing the evolution of species whose genomes have undergone processes such as lateral gene transfer and recombination that cannot be represented appropriately by a phylogenetic tree. Even so, as was recently pointed out in the literature, insisting that a network displays the set of trees can be an overly restrictive assumption when modeling certain evolutionary phenomena such as incomplete lineage sorting. In this paper, we thus consider the less restrictive notion of rigidly displaying which we introduce and study here. More specifically, we characterize when two trees can be rigidly displayed by a certain type of phylogenetic network called a temporal tree-child network in terms of fork-picking sequences. These are sequences of special subconfigurations of the two trees related to the well-studied cherry-picking sequences. We also show that, in case it exists, the rigid hybrid number for two phylogenetic trees is given by a minimum weight fork-picking sequence for the trees. Finally, we consider the relationship between the rigid hybrid number and three closely related numbers; the weak, beaded, and temporal hybrid numbers. In particular, we show that these numbers can all be different even for a fixed pair of trees, and also present an infinite family of pairs of trees which demonstrates that the difference between the rigid hybrid number and the temporal-hybrid number for two phylogenetic trees on the same set of n leaves can grow at least linearly with n.


Subject(s)
Models, Genetic , Phylogeny , Algorithms , Humans , Hybridization, Genetic
11.
Genes (Basel) ; 11(7)2020 07 16.
Article in English | MEDLINE | ID: mdl-32708551

ABSTRACT

The highly heterogeneous clinical course of human prostate cancer has prompted the development of multiple RNA biomarkers and diagnostic tools to predict outcome for individual patients. Biomarker discovery is often unstable with, for example, small changes in discovery dataset configuration resulting in large alterations in biomarker composition. Our hypothesis, which forms the basis of this current study, is that highly significant overlaps occurring between gene signatures obtained using entirely different approaches indicate genes fundamental for controlling cancer progression. For prostate cancer, we found two sets of signatures that had significant overlaps suggesting important genes (p < 10-34 for paired overlaps, hypergeometrical test). These overlapping signatures defined a core set of genes linking hormone signalling (HES6-AR), cell cycle progression (Prolaris) and a molecular subgroup of patients (PCS1) derived by Non Negative Matrix Factorization (NNMF) of control pathways, together designated as SIG-HES6. The second set (designated SIG-DESNT) consisted of the DESNT diagnostic signature and a second NNMF signature PCS3. Stratifications using SIG-HES6 (HES6, PCS1, Prolaris) and SIG-DESNT (DESNT) classifiers frequently detected the same individual high-risk cancers, indicating that the underlying mechanisms associated with SIG-HES6 and SIG-DESNT may act together to promote aggressive cancer development. We show that the use of combinations of a SIG-HES6 signature together with DESNT substantially increases the ability to predict poor outcome, and we propose a model for prostate cancer development involving co-operation between the SIG-HES6 and SIG-DESNT pathways that has implication for therapeutic design.


Subject(s)
Biomarkers, Tumor/genetics , Prostatic Neoplasms , Transcriptome , Biomarkers, Tumor/analysis , Cohort Studies , Datasets as Topic/statistics & numerical data , Disease Progression , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Male , Microarray Analysis , Neoplasm Invasiveness , Prognosis , Prostatic Neoplasms/diagnosis , Prostatic Neoplasms/genetics , Prostatic Neoplasms/pathology
12.
Algorithms Mol Biol ; 15: 12, 2020.
Article in English | MEDLINE | ID: mdl-32508979

ABSTRACT

The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.

13.
Nucleic Acids Res ; 48(12): 6481-6490, 2020 07 09.
Article in English | MEDLINE | ID: mdl-32463462

ABSTRACT

Natural antisense transcript-derived small interfering RNAs (nat-siRNAs) are a class of functional small RNA (sRNA) that have been found in both plant and animals kingdoms. In plants, these sRNAs have been shown to suppress the translation of messenger RNAs (mRNAs) by directing the RNA-induced silencing complex (RISC) to their sequence-specific mRNA target(s). Current computational tools for classification of nat-siRNAs are limited in number and can be computationally infeasible to use. In addition, current methods do not provide any indication of the function of the predicted nat-siRNAs. Here, we present a new software pipeline, called NATpare, for prediction and functional analysis of nat-siRNAs using sRNA and degradome sequencing data. Based on our benchmarking in multiple plant species, NATpare substantially reduces the time required to perform prediction with minimal resource requirements allowing for comprehensive analysis of nat-siRNAs in larger and more complex organisms for the first time. We then exemplify the use of NATpare by identifying tissue and stress specific nat-siRNAs in multiple Arabidopsis thaliana datasets.


Subject(s)
RNA, Plant/genetics , RNA, Small Interfering/chemistry , Sequence Analysis, RNA/methods , Software , Arabidopsis , RNA Interference , RNA, Plant/metabolism , RNA, Small Interfering/genetics , RNA, Small Interfering/metabolism
14.
Br J Cancer ; 122(10): 1467-1476, 2020 05.
Article in English | MEDLINE | ID: mdl-32203215

ABSTRACT

BACKGROUND: Unsupervised learning methods, such as Hierarchical Cluster Analysis, are commonly used for the analysis of genomic platform data. Unfortunately, such approaches ignore the well-documented heterogeneous composition of prostate cancer samples. Our aim is to use more sophisticated analytical approaches to deconvolute the structure of prostate cancer transcriptome data, providing novel clinically actionable information for this disease. METHODS: We apply an unsupervised model called Latent Process Decomposition (LPD), which can handle heterogeneity within individual cancer samples, to genome-wide expression data from eight prostate cancer clinical series, including 1,785 malignant samples with the clinical endpoints of PSA failure and metastasis. RESULTS: We show that PSA failure is correlated with the level of an expression signature called DESNT (HR = 1.52, 95% CI = [1.36, 1.7], P = 9.0 × 10-14, Cox model), and that patients with a majority DESNT signature have an increased metastatic risk (X2 test, P = 0.0017, and P = 0.0019). In addition, we develop a stratification framework that incorporates DESNT and identifies three novel molecular subtypes of prostate cancer. CONCLUSIONS: These results highlight the importance of using more complex approaches for the analysis of genomic data, may assist drug targeting, and have allowed the construction of a nomogram combining DESNT with other clinical factors for use in clinical management.


Subject(s)
Biomarkers, Tumor/blood , Gene Expression Profiling/statistics & numerical data , Prostatic Neoplasms/genetics , Transcriptome/genetics , Gene Expression Regulation, Neoplastic/genetics , Genomics/statistics & numerical data , Humans , Kaplan-Meier Estimate , Male , Middle Aged , Prognosis , Progression-Free Survival , Proportional Hazards Models , Prostate-Specific Antigen/blood , Prostatic Neoplasms/blood , Prostatic Neoplasms/pathology , Risk Assessment , Risk Factors
15.
Nucleic Acids Res ; 48(5): 2258-2270, 2020 03 18.
Article in English | MEDLINE | ID: mdl-31943065

ABSTRACT

MicroRNAs (miRNAs) are short, non-coding RNAs that modulate the translation-rate of messenger RNAs (mRNAs) by directing the RNA-induced silencing complex to sequence-specific targets. In plants, this typically results in cleavage and subsequent degradation of the mRNA. Degradome sequencing is a high-throughput technique developed to capture cleaved mRNA fragments and thus can be used to support miRNA target prediction. The current criteria used for miRNA target prediction were inferred on a limited number of experimentally validated A. thaliana interactions and were adapted to fit these specific interactions; thus, these fixed criteria may not be optimal across all datasets (organisms, tissues or treatments). We present a new tool, PAREameters, for inferring targeting criteria from small RNA and degradome sequencing datasets. We evaluate its performance using a more extensive set of experimentally validated interactions in multiple A. thaliana datasets. We also perform comprehensive analyses to highlight and quantify the differences between subsets of miRNA-mRNA interactions in model and non-model organisms. Our results show increased sensitivity in A. thaliana when using the PAREameters inferred criteria and that using data-driven criteria enables the identification of additional interactions that further our understanding of the RNA silencing pathway in both model and non-model organisms.


Subject(s)
Arabidopsis/genetics , Computational Biology/methods , Gene Expression Regulation, Plant , MicroRNAs/genetics , RNA, Messenger/genetics , RNA, Plant/genetics , Software , Arabidopsis/metabolism , Base Sequence , Datasets as Topic , Flowers/genetics , Flowers/metabolism , High-Throughput Nucleotide Sequencing , MicroRNAs/metabolism , Plant Leaves/genetics , Plant Leaves/metabolism , RNA Cleavage , RNA, Messenger/metabolism , RNA, Plant/metabolism , Sensitivity and Specificity , Sequence Analysis, RNA , Transcriptome
16.
J Ind Microbiol Biotechnol ; 47(1): 1-20, 2020 Jan.
Article in English | MEDLINE | ID: mdl-31691030

ABSTRACT

Denitrification is one of the key processes of the global nitrogen (N) cycle driven by bacteria. It has been widely known for more than 100 years as a process by which the biogeochemical N-cycle is balanced. To study this process, we develop an individual-based model called INDISIM-Denitrification. The model embeds a thermodynamic model for bacterial yield prediction inside the individual-based model INDISIM and is designed to simulate in aerobic and anaerobic conditions the cell growth kinetics of denitrifying bacteria. INDISIM-Denitrification simulates a bioreactor that contains a culture medium with succinate as a carbon source, ammonium as nitrogen source and various electron acceptors. To implement INDISIM-Denitrification, the individual-based model INDISIM was used to give sub-models for nutrient uptake, stirring and reproduction cycle. Using a thermodynamic approach, the denitrification pathway, cellular maintenance and individual mass degradation were modeled using microbial metabolic reactions. These equations are the basis of the sub-models for metabolic maintenance, individual mass synthesis and reducing internal cytotoxic products. The model was implemented in the open-access platform NetLogo. INDISIM-Denitrification is validated using a set of experimental data of two denitrifying bacteria in two different experimental conditions. This provides an interactive tool to study the denitrification process carried out by any denitrifying bacterium since INDISIM-Denitrification allows changes in the microbial empirical formula and in the energy-transfer-efficiency used to represent the metabolic pathways involved in the denitrification process. The simulator can be obtained from the authors on request.


Subject(s)
Denitrification , Ammonium Compounds/metabolism , Bacteria/metabolism , Bioreactors/microbiology , Carbon/metabolism , Nitrogen/metabolism , Thermodynamics
17.
J Math Biol ; 79(5): 1885-1925, 2019 10.
Article in English | MEDLINE | ID: mdl-31410552

ABSTRACT

Phylogenomics commonly aims to construct evolutionary trees from genomic sequence information. One way to approach this problem is to first estimate event-labeled gene trees (i.e., rooted trees whose non-leaf vertices are labeled by speciation or gene duplication events), and to then look for a species tree which can be reconciled with this tree through a reconciliation map between the trees. In practice, however, it can happen that there is no such map from a given event-labeled tree to any species tree. An important situation where this might arise is where the species evolution is better represented by a network instead of a tree. In this paper, we therefore consider the problem of reconciling event-labeled trees with species networks. In particular, we prove that any event-labeled gene tree can be reconciled with some network and that, under certain mild assumptions on the gene tree, the network can even be assumed to be multi-arc free. To prove this result, we show that we can always reconcile the gene tree with some multi-labeled (MUL-)tree, which can then be "folded up" to produce the desired reconciliation and network. In addition, we study the interplay between reconciliation maps from event-labeled gene trees to MUL-trees and networks. Our results could be useful for understanding how genomes have evolved after undergoing complex evolutionary events such as polyploidy.


Subject(s)
Evolution, Molecular , Gene Regulatory Networks , Models, Genetic , Phylogeny , Algorithms , Gene Duplication , Genetic Speciation , Mathematical Concepts
18.
Bull Math Biol ; 81(10): 3823-3863, 2019 10.
Article in English | MEDLINE | ID: mdl-31297691

ABSTRACT

Network reconstruction lies at the heart of phylogenetic research. Two well-studied classes of phylogenetic networks include tree-child networks and level-k networks. In a tree-child network, every non-leaf node has a child that is a tree node or a leaf. In a level-k network, the maximum number of reticulations contained in a biconnected component is k. Here, we show that level-k tree-child networks are encoded by their reticulate-edge-deleted subnetworks, which are subnetworks obtained by deleting a single reticulation edge, if [Formula: see text]. Following this, we provide a polynomial-time algorithm for uniquely reconstructing such networks from their reticulate-edge-deleted subnetworks. Moreover, we show that this can even be done when considering subnetworks obtained by deleting one reticulation edge from each biconnected component with k reticulations.


Subject(s)
Algorithms , Phylogeny , Computational Biology , Evolution, Molecular , Mathematical Concepts , Models, Genetic
19.
Syst Biol ; 68(5): 717-729, 2019 09 01.
Article in English | MEDLINE | ID: mdl-30668824

ABSTRACT

Introgression is an evolutionary process which provides an important source of innovation for evolution. Although various methods have been used to detect introgression, very few methods are currently available for constructing evolutionary histories involving introgression. In this article, we propose a new method for constructing such evolutionary histories whose starting point is a species forest (consisting of a collection of lineage trees, usually arising as a collection of clades or monophyletic groups in a species tree), and a gene tree for a specific allele of interest, or allele tree for short. Our method is based on representing introgression in terms of a certain "overlay" of the allele tree over the lineage trees, called an overlaid species forest (OSF). OSFs are similar to phylogenetic networks although a key difference is that they typically have multiple roots because each monophyletic group in the species tree has a different point of origin. Employing a new model for introgression, we derive an efficient algorithm for building OSFs called OSF-Builder that is guaranteed to return an optimal OSF in the sense that the number of potential introgression events is minimized. As well as using simulations to assess the performance of OSF-Builder, we illustrate its use on a butterfly data set in which introgression has been previously inferred. The OSF-Builder software is available for download from https://www.uea.ac.uk/computing/software/OSF-Builder.


Subject(s)
Biological Evolution , Classification/methods , Software
20.
Bull Math Biol ; 81(2): 598-617, 2019 02.
Article in English | MEDLINE | ID: mdl-29589255

ABSTRACT

Given a collection [Formula: see text] of subsets of a finite set X, we say that [Formula: see text] is phylogenetically flexible if, for any collection R of rooted phylogenetic trees whose leaf sets comprise the collection [Formula: see text], R is compatible (i.e. there is a rooted phylogenetic X-tree that displays each tree in R). We show that [Formula: see text] is phylogenetically flexible if and only if it satisfies a Hall-type inequality condition of being 'slim'. Using submodularity arguments, we show that there is a polynomial-time algorithm for determining whether or not [Formula: see text] is slim. This 'slim' condition reduces to a simpler inequality in the case where all of the sets in [Formula: see text] have size 3, a property we call 'thin'. Thin sets were recently shown to be equivalent to the existence of an (unrooted) tree for which the median function provides an injective mapping to its vertex set; we show here that the unrooted tree in this representation can always be chosen to be a caterpillar tree. We also characterise when a collection [Formula: see text] of subsets of size 2 is thin (in terms of the flexibility of total orders rather than phylogenies) and show that this holds if and only if an associated bipartite graph is a forest. The significance of our results for phylogenetics is in providing precise and efficiently verifiable conditions under which supertree methods that require consistent inputs of trees can be applied to any input trees on given subsets of species.


Subject(s)
Models, Genetic , Phylogeny , Algorithms , Computational Biology , Evolution, Molecular , Genomics/statistics & numerical data , Mathematical Concepts , Models, Statistical
SELECTION OF CITATIONS
SEARCH DETAIL
...