Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 125
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Math Biol ; 88(2): 17, 2024 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-38238584

RESUMEN

Convergent evolution is an important process in which independent species evolve similar features usually over a long period of time. It occurs with many different species across the tree of life, and is often caused by the fact that species have to adapt to similar environmental niches. In this paper, we introduce and study properties of a distance-based model for convergent evolution in which we assume that two ancestral species converge for a certain period of time within a collection of species that have otherwise evolved according to an evolutionary clock. Under these assumptions it follows that we obtain a distance on the collection that is a modification of an ultrametric distance arising from an equidistant phylogenetic tree. As well as characterising when this modified distance is a tree metric, we give conditions in terms of the model's parameters for when it is still possible to recover the underlying tree and also its height, even in case the modified distance is not a tree metric.


Asunto(s)
Evolución Molecular , Modelos Genéticos , Filogenia
2.
Ann Comb ; 28(1): 1-32, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38433929

RESUMEN

An equidistant X-cactus is a type of rooted, arc-weighted, directed acyclic graph with leaf set X, that is used in biology to represent the evolutionary history of a set X of species. In this paper, we introduce and investigate the space of equidistant X-cactuses. This space contains, as a subset, the space of ultrametric trees on X that was introduced by Gavryushkin and Drummond. We show that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points. As a key step to proving this, we present a combinatorial result concerning ranked rooted X-cactuses. In particular, we show that such graphs can be encoded in terms of a pairwise compatibility condition arising from a poset of collections of pairs of subsets of X that satisfy certain set-theoretic properties. As a corollary, we also obtain an encoding of ranked, rooted X-trees in terms of partitions of X, which provides an alternative proof that the space of ultrametric trees on X is CAT(0). We expect that our results will provide the basis for novel ways to perform statistical analyses on collections of equidistant X-cactuses, as well as new directions for defining and understanding spaces of more general, arc-weighted phylogenetic networks.

3.
Nature ; 541(7638): 536-540, 2017 01 26.
Artículo en Inglés | MEDLINE | ID: mdl-28092920

RESUMEN

The Southern Ocean houses a diverse and productive community of organisms. Unicellular eukaryotic diatoms are the main primary producers in this environment, where photosynthesis is limited by low concentrations of dissolved iron and large seasonal fluctuations in light, temperature and the extent of sea ice. How diatoms have adapted to this extreme environment is largely unknown. Here we present insights into the genome evolution of a cold-adapted diatom from the Southern Ocean, Fragilariopsis cylindrus, based on a comparison with temperate diatoms. We find that approximately 24.7 per cent of the diploid F. cylindrus genome consists of genetic loci with alleles that are highly divergent (15.1 megabases of the total genome size of 61.1 megabases). These divergent alleles were differentially expressed across environmental conditions, including darkness, low iron, freezing, elevated temperature and increased CO2. Alleles with the largest ratio of non-synonymous to synonymous nucleotide substitutions also show the most pronounced condition-dependent expression, suggesting a correlation between diversifying selection and allelic differentiation. Divergent alleles may be involved in adaptation to environmental fluctuations in the Southern Ocean.


Asunto(s)
Aclimatación/genética , Frío , Diatomeas/genética , Evolución Molecular , Genoma/genética , Genómica , Alelos , Dióxido de Carbono/metabolismo , Oscuridad , Diatomeas/metabolismo , Congelación , Perfilación de la Expresión Génica , Flujo Genético , Cubierta de Hielo , Hierro/metabolismo , Tasa de Mutación , Océanos y Mares , Filogenia , Recombinación Genética , Transcriptoma/genética
4.
Nucleic Acids Res ; 48(12): 6481-6490, 2020 07 09.
Artículo en Inglés | MEDLINE | ID: mdl-32463462

RESUMEN

Natural antisense transcript-derived small interfering RNAs (nat-siRNAs) are a class of functional small RNA (sRNA) that have been found in both plant and animals kingdoms. In plants, these sRNAs have been shown to suppress the translation of messenger RNAs (mRNAs) by directing the RNA-induced silencing complex (RISC) to their sequence-specific mRNA target(s). Current computational tools for classification of nat-siRNAs are limited in number and can be computationally infeasible to use. In addition, current methods do not provide any indication of the function of the predicted nat-siRNAs. Here, we present a new software pipeline, called NATpare, for prediction and functional analysis of nat-siRNAs using sRNA and degradome sequencing data. Based on our benchmarking in multiple plant species, NATpare substantially reduces the time required to perform prediction with minimal resource requirements allowing for comprehensive analysis of nat-siRNAs in larger and more complex organisms for the first time. We then exemplify the use of NATpare by identifying tissue and stress specific nat-siRNAs in multiple Arabidopsis thaliana datasets.


Asunto(s)
ARN de Planta/genética , ARN Interferente Pequeño/química , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Arabidopsis , Interferencia de ARN , ARN de Planta/metabolismo , ARN Interferente Pequeño/genética , ARN Interferente Pequeño/metabolismo
5.
Nucleic Acids Res ; 48(5): 2258-2270, 2020 03 18.
Artículo en Inglés | MEDLINE | ID: mdl-31943065

RESUMEN

MicroRNAs (miRNAs) are short, non-coding RNAs that modulate the translation-rate of messenger RNAs (mRNAs) by directing the RNA-induced silencing complex to sequence-specific targets. In plants, this typically results in cleavage and subsequent degradation of the mRNA. Degradome sequencing is a high-throughput technique developed to capture cleaved mRNA fragments and thus can be used to support miRNA target prediction. The current criteria used for miRNA target prediction were inferred on a limited number of experimentally validated A. thaliana interactions and were adapted to fit these specific interactions; thus, these fixed criteria may not be optimal across all datasets (organisms, tissues or treatments). We present a new tool, PAREameters, for inferring targeting criteria from small RNA and degradome sequencing datasets. We evaluate its performance using a more extensive set of experimentally validated interactions in multiple A. thaliana datasets. We also perform comprehensive analyses to highlight and quantify the differences between subsets of miRNA-mRNA interactions in model and non-model organisms. Our results show increased sensitivity in A. thaliana when using the PAREameters inferred criteria and that using data-driven criteria enables the identification of additional interactions that further our understanding of the RNA silencing pathway in both model and non-model organisms.


Asunto(s)
Arabidopsis/genética , Biología Computacional/métodos , Regulación de la Expresión Génica de las Plantas , MicroARNs/genética , ARN Mensajero/genética , ARN de Planta/genética , Programas Informáticos , Arabidopsis/metabolismo , Secuencia de Bases , Conjuntos de Datos como Asunto , Flores/genética , Flores/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , MicroARNs/metabolismo , Hojas de la Planta/genética , Hojas de la Planta/metabolismo , División del ARN , ARN Mensajero/metabolismo , ARN de Planta/metabolismo , Sensibilidad y Especificidad , Análisis de Secuencia de ARN , Transcriptoma
6.
J Math Biol ; 82(5): 40, 2021 03 26.
Artículo en Inglés | MEDLINE | ID: mdl-33770290

RESUMEN

Recently there has been considerable interest in the problem of finding a phylogenetic network with a minimum number of reticulation vertices which displays a given set of phylogenetic trees, that is, a network with minimum hybrid number. Such networks are useful for representing the evolution of species whose genomes have undergone processes such as lateral gene transfer and recombination that cannot be represented appropriately by a phylogenetic tree. Even so, as was recently pointed out in the literature, insisting that a network displays the set of trees can be an overly restrictive assumption when modeling certain evolutionary phenomena such as incomplete lineage sorting. In this paper, we thus consider the less restrictive notion of rigidly displaying which we introduce and study here. More specifically, we characterize when two trees can be rigidly displayed by a certain type of phylogenetic network called a temporal tree-child network in terms of fork-picking sequences. These are sequences of special subconfigurations of the two trees related to the well-studied cherry-picking sequences. We also show that, in case it exists, the rigid hybrid number for two phylogenetic trees is given by a minimum weight fork-picking sequence for the trees. Finally, we consider the relationship between the rigid hybrid number and three closely related numbers; the weak, beaded, and temporal hybrid numbers. In particular, we show that these numbers can all be different even for a fixed pair of trees, and also present an infinite family of pairs of trees which demonstrates that the difference between the rigid hybrid number and the temporal-hybrid number for two phylogenetic trees on the same set of n leaves can grow at least linearly with n.


Asunto(s)
Modelos Genéticos , Filogenia , Algoritmos , Humanos , Hibridación Genética
7.
Br J Cancer ; 122(10): 1467-1476, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32203215

RESUMEN

BACKGROUND: Unsupervised learning methods, such as Hierarchical Cluster Analysis, are commonly used for the analysis of genomic platform data. Unfortunately, such approaches ignore the well-documented heterogeneous composition of prostate cancer samples. Our aim is to use more sophisticated analytical approaches to deconvolute the structure of prostate cancer transcriptome data, providing novel clinically actionable information for this disease. METHODS: We apply an unsupervised model called Latent Process Decomposition (LPD), which can handle heterogeneity within individual cancer samples, to genome-wide expression data from eight prostate cancer clinical series, including 1,785 malignant samples with the clinical endpoints of PSA failure and metastasis. RESULTS: We show that PSA failure is correlated with the level of an expression signature called DESNT (HR = 1.52, 95% CI = [1.36, 1.7], P = 9.0 × 10-14, Cox model), and that patients with a majority DESNT signature have an increased metastatic risk (X2 test, P = 0.0017, and P = 0.0019). In addition, we develop a stratification framework that incorporates DESNT and identifies three novel molecular subtypes of prostate cancer. CONCLUSIONS: These results highlight the importance of using more complex approaches for the analysis of genomic data, may assist drug targeting, and have allowed the construction of a nomogram combining DESNT with other clinical factors for use in clinical management.


Asunto(s)
Biomarcadores de Tumor/sangre , Perfilación de la Expresión Génica/estadística & datos numéricos , Neoplasias de la Próstata/genética , Transcriptoma/genética , Regulación Neoplásica de la Expresión Génica/genética , Genómica/estadística & datos numéricos , Humanos , Estimación de Kaplan-Meier , Masculino , Persona de Mediana Edad , Pronóstico , Supervivencia sin Progresión , Modelos de Riesgos Proporcionales , Antígeno Prostático Específico/sangre , Neoplasias de la Próstata/sangre , Neoplasias de la Próstata/patología , Medición de Riesgo , Factores de Riesgo
8.
Syst Biol ; 68(4): 607-618, 2019 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-30418649

RESUMEN

Tree reconciliation is the mathematical tool that is used to investigate the coevolution of organisms, such as hosts and parasites. A common approach to tree reconciliation involves specifying a model that assigns costs to certain events, such as cospeciation, and then tries to find a mapping between two specified phylogenetic trees which minimizes the total cost of the implied events. For such models, it has been shown that there may be a huge number of optimal solutions, or at least solutions that are close to optimal. It is therefore of interest to be able to systematically compare and visualize whole collections of reconciliations between a specified pair of trees. In this article, we consider various metrics on the set of all possible reconciliations between a pair of trees, some that have been defined before but also new metrics that we shall propose. We show that the diameter for the resulting spaces of reconciliations can in some cases be determined theoretically, information that we use to normalize and compare properties of the metrics. We also implement the metrics and compare their behavior on several host parasite data sets, including the shapes of their distributions. In addition, we show that in combination with multidimensional scaling, the metrics can be useful for visualizing large collections of reconciliations, much in the same way as phylogenetic tree metrics can be used to explore collections of phylogenetic trees. Implementations of the metrics can be downloaded from: https://team.inria.fr/erable/en/team-members/blerina-sinaimeri/reconciliation-distances/.


Asunto(s)
Clasificación/métodos , Interacciones Huésped-Parásitos/fisiología , Filogenia , Modelos Biológicos
9.
Syst Biol ; 68(5): 717-729, 2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30668824

RESUMEN

Introgression is an evolutionary process which provides an important source of innovation for evolution. Although various methods have been used to detect introgression, very few methods are currently available for constructing evolutionary histories involving introgression. In this article, we propose a new method for constructing such evolutionary histories whose starting point is a species forest (consisting of a collection of lineage trees, usually arising as a collection of clades or monophyletic groups in a species tree), and a gene tree for a specific allele of interest, or allele tree for short. Our method is based on representing introgression in terms of a certain "overlay" of the allele tree over the lineage trees, called an overlaid species forest (OSF). OSFs are similar to phylogenetic networks although a key difference is that they typically have multiple roots because each monophyletic group in the species tree has a different point of origin. Employing a new model for introgression, we derive an efficient algorithm for building OSFs called OSF-Builder that is guaranteed to return an optimal OSF in the sense that the number of potential introgression events is minimized. As well as using simulations to assess the performance of OSF-Builder, we illustrate its use on a butterfly data set in which introgression has been previously inferred. The OSF-Builder software is available for download from https://www.uea.ac.uk/computing/software/OSF-Builder.


Asunto(s)
Evolución Biológica , Clasificación/métodos , Programas Informáticos
10.
Nucleic Acids Res ; 46(17): 8730-8739, 2018 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-30007348

RESUMEN

Small RNAs (sRNAs) are short, non-coding RNAs that play critical roles in many important biological pathways. They suppress the translation of messenger RNAs (mRNAs) by directing the RNA-induced silencing complex to their sequence-specific mRNA target(s). In plants, this typically results in mRNA cleavage and subsequent degradation of the mRNA. The resulting mRNA fragments, or degradome, provide evidence for these interactions, and thus degradome analysis has become an important tool for sRNA target prediction. Even so, with the continuing advances in sequencing technologies, not only are larger and more complex genomes being sequenced, but also degradome and associated datasets are growing both in number and read count. As a result, existing degradome analysis tools are unable to process the volume of data being produced without imposing huge resource and time requirements. Moreover, these tools use stringent, non-configurable targeting rules, which reduces their flexibility. Here, we present a new and user configurable software tool for degradome analysis, which employs a novel search algorithm and sequence encoding technique to reduce the search space during analysis. The tool significantly reduces the time and resources required to perform degradome analysis, in some cases providing more than two orders of magnitude speed-up over current methods.


Asunto(s)
Biología Computacional/métodos , Estabilidad del ARN , ARN Mensajero/metabolismo , ARN de Planta/metabolismo , ARN Interferente Pequeño/metabolismo , Programas Informáticos , Algoritmos , Arabidopsis/genética , Secuencia de Bases , Benchmarking , Conjuntos de Datos como Asunto , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Interferencia de ARN , Alineación de Secuencia
11.
J Ind Microbiol Biotechnol ; 47(1): 1-20, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-31691030

RESUMEN

Denitrification is one of the key processes of the global nitrogen (N) cycle driven by bacteria. It has been widely known for more than 100 years as a process by which the biogeochemical N-cycle is balanced. To study this process, we develop an individual-based model called INDISIM-Denitrification. The model embeds a thermodynamic model for bacterial yield prediction inside the individual-based model INDISIM and is designed to simulate in aerobic and anaerobic conditions the cell growth kinetics of denitrifying bacteria. INDISIM-Denitrification simulates a bioreactor that contains a culture medium with succinate as a carbon source, ammonium as nitrogen source and various electron acceptors. To implement INDISIM-Denitrification, the individual-based model INDISIM was used to give sub-models for nutrient uptake, stirring and reproduction cycle. Using a thermodynamic approach, the denitrification pathway, cellular maintenance and individual mass degradation were modeled using microbial metabolic reactions. These equations are the basis of the sub-models for metabolic maintenance, individual mass synthesis and reducing internal cytotoxic products. The model was implemented in the open-access platform NetLogo. INDISIM-Denitrification is validated using a set of experimental data of two denitrifying bacteria in two different experimental conditions. This provides an interactive tool to study the denitrification process carried out by any denitrifying bacterium since INDISIM-Denitrification allows changes in the microbial empirical formula and in the energy-transfer-efficiency used to represent the metabolic pathways involved in the denitrification process. The simulator can be obtained from the authors on request.


Asunto(s)
Desnitrificación , Compuestos de Amonio/metabolismo , Bacterias/metabolismo , Reactores Biológicos/microbiología , Carbono/metabolismo , Nitrógeno/metabolismo , Termodinámica
12.
RNA ; 23(6): 823-835, 2017 06.
Artículo en Inglés | MEDLINE | ID: mdl-28289155

RESUMEN

Recently, high-throughput sequencing (HTS) has revealed compelling details about the small RNA (sRNA) population in eukaryotes. These 20 to 25 nt noncoding RNAs can influence gene expression by acting as guides for the sequence-specific regulatory mechanism known as RNA silencing. The increase in sequencing depth and number of samples per project enables a better understanding of the role sRNAs play by facilitating the study of expression patterns. However, the intricacy of the biological hypotheses coupled with a lack of appropriate tools often leads to inadequate mining of the available data and thus, an incomplete description of the biological mechanisms involved. To enable a comprehensive study of differential expression in sRNA data sets, we present a new interactive pipeline that guides researchers through the various stages of data preprocessing and analysis. This includes various tools, some of which we specifically developed for sRNA analysis, for quality checking and normalization of sRNA samples as well as tools for the detection of differentially expressed sRNAs and identification of the resulting expression patterns. The pipeline is available within the UEA sRNA Workbench, a user-friendly software package for the processing of sRNA data sets. We demonstrate the use of the pipeline on a H. sapiens data set; additional examples on a B. terrestris data set and on an A. thaliana data set are described in the Supplemental Information A comparison with existing approaches is also included, which exemplifies some of the issues that need to be addressed for sRNA analysis and how the new pipeline may be used to do this.


Asunto(s)
Biología Computacional , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , ARN Pequeño no Traducido , Análisis de Secuencia de ARN , Programas Informáticos , Biología Computacional/métodos , Biología Computacional/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/normas , Flujo de Trabajo
13.
Bioinformatics ; 34(6): 1056-1057, 2018 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-29186450

RESUMEN

Summary: Split-networks are a generalization of phylogenetic trees that have proven to be a powerful tool in phylogenetics. Various ways have been developed for computing such networks, including split-decomposition, NeighborNet, QNet and FlatNJ. Some of these approaches are implemented in the user-friendly SplitsTree software package. However, to give the user the option to adjust and extend these approaches and to facilitate their integration into analysis pipelines, there is a need for robust, open-source implementations of associated data structures and algorithms. Here, we present SPECTRE, a readily available, open-source library of data structures written in Java, that comes complete with new implementations of several pre-published algorithms and a basic interactive graphical interface for visualizing planar split networks. SPECTRE also supports the use of longer running algorithms by providing command line interfaces, which can be executed on servers or in High Performance Computing environments. Availability and implementation: Full source code is available under the GPLv3 license at: https://github.com/maplesond/SPECTRE. SPECTRE's core library is available from Maven Central at: https://mvnrepository.com/artifact/uk.ac.uea.cmp.spectre/core. Documentation is available at: http://spectre-suite-of-phylogenetic-tools-for-reticulate-evolution.readthedocs.io/en/latest/. Contact: sarah.bastkowski@earlham.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Filogenia , Algoritmos , Biblioteca de Genes , Programas Informáticos
14.
Bioinformatics ; 34(19): 3382-3384, 2018 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29722807

RESUMEN

Motivation: RNA interference, a highly conserved regulatory mechanism, is mediated via small RNAs (sRNA). Recent technical advances enabled the analysis of larger, complex datasets and the investigation of microRNAs and the less known small interfering RNAs. However, the size and intricacy of current data requires a comprehensive set of tools, able to discriminate the patterns from the low-level, noise-like, variation; numerous and varied suggestions from the community represent an invaluable source of ideas for future tools, the ability of the community to contribute to this software is essential. Results: We present a new version of the UEA sRNA Workbench, reconfigured to allow an easy insertion of new tools/workflows. In its released form, it comprises of a suite of tools in a user-friendly environment, with enhanced capabilities for a comprehensive processing of sRNA-seq data e.g. tools for an accurate prediction of sRNA loci (CoLIde) and miRNA loci (miRCat2), as well as workflows to guide the users through common steps such as quality checking of the input data, normalization of abundances or detection of differential expression represent the first step in sRNA-seq analyses. Availability and implementation: The UEA sRNA Workbench is available at: http://srna-workbench.cmp.uea.ac.uk. The source code is available at: https://github.com/sRNAworkbenchuea/UEA_sRNA_Workbench. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
MicroARNs/genética , ARN Interferente Pequeño/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Interferencia de ARN , Flujo de Trabajo
15.
Bull Math Biol ; 81(2): 598-617, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-29589255

RESUMEN

Given a collection [Formula: see text] of subsets of a finite set X, we say that [Formula: see text] is phylogenetically flexible if, for any collection R of rooted phylogenetic trees whose leaf sets comprise the collection [Formula: see text], R is compatible (i.e. there is a rooted phylogenetic X-tree that displays each tree in R). We show that [Formula: see text] is phylogenetically flexible if and only if it satisfies a Hall-type inequality condition of being 'slim'. Using submodularity arguments, we show that there is a polynomial-time algorithm for determining whether or not [Formula: see text] is slim. This 'slim' condition reduces to a simpler inequality in the case where all of the sets in [Formula: see text] have size 3, a property we call 'thin'. Thin sets were recently shown to be equivalent to the existence of an (unrooted) tree for which the median function provides an injective mapping to its vertex set; we show here that the unrooted tree in this representation can always be chosen to be a caterpillar tree. We also characterise when a collection [Formula: see text] of subsets of size 2 is thin (in terms of the flexibility of total orders rather than phylogenies) and show that this holds if and only if an associated bipartite graph is a forest. The significance of our results for phylogenetics is in providing precise and efficiently verifiable conditions under which supertree methods that require consistent inputs of trees can be applied to any input trees on given subsets of species.


Asunto(s)
Modelos Genéticos , Filogenia , Algoritmos , Biología Computacional , Evolución Molecular , Genómica/estadística & datos numéricos , Conceptos Matemáticos , Modelos Estadísticos
16.
Bull Math Biol ; 81(10): 3823-3863, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31297691

RESUMEN

Network reconstruction lies at the heart of phylogenetic research. Two well-studied classes of phylogenetic networks include tree-child networks and level-k networks. In a tree-child network, every non-leaf node has a child that is a tree node or a leaf. In a level-k network, the maximum number of reticulations contained in a biconnected component is k. Here, we show that level-k tree-child networks are encoded by their reticulate-edge-deleted subnetworks, which are subnetworks obtained by deleting a single reticulation edge, if [Formula: see text]. Following this, we provide a polynomial-time algorithm for uniquely reconstructing such networks from their reticulate-edge-deleted subnetworks. Moreover, we show that this can even be done when considering subnetworks obtained by deleting one reticulation edge from each biconnected component with k reticulations.


Asunto(s)
Algoritmos , Filogenia , Biología Computacional , Evolución Molecular , Conceptos Matemáticos , Modelos Genéticos
17.
J Math Biol ; 79(5): 1885-1925, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31410552

RESUMEN

Phylogenomics commonly aims to construct evolutionary trees from genomic sequence information. One way to approach this problem is to first estimate event-labeled gene trees (i.e., rooted trees whose non-leaf vertices are labeled by speciation or gene duplication events), and to then look for a species tree which can be reconciled with this tree through a reconciliation map between the trees. In practice, however, it can happen that there is no such map from a given event-labeled tree to any species tree. An important situation where this might arise is where the species evolution is better represented by a network instead of a tree. In this paper, we therefore consider the problem of reconciling event-labeled trees with species networks. In particular, we prove that any event-labeled gene tree can be reconciled with some network and that, under certain mild assumptions on the gene tree, the network can even be assumed to be multi-arc free. To prove this result, we show that we can always reconcile the gene tree with some multi-labeled (MUL-)tree, which can then be "folded up" to produce the desired reconciliation and network. In addition, we study the interplay between reconciliation maps from event-labeled gene trees to MUL-trees and networks. Our results could be useful for understanding how genomes have evolved after undergoing complex evolutionary events such as polyploidy.


Asunto(s)
Evolución Molecular , Redes Reguladoras de Genes , Modelos Genéticos , Filogenia , Algoritmos , Duplicación de Gen , Especiación Genética , Conceptos Matemáticos
18.
Bioinformatics ; 33(16): 2446-2454, 2017 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-28407097

RESUMEN

MOTIVATION: MicroRNAs are a class of ∼21-22 nt small RNAs which are excised from a stable hairpin-like secondary structure. They have important gene regulatory functions and are involved in many pathways including developmental timing, organogenesis and development in eukaryotes. There are several computational tools for miRNA detection from next-generation sequencing datasets. However, many of these tools suffer from high false positive and false negative rates. Here we present a novel miRNA prediction algorithm, miRCat2. miRCat2 incorporates a new entropy-based approach to detect miRNA loci, which is designed to cope with the high sequencing depth of current next-generation sequencing datasets. It has a user-friendly interface and produces graphical representations of the hairpin structure and plots depicting the alignment of sequences on the secondary structure. RESULTS: We test miRCat2 on a number of animal and plant datasets and present a comparative analysis with miRCat, miRDeep2, miRPlant and miReap. We also use mutants in the miRNA biogenesis pathway to evaluate the predictions of these tools. Results indicate that miRCat2 has an improved accuracy compared with other methods tested. Moreover, miRCat2 predicts several new miRNAs that are differentially expressed in wild-type versus mutants in the miRNA biogenesis pathway. AVAILABILITY AND IMPLEMENTATION: miRCat2 is part of the UEA small RNA Workbench and is freely available from http://srna-workbench.cmp.uea.ac.uk/. CONTACT: v.moulton@uea.ac.uk or s.moxon@uea.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Sitios Genéticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , MicroARNs/genética , Programas Informáticos , Algoritmos , Animales , Entropía , Plantas/genética , Plantas/metabolismo , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos
19.
J Theor Biol ; 446: 160-167, 2018 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-29548737

RESUMEN

Phylogenetic networks are an extension of phylogenetic trees which are used to represent evolutionary histories in which reticulation events (such as recombination and hybridization) have occurred. A central question for such networks is that of identifiability, which essentially asks under what circumstances can we reliably identify the phylogenetic network that gave rise to the observed data? Recently, identifiability results have appeared for networks relative to a model of sequence evolution that generalizes the standard Markov models used for phylogenetic trees. However, these results are quite limited in terms of the complexity of the networks that are considered. In this paper, by introducing an alternative probabilistic model for evolution along a network that is based on some ground-breaking work by Thatte for pedigrees, we are able to obtain an identifiability result for a much larger class of phylogenetic networks (essentially the class of so-called tree-child networks). To prove our main theorem, we derive some new results for identifying tree-child networks combinatorially, and then adapt some techniques developed by Thatte for pedigrees to show that our combinatorial results imply identifiability in the probabilistic setting. We hope that the introduction of our new model for networks could lead to new approaches to reliably construct phylogenetic networks.


Asunto(s)
Algoritmos , Evolución Molecular , Modelos Genéticos , Mutación , Filogenia , Recombinación Genética
20.
Bull Math Biol ; 80(8): 2137-2153, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-29869043

RESUMEN

An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set X of species from a collection of trees, each having leaf-set some subset of X. In the 1980s, Colonius and Schulze gave certain inference rules for deciding when a collection of 4-leaved trees, one for each 4-element subset of X, can be simultaneously displayed by a single supertree with leaf-set X. Recently, it has become of interest to extend this and related results to phylogenetic networks. These are a generalization of phylogenetic trees which can be used to represent reticulate evolution (where species can come together to form a new species). It has recently been shown that a certain type of phylogenetic network, called a (unrooted) level-1 network, can essentially be constructed from 4-leaved trees. However, the problem of providing appropriate inference rules for such networks remains unresolved. Here, we show that by considering 4-leaved networks, called quarnets, as opposed to 4-leaved trees, it is possible to provide such rules. In particular, we show that these rules can be used to characterize when a collection of quarnets, one for each 4-element subset of X, can all be simultaneously displayed by a level-1 network with leaf-set X. The rules are an intriguing mixture of tree inference rules, and an inference rule for building up a cyclic ordering of X from orderings on subsets of X of size 4. This opens up several new directions of research for inferring phylogenetic networks from smaller ones, which could yield new algorithms for solving the supernetwork problem in phylogenetics.


Asunto(s)
Modelos Biológicos , Filogenia , Evolución Biológica , Especiación Genética , Conceptos Matemáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA