Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bull Math Biol ; 81(2): 431-451, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-29392644

RESUMO

Distances between sequences based on their k-mer frequency counts can be used to reconstruct phylogenies without first computing a sequence alignment. Past work has shown that effective use of k-mer methods depends on (1) model-based corrections to distances based on k-mers and (2) breaking long sequences into blocks to obtain repeated trials from the sequence-generating process. Good performance of such methods is based on having many high-quality blocks with many homologous sites, which can be problematic to guarantee a priori. Nature provides natural blocks of sequences into homologous regions-namely, the genes. However, directly using past work in this setting is problematic because of possible discordance between different gene trees and the underlying species tree. Using the multispecies coalescent model as a basis, we derive model-based moment formulas that involve the species divergence times and the coalescent parameters. From this setting, we prove identifiability results for the tree and branch length parameters under the Jukes-Cantor model of sequence mutations.


Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Biologia Computacional , Evolução Molecular , Cadeias de Markov , Conceitos Matemáticos , Modelos Estatísticos , Mutação , Probabilidade , Alinhamento de Sequência/estatística & dados numéricos , Processos Estocásticos
2.
J Comput Biol ; 24(2): 153-171, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-27387364

RESUMO

Frequencies of k-mers in sequences are sometimes used as a basis for inferring phylogenetic trees without first obtaining a multiple sequence alignment. We show that a standard approach of using the squared Euclidean distance between k-mer vectors to approximate a tree metric can be statistically inconsistent. To remedy this, we derive model-based distance corrections for orthologous sequences without gaps, which lead to consistent tree inference. The identifiability of model parameters from k-mer frequencies is also studied. Finally, we report simulations showing that the corrected distance outperforms many other k-mer methods, even when sequences are generated with an insertion and deletion process. These results have implications for multiple sequence alignment as well since k-mer methods are usually the first step in constructing a guide tree for such algorithms.


Assuntos
Algoritmos , Modelos Genéticos , Filogenia , Sequência de Bases , Simulação por Computador , Evolução Molecular , Alinhamento de Sequência , Análise de Sequência de DNA
3.
Bull Math Biol ; 77(8): 1620-51, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26337290

RESUMO

Identifiability concerns finding which unknown parameters of a model can be estimated, uniquely or otherwise, from given input-output data. If some subset of the parameters of a model cannot be determined given input-output data, then we say the model is unidentifiable. In this work, we study linear compartment models, which are a class of biological models commonly used in pharmacokinetics, physiology, and ecology. In past work, we used commutative algebra and graph theory to identify a class of linear compartment models that we call identifiable cycle models, which are unidentifiable but have the simplest possible identifiable functions (so-called monomial cycles). Here we show how to modify identifiable cycle models by adding inputs, adding outputs, or removing leaks, in such a way that we obtain an identifiable model. We also prove a constructive result on how to combine identifiable models, each corresponding to strongly connected graphs, into a larger identifiable model. We apply these theoretical results to several real-world biological models from physiology, cell biology, and ecology.


Assuntos
Modelos Lineares , Modelos Biológicos , Animais , Endossomos/metabolismo , Humanos , Manganês/farmacocinética , Conceitos Matemáticos , Ratos
4.
PLoS One ; 9(2): e86411, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24523860

RESUMO

We solve the local and global structural identifiability problems for viscoelastic mechanical models represented by networks of springs and dashpots. We propose a very simple characterization of both local and global structural identifiability based on identifiability tables, with the purpose of providing a guideline for constructing arbitrarily complex, identifiable spring-dashpot networks. We illustrate how to use our results in a number of examples and point to some applications in cardiovascular modeling.


Assuntos
Modelos Cardiovasculares , Algoritmos , Artérias/patologia , Pressão Sanguínea , Simulação por Computador , Elasticidade , Endotélio Vascular/patologia , Humanos , Estresse Mecânico , Viscosidade
5.
Artigo em Inglês | MEDLINE | ID: mdl-26355780

RESUMO

Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and neighbor-joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least-squares phylogeny, UPGMA, and neighbor-joining when the true tree has a single polytomy with exactly four neighbors. Our results suggest that in some circumstances, UPGMA and neighbor-joining poorly match least-squares phylogeny.


Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Algoritmos , Evolução Molecular
6.
Syst Biol ; 61(6): 1049-59, 2012 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-22798332

RESUMO

Phylogenetic mixture models, in which the sites in sequences undergo different substitution processes along the same or different trees, allow the description of heterogeneous evolutionary processes. As data sets consisting of longer sequences become available, it is important to understand such models, for both theoretical insights and use in statistical analyses. Some recent articles have highlighted disturbing "mimicking" behavior in which a distribution from a mixture model is identical to one arising on a different tree or trees. Other works have indicated such problems are unlikely to occur in practice, as they require very special parameter choices. After surveying some of these works on mixture models, we give several new results. In general, if the number of components in a generating mixture is not too large and we disallow zero or infinite branch lengths, then it cannot mimic the behavior of a nonmixture on a different tree. On the other hand, if the mixture model is locally overparameterized, it is possible for a phylogenetic mixture model to mimic distributions of another tree model. Although theoretical questions remain, these sorts of results can serve as a guide to when the use of mixture models in either maximum likelihood or Bayesian frameworks is likely to lead to statistically consistent inference, and when mimicking due to heterogeneity should be considered a realistic possibility. [Phylogenetic mixture models; parameter identifiability; heterogeneous sequence evolution.].


Assuntos
Modelos Biológicos , Filogenia , Algoritmos , Simulação por Computador , Interpretação Estatística de Dados
7.
Bull Math Biol ; 74(1): 212-31, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21717285

RESUMO

Phylogenetic mixture models are statistical models of character evolution allowing for heterogeneity. Each of the classes in some unknown partition of the characters may evolve by different processes, or even along different trees. Such models are of increasing interest for data analysis, as they can capture the variety of evolutionary processes that may be occurring across long sequences of DNA or proteins. The fundamental question of whether parameters of such a model are identifiable is difficult to address, due to the complexity of the parameterization. Identifiability is, however, essential to their use for statistical inference.We analyze mixture models on large trees, with many mixture components, showing that both numerical and tree parameters are indeed identifiable in these models when all trees are the same. This provides a theoretical justification for some current empirical studies, and indicates that extensions to even more mixture components should be theoretically well behaved. We also extend our results to certain mixtures on different trees, using the same algebraic techniques.


Assuntos
Evolução Molecular , Modelos Genéticos , Modelos Estatísticos , Filogenia
8.
Artigo em Inglês | MEDLINE | ID: mdl-20733238

RESUMO

Phylogenetic data arising on two possibly different tree topologies might be mixed through several biological mechanisms, including incomplete lineage sorting or horizontal gene transfer in the case of different topologies, or simply different substitution processes on characters in the case of the same topology. Recent work on a 2-state symmetric model of character change showed that for 4 taxa, such a mixture model has nonidentifiable parameters, and thus, it is theoretically impossible to determine the two tree topologies from any amount of data under such circumstances. Here, the question of identifiability is investigated for two-tree mixtures of the 4-state group-based models, which are more relevant to DNA sequence data. Using algebraic techniques, we show that the tree parameters are identifiable for the JC and K2P models. We also prove that generic substitution parameters for the JC mixture models are identifiable, and for the K2P and K3P models obtain generic identifiability results for mixtures on the same tree. This indicates that the full phylogenetic signal remains in such mixtures, and the 2-state symmetric result is thus a misleading guide to the behavior of other models.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Genéticos , Modelos Estatísticos , Filogenia , DNA/genética , Bases de Dados de Ácidos Nucleicos , Cadeias de Markov
9.
J Comput Biol ; 12(4): 457-81, 2005 May.
Artigo em Inglês | MEDLINE | ID: mdl-15882142

RESUMO

Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an Abelian group. Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gröbner bases for these toric ideals. For the Jukes-Cantor and Kimura models on a binary tree, our Gröbner bases consist of certain explicitly constructed polynomials of degree at most four.


Assuntos
Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Modelos Estatísticos , Filogenia , Evolução Molecular
10.
J Comput Biol ; 12(2): 204-28, 2005 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-15767777

RESUMO

Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an abelian group. Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gröbner bases for these toric ideals. For the Jukes-Cantor and Kimura models on a binary tree, our Gröbner bases consist of certain explicitly constructed polynomials of degree at most four.


Assuntos
Biologia Computacional/estatística & dados numéricos , Modelos Genéticos , Filogenia , Animais , Interpretação Estatística de Dados , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...