Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 79
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Genet ; 20(5): e1011274, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38768268

RESUMO

Molecular dissection of meiotic recombination in mammals, combined with population-genetic and comparative studies, have revealed a complex evolutionary dynamic characterized by short-lived recombination hotspots. Hotspots are chromosome positions containing DNA sequences where the protein PRDM9 can bind and cause crossing-over. To explain these fast evolutionary dynamic, a so-called intra-genomic Red Queen model has been proposed, based on the interplay between two antagonistic forces: biased gene conversion, mediated by double-strand breaks, resulting in hotspot extinction (the hotspot conversion paradox), followed by positive selection favoring mutant PRDM9 alleles recognizing new sequence motifs. Although this model predicts many empirical observations, the exact causes of the positive selection acting on new PRDM9 alleles is still not well understood. In this direction, experiment on mouse hybrids have suggested that, in addition to targeting double strand breaks, PRDM9 has another role during meiosis. Specifically, PRDM9 symmetric binding (simultaneous binding at the same site on both homologues) would facilitate homology search and, as a result, the pairing of the homologues. Although discovered in hybrids, this second function of PRDM9 could also be involved in the evolutionary dynamic observed within populations. To address this point, here, we present a theoretical model of the evolutionary dynamic of meiotic recombination integrating current knowledge about the molecular function of PRDM9. Our modeling work gives important insights into the selective forces driving the turnover of recombination hotspots. Specifically, the reduced symmetrical binding of PRDM9 caused by the loss of high affinity binding sites induces a net positive selection eliciting new PRDM9 alleles recognizing new targets. The model also offers new insights about the influence of the gene dosage of PRDM9, which can paradoxically result in negative selection on new PRDM9 alleles entering the population, driving their eviction and thus reducing standing variation at this locus.


Assuntos
Evolução Molecular , Histona-Lisina N-Metiltransferase , Meiose , Histona-Lisina N-Metiltransferase/genética , Histona-Lisina N-Metiltransferase/metabolismo , Meiose/genética , Animais , Camundongos , Conversão Gênica , Quebras de DNA de Cadeia Dupla , Alelos , Modelos Genéticos , Humanos , Recombinação Genética
2.
Proc Natl Acad Sci U S A ; 120(11): e2214977120, 2023 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-36897968

RESUMO

Adaptation in protein-coding sequences can be detected from multiple sequence alignments across species or alternatively by leveraging polymorphism data within a population. Across species, quantification of the adaptive rate relies on phylogenetic codon models, classically formulated in terms of the ratio of nonsynonymous over synonymous substitution rates. Evidence of an accelerated nonsynonymous substitution rate is considered a signature of pervasive adaptation. However, because of the background of purifying selection, these models are potentially limited in their sensitivity. Recent developments have led to more sophisticated mutation-selection codon models aimed at making a more detailed quantitative assessment of the interplay between mutation, purifying, and positive selection. In this study, we conducted a large-scale exome-wide analysis of placental mammals with mutation-selection models, assessing their performance at detecting proteins and sites under adaptation. Importantly, mutation-selection codon models are based on a population-genetic formalism and thus are directly comparable to the McDonald and Kreitman test at the population level to quantify adaptation. Taking advantage of this relationship between phylogenetic and population genetics analyses, we integrated divergence and polymorphism data across the entire exome for 29 populations across 7 genera and showed that proteins and sites detected to be under adaptation at the phylogenetic scale are also under adaptation at the population-genetic scale. Altogether, our exome-wide analysis shows that phylogenetic mutation-selection codon models and the population-genetic test of adaptation can be reconciled and are congruent, paving the way for integrative models and analyses across individuals and populations.


Assuntos
Evolução Molecular , Seleção Genética , Humanos , Feminino , Gravidez , Animais , Filogenia , Placenta , Genética Populacional , Códon , Modelos Genéticos , Mamíferos/genética
3.
Syst Biol ; 72(3): 616-638, 2023 06 17.
Artigo em Inglês | MEDLINE | ID: mdl-36810802

RESUMO

There is still no consensus as to how to select models in Bayesian phylogenetics, and more generally in applied Bayesian statistics. Bayes factors are often presented as the method of choice, yet other approaches have been proposed, such as cross-validation or information criteria. Each of these paradigms raises specific computational challenges, but they also differ in their statistical meaning, being motivated by different objectives: either testing hypotheses or finding the best-approximating model. These alternative goals entail different compromises, and as a result, Bayes factors, cross-validation, and information criteria may be valid for addressing different questions. Here, the question of Bayesian model selection is revisited, with a focus on the problem of finding the best-approximating model. Several model selection approaches were re-implemented, numerically assessed and compared: Bayes factors, cross-validation (CV), in its different forms (k-fold or leave-one-out), and the widely applicable information criterion (wAIC), which is asymptotically equivalent to leave-one-out cross-validation (LOO-CV). Using a combination of analytical results and empirical and simulation analyses, it is shown that Bayes factors are unduly conservative. In contrast, CV represents a more adequate formalism for selecting the model returning the best approximation of the data-generating process and the most accurate estimates of the parameters of interest. Among alternative CV schemes, LOO-CV and its asymptotic equivalent represented by the wAIC, stand out as the best choices, conceptually and computationally, given that both can be simultaneously computed based on standard Markov chain Monte Carlo runs under the posterior distribution. [Bayes factor; cross-validation; marginal likelihood; model comparison; wAIC.].


Assuntos
Filogenia , Teorema de Bayes , Simulação por Computador , Probabilidade , Cadeias de Markov , Método de Monte Carlo
4.
Syst Biol ; 72(4): 767-780, 2023 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-36946562

RESUMO

Accurate phylogenies are fundamental to our understanding of the pattern and process of evolution. Yet, phylogenies at deep evolutionary timescales, with correspondingly long branches, have been fraught with controversy resulting from conflicting estimates from models with varying complexity and goodness of fit. Analyses of historical as well as current empirical datasets, such as alignments including Microsporidia, Nematoda, or Platyhelminthes, have demonstrated that inadequate modeling of across-site compositional heterogeneity, which is the result of biochemical constraints that lead to varying patterns of accepted amino acids along sequences, can lead to erroneous topologies that are strongly supported. Unfortunately, models that adequately account for across-site compositional heterogeneity remain computationally challenging or intractable for an increasing fraction of contemporary datasets. Here, we introduce "compositional constraint analysis," a method to investigate the effect of site-specific constraints on amino acid composition on phylogenetic inference. We show that more constrained sites with lower diversity and less constrained sites with higher diversity exhibit ostensibly conflicting signals under models ignoring across-site compositional heterogeneity that lead to long-branch attraction artifacts and demonstrate that more complex models accounting for across-site compositional heterogeneity can ameliorate this bias. We present CAT-posterior mean site frequencies (PMSF), a pipeline for diagnosing and resolving phylogenetic bias resulting from inadequate modeling of across-site compositional heterogeneity based on the CAT model. CAT-PMSF is robust against long-branch attraction in all alignments we have examined. We suggest using CAT-PMSF when convergence of the CAT model cannot be assured. We find evidence that compositionally constrained sites are driving long-branch attraction in two metazoan datasets and recover evidence for Porifera as the sister group to all other animals. [Animal phylogeny; cross-site heterogeneity; long-branch attraction; phylogenomics.].


Assuntos
Microsporídios , Animais , Filogenia , Viés , Modelos Genéticos
5.
Mol Biol Evol ; 39(2)2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35021218

RESUMO

Phylogenetic codon models are routinely used to characterize selective regimes in coding sequences. Their parametric design, however, is still a matter of debate, in particular concerning the question of how to account for differing nucleotide frequencies and substitution rates. This problem relates to the fact that nucleotide composition in protein-coding sequences is the result of the interactions between mutation and selection. In particular, because of the structure of the genetic code, the nucleotide composition differs between the three coding positions, with the third position showing a more extreme composition. Yet, phylogenetic codon models do not correctly capture this phenomenon and instead predict that the nucleotide composition should be the same for all three positions. Alternatively, some models allow for different nucleotide rates at the three positions, an approach conflating the effects of mutation and selection on nucleotide composition. In practice, it results in inaccurate estimation of the strength of selection. Conceptually, the problem comes from the fact that phylogenetic codon models do not correctly capture the fixation bias acting against the mutational pressure at the mutation-selection equilibrium. To address this problem and to more accurately identify mutation rates and selection strength, we present an improved codon modeling approach where the fixation rate is not seen as a scalar, but as a tensor. This approach gives an accurate representation of how mutation and selection oppose each other at equilibrium and yields a reliable estimate of the mutational process, while disentangling the mean fixation probabilities prevailing in different mutational directions.


Assuntos
Código Genético , Seleção Genética , Códon/genética , Modelos Genéticos , Mutação , Filogenia
6.
Mol Biol Evol ; 38(10): 4573-4587, 2021 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-34191010

RESUMO

Mutation-selection phylogenetic codon models are grounded on population genetics first principles and represent a principled approach for investigating the intricate interplay between mutation, selection, and drift. In their current form, mutation-selection codon models are entirely characterized by the collection of site-specific amino-acid fitness profiles. However, thus far, they have relied on the assumption of a constant genetic drift, translating into a unique effective population size (Ne) across the phylogeny, clearly an unrealistic assumption. This assumption can be alleviated by introducing variation in Ne between lineages. In addition to Ne, the mutation rate (µ) is susceptible to vary between lineages, and both should covary with life-history traits (LHTs). This suggests that the model should more globally account for the joint evolutionary process followed by all of these lineage-specific variables (Ne, µ, and LHTs). In this direction, we introduce an extended mutation-selection model jointly reconstructing in a Bayesian Monte Carlo framework the fitness landscape across sites and long-term trends in Ne, µ, and LHTs along the phylogeny, from an alignment of DNA coding sequences and a matrix of observed LHTs in extant species. The model was tested against simulated data and applied to empirical data in mammals, isopods, and primates. The reconstructed history of Ne in these groups appears to correlate with LHTs or ecological variables in a way that suggests that the reconstruction is reasonable, at least in its global trends. On the other hand, the range of variation in Ne inferred across species is surprisingly narrow. This last point suggests that some of the assumptions of the model, in particular concerning the assumed absence of epistatic interactions between sites, are potentially problematic.


Assuntos
Modelos Genéticos , Seleção Genética , Animais , Teorema de Bayes , Evolução Molecular , Mamíferos , Mutação , Filogenia , Densidade Demográfica
7.
Mol Biol Evol ; 38(3): 1199-1208, 2021 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-33045094

RESUMO

In recent years, codon substitution models based on the mutation-selection principle have been extended for the purpose of detecting signatures of adaptive evolution in protein-coding genes. However, the approaches used to date have either focused on detecting global signals of adaptive regimes-across the entire gene-or on contexts where experimentally derived, site-specific amino acid fitness profiles are available. Here, we present a Bayesian site-heterogeneous mutation-selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment. We offer implementations, briefly present simulation results, and apply the approach on a few real data sets. Our analyses suggest that the new approach shows greater sensitivity than traditional methods. However, more study is required to assess the impact of potential model violations on the method, and gain a greater empirical sense its behavior on a broader range of real data sets. We propose an outline of such a research program.


Assuntos
Evolução Biológica , Técnicas Genéticas , Modelos Genéticos , Mutação , Seleção Genética , Teorema de Bayes
8.
Mol Biol Evol ; 37(12): 3616-3631, 2020 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-32877529

RESUMO

Biochemical demands constrain the range of amino acids acceptable at specific sites resulting in across-site compositional heterogeneity of the amino acid replacement process. Phylogenetic models that disregard this heterogeneity are prone to systematic errors, which can lead to severe long-branch attraction artifacts. State-of-the-art models accounting for across-site compositional heterogeneity include the CAT model, which is computationally expensive, and empirical distribution mixture models estimated via maximum likelihood (C10-C60 models). Here, we present a new, scalable method EDCluster for finding empirical distribution mixture models involving a simple cluster analysis. The cluster analysis utilizes specific coordinate transformations which allow the detection of specialized amino acid distributions either from curated databases or from the alignment at hand. We apply EDCluster to the HOGENOM and HSSP databases in order to provide universal distribution mixture (UDM) models comprising up to 4,096 components. Detailed analyses of the UDM models demonstrate the removal of various long-branch attraction artifacts and improved performance compared with the C10-C60 models. Ready-to-use implementations of the UDM models are provided for three established software packages (IQ-TREE, Phylobayes, and RevBayes).


Assuntos
Substituição de Aminoácidos , Técnicas Genéticas , Modelos Genéticos , Filogenia , Software , Análise por Conglomerados
9.
Mol Biol Evol ; 35(11): 2819-2834, 2018 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30203003

RESUMO

A key question in molecular evolutionary biology concerns the relative roles of mutation and selection in shaping genomic data. Moreover, features of mutation and selection are heterogeneous along the genome and over time. Mechanistic codon substitution models based on the mutation-selection framework are promising approaches to separating these effects. In practice, however, several complications arise, since accounting for such heterogeneities often implies handling models of high dimensionality (e.g., amino acid preferences), or leads to across-site dependence (e.g., CpG hypermutability), making the likelihood function intractable. Approximate Bayesian Computation (ABC) could address this latter issue. Here, we propose a new approach, named Conditional ABC (CABC), which combines the sampling efficiency of MCMC and the flexibility of ABC. To illustrate the potential of the CABC approach, we apply it to the study of mammalian CpG hypermutability based on a new mutation-level parameter implying dependence across adjacent sites, combined with site-specific purifying selection on amino-acids captured by a Dirichlet process. Our proof-of-concept of the CABC methodology opens new modeling perspectives. Our application of the method reveals a high level of heterogeneity of CpG hypermutability across loci and mild heterogeneity across taxonomic groups; and finally, we show that CpG hypermutability is an important evolutionary factor in rendering relative synonymous codon usage. All source code is available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).


Assuntos
Evolução Molecular , Técnicas Genéticas , Modelos Genéticos , Mutação , Seleção Genética , Animais , Teorema de Bayes , Humanos , Mamíferos/genética , Método de Monte Carlo
10.
Mol Biol Evol ; 35(12): 2900-2912, 2018 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-30247705

RESUMO

The rate of molecular evolution varies widely among species. Life history traits (LHTs) have been proposed as a major driver of these variations. However, the relative contribution of each trait is poorly understood. Here, we test the influence of metabolic rate (MR), longevity, and generation time (GT) on the nuclear and mitochondrial synonymous substitution rates using a group of isopod species that have made multiple independent transitions to subterranean environments. Subterranean species have repeatedly evolved a lower MR, a longer lifespan and a longer GT. We assembled the nuclear transcriptomes and the mitochondrial genomes of 13 pairs of closely related isopods, each pair composed of one surface and one subterranean species. We found that subterranean species have a lower rate of nuclear synonymous substitution than surface species whereas the mitochondrial rate remained unchanged. We propose that this decoupling between nuclear and mitochondrial rates comes from different DNA replication processes in these two compartments. In isopods, the nuclear rate is probably tightly controlled by GT alone. In contrast, mitochondrial genomes appear to replicate and mutate at a rate independent of LHTs. These results are incongruent with previous studies, which were mostly devoted to vertebrates. We suggest that this incongruence can be explained by developmental differences between animal clades, with a quiescent period during female gametogenesis in mammals and birds which imposes a nuclear and mitochondrial rate coupling, as opposed to the continuous gametogenesis observed in most arthropods.


Assuntos
Evolução Molecular , Genoma Mitocondrial , Isópodes/genética , Características de História de Vida , Animais , Replicação do DNA , Ecossistema , Transporte de Elétrons , Isópodes/metabolismo , Isópodes/efeitos da radiação , Biossíntese de Proteínas , Seleção Genética
11.
Mol Biol Evol ; 34(1): 204-214, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27744408

RESUMO

Codon substitution models have traditionally attempted to uncover signatures of adaptation within protein-coding genes by contrasting the rates of synonymous and non-synonymous substitutions. Another modeling approach, known as the mutation-selection framework, attempts to explicitly account for selective patterns at the amino acid level, with some approaches allowing for heterogeneity in these patterns across codon sites. Under such a model, substitutions at a given position occur at the neutral or nearly neutral rate when they are synonymous, or when they correspond to replacements between amino acids of similar fitness; substitutions from high to low (low to high) fitness amino acids have comparatively low (high) rates. Here, we study the use of such a mutation-selection framework as a null model for the detection of adaptation. Following previous works in this direction, we include a deviation parameter that has the effect of capturing the surplus, or deficit, in non-synonymous rates, relative to what would be expected under a mutation-selection modeling framework that includes a Dirichlet process approach to account for across-codon-site variation in amino acid fitness profiles. We use simulations, along with a few real data sets, to study the behavior of the approach, and find it to have good power with a low false-positive rate. Altogether, we emphasize the potential of recent mutation-selection models in the detection of adaptation, calling for further model refinements as well as large-scale applications.


Assuntos
Adaptação Biológica/genética , Substituição de Aminoácidos , Códon , Modelos Genéticos , Seleção Genética/genética , Aminoácidos/genética , Teorema de Bayes , Simulação por Computador , Epistasia Genética , Evolução Molecular , Heterogeneidade Genética , Mutação , Taxa de Mutação , Filogenia
12.
Proc Natl Acad Sci U S A ; 112(50): 15402-7, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26621703

RESUMO

Understanding how complex traits, such as epithelia, nervous systems, muscles, or guts, originated depends on a well-supported hypothesis about the phylogenetic relationships among major animal lineages. Traditionally, sponges (Porifera) have been interpreted as the sister group to the remaining animals, a hypothesis consistent with the conventional view that the last common animal ancestor was relatively simple and more complex body plans arose later in evolution. However, this premise has recently been challenged by analyses of the genomes of comb jellies (Ctenophora), which, instead, found ctenophores as the sister group to the remaining animals (the "Ctenophora-sister" hypothesis). Because ctenophores are morphologically complex predators with true epithelia, nervous systems, muscles, and guts, this scenario implies these traits were either present in the last common ancestor of all animals and were lost secondarily in sponges and placozoans (Trichoplax) or, alternatively, evolved convergently in comb jellies. Here, we analyze representative datasets from recent studies supporting Ctenophora-sister, including genome-scale alignments of concatenated protein sequences, as well as a genomic gene content dataset. We found no support for Ctenophora-sister and conclude it is an artifact resulting from inadequate methodology, especially the use of simplistic evolutionary models and inappropriate choice of species to root the metazoan tree. Our results reinforce a traditional scenario for the evolution of complexity in animals, and indicate that inferences about the evolution of Metazoa based on the Ctenophora-sister hypothesis are not supported by the currently available data.


Assuntos
Ctenóforos/classificação , Ctenóforos/genética , Bases de Dados Genéticas , Genoma , Animais , Teorema de Bayes , Viés , Funções Verossimilhança , Modelos Genéticos , Filogenia , Reprodutibilidade dos Testes , Seleção Genética
13.
BMC Evol Biol ; 17(1): 147, 2017 06 23.
Artigo em Inglês | MEDLINE | ID: mdl-28645318

RESUMO

BACKGROUND: Phylogenetic codon models are often used to characterize the selective regimes acting on protein-coding sequences. Recent methodological developments have led to models explicitly accounting for the interplay between mutation and selection, by modeling the amino acid fitness landscape along the sequence. However, thus far, most of these models have assumed that the fitness landscape is constant over time. Fluctuations of the fitness landscape may often be random or depend on complex and unknown factors. However, some organisms may be subject to systematic changes in selective pressure, resulting in reproducible molecular adaptations across independent lineages subject to similar conditions. RESULTS: Here, we introduce a codon-based differential selection model, which aims to detect and quantify the fine-grained consistent patterns of adaptation at the protein-coding level, as a function of external conditions experienced by the organism under investigation. The model parameterizes the global mutational pressure, as well as the site- and condition-specific amino acid selective preferences. This phylogenetic model is implemented in a Bayesian MCMC framework. After validation with simulations, we applied our method to a dataset of HIV sequences from patients with known HLA genetic background. Our differential selection model detects and characterizes differentially selected coding positions specifically associated with two different HLA alleles. CONCLUSION: Our differential selection model is able to identify consistent molecular adaptations as a function of repeated changes in the environment of the organism. These models can be applied to many other problems, ranging from viral adaptation to evolution of life-history strategies in plants or animals.


Assuntos
Teorema de Bayes , Códon , HIV/genética , Modelos Genéticos , Sequência de Aminoácidos , Evolução Molecular , HIV/fisiologia , Humanos , Mutação , Fases de Leitura Aberta , Filogenia , Seleção Genética , Produtos do Gene gag do Vírus da Imunodeficiência Humana/química , Produtos do Gene gag do Vírus da Imunodeficiência Humana/genética
15.
Syst Biol ; 65(4): 726-36, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27235697

RESUMO

Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.].


Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Software , Teorema de Bayes
16.
PLoS Genet ; 10(11): e1004790, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25393762

RESUMO

Recombination is an essential process in eukaryotes, which increases diversity by disrupting genetic linkage between loci and ensures the proper segregation of chromosomes during meiosis. In the human genome, recombination events are clustered in hotspots, whose location is determined by the PRDM9 protein. There is evidence that the location of hotspots evolves rapidly, as a consequence of changes in PRDM9 DNA-binding domain. However, the reasons for these changes and the rate at which they occur are not known. In this study, we investigated the evolution of human hotspot loci and of PRDM9 target motifs, both in modern and archaic human lineages (Denisovan) to quantify the dynamic of hotspot turnover during the recent period of human evolution. We show that present-day human hotspots are young: they have been active only during the last 10% of the time since the divergence from chimpanzee, starting to be operating shortly before the split between Denisovans and modern humans. Surprisingly, however, our analyses indicate that Denisovan recombination hotspots did not overlap with modern human ones, despite sharing similar PRDM9 target motifs. We further show that high-affinity PRDM9 target motifs are subject to a strong self-destructive drive, known as biased gene conversion (BGC), which should lead to the loss of the majority of them in the next 3 MYR. This depletion of PRDM9 genomic targets is expected to decrease fitness, and thereby to favor new PRDM9 alleles binding different motifs. Our refined estimates of the age and life expectancy of human hotspots provide empirical evidence in support of the Red Queen hypothesis of recombination hotspots evolution.


Assuntos
Troca Genética , Evolução Molecular , Histona-Lisina N-Metiltransferase/genética , Recombinação Genética , Animais , Cromossomos/genética , Proteínas de Ligação a DNA , Conversão Gênica , Genoma Humano , Humanos , Meiose/genética , Pan troglodytes
18.
Bioinformatics ; 30(4): 488-96, 2014 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-24318999

RESUMO

MOTIVATION: Correlation between life history or ecological traits and genomic features such as nucleotide or amino acid composition can be used for reconstructing the evolutionary history of the traits of interest along phylogenies. Thus far, however, such ancestral reconstructions have been done using simple linear regression approaches that do not account for phylogenetic inertia. These reconstructions could instead be seen as a genuine comparative regression problem, such as formalized by classical generalized least-square comparative methods, in which the trait of interest and the molecular predictor are represented as correlated Brownian characters coevolving along the phylogeny. RESULTS: Here, a Bayesian sampler is introduced, representing an alternative and more efficient algorithmic solution to this comparative regression problem, compared with currently existing generalized least-square approaches. Technically, ancestral trait reconstruction based on a molecular predictor is shown to be formally equivalent to a phylogenetic Kalman filter problem, for which backward and forward recursions are developed and implemented in the context of a Markov chain Monte Carlo sampler. The comparative regression method results in more accurate reconstructions and a more faithful representation of uncertainty, compared with simple linear regression. Application to the reconstruction of the evolution of optimal growth temperature in Archaea, using GC composition in ribosomal RNA stems and amino acid composition of a sample of protein-coding genes, confirms previous findings, in particular, pointing to a hyperthermophilic ancestor for the kingdom. AVAILABILITY AND IMPLEMENTATION: The program is freely available at www.phylobayes.org.


Assuntos
Algoritmos , Archaea/genética , Teorema de Bayes , Evolução Biológica , Filogenia , Archaea/crescimento & desenvolvimento , Composição de Bases , Interpretação Estatística de Dados , Modelos Lineares , Cadeias de Markov , Modelos Biológicos , Método de Monte Carlo , Fenótipo , RNA Ribossômico/genética , Temperatura
19.
Bioinformatics ; 30(21): 3020-8, 2014 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-25053744

RESUMO

MOTIVATION: Brownian models have been introduced in phylogenetics for describing variation in substitution rates through time, with applications to molecular dating or to the comparative analysis of variation in substitution patterns among lineages. Thus far, however, the Monte Carlo implementations of these models have relied on crude approximations, in which the Brownian process is sampled only at the internal nodes of the phylogeny or at the midpoints along each branch, and the unknown trajectory between these sampled points is summarized by simple branchwise average substitution rates. RESULTS: A more accurate Monte Carlo approach is introduced, explicitly sampling a fine-grained discretization of the trajectory of the (potentially multivariate) Brownian process along the phylogeny. Generic Monte Carlo resampling algorithms are proposed for updating the Brownian paths along and across branches. Specific computational strategies are developed for efficient integration of the finite-time substitution probabilities across branches induced by the Brownian trajectory. The mixing properties and the computational complexity of the resulting Markov chain Monte Carlo sampler scale reasonably with the discretization level, allowing practical applications with up to a few hundred discretization points along the entire depth of the tree. The method can be generalized to other Markovian stochastic processes, making it possible to implement a wide range of time-dependent substitution models with well-controlled computational precision. AVAILABILITY: The program is freely available at www.phylobayes.org.


Assuntos
Algoritmos , Filogenia , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo
20.
Bioinformatics ; 30(7): 1020-1, 2014 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-24351710

RESUMO

MOTIVATION: In recent years, there has been an increasing interest in the potential of codon substitution models for a variety of applications. However, the computational demands of these models have sometimes lead to the adoption of oversimplified assumptions, questionable statistical methods or a limited focus on small data sets. RESULTS: Here, we offer a scalable, message-passing-interface-based Bayesian implementation of site-heterogeneous codon models in the mutation-selection framework. Our software jointly infers the global mutational parameters at the nucleotide level, the branch lengths of the tree and a Dirichlet process governing across-site variation at the amino acid level. We focus on an example estimation of the distribution of selection coefficients from an alignment of several hundred sequences of the influenza PB2 gene, and highlight the site-specific characterization enabled by such a modeling approach. Finally, we discuss future potential applications of the software for conducting evolutionary inferences. AVAILABILITY AND IMPLEMENTATION: The models are implemented within the PhyloBayes-MPI package, (available at phylobayes.org) along with usage details in the accompanying manual.


Assuntos
Mutação , Filogenia , Teorema de Bayes , Códon , Modelos Genéticos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA