RESUMO
Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.
Assuntos
Algoritmos , Fluxo Gênico , Animais , Filogenia , Simulação por Computador , Teorema de Bayes , Funções Verossimilhança , Modelos GenéticosRESUMO
Epidemiology has been transformed by the advent of Bayesian phylodynamic models that allow researchers to infer the geographic history of pathogen dispersal over a set of discrete geographic areas [1, 2]. These models provide powerful tools for understanding the spatial dynamics of disease outbreaks, but contain many parameters that are inferred from minimal geographic information (i.e., the single area in which each pathogen was sampled). Consequently, inferences under these models are inherently sensitive to our prior assumptions about the model parameters. Here, we demonstrate that the default priors used in empirical phylodynamic studies make strong and biologically unrealistic assumptions about the underlying geographic process. We provide empirical evidence that these unrealistic priors strongly (and adversely) impact commonly reported aspects of epidemiological studies, including: 1) the relative rates of dispersal between areas; 2) the importance of dispersal routes for the spread of pathogens among areas; 3) the number of dispersal events between areas, and; 4) the ancestral area in which a given outbreak originated. We offer strategies to avoid these problems, and develop tools to help researchers specify more biologically reasonable prior models that will realize the full potential of phylodynamic methods to elucidate pathogen biology and, ultimately, inform surveillance and monitoring policies to mitigate the impacts of disease outbreaks.
Assuntos
Surtos de Doenças , Filogenia , Teorema de BayesRESUMO
Ancient DNA (aDNA) is increasingly being used to investigate questions such as the phylogenetic relationships and divergence times of extant and extinct species. If aDNA samples are sufficiently old, expected branch lengths (in units of nucleotide substitutions) are reduced relative to contemporary samples. This can be accounted for by incorporating sample ages into phylogenetic analyses. Existing methods that use tip (sample) dates infer gene trees rather than species trees, which can lead to incorrect or biased inferences of the species tree. Methods using a multispecies coalescent (MSC) model overcome these issues. We developed an MSC model with tip dates and implemented it in the program bpp. The method performed well for a range of biologically realistic scenarios, estimating calibrated divergence times and mutation rates precisely. Simulations suggest that estimation precision can be best improved by prioritizing sampling of many loci and more ancient samples. Incorrectly treating ancient samples as contemporary in analyzing simulated data, mimicking a common practice of empirical analyses, led to large systematic biases in model parameters, including divergence times. Two genomic datasets of mammoths and elephants were analyzed, demonstrating the method's empirical utility.
RESUMO
SUMMARY: Phylodynamic methods are central to studies of the geographic and demographic history of disease outbreaks. Inference under discrete-geographic phylodynamic models-which involve many parameters that must be inferred from minimal information-is inherently sensitive to our prior beliefs about the model parameters. We present an interactive utility, PrioriTree, to help researchers identify and accommodate prior sensitivity in discrete-geographic inferences. Specifically, PrioriTree provides a suite of functions to generate input files for-and summarize output from-BEAST analyses for performing robust Bayesian inference, data-cloning analyses and assessing the relative and absolute fit of candidate discrete-geographic (prior) models to empirical datasets. AVAILABILITY AND IMPLEMENTATION: PrioriTree is distributed as an R package available at https://github.com/jsigao/prioritree, with a comprehensive user manual provided at https://bookdown.org/jsigao/prioritree_manual/.
Assuntos
Surtos de Doenças , Software , Teorema de BayesRESUMO
Cross-species introgression can have significant impacts on phylogenomic reconstruction of species divergence events. Here, we used simulations to show how the presence of even a small amount of introgression can bias divergence time estimates when gene flow is ignored in the analysis. Using advances in analytical methods under the multispecies coalescent (MSC) model, we demonstrate that by accounting for incomplete lineage sorting and introgression using large phylogenomic data sets this problem can be avoided. The multispecies-coalescent-with-introgression (MSci) model is capable of accurately estimating both divergence times and ancestral effective population sizes, even when only a single diploid individual per species is sampled. We characterize some general expectations for biases in divergence time estimation under three different scenarios: 1) introgression between sister species, 2) introgression between non-sister species, and 3) introgression from an unsampled (i.e., ghost) outgroup lineage. We also conducted simulations under the isolation-with-migration (IM) model and found that the MSci model assuming episodic gene flow was able to accurately estimate species divergence times despite high levels of continuous gene flow. We estimated divergence times under the MSC and MSci models from two published empirical datasets with previous evidence of introgression, one of 372 target-enrichment loci from baobabs (Adansonia), and another of 1000 transcriptome loci from 14 species of the tomato relative, Jaltomata. The empirical analyses not only confirm our findings from simulations, demonstrating that the MSci model can reliably estimate divergence times but also show that divergence time estimation under the MSC can be robust to the presence of small amounts of introgression in empirical datasets with extensive taxon sampling. [divergence time; gene flow; hybridization; introgression; MSci model; multispecies coalescent].
Assuntos
Fluxo Gênico , Hibridização Genética , Filogenia , Modelos GenéticosRESUMO
The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes-Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.
Assuntos
Modelos Genéticos , Teorema de Bayes , Simulação por Computador , Cadeias de Markov , Método de Monte Carlo , FilogeniaRESUMO
Phylodynamic methods reveal the spatial and temporal dynamics of viral geographic spread, and have featured prominently in studies of the COVID-19 pandemic. Virtually all such studies are based on phylodynamic models that assume-despite direct and compelling evidence to the contrary-that rates of viral geographic dispersal are constant through time. Here, we: (1) extend phylodynamic models to allow both the average and relative rates of viral dispersal to vary independently between pre-specified time intervals; (2) implement methods to infer the number and timing of viral dispersal events between areas; and (3) develop statistics to assess the absolute fit of discrete-geographic phylodynamic models to empirical datasets. We first validate our new methods using simulations, and then apply them to a SARS-CoV-2 dataset from the early phase of the COVID-19 pandemic. We show that: (1) under simulation, failure to accommodate interval-specific variation in the study data will severely bias parameter estimates; (2) in practice, our interval-specific discrete-geographic phylodynamic models can significantly improve the relative and absolute fit to empirical data; and (3) the increased realism of our interval-specific models provides qualitatively different inferences regarding key aspects of the COVID-19 pandemic-revealing significant temporal variation in global viral dispersal rates, viral dispersal routes, and the number of viral dispersal events between areas-and alters interpretations regarding the efficacy of intervention measures to mitigate the pandemic.
Assuntos
COVID-19 , Pandemias , COVID-19/epidemiologia , Humanos , Filogenia , Filogeografia , SARS-CoV-2/genéticaRESUMO
Recent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here, we implement the multispecies-coalescent-with-introgression model, an extension of the multispecies-coalescent model to incorporate introgression, in our Bayesian Markov chain Monte Carlo program Bpp. The multispecies-coalescent-with-introgression model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Reanalysis of data sets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.
Assuntos
Introgressão Genética , Modelos Genéticos , Filogenia , Animais , Anopheles/genética , Teorema de Bayes , Picea/genética , Saccharomycetales/genéticaRESUMO
Recent analyses of genomic sequence data suggest cross-species gene flow is common in both plants and animals, posing challenges to species tree estimation. We examine the levels of gene flow needed to mislead species tree estimation with three species and either episodic introgressive hybridization or continuous migration between an outgroup and one ingroup species. Several species tree estimation methods are examined, including the majority-vote method based on the most common gene tree topology (with either the true or reconstructed gene trees used), the UPGMA method based on the average sequence distances (or average coalescent times) between species, and the full-likelihood method based on multilocus sequence data. Our results suggest that the majority-vote method based on gene tree topologies is more robust to gene flow than the UPGMA method based on coalescent times and both are more robust than likelihood assuming a multispecies coalescent (MSC) model with no cross-species gene flow. Comparison of the continuous migration model with the episodic introgression model suggests that a small amount of gene flow per generation can cause drastic changes to the genetic history of the species and mislead species tree methods, especially if the species diverged through radiative speciation events. Estimates of parameters under the MSC with gene flow suggest that African mosquito species in the Anopheles gambiae species complex constitute such an example of extreme impact of gene flow on species phylogeny. [IM; introgression; migration; MSci; multispecies coalescent; species tree.].
Assuntos
Classificação/métodos , Fluxo Gênico , Modelos Biológicos , Filogenia , Migração Animal , Animais , Anopheles/classificação , Anopheles/genéticaRESUMO
BACKGROUND: The BRCA1 c.3331_3334delCAAG founder mutation has been reported in hereditary breast and ovarian cancer families from multiple Hispanic groups. We aimed to evaluate BRCA1 c.3331_3334delCAAG haplotype diversity in cases of European, African, and Latin American ancestry. METHODS: BC mutation carrier cases from Colombia (n = 32), Spain (n = 13), Portugal (n = 2), Chile (n = 10), Africa (n = 1), and Brazil (n = 2) were genotyped with the genome-wide single nucleotide polymorphism (SNP) arrays to evaluate haplotype diversity around BRCA1 c.3331_3334delCAAG. Additional Portuguese (n = 13) and Brazilian (n = 18) BC mutation carriers were genotyped for 15 informative SNPs surrounding BRCA1. Data were phased using SHAPEIT2, and identical by descent regions were determined using BEAGLE and GERMLINE. DMLE+ was used to date the mutation in Colombia and Iberia. RESULTS: The haplotype reconstruction revealed a shared 264.4-kb region among carriers from all six countries. The estimated mutation age was ~ 100 generations in Iberia and that it was introduced to South America early during the European colonization period. CONCLUSIONS: Our results suggest that this mutation originated in Iberia and later introduced to Colombia and South America at the time of Spanish colonization during the early 1500s. We also found that the Colombian mutation carriers had higher European ancestry, at the BRCA1 gene harboring chromosome 17, than controls, which further supported the European origin of the mutation. Understanding founder mutations in diverse populations has implications in implementing cost-effective, ancestry-informed screening.
Assuntos
Proteína BRCA1/genética , Neoplasias da Mama/epidemiologia , Neoplasias da Mama/genética , Predisposição Genética para Doença , Mutação em Linhagem Germinativa , Haplótipos , Polimorfismo de Nucleotídeo Único , África/epidemiologia , Brasil/epidemiologia , Chile/epidemiologia , Cromossomos Humanos Par 17/genética , Colômbia/epidemiologia , Feminino , Efeito Fundador , Estudo de Associação Genômica Ampla/métodos , Humanos , Portugal/epidemiologia , Espanha/epidemiologiaRESUMO
Recent simulation studies examining the performance of Bayesian species delimitation as implemented in the bpp program have suggested that bpp may detect population splits but not species divergences and that it tends to over-split when data of many loci are analyzed. Here, we confirm these results and provide the mathematical justifications. We point out that the distinction between population and species splits made in the protracted speciation model (PSM) has no influence on the generation of gene trees and sequence data, which explains why no method can use such data to distinguish between population splits and speciation. We suggest that the PSM is unrealistic as its mechanism for assigning species status assumes instantaneous speciation, contradicting prevailing taxonomic practice. We confirm the suggestion, based on simulation, that in the case of speciation with gene flow, Bayesian model selection as implemented in bpp tends to detect population splits when the amount of data (the number of loci) increases. We discuss the use of a recently proposed empirical genealogical divergence index (gdi) for species delimitation and illustrate that parameter estimates produced by a full likelihood analysis as implemented in bpp provide much more reliable inference under the gdi than the approximate method phrapl. We distinguish between Bayesian model selection and parameter estimation and suggest that the model selection approach is useful for identifying sympatric cryptic species, while the parameter estimation approach may be used to implement empirical criteria for determining species status among allopatric populations.
Assuntos
Classificação/métodos , Especiação Genética , Modelos Biológicos , Teorema de Bayes , Simulação por ComputadorRESUMO
The multispecies coalescent provides a natural framework for accommodating ancestral genetic polymorphism and coalescent processes that can cause different genomic regions to have different genealogical histories. The Bayesian program BPP includes a full-likelihood implementation of the multispecies coalescent, using transmodel Markov chain Monte Carlo to calculate the posterior probabilities of different species trees. BPP is suitable for analyzing multilocus sequence data sets and it accommodates the heterogeneity of gene trees (both the topology and branch lengths) among loci and gene tree uncertainties due to limited phylogenetic information at each locus. Here, we provide a practical guide to the use of BPP in species tree estimation. BPP is a command-line program that runs on linux, macosx, and windows. This protocol shows how to use both BPP 3.4 (http://abacus.gene.ucl.ac.uk/software/) and BPP 4.0 (https://github.com/bpp/).
Assuntos
Técnicas Genéticas , Filogenia , Software , Animais , Teorema de Bayes , Humanos , RanidaeRESUMO
Bayesian analysis of macroevolutionary mixtures (BAMM) has recently taken the study of lineage diversification by storm. BAMM estimates the diversification-rate parameters (speciation and extinction) for every branch of a study phylogeny and infers the number and location of diversification-rate shifts across branches of a tree. Our evaluation of BAMM reveals two major theoretical errors: (i) the likelihood function (which estimates the model parameters from the data) is incorrect, and (ii) the compound Poisson process prior model (which describes the prior distribution of diversification-rate shifts across branches) is incoherent. Using simulation, we demonstrate that these theoretical issues cause statistical pathologies; posterior estimates of the number of diversification-rate shifts are strongly influenced by the assumed prior, and estimates of diversification-rate parameters are unreliable. Moreover, the inability to correctly compute the likelihood or to correctly specify the prior for rate-variable trees precludes the use of Bayesian approaches for testing hypotheses regarding the number and location of diversification-rate shifts using BAMM.
Assuntos
Coevolução Biológica , Extinção Biológica , Especiação Genética , Filogenia , Baleias/classificação , Animais , Teorema de Bayes , Biodiversidade , Funções Verossimilhança , Distribuição de Poisson , Baleias/genéticaRESUMO
We develop a Bayesian method for inferring the species phylogeny under the multispecies coalescent (MSC) model. To improve the mixing properties of the Markov chain Monte Carlo (MCMC) algorithm that traverses the space of species trees, we implement two efficient MCMC proposals: the first is based on the Subtree Pruning and Regrafting (SPR) algorithm and the second is based on a node-slider algorithm. Like the Nearest-Neighbor Interchange (NNI) algorithm we implemented previously, both new algorithms propose changes to the species tree, while simultaneously altering the gene trees at multiple genetic loci to automatically avoid conflicts with the newly proposed species tree. The method integrates over gene trees, naturally taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. A simulation study was performed to examine the statistical properties of the new method. The method was found to show excellent statistical performance, inferring the correct species tree with near certainty when 10 loci were included in the dataset. The prior on species trees has some impact, particularly for small numbers of loci. We analyzed several previously published datasets (both real and simulated) for rattlesnakes and Philippine shrews, in comparison with alternative methods. The results suggest that the Bayesian coalescent-based method is statistically more efficient than heuristic methods based on summary statistics, and that our implementation is computationally more efficient than alternative full-likelihood methods under the MSC. Parameter estimates for the rattlesnake data suggest drastically different evolutionary dynamics between the nuclear and mitochondrial loci, even though they support largely consistent species trees. We discuss the different challenges facing the marginal likelihood calculation and transmodel MCMC as alternative strategies for estimating posterior probabilities for species trees. [Bayes factor; Bayesian inference; MCMC; multispecies coalescent; nodeslider; species tree; SPR.].
Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Algoritmos , Animais , Teorema de Bayes , Simulação por Computador , Crotalus/classificação , Crotalus/genética , Musaranhos/classificação , Musaranhos/genéticaRESUMO
Phylogenies are important for addressing various biological questions such as relationships among species or genes, the origin and spread of viral infection and the demographic changes and migration patterns of species. The advancement of sequencing technologies has taken phylogenetic analysis to a new height. Phylogenies have permeated nearly every branch of biology, and the plethora of phylogenetic methods and software packages that are now available may seem daunting to an experimental biologist. Here, we review the major methods of phylogenetic analysis, including parsimony, distance, likelihood and Bayesian methods. We discuss their strengths and weaknesses and provide guidance for their use.
Assuntos
Filogenia , Sequência de Bases , Feminino , Humanos , Masculino , Modelos Biológicos , Modelos Estatísticos , Dados de Sequência Molecular , SoftwareRESUMO
DNA barcoding methods use a single locus (usually the mitochondrial COI gene) to assign unidentified specimens to known species in a library based on a genetic distance threshold that distinguishes between-species divergence from within-species diversity. Recently developed species delimitation methods based on the multispecies coalescent (MSC) model offer an alternative approach to individual assignment using either single-locus or multiloci sequence data. Here, we use simulations to demonstrate three features of an MSC method implemented in the program bpp. First, we show that with one locus, MSC can accurately assign individuals to species without the need for arbitrarily determined distance thresholds (as required for barcoding methods). We provide an example in which no single threshold or barcoding gap exists that can be used to assign all specimens without incurring high error rates. Second, we show that bpp can identify cryptic species that may be misidentified as a single species within the library, potentially improving the accuracy of barcoding libraries. Third, we show that taxon rarity does not present any particular problems for species assignments using bpp and that accurate assignments can be achieved even when only one or a few loci are available. Thus, concerns that have been raised that MSC methods may have problems analysing rare taxa (singletons) are unfounded. Currently, barcoding methods enjoy a huge computational advantage over MSC methods and may be the only approach feasible for massively large data sets, but MSC methods may offer a more stringent test for species that are tentatively assigned by barcoding.
Assuntos
Código de Barras de DNA Taxonômico , Modelos Genéticos , Teorema de Bayes , Simulação por Computador , Biblioteca Gênica , Genes MitocondriaisRESUMO
A method was developed for simultaneous Bayesian inference of species delimitation and species phylogeny using the multispecies coalescent model. The method eliminates the need for a user-specified guide tree in species delimitation and incorporates phylogenetic uncertainty in a Bayesian framework. The nearest-neighbor interchange algorithm was adapted to propose changes to the species tree, with the gene trees for multiple loci altered in the proposal to avoid conflicts with the newly proposed species tree. We also modify our previous scheme for specifying priors for species delimitation models to construct joint priors for models of species delimitation and species phylogeny. As in our earlier method, the modified algorithm integrates over gene trees, taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. We conducted a simulation study to examine the statistical properties of the method using six populations (two sequences each) and a true number of three species, with values of divergence times and ancestral population sizes that are realistic for recently diverged species. The results suggest that the method tends to be conservative with high posterior probabilities being a confident indicator of species status. Simulation results also indicate that the power of the method to delimit species increases with an increase of the divergence times in the species tree, and with an increased number of gene loci. Reanalyses of two data sets of cavefish and coast horned lizards suggest considerable phylogenetic uncertainty even though the data are informative about species delimitation. We discuss the impact of the prior on models of species delimitation and species phylogeny and of the prior on population size parameters (θ) on Bayesian species delimitation.
Assuntos
Modelos Genéticos , Tipagem de Sequências Multilocus , Algoritmos , Animais , Teorema de Bayes , Simulação por Computador , Peixes/genética , Lagartos/genética , Cadeias de Markov , Método de Monte Carlo , FilogeniaRESUMO
Gene flow among populations or species and incomplete lineage sorting (ILS) are two evolutionary processes responsible for generating gene tree discordance and therefore hindering species tree estimation. Numerous studies have evaluated the impacts of ILS on species tree inference, yet the ramifications of gene flow on species trees remain less studied. Here, we simulate and analyse multilocus sequence data generated with ILS and gene flow to quantify their impacts on species tree inference. We characterize species tree estimation errors under various models of gene flow, such as the isolation-migration model, the n-island model, and gene flow between non-sister species or involving ancestral species, and species boundaries crossed by a single gene copy (allelic introgression) or by a single migrant individual. These patterns of gene flow are explored on species trees of different sizes (4 vs. 10 species), at different time scales (shallow vs. deep), and with different migration rates. Species trees are estimated with the multispecies coalescent model using Bayesian methods (BEST and *BEAST) and with a summary statistic approach (MPEST) that facilitates phylogenomic-scale analysis. Even in cases where the topology of the species tree is estimated with high accuracy, we find that gene flow can result in overestimates of population sizes (species tree dilation) and underestimates of species divergence times (species tree compression). Signatures of migration events remain present in the distribution of coalescent times for gene trees, and with sufficient data it is possible to identify those loci that have crossed species boundaries. These results highlight the need for careful sampling design in phylogeographic and species delimitation studies as gene flow, introgression, or incorrect sample assignments can bias the estimation of the species tree topology and of parameter estimates such as population sizes and divergence times.
Assuntos
Classificação/métodos , Simulação por Computador , Fluxo Gênico , Modelos Genéticos , Filogenia , Alelos , Teorema de Bayes , Especiação GenéticaRESUMO
As demonstrated by the SARS-CoV-2 pandemic, the emergence of novel viral strains with increased transmission rates poses a serious threat to global health. Statistical models of genome sequence evolution may provide a critical tool for early detection of these strains. Using a novel stochastic model that links transmission rates to the entire viral genome sequence, we study the utility of phylogenetic methods that use a phylogenetic tree relating viral samples versus count-based methods that use case counts of variants over time exclusively to detect increased transmission rates and identify candidate causative mutations. We find that phylogenies in particular can detect novel transmission-enhancing variants very soon after their origin and may facilitate the development of early detection systems for outbreak surveillance.
Assuntos
COVID-19 , Genoma Viral , Filogenia , SARS-CoV-2 , SARS-CoV-2/genética , SARS-CoV-2/classificação , SARS-CoV-2/isolamento & purificação , Humanos , COVID-19/virologia , COVID-19/transmissão , COVID-19/epidemiologia , Mutação , Genômica/métodos , Evolução MolecularRESUMO
Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations for this phenomenon include the existence of multiple local peaks in the posterior, lack of convergence of the chain in the tail of the posterior, mixing problems, and misspecified priors on branch lengths. Here, we analyze the behavior of Bayesian Markov chain Monte Carlo algorithms when the chain is in the tail of the posterior distribution and note that all these phenomena can occur. In Bayesian phylogenetics, the likelihood function approaches a constant instead of zero when the branch lengths increase to infinity. The flat tail of the likelihood can cause poor mixing and undue influence of the prior. We suggest that the main cause of the extreme branch length estimates produced in many Bayesian analyses is the poor choice of a default prior on branch lengths in current Bayesian phylogenetic programs. The default prior in MrBayes assigns independent and identical distributions to branch lengths, imposing strong (and unreasonable) assumptions about the tree length. The problem is exacerbated by the strong correlation between the branch lengths and parameters in models of variable rates among sites or among site partitions. To resolve the problem, we suggest two multivariate priors for the branch lengths (called compound Dirichlet priors) that are fairly diffuse and demonstrate their utility in the special case of branch length estimation on a star phylogeny. Our analysis highlights the need for careful thought in the specification of high-dimensional priors in Bayesian analyses.