Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Mol Phylogenet Evol ; 179: 107650, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36441104

RESUMO

The effect of selection acting on regions of the genome on the accuracy of species-level phylogenetic inference using methods that do not explicitly model selection is an open question that is relevant to most, if not all, phylogenomic studies. To address this, we derive a mathematical approximation to the Wright-Fisher model with mutation and selection in the limit as the population size becomes large. In contrast to previous approximations based on diffusion processes, our approximation can be used to study the distribution of coalescent times for an arbitrary number of lineages, allowing calculation of the probability distribution of gene genealogies under the coalescent model. We use these calculations to show that direct selection at strengths typically encountered in practice has only a small effect on the distribution of coalescent times, and hence on the distribution of gene trees. This implies that many coalescent-based methods for estimating the species tree topology will be robust to the presence of selection in a subset of the underlying genes. Selection will, however, bias the estimation of speciation times, causing them to underestimate the true speciation times. Our model captures the effects of selection on the genealogies that generate the observed sequence data, but does not model selective pressures that act only on the subsequent sequences or that negatively impact gene tree estimation.


Assuntos
Especiação Genética , Modelos Genéticos , Filogenia , Probabilidade , Mutação
2.
Syst Biol ; 70(5): 891-907, 2021 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-33404632

RESUMO

Interspecific hybridization is an important evolutionary phenomenon that generates genetic variability in a population and fosters species diversity in nature. The availability of large genome scale data sets has revolutionized hybridization studies to shift from the observation of the presence or absence of hybrids to the investigation of the genomic constitution of hybrids and their genome-specific evolutionary dynamics. Although a handful of methods have been proposed in an attempt to identify hybrids, accurate detection of hybridization from genomic data remains a challenging task. In addition to methods that infer phylogenetic networks or that utilize pairwise divergence, site pattern frequency based and population genetic clustering approaches are popularly used in practice, though the performance of these methods under different hybridization scenarios has not been extensively examined. Here, we use simulated data to comparatively evaluate the performance of four tools that are commonly used to infer hybridization events: the site pattern frequency based methods HyDe and the $D$-statistic (i.e., the ABBA-BABA test) and the population clustering approaches structure and ADMIXTURE. We consider single hybridization scenarios that vary in the time of hybridization and the amount of incomplete lineage sorting (ILS) for different proportions of parental contributions ($\gamma$); introgressive hybridization; multiple hybridization scenarios; and a mixture of ancestral and recent hybridization scenarios. We focus on the statistical power to detect hybridization and the false discovery rate (FDR) for comparisons of the $D$-statistic and HyDe, and the accuracy of the estimates of $\gamma$ as measured by the mean squared error for HyDe, structure, and ADMIXTURE. Both HyDe and the $D$-statistic are powerful for detecting hybridization in all scenarios except those with high ILS, although the $D$-statistic often has an unacceptably high FDR. The estimates of $\gamma$ in HyDe are impressively robust and accurate whereas structure and ADMIXTURE sometimes fail to identify hybrids, particularly when the proportional parental contributions are asymmetric (i.e., when $\gamma$ is close to 0). Moreover, the posterior distribution estimated using structure exhibits multimodality in many scenarios, making interpretation difficult. Our results provide guidance in selecting appropriate methods for identifying hybrid populations from genomic data. [ABBA-BABA test; ADMIXTURE; hybridization; HyDe; introgression; Patterson's $D$-statistic; Structure.].


Assuntos
Genoma , Hibridização Genética , Genética Populacional , Genômica , Filogenia
3.
BMC Evol Biol ; 19(1): 112, 2019 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-31146685

RESUMO

BACKGROUND: Coalescent-based species tree inference has become widely used in the analysis of genome-scale multilocus and SNP datasets when the goal is inference of a species-level phylogeny. However, numerous evolutionary processes are known to violate the assumptions of a coalescence-only model and complicate inference of the species tree. One such process is hybrid speciation, in which a species shares its ancestry with two distinct species. Although many methods have been proposed to detect hybrid speciation, only a few have considered both hybridization and coalescence in a unified framework, and these are generally limited to the setting in which putative hybrid species must be identified in advance. RESULTS: Here we propose a method that can examine genome-scale data for a large number of taxa and detect those taxa that may have arisen via hybridization, as well as their potential "parental" taxa. The method is based on a model that considers both coalescence and hybridization together, and uses phylogenetic invariants to construct a test that scales well in terms of computational time for both the number of taxa and the amount of sequence data. We test the method using simulated data for up 20 taxa and 100,000bp, and find that the method accurately identifies both recent and ancient hybrid species in less than 30 s. We apply the method to two empirical datasets, one composed of Sistrurus rattlesnakes for which hybrid speciation is not supported by previous work, and one consisting of several species of Heliconius butterflies for which some evidence of hybrid speciation has been previously found. CONCLUSIONS: The proposed method is powerful for detecting hybridization for both recent and ancient hybridization events. The computations required can be carried out rapidly for a large number of sequences using genome-scale data, and the method is appropriate for both SNP and multilocus data.


Assuntos
Bases de Dados Genéticas , Genômica , Hibridização Genética , Modelos Genéticos , Animais , Borboletas/genética , Simulação por Computador , Crotalus/genética , Especiação Genética , Filogenia , Especificidade da Espécie
4.
Bioinformatics ; 34(3): 407-415, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29028881

RESUMO

Motivation: Genotyping and parameter estimation using high throughput sequencing data are everyday tasks for population geneticists, but methods developed for diploids are typically not applicable to polyploid taxa. This is due to their duplicated chromosomes, as well as the complex patterns of allelic exchange that often accompany whole genome duplication (WGD) events. For WGDs within a single lineage (autopolyploids), inbreeding can result from mixed mating and/or double reduction. For WGDs that involve hybridization (allopolyploids), alleles are typically inherited through independently segregating subgenomes. Results: We present two new models for estimating genotypes and population genetic parameters from genotype likelihoods for auto- and allopolyploids. We then use simulations to compare these models to existing approaches at varying depths of sequencing coverage and ploidy levels. These simulations show that our models typically have lower levels of estimation error for genotype and parameter estimates, especially when sequencing coverage is low. Finally, we also apply these models to two empirical datasets from the literature. Overall, we show that the use of genotype likelihoods to model non-standard inheritance patterns is a promising approach for conducting population genomic inferences in polyploids. Availability and implementation: A C ++ program, EBG, is provided to perform inference using the models we describe. It is available under the GNU GPLv3 on GitHub: https://github.com/pblischak/polyploid-genotyping. Contact: blischak.4@osu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Técnicas de Genotipagem/métodos , Endogamia , Polimorfismo de Nucleotídeo Único , Poliploidia , Análise de Sequência de DNA/métodos , Software , Alelos , Animais , Eucariotos/genética , Genética Populacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
5.
Syst Biol ; 67(5): 821-829, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29562307

RESUMO

The analysis of hybridization and gene flow among closely related taxa is a common goal for researchers studying speciation and phylogeography. Many methods for hybridization detection use simple site pattern frequencies from observed genomic data and compare them to null models that predict an absence of gene flow. The theory underlying the detection of hybridization using these site pattern probabilities exploits the relationship between the coalescent process for gene trees within population trees and the process of mutation along the branches of the gene trees. For certain models, site patterns are predicted to occur in equal frequency (i.e., their difference is 0), producing a set of functions called phylogenetic invariants. In this article, we introduce HyDe, a software package for detecting hybridization using phylogenetic invariants arising under the coalescent model with hybridization. HyDe is written in Python and can be used interactively or through the command line using pre-packaged scripts. We demonstrate the use of HyDe on simulated data, as well as on two empirical data sets from the literature. We focus in particular on identifying individual hybrids within population samples and on distinguishing between hybrid speciation and gene flow. HyDe is freely available as an open source Python package under the GNU GPL v3 on both GitHub (https://github.com/pblischak/HyDe) and the Python Package Index (PyPI: https://pypi.python.org/pypi/phyde).


Assuntos
Biologia Computacional/métodos , Fluxo Gênico , Especiação Genética , Hibridização Genética , Software
6.
Stat Appl Genet Mol Biol ; 17(3)2018 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-29874197

RESUMO

The increasing availability of population-level allele frequency data across one or more related populations necessitates the development of methods that can efficiently estimate population genetics parameters, such as the strength of selection acting on the population(s), from such data. Existing methods for this problem in the setting of the Wright-Fisher diffusion model are primarily likelihood-based, and rely on numerical approximation for likelihood computation and on bootstrapping for assessment of variability in the resulting estimates, requiring extensive computation. Recent work has provided a method for obtaining exact samples from general Wright-Fisher diffusion processes, enabling the development of methods for Bayesian estimation in this setting. We develop and implement a Bayesian method for estimating the strength of selection based on the Wright-Fisher diffusion for data sampled at a single time point. The method utilizes the latest algorithms for exact sampling to devise a Markov chain Monte Carlo procedure to draw samples from the joint posterior distribution of the selection coefficient and the allele frequencies. We demonstrate that when assumptions about the initial allele frequencies are accurate the method performs well for both simulated data and for an empirical data set on hypoxia in flies, where we find evidence for strong positive selection in a region of chromosome 2L previously identified. We discuss possible extensions of our method to the more general settings commonly encountered in practice, highlighting the advantages of Bayesian approaches to inference in this setting.


Assuntos
Teorema de Bayes , Frequência do Gene , Genética Populacional , Modelos Genéticos , Algoritmos , Animais , Drosophila melanogaster/genética , Hipóxia/genética , Funções Verossimilhança , Cadeias de Markov , Método de Monte Carlo , Polimorfismo de Nucleotídeo Único
7.
Syst Biol ; 66(4): 620-636, 2017 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-28123114

RESUMO

Detecting variation in the evolutionary process along chromosomes is increasingly important as whole-genome data become more widely available. For example, factors such as incomplete lineage sorting, horizontal gene transfer, and chromosomal inversion are expected to result in changes in the underlying gene trees along a chromosome, while changes in selective pressure and mutational rates for different genomic regions may lead to shifts in the underlying mutational process. We propose the split score as a general method for quantifying support for a particular phylogenetic relationship within a genomic data set. Because the split score is based on algebraic properties of a matrix of site pattern frequencies, it can be rapidly computed, even for data sets that are large in the number of taxa and/or in the length of the alignment, providing an advantage over other methods (e.g., maximum likelihood) that are often used to assess such support. Using simulation, we explore the properties of the split score, including its dependence on sequence length, branch length, size of a split and its ability to detect true splits in the underlying tree. Using a sliding window analysis, we show that split scores can be used to detect changes in the underlying evolutionary process for genome-scale data from primates, mosquitoes, and viruses in a computationally efficient manner. Computation of the split score has been implemented in the software package SplitSup.


Assuntos
Classificação/métodos , Filogenia , Animais , Culicidae/classificação , Culicidae/genética , Evolução Molecular , Transferência Genética Horizontal , Genoma/genética , Primatas/classificação , Primatas/genética , Software , Vírus/classificação , Vírus/genética
8.
Mol Phylogenet Evol ; 106: 144-150, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27693467

RESUMO

Although it is widely appreciated that gene trees may differ from the overall species tree and from one another due to various evolutionary processes (e.g., incomplete lineage sorting (ILS), horizontal gene transfer, etc.), the extent of this incongruence is rarely quantified and discussed. Here we consider the expected amount of incongruence arising from ILS, as modeled by the coalescent process. In particular, we compute the probability that two gene trees randomly sampled from the same species tree agree with one another as well as the distribution of the Robinson-Foulds distance between them, for species trees with three to eight taxa. We demonstrate that, as expected under the coalescent model, the amount of discordance is affected by species tree-specific factors such as speciation times and effective population sizes for the species under consideration. Our results highlight the fact that substantial discordance may occur, even when the number of species is very small, which has implications both for larger taxon samples and for any method that uses estimated gene trees as the basis for further statistical inference. The amount of incongruence is substantial enough that such methods may need to be modified to account for variability in the underlying gene trees.


Assuntos
Modelos Genéticos , Loci Gênicos , Especiação Genética , Filogenia , Recombinação Genética
9.
Mol Phylogenet Evol ; 105: 177-192, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27614251

RESUMO

We propose a coalescent model for three species that allows gene flow between both pairs of sister populations. The model is designed for multilocus genomic sequence alignments, with one sequence sampled from each of the three species, and is formulated using a Markov chain representation that allows use of matrix exponentiation to compute analytical expressions for the probability density of coalescent histories. The coalescent history distribution as well as the gene tree topology distribution under this coalescent model with gene flow are then calculated via numerical integration. We analyze the model to compare the distributions of gene tree topologies and coalescent histories for species trees with differing effective population sizes and gene flow rates. Our results suggest conditions under which the species tree and associated parameters are not identifiable from the gene tree topology distribution when gene flow is present, but indicate that the coalescent history distribution may identify the species tree and associated parameters. Thus, the coalescent history distribution can be used to infer parameters such as the ancestral effective population sizes and the rates of gene flow in a maximum likelihood (ML) framework. We conduct computer simulations to evaluate the performance of our method in estimating these parameters, and we apply our method to an Afrotropical mosquito data set (Fontaine et al., 2015).


Assuntos
Fluxo Gênico , Modelos Genéticos , Simulação por Computador , Filogenia , Probabilidade
10.
Mol Phylogenet Evol ; 70: 63-9, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24055603

RESUMO

Multi-locus phylogenetic inference is commonly carried out via models that incorporate the coalescent process to model the possibility that incomplete lineage sorting leads to incongruence between gene trees and the species tree. An interesting question that arises in this context is whether data "fit" the coalescent model. Previous work (Rosenfeld et al., 2012) has suggested that rooting of gene trees may account for variation in empirical data that has been previously attributed to the coalescent process. We examine this possibility using simulated data. We show that, in the case of four taxa, the distribution of gene trees observed from rooting estimated gene trees with either the molecular clock or with outgroup rooting can be closely matched by the distribution predicted by the coalescent model with specific choices of species tree branch lengths. We apply commonly-used coalescent-based methods of species tree inference to assess their performance in these situations.


Assuntos
Filogenia , Modelos Genéticos , Probabilidade , Análise de Sequência de DNA
11.
BMC Bioinformatics ; 14: 200, 2013 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-23786262

RESUMO

BACKGROUND: In mammalian genetics, many quantitative traits, such as blood pressure, are thought to be influenced by specific genes, but are also affected by environmental factors, making the associated genes difficult to identify and locate from genetic data alone. In particular, the application of classical statistical methods to single nucleotide polymorphism (SNP) data collected in genome-wide association studies has been especially challenging. We propose a coalescent approach to search for SNPs associated with quantitative traits in genome-wide association study (GWAS) data by taking into account the evolutionary history among SNPs. RESULTS: We evaluate the performance of the new method using simulated data, and find that it performs at least as well as existing methods with an increase in performance in the case of population structure. Application of the methodology to a real data set consisting of high-density lipoprotein cholesterol measurements in mice shows the method performs well for empirical data, as well. CONCLUSIONS: By combining methods from stochastic processes and phylogenetics, this work provides an innovative avenue for the development of new statistical methodology in the analysis of GWAS data.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Animais , HDL-Colesterol/sangue , Camundongos , Fenótipo
12.
Mol Phylogenet Evol ; 69(3): 1057-62, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23769751

RESUMO

With recent advances in genomic sequencing, the importance of taking the effects of the processes that can cause discord between the speciation history and the individual gene histories into account has become evident. For multilocus datasets, it is difficult to achieve complete coverage of all sampled loci across all sample specimens, a problem that also arises when combining incompletely overlapping datasets. Here we examine how missing data affects the accuracy of species tree reconstruction. In our study, 10- and 100-locus sequence datasets were simulated under the coalescent model from shallow and deep speciation histories, and species trees were estimated using the maximum likelihood and Bayesian frameworks (with STEM and (*)BEAST, respectively). The accuracy of the estimated species trees was evaluated using the symmetric difference and the SPR distance. We examine the effects of sampling more than one individual per species, as well as the effects of different patterns of missing data (i.e., different amounts of missing data, which is represented among random taxa as opposed to being concentrated in specific taxa, as is often the case for empirical studies). Our general conclusion is that the species tree estimates are remarkably resilient to the effects of missing data. We find that for datasets with more limited numbers of loci, sampling more than one individual per species has the strongest effect on improving species tree accuracy when there is missing data, especially at higher degrees of missing data. For larger multilocus datasets (e.g., 25-100 loci), the amount of missing data has a negligible effect on species tree reconstruction, even at 50% missing data and a single sampled individual per species.


Assuntos
Especiação Genética , Modelos Genéticos , Filogenia , Análise de Sequência de DNA/métodos , Teorema de Bayes , Simulação por Computador , Funções Verossimilhança
13.
Syst Biol ; 60(4): 393-409, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21389297

RESUMO

Phylogenetic relationships and taxonomic distinctiveness of closely related species and subspecies are most accurately inferred from data derived from multiple independent loci. Here, we apply several approaches for understanding species-level relationships using data from 18 nuclear DNA loci and 1 mitochondrial DNA locus within currently described species and subspecies of Sistrurus rattlesnakes. Collectively, these methods provide evidence that a currently described species, the massasauga rattlesnake (Sistrurus catenatus), consists of two well-supported clades, one composed of the two western subspecies (S. c. tergeminus and S. c. edwardsii) and the other the eastern subspecies (S. c. catenatus). Within pigmy rattlesnakes (S. miliarius), however, there is not strong support across methods for any particular grouping at the subspecific level. Monophyly based tests for taxonomic distinctiveness show evidence for distinctiveness of all subspecies but this support is strongest by far for the S. c. catenatus clade. Because support for the distinctiveness of S. c. catenatus is both strong and consistent across methods, and due to its morphological distinctiveness and allopatric distribution, we suggest that this subspecies be elevated to full species status, which has significant conservation implications. Finally, most divergence time estimates based upon a fossil-calibrated species tree are > 50% younger than those from a concatenated gene tree analysis and suggest that an active period of speciation within Sistrurus occurred within the late Pliocene/Pleistocene eras.


Assuntos
Crotalus/classificação , Filogenia , Animais , Crotalus/genética , DNA/química , DNA Mitocondrial/química , Especiação Genética , Recombinação Genética , Análise de Sequência de DNA , Especificidade da Espécie
14.
Science ; 376(6589): 156-162, 2022 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-35389782

RESUMO

Whereas DNA viruses are known to be abundant, diverse, and commonly key ecosystem players, RNA viruses are insufficiently studied outside disease settings. In this study, we analyzed ≈28 terabases of Global Ocean RNA sequences to expand Earth's RNA virus catalogs and their taxonomy, investigate their evolutionary origins, and assess their marine biogeography from pole to pole. Using new approaches to optimize discovery and classification, we identified RNA viruses that necessitate substantive revisions of taxonomy (doubling phyla and adding >50% new classes) and evolutionary understanding. "Species"-rank abundance determination revealed that viruses of the new phyla "Taraviricota," a missing link in early RNA virus evolution, and "Arctiviricota" are widespread and dominant in the oceans. These efforts provide foundational knowledge critical to integrating RNA viruses into ecological and epidemiological models.


Assuntos
Genoma Viral , Vírus de RNA , Vírus , Evolução Biológica , Ecossistema , Oceanos e Mares , Filogenia , RNA , Vírus de RNA/genética , Viroma/genética , Vírus/genética
15.
BMC Evol Biol ; 11: 77, 2011 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-21435245

RESUMO

BACKGROUND: Colobine monkeys constitute a diverse group of primates with major radiations in Africa and Asia. However, phylogenetic relationships among genera are under debate, and recent molecular studies with incomplete taxon-sampling revealed discordant gene trees. To solve the evolutionary history of colobine genera and to determine causes for possible gene tree incongruences, we combined presence/absence analysis of mobile elements with autosomal, X chromosomal, Y chromosomal and mitochondrial sequence data from all recognized colobine genera. RESULTS: Gene tree topologies and divergence age estimates derived from different markers were similar, but differed in placing Piliocolobus/Procolobus and langur genera among colobines. Although insufficient data, homoplasy and incomplete lineage sorting might all have contributed to the discordance among gene trees, hybridization is favored as the main cause of the observed discordance. We propose that African colobines are paraphyletic, but might later have experienced female introgression from Piliocolobus/Procolobus into Colobus. In the late Miocene, colobines invaded Eurasia and diversified into several lineages. Among Asian colobines, Semnopithecus diverged first, indicating langur paraphyly. However, unidirectional gene flow from Semnopithecus into Trachypithecus via male introgression followed by nuclear swamping might have occurred until the earliest Pleistocene. CONCLUSIONS: Overall, our study provides the most comprehensive view on colobine evolution to date and emphasizes that analyses of various molecular markers, such as mobile elements and sequence data from multiple loci, are crucial to better understand evolutionary relationships and to trace hybridization events. Our results also suggest that sex-specific dispersal patterns, promoted by a respective social organization of the species involved, can result in different hybridization scenarios.


Assuntos
Evolução Biológica , Núcleo Celular/genética , Colobinae/genética , DNA Mitocondrial/genética , Hibridização Genética , Filogenia , Elementos Alu , Animais , Mapeamento Cromossômico , Colobinae/classificação , Feminino , Masculino , Análise de Sequência de DNA , Cromossomo X/genética , Cromossomo Y/genética
16.
Mol Phylogenet Evol ; 59(2): 354-63, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21397706

RESUMO

Development of methods for estimating species trees from multilocus data is a current challenge in evolutionary biology. We propose a method for estimating the species tree topology and branch lengths using approximate Bayesian computation (ABC). The method takes as data a sample of observed rooted gene tree topologies, and then iterates through the following sequence of steps: First, a randomly selected species tree is used to compute the distribution of rooted gene tree topologies. This distribution is then compared to the observed gene topology frequencies, and if the fit between the observed and the predicted distributions is close enough, the proposed species tree is retained. Repeating this many times leads to a collection of retained species trees that are then used to form the estimate of the overall species tree. We test the performance of the method, which we call ST-ABC, using both simulated and empirical data. The simulation study examines both symmetric and asymmetric species trees over a range of branch lengths and sample sizes. The results from the simulation study show that the model performs very well, giving accurate estimates for both the topology and the branch lengths across the conditions studied, and that a sample size of 25 loci appears to be adequate for the method. Further, we apply the method to two empirical cases: a 4-taxon data set for primates and a 7-taxon data set for yeast. In both cases, we find that estimates obtained with ST-ABC agree with previous studies. The method provides efficient estimation of the species tree, and does not require sequence data, but rather the observed distribution of rooted gene topologies without branch lengths. Therefore, this method is a useful alternative to other currently available methods for species tree estimation.


Assuntos
Algoritmos , Teorema de Bayes , Classificação/métodos , Modelos Genéticos , Filogenia , Animais , Simulação por Computador , Humanos , Primatas/genética , Leveduras
17.
Syst Biol ; 59(5): 573-83, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20833951

RESUMO

Discord in the estimated gene trees among loci can be attributed to both the process of mutation and incomplete lineage sorting. Effectively modeling these two sources of variation--mutational and coalescent variance--provides two distinct challenges for phylogenetic studies. Despite extensive investigation on mutational models for gene-tree estimation over the past two decades and recent attention to modeling of the coalescent process for phylogenetic estimation, the effects of these two variances have yet to be evaluated simultaneously. Here, we partition the effects of mutational and coalescent processes on phylogenetic accuracy by comparing the accuracy of species trees estimated from gene trees (i.e., the actual coalescent genealogies) with that of species trees estimated from estimated gene trees (i.e., trees estimated from nucleotide sequences, which contain both coalescent and mutational variance). Not only is there a significant contribution of both mutational and coalescent variance to errors in species-tree estimates, but the relative magnitude of the effects on the accuracy of species-tree estimation also differs systematically depending on 1) the timing of divergence, 2) the sampling design, and 3) the method used for species-tree estimation. These findings explain why using more information contained in gene trees (e.g., topology and branch lengths as opposed to just topology) does not necessarily translate into pronounced gains in accuracy, highlighting the strengths and limits of different methods for species-tree estimation. Differences in accuracy scores between methods for different sampling regimes also emphasize that it would be a mistake to assume more computationally intensive species-tree estimation procedures that will always provide better estimates of species trees. To the contrary, the performance of a method depends not only on the method per se but also on the compatibilities between the input genetic data and the method as determined by the relative impact of mutational and coalescent variance.


Assuntos
Modelos Genéticos , Mutação , Filogenia , Projetos de Pesquisa , Análise de Variância , Especiação Genética
18.
Bioinformatics ; 25(7): 971-3, 2009 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-19211573

RESUMO

UNLABELLED: STEM is a software package written in the C language to obtain maximum likelihood (ML) estimates for phylogenetic species trees given a sample of gene trees under the coalescent model. It includes options to compute the ML species tree, search the space of all species trees for the k trees of highest likelihood and compute ML branch lengths for a user-input species tree. AVAILABILITY: The STEM package, including source code, is freely available at http://www.stat.osu.edu/~lkubatko/software/STEM/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Filogenia , Software , Algoritmos , Simulação por Computador , Genes , Funções Verossimilhança
20.
Mol Ecol Resour ; 16(3): 742-54, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-26607217

RESUMO

Despite the increasing opportunity to collect large-scale data sets for population genomic analyses, the use of high-throughput sequencing to study populations of polyploids has seen little application. This is due in large part to problems associated with determining allele copy number in the genotypes of polyploid individuals (allelic dosage uncertainty-ADU), which complicates the calculation of important quantities such as allele frequencies. Here, we describe a statistical model to estimate biallelic SNP frequencies in a population of autopolyploids using high-throughput sequencing data in the form of read counts. We bridge the gap from data collection (using restriction enzyme based techniques [e.g. GBS, RADseq]) to allele frequency estimation in a unified inferential framework using a hierarchical Bayesian model to sum over genotype uncertainty. Simulated data sets were generated under various conditions for tetraploid, hexaploid and octoploid populations to evaluate the model's performance and to help guide the collection of empirical data. We also provide an implementation of our model in the R package polyfreqs and demonstrate its use with two example analyses that investigate (i) levels of expected and observed heterozygosity and (ii) model adequacy. Our simulations show that the number of individuals sampled from a population has a greater impact on estimation error than sequencing coverage. The example analyses also show that our model and software can be used to make inferences beyond the estimation of allele frequencies for autopolyploids by providing assessments of model adequacy and estimates of heterozygosity.


Assuntos
Bioestatística/métodos , Frequência do Gene , Genética Populacional/métodos , Genótipo , Poliploidia , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa