Your browser doesn't support javascript.
loading
Effects of missing data on species tree estimation under the coalescent.
Hovmöller, Rasmus; Knowles, L Lacey; Kubatko, Laura S.
Afiliação
  • Hovmöller R; Department of Statistics, The Ohio State University, 404 Cockins Hall, 1958 Neil Avenue, Columbus, OH 43210, United States.
Mol Phylogenet Evol ; 69(3): 1057-62, 2013 Dec.
Article em En | MEDLINE | ID: mdl-23769751
ABSTRACT
With recent advances in genomic sequencing, the importance of taking the effects of the processes that can cause discord between the speciation history and the individual gene histories into account has become evident. For multilocus datasets, it is difficult to achieve complete coverage of all sampled loci across all sample specimens, a problem that also arises when combining incompletely overlapping datasets. Here we examine how missing data affects the accuracy of species tree reconstruction. In our study, 10- and 100-locus sequence datasets were simulated under the coalescent model from shallow and deep speciation histories, and species trees were estimated using the maximum likelihood and Bayesian frameworks (with STEM and (*)BEAST, respectively). The accuracy of the estimated species trees was evaluated using the symmetric difference and the SPR distance. We examine the effects of sampling more than one individual per species, as well as the effects of different patterns of missing data (i.e., different amounts of missing data, which is represented among random taxa as opposed to being concentrated in specific taxa, as is often the case for empirical studies). Our general conclusion is that the species tree estimates are remarkably resilient to the effects of missing data. We find that for datasets with more limited numbers of loci, sampling more than one individual per species has the strongest effect on improving species tree accuracy when there is missing data, especially at higher degrees of missing data. For larger multilocus datasets (e.g., 25-100 loci), the amount of missing data has a negligible effect on species tree reconstruction, even at 50% missing data and a single sampled individual per species.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Filogenia / Análise de Sequência de DNA / Especiação Genética / Modelos Genéticos Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2013 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Filogenia / Análise de Sequência de DNA / Especiação Genética / Modelos Genéticos Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2013 Tipo de documento: Article