Your browser doesn't support javascript.
loading
Reference genome choice and filtering thresholds jointly influence phylogenomic analyses.
Rick, Jessica A; Brock, Chad D; Lewanski, Alexander L; Golcher-Benavides, Jimena; Wagner, Catherine E.
  • Rick JA; School of Natural Resources & the Environment, University of Arizona, Tucson, AZ 85719, USA.
  • Brock CD; Department of Biological Sciences, Tarleton State University, Stephenville, TX 76401, USA.
  • Lewanski AL; Department of Integrative Biology and W.K. Kellogg Biological Station, Michigan State University, East Lansing, MI 48824, USA.
  • Golcher-Benavides J; Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA 50011, USA.
  • Wagner CE; Program in Ecology and Evolution, University of Wyoming, Laramie, WY, 82071, USA.
Syst Biol ; 2023 Oct 26.
Article en En | MEDLINE | ID: mdl-37881861
ABSTRACT
Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant-calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate to what extent the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find greatest topological accuracy when filtering sites for minor allele count >3-4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with minor allele count >1-2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short read genomic data for phylogenetic inference.
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Idioma: En Año: 2023 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Idioma: En Año: 2023 Tipo del documento: Article