RESUMO
Demographic inference using the site frequency spectrum (SFS) is a common way to understand historical events affecting genetic variation. However, most methods for estimating demography from the SFS assume random mating within populations, precluding these types of analyses in inbred populations. To address this issue, we developed a model for the expected SFS that includes inbreeding by parameterizing individual genotypes using beta-binomial distributions. We then take the convolution of these genotype probabilities to calculate the expected frequency of biallelic variants in the population. Using simulations, we evaluated the model's ability to coestimate demography and inbreeding using one- and two-population models across a range of inbreeding levels. We also applied our method to two empirical examples, American pumas (Puma concolor) and domesticated cabbage (Brassica oleracea var. capitata), inferring models both with and without inbreeding to compare parameter estimates and model fit. Our simulations showed that we are able to accurately coestimate demographic parameters and inbreeding even for highly inbred populations (F = 0.9). In contrast, failing to include inbreeding generally resulted in inaccurate parameter estimates in simulated data and led to poor model fit in our empirical analyses. These results show that inbreeding can have a strong effect on demographic inference, a pattern that was especially noticeable for parameters involving changes in population size. Given the importance of these estimates for informing practices in conservation, agriculture, and elsewhere, our method provides an important advancement for accurately estimating the demographic histories of these species.
Assuntos
Endogamia , Modelos Genéticos , Animais , Brassica/genética , Simulação por Computador , Polimorfismo de Nucleotídeo Único , Dinâmica Populacional , Puma/genéticaRESUMO
Many crops are polyploid or have a polyploid ancestry. Recent phylogenetic analyses have found that polyploidy often preceded the domestication of crop plants. One explanation for this observation is that increased genetic diversity following polyploidy may have been important during the strong artificial selection that occurs during domestication. In order to test the connection between domestication and polyploidy, we identified and examined candidate genes associated with the domestication of the diverse crop varieties of Brassica rapa. Like all 'diploid' flowering plants, B. rapa has a diploidized paleopolyploid genome and experienced many rounds of whole genome duplication (WGD). We analyzed transcriptome data of more than 100 cultivated B. rapa accessions. Using a combination of approaches, we identified > 3000 candidate genes associated with the domestication of four major B. rapa crop varieties. Consistent with our expectation, we found that the candidate genes were significantly enriched with genes derived from the Brassiceae mesohexaploidy. We also observed that paleologs were significantly more diverse than non-paleologs. Our analyses find evidence for that genetic diversity derived from ancient polyploidy played a key role in the domestication of B. rapa and provide support for its importance in the success of modern agriculture.
Assuntos
Brassica rapa , Domesticação , Brassica rapa/genética , Genoma de Planta/genética , Filogenia , PoliploidiaRESUMO
PREMISE: Whole-genome duplications (WGDs) are prevalent throughout the evolutionary history of plants. For example, dozens of WGDs have been phylogenetically localized across the order Brassicales, specifically, within the family Brassicaceae. A WGD event has also been identified in the Cleomaceae, the sister family to Brassicaceae, yet its placement, as well as that of WGDs in other families in the order, remains unclear. METHODS: Phylo-transcriptomic data were generated and used to infer a nuclear phylogeny for 74 Brassicales taxa. Genome survey sequencing was also performed on 66 of those taxa to infer a chloroplast phylogeny. These phylogenies were used to assess and confirm relationships among the major families of the Brassicales and within Brassicaceae. Multiple WGD inference methods were then used to assess the placement of WGDs on the nuclear phylogeny. RESULTS: Well-supported chloroplast and nuclear phylogenies for the Brassicales and the putative placement of the Cleomaceae-specific WGD event Th-É are presented. This work also provides evidence for previously hypothesized WGDs, including a well-supported event shared by at least two members of the Resedaceae family, and a possible event within the Capparaceae. CONCLUSIONS: Phylogenetics and the placement of WGDs within highly polyploid lineages continues to be a major challenge. This study adds to the conversation on WGD inference difficulties by demonstrating that sampling is especially important for WGD identification and phylogenetic placement. Given its economic importance and genomic resources, the Brassicales continues to be an ideal group for assessing WGD inference methods.
Assuntos
Duplicação Gênica , Magnoliopsida/genética , Evolução Molecular , Genoma , Genoma de Planta/genética , Humanos , Filogenia , PoliploidiaRESUMO
Motivation: Genotyping and parameter estimation using high throughput sequencing data are everyday tasks for population geneticists, but methods developed for diploids are typically not applicable to polyploid taxa. This is due to their duplicated chromosomes, as well as the complex patterns of allelic exchange that often accompany whole genome duplication (WGD) events. For WGDs within a single lineage (autopolyploids), inbreeding can result from mixed mating and/or double reduction. For WGDs that involve hybridization (allopolyploids), alleles are typically inherited through independently segregating subgenomes. Results: We present two new models for estimating genotypes and population genetic parameters from genotype likelihoods for auto- and allopolyploids. We then use simulations to compare these models to existing approaches at varying depths of sequencing coverage and ploidy levels. These simulations show that our models typically have lower levels of estimation error for genotype and parameter estimates, especially when sequencing coverage is low. Finally, we also apply these models to two empirical datasets from the literature. Overall, we show that the use of genotype likelihoods to model non-standard inheritance patterns is a promising approach for conducting population genomic inferences in polyploids. Availability and implementation: A C ++ program, EBG, is provided to perform inference using the models we describe. It is available under the GNU GPLv3 on GitHub: https://github.com/pblischak/polyploid-genotyping. Contact: blischak.4@osu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Técnicas de Genotipagem/métodos , Endogamia , Polimorfismo de Nucleotídeo Único , Poliploidia , Análise de Sequência de DNA/métodos , Software , Alelos , Animais , Eucariotos/genética , Genética Populacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
Sympatric diversification is recognized to have played an important role in the evolution of biodiversity. However, an in situ sympatric origin for codistributed taxa is difficult to demonstrate because different evolutionary processes can lead to similar biogeographic outcomes, especially in ecosystems that can readily facilitate secondary contact due to a lack of hard barriers to dispersal. Here we use a genomic (ddRADseq), model-based approach to delimit a species complex of tropical sea anemones that are codistributed on coral reefs throughout the Tropical Western Atlantic. We use coalescent simulations in fastsimcoal2 and ordinary differential equations in Moments to test competing diversification scenarios that span the allopatric-sympatric continuum. Our results suggest that the corkscrew sea anemone Bartholomea annulata is a cryptic species complex whose members are codistributed throughout their range. Simulation and model selection analyses from both approaches suggest these lineages experienced historical and contemporary gene flow, supporting a sympatric origin, but an alternative secondary contact model receives appreciable model support in fastsimcoal2. Leveraging the genome of the closely related Exaiptasia diaphana, we identify five loci under divergent selection between cryptic B. annulata lineages that fall within mRNA transcripts or CDS regions. Our study provides a rare empirical, genomic example of sympatric speciation in a tropical anthozoan and the first range-wide molecular study of a tropical sea anemone, underscoring that anemone diversity is under-described in the tropics, and highlighting the need for additional systematic studies into these ecologically and economically important species.
Assuntos
Fluxo Gênico , Especiação Genética , Genômica , Anêmonas-do-Mar/genética , Simpatria/genética , Animais , Oceano Atlântico , Análise por Conglomerados , Análise Discriminante , Ontologia Genética , Loci Gênicos , Genética Populacional , Geografia , Modelos Genéticos , Seleção Genética , Especificidade da EspécieRESUMO
The analysis of hybridization and gene flow among closely related taxa is a common goal for researchers studying speciation and phylogeography. Many methods for hybridization detection use simple site pattern frequencies from observed genomic data and compare them to null models that predict an absence of gene flow. The theory underlying the detection of hybridization using these site pattern probabilities exploits the relationship between the coalescent process for gene trees within population trees and the process of mutation along the branches of the gene trees. For certain models, site patterns are predicted to occur in equal frequency (i.e., their difference is 0), producing a set of functions called phylogenetic invariants. In this article, we introduce HyDe, a software package for detecting hybridization using phylogenetic invariants arising under the coalescent model with hybridization. HyDe is written in Python and can be used interactively or through the command line using pre-packaged scripts. We demonstrate the use of HyDe on simulated data, as well as on two empirical data sets from the literature. We focus in particular on identifying individual hybrids within population samples and on distinguishing between hybrid speciation and gene flow. HyDe is freely available as an open source Python package under the GNU GPL v3 on both GitHub (https://github.com/pblischak/HyDe) and the Python Package Index (PyPI: https://pypi.python.org/pypi/phyde).
Assuntos
Biologia Computacional/métodos , Fluxo Gênico , Especiação Genética , Hibridização Genética , SoftwareRESUMO
Polyploidy is an important generator of evolutionary novelty across diverse groups in the Tree of Life, including many crops. However, the impact of whole-genome duplication depends on the mode of formation: doubling within a single lineage (autopolyploidy) versus doubling after hybridization between two different lineages (allopolyploidy). Researchers have historically treated these two scenarios as completely separate cases based on patterns of chromosome pairing, but these cases represent ideals on a continuum of chromosomal interactions among duplicated genomes. Understanding the history of polyploid species thus demands quantitative inferences of demographic history and rates of exchange between subgenomes. To meet this need, we developed diffusion models for genetic variation in polyploids with subgenomes that cannot be bioinformatically separated and with potentially variable inheritance patterns, implementing them in the dadi software. We validated our models using forward SLiM simulations and found that our inference approach is able to accurately infer evolutionary parameters (timing, bottleneck size) involved with the formation of auto- and allotetraploids, as well as exchange rates in segmental allotetraploids. We then applied our models to empirical data for allotetraploid shepherd's purse (Capsella bursa-pastoris), finding evidence for allelic exchange between the subgenomes. Taken together, our model provides a foundation for demographic modeling in polyploids using diffusion equations, which will help increase our understanding of the impact of demography and selection in polyploid lineages.
Assuntos
Capsella , Poliploidia , Evolução Biológica , Hibridização Genética , Capsella/genética , DemografiaRESUMO
Inferring the frequency and mode of hybridization among closely related organisms is an important step for understanding the process of speciation and can help to uncover reticulated patterns of phylogeny more generally. Phylogenomic methods to test for the presence of hybridization come in many varieties and typically operate by leveraging expected patterns of genealogical discordance in the absence of hybridization. An important assumption made by these tests is that the data (genes or SNPs) are independent given the species tree. However, when the data are closely linked, it is especially important to consider their nonindependence. Recently, deep learning techniques such as convolutional neural networks (CNNs) have been used to perform population genetic inferences with linked SNPs coded as binary images. Here, we use CNNs for selecting among candidate hybridization scenarios using the tree topology (((P1 , P2 ), P3 ), Out) and a matrix of pairwise nucleotide divergence (dXY ) calculated in windows across the genome. Using coalescent simulations to train and independently test a neural network showed that our method, HyDe-CNN, was able to accurately perform model selection for hybridization scenarios across a wide breath of parameter space. We then used HyDe-CNN to test models of admixture in Heliconius butterflies, as well as comparing it to phylogeny-based introgression statistics. Given the flexibility of our approach, the dropping cost of long-read sequencing and the continued improvement of CNN architectures, we anticipate that inferences of hybridization using deep learning methods like ours will help researchers to better understand patterns of admixture in their study organisms.
Assuntos
Borboletas , Animais , Borboletas/genética , Cromossomos , Especiação Genética , Hibridização Genética , Redes Neurais de Computação , FilogeniaRESUMO
Most land plants are now known to be ancient polyploids that have rediploidized. Diploidization involves many changes in genome organization that ultimately restore bivalent chromosome pairing and disomic inheritance, and resolve dosage and other issues caused by genome duplication. In this review, we discuss the nature of polyploidy and its impact on chromosome pairing behavior. We also provide an overview of two major and largely independent processes of diploidization: cytological diploidization and genic diploidization/fractionation. Finally, we compare variation in gene fractionation across land plants and highlight the differences in diploidization between plants and animals. Altogether, we demonstrate recent advancements in our understanding of variation in the patterns and processes of diploidization in land plants and provide a road map for future research to unlock the mysteries of diploidization and eukaryotic genome evolution.
Assuntos
Embriófitas , Genoma de Planta , Animais , Evolução Molecular , Plantas/genética , PoliploidiaRESUMO
Despite early domestication around 3000 BC, the evolutionary history of the ancient allotetraploid species Brassica juncea (L.) Czern & Coss remains uncertain. Here, we report a chromosome-scale de novo assembly of a yellow-seeded B. juncea genome by integrating long-read and short-read sequencing, optical mapping and Hi-C technologies. Nuclear and organelle phylogenies of 480 accessions worldwide supported that B. juncea is most likely a single origin in West Asia, 8,000-14,000 years ago, via natural interspecific hybridization. Subsequently, new crop types evolved through spontaneous gene mutations and introgressions along three independent routes of eastward expansion. Selective sweeps, genome-wide trait associations and tissue-specific RNA-sequencing analysis shed light on the domestication history of flowering time and seed weight, and on human selection for morphological diversification in this versatile species. Our data provide a comprehensive insight into the origin and domestication and a foundation for genomics-based breeding of B. juncea.
Assuntos
Evolução Biológica , Cromossomos de Plantas/genética , Domesticação , Mostardeira/genética , Melhoramento Vegetal , Genoma de Planta/genética , Hibridização Genética/genética , Característica Quantitativa HerdávelRESUMO
PREMISE: Environmentally controlled facilities, such as growth chambers, are essential tools for experimental research. Automated, low-cost, remote-monitoring hardware can greatly improve both reproducibility and maintenance. METHODS AND RESULTS: Using a Raspberry Pi computer, open-source software, environmental sensors, and a camera, we developed Growth Monitor pi (GMpi), a cost-effective system for monitoring growth chamber conditions. Coupled with our software, GMPi_Pack, our setup automates sensor readings, photography, and alerts when conditions fall out of range. CONCLUSIONS: GMpi offers access to environmental data logging, improving reproducibility of experiments and reinforcing the stability of controlled environmental facilities. The device is also flexible and scalable, allowing researchers the ability to customize and expand GMpi for their own needs.
RESUMO
PREMISE OF THE STUDY: Targeted enrichment strategies for phylogenomic inference are a time- and cost-efficient way to collect DNA sequence data for large numbers of individuals at multiple, independent loci. Automated and reproducible processing of these data is a crucial step for researchers conducting phylogenetic studies. METHODS AND RESULTS: We present Fluidigm2PURC, an open source Python utility for processing paired-end Illumina data from double-barcoded PCR amplicons. In combination with the program PURC (Pipeline for Untangling Reticulate Complexes), our scripts process raw FASTQ files for analysis with PURC and use its output to infer haplotypes for diploids, polyploids, and samples with unknown ploidy. We demonstrate the use of the pipeline with an example data set from the genus Thalictrum (Ranunculaceae). CONCLUSIONS: Fluidigm2PURC is freely available for Unix-like operating systems on GitHub (https://github.com/pblischak/fluidigm2purc) and for all operating systems through Docker (https://hub.docker.com/r/pblischak/fluidigm2purc).
RESUMO
PREMISE OF THE STUDY: We developed primers targeting nuclear loci in Castilleja with the goal of reconstructing the evolutionary history of this challenging clade. These primers were tested across other major clades in Orobanchaceae to assess their broader utility. METHODS AND RESULTS: We assembled low-coverage genomes for three taxa in Castilleja and developed primer combinations for the single-copy conserved ortholog set (COSII) and the pentatricopeptide repeat (PPR) gene family. These primer combinations were designed to take advantage of the Fluidigm microfluidic PCR platform and are well suited for high-throughput sequencing applications. Eighty-seven primers were designed for Castilleja, and 27 were found to have broader utility in Orobanchaceae. CONCLUSIONS: These results demonstrate the utility of these primers, not only across Castilleja, but for other lineages within Orobanchaceae as well. This expanded molecular toolkit will be an asset to future phylogenetic studies in Castilleja and throughout Orobanchaceae.
RESUMO
Despite the increasing opportunity to collect large-scale data sets for population genomic analyses, the use of high-throughput sequencing to study populations of polyploids has seen little application. This is due in large part to problems associated with determining allele copy number in the genotypes of polyploid individuals (allelic dosage uncertainty-ADU), which complicates the calculation of important quantities such as allele frequencies. Here, we describe a statistical model to estimate biallelic SNP frequencies in a population of autopolyploids using high-throughput sequencing data in the form of read counts. We bridge the gap from data collection (using restriction enzyme based techniques [e.g. GBS, RADseq]) to allele frequency estimation in a unified inferential framework using a hierarchical Bayesian model to sum over genotype uncertainty. Simulated data sets were generated under various conditions for tetraploid, hexaploid and octoploid populations to evaluate the model's performance and to help guide the collection of empirical data. We also provide an implementation of our model in the R package polyfreqs and demonstrate its use with two example analyses that investigate (i) levels of expected and observed heterozygosity and (ii) model adequacy. Our simulations show that the number of individuals sampled from a population has a greater impact on estimation error than sequencing coverage. The example analyses also show that our model and software can be used to make inferences beyond the estimation of allele frequencies for autopolyploids by providing assessments of model adequacy and estimates of heterozygosity.
Assuntos
Bioestatística/métodos , Frequência do Gene , Genética Populacional/métodos , Genótipo , Poliploidia , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNARESUMO
PREMISE OF THE STUDY: Penstemon (Plantaginaceae) is a large and diverse genus endemic to North America. However, determining the phylogenetic relationships among its 280 species has been difficult due to its recent evolutionary radiation. The development of a large, multilocus data set can help to resolve this challenge. ⢠METHODS: Using both previously sequenced genomic libraries and our own low-coverage whole-genome shotgun sequencing libraries, we used the MAKER2 Annotation Pipeline to identify gene regions for the development of sequencing loci from six extremely low-coverage Penstemon genomes (â¼0.005×-0.007×). We also compared this approach to BLAST searches, and conducted analyses to characterize sequence divergence across the species sequenced. ⢠RESULTS: Annotations and gene predictions were successfully added to more than 10,000 contigs for potential use in downstream primer design. Primers were then designed for chloroplast, mitochondrial, and nuclear loci from these annotated sequences. MAKER2 identified longer gene regions in all six Penstemon genomes when compared with BLASTN and BLASTX searches. The average level of sequence divergence among the six species was 7.14%. ⢠DISCUSSION: Combining bioinformatics tools into a workflow that produces annotations can be useful for creating potential phylogenetic markers from thousands of sequences even when genome coverage is extremely low and reference data are only available from distant relatives. Furthermore, the output from MAKER2 contains information about important gene features, such as exon boundaries, and can be easily integrated with visualization tools to facilitate the process of marker development.