RESUMO
Crop production is becoming an increasing challenge as the global population grows and the climate changes. Modern cultivated crop species are selected for productivity under optimal growth environments and have often lost genetic variants that could allow them to adapt to diverse, and now rapidly changing, environments. These genetic variants are often present in their closest wild relatives, but so are less desirable traits. How to preserve and effectively utilize the rich genetic resources that crop wild relatives offer while avoiding detrimental variants and maladaptive genetic contributions is a central challenge for ongoing crop improvement. This Essay explores this challenge and potential paths that could lead to a solution.
Assuntos
Produtos Agrícolas , Diamante , Genoma de Planta , Fenótipo , Adaptação FisiológicaRESUMO
Fusarium oxysporum f. sp. fragariae (Fof) race 1 is avirulent on cultivars with the dominant resistance gene FW1, while Fof race 2 is virulent on FW1-resistant cultivars. We hypothesized there was a gene-for-gene interaction between a gene at the FW1 locus and an avirulence gene (AvrFW1) in Fof race 1. To identify a candidate AvrFW1, we compared genomes of 24 Fof race 1 and three Fof race 2 isolates. We found one candidate gene that was present in race 1, was absent in race 2, was highly expressed in planta, and was homologous to a known effector, secreted in xylem 6 (SIX6). We knocked out SIX6 in two Fof race 1 isolates by homologous recombination. All SIX6 knockout transformants (ΔSIX6) gained virulence on FW1/fw1 cultivars, whereas ectopic transformants and the wildtype isolates remained avirulent. ΔSIX6 isolates were quantitatively less virulent on FW1/fw1 cultivars Fronteras and San Andreas than fw1/fw1 cultivars. Seedlings from an FW1/fw1 × fw1/fw1 population were genotyped for FW1 and tested for susceptibility to a SIX6 knockout isolate. Results suggested that additional minor-effect quantitative resistance genes could be present at the FW1 locus. This work demonstrates that SIX6 acts as an avirulence factor interacting with a resistance gene at the FW1 locus. The identification of AvrFW1 enables surveillance for Fof race 2 and provides insight into the mechanisms of FW1-mediated resistance. [Formula: see text] Copyright © 2024 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Assuntos
Resistência à Doença , Fragaria , Fusarium , Doenças das Plantas , Fusarium/patogenicidade , Fusarium/genética , Doenças das Plantas/microbiologia , Virulência , Fragaria/microbiologia , Resistência à Doença/genética , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Xilema/microbiologiaRESUMO
The development of genome-informed methods for identifying quantitative trait loci (QTL) and studying the genetic basis of quantitative variation in natural and experimental populations has been driven by advances in high-throughput genotyping. For many complex traits, the underlying genetic variation is caused by the segregation of one or more 'large-effect' loci, in addition to an unknown number of loci with effects below the threshold of statistical detection. The large-effect loci segregating in populations are often necessary but not sufficient for predicting quantitative phenotypes. They are, nevertheless, important enough to warrant deeper study and direct modelling in genomic prediction problems. We explored the accuracy of statistical methods for estimating the fraction of marker-associated genetic variance (p) and heritability ([Formula: see text]) for large-effect loci underlying complex phenotypes. We found that commonly used statistical methods overestimate p and [Formula: see text]. The source of the upward bias was traced to inequalities between the expected values of variance components in the numerators and denominators of these parameters. Algebraic solutions for bias-correcting estimates of p and [Formula: see text] were found that only depend on the degrees of freedom and are constant for a given study design. We discovered that average semivariance methods, which have heretofore not been used in complex trait analyses, yielded unbiased estimates of p and [Formula: see text], in addition to best linear unbiased predictors of the additive and dominance effects of the underlying loci. The cryptic bias problem described here is unrelated to selection bias, although both cause the overestimation of p and [Formula: see text]. The solutions we described are predicted to more accurately describe the contributions of large-effect loci to the genetic variation underlying complex traits of medical, biological, and agricultural importance.
Assuntos
Previsões/métodos , Herança Multifatorial/genética , Locos de Características Quantitativas/genética , Alelos , Animais , Marcadores Genéticos/genética , Variação Genética/genética , Genômica/métodos , Genótipo , Humanos , Modelos Genéticos , Modelos Teóricos , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Cultivated strawberry (Fragaria × ananassa) is one of our youngest domesticates, originating in early eighteenth-century Europe from spontaneous hybrids between wild allo-octoploid species (Fragaria chiloensis and Fragaria virginiana). The improvement of horticultural traits by 300 years of breeding has enabled the global expansion of strawberry production. Here, we describe the genomic history of strawberry domestication from the earliest hybrids to modern cultivars. We observed a significant increase in heterozygosity among interspecific hybrids and a decrease in heterozygosity among domesticated descendants of those hybrids. Selective sweeps were found across the genome in early and modern phases of domestication-59-76% of the selectively swept genes originated in the three less dominant ancestral subgenomes. Contrary to the tenet that genetic diversity is limited in cultivated strawberry, we found that the octoploid species harbor massive allelic diversity and that F. × ananassa harbors as much allelic diversity as either wild founder. We identified 41.8 M subgenome-specific DNA variants among resequenced wild and domesticated individuals. Strikingly, 98% of common alleles and 73% of total alleles were shared between wild and domesticated populations. Moreover, genome-wide estimates of nucleotide diversity were virtually identical in F. chiloensis,F. virginiana, and F. × ananassa (π = 0.0059-0.0060). We found, however, that nucleotide diversity and heterozygosity were significantly lower in modern F. × ananassa populations that have experienced significant genetic gains and have produced numerous agriculturally important cultivars.
Assuntos
Domesticação , Fragaria/genética , Variação Genética , Genoma de Planta , Hibridização Genética , Cromossomos de Plantas , Desequilíbrio de Ligação , Poliploidia , Seleção GenéticaRESUMO
KEY MESSAGE: Several Fusarium wilt resistance genes were discovered, genetically and physically mapped, and rapidly deployed via marker-assisted selection to develop cultivars resistant to Fusarium oxysporum f. sp. fragariae, a devastating soil-borne pathogen of strawberry. Fusarium wilt, a soilborne disease caused by Fusarium oxysporum f. sp. fragariae, poses a significant threat to strawberry (Fragaria [Formula: see text] ananassa) production in many parts of the world. This pathogen causes wilting, collapse, and death in susceptible genotypes. We previously identified a dominant gene (FW1) on chromosome 2B that confers resistance to race 1 of the pathogen, and hypothesized that gene-for-gene resistance to Fusarium wilt was widespread in strawberry. To explore this, a genetically diverse collection of heirloom and modern cultivars and octoploid ecotypes were screened for resistance to Fusarium wilt races 1 and 2. Here, we show that resistance to both races is widespread in natural and domesticated populations and that resistance to race 1 is conferred by partially to completely dominant alleles among loci (FW1, FW2, FW3, FW4, and FW5) found on three non-homoeologous chromosomes (1A, 2B, and 6B). The underlying genes have not yet been cloned and functionally characterized; however, plausible candidates were identified that encode pattern recognition receptors or other proteins known to confer gene-for-gene resistance in plants. High-throughput genotyping assays for SNPs in linkage disequilibrium with FW1-FW5 were developed to facilitate marker-assisted selection and accelerate the development of race 1 resistant cultivars. This study laid the foundation for identifying the genes encoded by FW1-FW5, in addition to exploring the genetics of resistance to race 2 and other races of the pathogen, as a precaution to averting a Fusarium wilt pandemic.
Assuntos
Fragaria , Fusarium , Cromossomos , Fragaria/genética , Doenças das Plantas/genéticaRESUMO
The annual production of strawberry has increased by one million tonnes in the US and 8.4 million tonnes worldwide since 1960. Here we show that the US expansion was driven by genetic gains from Green Revolution breeding and production advances that increased yields by 2,755%. Using a California population with a century-long breeding history and phenotypes of hybrids observed in coastal California environments, we estimate that breeding has increased fruit yields by 2,974-6,636%, counts by 1,454-3,940%, weights by 228-504%, and firmness by 239-769%. Using genomic prediction approaches, we pinpoint the origin of the Green Revolution to the early 1950s and uncover significant increases in additive genetic variation caused by transgressive segregation and phenotypic diversification. Lastly, we show that the most consequential Green Revolution breeding breakthrough was the introduction of photoperiod-insensitive, PERPETUAL FLOWERING hybrids in the 1970s that doubled yields and drove the dramatic expansion of strawberry production in California.
Assuntos
Fragaria , Fragaria/genética , Melhoramento Vegetal , Fenótipo , Meio Ambiente , GenômicaRESUMO
Verticillium wilt (VW), a devastating vascular wilt disease of strawberry (Fragaria × $\times$ ananassa), has caused economic losses for nearly a century. This disease is caused by the soil-borne pathogen Verticillium dahliae, which occurs nearly worldwide and causes disease in numerous agriculturally important plants. The development of VW-resistant cultivars is critically important for the sustainability of strawberry production. We previously showed that a preponderance of the genetic resources (asexually propagated hybrid individuals) preserved in public germplasm collections were moderately to highly susceptible and that genetic gains for increased resistance to VW have been negligible over the last 60 years. To more fully understand the challenges associated with breeding for increased quantitative resistance to this pathogen, we developed and phenotyped a training population of hybrids ( n = 564 $n = 564$ ) among elite parents with a wide range of resistance phenotypes. When these data were combined with training data from a population of elite and exotic hybrids ( n = 386 $n = 386$ ), genomic prediction accuracies of 0.47-0.48 were achieved and were predicted to explain 70%-75% of the additive genetic variance for resistance. We concluded that breeding values for resistance to VW can be predicted with sufficient accuracy for effective genomic selection with routine updating of training populations.
Assuntos
Fragaria , Verticillium , Humanos , Fragaria/genética , Doenças das Plantas/genética , Melhoramento Vegetal , FenótipoRESUMO
Heterosis was the catalyst for the domestication of cultivated strawberry (Fragaria × ananassa), an interspecific hybrid species that originated in the 1700s. The hybrid origin was discovered because the phenotypes of spontaneous hybrids transgressed those of their parent species. The transgressions included fruit yield increases and other genetic gains in the twentieth century that sparked the global expansion of strawberry production. The importance of heterosis to the agricultural success of the hybrid species, however, has remained a mystery. Here we show that heterosis has disappeared (become fixed) among improved hybrids within a population (the California population) that has been under long-term selection for increased fruit yield, weight, and firmness. We found that the highest yielding hybrids are among the most highly inbred (59-79%), which seems counterintuitive for a highly heterozygous, outbreeder carrying heavy genetic loads. Although faint remnants of heterosis were discovered, the between-parent allele frequency differences and dispersed favorable dominant alleles necessary for heterosis have decreased nearly genome-wide within the California population. Conversely, heterosis was prevalent and significant among wide hybrids, especially for fruit count, a significant driver of genetic gains for fruit yield. We attributed the disappearance (fixation) of heterosis within the California population to increased homozygosity of favorable dominant alleles and inbreeding associated with selection, random genetic drift, and selective sweeps. Despite historical inbreeding, the highest yielding hybrids reported to-date are estimated to be heterozygous for 20,370-44,280 of 97,000-108,000 genes in the octoploid genome, the equivalent of an entire diploid genome or more.
RESUMO
Two decades have passed since the strawberry (Fragaria x ananassa) disease caused by Macrophomina phaseolina, a necrotrophic soilborne fungal pathogen, began surfacing in California, Florida, and elsewhere. This disease has since become one of the most common causes of plant death and yield losses in strawberry. The Macrophomina problem emerged and expanded in the wake of the global phase-out of soil fumigation with methyl bromide and appears to have been aggravated by an increase in climate change-associated abiotic stresses. Here we show that sources of resistance to this pathogen are rare in gene banks and that the favorable alleles they carry are phenotypically unobvious. The latter were exposed by transgressive segregation and selection in populations phenotyped for resistance to Macrophomina under heat and drought stress. The genetic gains were immediate and dramatic. The frequency of highly resistant individuals increased from 1% in selection cycle 0 to 74% in selection cycle 2. Using GWAS and survival analysis, we found that phenotypic selection had increased the frequencies of favorable alleles among 10 loci associated with resistance and that favorable alleles had to be accumulated among four or more of these loci for an individual to acquire resistance. An unexpectedly straightforward solution to the Macrophomina disease resistance breeding problem emerged from our studies, which showed that highly resistant cultivars can be developed by genomic selection per se or marker-assisted stacking of favorable alleles among a comparatively small number of large-effect loci.
RESUMO
Large-effect loci-those statistically significant loci discovered by genome-wide association studies or linkage mapping-associated with key traits segregate amidst a background of minor, often undetectable, genetic effects in wild and domesticated plants and animals. Accurately attributing mean differences and variance explained to the correct components in the linear mixed model analysis is vital for selecting superior progeny and parents in plant and animal breeding, gene therapy, and medical genetics in humans. Marker-assisted prediction and its successor, genomic prediction, have many advantages for selecting superior individuals and understanding disease risk. However, these two approaches are less often integrated to study complex traits with different genetic architectures. This simulation study demonstrates that the average semivariance can be applied to models incorporating Mendelian, oligogenic, and polygenic terms simultaneously and yields accurate estimates of the variance explained for all relevant variables. Our previous research focused on large-effect loci and polygenic variance separately. This work aims to synthesize and expand the average semivariance framework to various genetic architectures and the corresponding mixed models. This framework independently accounts for the effects of large-effect loci and the polygenic genetic background and is universally applicable to genetics studies in humans, plants, animals, and microbes.
Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Animais , Herança Multifatorial/genética , Mapeamento Cromossômico , Genoma , Fenótipo , Modelos Genéticos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
The development of strawberry (Fragaria × ananassa Duchesne ex Rozier) cultivars resistant to Phytophthora crown rot (PhCR), a devastating disease caused by the soil-borne pathogen Phytophthora cactorum (Lebert & Cohn) J. Schröt., has been challenging partly because the resistance phenotypes are quantitative and only moderately heritable. To develop deeper insights into the genetics of resistance and build the foundation for applying genomic selection, a genetically diverse training population was screened for resistance to California isolates of the pathogen. Here we show that genetic gains in breeding for resistance to PhCR have been negligible (3% of the cultivars tested were highly resistant and none surpassed early 20th century cultivars). Narrow-sense genomic heritability for PhCR resistance ranged from 0.41 to 0.75 among training population individuals. Using multivariate genome-wide association studies (GWAS), we identified a large-effect locus (predicted to be RPc2) that explained 43.6-51.6% of the genetic variance, was necessary but not sufficient for resistance, and was associated with calcium channel and other candidate genes with known plant defense functions. The addition of underutilized gene bank resources to our training population doubled additive genetic variance, increased the accuracy of genomic selection, and enabled the discovery of individuals carrying favorable alleles that are either rare or not present in modern cultivars. The incorporation of an RPc2-associated single-nucleotide polymorphism (SNP) as a fixed effect increased genomic prediction accuracy from 0.40 to 0.55. Finally, we show that parent selection using genomic-estimated breeding values, genetic variances, and cross usefulness holds promise for enhancing resistance to PhCR in strawberry.
Assuntos
Fragaria , Phytophthora , Fragaria/genética , Phytophthora/genética , Estudo de Associação Genômica Ampla , Melhoramento Vegetal , GenômicaRESUMO
Genomic prediction in breeding populations containing hundreds to thousands of parents and seedlings is prohibitively expensive with current high-density genetic marker platforms designed for strawberry. We developed mid-density panels of molecular inversion probes (MIPs) to be deployed with the "DArTag" marker platform to provide a low-cost, high-throughput genotyping solution for strawberry genomic prediction. In total, 7742 target single nucleotide polymorphism (SNP) regions were used to generate MIP assays that were tested with a screening panel of 376 octoploid Fragaria accessions. We evaluated the performance of DArTag assays based on genotype segregation, amplicon coverage, and their ability to produce subgenome-specific amplicon alignments to the FaRR1 assembly and subsequent alignment-based variant calls with strong concordance to DArT's alignment-free, count-based genotype reports. We used a combination of marker performance metrics and physical distribution in the FaRR1 assembly to select 3K and 5K production panels for genotyping of large strawberry populations. We show that the 3K and 5K DArTag panels are able to target and amplify homologous alleles within subgenomic sequences with low-amplification bias between reference and alternate alleles, supporting accurate genotype calling while producing marker genotypes that can be treated as functionally diploid for quantitative genetic analysis. The 3K and 5K target SNPs show high levels of polymorphism in diverse F. × ananassa germplasm and UC Davis cultivars, with mean pairwise diversity (π) estimates of 0.40 and 0.32 and mean heterozygous genotype frequencies of 0.35 and 0.33, respectively.
Assuntos
Fragaria , Mapeamento Cromossômico , Fragaria/genética , Genótipo , Melhoramento Vegetal , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Anthracnose fruit rot (AFR), caused by the fungal pathogen Colletotrichum fioriniae, is among the most destructive and widespread fruit disease of blueberry, impacting both yield and overall fruit quality. Blueberry cultivars have highly variable resistance against AFR. To date, this pathogen is largely controlled by applying various fungicides; thus, a more cost-effective and environmentally conscious solution for AFR is needed. Here we report three quantitative trait loci associated with AFR resistance in northern highbush blueberry (Vaccinium corymbosum). Candidate genes within these genomic regions are associated with the biosynthesis of flavonoids (e.g. anthocyanins) and resistance against pathogens. Furthermore, we examined gene expression changes in fruits following inoculation with Colletotrichum in a resistant cultivar, which revealed an enrichment of significantly differentially expressed genes associated with certain specialized metabolic pathways (e.g. flavonol biosynthesis) and pathogen resistance. Using non-targeted metabolite profiling, we identified a flavonol glycoside with properties consistent with a quercetin rhamnoside as a compound exhibiting significant abundance differences among the most resistant and susceptible individuals from the genetic mapping population. Further analysis revealed that this compound exhibits significant abundance differences among the most resistant and susceptible individuals when analyzed as two groups. However, individuals within each group displayed considerable overlapping variation in this compound, suggesting that its abundance may only be partially associated with resistance against C. fioriniae. These findings should serve as a powerful resource that will enable breeding programs to more easily develop new cultivars with superior resistance to AFR and as the basis of future research studies.
RESUMO
Many important traits in plants, animals, and microbes are polygenic and challenging to improve through traditional marker-assisted selection. Genomic prediction addresses this by incorporating all genetic data in a mixed model framework. The primary method for predicting breeding values is genomic best linear unbiased prediction, which uses the realized genomic relationship or kinship matrix (K) to connect genotype to phenotype. Genomic relationship matrices share information among entries to estimate the observed entries' genetic values and predict unobserved entries' genetic values. One of the main parameters of such models is genomic variance (σg2), or the variance of a trait associated with a genome-wide sample of DNA polymorphisms, and genomic heritability (hg2); however, the seminal papers introducing different forms of K often do not discuss their effects on the model estimated variance components despite their importance in genetic research and breeding. Here, we discuss the effect of several standard methods for calculating the genomic relationship matrix on estimates of σg2 and hg2. With current approaches, we found that the genomic variance tends to be either overestimated or underestimated depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assortment of alleles and heterozygosity (H) in the observed population. Using the average semivariance, we propose a new matrix, KASV, that directly yields accurate estimates of σg2 and hg2 in the observed population and produces best linear unbiased predictors equivalent to routine methods in plants and animals.
Assuntos
Modelos Genéticos , Herança Multifatorial , Alelos , Animais , Genômica/métodos , Genótipo , Fenótipo , Melhoramento Vegetal , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Population structure (also called genetic structure and population stratification) is the presence of a systematic difference in allele frequencies between subpopulations in a population as a result of nonrandom mating between individuals. It can be informative of genetic ancestry, and in the context of medical genetics, it is an important confounding variable in genome-wide association studies. Recently, many nonlinear dimensionality reduction techniques have been proposed for the population structure visualization task. However, an objective comparison of these techniques has so far been missing from the literature. In this article, we discuss the previously proposed nonlinear techniques and some of their potential weaknesses. We then propose a novel quantitative evaluation methodology for comparing these nonlinear techniques, based on populations for which pedigree is known a priori either through artificial selection or simulation. Based on this evaluation metric, we find graph-based algorithms such as t-SNE and UMAP to be superior to principal component analysis, while neural network-based methods fall behind.
Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Simulação por Computador , Frequência do Gene , Genética Populacional , Estudo de Associação Genômica Ampla/métodos , Humanos , Análise de Componente PrincipalRESUMO
Gray mold, a disease of strawberry (Fragaria × ananassa) caused by the ubiquitous necrotroph Botrytis cinerea, renders fruit unmarketable and causes economic losses in the postharvest supply chain. To explore the feasibility of selecting for increased resistance to gray mold, we undertook genetic and genomic prediction studies in strawberry populations segregating for fruit quality and shelf life traits hypothesized to pleiotropically affect susceptibility. As predicted, resistance to gray mold was heritable but quantitative and genetically complex. While every individual was susceptible, the speed of symptom progression and severity differed. Narrow-sense heritability ranged from 0.38 to 0.71 for lesion diameter (LD) and 0.39 to 0.44 for speed of emergence of external mycelium (EM). Even though significant additive genetic variation was observed for LD and EM, the phenotypic ranges were comparatively narrow and genome-wide analyses did not identify any large-effect loci. Genomic selection (GS) accuracy ranged from 0.28 to 0.59 for LD and 0.37 to 0.47 for EM. Additive genetic correlations between fruit quality and gray mold resistance traits were consistent with prevailing hypotheses: LD decreased as titratable acidity increased, whereas EM increased as soluble solid content decreased and firmness increased. We concluded that phenotypic and GS could be effective for reducing LD and increasing EM, especially in long shelf life populations, but that a significant fraction of the genetic variation for resistance to gray mold was caused by the pleiotropic effects of fruit quality traits that differ among market and shelf life classes.
Assuntos
Fragaria , Botrytis , Fragaria/genética , Fragaria/microbiologia , Frutas/genética , Estudo de Associação Genômica Ampla , Genômica , Doenças das Plantas/genética , Doenças das Plantas/microbiologiaRESUMO
The widely recounted story of the origin of cultivated strawberry (Fragaria × ananassa) oversimplifies the complex interspecific hybrid ancestry of the highly admixed populations from which heirloom and modern cultivars have emerged. To develop deeper insights into the three-century-long domestication history of strawberry, we reconstructed the genealogy as deeply as possible-pedigree records were assembled for 8,851 individuals, including 2,656 cultivars developed since 1775. The parents of individuals with unverified or missing pedigree records were accurately identified by applying an exclusion analysis to array-genotyped single-nucleotide polymorphisms. We identified 187 wild octoploid and 1,171 F. × ananassa founders in the genealogy, from the earliest hybrids to modern cultivars. The pedigree networks for cultivated strawberry are exceedingly complex labyrinths of ancestral interconnections formed by diverse hybrid ancestry, directional selection, migration, admixture, bottlenecks, overlapping generations, and recurrent hybridization with common ancestors that have unequally contributed allelic diversity to heirloom and modern cultivars. Fifteen to 333 ancestors were predicted to have transmitted 90% of the alleles found in country-, region-, and continent-specific populations. Using parent-offspring edges in the global pedigree network, we found that selection cycle lengths over the past 200 years of breeding have been extraordinarily long (16.0-16.9 years/generation), but decreased to a present-day range of 6.0-10.0 years/generation. Our analyses uncovered conspicuous differences in the ancestry and structure of North American and European populations, and shed light on forces that have shaped phenotypic diversity in F. × ananassa.
Assuntos
Domesticação , Fragaria , Fragaria/genética , Hibridização Genética , Melhoramento VegetalRESUMO
BACKGROUND: Shape is a critical element of the visual appeal of strawberry fruit and is influenced by both genetic and non-genetic determinants. Current fruit phenotyping approaches for external characteristics in strawberry often rely on the human eye to make categorical assessments. However, fruit shape is an inherently multi-dimensional, continuously variable trait and not adequately described by a single categorical or quantitative feature. Morphometric approaches enable the study of complex, multi-dimensional forms but are often abstract and difficult to interpret. In this study, we developed a mathematical approach for transforming fruit shape classifications from digital images onto an ordinal scale called the Principal Progression of k Clusters (PPKC). We use these human-recognizable shape categories to select quantitative features extracted from multiple morphometric analyses that are best fit for genetic dissection and analysis. RESULTS: We transformed images of strawberry fruit into human-recognizable categories using unsupervised machine learning, discovered 4 principal shape categories, and inferred progression using PPKC. We extracted 68 quantitative features from digital images of strawberries using a suite of morphometric analyses and multivariate statistical approaches. These analyses defined informative feature sets that effectively captured quantitative differences between shape classes. Classification accuracy ranged from 68% to 99% for the newly created phenotypic variables for describing a shape. CONCLUSIONS: Our results demonstrated that strawberry fruit shapes could be robustly quantified, accurately classified, and empirically ordered using image analyses, machine learning, and PPKC. We generated a dictionary of quantitative traits for studying and predicting shape classes and identifying genetic factors underlying phenotypic variability for fruit shape in strawberry. The methods and approaches that we applied in strawberry should apply to other fruits, vegetables, and specialty crops.
Assuntos
Fragaria/genética , Frutas/genética , Estudos de Associação Genética , Aprendizado de Máquina , Fenótipo , Algoritmos , Estudos de Associação Genética/métodos , Padrões de Herança , Modelos TeóricosRESUMO
Allo-octoploid cultivated strawberry (Fragaria × ananassa) originated through a combination of polyploid and homoploid hybridization, domestication of an interspecific hybrid lineage, and continued admixture of wild species over the last 300 years. While genes appear to flow freely between the octoploid progenitors, the genome structures and diversity of the octoploid species remain poorly understood. The complexity and absence of an octoploid genome frustrated early efforts to study chromosome evolution, resolve subgenomic structure, and develop a single coherent linkage group nomenclature. Here, we show that octoploid Fragaria species harbor millions of subgenome-specific DNA variants. Their diversity was sufficient to distinguish duplicated (homoeologous and paralogous) DNA sequences and develop 50K and 850K SNP genotyping arrays populated with co-dominant, disomic SNP markers distributed throughout the octoploid genome. Whole-genome shotgun genotyping of an interspecific segregating population yielded 1.9M genetically mapped subgenome variants in 5,521 haploblocks spanning 3,394 cM in F. chiloensis subsp. lucida, and 1.6M genetically mapped subgenome variants in 3,179 haploblocks spanning 2,017 cM in F. × ananassa. These studies provide a dense genomic framework of subgenome-specific DNA markers for seamlessly cross-referencing genetic and physical mapping information and unifying existing chromosome nomenclatures. Using comparative genomics, we show that geographically diverse wild octoploids are effectively diploidized, nearly completely collinear, and retain strong macro-synteny with diploid progenitor species. The preservation of genome structure among allo-octoploid taxa is a critical factor in the unique history of garden strawberry, where unimpeded gene flow supported its origin and domestication through repeated cycles of interspecific hybridization.