Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
G3 (Bethesda) ; 13(4)2023 04 11.
Article in English | MEDLINE | ID: mdl-36625555

ABSTRACT

Accurate prediction of the phenotypic outcomes produced by different combinations of genotypes, environments, and management interventions remains a key goal in biology with direct applications to agriculture, research, and conservation. The past decades have seen an expansion of new methods applied toward this goal. Here we predict maize yield using deep neural networks, compare the efficacy of 2 model development methods, and contextualize model performance using conventional linear and machine learning models. We examine the usefulness of incorporating interactions between disparate data types. We find deep learning and best linear unbiased predictor (BLUP) models with interactions had the best overall performance. BLUP models achieved the lowest average error, but deep learning models performed more consistently with similar average error. Optimizing deep neural network submodules for each data type improved model performance relative to optimizing the whole model for all data types at once. Examining the effect of interactions in the best-performing model revealed that including interactions altered the model's sensitivity to weather and management features, including a reduction of the importance scores for timepoints expected to have a limited physiological basis for influencing yield-those at the extreme end of the season, nearly 200 days post planting. Based on these results, deep learning provides a promising avenue for the phenotypic prediction of complex traits in complex environments and a potential mechanism to better understand the influence of environmental and genetic factors.


Subject(s)
Deep Learning , Neural Networks, Computer , Machine Learning , Genotype , Multifactorial Inheritance
2.
G3 (Bethesda) ; 12(11)2022 11 04.
Article in English | MEDLINE | ID: mdl-36124944

ABSTRACT

We introduce the R-package learnMET, developed as a flexible framework to enable a collection of analyses on multi-environment trial breeding data with machine learning-based models. learnMET allows the combination of genomic information with environmental data such as climate and/or soil characteristics. Notably, the package offers the possibility of incorporating weather data from field weather stations, or to retrieve global meteorological datasets from a NASA database. Daily weather data can be aggregated over specific periods of time based on naive (for instance, nonoverlapping 10-day windows) or phenological approaches. Different machine learning methods for genomic prediction are implemented, including gradient-boosted decision trees, random forests, stacked ensemble models, and multilayer perceptrons. These prediction models can be evaluated via a collection of cross-validation schemes that mimic typical scenarios encountered by plant breeders working with multi-environment trial experimental data in a user-friendly way. The package is published under an MIT license and accessible on GitHub.


Subject(s)
Genomics , Machine Learning , Genomics/methods , Neural Networks, Computer
3.
Front Plant Sci ; 12: 699589, 2021.
Article in English | MEDLINE | ID: mdl-34880880

ABSTRACT

The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.

4.
Plant Cell Physiol ; 62(7): 1199-1214, 2021 Oct 29.
Article in English | MEDLINE | ID: mdl-34015110

ABSTRACT

The strength of the stalk rind, measured as rind penetrometer resistance (RPR), is an important contributor to stalk lodging resistance. To enhance the genetic architecture of RPR, we combined selection mapping on populations developed by 15 cycles of divergent selection for high and low RPR with time-course transcriptomic and metabolic analyses of the stalks. Divergent selection significantly altered allele frequencies of 3,656 and 3,412 single- nucleotide polymorphisms (SNPs) in the high and low RPR populations, respectively. Surprisingly, only 110 (1.56%) SNPs under selection were common in both populations, while the majority (98.4%) were unique to each population. This result indicated that high and low RPR phenotypes are produced by biologically distinct mechanisms. Remarkably, regions harboring lignin and polysaccharide genes were preferentially selected in high and low RPR populations, respectively. The preferential selection was manifested as higher lignification and increased saccharification of the high and low RPR stalks, respectively. The evolution of distinct gene classes according to the direction of selection was unexpected in the context of parallel evolution and demonstrated that selection for a trait, albeit in different directions, does not necessarily act on the same genes. Tricin, a grass-specific monolignol that initiates the incorporation of lignin in the cell walls, emerged as a key determinant of RPR. Integration of selection mapping and transcriptomic analyses with published genetic studies of RPR identified several candidate genes including ZmMYB31, ZmNAC25, ZmMADS1, ZmEXPA2, ZmIAA41 and hk5. These findings provide a foundation for an enhanced understanding of RPR and the improvement of stalk lodging resistance.


Subject(s)
Zea mays/genetics , Cell Wall/metabolism , Evolution, Molecular , Gene Expression Profiling , Gene Frequency , Metabolomics , Polymorphism, Single Nucleotide/genetics , Quantitative Trait, Heritable , Zea mays/anatomy & histology
5.
G3 (Bethesda) ; 10(11): 4227-4239, 2020 11 05.
Article in English | MEDLINE | ID: mdl-32978264

ABSTRACT

Plant growth, development, and nutritional quality depends upon amino acid homeostasis, especially in seeds. However, our understanding of the underlying genetics influencing amino acid content and composition remains limited, with only a few candidate genes and quantitative trait loci identified to date. Improved knowledge of the genetics and biological processes that determine amino acid levels will enable researchers to use this information for plant breeding and biological discovery. Toward this goal, we used genomic prediction to identify biological processes that are associated with, and therefore potentially influence, free amino acid (FAA) composition in seeds of the model plant Arabidopsis thaliana Markers were split into categories based on metabolic pathway annotations and fit using a genomic partitioning model to evaluate the influence of each pathway on heritability explained, model fit, and predictive ability. Selected pathways included processes known to influence FAA composition, albeit to an unknown degree, and spanned four categories: amino acid, core, specialized, and protein metabolism. Using this approach, we identified associations for pathways containing known variants for FAA traits, in addition to finding new trait-pathway associations. Markers related to amino acid metabolism, which are directly involved in FAA regulation, improved predictive ability for branched chain amino acids and histidine. The use of genomic partitioning also revealed patterns across biochemical families, in which serine-derived FAAs were associated with protein related annotations and aromatic FAAs were associated with specialized metabolic pathways. Taken together, these findings provide evidence that genomic partitioning is a viable strategy to uncover the relative contributions of biological processes to FAA traits in seeds, offering a promising framework to guide hypothesis testing and narrow the search space for candidate genes.


Subject(s)
Arabidopsis , Biological Phenomena , Amino Acids , Arabidopsis/genetics , Genomics , Humans , Plant Breeding , Seeds/genetics
6.
Curr Opin Plant Biol ; 54: 93-100, 2020 04.
Article in English | MEDLINE | ID: mdl-32325397

ABSTRACT

Crop domestication is a fascinating area of study, as shown by a multitude of recent reviews. Coupled with the increasing availability of genomic and phenomic resources in numerous crop species, insights from evolutionary biology will enable a deeper understanding of the genetic architecture and short-term evolution of complex traits, which can be used to inform selection strategies. Future advances in crop improvement will rely on the integration of population genetics with plant breeding methodology, and the development of community resources to support research in a variety of crop life histories and reproductive strategies. We highlight recent advances related to the role of selective sweeps and demographic history in shaping genetic architecture, how these breakthroughs can inform selection strategies, and the application of precision gene editing to leverage these connections.


Subject(s)
Domestication , Plant Breeding , Breeding , Gene Editing , Plants/genetics
7.
BMC Plant Biol ; 19(1): 412, 2019 Oct 08.
Article in English | MEDLINE | ID: mdl-31590656

ABSTRACT

BACKGROUND: Genome wide association studies (GWAS) are a powerful tool for identifying quantitative trait loci (QTL) and causal single nucleotide polymorphisms (SNPs)/genes associated with various important traits in crop species. Typically, GWAS in crops are performed using a panel of inbred lines, where multiple replicates of the same inbred are measured and the average phenotype is taken as the response variable. Here we describe and evaluate single plant GWAS (sp-GWAS) for performing a GWAS on individual plants, which does not require an association panel of inbreds. Instead sp-GWAS relies on the phenotypes and genotypes from individual plants sampled from a randomly mating population. Importantly, we demonstrate how sp-GWAS can be efficiently combined with a bulk segregant analysis (BSA) experiment to rapidly corroborate evidence for significant SNPs. RESULTS: In this study we used the Shoepeg maize landrace, collected as an open pollinating variety from a farm in Southern Missouri in the 1960's, to evaluate whether sp-GWAS coupled with BSA can efficiently and powerfully used to detect significant association of SNPs for plant height (PH). Plant were grown in 8 locations across two years and in total 768 individuals were genotyped and phenotyped for sp-GWAS. A total of 306 k polymorphic markers in 768 individuals evaluated via association analysis detected 25 significant SNPs (P ≤ 0.00001) for PH. The results from our single-plant GWAS were further validated by bulk segregant analysis (BSA) for PH. BSA sequencing was performed on the same population by selecting tall and short plants as separate bulks. This approach identified 37 genomic regions for plant height. Of the 25 significant SNPs from GWAS, the three most significant SNPs co-localize with regions identified by BSA. CONCLUSION: Overall, this study demonstrates that sp-GWAS coupled with BSA can be a useful tool for detecting significant SNPs and identifying candidate genes. This result is particularly useful for species/populations where association panels are not readily available.


Subject(s)
Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide/genetics , Zea mays/genetics , Chromosomes, Plant/genetics , Genome, Plant/genetics , Linkage Disequilibrium/genetics , Quantitative Trait Loci/genetics
8.
Plant Cell ; 31(9): 1968-1989, 2019 09.
Article in English | MEDLINE | ID: mdl-31239390

ABSTRACT

Premature senescence in annual crops reduces yield, while delayed senescence, termed stay-green, imposes positive and negative impacts on yield and nutrition quality. Despite its importance, scant information is available on the genetic architecture of senescence in maize (Zea mays) and other cereals. We combined a systematic characterization of natural diversity for senescence in maize and coexpression networks derived from transcriptome analysis of normally senescing and stay-green lines. Sixty-four candidate genes were identified by genome-wide association study (GWAS), and 14 of these genes are supported by additional evidence for involvement in senescence-related processes including proteolysis, sugar transport and signaling, and sink activity. Eight of the GWAS candidates, independently supported by a coexpression network underlying stay-green, include a trehalose-6-phosphate synthase, a NAC transcription factor, and two xylan biosynthetic enzymes. Source-sink communication and the activity of cell walls as a secondary sink emerge as key determinants of stay-green. Mutant analysis supports the role of a candidate encoding Cys protease in stay-green in Arabidopsis (Arabidopsis thaliana), and analysis of natural alleles suggests a similar role in maize. This study provides a foundation for enhanced understanding and manipulation of senescence for increasing carbon yield, nutritional quality, and stress tolerance of maize and other cereals.


Subject(s)
Aging/genetics , Gene Expression Regulation, Plant , Gene Regulatory Networks , Genes, Plant/genetics , Zea mays/genetics , Arabidopsis/genetics , Gene Expression Profiling , Genome-Wide Association Study , Glucosyltransferases/genetics , Plant Leaves , Polymorphism, Single Nucleotide , Transcription Factors/genetics , Transcriptome
9.
Genome Biol ; 18(1): 215, 2017 11 13.
Article in English | MEDLINE | ID: mdl-29132403

ABSTRACT

BACKGROUND: The history of maize has been characterized by major demographic events, including population size changes associated with domestication and range expansion, and gene flow with wild relatives. The interplay between demographic history and selection has shaped diversity across maize populations and genomes. RESULTS: We investigate these processes using high-depth resequencing data from 31 maize landraces spanning the pre-Columbian distribution of maize, and four wild teosinte individuals (Zea mays ssp. parviglumis). Genome-wide demographic analyses reveal that maize experienced pronounced declines in effective population size due to both a protracted domestication bottleneck and serial founder effects during post-domestication spread, while parviglumis in the Balsas River Valley experienced population growth. The domestication bottleneck and subsequent spread led to an increase in deleterious alleles in the domesticate compared to the wild progenitor. This cost is particularly pronounced in Andean maize, which has experienced a more dramatic founder event compared to other maize populations. Additionally, we detect introgression from the wild teosinte Zea mays ssp. mexicana into maize in the highlands of Mexico, Guatemala, and the southwestern USA, which reduces the prevalence of deleterious alleles likely due to the higher long-term effective population size of teosinte. CONCLUSIONS: These findings underscore the strong interaction between historical demography and the efficiency of selection and illustrate how domesticated species are particularly useful for understanding these processes. The landscape of deleterious alleles and therefore evolutionary potential is clearly influenced by recent demography, a factor that could bear importantly on many species that have experienced recent demographic shifts.


Subject(s)
Domestication , Selection, Genetic , Zea mays/growth & development , Zea mays/genetics , Alleles , Inbreeding , Mutation/genetics , Population Density
10.
Plant Methods ; 13: 8, 2017.
Article in English | MEDLINE | ID: mdl-28250803

ABSTRACT

BACKGROUND: High-density marker panels and/or whole-genome sequencing, coupled with advanced phenotyping pipelines and sophisticated statistical methods, have dramatically increased our ability to generate lists of candidate genes or regions that are putatively associated with phenotypes or processes of interest. However, the speed with which we can validate genes, or even make reasonable biological interpretations about the principles underlying them, has not kept pace. A promising approach that runs parallel to explicitly validating individual genes is analyzing a set of genes together and assessing the biological similarities among them. This is often achieved via gene ontology analysis, a powerful tool that involves evaluating publicly available gene annotations. However, additional resources such as Medical Subject Headings (MeSH) can also be used to evaluate sets of genes to make biological interpretations. RESULTS: In this manuscript, we describe utilizing MeSH terms to make biological interpretations in maize. MeSH terms are assigned to PubMed-indexed manuscripts by the National Library of Medicine, and can be directly mapped to genes to develop gene annotations. Once mapped, these terms can be evaluated for enrichment in sets of genes or similarity between gene sets to provide biological insights. Here, we implement MeSH analyses in five maize datasets to demonstrate how MeSH can be leveraged by the maize and broader crop-genomics community. CONCLUSIONS: We demonstrate that MeSH terms can be effectively leveraged to generate hypotheses and make biological interpretations in maize, and we provide a pipeline that enables the use of MeSH terms in other plant species.

11.
Nat Plants ; 2: 16084, 2016 06 13.
Article in English | MEDLINE | ID: mdl-27294617

ABSTRACT

Genetic diversity is shaped by the interaction of drift and selection, but the details of this interaction are not well understood. The impact of genetic drift in a population is largely determined by its demographic history, typically summarized by its long-term effective population size (Ne). Rapidly changing population demographics complicate this relationship, however. To better understand how changing demography impacts selection, we used whole-genome sequencing data to investigate patterns of linked selection in domesticated and wild maize (teosinte). We produce the first whole-genome estimate of the demography of maize domestication, showing that maize was reduced to approximately 5% the population size of teosinte before it experienced rapid expansion post-domestication to population sizes much larger than its ancestor. Evaluation of patterns of nucleotide diversity in and near genes shows little evidence of selection on beneficial amino acid substitutions, and that the domestication bottleneck led to a decline in the efficiency of purifying selection in maize. Young alleles, however, show evidence of much stronger purifying selection in maize, reflecting the much larger effective size of present day populations. Our results demonstrate that recent demographic change-a hall-mark of many species including both humans and crops-can have immediate and wide-ranging impacts on diversity that conflict with expectations based on long-term Ne alone.


Subject(s)
Evolution, Molecular , Genome, Plant , Selection, Genetic , Zea mays/genetics , Crops, Agricultural/genetics , Domestication , Whole Genome Sequencing
12.
G3 (Bethesda) ; 6(8): 2447-53, 2016 08 09.
Article in English | MEDLINE | ID: mdl-27261003

ABSTRACT

Biomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation of gene products is mainly accelerated by Gene Ontology (GO), and more recently by Medical Subject Headings (MeSH). Here, we report a suite of MeSH packages for chicken in Bioconductor, and illustrate some features of different MeSH-based analyses, including MeSH-informed enrichment analysis and MeSH-guided semantic similarity among terms and gene products, using two lists of chicken genes available in public repositories. The two published datasets that were employed represent (i) differentially expressed genes, and (ii) candidate genes under selective sweep or epistatic selection. The comparison of MeSH with GO overrepresentation analyses suggested not only that MeSH supports the findings obtained from GO analysis, but also that MeSH is able to further enrich the representation of biological knowledge and often provide more interpretable results. Based on the hierarchical structures of MeSH and GO, we computed semantic similarities among vocabularies, as well as semantic similarities among selected genes. These yielded the similarity levels between significant functional terms, and the annotation of each gene yielded the measures of gene similarity. Our findings show the benefits of using MeSH as an alternative choice of annotation in order to draw biological inferences from a list of genes of interest. We argue that the use of MeSH in conjunction with GO will be instrumental in facilitating the understanding of the genetic basis of complex traits.


Subject(s)
Chickens/genetics , Gene Ontology , Medical Subject Headings , Semantics , Animals , Chickens/classification , Databases, Genetic , Humans , Terminology as Topic , Vocabulary, Controlled
13.
Genet Sel Evol ; 47: 30, 2015 Apr 17.
Article in English | MEDLINE | ID: mdl-25928167

ABSTRACT

BACKGROUND: High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome. RESULTS: Simulations applying this method were performed to identify selection signatures from pooled sequencing FST data, for which allele frequencies were estimated from a pool of individuals. The relative ratio of true to false positives was twice that generated by existing techniques. A comparison of the approach to a previous study that involved pooled sequencing FST data from maize suggested that outlying windows were more clearly separated from their neighbors than when using a standard sliding window approach. CONCLUSIONS: We have developed a novel technique to identify window boundaries for subsequent analysis protocols. When applied to selection studies based on F ST data, this method provides a high discovery rate and minimizes false positives. The method is implemented in the R package GenWin, which is publicly available from CRAN.


Subject(s)
Genomics/methods , Data Interpretation, Statistical , Gene Frequency , Zea mays/genetics
14.
G3 (Bethesda) ; 5(4): 541-9, 2015 Feb 02.
Article in English | MEDLINE | ID: mdl-25645532

ABSTRACT

Maize silage is forage of high quality and yield, and represents the second most important use of maize in the United States. The Wisconsin Quality Synthetic (WQS) maize population has undergone five cycles of recurrent selection for silage yield and composition, resulting in a genetically improved population. The application of high-density molecular markers allows breeders and geneticists to identify important loci through association analysis and selection mapping, as well as to monitor changes in the distribution of genetic diversity across the genome. The objectives of this study were to identify loci controlling variation for maize silage traits through association analysis and the assessment of selection signatures and to describe changes in the genomic distribution of gene diversity through selection and genetic drift in the WQS recurrent selection program. We failed to find any significant marker-trait associations using the historical phenotypic data from WQS breeding trials combined with 17,719 high-quality, informative single nucleotide polymorphisms. Likewise, no strong genomic signatures were left by selection on silage yield and quality in the WQS despite genetic gain for these traits. These results could be due to the genetic complexity underlying these traits, or the role of selection on standing genetic variation. Variation in loss of diversity through drift was observed across the genome. Some large regions experienced much greater loss in diversity than what is expected, suggesting limited recombination combined with small populations in recurrent selection programs could easily lead to fixation of large swaths of the genome.


Subject(s)
Genetic Variation , Genome, Plant , Zea mays/genetics , Genetics, Population , Genotype , Phenotype , Polymorphism, Single Nucleotide , Selection, Genetic , Silage
15.
Genetics ; 198(1): 409-21, 2014 Sep.
Article in English | MEDLINE | ID: mdl-25037958

ABSTRACT

Grain produced from cereal crops is a primary source of human food and animal feed worldwide. To understand the genetic basis of seed-size variation, a grain yield component, we conducted a genome-wide scan to detect evidence of selection in the maize Krug Yellow Dent long-term divergent seed-size selection experiment. Previous studies have documented significant phenotypic divergence between the populations. Allele frequency estimates for ∼3 million single nucleotide polymorphisms (SNPs) in the base population and selected populations were estimated from pooled whole-genome resequencing of 48 individuals per population. Using FST values across sliding windows, 94 divergent regions with a median of six genes per region were identified. Additionally, 2729 SNPs that reached fixation in both selected populations with opposing fixed alleles were identified, many of which clustered in two regions of the genome. Copy-number variation was highly prevalent between the selected populations, with 532 total regions identified on the basis of read-depth variation and comparative genome hybridization. Regions important for seed weight in natural variation were identified in the maize nested association mapping population. However, the number of regions that overlapped with the long-term selection experiment did not exceed that expected by chance, possibly indicating unique sources of variation between the two populations. The results of this study provide insights into the genetic elements underlying seed-size variation in maize and could also have applications for other cereal crops.


Subject(s)
Seeds/genetics , Selection, Genetic , Zea mays/genetics , DNA Copy Number Variations , Gene Frequency , Genome, Plant , Models, Genetic , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable , Seeds/anatomy & histology , Zea mays/growth & development
16.
Genetics ; 196(3): 829-40, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24381334

ABSTRACT

A genome-wide scan to detect evidence of selection was conducted in the Golden Glow maize long-term selection population. The population had been subjected to selection for increased number of ears per plant for 30 generations, with an empirically estimated effective population size ranging from 384 to 667 individuals and an increase of more than threefold in the number of ears per plant. Allele frequencies at >1.2 million single-nucleotide polymorphism loci were estimated from pooled whole-genome resequencing data, and FST values across sliding windows were employed to assess divergence between the population preselection and the population postselection. Twenty-eight highly divergent regions were identified, with half of these regions providing gene-level resolution on potentially selected variants. Approximately 93% of the divergent regions do not demonstrate a significant decrease in heterozygosity, which suggests that they are not approaching fixation. Also, most regions display a pattern consistent with a soft-sweep model as opposed to a hard-sweep model, suggesting that selection mostly operated on standing genetic variation. For at least 25% of the regions, results suggest that selection operated on variants located outside of currently annotated coding regions. These results provide insights into the underlying genetic effects of long-term artificial selection and identification of putative genetic elements underlying number of ears per plant in maize.


Subject(s)
Genome, Plant , Selection, Genetic , Zea mays/genetics , Chromosomes, Plant , Gene Frequency , Genetic Heterogeneity , Genetic Variation , Phenotype , Polymorphism, Single Nucleotide
17.
Genetics ; 193(4): 1073-81, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23410831

ABSTRACT

Genotyping-by-sequencing (GBS) approaches provide low-cost, high-density genotype information. However, GBS has unique technical considerations, including a substantial amount of missing data and a nonuniform distribution of sequence reads. The goal of this study was to characterize technical variation using this method and to develop methods to optimize read depth to obtain desired marker coverage. To empirically assess the distribution of fragments produced using GBS, ∼8.69 Gb of GBS data were generated on the Zea mays reference inbred B73, utilizing ApeKI for genome reduction and single-end reads between 75 and 81 bp in length. We observed wide variation in sequence coverage across sites. Approximately 76% of potentially observable cut site-adjacent sequence fragments had no sequencing reads whereas a portion had substantially greater read depth than expected, up to 2369 times the expected mean. The methods described in this article facilitate determination of sequencing depth in the context of empirically defined read depth to achieve desired marker density for genetic mapping studies.


Subject(s)
Genotyping Techniques/methods , Sequence Analysis, DNA/methods , Analysis of Variance , Genetic Markers , Genome, Plant , Zea mays/genetics
18.
Genet Sel Evol ; 44: 29, 2012 Sep 25.
Article in English | MEDLINE | ID: mdl-23009363

ABSTRACT

BACKGROUND: Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. RESULTS: Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. CONCLUSIONS: Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.


Subject(s)
Animals, Domestic/genetics , Breeding/methods , Models, Genetic , Animals , Bayes Theorem , Markov Chains , Monte Carlo Method
19.
Front Genet ; 2: 4, 2011.
Article in English | MEDLINE | ID: mdl-22303303

ABSTRACT

High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin-Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans.

SELECTION OF CITATIONS
SEARCH DETAIL
...