Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 121
Filter
Add more filters

Publication year range
1.
Nature ; 606(7914): 527-534, 2022 06.
Article in English | MEDLINE | ID: mdl-35676474

ABSTRACT

Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits1,2. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions3,4. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.


Subject(s)
Genetic Variation , Genome, Plant , Genome-Wide Association Study , Plant Breeding , Solanum lycopersicum , Alleles , Crops, Agricultural/genetics , Genome, Plant/genetics , Linkage Disequilibrium , Solanum lycopersicum/genetics , Solanum lycopersicum/metabolism
2.
Plant J ; 120(2): 833-850, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39259496

ABSTRACT

Genome-wide association study (GWAS) with single nucleotide polymorphisms (SNPs) has been widely used to explore genetic controls of phenotypic traits. Alternatively, GWAS can use counts of substrings of length k from longer sequencing reads, k-mers, as genotyping data. Using maize cob and kernel color traits, we demonstrated that k-mer GWAS can effectively identify associated k-mers. Co-expression analysis of kernel color k-mers and genes directly found k-mers from known causal genes. Analyzing complex traits of kernel oil and leaf angle resulted in k-mers from both known and candidate genes. A gene encoding a MADS transcription factor was functionally validated by showing that ectopic expression of the gene led to less upright leaves. Evolution analysis revealed most k-mers positively correlated with kernel oil were strongly selected against in maize populations, while most k-mers for upright leaf angle were positively selected. In addition, genomic prediction of kernel oil, leaf angle, and flowering time using k-mer data resulted in a similarly high prediction accuracy to the standard SNP-based method. Collectively, we showed k-mer GWAS is a powerful approach for identifying trait-associated genetic elements. Further, our results demonstrated the bridging role of k-mers for data integration and functional gene discovery.


Subject(s)
Genome-Wide Association Study , Phenotype , Polymorphism, Single Nucleotide , Zea mays , Zea mays/genetics , Quantitative Trait Loci/genetics , Plant Leaves/genetics , Genotype , Genome, Plant/genetics
3.
BMC Genomics ; 25(1): 497, 2024 May 21.
Article in English | MEDLINE | ID: mdl-38773372

ABSTRACT

BACKGROUND: Alfalfa (Medicago sativa L.) is the most cultivated forage legume around the world. Under a variety of growing conditions, forage yield in alfalfa is stymied by biotic and abiotic stresses including heat, salt, drought, and disease. Given the sessile nature of plants, they use strategies including, but not limited to, differential gene expression to respond to environmental cues. Transcription factors control the expression of genes that contribute to or enable tolerance and survival during periods of stress. Basic-leucine zipper (bZIP) transcription factors have been demonstrated to play a critical role in regulating plant growth and development as well as mediate the responses to abiotic stress in several species, including Arabidopsis thaliana, Oryza sativa, Lotus japonicus and Medicago truncatula. However, there is little information about bZIP transcription factors in cultivated alfalfa. RESULT: In the present study, 237 bZIP genes were identified in alfalfa from publicly available sequencing data. Multiple sequence alignments showed the presence of intact bZIP motifs in the identified sequences. Based on previous phylogenetic analyses in A. thaliana, alfalfa bZIPs were similarly divided and fell into 10 groups. The physico-chemical properties, motif analysis and phylogenetic study of the alfalfa bZIPs revealed high specificity within groups. The differential expression of alfalfa bZIPs in a suite of tissues indicates that bZIP genes are specifically expressed at different developmental stages in alfalfa. Similarly, expression analysis in response to ABA, cold, drought and salt stresses, indicates that a subset of bZIP genes are also differentially expressed and likely play a role in abiotic stress signaling and/or tolerance. RT-qPCR analysis on selected genes further verified these differential expression patterns. CONCLUSIONS: Taken together, this work provides a framework for the future study of bZIPs in alfalfa and presents candidate bZIPs involved in stress-response signaling.


Subject(s)
Basic-Leucine Zipper Transcription Factors , Gene Expression Regulation, Plant , Medicago sativa , Phylogeny , Stress, Physiological , Medicago sativa/genetics , Basic-Leucine Zipper Transcription Factors/genetics , Basic-Leucine Zipper Transcription Factors/metabolism , Stress, Physiological/genetics , Plant Proteins/genetics , Plant Proteins/metabolism , Computer Simulation , Gene Expression Profiling , Computational Biology/methods
4.
J Nanobiotechnology ; 22(1): 355, 2024 Jun 21.
Article in English | MEDLINE | ID: mdl-38902678

ABSTRACT

BACKGROUND: Cancer recurrence following surgical resection is a major cause of treatment failure. Finding effective methods to prevent postoperative recurrence and wound infection is an important component of successful surgery. With the development of new nanotechnology, more treatment options have been provided for postoperative adjuvant therapy. This study presents an innovative hydrogel system that stimulates tumoricidal immunity after surgical resection of non-small cell lung cancer (NSCLC) and prevents cancer relapse. RESULTS: The hydrogel system is based on the excellent photothermal conversion performance of single-atom platinum (CN-Pt) along with the delivery and release of the chemotherapy drug, gemcitabine (GEM). The system is coated onto the wound surface after tumor removal with subsequent near-infrared (NIR) photothermal therapy, which efficiently induces necroptosis of residual cancer cells, amplifies the levels of damage-associated molecular patterns (DAMPs), and increases the number of M1 macrophages. The significantly higher levels of phagocytic macrophages enhance tumor immunogenicity and sensitize cancer cells to CD8 + T-cell immunity to control postoperative recurrence, which has been verified using an animal model of postoperative lung cancer recurrence. The CN-Pt-GEM-hydrogel with NIR can also inhibit postoperative wound infection. CONCLUSIONS: These findings introduce an alternative strategy for supplementing antitumor immunity in patients undergoing resection of NSCLC tumors. The CN-Pt-GEM-hydrogel with the NIR system also exhibits good biosafety and may be adaptable for clinical application in relation to tumor resection surgery, wound tissue filling, infection prevention, and recurrence prevention.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Deoxycytidine , Gemcitabine , Hydrogels , Lung Neoplasms , Necroptosis , Animals , Mice , Deoxycytidine/analogs & derivatives , Deoxycytidine/pharmacology , Deoxycytidine/therapeutic use , Hydrogels/chemistry , Humans , Necroptosis/drug effects , Neoplasm Recurrence, Local , Cell Line, Tumor , Immunotherapy/methods , Photothermal Therapy/methods , Wound Infection/prevention & control , Wound Infection/drug therapy , Macrophages/drug effects , Mice, Inbred C57BL , CD8-Positive T-Lymphocytes/immunology , CD8-Positive T-Lymphocytes/drug effects
5.
Bioinformatics ; 37(9): 1324-1326, 2021 06 09.
Article in English | MEDLINE | ID: mdl-32960944

ABSTRACT

Accurately predicting phenotypes from genotypes holds great promise to improve health management in humans and animals, and breeding efficiency in animals and plants. Although many prediction methods have been developed, the optimal method differs across datasets due to multiple factors, including species, environments, populations and traits of interest. Studies have demonstrated that the number of genes underlying a trait and its heritability are the two key factors that determine which method fits the trait the best. In many cases, however, these two factors are unknown for the traits of interest. We developed a cloud computing platform for Mining the Maximum Accuracy of Predicting phenotypes from genotypes (MMAP) using unsupervised learning on publicly available real data and simulated data. MMAP provides a user interface to upload input data, manage projects and analyses and download the output results. The platform is free for the public to conduct computations for predicting phenotypes and genetic merit using the best prediction method optimized from many available ones, including Ridge Regression, gBLUP, compressed BLUP, Bayesian LASSO, Bayes A, B, Cpi and many more. Users can also use the platform to conduct data analyses with any methods of their choice. It is expected that extensive usage of MMAP would enrich the training data, which in turn results in continual improvement of the identification of the best method for use with particular traits. AVAILABILITY AND IMPLEMENTATION: The MMAP user manual, tutorials and example datasets are available at http://zzlab.net/MMAP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Cloud Computing , Models, Genetic , Animals , Bayes Theorem , Genomics , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide
6.
Plant Physiol ; 186(4): 2239-2252, 2021 08 03.
Article in English | MEDLINE | ID: mdl-34618106

ABSTRACT

Grain characteristics, including kernel length, kernel width, and thousand kernel weight, are critical component traits for grain yield. Manual measurements and counting are expensive, forming the bottleneck for dissecting these traits' genetic architectures toward ultimate yield improvement. High-throughput phenotyping methods have been developed by analyzing images of kernels. However, segmenting kernels from the image background and noise artifacts or from other kernels positioned in close proximity remain as challenges. In this study, we developed a software package, named GridFree, to overcome these challenges. GridFree uses an unsupervised machine learning approach, K-Means, to segment kernels from the background by using principal component analysis on both raw image channels and their color indices. GridFree incorporates users' experiences as a dynamic criterion to set thresholds for a divide-and-combine strategy that effectively segments adjacent kernels. When adjacent multiple kernels are incorrectly segmented as a single object, they form an outlier on the distribution plot of kernel area, length, and width. GridFree uses the dynamic threshold settings for splitting and merging. In addition to counting, GridFree measures kernel length, width, and area with the option of scaling with a reference object. Evaluations against existing software programs demonstrated that GridFree had the smallest error on counting seeds for multiple crop species. GridFree was implemented in Python with a friendly graphical user interface to allow users to easily visualize the outcomes and make decisions, which ultimately eliminates time-consuming and repetitive manual labor. GridFree is freely available at the GridFree website (https://zzlab.net/GridFree).


Subject(s)
Botany/methods , Crop Production/methods , Edible Grain/anatomy & histology , Image Processing, Computer-Assisted/instrumentation , Software , Botany/instrumentation , Crop Production/instrumentation , Seeds/anatomy & histology
7.
Mol Breed ; 42(4): 18, 2022 Apr.
Article in English | MEDLINE | ID: mdl-37309459

ABSTRACT

Using imbalanced historical yield data to predict performance and select new lines is an arduous breeding task. Genome-wide association studies (GWAS) and high throughput genotyping based on sequencing techniques can increase prediction accuracy. An association mapping panel of 227 Texas elite (TXE) wheat breeding lines was used for GWAS and a training population to develop prediction models for grain yield selection. An imbalanced set of yield data collected from 102 environments (year-by-location) over 10 years, through testing yield in 40-66 lines each year at 6-14 locations with 38-41 lines repeated in the test in any two consecutive years, was used. Based on correlations among data from different environments within two adjacent years and heritability estimated in each environment, yield data from 87 environments were selected and assigned to two correlation-based groups. The yield best linear unbiased estimation (BLUE) from each group, along with reaction to greenbug and Hessian fly in each line, was used for GWAS to reveal genomic regions associated with yield and insect resistance. A total of 74 genomic regions were associated with grain yield and two of them were commonly detected in both correlation-based groups. Greenbug resistance in TXE lines was mainly controlled by Gb3 on chromosome 7DL in addition to two novel regions on 3DL and 6DS, and Hessian fly resistance was conferred by the region on 1AS. Genomic prediction models developed in two correlation-based groups were validated using a set of 105 new advanced breeding lines and the model from correlation-based group G2 was more reliable for prediction. This research not only identified genomic regions associated with yield and insect resistance but also established the method of using historical imbalanced breeding data to develop a genomic prediction model for crop improvement. Supplementary Information: The online version contains supplementary material available at 10.1007/s11032-022-01287-8.

8.
Spinal Cord ; 60(2): 129-134, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34326463

ABSTRACT

STUDY DESIGN: A retrospective study of incomplete cervical spinal cord injury (SCI) treated with and without hyperbaric oxygen (HBO) therapy after operation. OBJECTIVE: To investigate the effects of hyperbaric oxygen therapy on patients' postoperative recovery after incomplete cervical spinal cord injury. SETTING: Shulan Hangzhou Hospital, Hangzhou, China. METHODS: We analyzed the clinical data of 78 patients admitted in the Orthopedic Department of our hospital from June 2014 to June 2016, due to trauma-induced incomplete cervical spinal cord injury. All study subjects underwent nerve decompression and internal fixation procedures within 2 weeks of injury. The patients were divided into hyperbaric oxygen therapy (HBO) group (n = 40) and non-hyperbaric oxygen therapy (NHBO) group (n = 38) according to the chosen treatment option. The NHBO group only receive the conventional treatment regimen while the HBO group received a combination of conventional treatment and hyperbaric oxygen therapy. The subsequent changes in spinal functions and activities of daily living (ADL) were assessed by The American Spinal Injury Association (ASIA) scale and the Barthel Index at different time points (pretreatment, 1 month and 3 months of treatment, as well as 6 months, 1 year, 2 years, and 3 years after the surgical procedure). RESULTS: There were no significant differences in age, gender, injury site, and disease condition between patients (p > 0.05). The results showed a significant difference in treatment total effectiveness rate between the HBO and NHBO groups (p < 0.05) (90% and 78.9%, respectively). Analyses of the ASIA scores and Barthel indices between the two groups indicated significant differences at 1 month and 3 months treatment time points, as well as 6 months and 1 year after the initial operation (p < 0.05). It showed that subjects in the HBO group had a better recovery than their NHBO counterparts, with the 1-month treatment time point being the most significant. In addition, the results indicated significant improvements in Barthel Index scores as well as ASIA sensory and motor function scores in both groups after a 1-month treatment, with the HBO group faring significantly better than the NHBO group (p < 0.01). CONCLUSIONS: Our results not only showed that hyperbaric oxygen therapy is safe and effective for the treatment of incomplete cervical spinal cord injury but also indicated that the longer the treatment lasts (therapy initiation within 3 months after the surgical operation), the better the effects. In addition, a correct hyperbaric oxygen therapy leads to a peak in recovery within the first postoperative 3 months and can effectively promote spinal cord functions, reduce the disabilities, and improve patients' quality of life.


Subject(s)
Cervical Cord , Hyperbaric Oxygenation , Spinal Cord Injuries , Activities of Daily Living , Humans , Hyperbaric Oxygenation/methods , Quality of Life , Retrospective Studies , Spinal Cord
9.
Compr Rev Food Sci Food Saf ; 21(3): 2105-2117, 2022 05.
Article in English | MEDLINE | ID: mdl-35411636

ABSTRACT

This review examines the application, limitations, and potential alternatives to the Hagberg-Perten falling number (FN) method used in the global wheat industry for detecting the risk of poor end-product quality mainly due to starch degradation by the enzyme α-amylase. By viscometry, the FN test indirectly detects the presence of α-amylase, the primary enzyme that digests starch. Elevated α-amylase results in low FN and damages wheat product quality resulting in cakes that fall, and sticky bread and noodles. Low FN can occur from preharvest sprouting (PHS) and late maturity α-amylase (LMA). Moist or rainy conditions before harvest cause PHS on the mother plant. Continuously cool or fluctuating temperatures during the grain filling stage cause LMA. Due to the expression of additional hydrolytic enzymes, PHS has a stronger negative impact than LMA. Wheat grain with low FN/high α-amylase results in serious losses for farmers, traders, millers, and bakers worldwide. Although blending of low FN grain with sound wheat may be used as a means of moving affected grain through the marketplace, care must be taken to avoid grain lots from falling below contract-specified FN. A large amount of sound wheat can be ruined if mixed with a small amount of sprouted wheat. The FN method is widely employed to detect α-amylase after harvest. However, it has several limitations, including sampling variability, high cost, labor intensiveness, the destructive nature of the test, and an inability to differentiate between LMA and PHS. Faster, cheaper, and more accurate alternatives could improve breeding for resistance to PHS and LMA and could preserve the value of wheat grain by avoiding inadvertent mixing of high- and low-FN grain by enabling testing at more stages of the value stream including at harvest, delivery, transport, storage, and milling. Alternatives to the FN method explored here include the Rapid Visco Analyzer, enzyme assays, immunoassays, near-infrared spectroscopy, and hyperspectral imaging.


Subject(s)
Seeds , Triticum , Bread , Edible Grain , Starch/chemistry , Triticum/chemistry , alpha-Amylases/metabolism
10.
Heredity (Edinb) ; 126(6): 929-941, 2021 06.
Article in English | MEDLINE | ID: mdl-33888874

ABSTRACT

Domesticates are an excellent model for understanding biological consequences of rapid climate change. Maize (Zea mays ssp. mays) was domesticated from a tropical grass yet is widespread across temperate regions today. We investigate the biological basis of temperate adaptation in diverse structured nested association mapping (NAM) populations from China, Europe (Dent and Flint) and the United States as well as in the Ames inbred diversity panel, using days to flowering as a proxy. Using cross-population prediction, where high prediction accuracy derives from overall genomic relatedness, shared genetic architecture, and sufficient diversity in the training population, we identify patterns in predictive ability across the five populations. To identify the source of temperate adapted alleles in these populations, we predict top associated genome-wide association study (GWAS) identified loci in a Random Forest Classifier using independent temperate-tropical North American populations based on lines selected from Hapmap3 as predictors. We find that North American populations are well predicted (AUC equals 0.89 and 0.85 for Ames and USNAM, respectively), European populations somewhat well predicted (AUC equals 0.59 and 0.67 for the Dent and Flint panels, respectively) and that the Chinese population is not predicted well at all (AUC is 0.47), suggesting an independent adaptation process for early flowering in China. Multiple adaptations for the complex trait days to flowering in maize provide hope for similar natural systems under climate change.


Subject(s)
Adaptation, Physiological , Flowers/physiology , Zea mays , Adaptation, Physiological/genetics , Alleles , Genetic Association Studies , Zea mays/genetics , Zea mays/physiology
11.
Plant Biotechnol J ; 18(2): 389-401, 2020 02.
Article in English | MEDLINE | ID: mdl-31278885

ABSTRACT

Landraces often contain genetic diversity that has been lost in modern cultivars, including alleles that confer enhanced local adaptation. To comprehensively identify loci associated with adaptive traits in soya bean landraces, for example flowering time, a population of 1938 diverse landraces and 97 accessions of the wild progenitor of cultivated soya bean, Glycine soja was genotyped using tGBS® . Based on 99 085 high-quality SNPs, landraces were classified into three sub-populations which exhibit geographical genetic differentiation. Clustering was inferred from STRUCTURE, principal component analyses and neighbour-joining tree analyses. Using phenotypic data collected at two locations separated by 10 degrees of latitude, 17 trait-associated SNPs (TASs) for flowering time were identified, including a stable locus Chr12:5914898 and previously undetected candidate QTL/genes for flowering time in the vicinity of the previously cloned flowering genes, E1 and E2. Using passport data associated with the collection sites of the landraces, 27 SNPs associated with adaptation to three bioclimatic variables (temperature, daylength, and precipitation) were identified. A series of candidate flowering genes were detected within linkage disequilibrium (LD) blocks surrounding 12 bioclimatic TASs. Nine of these TASs exhibit significant differences in flowering time between alleles within one or more of the three individual sub-populations. Signals of selection during domestication and/or subsequent landrace diversification and adaptation were detected at 38 of the 44 flowering and bioclimatic TASs. Hence, this study lays the groundwork to begin breeding for novel environments predicted to arise following global climate change.


Subject(s)
Adaptation, Physiological , Genes, Plant , Genome-Wide Association Study , Glycine max , Adaptation, Physiological/genetics , Alleles , Genes, Plant/genetics , Genotype , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Glycine max/genetics
12.
Plant Dis ; 104(8): 2181-2192, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32511046

ABSTRACT

Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), poses a major threat to wheat production worldwide, especially in the United States. To identify loci for effective stripe rust resistance in U.S. wheat, a genome-wide association study (GWAS) was conducted using a panel of 616 spring wheat cultivars and breeding lines. The accessions in this panel were phenotyped for stripe rust response in the greenhouse at seedling stage with five predominant and highly virulent races of Pst and in different field environments at adult-plant stage in 2017 and 2018. In total, 2,029 single-nucleotide polymorphism markers that cover the whole genome were generated with genotyping by multiplexed sequencing and used in GWAS. In addition, 23 markers of previously reported resistance genes or quantitative trait loci (QTLs) were used to genotype the population. This spring panel was grouped into three subpopulations based on principal component analysis. A total of 37 genes or QTLs including 10 potentially new QTLs for resistance to stripe rust were detected by GWAS and linked marker tests. The frequencies of the resistance genes or QTLs in various nurseries were determined, indicating different intensities of these genes or QTLs used in breeding programs of different regions. These resistance loci and the information on their markers, effectiveness, and distributions should be useful for improving stripe rust resistance in wheat cultivars.


Subject(s)
Genome-Wide Association Study , Triticum/genetics , Breeding , Disease Resistance/genetics , Humans , Plant Diseases , United States
13.
BMC Genomics ; 20(1): 827, 2019 Nov 08.
Article in English | MEDLINE | ID: mdl-31703627

ABSTRACT

BACKGROUND: Dual-purpose cattle are more adaptive to environmental challenges than single-purpose dairy or beef cattle. Balance among milk, reproductive, and mastitis resistance traits in breeding programs is therefore more critical for dual-purpose cattle to increase net income and maintain well-being. With dual-purpose Xinjiang Brown cattle adapted to the Xinjiang Region in northwestern China, we conducted genome-wide association studies (GWAS) to dissect the genetic architecture related to milk, reproductive, and mastitis resistance traits. Phenotypic data were collected for 2410 individuals measured during 1995-2017. By adding another 445 ancestors, a total of 2855 related individuals were used to derive estimated breeding values for all individuals, including the 2410 individuals with phenotypes. Among phenotyped individuals, we genotyped 403 cows with the Illumina 150 K Bovine BeadChip. RESULTS: GWAS were conducted with the FarmCPU (Fixed and random model circulating probability unification) method. We identified 12 markers significantly associated with six of the 10 traits under the threshold of 5% after a Bonferroni multiple test correction. Seven of these SNPs were in QTL regions previously identified to be associated with related traits. One identified SNP, BovineHD1600006691, was significantly associated with both age at first service and age at first calving. This SNP directly overlapped a QTL previously reported to be associated with calving ease. Within 160 Kb upstream and downstream of each significant SNP identified, we speculated candidate genes based on functionality. Four of the SNPs were located within four candidate genes, including CDH2, which is linked to milk fat percentage, and GABRG2, which is associated with milk protein yield. CONCLUSIONS: These findings are beneficial not only for breeding through marker-assisted selection, but also for genome editing underlying the related traits to enhance the overall performance of dual-purpose cattle.


Subject(s)
Cattle/genetics , Cattle/physiology , Genome-Wide Association Study , Milk/metabolism , Reproduction/genetics , Animals , Cattle/metabolism , Disease Resistance/genetics , Female , Mastitis/genetics , Phenotype
14.
Brief Bioinform ; 18(5): 744-753, 2017 09 01.
Article in English | MEDLINE | ID: mdl-27436121

ABSTRACT

Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula.


Subject(s)
Genomics , Genome , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide
15.
Bioinformatics ; 34(11): 1925-1927, 2018 06 01.
Article in English | MEDLINE | ID: mdl-29342241

ABSTRACT

Summary: The ultimate goal of genomic research is to effectively predict phenotypes from genotypes so that medical management can improve human health and molecular breeding can increase agricultural production. Genomic prediction or selection (GS) plays a complementary role to genome-wide association studies (GWAS), which is the primary method to identify genes underlying phenotypes. Unfortunately, most computing tools cannot perform data analyses for both GWAS and GS. Furthermore, the majority of these tools are executed through a command-line interface (CLI), which requires programming skills. Non-programmers struggle to use them efficiently because of the steep learning curves and zero tolerance for data formats and mistakes when inputting keywords and parameters. To address these problems, this study developed a software package, named the Intelligent Prediction and Association Tool (iPat), with a user-friendly graphical user interface. With iPat, GWAS or GS can be performed using a pointing device to simply drag and/or click on graphical elements to specify input data files, choose input parameters and select analytical models. Models available to users include those implemented in third party CLI packages such as GAPIT, PLINK, FarmCPU, BLINK, rrBLUP and BGLR. Users can choose any data format and conduct analyses with any of these packages. File conversions are automatically conducted for specified input data and selected packages. A GWAS-assisted genomic prediction method was implemented to perform genomic prediction using any GWAS method such as FarmCPU. iPat was written in Java for adaptation to multiple operating systems including Windows, Mac and Linux. Availability and implementation: The iPat executable file, user manual, tutorials and example datasets are freely available at http://zzlab.net/iPat. Contact: zhiwu.zhang@wsu.edu.


Subject(s)
Genome-Wide Association Study/methods , Phenotype , Software , Genomics/methods , Genotype , Humans
16.
Plant Cell ; 28(10): 2651-2665, 2016 10.
Article in English | MEDLINE | ID: mdl-27662898

ABSTRACT

Plant volatiles not only have multiple defense functions against herbivores, fungi, and bacteria, but also have been implicated in signaling within the plant and toward other organisms. Elucidating the function of individual plant volatiles will require more knowledge of their biosynthesis and regulation in response to external stimuli. By exploiting the variation of herbivore-induced volatiles among 26 maize (Zea mays) inbred lines, we conducted a nested association mapping and genome-wide association study (GWAS) to identify a set of quantitative trait loci (QTLs) for investigating the pathways of volatile terpene production. The most significant identified QTL affects the emission of (E)-nerolidol, linalool, and the two homoterpenes (E)-3,8-dimethyl-1,4,7-nonatriene (DMNT) and (E,E)-4,8,12-trimethyltrideca-1,3,7,11-tetraene (TMTT). GWAS associated a single nucleotide polymorphism in the promoter of the gene encoding the terpene synthase TPS2 with this QTL Biochemical characterization of TPS2 verified that this plastid-localized enzyme forms linalool, (E)-nerolidol, and (E,E)-geranyllinalool. The subsequent conversion of (E)-nerolidol into DMNT maps to a P450 monooxygenase, CYP92C5, which is capable of converting nerolidol into DMNT by oxidative degradation. A QTL influencing TMTT accumulation corresponds to a similar monooxygenase, CYP92C6, which is specific for the conversion of (E,E)-geranyllinalool to TMTT The DMNT biosynthetic pathway and both monooxygenases are distinct from those previously characterized for DMNT and TMTT synthesis in Arabidopsis thaliana, suggesting independent evolution of these enzymatic activities.


Subject(s)
Arabidopsis/metabolism , Acyclic Monoterpenes , Arabidopsis/genetics , Arabidopsis Proteins/metabolism , Genome-Wide Association Study , Monoterpenes/metabolism , Quantitative Trait Loci/genetics , Sesquiterpenes/metabolism
17.
PLoS Genet ; 12(2): e1005767, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26828793

ABSTRACT

False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Arabidopsis/genetics , Flowers/genetics , Flowers/physiology , Genes, Plant , Genetic Loci , Humans , Quantitative Trait, Heritable , Software , Species Specificity
18.
Theor Appl Genet ; 131(6): 1273-1285, 2018 Jun.
Article in English | MEDLINE | ID: mdl-29478186

ABSTRACT

KEY MESSAGE: We report a repertoire of diverse aneuploids harbored by a newly synthesized segmental allotetraploid rice population with fully sequenced sub-genomes and demonstrate their retention features and phenotypic consequences. Aneuploidy, defined as unequal numbers of different chromosomes, is a large-effect genetic variant and may produce diverse cellular and organismal phenotypes. Polyploids are more permissive to chromosomal content imbalance than their diploid and haploid counterparts, and therefore, may enable more in-depth investigation of the phenotypic consequences of aneuploidy. Based on whole-genome resequencing, we identify that ca. 40% of the 312 selfed individual plants sampled from an early generation rice segmental allotetraploid population are constitutive aneuploids harboring 55 distinct aneuploid karyotypes. We document that gain of a chromosome is more prevalent than loss of a chromosome, and the 12 rice chromosomes have distinct tendencies to be in an aneuploid state. These properties of aneuploidy are constrained by multiple factors including the number of genes residing on the chromosome and predicted functional connectivity with other chromosomes. Two broad categories of aneuploidy-associated phenotypes are recognized: those shared by different aneuploids, and those associated with aneuploidy of a specific chromosome. A repertoire of diverse aneuploids in the context of a segmental allotetraploid rice genome with fully sequenced sub-genomes provides a tractable resource to explore the roles of aneuploidy in nascent polyploid genome evolution and helps to decipher the mechanisms conferring karyotypic stabilization on the path to polyploid speciation and towards artificial construction of novel polyploid crops.


Subject(s)
Aneuploidy , Oryza/genetics , Plant Breeding , Polyploidy , Genome, Plant , Karyotype , Phenotype
19.
Heredity (Edinb) ; 121(6): 648-662, 2018 12.
Article in English | MEDLINE | ID: mdl-29765161

ABSTRACT

Improvement of statistical methods is crucial for realizing the potential of increasingly dense genetic markers. Bayesian methods treat all markers as random effects, exhibit an advantage on dense markers, and offer the flexibility of using different priors. In contrast, genomic best linear unbiased prediction (gBLUP) is superior in computing speed, but only superior in prediction accuracy for extremely complex traits. Currently, the existing variety in the BLUP method is insufficient for adapting to new sequencing technologies and traits with different genetic architectures. In this study, we found two ways to change the kinship derivation in the BLUP method that improve prediction accuracy while maintaining the computational advantage. First, using the settlement under progressively exclusive relationship (SUPER) algorithm, we substituted all available markers with estimated quantitative trait nucleotides (QTNs) to derive kinship. Second, we compressed individuals into groups based on kinship, and then used the groups as random effects instead of individuals. The two methods were named as SUPER BLUP (sBLUP) and compressed BLUP (cBLUP). Analyses on both simulated and real data demonstrated that these two methods offer flexibility for evaluating a variety of traits, covering a broadened realm of genetic architectures. For traits controlled by small numbers of genes, sBLUP outperforms Bayesian LASSO (least absolute shrinkage and selection operator). For traits with low heritability, cBLUP outperforms both gBLUP and Bayesian LASSO methods. We implemented these new BLUP alphabet series methods in an R package, Genome Association and Prediction Integrated Tool (GAPIT), available at http://zzlab.net/GAPIT .


Subject(s)
Genome , Quantitative Trait Loci , Animals , Arabidopsis/genetics , Bayes Theorem , Mice , Oryza/genetics , Zea mays/genetics
20.
Plant J ; 86(5): 391-402, 2016 06.
Article in English | MEDLINE | ID: mdl-27012534

ABSTRACT

Flowering time is one of the major adaptive traits in domestication of maize and an important selection criterion in breeding. To detect more maize flowering time variants we evaluated flowering time traits using an extremely large multi- genetic background population that contained more than 8000 lines under multiple Sino-United States environments. The population included two nested association mapping (NAM) panels and a natural association panel. Nearly 1 million single-nucleotide polymorphisms (SNPs) were used in the analyses. Through the parallel linkage analysis of the two NAM panels, both common and unique flowering time regions were detected. Genome wide, a total of 90 flowering time regions were identified. One-third of these regions were connected to traits associated with the environmental sensitivity of maize flowering time. The genome-wide association study of the three panels identified nearly 1000 flowering time-associated SNPs, mainly distributed around 220 candidate genes (within a distance of 1 Mb). Interestingly, two types of regions were significantly enriched for these associated SNPs - one was the candidate gene regions and the other was the approximately 5 kb regions away from the candidate genes. Moreover, the associated SNPs exhibited high accuracy for predicting flowering time.


Subject(s)
Genetic Variation , Genome-Wide Association Study , Zea mays/genetics , Breeding , Flowers/genetics , Flowers/physiology , Genetic Background , Genetic Linkage , Phenotype , Polymorphism, Single Nucleotide , Time Factors , Zea mays/physiology
SELECTION OF CITATIONS
SEARCH DETAIL