Pesquisa | BVS CLAP/SMR-OPAS/OMS

1.

Graph pangenome captures missing heritability and empowers tomato breeding.

Zhou, Yao; Zhang, Zhiyang; Bao, Zhigui; Li, Hongbo; Lyu, Yaqing; Zan, Yanjun; Wu, Yaoyao; Cheng, Lin; Fang, Yuhan; Wu, Kun; Zhang, Jinzhe; Lyu, Hongjun; Lin, Tao; Gao, Qiang; Saha, Surya; Mueller, Lukas; Fei, Zhangjun; Städler, Thomas; Xu, Shizhong; Zhang, Zhiwu; Speed, Doug; Huang, Sanwen.

Nature ; 606(7914): 527-534, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35676474

RESUMO

Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits1,2. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions3,4. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.

Assuntos

Variação Genética , Genoma de Planta , Estudo de Associação Genômica Ampla , Melhoramento Vegetal , Solanum lycopersicum , Alelos , Produtos Agrícolas/genética , Genoma de Planta/genética , Desequilíbrio de Ligação , Solanum lycopersicum/genética , Solanum lycopersicum/metabolismo

2.

Genome-Wide Association Study for Maize Hybrid Performance in a Typical Breeder Population.

Dong, Yuan; Li, Guoliang; Zhang, Xinghua; Feng, Zhiqian; Li, Ting; Li, Zhoushuai; Xu, Shizhong; Xu, Shutu; Liu, Wenxin; Xue, Jiquan.

Int J Mol Sci ; 25(2)2024 Jan 18.

Artigo em Inglês | MEDLINE | ID: mdl-38256265

RESUMO

Maize is one of the major crops that has demonstrated success in the utilization of heterosis. Developing high-yield hybrids is a crucial part of plant breeding to secure global food demand. In this study, we conducted a genome-wide association study (GWAS) for 10 agronomic traits using a typical breeder population comprised 442 single-cross hybrids by evaluating additive, dominance, and epistatic effects. A total of 49 significant single nucleotide polymorphisms (SNPs) and 69 significant pairs of epistasis were identified, explaining 26.2% to 64.3% of the phenotypic variation across the 10 traits. The enrichment of favorable genotypes is significantly correlated to the corresponding phenotype. In the confident region of the associated site, 532 protein-coding genes were discovered. Among these genes, the Zm00001d044211 candidate gene was found to negatively regulate starch synthesis and potentially impact yield. This typical breeding population provided a valuable resource for dissecting the genetic architecture of yield-related traits. We proposed a novel mating strategy to increase the GWAS efficiency without utilizing more resources. Finally, we analyzed the enrichment of favorable alleles in the Shaan A and Shaan B groups, as well as in each inbred line. Our breeding practice led to consistent results. Not only does this study demonstrate the feasibility of GWAS in F1 hybrid populations, it also provides a valuable basis for further molecular biology and breeding research.

Assuntos

Estudo de Associação Genômica Ampla , Zea mays , Zea mays/genética , Melhoramento Vegetal , Agricultura , Produtos Agrícolas

3.

Estimating genetic variance contributed by a quantitative trait locus: A random model approach.

Wang, Shibo; Xie, Fangjie; Xu, Shizhong.

PLoS Comput Biol ; 18(3): e1009923, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-35275920

RESUMO

Detecting quantitative trait loci (QTL) and estimating QTL variances (represented by the squared QTL effects) are two main goals of QTL mapping and genome-wide association studies (GWAS). However, there are issues associated with estimated QTL variances and such issues have not attracted much attention from the QTL mapping community. Estimated QTL variances are usually biased upwards due to estimation being associated with significance tests. The phenomenon is called the Beavis effect. However, estimated variances of QTL without significance tests can also be biased upwards, which cannot be explained by the Beavis effect; rather, this bias is due to the fact that QTL variances are often estimated as the squares of the estimated QTL effects. The parameters are the QTL effects and the estimated QTL variances are obtained by squaring the estimated QTL effects. This square transformation failed to incorporate the errors of estimated QTL effects into the transformation. The consequence is biases in estimated QTL variances. To correct the biases, we can either reformulate the QTL model by treating the QTL effect as random and directly estimate the QTL variance (as a variance component) or adjust the bias by taking into account the error of the estimated QTL effect. A moment method of estimation has been proposed to correct the bias. The method has been validated via Monte Carlo simulation studies. The method has been applied to QTL mapping for the 10-week-body-weight trait from an F2 mouse population.

Assuntos

Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Animais , Mapeamento Cromossômico/métodos , Camundongos , Modelos Genéticos , Método de Monte Carlo , Locos de Características Quantitativas/genética

4.

Incorporation of parental phenotypic data into multi-omic models improves prediction of yield-related traits in hybrid rice.

Xu, Yang; Zhao, Yue; Wang, Xin; Ma, Ying; Li, Pengcheng; Yang, Zefeng; Zhang, Xuecai; Xu, Chenwu; Xu, Shizhong.

Plant Biotechnol J ; 19(2): 261-272, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-32738177

RESUMO

Hybrid breeding has been shown to effectively increase rice productivity. However, identifying desirable hybrids out of numerous potential combinations is a daunting challenge. Genomic selection holds great promise for accelerating hybrid breeding by enabling early selection before phenotypes are measured. With the recent advances in multi-omic technologies, hybrid prediction based on transcriptomic and metabolomic data has received increasing attention. However, the current omic-based hybrid prediction has ignored parental phenotypic information, which is of fundamental importance in plant breeding. In this study, we integrated parental phenotypic information into various multi-omic prediction models applied in hybrid breeding of rice and compared the predictabilities of 15 combinations from four sets of predictors from the parents, that is genome, transcriptome, metabolome and phenome. The predictability for each combination was evaluated using the best linear unbiased prediction and a modified fast HAT method. We found significant interactions between predictors and traits in predictability, but joint prediction with various combinations of the predictors significantly improved predictability relative to prediction of any single source omic data for each trait investigated. Incorporation of parental phenotypic data into various omic predictors increased the predictability, averagely by 13.6%, 54.5%, 19.9% and 8.3%, for grain yield, number of tillers per plant, number of grains per panicle and 1000 grain weight, respectively. Among nine models of incorporating parental traits, the AD-All model was the most effective one. This novel strategy of incorporating parental phenotypic data into multi-omic prediction is expected to improve hybrid breeding progress, especially with the development of high-throughput phenotyping technologies.

Assuntos

Oryza , Hibridização Genética , Modelos Genéticos , Oryza/genética , Fenótipo , Melhoramento Vegetal

5.

Deshrinking ridge regression for genome-wide association studies.

Wang, Meiyue; Li, Ruidong; Xu, Shizhong.

Bioinformatics ; 36(14): 4154-4162, 2020 08 15.

Artigo em Inglês | MEDLINE | ID: mdl-32379866

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) are still the primary steps toward gene discovery. The urgency is more obvious in the big data era when GWAS are conducted simultaneously for thousand traits, e.g. transcriptomic and metabolomic traits. Efficient mixed model association (EMMA) and genome-wide efficient mixed model association (GEMMA) are the widely used methods for GWAS. An algorithm with high computational efficiency is badly needed. It is interesting to note that the test statistics of the ordinary ridge regression (ORR) have the same patterns across the genome as those obtained from the EMMA method. However, ORR has never been used for GWAS due to its severe shrinkage on the estimated effects and the test statistics. RESULTS: We introduce a degree of freedom for each marker effect obtained from ORR and use it to deshrink both the estimated effect and the standard error so that the Wald test of ORR is brought back to the same level as that of EMMA. The new method is called deshrinking ridge regression (DRR). By evaluating the methods under three different model sizes (small, medium and large), we demonstrate that DRR is more generalized for all model sizes than EMMA, which only works for medium and large models. Furthermore, DRR detect all markers in a simultaneous manner instead of scanning one marker at a time. As a result, the computational time complexity of DRR is much simpler than EMMA and about m (number of genetic variants) times simpler than that of GEMMA when the sample size is way smaller than the number of markers. CONTACT: shizhong.xu@ucr.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Algoritmos , Fenótipo , Tamanho da Amostra

6.

Rapid epistatic mixed-model association studies by controlling multiple polygenic effects.

Wang, Dan; Tang, Hui; Liu, Jian-Feng; Xu, Shizhong; Zhang, Qin; Ning, Chao.

Bioinformatics ; 36(19): 4833-4837, 2020 12 08.

Artigo em Inglês | MEDLINE | ID: mdl-32614415

RESUMO

SUMMARY: We have developed a rapid mixed model algorithm for exhaustive genome-wide epistatic association analysis by controlling multiple polygenic effects. Our model can simultaneously handle additive by additive epistasis, dominance by dominance epistasis and additive by dominance epistasis, and account for intrasubject fluctuations due to individuals with repeated records. Furthermore, we suggest a simple but efficient approximate algorithm, which allows the examination of all pairwise interactions in a remarkably fast manner of linear with population size. Simulation studies are performed to investigate the properties of REMMAX. Application to publicly available yeast and human data has showed that our mixed model-based method has similar performance with simple linear model on computational efficiency. It took less than 40 h for the pairwise analysis of 5000 individuals genotyped with roughly 350 000 SNPs with five threads on Intel Xeon E5 2.6 GHz CPU. AVAILABILITY AND IMPLEMENTATION: Source codes are freely available at https://github.com/chaoning/GMAT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Epistasia Genética , Herança Multifatorial , Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Herança Multifatorial/genética , Software

7.

Accurate prediction of maize grain yield using its contributing genes for gene-based breeding.

Zhang, Meiping; Cui, Yanru; Liu, Yun-Hua; Xu, Wenwei; Sze, Sing-Hoi; Murray, Seth C; Xu, Shizhong; Zhang, Hong-Bin.

Genomics ; 112(1): 225-236, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-30826444

RESUMO

Accurately predicting the phenotypes of complex traits is crucial to enhanced breeding in plants and livestock, and to enhanced medicine in humans. Here we reports the first study accurately predicting complex traits using their contributing genes, especially their number of favorable alleles (NFAs), genotypes and transcript expressions, with the grain yield of maize, Zea mays L. When the NFAs or genotypes of only 27 SNP/InDel-containing grain yield genes were used, a prediction accuracy of râ¯=â¯0.52 or 0.49 was obtained. When the expressions of grain yield gene transcripts were used, a plateaued prediction accuracy of râ¯=â¯0.84 was achieved. When the phenotypes predicted with two or three of the genic datasets were used for progeny selection, the selected lines were completely consistent with those selected by phenotypic selection. Therefore, the genes controlling complex traits enable accurately predicting their phenotypes, thus desirable for gene-based breeding in crop plants.

Assuntos

Grão Comestível/genética , Genes de Plantas , Melhoramento Vegetal/métodos , Zea mays/genética , Alelos , Expressão Gênica , Genótipo , Herança Multifatorial , Fenótipo

8.

Hybrid breeding of rice via genomic selection.

Cui, Yanru; Li, Ruidong; Li, Guangwei; Zhang, Fan; Zhu, Tiantian; Zhang, Qifa; Ali, Jauhar; Li, Zhikang; Xu, Shizhong.

Plant Biotechnol J ; 18(1): 57-67, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31124256

RESUMO

Hybrid breeding is the main strategy for improving productivity in many crops, especially in rice and maize. Genomic hybrid breeding is a technology that uses whole-genome markers to predict future hybrids. Predicted superior hybrids are then field evaluated and released as new hybrid cultivars after their superior performances are confirmed. This will increase the opportunity of selecting true superior hybrids with minimum costs. Here, we used genomic best linear unbiased prediction to perform hybrid performance prediction using an existing rice population of 1495 hybrids. Replicated 10-fold cross-validations showed that the prediction abilities on ten agronomic traits ranged from 0.35 to 0.92. Using the 1495 rice hybrids as a training sample, we predicted six agronomic traits of 100 hybrids derived from half diallel crosses involving 21 parents that are different from the parents of the hybrids in the training sample. The prediction abilities were relatively high, varying from 0.54 (yield) to 0.92 (grain length). We concluded that the current population of 1495 hybrids can be used to predict hybrids from seemingly unrelated parents. Eventually, we used this training population to predict all potential hybrids of cytoplasm male sterile lines from 3000 rice varieties from the 3K Rice Genome Project. Using a breeding index combining 10 traits, we identified the top and bottom 200 predicted hybrids. SNP genotypes of the training population and parameters estimated from this training population are available for general uses and further validation in genomic hybrid prediction of all potential hybrids generated from all varieties of rice.

Assuntos

Hibridização Genética , Oryza/genética , Melhoramento Vegetal , Produtos Agrícolas/genética , Genoma de Planta , Genômica , Modelos Genéticos , Polimorfismo de Nucleotídeo Único

9.

A coordinate descent approach for sparse Bayesian learning in high dimensional QTL mapping and genome-wide association studies.

Wang, Meiyue; Xu, Shizhong.

Bioinformatics ; 35(21): 4327-4335, 2019 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-31081037

RESUMO

MOTIVATION: Genomic scanning approaches that detect one locus at a time are subject to many problems in genome-wide association studies and quantitative trait locus mapping. The problems include large matrix inversion, over-conservativeness for tests after Bonferroni correction and difficulty in evaluation of the total genetic contribution to a trait's variance. Targeting these problems, we take a further step and investigate a multiple locus model that detects all markers simultaneously in a single model. RESULTS: We developed a sparse Bayesian learning (SBL) method for quantitative trait locus mapping and genome-wide association studies. This new method adopts a coordinate descent algorithm to estimate parameters (marker effects) by updating one parameter at a time conditional on current values of all other parameters. It uses an L2 type of penalty that allows the method to handle extremely large sample sizes (>100 000). Simulation studies show that SBL often has higher statistical powers and the simulated true loci are often detected with extremely small P-values, indicating that SBL is insensitive to stringent thresholds in significance testing. AVAILABILITY AND IMPLEMENTATION: An R package (sbl) is available on the comprehensive R archive network (CRAN) and https://github.com/MeiyueComputBio/sbl/tree/master/R%20packge. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Estudo de Associação Genômica Ampla , Genômica , Teorema de Bayes , Mapeamento Cromossômico , Modelos Genéticos , Fenótipo

10.

Efficient multivariate analysis algorithms for longitudinal genome-wide association studies.

Ning, Chao; Wang, Dan; Zhou, Lei; Wei, Julong; Liu, Yuanxin; Kang, Huimin; Zhang, Shengli; Zhou, Xiang; Xu, Shizhong; Liu, Jian-Feng.

Bioinformatics ; 35(23): 4879-4885, 2019 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-31070732

RESUMO

MOTIVATION: Current dynamic phenotyping system introduces time as an extra dimension to genome-wide association studies (GWAS), which helps to explore the mechanism of dynamical genetic control for complex longitudinal traits. However, existing methods for longitudinal GWAS either ignore the covariance among observations of different time points or encounter computational efficiency issues. RESULTS: We herein developed efficient genome-wide multivariate association algorithms for longitudinal data. In contrast to existing univariate linear mixed model analyses, the proposed method has improved statistic power for association detection and computational speed. In addition, the new method can analyze unbalanced longitudinal data with thousands of individuals and more than ten thousand records within a few hours. The corresponding time for balanced longitudinal data is just a few minutes. AVAILABILITY AND IMPLEMENTATION: A software package to implement the efficient algorithm named GMA (https://github.com/chaoning/GMA) is available freely for interested users in relevant fields. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Estudo de Associação Genômica Ampla , Genoma , Humanos , Análise Multivariada , Software

11.

GWASpro: a high-performance genome-wide association analysis server.

Kim, Bongsong; Dai, Xinbin; Zhang, Wenchao; Zhuang, Zhaohong; Sanchez, Darlene L; Lübberstedt, Thomas; Kang, Yun; Udvardi, Michael K; Beavis, William D; Xu, Shizhong; Zhao, Patrick X.

Bioinformatics ; 35(14): 2512-2514, 2019 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-30508039

RESUMO

SUMMARY: We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. AVAILABILITY AND IMPLEMENTATION: GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Estudo de Associação Genômica Ampla , Software , Computadores

12.

Genome-wide association studies using binned genotypes.

An, Bingxing; Gao, Xue; Chang, Tianpeng; Xia, Jiangwei; Wang, Xiaoqiao; Miao, Jian; Xu, Lingyang; Zhang, Lupei; Chen, Yan; Li, Junya; Xu, Shizhong; Gao, Huijiang.

Heredity (Edinb) ; 124(2): 288-298, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-31641238

RESUMO

Linear mixed models (LMM) that tests trait association one marker at a time have been the most popular methods for genome-wide association studies. However, this approach has potential pitfalls: over conservativeness after Bonferroni correction, ignorance of linkage disequilibrium (LD) between neighboring markers, and power reduction due to overfitting SNP effects. So, multiple locus models that can simultaneously estimate and test all markers in the genome are more appropriate. Based on the multiple locus models, we proposed a bin model that combines markers into bins based on their LD relationships. A bin is treated as a new synthetic marker and we detect the associations between bins and traits. Since the number of bins can be substantially smaller than the number of markers, a penalized multiple regression method can be adopted by fitting all bins to a single model. We developed an innovative method to bin the neighboring markers and used the least absolute shrinkage and selection operator (LASSO) method. We compared BIN-Lasso with SNP-Lasso and Q + K-LMM in a simulation experiment, and showed that the new method is more powerful with less Type I error than the other two methods. We also applied the bin model to a Chinese Simmental beef cattle population for bone weight association study. The new method identified more significant associations than the classical LMM. The bin model is a new dimension reduction technique that takes advantage of biological information (i.e., LD). The new method will be a significant breakthrough in associative genomics in the big data era.

Assuntos

Bovinos/genética , Estudos de Associação Genética/veterinária , Genômica/métodos , Modelos Genéticos , Animais , Simulação por Computador , Genótipo , Modelos Lineares , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único

13.

A multi-parent advanced generation inter-cross (MAGIC) population for genetic analysis and improvement of cowpea (Vigna unguiculata L. Walp.).

Huynh, Bao-Lam; Ehlers, Jeffrey D; Huang, Bevan Emma; Muñoz-Amatriaín, María; Lonardi, Stefano; Santos, Jansen R P; Ndeve, Arsenio; Batieno, Benoit J; Boukar, Ousmane; Cisse, Ndiaga; Drabo, Issa; Fatokun, Christian; Kusi, Francis; Agyare, Richard Y; Guo, Yi-Ning; Herniter, Ira; Lo, Sassoum; Wanamaker, Steve I; Xu, Shizhong; Close, Timothy J; Roberts, Philip A.

Plant J ; 93(6): 1129-1142, 2018 03.

Artigo em Inglês | MEDLINE | ID: mdl-29356213

RESUMO

Multi-parent advanced generation inter-cross (MAGIC) populations are an emerging type of resource for dissecting the genetic structure of traits and improving breeding populations. We developed a MAGIC population for cowpea (Vigna unguiculata L. Walp.) from eight founder parents. These founders were genetically diverse and carried many abiotic and biotic stress resistance, seed quality and agronomic traits relevant to cowpea improvement in the United States and sub-Saharan Africa, where cowpea is vitally important in the human diet and local economies. The eight parents were inter-crossed using structured matings to ensure that the population would have balanced representation from each parent, followed by single-seed descent, resulting in 305 F8 recombinant inbred lines each carrying a mosaic of genome blocks contributed by all founders. This was confirmed by single nucleotide polymorphism genotyping with the Illumina Cowpea Consortium Array. These lines were on average 99.74% homozygous but also diverse in agronomic traits across environments. Quantitative trait loci (QTLs) were identified for several parental traits. Loci with major effects on photoperiod sensitivity and seed size were also verified by biparental genetic mapping. The recombination events were concentrated in telomeric regions. Due to its broad genetic base, this cowpea MAGIC population promises breakthroughs in genetic gain, QTL and gene discovery, enhancement of breeding populations and, for some lines, direct releases as new varieties.

Assuntos

Genes de Plantas/genética , Melhoramento Vegetal/métodos , Locos de Características Quantitativas/genética , Vigna/genética , Mapeamento Cromossômico , Cromossomos de Plantas/genética , Cruzamentos Genéticos , Genética Populacional , Genoma de Planta/genética , Genótipo , Filogenia , Polimorfismo de Nucleotídeo Único , Sementes/genética , Especificidade da Espécie , Vigna/classificação

14.

A directed learning strategy integrating multiple omic data improves genomic prediction.

Hu, Xuehai; Xie, Weibo; Wu, Chengchao; Xu, Shizhong.

Plant Biotechnol J ; 17(10): 2011-2020, 2019 10.

Artigo em Inglês | MEDLINE | ID: mdl-30950198

RESUMO

Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome-wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes (cis or trans); and (iii) trait-related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models.

Assuntos

Genômica/métodos , Oryza/genética , Aprendizado de Máquina Supervisionado , Marcadores Genéticos , Metaboloma , Modelos Genéticos , Modelos Estatísticos , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Transcriptoma

15.

A rapid epistatic mixed-model association analysis by linear retransformations of genomic estimated values.

Ning, Chao; Wang, Dan; Kang, Huimin; Mrode, Raphael; Zhou, Lei; Xu, Shizhong; Liu, Jian-Feng.

Bioinformatics ; 34(11): 1817-1825, 2018 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-29342229

RESUMO

Motivation: Epistasis provides a feasible way for probing potential genetic mechanism of complex traits. However, time-consuming computation challenges successful detection of interaction in practice, especially when linear mixed model (LMM) is used to control type I error in the presence of population structure and cryptic relatedness. Results: A rapid epistatic mixed-model association analysis (REMMA) method was developed to overcome computational limitation. This method first estimates individuals' epistatic effects by an extended genomic best linear unbiased prediction (EG-BLUP) model with additive and epistatic kinship matrix, then pairwise interaction effects are obtained by linear retransformations of individuals' epistatic effects. Simulation studies showed that REMMA could control type I error and increase statistical power in detecting epistatic QTNs in comparison with existing LMM-based FaST-LMM. We applied REMMA to two real datasets, a mouse dataset and the Wellcome Trust Case Control Consortium (WTCCC) data. Application to the mouse data further confirmed the performance of REMMA in controlling type I error. For the WTCCC data, we found most epistatic QTNs for type 1 diabetes (T1D) located in a major histocompatibility complex (MHC) region, from which a large interacting network with 12 hub genes (interacting with ten or more genes) was established. Availability and implementation: Our REMMA method can be freely accessed at https://github.com/chaoning/REMMA. Contact: liujf@cau.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Epistasia Genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Animais , Genômica/métodos , Humanos , Camundongos

16.

JRmGRN: joint reconstruction of multiple gene regulatory networks with common hub genes using data from multiple tissues or conditions.

Deng, Wenping; Zhang, Kui; Liu, Sanzhen; Zhao, Patrick X; Xu, Shizhong; Wei, Hairong.

Bioinformatics ; 34(20): 3470-3478, 2018 10 15.

Artigo em Inglês | MEDLINE | ID: mdl-29718177

RESUMO

Motivation: Joint reconstruction of multiple gene regulatory networks (GRNs) using gene expression data from multiple tissues/conditions is very important for understanding common and tissue/condition-specific regulation. However, there are currently no computational models and methods available for directly constructing such multiple GRNs that not only share some common hub genes but also possess tissue/condition-specific regulatory edges. Results: In this paper, we proposed a new graphic Gaussian model for joint reconstruction of multiple gene regulatory networks (JRmGRN), which highlighted hub genes, using gene expression data from several tissues/conditions. Under the framework of Gaussian graphical model, JRmGRN method constructs the GRNs through maximizing a penalized log likelihood function. We formulated it as a convex optimization problem, and then solved it with an alternating direction method of multipliers (ADMM) algorithm. The performance of JRmGRN was first evaluated with synthetic data and the results showed that JRmGRN outperformed several other methods for reconstruction of GRNs. We also applied our method to real Arabidopsis thaliana RNA-seq data from two light regime conditions in comparison with other methods, and both common hub genes and some conditions-specific hub genes were identified with higher accuracy and precision. Availability and implementation: JRmGRN is available as a R program from: https://github.com/wenpingd. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Redes Reguladoras de Genes , Algoritmos , Funções Verossimilhança , Distribuição Normal , Software

17.

A genome-wide association and meta-analysis reveal regions associated with seed size in cowpea [Vigna unguiculata (L.) Walp].

Lo, Sassoum; Muñoz-Amatriaín, María; Hokin, Samuel A; Cisse, Ndiaga; Roberts, Philip A; Farmer, Andrew D; Xu, Shizhong; Close, Timothy J.

Theor Appl Genet ; 132(11): 3079-3087, 2019 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-31367839

RESUMO

KEY MESSAGE: This paper combined GWAS, meta-analysis and sequence homology comparison with common bean to identify regions associated with seed size variation in domesticated cowpea. Seed size is an important trait for yield and commercial value in dry-grain cowpea. Seed size varies widely among different cowpea accessions, and the genetic basis of such variation is not yet well understood. To better decipher the genetic basis of seed size, a genome-wide association study (GWAS) and meta-analysis were conducted on a panel of 368 cowpea diverse accessions from 51 countries. Four traits, including seed weight, length, width and density were evaluated across three locations. Using 51,128 single nucleotide polymorphisms covering the cowpea genome, 17 loci were identified for these traits. One locus was common to weight, width and length, suggesting pleiotropy. By integrating synteny-based analysis with common bean, six candidate genes (Vigun05g036000, Vigun05g039600, Vigun05g204200, Vigun08g217000, Vigun11g187000, and Vigun11g191300) which are implicated in multiple functional categories related to seed size such as endosperm development, embryo development, and cell elongation were identified. These results suggest that a combination of GWAS meta-analysis with synteny comparison in a related plant is an efficient approach to identify candidate gene (s) for complex traits in cowpea. The identified loci and candidate genes provide useful information for improving cowpea varieties and for molecular investigation of seed size.

Assuntos

Sementes/fisiologia , Vigna/genética , Mapeamento Cromossômico , Genes de Plantas , Estudos de Associação Genética , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Vigna/fisiologia

18.

Statistical power in genome-wide association studies and quantitative trait locus mapping.

Wang, Meiyue; Xu, Shizhong.

Heredity (Edinb) ; 123(3): 287-306, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-30858595

RESUMO

Power calculation prior to a genetic experiment can help investigators choose the optimal sample size to detect a quantitative trait locus (QTL). Without the guidance of power analysis, an experiment may be underpowered or overpowered. Either way will result in wasted resource. QTL mapping and genome-wide association studies (GWAS) are often conducted using a linear mixed model (LMM) with controls of population structure and polygenic background using markers of the whole genome. Power analysis for such a mixed model is often conducted via Monte Carlo simulations. In this study, we derived a non-centrality parameter for the Wald test statistic for association, which allows analytical power analysis. We show that large samples are not necessary to detect a biologically meaningful QTL, say explaining 5% of the phenotypic variance. Several R functions are provided so that users can perform power analysis to determine the minimum sample size required to detect a given QTL with a certain statistical power or calculate the statistical power with given sample size and known values of other population parameters.

Assuntos

Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genoma , Modelos Estatísticos , Locos de Características Quantitativas , Característica Quantitativa Herdável , Marcadores Genéticos , Genótipo , Humanos , Método de Monte Carlo , Oryza/genética , Fenótipo , Tamanho da Amostra

19.

Statistics of Mendelian segregation-A mixture model.

Wang, Meiyue; Xu, Shizhong.

J Anim Breed Genet ; 136(5): 341-350, 2019 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-31038229

RESUMO

Mendel's law of segregation explains why genetic variation can be maintained over time. In diploid organisms, an offspring receives one allele from each parent, not just half of the blended genetic material of the parents. Which of the two alleles is received is purely random. This stochastic process generates genetic variation among members of the same family, called Mendelian segregation variance or within-family variance. In statistics, the genetic value of a quantitative trait for an offspring follows a mixture distribution consisting of the four alleles of the two parents, guided by a Mendelian variable from each parent. The mixture model allows us to partition the total genetic variance into between-family and within-family variances. In the absence of inbreeding, the genetic variance splits half to the between-family variance and half to the within-family variance. With inbreeding, however, the between-family variance is increased at the cost of the within-family variance, leading to a net increase of the total genetic variance. This study defines multiple Mendelian variables and develops a mixture model of quantitative genetics. The phenomenon that allelic variance is maintained over time is guided by "the law of conservation of allelic variance" in biology, which is comparable to "the law of conservation of mass" in physics.

Assuntos

Modelos Genéticos , Modelos Estatísticos , Animais , Evolução Biológica , Diploide , Humanos , Endogamia

20.

An alternative derivation of Harville's restricted log likelihood function for variance component estimation.

Xu, Shizhong.

Biom J ; 61(1): 157-161, 2019 01.

Artigo em Inglês | MEDLINE | ID: mdl-30387166

RESUMO

Estimation of variance components in linear mixed models is important in clinical trial and longitudinal data analysis. It is also important in animal and plant breeding for accurately partitioning total phenotypic variance into genetic and environmental variances. Restricted maximum likelihood (REML) method is often preferred over the maximum likelihood (ML) method for variance component estimation because REML takes into account the lost degree of freedom resulting from estimating the fixed effects. The original restricted likelihood function involves a linear transformation of the original response variable (a collection of error contrasts). Harville's final form of the restricted likelihood function does not involve the transformation and thus is much easier to manipulate than the original restricted likelihood function. There are several different ways to show that the two forms of the restricted likelihood are equivalent. In this study, I present a much simpler way to derive Harville's restricted likelihood function. I first treat the fixed effects as random effects and call such a mixed model a pseudo random model (PDRM). I then construct a likelihood function for the PDRM. Finally, I let the variance of the pseudo random effects be infinity and show that the limit of the likelihood function of the PDRM is the restricted likelihood function.

Assuntos

Estatística como Assunto/métodos , Análise de Variância , Funções Verossimilhança

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA