Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Plant Methods ; 20(1): 121, 2024 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-39127715

RESUMO

BACKGROUND: Structural genomic variants (SVs) are prevalent in plant genomes and have played an important role in evolution and domestication, as they constitute a significant source of genomic and phenotypic variability. Nevertheless, most methods in quantitative genetics focusing on crop improvement, such as genomic prediction, consider only Single Nucleotide Polymorphisms (SNPs). Deep Learning (DL) is a promising strategy for genomic prediction, but its performance using SVs and SNPs as genetic markers remains unknown. RESULTS: We used rice to investigate whether combining SVs and SNPs can result in better trait prediction over SNPs alone and examine the potential advantage of Deep Learning (DL) networks over Bayesian Linear models. Specifically, the performances of BayesC (considering additive effects) and a Bayesian Reproducible Kernel Hilbert space (RKHS) regression (considering both additive and non-additive effects) were compared to those of two different DL architectures, the Multilayer Perceptron, and the Convolution Neural Network, to explore their prediction ability by using various marker input strategies. We found that exploiting structural and nucleotide variation slightly improved prediction ability on complex traits in 87% of the cases. DL models outperformed Bayesian models in 75% of the studied cases, considering the four traits and the two validation strategies used. Finally, DL systematically improved prediction ability of binary traits against the Bayesian models. CONCLUSIONS: Our study reveals that the use of structural genomic variants can improve trait prediction in rice, independently of the methodology used. Also, our results suggest that Deep Learning (DL) networks can perform better than Bayesian models in the prediction of binary traits, and in quantitative traits when the training and target sets are not closely related. This highlights the potential of DL to enhance crop improvement in specific scenarios and the importance to consider SVs in addition to SNPs in genomic selection.

2.
Front Plant Sci ; 15: 1393965, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39139722

RESUMO

Introduction: Predicting the performance (yield or other integrative traits) of cultivated plants is complex because it involves not only estimating the genetic value of the candidates to selection, the interactions between the genotype and the environment (GxE) but also the epistatic interactions between genomic regions for a given trait, and the interactions between the traits contributing to the integrative trait. Classical Genomic Prediction (GP) models mostly account for additive effects and are not suitable to estimate non-additive effects such as epistasis. Therefore, the use of machine learning and deep learning methods has been previously proposed to model those non-linear effects. Methods: In this study, we propose a type of Artificial Neural Network (ANN) called Convolutional Neural Network (CNN) and compare it to two classical GP regression methods for their ability to predict an integrative trait of sorghum: aboveground fresh weight accumulation. We also suggest that the use of a crop growth model (CGM) can enhance predictions of integrative traits by decomposing them into more heritable intermediate traits. Results: The results show that CNN outperformed both LASSO and Bayes C methods in accuracy, suggesting that CNN are better suited to predict integrative traits. Furthermore, the predictive ability of the combined CGM-GP approach surpassed that of GP without the CGM integration, irrespective of the regression method used. Discussion: These results are consistent with recent works aiming to develop Genome-to-Phenotype models and advocate for the use of non-linear prediction methods, and the use of combined CGM-GP to enhance the prediction of crop performances.

3.
Front Plant Sci ; 15: 1386837, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39139728

RESUMO

Cultivated potato, Solanum tuberosum L., is considered an autotetraploid with 12 chromosomes with four homologous phases. However, recent evidence found that, due to frequent large phase deletions in the genome, gene ploidy is not constant across the genome. The elite cultivar "Otava" was found to have an average gene copy number of 3.2 across all loci. Breeding programs for elite potato cultivars rely increasingly on genomic prediction tools for selection breeding and elucidation of quantitative trait loci underpinning trait genetic variance. These are typically based on anonymous single nucleotide polymorphism (SNP) markers, which are usually called from, for example, SNP array or sequencing data using a tetraploid model. In this study, we analyzed the impact of using whole genome markers genotyped as either tetraploid or observed allele frequencies from genotype-by-sequencing data on single-trait additive genomic best linear unbiased prediction (GBLUP) genomic prediction (GP) models and single-marker regression genome-wide association studies of potato to evaluate the implications of capturing varying ploidy on the statistical models employed in genomic breeding. A panel of 762 offspring of a diallel cross of 18 parents of elite breeding material was used for modeling. These were genotyped by sequencing and phenotyped for five key performance traits: chipping quality, length/width ratio, senescence, dry matter content, and yield. We also estimated the read coverage required to confidently discriminate between a heterozygous triploid and tetraploid state from simulated data. It was found that using a tetraploid model neither impaired nor improved genomic predictions compared to using the observed allele frequencies that account for true marker ploidy. In genome-wide associations studies (GWAS), very minor variations of both signal amplitude and number of SNPs supporting both minor and major quantitative trait loci (QTLs) were observed between the two data sets. However, all major QTLs were reproducible using both data sets.

4.
Front Plant Sci ; 15: 1400000, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39109055

RESUMO

Sugarcane is a crucial crop for sugar and bioenergy production. Saccharose content and total weight are the two main key commercial traits that compose sugarcane's yield. These traits are under complex genetic control and their response patterns are influenced by the genotype-by-environment (G×E) interaction. An efficient breeding of sugarcane demands an accurate assessment of the genotype stability through multi-environment trials (METs), where genotypes are tested/evaluated across different environments. However, phenotyping all genotype-in-environment combinations is often impractical due to cost and limited availability of propagation-materials. This study introduces the sparse testing designs as a viable alternative, leveraging genomic information to predict unobserved combinations through genomic prediction models. This approach was applied to a dataset comprising 186 genotypes across six environments (6×186=1,116 phenotypes). Our study employed three predictive models, including environment, genotype, and genomic markers as main effects, as well as the G×E to predict saccharose accumulation (SA) and tons of cane per hectare (TCH). Calibration sets sizes varying between 72 (6.5%) to 186 (16.7%) of the total number of phenotypes were composed to predict the remaining 930 (83.3%). Additionally, we explored the optimal number of common genotypes across environments for G×E pattern prediction. Results demonstrate that maximum accuracy for SA ( ρ = 0.611 ) and for TCH ( ρ=0.341 ) was achieved using in training sets few (3) to no common (0) genotype across environments maximizing the number of different genotypes that were tested only once. Significantly, we show that reducing phenotypic records for model calibration has minimal impact on predictive ability, with sets of 12 non-overlapped genotypes per environment (72=12×6) being the most convenient cost-benefit combination.

5.
G3 (Bethesda) ; 2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-39129203

RESUMO

Striga hermonthica (Del.) Benth., a parasitic weed, causes substantial yield losses in maize production in sub-Saharan Africa (SSA). Breeding for Striga resistance in maize is constrained by limited genetic diversity for Striga resistance within the elite germplasm and phenotyping capacity under artificial Striga infestation. Genomics-enabled approaches have the potential to accelerate identification of Striga resistant lines for hybrid development. The objectives of this study were to evaluate the accuracy of genomic selection for traits associated with Striga resistance and grain yield (GY) and to predict genetic values of tested and untested doubled haploid (DH) maize lines. We genotyped 606 DH lines with 8,439 rAmpSeq markers. A training set of 116 DH lines crossed to two testers was phenotyped under artificial Striga infestation at three locations in Kenya. Heritability for Striga resistance parameters ranged from 0.38‒0.65 while that for GY was 0.54. The prediction accuracies for Striga resistance-associated traits across locations, as determined by cross validation (CV) were 0.24 to 0.53 for CV0 and from 0.20 to 0.37 for CV2. For GY, the prediction accuracies were 0.59 and 0.56 for CV0 and CV2, respectively. The results revealed 300 DH lines with desirable genomic estimated breeding values (GEBVs) for reduced number of emerged Striga plants (STR) at 8, 10, and 12 weeks after planting. The GEBVs of DH lines for Striga resistance associated traits in the training and testing sets were similar in magnitude. These results highlight the potential application of genomic selection in breeding for Striga resistance in maize. The integration of genomic-assisted strategies and DH technology for line development coupled with forward breeding for major adaptive traits will enhance genetic gains in breeding for Striga resistance in maize.

6.
Front Genet ; 15: 1415249, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38948357

RESUMO

In modern breeding practices, genomic prediction (GP) uses high-density single nucleotide polymorphisms (SNPs) markers to predict genomic estimated breeding values (GEBVs) for crucial phenotypes, thereby speeding up selection breeding process and shortening generation intervals. However, due to the characteristic of genotype data typically having far fewer sample numbers than SNPs markers, overfitting commonly arise during model training. To address this, the present study builds upon the Least Squares Twin Support Vector Regression (LSTSVR) model by incorporating a Lasso regularization term named ILSTSVR. Because of the complexity of parameter tuning for different datasets, subtraction average based optimizer (SABO) is further introduced to optimize ILSTSVR, and then obtain the GP model named SABO-ILSTSVR. Experiments conducted on four different crop datasets demonstrate that SABO-ILSTSVR outperforms or is equivalent in efficiency to widely-used genomic prediction methods. Source codes and data are available at: https://github.com/MLBreeding/SABO-ILSTSVR.

7.
Front Plant Sci ; 15: 1337388, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38978519

RESUMO

Introduction: In plant breeding, we often aim to improve multiple traits at once. However, without knowing the economic value of each trait, it is hard to decide which traits to focus on. This is where "desired gain selection indices" come in handy, which can yield optimal gains in each trait based on the breeder's prioritisation of desired improvements when economic weights are not available. However, they lack the ability to maximise the selection response and determine the correlation between the index and net genetic merit. Methods: Here, we report the development of an iterative desired gain selection index method that optimises the sampling of the desired gain values to achieve a targeted or a user-specified selection response for multiple traits. This targeted selection response can be constrained or unconstrained for either a subset or all the studied traits. Results: We tested the method using genomic estimated breeding values (GEBVs) for seven traits in a bread wheat (Triticum aestivum) reference breeding population comprising 3,331 lines and achieved prediction accuracies ranging between 0.29 and 0.47 across the seven traits. The indices were validated using 3,005 double haploid lines that were derived from crosses between parents selected from the reference population. We tested three user-specified response scenarios: a constrained equal weight (INDEX1), a constrained yield dominant weight (INDEX2), and an unconstrained weight (INDEX3). Our method achieved an equivalent response to the user-specified selection response when constraining a set of traits, and this response was much better than the response of the traditional desired gain selection indices method without iteration. Interestingly, when using unconstrained weight, our iterative method maximised the selection response and shifted the average GEBVs of the selection candidates towards the desired direction. Discussion: Our results show that the method is an optimal choice not only when economic weights are unavailable, but also when constraining the selection response is an unfavourable option.

8.
G3 (Bethesda) ; 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-39028116

RESUMO

Switchgrass is a potential crop for bioenergy or carbon capture schemes, but further yield improvements through selective breeding are needed to encourage commercialization. To identify promising switchgrass germplasm for future breeding efforts, we conducted multi-site and multi-trait genomic prediction with a diversity panel of 630 genotypes from 4 switchgrass subpopulations (Gulf, Midwest, Coastal, and Texas), which were measured for spaced plant biomass yield across 10 sites. Our study focused on the use of genomic prediction to share information among traits and environments. Specifically, we evaluated the predictive ability of cross-validation (CV) schemes using only genetic data and the training set, (cross validation 1: CV1), a subset of the sites (cross validation 2: CV2), and/or with two yield surrogates (flowering time and fall plant height). We found that genotype-by-environment interactions were largely due to the north-south distribution of sites. The genetic correlations between yield surrogates and biomass yield were generally positive (mean height r=0.85; mean flowering time r=0.45) and did not vary due to subpopulation or growing region (North, Middle, South). Genomic prediction models had cross-validation predictive abilities of -0.02 for individuals using only genetic data (CV1) but 0.55, 0.69, 0.76, 0.81, and 0.84 for individuals with biomass performance data from one, two, three, four and five sites included in the training data (CV2), respectively. To simulate a resource-limited breeding program, we determined the predictive ability of models provided with: one site observation of flowering time (0.39), one site observation of flowering time and fall height (0.51), one site observation of fall height (0.52), one site observation of biomass (0.55), and five site observations of biomass yield (0.84). The ability to share information at a regional scale is very encouraging but further research is required to accurately translate spaced plant biomass to commercial-scale sward biomass performance.

9.
Yi Chuan ; 46(7): 560-569, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39016089

RESUMO

Genomic prediction has emerged as a pivotal technology for the genetic evaluation of livestock, crops, and for predicting human disease risks. However, classical genomic prediction methods face challenges in incorporating biological prior information such as the genetic regulation mechanisms of traits. This study introduces a novel approach that integrates mRNA transcript information to predict complex trait phenotypes. To evaluate the accuracy of the new method, we utilized a Drosophila population that is widely employed in quantitative genetics researches globally. Results indicate that integrating mRNA transcript data can significantly enhance the genomic prediction accuracy for certain traits, though it does not improve phenotype prediction accuracy for all traits. Compared with GBLUP, the prediction accuracy for olfactory response to dCarvone in male Drosophila increased from 0.256 to 0.274. Similarly, the accuracy for cafe in male Drosophila rose from 0.355 to 0.401. The prediction accuracy for survival_paraquat in male Drosophila is improved from 0.101 to 0.138. In female Drosophila, the accuracy of olfactory response to 1hexanol increased from 0.147 to 0.210. In conclusion, integrating mRNA transcripts can substantially improve genomic prediction accuracy of certain traits by up to 43%, with range of 7% to 43%. Furthermore, for some traits, considering interaction effects along with mRNA transcript integration can lead to even higher prediction accuracy.


Assuntos
Drosophila , Genômica , RNA Mensageiro , Animais , RNA Mensageiro/genética , Masculino , Genômica/métodos , Feminino , Drosophila/genética , Fenótipo
10.
Vet Anim Sci ; 25: 100373, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39036417

RESUMO

Mating in animal communities must be managed in a way that assures the performance increase in the progenies without increasing the rate of inbreeding. It has currently become possible to identify millions of single nucleotide polymorphisms (SNPs), and it is feasible to select animals based on genome-wide marker profiles. This study aimed to evaluate the impact of five mating designs among individuals (random, positive and negative assortative, minimized and maximized inbreeding) on genomic prediction accuracy. The choice of these five particular mating designs provides a thorough analysis of the way genetic diversity, relatedness, inbreeding, and biological conditions influence the accuracy of genomic predictions. Utilizing a stochastic simulation technique, various marker and quantitative trait loci (QTL) densities were taken into account. The heritabilities of a simulated trait were 0.05, 0.30, and 0.60. A validation population that only had genotypic records was taken into consideration, and a reference population that had both genotypic and phenotypic records was considered for every simulation scenario. By measuring the correlation between estimated and true breeding values, the prediction accuracy was calculated. Computing the regression of true genomic breeding value on estimated genomic breeding value allowed for the examination of prediction bias. The scenario with a positive assortative mating design had the highest accuracy of genomic prediction (0.733 ± 0.003 to 0.966 ± 0.001). In a case of negative assortative mating, the genomic evaluation's accuracy was lowest (0.680 ± 0.011 to 0.899 ± 0.003). Applying the positive assortative mating design resulted in the unbiased regression coefficients of true genomic breeding value on estimated genomic breeding value. Based on the current results, it is suggested to implement positive assortative mating in genomic evaluation programs to obtain unbiased genomic predictions with greater accuracy. This study implies that animal breeding programs can improve offspring performance without compromising genetic health by carefully managing mating strategies based on genetic diversity, relatedness, and inbreeding levels. To maximize breeding results and ensure long-term genetic improvement in animal populations, this study highlights the importance of considering different mating designs when evaluating genomic information. When incorporating positive assortative mating or other mating schemes into genomic evaluation programs, it is critical to consider the complex relationship between gene interactions, environmental influences, and genetic drift to ensure the stability and effectiveness of breeding efforts. Further research and comprehensive analyzes are needed to fully understand the impact of these factors and their possible complex interactions on the accuracy of genomic prediction and to develop strategies that optimize breeding outcomes in animal populations.

11.
G3 (Bethesda) ; 2024 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-39001867

RESUMO

Intermediate wheatgrass (IWG) is a perennial grass that produces nutritious grain while offering substantial ecosystem services. Commercial varieties of this crop are mostly synthetic panmictic populations that are developed by intermating a few selected individuals. As development and generation advancement of these synthetic populations is a multi-year process, earlier synthetic generations are tested by the breeders and subsequent generations are released to the growers. A comparison of generations within IWG synthetic cultivars is currently lacking. In this study, we used simulation models and genomic prediction to analyze population differences and trends of genetic variance in four synthetic generations of MN-Clearwater, a commercial cultivar released by the University of Minnesota. Little to no differences were observed among the four generations for population genetic, genetic kinship, and genome-wide marker relationships measured via linkage disequilibrium. A reduction in genetic variance was observed when 7 parents were used to generate synthetic populations while using 20 led to the best possible outcome in determining population variance. Genomic prediction of plant height, free threshing ability, seed mass, and grain yield among the four synthetic generations showed a few significant differences among the generations yet the difference in values were negligible. Based on these observations, we make two major conclusions: 1) The earlier and latter synthetic generations of IWG are mostly similar to each other with minimal differences; and 2) Using 20 genotypes to create synthetic populations is recommended to sustain ample genetic variance and trait expression among all synthetic generations.

12.
G3 (Bethesda) ; 2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39052988

RESUMO

Blueberry (Vaccinium spp.) is among the most-consumed soft fruit and has been recognized as an important source of health-promoting compounds. Highly perishable and susceptible to rapid spoilage due to fruit softening and decay during postharvest storage, modern breeding programs are looking to maximize quality and extend the market life of fresh blueberries. However, it is uncertain how genetically controlled postharvest quality traits are in blueberries. This study aimed to investigate the prediction ability and genetic basis of the main fruit quality traits affected during blueberry postharvest to create breeding strategies for developing cultivars with an extended shelf life. To achieve this goal, we carried out target genotyping in a breeding population of 588 individuals and evaluated for several fruit quality traits after one day, one week, three weeks, and seven weeks of postharvest storage at 1 °C. Using longitudinal genome-based methods, we estimated genetic parameters and predicted unobserved phenotypes. Our results showed large diversity, moderate heritability, and consistent predictive accuracies along the postharvest storage for most of the traits. Regarding fruit quality, firmness showed the largest variation during postharvest storage, with a surprising number of genotypes maintaining or increasing their firmness even after seven weeks of cold storage. Our results suggest that we can effectively improve blueberry postharvest quality through breeding and use genomic prediction to maximize the genetic gains in the long term. We also emphasize the potential of using longitudinal genomic prediction models to predict fruit quality at extended postharvest periods by integrating known phenotypic data from harvest.

13.
Front Plant Sci ; 15: 1407609, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38916032

RESUMO

Genomic prediction has mostly been used in single environment contexts, largely ignoring genotype x environment interaction, which greatly affects the performance of plants. However, in the last decade, prediction models including marker x environment (MxE) interaction have been developed. We evaluated the potential of genomic prediction in red clover (Trifolium pratense L.) using field trial data from five European locations, obtained in the Horizon 2020 EUCLEG project. Three models were compared: (1) single environment (SingleEnv), (2) across environment (AcrossEnv), (3) marker x environment interaction (MxE). Annual dry matter yield (DMY) gave the highest predictive ability (PA). Joint analyses of DMY from years 1 and 2 from each location varied from 0.87 in Britain and Switzerland in year 1, to 0.40 in Serbia in year 2. Overall, crude protein (CP) was predicted poorly. PAs for date of flowering (DOF), however ranged from 0.87 to 0.67 for Britain and Switzerland, respectively. Across the three traits, the MxE model performed best and the AcrossEnv worst, demonstrating that including marker x environment effects can improve genomic prediction in red clover. Leaving out accessions from specific regions or from specific breeders' material in the cross validation tended to reduce PA, but the magnitude of reduction depended on trait, region and breeders' material, indicating that population structure contributed to the high PAs observed for DMY and DOF. Testing the genomic estimated breeding values on new phenotypic data from Sweden showed that DMY training data from Britain gave high PAs in both years (0.43-0.76), while DMY training data from Switzerland gave high PAs only for year 1 (0.70-0.87). The genomic predictions we report here underline the potential benefits of incorporating MxE interaction in multi-environment trials and could have perspectives for identifying markers with effects that are stable across environments, and markers with environment-specific effects.

14.
Genes (Basel) ; 15(6)2024 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-38927626

RESUMO

Genomic prediction plays an increasingly important role in modern animal breeding, with predictive accuracy being a crucial aspect. The classical linear mixed model is gradually unable to accommodate the growing number of target traits and the increasingly intricate genetic regulatory patterns. Hence, novel approaches are necessary for future genomic prediction. In this study, we used an illumina 50K SNP chip to genotype 4190 egg-type female Rhode Island Red chickens. Machine learning (ML) and classical bioinformatics methods were integrated to fit genotypes with 10 economic traits in chickens. We evaluated the effectiveness of ML methods using Pearson correlation coefficients and the RMSE between predicted and actual phenotypic values and compared them with rrBLUP and BayesA. Our results indicated that ML algorithms exhibit significantly superior performance to rrBLUP and BayesA in predicting body weight and eggshell strength traits. Conversely, rrBLUP and BayesA demonstrated 2-58% higher predictive accuracy in predicting egg numbers. Additionally, the incorporation of suggestively significant SNPs obtained through the GWAS into the ML models resulted in an increase in the predictive accuracy of 0.1-27% across nearly all traits. These findings suggest the potential of combining classical bioinformatics methods with ML techniques to improve genomic prediction in the future.


Assuntos
Galinhas , Biologia Computacional , Genômica , Aprendizado de Máquina , Polimorfismo de Nucleotídeo Único , Animais , Galinhas/genética , Genômica/métodos , Biologia Computacional/métodos , Feminino , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Genótipo , Cruzamento/métodos , Locos de Características Quantitativas
15.
Genomics ; 116(4): 110874, 2024 07.
Artigo em Inglês | MEDLINE | ID: mdl-38839024

RESUMO

Low-coverage whole-genome sequencing (LCS) offers a cost-effective alternative for sturgeon breeding, especially given the lack of SNP chips and the high costs associated with whole-genome sequencing. In this study, the efficiency of LCS for genotype imputation and genomic prediction was assessed in 643 sequenced Russian sturgeons (∼13.68×). The results showed that using BaseVar+STITCH at a sequencing depth of 2× with a sample size larger than 300 resulted in the highest genotyping accuracy. In addition, when the sequencing depth reached 0.5× and SNP density was reduced to 50 K through linkage disequilibrium pruning, the prediction accuracy was comparable to that of whole sequencing depth. Furthermore, an incremental feature selection method has the potential to improve prediction accuracy. This study suggests that the combination of LCS and imputation can be a cost-effective strategy, contributing to the genetic improvement of economic traits and promoting genetic gains in aquaculture species.


Assuntos
Peixes , Polimorfismo de Nucleotídeo Único , Peixes/genética , Animais , Sequenciamento Completo do Genoma/economia , Sequenciamento Completo do Genoma/métodos , Genômica/métodos , Genômica/economia , Análise Custo-Benefício , Desequilíbrio de Ligação
16.
Sci Rep ; 14(1): 13188, 2024 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-38851759

RESUMO

Genome interpretation (GI) encompasses the computational attempts to model the relationship between genotype and phenotype with the goal of understanding how the first leads to the second. While traditional approaches have focused on sub-problems such as predicting the effect of single nucleotide variants or finding genetic associations, recent advances in neural networks (NNs) have made it possible to develop end-to-end GI models that take genomic data as input and predict phenotypes as output. However, technical and modeling issues still need to be fixed for these models to be effective, including the widespread underdetermination of genomic datasets, making them unsuitable for training large, overfitting-prone, NNs. Here we propose novel GI models to address this issue, exploring the use of two types of transfer learning approaches and proposing a novel Biologically Meaningful Sparse NN layer specifically designed for end-to-end GI. Our models predict the leaf and seed ionome in A.thaliana, obtaining comparable results to our previous over-parameterized model while reducing the number of parameters by 8.8 folds. We also investigate how the effect of population stratification influences the evaluation of the performances, highlighting how it leads to (1) an instance of the Simpson's Paradox, and (2) model generalization limitations.


Assuntos
Arabidopsis , Genoma de Planta , Folhas de Planta , Sementes , Arabidopsis/genética , Folhas de Planta/genética , Folhas de Planta/metabolismo , Sementes/genética , Sementes/metabolismo , Redes Neurais de Computação , Genômica/métodos , Fenótipo , Modelos Genéticos , Genótipo
17.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38856170

RESUMO

In the application of genomic prediction, a situation often faced is that there are multiple populations in which genomic prediction (GP) need to be conducted. A common way to handle the multi-population GP is simply to combine the multiple populations into a single population. However, since these populations may be subject to different environments, there may exist genotype-environment interactions which may affect the accuracy of genomic prediction. In this study, we demonstrated that multi-trait genomic best linear unbiased prediction (MTGBLUP) can be used for multi-population genomic prediction, whereby the performances of a trait in different populations are regarded as different traits, and thus multi-population prediction is regarded as multi-trait prediction by employing the between-population genetic correlation. Using real datasets, we proved that MTGBLUP outperformed the conventional multi-population model that simply combines different populations together. We further proposed that MTGBLUP can be improved by partitioning the global between-population genetic correlation into local genetic correlations (LGC). We suggested two LGC models, LGC-model-1 and LGC-model-2, which partition the genome into regions with and without significant LGC (LGC-model-1) or regions with and without strong LGC (LGC-model-2). In analysis of real datasets, we demonstrated that the LGC models could increase universally the prediction accuracy and the relative improvement over MTGBLUP reached up to 163.86% (25.64% on average).


Assuntos
Genômica , Modelos Genéticos , Genômica/métodos , Genética Populacional/métodos , Locos de Características Quantitativas , Humanos , Algoritmos , Genótipo
18.
J Anim Sci Biotechnol ; 15(1): 87, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38945998

RESUMO

BACKGROUND: Biologically annotated neural networks (BANNs) are feedforward Bayesian neural network models that utilize partially connected architectures based on SNP-set annotations. As an interpretable neural network, BANNs model SNP and SNP-set effects in their input and hidden layers, respectively. Furthermore, the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales. However, its application in genomic prediction has yet to be explored. RESULTS: This study extended the BANNs framework to the area of genomic selection and explored the optimal SNP-set partitioning strategies by using dairy cattle datasets. The SNP-sets were partitioned based on two strategies-gene annotations and 100 kb windows, denoted as BANN_gene and BANN_100kb, respectively. The BANNs model was compared with GBLUP, random forest (RF), BayesB and BayesCπ through five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits, type traits, and one health trait of 6,558, 6,210 and 5,962 Chinese Holsteins, respectively. Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLUP, RF and Bayesian methods. Specifically, the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP, RF, BayesB and BayesCπ across all traits. The average accuracy improvements of BANN_100kb over GBLUP, RF, BayesB and BayesCπ were 4.86%, 3.95%, 3.84% and 1.92%, and the accuracy of BANN_gene was improved by 3.75%, 2.86%, 2.73% and 0.85% compared to GBLUP, RF, BayesB and BayesCπ, respectively across all seven traits. Meanwhile, both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP, RF and Bayesian methods. CONCLUSION: Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios, and might serve as a promising alternative approach for genomic prediction in dairy cattle.

19.
Genetics ; 227(4)2024 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-38913695

RESUMO

Increasing SNP density by incorporating sequence information only marginally increases prediction accuracies of breeding values in livestock. To find out why, we used statistical models and simulations to investigate the shape of distribution of estimated SNP effects (a profile) around quantitative trait nucleotides (QTNs) in populations with a small effective population size (Ne). A QTN profile created by averaging SNP effects around each QTN was similar to the shape of expected pairwise linkage disequilibrium (PLD) based on Ne and genetic distance between SNP, with a distinct peak for the QTN. Populations with smaller Ne showed lower but wider QTN profiles. However, adding more genotyped individuals with phenotypes dragged the profile closer to the QTN. The QTN profile was higher and narrower for populations with larger compared to smaller Ne. Assuming the PLD curve for the QTN profile, 80% of the additive genetic variance explained by each QTN was contained in ± 1/Ne Morgan interval around the QTN, corresponding to 2 Mb in cattle and 5 Mb in pigs and chickens. With such large intervals, identifying QTN is difficult even if all of them are in the data and the assumed genetic architecture is simplistic. Additional complexity in QTN detection arises from confounding of QTN profiles with signals due to relationships, overlapping profiles with closely spaced QTN, and spurious signals. However, small Ne allows for accurate predictions with large data even without QTN identification because QTNs are accounted for by QTN profiles if SNP density is sufficient to saturate the segments.


Assuntos
Desequilíbrio de Ligação , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Animais , Bovinos/genética , Mapeamento Cromossômico/métodos , Densidade Demográfica , Galinhas/genética , Suínos/genética , Genômica/métodos , Característica Quantitativa Herdável
20.
Plant Methods ; 20(1): 85, 2024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38844940

RESUMO

The selection of highly productive genotypes with stable performance across environments is a major challenge of plant breeding programs due to genotype-by-environment (GE) interactions. Over the years, different metrics have been proposed that aim at characterizing the superiority and/or stability of genotype performance across environments. However, these metrics are traditionally estimated using phenotypic values only and are not well suited to an unbalanced design in which genotypes are not observed in all environments. The objective of this research was to propose and evaluate new estimators of the following GE metrics: Ecovalence, Environmental Variance, Finlay-Wilkinson regression coefficient, and Lin-Binns superiority measure. Drawing from a multi-environment genomic prediction model, we derived the best linear unbiased prediction for each GE metric. These derivations included both a squared expectation and a variance term. To assess the effectiveness of our new estimators, we conducted simulations that varied in traits and environment parameters. In our results, new estimators consistently outperformed traditional phenotype-based estimators in terms of accuracy. By incorporating a variance term into our new estimators, in addition to the squared expectation term, we were able to improve the precision of our estimates, particularly for Ecovalence in situations where heritability was low and/or sparseness was high. All methods are implemented in a new R-package: GEmetrics. These genomic-based estimators enable estimating GE metrics in unbalanced designs and predicting GE metrics for new genotypes, which should help improve the selection efficiency of high-performance and stable genotypes across environments.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA