Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 115
Filter
1.
J Anim Breed Genet ; 2024 Jun 10.
Article in English | MEDLINE | ID: mdl-38853664

ABSTRACT

This study utilized Bayesian inference in a genome-wide association study (GWAS) to identify genetic markers associated with traits relevant to the adaptation of Hereford and Braford cattle breeds. We focused on eye pigmentation (EP), weaning hair coat (WHC), yearling hair coat (YHC), and breeding standard (BS). Our dataset comprised 126,290 animals in the pedigree. Out of these, 233 sires were genotyped using high-density (HD) chips, and 3750 animals with medium-density (50 K) single-nucleotide polymorphism (SNP) chips. Employing the Bayes B method with a prior probability of π = 0.99, we identified and tagged single nucleotide polymorphisms (Tag SNPs), ranging from 18 to 117 SNPs depending on the trait. These Tag SNPs facilitated the construction of reduced SNP panels. We then evaluated the predictive accuracy of these panels in comparison to traditional medium-density SNP chips. The accuracy of genomic predictions using these reduced panels varied significantly depending on the clustering method, ranging from 0.13 to 0.65. Additionally, we conducted functional enrichment analysis that found genes associated with the most informative SNP markers in the current study, thereby providing biological insights into the genomic basis of these traits.

2.
Front Plant Sci ; 15: 1349569, 2024.
Article in English | MEDLINE | ID: mdl-38812738

ABSTRACT

Introduction: Because Genomic selection (GS) is a predictive methodology, it needs to guarantee high-prediction accuracies for practical implementations. However, since many factors affect the prediction performance of this methodology, its practical implementation still needs to be improved in many breeding programs. For this reason, many strategies have been explored to improve the prediction performance of this methodology. Methods: When environmental covariates are incorporated as inputs in the genomic prediction models, this information only sometimes helps increase prediction performance. For this reason, this investigation explores the use of feature engineering on the environmental covariates to enhance the prediction performance of genomic prediction models. Results and discussion: We found that across data sets, feature engineering helps reduce prediction error regarding only the inclusion of the environmental covariates without feature engineering by 761.625% across predictors. These results are very promising regarding the potential of feature engineering to enhance prediction accuracy. However, since a significant gain in prediction accuracy was observed in only some data sets, further research is required to guarantee a robust feature engineering strategy to incorporate the environmental covariates.

3.
Genes (Basel) ; 15(4)2024 03 27.
Article in English | MEDLINE | ID: mdl-38674352

ABSTRACT

Genomic prediction relates a set of markers to variability in observed phenotypes of cultivars and allows for the prediction of phenotypes or breeding values of genotypes on unobserved individuals. Most genomic prediction approaches predict breeding values based solely on additive effects. However, the economic value of wheat lines is not only influenced by their additive component but also encompasses a non-additive part (e.g., additive × additive epistasis interaction). In this study, genomic prediction models were implemented in three target populations of environments (TPE) in South Asia. Four models that incorporate genotype × environment interaction (G × E) and genotype × genotype (GG) were tested: Factor Analytic (FA), FA with genomic relationship matrix (FA + G), FA with epistatic relationship matrix (FA + GG), and FA with both genomic and epistatic relationship matrices (FA + G + GG). Results show that the FA + G and FA + G + GG models displayed the best and a similar performance across all tests, leading us to infer that the FA + G model effectively captures certain epistatic effects. The wheat lines tested in sites in different TPE were predicted with different precisions depending on the cross-validation employed. In general, the best prediction accuracy was obtained when some lines were observed in some sites of particular TPEs and the worse genomic prediction was observed when wheat lines were never observed in any site of one TPE.


Subject(s)
Epistasis, Genetic , Gene-Environment Interaction , Genome, Plant , Genomics , Models, Genetic , Plant Breeding , Triticum , Triticum/genetics , Plant Breeding/methods , Genomics/methods , Genotype , Phenotype
4.
Genes (Basel) ; 15(3)2024 02 24.
Article in English | MEDLINE | ID: mdl-38540344

ABSTRACT

Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data augmentation (DA) generate synthetic data from the original training set to increase the training set and to improve the prediction performance of any statistical or machine learning algorithm. There is much empirical evidence of their success in many computer vision applications. Due to this, DA was explored in the context of GS using 14 real datasets. We found empirical evidence that DA is a powerful tool to improve the prediction accuracy, since we improved the prediction accuracy of the top lines in the 14 datasets under study. On average, across datasets and traits, the gain in prediction performance of the DA approach regarding the Conventional method in the top 20% of lines in the testing set was 108.4% in terms of the NRMSE and 107.4% in terms of the MAAPE, but a worse performance was observed on the whole testing set. We encourage more empirical evaluations to support our findings.


Subject(s)
Genome, Plant , Genomics , Phenotype , Machine Learning , Neural Networks, Computer
5.
J Anim Breed Genet ; 141(3): 291-303, 2024 May.
Article in English | MEDLINE | ID: mdl-38062881

ABSTRACT

Feed efficiency plays a major role in the overall profitability and sustainability of the beef cattle industry, as it is directly related to the reduction of the animal demand for input and methane emissions. Traditionally, the average daily feed intake and weight gain are used to calculate feed efficiency traits. However, feed efficiency traits can be analysed longitudinally using random regression models (RRMs), which allow fitting random genetic and environmental effects over time by considering the covariance pattern between the daily records. Therefore, the objectives of this study were to: (1) propose genomic evaluations for dry matter intake (DMI), body weight gain (BWG), residual feed intake (RFI) and residual weight gain (RWG) data collected during an 84-day feedlot test period via RRMs; (2) compare the goodness-of-fit of RRM using Legendre polynomials (LP) and B-spline functions; (3) evaluate the genetic parameters behaviour for feed efficiency traits and their implication for new selection strategies. The datasets were provided by the EMBRAPA-GENEPLUS beef cattle breeding program and included 2920 records for DMI, 2696 records for BWG and 4675 genotyped animals. Genetic parameters and genomic breeding values (GEBVs) were estimated by RRMs under ssGBLUP for Nellore cattle using orthogonal LPs and B-spline. Models were compared based on the deviance information criterion (DIC). The ranking of the average GEBV of each test week and the overall GEBV average were compared by the percentage of individuals in common and the Spearman correlation coefficient (top 1%, 5%, 10% and 100%). The highest goodness-of-fit was obtained with linear B-Spline function considering heterogeneous residual variance. The heritability estimates across the test period for DMI, BWG, RFI and RWG ranged from 0.06 to 0.21, 0.11 to 0.30, 0.03 to 0.26 and 0.07 to 0.27, respectively. DMI and RFI presented within-trait genetic correlations ranging from low to high magnitude across different performance test-day. In contrast, BWG and RWG presented negative genetic correlations between the first 3 weeks and the other days of performance tests. DMI and RFI presented a high-ranking similarity between the GEBV average of week eight and the overall GEBV average, with Spearman correlations and percentages of individuals selected in common ranging from 0.95 to 1.00 and 93 to 100, respectively. Week 11 presented the highest Spearman correlations (ranging from 0.94 to 0.98) and percentages of individuals selected in common (ranging from 85 to 94) of BWG and RWG with the average GEBV of the entire period of the test. In conclusion, the RRM using linear B-splines is a feasible alternative for the genomic evaluation of feed efficiency. Heritability estimates of DMI, RFI, BWG and RWG indicate enough additive genetic variance to achieve a moderate response to selection. A new selection strategy can be adopted by reducing the performance test to 56 days for DMI and RFI selection and 77 days for BWG and RWG selection.


Subject(s)
Genome , Genomics , Humans , Cattle/genetics , Animals , Phenotype , Weight Gain/genetics , Genotype , Eating/genetics , Animal Feed
6.
Int J Mol Sci ; 24(24)2023 Dec 18.
Article in English | MEDLINE | ID: mdl-38139451

ABSTRACT

Nitrogen (N), the most important macro-nutrient for plant growth and development, is a key factor that determines crop yield. Yet its excessive applications pollute the environment and are expensive. Hence, studying nitrogen use efficiency (NUE) in crops is fundamental for sustainable agriculture. Here, an association panel consisting of 123 flax accessions was evaluated for 21 NUE-related traits at the seedling stage under optimum N (N+) and N deficiency (N-) treatments to dissect the genetic architecture of NUE-related traits using a multi-omics approach integrating genome-wide association studies (GWAS), transcriptome analysis and genomic selection (GS). Root traits exhibited significant and positive correlations with NUE under N- conditions (r = 0.33 to 0.43, p < 0.05). A total of 359 QTLs were identified, accounting for 0.11% to 23.1% of the phenotypic variation in NUE-related traits. Transcriptomic analysis identified 1034 differentially expressed genes (DEGs) under contrasting N conditions. DEGs involved in N metabolism, root development, amino acid transport and catabolism and others, were found near the QTLs. GS models to predict NUE stress tolerance index (NUE_STI) trait were tested using a random genome-wide SNP dataset and a GWAS-derived QTLs dataset. The latter produced superior prediction accuracy (r = 0.62 to 0.79) compared to the genome-wide SNP marker dataset (r = 0.11) for NUE_STI. Our results provide insights into the QTL architecture of NUE-related traits, identify candidate genes for further studies, and propose genomic breeding tools to achieve superior NUE in flax under low N input.


Subject(s)
Flax , Nitrogen , Flax/genetics , Flax/metabolism , Genome-Wide Association Study , Genomics , Nitrogen/metabolism , Plant Breeding , RNA-Seq , Seedlings/metabolism
7.
Front Plant Sci ; 14: 1252504, 2023.
Article in English | MEDLINE | ID: mdl-37965018

ABSTRACT

Introduction: Genomic selection (GS) experiments in forest trees have largely reported estimates of predictive abilities from cross-validation among individuals in the same breeding generation. In such conditions, no effects of recombination, selection, drift, and environmental changes are accounted for. Here, we assessed the effectively realized predictive ability (RPA) for volume growth at harvest age by GS across generations in an operational reciprocal recurrent selection (RRS) program of hybrid Eucalyptus. Methods: Genomic best linear unbiased prediction with additive (GBLUP_G), additive plus dominance (GBLUP_G+D), and additive single-step (HBLUP) models were trained with different combinations of growth data of hybrids and pure species individuals (N = 17,462) of the G1 generation, 1,944 of which were genotyped with ~16,000 SNPs from SNP arrays. The hybrid G2 progeny trial (HPT267) was the GS target, with 1,400 selection candidates, 197 of which were genotyped still at the seedling stage, and genomically predicted for their breeding and genotypic values at the operational harvest age (6 years). Seedlings were then grown to harvest and measured, and their pedigree-based breeding and genotypic values were compared to their originally predicted genomic counterparts. Results: Genomic RPAs ≥0.80 were obtained as the genetic relatedness between G1 and G2 increased, especially when the direct parents of selection candidates were used in training. GBLUP_G+D reached RPAs ≥0.70 only when hybrid or pure species data of G1 were included in training. HBLUP was only marginally better than GBLUP. Correlations ≥0.80 were obtained between pedigree and genomic individual ranks. Rank coincidence of the top 2.5% selections was the highest for GBLUP_G (45% to 60%) compared to GBLUP_G+D. To advance the pure species RRS populations, GS models were best when trained on pure species than hybrid data, and HBLUP yielded ~20% higher predictive abilities than GBLUP, but was not better than ABLUP for ungenotyped trees. Discussion: We demonstrate that genomic data effectively enable accurate ranking of eucalypt hybrid seedlings for their yet-to-be observed volume growth at harvest age. Our results support a two-stage GS approach involving family selection by average genomic breeding value, followed by within-top-families individual GS, significantly increasing selection intensity, optimizing genotyping costs, and accelerating RRS breeding.

8.
Front Genet ; 14: 1209275, 2023.
Article in English | MEDLINE | ID: mdl-37554404

ABSTRACT

Genomic selection (GS) is transforming plant and animal breeding, but its practical implementation for complex traits and multi-environmental trials remains challenging. To address this issue, this study investigates the integration of environmental information with genotypic information in GS. The study proposes the use of two feature selection methods (Pearson's correlation and Boruta) for the integration of environmental information. Results indicate that the simple incorporation of environmental covariates may increase or decrease prediction accuracy depending on the case. However, optimal incorporation of environmental covariates using feature selection significantly improves prediction accuracy in four out of six datasets between 14.25% and 218.71% under a leave one environment out cross validation scenario in terms of Normalized Root Mean Squared Error, but not relevant gain was observed in terms of Pearson´s correlation. In two datasets where environmental covariates are unrelated to the response variable, feature selection is unable to enhance prediction accuracy. Therefore, the study provides empirical evidence supporting the use of feature selection to improve the prediction power of GS.

9.
Int J Mol Sci ; 24(13)2023 Jun 22.
Article in English | MEDLINE | ID: mdl-37445683

ABSTRACT

Genomic prediction combines molecular and phenotypic data in a training population to predict the breeding values of individuals that have only been genotyped. The use of genomic information in breeding programs helps to increase the frequency of favorable alleles in the populations of interest. This study evaluated the performance of BLUP (Best Linear Unbiased Prediction) in predicting resistance to tan spot, spot blotch and Septoria nodorum blotch in synthetic hexaploid wheat. BLUP was implemented in single-trait and multi-trait models with three variations: (1) the pedigree relationship matrix (A-BLUP), (2) the genomic relationship matrix (G-BLUP), and (3) a combination of the two matrices (A+G BLUP). In all three diseases, the A-BLUP model had a lower performance, and the G-BLUP and A+G BLUP were statistically similar (p ≥ 0.05). The prediction accuracy with the single trait was statistically similar (p ≥ 0.05) to the multi-trait accuracy, possibly due to the low correlation of severity between the diseases.


Subject(s)
Plant Diseases , Triticum , Humans , Triticum/genetics , Plant Diseases/genetics , Plant Breeding , Genome , Genomics , Phenotype , Genotype , Models, Genetic
10.
Genes (Basel) ; 14(5)2023 04 28.
Article in English | MEDLINE | ID: mdl-37239363

ABSTRACT

Genomic selection (GS) is revolutionizing plant breeding. However, because it is a predictive methodology, a basic understanding of statistical machine-learning methods is necessary for its successful implementation. This methodology uses a reference population that contains both the phenotypic and genotypic information of genotypes to train a statistical machine-learning method. After optimization, this method is used to make predictions of candidate lines for which only genotypic information is available. However, due to a lack of time and appropriate training, it is difficult for breeders and scientists of related fields to learn all the fundamentals of prediction algorithms. With smart or highly automated software, it is possible for these professionals to appropriately implement any state-of-the-art statistical machine-learning method for its collected data without the need for an exhaustive understanding of statistical machine-learning methods and programing. For this reason, we introduce state-of-the-art statistical machine-learning methods using the Sparse Kernel Methods (SKM) R library, with complete guidelines on how to implement seven statistical machine-learning methods that are available in this library for genomic prediction (random forest, Bayesian models, support vector machine, gradient boosted machine, generalized linear models, partial least squares, feed-forward artificial neural networks). This guide includes details of the functions required to implement each of the methods, as well as others for easily implementing different tuning strategies, cross-validation strategies, and metrics to evaluate the prediction performance and different summary functions that compute it. A toy dataset illustrates how to implement statistical machine-learning methods and facilitate their use by professionals who do not possess a strong background in machine learning and programing.


Subject(s)
Plant Breeding , Software , Bayes Theorem , Genomics/methods , Machine Learning
11.
Anim Biosci ; 36(7): 1003-1009, 2023 Jul.
Article in English | MEDLINE | ID: mdl-36915917

ABSTRACT

OBJECTIVE: The objective was to compare (pedigree-based) best linear unbiased prediction (BLUP), genomic BLUP (GBLUP), and single-step GBLUP (ssGBLUP) methods for genomic evaluation of growth traits in a Mexican Braunvieh cattle population. METHODS: Birth (BW), weaning (WW), and yearling weight (YW) data of a Mexican Braunvieh cattle population were analyzed with BLUP, GBLUP, and ssGBLUP methods. These methods are differentiated by the additive genetic relationship matrix included in the model and the animals under evaluation. The predictive ability of the model was evaluated using random partitions of the data in training and testing sets, consistently predicting about 20% of genotyped animals on all occasions. For each partition, the Pearson correlation coefficient between adjusted phenotypes for fixed effects and non-genetic random effects and the estimated breeding values (EBV) were computed. RESULTS: The random contemporary group (CG) effect explained about 50%, 45%, and 35% of the phenotypic variance in BW, WW, and YW, respectively. For the three methods, the CG effect explained the highest proportion of the phenotypic variances (except for YW-GBLUP). The heritability estimate obtained with GBLUP was the lowest for BW, while the highest heritability was obtained with BLUP. For WW, the highest heritability estimate was obtained with BLUP, the estimates obtained with GBLUP and ssGBLUP were similar. For YW, the heritability estimates obtained with GBLUP and BLUP were similar, and the lowest heritability was obtained with ssGBLUP. Pearson correlation coefficients between adjusted phenotypes for non-genetic effects and EBVs were the highest for BLUP, followed by ssBLUP and GBLUP. CONCLUSION: The successful implementation of genetic evaluations that include genotyped and non-genotyped animals in our study indicate a promising method for use in genetic improvement programs of Braunvieh cattle. Our findings showed that simultaneous evaluation of genotyped and non-genotyped animals improved prediction accuracy for growth traits even with a limited number of genotyped animals.

12.
Trop Anim Health Prod ; 55(2): 95, 2023 Feb 21.
Article in English | MEDLINE | ID: mdl-36810697

ABSTRACT

The aim of this work was to evaluate the impact of applying genomic information in pedigree uncertainty situations on genetic evaluations for growth- and cow productivity-related traits in Nelore commercial herds. Records for accumulated cow productivity (ACP) and adjusted weight at 450 days of age (W450) were used, as well as genotypes of registered and commercial herd animals, genotyped with the Clarifide Nelore 3.1 panel (~29,000 SNPs). The genetic values for commercial and registered populations were estimated using different approaches that included (ssGBLUP) or did not include genomic information (BLUP), with different pedigree structures. Different scenarios were tested, varying the proportion of young animals with unknown sires (0, 25, 50, 75, and 100%), and unknown maternal grandsires (0, 25, 50, 75, and 100%). The prediction accuracies and abilities were calculated. The estimated breeding value accuracies decreased as the proportion of unknown sires and maternal grandsires increased. The genomic estimated breeding value accuracy using the ssGBLUP was higher in scenarios with a lower proportion of known pedigree when compared to the BLUP methodology. The results obtained with the ssGBLUP showed that it is possible to obtain reliable direct and indirect predictions for young animals from commercial herds without pedigree structure.


Subject(s)
Genome , Models, Genetic , Female , Cattle , Animals , Pedigree , Genomics/methods , Genotype , Phenotype
13.
J Appl Genet ; 64(1): 159-167, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36376720

ABSTRACT

This study aimed to estimate prediction ability and genetic parameters for residual feed intake (RFI) calculated using a regression equation for each test (RFItest) and for the whole population (RFIpop) in Nellore beef cattle. It also aimed to evaluate the correlations between RFIpop and RFItest with growth, reproductive, and carcass traits. Genotypic and phenotypic records from 8354 animals were used. An analysis of variance (ANOVA) was performed to verify the adequacy of the regression equations applied to estimate the RFItest and RFIpop. The (co)variance components were obtained using the single-step genomic best linear unbiased prediction under single and two-trait animal model analyses. The genetic and phenotypic correlations between RFItest and RFIpop with dry matter intake, frame, growth, reproduction, and carcass-related traits were evaluated. The prediction ability and bias were estimated to compare the RFItest and RFIpop genomic breeding values (GEBV). The RFIpop ANOVA showed a higher significance level (p < 0.0001) than did the RFItest for the fixed effects. The RFIpop displayed higher additive genetic variance estimated than the RFItest, although the RFIpop and RFItest displayed similar heritabilities. Overall, the RFItest showed higher residual correlations with growth, reproductive, and carcass traits, while the RFIpop displayed higher genetic correlations with such traits. The GEBV for the RFItest was slightly biased than GEBV RFIpop. The approach to calculate the RFI influenced the decomposition and estimation of variance components and genomic prediction for RFI. The application of RFIpop would be more appropriate for genetic evaluation purpose to adjust or correct for non-genetic effects and to decrease the prediction bias for RFI.


Subject(s)
Eating , Genome , Cattle/genetics , Animals , Eating/genetics , Phenotype , Genomics , Reproduction/genetics , Animal Feed
14.
Metabolites ; 14(1)2023 Dec 21.
Article in English | MEDLINE | ID: mdl-38276296

ABSTRACT

The meat market has enormous importance for the world economy, and the quality of the product offered to the consumer is fundamental for the success of the sector. In this study, we analyzed a database which contained information on 2470 animals from a commercial farm in the state of São Paulo, Brazil. Of this total, 2181 animals were genotyped, using 777,962 single-nucleotide polymorphisms (SNPs). After quality control analysis, 468,321 SNPs provided information on the number of genotyped animals. Genome-wide association analyses (GWAS) were performed for the characteristics of the rib eye area (REA), subcutaneous fat thickness (SFT), shear force at 7 days' ageing (SF7), and intramuscular fat (IMF), with the aid of the single-step genomic best linear unbiased prediction (ssGBLUP) method, with the purpose of identifying possible genomic windows (~1 Mb) responsible for explaining at least 0.5% of the genetic variance of the traits under analysis (≥0.5%). These genomic regions were used in a gene search and enrichment analyses using MeSH terms. The distributed heritability coefficients were 0.14, 0.20, 0.18, and 0.21 for REA, SFT, SF7, and IMF, respectively. The GWAS results indicated significant genomic windows for the traits of interest in a total of 17 chromosomes. Enrichment analyses showed the following significant terms (FDR ≤ 0.05) associated with the characteristics under study: for the REA, heat stress disorders and life cycle stages; for SFT, insulin and nonesterified fatty acids; for SF7, apoptosis and heat shock proteins (HSP27); and for IMF, metalloproteinase 2. In addition, KEGG (Kyoto encyclopedia of genes and genomes) enrichment analysis allowed us to highlight important metabolic pathways related to the studied phenotypes, such as the growth hormone synthesis, insulin-signaling, fatty acid metabolism, and ABC transporter pathways. The results obtained provide a better understanding of the molecular processes involved in the expression of the studied characteristics and may contribute to the design of selection strategies and future studies aimed at improving the productivity of Nellore cattle.

15.
Ciênc. rural (Online) ; 53(10): e20220327, 2023. tab, graf
Article in English | VETINDEX | ID: biblio-1418792

ABSTRACT

Quantile Random Forest (QRF) is a non-parametric methodology that combines the advantages of Random Forest (RF) and Quantile Regression (QR). Specifically, this approach can explore non-linear functions, determining the probability distribution of a response variable and extracting information from different quantiles instead of just predicting the mean. This evaluated the performance of the QRF in the genomic prediction for complex traits (epistasis and dominance). In addition, compare the accuracies obtained with those derived from the G-BLUP. The simulation created an F2 population with 1,000 individuals and genotyped for 4,010 SNP markers. Besides, twelve traits were simulated from a model considering additive and non-additive effects, QTL (Quantitative trait loci) numbers ranging from eight to 120, and heritability of 0.3, 0.5, or 0.8. For training and validation, the 5-fold cross-validation approach was used. For each fold, the accuracies of all the proposed models were calculated: QRF in five different quantiles and three G-BLUP models (additive effect, additive and epistatic effects, additive and dominant effects). Finally, the predictive performance of these methodologies was compared. In all scenarios, the QRF accuracies were equal to or greater than the methodologies evaluated and proved to be an alternative tool to predict genetic values in complex traits.


Quantile Random Forest (QRF) é uma metodologia não paramétrica, que combina as vantagens do Random Forest (RF) e da Regressão Quantílica (QR). Especificamente, essa abordagem pode explorar funções não lineares, determinando a distribuição de probabilidade de uma variável resposta e extraindo informações de diferentes quantis em vez de apenas prever a média. O objetivo deste trabalho foi avaliar o desempenho do QRF em predizer o valor genético genômico para características com arquitetura genética não aditiva (epistasia e dominância). Adicionalmente, as acurácias obtidas foram comparadas com aquelas advindas do G-BLUP. A simulação criou uma população F2 com 1.000 indivíduos genotipados para 4.010 marcadores SNP. Além disso, doze características foram simuladas a partir de um modelo considerando efeitos aditivos e não aditivos, com número de QTL (Quantitative trait loci) variando de oito a 120 e herdabilidade de 0,3, 0,5 ou 0,8. Para treinamento e validação foi usada a abordagem da validação cruzada 5-fold. Para cada um dos folds foram calculadas as acurácias de todos os modelos propostos: QRF em cinco quantis diferentes e três modelos do G-BLUP (com efeito aditivo, aditivo e epistático, aditivo e dominante). Por fim, o desempenho preditivo dessas metodologias foi comparado. Em todos os cenários, as acurácias do QRF foram iguais ou superiores às metodologias avaliadas e mostrou ser uma ferramenta alternativa para predizer valores genéticos em características complexas.


Subject(s)
Selection, Genetic , Genome , Genomics , Epistasis, Genetic , Random Forest
16.
Ciênc. rural (Online) ; 53(10): e20220327, 2023. tab, graf
Article in English | LILACS-Express | VETINDEX | ID: biblio-1430203

ABSTRACT

ABSTRACT: Quantile Random Forest (QRF) is a non-parametric methodology that combines the advantages of Random Forest (RF) and Quantile Regression (QR). Specifically, this approach can explore non-linear functions, determining the probability distribution of a response variable and extracting information from different quantiles instead of just predicting the mean. This evaluated the performance of the QRF in the genomic prediction for complex traits (epistasis and dominance). In addition, compare the accuracies obtained with those derived from the G-BLUP. The simulation created an F2 population with 1,000 individuals and genotyped for 4,010 SNP markers. Besides, twelve traits were simulated from a model considering additive and non-additive effects, QTL (Quantitative trait loci) numbers ranging from eight to 120, and heritability of 0.3, 0.5, or 0.8. For training and validation, the 5-fold cross-validation approach was used. For each fold, the accuracies of all the proposed models were calculated: QRF in five different quantiles and three G-BLUP models (additive effect, additive and epistatic effects, additive and dominant effects). Finally, the predictive performance of these methodologies was compared. In all scenarios, the QRF accuracies were equal to or greater than the methodologies evaluated and proved to be an alternative tool to predict genetic values in complex traits.


RESUMO: Quantile Random Forest (QRF) é uma metodologia não paramétrica, que combina as vantagens do Random Forest (RF) e da Regressão Quantílica (QR). Especificamente, essa abordagem pode explorar funções não lineares, determinando a distribuição de probabilidade de uma variável resposta e extraindo informações de diferentes quantis em vez de apenas prever a média. O objetivo deste trabalho foi avaliar o desempenho do QRF em predizer o valor genético genômico para características com arquitetura genética não aditiva (epistasia e dominância). Adicionalmente, as acurácias obtidas foram comparadas com aquelas advindas do G-BLUP. A simulação criou uma população F2 com 1.000 indivíduos genotipados para 4.010 marcadores SNP. Além disso, doze características foram simuladas a partir de um modelo considerando efeitos aditivos e não aditivos, com número de QTL (Quantitative trait loci) variando de oito a 120 e herdabilidade de 0,3, 0,5 ou 0,8. Para treinamento e validação foi usada a abordagem da validação cruzada 5-fold. Para cada um dos folds foram calculadas as acurácias de todos os modelos propostos: QRF em cinco quantis diferentes e três modelos do G-BLUP (com efeito aditivo, aditivo e epistático, aditivo e dominante). Por fim, o desempenho preditivo dessas metodologias foi comparado. Em todos os cenários, as acurácias do QRF foram iguais ou superiores às metodologias avaliadas e mostrou ser uma ferramenta alternativa para predizer valores genéticos em características complexas.

17.
BAG, J. basic appl. genet. (Online) ; 33(2): 45-53, Dec. 2022. graf
Article in Spanish | LILACS-Express | LILACS | ID: biblio-1420296

ABSTRACT

RESUMEN La displasia de cadera canina o displasia coxo-femoral (DCF) es un desorden progresivo e incapacitante en perros de razas grandes, como el Ovejero Alemán. La selección de reproductores libres de displasia es la única forma de reducir su incidencia. Se han desarrollado varios métodos de diagnóstico basados en el examen radiográfico, en base a los cuales se seleccionan los reproductores para la cría. La DCF tiene una base hereditaria poligénica e influencia ambiental, con una heredabilidad media a baja (alrededor de 0,20 a 0,40), por lo que el progreso de la selección fenotípica ha sido lento. En Argentina la prevalencia de la displasia en la raza sigue siendo alta (>25%) y es imposible prever su incidencia en la progenie del plantel de cría. Algunos países han implementado la selección basada en el valor estimado de cría, obteniendo un importante avance. Los estudios de asociación del genoma completo han revelado numerosos marcadores asociados a la DCF y se han encontrado varios genes candidatos que señalan la posibilidad de implementar una selección genómica en un futuro cercano.


ABSTRACT Canine hip dysplasia (CHD) is a progressive and disabling disorder in large dog breeds, such as the German Shepherd dog. Breeding sires and dams free of dysplasia is the only way to reduce its incidence. Several diagnostic methods have been developed based on radiographic examination, on the basis of which dogs are selected for breeding. CHD has a polygenic hereditary basis and environmental influence, with a median to low heritability (ca. 0,20 to 0,40), so the progress in phenotypic selection has been slow. In Argentina, the prevalence of dysplasia in German Shepherd dogs remains high (> 25%) and it is impossible to predict its incidence in the offspring of the breeding stock. Some countries have implemented a selection based on the estimated breeding value, obtaining an important advance. Genomewide association studies have revealed numerous CHD-associated markers and several candidate genes have been found that point to the possibility of implementing genomic selection in the near future.

18.
Animals (Basel) ; 12(21)2022 Oct 29.
Article in English | MEDLINE | ID: mdl-36359100

ABSTRACT

Tenderness is one of the main characteristics of meat because it determines its price and acceptability. This is the first bibliometric study on the trend of research on the role of genes in meat tenderness. A total of 175 original and English-language articles published up to 2021 were retrieved from Scopus. The bibliometric analysis was carried out with VOSviewer (version 1.6.18, Eck and Waltman, Leiden, Netherlands) and complemented with the Analyze search results service from Scopus. Erroneous and duplicate data were eliminated, and incomplete information was added to standardize the results. Scientific production was evaluated by means of quantity, quality and structure indicators. As a first glance, 8.816% of authors have published more than 50% of papers mainly related to genes encoding the calpain (CAPN)-calpastatin (CAST) system and single nucleotide polymorphisms (SNPs). Among other findings, a strong link was found between the contribution of the main countries (led by the United States with) and their institutions (led by the USDA Agricultural Research Service with) to their gross domestic product. Most studies on the topic are published in the Journal of Animal Science, and other journals with high impact according to the number of citations and different metrics. Finally, when evaluating the most cited articles, the occurrence and association of the main keywords, it was confirmed that research is focused on the role of CAPN and CAST genes and of SNPs in beef tenderness. The change in science was emphasized; although marker-assisted selection is still used, genes have an infinitesimal effect on complex traits. Therefore, since about 2010, new research groups adopted genomic selection to evaluate dense panels of SNPs and better explain genetic variation in meat tenderness.

19.
Genes (Basel) ; 13(8)2022 08 21.
Article in English | MEDLINE | ID: mdl-36011405

ABSTRACT

Genomic selection (GS) changed the way plant breeders select genotypes. GS takes advantage of phenotypic and genotypic information to training a statistical machine learning model, which is used to predict phenotypic (or breeding) values of new lines for which only genotypic information is available. Therefore, many statistical machine learning methods have been proposed for this task. Multi-trait (MT) genomic prediction models take advantage of correlated traits to improve prediction accuracy. Therefore, some multivariate statistical machine learning methods are popular for GS. In this paper, we compare the prediction performance of three MT methods: the MT genomic best linear unbiased predictor (GBLUP), the MT partial least squares (PLS) and the multi-trait random forest (RF) methods. Benchmarking was performed with six real datasets. We found that the three investigated methods produce similar results, but under predictors with genotype (G) and environment (E), that is, E + G, the MT GBLUP achieved superior performance, whereas under predictors E + G + genotype × environment (GE) and G + GE, random forest achieved the best results. We also found that the best predictions were achieved under the predictors E + G and E + G + GE. Here, we also provide the R code for the implementation of these three statistical machine learning methods in the sparse kernel method (SKM) library, which offers not only options for single-trait prediction with various statistical machine learning methods but also some options for MT predictions that can help to capture improved complex patterns in datasets that are common in genomic selection.


Subject(s)
Genome , Genomics , Genomics/methods , Machine Learning , Phenotype , Plant Breeding/methods
20.
Anim Genet ; 53(5): 570-582, 2022 Oct.
Article in English | MEDLINE | ID: mdl-35811456

ABSTRACT

This study aimed to integrate analyses of structural variations and differentially expressed genes (DEGs) associated with the beef fatty acid (FA) profile in Nellore cattle. Copy numbers variation (CNV) detection was performed using the penncnv algorithm and CNVRuler software in 3794 genotyped animals through the High-Density Bovine BeadChip. In order to perform the genomic wide association study (GWAS), a total of 963 genotyped animals were selected to obtain the intramuscular lipid concentration and quantify the beef FA profile. A total of 48 animals belonging to the same farm and management lot were extracted from the 963 genotyped and phenotyped animals to carry out the transcriptomic and differentially expressed gene analyses. The GWAS with extreme groups of FA profiles was performed using a logistic model. A total of 43, 42, 66 and 35 significant CNV regions (p < 0.05) for saturated, monounsaturated, polyunsaturated and omega 3 and 6 fatty acids were identified respectively. The paired-end sequencing of 48 samples was performed using the Illumina HiSeq2500 platform. Real-time quantitative PCR was used to validate the DEGs identified by RNA-seq analysis. The results showed several DEGs associated with the FA profile of Longissimus thoracis, such as BSCL2 and SAMD8. Enriched terms as the cellular response to corticosteroid (GO:0071384) and glucocorticoid stimulus (GO:0071385) could be highlighted. The identification of structural variations harboring candidate genes for beef FA must contribute to the elucidation of the genetic basis that determines the beef FA composition of intramuscular fat in Nellore cattle. Our results will contribute to the identification of potential biomarkers for complex phenotypes, such as the FA profile, to improve the reliability of the genomic predictions including pre-selected variants using differentiated weighting in the genomic models.


Subject(s)
Fatty Acids , Animals , Cattle/genetics , Fatty Acids/analysis , Gene Expression , Genotype , Phenotype , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL