Resumo
ABSTRACT: Quantile Random Forest (QRF) is a non-parametric methodology that combines the advantages of Random Forest (RF) and Quantile Regression (QR). Specifically, this approach can explore non-linear functions, determining the probability distribution of a response variable and extracting information from different quantiles instead of just predicting the mean. This evaluated the performance of the QRF in the genomic prediction for complex traits (epistasis and dominance). In addition, compare the accuracies obtained with those derived from the G-BLUP. The simulation created an F2 population with 1,000 individuals and genotyped for 4,010 SNP markers. Besides, twelve traits were simulated from a model considering additive and non-additive effects, QTL (Quantitative trait loci) numbers ranging from eight to 120, and heritability of 0.3, 0.5, or 0.8. For training and validation, the 5-fold cross-validation approach was used. For each fold, the accuracies of all the proposed models were calculated: QRF in five different quantiles and three G-BLUP models (additive effect, additive and epistatic effects, additive and dominant effects). Finally, the predictive performance of these methodologies was compared. In all scenarios, the QRF accuracies were equal to or greater than the methodologies evaluated and proved to be an alternative tool to predict genetic values in complex traits.
RESUMO: Quantile Random Forest (QRF) é uma metodologia não paramétrica, que combina as vantagens do Random Forest (RF) e da Regressão Quantílica (QR). Especificamente, essa abordagem pode explorar funções não lineares, determinando a distribuição de probabilidade de uma variável resposta e extraindo informações de diferentes quantis em vez de apenas prever a média. O objetivo deste trabalho foi avaliar o desempenho do QRF em predizer o valor genético genômico para características com arquitetura genética não aditiva (epistasia e dominância). Adicionalmente, as acurácias obtidas foram comparadas com aquelas advindas do G-BLUP. A simulação criou uma população F2 com 1.000 indivíduos genotipados para 4.010 marcadores SNP. Além disso, doze características foram simuladas a partir de um modelo considerando efeitos aditivos e não aditivos, com número de QTL (Quantitative trait loci) variando de oito a 120 e herdabilidade de 0,3, 0,5 ou 0,8. Para treinamento e validação foi usada a abordagem da validação cruzada 5-fold. Para cada um dos folds foram calculadas as acurácias de todos os modelos propostos: QRF em cinco quantis diferentes e três modelos do G-BLUP (com efeito aditivo, aditivo e epistático, aditivo e dominante). Por fim, o desempenho preditivo dessas metodologias foi comparado. Em todos os cenários, as acurácias do QRF foram iguais ou superiores às metodologias avaliadas e mostrou ser uma ferramenta alternativa para predizer valores genéticos em características complexas.
Resumo
Among the multi-trait models selected to study several traits and environments jointly, the Bayesian framework has been a preferred tool when constructing a more complex and biologically realistic model. In most cases, non-informative prior distributions are adopted in studies using the Bayesian approach. However, the Bayesian approach presents more accurate estimates when informative prior distributions are used. The present study was developed to evaluate the efficiency and applicability of multi-trait multi-environment (MTME) models within a Bayesian framework utilizing a strategy for eliciting informative prior distribution using previous data on rice. The study involved data pertaining to rice (Oryza sativa L.) genotypes in three environments and five crop seasons (2010/2011 until 2014/2015) for the following traits: grain yield (GY), flowering in days (FLOR) and plant height (PH). Variance components, genetic and non-genetic parameters were estimated using the Bayesian method. In general, the informative prior distribution in Bayesian MTME models provided higher estimates of individual narrow-sense heritability and variance components, as well as minor lengths for the highest probability density interval (HPD), compared to their respective non-informative prior distribution analyses. More informative prior distributions make it possible to detect genetic correlations between traits, which cannot be achieved with non-informative prior distributions. Therefore, this mechanism presented to update knowledge for an elicitation of an informative prior distribution can be efficiently applied in rice breeding programs.
Assuntos
Oryza/crescimento & desenvolvimento , Alimentos Geneticamente Modificados/estatística & dados numéricosResumo
Quantile Random Forest (QRF) is a non-parametric methodology that combines the advantages of Random Forest (RF) and Quantile Regression (QR). Specifically, this approach can explore non-linear functions, determining the probability distribution of a response variable and extracting information from different quantiles instead of just predicting the mean. This evaluated the performance of the QRF in the genomic prediction for complex traits (epistasis and dominance). In addition, compare the accuracies obtained with those derived from the G-BLUP. The simulation created an F2 population with 1,000 individuals and genotyped for 4,010 SNP markers. Besides, twelve traits were simulated from a model considering additive and non-additive effects, QTL (Quantitative trait loci) numbers ranging from eight to 120, and heritability of 0.3, 0.5, or 0.8. For training and validation, the 5-fold cross-validation approach was used. For each fold, the accuracies of all the proposed models were calculated: QRF in five different quantiles and three G-BLUP models (additive effect, additive and epistatic effects, additive and dominant effects). Finally, the predictive performance of these methodologies was compared. In all scenarios, the QRF accuracies were equal to or greater than the methodologies evaluated and proved to be an alternative tool to predict genetic values in complex traits.
Quantile Random Forest (QRF) é uma metodologia não paramétrica, que combina as vantagens do Random Forest (RF) e da Regressão Quantílica (QR). Especificamente, essa abordagem pode explorar funções não lineares, determinando a distribuição de probabilidade de uma variável resposta e extraindo informações de diferentes quantis em vez de apenas prever a média. O objetivo deste trabalho foi avaliar o desempenho do QRF em predizer o valor genético genômico para características com arquitetura genética não aditiva (epistasia e dominância). Adicionalmente, as acurácias obtidas foram comparadas com aquelas advindas do G-BLUP. A simulação criou uma população F2 com 1.000 indivíduos genotipados para 4.010 marcadores SNP. Além disso, doze características foram simuladas a partir de um modelo considerando efeitos aditivos e não aditivos, com número de QTL (Quantitative trait loci) variando de oito a 120 e herdabilidade de 0,3, 0,5 ou 0,8. Para treinamento e validação foi usada a abordagem da validação cruzada 5-fold. Para cada um dos folds foram calculadas as acurácias de todos os modelos propostos: QRF em cinco quantis diferentes e três modelos do G-BLUP (com efeito aditivo, aditivo e epistático, aditivo e dominante). Por fim, o desempenho preditivo dessas metodologias foi comparado. Em todos os cenários, as acurácias do QRF foram iguais ou superiores às metodologias avaliadas e mostrou ser uma ferramenta alternativa para predizer valores genéticos em características complexas.
Assuntos
Seleção Genética , Genoma , Genômica , Epistasia Genética , Algoritmo Florestas AleatóriasResumo
The development of efficient methods for genome-wide association studies (GWAS) between quantitative trait loci (QTL) and genetic values is extremely important to animal and plant breeding programs. Bayesian approaches that aim to select regions of single nucleotide polymorphisms (SNPs) proved to be efficient, indicating genes with important effects. Among the selection criteria for SNPs or regions, selection criterion by percentage of variance can be explained by genomic regions (%var), selection of tag SNPs, and selection based on the window posterior probability of association (WPPA). To also detect potentially associated regions, we proposed measuring posterior probability of the interval PPint), which aims to select regions based on the markers of greatest effects. Therefore, the objective of this work was to evaluate these approaches, in terms of efficiency in selecting and identifying markers or regions located within or close to genes associated with traits. This study also aimed to compare these methodologies with single-marker analyses. To accomplish this, simulated data were used in six scenarios, with SNPs allocated in non-overlapping genomic regions. Considering traits with oligogenic inheritance, WPPA criterion followed by %var and PPint criteria were shown to be superior, presenting higher values of detection power, capturing higher percentages of genetic variance and larger areas. For traits with polygenic inheritance, PPint and WPPA criteria were considered superior. Single-marker analyses identified SNPs associated only in oligogenic inheritance scenarios and was lower than the other criteria.(AU)
Assuntos
Variação Genética , Teorema de Bayes , Melhoramento Genético/métodos , Locos de Características Quantitativas/genética , Metodologia como AssuntoResumo
The principal component regression (PCR) and the independent component regression (ICR) are dimensionality reduction methods and extremely important in genomic prediction. These methods require the choice of the number of components to be inserted into the model. For PCR, there are formal criteria; however, for ICR, the adopted criterion chooses the number of independent components (ICs) associated to greater accuracy and requires high computational time. In this study, seven criteria based on the number of principal components (PCs) and methods of variable selection to guide this choice in ICR are proposed and evaluated in simulated and real data. For both datasets, the most efficient criterion and that drastically reduced computational time determined that the number of ICs should be equal to the number of PCs to reach a higher accuracy value. In addition, the criteria did not recover the simulated heritability and generated biased genomic values.
Assuntos
Oryza/genética , Melhoramento Vegetal/métodos , Análise de Regressão , Previsões/métodosResumo
Empirical patterns of linkage disequilibrium (LD) can be used to increase the statistical power of genetic mapping. This study was carried out with the objective of verifying the efficacy of factor analysis (AF) applied to data sets of molecular markers of the SNP type, in order to identify linkage groups and haplotypes blocks. The SNPs data set used was derived from a simulation process of an F2 population, containing 2000 marks with information of 500 individuals. The estimation of the factorial loadings of FA was made in two ways, considering the matrix of distances between the markers (A) and considering the correlation matrix (R). The number of factors (k) to be used was established based on the graph scree-plot and based on the proportion of the total variance explained. Results indicated that matrices A and R lead to similar results. Based on the scree-plot we considered k equal to 10 and the factors interpreted as being representative of the bonding groups. The second criterion led to a number of factors equal to 50, and the factors interpreted as being representative of the haplotypes blocks. This showed the potential of the technique, making it possible to obtain results applicable to any type of population, helping or corroborating the interpretation of genomic studies. The study demonstrated that AF was able to identify patterns of association between markers, identifying subgroups of markers that reflect factor binding groups and also linkage disequilibrium groups.(AU)
Padrões empíricos de desequilíbrio de ligação (LD) podem ser utilizados para aumentar o poder estatístico do mapeamento genético. Este trabalho foi realizado com o objetivo de verificar a eficácia da análise de fatores (AF) aplicada a conjuntos de dados de marcadores moleculares do tipo SNP, visando identificar grupos de ligação e blocos de haplótipos. O conjunto de dados SNPs utilizado foi oriundo de um processo de simulação de uma população F2, contendo 2000 marcas com informações de 500 indivíduos. A estimação das cargas fatoriais (loadings) da AF foi feita de duas formas, considerando a matriz de distâncias entre os marcadores (A) e considerando a matriz de correlação (R). O número de fatores (k) a ser utilizado foi estabelecido com base no gráfico scree-plot e com base na proporção da variância total explicada. Os resultados indicam que as matrizes A e R conduzem a resultados similares. Com base no scree-plot considerou-se k igual a 10 e os fatores interpretados como sendo representativos dos grupos de ligação. O segundo critério conduziu a um número de fatores igual a 50, e os fatores interpretados como sendo representativos dos blocos de haplótipos. Isto mostra o potencial da técnica que permite obter resultados aplicáveis a qualquer tipo de população, corroborando a interpretação de estudos genômicos. O trabalho demonstrou que a AF foi capaz de identificar padrões de associação entre marcadores, identificando subgrupos de marcadores que refletem grupos de ligação fatorial e também grupos de desequilíbrio de ligação.(AU)
Assuntos
Técnicas Genéticas , Marcadores GenéticosResumo
Genomic selection (GS) emphasizes the simultaneous prediction of the genetic effects of thousands of scattered markers over the genome. Several statistical methodologies have been used in GS for the prediction of genetic merit. In general, such methodologies require certain assumptions about the data, such as the normality of the distribution of phenotypic values. To circumvent the non-normality of phenotypic values, the literature suggests the use of Bayesian Generalized Linear Regression (GBLASSO). Another alternative is the models based on machine learning, represented by methodologies such as Artificial Neural Networks (ANN), Decision Trees (DT) and related possible refinements such as Bagging, Random Forest and Boosting. This study aimed to use DT and its refinements for predicting resistance to orange rust in Arabica coffee. Additionally, DT and its refinements were used to identify the importance of markers related to the characteristic of interest. The results were compared with those from GBLASSO and ANN. Data on coffee rust resistance of 245 Arabica coffee plants genotyped for 137 markers were used. The DT refinements presented equal or inferior values of Apparent Error Rate compared to those obtained by DT, GBLASSO, and ANN. Moreover, DT refinements were able to identify important markers for the characteristic of interest. Out of 14 of the most important markers analyzed in each methodology, 9.3 markers on average were in regions of quantitative trait loci (QTLs) related to resistance to disease listed in the literature.
Assuntos
Coffea/genética , Coffea/parasitologia , Fungos/crescimento & desenvolvimento , Fungos/patogenicidade , Inteligência ArtificialResumo
Genomic selection (GS) emphasizes the simultaneous prediction of the genetic effects of thousands of scattered markers over the genome. Several statistical methodologies have been used in GS for the prediction of genetic merit. In general, such methodologies require certain assumptions about the data, such as the normality of the distribution of phenotypic values. To circumvent the non-normality of phenotypic values, the literature suggests the use of Bayesian Generalized Linear Regression (GBLASSO). Another alternative is the models based on machine learning, represented by methodologies such as Artificial Neural Networks (ANN), Decision Trees (DT) and related possible refinements such as Bagging, Random Forest and Boosting. This study aimed to use DT and its refinements for predicting resistance to orange rust in Arabica coffee. Additionally, DT and its refinements were used to identify the importance of markers related to the characteristic of interest. The results were compared with those from GBLASSO and ANN. Data on coffee rust resistance of 245 Arabica coffee plants genotyped for 137 markers were used. The DT refinements presented equal or inferior values of Apparent Error Rate compared to those obtained by DT, GBLASSO, and ANN. Moreover, DT refinements were able to identify important markers for the characteristic of interest. Out of 14 of the most important markers analyzed in each methodology, 9.3 markers on average were in regions of quantitative trait loci (QTLs) related to resistance to disease listed in the literature.(AU)
Assuntos
Coffea/genética , Coffea/parasitologia , Fungos/crescimento & desenvolvimento , Fungos/patogenicidade , Inteligência ArtificialResumo
The objective of this study was to adjust nonlinear quantile regression models for the study of dry matter accumulation in garlic plants over time, and to compare them to models fitted by the ordinary least squares method. The total dry matter of nine garlic accessions belonging to the Vegetable Germplasm Bank of Universidade Federal de Viçosa (BGH/UFV) was measured in four stages (60, 90, 120 and 150 days after planting), and those values were used for the nonlinear regression models fitting. For each accession, there was an adjustment of one model of quantile regression (τ=0.5) and one based on the least squares method. The nonlinear regression model fitted was the Logistic. The Akaike Information Criterion was used to evaluate the goodness of fit of the models. Accessions were grouped using the UPGMA algorithm, with the estimates of the parameters with biological interpretation as variables. The nonlinear quantile regression is efficient for the adjustment of models for dry matter accumulation in garlic plants over time. The estimated parameters are more uniform and robust in the presence of asymmetry in the distribution of the data, heterogeneous variances, and outliers.(AU)
Este trabalho teve como objetivo ajustar modelos de regressão quantílica não linear para o estudo do acúmulo de matéria seca total em plantas de alho ao longo do tempo, e compará-los com modelos ajustados pelo método dos mínimos quadrados. A matéria seca total de nove acessos de alho pertencentes ao Banco de Germoplasma de Hortaliças da Universidade Federal de Viçosa (BGH/UFV) foi avaliada em quatro períodos (60, 90, 120 e 150 dias após plantio), e estes valores foram utilizados para o ajuste de modelos de regressão - não linear - logística. Para cada acesso, foram ajustados um modelo de regressão quantílica (τ=0,5) e um modelo pela metodologia dos mínimos quadrados. Para avaliar a qualidade de ajuste dos modelos foi utilizado o Critério de Informação de Akaike. Os acessos foram agrupados pelo algoritmo UPGMA, utilizando as estimativas dos parâmetros com interpretação biológica como variáveis. A regressão quantílica não linear foi eficiente no ajuste de modelos para descrição do acúmulo de matéria seca ao longo do tempo. As estimativas de parâmetros foram mais uniformes e robustas na presença de dados assimétricos, variâncias heterogêneas e de valores discrepantes.(AU)
Assuntos
Análise de Regressão , Alho , 24444Resumo
Genome-wide selection (GWS) is based on a large number of markers widely distributed throughout the genome. Genome-wide selection provides for the estimation of the effect of each molecular marker on the phenotype, thereby allowing for the capture of all genes affecting the quantitative traits of interest. The main statistical tools applied to GWS are based on random regression or dimensionality reduction methods. In this study a new non-parametric method, called Delta-p was proposed, which was then compared to the Genomic Best Linear Unbiased Predictor (G-BLUP) method. Furthermore, a new selection index combining the genetic values obtained by the G-BLUP and Delta-p, named Delta-p/G-BLUP methods, was proposed. The efficiency of the proposed methods was evaluated through both simulation and real studies. The simulated data consisted of eight scenarios comprising a combination of two levels of heritability, two genetic architectures and two dominance status (absence and complete dominance). Each scenario was simulated ten times. All methods were applied to a real dataset of Asian rice (Oryza sativa) aiming to increase the efficiency of a current breeding program. The methods were compared as regards accuracy of prediction (simulation data) or predictive ability (real dataset), bias and recovery of the true genomic heritability. The results indicated that the proposed Delta-p/G-BLUP index outperformed the other methods in both prediction accuracy and predictive ability.
Resumo
The aim of this study was to use quantile regression (QR) to characterize the effect of the adaptability parameter throughout the distribution of the productivity variable on black bean cultivars launched by different national research institutes (research centers) over the last 50 years. For this purpose, 40 cultivars developed by Brazilian genetic improvement programs between 1959 and 2013 were used. Initially, QR models were adjusted considering three quantiles (τ = 0.2, 0.5 and 0.8). Subsequently, with the confidence intervals, quantile models τ = 0.2 and 0.8 (QR0.2 and QR0.8) showed differences regarding the parameter of adaptability and average productivity. Finally, by grouping the cultivars into one of the two groups defined from QR0.2 and QR0.8, it was reported that the younger cultivars were associated to the quantile τ = 0.8, i.e., those with higher yields and more responsive conditions indicating that genetic improvement over the last 50 years resulted in an increase in both the productivity and the adaptability of cultivars.(AU)
Neste estudo objetivou-se utilizar a regressão quantílica (RQ) para caracterizar o efeito do parâmetro de adaptabilidade ao longo de toda a distribuição da variável produtividade em cultivares de feijão preto lançadas por diferentes instituições nacionais de pesquisa nos últimos 50 anos. Para tanto utilizou-se 40 cultivares desenvolvidas pelos programas brasileiros de melhoramento genético entre os anos de 1959 a 2013. Inicialmente foram ajustados modelos de RQ considerando três quantis (τ=0,2, 0,5, 0,8). Posteriormente, com os intervalos de confiança verificou-se que os modelos quantílicos τ=0,2 e 0,8 (RQ0,2 e RQ0,8) apresentaram diferenças quanto ao parâmetro de adaptabilidade e produtividade média. Finalmente, por meio do agrupamento das cultivares em um dos dois grupos definidos a partir de RQ0,2 e RQ0,8, constatou-se que as cultivares mais novas foram associadas ao quantil τ = 0,8, ou seja, aquelas com maiores produtividades e mais responsivas as condições ambientais indicando que o melhoramento ao longo dos últimos 50 anos possibilitou o incremento tanto na produtividade quanto na adaptabilidade das cultivares.(AU)
Resumo
The length of the hypocotyl has been highlighted as a potential descriptor of the soybean crop. However, there is no information available in the published literature about its behavior over several planting times. The present study aimed to identify soybean cultivars with stability and predictability of hypocotyl length behavior through neural networks and traditional adaptability and stability methodologies. We analyzed 16 soybean cultivars in 6 planting seasons under greenhouse conditions. In each season, a randomized block design with 4 replications was adopted. The experimental unit was composed of 3 plants. The plot mean was used in the analysis. Hypocotyl length data were analyzed by analysis of variance and Tukeys test. Then analyses were carried out using the Traditional Method, Plaisted and Peterson, Wricke, Eberhart and Russell, and Artificial Neural Networks. A significant effect (p<0.01 by the F test) was identified for Cultivars versus Planting Season and Planting Seasons and Cultivars. Cultivars BRS810C, BRSMG760SRR, TMG1175RR, and BMX Tornado RR showed lower averages, high stability, and general adaptability regarding soybean hypocotyl length whereas the cultivar BG4272 presented higher mean, high stability, and general adaptability. Identification of soybean cultivars of predictable and stable behavior as to hypocotyl length contributes to Soybean Improvement as it further our knowledge on the potential descriptor and the possibility of increasing the number of descriptors.(AU)
O comprimento do hipocótilo tem-se destacado como potencial descritor da cultura da soja, no entanto, não se tem informação sobre o seu comportamento ao longo de várias épocas de plantio. Diante disto, objetivou-se identificar cultivares de soja com estabilidade e previsibilidade de comportamento quanto ao comprimento do hipocótilo por meio de redes neurais e metodologias tradicionais de adaptabilidade e estabilidade. Analisou-se 16 cultivares de soja em seis épocas de plantio, em condições de casa de vegetação. Em cada época, adotou-se o delineamento em blocos casualizados com quatro repetições, sendo a unidade experimental composta por três plantas e usou-se a média da parcela na análise. Os dados de comprimento de hipocótilo foram analisados por meio da análise de variância e teste de Tukey e, posteriormente, procedeu-se análises por meio do Método Tradicional, Plaisted e Peterson, Wricke, Eberhart e Russell e Redes Neurais Artificiais. Identificou-se efeito significativo (p<0,01 pelo teste F) para Cultivares x Épocas, Épocas e Cultivares. As cultivares BRS810C, BRSMG760SRR, TMG1175RR e BMX Tornado RR apresentaram menores médias, alta estabilidade e adaptabilidade geral quanto ao comprimento do hipocótilo de soja; enquanto que, a cultivar BG4272 apresentou maior média, alta estabilidade e adaptabilidade geral. A identificação de cultivares de soja de comportamento previsível e estável, quanto ao comprimento do hipocótilo, contribui para o Melhoramento da Soja no tocante ao melhor conhecimento do potencial descritor e à possibilidade de incremento do número de descritores.(AU)
Resumo
Rice cultivation has great national and global importance, being one of the most produced and consumed cereals in the world and the primary food for more than half of the worlds population. Because of its importance as food, developing efficient methods to select and predict genetically superior individuals in reference to plant traits is of extreme importance for breeding programs. The objective of this research was to evaluate and compare the efficiency of the Delta-p, G-BLUP (Genomic Best Linear Unbiased Predictor), BayesCpi, BLASSO (Bayesian Least Absolute Shrinkage and Selection Operator), Delta-p/G-BLUP index, Delta-p/BayesCpi index, and Delta-p/BLASSO index in the estimation of genomic values and the effects of single nucleotide polymorphisms on phenotypic data associated with rice traits. Use of molecular markers allowed high selective efficiency and increased genetic gain per unit time. The Delta-p method uses the concept of change in allelic frequency caused by selection and the theoretical concept of genetic gain. The Index is based on the principle of combined selection, using the information regarding the additive genomic values predicted via G-BLUP, BayesCpi, BLASSO, or Delta-p. These methods were applied and compared for genomic prediction using nine rice traits: flag leaf length, flag leaf width, panicles number per plant, primary panicle branch number, seed length, seed width, amylose content, protein content, and blast resistance. Delta-p/G-BLUP index had higher predictive abilities for the traits studied, except for amylose content trait in which the method with the highest predictive ability was BayesCpi, being approximately 3% greater than that of the Delta-p/G-BLUP index.(AU)
A cultura do arroz tem grande importância nacional e mundial por ser um dos cereais mais produzidos e consumidos no mundo, caracterizando-se como o principal alimento de mais da metade da população mundial. Em função de sua importância alimentar, desenvolver métodos eficientes que visam a predição e a seleção de indivíduos geneticamente superiores, quanto a características da planta, é de extrema importância para os programas de melhoramento. Diante disso, o objetivo deste trabalho foi avaliar e comparar a eficiência do método Delta-p, G-BLUP, BayesCpi, BLASSO e o índice Delta-p/G-BLUP, índice Delta-p/BayesCpi e índice Delta-p/BLASSO, na estimação de valores genômicos e dos efeitos de marcadores SNPs (Single Nucleotide Polymorphisms) em dados fenotípicos associados a características de arroz. A utilização de marcadores moleculares permite alta eficiência seletiva e o aumento do ganho genético por unidade de tempo. O método Delta-p utiliza o conceito de mudança na frequência alélica devido à seleção e o conceito teórico de ganho genético. O Índice é baseado no princípio da seleção combinada, utiliza conjuntamente as informações dos valores genômicos aditivos preditos via G-BLUP, BayesCpi ou BLASSO e via Delta-p. Estes métodos foram aplicados e comparados quanto à predição genômica utilizando nove características de arroz (Oryza sativa), sendo elas: comprimento da folha bandeira, largura da folha bandeira; número de panículas por planta; número de ramos da panícula primária; comprimento de semente; largura de semente; teor de amilose; teor de proteína; resistência a bruzone. O índice Delta-p/G-BLUP obteve maiores capacidades preditivas para as características estudadas, exceto para a característica Conteúdo de amilose, em que o método que obteve maior capacidade preditiva foi o BayesCpi, sendo aproximadamente 3% superior ao índice Delta-p/G-BLUP.(AU)
Assuntos
Oryza/genética , Oryza/crescimento & desenvolvimento , Melhoramento Genético/métodos , Componentes Genômicos , Polimorfismo de Nucleotídeo Único , Plantas Geneticamente ModificadasResumo
Genome-wide selection (GWS) is based on a large number of markers widely distributed throughout the genome. Genome-wide selection provides for the estimation of the effect of each molecular marker on the phenotype, thereby allowing for the capture of all genes affecting the quantitative traits of interest. The main statistical tools applied to GWS are based on random regression or dimensionality reduction methods. In this study a new non-parametric method, called Delta-p was proposed, which was then compared to the Genomic Best Linear Unbiased Predictor (G-BLUP) method. Furthermore, a new selection index combining the genetic values obtained by the G-BLUP and Delta-p, named Delta-p/G-BLUP methods, was proposed. The efficiency of the proposed methods was evaluated through both simulation and real studies. The simulated data consisted of eight scenarios comprising a combination of two levels of heritability, two genetic architectures and two dominance status (absence and complete dominance). Each scenario was simulated ten times. All methods were applied to a real dataset of Asian rice (Oryza sativa) aiming to increase the efficiency of a current breeding program. The methods were compared as regards accuracy of prediction (simulation data) or predictive ability (real dataset), bias and recovery of the true genomic heritability. The results indicated that the proposed Delta-p/G-BLUP index outperformed the other methods in both prediction accuracy and predictive ability.(AU)
Resumo
ABSTRACT: We aimed to apply genomic information based on SNP (single nucleotide polymorphism) markers for the genetic evaluation of the traits stay-green (SG), plant architecture (PA), grain aspect (GA) and grain yield (GY) in common bean through Bayesian models. These models were compared in terms of prediction accuracy and ability for heritability estimation for each one of the mentioned traits. A total of 80 cultivars were genotyped for 377 SNP markers, whose effects were estimated by five different Bayesian models: Bayes A (BA), B (BB), C (BC), LASSO (BL) e Ridge regression (BRR). Although, prediction accuracies calculated by means of cross-validation have been similar within each trait, the BB model stood out for the trait SG, whereas the BRR was indicated for the remaining traits. The heritability estimates for the traits SG, PA, GA and GY were 0.61, 0.28, 0.32 and 0.29, respectively. In summary, the Bayesian methods applied here were effective and ease to be implemented. The used SNP markers can help in the early selection of promising genotypes, since incorporating genomic information increase the prediction accuracy of the estimated genetic merit.
RESUMO: Objetivou-se incorporar informações genômicas de marcadores SNP (single nucleotide polymorphism) na avaliação genética das características stay-green (SG), arquitetura de planta (AP), aspecto de grãos (AG) e produtividade de grãos (PG) em feijoeiro-comum via modelos Bayesianos. Estes modelos foram comparados quanto a acurácia de predição e habilidade de estimação da herdabilidade para cada característica. Utilizaram-se informações de 80 cultivares genotipadas para 377 marcadores SNP, cujos efeitos de substituição alélica foram estimados por meio de cinco diferentes modelos Bayesianos: Bayes A (BA), B (BB), C (BC), LASSO (BL) e regressão ridge (BRR). Embora as acurácias de predição calculadas por meio de análise de validação cruzada tenham sido similares dentro de cada característica, o modelo BB se destacou para a característica SG, enquanto o modelo BRR foi indicado para as demais. As herdabilidades estimadas para SG, AP, AG e PG foram, respectivamente, 0,61, 0,28, 0,32 e 0,29. Em resumo, os métodos contemplados mostraram-se efetivos e de fácil implementação. O conjunto de marcadores utilizado pode auxiliar na seleção precoce de genótipos promissores, uma vez que a incorporação de informações genômicas aumenta a acurácia de predição do mérito genético estimado.
Resumo
We aimed to apply genomic information based on SNP (single nucleotide polymorphism) markers for the genetic evaluation of the traits stay-green (SG), plant architecture (PA), grain aspect (GA) and grain yield (GY) in common bean through Bayesian models. These models were compared in terms of prediction accuracy and ability for heritability estimation for each one of the mentioned traits. A total of 80 cultivars were genotyped for 377 SNP markers, whose effects were estimated by five different Bayesian models: Bayes A (BA), B (BB), C (BC), LASSO (BL) e Ridge regression (BRR). Although, prediction accuracies calculated by means of cross-validation have been similar within each trait, the BB model stood out for the trait SG, whereas the BRR was indicated for the remaining traits. The heritability estimates for the traits SG, PA, GA and GY were 0.61, 0.28, 0.32 and 0.29, respectively. In summary, the Bayesian methods applied here were effective and ease to be implemented. The used SNP markers can help in the early selection of promising genotypes, since incorporating genomic information increase the prediction accuracy of the estimated genetic merit.(AU)
Objetivou-se incorporar informações genômicas de marcadores SNP (single nucleotide polymorphism) na avaliação genética das características stay-green (SG), arquitetura de planta (AP), aspecto de grãos (AG) e produtividade de grãos (PG) em feijoeiro-comum via modelos Bayesianos. Estes modelos foram comparados quanto a acurácia de predição e habilidade de estimação da herdabilidade para cada característica. Utilizaram-se informações de 80 cultivares genotipadas para 377 marcadores SNP, cujos efeitos de substituição alélica foram estimados por meio de cinco diferentes modelos Bayesianos: Bayes A (BA), B (BB), C (BC), LASSO (BL) e regressão ridge (BRR). Embora as acurácias de predição calculadas por meio de análise de validação cruzada tenham sido similares dentro de cada característica, o modelo BB se destacou para a característica SG, enquanto o modelo BRR foi indicado para as demais. As herdabilidades estimadas para SG, AP, AG e PG foram, respectivamente, 0,61, 0,28, 0,32 e 0,29. Em resumo, os métodos contemplados mostraram-se efetivos e de fácil implementação. O conjunto de marcadores utilizado pode auxiliar na seleção precoce de genótipos promissores, uma vez que a incorporação de informações genômicas aumenta a acurácia de predição do mérito genético estimado.(AU)
Assuntos
Phaseolus/crescimento & desenvolvimento , Phaseolus/genética , Polimorfismo de Nucleotídeo Único , Genoma , Teorema de BayesResumo
Plant growth analyses are important because they generate information on the demand and necessary care for each development stage of a plant. Nonlinear regression models are appropriate for the description of curves of growth, since they include parameters with practical biological interpretation. However, these models present information in terms of the conditional mean, and they are subject to problems in the adjustment caused by possible outliers or asymmetry in the distribution of the data. Quantile regression can solve these problems, and it allows the estimation of different quantiles, generating more complete and robust results. The objective of this research was to adjust a nonlinear quantile regression model for the study of dry matter accumulation in garlic plants (Allium sativum L.) over time, estimating parameters at three different quantiles and classifying each garlic accession according to its growth rate and asymptotic weight. The nonlinear regression model fitted was a Logistic model, and 30 garlic accessions were evaluated. These 30 accessions were divided based on the model with the closest quantile estimates; 12 accessions were classified as of lesser interest for planting, 6 were classified as intermediate, and 12 were classified as of greater interest for planting.(AU)
Análises de crescimento de plantas são importantes, pois geram informações sobre a demanda e os cuidados necessários para cada etapa de seu desenvolvimento. Modelos de regressão não linear são apropriados para descrever curvas de crescimento por apresentarem parâmetros com interpretação prática biológica. Entretanto, estes modelos apresentam informações em termos médios, e estão sujeitos a problemas no ajuste proporcionados por possíveis valores extremos ou assimetria na distribuição dos dados. A regressão quantílica pode contornar estes problemas, e ainda permite estimativas de diferentes quantis, gerando resultados mais completos e robustos. Assim, o objetivo deste trabalho foi ajustar um modelo de regressão quantílica não linear para o estudo do acúmulo de matéria seca em plantas de alho (Allium sativum L.) ao longo do tempo, estimando seus parâmetros em três diferentes quantis e classificando cada acesso de alho de acordo com sua taxa de crescimento e peso assintótico. O modelo de regressão não linear ajustado foi o Logístico, e foram utilizados 30 acessos de alho. Estes foram divididos de acordo com a curva do quantil de estimativas mais próximas, sendo classificados 12 acessos como de baixo interesse para o plantio, 6 de interesse intermediário e 12 como de alto interesse.(AU)
Assuntos
Alho/crescimento & desenvolvimento , Produtos Agrícolas/crescimento & desenvolvimento , Análise de Regressão , Modelos Logísticos , Desenvolvimento Vegetal , 24444Resumo
The aim of this research was to evaluate the dimensional reduction of additive direct genetic covariance matrices in genetic evaluations of growth traits (range 100-730 days) in Simmental cattle using principal components, as well as to estimate (co)variance components and genetic parameters. Principal component analyses were conducted for five different models-one full and four reduced-rank models. Models were compared using Akaike information (AIC) and Bayesian information (BIC) criteria. Variance components and genetic parameters were estimated by restricted maximum likelihood (REML). The AIC and BIC values were similar among models. This indicated that parsimonious models could be used in genetic evaluations in Simmental cattle. The first principal component explained more than 96% of total variance in both models. Heritability estimates were higher for advanced ages and varied from 0.05 (100 days) to 0.30 (730 days). Genetic correlation estimates were similar in both models regardless of magnitude and number of principal components. The first principal component was sufficient to explain almost all genetic variance. Furthermore, genetic parameter similarities and lower computational requirements allowed for parsimonious models in genetic evaluations of growth traits in Simmental cattle.(AU)
Objetivou-se estudar a efetividade da redução da dimensão da matriz de covariância do efeito genético direto na avaliação genética do crescimento (pesos dos 100 aos 730 dias de idade) de bovinos Simental, por meio da análise de componentes principais, e estimar componentes de (co)variância e parâmetros genéticos. A análise de componentes principais foi realizada ajsutando-se cinco diferentes modelos: um modelo multicaracterístico padrão, de posto completo, e quatro modelos de posto reduzido. Os modelos foram comparados via informação de Akaike (AIC) e informação Bayesiana de Schwarz (BIC). Os componentes de variância e parâmetros genéticos foram obtidos via REML. Os valores de AIC e BIC para os modelos testados foram similares, indicando a possibilidade da escolha de um modelo mais parcimonioso na avaliação genética da raça Simental. O primeiro componente principal explicou mais de 96% de toda variação genética aditiva direta em ambos os modelos. Os valores de herdabilidades foram maiores em idades mais avançadas e variaram de 0,05 (peso aos 100 dias) a 0,30 (peso aos 730 dias). As estimativas de correlações genéticas foram similares em todos os modelos e apresentaram mesma magnitude e comportamento independentemente do número de componentes principais adotado. Diante dos resultados, pode-se afirmar que apenas o primeiro componente principal foi suficiente para explicar quase que na totalidade a variação genética aditiva direta existente. Além disso, a similaridade dos parâmetros genéticos estimados e a menor demanda computacional são indicativos da possibilidade da utilização de modelos mais parcimoniosos na avaliação genética de bovinos Simental.(AU)
Assuntos
Animais , Bovinos , Bovinos/crescimento & desenvolvimento , Bovinos/genética , Aumento de Peso/genéticaResumo
Analysis using Artificial Neural Networks has been described as an approach in the decision-making process that, although incipient, has been reported as presenting high potential for use in animal and plant breeding. In this study, we introduce the procedure of using the expanded data set for training the network. Wealso proposed using statistical parameters to estimate the breeding value of genotypes in simulated scenarios, in addition to the mean phenotypic value in a feed-forward back propagation multilayer perceptron network. After evaluating artificial neural network configurations, our results showed its superiority to estimates based on linear models, as well as its applicability in the genetic value prediction process. The results further indicated the good generalization performance of the neural network model in several additional validation experiments.
Assuntos
Melhoramento Vegetal/métodos , Moldes Genéticos , Redes Neurais de Computação , Simulação por ComputadorResumo
Analysis using Artificial Neural Networks has been described as an approach in the decision-making process that, although incipient, has been reported as presenting high potential for use in animal and plant breeding. In this study, we introduce the procedure of using the expanded data set for training the network. Wealso proposed using statistical parameters to estimate the breeding value of genotypes in simulated scenarios, in addition to the mean phenotypic value in a feed-forward back propagation multilayer perceptron network. After evaluating artificial neural network configurations, our results showed its superiority to estimates based on linear models, as well as its applicability in the genetic value prediction process. The results further indicated the good generalization performance of the neural network model in several additional validation experiments.(AU)