Pesquisa | Portal de Pesquisa da BVS

Marker effects and heritability estimates using additive-dominance genomic architectures via artificial neural networks in Coffea canephora.

Coelho de Sousa, Ithalo; Nascimento, Moysés; de Castro Sant'anna, Isabela; Teixeira Caixeta, Eveline; Ferreira Azevedo, Camila; Damião Cruz, Cosme; Lopes da Silva, Felipe; Ruas Alkimim, Emilly; Campana Nascimento, Ana Carolina; Vergara Lopes Serão, Nick.

PLoS One ; 17(1): e0262055, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35081139

RESUMO

Many methodologies are used to predict the genetic merit in animals and plants, but some of them require priori assumptions that may increase the complexity of the model. Artificial neural network (ANN) has advantage to not require priori assumptions about the relationships between inputs and the output allowing great flexibility to handle different types of complex non-additive effects, such as dominance and epistasis. Despite this advantage, the biological interpretability of ANNs is still limited. The aim of this research was to estimate the heritability and markers effects for two traits in Coffea canephora using an additive-dominance architecture ANN and to compare it with genomic best linear unbiased prediction (GBLUP). The data used consists of 51 clones of C. canephora varietal Conilon, 32 of varietal group Robusta and 82 intervarietal hybrids. From this, 165 phenotyped individuals were genotyped for 14,387 SNPs. Due to the high computational cost of ANNs, we used Bagging decision tree to reduce the dimensionality of the data, selecting the markers that accumulated 70% of the total importance. An ANN with three hidden layers was run, each varying from 1 to 40 neurons summing 64,000 neural networks. The network architectures with the best predictive ability were selected. The best architectures were composed by 4, 15, and 33 neurons in the first, second and third hidden layers, respectively, for yield, and by 13, 20, and 24 neurons, respectively for rust resistance. The predictive ability was greater when using ANN with three hidden layers than using one hidden layer and GBLUP, with 0.72 and 0.88 for yield and coffee leaf rust resistance, respectively. The concordance rate (CR) of the 10% larger markers effects among the methods varied between 10% and 13.8%, for additive effects and between 5.4% and 11.9% for dominance effects. The narrow-sense ([Formula: see text]) and dominance-only ([Formula: see text]) heritability estimates were 0.25 and 0.06, respectively, for yield, and 0.67 and 0.03, respectively for rust resistance. The ANN was able to estimate the heritabilities from an additive-dominance genomic architectures and the ANN with three hidden layers obtained best predictive ability when compared with those obtained from GBLUP and ANN with one hidden layer.

Assuntos

Genômica

Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data.

Nascimento, Moysés; Silva, Fabyano Fonseca E; Sáfadi, Thelma; Nascimento, Ana Carolina Campana; Ferreira, Talles Eduardo Maciel; Barroso, Laís Mayara Azevedo; Ferreira Azevedo, Camila; Guimarães, Simone Eliza Faccione; Serão, Nick Vergara Lopes.

PLoS One ; 12(7): e0181195, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28715507

RESUMO

Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together.

Assuntos

Algoritmos , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , RNA , Animais , Análise por Conglomerados , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica no Desenvolvimento , Modelos Moleculares , Músculo Esquelético/embriologia , Músculo Esquelético/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , RNA/metabolismo , Suínos , Fatores de Tempo

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA