Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 98
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cereb Cortex ; 33(8): 4829-4843, 2023 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-36190430

RESUMEN

Functional magnetic resonance imaging has been used to identify complex brain networks by examining the correlation of blood-oxygen-level-dependent signals between brain regions during the resting state. Many of the brain networks identified in adults are detectable at birth, but genetic and environmental influences governing connectivity within and between these networks in early infancy have yet to be explored. We investigated genetic influences on neonatal resting-state connectivity phenotypes by generating intraclass correlations and performing mixed effects modeling to estimate narrow-sense heritability on measures of within network and between-network connectivity in a large cohort of neonate twins. We also used backwards elimination regression and mixed linear modeling to identify specific demographic and medical history variables influencing within and between network connectivity in a large cohort of typically developing twins and singletons. Of the 36 connectivity phenotypes examined, only 6 showed narrow-sense heritability estimates greater than 0.10, with none being statistically significant. Demographic and obstetric history variables contributed to between- and within-network connectivity. Our results suggest that in early infancy, genetic factors minimally influence brain connectivity. However, specific demographic and medical history variables, such as gestational age at birth and maternal psychiatric history, may influence resting-state connectivity measures.


Asunto(s)
Mapeo Encefálico , Encéfalo , Embarazo , Femenino , Humanos , Encéfalo/diagnóstico por imagen , Fenotipo , Descanso , Imagen por Resonancia Magnética , Vías Nerviosas/diagnóstico por imagen
2.
J Dairy Sci ; 107(3): 1561-1576, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37806624

RESUMEN

Information on dry matter intake (DMI) and energy balance (EB) at the animal and herd level is important for management and breeding decisions. However, routine recording of these traits at commercial farms can be challenging and costly. Fourier-transform mid-infrared (FT-MIR) spectroscopy is a noninvasive technique applicable to a large cohort of animals that is routinely used to analyze milk components and is convenient for predicting complex phenotypes that are typically difficult and expensive to obtain on a large scale. We aimed to develop prediction models for EB and use the predicted phenotypes for genetic analysis. First, we assessed prediction equations using 4,485 phenotypic records from 167 Holstein cows from an experimental station. The phenotypes available were body weight (BW), milk yield (MY) and milk components, weekly-averaged DMI, and FT-MIR data from all milk samples available. We implemented mixed models with Bayesian approaches and assessed them through 50 randomized replicates of a 5-fold cross-validation. Second, we used the best prediction models to obtain predicted phenotypes of EB (EBp) and DMI (DMIp) on 5 commercial farms with 2,365 phenotypic records of MY, milk components and FT-MIR data, and BW from 1,441 Holstein cows. Third, we performed a GWAS and estimated heritability and genetic correlations for energy content in milk (EnM), BW, DMIp, and EBp using the genomic information available on the cows from commercial farms. The highest correlation between the predicted and observed phenotype (ry,y^) was obtained with DMI (0.88) and EB (0.86), while predicting BW was, as anticipated, more challenging (0.69). In our study, models that included FT-MIR information performed better than models without spectra information in the 3 traits analyzed, with increments in prediction correlation ranging from 5% to 10%. For the predicted phenotypes calculated by the prediction equations and data from the commercial farms the heritability ranged between 0.11 and 0.16 for EnM, DMIp and EBp, and 0.42 for BW. The genetic correlation between EnM and BW was -0.17, with DMIp was 0.40 and with EBp was -0.39. From the GWAS, we detected one significant QTL region for EnM, and 3 for BW, but none for DMIp and EBp. The results obtained in our study support previous evidence that FT-MIR information from milk samples contribute to improve the prediction equations for DMI, BW, and EB, and these predicted phenotypes may be used for herd management and contribute to the breeding strategy for improving cow performance.


Asunto(s)
Cruzamiento , Leche , Humanos , Femenino , Animales , Bovinos , Teorema de Bayes , Peso Corporal , Granjas
3.
Plant Cell ; 32(1): 139-151, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31641024

RESUMEN

The ability to predict traits from genome-wide sequence information (i.e., genomic prediction) has improved our understanding of the genetic basis of complex traits and transformed breeding practices. Transcriptome data may also be useful for genomic prediction. However, it remains unclear how well transcript levels can predict traits, particularly when traits are scored at different development stages. Using maize (Zea mays) genetic markers and transcript levels from seedlings to predict mature plant traits, we found that transcript and genetic marker models have similar performance. When the transcripts and genetic markers with the greatest weights (i.e., the most important) in those models were used in one joint model, performance increased. Furthermore, genetic markers important for predictions were not close to or identified as regulatory variants for important transcripts. These findings demonstrate that transcript levels are useful for predicting traits and that their predictive power is not simply due to genetic variation in the transcribed genomic regions. Finally, genetic marker models identified only 1 of 14 benchmark flowering-time genes, while transcript models identified 5. These data highlight that, in addition to being useful for genomic prediction, transcriptome data can provide a link between traits and variation that cannot be readily captured at the sequence level.


Asunto(s)
Genoma de Planta/genética , Herencia Multifactorial , Transcriptoma , Zea mays/genética , Marcadores Genéticos , Variación Genética , Estudio de Asociación del Genoma Completo , Genómica , Modelos Genéticos , Fenotipo
4.
Genet Sel Evol ; 55(1): 57, 2023 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-37550618

RESUMEN

BACKGROUND: Most genomic prediction applications in animal breeding use genotypes with tens of thousands of single nucleotide polymorphisms (SNPs). However, modern sequencing technologies and imputation algorithms can generate ultra-high-density genotypes (including millions of SNPs) at an affordable cost. Empirical studies have not produced clear evidence that using ultra-high-density genotypes can significantly improve prediction accuracy. However, (whole-genome) prediction accuracy is not very informative about the ability of a model to capture the genetic signals from specific genomic regions. To address this problem, we propose a simple methodology that detects chromosome regions for which a specific model (e.g., single-step genomic best linear unbiased prediction (ssGBLUP)) may fail to fully capture the genetic signal present in such segments-a phenomenon that we refer to as signal leakage. We propose to detect regions with evidence of signal leakage by testing the association of residuals from a pedigree or a genomic model with SNP genotypes. We discuss how this approach can be used to map regions with signals that are poorly captured by a model and to identify strategies to fix those problems (e.g., using a different prior or increasing marker density). Finally, we explored the proposed approach to scan for signal leakage of different models (pedigree-based, ssGBLUP, and various Bayesian models) applied to growth-related phenotypes (average daily gain and backfat thickness) in pigs. RESULTS: We report widespread evidence of signal leakage for pedigree-based models. Including a percentage of animals with SNP data in ssGBLUP reduced the extent of signal leakage. However, local peaks of missed signals remained in some regions, even when all animals were genotyped. Using variable selection priors solves leakage points that are caused by excessive shrinkage of marker effects. Nevertheless, these models still miss signals in some regions due to low linkage disequilibrium between the SNPs on the array used and causal variants. Thus, we discuss how such problems could be addressed by adding sequence SNPs from those regions to the prediction model. CONCLUSIONS: Residual single-marker regression analysis is a simple approach that can be used to detect regional genomic signals that are poorly captured by a model and to indicate ways to fix such problems.


Asunto(s)
Genoma , Genómica , Animales , Porcinos , Teorema de Bayes , Genómica/métodos , Genotipo , Fenotipo , Polimorfismo de Nucleótido Simple , Linaje , Modelos Genéticos
5.
BMC Infect Dis ; 22(1): 311, 2022 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-35351016

RESUMEN

BACKGROUND: Knowing the age-specific rates at which individuals infected with SARS-CoV-2 develop severe and critical disease is essential for designing public policy, for infectious disease modeling, and for individual risk evaluation. METHODS: In this study, we present the first estimates of these rates using multi-country serology studies, and public data on hospital admissions and mortality from early to mid-2020. We combine these under a Bayesian framework that accounts for the high heterogeneity between data sources and their respective uncertainties. We also validate our results using an indirect method based on infection fatality rates and hospital mortality data. RESULTS: Our results show that the risk of severe and critical disease increases exponentially with age, but much less steeply than the risk of fatal illness. We also show that our results are consistent across several robustness checks. CONCLUSION: A complete evaluation of the risks of SARS-CoV-2 for health must take non-fatal disease outcomes into account, particularly in young populations where they can be 2 orders of magnitude more frequent than deaths.


Asunto(s)
COVID-19 , Factores de Edad , Teorema de Bayes , COVID-19/epidemiología , Humanos , SARS-CoV-2 , Estudios Seroepidemiológicos
6.
Trends Genet ; 34(10): 746-754, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30139641

RESUMEN

Accurate prediction of complex traits requires using a large number of DNA variants. Advances in statistical and machine learning methodology enable the identification of complex patterns in high-dimensional settings. However, training these highly parameterized methods requires very large data sets. Until recently, such data sets were not available. But the situation is changing rapidly as very large biomedical data sets comprising individual genotype-phenotype data for hundreds of thousands of individuals become available in public and private domains. We argue that the convergence of advances in methodology and the advent of Big Genomic Data will enable unprecedented improvements in complex-trait prediction; we review theory and evidence supporting our claim and discuss challenges and opportunities that Big Data will bring to complex-trait prediction.


Asunto(s)
Macrodatos , Estudio de Asociación del Genoma Completo/tendencias , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo/genética , Genómica , Genotipo , Humanos , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética
7.
Heredity (Edinb) ; 127(5): 423-432, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34564692

RESUMEN

Genomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5-17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant.


Asunto(s)
Modelos Genéticos , Zea mays , Genoma , Genómica , Fenotipo , Polimorfismo de Nucleótido Simple , Zea mays/genética
8.
Genet Sel Evol ; 53(1): 65, 2021 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-34362312

RESUMEN

BACKGROUND: Analysis and prediction of complex traits using microbiome data combined with host genomic information is a topic of utmost interest. However, numerous questions remain to be answered: how useful can the microbiome be for complex trait prediction? Are estimates of microbiability reliable? Can the underlying biological links between the host's genome, microbiome, and phenome be recovered? METHODS: Here, we address these issues by (i) developing a novel simulation strategy that uses real microbiome and genotype data as inputs, and (ii) using variance-component approaches (Bayesian Reproducing Kernel Hilbert Space (RKHS) and Bayesian variable selection methods (Bayes C)) to quantify the proportion of phenotypic variance explained by the genome and the microbiome. The proposed simulation approach can mimic genetic links between the microbiome and genotype data by a permutation procedure that retains the distributional properties of the data. RESULTS: Using real genotype and rumen microbiota abundances from dairy cattle, simulation results suggest that microbiome data can significantly improve the accuracy of phenotype predictions, regardless of whether some microbiota abundances are under direct genetic control by the host or not. This improvement depends logically on the microbiome being stable over time. Overall, random-effects linear methods appear robust for variance components estimation, in spite of the typically highly leptokurtic distribution of microbiota abundances. The predictive performance of Bayes C was higher but more sensitive to the number of causative effects than RKHS. Accuracy with Bayes C depended, in part, on the number of microorganisms' taxa that influence the phenotype. CONCLUSIONS: While we conclude that, overall, genome-microbiome-links can be characterized using variance component estimates, we are less optimistic about the possibility of identifying the causative host genetic effects that affect microbiota abundances, which would require much larger sample sizes than are typically available for genome-microbiome-phenome studies. The R code to replicate the analyses is in https://github.com/miguelperezenciso/simubiome .


Asunto(s)
Bovinos/genética , Microbioma Gastrointestinal , Estudio de Asociación del Genoma Completo/métodos , Genoma , Herencia Multifactorial , Animales , Teorema de Bayes , Bovinos/microbiología , Simulación por Computador , Fenotipo
9.
PLoS Genet ; 11(5): e1005048, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25942577

RESUMEN

Whole-genome regression methods are being increasingly used for the analysis and prediction of complex traits and diseases. In human genetics, these methods are commonly used for inferences about genetic parameters, such as the amount of genetic variance among individuals or the proportion of phenotypic variance that can be explained by regression on molecular markers. This is so even though some of the assumptions commonly adopted for data analysis are at odds with important quantitative genetic concepts. In this article we develop theory that leads to a precise definition of parameters arising in high dimensional genomic regressions; we focus on the so-called genomic heritability: the proportion of variance of a trait that can be explained (in the population) by a linear regression on a set of markers. We propose a definition of this parameter that is framed within the classical quantitative genetics theory and show that the genomic heritability and the trait heritability parameters are equal only when all causal variants are typed. Further, we discuss how the genomic variance and genomic heritability, defined as quantitative genetic parameters, relate to parameters of statistical models commonly used for inferences, and indicate potential inferential problems that are assessed further using simulations. When a large proportion of the markers used in the analysis are in LE with QTL the likelihood function can be misspecified. This can induce a sizable finite-sample bias and, possibly, lack of consistency of likelihood (or Bayesian) estimates. This situation can be encountered if the individuals in the sample are distantly related and linkage disequilibrium spans over short regions. This bias does not negate the use of whole-genome regression models as predictive machines; however, our results indicate that caution is needed when using marker-based regressions for inferences about population parameters such as the genomic heritability.


Asunto(s)
Genómica/métodos , Modelos Genéticos , Carácter Cuantitativo Heredable , Teorema de Bayes , Marcadores Genéticos , Humanos , Funciones de Verosimilitud , Modelos Lineales , Desequilibrio de Ligamiento , Modelos Estadísticos , Sitios de Carácter Cuantitativo
10.
J Dairy Sci ; 101(10): 9135-9153, 2018 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-30055916

RESUMEN

The relationship of the estrous cycle to milk composition and milk physical properties was assessed on Holstein (n = 10,696), Brown Swiss (n = 20,501), Simmental (n = 17,837), and Alpine Grey (n = 8,595) cows reared in northeastern Italy. The first insemination after calving for each cow was chosen to be the day of estrus and insemination. Test days surrounding the insemination date (from 10 d before to 10 d after the day of the estrus) were selected and categorized in phases relative to estrus as diestrus high-progesterone, proestrus, estrus, metestrus, and diestrus increasing-progesterone phases. Milk components and physical properties were predicted on the basis of Fourier-transform infrared spectra of milk samples and were analyzed using a linear mixed model, which included the random effects of herd, the fixed classification effects of year-month, parity number, breed, estrous cycle phase, day nested within the estrous cycle phase, conception, partial regressions on linear and quadratic effects of days in milk nested within parity number, as well as the interactions between conception outcome with estrous cycle phase and breed with estrous cycle phase. Milk composition, particularly fat, protein, and lactose, showed clear differences among the estrous cycle phases. Fat increased by 0.14% from diestrus high-progesterone to estrous phase, whereas protein concomitantly decreased by 0.03%. Lactose appeared to remain relatively constant over diestrus high-progesterone, rising 1 d before the day of estrus followed by a gradual reduction over the subsequent phases. Specific fatty acids were also affected across the estrous cycle phases: C14:0 and C16:0 decreased (-0.34 and -0.48%) from proestrus to estrus with a concomitant increase in C18:0 and C18:1 cis-9 (0.40 and 0.73%). More general categories of fatty acids showed a similar behavior; that is, unsaturated fatty acids, monounsaturated fatty acids, polyunsaturated fatty acids, trans fatty acids, and long-chain fatty acids increased, whereas the saturated fatty acids, medium-chain fatty acids, and short-chain fatty acids decreased during the estrous phase. Finally, urea, somatic cell score, freezing point, pH, and homogenization index were also affected indicating variation associated with the hormonal and behavioral changes of cows in standing estrus. Hence, the variation in milk profiles of cows showing estrus should potentially be taken into account for precision dairy farming management.


Asunto(s)
Bovinos/fisiología , Ciclo Estral/metabolismo , Ácidos Grasos/análisis , Leche/química , Animales , Femenino , Italia , Lactancia , Embarazo
11.
J Dairy Sci ; 101(3): 2496-2505, 2018 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-29290427

RESUMEN

Data on Holstein (16,890), Brown Swiss (31,441), Simmental (25,845), and Alpine Grey (12,535) cows reared in northeastern Italy were used to assess the ability of milk components (fat, protein, casein, and lactose) and Fourier transform infrared (FTIR) spectral data to diagnose pregnancy. Pregnancy status was defined as whether a pregnancy was confirmed by a subsequent calving and no other subsequent inseminations within 90 d of the breeding of specific interest. Milk samples were analyzed for components and FTIR full-spectrum data using a MilkoScan FT+ 6000 (Foss Electric, Hillerød, Denmark). The spectrum covered 1,060 wavenumbers (wn) from 5,010 to 925 cm-1. Pregnancy status was predicted using generalized linear models with fat, protein, lactose, casein, and individual FTIR spectral bands or wavelengths as predictors. We also fitted a generalized linear model as a simultaneous function of all wavelengths (1,060 wn) with a Bayesian variable selection model using the BGLR R-package (https://r-forge.r-project.org/projects/bglr/). Prediction accuracy was determined using the area under a receiver operating characteristic curve based on a 10-fold cross-validation (CV-AUC) assessment based on sensitivities and specificities of phenotypic predictions. Overall, the best prediction accuracies were obtained for the model that included the complete FTIR spectral data. We observed similar patterns across breeds with small differences in prediction accuracy. The highest CV-AUC value was obtained for Alpine Grey cows (CV-AUC = 0.645), whereas Brown Swiss and Simmental cows had similar performance (CV-AUC = 0.630 and 0.628, respectively), followed by Holsteins (CV-AUC = 0.607). For single-wavelength analyses, important peaks were detected at wn 2,973 to 2,872 cm-1 where Fat-B (C-H stretch) is usually filtered, wn 1,773 cm-1 where Fat-A (C=O stretch) is filtered, wn 1,546 cm-1 where protein is filtered, wn 1,468 cm-1 associated with urea and fat, wn 1,399 and 1,245 cm-1 associated with acetone, and wn 1,025 to 1,013 cm-1 where lactose is filtered. In conclusion, this research provides new insight into alternative strategies for pregnancy screening of dairy cows.


Asunto(s)
Leche/química , Preñez , Espectroscopía Infrarroja por Transformada de Fourier/veterinaria , Animales , Caseínas/análisis , Bovinos , Femenino , Glucolípidos/análisis , Glicoproteínas/análisis , Italia , Lactosa/análisis , Gotas Lipídicas , Proteínas de la Leche/análisis , Embarazo
12.
New Phytol ; 213(2): 799-811, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27596807

RESUMEN

Genome-wide association studies (GWAS) have been used extensively to dissect the genetic regulation of complex traits in plants. These studies have focused largely on the analysis of common genetic variants despite the abundance of rare polymorphisms in several species, and their potential role in trait variation. Here, we conducted the first GWAS in Populus deltoides, a genetically diverse keystone forest species in North America and an important short rotation woody crop for the bioenergy industry. We searched for associations between eight growth and wood composition traits, and common and low-frequency single-nucleotide polymorphisms detected by targeted resequencing of 18 153 genes in a population of 391 unrelated individuals. To increase power to detect associations with low-frequency variants, multiple-marker association tests were used in combination with single-marker association tests. Significant associations were discovered for all phenotypes and are indicative that low-frequency polymorphisms contribute to phenotypic variance of several bioenergy traits. Our results suggest that both common and low-frequency variants need to be considered for a comprehensive understanding of the genetic regulation of complex traits, particularly in species that carry large numbers of rare polymorphisms. These polymorphisms may be critical for the development of specialized plant feedstocks for bioenergy.


Asunto(s)
Metabolismo Energético/genética , Estudio de Asociación del Genoma Completo , Populus/genética , Carácter Cuantitativo Heredable , Secuencia de Aminoácidos , Genes de Plantas , Sitios Genéticos , Marcadores Genéticos , Proteínas de Plantas/química , Proteínas de Plantas/genética , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN
13.
Theor Appl Genet ; 130(7): 1431-1440, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28401254

RESUMEN

KEY MESSAGE: A new genomic model that incorporates genotype × environment interaction gave increased prediction accuracy of untested hybrid response for traits such as percent starch content, percent dry matter content and silage yield of maize hybrids. The prediction of hybrid performance (HP) is very important in agricultural breeding programs. In plant breeding, multi-environment trials play an important role in the selection of important traits, such as stability across environments, grain yield and pest resistance. Environmental conditions modulate gene expression causing genotype × environment interaction (G × E), such that the estimated genetic correlations of the performance of individual lines across environments summarize the joint action of genes and environmental conditions. This article proposes a genomic statistical model that incorporates G × E for general and specific combining ability for predicting the performance of hybrids in environments. The proposed model can also be applied to any other hybrid species with distinct parental pools. In this study, we evaluated the predictive ability of two HP prediction models using a cross-validation approach applied in extensive maize hybrid data, comprising 2724 hybrids derived from 507 dent lines and 24 flint lines, which were evaluated for three traits in 58 environments over 12 years; analyses were performed for each year. On average, genomic models that include the interaction of general and specific combining ability with environments have greater predictive ability than genomic models without interaction with environments (ranging from 12 to 22%, depending on the trait). We concluded that including G × E in the prediction of untested maize hybrids increases the accuracy of genomic models.


Asunto(s)
Interacción Gen-Ambiente , Genómica/métodos , Modelos Genéticos , Zea mays/genética , Ambiente , Genoma de Planta , Genotipo , Hibridación Genética , Modelos Estadísticos , Fenotipo , Fitomejoramiento , Polimorfismo de Nucleótido Simple
14.
Nat Rev Genet ; 11(12): 880-6, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21045869

RESUMEN

Although genome-wide association studies have identified markers that are associated with various human traits and diseases, our ability to predict such phenotypes remains limited. A perhaps overlooked explanation lies in the limitations of the genetic models and statistical techniques commonly used in association studies. We propose that alternative approaches, which are largely borrowed from animal breeding, provide potential for advances. We review selected methods and discuss the challenges and opportunities ahead.


Asunto(s)
Marcadores Genéticos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Enfermedad/genética , Humanos
16.
PLoS Genet ; 9(7): e1003608, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23874214

RESUMEN

Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R(2)) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R(2) based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1-b)(2), where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R(2). However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R(2). Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R(2) may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.


Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Teóricos , Sitios de Carácter Cuantitativo , Análisis de Regresión , Cruzamiento , Genoma , Humanos , Desequilibrio de Ligamiento , Fenotipo , Polimorfismo de Nucleótido Simple , Selección Genética
17.
Ann Hum Genet ; 79(2): 122-35, 2015 03.
Artículo en Inglés | MEDLINE | ID: mdl-25600682

RESUMEN

Genome-wide association studies (GWAS) have detected large numbers of variants associated with complex human traits and diseases. However, the proportion of variance explained by GWAS-significant single nucleotide polymorphisms has been usually small. This brought interest in the use of whole-genome regression (WGR) methods. However, there has been limited research on the factors that affect prediction accuracy (PA) of WGRs when applied to human data of distantly related individuals. Here, we examine, using real human genotypes and simulated phenotypes, how trait complexity, marker-quantitative trait loci (QTL) linkage disequilibrium (LD), and the model used affect the performance of WGRs. Our results indicated that the estimated rate of missing heritability is dependent on the extent of marker-QTL LD. However, this parameter was not greatly affected by trait complexity. Regarding PA our results indicated that: (a) under perfect marker-QTL LD WGR can achieve moderately high prediction accuracy, and with simple genetic architectures variable selection methods outperform shrinkage procedures and (b) under imperfect marker-QTL LD, variable selection methods can achieved reasonably good PA with simple or moderately complex genetic architectures; however, the PA of these methods deteriorated as trait complexity increases and with highly complex traits variable selection and shrinkage methods both performed poorly. This was confirmed with an analysis of human height.


Asunto(s)
Enfermedad/genética , Genoma Humano , Modelos Genéticos , Sitios de Carácter Cuantitativo , Simulación por Computador , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Análisis de Regresión
18.
Theor Appl Genet ; 127(3): 595-607, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24337101

RESUMEN

New methods that incorporate the main and interaction effects of high-dimensional markers and of high-dimensional environmental covariates gave increased prediction accuracy of grain yield in wheat across and within environments. In most agricultural crops the effects of genes on traits are modulated by environmental conditions, leading to genetic by environmental interaction (G × E). Modern genotyping technologies allow characterizing genomes in great detail and modern information systems can generate large volumes of environmental data. In principle, G × E can be accounted for using interactions between markers and environmental covariates (ECs). However, when genotypic and environmental information is high dimensional, modeling all possible interactions explicitly becomes infeasible. In this article we show how to model interactions between high-dimensional sets of markers and ECs using covariance functions. The model presented here consists of (random) reaction norm where the genetic and environmental gradients are described as linear functions of markers and of ECs, respectively. We assessed the proposed method using data from Arvalis, consisting of 139 wheat lines genotyped with 2,395 SNPs and evaluated for grain yield over 8 years and various locations within northern France. A total of 68 ECs, defined based on five phases of the phenology of the crop, were used in the analysis. Interaction terms accounted for a sizable proportion (16 %) of the within-environment yield variance, and the prediction accuracy of models including interaction terms was substantially higher (17-34 %) than that of models based on main effects only. Breeding for target environmental conditions has become a central priority of most breeding programs. Methods, like the one presented here, that can capitalize upon the wealth of genomic and environmental information available, will become increasingly important.


Asunto(s)
Genoma de Planta , Modelos Genéticos , Triticum/genética , Cruzamiento , Francia , Interacción Gen-Ambiente , Genómica , Genotipo , Fenotipo , Sitios de Carácter Cuantitativo , Selección Genética
20.
PLoS Genet ; 7(4): e1002051, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21552331

RESUMEN

Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the "missing heritability" for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h(2) up to 0.83, R(2) up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R(2) values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼ 0.80), substantial room for improvement remains.


Asunto(s)
Estatura/genética , Genoma Humano , Carácter Cuantitativo Heredable , Teorema de Bayes , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA