RESUMEN
Linkage studies have successfully mapped loci underlying monogenic disorders, but mostly failed when applied to common diseases. Conversely, genome-wide association studies (GWASs) have identified replicable associations between thousands of SNPs and complex traits, yet capture less than half of the total heritability. In the present study we reconcile these two approaches by showing that linkage signals of height and body mass index (BMI) from 119,000 sibling pairs colocalize with GWAS-identified loci. Concordant with polygenicity, we observed the following: a genome-wide inflation of linkage test statistics; that GWAS results predict linkage signals; and that adjusting phenotypes for polygenic scores reduces linkage signals. Finally, we developed a method using recombination rate-stratified, identity-by-descent sharing between siblings to unbiasedly estimate heritability of height (0.76 ± 0.05) and BMI (0.55 ± 0.07). Our results imply that substantial heritability remains unaccounted for by GWAS-identified loci and this residual genetic variation is polygenic and enriched near these loci.
RESUMEN
The causes of temporal fluctuations in adult traits are poorly understood. Here, we investigate the genetic determinants of within-person trait variability of 8 repeatedly measured anthropometric traits in 50,117 individuals from the UK Biobank. We found that within-person (non-directional) variability had a SNP-based heritability of 2-5% for height, sitting height, body mass index (BMI) and weight (P ≤ 2.4 × 10-3). We also analysed longitudinal trait change and show a loss of both average height and weight beyond about 70 years of age. A variant tracking the Alzheimer's risk APOE- E 4 allele (rs429358) was significantly associated with weight loss ( ß = -0.047 kg per yr, s.e. 0.007, P = 2.2 × 10-11), and using 2-sample Mendelian Randomisation we detected a relationship consistent with causality between decreased lumbar spine bone mineral density and height loss (bxy = 0.011, s.e. 0.003, P = 3.5 × 10-4). Finally, population-level variance quantitative trait loci (vQTL) were consistent with within-person variability for several traits, indicating an overlap between trait variability assessed at the population or individual level. Our findings help elucidate the genetic influence on trait-change within an individual and highlight disease risks associated with these changes.
Asunto(s)
Apolipoproteínas E , Estatura , Índice de Masa Corporal , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Adulto , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Alelos , Enfermedad de Alzheimer/genética , Antropometría , Apolipoproteínas E/genética , Estatura/genética , Peso Corporal/genética , Densidad Ósea/genética , Estudio de Asociación del Genoma Completo , Estudios Longitudinales , Vértebras Lumbares , Análisis de la Aleatorización Mendeliana , Biobanco del Reino Unido , Reino UnidoRESUMEN
We develop a method, SBayesRC, that integrates genome-wide association study (GWAS) summary statistics with functional genomic annotations to improve polygenic prediction of complex traits. Our method is scalable to whole-genome variant analysis and refines signals from functional annotations by allowing them to affect both causal variant probability and causal effect distribution. We analyze 50 complex traits and diseases using â¼7 million common single-nucleotide polymorphisms (SNPs) and 96 annotations. SBayesRC improves prediction accuracy by 14% in European ancestry and up to 34% in cross-ancestry prediction compared to the baseline method SBayesR, which does not use annotations, and outperforms other methods, including LDpred2, LDpred-funct, MegaPRS, PolyPred-S and PRS-CSx. Investigation of factors affecting prediction accuracy identifies a significant interaction between SNP density and annotation information, suggesting whole-genome sequence variants with annotations may further improve prediction. Functional partitioning analysis highlights a major contribution of evolutionary constrained regions to prediction accuracy and the largest per-SNP contribution from nonsynonymous SNPs.
Asunto(s)
Estudio de Asociación del Genoma Completo , Anotación de Secuencia Molecular , Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Herencia Multifactorial/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Anotación de Secuencia Molecular/métodos , Genómica/métodos , Genoma Humano , Modelos GenéticosRESUMEN
Complement components have been linked to schizophrenia and autoimmune disorders. We examined the association between neonatal circulating C3 and C4 protein concentrations in 68,768 neonates and the risk of six mental disorders. We completed genome-wide association studies (GWASs) for C3 and C4 and applied the summary statistics in Mendelian randomization and phenome-wide association studies related to mental and autoimmune disorders. The GWASs for C3 and C4 protein concentrations identified 15 and 36 independent loci, respectively. We found no associations between neonatal C3 and C4 concentrations and mental disorders in the total sample (both sexes combined); however, post-hoc analyses found that a higher C3 concentration was associated with a reduced risk of schizophrenia in females. Mendelian randomization based on C4 summary statistics found an altered risk of five types of autoimmune disorders. Our study adds to our understanding of the associations between C3 and C4 concentrations and subsequent mental and autoimmune disorders.
RESUMEN
Genome-wide association studies (GWASs) have been mostly conducted in populations of European ancestry, which currently limits the transferability of their findings to other populations. Here, we show, through theory, simulations and applications to real data, that adjustment of GWAS analyses for polygenic scores (PGSs) increases the statistical power for discovery across all ancestries. We applied this method to analyze seven traits available in three large biobanks with participants of East Asian ancestry (n = 340,000 in total) and report 139 additional associations across traits. We also present a two-stage meta-analysis strategy whereby, in contributing cohorts, a PGS-adjusted GWAS is rerun using PGSs derived from a first round of a standard meta-analysis. On average, across traits, this approach yields a 1.26-fold increase in the number of detected associations (range 1.07- to 1.76-fold increase). Altogether, our study demonstrates the value of using PGSs to increase the power of GWASs in underrepresented populations and promotes such an analytical strategy for future GWAS meta-analyses.
Asunto(s)
Pueblos del Este de Asia , Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Polimorfismo de Nucleótido Simple , Pueblos del Este de Asia/genéticaRESUMEN
The genetic regulation of post-prandial glucose levels is poorly understood. Here, we characterise the genetic architecture of blood glucose variably measured within 0 and 24 h of fasting in 368,000 European ancestry participants of the UK Biobank. We found a near-linear increase in the heritability of non-fasting glucose levels over time, which plateaus to its fasting state value after 5 h post meal (h2 = 11%; standard error: 1%). The genetic correlation between different fasting times is > 0.77, suggesting that the genetic control of glucose is largely constant across fasting durations. Accounting for heritability differences between fasting times leads to a ~16% improvement in the discovery of genetic variants associated with glucose. Newly detected variants improve the prediction of fasting glucose and type 2 diabetes in independent samples. Finally, we meta-analysed summary statistics from genome-wide association studies of random and fasting glucose (N = 518,615) and identified 156 independent SNPs explaining 3% of fasting glucose variance. Altogether, our study demonstrates the utility of random glucose measures to improve the discovery of genetic variants associated with glucose homeostasis, even in fasting conditions.
Asunto(s)
Glucemia , Diabetes Mellitus Tipo 2 , Humanos , Glucemia/análisis , Diabetes Mellitus Tipo 2/genética , Estudio de Asociación del Genoma Completo , Glucosa , Ayuno , Polimorfismo de Nucleótido SimpleRESUMEN
Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.
Asunto(s)
Estatura , Mapeo Cromosómico , Polimorfismo de Nucleótido Simple , Humanos , Estatura/genética , Frecuencia de los Genes/genética , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Haplotipos/genética , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética , Europa (Continente)/etnología , Tamaño de la Muestra , FenotipoRESUMEN
Purpose: Covariance between gray-matter measurements can reflect structural or functional brain networks though it has also been shown to be influenced by confounding factors (e.g., age, head size, and scanner), which could lead to lower mapping precision (increased size of associated clusters) and create distal false positives associations in mass-univariate vertexwise analyses. Approach: We evaluated this concern by performing state-of-the-art mass-univariate analyses (general linear model, GLM) on traits simulated from real vertex-wise gray matter data (including cortical and subcortical thickness and surface area). We contrasted the results with those from linear mixed models (LMMs), which have been shown to overcome similar issues in omics association studies. Results: We showed that when performed on a large sample ( N = 8662 , UK Biobank), GLMs yielded greatly inflated false positive rate (cluster false discovery rate > 0.6 ). We showed that LMMs resulted in more parsimonious results: smaller clusters and reduced false positive rate but at a cost of increased computation. Next, we performed mass-univariate association analyses on five real UKB traits (age, sex, BMI, fluid intelligence, and smoking status) and LMM yielded fewer and more localized associations. We identified 19 significant clusters displaying small associations with age, sex, and BMI, which suggest a complex architecture of at least dozens of associated areas with those phenotypes. Conclusions: The published literature could contain a large proportion of redundant (possibly confounded) associations that are largely prevented using LMMs. The parsimony of LMMs results from controlling for the joint effect of all vertices, which prevents local and distal redundant associations from reaching significance.
RESUMEN
Schizophrenia has a heritability of 60-80%1, much of which is attributable to common risk alleles. Here, in a two-stage genome-wide association study of up to 76,755 individuals with schizophrenia and 243,649 control individuals, we report common variant associations at 287 distinct genomic loci. Associations were concentrated in genes that are expressed in excitatory and inhibitory neurons of the central nervous system, but not in other tissues or cell types. Using fine-mapping and functional genomic data, we identify 120 genes (106 protein-coding) that are likely to underpin associations at some of these loci, including 16 genes with credible causal non-synonymous or untranslated region variation. We also implicate fundamental processes related to neuronal function, including synaptic organization, differentiation and transmission. Fine-mapped candidates were enriched for genes associated with rare disruptive coding variants in people with schizophrenia, including the glutamate receptor subunit GRIN2A and transcription factor SP4, and were also enriched for genes implicated by such variants in neurodevelopmental disorders. We identify biological processes relevant to schizophrenia pathophysiology; show convergence of common and rare variant associations in schizophrenia and neurodevelopmental disorders; and provide a resource of prioritized genes and variants to advance mechanistic studies.
Asunto(s)
Estudio de Asociación del Genoma Completo , Esquizofrenia , Alelos , Predisposición Genética a la Enfermedad/genética , Genómica , Humanos , Polimorfismo de Nucleótido Simple/genética , Esquizofrenia/genéticaRESUMEN
We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12-16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI's magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.
Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only ≤10% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32-44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data.
Asunto(s)
Estudio de Asociación del Genoma Completo , Genómica , Herencia Multifactorial/genética , Teorema de Bayes , Estatura , Índice de Masa Corporal , Enfermedades Cardiovasculares , Diabetes Mellitus Tipo 2 , Técnicas Genéticas , Variación Genética , Genotipo , Humanos , Intrones , Modelos Estadísticos , Sistemas de Lectura Abierta , Fenotipo , Programas InformáticosRESUMEN
This study investigates if genetic factors could contribute to the high rate of mood disorders reported in a U.S. community known to have a restricted early founder population (confirmed here through runs of homozygosity analysis). Polygenic scores (PGSs) for eight common diseases, disorders, or traits, including psychiatric disorders, were calculated in 274 participants (125 mood disorder cases) who each reported three or four grandparents born in the community. Ancestry-matched controls were selected from the UK Biobank (UKB; three sets of N = 1,822 each). The mean PGSs were significantly higher in the community for major depression PRS (p = 2.1 × 10-19 , 0.56 SD units), bipolar disorder (p = 2.5 × 10-15 , 0.56 SD units), and schizophrenia (p = 3.8 × 10-21 , 0.64 SD units). The PGSs were not significantly different between the community participants and UKB controls for the traits of body mass index, Type 2 diabetes, coronary artery disease, and chronotype. The mean PGSs for height were significantly lower in the community sample compared to controls (-0.21 SD units, p = 1.2 × 10-5 ). The results are consistent with enrichment of polygenic risk factors for psychiatric disorders in this community.
Asunto(s)
Trastorno Bipolar , Trastorno Depresivo Mayor , Diabetes Mellitus Tipo 2 , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Herencia Multifactorial/genéticaRESUMEN
Non-additive genetic variance for complex traits is traditionally estimated from data on relatives. It is notoriously difficult to estimate without bias in non-laboratory species, including humans, because of possible confounding with environmental covariance among relatives. In principle, non-additive variance attributable to common DNA variants can be estimated from a random sample of unrelated individuals with genome-wide SNP data. Here, we jointly estimate the proportion of variance explained by additive (hSNP2), dominance (δSNP2) and additive-by-additive (ηSNP2) genetic variance in a single analysis model. We first show by simulations that our model leads to unbiased estimates and provide a new theory to predict standard errors estimated using either least-squares or maximum likelihood. We then apply the model to 70 complex traits using 254,679 unrelated individuals from the UK Biobank and 1.1 M genotyped and imputed SNPs. We found strong evidence for additive variance (average across traits h¯SNP2=0.208). In contrast, the average estimate of δ¯SNP2 across traits was 0.001, implying negligible dominance variance at causal variants tagged by common SNPs. The average epistatic variance η¯SNP2 across the traits was 0.055, not significantly different from zero because of the large sampling variance. Our results provide new evidence that genetic variance for complex traits is predominantly additive and that sample sizes of many millions of unrelated individuals are needed to estimate epistatic variance with sufficient precision.
Asunto(s)
Conjuntos de Datos como Asunto , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Bancos de Muestras Biológicas , Epistasis Genética , Femenino , Genotipo , Humanos , Masculino , Modelos Genéticos , Fenotipo , Reproducibilidad de los Resultados , Reino UnidoRESUMEN
Genetic factors are recognized to contribute to peptic ulcer disease (PUD) and other gastrointestinal diseases, such as gastro-oesophageal reflux disease (GORD), irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD). Here, genome-wide association study (GWAS) analyses based on 456,327 UK Biobank (UKB) individuals identify 8 independent and significant loci for PUD at, or near, genes MUC1, MUC6, FUT2, PSCA, ABO, CDX2, GAST and CCKBR. There are previously established roles in susceptibility to Helicobacter pylori infection, response to counteract infection-related damage, gastric acid secretion or gastrointestinal motility for these genes. Only two associations have been previously reported for duodenal ulcer, here replicated trans-ancestrally. The results highlight the role of host genetic susceptibility to infection. Post-GWAS analyses for PUD, GORD, IBS and IBD add insights into relationships between these gastrointestinal diseases and their relationships with depression, a commonly comorbid disorder.
Asunto(s)
Depresión , Enfermedades Gastrointestinales/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Infecciones por Helicobacter/genética , Helicobacter pylori/genética , Úlcera Péptica/genética , Sistema del Grupo Sanguíneo ABO/genética , Antígenos de Neoplasias/genética , Factor de Transcripción CDX2/genética , Úlcera Duodenal , Femenino , Fucosiltransferasas/genética , Proteínas Ligadas a GPI , Galactosiltransferasas , Reflujo Gastroesofágico , Infecciones por Helicobacter/complicaciones , Humanos , Enfermedades Inflamatorias del Intestino , Masculino , Mucina-1/genética , Mucina 6/genética , Proteínas de Neoplasias , Úlcera Péptica/complicaciones , Galactósido 2-alfa-L-FucosiltransferasaRESUMEN
Genetic association studies have identified 44 common genome-wide significant risk loci for late-onset Alzheimer's disease (LOAD). However, LOAD genetic architecture and prediction are unclear. Here we estimate the optimal P-threshold (Poptimal) of a genetic risk score (GRS) for prediction of LOAD in three independent datasets comprising 676 cases and 35,675 family history proxy cases. We show that the discriminative ability of GRS in LOAD prediction is maximised when selecting a small number of SNPs. Both simulation results and direct estimation indicate that the number of causal common SNPs for LOAD may be less than 100, suggesting LOAD is more oligogenic than polygenic. The best GRS explains approximately 75% of SNP-heritability, and individuals in the top decile of GRS have ten-fold increased odds when compared to those in the bottom decile. In addition, 14 variants are identified that contribute to both LOAD risk and age at onset of LOAD.
Asunto(s)
Enfermedad de Alzheimer/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad/genética , Adulto , Edad de Inicio , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Factores de RiesgoRESUMEN
Vitamin D deficiency is a candidate risk factor for a range of adverse health outcomes. In a genome-wide association study of 25 hydroxyvitamin D (25OHD) concentration in 417,580 Europeans we identify 143 independent loci in 112 1-Mb regions, providing insights into the physiology of vitamin D and implicating genes involved in lipid and lipoprotein metabolism, dermal tissue properties, and the sulphonation and glucuronidation of 25OHD. Mendelian randomization models find no robust evidence that 25OHD concentration has causal effects on candidate phenotypes (e.g. BMI, psychiatric disorders), but many phenotypes have (direct or indirect) causal effects on 25OHD concentration, clarifying the epidemiological relationship between 25OHD status and the health outcomes examined in this study.
Asunto(s)
Deficiencia de Vitamina D/genética , Vitamina D/análogos & derivados , Adulto , Anciano , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Análisis de la Aleatorización Mendeliana , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Reino Unido , Vitamina D/sangre , Deficiencia de Vitamina D/sangre , Población Blanca/genéticaRESUMEN
Motivated by observational studies that report associations between schizophrenia and traits, such as poor diet, increased body mass index and metabolic disease, we investigated the genetic contribution to dietary intake in a sample of 335,576 individuals from the UK Biobank study. A principal component analysis applied to diet question item responses generated two components: Diet Component 1 (DC1) represented a meat-related diet and Diet Component 2 (DC2) a fish and plant-related diet. Genome-wide association analysis identified 29 independent single-nucleotide polymorphisms (SNPs) associated with DC1 and 63 SNPs with DC2. Estimated from over 35,000 3rd-degree relative pairs that are unlikely to share close family environments, heritabilities for both DC1 and DC2 were 0.16 (standard error (s.e.) = 0.05). SNP-based heritability was 0.06 (s.e. = 0.003) for DC1 and 0.08 (s.e = 0.004) for DC2. We estimated significant genetic correlations between both DCs and schizophrenia, and several other traits. Mendelian randomisation analyses indicated a negative uni-directional relationship between liability to schizophrenia and tendency towards selecting a meat-based diet (which could be direct or via unidentified correlated variables), but a bi-directional relationship between liability to schizophrenia and tendency towards selecting a fish and plant-based diet consistent with genetic pleiotropy.
Asunto(s)
Estudio de Asociación del Genoma Completo , Esquizofrenia , Bancos de Muestras Biológicas , Ingestión de Alimentos , Humanos , Polimorfismo de Nucleótido Simple , Esquizofrenia/genética , Reino UnidoRESUMEN
INTRODUCTION: Celiac disease is an autoimmune disorder where intestinal immunopathology arises after gluten consumption. Previous studies suggested that hookworm infection restores gluten tolerance; however, these studies were small (n = 12) and not placebo controlled. METHODS: We undertook a randomized, placebo-controlled trial of hookworm infection in 54 people with celiac disease. The 94-week study involved treatment with either 20 or 40 Necator americanus third-stage larvae (L3-20 or L3-40) or placebo, followed by escalating gluten consumption (50 mg/d for 12 weeks, 1 g intermittent twice weekly for 12 weeks, 2 g/d sustained for 6 weeks, liberal diet for 1 year). RESULTS: Successful study completion rates at week 42 (primary outcome) were similar in each group (placebo: 57%, L3-20: 37%, and L3-40: 44%; P = 0.61), however gluten-related adverse events were significantly reduced in hookworm-treated participants: Median (range) adverse events/participant were as follows: placebo, 4 (1-9); L3-20, 1 (0-9); and L3-40, 0 (0-3) (P = 0.019). Duodenal villous height:crypt depth deteriorated similarly compared with their enrolment values in each group (mean change [95% confidence interval]: placebo, -0.6 [-1.3 to 0.2]; L3-20, -0.5 [-0.8 to 0.2]; and L3-40, -1.1 [-1.8 to 0.4]; P = 0.12). A retrospective analysis revealed that 9 of the 40 L3-treated participants failed to establish hookworm infections. Although week 42 completion rates were similar in hookworm-positive vs hookworm-negative participants (48% vs 44%, P = 0.43), quality of life symptom scores were lower in hookworm-positive participants after intermittent gluten challenge (mean [95% confidence interval]: 38.9 [33.9-44] vs 45.9 [39.2-52.6]). DISCUSSION: Hookworm infection does not restore tolerance to sustained moderate consumption of gluten (2 g/d) but was associated with improved symptom scores after intermittent consumption of lower, intermittent gluten doses.
Asunto(s)
Enfermedad Celíaca/terapia , Glútenes/inmunología , Larva/metabolismo , Necator americanus/metabolismo , Terapia con Helmintos/métodos , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Animales , Enfermedad Celíaca/inmunología , Método Doble Ciego , Femenino , Glútenes/administración & dosificación , Glútenes/metabolismo , Humanos , Masculino , Persona de Mediana Edad , Calidad de Vida , Terapia con Helmintos/efectos adversos , Resultado del Tratamiento , Adulto JovenRESUMEN
Accurate prediction of an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.