Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
2.
Cell Genom ; 3(12): 100457, 2023 Dec 13.
Article in English | MEDLINE | ID: mdl-38116117

ABSTRACT

Complement components have been linked to schizophrenia and autoimmune disorders. We examined the association between neonatal circulating C3 and C4 protein concentrations in 68,768 neonates and the risk of six mental disorders. We completed genome-wide association studies (GWASs) for C3 and C4 and applied the summary statistics in Mendelian randomization and phenome-wide association studies related to mental and autoimmune disorders. The GWASs for C3 and C4 protein concentrations identified 15 and 36 independent loci, respectively. We found no associations between neonatal C3 and C4 concentrations and mental disorders in the total sample (both sexes combined); however, post-hoc analyses found that a higher C3 concentration was associated with a reduced risk of schizophrenia in females. Mendelian randomization based on C4 summary statistics found an altered risk of five types of autoimmune disorders. Our study adds to our understanding of the associations between C3 and C4 concentrations and subsequent mental and autoimmune disorders.

3.
Am J Hum Genet ; 110(12): 2042-2055, 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-37944514

ABSTRACT

LDpred2 is a widely used Bayesian method for building polygenic scores (PGSs). LDpred2-auto can infer the two parameters from the LDpred model, the SNP heritability h2 and polygenicity p, so that it does not require an additional validation dataset to choose best-performing parameters. The main aim of this paper is to properly validate the use of LDpred2-auto for inferring multiple genetic parameters. Here, we present a new version of LDpred2-auto that adds an optional third parameter α to its model, for modeling negative selection. We then validate the inference of these three parameters (or two, when using the previous model). We also show that LDpred2-auto provides per-variant probabilities of being causal that are well calibrated and can therefore be used for fine-mapping purposes. We also introduce a formula to infer the out-of-sample predictive performance r2 of the resulting PGS directly from the Gibbs sampler of LDpred2-auto. Finally, we extend the set of HapMap3 variants recommended to use with LDpred2 with 37% more variants to improve the coverage of this set, and we show that this new set of variants captures 12% more heritability and provides 6% more predictive performance, on average, in UK Biobank analyses.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Bayes Theorem , Genome-Wide Association Study/methods , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide/genetics
4.
Nat Commun ; 14(1): 5553, 2023 09 09.
Article in English | MEDLINE | ID: mdl-37689771

ABSTRACT

Proportional hazards models have been proposed to analyse time-to-event phenotypes in genome-wide association studies (GWAS). However, little is known about the ability of proportional hazards models to identify genetic associations under different generative models and when ascertainment is present. Here we propose the age-dependent liability threshold (ADuLT) model as an alternative to a Cox regression based GWAS, here represented by SPACox. We compare ADuLT, SPACox, and standard case-control GWAS in simulations under two generative models and with varying degrees of ascertainment as well as in the iPSYCH cohort. We find Cox regression GWAS to be underpowered when cases are strongly ascertained (cases are oversampled by a factor 5), regardless of the generative model used. ADuLT is robust to ascertainment in all simulated scenarios. Then, we analyse four psychiatric disorders in iPSYCH, ADHD, Autism, Depression, and Schizophrenia, with a strong case-ascertainment. Across these psychiatric disorders, ADuLT identifies 20 independent genome-wide significant associations, case-control GWAS finds 17, and SPACox finds 8, which is consistent with simulation results. As more genetic data are being linked to electronic health records, robust GWAS methods that can make use of age-of-onset information will help increase power in analyses for common health outcomes.


Subject(s)
Autistic Disorder , Genome-Wide Association Study , Humans , Computer Simulation , Electronic Health Records , Factor V
5.
Nat Commun ; 14(1): 4702, 2023 08 05.
Article in English | MEDLINE | ID: mdl-37543680

ABSTRACT

The predictive performance of polygenic scores (PGS) is largely dependent on the number of samples available to train the PGS. Increasing the sample size for a specific phenotype is expensive and takes time, but this sample size can be effectively increased by using genetically correlated phenotypes. We propose a framework to generate multi-PGS from thousands of publicly available genome-wide association studies (GWAS) with no need to individually select the most relevant ones. In this study, the multi-PGS framework increases prediction accuracy over single PGS for all included psychiatric disorders and other available outcomes, with prediction R2 increases of up to 9-fold for attention-deficit/hyperactivity disorder compared to a single PGS. We also generate multi-PGS for phenotypes without an existing GWAS and for case-case predictions. We benchmark the multi-PGS framework against other methods and highlight its potential application to new emerging biobanks.


Subject(s)
Attention Deficit Disorder with Hyperactivity , Genome-Wide Association Study , Humans , Attention Deficit Disorder with Hyperactivity/genetics , Phenotype , Multifactorial Inheritance/genetics
6.
Nature ; 618(7966): 774-781, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37198491

ABSTRACT

Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.


Subject(s)
Multifactorial Inheritance , Racial Groups , Humans , Europe/ethnology , Hispanic or Latino/genetics , Multifactorial Inheritance/genetics , Racial Groups/genetics , United Kingdom , White People/genetics , European People/genetics , Los Angeles , Databases, Genetic
7.
Nat Commun ; 14(1): 852, 2023 02 15.
Article in English | MEDLINE | ID: mdl-36792583

ABSTRACT

The vitamin D binding protein (DBP), encoded by the group-specific component (GC) gene, is a component of the vitamin D system. In a genome-wide association study of DBP concentration in 65,589 neonates we identify 26 independent loci, 17 of which are in or close to the GC gene, with fine-mapping identifying 2 missense variants on chromosomes 12 and 17 (within SH2B3 and GSDMA, respectively). When adjusted for GC haplotypes, we find 15 independent loci distributed over 10 chromosomes. Mendelian randomization analyses identify a unidirectional effect of higher DBP concentration and (a) higher 25-hydroxyvitamin D concentration, and (b) a reduced risk of multiple sclerosis and rheumatoid arthritis. A phenome-wide association study confirms that higher DBP concentration is associated with a reduced risk of vitamin D deficiency. Our findings provide valuable insights into the influence of DBP on vitamin D status and a range of health outcomes.


Subject(s)
Genome-Wide Association Study , Vitamin D-Binding Protein , Infant, Newborn , Humans , Vitamin D-Binding Protein/genetics , Vitamin D/genetics , Calcifediol , Vitamins , Polymorphism, Single Nucleotide , Pore Forming Cytotoxic Proteins/genetics
8.
J Clin Endocrinol Metab ; 108(5): e89-e97, 2023 04 13.
Article in English | MEDLINE | ID: mdl-36413496

ABSTRACT

BACKGROUND: Resource trade-off theory suggests that increased performance on a given trait comes at the cost of decreased performance on other traits. METHODS: Growth data from 1889 subjects (996 girls) were used from the GrowUp1974 Gothenburg study. Energy Trade-Off (ETO) between height and weight for individuals with extreme body types was characterized using a novel ETO-Score (ETOS). Four extreme body types were defined based on height and ETOI at early adulthood: tall-slender, short-stout, short-slender, and tall-stout; their growth trajectories assessed from ages 0.5-17.5 years.A GWAS using UK BioBank data was conducted to identify gene variants associated with height, BMI, and for the first time with ETOS. RESULTS: Height and ETOS trajectories show a two-hit pattern with profound changes during early infancy and at puberty for tall-slender and short-stout body types. Several loci (including FTO, ADCY3, GDF5, ) and pathways were identified by GWAS as being highly associated with ETOS. The most strongly associated pathways were related to "extracellular matrix," "signal transduction," "chromatin organization," and "energy metabolism." CONCLUSIONS: ETOS represents a novel anthropometric trait with utility in describing body types. We discovered the multiple genomic loci and pathways probably involved in energy trade-off.


Subject(s)
Puberty , Somatotypes , Female , Humans , Adult , Infant , Child, Preschool , Child , Adolescent , Phenotype , Anthropometry , Energy Metabolism/genetics , Body Height/genetics , Alpha-Ketoglutarate-Dependent Dioxygenase FTO/genetics
9.
Biol Psychiatry ; 93(1): 29-36, 2023 01 01.
Article in English | MEDLINE | ID: mdl-35973856

ABSTRACT

BACKGROUND: Single nucleotide polymorphism-based heritability is a fundamental quantity in the genetic analysis of complex traits. For case-control phenotypes, for which the continuous distribution of risk in the population is unobserved, observed-scale heritability estimates must be transformed to the more interpretable liability scale. This article describes how the field standard approach incorrectly performs the liability correction in that it does not appropriately account for variation in the proportion of cases across the cohorts comprising the meta-analysis. We propose a simple solution that incorporates cohort-specific ascertainment using the summation of effective sample sizes across cohorts. This solution is applied at the stage of single nucleotide polymorphism-based heritability estimation and does not require generating updated meta-analytic genome-wide association study summary statistics. METHODS: We began by performing a series of simulations to examine the ability of the standard approach and our proposed approach to recapture liability-scale heritability in the population. We went on to examine the differences in estimates obtained from these 2 approaches for real data for 12 major case-control genome-wide association studies of psychiatric and neurologic traits. RESULTS: We found that the field standard approach for performing the liability conversion can downwardly bias estimates by as much as approximately 50% in simulation and approximately 30% in real data. CONCLUSIONS: Prior estimates of liability-scale heritability for genome-wide association study meta-analysis may be drastically underestimated. To this end, we strongly recommend using our proposed approach of using the sum of effective sample sizes across contributing cohorts to obtain unbiased estimates.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Polymorphism, Single Nucleotide/genetics , Phenotype , Case-Control Studies
10.
HGG Adv ; 3(4): 100136, 2022 Oct 13.
Article in English | MEDLINE | ID: mdl-36105883

ABSTRACT

Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power.

11.
Bioinformatics ; 38(13): 3477-3480, 2022 06 27.
Article in English | MEDLINE | ID: mdl-35604078

ABSTRACT

MOTIVATION: Measuring genetic diversity is an important problem because increasing genetic diversity is a key to making new genetic discoveries, while also being a major source of confounding to be aware of in genetics studies. RESULTS: Using the UK Biobank data, a prospective cohort study with deep genetic and phenotypic data collected on almost 500 000 individuals from across the UK, we carefully define 21 distinct ancestry groups from all four corners of the world. These ancestry groups can serve as a global reference of worldwide populations, with a handful of applications. Here, we develop a method that uses allele frequencies and principal components derived from these ancestry groups to effectively measure ancestry proportions from allele frequencies of any genetic dataset. AVAILABILITY AND IMPLEMENTATION: This method is implemented in function snp_ancestry_summary of R package bigsnpr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Biological Specimen Banks , Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Genotype , Prospective Studies , United Kingdom , Polymorphism, Single Nucleotide
12.
Am J Hum Genet ; 109(3): 417-432, 2022 03 03.
Article in English | MEDLINE | ID: mdl-35139346

ABSTRACT

Genome-wide association studies (GWASs) have revolutionized human genetics, allowing researchers to identify thousands of disease-related genes and possible drug targets. However, case-control status does not account for the fact that not all controls may have lived through their period of risk for the disorder of interest. This can be quantified by examining the age-of-onset distribution and the age of the controls or the age of onset for cases. The age-of-onset distribution may also depend on information such as sex and birth year. In addition, family history is not routinely included in the assessment of control status. Here, we present LT-FH++, an extension of the liability threshold model conditioned on family history (LT-FH), which jointly accounts for age of onset and sex as well as family history. Using simulations, we show that, when family history and the age-of-onset distribution are available, the proposed approach yields statistically significant power gains over LT-FH and large power gains over genome-wide association study by proxy (GWAX). We applied our method to four psychiatric disorders available in the iPSYCH data and to mortality in the UK Biobank and found 20 genome-wide significant associations with LT-FH++, compared to ten for LT-FH and eight for a standard case-control GWAS. As more genetic data with linked electronic health records become available to researchers, we expect methods that account for additional health information, such as LT-FH++, to become even more beneficial.


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Age of Onset , Case-Control Studies , Genome-Wide Association Study/methods , Humans , Medical History Taking
14.
J Clin Psychiatry ; 83(1)2022 01 04.
Article in English | MEDLINE | ID: mdl-34985833

ABSTRACT

Objective: To estimate phenotypic and familial association between early-life injuries and attention-deficit/hyperactivity disorder (ADHD) and the genetic contribution to the association using polygenic risk score for ADHD (PRS-ADHD) and genetic correlation analyses.Methods: Children born in Denmark between 1995-2010 (n = 786,543) were followed from age 5 years until a median age of 14 years (interquartile range: 10-18 years). Using ICD-10 diagnoses, we estimated hazard ratios (HRs) and absolute risks of ADHD by number of hospital/emergency ward-treated injuries by age 5. In a subset of ADHD cases and controls born 1995 to 2005 who had genetic data available (n = 16,580), we estimated incidence rate ratios (IRRs) for the association between PRS-ADHD and number of injuries before age 5 and the genetic correlation between ADHD and any injury before age 5.Results: Injuries were associated with ADHD (HR = 1.61; 95% CI, 1.55-1.66) in males (HR = 1.59; 1.53-1.65) and females (HR = 1.65; 1.54-1.77), with a dose-response relationship with number of injuries. The absolute ADHD risk by age 15 was 8.4% (3+ injuries) vs 3.1% (no injuries). ADHD was also associated with injuries in relatives, with a stronger association in first- than second-degree relatives. PRS-ADHD was marginally associated with the number of injuries in the general population (IRR = 1.06; 1.00-1.14), with a genetic correlation of 0.53 (0.21-0.85).Conclusions: Early-life injuries in individuals and their relatives were associated with a diagnosis of ADHD. However, even in children with the most injuries, more than 90% were not diagnosed with ADHD by age 15. Despite a low positive predictive value and that the impact of unmeasured factors such as parental behavior remains unclear, results indicate that the association is partly explained by genetics, suggesting that early-life injuries may represent or herald early behavioral manifestations of ADHD.


Subject(s)
Attention Deficit Disorder with Hyperactivity/epidemiology , Wounds and Injuries/epidemiology , Adolescent , Attention Deficit Disorder with Hyperactivity/genetics , Child , Child, Preschool , Cohort Studies , Denmark/epidemiology , Female , Humans , Male , Phenotype , Proportional Hazards Models , Risk Factors
15.
Am J Hum Genet ; 109(1): 12-23, 2022 01 06.
Article in English | MEDLINE | ID: mdl-34995502

ABSTRACT

The low portability of polygenic scores (PGSs) across global populations is a major concern that must be addressed before PGSs can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGSs are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a sub-continental level, based on a simple, robust, and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes and show a systematic and dramatic reduction in portability of PGSs trained using Northwestern European individuals and applied to nine ancestry groups. These analyses demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to genetic distance. Altogether, our study provides unique and robust insights into the PGS portability problem.


Subject(s)
Genetic Association Studies/methods , Genetic Predisposition to Disease , Genetics, Population/methods , Multifactorial Inheritance , Algorithms , Alleles , Biological Specimen Banks , Genetic Variation , Genome-Wide Association Study , Genotype , Humans , Models, Genetic , Phenotype , Reproducibility of Results , United Kingdom
16.
Nat Genet ; 54(1): 30-39, 2022 01.
Article in English | MEDLINE | ID: mdl-34931067

ABSTRACT

Although the cohort-level accuracy of polygenic risk scores (PRSs)-estimates of genetic value at the individual level-has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual's PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank (n = 291,273 unrelated 'white British'), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.


Subject(s)
Genetic Predisposition to Disease , Multifactorial Inheritance , Risk Assessment , Uncertainty , Genetic Association Studies , Genome-Wide Association Study , Humans , Models, Genetic , Models, Statistical
17.
Bioinformatics ; 38(1): 255-256, 2021 12 22.
Article in English | MEDLINE | ID: mdl-34260708

ABSTRACT

MOTIVATION: A few algorithms have been developed for splitting the genome in nearly independent blocks of linkage disequilibrium. Due to the complexity of this problem, these algorithms rely on heuristics, which makes them suboptimal. RESULTS: Here, we develop an optimal solution for this problem using dynamic programming. AVAILABILITY: This is now implemented as function snp_ldsplit as part of R package bigsnpr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Linkage Disequilibrium , Software , Humans , Genome, Human , Computational Biology
18.
Nat Commun ; 12(1): 4192, 2021 07 07.
Article in English | MEDLINE | ID: mdl-34234142

ABSTRACT

Most existing tools for constructing genetic prediction models begin with the assumption that all genetic variants contribute equally towards the phenotype. However, this represents a suboptimal model for how heritability is distributed across the genome. Therefore, we develop prediction tools that allow the user to specify the heritability model. We compare individual-level data prediction tools using 14 UK Biobank phenotypes; our new tool LDAK-Bolt-Predict outperforms the existing tools Lasso, BLUP, Bolt-LMM and BayesR for all 14 phenotypes. We compare summary statistic prediction tools using 225 UK Biobank phenotypes; our new tool LDAK-BayesR-SS outperforms the existing tools lassosum, sBLUP, LDpred and SBayesR for 223 of the 225 phenotypes. When we improve the heritability model, the proportion of phenotypic variance explained increases by on average 14%, which is equivalent to increasing the sample size by a quarter.


Subject(s)
Forecasting/methods , Models, Genetic , Multifactorial Inheritance , Precision Medicine/methods , Quantitative Trait, Heritable , Case-Control Studies , Datasets as Topic , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Sample Size , Software
19.
Am J Hum Genet ; 108(6): 1001-1011, 2021 06 03.
Article in English | MEDLINE | ID: mdl-33964208

ABSTRACT

The accuracy of polygenic risk scores (PRSs) to predict complex diseases increases with the training sample size. PRSs are generally derived based on summary statistics from large meta-analyses of multiple genome-wide association studies (GWASs). However, it is now common for researchers to have access to large individual-level data as well, such as the UK Biobank data. To the best of our knowledge, it has not yet been explored how best to combine both types of data (summary statistics and individual-level data) to optimize polygenic prediction. The most widely used approach to combine data is the meta-analysis of GWAS summary statistics (meta-GWAS), but we show that it does not always provide the most accurate PRS. Through simulations and using 12 real case-control and quantitative traits from both iPSYCH and UK Biobank along with external GWAS summary statistics, we compare meta-GWAS with two alternative data-combining approaches, stacked clumping and thresholding (SCT) and meta-PRS. We find that, when large individual-level data are available, the linear combination of PRSs (meta-PRS) is both a simple alternative to meta-GWAS and often more accurate.


Subject(s)
Disease/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Models, Statistical , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Case-Control Studies , Humans , Phenotype
SELECTION OF CITATIONS
SEARCH DETAIL