Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.652
Filter
1.
Transl Vis Sci Technol ; 13(5): 13, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38767906

ABSTRACT

Purpose: The purpose of this study was to conduct a large-scale genome-wide association study (GWAS) and construct a polygenic risk score (PRS) for risk stratification in patients with dry eye disease (DED) using the Taiwan Biobank (TWB) databases. Methods: This retrospective case-control study involved 40,112 subjects of Han Chinese ancestry, sourced from the publicly available TWB. Cases were patients with DED (n = 14,185), and controls were individuals without DED (n = 25,927). The patients with DED were further divided into 8072 young (<60 years old) and 6113 old participants (≥60 years old). Using PLINK (version 1.9) software, quality control was carried out, followed by logistic regression analysis with adjustments for sex, age, body mass index, depression, and manic episodes as covariates. We also built PRS prediction models using the standard clumping and thresholding method and evaluated their performance (area under the curve [AUC]) through five-fold cross-validation. Results: Eleven independent risk loci were identified for these patients with DED at the genome-wide significance levels, including DNAJB6, MAML3, LINC02267, DCHS1, SIRPB3P, HULC, MUC16, GAS2L3, and ZFPM2. Among these, MUC16 encodes mucin family protein. The PRS model incorporated 932 and 740 genetic loci for young and old populations, respectively. A higher PRS score indicated a greater DED risk, with the top 5% of PRS individuals having a 10-fold higher risk. After integrating these covariates into the PRS model, the area under the receiver operating curve (AUROC) increased from 0.509 and 0.537 to 0.600 and 0.648 for young and old populations, respectively, demonstrating the genetic-environmental interaction. Conclusions: Our study prompts potential candidates for the mechanism of DED and paves the way for more personalized medication in the future. Translational Relevance: Our study identified genes related to DED and constructed a PRS model to improve DED prediction.


Subject(s)
Dry Eye Syndromes , Genetic Predisposition to Disease , Genome-Wide Association Study , Multifactorial Inheritance , Humans , Female , Male , Middle Aged , Retrospective Studies , Dry Eye Syndromes/genetics , Dry Eye Syndromes/epidemiology , Case-Control Studies , Genetic Predisposition to Disease/genetics , Adult , Multifactorial Inheritance/genetics , Aged , Risk Factors , Risk Assessment/methods , Polymorphism, Single Nucleotide , Taiwan/epidemiology , Genetic Risk Score
2.
Int J Mol Sci ; 25(9)2024 Apr 23.
Article in English | MEDLINE | ID: mdl-38731822

ABSTRACT

Our understanding of rare disease genetics has been shaped by a monogenic disease model. While the traditional monogenic disease model has been successful in identifying numerous disease-associated genes and significantly enlarged our knowledge in the field of human genetics, it has limitations in explaining phenomena like phenotypic variability and reduced penetrance. Widening the perspective beyond Mendelian inheritance has the potential to enable a better understanding of disease complexity in rare disorders. Digenic inheritance is the simplest instance of a non-Mendelian disorder, characterized by the functional interplay of variants in two disease-contributing genes. Known digenic disease causes show a range of pathomechanisms underlying digenic interplay, including direct and indirect gene product interactions as well as epigenetic modifications. This review aims to systematically explore the background of digenic inheritance in rare disorders, the approaches and challenges when investigating digenic inheritance, and the current evidence for digenic inheritance in mitochondrial disorders.


Subject(s)
Mitochondrial Diseases , Rare Diseases , Humans , Mitochondrial Diseases/genetics , Rare Diseases/genetics , Genetic Predisposition to Disease , Epigenesis, Genetic , Multifactorial Inheritance/genetics , Animals
3.
Nat Commun ; 15(1): 4260, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38769300

ABSTRACT

Transcriptome-wide association study (TWAS) is a popular approach to dissect the functional consequence of disease associated non-coding variants. Most existing TWAS use bulk tissues and may not have the resolution to reveal cell-type specific target genes. Single-cell expression quantitative trait loci (sc-eQTL) datasets are emerging. The largest bulk- and sc-eQTL datasets are most conveniently available as summary statistics, but have not been broadly utilized in TWAS. Here, we present a new method EXPRESSO (EXpression PREdiction with Summary Statistics Only), to analyze sc-eQTL summary statistics, which also integrates 3D genomic data and epigenomic annotation to prioritize causal variants. EXPRESSO substantially improves existing methods. We apply EXPRESSO to analyze multi-ancestry GWAS datasets for 14 autoimmune diseases. EXPRESSO uniquely identifies 958 novel gene x trait associations, which is 26% more than the second-best method. Among them, 492 are unique to cell type level analysis and missed by TWAS using whole blood. We also develop a cell type aware drug repurposing pipeline, which leverages EXPRESSO results to identify drug compounds that can reverse disease gene expressions in relevant cell types. Our results point to multiple drugs with therapeutic potentials, including metformin for type 1 diabetes, and vitamin K for ulcerative colitis.


Subject(s)
Genome-Wide Association Study , Quantitative Trait Loci , Single-Cell Analysis , Humans , Single-Cell Analysis/methods , Genome-Wide Association Study/methods , Genetic Predisposition to Disease/genetics , Transcriptome/genetics , Autoimmune Diseases/genetics , Polymorphism, Single Nucleotide , Multifactorial Inheritance/genetics , Gene Expression Profiling/methods
4.
Sci Rep ; 14(1): 11632, 2024 May 21.
Article in English | MEDLINE | ID: mdl-38773257

ABSTRACT

In recent years, the utility of polygenic risk scores (PRS) in forecasting disease susceptibility from genome-wide association studies (GWAS) results has been widely recognised. Yet, these models face limitations due to overfitting and the potential overestimation of effect sizes in correlated variants. To surmount these obstacles, we devised the Stacked Neural Network Polygenic Risk Score (SNPRS). This novel approach synthesises outputs from multiple neural network models, each calibrated using genetic variants chosen based on diverse p-value thresholds. By doing so, SNPRS captures a broader array of genetic variants, enabling a more nuanced interpretation of the combined effects of these variants. We assessed the efficacy of SNPRS using the UK Biobank data, focusing on the genetic risks associated with breast and prostate cancers, as well as quantitative traits like height and BMI. We also extended our analysis to the Korea Genome and Epidemiology Study (KoGES) dataset. Impressively, our results indicate that SNPRS surpasses traditional PRS models and an isolated deep neural network in terms of accuracy, highlighting its promise in refining the efficacy and relevance of PRS in genetic studies.


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Multifactorial Inheritance , Neural Networks, Computer , Polymorphism, Single Nucleotide , Humans , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Female , Male , Prostatic Neoplasms/genetics , Breast Neoplasms/genetics , Risk Factors , Genetic Risk Score
5.
Nat Commun ; 15(1): 3346, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38693125

ABSTRACT

Endurance exercise training is known to reduce risk for a range of complex diseases. However, the molecular basis of this effect has been challenging to study and largely restricted to analyses of either few or easily biopsied tissues. Extensive transcriptome data collected across 15 tissues during exercise training in rats as part of the Molecular Transducers of Physical Activity Consortium has provided a unique opportunity to clarify how exercise can affect tissue-specific gene expression and further suggest how exercise adaptation may impact complex disease-associated genes. To build this map, we integrate this multi-tissue atlas of gene expression changes with gene-disease targets, genetic regulation of expression, and trait relationship data in humans. Consensus from multiple approaches prioritizes specific tissues and genes where endurance exercise impacts disease-relevant gene expression. Specifically, we identify a total of 5523 trait-tissue-gene triplets to serve as a valuable starting point for future investigations [Exercise; Transcription; Human Phenotypic Variation].


Subject(s)
Gene Expression Regulation , Physical Conditioning, Animal , Animals , Humans , Rats , Transcriptome/genetics , Multifactorial Inheritance/genetics , Exercise/physiology , Male , Phenotype , Quantitative Trait Loci , Gene Expression Profiling
6.
Nat Genet ; 56(5): 838-845, 2024 May.
Article in English | MEDLINE | ID: mdl-38741015

ABSTRACT

Autoimmune and inflammatory diseases are polygenic disorders of the immune system. Many genomic loci harbor risk alleles for several diseases, but the limited resolution of genetic mapping prevents determining whether the same allele is responsible, indicating a shared underlying mechanism. Here, using a collection of 129,058 cases and controls across 6 diseases, we show that ~40% of overlapping associations are due to the same allele. We improve fine-mapping resolution for shared alleles twofold by combining cases and controls across diseases, allowing us to identify more expression quantitative trait loci driven by the shared alleles. The patterns indicate widespread sharing of pathogenic mechanisms but not a single global autoimmune mechanism. Our approach can be applied to any set of traits and is particularly valuable as sample collections become depleted.


Subject(s)
Alleles , Autoimmune Diseases , Chromosome Mapping , Genetic Predisposition to Disease , Quantitative Trait Loci , Humans , Autoimmune Diseases/genetics , Polymorphism, Single Nucleotide , Genome-Wide Association Study , Case-Control Studies , Multifactorial Inheritance/genetics
7.
Nat Genet ; 56(5): 819-826, 2024 May.
Article in English | MEDLINE | ID: mdl-38741014

ABSTRACT

We performed genome-wide association studies of breast cancer including 18,034 cases and 22,104 controls of African ancestry. Genetic variants at 12 loci were associated with breast cancer risk (P < 5 × 10-8), including associations of a low-frequency missense variant rs61751053 in ARHGEF38 with overall breast cancer (odds ratio (OR) = 1.48) and a common variant rs76664032 at chromosome 2q14.2 with triple-negative breast cancer (TNBC) (OR = 1.30). Approximately 15.4% of cases with TNBC carried six risk alleles in three genome-wide association study-identified TNBC risk variants, with an OR of 4.21 (95% confidence interval = 2.66-7.03) compared with those carrying fewer than two risk alleles. A polygenic risk score (PRS) showed an area under the receiver operating characteristic curve of 0.60 for the prediction of breast cancer risk, which outperformed PRS derived using data from females of European ancestry. Our study markedly increases the population diversity in genetic studies for breast cancer and demonstrates the utility of PRS for risk prediction in females of African ancestry.


Subject(s)
Black People , Breast Neoplasms , Genetic Predisposition to Disease , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Female , Genome-Wide Association Study/methods , Breast Neoplasms/genetics , Black People/genetics , Case-Control Studies , Risk Factors , Triple Negative Breast Neoplasms/genetics , Alleles , Multifactorial Inheritance/genetics , Middle Aged , Genetic Loci , White People/genetics
8.
PLoS One ; 19(5): e0303610, 2024.
Article in English | MEDLINE | ID: mdl-38758931

ABSTRACT

We have previously shown that polygenic risk scores (PRS) can improve risk stratification of peripheral artery disease (PAD) in a large, retrospective cohort. Here, we evaluate the potential of PRS in improving the detection of PAD and prediction of major adverse cardiovascular and cerebrovascular events (MACCE) and adverse events (AE) in an institutional patient cohort. We created a cohort of 278 patients (52 cases and 226 controls) and fit a PAD-specific PRS based on the weighted sum of risk alleles. We built traditional clinical risk models and machine learning (ML) models using clinical and genetic variables to detect PAD, MACCE, and AE. The models' performances were measured using the area under the curve (AUC), net reclassification index (NRI), integrated discrimination improvement (IDI), and Brier score. We also evaluated the clinical utility of our PAD model using decision curve analysis (DCA). We found a modest, but not statistically significant improvement in the PAD detection model's performance with the inclusion of PRS from 0.902 (95% CI: 0.846-0.957) (clinical variables only) to 0.909 (95% CI: 0.856-0.961) (clinical variables with PRS). The PRS inclusion significantly improved risk re-classification of PAD with an NRI of 0.07 (95% CI: 0.002-0.137), p = 0.04. For our ML model predicting MACCE, the addition of PRS did not significantly improve the AUC, however, NRI analysis demonstrated significant improvement in risk re-classification (p = 2e-05). Decision curve analysis showed higher net benefit of our combined PRS-clinical model across all thresholds of PAD detection. Including PRS to a clinical PAD-risk model was associated with improvement in risk stratification and clinical utility, although we did not see a significant change in AUC. This result underscores the potential clinical utility of incorporating PRS data into clinical risk models for prevalent PAD and the need for use of evaluation metrics that can discern the clinical impact of using new biomarkers in smaller populations.


Subject(s)
Peripheral Arterial Disease , Humans , Peripheral Arterial Disease/genetics , Peripheral Arterial Disease/diagnosis , Female , Male , Aged , Middle Aged , Risk Assessment/methods , Risk Factors , Machine Learning , Cardiovascular Diseases/genetics , Cardiovascular Diseases/diagnosis , Retrospective Studies , Multifactorial Inheritance/genetics , Case-Control Studies , Area Under Curve , Genetic Risk Score
9.
Am J Hum Genet ; 111(5): 833-840, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38701744

ABSTRACT

Some commercial firms currently sell polygenic indexes (PGIs) to individual consumers, despite their relatively low predictive power. It might be tempting to assume that because the predictive power of many PGIs is so modest, other sorts of firms-such as those selling insurance and financial services-will not be interested in using PGIs for their own purposes. We argue to the contrary. We build this argument in two ways. First, we offer a very simple model, rooted in economic theory, of a profit-maximizing firm that can gain information about a single consumer's genome. We use the model to show that, depending on the specific economic environment, a firm would be willing to pay for statistically noisy PGIs, even if they allow for only a small reduction in uncertainty. Second, we describe two plausible scenarios in which these different kinds of firms could conceivably use PGIs to maximize profits. Finally, we briefly discuss some of the associated ethics and policy issues. They deserve more attention, which is unlikely to be given until it is first recognized that firms whose services affect a large swath of the public will indeed have incentives to use PGIs.


Subject(s)
Multifactorial Inheritance , Humans , Multifactorial Inheritance/genetics , Genetic Testing/ethics , Genetic Testing/economics
10.
Nat Commun ; 15(1): 4230, 2024 May 18.
Article in English | MEDLINE | ID: mdl-38762475

ABSTRACT

Type 2 diabetes (T2D) presents a formidable global health challenge, highlighted by its escalating prevalence, underscoring the critical need for precision health strategies and early detection initiatives. Leveraging artificial intelligence, particularly eXtreme Gradient Boosting (XGBoost), we devise robust risk assessment models for T2D. Drawing upon comprehensive genetic and medical imaging datasets from 68,911 individuals in the Taiwan Biobank, our models integrate Polygenic Risk Scores (PRS), Multi-image Risk Scores (MRS), and demographic variables, such as age, sex, and T2D family history. Here, we show that our model achieves an Area Under the Receiver Operating Curve (AUC) of 0.94, effectively identifying high-risk T2D subgroups. A streamlined model featuring eight key variables also maintains a high AUC of 0.939. This high accuracy for T2D risk assessment promises to catalyze early detection and preventive strategies. Moreover, we introduce an accessible online risk assessment tool for T2D, facilitating broader applicability and dissemination of our findings.


Subject(s)
Artificial Intelligence , Diabetes Mellitus, Type 2 , Diabetes Mellitus, Type 2/genetics , Humans , Risk Assessment/methods , Female , Male , Middle Aged , Taiwan/epidemiology , Genetic Predisposition to Disease , Adult , Diagnostic Imaging/methods , Aged , Risk Factors , ROC Curve , Multifactorial Inheritance/genetics
11.
PLoS Biol ; 22(4): e3002511, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38603516

ABSTRACT

A central aim of genome-wide association studies (GWASs) is to estimate direct genetic effects: the causal effects on an individual's phenotype of the alleles that they carry. However, estimates of direct effects can be subject to genetic and environmental confounding and can also absorb the "indirect" genetic effects of relatives' genotypes. Recently, an important development in controlling for these confounds has been the use of within-family GWASs, which, because of the randomness of mendelian segregation within pedigrees, are often interpreted as producing unbiased estimates of direct effects. Here, we present a general theoretical analysis of the influence of confounding in standard population-based and within-family GWASs. We show that, contrary to common interpretation, family-based estimates of direct effects can be biased by genetic confounding. In humans, such biases will often be small per-locus, but can be compounded when effect-size estimates are used in polygenic scores (PGSs). We illustrate the influence of genetic confounding on population- and family-based estimates of direct effects using models of assortative mating, population stratification, and stabilizing selection on GWAS traits. We further show how family-based estimates of indirect genetic effects, based on comparisons of parentally transmitted and untransmitted alleles, can suffer substantial genetic confounding. We conclude that, while family-based studies have placed GWAS estimation on a more rigorous footing, they carry subtle issues of interpretation that arise from confounding.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Genotype , Phenotype , Multifactorial Inheritance/genetics , Alleles , Polymorphism, Single Nucleotide/genetics
12.
Cell Genom ; 4(4): 100539, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38604127

ABSTRACT

Polygenic risk scores (PRSs) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in summary statistics from genome-wide association studies (GWASs) across multiple ancestry groups via Bayesian hierarchical modeling and ensemble learning. In our simulation studies and data analyses across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. For example, MUSSEL has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, trait architecture, and linkage disequilibrium reference samples; thus, ultimately a combination of methods may be needed to generate the most robust PRSs across diverse populations.


Subject(s)
Bivalvia , Multifactorial Inheritance , Humans , Animals , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Bayes Theorem , Phenotype , Genetic Risk Score
14.
PLoS Genet ; 20(4): e1011249, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38669290

ABSTRACT

Polygenic scores (PGS) are measures of genetic risk, derived from the results of genome wide association studies (GWAS). Previous work has proposed the coefficient of determination (R2) as an appropriate measure by which to compare PGS performance in a validation dataset. Here we propose correlation-based methods for evaluating PGS performance by adapting previous work which produced a statistical framework and robust test statistics for the comparison of multiple correlation measures in multiple populations. This flexible framework can be extended to a wider variety of hypothesis tests than currently available methods. We assess our proposed method in simulation and demonstrate its utility with two examples, assessing previously developed PGS for low-density lipoprotein cholesterol and height in multiple populations in the All of Us cohort. Finally, we provide an R package 'coranova' with both parametric and nonparametric implementations of the described methods.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Cholesterol, LDL/blood , Cholesterol, LDL/genetics , Genetic Predisposition to Disease , Models, Genetic , Polymorphism, Single Nucleotide/genetics , Body Height/genetics , Computer Simulation , Genetics, Population/methods
15.
Med ; 5(5): 459-468.e3, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38642556

ABSTRACT

BACKGROUND: The extent to which the relationships between clinical risk factors and coronary artery disease (CAD) are altered by CAD polygenic risk score (PRS) is not well understood. Here, we determine whether the interactions between clinical risk factors and CAD PRS further explain risk for incident CAD. METHODS: Participants were of European ancestry from the UK Biobank without prevalent CAD. An externally trained genome-wide CAD PRS was generated and then applied. Clinical risk factors were ascertained at baseline. Cox proportional hazards models were fitted to examine the incident CAD effects of CAD PRS, risk factors, and their interactions. Next, the PRS and risk factors were stratified to investigate the attributable risk of clinical risk factors. FINDINGS: A total of 357,144 individuals of European ancestry without prevalent CAD were included. During a median of 11.1 years of follow-up (interquartile range 10.4-14.1 years), CAD PRS was associated with 1.35-fold (95% confidence interval [CI] 1.332-1.368) risk per SD for incident CAD. The prognostic relevance of the following risk factors was relatively diminished for those with high CAD PRS on a continuous scale: type 2 diabetes (hazard ratio [HR]interaction 0.91, 95% CIinteraction 0.88-0.94), increased body mass index (HRinteraction 0.97, 95% CIinteraction 0.96-0.98), and increased C-reactive protein (HRinteraction 0.98, 95% CIinteraction 0.96-0.99). However, a high CAD PRS yielded joint risk increases with low-density lipoprotein cholesterol (HRinteraction 1.05, 95% CIinteraction 1.04-1.06) and total cholesterol (HRinteraction 1.05, 95% CIinteraction 1.03-1.06). CONCLUSION: The CAD PRS is associated with incident CAD, and its application improves the prognostic relevance of several clinical risk factors. FUNDING: P.N. (R01HL127564, R01HL151152, and U01HG011719) is supported by the National Institutes of Health.


Subject(s)
Coronary Artery Disease , Humans , Coronary Artery Disease/genetics , Coronary Artery Disease/epidemiology , Male , Female , Middle Aged , Risk Factors , United Kingdom/epidemiology , Proportional Hazards Models , Aged , Multifactorial Inheritance/genetics , Genome-Wide Association Study , Adult , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/epidemiology , White People/genetics , Incidence , Risk Assessment , Heart Disease Risk Factors , Genetic Risk Score
16.
Nat Genet ; 56(5): 827-837, 2024 May.
Article in English | MEDLINE | ID: mdl-38632349

ABSTRACT

We report a multi-ancestry genome-wide association study on liver cirrhosis and its associated endophenotypes, alanine aminotransferase (ALT) and γ-glutamyl transferase. Using data from 12 cohorts, including 18,265 cases with cirrhosis, 1,782,047 controls, up to 1 million individuals with liver function tests and a validation cohort of 21,689 cases and 617,729 controls, we identify and validate 14 risk associations for cirrhosis. Many variants are located near genes involved in hepatic lipid metabolism. One of these, PNPLA3 p.Ile148Met, interacts with alcohol intake, obesity and diabetes on the risk of cirrhosis and hepatocellular carcinoma (HCC). We develop a polygenic risk score that associates with the progression from cirrhosis to HCC. By focusing on prioritized genes from common variant analyses, we find that rare coding variants in GPAM associate with lower ALT, supporting GPAM as a potential target for therapeutic inhibition. In conclusion, this study provides insights into the genetic underpinnings of cirrhosis.


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Liver Cirrhosis , Humans , Liver Cirrhosis/genetics , Liver Neoplasms/genetics , Carcinoma, Hepatocellular/genetics , Alanine Transaminase/blood , Polymorphism, Single Nucleotide , Male , Lipase/genetics , Female , gamma-Glutamyltransferase/genetics , Membrane Proteins/genetics , Cohort Studies , Case-Control Studies , Multifactorial Inheritance/genetics , Risk Factors , Genetic Variation
17.
Nat Genet ; 56(5): 861-868, 2024 May.
Article in English | MEDLINE | ID: mdl-38637616

ABSTRACT

Rare damaging variants in a large number of genes are known to cause monogenic developmental disorders (DDs) and have also been shown to cause milder subclinical phenotypes in population cohorts. Here, we show that carrying multiple (2-5) rare damaging variants across 599 dominant DD genes has an additive adverse effect on numerous cognitive and socioeconomic traits in UK Biobank, which can be partially counterbalanced by a higher educational attainment polygenic score (EA-PGS). Phenotypic deviators from expected EA-PGS could be partly explained by the enrichment or depletion of rare DD variants. Among carriers of rare DD variants, those with a DD-related clinical diagnosis had a substantially lower EA-PGS and more severe phenotype than those without a clinical diagnosis. Our results suggest that the overall burden of both rare and common variants can modify the expressivity of a phenotype, which may then influence whether an individual reaches the threshold for clinical disease.


Subject(s)
Developmental Disabilities , Multifactorial Inheritance , Phenotype , Humans , Multifactorial Inheritance/genetics , Developmental Disabilities/genetics , Female , Male , Genetic Predisposition to Disease , Genetic Variation , United Kingdom , Genes, Modifier , Middle Aged , Genome-Wide Association Study
18.
Nat Genet ; 56(5): 767-777, 2024 May.
Article in English | MEDLINE | ID: mdl-38689000

ABSTRACT

We develop a method, SBayesRC, that integrates genome-wide association study (GWAS) summary statistics with functional genomic annotations to improve polygenic prediction of complex traits. Our method is scalable to whole-genome variant analysis and refines signals from functional annotations by allowing them to affect both causal variant probability and causal effect distribution. We analyze 50 complex traits and diseases using ∼7 million common single-nucleotide polymorphisms (SNPs) and 96 annotations. SBayesRC improves prediction accuracy by 14% in European ancestry and up to 34% in cross-ancestry prediction compared to the baseline method SBayesR, which does not use annotations, and outperforms other methods, including LDpred2, LDpred-funct, MegaPRS, PolyPred-S and PRS-CSx. Investigation of factors affecting prediction accuracy identifies a significant interaction between SNP density and annotation information, suggesting whole-genome sequence variants with annotations may further improve prediction. Functional partitioning analysis highlights a major contribution of evolutionary constrained regions to prediction accuracy and the largest per-SNP contribution from nonsynonymous SNPs.


Subject(s)
Genome-Wide Association Study , Molecular Sequence Annotation , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Humans , Molecular Sequence Annotation/methods , Genomics/methods , Genome, Human , Models, Genetic
19.
Elife ; 122024 04 19.
Article in English | MEDLINE | ID: mdl-38639992

ABSTRACT

We propose a new framework for human genetic association studies: at each locus, a deep learning model (in this study, Sei) is used to calculate the functional genomic activity score for two haplotypes per individual. This score, defined as the Haplotype Function Score (HFS), replaces the original genotype in association studies. Applying the HFS framework to 14 complex traits in the UK Biobank, we identified 3619 independent HFS-trait associations with a significance of p < 5 × 10-8. Fine-mapping revealed 2699 causal associations, corresponding to a median increase of 63 causal findings per trait compared with single-nucleotide polymorphism (SNP)-based analysis. HFS-based enrichment analysis uncovered 727 pathway-trait associations and 153 tissue-trait associations with strong biological interpretability, including 'circadian pathway-chronotype' and 'arachidonic acid-intelligence'. Lastly, we applied least absolute shrinkage and selection operator (LASSO) regression to integrate HFS prediction score with SNP-based polygenic risk scores, which showed an improvement of 16.1-39.8% in cross-ancestry polygenic prediction. We concluded that HFS is a promising strategy for understanding the genetic basis of human complex traits.


Scattered throughout the human genome are variations in the genetic code that make individuals more or less likely to develop certain traits. To identify these variants, scientists carry out Genome-wide association studies (GWAS) which compare the DNA variants of large groups of people with and without the trait of interest. This method has been able to find the underlying genes for many human diseases, but it has limitations. For instance, some variations are linked together due to where they are positioned within DNA, which can result in GWAS falsely reporting associations between genetic variants and traits. This phenomenon, known as linkage equilibrium, can be avoided by analyzing functional genomics which looks at the multiple ways a gene's activity can be influenced by a variation. For instance, how the gene is copied and decoded in to proteins and RNA molecules, and the rate at which these products are generated. Researchers can now use an artificial intelligence technique called deep learning to generate functional genomic data from a particular DNA sequence. Here, Song et al. used one of these deep learning models to calculate the functional genomics of haplotypes, groups of genetic variants inherited from one parent. The approach was applied to DNA samples from over 350 thousand individuals included in the UK BioBank. An activity score, defined as the haplotype function score (or HFS for short), was calculated for at least two haplotypes per individual, and then compared to various complex traits like height or bone density. Song et al. found that the HFS framework was better at finding links between genes and specific traits than existing methods. It also provided more information on the biology that may be underpinning these outcomes. Although more work is needed to reduce the computer processing times required to calculate the HFS, Song et al. believe that their new method has the potential to improve the way researchers identify links between genes and human traits.


Subject(s)
Multifactorial Inheritance , Quantitative Trait Loci , Humans , Haplotypes , Multifactorial Inheritance/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Phenotype
20.
PLoS Comput Biol ; 20(4): e1011990, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38598551

ABSTRACT

Prostate cancer is a heritable disease with ancestry-biased incidence and mortality. Polygenic risk scores (PRSs) offer promising advancements in predicting disease risk, including prostate cancer. While their accuracy continues to improve, research aimed at enhancing their effectiveness within African and Asian populations remains key for equitable use. Recent algorithmic developments for PRS derivation have resulted in improved pan-ancestral risk prediction for several diseases. In this study, we benchmark the predictive power of six widely used PRS derivation algorithms, including four of which adjust for ancestry, against prostate cancer cases and controls from the UK Biobank and All of Us cohorts. We find modest improvement in discriminatory ability when compared with a simple method that prioritizes variants, clumping, and published polygenic risk scores. Our findings underscore the importance of improving upon risk prediction algorithms and the sampling of diverse cohorts.


Subject(s)
Algorithms , Benchmarking , Genetic Predisposition to Disease , Multifactorial Inheritance , Prostatic Neoplasms , Humans , Prostatic Neoplasms/genetics , Male , Benchmarking/methods , Genetic Predisposition to Disease/genetics , Multifactorial Inheritance/genetics , Cohort Studies , Risk Factors , Polymorphism, Single Nucleotide/genetics , Genome-Wide Association Study/methods , Computational Biology/methods , Risk Assessment/methods , Case-Control Studies , Genetic Risk Score
SELECTION OF CITATIONS
SEARCH DETAIL
...