Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 97
Filter
1.
Nat Commun ; 15(1): 5007, 2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38866767

ABSTRACT

Polygenic scores (PGSs) offer the ability to predict genetic risk for complex diseases across the life course; a key benefit over short-term prediction models. To produce risk estimates relevant to clinical and public health decision-making, it is important to account for varying effects due to age and sex. Here, we develop a novel framework to estimate country-, age-, and sex-specific estimates of cumulative incidence stratified by PGS for 18 high-burden diseases. We integrate PGS associations from seven studies in four countries (N = 1,197,129) with disease incidences from the Global Burden of Disease. PGS has a significant sex-specific effect for asthma, hip osteoarthritis, gout, coronary heart disease and type 2 diabetes (T2D), with all but T2D exhibiting a larger effect in men. PGS has a larger effect in younger individuals for 13 diseases, with effects decreasing linearly with age. We show for breast cancer that, relative to individuals in the bottom 20% of polygenic risk, the top 5% attain an absolute risk for screening eligibility 16.3 years earlier. Our framework increases the generalizability of results from biobank studies and the accuracy of absolute risk estimates by appropriately accounting for age- and sex-specific PGS effects. Our results highlight the potential of PGS as a screening tool which may assist in the early prevention of common diseases.


Subject(s)
Genetic Predisposition to Disease , Multifactorial Inheritance , Humans , Male , Female , Multifactorial Inheritance/genetics , Incidence , Middle Aged , Adult , Aged , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/epidemiology , Risk Factors , Risk Assessment/methods , Global Burden of Disease , Sex Factors , Age Factors
3.
bioRxiv ; 2024 May 06.
Article in English | MEDLINE | ID: mdl-38766054

ABSTRACT

Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.

4.
medRxiv ; 2024 May 16.
Article in English | MEDLINE | ID: mdl-38798318

ABSTRACT

Understanding the genetic basis of gene expression can help us understand the molecular underpinnings of human traits and disease. Expression quantitative trait locus (eQTL) mapping can help in studying this relationship but have been shown to be very cell-type specific, motivating the use of single-cell RNA sequencing and single-cell eQTLs to obtain a more granular view of genetic regulation. Current methods for single-cell eQTL mapping either rely on the "pseudobulk" approach and traditional pipelines for bulk transcriptomics or do not scale well to large datasets. Here, we propose SAIGE-QTL, a robust and scalable tool that can directly map eQTLs using single-cell profiles without needing aggregation at the pseudobulk level. Additionally, SAIGE-QTL allows for testing the effects of less frequent/rare genetic variation through set-based tests, which is traditionally excluded from eQTL mapping studies. We evaluate the performance of SAIGE-QTL on both real and simulated data and demonstrate the improved power for eQTL mapping over existing pipelines.

5.
medRxiv ; 2024 May 13.
Article in English | MEDLINE | ID: mdl-38798542

ABSTRACT

Leveraging data from multiple ancestries can greatly improve fine-mapping power due to differences in linkage disequilibrium and allele frequencies. We propose MultiSuSiE, an extension of the sum of single effects model (SuSiE) to multiple ancestries that allows causal effect sizes to vary across ancestries based on a multivariate normal prior informed by empirical data. We evaluated MultiSuSiE via simulations and analyses of 14 quantitative traits leveraging whole-genome sequencing data in 47k African-ancestry and 94k European-ancestry individuals from All of Us. In simulations, MultiSuSiE applied to Afr47k+Eur47k was well-calibrated and attained higher power than SuSiE applied to Eur94k; interestingly, higher causal variant PIPs in Afr47k compared to Eur47k were entirely explained by differences in the extent of LD quantified by LD 4th moments. Compared to very recently proposed multi-ancestry fine-mapping methods, MultiSuSiE attained higher power and/or much lower computational costs, making the analysis of large-scale All of Us data feasible. In real trait analyses, MultiSuSiE applied to Afr47k+Eur94k identified 579 fine-mapped variants with PIP > 0.5, and MultiSuSiE applied to Afr47k+Eur47k identified 44% more fine-mapped variants with PIP > 0.5 than SuSiE applied to Eur94k. We validated MultiSuSiE results for real traits via functional enrichment of fine-mapped variants. We highlight several examples where MultiSuSiE implicates well-studied or biologically plausible fine-mapped variants that were not implicated by other methods.

6.
Nat Genet ; 56(4): 615-626, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38594305

ABSTRACT

Translating genome-wide association study (GWAS) loci into causal variants and genes requires accurate cell-type-specific enhancer-gene maps from disease-relevant tissues. Building enhancer-gene maps is essential but challenging with current experimental methods in primary human tissues. Here we developed a nonparametric statistical method, SCENT (single-cell enhancer target gene mapping), that models association between enhancer chromatin accessibility and gene expression in single-cell or nucleus multimodal RNA sequencing and ATAC sequencing data. We applied SCENT to 9 multimodal datasets including >120,000 single cells or nuclei and created 23 cell-type-specific enhancer-gene maps. These maps were highly enriched for causal variants in expression quantitative loci and GWAS for 1,143 diseases and traits. We identified likely causal genes for both common and rare diseases and linked somatic mutation hotspots to target genes. We demonstrate that application of SCENT to multimodal data from disease-relevant human tissue enables the scalable construction of accurate cell-type-specific enhancer-gene maps, essential for defining noncoding variant function.


Subject(s)
Genome-Wide Association Study , Regulatory Sequences, Nucleic Acid , Humans , Alleles , Genome-Wide Association Study/methods , Chromosome Mapping , Phenotype , Chromatin/genetics , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease/genetics
7.
Aliment Pharmacol Ther ; 59(11): 1402-1412, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38497224

ABSTRACT

BACKGROUND AND AIMS: The European Association for the Study of the Liver introduced a clinical pathway (EASL CP) for screening significant/advanced fibrosis in people at risk of steatotic liver disease (SLD). We assessed the performance of the first-step FIB4 EASL CP in the general population across different SLD risk groups (MASLD, Met-ALD and ALD) and various age classes. METHODS: We analysed a total of 3372 individuals at risk of SLD from the 2017-2018 National Health and Nutrition Examination Survey (NHANES17-18), projected to 152.3 million U.S. adults, 300,329 from the UK Biobank (UKBB) and 57,644 from the Biobank Japan (BBJ). We assessed liver stiffness measurement (LSM) ≥8 kPa and liver-related events occurring within 3 and 10 years (3/10 year-LREs) as outcomes. We defined MASLD, MetALD, and ALD according to recent international recommendations. RESULTS: FIB4 sensitivity for LSM ≥ 8 kPa was low (27.7%), but it ranged approximately 80%-90% for 3-year LREs. Using FIB4, 22%-57% of subjects across the three cohorts were identified as candidates for vibration-controlled transient elastography (VCTE), which was mostly avoidable (positive predictive value of FIB4 ≥ 1.3 for LSM ≥ 8 kPa ranging 9.5%-13% across different SLD categories). Sensitivity for LSM ≥ 8 kPa and LREs increased with increasing alcohol intake (ALD>MetALD>MASLD) and age classes. For individuals aged ≥65 years, using the recommended age-adjusted FIB4 cut-off (≥2) substantially reduced sensitivity for LSM ≥ 8 kPa and LREs. CONCLUSIONS: The first-step FIB4 EASL CP is poorly accurate and feasible for individuals at risk of SLD in the general population. It is crucial to enhance the screening strategy with a first-step approach able to reduce unnecessary VCTEs and optimise their yield.


Subject(s)
Fatty Liver , Mass Screening , Adult , Aged , Female , Humans , Male , Middle Aged , Elasticity Imaging Techniques , Fatty Liver/diagnostic imaging , Japan , Liver Cirrhosis , Mass Screening/methods , Non-alcoholic Fatty Liver Disease , Nutrition Surveys , Risk Assessment/methods , Risk Factors , Sensitivity and Specificity , United States
8.
Cell Rep Med ; 5(2): 101430, 2024 Feb 20.
Article in English | MEDLINE | ID: mdl-38382466

ABSTRACT

Primary open-angle glaucoma (POAG), a leading cause of irreversible blindness globally, shows disparity in prevalence and manifestations across ancestries. We perform meta-analysis across 15 biobanks (of the Global Biobank Meta-analysis Initiative) (n = 1,487,441: cases = 26,848) and merge with previous multi-ancestry studies, with the combined dataset representing the largest and most diverse POAG study to date (n = 1,478,037: cases = 46,325) and identify 17 novel significant loci, 5 of which were ancestry specific. Gene-enrichment and transcriptome-wide association analyses implicate vascular and cancer genes, a fifth of which are primary ciliary related. We perform an extensive statistical analysis of SIX6 and CDKN2B-AS1 loci in human GTEx data and across large electronic health records showing interaction between SIX6 gene and causal variants in the chr9p21.3 locus, with expression effect on CDKN2A/B. Our results suggest that some POAG risk variants may be ancestry specific, sex specific, or both, and support the contribution of genes involved in programmed cell death in POAG pathogenesis.


Subject(s)
Genetic Predisposition to Disease , Glaucoma, Open-Angle , Male , Female , Humans , Genetic Predisposition to Disease/genetics , Glaucoma, Open-Angle/genetics , Glaucoma, Open-Angle/epidemiology , Polymorphism, Single Nucleotide , Cell Proliferation , Biology
9.
Nat Genet ; 56(2): 222-233, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38177345

ABSTRACT

Most genome-wide association studies (GWAS) of major depression (MD) have been conducted in samples of European ancestry. Here we report a multi-ancestry GWAS of MD, adding data from 21 cohorts with 88,316 MD cases and 902,757 controls to previously reported data. This analysis used a range of measures to define MD and included samples of African (36% of effective sample size), East Asian (26%) and South Asian (6%) ancestry and Hispanic/Latin American participants (32%). The multi-ancestry GWAS identified 53 significantly associated novel loci. For loci from GWAS in European ancestry samples, fewer than expected were transferable to other ancestry groups. Fine mapping benefited from additional sample diversity. A transcriptome-wide association study identified 205 significantly associated novel genes. These findings suggest that, for MD, increasing ancestral and global diversity in genetic studies may be particularly important to ensure discovery of core genes and inform about transferability of findings.


Subject(s)
Depressive Disorder, Major , Genome-Wide Association Study , Humans , Genetic Predisposition to Disease , Depressive Disorder, Major/genetics , Depression , Chromosome Mapping , Polymorphism, Single Nucleotide/genetics
11.
Nature ; 625(7993): 92-100, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Subject(s)
Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic
12.
Nat Genet ; 56(1): 162-169, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38036779

ABSTRACT

Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Bayes Theorem , Multifactorial Inheritance , Algorithms
13.
Cell Genom ; 3(10): 100408, 2023 Oct 11.
Article in English | MEDLINE | ID: mdl-37868036

ABSTRACT

Polygenic risk scores (PRSs) developed from multi-ancestry genome-wide association studies (GWASs), PRSmulti, hold promise for improving PRS accuracy and generalizability across populations. To establish best practices for leveraging the increasing diversity of genomic studies, we investigated how various factors affect the performance of PRSmulti compared with PRSs constructed from single-ancestry GWASs (PRSsingle). Through extensive simulations and empirical analyses, we showed that PRSmulti overall outperformed PRSsingle in understudied populations, except when the understudied population represented a small proportion of the multi-ancestry GWAS. Furthermore, integrating PRSs based on local ancestry-informed GWASs and large-scale, European-based PRSs improved predictive performance in understudied African populations, especially for less polygenic traits with large-effect ancestry-enriched variants. Our work highlights the importance of diversifying genomic studies to achieve equitable PRS performance across ancestral populations and provides guidance for developing PRSs from multiple studies.

14.
PLoS Genet ; 19(9): e1010932, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37721944

ABSTRACT

The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases.


Subject(s)
Multifactorial Inheritance , Quantitative Trait Loci , Humans , Quantitative Trait Loci/genetics , Genotype , Base Sequence , Genome-Wide Association Study , Polymorphism, Single Nucleotide
15.
Nature ; 620(7975): 839-848, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37587338

ABSTRACT

Mitochondrial DNA (mtDNA) is a maternally inherited, high-copy-number genome required for oxidative phosphorylation1. Heteroplasmy refers to the presence of a mixture of mtDNA alleles in an individual and has been associated with disease and ageing. Mechanisms underlying common variation in human heteroplasmy, and the influence of the nuclear genome on this variation, remain insufficiently explored. Here we quantify mtDNA copy number (mtCN) and heteroplasmy using blood-derived whole-genome sequences from 274,832 individuals and perform genome-wide association studies to identify associated nuclear loci. Following blood cell composition correction, we find that mtCN declines linearly with age and is associated with variants at 92 nuclear loci. We observe that nearly everyone harbours heteroplasmic mtDNA variants obeying two principles: (1) heteroplasmic single nucleotide variants tend to arise somatically and accumulate sharply after the age of 70 years, whereas (2) heteroplasmic indels are maternally inherited as mixtures with relative levels associated with 42 nuclear loci involved in mtDNA replication, maintenance and novel pathways. These loci may act by conferring a replicative advantage to certain mtDNA alleles. As an illustrative example, we identify a length variant carried by more than 50% of humans at position chrM:302 within a G-quadruplex previously proposed to mediate mtDNA transcription/replication switching2,3. We find that this variant exerts cis-acting genetic control over mtDNA abundance and is itself associated in-trans with nuclear loci encoding machinery for this regulatory switch. Our study suggests that common variation in the nuclear genome can shape variation in mtCN and heteroplasmy dynamics across the human population.


Subject(s)
Cell Nucleus , DNA Copy Number Variations , DNA, Mitochondrial , Heteroplasmy , Mitochondria , Aged , Humans , DNA Copy Number Variations/genetics , DNA, Mitochondrial/genetics , Genome-Wide Association Study , Heteroplasmy/genetics , Mitochondria/genetics , Cell Nucleus/genetics , Alleles , Polymorphism, Single Nucleotide , INDEL Mutation , G-Quadruplexes
16.
iScience ; 26(7): 107051, 2023 Jul 21.
Article in English | MEDLINE | ID: mdl-37426350

ABSTRACT

Angiogenesis is a sequential process to extend new blood vessels from preexisting ones by sprouting and branching. During angiogenesis, endothelial cells (ECs) exhibit inhomogeneous multicellular behaviors referred to as "cell mixing," in which ECs repetitively exchange their relative positions, but the underlying mechanism remains elusive. Here we identified the coordinated linear and rotational movements potentiated by cell-cell contact as drivers of sprouting angiogenesis using in vitro and in silico approaches. VE-cadherin confers the coordinated linear motility that facilitated forward sprout elongation, although it is dispensable for rotational movement, which was synchronous without VE-cadherin. Mathematical modeling recapitulated the EC motility in the two-cell state and angiogenic morphogenesis with the effects of VE-cadherin-knockout. Finally, we found that VE-cadherin-dependent EC compartmentalization potentiated branch elongations, and confirmed this by mathematical simulation. Collectively, we propose a way to understand angiogenesis, based on unique EC behavioral properties that are partially dependent on VE-cadherin function.

17.
Nat Med ; 29(7): 1611-1612, 2023 07.
Article in English | MEDLINE | ID: mdl-37464034
18.
Nat Genet ; 55(8): 1267-1276, 2023 08.
Article in English | MEDLINE | ID: mdl-37443254

ABSTRACT

Genome-wide association studies (GWASs) are a valuable tool for understanding the biology of complex human traits and diseases, but associated variants rarely point directly to causal genes. In the present study, we introduce a new method, polygenic priority score (PoPS), that learns trait-relevant gene features, such as cell-type-specific expression, to prioritize genes at GWAS loci. Using a large evaluation set of genes with fine-mapped coding variants, we show that PoPS and the closest gene individually outperform other gene prioritization methods, but observe the best overall performance by combining PoPS with orthogonal methods. Using this combined approach, we prioritize 10,642 unique gene-trait pairs across 113 complex traits and diseases with high precision, finding not only well-established gene-trait relationships but nominating new genes at unresolved loci, such as LGR4 for estimated glomerular filtration rate and CCR7 for deep vein thrombosis. Overall, we demonstrate that PoPS provides a powerful addition to the gene prioritization toolbox.


Subject(s)
Multifactorial Inheritance , Quantitative Trait Loci , Humans , Multifactorial Inheritance/genetics , Quantitative Trait Loci/genetics , Genome-Wide Association Study/methods , Genetic Predisposition to Disease/genetics , Phenotype , Polymorphism, Single Nucleotide/genetics
20.
bioRxiv ; 2023 Apr 07.
Article in English | MEDLINE | ID: mdl-37066341

ABSTRACT

Splicing quantitative trait loci (QTLs) have been implicated as a common mechanism underlying complex trait associations. However, utilising splicing QTLs in target discovery and prioritisation has been challenging due to extensive data normalisation which often renders the direction of the genetic effect as well as its magnitude difficult to interpret. This is further complicated by the fact that strong expression QTLs often manifest as weak splicing QTLs and vice versa, making it difficult to uniquely identify the underlying molecular mechanism at each locus. We find that these ambiguities can be mitigated by visualising the association between the genotype and average RNA sequencing read coverage in the region. Here, we generate these QTL coverage plots for 1.7 million molecular QTL associations in the eQTL Catalogue identified with five quantification methods. We illustrate the utility of these QTL coverage plots by performing colocalisation between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. We find that while visually confirmed splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases. All our association summary statistics and QTL coverage plots are freely available at https://www.ebi.ac.uk/eqtl/.

SELECTION OF CITATIONS
SEARCH DETAIL
...