Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 93
Filter
1.
bioRxiv ; 2024 May 06.
Article in English | MEDLINE | ID: mdl-38766054

ABSTRACT

Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.

2.
Nat Genet ; 56(4): 615-626, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38594305

ABSTRACT

Translating genome-wide association study (GWAS) loci into causal variants and genes requires accurate cell-type-specific enhancer-gene maps from disease-relevant tissues. Building enhancer-gene maps is essential but challenging with current experimental methods in primary human tissues. Here we developed a nonparametric statistical method, SCENT (single-cell enhancer target gene mapping), that models association between enhancer chromatin accessibility and gene expression in single-cell or nucleus multimodal RNA sequencing and ATAC sequencing data. We applied SCENT to 9 multimodal datasets including >120,000 single cells or nuclei and created 23 cell-type-specific enhancer-gene maps. These maps were highly enriched for causal variants in expression quantitative loci and GWAS for 1,143 diseases and traits. We identified likely causal genes for both common and rare diseases and linked somatic mutation hotspots to target genes. We demonstrate that application of SCENT to multimodal data from disease-relevant human tissue enables the scalable construction of accurate cell-type-specific enhancer-gene maps, essential for defining noncoding variant function.


Subject(s)
Genome-Wide Association Study , Regulatory Sequences, Nucleic Acid , Humans , Alleles , Genome-Wide Association Study/methods , Chromosome Mapping , Phenotype , Chromatin/genetics , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease/genetics
3.
Aliment Pharmacol Ther ; 59(11): 1402-1412, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38497224

ABSTRACT

BACKGROUND AND AIMS: The European Association for the Study of the Liver introduced a clinical pathway (EASL CP) for screening significant/advanced fibrosis in people at risk of steatotic liver disease (SLD). We assessed the performance of the first-step FIB4 EASL CP in the general population across different SLD risk groups (MASLD, Met-ALD and ALD) and various age classes. METHODS: We analysed a total of 3372 individuals at risk of SLD from the 2017-2018 National Health and Nutrition Examination Survey (NHANES17-18), projected to 152.3 million U.S. adults, 300,329 from the UK Biobank (UKBB) and 57,644 from the Biobank Japan (BBJ). We assessed liver stiffness measurement (LSM) ≥8 kPa and liver-related events occurring within 3 and 10 years (3/10 year-LREs) as outcomes. We defined MASLD, MetALD, and ALD according to recent international recommendations. RESULTS: FIB4 sensitivity for LSM ≥ 8 kPa was low (27.7%), but it ranged approximately 80%-90% for 3-year LREs. Using FIB4, 22%-57% of subjects across the three cohorts were identified as candidates for vibration-controlled transient elastography (VCTE), which was mostly avoidable (positive predictive value of FIB4 ≥ 1.3 for LSM ≥ 8 kPa ranging 9.5%-13% across different SLD categories). Sensitivity for LSM ≥ 8 kPa and LREs increased with increasing alcohol intake (ALD>MetALD>MASLD) and age classes. For individuals aged ≥65 years, using the recommended age-adjusted FIB4 cut-off (≥2) substantially reduced sensitivity for LSM ≥ 8 kPa and LREs. CONCLUSIONS: The first-step FIB4 EASL CP is poorly accurate and feasible for individuals at risk of SLD in the general population. It is crucial to enhance the screening strategy with a first-step approach able to reduce unnecessary VCTEs and optimise their yield.


Subject(s)
Elasticity Imaging Techniques , Humans , Male , Middle Aged , Female , Adult , Aged , Elasticity Imaging Techniques/methods , Nutrition Surveys , Risk Assessment/methods , Fatty Liver , Risk Factors , Non-alcoholic Fatty Liver Disease , Mass Screening/methods , United States/epidemiology , Liver Cirrhosis , Sensitivity and Specificity , Japan/epidemiology
4.
Cell Rep Med ; 5(2): 101430, 2024 Feb 20.
Article in English | MEDLINE | ID: mdl-38382466

ABSTRACT

Primary open-angle glaucoma (POAG), a leading cause of irreversible blindness globally, shows disparity in prevalence and manifestations across ancestries. We perform meta-analysis across 15 biobanks (of the Global Biobank Meta-analysis Initiative) (n = 1,487,441: cases = 26,848) and merge with previous multi-ancestry studies, with the combined dataset representing the largest and most diverse POAG study to date (n = 1,478,037: cases = 46,325) and identify 17 novel significant loci, 5 of which were ancestry specific. Gene-enrichment and transcriptome-wide association analyses implicate vascular and cancer genes, a fifth of which are primary ciliary related. We perform an extensive statistical analysis of SIX6 and CDKN2B-AS1 loci in human GTEx data and across large electronic health records showing interaction between SIX6 gene and causal variants in the chr9p21.3 locus, with expression effect on CDKN2A/B. Our results suggest that some POAG risk variants may be ancestry specific, sex specific, or both, and support the contribution of genes involved in programmed cell death in POAG pathogenesis.


Subject(s)
Genetic Predisposition to Disease , Glaucoma, Open-Angle , Male , Female , Humans , Genetic Predisposition to Disease/genetics , Glaucoma, Open-Angle/genetics , Glaucoma, Open-Angle/epidemiology , Polymorphism, Single Nucleotide , Cell Proliferation , Biology
6.
Nat Genet ; 56(2): 222-233, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38177345

ABSTRACT

Most genome-wide association studies (GWAS) of major depression (MD) have been conducted in samples of European ancestry. Here we report a multi-ancestry GWAS of MD, adding data from 21 cohorts with 88,316 MD cases and 902,757 controls to previously reported data. This analysis used a range of measures to define MD and included samples of African (36% of effective sample size), East Asian (26%) and South Asian (6%) ancestry and Hispanic/Latin American participants (32%). The multi-ancestry GWAS identified 53 significantly associated novel loci. For loci from GWAS in European ancestry samples, fewer than expected were transferable to other ancestry groups. Fine mapping benefited from additional sample diversity. A transcriptome-wide association study identified 205 significantly associated novel genes. These findings suggest that, for MD, increasing ancestral and global diversity in genetic studies may be particularly important to ensure discovery of core genes and inform about transferability of findings.


Subject(s)
Depressive Disorder, Major , Genome-Wide Association Study , Humans , Genetic Predisposition to Disease , Depressive Disorder, Major/genetics , Depression , Chromosome Mapping , Polymorphism, Single Nucleotide/genetics
7.
Nature ; 625(7993): 92-100, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Subject(s)
Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic
8.
Nat Genet ; 56(1): 162-169, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38036779

ABSTRACT

Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Bayes Theorem , Multifactorial Inheritance , Algorithms
9.
Cell Genom ; 3(10): 100408, 2023 Oct 11.
Article in English | MEDLINE | ID: mdl-37868036

ABSTRACT

Polygenic risk scores (PRSs) developed from multi-ancestry genome-wide association studies (GWASs), PRSmulti, hold promise for improving PRS accuracy and generalizability across populations. To establish best practices for leveraging the increasing diversity of genomic studies, we investigated how various factors affect the performance of PRSmulti compared with PRSs constructed from single-ancestry GWASs (PRSsingle). Through extensive simulations and empirical analyses, we showed that PRSmulti overall outperformed PRSsingle in understudied populations, except when the understudied population represented a small proportion of the multi-ancestry GWAS. Furthermore, integrating PRSs based on local ancestry-informed GWASs and large-scale, European-based PRSs improved predictive performance in understudied African populations, especially for less polygenic traits with large-effect ancestry-enriched variants. Our work highlights the importance of diversifying genomic studies to achieve equitable PRS performance across ancestral populations and provides guidance for developing PRSs from multiple studies.

10.
PLoS Genet ; 19(9): e1010932, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37721944

ABSTRACT

The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases.


Subject(s)
Multifactorial Inheritance , Quantitative Trait Loci , Humans , Quantitative Trait Loci/genetics , Genotype , Base Sequence , Genome-Wide Association Study , Polymorphism, Single Nucleotide
11.
Nature ; 620(7975): 839-848, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37587338

ABSTRACT

Mitochondrial DNA (mtDNA) is a maternally inherited, high-copy-number genome required for oxidative phosphorylation1. Heteroplasmy refers to the presence of a mixture of mtDNA alleles in an individual and has been associated with disease and ageing. Mechanisms underlying common variation in human heteroplasmy, and the influence of the nuclear genome on this variation, remain insufficiently explored. Here we quantify mtDNA copy number (mtCN) and heteroplasmy using blood-derived whole-genome sequences from 274,832 individuals and perform genome-wide association studies to identify associated nuclear loci. Following blood cell composition correction, we find that mtCN declines linearly with age and is associated with variants at 92 nuclear loci. We observe that nearly everyone harbours heteroplasmic mtDNA variants obeying two principles: (1) heteroplasmic single nucleotide variants tend to arise somatically and accumulate sharply after the age of 70 years, whereas (2) heteroplasmic indels are maternally inherited as mixtures with relative levels associated with 42 nuclear loci involved in mtDNA replication, maintenance and novel pathways. These loci may act by conferring a replicative advantage to certain mtDNA alleles. As an illustrative example, we identify a length variant carried by more than 50% of humans at position chrM:302 within a G-quadruplex previously proposed to mediate mtDNA transcription/replication switching2,3. We find that this variant exerts cis-acting genetic control over mtDNA abundance and is itself associated in-trans with nuclear loci encoding machinery for this regulatory switch. Our study suggests that common variation in the nuclear genome can shape variation in mtCN and heteroplasmy dynamics across the human population.


Subject(s)
Cell Nucleus , DNA Copy Number Variations , DNA, Mitochondrial , Heteroplasmy , Mitochondria , Aged , Humans , DNA Copy Number Variations/genetics , DNA, Mitochondrial/genetics , Genome-Wide Association Study , Heteroplasmy/genetics , Mitochondria/genetics , Cell Nucleus/genetics , Alleles , Polymorphism, Single Nucleotide , INDEL Mutation , G-Quadruplexes
12.
iScience ; 26(7): 107051, 2023 Jul 21.
Article in English | MEDLINE | ID: mdl-37426350

ABSTRACT

Angiogenesis is a sequential process to extend new blood vessels from preexisting ones by sprouting and branching. During angiogenesis, endothelial cells (ECs) exhibit inhomogeneous multicellular behaviors referred to as "cell mixing," in which ECs repetitively exchange their relative positions, but the underlying mechanism remains elusive. Here we identified the coordinated linear and rotational movements potentiated by cell-cell contact as drivers of sprouting angiogenesis using in vitro and in silico approaches. VE-cadherin confers the coordinated linear motility that facilitated forward sprout elongation, although it is dispensable for rotational movement, which was synchronous without VE-cadherin. Mathematical modeling recapitulated the EC motility in the two-cell state and angiogenic morphogenesis with the effects of VE-cadherin-knockout. Finally, we found that VE-cadherin-dependent EC compartmentalization potentiated branch elongations, and confirmed this by mathematical simulation. Collectively, we propose a way to understand angiogenesis, based on unique EC behavioral properties that are partially dependent on VE-cadherin function.

13.
Nat Genet ; 55(8): 1267-1276, 2023 08.
Article in English | MEDLINE | ID: mdl-37443254

ABSTRACT

Genome-wide association studies (GWASs) are a valuable tool for understanding the biology of complex human traits and diseases, but associated variants rarely point directly to causal genes. In the present study, we introduce a new method, polygenic priority score (PoPS), that learns trait-relevant gene features, such as cell-type-specific expression, to prioritize genes at GWAS loci. Using a large evaluation set of genes with fine-mapped coding variants, we show that PoPS and the closest gene individually outperform other gene prioritization methods, but observe the best overall performance by combining PoPS with orthogonal methods. Using this combined approach, we prioritize 10,642 unique gene-trait pairs across 113 complex traits and diseases with high precision, finding not only well-established gene-trait relationships but nominating new genes at unresolved loci, such as LGR4 for estimated glomerular filtration rate and CCR7 for deep vein thrombosis. Overall, we demonstrate that PoPS provides a powerful addition to the gene prioritization toolbox.


Subject(s)
Multifactorial Inheritance , Quantitative Trait Loci , Humans , Multifactorial Inheritance/genetics , Quantitative Trait Loci/genetics , Genome-Wide Association Study/methods , Genetic Predisposition to Disease/genetics , Phenotype , Polymorphism, Single Nucleotide/genetics
14.
Nat Med ; 29(7): 1611-1612, 2023 07.
Article in English | MEDLINE | ID: mdl-37464034
16.
bioRxiv ; 2023 Apr 07.
Article in English | MEDLINE | ID: mdl-37066341

ABSTRACT

Splicing quantitative trait loci (QTLs) have been implicated as a common mechanism underlying complex trait associations. However, utilising splicing QTLs in target discovery and prioritisation has been challenging due to extensive data normalisation which often renders the direction of the genetic effect as well as its magnitude difficult to interpret. This is further complicated by the fact that strong expression QTLs often manifest as weak splicing QTLs and vice versa, making it difficult to uniquely identify the underlying molecular mechanism at each locus. We find that these ambiguities can be mitigated by visualising the association between the genotype and average RNA sequencing read coverage in the region. Here, we generate these QTL coverage plots for 1.7 million molecular QTL associations in the eQTL Catalogue identified with five quantification methods. We illustrate the utility of these QTL coverage plots by performing colocalisation between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. We find that while visually confirmed splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases. All our association summary statistics and QTL coverage plots are freely available at https://www.ebi.ac.uk/eqtl/.

17.
Science ; 379(6639): 1341-1348, 2023 03 31.
Article in English | MEDLINE | ID: mdl-36996212

ABSTRACT

Classical statistical genetics theory defines dominance as any deviation from a purely additive, or dosage, effect of a genotype on a trait, which is known as the dominance deviation. Dominance is well documented in plant and animal breeding. Outside of rare monogenic traits, however, evidence in humans is limited. We systematically examined common genetic variation across 1060 traits in a large population cohort (UK Biobank, N = 361,194 samples analyzed) for evidence of dominance effects. We then developed a computationally efficient method to rapidly assess the aggregate contribution of dominance deviations to heritability. Lastly, observing that dominance associations are inherently less correlated between sites at a genomic locus than their additive counterparts, we explored whether they may be leveraged to identify causal variants more confidently.


Subject(s)
Biological Specimen Banks , Genes, Dominant , Genetic Variation , Multifactorial Inheritance , Animals , Humans , Breeding , Genotype , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide , United Kingdom
19.
Cell Genom ; 3(1): 100241, 2023 Jan 11.
Article in English | MEDLINE | ID: mdl-36777179

ABSTRACT

Polygenic risk scores (PRSs) have been widely explored in precision medicine. However, few studies have thoroughly investigated their best practices in global populations across different diseases. We here utilized data from Global Biobank Meta-analysis Initiative (GBMI) to explore methodological considerations and PRS performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRSs using pruning and thresholding (P + T) and PRS-continuous shrinkage (CS). For both methods, using a European-based linkage disequilibrium (LD) reference panel resulted in comparable or higher prediction accuracy compared with several other non-European-based panels. PRS-CS overall outperformed the classic P + T method, especially for endpoints with higher SNP-based heritability. Notably, prediction accuracy is heterogeneous across endpoints, biobanks, and ancestries, especially for asthma, which has known variation in disease prevalence across populations. Overall, we provide lessons for PRS construction, evaluation, and interpretation using GBMI resources and highlight the importance of best practices for PRS in the biobank-scale genomics era.

SELECTION OF CITATIONS
SEARCH DETAIL
...