Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
bioRxiv ; 2024 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-38766054

RESUMEN

Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.

2.
Nature ; 626(8000): 799-807, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38326615

RESUMEN

Linking variants from genome-wide association studies (GWAS) to underlying mechanisms of disease remains a challenge1-3. For some diseases, a successful strategy has been to look for cases in which multiple GWAS loci contain genes that act in the same biological pathway1-6. However, our knowledge of which genes act in which pathways is incomplete, particularly for cell-type-specific pathways or understudied genes. Here we introduce a method to connect GWAS variants to functions. This method links variants to genes using epigenomics data, links genes to pathways de novo using Perturb-seq and integrates these data to identify convergence of GWAS loci onto pathways. We apply this approach to study the role of endothelial cells in genetic risk for coronary artery disease (CAD), and discover 43 CAD GWAS signals that converge on the cerebral cavernous malformation (CCM) signalling pathway. Two regulators of this pathway, CCM2 and TLNRD1, are each linked to a CAD risk variant, regulate other CAD risk genes and affect atheroprotective processes in endothelial cells. These results suggest a model whereby CAD risk is driven in part by the convergence of causal genes onto a particular transcriptional pathway in endothelial cells. They highlight shared genes between common and rare vascular diseases (CAD and CCM), and identify TLNRD1 as a new, previously uncharacterized member of the CCM signalling pathway. This approach will be widely useful for linking variants to functions for other common polygenic diseases.


Asunto(s)
Enfermedad de la Arteria Coronaria , Células Endoteliales , Estudio de Asociación del Genoma Completo , Hemangioma Cavernoso del Sistema Nervioso Central , Humanos , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/patología , Células Endoteliales/metabolismo , Células Endoteliales/patología , Predisposición Genética a la Enfermedad/genética , Hemangioma Cavernoso del Sistema Nervioso Central/genética , Hemangioma Cavernoso del Sistema Nervioso Central/patología , Polimorfismo de Nucleótido Simple , Epigenómica , Transducción de Señal/genética , Herencia Multifactorial
3.
Nat Genet ; 56(1): 162-169, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38036779

RESUMEN

Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Teorema de Bayes , Herencia Multifactorial , Algoritmos
4.
bioRxiv ; 2023 Nov 13.
Artículo en Inglés | MEDLINE | ID: mdl-38014075

RESUMEN

Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and largescale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 elementgene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancerpromoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.

5.
Nat Commun ; 14(1): 7659, 2023 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-38036535

RESUMEN

Many of the Alzheimer's disease (AD) risk genes are specifically expressed in microglia and astrocytes, but how and when the genetic risk localizing to these cell types contributes to AD pathophysiology remains unclear. Here, we derive cell-type-specific AD polygenic risk scores (ADPRS) from two extensively characterized datasets and uncover the impact of cell-type-specific genetic risk on AD endophenotypes. In an autopsy dataset spanning all stages of AD (n = 1457), the astrocytic ADPRS affected diffuse and neuritic plaques (amyloid-ß), while microglial ADPRS affected neuritic plaques, microglial activation, neurofibrillary tangles (tau), and cognitive decline. In an independent neuroimaging dataset of cognitively unimpaired elderly (n = 2921), astrocytic ADPRS was associated with amyloid-ß, and microglial ADPRS was associated with amyloid-ß and tau, connecting cell-type-specific genetic risk with AD pathology even before symptom onset. Together, our study provides human genetic evidence implicating multiple glial cell types in AD pathophysiology, starting from the preclinical stage.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Anciano , Enfermedad de Alzheimer/metabolismo , Placa Amiloide/metabolismo , Proteínas tau/genética , Proteínas tau/metabolismo , Péptidos beta-Amiloides/metabolismo , Ovillos Neurofibrilares/genética , Ovillos Neurofibrilares/metabolismo , Factores de Riesgo
6.
PLoS Genet ; 19(9): e1010932, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37721944

RESUMEN

The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases.


Asunto(s)
Herencia Multifactorial , Sitios de Carácter Cuantitativo , Humanos , Sitios de Carácter Cuantitativo/genética , Genotipo , Secuencia de Bases , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple
7.
Nat Genet ; 55(8): 1267-1276, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37443254

RESUMEN

Genome-wide association studies (GWASs) are a valuable tool for understanding the biology of complex human traits and diseases, but associated variants rarely point directly to causal genes. In the present study, we introduce a new method, polygenic priority score (PoPS), that learns trait-relevant gene features, such as cell-type-specific expression, to prioritize genes at GWAS loci. Using a large evaluation set of genes with fine-mapped coding variants, we show that PoPS and the closest gene individually outperform other gene prioritization methods, but observe the best overall performance by combining PoPS with orthogonal methods. Using this combined approach, we prioritize 10,642 unique gene-trait pairs across 113 complex traits and diseases with high precision, finding not only well-established gene-trait relationships but nominating new genes at unresolved loci, such as LGR4 for estimated glomerular filtration rate and CCR7 for deep vein thrombosis. Overall, we demonstrate that PoPS provides a powerful addition to the gene prioritization toolbox.


Asunto(s)
Herencia Multifactorial , Sitios de Carácter Cuantitativo , Humanos , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo/genética , Estudio de Asociación del Genoma Completo/métodos , Predisposición Genética a la Enfermedad/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética
8.
medRxiv ; 2023 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-37333223

RESUMEN

Alzheimer's disease (AD) heritability is enriched in glial genes, but how and when cell-type-specific genetic risk contributes to AD remains unclear. Here, we derive cell-type-specific AD polygenic risk scores (ADPRS) from two extensively characterized datasets. In an autopsy dataset spanning all stages of AD (n=1,457), astrocytic (Ast) ADPRS was associated with both diffuse and neuritic Aß plaques, while microglial (Mic) ADPRS was associated with neuritic Aß plaques, microglial activation, tau, and cognitive decline. Causal modeling analyses further clarified these relationships. In an independent neuroimaging dataset of cognitively unimpaired elderly (n=2,921), Ast-ADPRS were associated with Aß, and Mic-ADPRS was associated with Aß and tau, showing a consistent pattern with the autopsy dataset. Oligodendrocytic and excitatory neuronal ADPRSs were associated with tau, but only in the autopsy dataset including symptomatic AD cases. Together, our study provides human genetic evidence implicating multiple glial cell types in AD pathophysiology, starting from the preclinical stage.

9.
bioRxiv ; 2023 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-37066341

RESUMEN

Splicing quantitative trait loci (QTLs) have been implicated as a common mechanism underlying complex trait associations. However, utilising splicing QTLs in target discovery and prioritisation has been challenging due to extensive data normalisation which often renders the direction of the genetic effect as well as its magnitude difficult to interpret. This is further complicated by the fact that strong expression QTLs often manifest as weak splicing QTLs and vice versa, making it difficult to uniquely identify the underlying molecular mechanism at each locus. We find that these ambiguities can be mitigated by visualising the association between the genotype and average RNA sequencing read coverage in the region. Here, we generate these QTL coverage plots for 1.7 million molecular QTL associations in the eQTL Catalogue identified with five quantification methods. We illustrate the utility of these QTL coverage plots by performing colocalisation between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. We find that while visually confirmed splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases. All our association summary statistics and QTL coverage plots are freely available at https://www.ebi.ac.uk/eqtl/.

10.
Cell Metab ; 35(5): 887-905.e11, 2023 05 02.
Artículo en Inglés | MEDLINE | ID: mdl-37075753

RESUMEN

Cellular exposure to free fatty acids (FFAs) is implicated in the pathogenesis of obesity-associated diseases. However, there are no scalable approaches to comprehensively assess the diverse FFAs circulating in human plasma. Furthermore, assessing how FFA-mediated processes interact with genetic risk for disease remains elusive. Here, we report the design and implementation of fatty acid library for comprehensive ontologies (FALCON), an unbiased, scalable, and multimodal interrogation of 61 structurally diverse FFAs. We identified a subset of lipotoxic monounsaturated fatty acids associated with decreased membrane fluidity. Furthermore, we prioritized genes that reflect the combined effects of harmful FFA exposure and genetic risk for type 2 diabetes (T2D). We found that c-MAF-inducing protein (CMIP) protects cells from FFA exposure by modulating Akt signaling. In sum, FALCON empowers the study of fundamental FFA biology and offers an integrative approach to identify much needed targets for diverse diseases associated with disordered FFA metabolism.


Asunto(s)
Diabetes Mellitus Tipo 2 , Ácidos Grasos no Esterificados , Humanos , Ácidos Grasos no Esterificados/metabolismo , Ácidos Grasos , Transducción de Señal , Biología
11.
bioRxiv ; 2023 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-36865221

RESUMEN

Cellular exposure to free fatty acids (FFA) is implicated in the pathogenesis of obesity-associated diseases. However, studies to date have assumed that a few select FFAs are representative of broad structural categories, and there are no scalable approaches to comprehensively assess the biological processes induced by exposure to diverse FFAs circulating in human plasma. Furthermore, assessing how these FFA- mediated processes interact with genetic risk for disease remains elusive. Here we report the design and implementation of FALCON (Fatty Acid Library for Comprehensive ONtologies) as an unbiased, scalable and multimodal interrogation of 61 structurally diverse FFAs. We identified a subset of lipotoxic monounsaturated fatty acids (MUFAs) with a distinct lipidomic profile associated with decreased membrane fluidity. Furthermore, we developed a new approach to prioritize genes that reflect the combined effects of exposure to harmful FFAs and genetic risk for type 2 diabetes (T2D). Importantly, we found that c-MAF inducing protein (CMIP) protects cells from exposure to FFAs by modulating Akt signaling and we validated the role of CMIP in human pancreatic beta cells. In sum, FALCON empowers the study of fundamental FFA biology and offers an integrative approach to identify much needed targets for diverse diseases associated with disordered FFA metabolism. Highlights: FALCON (Fatty Acid Library for Comprehensive ONtologies) enables multimodal profiling of 61 free fatty acids (FFAs) to reveal 5 FFA clusters with distinct biological effectsFALCON is applicable to many and diverse cell typesA subset of monounsaturated FAs (MUFAs) equally or more toxic than canonical lipotoxic saturated FAs (SFAs) leads to decreased membrane fluidityNew approach prioritizes genes that represent the combined effects of environmental (FFA) exposure and genetic risk for diseaseC-Maf inducing protein (CMIP) is identified as a suppressor of FFA-induced lipotoxicity via Akt-mediated signaling.

12.
Cell ; 185(16): 3041-3055.e25, 2022 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-35917817

RESUMEN

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma Humano , Variaciones en el Número de Copia de ADN/genética , Dosificación de Gen , Haploinsuficiencia/genética , Humanos
13.
Sci Adv ; 8(16): eabl4602, 2022 04 22.
Artículo en Inglés | MEDLINE | ID: mdl-35452290

RESUMEN

Coronary artery disease (CAD) remains the leading cause of death despite scientific advances. Elucidating shared CAD/pneumonia pathways may reveal novel insights regarding CAD pathways. We performed genome-wide pleiotropy analyses of CAD and pneumonia, examined the causal effects of the expression of genes near independently replicated SNPs and interacting genes with CAD and pneumonia, and tested interactions between disruptive coding mutations of each pleiotropic gene and smoking status on CAD and pneumonia risks. Identified pleiotropic SNPs were annotated to ADAMTS7 and IL6R. Increased ADAMTS7 expression across tissues consistently showed decreased risk for CAD and increased risk for pneumonia; increased IL6R expression showed increased risk for CAD and decreased risk for pneumonia. We similarly observed opposing CAD/pneumonia effects for NLRP3. Reduced ADAMTS7 expression conferred a reduced CAD risk without increased pneumonia risk only among never-smokers. Genetic immune-inflammatory axes of CAD linked to respiratory infections implicate ADAMTS7 and IL6R, and related genes.


Asunto(s)
Enfermedad de la Arteria Coronaria , Pleiotropía Genética , Neumonía , Proteína ADAMTS7/genética , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/inmunología , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Neumonía/genética , Neumonía/inmunología , Polimorfismo de Nucleótido Simple , Receptores de Interleucina-6/genética
14.
Nat Genet ; 54(4): 450-458, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35393596

RESUMEN

Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.


Asunto(s)
Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Desequilibrio de Ligamiento , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Factores de Riesgo
15.
Cell Genom ; 2(12)2022 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-36643910

RESUMEN

Meta-analysis is pervasively used to combine multiple genome-wide association studies (GWASs). Fine-mapping of meta-analysis studies is typically performed as in a single-cohort study. Here, we first demonstrate that heterogeneity (e.g., of sample size, phenotyping, imputation) hurts calibration of meta-analysis fine-mapping. We propose a summary statistics-based quality-control (QC) method, suspicious loci analysis of meta-analysis summary statistics (SLALOM), that identifies suspicious loci for meta-analysis fine-mapping by detecting outliers in association statistics. We validate SLALOM in simulations and the GWAS Catalog. Applying SLALOM to 14 meta-analyses from the Global Biobank Meta-analysis Initiative (GBMI), we find that 67% of loci show suspicious patterns that call into question fine-mapping accuracy. These predicted suspicious loci are significantly depleted for having nonsynonymous variants as lead variant (2.7×; Fisher's exact p = 7.3 × 10-4). We find limited evidence of fine-mapping improvement in the GBMI meta-analyses compared with individual biobanks. We urge extreme caution when interpreting fine-mapping results from meta-analysis of heterogeneous cohorts.

17.
Cell ; 184(20): 5247-5260.e19, 2021 09 30.
Artículo en Inglés | MEDLINE | ID: mdl-34534445

RESUMEN

3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.


Asunto(s)
Regiones no Traducidas 3'/genética , Evolución Biológica , Enfermedad/genética , Estudio de Asociación del Genoma Completo , Algoritmos , Alelos , Regulación de la Expresión Génica , Genes Reporteros , Variación Genética , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Polirribosomas/metabolismo , Sitios de Carácter Cuantitativo/genética , ARN/genética
19.
Nat Genet ; 53(8): 1166-1176, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34326544

RESUMEN

Effective interpretation of genome function and genetic variation requires a shift from epigenetic mapping of cis-regulatory elements (CREs) to characterization of endogenous function. We developed hybridization chain reaction fluorescence in situ hybridization coupled with flow cytometry (HCR-FlowFISH), a broadly applicable approach to characterize CRISPR-perturbed CREs via accurate quantification of native transcripts, alongside CRISPR activity screen analysis (CASA), a hierarchical Bayesian model to quantify CRE activity. Across >325,000 perturbations, we provide evidence that CREs can regulate multiple genes, skip over the nearest gene and display activating and/or silencing effects. At the cholesterol-level-associated FADS locus, we combine endogenous screens with reporter assays to exhaustively characterize multiple genome-wide association signals, functionally nominate causal variants and, importantly, identify their target genes.


Asunto(s)
Hibridación Fluorescente in Situ/métodos , Secuencias Reguladoras de Ácidos Nucleicos , Proteínas Adaptadoras Transductoras de Señales/genética , Teorema de Bayes , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , delta-5 Desaturasa de Ácido Graso , Desoxirribonucleasa I/genética , Desoxirribonucleasa I/metabolismo , Ácido Graso Desaturasas/genética , Citometría de Flujo , Factor de Transcripción GATA1/genética , Humanos , Células K562 , Proteínas con Dominio LIM/genética , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Proteínas Proto-Oncogénicas/genética , Sitios de Carácter Cuantitativo , ARN Guía de Kinetoplastida
20.
Nat Commun ; 12(1): 3394, 2021 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-34099641

RESUMEN

The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.


Asunto(s)
Mapeo Cromosómico/métodos , Biología Computacional/métodos , Sitios de Carácter Cuantitativo , Aprendizaje Automático Supervisado , Adulto , Estudios de Cohortes , Conjuntos de Datos como Asunto , Perfilación de la Expresión Génica , Humanos , Polimorfismo de Nucleótido Simple
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...