Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 141
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 606(7912): 120-128, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35545678

RESUMO

Non-coding genetic variants may cause disease by modulating gene expression. However, identifying these expression quantitative trait loci (eQTLs) is complicated by differences in gene regulation across fluid functional cell states within cell types. These states-for example, neurotransmitter-driven programs in astrocytes or perivascular fibroblast differentiation-are obscured in eQTL studies that aggregate cells1,2. Here we modelled eQTLs at single-cell resolution in one complex cell type: memory T cells. Using more than 500,000 unstimulated memory T cells from 259 Peruvian individuals, we show that around one-third of 6,511 cis-eQTLs had effects that were mediated by continuous multimodally defined cell states, such as cytotoxicity and regulatory capacity. In some loci, independent eQTL variants had opposing cell-state relationships. Autoimmune variants were enriched in cell-state-dependent eQTLs, including risk variants for rheumatoid arthritis near ORMDL3 and CTLA4; this indicates that cell-state context is crucial to understanding potential eQTL pathogenicity. Moreover, continuous cell states explained more variation in eQTLs than did conventional discrete categories, such as CD4+ versus CD8+, suggesting that modelling eQTLs and cell states at single-cell resolution can expand insight into gene regulation in functionally heterogeneous cell types.


Assuntos
Predisposição Genética para Doença , Células T de Memória , Locos de Características Quantitativas , Regulação da Expressão Gênica , Predisposição Genética para Doença/genética , Humanos , Células T de Memória/imunologia , Células T de Memória/metabolismo , Peru , Locos de Características Quantitativas/genética
2.
Nature ; 593(7858): 238-243, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33828297

RESUMO

Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complex traits, each of which could reveal insights into the mechanisms of disease1. Many of the underlying causal variants may affect enhancers2,3, but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types4. Here we apply this ABC model to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.


Assuntos
Elementos Facilitadores Genéticos/genética , Predisposição Genética para Doença , Variação Genética/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Doenças Inflamatórias Intestinais/genética , Linhagem Celular , Cromossomos Humanos Par 10/genética , Ciclofilinas/genética , Células Dendríticas , Feminino , Humanos , Macrófagos/metabolismo , Masculino , Mitocôndrias/metabolismo , Especificidade de Órgãos/genética , Fenótipo
3.
Am J Hum Genet ; 109(3): 393-404, 2022 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-35108496

RESUMO

Identifying gene sets that are associated to disease can provide valuable biological knowledge, but a fundamental challenge of gene set analyses of GWAS data is linking disease-associated SNPs to genes. Transcriptome-wide association studies (TWASs) detect associations between the genetically predicted expression of a gene and disease risk, thus implicating candidate disease genes. However, causal disease genes at TWAS-associated loci generally remain unknown due to gene co-regulation, which leads to correlations across genes in predicted expression. We developed a method, gene co-regulation score (GCSC) regression, to identify gene sets that are enriched for disease heritability explained by predicted expression. GCSC regresses TWAS chi-square statistics on gene co-regulation scores reflecting correlations in predicted gene expression; a gene set is enriched for heritability if genes with high co-regulation to the set have higher TWAS chi-square statistics than genes with low co-regulation to the set, beyond what is expected based on co-regulation to all genes. We verified via simulations that GCSC is well calibrated and well powered. We applied GCSC to gene expression data from GTEx (48 tissues) and GWAS summary statistics for 43 independent diseases and complex traits analyzing a broad set of biological pathways and specifically expressed gene sets. We identified many enriched sets, recapitulating known biology. For Alzheimer disease, we detected evidence of an immune basis, and specifically a role for antigen presentation, in analyses of both biological pathways and specifically expressed gene sets. Our results highlight the advantages of leveraging gene co-regulation within the TWAS framework to identify enriched gene sets.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Predisposição Genética para Doença , Humanos , Herança Multifatorial , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Transcriptoma
4.
Nature ; 559(7714): 350-355, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29995854

RESUMO

The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6-9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3-TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.


Assuntos
Aberrações Cromossômicas , Células Clonais/citologia , Células Clonais/metabolismo , Hematopoese/genética , Mosaicismo , Adulto , Idoso , Alelos , Bancos de Espécimes Biológicos , Quebra Cromossômica , Sítios Frágeis do Cromossomo/genética , Cromossomos Humanos Par 10/genética , Feminino , Saúde , Neoplasias Hematológicas/genética , Neoplasias Hematológicas/mortalidade , Humanos , Masculino , Pessoa de Meia-Idade , Penetrância , Reino Unido
5.
Hum Mol Genet ; 30(16): 1521-1534, 2021 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-33987664

RESUMO

It is important to study the genetics of complex traits in diverse populations. Here, we introduce covariate-adjusted linkage disequilibrium (LD) score regression (cov-LDSC), a method to estimate SNP-heritability (${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}})$ and its enrichment in homogenous and admixed populations with summary statistics and in-sample LD estimates. In-sample LD can be estimated from a subset of the genome-wide association studies samples, allowing our method to be applied efficiently to very large cohorts. In simulations, we show that unadjusted LDSC underestimates ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ by 10-60% in admixed populations; in contrast, cov-LDSC is robustly accurate. We apply cov-LDSC to genotyping data from 8124 individuals, mostly of admixed ancestry, from the Slim Initiative in Genomic Medicine for the Americas study, and to approximately 161 000 Latino-ancestry individuals, 47 000 African American-ancestry individuals and 135 000 European-ancestry individuals, as classified by 23andMe. We estimate ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and detect heritability enrichment in three quantitative and five dichotomous phenotypes, making this, to our knowledge, the most comprehensive heritability-based analysis of admixed individuals to date. Most traits have high concordance of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and consistent tissue-specific heritability enrichment among different populations. However, for age at menarche, we observe population-specific heritability estimates of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$. We observe consistent patterns of tissue-specific heritability enrichment across populations; for example, in the limbic system for BMI, the per-standardized-annotation effect size $ \tau $* is 0.16 ± 0.04, 0.28 ± 0.11 and 0.18 ± 0.03 in the Latino-, African American- and European-ancestry populations, respectively. Our approach is a powerful way to analyze genetic data for complex traits from admixed populations.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Desequilíbrio de Ligação/genética , Herança Multifatorial/genética , Técnicas de Genotipagem/estatística & dados numéricos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Característica Quantitativa Herdável
6.
Nat Rev Genet ; 18(2): 117-127, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27840428

RESUMO

During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.


Assuntos
Variação Genética/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Estatísticos , Locos de Características Quantitativas , Característica Quantitativa Herdável , Simulação por Computador , Genótipo , Humanos , Modelos Genéticos , Fenótipo
7.
Hum Mol Genet ; 29(7): 1057-1067, 2020 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-31595288

RESUMO

Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10-14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.


Assuntos
Cromatina/genética , Doenças Genéticas Inatas/genética , Anotação de Sequência Molecular , Fatores de Transcrição/genética , Sítios de Ligação/genética , Biologia Computacional , Regulação da Expressão Gênica/genética , Doenças Genéticas Inatas/classificação , Doenças Genéticas Inatas/patologia , Humanos , Desequilíbrio de Ligação/genética , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Ligação Proteica/genética
8.
Am J Hum Genet ; 104(4): 611-624, 2019 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-30905396

RESUMO

Regulatory elements, e.g., enhancers and promoters, have been widely reported to be enriched for disease and complex trait heritability. We investigated how this enrichment varies with the age of the underlying genome sequence, the conservation of regulatory function across species, and the target gene of the regulatory element. We estimated heritability enrichment by applying stratified LD score regression to summary statistics from 41 independent diseases and complex traits (average N = 320K) and meta-analyzing results across traits. Enrichment of human putative enhancers and promoters was larger in elements with older sequence age, assessed via alignment with other species irrespective of conserved functionality: putative enhancer elements with ancient sequence age (older than the split between marsupial and placental mammals) were 8.8× enriched (versus 2.5× for all putative enhancers; p = 3e-14), and promoter elements with ancient sequence age were 13.5× enriched (versus 5.1× for all promoters; p = 5e-16). Enrichment of human putative enhancers and promoters was also larger in elements whose regulatory function was conserved across species, e.g., human putative enhancers that were enhancers in ≥5 of 9 other mammals were 4.6× enriched (p = 5e-12 versus all putative enhancers). Enrichment of human promoters was larger in promoters of loss-of-function intolerant genes: 12.0× enrichment (p = 8e-15 versus all promoters). The mean value of several measures of negative selection within these genomic annotations mirrored all of these findings. Notably, the annotations with these excess heritability enrichments were jointly significant conditional on each other and on our baseline-LD model, which includes a broad set of coding, conserved, regulatory, and LD-related annotations.


Assuntos
Elementos Facilitadores Genéticos , Doenças Genéticas Inatas/genética , Regiões Promotoras Genéticas , Animais , Sequência Conservada , Estudo de Associação Genômica Ampla , Genômica , Humanos , Desequilíbrio de Ligação , Mamíferos/genética , Marsupiais/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Especificidade da Espécie
9.
Am J Hum Genet ; 105(3): 456-476, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31402091

RESUMO

Complex traits and common diseases are extremely polygenic, their heritability spread across thousands of loci. One possible explanation is that thousands of genes and loci have similarly important biological effects when mutated. However, we hypothesize that for most complex traits, relatively few genes and loci are critical, and negative selection-purging large-effect mutations in these regions-leaves behind common-variant associations in thousands of less critical regions instead. We refer to this phenomenon as flattening. To quantify its effects, we introduce a mathematical definition of polygenicity, the effective number of independently associated SNPs (Me), which describes how evenly the heritability of a trait is spread across the genome. We developed a method, stratified LD fourth moments regression (S-LD4M), to estimate Me, validating that it produces robust estimates in simulations. Analyzing 33 complex traits (average N = 361k), we determined that heritability is spread ∼4× more evenly among common SNPs than among low-frequency SNPs. This difference, together with evolutionary modeling of new mutations, suggests that complex traits would be orders of magnitude less polygenic if not for the influence of negative selection. We also determined that heritability is spread more evenly within functionally important regions in proportion to their heritability enrichment; functionally important regions do not harbor common SNPs with greatly increased causal effect sizes, due to selective constraint. Our results suggest that for most complex traits, the genes and loci with the most critical biological effects often differ from those with the strongest common-variant associations.


Assuntos
Herança Multifatorial , Seleção Genética , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único
10.
Am J Hum Genet ; 104(1): 65-75, 2019 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-30595370

RESUMO

Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory, and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, functionally informed novel discovery of risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9%-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N = 130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p values that do not use functional data. We replicated the additional loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N = 416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.


Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Calibragem , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Reações Falso-Positivas , Humanos , Probabilidade , Reprodutibilidade dos Testes , Reino Unido
11.
Am J Hum Genet ; 104(5): 879-895, 2019 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-31006511

RESUMO

Despite significant progress in annotating the genome with experimental methods, much of the regulatory noncoding genome remains poorly defined. Here we assert that regulatory elements may be characterized by leveraging local epigenomic signatures where specific transcription factors (TFs) are bound. To link these two features, we introduce IMPACT, a genome annotation strategy that identifies regulatory elements defined by cell-state-specific TF binding profiles, learned from 515 chromatin and sequence annotations. We validate IMPACT using multiple compelling applications. First, IMPACT distinguishes between bound and unbound TF motif sites with high accuracy (average AUPRC 0.81, SE 0.07; across 8 tested TFs) and outperforms state-of-the-art TF binding prediction methods, MocapG, MocapS, and Virtual ChIP-seq. Second, in eight tested cell types, RNA polymerase II IMPACT annotations capture more cis-eQTL variation than sequence-based annotations, such as promoters and TSS windows (25% average increase in enrichment). Third, integration with rheumatoid arthritis (RA) summary statistics from European (N = 38,242) and East Asian (N = 22,515) populations revealed that the top 5% of CD4+ Treg IMPACT regulatory elements capture 85.7% of RA h2, the most comprehensive explanation for RA h2 to date. In comparison, the average RA h2 captured by compared CD4+ T histone marks is 42.3% and by CD4+ T specifically expressed gene sets is 36.4%. Lastly, we find that IMPACT may be used in many different cell types to identify complex trait associated regulatory elements.


Assuntos
Artrite Reumatoide/metabolismo , Epigenoma , Epigenômica/métodos , Genoma Humano , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo , Artrite Reumatoide/genética , Cromatina/genética , Cromatina/metabolismo , Biologia Computacional/métodos , Histonas/genética , Histonas/metabolismo , Humanos , Regiões Promotoras Genéticas , Ligação Proteica , Fatores de Transcrição/genética
12.
Am J Hum Genet ; 104(5): 896-913, 2019 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-31051114

RESUMO

Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and 75 known functional annotations (from the baseline-LD model), a stringent step that greatly reduced the number of pathways detected; most significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, emphasizing the importance of accounting for known annotations.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Genes/genética , Doenças Genéticas Inatas/genética , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Humanos , Anotação de Sequência Molecular , Fenótipo , Software
13.
Genet Epidemiol ; 43(2): 180-188, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30474154

RESUMO

Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated ρ g , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of ρ g depends both on the cross-population correlation of true causal effect sizes ( ρ b ) and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio ρ g / ρ b as a function of LD in each population. By applying existing methods to obtain estimates of ρ g , we can use this ratio to estimate ρ b . Our estimates of ρ b were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.


Assuntos
Genética Populacional , Adulto , Envelhecimento/genética , Artrite Reumatoide/genética , Bancos de Espécimes Biológicos , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Genótipo , Humanos , Fenótipo , Característica Quantitativa Herdável , Reino Unido
14.
Am J Hum Genet ; 100(1): 31-39, 2017 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-28017371

RESUMO

Mixed models have become the tool of choice for genetic association studies; however, standard mixed model methods may be poorly calibrated or underpowered under family sampling bias and/or case-control ascertainment. Previously, we introduced a liability threshold-based mixed model association statistic (LTMLM) to address case-control ascertainment in unrelated samples. Here, we consider family-biased case-control ascertainment, where case and control subjects are ascertained non-randomly with respect to family relatedness. Previous work has shown that this type of ascertainment can severely bias heritability estimates; we show here that it also impacts mixed model association statistics. We introduce a family-based association statistic (LT-Fam) that is robust to this problem. Similar to LTMLM, LT-Fam is computed from posterior mean liabilities (PML) under a liability threshold model; however, LT-Fam uses published narrow-sense heritability estimates to avoid the problem of biased heritability estimation, enabling correct calibration. In simulations with family-biased case-control ascertainment, LT-Fam was correctly calibrated (average χ2 = 1.00-1.02 for null SNPs), whereas the Armitage trend test (ATT), standard mixed model association (MLM), and case-control retrospective association test (CARAT) were mis-calibrated (e.g., average χ2 = 0.50-1.22 for MLM, 0.89-2.65 for CARAT). LT-Fam also attained higher power than other methods in some settings. In 1,259 type 2 diabetes-affected case subjects and 5,765 control subjects from the CARe cohort, downsampled to induce family-biased ascertainment, LT-Fam was correctly calibrated whereas ATT, MLM, and CARAT were again mis-calibrated. Our results highlight the importance of modeling family sampling bias in case-control datasets with related samples.


Assuntos
Família , Estudos de Associação Genética/métodos , Modelos Genéticos , Viés , Calibragem , Diabetes Mellitus Tipo 2/genética , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Estudos Retrospectivos
15.
Am J Hum Genet ; 100(4): 605-616, 2017 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-28343628

RESUMO

Genetic variants that modulate gene expression levels play an important role in the etiology of human diseases and complex traits. Although large-scale eQTL mapping studies routinely identify many local eQTLs, the molecular mechanisms by which genetic variants regulate expression remain unclear, particularly for distal eQTLs, which these studies are not well powered to detect. Here, we leveraged all variants (not just those that pass stringent significance thresholds) to analyze the functional architecture of local and distal regulation of gene expression in 15 human tissues by employing an extension of stratified LD-score regression that produces robust results in simulations. The top enriched functional categories in local regulation of peripheral-blood gene expression included coding regions (11.41×), conserved regions (4.67×), and four histone marks (p < 5 × 10-5 for all enrichments); local enrichments were similar across the 15 tissues. We also observed substantial enrichments for distal regulation of peripheral-blood gene expression: coding regions (4.47×), conserved regions (4.51×), and two histone marks (p < 3 × 10-7 for all enrichments). Analyses of the genetic correlation of gene expression across tissues confirmed that local regulation of gene expression is largely shared across tissues but that distal regulation is highly tissue specific. Our results elucidate the functional components of the genetic architecture of local and distal regulation of gene expression.


Assuntos
Regulação da Expressão Gênica , Ansiedade/genética , Simulação por Computador , Depressão/genética , Humanos , Desequilíbrio de Ligação , Especificidade de Órgãos , Locos de Características Quantitativas , Análise de Regressão , Gêmeos/genética
16.
Am J Hum Genet ; 99(1): 76-88, 2016 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-27321947

RESUMO

The increasing number of genetic association studies conducted in multiple populations provides an unprecedented opportunity to study how the genetic architecture of complex phenotypes varies between populations, a problem important for both medical and population genetics. Here, we have developed a method for estimating the transethnic genetic correlation: the correlation of causal-variant effect sizes at SNPs common in populations. This methods takes advantage of the entire spectrum of SNP associations and uses only summary-level data from genome-wide association studies. This avoids the computational costs and privacy concerns associated with genotype-level information while remaining scalable to hundreds of thousands of individuals and millions of SNPs. We applied our method to data on gene expression, rheumatoid arthritis, and type 2 diabetes and overwhelmingly found that the genetic correlation was significantly less than 1. Our method is implemented in a Python package called Popcorn.


Assuntos
Artrite Reumatoide/genética , Diabetes Mellitus Tipo 2/genética , Etnicidade/genética , Estudo de Associação Genômica Ampla/métodos , Software , Estatura , Índice de Massa Corporal , Genótipo , Humanos , Funções Verossimilhança , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Tamanho da Amostra
17.
Am J Hum Genet ; 99(5): 1130-1139, 2016 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-27773431

RESUMO

Analyzing genetic differences between closely related populations can be a powerful way to detect recent adaptation. The very large sample size of the UK Biobank is ideal for using population differentiation to detect selection and enables an analysis of the UK population structure at fine resolution. In this study, analyses of 113,851 UK Biobank samples showed that population structure in the UK is dominated by five principal components (PCs) spanning six clusters: Northern Ireland, Scotland, northern England, southern England, and two Welsh clusters. Analyses of ancient Eurasians revealed that populations in the northern UK have higher levels of Steppe ancestry and that UK population structure cannot be explained as a simple mixture of Celts and Saxons. A scan for unusual population differentiation along the top PCs identified a genome-wide-significant signal of selection at the coding variant rs601338 in FUT2 (p = 9.16 × 10-9). In addition, by combining evidence of unusual differentiation within the UK with evidence from ancient Eurasians, we identified genome-wide-significant (p = 5 × 10-8) signals of recent selection at two additional loci: CYP1A2-CSK and F12. We detected strong associations between diastolic blood pressure in the UK Biobank and both the variants with selection signals at CYP1A2-CSK (p = 1.10 × 10-19) and the variants with ancient Eurasian selection signals at the ATXN2-SH2B3 locus (p = 8.00 × 10-33), implicating recent adaptation related to blood pressure.


Assuntos
Bancos de Espécimes Biológicos/organização & administração , Pressão Sanguínea/genética , Adaptação Fisiológica/genética , Loci Gênicos , Genética Populacional , Genoma Humano , Humanos , Família Multigênica , Filogeografia , Seleção Genética , Reino Unido , População Branca/genética
18.
Am J Hum Genet ; 98(3): 456-472, 2016 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-26924531

RESUMO

Searching for genetic variants with unusual differentiation between subpopulations is an established approach for identifying signals of natural selection. However, existing methods generally require discrete subpopulations. We introduce a method that infers selection using principal components (PCs) by identifying variants whose differentiation along top PCs is significantly greater than the null distribution of genetic drift. To enable the application of this method to large datasets, we developed the FastPCA software, which employs recent advances in random matrix theory to accurately approximate top PCs while reducing time and memory cost from quadratic to linear in the number of individuals, a computational improvement of many orders of magnitude. We apply FastPCA to a cohort of 54,734 European Americans, identifying 5 distinct subpopulations spanning the top 4 PCs. Using the PC-based test for natural selection, we replicate previously known selected loci and identify three new genome-wide significant signals of selection, including selection in Europeans at ADH1B. The coding variant rs1229984(∗)T has previously been associated to a decreased risk of alcoholism and shown to be under selection in East Asians; we show that it is a rare example of independent evolution on two continents. We also detect selection signals at IGFBP3 and IGH, which have also previously been associated to human disease.


Assuntos
Álcool Desidrogenase/genética , Povo Asiático/genética , Evolução Molecular , Análise de Componente Principal , População Branca/genética , Biologia Computacional , Bases de Dados Genéticas , Europa (Continente) , Ásia Oriental , Loci Gênicos , Genética Populacional , Estudo de Associação Genômica Ampla , Humanos , Proteína 3 de Ligação a Fator de Crescimento Semelhante à Insulina/genética , Modelos Genéticos , Filogenia , Polimorfismo de Nucleotídeo Único , Seleção Genética
19.
Nat Rev Genet ; 14(7): 507-15, 2013 07.
Artigo em Inglês | MEDLINE | ID: mdl-23774735

RESUMO

The success of genome-wide association studies (GWASs) has led to increasing interest in making predictions of complex trait phenotypes, including disease, from genotype data. Rigorous assessment of the value of predictors is crucial before implementation. Here we discuss some of the limitations and pitfalls of prediction analysis and show how naive implementations can lead to severe bias and misinterpretation of results.


Assuntos
Estudo de Associação Genômica Ampla , Fenótipo , Polimorfismo de Nucleotídeo Único , Marcadores Genéticos/genética , Variação Genética , Genômica , Genótipo , Humanos , Modelos Genéticos , Modelos Estatísticos , Reprodutibilidade dos Testes , Risco
20.
Genet Epidemiol ; 41(8): 811-823, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29110330

RESUMO

Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but not both. Here, we introduce a multiethnic polygenic risk score that combines training data from European samples and training data from the target population. We applied this approach to predict type 2 diabetes (T2D) in a Latino cohort using both publicly available European summary statistics in large sample size (Neff  = 40k) and Latino training data in small sample size (Neff  = 8k). Here, we attained a >70% relative improvement in prediction accuracy (from R2  = 0.027 to 0.047) compared to methods that use only one source of training data, consistent with large relative improvements in simulations. We observed a systematically lower load of T2D risk alleles in Latino individuals with more European ancestry, which could be explained by polygenic selection in ancestral European and/or Native American populations. We predict T2D in a South Asian UK Biobank cohort using European (Neff  = 40k) and South Asian (Neff  = 16k) training data and attained a >70% relative improvement in prediction accuracy, and application to predict height in an African UK Biobank cohort using European (N = 113k) and African (N = 2k) training data attained a 30% relative improvement. Our work reduces the gap in polygenic risk prediction accuracy between European and non-European target populations.


Assuntos
Diabetes Mellitus Tipo 2/genética , Modelos Genéticos , Alelos , Estudos de Coortes , Diabetes Mellitus Tipo 2/patologia , Etnicidade/genética , Estudo de Associação Genômica Ampla , Genótipo , Hispânico ou Latino/genética , Humanos , Herança Multifatorial , Fenótipo , Polimorfismo de Nucleotídeo Único , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA