Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 933
Filtrar
1.
Nat Commun ; 11(1): 3981, 2020 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-32769997

RESUMO

Thyroid stimulating hormone (TSH) is critical for normal development and metabolism. To better understand the genetic contribution to TSH levels, we conduct a GWAS meta-analysis at 22.4 million genetic markers in up to 119,715 individuals and identify 74 genome-wide significant loci for TSH, of which 28 are previously unreported. Functional experiments show that the thyroglobulin protein-altering variants P118L and G67S impact thyroglobulin secretion. Phenome-wide association analysis in the UK Biobank demonstrates the pleiotropic effects of TSH-associated variants and a polygenic score for higher TSH levels is associated with a reduced risk of thyroid cancer in the UK Biobank and three other independent studies. Two-sample Mendelian randomization using TSH index variants as instrumental variables suggests a protective effect of higher TSH levels (indicating lower thyroid function) on risk of thyroid cancer and goiter. Our findings highlight the pleiotropic effects of TSH-associated variants on thyroid function and growth of malignant and benign thyroid tumors.


Assuntos
Pleiotropia Genética , Estudo de Associação Genômica Ampla , Neoplasias da Glândula Tireoide/genética , Tireotropina/genética , Loci Gênicos , Predisposição Genética para Doença , Bócio/genética , Humanos , Análise da Randomização Mendeliana , Herança Multifatorial/genética , Mutação de Sentido Incorreto/genética , Fenótipo , Mapeamento Físico do Cromossomo , Prevalência , Fatores de Risco , Tireoglobulina/genética , Neoplasias da Glândula Tireoide/epidemiologia
2.
PLoS One ; 15(8): e0237657, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32817676

RESUMO

The majority of genome-wide association studies (GWAS) loci are not annotated to known genes in the human genome, which renders biological interpretations difficult. Transcriptome-wide association studies (TWAS) associate complex traits with genotype-based prediction of gene expression deriving from expression quantitative loci(eQTL) studies, thus improving the interpretability of GWAS findings. However, these results can sometimes suffer from a high false positive rate, because predicted expression of different genes may be highly correlated due to linkage disequilibrium between eQTL. We propose a novel statistical method, Gene Score Regression (GSR), to detect causal gene sets for complex traits while accounting for gene-to-gene correlations. We consider non-causal genes that are highly correlated with the causal genes will also exhibit a high marginal association with the complex trait. Consequently, by regressing on the marginal associations of complex traits with the sum of the gene-to-gene correlations in each gene set, we can assess the amount of variance of the complex traits explained by the predicted expression of the genes in each gene set and identify plausible causal gene sets. GSR can operate either on GWAS summary statistics or observed gene expression. Therefore, it may be widely applied to annotate GWAS results and identify the underlying biological pathways. We demonstrate the high accuracy and computational efficiency of GSR compared to state-of-the-art methods through simulations and real data applications. GSR is openly available at https://github.com/li-lab-mcgill/GSR.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Anotação de Sequência Molecular , Herança Multifatorial/genética , Transcriptoma/genética , Regulação da Expressão Gênica/genética , Predisposição Genética para Doença , Genoma Humano/genética , Genótipo , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas
3.
Am J Hum Genet ; 107(3): 418-431, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32758451

RESUMO

While genome-wide association studies have identified susceptibility variants for numerous traits, their combined utility for predicting broad measures of health, such as mortality, remains poorly understood. We used data from the UK Biobank to combine polygenic risk scores (PRS) for 13 diseases and 12 mortality risk factors into sex-specific composite PRS (cPRS). These cPRS were moderately associated with all-cause mortality in independent data within the UK Biobank: the estimated hazard ratios per standard deviation were 1.10 (95% confidence interval: 1.05, 1.16) and 1.15 (1.10, 1.19) for women and men, respectively. Differences in life expectancy between the top and bottom 5% of the cPRS were estimated to be 4.79 (1.76, 7.81) years and 6.75 (4.16, 9.35) years for women and men, respectively. These associations were substantially attenuated after adjusting for non-genetic mortality risk factors measured at study entry (i.e., middle age for most participants). The cPRS may be useful in counseling younger individuals at higher genetic risk of mortality on modification of non-genetic factors.


Assuntos
Doenças Genéticas Inatas/mortalidade , Predisposição Genética para Doença , Herança Multifatorial/genética , Medição de Risco/estatística & dados numéricos , Bancos de Espécimes Biológicos , Feminino , Doenças Genéticas Inatas/genética , Doenças Genéticas Inatas/patologia , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Modelos de Riscos Proporcionais , Fatores de Risco , Reino Unido
4.
Am J Hum Genet ; 107(3): 432-444, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32758450

RESUMO

Accurate colorectal cancer (CRC) risk prediction models are critical for identifying individuals at low and high risk of developing CRC, as they can then be offered targeted screening and interventions to address their risks of developing disease (if they are in a high-risk group) and avoid unnecessary screening and interventions (if they are in a low-risk group). As it is likely that thousands of genetic variants contribute to CRC risk, it is clinically important to investigate whether these genetic variants can be used jointly for CRC risk prediction. In this paper, we derived and compared different approaches to generating predictive polygenic risk scores (PRS) from genome-wide association studies (GWASs) including 55,105 CRC-affected case subjects and 65,079 control subjects of European ancestry. We built the PRS in three ways, using (1) 140 previously identified and validated CRC loci; (2) SNP selection based on linkage disequilibrium (LD) clumping followed by machine-learning approaches; and (3) LDpred, a Bayesian approach for genome-wide risk prediction. We tested the PRS in an independent cohort of 101,987 individuals with 1,699 CRC-affected case subjects. The discriminatory accuracy, calculated by the age- and sex-adjusted area under the receiver operating characteristics curve (AUC), was highest for the LDpred-derived PRS (AUC = 0.654) including nearly 1.2 M genetic variants (the proportion of causal genetic variants for CRC assumed to be 0.003), whereas the PRS of the 140 known variants identified from GWASs had the lowest AUC (AUC = 0.629). Based on the LDpred-derived PRS, we are able to identify 30% of individuals without a family history as having risk for CRC similar to those with a family history of CRC, whereas the PRS based on known GWAS variants identified only top 10% as having a similar relative risk. About 90% of these individuals have no family history and would have been considered average risk under current screening guidelines, but might benefit from earlier screening. The developed PRS offers a way for risk-stratified CRC screening and other targeted interventions.


Assuntos
Neoplasias Colorretais/epidemiologia , Predisposição Genética para Doença , Genoma Humano/genética , Medição de Risco , Idoso , Grupo com Ancestrais do Continente Asiático/genética , Teorema de Bayes , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco
5.
Am J Hum Genet ; 107(3): 461-472, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32781045

RESUMO

RNA sequencing (RNA-seq) is a powerful technology for studying human transcriptome variation. We introduce PAIRADISE (Paired Replicate Analysis of Allelic Differential Splicing Events), a method for detecting allele-specific alternative splicing (ASAS) from RNA-seq data. Unlike conventional approaches that detect ASAS events one sample at a time, PAIRADISE aggregates ASAS signals across multiple individuals in a population. By treating the two alleles of an individual as paired, and multiple individuals sharing a heterozygous SNP as replicates, we formulate ASAS detection using PAIRADISE as a statistical problem for identifying differential alternative splicing from RNA-seq data with paired replicates. PAIRADISE outperforms alternative statistical models in simulation studies. Applying PAIRADISE to replicate RNA-seq data of a single individual and to population-scale RNA-seq data across many individuals, we detect ASAS events associated with genome-wide association study (GWAS) signals of complex traits or diseases. Additionally, PAIRADISE ASAS analysis detects the effects of rare variants on alternative splicing. PAIRADISE provides a useful computational tool for elucidating the genetic variation and phenotypic association of alternative splicing in populations.


Assuntos
Processamento Alternativo/genética , Predisposição Genética para Doença , Herança Multifatorial/genética , Transcriptoma/genética , Alelos , Feminino , Perfilação da Expressão Gênica , Genética Populacional/métodos , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Modelos Estatísticos , RNA-Seq , Sequenciamento Completo do Exoma
6.
Nat Commun ; 11(1): 4020, 2020 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-32782262

RESUMO

While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and low frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Modelos Genéticos , Alelos , Frequência do Gene , Genótipo , Humanos , Desequilíbrio de Ligação , Herança Multifatorial/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável
7.
Nat Commun ; 11(1): 3635, 2020 08 20.
Artigo em Inglês | MEDLINE | ID: mdl-32820175

RESUMO

Genetic variation can predispose to disease both through (i) monogenic risk variants that disrupt a physiologic pathway with large effect on disease and (ii) polygenic risk that involves many variants of small effect in different pathways. Few studies have explored the interplay between monogenic and polygenic risk. Here, we study 80,928 individuals to examine whether polygenic background can modify penetrance of disease in tier 1 genomic conditions - familial hypercholesterolemia, hereditary breast and ovarian cancer, and Lynch syndrome. Among carriers of a monogenic risk variant, we estimate substantial gradients in disease risk based on polygenic background - the probability of disease by age 75 years ranged from 17% to 78% for coronary artery disease, 13% to 76% for breast cancer, and 11% to 80% for colon cancer. We propose that accounting for polygenic background is likely to increase accuracy of risk estimation for individuals who inherit a monogenic risk variant.


Assuntos
Predisposição Genética para Doença , Herança Multifatorial/genética , Penetrância , Idoso , Neoplasias da Mama/genética , Estudos de Casos e Controles , Neoplasias Colorretais/genética , Doença da Artéria Coronariana/genética , Feminino , Genoma Humano , Humanos , Masculino , Pessoa de Meia-Idade , Razão de Chances , Fatores de Risco
8.
Am J Hum Genet ; 107(1): 60-71, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32533944

RESUMO

Adult height is one of the earliest putative examples of polygenic adaptation in humans. However, this conclusion was recently challenged because residual uncorrected stratification from large-scale consortium studies was considered responsible for the previously noted genetic difference. It thus remains an open question whether height loci exhibit signals of polygenic adaptation in any human population. We re-examined this question, focusing on one of the shortest European populations, the Sardinians, in addition to mainland European populations. We utilized height-associated loci from the Biobank Japan (BBJ) dataset to further alleviate concerns of biased ascertainment of GWAS loci and showed that the Sardinians remain significantly shorter than expected under neutrality (∼0.22 standard deviation shorter than Utah residents with ancestry from northern and western Europe [CEU] on the basis of polygenic height scores, p = 3.89 × 10-4). We also found the trajectory of polygenic height scores between the Sardinian and the British populations diverged over at least the last 10,000 years (p = 0.0082), consistent with a signature of polygenic adaptation driven primarily by the Sardinian population. Although the polygenic score-based analysis showed a much subtler signature in mainland European populations, we found a clear and robust adaptive signature in the UK population by using a haplotype-based statistic, the trait singleton density score (tSDS), driven by the height-increasing alleles (p = 9.1 × 10-4). In summary, by ascertaining height loci in a distant East Asian population, we further supported the evidence of polygenic adaptation at height-associated loci among the Sardinians. In mainland Europeans, the adaptive signature was detected in haplotype-based analysis but not in polygenic score-based analysis.


Assuntos
Adaptação Fisiológica/genética , Estatura/genética , Herança Multifatorial/genética , Alelos , Grupo com Ancestrais do Continente Asiático/genética , Bancos de Espécimes Biológicos , Grupo com Ancestrais do Continente Europeu/genética , Genética Populacional/métodos , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Haplótipos/genética , Humanos , Itália , Japão , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Seleção Genética/genética
9.
PLoS Med ; 17(6): e1003137, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32479557

RESUMO

BACKGROUND: Identifying causal risk factors for self-harm is essential to inform preventive interventions. Epidemiological studies have identified risk factors associated with self-harm, but these associations can be subject to confounding. By implementing genetically informed methods to better account for confounding, this study aimed to better identify plausible causal risk factors for self-harm. METHODS AND FINDINGS: Using summary statistics from 24 genome-wide association studies (GWASs) comprising 16,067 to 322,154 individuals, polygenic scores (PSs) were generated to index 24 possible individual risk factors for self-harm (i.e., mental health vulnerabilities, substance use, cognitive traits, personality traits, and physical traits) among a subset of UK Biobank participants (N = 125,925, 56.2% female) who completed an online mental health questionnaire in the period from 13 July 2016 to 27 July 2017. In total, 5,520 (4.4%) of these participants reported having self-harmed in their lifetime. In binomial regression models, PSs indexing 6 risk factors (major depressive disorder [MDD], attention deficit/hyperactivity disorder [ADHD], bipolar disorder, schizophrenia, alcohol dependence disorder, and lifetime cannabis use) predicted self-harm, with effect sizes ranging from odds ratio (OR) = 1.05 (95% CI 1.02 to 1.07, q = 0.008) for lifetime cannabis use to OR = 1.20 (95% CI 1.16 to 1.23, q = 1.33 × 10-35) for MDD. No systematic differences emerged between suicidal and non-suicidal self-harm. To further probe causal relationships, two-sample Mendelian randomisation (MR) analyses were conducted, with MDD, ADHD, and schizophrenia emerging as the most plausible causal risk factors for self-harm. The genetic liabilities for MDD and schizophrenia were associated with self-harm independently of diagnosis and medication. Main limitations include the lack of representativeness of the UK Biobank sample, that self-harm was self-reported, and the limited power of some of the included GWASs, potentially leading to possible type II error. CONCLUSIONS: In addition to confirming the role of MDD, we demonstrate that ADHD and schizophrenia likely play a role in the aetiology of self-harm using multivariate genetic designs for causal inference. Among the many individual risk factors we simultaneously considered, our findings suggest that systematic detection and treatment of core psychiatric symptoms, including psychotic and impulsivity symptoms, may be beneficial among people at risk for self-harm.


Assuntos
Comportamento Autodestrutivo/genética , Idoso , Idoso de 80 Anos ou mais , Transtorno do Deficit de Atenção com Hiperatividade/complicações , Transtorno do Deficit de Atenção com Hiperatividade/genética , Transtorno Bipolar/complicações , Transtorno Bipolar/genética , Bases de Dados como Assunto , Transtorno Depressivo Maior/complicações , Transtorno Depressivo Maior/genética , Feminino , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Humanos , Modelos Logísticos , Masculino , Análise da Randomização Mendeliana , Pessoa de Meia-Idade , Herança Multifatorial/genética , Fatores de Risco , Esquizofrenia , Comportamento Autodestrutivo/epidemiologia , Comportamento Autodestrutivo/etiologia , Transtornos Relacionados ao Uso de Substâncias/complicações , Transtornos Relacionados ao Uso de Substâncias/genética , Inquéritos e Questionários , Reino Unido/epidemiologia
10.
PLoS Genet ; 16(6): e1008855, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32542026

RESUMO

Traditional univariate genome-wide association studies generate false positives and negatives due to difficulties distinguishing associated variants from variants with spurious nonzero effects that do not directly influence the trait. Recent efforts have been directed at identifying genes or signaling pathways enriched for mutations in quantitative traits or case-control studies, but these can be computationally costly and hampered by strict model assumptions. Here, we present gene-ε, a new approach for identifying statistical associations between sets of variants and quantitative traits. Our key insight is that enrichment studies on the gene-level are improved when we reformulate the genome-wide SNP-level null hypothesis to identify spurious small-to-intermediate SNP effects and classify them as non-causal. gene-ε efficiently identifies enriched genes under a variety of simulated genetic architectures, achieving greater than a 90% true positive rate at 1% false positive rate for polygenic traits. Lastly, we apply gene-ε to summary statistics derived from six quantitative traits using European-ancestry individuals in the UK Biobank, and identify enriched genes that are in biologically relevant pathways.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Genéticos , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genética , Interpretação Estatística de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Grupo com Ancestrais do Continente Europeu/genética , Humanos , Reino Unido
11.
Nature ; 584(7819): 136-141, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32581363

RESUMO

Clonally expanded blood cells that contain somatic mutations (clonal haematopoiesis) are commonly acquired with age and increase the risk of blood cancer1-9. The blood clones identified so far contain diverse large-scale mosaic chromosomal alterations (deletions, duplications and copy-neutral loss of heterozygosity (CN-LOH)) on all chromosomes1,2,5,6,9, but the sources of selective advantage that drive the expansion of most clones remain unknown. Here, to identify genes, mutations and biological processes that give selective advantage to mutant clones, we analysed genotyping data from the blood-derived DNA of 482,789 participants from the UK Biobank10. We identified 19,632 autosomal mosaic chromosomal alterations and analysed these for relationships to inherited genetic variation. We found 52 inherited, rare, large-effect coding or splice variants in 7 genes that were associated with greatly increased vulnerability to clonal haematopoiesis with specific acquired CN-LOH mutations. Acquired mutations systematically replaced the inherited risk alleles (at MPL) or duplicated them to the homologous chromosome (at FH, NBN, MRE11, ATM, SH2B3 and TM2D3). Three of the genes (MRE11, NBN and ATM) encode components of the MRN-ATM pathway, which limits cell division after DNA damage and telomere attrition11-13; another two (MPL and SH2B3) encode proteins that regulate the self-renewal of stem cells14-16. In addition, we found that CN-LOH mutations across the genome tended to cause chromosomal segments with alleles that promote the expansion of haematopoietic cells to replace their homologous (allelic) counterparts, increasing polygenic drive for blood-cell proliferation traits. Readily acquired mutations that replace chromosomal segments with their homologous counterparts seem to interact with pervasive inherited variation to create a challenge for lifelong cytopoiesis.


Assuntos
Evolução Clonal/genética , Células Clonais/metabolismo , Hematopoese/genética , Herança Multifatorial/genética , Adulto , Idoso , Alelos , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/patologia , Divisão Celular/genética , Aberrações Cromossômicas , Células Clonais/citologia , Células Clonais/patologia , Feminino , Predisposição Genética para Doença , Neoplasias Hematológicas/genética , Neoplasias Hematológicas/patologia , Humanos , Perda de Heterozigosidade/genética , Masculino , Pessoa de Meia-Idade , Mosaicismo , Reino Unido
12.
Am J Hum Genet ; 106(6): 805-817, 2020 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-32442408

RESUMO

Despite strong transethnic genetic correlations reported in the literature for many complex traits, the non-transferability of polygenic risk scores across populations suggests the presence of population-specific components of genetic architecture. We propose an approach that models GWAS summary data for one trait in two populations to estimate genome-wide proportions of population-specific/shared causal SNPs. In simulations across various genetic architectures, we show that our approach yields approximately unbiased estimates with in-sample LD and slight upward-bias with out-of-sample LD. We analyze nine complex traits in individuals of East Asian and European ancestry, restricting to common SNPs (MAF > 5%), and find that most common causal SNPs are shared by both populations. Using the genome-wide estimates as priors in an empirical Bayes framework, we perform fine-mapping and observe that high-posterior SNPs (for both the population-specific and shared causal configurations) have highly correlated effects in East Asians and Europeans. In population-specific GWAS risk regions, we observe a 2.8× enrichment of shared high-posterior SNPs, suggesting that population-specific GWAS risk regions harbor shared causal SNPs that are undetected in the other GWASs due to differences in LD, allele frequencies, and/or sample size. Finally, we report enrichments of shared high-posterior SNPs in 53 tissue-specific functional categories and find evidence that SNP-heritability enrichments are driven largely by many low-effect common SNPs.


Assuntos
Grupos Étnicos/genética , Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Teorema de Bayes , Europa (Continente)/etnologia , Extremo Oriente/etnologia , Frequência do Gene , Humanos , Desequilíbrio de Ligação , Especificidade de Órgãos/genética
13.
Genet Sel Evol ; 52(1): 26, 2020 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-32414320

RESUMO

BACKGROUND: Estimating the genetic component of a complex phenotype is a complicated problem, mainly because there are many allele effects to estimate from a limited number of phenotypes. In spite of this difficulty, linear methods with variable selection have been able to give good predictions of additive effects of individuals. However, prediction of non-additive genetic effects is challenging with the usual prediction methods. In machine learning, non-additive relations between inputs can be modeled with neural networks. We developed a novel method (NetSparse) that uses Bayesian neural networks with variable selection for the prediction of genotypic values of individuals, including non-additive genetic effects. RESULTS: We simulated several populations with different phenotypic models and compared NetSparse to genomic best linear unbiased prediction (GBLUP), BayesB, their dominance variants, and an additive by additive method. We found that when the number of QTL was relatively small (10 or 100), NetSparse had 2 to 28 percentage points higher accuracy than the reference methods. For scenarios that included dominance or epistatic effects, NetSparse had 0.0 to 3.9 percentage points higher accuracy for predicting phenotypes than the reference methods, except in scenarios with extreme overdominance, for which reference methods that explicitly model dominance had 6 percentage points higher accuracy than NetSparse. CONCLUSIONS: Bayesian neural networks with variable selection are promising for prediction of the genetic component of complex traits in animal breeding, and their performance is robust across different genetic models. However, their large computational costs can hinder their use in practice.


Assuntos
Previsões/métodos , Herança Multifatorial/genética , Fenótipo , Algoritmos , Alelos , Animais , Teorema de Bayes , Frequência do Gene/genética , Genética Populacional/métodos , Genômica/métodos , Genótipo , Humanos , Modelos Genéticos , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Seleção Genética/genética
15.
Am J Hum Genet ; 107(1): 46-59, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32470373

RESUMO

In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.


Assuntos
Herança Multifatorial/genética , Idoso , Estudos de Coortes , Diabetes Mellitus Tipo 2/genética , Feminino , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Desequilíbrio de Ligação/genética , Masculino , Pessoa de Meia-Idade , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética
17.
PLoS Genet ; 16(5): e1008766, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32365090

RESUMO

Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects' relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM with a single random effect called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. We develop a blockwise coordinate descent algorithm with automatic tuning parameter selection which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and three real data examples, we show that ggmix leads to more parsimonious models compared to the two-stage approach or principal component adjustment with better prediction accuracy. Our method performs well even in the presence of highly correlated markers, and when the causal SNPs are included in the kinship matrix. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package available on CRAN (https://cran.r-project.org/package=ggmix).


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Animais , Simulação por Computador , Cruzamentos Genéticos , Genética Populacional/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Leishmania tropica/genética , Leishmaniose Cutânea/genética , Modelos Lineares , Camundongos , Camundongos Endogâmicos , Herança Multifatorial/genética , Mycobacterium bovis , Dinâmica Populacional , Tamanho da Amostra , Software , Tuberculose/genética , Tuberculose/patologia
18.
Am J Hum Genet ; 106(5): 707-716, 2020 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-32386537

RESUMO

Because polygenic risk scores (PRSs) for coronary heart disease (CHD) are derived from mainly European ancestry (EA) cohorts, their validity in African ancestry (AA) and Hispanic ethnicity (HE) individuals is unclear. We investigated associations of "restricted" and genome-wide PRSs with CHD in three major racial and ethnic groups in the U.S. The eMERGE cohort (mean age 48 ± 14 years, 58% female) included 45,645 EA, 7,597 AA, and 2,493 HE individuals. We assessed two restricted PRSs (PRSTikkanen and PRSTada; 28 and 50 variants, respectively) and two genome-wide PRSs (PRSmetaGRS and PRSLDPred; 1.7 M and 6.6 M variants, respectively) derived from EA cohorts. Over a median follow-up of 11.1 years, 2,652 incident CHD events occurred. Hazard and odds ratios for the association of PRSs with CHD were similar in EA and HE cohorts but lower in AA cohorts. Genome-wide PRSs were more strongly associated with CHD than restricted PRSs were. PRSmetaGRS, the best performing PRS, was associated with CHD in all three cohorts; hazard ratios (95% CI) per 1 SD increase were 1.53 (1.46-1.60), 1.53 (1.23-1.90), and 1.27 (1.13-1.43) for incident CHD in EA, HE, and AA individuals, respectively. The hazard ratios were comparable in the EA and HE cohorts (pinteraction = 0.77) but were significantly attenuated in AA individuals (pinteraction= 2.9 × 10-3). These results highlight the potential clinical utility of PRSs for CHD as well as the need to assemble diverse cohorts to generate ancestry- and ethnicity PRSs.


Assuntos
Afro-Americanos/genética , Doença das Coronárias/genética , Grupo com Ancestrais do Continente Europeu/genética , Predisposição Genética para Doença , Hispano-Americanos/genética , Herança Multifatorial/genética , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Razão de Chances
19.
Hum Genet ; 139(5): 647-655, 2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-32232557

RESUMO

Gene prioritization is the process of determining which variants and genes identified in genetic analyses are likely to cause a disease or a variation in a phenotype. For many genes, neither in vitro nor in vivo testing is available, thus assessing their pathogenic role could be challenging, leading to false-positive or false-negative results. In this paper, we propose an innovative score of gene prioritization based on the population of interest. We introduce the concept of singleton-cohort variants (SC variant), a variant that has allele count equal to one in the cohort under study. The difference between the normalized count of SC variants in the coding region and the normalized count of SC variants in the non-coding region should give a hint regarding the level of constraints for that gene in a specific population. This scoring system is negative when there are constraints that allow the presence of SC variants only in the non-coding region; on the contrary, it is positive when there are no constraints. A complimentary score is the sum of SC variants normalized count in both coding and non-coding regions, which could be used as a proxy of positive or strong purifying selection in a specific population. Our methodology showed a high level of constraining for genes such as USP34 in all subpopulations tested (1000 G dataset). In contrast, some genes showed a high negative score only in specific populations, e.g., MYT1L in Europeans, UBR5 in East Asians, and FBXO11 in Africans.


Assuntos
Grupos Étnicos/genética , Marcadores Genéticos , Variação Genética , Genética Populacional , Modelos Teóricos , Herança Multifatorial/genética , Estudo de Associação Genômica Ampla , Haplótipos , Humanos , Fenótipo , Transdução de Sinais
20.
Mol Genet Genomics ; 295(4): 843-853, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32227305

RESUMO

Genome-wide association studies (GWAS) have revealed that the genetic contribution to certain complex diseases is well-described by Fisher's infinitesimal model in which a vast number of polymorphisms each confer a small effect. Under Fisher's model, variants have additive effects both across loci and within loci. However, the latter assumption is at odds with the common observation of dominant or recessive rare alleles responsible for monogenic disorders. Here, we searched for evidence of non-additive (dominant or recessive) effects for GWAS variants known to confer susceptibility to the highly heritable quantitative trait, refractive error. Of 146 GWAS variants examined in a discovery sample of 228,423 individuals whose refractive error phenotype was inferred from their age-of-onset of spectacle wear, only 8 had even nominal evidence (p < 0.05) of non-additive effects. In a replication sample of 73,577 individuals who underwent direct assessment of refractive error, 1 of these 8 variants had robust independent evidence of non-additive effects (rs7829127 within ZMAT4, p = 4.76E-05) while a further 2 had suggestive evidence (rs35337422 in RD3L, p = 7.21E-03 and rs12193446 in LAMA2, p = 2.57E-02). Accounting for non-additive effects had minimal impact on the accuracy of a polygenic risk score for refractive error (R2 = 6.04% vs. 6.01%). Our findings demonstrate that very few GWAS variants for refractive error show evidence of a departure from an additive mode of action and that accounting for non-additive risk variants offers little scope to improve the accuracy of polygenic risk scores for myopia.


Assuntos
Estudo de Associação Genômica Ampla , Miopia/genética , Característica Quantitativa Herdável , Erros de Refração/genética , Adulto , Idoso , Bancos de Espécimes Biológicos , Feminino , Genes Dominantes/genética , Predisposição Genética para Doença , Variação Genética/genética , Humanos , Laminina/genética , Masculino , Pessoa de Meia-Idade , Herança Multifatorial/genética , Miopia/patologia , Polimorfismo de Nucleotídeo Único/genética , Erros de Refração/patologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA