RESUMO
The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.
Assuntos
Sequenciamento do Exoma , Genoma Humano , Genótipo , Hispânico ou Latino , Adulto , Humanos , África/etnologia , América/etnologia , Europa (Continente)/etnologia , Frequência do Gene/genética , Genética Populacional , Genoma Humano/genética , Técnicas de Genotipagem , Hispânico ou Latino/genética , Homozigoto , Mutação com Perda de Função/genética , México , Estudos ProspectivosRESUMO
Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes1-5. Here we use exome sequence data on 628,388 individuals to identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75, which are associated with reduced incidence of CHIP. We also identify novel rare variant associations with clonal haematopoiesis and telomere length. Analysis of 5,041 health traits from the UK Biobank (UKB) found relationships between CHIP and severe COVID-19 outcomes, cardiovascular disease, haematologic traits, malignancy, smoking, obesity, infection and all-cause mortality. Longitudinal and Mendelian randomization analyses revealed that CHIP is associated with solid cancers, including non-melanoma skin cancer and lung cancer, and that CHIP linked to DNMT3A is associated with the subsequent development of myeloid but not lymphoid leukaemias. Additionally, contrary to previous findings from the initial 50,000 UKB exomes6, our results in the full sample do not support a role for IL-6 inhibition in reducing the risk of cardiovascular disease among CHIP carriers. Our findings demonstrate that CHIP represents a complex set of heterogeneous phenotypes with shared and unique germline genetic causes and varied clinical implications.
Assuntos
COVID-19 , Doenças Cardiovasculares , Humanos , Hematopoiese Clonal/genética , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/genéticaRESUMO
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Assuntos
Pleiotropia Genética , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Expressão Gênica/genética , Simulação por Computador , Modelos Genéticos , Locos de Características Quantitativas , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Gene-based burden tests are a popular and powerful approach for analysis of exome-wide association studies. These approaches combine sets of variants within a gene into a single burden score that is then tested for association. Typically, a range of burden scores are calculated and tested across a range of annotation classes and frequency bins. Correlation between these tests can complicate the multiple testing correction and hamper interpretation of the results. We introduce a method called the sparse burden association test (SBAT) that tests the joint set of burden scores under the assumption that causal burden scores act in the same effect direction. The method simultaneously assesses the significance of the model fit and selects the set of burden scores that best explain the association at the same time. Using simulated data, we show that the method is well calibrated and highlight scenarios where the test outperforms existing gene-based tests. We apply the method to 73 quantitative traits from the UK Biobank, showing that SBAT is a valuable additional gene-based test when combined with other existing approaches. This test is implemented in the REGENIE software.
Assuntos
Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Análise dos Mínimos Quadrados , Software , Modelos Genéticos , Exoma/genética , Variação Genética , Simulação por ComputadorRESUMO
A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.
Assuntos
Bancos de Espécimes Biológicos , Bases de Dados Genéticas , Sequenciamento do Exoma , Exoma/genética , África/etnologia , Ásia/etnologia , Asma/genética , Diabetes Mellitus/genética , Europa (Continente)/etnologia , Oftalmopatias/genética , Feminino , Predisposição Genética para Doença/genética , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Hipertensão/genética , Hepatopatias/genética , Masculino , Mutação , Neoplasias/genética , Característica Quantitativa Herdável , Reino UnidoRESUMO
In genetic association analysis of complex traits, permutation testing can be a valuable tool for assessing significance when the distribution of the test statistic is unknown or not well-approximated. This commonly arises, e.g, in tests of gene-set, pathway or genome-wide significance, or when the statistic is formed by machine learning or data adaptive methods. Existing applications include eQTL mapping, association testing with rare variants, inclusion of admixed individuals in genetic association analysis, and epistasis detection among many others. For genetic association testing in samples with population structure and/or relatedness, use of naive permutation can lead to inflated type 1 error. To address this in quantitative traits, the MVNpermute method was developed. However, for association mapping of a binary trait, the relationship between the mean and variance makes both naive permutation and the MVNpermute method invalid. We propose BRASS, a permutation method for binary traits, for use in association mapping in structured samples. In addition to modeling structure in the sample, BRASS allows for covariates, ascertainment and simultaneous testing of multiple markers, and it accommodates a wide range of test statistics. In simulation studies, we compare BRASS to other permutation and resampling-based methods in a range of scenarios that include population structure, familial relatedness, ascertainment and phenotype model misspecification. In these settings, we demonstrate the superior control of type 1 error by BRASS compared to the other 6 methods considered. We apply BRASS to assess genome-wide significance for association analyses in domestic dog for elbow dysplasia (ED) and idiopathic epilepsy (IE). For both traits we detect previously identified associations, and in addition, for ED, we detect significant association with a SNP on chromosome 35 that was not detected by previous analyses, demonstrating the potential of the method.
Assuntos
Testes Genéticos , Modelos Genéticos , Animais , Cães , Fenótipo , Estudos de Associação Genética , Simulação por Computador , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
BACKGROUND: Exome sequencing in hundreds of thousands of persons may enable the identification of rare protein-coding genetic variants associated with protection from human diseases like liver cirrhosis, providing a strategy for the discovery of new therapeutic targets. METHODS: We performed a multistage exome sequencing and genetic association analysis to identify genes in which rare protein-coding variants were associated with liver phenotypes. We conducted in vitro experiments to further characterize associations. RESULTS: The multistage analysis involved 542,904 persons with available data on liver aminotransferase levels, 24,944 patients with various types of liver disease, and 490,636 controls without liver disease. We found that rare coding variants in APOB, ABCB4, SLC30A10, and TM6SF2 were associated with increased aminotransferase levels and an increased risk of liver disease. We also found that variants in CIDEB, which encodes a structural protein found in hepatic lipid droplets, had a protective effect. The burden of rare predicted loss-of-function variants plus missense variants in CIDEB (combined carrier frequency, 0.7%) was associated with decreased alanine aminotransferase levels (beta per allele, -1.24 U per liter; 95% confidence interval [CI], -1.66 to -0.83; P = 4.8×10-9) and with 33% lower odds of liver disease of any cause (odds ratio per allele, 0.67; 95% CI, 0.57 to 0.79; P = 9.9×10-7). Rare coding variants in CIDEB were associated with a decreased risk of liver disease across different underlying causes and different degrees of severity, including cirrhosis of any cause (odds ratio per allele, 0.50; 95% CI, 0.36 to 0.70). Among 3599 patients who had undergone bariatric surgery, rare coding variants in CIDEB were associated with a decreased nonalcoholic fatty liver disease activity score (beta per allele in score units, -0.98; 95% CI, -1.54 to -0.41 [scores range from 0 to 8, with higher scores indicating more severe disease]). In human hepatoma cell lines challenged with oleate, CIDEB small interfering RNA knockdown prevented the buildup of large lipid droplets. CONCLUSIONS: Rare germline mutations in CIDEB conferred substantial protection from liver disease. (Funded by Regeneron Pharmaceuticals.).
Assuntos
Proteínas Reguladoras de Apoptose , Mutação em Linhagem Germinativa , Hepatopatias , Proteínas Reguladoras de Apoptose/genética , Proteínas Reguladoras de Apoptose/metabolismo , Predisposição Genética para Doença/genética , Predisposição Genética para Doença/prevenção & controle , Humanos , Fígado/metabolismo , Hepatopatias/genética , Hepatopatias/metabolismo , Hepatopatias/prevenção & controle , Transaminases/genética , Sequenciamento do ExomaRESUMO
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19), a respiratory illness that can result in hospitalization or death. We used exome sequence data to investigate associations between rare genetic variants and seven COVID-19 outcomes in 586,157 individuals, including 20,952 with COVID-19. After accounting for multiple testing, we did not identify any clear associations with rare variants either exome wide or when specifically focusing on (1) 13 interferon pathway genes in which rare deleterious variants have been reported in individuals with severe COVID-19, (2) 281 genes located in susceptibility loci identified by the COVID-19 Host Genetics Initiative, or (3) 32 additional genes of immunologic relevance and/or therapeutic potential. Our analyses indicate there are no significant associations with rare protein-coding variants with detectable effect sizes at our current sample sizes. Analyses will be updated as additional data become available, and results are publicly available through the Regeneron Genetics Center COVID-19 Results Browser.
Assuntos
COVID-19/diagnóstico , COVID-19/genética , Sequenciamento do Exoma , Exoma/genética , Predisposição Genética para Doença , Hospitalização/estatística & dados numéricos , COVID-19/imunologia , COVID-19/terapia , Feminino , Humanos , Interferons/genética , Masculino , Prognóstico , SARS-CoV-2 , Tamanho da AmostraRESUMO
Case-control genetic association analysis is an extremely common tool in human complex trait mapping. From a statistical point of view, the analysis of binary traits poses somewhat different challenges from the analysis of quantitative traits. Desirable features of a binary trait mapping approach would include (1) phenotype modeled as binary, with appropriate dependence between the mean and variance; (2) appropriate correction for relevant covariates; (3) appropriate correction for sample structure of various types, including related individuals, admixture and other types of population structure; (4) both fast and accurate computations; (5) robustness to ascertainment and other types of phenotype model misspecification, and (6) ability to leverage partially missing data to increase power. We review these challenges and argue, both theoretically and in simulations, for the value of retrospective association analysis as a way to overcome some of the limitations of the phenotype model, including model misspecification due to ascertainment. We give an overview of two recent retrospective methods, CARAT and CERAMIC, that are designed to meet criteria 1-6.
Assuntos
Modelos Genéticos , Herança Multifatorial , Simulação por Computador , Humanos , Fenótipo , Característica Quantitativa Herdável , Estudos RetrospectivosRESUMO
Coronavirus disease 2019 (COVID-19) and influenza are respiratory illnesses caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses, respectively. Both diseases share symptoms and clinical risk factors1, but the extent to which these conditions have a common genetic etiology is unknown. This is partly because host genetic risk factors are well characterized for COVID-19 but not for influenza, with the largest published genome-wide association studies for these conditions including >2 million individuals2 and about 1,000 individuals3-6, respectively. Shared genetic risk factors could point to targets to prevent or treat both infections. Through a genetic study of 18,334 cases with a positive test for influenza and 276,295 controls, we show that published COVID-19 risk variants are not associated with influenza. Furthermore, we discovered and replicated an association between influenza infection and noncoding variants in B3GALT5 and ST6GAL1, neither of which was associated with COVID-19. In vitro small interfering RNA knockdown of ST6GAL1-an enzyme that adds sialic acid to the cell surface, which is used for viral entry-reduced influenza infectivity by 57%. These results mirror the observation that variants that downregulate ACE2, the SARS-CoV-2 receptor, protect against COVID-19 (ref. 7). Collectively, these findings highlight downregulation of key cell surface receptors used for viral entry as treatment opportunities to prevent COVID-19 and influenza.
Assuntos
COVID-19 , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Influenza Humana , SARS-CoV-2 , Humanos , Influenza Humana/genética , Influenza Humana/epidemiologia , Influenza Humana/virologia , COVID-19/genética , COVID-19/virologia , Fatores de Risco , SARS-CoV-2/genética , Masculino , Feminino , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Pessoa de Meia-IdadeRESUMO
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
RESUMO
Human genetic studies of smoking behavior have been thus far largely limited to common variants. Studying rare coding variants has the potential to identify drug targets. We performed an exome-wide association study of smoking phenotypes in up to 749,459 individuals and discovered a protective association in CHRNB2, encoding the ß2 subunit of the α4ß2 nicotine acetylcholine receptor. Rare predicted loss-of-function and likely deleterious missense variants in CHRNB2 in aggregate were associated with a 35% decreased odds for smoking heavily (odds ratio (OR) = 0.65, confidence interval (CI) = 0.56-0.76, P = 1.9 × 10-8). An independent common variant association in the protective direction ( rs2072659 ; OR = 0.96; CI = 0.94-0.98; P = 5.3 × 10-6) was also evident, suggesting an allelic series. Our findings in humans align with decades-old experimental observations in mice that ß2 loss abolishes nicotine-mediated neuronal responses and attenuates nicotine self-administration. Our genetic discovery will inspire future drug designs targeting CHRNB2 in the brain for the treatment of nicotine addiction.
Assuntos
Nicotina , Tabagismo , Humanos , Animais , Camundongos , Fumar/genética , Tabagismo/genética , Fenótipo , Razão de ChancesRESUMO
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) enters human host cells via angiotensin-converting enzyme 2 (ACE2) and causes coronavirus disease 2019 (COVID-19). Here, through a genome-wide association study, we identify a variant (rs190509934, minor allele frequency 0.2-2%) that downregulates ACE2 expression by 37% (P = 2.7 × 10-8) and reduces the risk of SARS-CoV-2 infection by 40% (odds ratio = 0.60, P = 4.5 × 10-13), providing human genetic evidence that ACE2 expression levels influence COVID-19 risk. We also replicate the associations of six previously reported risk variants, of which four were further associated with worse outcomes in individuals infected with the virus (in/near LZTFL1, MHC, DPP9 and IFNAR2). Lastly, we show that common variants define a risk score that is strongly associated with severe disease among cases and modestly improves the prediction of disease severity relative to demographic and clinical factors alone.
Assuntos
COVID-19 , Enzima de Conversão de Angiotensina 2/genética , COVID-19/genética , Estudo de Associação Genômica Ampla , Humanos , Fatores de Risco , SARS-CoV-2/genéticaRESUMO
Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
Assuntos
Biologia Computacional , Estudo de Associação Genômica Ampla , Genômica , Estudos de Casos e Controles , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Genótipo , Humanos , Modelos Logísticos , Aprendizado de Máquina , Fenótipo , Reprodutibilidade dos TestesRESUMO
Large-scale human exome sequencing can identify rare protein-coding variants with a large impact on complex traits such as body adiposity. We sequenced the exomes of 645,626 individuals from the United Kingdom, the United States, and Mexico and estimated associations of rare coding variants with body mass index (BMI). We identified 16 genes with an exome-wide significant association with BMI, including those encoding five brain-expressed G protein-coupled receptors (CALCR, MC4R, GIPR, GPR151, and GPR75). Protein-truncating variants in GPR75 were observed in ~4/10,000 sequenced individuals and were associated with 1.8 kilograms per square meter lower BMI and 54% lower odds of obesity in the heterozygous state. Knock out of Gpr75 in mice resulted in resistance to weight gain and improved glycemic control in a high-fat diet model. Inhibition of GPR75 may provide a therapeutic strategy for obesity.