ABSTRACT
Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.
Subject(s)
Genetic Variation , Genome-Wide Association Study , Models, Genetic , Bayes Theorem , Female , Humans , Male , PhenotypeABSTRACT
The high prevalence of sickle haemoglobin in Africa shows that malaria has been a major force for human evolutionary selection, but surprisingly few other polymorphisms have been proven to confer resistance to malaria in large epidemiological studies. To address this problem, we conducted a multi-centre genome-wide association study (GWAS) of life-threatening Plasmodium falciparum infection (severe malaria) in over 11,000 African children, with replication data in a further 14,000 individuals. Here we report a novel malaria resistance locus close to a cluster of genes encoding glycophorins that are receptors for erythrocyte invasion by P. falciparum. We identify a haplotype at this locus that provides 33% protection against severe malaria (odds ratio = 0.67, 95% confidence interval = 0.60-0.76, P value = 9.5 × 10(-11)) and is linked to polymorphisms that have previously been shown to have features of ancient balancing selection, on the basis of haplotype sharing between humans and chimpanzees. Taken together with previous observations on the malaria-protective role of blood group O, these data reveal that two of the strongest GWAS signals for severe malaria lie in or close to genes encoding the glycosylated surface coat of the erythrocyte cell membrane, both within regions of the genome where it appears that evolution has maintained diversity for millions of years. These findings provide new insights into the host-parasite interactions that are critical in determining the outcome of malaria infection.
Subject(s)
Genetic Predisposition to Disease/genetics , Genome-Wide Association Study , Malaria, Falciparum/genetics , Selection, Genetic/genetics , ABO Blood-Group System , Africa/epidemiology , Animals , Child , Conserved Sequence/genetics , Erythrocyte Membrane/metabolism , Erythrocytes/metabolism , Erythrocytes/parasitology , Evolution, Molecular , Extracellular Matrix Proteins/genetics , Female , Glycophorins/genetics , Haplotypes/genetics , Host-Parasite Interactions/genetics , Humans , Malaria, Falciparum/epidemiology , Malaria, Falciparum/parasitology , Male , Pan troglodytes/genetics , Plasmodium falciparum/physiology , Polymorphism, Single Nucleotide/geneticsABSTRACT
Genome-wide association studies (GWAS) are a powerful tool for understanding the genetic basis of diseases and traits, but most studies have been conducted in isolation, with a focus on either a single or a set of closely related phenotypes. We describe MetABF, a simple Bayesian framework for performing integrative meta-analysis across multiple GWAS using summary statistics. The approach is applicable across a wide range of study designs and can increase the power by 50% compared with standard frequentist tests when only a subset of studies have a true effect. We demonstrate its utility in a meta-analysis of 20 diverse GWAS which were part of the Wellcome Trust Case Control Consortium 2. The novelty of the approach is its ability to explore, and assess the evidence for a range of possible true patterns of association across studies in a computationally efficient framework.
Subject(s)
Genome-Wide Association Study , Bayes Theorem , Case-Control Studies , Computer Simulation , Humans , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide/geneticsABSTRACT
New directly acting antivirals (DAAs) provide very high cure rates in most patients infected by hepatitis C virus (HCV). However, some patient groups have been relatively harder to treat, including those with cirrhosis or infected with HCV genotype 3. In the recent BOSON trial, genotype 3, patients with cirrhosis receiving a 16-week course of sofosbuvir and ribavirin had a sustained virological response (SVR) rate of around 50%. In patients with cirrhosis, interferon lambda 4 (IFNL4) CC genotype was significantly associated with SVR. This genotype was also associated with a lower interferon-stimulated gene (ISG) signature in peripheral blood and in liver at baseline. Unexpectedly, patients with the CC genotype showed a dynamic increase in ISG expression between weeks 4 and 16 of DAA therapy, whereas the reverse was true for non-CC patients. Conclusion: These data provide an important dynamic link between host genotype and phenotype in HCV therapy also potentially relevant to naturally acquired infection. (Hepatology 2018; 00:000-000).
Subject(s)
Antiviral Agents/therapeutic use , Hepatitis C/drug therapy , Interleukins/genetics , Ribavirin/therapeutic use , Sofosbuvir/therapeutic use , Gene Expression Profiling , Gene Expression Regulation , Genotype , Hepatitis C/blood , Hepatitis C/genetics , Humans , Liver/metabolism , Liver Cirrhosis/virology , Sustained Virologic ResponseABSTRACT
MOTIVATION: The goal of fine-mapping in genomic regions associated with complex diseases and traits is to identify causal variants that point to molecular mechanisms behind the associations. Recent fine-mapping methods using summary data from genome-wide association studies rely on exhaustive search through all possible causal configurations, which is computationally expensive. RESULTS: We introduce FINEMAP, a software package to efficiently explore a set of the most important causal configurations of the region via a shotgun stochastic search algorithm. We show that FINEMAP produces accurate results in a fraction of processing time of existing approaches and is therefore a promising tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing projects. AVAILABILITY AND IMPLEMENTATION: FINEMAP v1.0 is freely available for Mac OS X and Linux at http://www.christianbenner.com CONTACT: : christian.benner@helsinki.fi or matti.pirinen@helsinki.fi.
Subject(s)
Genome-Wide Association Study , Algorithms , Genome , Genomics , Polymorphism, Single Nucleotide , SoftwareABSTRACT
Multiple sclerosis is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability. Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals, and systematic attempts to identify linkage in multiplex families have confirmed that variation within the major histocompatibility complex (MHC) exerts the greatest individual effect on risk. Modestly powered genome-wide association studies (GWAS) have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects have a key role in disease susceptibility. Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the HLA-DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the class I region. Immunologically relevant genes are significantly overrepresented among those mapping close to the identified loci and particularly implicate T-helper-cell differentiation in the pathogenesis of multiple sclerosis.
Subject(s)
Genetic Predisposition to Disease/genetics , Immunity, Cellular/immunology , Multiple Sclerosis/genetics , Multiple Sclerosis/immunology , Alleles , Cell Differentiation/immunology , Europe/ethnology , Genome, Human/genetics , Genome-Wide Association Study , HLA-A Antigens/genetics , HLA-DR Antigens/genetics , HLA-DRB1 Chains , Humans , Immunity, Cellular/genetics , Major Histocompatibility Complex/genetics , Polymorphism, Single Nucleotide/genetics , Sample Size , T-Lymphocytes, Helper-Inducer/cytology , T-Lymphocytes, Helper-Inducer/immunologyABSTRACT
We carried out a genome-wide association study of schizophrenia (479 cases, 2,937 controls) and tested loci with P < 10(-5) in up to 16,726 additional subjects. Of 12 loci followed up, 3 had strong independent support (P < 5 x 10(-4)), and the overall pattern of replication was unlikely to occur by chance (P = 9 x 10(-8)). Meta-analysis provided strongest evidence for association around ZNF804A (P = 1.61 x 10(-7)) and this strengthened when the affected phenotype included bipolar disorder (P = 9.96 x 10(-9)).
Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Kruppel-Like Transcription Factors/genetics , Schizophrenia/genetics , Bipolar Disorder/genetics , Case-Control Studies , Chromosome Mapping , Follow-Up Studies , Humans , Polymorphism, Single NucleotideABSTRACT
Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP-based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles.
Subject(s)
Black People/genetics , Genome-Wide Association Study , Hemoglobin, Sickle/genetics , Malaria/genetics , Africa , Bayes Theorem , Chromosome Mapping , Genetic Heterogeneity , Genetic Predisposition to Disease , Genetic Variation , Genetics, Population , Genome, Human , Haplotypes , Humans , Linkage Disequilibrium , Malaria/epidemiology , Malaria/pathology , Polymorphism, Single NucleotideABSTRACT
Susceptibility to common human diseases is influenced by both genetic and environmental factors. The explosive growth of genetic data, and the knowledge that it is generating, are transforming our biological understanding of these diseases. In this review, we describe the technological and analytical advances that have enabled genome-wide association studies to be successful in identifying a large number of genetic variants robustly associated with common disease. We examine the biological insights that these genetic associations are beginning to produce, from functional mechanisms involving individual genes to biological pathways linking associated genes, and the identification of functional annotations, some of which are cell-type-specific, enriched in disease associations. Although most efforts have focused on identifying and interpreting genetic variants that are irrefutably associated with disease, it is increasingly clear that--even at large sample sizes--these represent only the tip of the iceberg of genetic signal, motivating polygenic analyses that consider the effects of genetic variants throughout the genome, including modest effects that are not individually statistically significant. As data from an increasingly large number of diseases and traits are analysed, pleiotropic effects (defined as genetic loci affecting multiple phenotypes) can help integrate our biological understanding. Looking forward, the next generation of population-scale data resources, linking genomic information with health outcomes, will lead to another step-change in our ability to understand, and treat, common diseases.
Subject(s)
Disease/genetics , Genetic Predisposition to Disease , Genome, Human , Genetic Pleiotropy , Genetic Variation , Genome-Wide Association Study , Humans , PhenotypeABSTRACT
We performed a genome-wide association study (GWAS) in 1705 Parkinson's disease (PD) UK patients and 5175 UK controls, the largest sample size so far for a PD GWAS. Replication was attempted in an additional cohort of 1039 French PD cases and 1984 controls for the 27 regions showing the strongest evidence of association (P< 10(-4)). We replicated published associations in the 4q22/SNCA and 17q21/MAPT chromosome regions (P< 10(-10)) and found evidence for an additional independent association in 4q22/SNCA. A detailed analysis of the haplotype structure at 17q21 showed that there are three separate risk groups within this region. We found weak but consistent evidence of association for common variants located in three previously published associated regions (4p15/BST1, 4p16/GAK and 1q32/PARK16). We found no support for the previously reported SNP association in 12q12/LRRK2. We also found an association of the two SNPs in 4q22/SNCA with the age of onset of the disease.
Subject(s)
Chromosomes, Human, Pair 17/genetics , Genetic Predisposition to Disease , Parkinson Disease/genetics , alpha-Synuclein/genetics , Age of Onset , Case-Control Studies , Genome-Wide Association Study , Haplotypes , Humans , Polymorphism, Single Nucleotide , Sample Size , White PeopleABSTRACT
SUMMARY: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. AVAILABILITY: The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer CONTACT: chris.spencer@well.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Algorithms , Cluster Analysis , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Cohort Studies , Female , Humans , Male , Oligonucleotide Array Sequence AnalysisABSTRACT
Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical "complete" chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
Subject(s)
Genome-Wide Association Study/methods , Genotype , Genome, Human , Humans , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Sample Size , SoftwareABSTRACT
BACKGROUND: There is considerable interest in whether genetic data can be used to improve standard cardiovascular disease risk calculators, as the latter are routinely used in clinical practice to manage preventative treatment. METHODS: Using the UK Biobank resource, we developed our own polygenic risk score for coronary artery disease (CAD). We used an additional 60 000 UK Biobank individuals to develop an integrated risk tool (IRT) that combined our polygenic risk score with established risk tools (either the American Heart Association/American College of Cardiology pooled cohort equations [PCE] or UK QRISK3), and we tested our IRT in an additional, independent set of 186 451 UK Biobank individuals. RESULTS: The novel CAD polygenic risk score shows superior predictive power for CAD events, compared with other published polygenic risk scores, and is largely uncorrelated with PCE and QRISK3. When combined with PCE into an IRT, it has superior predictive accuracy. Overall, 10.4% of incident CAD cases were misclassified as low risk by PCE and correctly classified as high risk by the IRT, compared with 4.4% misclassified by the IRT and correctly classified by PCE. The overall net reclassification improvement for the IRT was 5.9% (95% CI, 4.7-7.0). When individuals were stratified into age-by-sex subgroups, the improvement was larger for all subgroups (range, 8.3%-15.4%), with the best performance in 40- to 54-year-old men (15.4% [95% CI, 11.6-19.3]). Comparable results were found using a different risk tool (QRISK3) and also a broader definition of cardiovascular disease. Use of the IRT is estimated to avoid up to 12 000 deaths in the United States over a 5-year period. CONCLUSIONS: An IRT that includes polygenic risk outperforms current risk stratification tools and offers greater opportunity for early interventions. Given the plummeting costs of genetic tests, future iterations of CAD risk tools would be enhanced with the addition of a person's polygenic risk.
Subject(s)
Coronary Artery Disease/diagnosis , Adult , Aged , Coronary Artery Disease/epidemiology , Coronary Artery Disease/genetics , Databases, Genetic , Female , Genetic Predisposition to Disease , Humans , Incidence , Male , Middle Aged , Proportional Hazards Models , Risk FactorsABSTRACT
The search for adaptive evolution in the human genome has reached a new era with the advent of genome-wide surveys of genetic variation. However, making sense, let alone use, of such experiments is far from straightforward. Key problems include the way in which the data have been collected, the need to control for factors such as population history and variable recombination rates, which influence the discovery rates for both true and false positives, and the inherent difficulty of falsification. Nevertheless, recent work has shown that genome scans can be used to identify both functional polymorphisms underlying selected traits and entire classes of genes enriched for signals of adaptation.
Subject(s)
Adaptation, Biological/genetics , Evolution, Molecular , Genetic Testing , Genetic Variation , Genome, Human , Humans , Selection, GeneticABSTRACT
In humans, the rate of recombination, as measured on the megabase scale, is positively associated with the level of genetic variation, as measured at the genic scale. Despite considerable debate, it is not clear whether these factors are causally linked or, if they are, whether this is driven by the repeated action of adaptive evolution or molecular processes such as double-strand break formation and mismatch repair. We introduce three innovations to the analysis of recombination and diversity: fine-scale genetic maps estimated from genotype experiments that identify recombination hotspots at the kilobase scale, analysis of an entire human chromosome, and the use of wavelet techniques to identify correlations acting at different scales. We show that recombination influences genetic diversity only at the level of recombination hotspots. Hotspots are also associated with local increases in GC content and the relative frequency of GC-increasing mutations but have no effect on substitution rates. Broad-scale association between recombination and diversity is explained through covariance of both factors with base composition. To our knowledge, these results are the first evidence of a direct and local influence of recombination hotspots on genetic variation and the fate of individual mutations. However, that hotspots have no influence on substitution rates suggests that they are too ephemeral on an evolutionary time scale to have a strong influence on broader scale patterns of base composition and long-term molecular evolution.
Subject(s)
Genetic Variation/genetics , Mutagenesis/genetics , Recombination, Genetic/genetics , Animals , Base Composition/genetics , Base Pairing/genetics , Gene Frequency/genetics , Genome, Human/genetics , Humans , Pan troglodytes/geneticsABSTRACT
Hepatitis C virus (HCV) genotype 3 is very prevalent in Europe and Asia and is associated with worst outcomes than other genotypes. Genetic factors have been associated with HCV infection; however, no extensive genome-wide study has been performed among HCV genotype 3 patients. In this study, using a large cohort of 1,759 patients infected with HCV genotype 3, we explore the role of genetic variants on the response to interferon (IFN) and direct-acting antiviral (DAA) regimens and viremia in a combined candidate gene and genome-wide analysis. We show that genetic variants within the IFN lambda 4 (IFNL4) locus are the major factors associated with the studied traits, accordingly with observations in other HCV genotypes and with comparable effect sizes. In particular, the functional dinucleotide polymorphism rs368234815 was associated with IFN-based sustained virologic response (SVR) [odds ratio (OR) = 1.5, P = 2.3 × 10-7], viremia (beta = -0.23, P = 8.8 × 10-10), and also DAA-based SVR (OR = 1.7; P = 4.2 × 10-4). Our results provide evidence for a role of genetic variants on HCV viremia and SVR, notably DAA-based, in patients infected with HCV genotype 3.
Subject(s)
Genetic Loci , Genotype , Hepacivirus , Interleukins/genetics , Polymorphism, Genetic , Viremia/genetics , Female , Genome-Wide Association Study , Humans , Interleukins/metabolism , Male , Middle Aged , Viremia/drug therapy , Viremia/metabolismABSTRACT
The completion of the International HapMap Project marks the start of a new phase in human genetics. The aim of the project was to provide a resource that facilitates the design of efficient genome-wide association studies, through characterising patterns of genetic variation and linkage disequilibrium in a sample of 270 individuals across four geographical populations. In total, over one million SNPs have been typed across these genomes, providing an unprecedented view of human genetic diversity. In this review we focus on what the HapMap Project has taught us about the structure of human genetic variation and the fundamental molecular and evolutionary processes that shape it.
Subject(s)
Genetic Variation , Genome, Human , Alleles , Chromosome Mapping , Evolution, Molecular , Genetics, Population , Humans , Linkage Disequilibrium , Polymorphism, Single NucleotideABSTRACT
Nontyphoidal Salmonella (NTS) is a major cause of bacteraemia in Africa. The disease typically affects HIV-infected individuals and young children, causing substantial morbidity and mortality. Here we present a genome-wide association study (180 cases, 2677 controls) and replication analysis of NTS bacteraemia in Kenyan and Malawian children. We identify a locus in STAT4, rs13390936, associated with NTS bacteraemia. rs13390936 is a context-specific expression quantitative trait locus for STAT4 RNA expression, and individuals carrying the NTS-risk genotype demonstrate decreased interferon-γ (IFNγ) production in stimulated natural killer cells, and decreased circulating IFNγ concentrations during acute NTS bacteraemia. The NTS-risk allele at rs13390936 is associated with protection against a range of autoimmune diseases. These data implicate interleukin-12-dependent IFNγ-mediated immunity as a determinant of invasive NTS disease in African children, and highlight the shared genetic architecture of infectious and autoimmune disease.
Subject(s)
Autoimmune Diseases/genetics , Bacteremia/epidemiology , Genetic Predisposition to Disease , STAT4 Transcription Factor/genetics , Salmonella Infections/epidemiology , Salmonella/pathogenicity , Adolescent , Alleles , Autoimmune Diseases/epidemiology , Autoimmune Diseases/immunology , Autoimmune Diseases/microbiology , Bacteremia/genetics , Bacteremia/immunology , Bacteremia/microbiology , Case-Control Studies , Child , Child, Preschool , Female , Follow-Up Studies , Genome-Wide Association Study , Genotype , Humans , Immunity, Cellular/genetics , Infant , Infant, Newborn , Interferon-gamma/blood , Interferon-gamma/immunology , Interferon-gamma/metabolism , Interleukin-12/immunology , Interleukin-12/metabolism , Kenya/epidemiology , Killer Cells, Natural/immunology , Killer Cells, Natural/metabolism , Malawi/epidemiology , Male , Polymorphism, Single Nucleotide , Quantitative Trait Loci/genetics , Risk Factors , Salmonella/isolation & purification , Salmonella Infections/genetics , Salmonella Infections/immunology , Salmonella Infections/microbiologyABSTRACT
Phenome-wide association studies (PheWAS) have been proposed as a possible aid in drug development through elucidating mechanisms of action, identifying alternative indications, or predicting adverse drug events (ADEs). Here, we select 25 single nucleotide polymorphisms (SNPs) linked through genome-wide association studies (GWAS) to 19 candidate drug targets for common disease indications. We interrogate these SNPs by PheWAS in four large cohorts with extensive health information (23andMe, UK Biobank, FINRISK, CHOP) for association with 1683 binary endpoints in up to 697,815 individuals and conduct meta-analyses for 145 mapped disease endpoints. Our analyses replicate 75% of known GWAS associations (P < 0.05) and identify nine study-wide significant novel associations (of 71 with FDR < 0.1). We describe associations that may predict ADEs, e.g., acne, high cholesterol, gout, and gallstones with rs738409 (p.I148M) in PNPLA3 and asthma with rs1990760 (p.T946A) in IFIH1. Our results demonstrate PheWAS as a powerful addition to the toolkit for drug discovery.