Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 77
Filter
Add more filters

Publication year range
1.
Nature ; 629(8012): 624-629, 2024 May.
Article in English | MEDLINE | ID: mdl-38632401

ABSTRACT

The cost of drug discovery and development is driven primarily by failure1, with only about 10% of clinical programmes eventually receiving approval2-4. We previously estimated that human genetic evidence doubles the success rate from clinical development to approval5. In this study we leverage the growth in genetic evidence over the past decade to better understand the characteristics that distinguish clinical success and failure. We estimate the probability of success for drug mechanisms with genetic support is 2.6 times greater than those without. This relative success varies among therapy areas and development phases, and improves with increasing confidence in the causal gene, but is largely unaffected by genetic effect size, minor allele frequency or year of discovery. These results indicate we are far from reaching peak genetic insights to aid the discovery of targets for more effective drugs.


Subject(s)
Clinical Trials as Topic , Drug Approval , Drug Discovery , Treatment Outcome , Humans , Alleles , Clinical Trials as Topic/economics , Clinical Trials as Topic/statistics & numerical data , Drug Approval/economics , Drug Discovery/economics , Drug Discovery/methods , Drug Discovery/statistics & numerical data , Drug Discovery/trends , Gene Frequency , Genetic Predisposition to Disease , Molecular Targeted Therapy , Probability , Time Factors , Treatment Failure
2.
Nature ; 586(7831): 749-756, 2020 10.
Article in English | MEDLINE | ID: mdl-33087929

ABSTRACT

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.


Subject(s)
Databases, Genetic , Exome Sequencing , Exome/genetics , Loss of Function Mutation/genetics , Phenotype , Aged , Bone Density/genetics , Collagen Type VI/genetics , Demography , Female , Genes, BRCA1 , Genes, BRCA2 , Genotype , Humans , Ion Channels/genetics , Male , Middle Aged , Neoplasms/genetics , Penetrance , Peptide Fragments/genetics , United Kingdom , Varicose Veins/genetics , ras GTPase-Activating Proteins/genetics
3.
Nat Rev Genet ; 17(4): 197-206, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26972588

ABSTRACT

Lack of sufficient efficacy is the most common cause of attrition in late-phase drug development. It has long been envisioned that genetics could drive stratified drug development by identifying those patient subgroups that are most likely to respond. However, this vision has not been realized as only a small proportion of drugs have been found to have germline genetic predictors of efficacy with clinically meaningful effects, and so far all but one were found after drug approval. With the exception of oncology, systematic application of efficacy pharmacogenetics has not been integrated into drug discovery and development across the industry. Here, we argue for routine, early and cumulative screening for genetic predictors of efficacy, as an integrated component of clinical trial analysis. Such a strategy would identify clinically relevant predictors that may exist at the earliest possible opportunity, allow these predictors to be integrated into subsequent clinical development and provide mechanistic insights into drug disposition and patient-specific factors that influence response, therefore paving the way towards more personalized medicine.


Subject(s)
Drug Discovery , Pharmacogenetics , Biomarkers, Pharmacological/analysis , Drug Discovery/trends , Genotype , Humans , Pharmacogenetics/trends , Precision Medicine/trends , Treatment Outcome
4.
J Allergy Clin Immunol ; 147(5): 1830-1837.e15, 2021 05.
Article in English | MEDLINE | ID: mdl-33058932

ABSTRACT

BACKGROUND: ß-lactam antibiotics are associated with a variety of immune-mediated or hypersensitivity reactions, including immediate (type I) reactions mediated by antigen-specific IgE. OBJECTIVE: We sought to identify genetic predisposing factors for immediate reactions to ß-lactam antibiotics. METHODS: Patients with a clinical history of immediate hypersensitivity reactions to either penicillins or cephalosporins, which were immunologically confirmed, were recruited from allergy clinics. A genome-wide association study was conducted on 662 patients (the discovery cohort) with a diagnosis of immediate hypersensitivity and the main finding was replicated in a cohort of 98 Spanish cases, recruited using the same diagnostic criteria as the discovery cohort. RESULTS: Genome-wide association study identified rs71542416 within the Class II HLA region as the top hit (P = 2 × 10-14); this was in linkage disequilibrium with HLA-DRB1∗10:01 (odds ratio, 2.93; P = 5.4 × 10-7) and HLA-DQA1∗01:05 (odds ratio, 2.93, P = 5.4 × 10-7). Haplotype analysis identified that HLA-DRB1∗10:01 was a risk factor even without the HLA-DQA1∗01:05 allele. The association with HLA-DRB1∗10:01 was replicated in another cohort, with the meta-analysis of the discovery and replication cohorts showing that HLA-DRB1∗10:01 increased the risk of immediate hypersensitivity at a genome-wide level (odds ratio, 2.96; P = 4.1 × 10-9). No association with HLA-DRB1∗10:01 was identified in 268 patients with delayed hypersensitivity reactions to ß-lactams. CONCLUSIONS: HLA-DRB1∗10:01 predisposed to immediate hypersensitivity reactions to penicillins. Further work to identify other predisposing HLA and non-HLA loci is required.


Subject(s)
Anti-Bacterial Agents/adverse effects , Cephalosporins/adverse effects , Drug Hypersensitivity/genetics , Hypersensitivity, Immediate/chemically induced , Hypersensitivity, Immediate/genetics , Penicillins/adverse effects , Adult , Cohort Studies , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , HLA-DQ alpha-Chains/genetics , HLA-DRB1 Chains/genetics , Humans , Male , Middle Aged , Phenotype , Polymorphism, Single Nucleotide
5.
Gastroenterology ; 156(6): 1707-1716.e2, 2019 05.
Article in English | MEDLINE | ID: mdl-30664875

ABSTRACT

BACKGROUND & AIMS: We performed genetic analyses of a multiethnic cohort of patients with idiosyncratic drug-induced liver injury (DILI) to identify variants associated with susceptibility. METHODS: We performed a genome-wide association study of 2048 individuals with DILI (cases) and 12,429 individuals without (controls). Our analysis included subjects of European (1806 cases and 10,397 controls), African American (133 cases and 1,314 controls), and Hispanic (109 cases and 718 controls) ancestry. We analyzed DNA from 113 Icelandic cases and 239,304 controls to validate our findings. RESULTS: We associated idiosyncratic DILI with rs2476601, a nonsynonymous polymorphism that encodes a substitution of tryptophan with arginine in the protein tyrosine phosphatase, nonreceptor type 22 gene (PTPN22) (odds ratio [OR] 1.44; 95% confidence interval [CI] 1.28-1.62; P = 1.2 × 10-9 and replicated the finding in the validation set (OR 1.48; 95% CI 1.09-1.99; P = .01). The minor allele frequency showed the same effect size (OR > 1) among ethnic groups. The strongest association was with amoxicillin and clavulanate-associated DILI in persons of European ancestry (OR 1.62; 95% CI 1.32-1.98; P = 4.0 × 10-6; allele frequency = 13.3%), but the polymorphism was associated with DILI of other causes (OR 1.37; 95% CI 1.21-1.56; P = 1.5 × 10-6; allele frequency = 11.5%). Among amoxicillin- and clavulanate-associated cases of European ancestry, rs2476601 doubled the risk for DILI among those with the HLA risk alleles A*02:01 and DRB1*15:01. CONCLUSIONS: In a genome-wide association study, we identified rs2476601 in PTPN22 as a non-HLA variant that associates with risk of liver injury caused by multiple drugs and validated our finding in a separate cohort. This variant has been associated with increased risk of autoimmune diseases, providing support for the concept that alterations in immune regulation contribute to idiosyncratic DILI.


Subject(s)
Black or African American/genetics , Chemical and Drug Induced Liver Injury/genetics , Hispanic or Latino/genetics , Protein Tyrosine Phosphatase, Non-Receptor Type 22/genetics , White People/genetics , Adult , Amoxicillin/adverse effects , Anti-Bacterial Agents/adverse effects , Case-Control Studies , Clavulanic Acid/adverse effects , Female , Gene Frequency , Genome-Wide Association Study , HLA-A2 Antigen/genetics , HLA-DRB1 Chains/genetics , Humans , Male , Middle Aged , Mutation, Missense , Polymorphism, Single Nucleotide , Risk Factors
6.
BMC Bioinformatics ; 20(1): 69, 2019 Feb 08.
Article in English | MEDLINE | ID: mdl-30736745

ABSTRACT

BACKGROUND: Determining which target to pursue is a challenging and error-prone first step in developing a therapeutic treatment for a disease, where missteps are potentially very costly given the long-time frames and high expenses of drug development. With current informatics technology and machine learning algorithms, it is now possible to computationally discover therapeutic hypotheses by predicting clinically promising drug targets based on the evidence associating drug targets with disease indications. We have collected this evidence from Open Targets and additional databases that covers 17 sources of evidence for target-indication association and represented the data as a tensor of 21,437 × 2211 × 17. RESULTS: As a proof-of-concept, we identified examples of successes and failures of target-indication pairs in clinical trials across 875 targets and 574 disease indications to build a gold-standard data set of 6140 known clinical outcomes. We designed and executed three benchmarking strategies to examine the performance of multiple machine learning models: Logistic Regression, LASSO, Random Forest, Tensor Factorization and Gradient Boosting Machine. With 10-fold cross-validation, tensor factorization achieved AUROC = 0.82 ± 0.02 and AUPRC = 0.71 ± 0.03. Across multiple validation schemes, this was comparable or better than other methods. CONCLUSION: In this work, we benchmarked a machine learning technique called tensor factorization for the problem of predicting clinical outcomes of therapeutic hypotheses. Results have shown that this method can achieve equal or better prediction performance compared with a variety of baseline models. We demonstrate one application of the method to predict outcomes of trials on novel indications of approved drug targets. This work can be expanded to targets and indications that have never been clinically tested and proposing novel target-indication hypotheses. Our proposed biologically-motivated cross-validation schemes provide insight into the robustness of the prediction performance. This has significant implications for all future methods that try to address this seminal problem in drug discovery.


Subject(s)
Algorithms , Drug Discovery , Models, Theoretical , Bayes Theorem , Benchmarking , Clinical Trials as Topic , Drug Delivery Systems/methods , Humans , Logistic Models
7.
Gastroenterology ; 152(5): 1078-1089, 2017 04.
Article in English | MEDLINE | ID: mdl-28043905

ABSTRACT

BACKGROUND & AIMS: We performed a genome-wide association study (GWAS) to identify genetic risk factors for drug-induced liver injury (DILI) from licensed drugs without previously reported genetic risk factors. METHODS: We performed a GWAS of 862 persons with DILI and 10,588 population-matched controls. The first set of cases was recruited before May 2009 in Europe (n = 137) and the United States (n = 274). The second set of cases were identified from May 2009 through May 2013 from international collaborative studies performed in Europe, the United States, and South America. For the GWAS, we included only cases with patients of European ancestry associated with a particular drug (but not flucloxacillin or amoxicillin-clavulanate). We used DNA samples from all subjects to analyze HLA genes and single nucleotide polymorphisms. After the discovery analysis was concluded, we validated our findings using data from 283 European patients with diagnosis of DILI associated with various drugs. RESULTS: We associated DILI with rs114577328 (a proxy for A*33:01 a HLA class I allele; odds ratio [OR], 2.7; 95% confidence interval [CI], 1.9-3.8; P = 2.4 × 10-8) and with rs72631567 on chromosome 2 (OR, 2.0; 95% CI, 1.6-2.5; P = 9.7 × 10-9). The association with A*33:01 was mediated by large effects for terbinafine-, fenofibrate-, and ticlopidine-related DILI. The variant on chromosome 2 was associated with DILI from a variety of drugs. Further phenotypic analysis indicated that the association between DILI and A*33:01 was significant genome wide for cholestatic and mixed DILI, but not for hepatocellular DILI; the polymorphism on chromosome 2 was associated with cholestatic and mixed DILI as well as hepatocellular DILI. We identified an association between rs28521457 (within the lipopolysaccharide-responsive vesicle trafficking, beach and anchor containing gene) and only hepatocellular DILI (OR, 2.1; 95% CI, 1.6-2.7; P = 4.8 × 10-9). We did not associate any specific drug classes with genetic polymorphisms, except for statin-associated DILI, which was associated with rs116561224 on chromosome 18 (OR, 5.4; 95% CI, 3.0-9.5; P = 7.1 × 10-9). We validated the association between A*33:01 terbinafine- and sertraline-induced DILI. We could not validate the association between DILI and rs72631567, rs28521457, or rs116561224. CONCLUSIONS: In a GWAS of persons of European descent with DILI, we associated HLA-A*33:01 with DILI due to terbinafine and possibly fenofibrate and ticlopidine. We identified polymorphisms that appear to be associated with DILI from statins, as well as 2 non-drug-specific risk factors.


Subject(s)
Chemical and Drug Induced Liver Injury/genetics , Chromosomes, Human, Pair 2/genetics , HLA-A Antigens/genetics , Alleles , Antidepressive Agents/adverse effects , Antifungal Agents/adverse effects , Chemical and Drug Induced Liver Injury/etiology , Female , Fenofibrate/adverse effects , Genes, MHC Class I/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/adverse effects , Hypolipidemic Agents/adverse effects , Male , Middle Aged , Naphthalenes/adverse effects , Odds Ratio , Phenotype , Platelet Aggregation Inhibitors/adverse effects , Polymorphism, Single Nucleotide , Sertraline/adverse effects , Terbinafine , Ticlopidine/adverse effects , White People/genetics
8.
Bioinformatics ; 33(17): 2784-2786, 2017 Sep 01.
Article in English | MEDLINE | ID: mdl-28472345

ABSTRACT

SUMMARY: We developed the STOPGAP (Systematic Target OPportunity assessment by Genetic Association Predictions) database, an extensive catalog of human genetic associations mapped to effector gene candidates. STOPGAP draws on a variety of publicly available GWAS associations, linkage disequilibrium (LD) measures, functional genomic and variant annotation sources. Algorithms were developed to merge the association data, partition associations into non-overlapping LD clusters, map variants to genes and produce a variant-to-gene score used to rank the relative confidence among potential effector genes. This database can be used for a multitude of investigations into the genes and genetic mechanisms underlying inter-individual variation in human traits, as well as supporting drug discovery applications. AVAILABILITY AND IMPLEMENTATION: Shell, R, Perl and Python scripts and STOPGAP R data files (version 2.5.1 at publication) are available at https://github.com/StatGenPRD/STOPGAP . Some of the most useful STOPGAP fields can be queried through an R Shiny web application at http://stopgapwebapp.com . CONTACT: matthew.r.nelson@gsk.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Databases, Factual , Genetic Association Studies/methods , Genetic Variation , Linkage Disequilibrium , Algorithms , Humans , Sequence Analysis, DNA/methods
9.
Nucleic Acids Res ; 44(D1): D869-76, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26615194

ABSTRACT

Genome-wide association studies (GWASs), now as a routine approach to study single-nucleotide polymorphism (SNP)-trait association, have uncovered over ten thousand significant trait/disease associated SNPs (TASs). Here, we updated GWASdb (GWASdb v2, http://jjwanglab.org/gwasdb) which provides comprehensive data curation and knowledge integration for GWAS TASs. These updates include: (i) Up to August 2015, we collected 2479 unique publications from PubMed and other resources; (ii) We further curated moderate SNP-trait associations (P-value < 1.0 × 10(-3)) from each original publication, and generated a total of 252,530 unique TASs in all GWASdb v2 collected studies; (iii) We manually mapped 1610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms. For each ontology term, we also predicted the putative causal genes; (iv) We curated the detailed sub-populations and related sample size for each study; (v) Importantly, we performed extensive function annotation for each TAS by incorporating gene-based information, ENCODE ChIP-seq assays, eQTL, population haplotype, functional prediction across multiple biological domains, evolutionary signals and disease-related annotation; (vi) Additionally, we compiled a SNP-drug response association dataset for 650 pharmacogenetic studies involving 257 drugs in this update; (vii) Last, we improved the user interface of website.


Subject(s)
Databases, Genetic , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Biological Ontologies , Disease/genetics , Genes , Humans , Molecular Sequence Annotation
10.
Pharmacogenet Genomics ; 27(3): 89-100, 2017 03.
Article in English | MEDLINE | ID: mdl-27984508

ABSTRACT

OBJECTIVE: Proteins involving absorption, distribution, metabolism, and excretion (ADME) play a critical role in drug pharmacokinetics. The type and frequency of genetic variation in the ADME genes differ among populations. The aim of this study was to systematically investigate common and rare ADME coding variation in diverse ethnic populations by exome sequencing. MATERIALS AND METHODS: Data derived from commercial exome capture arrays and next-generation sequencing were used to characterize coding variation in 298 ADME genes in 251 Northeast Asians and 1181 individuals from the 1000 Genomes Project. RESULTS: Approximately 75% of the ADME coding sequence was captured at high quality across the joint samples harboring more than 8000 variants, with 49% of individuals carrying at least one 'knockout' allele. ADME genes carried 50% more nonsynonymous variation than non-ADME genes (P=8.2×10) and showed significantly greater levels of population differentiation (P=7.6×10). Out of the 2135 variants identified that were predicted to be deleterious, 633 were not on commercially available ADME or general-purpose genotyping arrays. Forty deleterious variants within important ADME genes, with frequencies of at least 2% in at least one population, were identified as candidates for future pharmacogenetic studies. CONCLUSION: Exome sequencing was effective in accurately genotyping most ADME variants important for pharmacogenetic research, in addition to identifying rare or potentially de novo coding variants that may be clinically meaningful. Furthermore, as a class, ADME genes are more variable and less sensitive to purifying selection than non-ADME genes.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Oligonucleotide Array Sequence Analysis/methods , Population Groups/genetics , Sequence Analysis, DNA/methods , Exome , Genetic Variation , Genetics, Population , Humans , Male , Polymorphism, Single Nucleotide , Population Groups/ethnology , Principal Component Analysis
11.
J Antimicrob Chemother ; 72(4): 1152-1162, 2017 04 01.
Article in English | MEDLINE | ID: mdl-28062682

ABSTRACT

Background: The antiretroviral nevirapine is associated with hypersensitivity reactions in 6%-10% of patients, including hepatotoxicity, maculopapular exanthema, Stevens-Johnson syndrome (SJS) and toxic epidermal necrolysis (TEN). Objectives: To undertake a genome-wide association study (GWAS) to identify genetic predisposing factors for the different clinical phenotypes associated with nevirapine hypersensitivity. Methods: A GWAS was undertaken in a discovery cohort of 151 nevirapine-hypersensitive and 182 tolerant, HIV-infected Malawian adults. Replication of signals was determined in a cohort of 116 cases and 68 controls obtained from Malawi, Uganda and Mozambique. Interaction with ERAP genes was determined in patients positive for HLA-C*04:01 . In silico docking studies were also performed for HLA-C*04:01 . Results: Fifteen SNPs demonstrated nominal significance ( P < 1 × 10 -5 ) with one or more of the hypersensitivity phenotypes. The most promising signal was seen in SJS/TEN, where rs5010528 ( HLA-C locus) approached genome-wide significance ( P < 8.5 × 10 -8 ) and was below HLA -wide significance ( P < 2.5 × 10 -4 ) in the meta-analysis of discovery and replication cohorts [OR 4.84 (95% CI 2.71-8.61)]. rs5010528 is a strong proxy for HLA-C*04:01 carriage: in silico docking showed that two residues (33 and 123) in the B pocket were the most likely nevirapine interactors. There was no interaction between HLA-C*04:01 and ERAP1 , but there is a potential protective effect with ERAP2 [ P = 0.019, OR 0.43 (95% CI 0.21-0.87)]. Conclusions: HLA-C*04:01 predisposes to nevirapine-induced SJS/TEN in sub-Saharan Africans, but not to other hypersensitivity phenotypes. This is likely to be mediated via binding to the B pocket of the HLA-C peptide. Whether this risk is modulated by ERAP2 variants requires further study.


Subject(s)
Anti-HIV Agents/adverse effects , Drug Hypersensitivity/genetics , HIV Infections/drug therapy , HLA-C Antigens/genetics , Nevirapine/adverse effects , Polymorphism, Single Nucleotide , Adult , Africa South of the Sahara/epidemiology , Aged , Anti-HIV Agents/therapeutic use , Biomarkers/analysis , Black People , Case-Control Studies , Female , Genome-Wide Association Study , Genotype , HIV Infections/epidemiology , HIV Infections/genetics , HIV Infections/virology , Humans , Male , Middle Aged , Nevirapine/therapeutic use , Stevens-Johnson Syndrome/etiology , Young Adult
12.
Pharmacogenet Genomics ; 26(5): 218-24, 2016 May.
Article in English | MEDLINE | ID: mdl-26959717

ABSTRACT

OBJECTIVE: Flupirtine is a nonopioid analgesic with regulatory approval in a number of European countries. Because of the risk of serious liver injury, its use is now limited to short-term pain management. We aimed to identify genetic risk factors for flupirtine-related drug-induced liver injury (DILI) as these are unknown. MATERIALS AND METHODS: Six flupirtine-related DILI patients from Germany were included in a genome-wide association study (GWAS) involving a further 614 European cases of DILI because of other drugs and 10,588 population controls. DILI was diagnosed by causality assessment and expert review. Human leucocyte antigen (HLA) and single nucleotide polymorphism genotypes were imputed from the GWAS data, with direct HLA typing performed on selected cases to validate HLA predictions. Four replication cases that were unavailable for the GWAS were genotyped by direct HLA typing, yielding an overall total of 10 flupirtine DILI cases. RESULTS: In the six flupirtine DILI cases included in the GWAS, we found a significant enrichment of the DRB1*16:01-DQB1*05:02 haplotype compared with the controls (minor allele frequency cases 0.25 and minor allele frequency controls 0.013; P=1.4 × 10(-5)). We estimated an odds ratio for haplotype carriers of 18.7 (95% confidence interval 2.5-140.5, P=0.002) using population-specific HLA control data. The result was replicated in four additional cases, also with a haplotype frequency of 0.25. In the combined cohort (six GWAS plus four replication cases), the haplotype was also significant (odds ratio 18.7, 95% confidence interval 4.31-81.42, P=6.7 × 10(-5)). CONCLUSION: We identified a novel HLA class II association for DILI, confirming the important contribution of HLA genotype towards the risk of DILI generally.


Subject(s)
Aminopyridines/adverse effects , Chemical and Drug Induced Liver Injury/genetics , HLA-DQ beta-Chains/genetics , HLA-DRB1 Chains/genetics , Adult , Aged , Chemical and Drug Induced Liver Injury/etiology , Female , Genome-Wide Association Study , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide
13.
Am J Hum Genet ; 92(4): 547-57, 2013 Apr 04.
Article in English | MEDLINE | ID: mdl-23541341

ABSTRACT

Clinical trials for preventative therapies are complex and costly endeavors focused on individuals likely to develop disease in a short time frame, randomizing them to treatment groups, and following them over time. In such trials, statistical power is governed by the rate of disease events in each group and cost is determined by randomization, treatment, and follow-up. Strategies that increase the rate of disease events by enrolling individuals with high risk of disease can significantly reduce study size, duration, and cost. Comprehensive study of common, complex diseases has resulted in a growing list of robustly associated genetic markers. Here, we evaluate the utility--in terms of trial size, duration, and cost--of enriching prevention trial samples by combining clinical information with genetic risk scores to identify individuals at greater risk of disease. We also describe a framework for utilizing genetic risk scores in these trials and evaluating the associated cost and time savings. With type 1 diabetes (T1D), type 2 diabetes (T2D), myocardial infarction (MI), and advanced age-related macular degeneration (AMD) as examples, we illustrate the potential and limitations of using genetic data for prevention trial design. We illustrate settings where incorporating genetic information could reduce trial cost or duration considerably, as well as settings where potential savings are negligible. Results are strongly dependent on the genetic architecture of the disease, but we also show that these benefits should increase as the list of robustly associated markers for each disease grows and as large samples of genotyped individuals become available.


Subject(s)
Diabetes Mellitus, Type 1/prevention & control , Diabetes Mellitus, Type 2/prevention & control , Genetic Testing/statistics & numerical data , Genetic Variation/genetics , Genotype , Macular Degeneration/prevention & control , Myocardial Infarction/prevention & control , Research Design , Clinical Trials as Topic , Cost-Benefit Analysis , Diabetes Mellitus, Type 1/genetics , Diabetes Mellitus, Type 2/genetics , Humans , Macular Degeneration/genetics , Models, Statistical , Myocardial Infarction/genetics , Phenotype , Risk Factors
14.
Genome Res ; 23(12): 1974-84, 2013 Dec.
Article in English | MEDLINE | ID: mdl-23990608

ABSTRACT

Understanding patterns of spontaneous mutations is of fundamental interest in studies of human genome evolution and genetic disease. Here, we used extremely rare variants in humans to model the molecular spectrum of single-nucleotide mutations. Compared to common variants in humans and human-chimpanzee fixed differences (substitutions), rare variants, on average, arose more recently in the human lineage and are less affected by the potentially confounding effects of natural selection, population demographic history, and biased gene conversion. We analyzed variants obtained from a population-based sequencing study of 202 genes in >14,000 individuals. We observed considerable variability in the per-gene mutation rate, which was correlated with local GC content, but not recombination rate. Using >20,000 variants with a derived allele frequency ≤ 10(-4), we examined the effect of local GC content and recombination rate on individual variant subtypes and performed comparisons with common variants and substitutions. The influence of local GC content on rare variants differed from that on common variants or substitutions, and the differences varied by variant subtype. Furthermore, recombination rate and recombination hotspots have little effect on rare variants of any subtype, yet both have a relatively strong impact on multiple variant subtypes in common variants and substitutions. This observation is consistent with the effect of biased gene conversion or selection-dependent processes. Our results highlight the distinct biases inherent in the initial mutation patterns and subsequent evolutionary processes that affect segregating variants.


Subject(s)
Genetic Variation , Genome, Human , Point Mutation , Animals , Base Composition , Evolution, Molecular , Gene Conversion , Gene Frequency , Genomics , Humans , Logistic Models , Models, Genetic , Mutation Rate , Pan troglodytes/genetics , Phylogeny , Recombination, Genetic , Selection, Genetic
15.
BMC Bioinformatics ; 16: 75, 2015 Mar 07.
Article in English | MEDLINE | ID: mdl-25884587

ABSTRACT

BACKGROUND: Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing. RESULTS: Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals. CONCLUSIONS: We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.


Subject(s)
Algorithms , Biomarkers/analysis , Disease/genetics , Exons/genetics , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide/genetics , Software , Genetics, Population , Genome, Human , Genotype , Haplotypes/genetics , Humans , Linkage Disequilibrium
16.
PLoS Comput Biol ; 9(2): e1002877, 2013.
Article in English | MEDLINE | ID: mdl-23459081

ABSTRACT

Statistical imputation of classical HLA alleles in case-control studies has become established as a valuable tool for identifying and fine-mapping signals of disease association in the MHC. Imputation into diverse populations has, however, remained challenging, mainly because of the additional haplotypic heterogeneity introduced by combining reference panels of different sources. We present an HLA type imputation model, HLA*IMP:02, designed to operate on a multi-population reference panel. HLA*IMP:02 is based on a graphical representation of haplotype structure. We present a probabilistic algorithm to build such models for the HLA region, accommodating genotyping error, haplotypic heterogeneity and the need for maximum accuracy at the HLA loci, generalizing the work of Browning and Browning (2007) and Ron et al. (1998). HLA*IMP:02 achieves an average 4-digit imputation accuracy on diverse European panels of 97% (call rate 97%). On non-European samples, 2-digit performance is over 90% for most loci and ethnicities where data available. HLA*IMP:02 supports imputation of HLA-DPB1 and HLA-DRB3-5, is highly tolerant of missing data in the imputation panel and works on standard genotype data from popular genotyping chips. It is publicly available in source code and as a user-friendly web service framework.


Subject(s)
Computational Biology/methods , Genetics, Population/methods , HLA Antigens/genetics , Models, Genetic , Models, Immunological , Haplotypes , Humans , Polymorphism, Single Nucleotide , Principal Component Analysis , Racial Groups , Reproducibility of Results , Software
17.
Nature ; 456(7218): 98-101, 2008 Nov 06.
Article in English | MEDLINE | ID: mdl-18758442

ABSTRACT

Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual's DNA can be used to infer their geographic origin with surprising accuracy-often to within a few hundred kilometres.


Subject(s)
Genetic Variation/genetics , Genetics, Population , Geography , Emigration and Immigration , Europe/ethnology , Genome, Human/genetics , Genome-Wide Association Study , Genotype , Humans , Phylogeny , Polymorphism, Single Nucleotide , Principal Component Analysis , Quantitative Trait, Heritable , Sample Size
18.
Proc Natl Acad Sci U S A ; 107(2): 786-91, 2010 Jan 12.
Article in English | MEDLINE | ID: mdl-20080753

ABSTRACT

Quantifying patterns of population structure in Africans and African Americans illuminates the history of human populations and is critical for undertaking medical genomic studies on a global scale. To obtain a fine-scale genome-wide perspective of ancestry, we analyze Affymetrix GeneChip 500K genotype data from African Americans (n = 365) and individuals with ancestry from West Africa (n = 203 from 12 populations) and Europe (n = 400 from 42 countries). We find that population structure within the West African sample reflects primarily language and secondarily geographical distance, echoing the Bantu expansion. Among African Americans, analysis of genomic admixture by a principal component-based approach indicates that the median proportion of European ancestry is 18.5% (25th-75th percentiles: 11.6-27.7%), with very large variation among individuals. In the African-American sample as a whole, few autosomal regions showed exceptionally high or low mean African ancestry, but the X chromosome showed elevated levels of African ancestry, consistent with a sex-biased pattern of gene flow with an excess of European male and African female ancestry. We also find that genomic profiles of individual African Americans afford personalized ancestry reconstructions differentiating ancient vs. recent European and African ancestry. Finally, patterns of genetic similarity among inferred African segments of African-American genomes and genomes of contemporary African populations included in this study suggest African ancestry is most similar to non-Bantu Niger-Kordofanian-speaking populations, consistent with historical documents of the African Diaspora and trans-Atlantic slave trade.


Subject(s)
Black People/genetics , Genome-Wide Association Study/methods , Africa South of the Sahara , Africa, Western , Algorithms , Ethnicity/genetics , Europe , Female , Genetic Markers , Genetic Variation , Geography , Humans , Language , Male , United States
19.
Curr Opin Struct Biol ; 80: 102568, 2023 06.
Article in English | MEDLINE | ID: mdl-36963162

ABSTRACT

Evidence from human genetics supporting the therapeutic hypothesis increases the likelihood that a drug will succeed in clinical trials. Rare and common disease genetics yield a wide array of alleles with a range of effect sizes that can proxy for the effect of a drug in disease. Recent advances in large scale population collections and whole genome sequencing approaches have provided a rich resource of human genetic evidence to support drug target selection. As the range of phenotypes profiled increases and ever more alleles are discovered across world-wide populations, these approaches will increasingly influence multiple stages across the lifespan of a drug discovery programme.


Subject(s)
Drug Discovery , Genomics , Humans , Phenotype , Human Genetics
20.
Nat Rev Drug Discov ; 22(2): 145-162, 2023 02.
Article in English | MEDLINE | ID: mdl-36261593

ABSTRACT

Human genetics research has discovered thousands of proteins associated with complex and rare diseases. Genome-wide association studies (GWAS) and studies of Mendelian disease have resulted in an increased understanding of the role of gene function and regulation in human conditions. Although the application of human genetics has been explored primarily as a method to identify potential drug targets and support their relevance to disease in humans, there is increasing interest in using genetic data to identify potential safety liabilities of modulating a given target. Human genetic variants can be used as a model to anticipate the effect of lifelong modulation of therapeutic targets and identify the potential risk for on-target adverse events. This approach is particularly useful for non-clinical safety evaluation of novel therapeutics that lack pharmacologically relevant animal models and can contribute to the intrinsic safety profile of a drug target. This Review illustrates applications of human genetics to safety studies during drug discovery and development, including assessing the potential for on- and off-target associated adverse events, carcinogenicity risk assessment, and guiding translational safety study designs and monitoring strategies. A summary of available human genetic resources and recommended best practices is provided. The challenges and future perspectives of translating human genetic information to identify risks for potential drug effects in preclinical and clinical development are discussed.


Subject(s)
Genome-Wide Association Study , Human Genetics , Animals , Humans
SELECTION OF CITATIONS
SEARCH DETAIL