RESUMEN
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
Asunto(s)
Bases de Datos Genéticas , Secuenciación del Exoma , Exoma/genética , Mutación con Pérdida de Función/genética , Fenotipo , Anciano , Densidad Ósea/genética , Colágeno Tipo VI/genética , Demografía , Femenino , Genes BRCA1 , Genes BRCA2 , Genotipo , Humanos , Canales Iónicos/genética , Masculino , Persona de Mediana Edad , Neoplasias/genética , Penetrancia , Fragmentos de Péptidos/genética , Reino Unido , Várices/genética , Proteínas Activadoras de ras GTPasa/genéticaRESUMEN
The hope for precision medicine has long been on the drug discovery horizon, well before the Human Genome Project gave it promise at the turn of the 21st century. In oncology, the concept has finally been realized and is now firmly embedded in ongoing drug discovery programs, and with many recent therapies involving some level of patient/disease stratification, including some highly personalized treatments. In addition, several drugs for rare diseases have been recently approved or are in late-stage clinical development, and new delivery modalities in cell and gene therapy and oligonucleotide approaches are yielding exciting new medicines for rare diseases of unmet need. For common complex diseases, however, the GWAS-driven advances in annotation of the genetic architecture over the past decade have not led to a concomitant shift in refined treatments. Similarly, attempts to disentangle treatment responders from non-responders via genetic predictors in pharmacogenetics studies have not met their anticipated success. It is possible that common diseases are simply lagging behind due to the inherent time lag with drug discovery, but it is also possible that their inherent multifactorial nature and their etiological and clinical heterogeneity will prove more resistant to refined treatment paradigms. The emergence of population-based resources in electronic health records, coupled with the rapid expansion of mobile devices and digital health may help to refine the measurement of phenotypic outcomes to match the exquisite detail emerging at the molecular level.
RESUMEN
OBJECTIVE: Proteins involving absorption, distribution, metabolism, and excretion (ADME) play a critical role in drug pharmacokinetics. The type and frequency of genetic variation in the ADME genes differ among populations. The aim of this study was to systematically investigate common and rare ADME coding variation in diverse ethnic populations by exome sequencing. MATERIALS AND METHODS: Data derived from commercial exome capture arrays and next-generation sequencing were used to characterize coding variation in 298 ADME genes in 251 Northeast Asians and 1181 individuals from the 1000 Genomes Project. RESULTS: Approximately 75% of the ADME coding sequence was captured at high quality across the joint samples harboring more than 8000 variants, with 49% of individuals carrying at least one 'knockout' allele. ADME genes carried 50% more nonsynonymous variation than non-ADME genes (P=8.2×10) and showed significantly greater levels of population differentiation (P=7.6×10). Out of the 2135 variants identified that were predicted to be deleterious, 633 were not on commercially available ADME or general-purpose genotyping arrays. Forty deleterious variants within important ADME genes, with frequencies of at least 2% in at least one population, were identified as candidates for future pharmacogenetic studies. CONCLUSION: Exome sequencing was effective in accurately genotyping most ADME variants important for pharmacogenetic research, in addition to identifying rare or potentially de novo coding variants that may be clinically meaningful. Furthermore, as a class, ADME genes are more variable and less sensitive to purifying selection than non-ADME genes.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Grupos de Población/genética , Análisis de Secuencia de ADN/métodos , Exoma , Variación Genética , Genética de Población , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Grupos de Población/etnología , Análisis de Componente PrincipalRESUMEN
Malaria is a devastating infection caused by protozoa of the genus Plasmodium. Drug resistance is widespread, no new chemical class of antimalarials has been introduced into clinical practice since 1996 and there is a recent rise of parasite strains with reduced sensitivity to the newest drugs. We screened nearly 2 million compounds in GlaxoSmithKline's chemical library for inhibitors of P. falciparum, of which 13,533 were confirmed to inhibit parasite growth by at least 80% at 2 microM concentration. More than 8,000 also showed potent activity against the multidrug resistant strain Dd2. Most (82%) compounds originate from internal company projects and are new to the malaria community. Analyses using historic assay data suggest several novel mechanisms of antimalarial action, such as inhibition of protein kinases and host-pathogen interaction related targets. Chemical structures and associated data are hereby made public to encourage additional drug lead identification efforts and further research into this disease.
Asunto(s)
Antimaláricos/análisis , Antimaláricos/farmacología , Descubrimiento de Drogas , Malaria Falciparum/tratamiento farmacológico , Plasmodium falciparum/efectos de los fármacos , Bibliotecas de Moléculas Pequeñas/análisis , Bibliotecas de Moléculas Pequeñas/farmacología , Animales , Antimaláricos/química , Antimaláricos/toxicidad , Línea Celular Tumoral , Resistencia a Múltiples Medicamentos/efectos de los fármacos , Humanos , Malaria Falciparum/parasitología , Modelos Biológicos , Filogenia , Plasmodium falciparum/enzimología , Plasmodium falciparum/genética , Plasmodium falciparum/crecimiento & desarrollo , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/toxicidadRESUMEN
We tested 310,605 SNPs for association in 778 individuals with celiac disease and 1,422 controls. Outside the HLA region, the most significant finding (rs13119723; P = 2.0 x 10(-7)) was in the KIAA1109-TENR-IL2-IL21 linkage disequilibrium block. We independently confirmed association in two further collections (strongest association at rs6822844, 24 kb 5' of IL21; meta-analysis P = 1.3 x 10(-14), odds ratio = 0.63), suggesting that genetic variation in this region predisposes to celiac disease.
Asunto(s)
Enfermedad Celíaca/genética , Predisposición Genética a la Enfermedad , Variación Genética , Genoma Humano , Interleucina-2/genética , Interleucinas/genética , Animales , Cromosomas Humanos Par 4/genética , Humanos , Desequilibrio de Ligamiento , Ratones , Polimorfismo de Nucleótido Simple , Factores de RiesgoRESUMEN
A genome-wide association scan in individuals with Crohn's disease by the Wellcome Trust Case Control Consortium detected strong association at four novel loci. We tested 37 SNPs from these and other loci for association in an independent case-control sample. We obtained replication for the autophagy-inducing IRGM gene on chromosome 5q33.1 (replication P = 6.6 x 10(-4), combined P = 2.1 x 10(-10)) and for nine other loci, including NKX2-3, PTPN2 and gene deserts on chromosomes 1q and 5p13.
Asunto(s)
Autofagia/genética , Enfermedad de Crohn/genética , Proteínas de Unión al GTP/genética , Predisposición Genética a la Enfermedad , Variación Genética , Animales , Estudios de Casos y Controles , Humanos , Ratones , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADNRESUMEN
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, 'missing' heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.
Asunto(s)
Enfermedades Genéticas Congénitas/genética , Predisposición Genética a la Enfermedad/genética , Genética Médica/métodos , Genética Médica/tendencias , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/tendencias , Humanos , Patrón de Herencia/genética , LinajeRESUMEN
Genome-wide association studies involving hundreds of thousands of SNPs in thousands of cases and controls are now underway. The first of many analytical challenges in these studies involves the choice of SNPs to genotype. It is not practical to construct a different panel of tag SNPs for each study, so the first generation of genome-wide scans will use predefined, commercially available marker panels, which will in part dictate their success or failure. We compare different approaches in use today, and show that although many of them provide substantial coverage of common variation in non-African populations, the precise extent is strongly dependent on the frequencies of alleles of interest and on specific considerations of study design. Overall, despite substantial differences in genotyping technologies, marker selection strategies and number of markers assayed, the first-generation high-throughput platforms all offer similar levels of genome coverage.
Asunto(s)
Genoma Humano , Polimorfismo de Nucleótido Simple , Estudios de Casos y Controles , Haplotipos , Humanos , Funciones de Verosimilitud , Desequilibrio de LigamientoRESUMEN
The past year has witnessed substantial advances in understanding the genetic basis of many common phenotypes of biomedical importance. These advances have been the result of systematic, well-powered, genome-wide surveys exploring the relationships between common sequence variation and disease predisposition. This approach has revealed over 50 disease-susceptibility loci and has provided insights into the allelic architecture of multifactorial traits. At the same time, much has been learned about the successful prosecution of association studies on such a scale. This Review highlights the knowledge gained, defines areas of emerging consensus, and describes the challenges that remain as researchers seek to obtain more complete descriptions of the susceptibility architecture of biomedical traits of interest and to translate the information gathered into improvements in clinical management.
Asunto(s)
Enfermedades Genéticas Congénitas/genética , Predisposición Genética a la Enfermedad , Variación Genética , Genoma Humano , Sitios de Carácter Cuantitativo , Carácter Cuantitativo Heredable , Alelos , Animales , HumanosRESUMEN
After nearly 10 years of intense academic and commercial research effort, large genome-wide association studies for common complex diseases are now imminent. Although these conditions involve a complex relationship between genotype and phenotype, including interactions between unlinked loci, the prevailing strategies for analysis of such studies focus on the locus-by-locus paradigm. Here we consider analytical methods that explicitly look for statistical interactions between loci. We show first that they are computationally feasible, even for studies of hundreds of thousands of loci, and second that even with a conservative correction for multiple testing, they can be more powerful than traditional analyses under a range of models for interlocus interactions. We also show that plausible variations across populations in allele frequencies among interacting loci can markedly affect the power to detect their marginal effects, which may account in part for the well-known difficulties in replicating association results. These results suggest that searching for interactions among genetic loci can be fruitfully incorporated into analysis strategies for genome-wide association studies.
Asunto(s)
Enfermedades Genéticas Congénitas/genética , Predisposición Genética a la Enfermedad , Genoma Humano , Alelos , Ligamiento Genético , Marcadores Genéticos , Genética de Población , Humanos , Modelos GenéticosRESUMEN
A substantial investment has been made in the generation of large public resources designed to enable the identification of tag SNP sets, but data establishing the adequacy of the sample sizes used are limited. Using large-scale empirical and simulated data sets, we found that the sample sizes used in the HapMap project are sufficient to capture common variation, but that performance declines substantially for variants with minor allele frequencies of <5%.
Asunto(s)
Mapeo Cromosómico , Bases de Datos de Ácidos Nucleicos , Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad , Genoma Humano/genética , Polimorfismo de Nucleótido Simple , Frecuencia de los Genes , Humanos , Desequilibrio de Ligamiento , Tamaño de la MuestraRESUMEN
Efforts to find disease genes using high-density single-nucleotide polymorphism (SNP) maps will produce data sets that exceed the limitations of current computational tools. Here we describe a new, efficient method for the analysis of dense genetic maps in pedigree data that provides extremely fast solutions to common problems such as allele-sharing analyses and haplotyping. We show that sparse binary trees represent patterns of gene flow in general pedigrees in a parsimonious manner, and derive a family of related algorithms for pedigree traversal. With these trees, exact likelihood calculations can be carried out efficiently for single markers or for multiple linked markers. Using an approximate multipoint calculation that ignores the unlikely possibility of a large number of recombinants further improves speed and provides accurate solutions in dense maps with thousands of markers. Our multipoint engine for rapid likelihood inference (Merlin) is a computer program that uses sparse inheritance trees for pedigree analysis; it performs rapid haplotyping, genotype error detection and affected pair linkage analyses and can handle more markers than other pedigree analysis packages.
Asunto(s)
Algoritmos , Ligamiento Genético , Funciones de Verosimilitud , Programas Informáticos , Femenino , Genotipo , Haplotipos , Humanos , Masculino , Meiosis , Linaje , Polimorfismo GenéticoRESUMEN
Large-scale association studies hold substantial promise for unraveling the genetic basis of common human diseases. A well-known problem with such studies is the presence of undetected population structure, which can lead to both false positive results and failures to detect genuine associations. Here we examine approximately 15,000 genome-wide single-nucleotide polymorphisms typed in three population groups to assess the consequences of population structure on the coming generation of association studies. The consequences of population structure on association outcomes increase markedly with sample size. For the size of study needed to detect typical genetic effects in common diseases, even the modest levels of population structure within population groups cannot safely be ignored. We also examine one method for correcting for population structure (Genomic Control). Although it often performs well, it may not correct for structure if too few loci are used and may overcorrect in other settings, leading to substantial loss of power. The results of our analysis can guide the design of large-scale association studies.
Asunto(s)
Marcadores Genéticos , Predisposición Genética a la Enfermedad , Genética de Población , Polimorfismo de Nucleótido Simple/genética , Variación Genética , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos , Carácter Cuantitativo HeredableRESUMEN
Developmental dyslexia is defined as a specific and significant impairment in reading ability that cannot be explained by deficits in intelligence, learning opportunity, motivation or sensory acuity. It is one of the most frequently diagnosed disorders in childhood, representing a major educational and social problem. It is well established that dyslexia is a significantly heritable trait with a neurobiological basis. The etiological mechanisms remain elusive, however, despite being the focus of intensive multidisciplinary research. All attempts to map quantitative-trait loci (QTLs) influencing dyslexia susceptibility have targeted specific chromosomal regions, so that inferences regarding genetic etiology have been made on the basis of very limited information. Here we present the first two complete QTL-based genome-wide scans for this trait, in large samples of families from the United Kingdom and United States. Using single-point analysis, linkage to marker D18S53 was independently identified as being one of the most significant results of the genome in each scan (P< or =0.0004 for single word-reading ability in each family sample). Multipoint analysis gave increased evidence of 18p11.2 linkage for single-word reading, yielding top empirical P values of 0.00001 (UK) and 0.0004 (US). Measures related to phonological and orthographic processing also showed linkage at this locus. We replicated linkage to 18p11.2 in a third independent sample of families (from the UK), in which the strongest evidence came from a phoneme-awareness measure (most significant P value=0.00004). A combined analysis of all UK families confirmed that this newly discovered 18p QTL is probably a general risk factor for dyslexia, influencing several reading-related processes. This is the first report of QTL-based genome-wide scanning for a human cognitive trait.
Asunto(s)
Mapeo Cromosómico/métodos , Cromosomas Humanos Par 18/genética , Dislexia/genética , Carácter Cuantitativo Heredable , Niño , Cromosomas Humanos Par 6/genética , Enfermedades en Gemelos/genética , Femenino , Heterogeneidad Genética , Ligamiento Genético , Marcadores Genéticos , Genotipo , Humanos , Escala de Lod , Masculino , Pruebas Psicológicas , Reino Unido , Estados UnidosRESUMEN
Atopic or immunoglobulin E (IgE)-mediated diseases include the common disorders of asthma, atopic dermatitis and allergic rhinitis. Chromosome 13q14 shows consistent linkage to atopy and the total serum IgE concentration. We previously identified association between total serum IgE levels and a novel 13q14 microsatellite (USAT24G1; ref. 7) and have now localized the underlying quantitative-trait locus (QTL) in a comprehensive single-nucleotide polymorphism (SNP) map. We found replicated association to IgE levels that was attributed to several alleles in a single gene, PHF11. We also found association with these variants to severe clinical asthma. The gene product (PHF11) contains two PHD zinc fingers and probably regulates transcription. Distinctive splice variants were expressed in immune tissues and cells.
Asunto(s)
Asma/genética , Cromosomas Humanos Par 13/genética , Inmunoglobulina E/sangre , Sitios de Carácter Cuantitativo , Adulto , Alelos , Empalme Alternativo , Estudios de Casos y Controles , Niño , Femenino , Haplotipos , Humanos , Masculino , Datos de Secuencia Molecular , Polimorfismo de Nucleótido Simple , Distribución Tisular , Dedos de Zinc/genéticaRESUMEN
Genetic variation in LRRK2 predisposes to Parkinson disease (PD), which underpins its development as a therapeutic target. Here, we aimed to identify novel genotype-phenotype associations that might support developing LRRK2 therapies for other conditions. We sequenced the 51 exons of LRRK2 in cases comprising 12 common diseases (n = 9,582), and in 4,420 population controls. We identified 739 single-nucleotide variants, 62% of which were observed in only one person, including 316 novel exonic variants. We found evidence of purifying selection for the LRRK2 gene and a trend suggesting that this is more pronounced in the central (ROC-COR-kinase) core protein domains of LRRK2 than the flanking domains. Population genetic analyses revealed that LRRK2 is not especially polymorphic or differentiated in comparison to 201 other drug target genes. Among Europeans, we identified 17 carriers (0.13%) of pathogenic LRRK2 mutations that were not significantly enriched within any disease or in those reporting a family history of PD. Analysis of pathogenic mutations within Europe reveals that the p.Arg1628Pro (c4883G>C) mutation arose independently in Europe and Asia. Taken together, these findings demonstrate how targeted deep sequencing can help to reveal fundamental characteristics of clinically important loci.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteínas Serina-Treonina Quinasas/genética , Europa (Continente) , Predisposición Genética a la Enfermedad , Genética de Población , Humanos , Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina , Mutación , Enfermedad de Parkinson/genética , Población Blanca/genéticaRESUMEN
Genes in the major histocompatibility complex (MHC) encode proteins important in activating antigen-specific immune responses. Alleles at adjacent MHC loci are often in strong linkage disequilibrium; however, little is known about the mechanisms responsible for this linkage disequilibrium. Here we report that the human MHC HLA-DR2 haplotype, which predisposes to multiple sclerosis, shows more extensive linkage disequilibrium than other common caucasian HLA haplotypes in the DR region and thus seems likely to have been maintained through positive selection. Characterization of two multiple-sclerosis-associated HLA-DR alleles at separate loci by a functional assay in humanized mice indicates that the linkage disequilibrium between the two alleles may be due to a functional epistatic interaction, whereby one allele modifies the T-cell response activated by the second allele through activation-induced cell death. This functional epistasis is associated with a milder form of multiple-sclerosis-like disease. Such epistatic interaction might prove to be an important general mechanism for modifying exuberant immune responses that are deleterious to the host and could also help to explain the strong linkage disequilibrium in this and perhaps other HLA haplotypes.
Asunto(s)
Epistasis Genética , Antígeno HLA-DR2/genética , Haplotipos/genética , Esclerosis Múltiple/genética , Alelos , Animales , Linfocitos T CD4-Positivos/inmunología , Modelos Animales de Enfermedad , Encefalomielitis Autoinmune Experimental/genética , Encefalomielitis Autoinmune Experimental/patología , Humanos , Desequilibrio de Ligamiento/genética , Ratones , Esclerosis Múltiple/patologíaRESUMEN
Laboratory safety data are routinely collected in clinical studies for safety monitoring and assessment. We have developed a truncated robust multivariate outlier detection method for identifying subjects with clinically relevant abnormal laboratory measurements. The proposed method can be applied to historical clinical data to establish a multivariate decision boundary that can then be used for future clinical trial laboratory safety data monitoring and assessment. Simulations demonstrate that the proposed method has the ability to detect relevant outliers while automatically excluding irrelevant outliers. Two examples from actual clinical studies are used to illustrate the use of this method for identifying clinically relevant outliers.
Asunto(s)
Ensayos Clínicos como Asunto/estadística & datos numéricos , Interpretación Estadística de Datos , Monitoreo de Drogas/estadística & datos numéricos , Modelos Biológicos , Modelos Estadísticos , Análisis Multivariante , Biomarcadores/sangre , Simulación por Computador , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Lipoproteínas LDL/sangre , Pruebas de Función Hepática , Seguridad/estadística & datos numéricos , Triglicéridos/sangreRESUMEN
Significant allele flipping, where associations for the same disease occur at opposite alleles of the same bi-allelic locus, is increasing. But when is a significant allele flip genuine? We address the statistical issues of claiming and observing genuine allele flips in actual samples. We show that unless an allele flip is genuine, the probability of observing a significant allele flip in samples ascertained similarly from a common population is negligible. We derive expressions for the expected values of commonly used measures of association, which confirm previous findings that the underlying mechanism of a genuine allele flip is variation in the haplotype frequencies and show further how this variation interacts with variation in the genetic effects to impact allele flipping. We show that for association testing at proxy SNPs, common in genome-wide association studies, variation in haplotype frequencies must coincide with a reversal in the sign of linkage disequilibrium (LD) to trigger genuine allele flips. Using HapMap data and r, rather than r(2), to highlight previously unobserved effects, we show that unless genetic effects are large, variation in LD is unlikely to cause genuine allele flips in samples drawn from the same population. However, as populations diverge, it is an increasingly viable cause of a genuine allele flip for sufficiently large genetic effect and/or sample sizes. We conclude that evidence of variation in local patterns of LD, ancestral composition of study samples, and environmental exposures between study populations can provide compelling practical evidence in defense of a genuine allele flip.
Asunto(s)
Alelos , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genética , Estudios de Casos y Controles , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Haplotipos , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos , Epidemiología Molecular/métodos , ProbabilidadRESUMEN
Genome-wide association (GWA) studies have proved extremely successful in identifying novel genetic loci contributing effects to complex human diseases. In doing so, they have highlighted the fact that many potential loci of modest effect remain undetected, partly due to the need for samples consisting of many thousands of individuals. Large-scale international initiatives, such as the Wellcome Trust Case Control Consortium, the Genetic Association Information Network, and the database of genetic and phenotypic information, aim to facilitate discovery of modest-effect genes by making genome-wide data publicly available, allowing information to be combined for the purpose of pooled analysis. In principle, disease or control samples from these studies could be used to increase the power of any GWA study via judicious use as "genetically matched controls" for other traits. Here, we present the biological motivation for the problem and the theoretical potential for expanding the control group with publicly available disease or reference samples. We demonstrate that a naïve application of this strategy can greatly inflate the false-positive error rate in the presence of population structure. As a remedy, we make use of genome-wide data and model selection techniques to identify "axes" of genetic variation which are associated with disease. These axes are then included as covariates in association analysis to correct for population structure, which can result in increases in power over standard analysis of genetic information from the samples in the original GWA study.