Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
Nature ; 631(8021): 583-592, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38768635

ABSTRACT

Rare coding variants that substantially affect function provide insights into the biology of a gene1-3. However, ascertaining the frequency of such variants requires large sample sizes4-8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.


Subject(s)
Exome , Genetic Variation , Proteins , Humans , Alleles , Exome/genetics , Exome Sequencing , Gene Frequency , Genetic Variation/genetics , Heterozygote , Loss of Function Mutation/genetics , Mutation, Missense/genetics , Open Reading Frames/genetics , Proteins/genetics , RNA Splice Sites/genetics , Precision Medicine
2.
Am J Hum Genet ; 111(10): 2139-2149, 2024 Oct 03.
Article in English | MEDLINE | ID: mdl-39366334

ABSTRACT

Gene-based burden tests are a popular and powerful approach for analysis of exome-wide association studies. These approaches combine sets of variants within a gene into a single burden score that is then tested for association. Typically, a range of burden scores are calculated and tested across a range of annotation classes and frequency bins. Correlation between these tests can complicate the multiple testing correction and hamper interpretation of the results. We introduce a method called the sparse burden association test (SBAT) that tests the joint set of burden scores under the assumption that causal burden scores act in the same effect direction. The method simultaneously assesses the significance of the model fit and selects the set of burden scores that best explain the association at the same time. Using simulated data, we show that the method is well calibrated and highlight scenarios where the test outperforms existing gene-based tests. We apply the method to 73 quantitative traits from the UK Biobank, showing that SBAT is a valuable additional gene-based test when combined with other existing approaches. This test is implemented in the REGENIE software.


Subject(s)
Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Least-Squares Analysis , Software , Models, Genetic , Exome/genetics , Genetic Variation , Computer Simulation
3.
Nature ; 599(7886): 628-634, 2021 11.
Article in English | MEDLINE | ID: mdl-34662886

ABSTRACT

A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.


Subject(s)
Biological Specimen Banks , Databases, Genetic , Exome Sequencing , Exome/genetics , Africa/ethnology , Asia/ethnology , Asthma/genetics , Diabetes Mellitus/genetics , Europe/ethnology , Eye Diseases/genetics , Female , Genetic Predisposition to Disease/genetics , Genetic Variation , Genome-Wide Association Study , Humans , Hypertension/genetics , Liver Diseases/genetics , Male , Mutation , Neoplasms/genetics , Quantitative Trait, Heritable , United Kingdom
4.
Nature ; 586(7831): 749-756, 2020 10.
Article in English | MEDLINE | ID: mdl-33087929

ABSTRACT

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.


Subject(s)
Databases, Genetic , Exome Sequencing , Exome/genetics , Loss of Function Mutation/genetics , Phenotype , Aged , Bone Density/genetics , Collagen Type VI/genetics , Demography , Female , Genes, BRCA1 , Genes, BRCA2 , Genotype , Humans , Ion Channels/genetics , Male , Middle Aged , Neoplasms/genetics , Penetrance , Peptide Fragments/genetics , United Kingdom , Varicose Veins/genetics , ras GTPase-Activating Proteins/genetics
5.
Nature ; 570(7759): 71-76, 2019 06.
Article in English | MEDLINE | ID: mdl-31118516

ABSTRACT

Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10-3) and candidate genes from knockout mice (P = 5.2 × 10-3). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000-185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Exome Sequencing , Exome/genetics , Animals , Case-Control Studies , Decision Support Techniques , Female , Gene Frequency , Genome-Wide Association Study , Humans , Male , Mice , Mice, Knockout
6.
Am J Hum Genet ; 108(7): 1350-1355, 2021 07 01.
Article in English | MEDLINE | ID: mdl-34115965

ABSTRACT

Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19), a respiratory illness that can result in hospitalization or death. We used exome sequence data to investigate associations between rare genetic variants and seven COVID-19 outcomes in 586,157 individuals, including 20,952 with COVID-19. After accounting for multiple testing, we did not identify any clear associations with rare variants either exome wide or when specifically focusing on (1) 13 interferon pathway genes in which rare deleterious variants have been reported in individuals with severe COVID-19, (2) 281 genes located in susceptibility loci identified by the COVID-19 Host Genetics Initiative, or (3) 32 additional genes of immunologic relevance and/or therapeutic potential. Our analyses indicate there are no significant associations with rare protein-coding variants with detectable effect sizes at our current sample sizes. Analyses will be updated as additional data become available, and results are publicly available through the Regeneron Genetics Center COVID-19 Results Browser.


Subject(s)
COVID-19/diagnosis , COVID-19/genetics , Exome Sequencing , Exome/genetics , Genetic Predisposition to Disease , Hospitalization/statistics & numerical data , COVID-19/immunology , COVID-19/therapy , Female , Humans , Interferons/genetics , Male , Prognosis , SARS-CoV-2 , Sample Size
7.
Genet Epidemiol ; 45(6): 664-681, 2021 09.
Article in English | MEDLINE | ID: mdl-34184762

ABSTRACT

Serum alanine aminotransferase (ALT) and aspartate aminotransferase (AST) are biomarkers for liver health. Here we report the largest genome-wide association analysis to date of serum ALT and AST levels in over 388k people of European ancestry from UK biobank and DiscovEHR. Eleven million imputed markers with a minor allele frequency (MAF) ≥ 0.5% were analyzed. Overall, 300 ALT and 336 AST independent genome-wide significant associations were identified. Among them, 81 ALT and 61 AST associations are reported for the first time. Genome-wide interaction study identified 9 ALT and 12 AST independent associations significantly modified by body mass index (BMI), including several previously reported potential liver disease therapeutic targets, for example, PNPLA3, HSD17B13, and MARC1. While further work is necessary to understand the effect of ALT and AST-associated variants on liver disease, the weighted burden of significant BMI-modified signals is significantly associated with liver disease outcomes. In summary, this study identifies genetic associations which offer an important step forward in understanding the genetic architecture of serum ALT and AST levels. Significant interactions between BMI and genetic loci not only highlight the important role of adiposity in liver damage but also shed light on the genetic etiology of liver disease in obese individuals.


Subject(s)
Alanine Transaminase/blood , Aspartate Aminotransferases/blood , Body Mass Index , Genome-Wide Association Study , Humans
8.
Am J Hum Genet ; 102(1): 103-115, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29290336

ABSTRACT

Atrial fibrillation (AF) is a common cardiac arrhythmia and a major risk factor for stroke, heart failure, and premature death. The pathogenesis of AF remains poorly understood, which contributes to the current lack of highly effective treatments. To understand the genetic variation and biology underlying AF, we undertook a genome-wide association study (GWAS) of 6,337 AF individuals and 61,607 AF-free individuals from Norway, including replication in an additional 30,679 AF individuals and 278,895 AF-free individuals. Through genotyping and dense imputation mapping from whole-genome sequencing, we tested almost nine million genetic variants across the genome and identified seven risk loci, including two novel loci. One novel locus (lead single-nucleotide variant [SNV] rs12614435; p = 6.76 × 10-18) comprised intronic and several highly correlated missense variants situated in the I-, A-, and M-bands of titin, which is the largest protein in humans and responsible for the passive elasticity of heart and skeletal muscle. The other novel locus (lead SNV rs56202902; p = 1.54 × 10-11) covered a large, gene-dense chromosome 1 region that has previously been linked to cardiac conduction. Pathway and functional enrichment analyses suggested that many AF-associated genetic variants act through a mechanism of impaired muscle cell differentiation and tissue formation during fetal heart development.


Subject(s)
Atrial Fibrillation/genetics , Genetic Loci , Genetic Predisposition to Disease , Genome-Wide Association Study , Heart/embryology , Regulatory Sequences, Nucleic Acid/genetics , Humans , Inheritance Patterns/genetics , Multifactorial Inheritance/genetics , Organ Specificity/genetics , Physical Chromosome Mapping , Quantitative Trait Loci/genetics , Reproducibility of Results , Risk Factors
9.
Genome Res ; 28(7): 1039-1052, 2018 07.
Article in English | MEDLINE | ID: mdl-29773658

ABSTRACT

Current approaches to detect and characterize mosaic chromosomal aneuploidy are limited by sensitivity, efficiency, cost, or the need to culture cells. We describe the mosaic aneuploidy detection by massively parallel sequencing (MAD-seq) capture assay and the MADSEQ analytical approach that allow low (<10%) levels of mosaicism for chromosomal aneuploidy or regional loss of heterozygosity to be detected, assigned to a meiotic or mitotic origin, and quantified as a proportion of the cells in the sample. We show results from a multi-ethnic MAD-seq (meMAD-seq) capture design that works equally well in populations of diverse racial and ethnic origins and how the MADSEQ analytical approach can be applied to exome or whole-genome sequencing data, revealing previously unrecognized aneuploidy or copy number neutral loss of heterozygosity in samples studied by the 1000 Genomes Project, cell lines from public repositories, and one of the Illumina Platinum Genomes samples. We have made the meMAD-seq capture design and MADSEQ analytical software open for unrestricted use, with the goal that they can be applied in clinical samples to allow new insights into the unrecognized prevalence of mosaic chromosomal aneuploidy in humans and its phenotypic associations.


Subject(s)
Chromosomes/genetics , High-Throughput Nucleotide Sequencing/methods , Aneuploidy , Exome/genetics , Female , Genome/genetics , Humans , Male , Mosaicism , Software
10.
Nat Genet ; 2024 Sep 25.
Article in English | MEDLINE | ID: mdl-39322778

ABSTRACT

Whole-genome sequencing (WGS), whole-exome sequencing (WES) and array genotyping with imputation (IMP) are common strategies for assessing genetic variation and its association with medically relevant phenotypes. To date, there has been no systematic empirical assessment of the yield of these approaches when applied to hundreds of thousands of samples to enable the discovery of complex trait genetic signals. Using data for 100 complex traits from 149,195 individuals in the UK Biobank, we systematically compare the relative yield of these strategies in genetic association studies. We find that WGS and WES combined with arrays and imputation (WES + IMP) have the largest association yield. Although WGS results in an approximately fivefold increase in the total number of assayed variants over WES + IMP, the number of detected signals differed by only 1% for both single-variant and gene-based association analyses. Given that WES + IMP typically results in savings of lab and computational time and resources expended per sample, we evaluate the potential benefits of applying WES + IMP to larger samples. When we extend our WES + IMP analyses to 468,169 UK Biobank individuals, we observe an approximately fourfold increase in association signals with the threefold increase in sample size. We conclude that prioritizing WES + IMP and large sample sizes rather than contemporary short-read WGS alternatives will maximize the number of discoveries in genetic association studies.

11.
Nat Genet ; 56(8): 1592-1596, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39103650

ABSTRACT

Coronavirus disease 2019 (COVID-19) and influenza are respiratory illnesses caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses, respectively. Both diseases share symptoms and clinical risk factors1, but the extent to which these conditions have a common genetic etiology is unknown. This is partly because host genetic risk factors are well characterized for COVID-19 but not for influenza, with the largest published genome-wide association studies for these conditions including >2 million individuals2 and about 1,000 individuals3-6, respectively. Shared genetic risk factors could point to targets to prevent or treat both infections. Through a genetic study of 18,334 cases with a positive test for influenza and 276,295 controls, we show that published COVID-19 risk variants are not associated with influenza. Furthermore, we discovered and replicated an association between influenza infection and noncoding variants in B3GALT5 and ST6GAL1, neither of which was associated with COVID-19. In vitro small interfering RNA knockdown of ST6GAL1-an enzyme that adds sialic acid to the cell surface, which is used for viral entry-reduced influenza infectivity by 57%. These results mirror the observation that variants that downregulate ACE2, the SARS-CoV-2 receptor, protect against COVID-19 (ref. 7). Collectively, these findings highlight downregulation of key cell surface receptors used for viral entry as treatment opportunities to prevent COVID-19 and influenza.


Subject(s)
COVID-19 , Genetic Predisposition to Disease , Genome-Wide Association Study , Influenza, Human , SARS-CoV-2 , Humans , Influenza, Human/genetics , Influenza, Human/epidemiology , Influenza, Human/virology , COVID-19/genetics , COVID-19/virology , Risk Factors , SARS-CoV-2/genetics , Male , Female , Polymorphism, Single Nucleotide , Case-Control Studies , Middle Aged
12.
bioRxiv ; 2023 Nov 02.
Article in English | MEDLINE | ID: mdl-37214792

ABSTRACT

Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.

13.
Nat Genet ; 54(4): 382-392, 2022 04.
Article in English | MEDLINE | ID: mdl-35241825

ABSTRACT

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) enters human host cells via angiotensin-converting enzyme 2 (ACE2) and causes coronavirus disease 2019 (COVID-19). Here, through a genome-wide association study, we identify a variant (rs190509934, minor allele frequency 0.2-2%) that downregulates ACE2 expression by 37% (P = 2.7 × 10-8) and reduces the risk of SARS-CoV-2 infection by 40% (odds ratio = 0.60, P = 4.5 × 10-13), providing human genetic evidence that ACE2 expression levels influence COVID-19 risk. We also replicate the associations of six previously reported risk variants, of which four were further associated with worse outcomes in individuals infected with the virus (in/near LZTFL1, MHC, DPP9 and IFNAR2). Lastly, we show that common variants define a risk score that is strongly associated with severe disease among cases and modestly improves the prediction of disease severity relative to demographic and clinical factors alone.


Subject(s)
COVID-19 , Angiotensin-Converting Enzyme 2/genetics , COVID-19/genetics , Genome-Wide Association Study , Humans , Risk Factors , SARS-CoV-2/genetics
14.
HGG Adv ; 2(3): 100039, 2021 Jul 08.
Article in English | MEDLINE | ID: mdl-35047837

ABSTRACT

Parent-of-origin (PoO) effects refer to the differential phenotypic impacts of genetic variants dependent on their parental inheritance due to imprinting. While PoO effects can influence complex traits, they may be poorly captured by models that do not differentiate the parental origin of the variant. The aim of this study was to conduct a genome-wide screen for PoO effects on a broad range of clinical traits derived from electronic health records (EHR) in the DiscovEHR study enriched with familial relationships. Using pairwise kinship estimates from genetic data and demographic data, we identified 22,051 offspring among 134,049 individuals in the DiscovEHR study. PoO of ~9 million variants was assigned in the offspring by comparing offspring and parental genotypes and haplotypes. We then performed genome-wide PoO association analyses across 154 quantitative and 611 binary traits extracted from EHR. Of the 732 significant PoO associations identified (p < 5 × 10-8), we attempted to replicate 274 PoO associations in the UK Biobank study with 5,015 offspring and replicated 9 PoO associations (p < 0.05). In summary, our study implements a bioinformatic and statistical approach to examine PoO effects genome-wide in a large population study enriched with familial relationships and systematically characterizes PoO effects on hundreds of clinical traits derived from EHR. Our results suggest that, while the statistical power to detect PoO effects remains modest yet, accurately modeling PoO effects has the potential to find new associations that may have been missed by the standard additive model, further enhancing the mechanistic understanding of genetic influence on complex traits.

15.
Nat Genet ; 53(7): 1097-1103, 2021 07.
Article in English | MEDLINE | ID: mdl-34017140

ABSTRACT

Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.


Subject(s)
Computational Biology , Genome-Wide Association Study , Genomics , Case-Control Studies , Computational Biology/methods , Genome-Wide Association Study/methods , Genomics/methods , Genotype , Humans , Logistic Models , Machine Learning , Phenotype , Reproducibility of Results
16.
Nat Commun ; 9(1): 2252, 2018 06 13.
Article in English | MEDLINE | ID: mdl-29899519

ABSTRACT

Angiopoietin-like 4 (ANGPTL4) is an endogenous inhibitor of lipoprotein lipase that modulates lipid levels, coronary atherosclerosis risk, and nutrient partitioning. We hypothesize that loss of ANGPTL4 function might improve glucose homeostasis and decrease risk of type 2 diabetes (T2D). We investigate protein-altering variants in ANGPTL4 among 58,124 participants in the DiscovEHR human genetics study, with follow-up studies in 82,766 T2D cases and 498,761 controls. Carriers of p.E40K, a variant that abolishes ANGPTL4 ability to inhibit lipoprotein lipase, have lower odds of T2D (odds ratio 0.89, 95% confidence interval 0.85-0.92, p = 6.3 × 10-10), lower fasting glucose, and greater insulin sensitivity. Predicted loss-of-function variants are associated with lower odds of T2D among 32,015 cases and 84,006 controls (odds ratio 0.71, 95% confidence interval 0.49-0.99, p = 0.041). Functional studies in Angptl4-deficient mice confirm improved insulin sensitivity and glucose homeostasis. In conclusion, genetic inactivation of ANGPTL4 is associated with improved glucose homeostasis and reduced risk of T2D.


Subject(s)
Angiopoietin-Like Protein 4/deficiency , Angiopoietin-Like Protein 4/genetics , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Amino Acid Substitution , Angiopoietin-Like Protein 4/metabolism , Animals , Blood Glucose/metabolism , Case-Control Studies , Diabetes Mellitus, Type 2/etiology , Female , Gene Silencing , Genetic Association Studies , Genetic Variation , Heterozygote , Homeostasis , Humans , Insulin Resistance/genetics , Lipoprotein Lipase/metabolism , Male , Mice , Mice, Inbred C57BL , Mice, Knockout , Risk Factors , Exome Sequencing
17.
Nat Genet ; 48(6): 593-9, 2016 06.
Article in English | MEDLINE | ID: mdl-27111036

ABSTRACT

We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.


Subject(s)
Chromosomes, Human, Y , Demography , Haplotypes , Humans , Male , Mutation , Phylogeny , Polymorphism, Single Nucleotide
18.
Science ; 354(6319)2016 Dec 23.
Article in English | MEDLINE | ID: mdl-28008009

ABSTRACT

The DiscovEHR collaboration between the Regeneron Genetics Center and Geisinger Health System couples high-throughput sequencing to an integrated health care system using longitudinal electronic health records (EHRs). We sequenced the exomes of 50,726 adult participants in the DiscovEHR study to identify ~4.2 million rare single-nucleotide variants and insertion/deletion events, of which ~176,000 are predicted to result in a loss of gene function. Linking these data to EHR-derived clinical phenotypes, we find clinical associations supporting therapeutic targets, including genes encoding drug targets for lipid lowering, and identify previously unidentified rare alleles associated with lipid levels and other blood level traits. About 3.5% of individuals harbor deleterious variants in 76 clinically actionable genes. The DiscovEHR data set provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapeutic discovery.


Subject(s)
Delivery of Health Care, Integrated , Disease/genetics , Electronic Health Records , Exome/genetics , High-Throughput Nucleotide Sequencing , Adult , Drug Design , Gene Frequency , Genomics , Humans , Hypolipidemic Agents/pharmacology , INDEL Mutation , Lipids/blood , Molecular Targeted Therapy , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL