Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 236
Filter
Add more filters

Publication year range
1.
Cell ; 185(16): 3041-3055.e25, 2022 08 04.
Article in English | MEDLINE | ID: mdl-35917817

ABSTRACT

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.


Subject(s)
DNA Copy Number Variations , Genome, Human , DNA Copy Number Variations/genetics , Gene Dosage , Haploinsufficiency/genetics , Humans
2.
Nature ; 625(7993): 92-100, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Subject(s)
Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic
3.
Nature ; 614(7948): 492-499, 2023 02.
Article in English | MEDLINE | ID: mdl-36755099

ABSTRACT

Both common and rare genetic variants influence complex traits and common diseases. Genome-wide association studies have identified thousands of common-variant associations, and more recently, large-scale exome sequencing studies have identified rare-variant associations in hundreds of genes1-3. However, rare-variant genetic architecture is not well characterized, and the relationship between common-variant and rare-variant architecture is unclear4. Here we quantify the heritability explained by the gene-wise burden of rare coding variants across 22 common traits and diseases in 394,783 UK Biobank exomes5. Rare coding variants (allele frequency < 1 × 10-3) explain 1.3% (s.e. = 0.03%) of phenotypic variance on average-much less than common variants-and most burden heritability is explained by ultrarare loss-of-function variants (allele frequency < 1 × 10-5). Common and rare variants implicate the same cell types, with similar enrichments, and they have pleiotropic effects on the same pairs of traits, with similar genetic correlations. They partially colocalize at individual genes and loci, but not to the same extent: burden heritability is strongly concentrated in significant genes, while common-variant heritability is more polygenic, and burden heritability is also more strongly concentrated in constrained genes. Finally, we find that burden heritability for schizophrenia and bipolar disorder6,7 is approximately 2%. Our results indicate that rare coding variants will implicate a tractable number of large-effect genes, that common and rare associations are mechanistically convergent, and that rare coding variants will contribute only modestly to missing heritability and population risk stratification.


Subject(s)
Exome , Gene Frequency , Genetic Variation , Multifactorial Inheritance , Humans , Exome/genetics , Genetic Variation/genetics , Genome-Wide Association Study , Multifactorial Inheritance/genetics , Risk Factors , United Kingdom , Genetic Loci/genetics , Schizophrenia/genetics , Bipolar Disorder/genetics
4.
Nature ; 620(7975): 839-848, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37587338

ABSTRACT

Mitochondrial DNA (mtDNA) is a maternally inherited, high-copy-number genome required for oxidative phosphorylation1. Heteroplasmy refers to the presence of a mixture of mtDNA alleles in an individual and has been associated with disease and ageing. Mechanisms underlying common variation in human heteroplasmy, and the influence of the nuclear genome on this variation, remain insufficiently explored. Here we quantify mtDNA copy number (mtCN) and heteroplasmy using blood-derived whole-genome sequences from 274,832 individuals and perform genome-wide association studies to identify associated nuclear loci. Following blood cell composition correction, we find that mtCN declines linearly with age and is associated with variants at 92 nuclear loci. We observe that nearly everyone harbours heteroplasmic mtDNA variants obeying two principles: (1) heteroplasmic single nucleotide variants tend to arise somatically and accumulate sharply after the age of 70 years, whereas (2) heteroplasmic indels are maternally inherited as mixtures with relative levels associated with 42 nuclear loci involved in mtDNA replication, maintenance and novel pathways. These loci may act by conferring a replicative advantage to certain mtDNA alleles. As an illustrative example, we identify a length variant carried by more than 50% of humans at position chrM:302 within a G-quadruplex previously proposed to mediate mtDNA transcription/replication switching2,3. We find that this variant exerts cis-acting genetic control over mtDNA abundance and is itself associated in-trans with nuclear loci encoding machinery for this regulatory switch. Our study suggests that common variation in the nuclear genome can shape variation in mtCN and heteroplasmy dynamics across the human population.


Subject(s)
Cell Nucleus , DNA Copy Number Variations , DNA, Mitochondrial , Heteroplasmy , Mitochondria , Aged , Humans , DNA Copy Number Variations/genetics , DNA, Mitochondrial/genetics , Genome-Wide Association Study , Heteroplasmy/genetics , Mitochondria/genetics , Cell Nucleus/genetics , Alleles , Polymorphism, Single Nucleotide , INDEL Mutation , G-Quadruplexes
5.
Am J Hum Genet ; 111(10): 2129-2138, 2024 Oct 03.
Article in English | MEDLINE | ID: mdl-39270648

ABSTRACT

Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous. Therefore, directly applying these to the whole dataset can yield statistically invalid results. To account for this heterogeneity, HWE can be tested on subsets of samples that have genetically homogeneous ancestries and the results aggregated at each variant. To facilitate valid HWE subset testing, we developed a semi-supervised learning approach that predicts homogeneous ancestries based on the genotype. This method provides a convenient tool for estimating HWE in the presence of population structure and missing self-reported race and ethnicities in diverse WGS studies. In addition, assessing HWE within the homogeneous ancestries provides reliable HWE estimates that will directly benefit downstream analyses, including association analyses in WGS studies. We applied our proposed method on the CCDG dataset, predicting homogeneous genetic ancestry groups for 60,545 multi-ethnic WGS samples to assess HWE within each group.


Subject(s)
Supervised Machine Learning , Whole Genome Sequencing , Humans , Whole Genome Sequencing/methods , Genome, Human , Genetics, Population/methods , Ethnicity/genetics , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Genotype
6.
Genome Res ; 34(5): 796-809, 2024 06 25.
Article in English | MEDLINE | ID: mdl-38749656

ABSTRACT

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.


Subject(s)
Databases, Genetic , Genome, Human , Humans , Human Genome Project , High-Throughput Nucleotide Sequencing/methods , Genetic Variation , Genomics/methods
7.
Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-38000370

ABSTRACT

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.


Subject(s)
DNA , Trout , Humans , Animals , Sequence Analysis, DNA/methods , Genotype , Homozygote , High-Throughput Nucleotide Sequencing/methods , Software
8.
Genome Res ; 33(6): 999-1005, 2023 06.
Article in English | MEDLINE | ID: mdl-37253541

ABSTRACT

Large-scale high-throughput sequencing data sets have been transformative for informing clinical variant interpretation and for use as reference panels for statistical and population genetic efforts. Although such resources are often treated as ground truth, we find that in widely used reference data sets such as the Genome Aggregation Database (gnomAD), some variants pass gold-standard filters, yet are systematically different in their genotype calls across genotype discovery approaches. The inclusion of such discordant sites in study designs involving multiple genotype discovery strategies could bias results and lead to false-positive hits in association studies owing to technological artifacts rather than a true relationship to the phenotype. Here, we describe this phenomenon of discordant genotype calls across genotype discovery approaches, characterize the error mode of wrong calls, provide a list of discordant sites identified in gnomAD that should be treated with caution in analyses, and present a metric and machine learning classifier trained on gnomAD data to identify likely discordant variants in other data sets. We find that different genotype discovery approaches have different sets of variants at which this problem occurs, but there are characteristic variant features that can be used to predict discordant behavior. Discordant sites are largely shared across ancestry groups, although different populations are powered for the discovery of different variants. We find that the most common error mode is that of a variant being heterozygous for one approach and homozygous for the other, with heterozygous in the genomes and homozygous reference in the exomes making up the majority of miscalls.


Subject(s)
Exome , Genetics, Population , Genotype , Heterozygote , Phenotype , Polymorphism, Single Nucleotide
9.
Nature ; 583(7814): 83-89, 2020 07.
Article in English | MEDLINE | ID: mdl-32460305

ABSTRACT

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.


Subject(s)
Genetic Variation , Genome, Human/genetics , Whole Genome Sequencing , Alleles , Case-Control Studies , Epigenesis, Genetic , Female , Gene Dosage/genetics , Genetics, Population , High-Throughput Nucleotide Sequencing , Humans , Male , Molecular Sequence Annotation , Quantitative Trait Loci , Racial Groups/genetics , Software
10.
Nature ; 586(7831): 769-775, 2020 10.
Article in English | MEDLINE | ID: mdl-33057200

ABSTRACT

Myeloproliferative neoplasms (MPNs) are blood cancers that are characterized by the excessive production of mature myeloid cells and arise from the acquisition of somatic driver mutations in haematopoietic stem cells (HSCs). Epidemiological studies indicate a substantial heritable component of MPNs that is among the highest known for cancers1. However, only a limited number of genetic risk loci have been identified, and the underlying biological mechanisms that lead to the acquisition of MPNs remain unclear. Here, by conducting a large-scale genome-wide association study (3,797 cases and 1,152,977 controls), we identify 17 MPN risk loci (P < 5.0 × 10-8), 7 of which have not been previously reported. We find that there is a shared genetic architecture between MPN risk and several haematopoietic traits from distinct lineages; that there is an enrichment for MPN risk variants within accessible chromatin of HSCs; and that increased MPN risk is associated with longer telomere length in leukocytes and other clonal haematopoietic states-collectively suggesting that MPN risk is associated with the function and self-renewal of HSCs. We use gene mapping to identify modulators of HSC biology linked to MPN risk, and show through targeted variant-to-function assays that CHEK2 and GFI1B have roles in altering the function of HSCs to confer disease risk. Overall, our results reveal a previously unappreciated mechanism for inherited MPN risk through the modulation of HSC function.


Subject(s)
Genetic Predisposition to Disease/genetics , Hematopoietic Stem Cells/pathology , Myeloproliferative Disorders/genetics , Myeloproliferative Disorders/pathology , Neoplasms/genetics , Neoplasms/pathology , Cell Lineage/genetics , Cell Self Renewal , Checkpoint Kinase 2/genetics , Female , Humans , Leukocytes/pathology , Male , Proto-Oncogene Proteins/genetics , Repressor Proteins/genetics , Risk , Telomere Homeostasis
11.
Nature ; 581(7809): 444-451, 2020 05.
Article in English | MEDLINE | ID: mdl-32461652

ABSTRACT

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.


Subject(s)
Disease/genetics , Genetic Variation , Genetics, Medical/standards , Genetics, Population/standards , Genome, Human/genetics , Female , Genetic Testing , Genotyping Techniques , Humans , Male , Middle Aged , Mutation , Polymorphism, Single Nucleotide/genetics , Racial Groups/genetics , Reference Standards , Selection, Genetic , Whole Genome Sequencing
12.
Am J Hum Genet ; 109(12): 2110-2125, 2022 12 01.
Article in English | MEDLINE | ID: mdl-36400022

ABSTRACT

The use of population descriptors such as race, ethnicity, and ancestry in science, medicine, and public health has a long, complicated, and at times dark history, particularly for genetics, given the field's perceived importance for understanding between-group differences. The historical and potential harms that come with irresponsible use of these categories suggests a clear need for definitive guidance about when and how they can be used appropriately. However, while many prior authors have provided such guidance, no established consensus exists, and the extant literature has not been examined for implied consensus and sources of disagreement. Here, we present the results of a scoping review of published normative recommendations regarding the use of population categories, particularly in genetics research. Following PRISMA guidelines, we extracted recommendations from n = 121 articles matching inclusion criteria. Articles were published consistently throughout the time period examined and in a broad range of journals, demonstrating an ongoing and interdisciplinary perceived need for guidance. Examined recommendations fall under one of eight themes identified during analysis. Seven are characterized by broad agreement across articles; one, "appropriate definitions of population categories and contexts for use," revealed substantial fundamental disagreement among articles. Additionally, while many articles focus on the inappropriate use of race, none fundamentally problematize ancestry. This work can be a resource to researchers looking for normative guidance on the use of population descriptors and can orient authors of future guidelines to this complex field, thereby contributing to the development of more effective future guidelines for genetics research.


Subject(s)
Ethnicity , Problem Behavior , Humans , Asian People , Consensus , Ethnicity/genetics , Research Personnel
13.
Am J Hum Genet ; 109(9): 1667-1679, 2022 09 01.
Article in English | MEDLINE | ID: mdl-36055213

ABSTRACT

African populations are the most diverse in the world yet are sorely underrepresented in medical genetics research. Here, we examine the structure of African populations using genetic and comprehensive multi-generational ethnolinguistic data from the Neuropsychiatric Genetics of African Populations-Psychosis study (NeuroGAP-Psychosis) consisting of 900 individuals from Ethiopia, Kenya, South Africa, and Uganda. We find that self-reported language classifications meaningfully tag underlying genetic variation that would be missed with consideration of geography alone, highlighting the importance of culture in shaping genetic diversity. Leveraging our uniquely rich multi-generational ethnolinguistic metadata, we track language transmission through the pedigree, observing the disappearance of several languages in our cohort as well as notable shifts in frequency over three generations. We find suggestive evidence for the rate of language transmission in matrilineal groups having been higher than that for patrilineal ones. We highlight both the diversity of variation within Africa as well as how within-Africa variation can be informative for broader variant interpretation; many variants that are rare elsewhere are common in parts of Africa. The work presented here improves the understanding of the spectrum of genetic variation in African populations and highlights the enormous and complex genetic and ethnolinguistic diversity across Africa.


Subject(s)
Genetic Variation , Genetics, Population , Africa, Southern , Black People/genetics , Genetic Structures , Genetic Variation/genetics , Humans
16.
Am J Hum Genet ; 108(4): 656-668, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33770507

ABSTRACT

Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.


Subject(s)
DNA Mutational Analysis/economics , DNA Mutational Analysis/standards , Genetic Variation/genetics , Genetics, Population/economics , Africa , DNA Mutational Analysis/methods , Genetics, Population/methods , Genome, Human/genetics , Genome-Wide Association Study , Health Equity , Humans , Microbiota , Whole Genome Sequencing/economics , Whole Genome Sequencing/standards
17.
N Engl J Med ; 385(1): 78-86, 2021 07 01.
Article in English | MEDLINE | ID: mdl-34192436

ABSTRACT

Companies have recently begun to sell a new service to patients considering in vitro fertilization: embryo selection based on polygenic scores (ESPS). These scores represent individualized predictions of health and other outcomes derived from genomewide association studies in adults to partially predict these outcomes. This article includes a discussion of many factors that lower the predictive power of polygenic scores in the context of embryo selection and quantifies these effects for a variety of clinical and nonclinical traits. Also discussed are potential unintended consequences of ESPS (including selecting for adverse traits, altering population demographics, exacerbating inequalities in society, and devaluing certain traits). Recommendations for the responsible communication about ESPS by practitioners are provided, and a call for a society-wide conversation about this technology is made. (Funded by the National Institute on Aging and others.).


Subject(s)
Embryo, Mammalian , Fertilization in Vitro , Genetic Testing , Genetic Variation , Multifactorial Inheritance/genetics , Phenotype , Preimplantation Diagnosis , Educational Status , Gene-Environment Interaction , Genome-Wide Association Study , Humans , Predictive Value of Tests
18.
Hum Mol Genet ; 30(16): 1521-1534, 2021 07 28.
Article in English | MEDLINE | ID: mdl-33987664

ABSTRACT

It is important to study the genetics of complex traits in diverse populations. Here, we introduce covariate-adjusted linkage disequilibrium (LD) score regression (cov-LDSC), a method to estimate SNP-heritability (${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}})$ and its enrichment in homogenous and admixed populations with summary statistics and in-sample LD estimates. In-sample LD can be estimated from a subset of the genome-wide association studies samples, allowing our method to be applied efficiently to very large cohorts. In simulations, we show that unadjusted LDSC underestimates ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ by 10-60% in admixed populations; in contrast, cov-LDSC is robustly accurate. We apply cov-LDSC to genotyping data from 8124 individuals, mostly of admixed ancestry, from the Slim Initiative in Genomic Medicine for the Americas study, and to approximately 161 000 Latino-ancestry individuals, 47 000 African American-ancestry individuals and 135 000 European-ancestry individuals, as classified by 23andMe. We estimate ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and detect heritability enrichment in three quantitative and five dichotomous phenotypes, making this, to our knowledge, the most comprehensive heritability-based analysis of admixed individuals to date. Most traits have high concordance of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and consistent tissue-specific heritability enrichment among different populations. However, for age at menarche, we observe population-specific heritability estimates of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$. We observe consistent patterns of tissue-specific heritability enrichment across populations; for example, in the limbic system for BMI, the per-standardized-annotation effect size $ \tau $* is 0.16 ± 0.04, 0.28 ± 0.11 and 0.18 ± 0.03 in the Latino-, African American- and European-ancestry populations, respectively. Our approach is a powerful way to analyze genetic data for complex traits from admixed populations.


Subject(s)
Genetics, Population , Genome-Wide Association Study/statistics & numerical data , Linkage Disequilibrium/genetics , Multifactorial Inheritance/genetics , Genotyping Techniques/statistics & numerical data , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait, Heritable
19.
Am J Hum Genet ; 107(1): 46-59, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32470373

ABSTRACT

In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.


Subject(s)
Multifactorial Inheritance/genetics , Aged , Cohort Studies , Diabetes Mellitus, Type 2/genetics , Female , Genome-Wide Association Study/methods , Genotype , Humans , Linkage Disequilibrium/genetics , Male , Middle Aged , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics
20.
Annu Rev Neurosci ; 38: 47-68, 2015 Jul 08.
Article in English | MEDLINE | ID: mdl-25840007

ABSTRACT

Next-generation sequencing, which allows genome-wide detection of rare and de novo mutations, is transforming neuropsychiatric disease genetics through identifying on an unprecedented scale genes and protein-coding mutations that confer risk. Although understanding how regulatory variants influence risk remains a challenge, we are likely transitioning into a phase of neuropsychiatric disease genetics in which the rate-limiting step may no longer be gene discovery. Instead, the future will concentrate more on the biological and clinical translation of the torrent of specific risk mutations identified through next-generation sequencing. Here, we review the recent progress that resulted specifically from exome sequencing and emphasize the need for rigorous statistical evaluation of the expanding data sets, as well as expanded functional analysis of implicated proteins and mutations. Then, we introduce some of the expected opportunities and challenges investigators face when moving beyond the exome. Finally, we briefly highlight the challenge of deriving translational benefit from the progress in genetics.


Subject(s)
Exome/genetics , Genetic Predisposition to Disease/genetics , Mental Disorders/genetics , Nervous System Diseases/genetics , Genome-Wide Association Study/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Mutation
SELECTION OF CITATIONS
SEARCH DETAIL