Search | VHL CLAP/WR-PAHO/WHO

1.

A cross-disorder dosage sensitivity map of the human genome.

Collins, Ryan L; Glessner, Joseph T; Porcu, Eleonora; Lepamets, Maarja; Brandon, Rhonda; Lauricella, Christopher; Han, Lide; Morley, Theodore; Niestroj, Lisa-Marie; Ulirsch, Jacob; Everett, Selin; Howrigan, Daniel P; Boone, Philip M; Fu, Jack; Karczewski, Konrad J; Kellaris, Georgios; Lowther, Chelsea; Lucente, Diane; Mohajeri, Kiana; Nõukas, Margit; Nuttle, Xander; Samocha, Kaitlin E; Trinh, Mi; Ullah, Farid; Võsa, Urmo; Hurles, Matthew E; Aradhya, Swaroop; Davis, Erica E; Finucane, Hilary; Gusella, James F; Janze, Aura; Katsanis, Nicholas; Matyakhina, Ludmila; Neale, Benjamin M; Sanders, David; Warren, Stephanie; Hodge, Jennelle C; Lal, Dennis; Ruderfer, Douglas M; Meck, Jeanne; Mägi, Reedik; Esko, Tõnu; Reymond, Alexandre; Kutalik, Zoltán; Hakonarson, Hakon; Sunyaev, Shamil; Brand, Harrison; Talkowski, Michael E.

Cell ; 185(16): 3041-3055.e25, 2022 08 04.

Article in English | MEDLINE | ID: mdl-35917817

ABSTRACT

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.

Subject(s)

DNA Copy Number Variations , Genome, Human , DNA Copy Number Variations/genetics , Gene Dosage , Haploinsufficiency/genetics , Humans

2.

A genomic mutational constraint map using variation in 76,156 human genomes.

Chen, Siwei; Francioli, Laurent C; Goodrich, Julia K; Collins, Ryan L; Kanai, Masahiro; Wang, Qingbo; Alföldi, Jessica; Watts, Nicholas A; Vittal, Christopher; Gauthier, Laura D; Poterba, Timothy; Wilson, Michael W; Tarasova, Yekaterina; Phu, William; Grant, Riley; Yohannes, Mary T; Koenig, Zan; Farjoun, Yossi; Banks, Eric; Donnelly, Stacey; Gabriel, Stacey; Gupta, Namrata; Ferriera, Steven; Tolonen, Charlotte; Novod, Sam; Bergelson, Louis; Roazen, David; Ruano-Rubio, Valentin; Covarrubias, Miguel; Llanwarne, Christopher; Petrillo, Nikelle; Wade, Gordon; Jeandet, Thibault; Munshi, Ruchi; Tibbetts, Kathleen; O'Donnell-Luria, Anne; Solomonson, Matthew; Seed, Cotton; Martin, Alicia R; Talkowski, Michael E; Rehm, Heidi L; Daly, Mark J; Tiao, Grace; Neale, Benjamin M; MacArthur, Daniel G; Karczewski, Konrad J.

Nature ; 625(7993): 92-100, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.

Subject(s)

Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic

3.

Polygenic architecture of rare coding variation across 394,783 exomes.

Weiner, Daniel J; Nadig, Ajay; Jagadeesh, Karthik A; Dey, Kushal K; Neale, Benjamin M; Robinson, Elise B; Karczewski, Konrad J; O'Connor, Luke J.

Nature ; 614(7948): 492-499, 2023 02.

Article in English | MEDLINE | ID: mdl-36755099

ABSTRACT

Both common and rare genetic variants influence complex traits and common diseases. Genome-wide association studies have identified thousands of common-variant associations, and more recently, large-scale exome sequencing studies have identified rare-variant associations in hundreds of genes1-3. However, rare-variant genetic architecture is not well characterized, and the relationship between common-variant and rare-variant architecture is unclear4. Here we quantify the heritability explained by the gene-wise burden of rare coding variants across 22 common traits and diseases in 394,783 UK Biobank exomes5. Rare coding variants (allele frequency < 1 × 10-3) explain 1.3% (s.e. = 0.03%) of phenotypic variance on average-much less than common variants-and most burden heritability is explained by ultrarare loss-of-function variants (allele frequency < 1 × 10-5). Common and rare variants implicate the same cell types, with similar enrichments, and they have pleiotropic effects on the same pairs of traits, with similar genetic correlations. They partially colocalize at individual genes and loci, but not to the same extent: burden heritability is strongly concentrated in significant genes, while common-variant heritability is more polygenic, and burden heritability is also more strongly concentrated in constrained genes. Finally, we find that burden heritability for schizophrenia and bipolar disorder6,7 is approximately 2%. Our results indicate that rare coding variants will implicate a tractable number of large-effect genes, that common and rare associations are mechanistically convergent, and that rare coding variants will contribute only modestly to missing heritability and population risk stratification.

Subject(s)

Exome , Gene Frequency , Genetic Variation , Multifactorial Inheritance , Humans , Exome/genetics , Genetic Variation/genetics , Genome-Wide Association Study , Multifactorial Inheritance/genetics , Risk Factors , United Kingdom , Genetic Loci/genetics , Schizophrenia/genetics , Bipolar Disorder/genetics

4.

Nuclear genetic control of mtDNA copy number and heteroplasmy in humans.

Gupta, Rahul; Kanai, Masahiro; Durham, Timothy J; Tsuo, Kristin; McCoy, Jason G; Kotrys, Anna V; Zhou, Wei; Chinnery, Patrick F; Karczewski, Konrad J; Calvo, Sarah E; Neale, Benjamin M; Mootha, Vamsi K.

Nature ; 620(7975): 839-848, 2023 Aug.

Article in English | MEDLINE | ID: mdl-37587338

ABSTRACT

Mitochondrial DNA (mtDNA) is a maternally inherited, high-copy-number genome required for oxidative phosphorylation1. Heteroplasmy refers to the presence of a mixture of mtDNA alleles in an individual and has been associated with disease and ageing. Mechanisms underlying common variation in human heteroplasmy, and the influence of the nuclear genome on this variation, remain insufficiently explored. Here we quantify mtDNA copy number (mtCN) and heteroplasmy using blood-derived whole-genome sequences from 274,832 individuals and perform genome-wide association studies to identify associated nuclear loci. Following blood cell composition correction, we find that mtCN declines linearly with age and is associated with variants at 92 nuclear loci. We observe that nearly everyone harbours heteroplasmic mtDNA variants obeying two principles: (1) heteroplasmic single nucleotide variants tend to arise somatically and accumulate sharply after the age of 70 years, whereas (2) heteroplasmic indels are maternally inherited as mixtures with relative levels associated with 42 nuclear loci involved in mtDNA replication, maintenance and novel pathways. These loci may act by conferring a replicative advantage to certain mtDNA alleles. As an illustrative example, we identify a length variant carried by more than 50% of humans at position chrM:302 within a G-quadruplex previously proposed to mediate mtDNA transcription/replication switching2,3. We find that this variant exerts cis-acting genetic control over mtDNA abundance and is itself associated in-trans with nuclear loci encoding machinery for this regulatory switch. Our study suggests that common variation in the nuclear genome can shape variation in mtCN and heteroplasmy dynamics across the human population.

Subject(s)

Cell Nucleus , DNA Copy Number Variations , DNA, Mitochondrial , Heteroplasmy , Mitochondria , Aged , Humans , DNA Copy Number Variations/genetics , DNA, Mitochondrial/genetics , Genome-Wide Association Study , Heteroplasmy/genetics , Mitochondria/genetics , Cell Nucleus/genetics , Alleles , Polymorphism, Single Nucleotide , INDEL Mutation , G-Quadruplexes

5.

A harmonized public resource of deeply sequenced diverse human genomes.

Koenig, Zan; Yohannes, Mary T; Nkambule, Lethukuthula L; Zhao, Xuefang; Goodrich, Julia K; Kim, Heesu Ally; Wilson, Michael W; Tiao, Grace; Hao, Stephanie P; Sahakian, Nareh; Chao, Katherine R; Walker, Mark A; Lyu, Yunfei; Rehm, Heidi L; Neale, Benjamin M; Talkowski, Michael E; Daly, Mark J; Brand, Harrison; Karczewski, Konrad J; Atkinson, Elizabeth G; Martin, Alicia R.

Genome Res ; 34(5): 796-809, 2024 06 25.

Article in English | MEDLINE | ID: mdl-38749656

ABSTRACT

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

Subject(s)

Databases, Genetic , Genome, Human , Humans , Human Genome Project , High-Throughput Nucleotide Sequencing/methods , Genetic Variation , Genomics/methods

6.

Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries.

Talkowski, Michael E; Rosenfeld, Jill A; Blumenthal, Ian; Pillalamarri, Vamsee; Chiang, Colby; Heilbut, Adrian; Ernst, Carl; Hanscom, Carrie; Rossin, Elizabeth; Lindgren, Amelia M; Pereira, Shahrin; Ruderfer, Douglas; Kirby, Andrew; Ripke, Stephan; Harris, David J; Lee, Ji-Hyun; Ha, Kyungsoo; Kim, Hyung-Goo; Solomon, Benjamin D; Gropman, Andrea L; Lucente, Diane; Sims, Katherine; Ohsumi, Toshiro K; Borowsky, Mark L; Loranger, Stephanie; Quade, Bradley; Lage, Kasper; Miles, Judith; Wu, Bai-Lin; Shen, Yiping; Neale, Benjamin; Shaffer, Lisa G; Daly, Mark J; Morton, Cynthia C; Gusella, James F.

Cell ; 149(3): 525-37, 2012 Apr 27.

Article in English | MEDLINE | ID: mdl-22521361

ABSTRACT

Balanced chromosomal abnormalities (BCAs) represent a relatively untapped reservoir of single-gene disruptions in neurodevelopmental disorders (NDDs). We sequenced BCAs in patients with autism or related NDDs, revealing disruption of 33 loci in four general categories: (1) genes previously associated with abnormal neurodevelopment (e.g., AUTS2, FOXP1, and CDKL5), (2) single-gene contributors to microdeletion syndromes (MBD5, SATB2, EHMT1, and SNURF-SNRPN), (3) novel risk loci (e.g., CHD8, KIRREL3, and ZNF507), and (4) genes associated with later-onset psychiatric disorders (e.g., TCF4, ZNF804A, PDE10A, GRIN2B, and ANK3). We also discovered among neurodevelopmental cases a profoundly increased burden of copy-number variants from these 33 loci and a significant enrichment of polygenic risk alleles from genome-wide association studies of autism and schizophrenia. Our findings suggest a polygenic risk model of autism and reveal that some neurodevelopmental genes are sensitive to perturbation by multiple mutational mechanisms, leading to variable phenotypic outcomes that manifest at different life stages.

Subject(s)

Child Development Disorders, Pervasive/genetics , Chromosome Aberrations , Autistic Disorder/diagnosis , Autistic Disorder/genetics , Child , Child Development Disorders, Pervasive/diagnosis , Chromosome Breakage , Chromosome Deletion , DNA Copy Number Variations , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Nervous System/growth & development , Schizophrenia/genetics , Sequence Analysis, DNA , Signal Transduction

7.

CHARR efficiently estimates contamination from DNA sequencing data.

Lu, Wenhan; Gauthier, Laura D; Poterba, Timothy; Giacopuzzi, Edoardo; Goodrich, Julia K; Stevens, Christine R; King, Daniel; Daly, Mark J; Neale, Benjamin M; Karczewski, Konrad J.

Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.

Article in English | MEDLINE | ID: mdl-38000370

ABSTRACT

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.

Subject(s)

DNA , Trout , Humans , Animals , Sequence Analysis, DNA/methods , Genotype , Homozygote , High-Throughput Nucleotide Sequencing/methods , Software

8.

Discordant calls across genotype discovery approaches elucidate variants with systematic errors.

Atkinson, Elizabeth G; Artomov, Mykyta; Loboda, Alexander A; Rehm, Heidi L; MacArthur, Daniel G; Karczewski, Konrad J; Neale, Benjamin M; Daly, Mark J.

Genome Res ; 33(6): 999-1005, 2023 06.

Article in English | MEDLINE | ID: mdl-37253541

ABSTRACT

Large-scale high-throughput sequencing data sets have been transformative for informing clinical variant interpretation and for use as reference panels for statistical and population genetic efforts. Although such resources are often treated as ground truth, we find that in widely used reference data sets such as the Genome Aggregation Database (gnomAD), some variants pass gold-standard filters, yet are systematically different in their genotype calls across genotype discovery approaches. The inclusion of such discordant sites in study designs involving multiple genotype discovery strategies could bias results and lead to false-positive hits in association studies owing to technological artifacts rather than a true relationship to the phenotype. Here, we describe this phenomenon of discordant genotype calls across genotype discovery approaches, characterize the error mode of wrong calls, provide a list of discordant sites identified in gnomAD that should be treated with caution in analyses, and present a metric and machine learning classifier trained on gnomAD data to identify likely discordant variants in other data sets. We find that different genotype discovery approaches have different sets of variants at which this problem occurs, but there are characteristic variant features that can be used to predict discordant behavior. Discordant sites are largely shared across ancestry groups, although different populations are powered for the discovery of different variants. We find that the most common error mode is that of a variant being heterozygous for one approach and homozygous for the other, with heterozygous in the genomes and homozygous reference in the exomes making up the majority of miscalls.

Subject(s)

Exome , Genetics, Population , Genotype , Heterozygote , Phenotype , Polymorphism, Single Nucleotide

9.

Mapping and characterization of structural variation in 17,795 human genomes.

Abel, Haley J; Larson, David E; Regier, Allison A; Chiang, Colby; Das, Indraniel; Kanchi, Krishna L; Layer, Ryan M; Neale, Benjamin M; Salerno, William J; Reeves, Catherine; Buyske, Steven; Matise, Tara C; Muzny, Donna M; Zody, Michael C; Lander, Eric S; Dutcher, Susan K; Stitziel, Nathan O; Hall, Ira M.

Nature ; 583(7814): 83-89, 2020 07.

Article in English | MEDLINE | ID: mdl-32460305

ABSTRACT

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.

Subject(s)

Genetic Variation , Genome, Human/genetics , Whole Genome Sequencing , Alleles , Case-Control Studies , Epigenesis, Genetic , Female , Gene Dosage/genetics , Genetics, Population , High-Throughput Nucleotide Sequencing , Humans , Male , Molecular Sequence Annotation , Quantitative Trait Loci , Racial Groups/genetics , Software

10.

Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells.

Bao, Erik L; Nandakumar, Satish K; Liao, Xiaotian; Bick, Alexander G; Karjalainen, Juha; Tabaka, Marcin; Gan, Olga I; Havulinna, Aki S; Kiiskinen, Tuomo T J; Lareau, Caleb A; de Lapuente Portilla, Aitzkoa L; Li, Bo; Emdin, Connor; Codd, Veryan; Nelson, Christopher P; Walker, Christopher J; Churchhouse, Claire; de la Chapelle, Albert; Klein, Daryl E; Nilsson, Björn; Wilson, Peter W F; Cho, Kelly; Pyarajan, Saiju; Gaziano, J Michael; Samani, Nilesh J; Regev, Aviv; Palotie, Aarno; Neale, Benjamin M; Dick, John E; Natarajan, Pradeep; O'Donnell, Christopher J; Daly, Mark J; Milyavsky, Michael; Kathiresan, Sekar; Sankaran, Vijay G.

Nature ; 586(7831): 769-775, 2020 10.

Article in English | MEDLINE | ID: mdl-33057200

ABSTRACT

Myeloproliferative neoplasms (MPNs) are blood cancers that are characterized by the excessive production of mature myeloid cells and arise from the acquisition of somatic driver mutations in haematopoietic stem cells (HSCs). Epidemiological studies indicate a substantial heritable component of MPNs that is among the highest known for cancers1. However, only a limited number of genetic risk loci have been identified, and the underlying biological mechanisms that lead to the acquisition of MPNs remain unclear. Here, by conducting a large-scale genome-wide association study (3,797 cases and 1,152,977 controls), we identify 17 MPN risk loci (P < 5.0 × 10-8), 7 of which have not been previously reported. We find that there is a shared genetic architecture between MPN risk and several haematopoietic traits from distinct lineages; that there is an enrichment for MPN risk variants within accessible chromatin of HSCs; and that increased MPN risk is associated with longer telomere length in leukocytes and other clonal haematopoietic states-collectively suggesting that MPN risk is associated with the function and self-renewal of HSCs. We use gene mapping to identify modulators of HSC biology linked to MPN risk, and show through targeted variant-to-function assays that CHEK2 and GFI1B have roles in altering the function of HSCs to confer disease risk. Overall, our results reveal a previously unappreciated mechanism for inherited MPN risk through the modulation of HSC function.

Subject(s)

Genetic Predisposition to Disease/genetics , Hematopoietic Stem Cells/pathology , Myeloproliferative Disorders/genetics , Myeloproliferative Disorders/pathology , Neoplasms/genetics , Neoplasms/pathology , Cell Lineage/genetics , Cell Self Renewal , Checkpoint Kinase 2/genetics , Female , Humans , Leukocytes/pathology , Male , Proto-Oncogene Proteins/genetics , Repressor Proteins/genetics , Risk , Telomere Homeostasis

11.

A structural variation reference for medical and population genetics.

Collins, Ryan L; Brand, Harrison; Karczewski, Konrad J; Zhao, Xuefang; Alföldi, Jessica; Francioli, Laurent C; Khera, Amit V; Lowther, Chelsea; Gauthier, Laura D; Wang, Harold; Watts, Nicholas A; Solomonson, Matthew; O'Donnell-Luria, Anne; Baumann, Alexander; Munshi, Ruchi; Walker, Mark; Whelan, Christopher W; Huang, Yongqing; Brookings, Ted; Sharpe, Ted; Stone, Matthew R; Valkanas, Elise; Fu, Jack; Tiao, Grace; Laricchia, Kristen M; Ruano-Rubio, Valentin; Stevens, Christine; Gupta, Namrata; Cusick, Caroline; Margolin, Lauren; Taylor, Kent D; Lin, Henry J; Rich, Stephen S; Post, Wendy S; Chen, Yii-Der Ida; Rotter, Jerome I; Nusbaum, Chad; Philippakis, Anthony; Lander, Eric; Gabriel, Stacey; Neale, Benjamin M; Kathiresan, Sekar; Daly, Mark J; Banks, Eric; MacArthur, Daniel G; Talkowski, Michael E.

Nature ; 581(7809): 444-451, 2020 05.

Article in English | MEDLINE | ID: mdl-32461652

ABSTRACT

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.

Subject(s)

Disease/genetics , Genetic Variation , Genetics, Medical/standards , Genetics, Population/standards , Genome, Human/genetics , Female , Genetic Testing , Genotyping Techniques , Humans , Male , Middle Aged , Mutation , Polymorphism, Single Nucleotide/genetics , Racial Groups/genetics , Reference Standards , Selection, Genetic , Whole Genome Sequencing

12.

A scoping review of guidelines for the use of race, ethnicity, and ancestry reveals widespread consensus but also points of ongoing disagreement.

Mauro, Madelyn; Allen, Danielle S; Dauda, Bege; Molina, Santiago J; Neale, Benjamin M; Lewis, Anna C F.

Am J Hum Genet ; 109(12): 2110-2125, 2022 12 01.

Article in English | MEDLINE | ID: mdl-36400022

ABSTRACT

The use of population descriptors such as race, ethnicity, and ancestry in science, medicine, and public health has a long, complicated, and at times dark history, particularly for genetics, given the field's perceived importance for understanding between-group differences. The historical and potential harms that come with irresponsible use of these categories suggests a clear need for definitive guidance about when and how they can be used appropriately. However, while many prior authors have provided such guidance, no established consensus exists, and the extant literature has not been examined for implied consensus and sources of disagreement. Here, we present the results of a scoping review of published normative recommendations regarding the use of population categories, particularly in genetics research. Following PRISMA guidelines, we extracted recommendations from n = 121 articles matching inclusion criteria. Articles were published consistently throughout the time period examined and in a broad range of journals, demonstrating an ongoing and interdisciplinary perceived need for guidance. Examined recommendations fall under one of eight themes identified during analysis. Seven are characterized by broad agreement across articles; one, "appropriate definitions of population categories and contexts for use," revealed substantial fundamental disagreement among articles. Additionally, while many articles focus on the inappropriate use of race, none fundamentally problematize ancestry. This work can be a resource to researchers looking for normative guidance on the use of population descriptors and can orient authors of future guidelines to this complex field, thereby contributing to the development of more effective future guidelines for genetics research.

Subject(s)

Ethnicity , Problem Behavior , Humans , Asian People , Consensus , Ethnicity/genetics , Research Personnel

13.

Genetic structure correlates with ethnolinguistic diversity in eastern and southern Africa.

Atkinson, Elizabeth G; Dalvie, Shareefa; Pichkar, Yakov; Kalungi, Allan; Majara, Lerato; Stevenson, Anne; Abebe, Tamrat; Akena, Dickens; Alemayehu, Melkam; Ashaba, Fred K; Atwoli, Lukoye; Baker, Mark; Chibnik, Lori B; Creanza, Nicole; Daly, Mark J; Fekadu, Abebaw; Gelaye, Bizu; Gichuru, Stella; Injera, Wilfred E; James, Roxanne; Kariuki, Symon M; Kigen, Gabriel; Koen, Nastassja; Koenen, Karestan C; Koenig, Zan; Kwobah, Edith; Kyebuzibwa, Joseph; Musinguzi, Henry; Mwema, Rehema M; Neale, Benjamin M; Newman, Carter P; Newton, Charles R J C; Ongeri, Linnet; Ramachandran, Sohini; Ramesar, Raj; Shiferaw, Welelta; Stein, Dan J; Stroud, Rocky E; Teferra, Solomon; Yohannes, Mary T; Zingela, Zukiswa; Martin, Alicia R.

Am J Hum Genet ; 109(9): 1667-1679, 2022 09 01.

Article in English | MEDLINE | ID: mdl-36055213

ABSTRACT

African populations are the most diverse in the world yet are sorely underrepresented in medical genetics research. Here, we examine the structure of African populations using genetic and comprehensive multi-generational ethnolinguistic data from the Neuropsychiatric Genetics of African Populations-Psychosis study (NeuroGAP-Psychosis) consisting of 900 individuals from Ethiopia, Kenya, South Africa, and Uganda. We find that self-reported language classifications meaningfully tag underlying genetic variation that would be missed with consideration of geography alone, highlighting the importance of culture in shaping genetic diversity. Leveraging our uniquely rich multi-generational ethnolinguistic metadata, we track language transmission through the pedigree, observing the disappearance of several languages in our cohort as well as notable shifts in frequency over three generations. We find suggestive evidence for the rate of language transmission in matrilineal groups having been higher than that for patrilineal ones. We highlight both the diversity of variation within Africa as well as how within-Africa variation can be informative for broader variant interpretation; many variants that are rare elsewhere are common in parts of Africa. The work presented here improves the understanding of the spectrum of genetic variation in African populations and highlights the enormous and complex genetic and ethnolinguistic diversity across Africa.

Subject(s)

Genetic Variation , Genetics, Population , Africa, Southern , Black People/genetics , Genetic Structures , Genetic Variation/genetics , Humans

14.

Author Correction: Nuclear genetic control of mtDNA copy number and heteroplasmy in humans.

Gupta, Rahul; Kanai, Masahiro; Durham, Timothy J; Tsuo, Kristin; McCoy, Jason G; Kotrys, Anna V; Zhou, Wei; Chinnery, Patrick F; Karczewski, Konrad J; Calvo, Sarah E; Neale, Benjamin M; Mootha, Vamsi K.

Nature ; 630(8017): E10, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38831054

15.

Author Correction: A genomic mutational constraint map using variation in 76,156 human genomes.

Chen, Siwei; Francioli, Laurent C; Goodrich, Julia K; Collins, Ryan L; Kanai, Masahiro; Wang, Qingbo; Alföldi, Jessica; Watts, Nicholas A; Vittal, Christopher; Gauthier, Laura D; Poterba, Timothy; Wilson, Michael W; Tarasova, Yekaterina; Phu, William; Grant, Riley; Yohannes, Mary T; Koenig, Zan; Farjoun, Yossi; Banks, Eric; Donnelly, Stacey; Gabriel, Stacey; Gupta, Namrata; Ferriera, Steven; Tolonen, Charlotte; Novod, Sam; Bergelson, Louis; Roazen, David; Ruano-Rubio, Valentin; Covarrubias, Miguel; Llanwarne, Christopher; Petrillo, Nikelle; Wade, Gordon; Jeandet, Thibault; Munshi, Ruchi; Tibbetts, Kathleen; O'Donnell-Luria, Anne; Solomonson, Matthew; Seed, Cotton; Martin, Alicia R; Talkowski, Michael E; Rehm, Heidi L; Daly, Mark J; Tiao, Grace; Neale, Benjamin M; MacArthur, Daniel G; Karczewski, Konrad J.

Nature ; 626(7997): E1, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38225470

16.

FAVOR: functional annotation of variants online resource and annotator for variation across the human genome.

Zhou, Hufeng; Arapoglou, Theodore; Li, Xihao; Li, Zilin; Zheng, Xiuwen; Moore, Jill; Asok, Abhijith; Kumar, Sushant; Blue, Elizabeth E; Buyske, Steven; Cox, Nancy; Felsenfeld, Adam; Gerstein, Mark; Kenny, Eimear; Li, Bingshan; Matise, Tara; Philippakis, Anthony; Rehm, Heidi L; Sofia, Heidi J; Snyder, Grace; Weng, Zhiping; Neale, Benjamin; Sunyaev, Shamil R; Lin, Xihong.

Nucleic Acids Res ; 51(D1): D1300-D1311, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36350676

ABSTRACT

Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.

Subject(s)

Genome, Human , Software , Humans , Molecular Sequence Annotation , Genomics , Genotype , Genetic Variation

17.

A data harmonization pipeline to leverage external controls and boost power in GWAS.

Chen, Danfeng; Tashman, Katherine; Palmer, Duncan S; Neale, Benjamin; Roeder, Kathryn; Bloemendal, Alex; Churchhouse, Claire; Ke, Zheng Tracy.

Hum Mol Genet ; 31(3): 481-489, 2022 02 03.

Article in English | MEDLINE | ID: mdl-34508597

ABSTRACT

The use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spurious results if not carefully addressed. We propose a unified data harmonization pipeline that includes an iterative approach to quality control and imputation, implemented before and after merging cohorts and arrays. We apply this harmonization pipeline to aggregate 27 517 European control samples from 16 collections within dbGaP. We leverage these harmonized controls to conduct a GWAS of Crohn's disease. We demonstrate a boost in power over using the cohort samples alone, and that our procedure results in summary statistics free of any significant batch effects. This harmonization pipeline for aggregating genotype data from multiple sources can also serve other applications where individual level genotypes, rather than summary statistics, are required.

Subject(s)

Genome-Wide Association Study , Polymorphism, Single Nucleotide , Cohort Studies , Genotype , Humans , Polymorphism, Single Nucleotide/genetics , Quality Control

18.

Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations.

Martin, Alicia R; Atkinson, Elizabeth G; Chapman, Sinéad B; Stevenson, Anne; Stroud, Rocky E; Abebe, Tamrat; Akena, Dickens; Alemayehu, Melkam; Ashaba, Fred K; Atwoli, Lukoye; Bowers, Tera; Chibnik, Lori B; Daly, Mark J; DeSmet, Timothy; Dodge, Sheila; Fekadu, Abebaw; Ferriera, Steven; Gelaye, Bizu; Gichuru, Stella; Injera, Wilfred E; James, Roxanne; Kariuki, Symon M; Kigen, Gabriel; Koenen, Karestan C; Kwobah, Edith; Kyebuzibwa, Joseph; Majara, Lerato; Musinguzi, Henry; Mwema, Rehema M; Neale, Benjamin M; Newman, Carter P; Newton, Charles R J C; Pickrell, Joseph K; Ramesar, Raj; Shiferaw, Welelta; Stein, Dan J; Teferra, Solomon; van der Merwe, Celia; Zingela, Zukiswa.

Am J Hum Genet ; 108(4): 656-668, 2021 04 01.

Article in English | MEDLINE | ID: mdl-33770507

ABSTRACT

Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.

Subject(s)

DNA Mutational Analysis/economics , DNA Mutational Analysis/standards , Genetic Variation/genetics , Genetics, Population/economics , Africa , DNA Mutational Analysis/methods , Genetics, Population/methods , Genome, Human/genetics , Genome-Wide Association Study , Health Equity , Humans , Microbiota , Whole Genome Sequencing/economics , Whole Genome Sequencing/standards

19.

Problems with Using Polygenic Scores to Select Embryos.

Turley, Patrick; Meyer, Michelle N; Wang, Nancy; Cesarini, David; Hammonds, Evelynn; Martin, Alicia R; Neale, Benjamin M; Rehm, Heidi L; Wilkins-Haug, Louise; Benjamin, Daniel J; Hyman, Steven; Laibson, David; Visscher, Peter M.

N Engl J Med ; 385(1): 78-86, 2021 07 01.

Article in English | MEDLINE | ID: mdl-34192436

ABSTRACT

Companies have recently begun to sell a new service to patients considering in vitro fertilization: embryo selection based on polygenic scores (ESPS). These scores represent individualized predictions of health and other outcomes derived from genomewide association studies in adults to partially predict these outcomes. This article includes a discussion of many factors that lower the predictive power of polygenic scores in the context of embryo selection and quantifies these effects for a variety of clinical and nonclinical traits. Also discussed are potential unintended consequences of ESPS (including selecting for adverse traits, altering population demographics, exacerbating inequalities in society, and devaluing certain traits). Recommendations for the responsible communication about ESPS by practitioners are provided, and a call for a society-wide conversation about this technology is made. (Funded by the National Institute on Aging and others.).

Subject(s)

Embryo, Mammalian , Fertilization in Vitro , Genetic Testing , Genetic Variation , Multifactorial Inheritance/genetics , Phenotype , Preimplantation Diagnosis , Educational Status , Gene-Environment Interaction , Genome-Wide Association Study , Humans , Predictive Value of Tests

20.

Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations.

Luo, Yang; Li, Xinyi; Wang, Xin; Gazal, Steven; Mercader, Josep Maria; Neale, Benjamin M; Florez, Jose C; Auton, Adam; Price, Alkes L; Finucane, Hilary K; Raychaudhuri, Soumya.

Hum Mol Genet ; 30(16): 1521-1534, 2021 07 28.

Article in English | MEDLINE | ID: mdl-33987664

ABSTRACT

It is important to study the genetics of complex traits in diverse populations. Here, we introduce covariate-adjusted linkage disequilibrium (LD) score regression (cov-LDSC), a method to estimate SNP-heritability (${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}})$ and its enrichment in homogenous and admixed populations with summary statistics and in-sample LD estimates. In-sample LD can be estimated from a subset of the genome-wide association studies samples, allowing our method to be applied efficiently to very large cohorts. In simulations, we show that unadjusted LDSC underestimates ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ by 10-60% in admixed populations; in contrast, cov-LDSC is robustly accurate. We apply cov-LDSC to genotyping data from 8124 individuals, mostly of admixed ancestry, from the Slim Initiative in Genomic Medicine for the Americas study, and to approximately 161 000 Latino-ancestry individuals, 47 000 African American-ancestry individuals and 135 000 European-ancestry individuals, as classified by 23andMe. We estimate ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and detect heritability enrichment in three quantitative and five dichotomous phenotypes, making this, to our knowledge, the most comprehensive heritability-based analysis of admixed individuals to date. Most traits have high concordance of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and consistent tissue-specific heritability enrichment among different populations. However, for age at menarche, we observe population-specific heritability estimates of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$. We observe consistent patterns of tissue-specific heritability enrichment across populations; for example, in the limbic system for BMI, the per-standardized-annotation effect size $ \tau $* is 0.16 ± 0.04, 0.28 ± 0.11 and 0.18 ± 0.03 in the Latino-, African American- and European-ancestry populations, respectively. Our approach is a powerful way to analyze genetic data for complex traits from admixed populations.

Subject(s)

Genetics, Population , Genome-Wide Association Study/statistics & numerical data , Linkage Disequilibrium/genetics , Multifactorial Inheritance/genetics , Genotyping Techniques/statistics & numerical data , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Quantitative Trait, Heritable

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL