Búsqueda | Portal de Búsqueda de la BVS

1.

A cross-disorder dosage sensitivity map of the human genome.

Collins, Ryan L; Glessner, Joseph T; Porcu, Eleonora; Lepamets, Maarja; Brandon, Rhonda; Lauricella, Christopher; Han, Lide; Morley, Theodore; Niestroj, Lisa-Marie; Ulirsch, Jacob; Everett, Selin; Howrigan, Daniel P; Boone, Philip M; Fu, Jack; Karczewski, Konrad J; Kellaris, Georgios; Lowther, Chelsea; Lucente, Diane; Mohajeri, Kiana; Nõukas, Margit; Nuttle, Xander; Samocha, Kaitlin E; Trinh, Mi; Ullah, Farid; Võsa, Urmo; Hurles, Matthew E; Aradhya, Swaroop; Davis, Erica E; Finucane, Hilary; Gusella, James F; Janze, Aura; Katsanis, Nicholas; Matyakhina, Ludmila; Neale, Benjamin M; Sanders, David; Warren, Stephanie; Hodge, Jennelle C; Lal, Dennis; Ruderfer, Douglas M; Meck, Jeanne; Mägi, Reedik; Esko, Tõnu; Reymond, Alexandre; Kutalik, Zoltán; Hakonarson, Hakon; Sunyaev, Shamil; Brand, Harrison; Talkowski, Michael E.

Cell ; 185(16): 3041-3055.e25, 2022 08 04.

Artículo en Inglés | MEDLINE | ID: mdl-35917817

RESUMEN

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.

Asunto(s)

Variaciones en el Número de Copia de ADN , Genoma Humano , Variaciones en el Número de Copia de ADN/genética , Dosificación de Gen , Haploinsuficiencia/genética , Humanos

2.

A genomic mutational constraint map using variation in 76,156 human genomes.

Chen, Siwei; Francioli, Laurent C; Goodrich, Julia K; Collins, Ryan L; Kanai, Masahiro; Wang, Qingbo; Alföldi, Jessica; Watts, Nicholas A; Vittal, Christopher; Gauthier, Laura D; Poterba, Timothy; Wilson, Michael W; Tarasova, Yekaterina; Phu, William; Grant, Riley; Yohannes, Mary T; Koenig, Zan; Farjoun, Yossi; Banks, Eric; Donnelly, Stacey; Gabriel, Stacey; Gupta, Namrata; Ferriera, Steven; Tolonen, Charlotte; Novod, Sam; Bergelson, Louis; Roazen, David; Ruano-Rubio, Valentin; Covarrubias, Miguel; Llanwarne, Christopher; Petrillo, Nikelle; Wade, Gordon; Jeandet, Thibault; Munshi, Ruchi; Tibbetts, Kathleen; O'Donnell-Luria, Anne; Solomonson, Matthew; Seed, Cotton; Martin, Alicia R; Talkowski, Michael E; Rehm, Heidi L; Daly, Mark J; Tiao, Grace; Neale, Benjamin M; MacArthur, Daniel G; Karczewski, Konrad J.

Nature ; 625(7993): 92-100, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38057664

RESUMEN

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.

Asunto(s)

Genoma Humano , Genómica , Modelos Genéticos , Mutación , Humanos , Acceso a la Información , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Frecuencia de los Genes , Genoma Humano/genética , Mutación/genética , Selección Genética

3.

Polygenic architecture of rare coding variation across 394,783 exomes.

Weiner, Daniel J; Nadig, Ajay; Jagadeesh, Karthik A; Dey, Kushal K; Neale, Benjamin M; Robinson, Elise B; Karczewski, Konrad J; O'Connor, Luke J.

Nature ; 614(7948): 492-499, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36755099

RESUMEN

Both common and rare genetic variants influence complex traits and common diseases. Genome-wide association studies have identified thousands of common-variant associations, and more recently, large-scale exome sequencing studies have identified rare-variant associations in hundreds of genes1-3. However, rare-variant genetic architecture is not well characterized, and the relationship between common-variant and rare-variant architecture is unclear4. Here we quantify the heritability explained by the gene-wise burden of rare coding variants across 22 common traits and diseases in 394,783 UK Biobank exomes5. Rare coding variants (allele frequency < 1 × 10-3) explain 1.3% (s.e. = 0.03%) of phenotypic variance on average-much less than common variants-and most burden heritability is explained by ultrarare loss-of-function variants (allele frequency < 1 × 10-5). Common and rare variants implicate the same cell types, with similar enrichments, and they have pleiotropic effects on the same pairs of traits, with similar genetic correlations. They partially colocalize at individual genes and loci, but not to the same extent: burden heritability is strongly concentrated in significant genes, while common-variant heritability is more polygenic, and burden heritability is also more strongly concentrated in constrained genes. Finally, we find that burden heritability for schizophrenia and bipolar disorder6,7 is approximately 2%. Our results indicate that rare coding variants will implicate a tractable number of large-effect genes, that common and rare associations are mechanistically convergent, and that rare coding variants will contribute only modestly to missing heritability and population risk stratification.

Asunto(s)

Exoma , Frecuencia de los Genes , Variación Genética , Herencia Multifactorial , Humanos , Exoma/genética , Variación Genética/genética , Estudio de Asociación del Genoma Completo , Herencia Multifactorial/genética , Factores de Riesgo , Reino Unido , Sitios Genéticos/genética , Esquizofrenia/genética , Trastorno Bipolar/genética

4.

Nuclear genetic control of mtDNA copy number and heteroplasmy in humans.

Gupta, Rahul; Kanai, Masahiro; Durham, Timothy J; Tsuo, Kristin; McCoy, Jason G; Kotrys, Anna V; Zhou, Wei; Chinnery, Patrick F; Karczewski, Konrad J; Calvo, Sarah E; Neale, Benjamin M; Mootha, Vamsi K.

Nature ; 620(7975): 839-848, 2023 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-37587338

RESUMEN

Mitochondrial DNA (mtDNA) is a maternally inherited, high-copy-number genome required for oxidative phosphorylation1. Heteroplasmy refers to the presence of a mixture of mtDNA alleles in an individual and has been associated with disease and ageing. Mechanisms underlying common variation in human heteroplasmy, and the influence of the nuclear genome on this variation, remain insufficiently explored. Here we quantify mtDNA copy number (mtCN) and heteroplasmy using blood-derived whole-genome sequences from 274,832 individuals and perform genome-wide association studies to identify associated nuclear loci. Following blood cell composition correction, we find that mtCN declines linearly with age and is associated with variants at 92 nuclear loci. We observe that nearly everyone harbours heteroplasmic mtDNA variants obeying two principles: (1) heteroplasmic single nucleotide variants tend to arise somatically and accumulate sharply after the age of 70 years, whereas (2) heteroplasmic indels are maternally inherited as mixtures with relative levels associated with 42 nuclear loci involved in mtDNA replication, maintenance and novel pathways. These loci may act by conferring a replicative advantage to certain mtDNA alleles. As an illustrative example, we identify a length variant carried by more than 50% of humans at position chrM:302 within a G-quadruplex previously proposed to mediate mtDNA transcription/replication switching2,3. We find that this variant exerts cis-acting genetic control over mtDNA abundance and is itself associated in-trans with nuclear loci encoding machinery for this regulatory switch. Our study suggests that common variation in the nuclear genome can shape variation in mtCN and heteroplasmy dynamics across the human population.

Asunto(s)

Núcleo Celular , Variaciones en el Número de Copia de ADN , ADN Mitocondrial , Heteroplasmia , Mitocondrias , Anciano , Humanos , Variaciones en el Número de Copia de ADN/genética , ADN Mitocondrial/genética , Estudio de Asociación del Genoma Completo , Heteroplasmia/genética , Mitocondrias/genética , Núcleo Celular/genética , Alelos , Polimorfismo de Nucleótido Simple , Mutación INDEL , G-Cuádruplex

5.

A harmonized public resource of deeply sequenced diverse human genomes.

Koenig, Zan; Yohannes, Mary T; Nkambule, Lethukuthula L; Zhao, Xuefang; Goodrich, Julia K; Kim, Heesu Ally; Wilson, Michael W; Tiao, Grace; Hao, Stephanie P; Sahakian, Nareh; Chao, Katherine R; Walker, Mark A; Lyu, Yunfei; Rehm, Heidi L; Neale, Benjamin M; Talkowski, Michael E; Daly, Mark J; Brand, Harrison; Karczewski, Konrad J; Atkinson, Elizabeth G; Martin, Alicia R.

Genome Res ; 34(5): 796-809, 2024 06 25.

Artículo en Inglés | MEDLINE | ID: mdl-38749656

RESUMEN

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

Asunto(s)

Bases de Datos Genéticas , Genoma Humano , Humanos , Proyecto Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Variación Genética , Genómica/métodos

6.

Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries.

Talkowski, Michael E; Rosenfeld, Jill A; Blumenthal, Ian; Pillalamarri, Vamsee; Chiang, Colby; Heilbut, Adrian; Ernst, Carl; Hanscom, Carrie; Rossin, Elizabeth; Lindgren, Amelia M; Pereira, Shahrin; Ruderfer, Douglas; Kirby, Andrew; Ripke, Stephan; Harris, David J; Lee, Ji-Hyun; Ha, Kyungsoo; Kim, Hyung-Goo; Solomon, Benjamin D; Gropman, Andrea L; Lucente, Diane; Sims, Katherine; Ohsumi, Toshiro K; Borowsky, Mark L; Loranger, Stephanie; Quade, Bradley; Lage, Kasper; Miles, Judith; Wu, Bai-Lin; Shen, Yiping; Neale, Benjamin; Shaffer, Lisa G; Daly, Mark J; Morton, Cynthia C; Gusella, James F.

Cell ; 149(3): 525-37, 2012 Apr 27.

Artículo en Inglés | MEDLINE | ID: mdl-22521361

RESUMEN

Balanced chromosomal abnormalities (BCAs) represent a relatively untapped reservoir of single-gene disruptions in neurodevelopmental disorders (NDDs). We sequenced BCAs in patients with autism or related NDDs, revealing disruption of 33 loci in four general categories: (1) genes previously associated with abnormal neurodevelopment (e.g., AUTS2, FOXP1, and CDKL5), (2) single-gene contributors to microdeletion syndromes (MBD5, SATB2, EHMT1, and SNURF-SNRPN), (3) novel risk loci (e.g., CHD8, KIRREL3, and ZNF507), and (4) genes associated with later-onset psychiatric disorders (e.g., TCF4, ZNF804A, PDE10A, GRIN2B, and ANK3). We also discovered among neurodevelopmental cases a profoundly increased burden of copy-number variants from these 33 loci and a significant enrichment of polygenic risk alleles from genome-wide association studies of autism and schizophrenia. Our findings suggest a polygenic risk model of autism and reveal that some neurodevelopmental genes are sensitive to perturbation by multiple mutational mechanisms, leading to variable phenotypic outcomes that manifest at different life stages.

Asunto(s)

Trastornos Generalizados del Desarrollo Infantil/genética , Aberraciones Cromosómicas , Trastorno Autístico/diagnóstico , Trastorno Autístico/genética , Niño , Trastornos Generalizados del Desarrollo Infantil/diagnóstico , Rotura Cromosómica , Deleción Cromosómica , Variaciones en el Número de Copia de ADN , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Sistema Nervioso/crecimiento & desarrollo , Esquizofrenia/genética , Análisis de Secuencia de ADN , Transducción de Señal

7.

CHARR efficiently estimates contamination from DNA sequencing data.

Lu, Wenhan; Gauthier, Laura D; Poterba, Timothy; Giacopuzzi, Edoardo; Goodrich, Julia K; Stevens, Christine R; King, Daniel; Daly, Mark J; Neale, Benjamin M; Karczewski, Konrad J.

Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.

Artículo en Inglés | MEDLINE | ID: mdl-38000370

RESUMEN

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.

Asunto(s)

ADN , Trucha , Humanos , Animales , Análisis de Secuencia de ADN/métodos , Genotipo , Homocigoto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos

8.

Discordant calls across genotype discovery approaches elucidate variants with systematic errors.

Atkinson, Elizabeth G; Artomov, Mykyta; Loboda, Alexander A; Rehm, Heidi L; MacArthur, Daniel G; Karczewski, Konrad J; Neale, Benjamin M; Daly, Mark J.

Genome Res ; 33(6): 999-1005, 2023 06.

Artículo en Inglés | MEDLINE | ID: mdl-37253541

RESUMEN

Large-scale high-throughput sequencing data sets have been transformative for informing clinical variant interpretation and for use as reference panels for statistical and population genetic efforts. Although such resources are often treated as ground truth, we find that in widely used reference data sets such as the Genome Aggregation Database (gnomAD), some variants pass gold-standard filters, yet are systematically different in their genotype calls across genotype discovery approaches. The inclusion of such discordant sites in study designs involving multiple genotype discovery strategies could bias results and lead to false-positive hits in association studies owing to technological artifacts rather than a true relationship to the phenotype. Here, we describe this phenomenon of discordant genotype calls across genotype discovery approaches, characterize the error mode of wrong calls, provide a list of discordant sites identified in gnomAD that should be treated with caution in analyses, and present a metric and machine learning classifier trained on gnomAD data to identify likely discordant variants in other data sets. We find that different genotype discovery approaches have different sets of variants at which this problem occurs, but there are characteristic variant features that can be used to predict discordant behavior. Discordant sites are largely shared across ancestry groups, although different populations are powered for the discovery of different variants. We find that the most common error mode is that of a variant being heterozygous for one approach and homozygous for the other, with heterozygous in the genomes and homozygous reference in the exomes making up the majority of miscalls.

Asunto(s)

Exoma , Genética de Población , Genotipo , Heterocigoto , Fenotipo , Polimorfismo de Nucleótido Simple

9.

Mapping and characterization of structural variation in 17,795 human genomes.

Abel, Haley J; Larson, David E; Regier, Allison A; Chiang, Colby; Das, Indraniel; Kanchi, Krishna L; Layer, Ryan M; Neale, Benjamin M; Salerno, William J; Reeves, Catherine; Buyske, Steven; Matise, Tara C; Muzny, Donna M; Zody, Michael C; Lander, Eric S; Dutcher, Susan K; Stitziel, Nathan O; Hall, Ira M.

Nature ; 583(7814): 83-89, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32460305

RESUMEN

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.

Asunto(s)

Variación Genética , Genoma Humano/genética , Secuenciación Completa del Genoma , Alelos , Estudios de Casos y Controles , Epigénesis Genética , Femenino , Dosificación de Gen/genética , Genética de Población , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Anotación de Secuencia Molecular , Sitios de Carácter Cuantitativo , Grupos Raciales/genética , Programas Informáticos

10.

Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells.

Bao, Erik L; Nandakumar, Satish K; Liao, Xiaotian; Bick, Alexander G; Karjalainen, Juha; Tabaka, Marcin; Gan, Olga I; Havulinna, Aki S; Kiiskinen, Tuomo T J; Lareau, Caleb A; de Lapuente Portilla, Aitzkoa L; Li, Bo; Emdin, Connor; Codd, Veryan; Nelson, Christopher P; Walker, Christopher J; Churchhouse, Claire; de la Chapelle, Albert; Klein, Daryl E; Nilsson, Björn; Wilson, Peter W F; Cho, Kelly; Pyarajan, Saiju; Gaziano, J Michael; Samani, Nilesh J; Regev, Aviv; Palotie, Aarno; Neale, Benjamin M; Dick, John E; Natarajan, Pradeep; O'Donnell, Christopher J; Daly, Mark J; Milyavsky, Michael; Kathiresan, Sekar; Sankaran, Vijay G.

Nature ; 586(7831): 769-775, 2020 10.

Artículo en Inglés | MEDLINE | ID: mdl-33057200

RESUMEN

Myeloproliferative neoplasms (MPNs) are blood cancers that are characterized by the excessive production of mature myeloid cells and arise from the acquisition of somatic driver mutations in haematopoietic stem cells (HSCs). Epidemiological studies indicate a substantial heritable component of MPNs that is among the highest known for cancers1. However, only a limited number of genetic risk loci have been identified, and the underlying biological mechanisms that lead to the acquisition of MPNs remain unclear. Here, by conducting a large-scale genome-wide association study (3,797 cases and 1,152,977 controls), we identify 17 MPN risk loci (P < 5.0 × 10-8), 7 of which have not been previously reported. We find that there is a shared genetic architecture between MPN risk and several haematopoietic traits from distinct lineages; that there is an enrichment for MPN risk variants within accessible chromatin of HSCs; and that increased MPN risk is associated with longer telomere length in leukocytes and other clonal haematopoietic states-collectively suggesting that MPN risk is associated with the function and self-renewal of HSCs. We use gene mapping to identify modulators of HSC biology linked to MPN risk, and show through targeted variant-to-function assays that CHEK2 and GFI1B have roles in altering the function of HSCs to confer disease risk. Overall, our results reveal a previously unappreciated mechanism for inherited MPN risk through the modulation of HSC function.

Asunto(s)

Predisposición Genética a la Enfermedad/genética , Células Madre Hematopoyéticas/patología , Trastornos Mieloproliferativos/genética , Trastornos Mieloproliferativos/patología , Neoplasias/genética , Neoplasias/patología , Linaje de la Célula/genética , Autorrenovación de las Células , Quinasa de Punto de Control 2/genética , Femenino , Humanos , Leucocitos/patología , Masculino , Proteínas Proto-Oncogénicas/genética , Proteínas Represoras/genética , Riesgo , Homeostasis del Telómero

11.

A structural variation reference for medical and population genetics.

Collins, Ryan L; Brand, Harrison; Karczewski, Konrad J; Zhao, Xuefang; Alföldi, Jessica; Francioli, Laurent C; Khera, Amit V; Lowther, Chelsea; Gauthier, Laura D; Wang, Harold; Watts, Nicholas A; Solomonson, Matthew; O'Donnell-Luria, Anne; Baumann, Alexander; Munshi, Ruchi; Walker, Mark; Whelan, Christopher W; Huang, Yongqing; Brookings, Ted; Sharpe, Ted; Stone, Matthew R; Valkanas, Elise; Fu, Jack; Tiao, Grace; Laricchia, Kristen M; Ruano-Rubio, Valentin; Stevens, Christine; Gupta, Namrata; Cusick, Caroline; Margolin, Lauren; Taylor, Kent D; Lin, Henry J; Rich, Stephen S; Post, Wendy S; Chen, Yii-Der Ida; Rotter, Jerome I; Nusbaum, Chad; Philippakis, Anthony; Lander, Eric; Gabriel, Stacey; Neale, Benjamin M; Kathiresan, Sekar; Daly, Mark J; Banks, Eric; MacArthur, Daniel G; Talkowski, Michael E.

Nature ; 581(7809): 444-451, 2020 05.

Artículo en Inglés | MEDLINE | ID: mdl-32461652

RESUMEN

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.

Asunto(s)

Enfermedad/genética , Variación Genética , Genética Médica/normas , Genética de Población/normas , Genoma Humano/genética , Femenino , Pruebas Genéticas , Técnicas de Genotipaje , Humanos , Masculino , Persona de Mediana Edad , Mutación , Polimorfismo de Nucleótido Simple/genética , Grupos Raciales/genética , Estándares de Referencia , Selección Genética , Secuenciación Completa del Genoma

12.

A scoping review of guidelines for the use of race, ethnicity, and ancestry reveals widespread consensus but also points of ongoing disagreement.

Mauro, Madelyn; Allen, Danielle S; Dauda, Bege; Molina, Santiago J; Neale, Benjamin M; Lewis, Anna C F.

Am J Hum Genet ; 109(12): 2110-2125, 2022 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-36400022

RESUMEN

The use of population descriptors such as race, ethnicity, and ancestry in science, medicine, and public health has a long, complicated, and at times dark history, particularly for genetics, given the field's perceived importance for understanding between-group differences. The historical and potential harms that come with irresponsible use of these categories suggests a clear need for definitive guidance about when and how they can be used appropriately. However, while many prior authors have provided such guidance, no established consensus exists, and the extant literature has not been examined for implied consensus and sources of disagreement. Here, we present the results of a scoping review of published normative recommendations regarding the use of population categories, particularly in genetics research. Following PRISMA guidelines, we extracted recommendations from n = 121 articles matching inclusion criteria. Articles were published consistently throughout the time period examined and in a broad range of journals, demonstrating an ongoing and interdisciplinary perceived need for guidance. Examined recommendations fall under one of eight themes identified during analysis. Seven are characterized by broad agreement across articles; one, "appropriate definitions of population categories and contexts for use," revealed substantial fundamental disagreement among articles. Additionally, while many articles focus on the inappropriate use of race, none fundamentally problematize ancestry. This work can be a resource to researchers looking for normative guidance on the use of population descriptors and can orient authors of future guidelines to this complex field, thereby contributing to the development of more effective future guidelines for genetics research.

Asunto(s)

Etnicidad , Problema de Conducta , Humanos , Pueblo Asiatico , Consenso , Etnicidad/genética , Investigadores

13.

Genetic structure correlates with ethnolinguistic diversity in eastern and southern Africa.

Atkinson, Elizabeth G; Dalvie, Shareefa; Pichkar, Yakov; Kalungi, Allan; Majara, Lerato; Stevenson, Anne; Abebe, Tamrat; Akena, Dickens; Alemayehu, Melkam; Ashaba, Fred K; Atwoli, Lukoye; Baker, Mark; Chibnik, Lori B; Creanza, Nicole; Daly, Mark J; Fekadu, Abebaw; Gelaye, Bizu; Gichuru, Stella; Injera, Wilfred E; James, Roxanne; Kariuki, Symon M; Kigen, Gabriel; Koen, Nastassja; Koenen, Karestan C; Koenig, Zan; Kwobah, Edith; Kyebuzibwa, Joseph; Musinguzi, Henry; Mwema, Rehema M; Neale, Benjamin M; Newman, Carter P; Newton, Charles R J C; Ongeri, Linnet; Ramachandran, Sohini; Ramesar, Raj; Shiferaw, Welelta; Stein, Dan J; Stroud, Rocky E; Teferra, Solomon; Yohannes, Mary T; Zingela, Zukiswa; Martin, Alicia R.

Am J Hum Genet ; 109(9): 1667-1679, 2022 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-36055213

RESUMEN

African populations are the most diverse in the world yet are sorely underrepresented in medical genetics research. Here, we examine the structure of African populations using genetic and comprehensive multi-generational ethnolinguistic data from the Neuropsychiatric Genetics of African Populations-Psychosis study (NeuroGAP-Psychosis) consisting of 900 individuals from Ethiopia, Kenya, South Africa, and Uganda. We find that self-reported language classifications meaningfully tag underlying genetic variation that would be missed with consideration of geography alone, highlighting the importance of culture in shaping genetic diversity. Leveraging our uniquely rich multi-generational ethnolinguistic metadata, we track language transmission through the pedigree, observing the disappearance of several languages in our cohort as well as notable shifts in frequency over three generations. We find suggestive evidence for the rate of language transmission in matrilineal groups having been higher than that for patrilineal ones. We highlight both the diversity of variation within Africa as well as how within-Africa variation can be informative for broader variant interpretation; many variants that are rare elsewhere are common in parts of Africa. The work presented here improves the understanding of the spectrum of genetic variation in African populations and highlights the enormous and complex genetic and ethnolinguistic diversity across Africa.

Asunto(s)

Variación Genética , Genética de Población , África Austral , Población Negra/genética , Estructuras Genéticas , Variación Genética/genética , Humanos

14.

Author Correction: Nuclear genetic control of mtDNA copy number and heteroplasmy in humans.

Gupta, Rahul; Kanai, Masahiro; Durham, Timothy J; Tsuo, Kristin; McCoy, Jason G; Kotrys, Anna V; Zhou, Wei; Chinnery, Patrick F; Karczewski, Konrad J; Calvo, Sarah E; Neale, Benjamin M; Mootha, Vamsi K.

Nature ; 630(8017): E10, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38831054

15.

Author Correction: A genomic mutational constraint map using variation in 76,156 human genomes.

Chen, Siwei; Francioli, Laurent C; Goodrich, Julia K; Collins, Ryan L; Kanai, Masahiro; Wang, Qingbo; Alföldi, Jessica; Watts, Nicholas A; Vittal, Christopher; Gauthier, Laura D; Poterba, Timothy; Wilson, Michael W; Tarasova, Yekaterina; Phu, William; Grant, Riley; Yohannes, Mary T; Koenig, Zan; Farjoun, Yossi; Banks, Eric; Donnelly, Stacey; Gabriel, Stacey; Gupta, Namrata; Ferriera, Steven; Tolonen, Charlotte; Novod, Sam; Bergelson, Louis; Roazen, David; Ruano-Rubio, Valentin; Covarrubias, Miguel; Llanwarne, Christopher; Petrillo, Nikelle; Wade, Gordon; Jeandet, Thibault; Munshi, Ruchi; Tibbetts, Kathleen; O'Donnell-Luria, Anne; Solomonson, Matthew; Seed, Cotton; Martin, Alicia R; Talkowski, Michael E; Rehm, Heidi L; Daly, Mark J; Tiao, Grace; Neale, Benjamin M; MacArthur, Daniel G; Karczewski, Konrad J.

Nature ; 626(7997): E1, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38225470

16.

FAVOR: functional annotation of variants online resource and annotator for variation across the human genome.

Zhou, Hufeng; Arapoglou, Theodore; Li, Xihao; Li, Zilin; Zheng, Xiuwen; Moore, Jill; Asok, Abhijith; Kumar, Sushant; Blue, Elizabeth E; Buyske, Steven; Cox, Nancy; Felsenfeld, Adam; Gerstein, Mark; Kenny, Eimear; Li, Bingshan; Matise, Tara; Philippakis, Anthony; Rehm, Heidi L; Sofia, Heidi J; Snyder, Grace; Weng, Zhiping; Neale, Benjamin; Sunyaev, Shamil R; Lin, Xihong.

Nucleic Acids Res ; 51(D1): D1300-D1311, 2023 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-36350676

RESUMEN

Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.

Asunto(s)

Genoma Humano , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Genómica , Genotipo , Variación Genética

17.

A data harmonization pipeline to leverage external controls and boost power in GWAS.

Chen, Danfeng; Tashman, Katherine; Palmer, Duncan S; Neale, Benjamin; Roeder, Kathryn; Bloemendal, Alex; Churchhouse, Claire; Ke, Zheng Tracy.

Hum Mol Genet ; 31(3): 481-489, 2022 02 03.

Artículo en Inglés | MEDLINE | ID: mdl-34508597

RESUMEN

The use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spurious results if not carefully addressed. We propose a unified data harmonization pipeline that includes an iterative approach to quality control and imputation, implemented before and after merging cohorts and arrays. We apply this harmonization pipeline to aggregate 27 517 European control samples from 16 collections within dbGaP. We leverage these harmonized controls to conduct a GWAS of Crohn's disease. We demonstrate a boost in power over using the cohort samples alone, and that our procedure results in summary statistics free of any significant batch effects. This harmonization pipeline for aggregating genotype data from multiple sources can also serve other applications where individual level genotypes, rather than summary statistics, are required.

Asunto(s)

Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Estudios de Cohortes , Genotipo , Humanos , Polimorfismo de Nucleótido Simple/genética , Control de Calidad

18.

Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations.

Martin, Alicia R; Atkinson, Elizabeth G; Chapman, Sinéad B; Stevenson, Anne; Stroud, Rocky E; Abebe, Tamrat; Akena, Dickens; Alemayehu, Melkam; Ashaba, Fred K; Atwoli, Lukoye; Bowers, Tera; Chibnik, Lori B; Daly, Mark J; DeSmet, Timothy; Dodge, Sheila; Fekadu, Abebaw; Ferriera, Steven; Gelaye, Bizu; Gichuru, Stella; Injera, Wilfred E; James, Roxanne; Kariuki, Symon M; Kigen, Gabriel; Koenen, Karestan C; Kwobah, Edith; Kyebuzibwa, Joseph; Majara, Lerato; Musinguzi, Henry; Mwema, Rehema M; Neale, Benjamin M; Newman, Carter P; Newton, Charles R J C; Pickrell, Joseph K; Ramesar, Raj; Shiferaw, Welelta; Stein, Dan J; Teferra, Solomon; van der Merwe, Celia; Zingela, Zukiswa.

Am J Hum Genet ; 108(4): 656-668, 2021 04 01.

Artículo en Inglés | MEDLINE | ID: mdl-33770507

RESUMEN

Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.

Asunto(s)

Análisis Mutacional de ADN/economía , Análisis Mutacional de ADN/normas , Variación Genética/genética , Genética de Población/economía , África , Análisis Mutacional de ADN/métodos , Genética de Población/métodos , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Equidad en Salud , Humanos , Microbiota , Secuenciación Completa del Genoma/economía , Secuenciación Completa del Genoma/normas

19.

Problems with Using Polygenic Scores to Select Embryos.

Turley, Patrick; Meyer, Michelle N; Wang, Nancy; Cesarini, David; Hammonds, Evelynn; Martin, Alicia R; Neale, Benjamin M; Rehm, Heidi L; Wilkins-Haug, Louise; Benjamin, Daniel J; Hyman, Steven; Laibson, David; Visscher, Peter M.

N Engl J Med ; 385(1): 78-86, 2021 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-34192436

RESUMEN

Companies have recently begun to sell a new service to patients considering in vitro fertilization: embryo selection based on polygenic scores (ESPS). These scores represent individualized predictions of health and other outcomes derived from genomewide association studies in adults to partially predict these outcomes. This article includes a discussion of many factors that lower the predictive power of polygenic scores in the context of embryo selection and quantifies these effects for a variety of clinical and nonclinical traits. Also discussed are potential unintended consequences of ESPS (including selecting for adverse traits, altering population demographics, exacerbating inequalities in society, and devaluing certain traits). Recommendations for the responsible communication about ESPS by practitioners are provided, and a call for a society-wide conversation about this technology is made. (Funded by the National Institute on Aging and others.).

Asunto(s)

Embrión de Mamíferos , Fertilización In Vitro , Pruebas Genéticas , Variación Genética , Herencia Multifactorial/genética , Fenotipo , Diagnóstico Preimplantación , Escolaridad , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Humanos , Valor Predictivo de las Pruebas

20.

Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations.

Luo, Yang; Li, Xinyi; Wang, Xin; Gazal, Steven; Mercader, Josep Maria; Neale, Benjamin M; Florez, Jose C; Auton, Adam; Price, Alkes L; Finucane, Hilary K; Raychaudhuri, Soumya.

Hum Mol Genet ; 30(16): 1521-1534, 2021 07 28.

Artículo en Inglés | MEDLINE | ID: mdl-33987664

RESUMEN

It is important to study the genetics of complex traits in diverse populations. Here, we introduce covariate-adjusted linkage disequilibrium (LD) score regression (cov-LDSC), a method to estimate SNP-heritability (${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}})$ and its enrichment in homogenous and admixed populations with summary statistics and in-sample LD estimates. In-sample LD can be estimated from a subset of the genome-wide association studies samples, allowing our method to be applied efficiently to very large cohorts. In simulations, we show that unadjusted LDSC underestimates ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ by 10-60% in admixed populations; in contrast, cov-LDSC is robustly accurate. We apply cov-LDSC to genotyping data from 8124 individuals, mostly of admixed ancestry, from the Slim Initiative in Genomic Medicine for the Americas study, and to approximately 161 000 Latino-ancestry individuals, 47 000 African American-ancestry individuals and 135 000 European-ancestry individuals, as classified by 23andMe. We estimate ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and detect heritability enrichment in three quantitative and five dichotomous phenotypes, making this, to our knowledge, the most comprehensive heritability-based analysis of admixed individuals to date. Most traits have high concordance of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and consistent tissue-specific heritability enrichment among different populations. However, for age at menarche, we observe population-specific heritability estimates of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$. We observe consistent patterns of tissue-specific heritability enrichment across populations; for example, in the limbic system for BMI, the per-standardized-annotation effect size $ \tau $* is 0.16 ± 0.04, 0.28 ± 0.11 and 0.18 ± 0.03 in the Latino-, African American- and European-ancestry populations, respectively. Our approach is a powerful way to analyze genetic data for complex traits from admixed populations.

Asunto(s)

Genética de Población , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Desequilibrio de Ligamiento/genética , Herencia Multifactorial/genética , Técnicas de Genotipaje/estadística & datos numéricos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Carácter Cuantitativo Heredable

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA