Pesquisa | Biblioteca Virtual em Saúde

A deep catalogue of protein-coding variation in 983,578 individuals.

Sun, Kathie Y; Bai, Xiaodong; Chen, Siying; Bao, Suying; Zhang, Chuanyi; Kapoor, Manav; Backman, Joshua; Joseph, Tyler; Maxwell, Evan; Mitra, George; Gorovits, Alexander; Mansfield, Adam; Boutkov, Boris; Gokhale, Sujit; Habegger, Lukas; Marcketta, Anthony; Locke, Adam E; Ganel, Liron; Hawes, Alicia; Kessler, Michael D; Sharma, Deepika; Staples, Jeffrey; Bovijn, Jonas; Gelfman, Sahar; Di Gioia, Alessandro; Rajagopal, Veera M; Lopez, Alexander; Varela, Jennifer Rico; Alegre, Jesus; Berumen, Jaime; Tapia-Conyer, Roberto; Kuri-Morales, Pablo; Torres, Jason; Emberson, Jonathan; Collins, Rory; Cantor, Michael; Thornton, Timothy; Kang, Hyun Min; Overton, John D; Shuldiner, Alan R; Cremona, M Laura; Nafde, Mona; Baras, Aris; Abecasis, Goncalo; Marchini, Jonathan; Reid, Jeffrey G; Salerno, William; Balasubramanian, Suganthi.

Nature ; 2024 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-38768635

RESUMO

Rare coding variants that significantly impact function provide insights into the biology of a gene1-3. However, ascertaining their frequency requires large sample sizes4-8. Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser.

A deep catalog of protein-coding variation in 985,830 individuals.

Sun, Kathie Y; Bai, Xiaodong; Chen, Siying; Bao, Suying; Kapoor, Manav; Zhang, Chuanyi; Backman, Joshua; Joseph, Tyler; Maxwell, Evan; Mitra, George; Gorovits, Alexander; Mansfield, Adam; Boutkov, Boris; Gokhale, Sujit; Habegger, Lukas; Marcketta, Anthony; Locke, Adam; Kessler, Michael D; Sharma, Deepika; Staples, Jeffrey; Bovijn, Jonas; Gelfman, Sahar; Gioia, Alessandro Di; Rajagopal, Veera; Lopez, Alexander; Varela, Jennifer Rico; Alegre, Jesus; Berumen, Jaime; Tapia-Conyer, Roberto; Kuri-Morales, Pablo; Torres, Jason; Emberson, Jonathan; Collins, Rory; Cantor, Michael; Thornton, Timothy; Kang, Hyun Min; Overton, John; Shuldiner, Alan R; Cremona, M Laura; Nafde, Mona; Baras, Aris; Abecasis, Goncalo; Marchini, Jonathan; Reid, Jeffrey G; Salerno, William; Balasubramanian, Suganthi.

bioRxiv ; 2023 Nov 02.

Artigo em Inglês | MEDLINE | ID: mdl-37214792

RESUMO

Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.

Computationally efficient whole-genome regression for quantitative and binary traits.

Mbatchou, Joelle; Barnard, Leland; Backman, Joshua; Marcketta, Anthony; Kosmicki, Jack A; Ziyatdinov, Andrey; Benner, Christian; O'Dushlaine, Colm; Barber, Mathew; Boutkov, Boris; Habegger, Lukas; Ferreira, Manuel; Baras, Aris; Reid, Jeffrey; Abecasis, Goncalo; Maxwell, Evan; Marchini, Jonathan.

Nat Genet ; 53(7): 1097-1103, 2021 07.

Artigo em Inglês | MEDLINE | ID: mdl-34017140

RESUMO

Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.

Assuntos

Biologia Computacional , Estudo de Associação Genômica Ampla , Genômica , Estudos de Casos e Controles , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Genótipo , Humanos , Modelos Logísticos , Aprendizado de Máquina , Fenótipo , Reprodutibilidade dos Testes

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA