Pesquisa | BVS Integralidade em Saúde

Genome-wide prediction of disease variant effects with a deep protein language model.

Brandes, Nadav; Goldman, Grant; Wang, Charlotte H; Ye, Chun Jimmie; Ntranos, Vasilis.

Nat Genet ; 55(9): 1512-1522, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37563329

RESUMO

Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-million-parameter protein language model, to predict all ~450 million possible missense variant effects in the human genome, and made all predictions available on a web portal. ESM1b outperformed existing methods in classifying ~150,000 ClinVar/HGMD missense variants as pathogenic or benign and predicting measurements across 28 deep mutational scan datasets. We further annotated ~2 million variants as damaging only in specific protein isoforms, demonstrating the importance of considering all isoforms when predicting variant effects. Our approach also generalizes to more complex coding variants such as in-frame indels and stop-gains. Together, these results establish protein language models as an effective, accurate and general approach to predicting variant effects.

Assuntos

Biologia Computacional , Software , Humanos , Biologia Computacional/métodos , Mutação de Sentido Incorreto/genética , Proteínas/genética , Genoma Humano/genética

Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals.

Okbay, Aysu; Wu, Yeda; Wang, Nancy; Jayashankar, Hariharan; Bennett, Michael; Nehzati, Seyed Moeen; Sidorenko, Julia; Kweon, Hyeokmoon; Goldman, Grant; Gjorgjieva, Tamara; Jiang, Yunxuan; Hicks, Barry; Tian, Chao; Hinds, David A; Ahlskog, Rafael; Magnusson, Patrik K E; Oskarsson, Sven; Hayward, Caroline; Campbell, Archie; Porteous, David J; Freese, Jeremy; Herd, Pamela; Watson, Chelsea; Jala, Jonathan; Conley, Dalton; Koellinger, Philipp D; Johannesson, Magnus; Laibson, David; Meyer, Michelle N; Lee, James J; Kong, Augustine; Yengo, Loic; Cesarini, David; Turley, Patrick; Visscher, Peter M; Beauchamp, Jonathan P; Benjamin, Daniel J; Young, Alexander I.

Nat Genet ; 54(4): 437-449, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35361970

RESUMO

We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12-16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI's magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.

Assuntos

Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética

Resource profile and user guide of the Polygenic Index Repository.

Becker, Joel; Burik, Casper A P; Goldman, Grant; Wang, Nancy; Jayashankar, Hariharan; Bennett, Michael; Belsky, Daniel W; Karlsson Linnér, Richard; Ahlskog, Rafael; Kleinman, Aaron; Hinds, David A; Caspi, Avshalom; Corcoran, David L; Moffitt, Terrie E; Poulton, Richie; Sugden, Karen; Williams, Benjamin S; Harris, Kathleen Mullan; Steptoe, Andrew; Ajnakina, Olesya; Milani, Lili; Esko, Tõnu; Iacono, William G; McGue, Matt; Magnusson, Patrik K E; Mallard, Travis T; Harden, K Paige; Tucker-Drob, Elliot M; Herd, Pamela; Freese, Jeremy; Young, Alexander; Beauchamp, Jonathan P; Koellinger, Philipp D; Oskarsson, Sven; Johannesson, Magnus; Visscher, Peter M; Meyer, Michelle N; Laibson, David; Cesarini, David; Benjamin, Daniel J; Turley, Patrick; Okbay, Aysu.

Nat Hum Behav ; 5(12): 1744-1758, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34140656

RESUMO

Polygenic indexes (PGIs) are DNA-based predictors. Their value for research in many scientific disciplines is growing rapidly. As a resource for researchers, we used a consistent methodology to construct PGIs for 47 phenotypes in 11 datasets. To maximize the PGIs' prediction accuracies, we constructed them using genome-wide association studies-some not previously published-from multiple data sources, including 23andMe and UK Biobank. We present a theoretical framework to help interpret analyses involving PGIs. A key insight is that a PGI can be understood as an unbiased but noisy measure of a latent variable we call the 'additive SNP factor'. Regressions in which the true regressor is this factor but the PGI is used as its proxy therefore suffer from errors-in-variables bias. We derive an estimator that corrects for the bias, illustrate the correction, and make a Python tool for implementing it publicly available.

Assuntos

Bases de Dados Genéticas , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Análise de Dados , Estudo de Associação Genômica Ampla , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa