Búsqueda | Portal de Búsqueda de la BVS España

Genome-wide prediction of disease variant effects with a deep protein language model.

Brandes, Nadav; Goldman, Grant; Wang, Charlotte H; Ye, Chun Jimmie; Ntranos, Vasilis.

Nat Genet ; 55(9): 1512-1522, 2023 09.

Artículo en Inglés | MEDLINE | ID: mdl-37563329

RESUMEN

Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-million-parameter protein language model, to predict all ~450 million possible missense variant effects in the human genome, and made all predictions available on a web portal. ESM1b outperformed existing methods in classifying ~150,000 ClinVar/HGMD missense variants as pathogenic or benign and predicting measurements across 28 deep mutational scan datasets. We further annotated ~2 million variants as damaging only in specific protein isoforms, demonstrating the importance of considering all isoforms when predicting variant effects. Our approach also generalizes to more complex coding variants such as in-frame indels and stop-gains. Together, these results establish protein language models as an effective, accurate and general approach to predicting variant effects.

Asunto(s)

Biología Computacional , Programas Informáticos , Humanos , Biología Computacional/métodos , Mutación Missense/genética , Proteínas/genética , Genoma Humano/genética

Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals.

Okbay, Aysu; Wu, Yeda; Wang, Nancy; Jayashankar, Hariharan; Bennett, Michael; Nehzati, Seyed Moeen; Sidorenko, Julia; Kweon, Hyeokmoon; Goldman, Grant; Gjorgjieva, Tamara; Jiang, Yunxuan; Hicks, Barry; Tian, Chao; Hinds, David A; Ahlskog, Rafael; Magnusson, Patrik K E; Oskarsson, Sven; Hayward, Caroline; Campbell, Archie; Porteous, David J; Freese, Jeremy; Herd, Pamela; Watson, Chelsea; Jala, Jonathan; Conley, Dalton; Koellinger, Philipp D; Johannesson, Magnus; Laibson, David; Meyer, Michelle N; Lee, James J; Kong, Augustine; Yengo, Loic; Cesarini, David; Turley, Patrick; Visscher, Peter M; Beauchamp, Jonathan P; Benjamin, Daniel J; Young, Alexander I.

Nat Genet ; 54(4): 437-449, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35361970

RESUMEN

We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12-16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI's magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.

Asunto(s)

Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética

Resource profile and user guide of the Polygenic Index Repository.

Becker, Joel; Burik, Casper A P; Goldman, Grant; Wang, Nancy; Jayashankar, Hariharan; Bennett, Michael; Belsky, Daniel W; Karlsson Linnér, Richard; Ahlskog, Rafael; Kleinman, Aaron; Hinds, David A; Caspi, Avshalom; Corcoran, David L; Moffitt, Terrie E; Poulton, Richie; Sugden, Karen; Williams, Benjamin S; Harris, Kathleen Mullan; Steptoe, Andrew; Ajnakina, Olesya; Milani, Lili; Esko, Tõnu; Iacono, William G; McGue, Matt; Magnusson, Patrik K E; Mallard, Travis T; Harden, K Paige; Tucker-Drob, Elliot M; Herd, Pamela; Freese, Jeremy; Young, Alexander; Beauchamp, Jonathan P; Koellinger, Philipp D; Oskarsson, Sven; Johannesson, Magnus; Visscher, Peter M; Meyer, Michelle N; Laibson, David; Cesarini, David; Benjamin, Daniel J; Turley, Patrick; Okbay, Aysu.

Nat Hum Behav ; 5(12): 1744-1758, 2021 12.

Artículo en Inglés | MEDLINE | ID: mdl-34140656

RESUMEN

Polygenic indexes (PGIs) are DNA-based predictors. Their value for research in many scientific disciplines is growing rapidly. As a resource for researchers, we used a consistent methodology to construct PGIs for 47 phenotypes in 11 datasets. To maximize the PGIs' prediction accuracies, we constructed them using genome-wide association studies-some not previously published-from multiple data sources, including 23andMe and UK Biobank. We present a theoretical framework to help interpret analyses involving PGIs. A key insight is that a PGI can be understood as an unbiased but noisy measure of a latent variable we call the 'additive SNP factor'. Regressions in which the true regressor is this factor but the PGI is used as its proxy therefore suffer from errors-in-variables bias. We derive an estimator that corrects for the bias, illustrate the correction, and make a Python tool for implementing it publicly available.

Asunto(s)

Bases de Datos Genéticas , Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Análisis de Datos , Estudio de Asociación del Genoma Completo , Humanos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA