|

Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases.

Weeks, Elle M; Ulirsch, Jacob C; Cheng, Nathan Y; Trippe, Brian L; Fine, Rebecca S; Miao, Jenkai; Patwardhan, Tejal A; Kanai, Masahiro; Nasser, Joseph; Fulco, Charles P; Tashman, Katherine C; Aguet, Francois; Li, Taibo; Ordovas-Montanes, Jose; Smillie, Christopher S; Biton, Moshe; Shalek, Alex K; Ananthakrishnan, Ashwin N; Xavier, Ramnik J; Regev, Aviv; Gupta, Rajat M; Lage, Kasper; Ardlie, Kristin G; Hirschhorn, Joel N; Lander, Eric S; Engreitz, Jesse M; Finucane, Hilary K.

Nat Genet ; 55(8): 1267-1276, 2023 08.

Article En | MEDLINE | ID: mdl-37443254

Genome-wide association studies (GWASs) are a valuable tool for understanding the biology of complex human traits and diseases, but associated variants rarely point directly to causal genes. In the present study, we introduce a new method, polygenic priority score (PoPS), that learns trait-relevant gene features, such as cell-type-specific expression, to prioritize genes at GWAS loci. Using a large evaluation set of genes with fine-mapped coding variants, we show that PoPS and the closest gene individually outperform other gene prioritization methods, but observe the best overall performance by combining PoPS with orthogonal methods. Using this combined approach, we prioritize 10,642 unique gene-trait pairs across 113 complex traits and diseases with high precision, finding not only well-established gene-trait relationships but nominating new genes at unresolved loci, such as LGR4 for estimated glomerular filtration rate and CCR7 for deep vein thrombosis. Overall, we demonstrate that PoPS provides a powerful addition to the gene prioritization toolbox.

Multifactorial Inheritance , Quantitative Trait Loci , Humans , Multifactorial Inheritance/genetics , Quantitative Trait Loci/genetics , Genome-Wide Association Study/methods , Genetic Predisposition to Disease/genetics , Phenotype , Polymorphism, Single Nucleotide/genetics

De novo design of protein structure and function with RFdiffusion.

Watson, Joseph L; Juergens, David; Bennett, Nathaniel R; Trippe, Brian L; Yim, Jason; Eisenach, Helen E; Ahern, Woody; Borst, Andrew J; Ragotte, Robert J; Milles, Lukas F; Wicky, Basile I M; Hanikel, Nikita; Pellock, Samuel J; Courbet, Alexis; Sheffler, William; Wang, Jue; Venkatesh, Preetham; Sappington, Isaac; Torres, Susana Vázquez; Lauko, Anna; De Bortoli, Valentin; Mathieu, Emile; Ovchinnikov, Sergey; Barzilay, Regina; Jaakkola, Tommi S; DiMaio, Frank; Baek, Minkyung; Baker, David.

Nature ; 620(7976): 1089-1100, 2023 Aug.

Article En | MEDLINE | ID: mdl-37433327

There has been considerable recent progress in designing new proteins using deep-learning methods1-9. Despite this progress, a general deep-learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher-order symmetric architectures, has yet to be described. Diffusion models10,11 have had considerable success in image and language generative modelling but limited success when applied to protein modelling, probably due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of designed symmetric assemblies, metal-binding proteins and protein binders. The accuracy of RFdiffusion is confirmed by the cryogenic electron microscopy structure of a designed binder in complex with influenza haemagglutinin that is nearly identical to the design model. In a manner analogous to networks that produce images from user-specified inputs, RFdiffusion enables the design of diverse functional proteins from simple molecular specifications.

Deep Learning , Proteins , Catalytic Domain , Cryoelectron Microscopy , Hemagglutinin Glycoproteins, Influenza Virus/chemistry , Hemagglutinin Glycoproteins, Influenza Virus/metabolism , Hemagglutinin Glycoproteins, Influenza Virus/ultrastructure , Protein Binding , Proteins/chemistry , Proteins/metabolism , Proteins/ultrastructure