Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 85
Filter
Add more filters










Publication year range
1.
Cell ; 2024 May 02.
Article in English | MEDLINE | ID: mdl-38723628

ABSTRACT

Hepatitis B virus (HBV) is a small double-stranded DNA virus that chronically infects 296 million people. Over half of its compact genome encodes proteins in two overlapping reading frames, and during evolution, multiple selective pressures can act on shared nucleotides. This study combines an RNA-based HBV cell culture system with deep mutational scanning (DMS) to uncouple cis- and trans-acting sequence requirements in the HBV genome. The results support a leaky ribosome scanning model for polymerase translation, provide a fitness map of the HBV polymerase at single-nucleotide resolution, and identify conserved prolines adjacent to the HBV polymerase termination codon that stall ribosomes. Further experiments indicated that stalled ribosomes tether the nascent polymerase to its template RNA, ensuring cis-preferential RNA packaging and reverse transcription of the HBV genome.

2.
ArXiv ; 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38699161

ABSTRACT

Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.

3.
Nat Struct Mol Biol ; 31(4): 667-677, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38326651

ABSTRACT

The orphan G protein-coupled receptor (GPCR) GPR161 plays a central role in development by suppressing Hedgehog signaling. The fundamental basis of how GPR161 is activated remains unclear. Here, we determined a cryogenic-electron microscopy structure of active human GPR161 bound to heterotrimeric Gs. This structure revealed an extracellular loop 2 that occupies the canonical GPCR orthosteric ligand pocket. Furthermore, a sterol that binds adjacent to transmembrane helices 6 and 7 stabilizes a GPR161 conformation required for Gs coupling. Mutations that prevent sterol binding to GPR161 suppress Gs-mediated signaling. These mutants retain the ability to suppress GLI2 transcription factor accumulation in primary cilia, a key function of ciliary GPR161. By contrast, a protein kinase A-binding site in the GPR161 C terminus is critical in suppressing GLI2 ciliary accumulation. Our work highlights how structural features of GPR161 interface with the Hedgehog pathway and sets a foundation to understand the role of GPR161 function in other signaling pathways.


Subject(s)
Hedgehog Proteins , Signal Transduction , Humans , Hedgehog Proteins/genetics , Receptors, G-Protein-Coupled/metabolism , Mutation , Cilia/metabolism
4.
Nat Commun ; 15(1): 1639, 2024 Feb 22.
Article in English | MEDLINE | ID: mdl-38388493

ABSTRACT

Recent developments in protein design rely on large neural networks with up to 100s of millions of parameters, yet it is unclear which residue dependencies are critical for determining protein function. Here, we show that amino acid preferences at individual residues-without accounting for mutation interactions-explain much and sometimes virtually all of the combinatorial mutation effects across 8 datasets (R2 ~ 78-98%). Hence, few observations (~100 times the number of mutated residues) enable accurate prediction of held-out variant effects (Pearson r > 0.80). We hypothesized that the local structural contexts around a residue could be sufficient to predict mutation preferences, and develop an unsupervised approach termed CoVES (Combinatorial Variant Effects from Structure). Our results suggest that CoVES outperforms not just model-free methods but also similarly to complex models for creating functional and diverse protein variants. CoVES offers an effective alternative to complicated models for identifying functional protein mutations.


Subject(s)
Neural Networks, Computer , Proteins , Proteins/metabolism , Amino Acids/chemistry , Mutation
5.
Res Sq ; 2024 Jan 04.
Article in English | MEDLINE | ID: mdl-38260496

ABSTRACT

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome1-6. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data7 and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders8 from potentially healthy individuals9. popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.

6.
Nat Methods ; 21(3): 531-540, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38279009

ABSTRACT

Analysis across a growing number of single-cell perturbation datasets is hampered by poor data interoperability. To facilitate development and benchmarking of computational methods, we collect a set of 44 publicly available single-cell perturbation-response datasets with molecular readouts, including transcriptomics, proteomics and epigenomics. We apply uniform quality control pipelines and harmonize feature annotations. The resulting information resource, scPerturb, enables development and testing of computational methods, and facilitates comparison and integration across datasets. We describe energy statistics (E-statistics) for quantification of perturbation effects and significance testing, and demonstrate E-distance as a general distance measure between sets of single-cell expression profiles. We illustrate the application of E-statistics for quantifying similarity and efficacy of perturbations. The perturbation-response datasets and E-statistics computation software are publicly available at scperturb.org. This work provides an information resource for researchers working with single-cell perturbation data and recommendations for experimental design, including optimal cell counts and read depth.


Subject(s)
Proteomics , Software , Gene Expression Profiling/methods , Epigenomics , Single-Cell Analysis
7.
medRxiv ; 2023 Nov 28.
Article in English | MEDLINE | ID: mdl-38076790

ABSTRACT

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders from potentially healthy individuals. popEVE identifies 442 genes in a cohort of developmental disorder cases, including evidence of 119 novel genetic disorders without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable. Interactive web viewer and downloads available at pop.evemodel.org.

8.
bioRxiv ; 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-38106034

ABSTRACT

Protein design holds immense potential for optimizing naturally occurring proteins, with broad applications in drug discovery, material design, and sustainability. However, computational methods for protein engineering are confronted with significant challenges, such as an expansive design space, sparse functional regions, and a scarcity of available labels. These issues are further exacerbated in practice by the fact most real-life design scenarios necessitate the simultaneous optimization of multiple properties. In this work, we introduce ProteinNPT, a non-parametric transformer variant tailored to protein sequences and particularly suited to label-scarce and multi-task learning settings. We first focus on the supervised fitness prediction setting and develop several cross-validation schemes which support robust performance assessment. We subsequently reimplement prior top-performing baselines, introduce several extensions of these baselines by integrating diverse branches of the protein engineering literature, and demonstrate that ProteinNPT consistently outperforms all of them across a diverse set of protein property prediction tasks. Finally, we demonstrate the value of our approach for iterative protein design across extensive in silico Bayesian optimization and conditional sampling experiments.

9.
bioRxiv ; 2023 Dec 08.
Article in English | MEDLINE | ID: mdl-38106144

ABSTRACT

Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.

10.
bioRxiv ; 2023 Nov 14.
Article in English | MEDLINE | ID: mdl-38014077

ABSTRACT

When nature maintains or evolves a gene's function over millions of years at scale, it produces a diversity of homologous sequences whose patterns of conservation and change contain rich structural, functional, and historical information about the gene. However, natural gene diversity likely excludes vast regions of functional sequence space and includes phylogenetic and evolutionary eccentricities, limiting what information we can extract. We introduce an accessible experimental approach for compressing long-term gene evolution to laboratory timescales, allowing for the direct observation of extensive adaptation and divergence followed by inference of structural, functional, and environmental constraints for any selectable gene. To enable this approach, we developed a new orthogonal DNA replication (OrthoRep) system that durably hypermutates chosen genes at a rate of >10 -4 substitutions per base in vivo . When OrthoRep was used to evolve a conditionally essential maladapted enzyme, we obtained thousands of unique multi-mutation sequences with many pairs >60 amino acids apart (>15% divergence), revealing known and new factors influencing enzyme adaptation. The fitness of evolved sequences was not predictable by advanced machine learning models trained on natural variation. We suggest that OrthoRep supports the prospective and systematic discovery of constraints shaping gene evolution, uncovering of new regions in fitness landscapes, and general applications in biomolecular engineering.

11.
Res Sq ; 2023 Oct 19.
Article in English | MEDLINE | ID: mdl-37886540

ABSTRACT

As genetic testing has become more accessible and affordable, variants of uncertain significance (VUS) are increasingly identified, and determining whether these variants play causal roles in disease is a major challenge. The known disease-associated Annexin A11 (ANXA11) mutations result in ANXA11 aggregation, alterations in lysosomal-RNA granule co-trafficking, and TDP-43 mis-localization and present as amyotrophic lateral sclerosis or frontotemporal dementia. We identified a novel VUS in ANXA11 (P93S) in a kindred with corticobasal syndrome and unique radiographic features that segregated with disease. We then queried neurodegenerative disorder clinic databases to identify the phenotypic spread of ANXA11 mutations. Multi-modal computational analysis of this variant was performed and the effect of this VUS on ANXA11 function and TDP-43 biology was characterized in iPSC-derived neurons. Single-cell sequencing and proteomic analysis of iPSC-derived neurons and microglia were used to determine the multiomic signature of this VUS. Mutations in ANXA11 were found in association with clinically diagnosed corticobasal syndrome, thereby establishing corticobasal syndrome as part of ANXA11 clinical spectrum. In iPSC-derived neurons expressing mutant ANXA11, we found decreased colocalization of lysosomes and decreased neuritic RNA as well as decreased nuclear TDP-43 and increased formation of cryptic exons compared to controls. Multiomic assessment of the P93S variant in iPSC-derived neurons and microglia indicates that the pathogenic omic signature in neurons is modest compared to microglia. Additionally, omic studies reveal that immune dysregulation and interferon signaling pathways in microglia are central to disease. Collectively, these findings identify a new pathogenic variant in ANXA11, expand the range of clinical syndromes caused by ANXA11 mutations, and implicate both neuronal and microglia dysfunction in ANXA11 pathophysiology. This work illustrates the potential for iPSC-derived cellular models to revolutionize the variant annotation process and provides a generalizable approach to determining causality of novel variants across genes.

12.
Nature ; 622(7984): 818-825, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821700

ABSTRACT

Effective pandemic preparedness relies on anticipating viral mutations that are able to evade host immune responses to facilitate vaccine and therapeutic design. However, current strategies for viral evolution prediction are not available early in a pandemic-experimental approaches require host polyclonal antibodies to test against1-16, and existing computational methods draw heavily from current strain prevalence to make reliable predictions of variants of concern17-19. To address this, we developed EVEscape, a generalizable modular framework that combines fitness predictions from a deep learning model of historical sequences with biophysical and structural information. EVEscape quantifies the viral escape potential of mutations at scale and has the advantage of being applicable before surveillance sequencing, experimental scans or three-dimensional structures of antibody complexes are available. We demonstrate that EVEscape, trained on sequences available before 2020, is as accurate as high-throughput experimental scans at anticipating pandemic variation for SARS-CoV-2 and is generalizable to other viruses including influenza, HIV and understudied viruses with pandemic potential such as Lassa and Nipah. We provide continually revised escape scores for all current strains of SARS-CoV-2 and predict probable further mutations to forecast emerging strains as a tool for continuing vaccine development ( evescape.org ).


Subject(s)
Evolution, Molecular , Forecasting , Immune Evasion , Mutation , Pandemics , Viruses , Humans , Drug Design , HIV Infections , Immune Evasion/genetics , Immune Evasion/immunology , Influenza, Human , Lassa virus , Nipah Virus , SARS-CoV-2/genetics , SARS-CoV-2/immunology , Viral Vaccines/immunology , Viruses/genetics , Viruses/immunology
14.
Genome Biol ; 24(1): 147, 2023 07 03.
Article in English | MEDLINE | ID: mdl-37394429

ABSTRACT

Sequencing has revealed hundreds of millions of human genetic variants, and continued efforts will only add to this variant avalanche. Insufficient information exists to interpret the effects of most variants, limiting opportunities for precision medicine and comprehension of genome function. A solution lies in experimental assessment of the functional effect of variants, which can reveal their biological and clinical impact. However, variant effect assays have generally been undertaken reactively for individual variants only after and, in most cases long after, their first observation. Now, multiplexed assays of variant effect can characterise massive numbers of variants simultaneously, yielding variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. Generating maps for every protein encoding gene and regulatory element in the human genome would create an 'Atlas' of variant effect maps and transform our understanding of genetics and usher in a new era of nucleotide-resolution functional knowledge of the genome. An Atlas would reveal the fundamental biology of the human genome, inform human evolution, empower the development and use of therapeutics and maximize the utility of genomics for diagnosing and treating disease. The Atlas of Variant Effects Alliance is an international collaborative group comprising hundreds of researchers, technologists and clinicians dedicated to realising an Atlas of Variant Effects to help deliver on the promise of genomics.


Subject(s)
Genetic Variation , Genomics , Humans , Genome, Human , High-Throughput Nucleotide Sequencing , Precision Medicine
15.
bioRxiv ; 2023 May 24.
Article in English | MEDLINE | ID: mdl-37292845

ABSTRACT

The orphan G protein-coupled receptor (GPCR) GPR161 is enriched in primary cilia, where it plays a central role in suppressing Hedgehog signaling1. GPR161 mutations lead to developmental defects and cancers2,3,4. The fundamental basis of how GPR161 is activated, including potential endogenous activators and pathway-relevant signal transducers, remains unclear. To elucidate GPR161 function, we determined a cryogenic-electron microscopy structure of active GPR161 bound to the heterotrimeric G protein complex Gs. This structure revealed an extracellular loop 2 that occupies the canonical GPCR orthosteric ligand pocket. Furthermore, we identify a sterol that binds to a conserved extrahelical site adjacent to transmembrane helices 6 and 7 and stabilizes a GPR161 conformation required for Gs coupling. Mutations that prevent sterol binding to GPR161 suppress cAMP pathway activation. Surprisingly, these mutants retain the ability to suppress GLI2 transcription factor accumulation in cilia, a key function of ciliary GPR161 in Hedgehog pathway suppression. By contrast, a protein kinase A-binding site in the GPR161 C-terminus is critical in suppressing GLI2 ciliary accumulation. Our work highlights how unique structural features of GPR161 interface with the Hedgehog pathway and sets a foundation to understand the broader role of GPR161 function in other signaling pathways.

16.
Nat Med ; 29(5): 1113-1122, 2023 05.
Article in English | MEDLINE | ID: mdl-37156936

ABSTRACT

Pancreatic cancer is an aggressive disease that typically presents late with poor outcomes, indicating a pronounced need for early detection. In this study, we applied artificial intelligence methods to clinical data from 6 million patients (24,000 pancreatic cancer cases) in Denmark (Danish National Patient Registry (DNPR)) and from 3 million patients (3,900 cases) in the United States (US Veterans Affairs (US-VA)). We trained machine learning models on the sequence of disease codes in clinical histories and tested prediction of cancer occurrence within incremental time windows (CancerRiskNet). For cancer occurrence within 36 months, the performance of the best DNPR model has area under the receiver operating characteristic (AUROC) curve = 0.88 and decreases to AUROC (3m) = 0.83 when disease events within 3 months before cancer diagnosis are excluded from training, with an estimated relative risk of 59 for 1,000 highest-risk patients older than age 50 years. Cross-application of the Danish model to US-VA data had lower performance (AUROC = 0.71), and retraining was needed to improve performance (AUROC = 0.78, AUROC (3m) = 0.76). These results improve the ability to design realistic surveillance programs for patients at elevated risk, potentially benefiting lifespan and quality of life by early detection of this aggressive cancer.


Subject(s)
Deep Learning , Pancreatic Neoplasms , Humans , Middle Aged , Artificial Intelligence , Quality of Life , Pancreatic Neoplasms/diagnosis , Pancreatic Neoplasms/epidemiology , Algorithms , Pancreatic Neoplasms
17.
bioRxiv ; 2023 May 09.
Article in English | MEDLINE | ID: mdl-37214973

ABSTRACT

Designing optimized proteins is important for a range of practical applications. Protein design is a rapidly developing field that would benefit from approaches that enable many changes in the amino acid primary sequence, rather than a small number of mutations, while maintaining structure and enhancing function. Homologous protein sequences contain extensive information about various protein properties and activities that have emerged over billions of years of evolution. Evolutionary models of sequence co-variation, derived from a set of homologous sequences, have proven effective in a range of applications including structure determination and mutation effect prediction. In this work we apply one of these models (EVcouplings) to computationally design highly divergent variants of the model protein TEM-1 ß-lactamase, and characterize these designs experimentally using multiple biochemical and biophysical assays. Nearly all designed variants were functional, including one with 84 mutations from the nearest natural homolog. Surprisingly, all functional designs had large increases in thermostability and most had a broadening of available substrates. These property enhancements occurred while maintaining a nearly identical structure to the wild type enzyme. Collectively, this work demonstrates that evolutionary models of sequence co-variation (1) are able to capture complex epistatic interactions that successfully guide large sequence departures from natural contexts, and (2) can be applied to generate functional diversity useful for many applications in protein design.

18.
Nat Chem Biol ; 19(8): 1013-1021, 2023 08.
Article in English | MEDLINE | ID: mdl-37081311

ABSTRACT

The relaxin family peptide receptor 1 (RXFP1) is the receptor for relaxin-2, an important regulator of reproductive and cardiovascular physiology. RXFP1 is a multi-domain G protein-coupled receptor (GPCR) with an ectodomain consisting of a low-density lipoprotein receptor class A (LDLa) module and leucine-rich repeats. The mechanism of RXFP1 signal transduction is clearly distinct from that of other GPCRs, but remains very poorly understood. In the present study, we determine the cryo-electron microscopy structure of active-state human RXFP1, bound to a single-chain version of the endogenous agonist relaxin-2 and the heterotrimeric Gs protein. Evolutionary coupling analysis and structure-guided functional experiments reveal that RXFP1 signals through a mechanism of autoinhibition. Our results explain how an unusual GPCR family functions, providing a path to rational drug development targeting the relaxin receptors.


Subject(s)
Relaxin , Humans , Relaxin/chemistry , Relaxin/metabolism , Cryoelectron Microscopy , Receptors, G-Protein-Coupled/metabolism , Receptors, Peptide/chemistry
19.
Nature ; 615(7951): 300-304, 2023 03.
Article in English | MEDLINE | ID: mdl-36859542

ABSTRACT

Gram-negative bacteria surround their cytoplasmic membrane with a peptidoglycan (PG) cell wall and an outer membrane (OM) with an outer leaflet composed of lipopolysaccharide (LPS)1. This complex envelope presents a formidable barrier to drug entry and is a major determinant of the intrinsic antibiotic resistance of these organisms2. The biogenesis pathways that build the surface are also targets of many of our most effective antibacterial therapies3. Understanding the molecular mechanisms underlying the assembly of the Gram-negative envelope therefore promises to aid the development of new treatments effective against the growing problem of drug-resistant infections. Although the individual pathways for PG and OM synthesis and assembly are well characterized, almost nothing is known about how the biogenesis of these essential surface layers is coordinated. Here we report the discovery of a regulatory interaction between the committed enzymes for the PG and LPS synthesis pathways in the Gram-negative pathogen Pseudomonas aeruginosa. We show that the PG synthesis enzyme MurA interacts directly and specifically with the LPS synthesis enzyme LpxC. Moreover, MurA was shown to stimulate LpxC activity in cells and in a purified system. Our results support a model in which the assembly of the PG and OM layers in many proteobacterial species is coordinated by linking the activities of the committed enzymes in their respective synthesis pathways.


Subject(s)
Bacterial Outer Membrane , Cell Wall , Pseudomonas aeruginosa , Cell Wall/metabolism , Lipopolysaccharides/metabolism , Bacterial Outer Membrane/chemistry , Bacterial Outer Membrane/metabolism , Pseudomonas aeruginosa/cytology , Pseudomonas aeruginosa/enzymology , Pseudomonas aeruginosa/metabolism , Peptidoglycan/biosynthesis , Peptidoglycan/metabolism
20.
Nat Commun ; 13(1): 7554, 2022 12 07.
Article in English | MEDLINE | ID: mdl-36477674

ABSTRACT

Antibodies are essential biological research tools and important therapeutic agents, but some exhibit non-specific binding to off-target proteins and other biomolecules. Such polyreactive antibodies compromise screening pipelines, lead to incorrect and irreproducible experimental results, and are generally intractable for clinical development. Here, we design a set of experiments using a diverse naïve synthetic camelid antibody fragment (nanobody) library to enable machine learning models to accurately assess polyreactivity from protein sequence (AUC > 0.8). Moreover, our models provide quantitative scoring metrics that predict the effect of amino acid substitutions on polyreactivity. We experimentally test our models' performance on three independent nanobody scaffolds, where over 90% of predicted substitutions successfully reduced polyreactivity. Importantly, the models allow us to diminish the polyreactivity of an angiotensin II type I receptor antagonist nanobody, without compromising its functional properties. We provide a companion web-server that offers a straightforward means of predicting polyreactivity and polyreactivity-reducing mutations for any given nanobody sequence.


Subject(s)
Immunoglobulin Fragments
SELECTION OF CITATIONS
SEARCH DETAIL
...