Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 57
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 175(6): 1449-1451, 2018 11 29.
Article in English | MEDLINE | ID: mdl-30500528

ABSTRACT

This year, the Nobel Prize in Chemistry was awarded to three pioneering scientists who applied laboratory evolution for protein engineering: Frances Arnold, George P. Smith, and Sir Gregory P. Winter. This approach has had major impact in various applications and inspires the search for the general principles of design through evolution.


Subject(s)
Directed Molecular Evolution , Nobel Prize , Protein Engineering , Directed Molecular Evolution/methods , Directed Molecular Evolution/trends , Humans , Protein Engineering/methods , Protein Engineering/trends
2.
Cell ; 166(2): 468-480, 2016 Jul 14.
Article in English | MEDLINE | ID: mdl-27321669

ABSTRACT

Proteins display the capacity for adaptation to new functions, a property critical for evolvability. But what structural principles underlie the capacity for adaptation? Here, we show that adaptation to a physiologically distinct class of ligand specificity in a PSD95, DLG1, ZO-1 (PDZ) domain preferentially occurs through class-bridging intermediate mutations located distant from the ligand-binding site. These mutations provide a functional link between ligand classes and demonstrate the principle of "conditional neutrality" in mediating evolutionary adaptation. Structures show that class-bridging mutations work allosterically to open up conformational plasticity at the active site, permitting novel functions while retaining existing function. More generally, the class-bridging phenotype arises from mutations in an evolutionarily conserved network of coevolving amino acids in the PDZ family (the sector) that connects the active site to distant surface sites. These findings introduce the concept that allostery in proteins could have its origins not in protein function but in the capacity to adapt.


Subject(s)
Evolution, Molecular , Intracellular Signaling Peptides and Proteins/chemistry , Intracellular Signaling Peptides and Proteins/genetics , Membrane Proteins/chemistry , Membrane Proteins/genetics , Allosteric Regulation , Animals , Disks Large Homolog 4 Protein , Intracellular Signaling Peptides and Proteins/metabolism , Membrane Proteins/metabolism , Models, Molecular , Mutation , Protein Domains , Protein Engineering , Rats
3.
Cell ; 160(5): 882-892, 2015 Feb 26.
Article in English | MEDLINE | ID: mdl-25723163

ABSTRACT

Evolvability­the capacity to generate beneficial heritable variation­is a central property of biological systems. However, its origins and modulation by environmental factors have not been examined systematically. Here, we analyze the fitness effects of all single mutations in TEM-1 ß-lactamase (4,997 variants) under selection for the wild-type function (ampicillin resistance) and for a new function (cefotaxime resistance). Tolerance to mutation in this enzyme is bimodal and dependent on the strength of purifying selection in vivo, a result that derives from a steep non-linear ampicillin-dependent relationship between biochemical activity and fitness. Interestingly, cefotaxime resistance emerges from mutations that are neutral at low levels of ampicillin but deleterious at high levels; thus the capacity to evolve new function also depends on the strength of selection. The key property controlling evolvability is an excess of enzymatic activity relative to the strength of selection, suggesting that fluctuating environments might select for high-activity enzymes.


Subject(s)
Ampicillin Resistance , Cefotaxime/pharmacology , Directed Molecular Evolution , Escherichia coli/drug effects , Escherichia coli/genetics , beta-Lactamases/genetics , Ampicillin/pharmacology , Escherichia coli/enzymology , Genetic Fitness , Mutation , beta-Lactam Resistance , beta-Lactamases/chemistry
4.
Cell ; 147(7): 1564-75, 2011 Dec 23.
Article in English | MEDLINE | ID: mdl-22196731

ABSTRACT

Recent work indicates a general architecture for proteins in which sparse networks of physically contiguous and coevolving amino acids underlie basic aspects of structure and function. These networks, termed sectors, are spatially organized such that active sites are linked to many surface sites distributed throughout the structure. Using the metabolic enzyme dihydrofolate reductase as a model system, we show that: (1) the sector is strongly correlated to a network of residues undergoing millisecond conformational fluctuations associated with enzyme catalysis, and (2) sector-connected surface sites are statistically preferred locations for the emergence of allosteric control in vivo. Thus, sectors represent an evolutionarily conserved "wiring" mechanism that can enable perturbations at specific surface positions to rapidly initiate conformational control over protein function. These findings suggest that sectors enable the evolution of intermolecular communication and regulation.


Subject(s)
Allosteric Regulation , Escherichia coli/enzymology , Models, Molecular , Proteins/chemistry , Escherichia coli/metabolism , Evolution, Molecular , PDZ Domains , Proteins/genetics , Proteins/metabolism , Tetrahydrofolate Dehydrogenase/chemistry , Tetrahydrofolate Dehydrogenase/genetics , Tetrahydrofolate Dehydrogenase/metabolism
5.
Cell ; 138(4): 774-86, 2009 Aug 21.
Article in English | MEDLINE | ID: mdl-19703402

ABSTRACT

Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasi-independent groups of correlated amino acids that we term "protein sectors." Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories.


Subject(s)
Evolution, Molecular , Serine Endopeptidases/chemistry , Amino Acid Sequence , Amino Acids/chemistry , Amino Acids/genetics , Amino Acids/metabolism , Animals , Conserved Sequence , Enzyme Stability , Humans , Models, Molecular , Rats , Serine Endopeptidases/genetics , Serine Endopeptidases/metabolism
6.
Lancet Oncol ; 24(1): 22-32, 2023 01.
Article in English | MEDLINE | ID: mdl-36603919

ABSTRACT

BACKGROUND: Population-based cancer survival is a key measurement of cancer control performance linked to diagnosis and treatment, but benchmarking studies that include lower-income settings and that link results to health systems and human development are scarce. SURVCAN-3 is an international collaboration of population-based cancer registries that aims to benchmark timely and comparable cancer survival estimates in Africa, central and south America, and Asia. METHODS: In SURVCAN-3, population-based cancer registries from Africa, central and south America, and Asia were invited to contribute data. Quality control and data checks were carried out in collaboration with population-based cancer registries and, where applicable, active follow-up was performed at the registry. Patient-level data (sex, age at diagnosis, date of diagnosis, morphology and topography, stage, vital status, and date of death or last contact) were included, comprising patients diagnosed between Jan 1, 2008, and Dec 31, 2012, and followed up for at least 2 years (until Dec 31, 2014). Age-standardised net survival (survival where cancer was the only possible cause of death), with 95% CIs, at 1 year, 3 years, and 5 years after diagnosis were calculated using Pohar-Perme estimators for 15 major cancers. 1-year, 3-year, and 5-year net survival estimates were stratified by countries within continents (Africa, central and south America, and Asia), and countries according to the four-tier Human Development Index (HDI; low, medium, high, and very high). FINDINGS: 1 400 435 cancer cases from 68 population-based cancer registries in 32 countries were included. Net survival varied substantially between countries and world regions, with estimates steadily rising with increasing levels of the HDI. Across the included cancer types, countries within the lowest HDI category (eg, CÔte d'Ivoire) had a maximum 3-year net survival of 54·6% (95% CI 33·3-71·6; prostate cancer), whereas those within the highest HDI categories (eg, Israel) had a maximum survival of 96·8% (96·1-97·3; prostate cancer). Three distinct groups with varying outcomes by country and HDI dependant on cancer type were identified: cancers with low median 3-year net survival (<30%) and small differences by HDI category (eg, lung and stomach), cancers with intermediate median 3-year net survival (30-79%) and moderate difference by HDI (eg, cervix and colorectum), and cancers with high median 3-year net survival (≥80%) and large difference by HDI (eg, breast and prostate). INTERPRETATION: Disparities in cancer survival across countries were linked to a country's developmental position, and the availability and efficiency of health services. These data can inform policy makers on priorities in cancer control to reduce apparent inequality in cancer outcome. FUNDING: Tata Memorial Hospital, the Martin-Luther-University Halle-Wittenberg, and the International Agency for Research on Cancer.


Subject(s)
Benchmarking , Prostatic Neoplasms , Male , Female , Humans , Breast , Income , Africa, Central , Registries
7.
Proc Natl Acad Sci U S A ; 117(33): 19879-19887, 2020 08 18.
Article in English | MEDLINE | ID: mdl-32747536

ABSTRACT

The ribosome translates the genetic code into proteins in all domains of life. Its size and complexity demand long-range interactions that regulate ribosome function. These interactions are largely unknown. Here, we apply a global coevolution method, statistical coupling analysis (SCA), to identify coevolving residue networks (sectors) within the 23S ribosomal RNA (rRNA) of the large ribosomal subunit. As in proteins, SCA reveals a hierarchical organization of evolutionary constraints with near-independent groups of nucleotides forming physically contiguous networks within the three-dimensional structure. Using a quantitative, continuous-culture-with-deep-sequencing assay, we confirm that the top two SCA-predicted sectors contribute to ribosome function. These sectors map to distinct ribosome activities, and their origins trace to phylogenetic divergences across all domains of life. These findings provide a foundation to map ribosome allostery, explore ribosome biogenesis, and engineer ribosomes for new functions. Despite differences in chemical structure, protein and RNA enzymes appear to share a common internal logic of interaction and assembly.


Subject(s)
Escherichia coli/genetics , RNA, Bacterial/chemistry , RNA, Ribosomal, 23S/chemistry , Ribosomes/genetics , Escherichia coli/chemistry , Escherichia coli/metabolism , Evolution, Molecular , Nucleic Acid Conformation , Phylogeny , RNA, Bacterial/genetics , RNA, Bacterial/metabolism , RNA, Ribosomal, 23S/genetics , RNA, Ribosomal, 23S/metabolism , Ribosomes/chemistry , Ribosomes/metabolism
8.
Nature ; 540(7633): 400-405, 2016 12 15.
Article in English | MEDLINE | ID: mdl-27926732

ABSTRACT

The internal mechanics of proteins-the coordinated motions of amino acids and the pattern of forces constraining these motions-connects protein structure to function. Here we describe a new method combining the application of strong electric field pulses to protein crystals with time-resolved X-ray crystallography to observe conformational changes in spatial and temporal detail. Using a human PDZ domain (LNX2PDZ2) as a model system, we show that protein crystals tolerate electric field pulses strong enough to drive concerted motions on the sub-microsecond timescale. The induced motions are subtle, involve diverse physical mechanisms, and occur throughout the protein structure. The global pattern of electric-field-induced motions is consistent with both local and allosteric conformational changes naturally induced by ligand binding, including at conserved functional sites in the PDZ domain family. This work lays the foundation for comprehensive experimental study of the mechanical basis of protein function.


Subject(s)
Crystallography, X-Ray/methods , Electricity , Movement , PDZ Domains , Proteins/chemistry , Proteins/metabolism , Allosteric Regulation , Biomechanical Phenomena , Humans , Ligands , Models, Molecular , Structure-Activity Relationship
9.
Phys Biol ; 18(4)2021 05 17.
Article in English | MEDLINE | ID: mdl-33477124

ABSTRACT

Biological organisms experience constantly changing environments, from sudden changes in physiology brought about by feeding, to the regular rising and setting of the Sun, to ecological changes over evolutionary timescales. Living organisms have evolved to thrive in this changing world but the general principles by which organisms shape and are shaped by time varying environments remain elusive. Our understanding is particularly poor in the intermediate regime with no separation of timescales, where the environment changes on the same timescale as the physiological or evolutionary response. Experiments to systematically characterize the response to dynamic environments are challenging since such environments are inherently high dimensional. This roadmap deals with the unique role played by time varying environments in biological phenomena across scales, from physiology to evolution, seeking to emphasize the commonalities and the challenges faced in this emerging area of research.


Subject(s)
Biological Evolution , Environment , Physiological Phenomena , Time Factors
10.
Pediatr Blood Cancer ; 67(9): e28532, 2020 09.
Article in English | MEDLINE | ID: mdl-32568452

ABSTRACT

BACKGROUND: Breakthrough chemotherapy-induced vomiting (CIV) is defined as CIV occurring after adequate antiemetic prophylaxis. Olanzapine and metoclopramide are two drugs recommended for the treatment of breakthrough CIV in children, without adequate evidence. We conducted an open-label, single-center, phase 3 randomized controlled trial comparing the safety and efficacy of olanzapine and metoclopramide for treating breakthrough CIV. PROCEDURE: Children aged 5-18 years who developed breakthrough CIV after receiving highly emetogenic chemotherapy or moderately emetogenic chemotherapy were randomly assigned to the metoclopramide or olanzapine arm. The primary objective of the study was to compare the complete response (CR) rates between patients receiving olanzapine or metoclopramide for treating breakthrough CIV during 72 hours after the administration of the study drug. Secondary objectives were to compare CR rates for nausea and toxicities between the two arms. RESULTS: Eighty patients were analyzed (39 in the olanzapine arm and 41 in the metoclopramide arm). CR rates were significantly higher in the olanzapine arm compared with the metoclopramide arm for vomiting (72% vs 39%, P = 0.003) and nausea (59% vs 34%, P = 0.026). Seven patients in the metoclopramide arm crossed over to the olanzapine arm and none crossed over in the olanzapine arm (P < 0.001). The mean nausea score in the olanzapine arm was significantly lower than the metoclopramide arm after the initiation of the rescue antiemetic (P = 0.01). Hyperglycemia and drowsiness were more commonly seen in the olanzapine arm. CONCLUSION: Olanzapine is superior to metoclopramide for the treatment of breakthrough CIV in children. Drowsiness and hyperglycemia need to be monitored closely in children receiving olanzapine for breakthrough CIV.


Subject(s)
Antiemetics/therapeutic use , Antineoplastic Combined Chemotherapy Protocols/adverse effects , Metoclopramide/therapeutic use , Neoplasms/drug therapy , Olanzapine/therapeutic use , Vomiting/drug therapy , Adolescent , Child , Child, Preschool , Female , Follow-Up Studies , Humans , Male , Neoplasms/pathology , Prognosis , Vomiting/chemically induced , Vomiting/pathology
11.
Nature ; 491(7422): 138-42, 2012 Nov 01.
Article in English | MEDLINE | ID: mdl-23041932

ABSTRACT

Statistical analysis of protein evolution suggests a design for natural proteins in which sparse networks of coevolving amino acids (termed sectors) comprise the essence of three-dimensional structure and function. However, proteins are also subject to pressures deriving from the dynamics of the evolutionary process itself--the ability to tolerate mutation and to be adaptive to changing selection pressures. To understand the relationship of the sector architecture to these properties, we developed a high-throughput quantitative method for a comprehensive single-mutation study in which every position is substituted individually to every other amino acid. Using a PDZ domain (PSD95(pdz3)) model system, we show that sector positions are functionally sensitive to mutation, whereas non-sector positions are more tolerant to substitution. In addition, we find that adaptation to a new binding specificity initiates exclusively through variation within sector residues. A combination of just two sector mutations located near and away from the ligand-binding site suffices to switch the binding specificity of PSD95(pdz3) quantitatively towards a class-switching ligand. The localization of functional constraint and adaptive variation within the sector has important implications for understanding and engineering proteins.


Subject(s)
Adaptation, Physiological , Amino Acid Substitution , Mutant Proteins/chemistry , PDZ Domains/genetics , PDZ Domains/physiology , Proteins/chemistry , Proteins/metabolism , Adaptation, Physiological/genetics , Adaptation, Physiological/physiology , Amino Acid Sequence , Binding Sites/genetics , Evolution, Molecular , Ligands , Models, Molecular , Molecular Sequence Data , Mutant Proteins/genetics , Mutant Proteins/metabolism , Mutation , Proteins/genetics
12.
PLoS Comput Biol ; 12(6): e1004817, 2016 06.
Article in English | MEDLINE | ID: mdl-27254668

ABSTRACT

The essential biological properties of proteins-folding, biochemical activities, and the capacity to adapt-arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function. To facilitate its usage, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package (pySCA). We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment-a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for studying proteins and for generally testing the concept of sectors as the principal units of function and adaptive variation.


Subject(s)
Evolution, Molecular , GTP-Binding Proteins/chemistry , GTP-Binding Proteins/chemical synthesis , Models, Chemical , Molecular Docking Simulation/methods , Sequence Analysis, Protein/methods , Algorithms , Binding Sites , Computer Simulation , GTP-Binding Proteins/ultrastructure , Protein Binding , Sequence Alignment/methods
13.
bioRxiv ; 2024 Mar 24.
Article in English | MEDLINE | ID: mdl-37461732

ABSTRACT

Proteins are molecular machines and to understand how they work, we need to understand how they move. New pump-probe time-resolved X-ray diffraction methods open up ways to initiate and observe protein motions with atomistic detail in crystals on biologically relevant timescales. However, practical limitations of these experiments demands parallel development of effective molecular dynamics approaches to accelerate progress and extract meaning. Here, we establish robust and accurate methods for simulating dynamics in protein crystals, a nontrivial process requiring careful attention to equilibration, environmental composition, and choice of force fields. With more than seven milliseconds of sampling of a single chain, we identify critical factors controlling agreement between simulation and experiments and show that simulated motions recapitulate ligand-induced conformational changes. This work enables a virtuous cycle between simulation and experiments for visualizing and understanding the basic functional motions of proteins.

14.
Nat Commun ; 15(1): 3244, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38622111

ABSTRACT

Proteins are molecular machines and to understand how they work, we need to understand how they move. New pump-probe time-resolved X-ray diffraction methods open up ways to initiate and observe protein motions with atomistic detail in crystals on biologically relevant timescales. However, practical limitations of these experiments demands parallel development of effective molecular dynamics approaches to accelerate progress and extract meaning. Here, we establish robust and accurate methods for simulating dynamics in protein crystals, a nontrivial process requiring careful attention to equilibration, environmental composition, and choice of force fields. With more than seven milliseconds of sampling of a single chain, we identify critical factors controlling agreement between simulation and experiments and show that simulated motions recapitulate ligand-induced conformational changes. This work enables a virtuous cycle between simulation and experiments for visualizing and understanding the basic functional motions of proteins.


Subject(s)
Molecular Dynamics Simulation , Proteins , Proteins/metabolism , X-Ray Diffraction , Protein Conformation
15.
Struct Dyn ; 11(1): 014301, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38304444

ABSTRACT

A major goal in biomedical science is to move beyond static images of proteins and other biological macromolecules to the internal dynamics underlying their function. This level of study is necessary to understand how these molecules work and to engineer new functions and modulators of function. Stemming from a visionary commitment to this problem by Keith Moffat decades ago, a community of structural biologists has now enabled a set of x-ray scattering technologies for observing intramolecular dynamics in biological macromolecules at atomic resolution and over the broad range of timescales over which motions are functionally relevant. Many of these techniques are provided by BioCARS, a cutting-edge synchrotron radiation facility built under Moffat leadership and located at the Advanced Photon Source at Argonne National Laboratory. BioCARS enables experimental studies of molecular dynamics with time resolutions spanning from 100 ps to seconds and provides both time-resolved x-ray crystallography and small- and wide-angle x-ray scattering. Structural changes can be initiated by several methods-UV/Vis pumping with tunable picosecond and nanosecond laser pulses, substrate diffusion, and global perturbations, such as electric field and temperature jumps. Studies of dynamics typically involve subtle perturbations to molecular structures, requiring specialized computational techniques for data processing and interpretation. In this review, we present the challenges in experimental macromolecular dynamics and describe the current state of experimental capabilities at this facility. As Moffat imagined years ago, BioCARS is now positioned to catalyze the scientific community to make fundamental advances in understanding proteins and other complex biological macromolecules.

16.
Vaccines (Basel) ; 12(2)2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38400096

ABSTRACT

Autologous dendritic cell (DC)-based immunotherapy is a cell-based advanced therapy medicinal product (ATMP) that was first introduced more than three decades ago. In the current study, our objective was to establish a harmonized protocol using two varied antigenic sources and a good manufacturing practice (GMP)-compliant, manual method for generating clinical-grade DCs at a limited-resource academic setting. After obtaining ethical committee-approved informed consent, the recruited patients underwent leukapheresis, and single-batch DC production was carried out. Using responder-independent flow cytometric assays as quality control (QC) criteria, we propose a differentiation and maturation index (DI and MI, respectively), calculated with the QC cut-off and actual scores of each batch for comparison. Changes during cryopreservation and personnel variation were assessed periodically for up to two to three years. Using our harmonized batch production protocol, the average DI was 1.39 and MI was 1.25. Allogenic responder proliferation was observed in all patients, while IFN-gamma secretion, evaluated using flow cytometry, was detected in 10/36 patients and significantly correlated with CD8+ T cell proliferation (p value-0.0002). Tracking the viability and phenotype of cryopreserved MDCs showed a >90% viability for up to three years, while a mature DC phenotype was retained for up to one year. Our results confirm that the manual/semi-automated protocol was simple, consistent, and cost-effective, without the requirement for expensive equipment and without compromising on the quality of the final product.

17.
Nat Cell Biol ; 8(6): 571-80, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16699502

ABSTRACT

Cellular information processing requires the coordinated activity of a large network of intracellular signalling pathways. Cross-talk between pathways provides for complex non-linear responses to combinations of stimuli, but little is known about the density of these interactions in any specific cell. Here, we have analysed a large-scale survey of pathway interactions carried out by the Alliance for Cellular Signalling (AfCS) in RAW 264.7 macrophages. Twenty-two receptor-specific ligands were studied, both alone and in all pairwise combinations, for Ca2+ mobilization, cAMP synthesis, phosphorylation of many signalling proteins and for cytokine production. A large number of non-additive interactions are evident that are consistent with known mechanisms of cross-talk between pathways, but many novel interactions are also revealed. A global analysis of cross-talk suggests that many external stimuli converge on a relatively small number of interaction mechanisms to provide for context-dependent signalling.


Subject(s)
Receptor Cross-Talk , Signal Transduction , Animals , Calcium Signaling , Cluster Analysis , Cyclic AMP/biosynthesis , Cytokines/biosynthesis , Ligands , Macrophages , Mice , Phosphorylation
18.
Cell Syst ; 14(3): 210-219.e7, 2023 03 15.
Article in English | MEDLINE | ID: mdl-36693377

ABSTRACT

Protein structure, function, and evolution depend on local and collective epistatic interactions between amino acids. A powerful approach to defining these interactions is to construct models of couplings between amino acids that reproduce the empirical statistics (frequencies and correlations) observed in sequences comprising a protein family. The top couplings are then interpreted. Here, we show that as currently implemented, this inference unequally represents epistatic interactions, a problem that fundamentally arises from limited sampling of sequences in the context of distinct scales at which epistasis occurs in proteins. We show that these issues explain the ability of current approaches to predict tertiary contacts between amino acids and the inability to obviously expose larger networks of functionally relevant, collectively evolving residues called sectors. This work provides a necessary foundation for more deeply understanding and improving evolution-based models of proteins.


Subject(s)
Amino Acids , Proteins , Proteins/metabolism
19.
ACS Synth Biol ; 12(12): 3544-3561, 2023 Dec 15.
Article in English | MEDLINE | ID: mdl-37988083

ABSTRACT

Deep generative models (DGMs) have shown great success in the understanding and data-driven design of proteins. Variational autoencoders (VAEs) are a popular DGM approach that can learn the correlated patterns of amino acid mutations within a multiple sequence alignment (MSA) of protein sequences and distill this information into a low-dimensional latent space to expose phylogenetic and functional relationships and guide generative protein design. Autoregressive (AR) models are another popular DGM approach that typically lacks a low-dimensional latent embedding but does not require training sequences to be aligned into an MSA and enable the design of variable length proteins. In this work, we propose ProtWave-VAE as a novel and lightweight DGM, employing an information maximizing VAE with a dilated convolution encoder and an autoregressive WaveNet decoder. This architecture blends the strengths of the VAE and AR paradigms in enabling training over unaligned sequence data and the conditional generative design of variable length sequences from an interpretable, low-dimensional learned latent space. We evaluated the model's ability to infer patterns and design rules within alignment-free homologous protein family sequences and to design novel synthetic proteins in four diverse protein families. We show that our model can infer meaningful functional and phylogenetic embeddings within latent spaces and make highly accurate predictions within semisupervised downstream fitness prediction tasks. In an application to the C-terminal SH3 domain in the Sho1 transmembrane osmosensing receptor in baker's yeast, we subject ProtWave-VAE-designed sequences to experimental gene synthesis and select-seq assays for the osmosensing function to show that the model enables synthetic protein design, conditional C-terminus diversification, and engineering of the osmosensing function into SH3 paralogues.


Subject(s)
Genetic Techniques , Proteins , Phylogeny , Mutation , Amino Acid Sequence
SELECTION OF CITATIONS
SEARCH DETAIL