Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 69
Filter
1.
J Mol Biol ; : 168791, 2024 Sep 12.
Article in English | MEDLINE | ID: mdl-39260686

ABSTRACT

The vastness of unexplored protein fold universe remains a significant question. Through systematic de novo design of proteins with novel αß-folds, we demonstrated that nature has only explored a tiny portion of the possible folds. Numerous possible protein folds are still untouched by nature. This review outlines this study and discusses the prospects for design of functional proteins with novel folds.

2.
Curr Issues Mol Biol ; 46(9): 10590-10605, 2024 Sep 21.
Article in English | MEDLINE | ID: mdl-39329979

ABSTRACT

Phage display has been widely used to identify peptides binding to a variety of biological targets. In the current work, we planned to select novel peptides targeting CD4 through screening of a commercial phage display library (New England Biolabs Ph.D.TM-7). After three rounds of biopanning, 57 phage clones were Sanger-sequenced. These clones represented 30 unique peptide sequences, which were subjected to phage ELISA, resulting in the identification of two potential target binders. Following peptide synthesis, downstream characterization was conducted using fluorescence plate-based assay, flow cytometry, SPR, and confocal microscopy. The results revealed that neither of the peptides identified in the Sanger-based phage display selection exhibited specific binding toward CD4. The naïve library and the phage pool recovered from the third round of biopanning were then subjected to next-generation sequencing (NGS). The results of NGS indicated corruption of the selection output by a phage already known as a fast-propagating clone whose target-unrelated enrichment can shed light on the misidentification of target-binding peptides through phage display. This work provides an in-depth insight into some of the challenges encountered in peptide phage display selection. Furthermore, our data highlight that NGS, by exploring a broader sequence space and providing a more precise picture of the composition of biopanning output, can be used to refine the selection protocol and avoid misleading the process of ligand identification. We hope that these findings can describe some of the complexities of phage display selection and offer help to fellow researchers who have faced similar situations.

3.
Genome Biol Evol ; 16(8)2024 Aug 05.
Article in English | MEDLINE | ID: mdl-39212966

ABSTRACT

During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.


Subject(s)
Drosophila Proteins , Animals , Drosophila Proteins/genetics , Drosophila Proteins/chemistry , Drosophila Proteins/metabolism , Evolution, Molecular , Machine Learning , Drosophila/genetics , Drosophila melanogaster/genetics , Protein Folding , Biomolecular Condensates/metabolism , Biomolecular Condensates/chemistry
4.
bioRxiv ; 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38798671

ABSTRACT

Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.

5.
Elife ; 122024 May 20.
Article in English | MEDLINE | ID: mdl-38767330

ABSTRACT

A protein's genetic architecture - the set of causal rules by which its sequence produces its functions - also determines its possible evolutionary trajectories. Prior research has proposed that the genetic architecture of proteins is very complex, with pervasive epistatic interactions that constrain evolution and make function difficult to predict from sequence. Most of this work has analyzed only the direct paths between two proteins of interest - excluding the vast majority of possible genotypes and evolutionary trajectories - and has considered only a single protein function, leaving unaddressed the genetic architecture of functional specificity and its impact on the evolution of new functions. Here, we develop a new method based on ordinal logistic regression to directly characterize the global genetic determinants of multiple protein functions from 20-state combinatorial deep mutational scanning (DMS) experiments. We use it to dissect the genetic architecture and evolution of a transcription factor's specificity for DNA, using data from a combinatorial DMS of an ancient steroid hormone receptor's capacity to activate transcription from two biologically relevant DNA elements. We show that the genetic architecture of DNA recognition consists of a dense set of main and pairwise effects that involve virtually every possible amino acid state in the protein-DNA interface, but higher-order epistasis plays only a tiny role. Pairwise interactions enlarge the set of functional sequences and are the primary determinants of specificity for different DNA elements. They also massively expand the number of opportunities for single-residue mutations to switch specificity from one DNA target to another. By bringing variants with different functions close together in sequence space, pairwise epistasis therefore facilitates rather than constrains the evolution of new functions.


Subject(s)
Epistasis, Genetic , Evolution, Molecular , Transcription Factors/metabolism , Transcription Factors/genetics , DNA/genetics , DNA/metabolism , Mutation , Protein Binding
6.
bioRxiv ; 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38798625

ABSTRACT

Quantitative models that describe how biological sequences encode functional activities are ubiquitous in modern biology. One important aspect of these models is that they commonly exhibit gauge freedoms, i.e., directions in parameter space that do not affect model predictions. In physics, gauge freedoms arise when physical theories are formulated in ways that respect fundamental symmetries. However, the connections that gauge freedoms in models of sequence-function relationships have to the symmetries of sequence space have yet to be systematically studied. Here we study the gauge freedoms of models that respect a specific symmetry of sequence space: the group of position-specific character permutations. We find that gauge freedoms arise when model parameters transform under redundant irreducible matrix representations of this group. Based on this finding, we describe an "embedding distillation" procedure that enables analytic calculation of the number of independent gauge freedoms, as well as efficient computation of a sparse basis for the space of gauge freedoms. We also study how parameter transformation behavior affects parameter interpretability. We find that in many (and possibly all) nontrivial models, the ability to interpret individual model parameters as quantifying intrinsic allelic effects requires that gauge freedoms be present. This finding establishes an incompatibility between two distinct notions of parameter interpretability. Our work thus advances the understanding of symmetries, gauge freedoms, and parameter interpretability in sequence-function relationships.

7.
Appl Environ Microbiol ; 90(1): e0167623, 2024 01 24.
Article in English | MEDLINE | ID: mdl-38179968

ABSTRACT

FAD-dependent pyranose oxidase (POx) and C-glycoside-3-oxidase (CGOx) are both members of the glucose-methanol-choline superfamily of oxidoreductases and belong to the same sequence space. Pyranose oxidases had been studied for their oxidation of monosaccharides such as D-glucose, but recently, a bacterial C-glycoside-3-oxidase that is phylogenetically related to POx and that reacts with C-glycosides such as carminic acid, mangiferin or puerarin has been described. Since these actinobacterial CGOx enzymes belong to the same sequence space as bacterial POx, they must have evolved from the same ancestor. Here, we performed a phylogenetic analysis of actinobacterial sequences and resurrected seven ancestral enzymes of the POx/CGOx sequence space to study the evolutionary trajectory of substrate preferences for monosaccharides and C-glycosides. Clade I, with its dimeric member POx from Kitasatospora aureofaciens, shows strict preference for monosaccharides (D-glucose and D-xylose) and does not react with any of the glycosides tested. No extant member of clade II has been studied to date. The two extant members of clades III and IV, monomeric POx/CGOx from Pseudoarthrobacter siccitolerans and Streptomyces canus, oxidized both monosaccharides as well as various C-glycosides (homoorientin, isovitexin, mangiferin, and puerarin). Steady-state kinetic parameters of several clades III and IV ancestral enzymes indicate that the generalist ancestor N35 slowly evolved to present-day enzymes with a much higher preference for C-glycosides than monosaccharides. Based on structural predictions of ancestors, we hypothesize that the strict specificity of bacterial clade I POx (and also fungal POx) is the result of oligomerization, which in turn results from the evolution of protein segments that were shown to be important for oligomerization, the arm, and the head domain.IMPORTANCEC-Glycosides often form active compounds in various plants. Breakage of the C-C bond in these glycosides to release the aglycone is challenging and proceeds via a two-step reaction, the oxidation of the sugar and subsequent cleavage of the C-C bond. Recently, an enzyme from a soil bacterium, FAD-dependent C-glycoside-3-oxidase (CGOx), was shown to catalyze the initial oxidation reaction. Here, we show that CGOx belongs to the same sequence space as pyranose oxidase (POx), and that an actinobacterial ancestor of the POx/CGOx family evolved into four clades, two of which show a high preference for C-glycosides.


Subject(s)
Glycosides , Oxidoreductases , Oxidoreductases/metabolism , Phylogeny , Monosaccharides , Glucose/metabolism
8.
Comput Struct Biotechnol J ; 21: 4488-4496, 2023.
Article in English | MEDLINE | ID: mdl-37736300

ABSTRACT

Enzymes are potent catalysts with high specificity and selectivity. To leverage nature's synthetic potential for industrial applications, various protein engineering techniques have emerged which allow to tailor the catalytic, biophysical, and molecular recognition properties of enzymes. However, the many possible ways a protein can be altered forces researchers to carefully balance between the exhaustiveness of an enzyme screening campaign and the required resources. Consequently, the optimal engineering strategy is often defined on a case-by-case basis. Strikingly, while predicting mutations that lead to an improved target function is challenging, here we show that the prediction and exclusion of deleterious mutations is a much more straightforward task as analyzed for an engineered carbonic acid anhydrase, a transaminase, a squalene-hopene cyclase and a Kemp eliminase. Combining such a pre-selection of allowed residues with advanced gene synthesis methods opens a path toward an efficient and generalizable library construction approach for protein engineering. To give researchers easy access to this methodology, we provide the website LibGENiE containing the bioinformatic tools for the library design workflow.

9.
J Comput Chem ; 44(22): 1836-1844, 2023 08 15.
Article in English | MEDLINE | ID: mdl-37177839

ABSTRACT

Discovery of target-binding molecules, such as aptamers and peptides, is usually performed with the use of high-throughput experimental screening methods. These methods typically generate large datasets of sequences of target-binding molecules, which can be enriched with high affinity binders. However, the identification of the highest affinity binders from these large datasets often requires additional low-throughput experiments or other approaches. Bioinformatics-based analyses could be helpful to better understand these large datasets and identify the parts of the sequence space enriched with high affinity binders. BinderSpace is an open-source Python package that performs motif analysis, sequence space visualization, clustering analyses, and sequence extraction from clusters of interest. The motif analysis, resulting in text-based and visual output of motifs, can also provide heat maps of previously measured user-defined functional properties for all the motif-containing molecules. Users can also run principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) analyses on whole datasets and on motif-related subsets of the data. Functionally important sequences can also be highlighted in the resulting PCA and t-SNE maps. If points (sequences) in two-dimensional maps in PCA or t-SNE space form clusters, users can perform clustering analyses on their data, and extract sequences from clusters of interest. We demonstrate the use of BinderSpace on a dataset of oligonucleotides binding to single-wall carbon nanotubes in the presence and absence of a bioanalyte, and on a dataset of cyclic peptidomimetics binding to bovine carbonic anhydrase protein. BinderSpace is openly accessible to the public via the GitHub website: https://github.com/vukoviclab/BinderSpace.


Subject(s)
Nanotubes, Carbon , Oligonucleotides , Animals , Cattle , Peptides , Computational Biology , Sequence Analysis , Algorithms
10.
Curr Opin Microbiol ; 74: 102320, 2023 08.
Article in English | MEDLINE | ID: mdl-37075547

ABSTRACT

Viruses are locked in an evolutionary arms race with their hosts. What ultimately determines viral evolvability, or capacity for adaptive evolution, is their ability to efficiently explore and expand sequence space while under the selective regime imposed by their ecology, which includes innate and adaptive host defenses. Viral genomes have significantly higher evolutionary rates than their host counterparts and should have advantages relative to their slower-evolving hosts. However, functional constraints on virus evolutionary landscapes along with the modularity and mutational tolerance of host defense proteins may help offset the advantage conferred to viruses by high evolutionary rates. Additionally, cellular life forms from all domains of life possess many highly complex defense mechanisms that act as hurdles to viral replication. Consequently, viruses constantly probe sequence space through mutation and genetic exchange and are under pressure to optimize diverse counter-defense strategies.


Subject(s)
Evolution, Molecular , Genome, Viral , Genome, Viral/genetics
11.
Viruses ; 15(1)2023 01 12.
Article in English | MEDLINE | ID: mdl-36680256

ABSTRACT

In the human gut, temperate bacteriophages interact with bacteria through predation and horizontal gene transfer. Relying on taxonomic data, metagenomic studies have associated shifts in phage abundance with a number of human diseases. The temperate bacteriophage VEsP-1 with siphovirus morphology was isolated from a sample of river water using Enterococcus faecalis as a host. Starting from the whole genome sequence of VEsP-1, we retrieved related phage genomes in blastp searches of the tail protein and large terminase sequences, and blastn searches of the whole genome sequences, with matches compiled from several different databases, and visualized a part of viral dark matter sequence space. The genome network and phylogenomic analyses resulted in the proposal of a novel genus "Vespunovirus", consisting of temperate, mainly metagenomic phages infecting Enterococcus spp.


Subject(s)
Bacteriophages , Humans , Enterococcus/genetics , Genome, Viral , Sequence Analysis, DNA , Phylogeny , Myoviridae/genetics
12.
Microbiol Spectr ; 11(1): e0292122, 2023 02 14.
Article in English | MEDLINE | ID: mdl-36625643

ABSTRACT

Recently, a new strategy for attenuating RNA viruses by redirecting their evolution in sequence space was confirmed for Enterovirus and Influenza viruses. Using avian flavivirus as a model, the 69 serine and 53 leucine codons on the E-NS1 genes were modified to change evolutionary direction of the viral sequence space. This means that all codons encoding serine or leucine residues were substituted with codons that are only one base different from the three stop codons, resulting in the initial position of the virus genome in sequence space being closer to the detrimental areas to achieve attenuation by reducing viral adaptability. The growth curve and plaque size of CQW1-one-to-stop (CQW1-OTS) were similar to those of CQW1-wild type (CQW1-WT) in vitro, but attenuated proliferation was detected when treated with a mutagenic reagent (ribavirin). However, comparably high CQW1-OTS and CQW1-WT lethality rates were detected in 9-day-old duck embryos and 5-day-old ducklings, suggesting that this strategy works but with limitations. With that in mind, homologous hosts in nonsensitive age (25-day-old ducks) and heterologous hosts (3-week-old Kunming mice) were employed to investigate if CQW1-OTS was attenuated under host selection pressure. Minimal attenuation of CQW1-OTS in elder ducks and apparent attenuation in mice were reported, providing reduced viral titers, mild clinical signs, and lower specific infectivity. Collectively, we experimentally demonstrate that the attenuation strategy of redirecting virus evolution in sequence space works for flavivirus. Redirection of the virus is attenuated only under some outside pressure, such as heterologous hosts or antiviral drugs treatment, limiting its usage in flaviviruses. IMPORTANCE Flaviviruses are medically important arboviruses that threaten public health, but no approved treatments are currently available. Vaccines prevent flavivirus infection. We employed duck Tembusu virus (TMUV), a mosquito-borne flavivirus, to evaluate virus redirection. TMUV is native to birds and could infect mice by intracerebral injection, making it an experimental animal model to study flavivirus characteristics in vivo. The 69 serine and 53 leucine codons on the E-NS1 proteins of CQW1 were synonymously substituted to change evolutionary direction of the virus in sequence space. In vitro mutagen reagent treatment suppressed CQW1-OTS viral multiplication, but in vivo attenuation depended on host selective pressure. CQW1-OTS viral attenuation was observed in older ducks but not sensitive ducklings; considerable attenuation was also observed in heterogenous host (mice), which provides more selective pressure on viruses. Collectively, these data indicated that there are very important preconditions for application of evaluating whether this strategy shows application prospects in novel flavivirus vaccine development.


Subject(s)
Flavivirus , Poultry Diseases , Mice , Animals , Leucine/metabolism , Serine/metabolism , Flavivirus/genetics , Mutation , Ducks , Codon/genetics , Codon/metabolism
13.
Curr Top Microbiol Immunol ; 439: 1-94, 2023.
Article in English | MEDLINE | ID: mdl-36592242

ABSTRACT

The landscape paradigm is revisited in the light of evolution in simple systems. A brief overview of different classes of fitness landscapes is followed by a more detailed discussion of the RNA model, which is currently the only evolutionary model that allows for a comprehensive molecular analysis of a fitness landscape. Neutral networks of genotypes are indispensable for the success of evolution. Important insights into the evolutionary mechanism are gained by considering the topology of sequence and shape spaces. The dynamic concept of molecular quasispecies is viewed in the light of the landscape paradigm. The distribution of fitness values in state space is mirrored by the population structures of mutant distributions. Two classes of thresholds for replication error or mutations are important: (i) the-conventional-genotypic error threshold, which separates ordered replication from random drift on neutral networks, and (ii) a phenotypic error threshold above which the molecular phenotype is lost. Empirical landscapes are reviewed and finally, the implications of the landscape concept for virus evolution are discussed.


Subject(s)
Models, Genetic , Viruses , Genotype , Phenotype , Mutation , RNA/chemistry , RNA/genetics , Viruses/genetics , Evolution, Molecular , Genetic Fitness , Biological Evolution
14.
Elife ; 112022 09 13.
Article in English | MEDLINE | ID: mdl-36098382

ABSTRACT

Low complexity regions (LCRs) play a role in a variety of important biological processes, yet we lack a unified view of their sequences, features, relationships, and functions. Here, we use dotplots and dimensionality reduction to systematically define LCR type/copy relationships and create a map of LCR sequence space capable of integrating LCR features and functions. By defining LCR relationships across the proteome, we provide insight into how LCR type and copy number contribute to higher order assemblies, such as the importance of K-rich LCR copy number for assembly of the nucleolar protein RPA43 in vivo and in vitro. With LCR maps, we reveal the underlying structure of LCR sequence space, and relate differential occupancy in this space to the conservation and emergence of higher order assemblies, including the metazoan extracellular matrix and plant cell wall. Together, LCR relationships and maps uncover and identify scaffold-client relationships among E-rich LCR-containing proteins in the nucleolus, and revealed previously undescribed regions of LCR sequence space with signatures of higher order assemblies, including a teleost-specific T/H-rich sequence space. Thus, this unified view of LCRs enables discovery of how LCRs encode higher order assemblies of organisms.


Subject(s)
Species Specificity , Animals , Humans
15.
Int J Biol Macromol ; 217: 492-505, 2022 Sep 30.
Article in English | MEDLINE | ID: mdl-35841961

ABSTRACT

Conventional drug development strategies typically use pocket in protein structures as drug-target sites. They overlook the plausible effects of protein evolvability and resistant mutations on protein structure which in turn may impair protein-drug interaction. In this study, we used an integrated evolution and structure guided strategy to develop potential evolutionary-escape resistant therapeutics using receptor binding domain (RBD) of SARS-CoV-2 spike-protein/S-protein as a model. Deploying an ensemble of sequence space exploratory tools including co-evolutionary analysis and deep mutational scans we provide a quantitative insight into the evolutionarily constrained subspace of the RBD sequence-space. Guided by molecular simulation and structure network analysis we highlight regions inside the RBD, which are critical for providing structural integrity and conformational flexibility. Using fuzzy C-means clustering we combined evolutionary and structural features of RBD and identified a critical region. Subsequently, we used computational drug screening using a library of 1615 small molecules and identified one lead molecule, which is expected to target the identified region, critical for evolvability and structural stability of RBD. This integrated evolution-structure guided strategy to develop evolutionary-escape resistant lead molecules have potential general applications beyond SARS-CoV-2.


Subject(s)
COVID-19 , SARS-CoV-2 , Angiotensin-Converting Enzyme 2 , Binding Sites , Humans , Mutation , Peptidyl-Dipeptidase A/metabolism , Protein Binding , Spike Glycoprotein, Coronavirus/chemistry
16.
Open Biol ; 12(6): 220040, 2022 06.
Article in English | MEDLINE | ID: mdl-35728622

ABSTRACT

The earliest proteins had to rely on amino acids available on early Earth before the biosynthetic pathways for more complex amino acids evolved. In extant proteins, a significant fraction of the 'late' amino acids (such as Arg, Lys, His, Cys, Trp and Tyr) belong to essential catalytic and structure-stabilizing residues. How (or if) early proteins could sustain an early biosphere has been a major puzzle. Here, we analysed two combinatorial protein libraries representing proxies of the available sequence space at two different evolutionary stages. The first is composed of the entire alphabet of 20 amino acids while the second one consists of only 10 residues (ASDGLIPTEV) representing a consensus view of plausibly available amino acids through prebiotic chemistry. We show that compact conformations resistant to proteolysis are surprisingly similarly abundant in both libraries. In addition, the early alphabet proteins are inherently more soluble and refoldable, independent of the general Hsp70 chaperone activity. By contrast, chaperones significantly increase the otherwise poor solubility of the modern alphabet proteins suggesting their coevolution with the amino acid repertoire. Our work indicates that while both early and modern amino acids are predisposed to supporting protein structure, they do so with different biophysical properties and via different mechanisms.


Subject(s)
Amino Acids , Prebiotics , Amino Acids/chemistry , Protein Folding , Proteins/chemistry
17.
Int J Mol Sci ; 23(8)2022 Apr 11.
Article in English | MEDLINE | ID: mdl-35457045

ABSTRACT

Aminoacyl-tRNA synthetase (aaRS)/tRNA cognate pairs translate the genetic code by synthesizing specific aminoacyl-tRNAs that are assembled on messenger RNA by the ribosome. Deconstruction of the two distinct aaRS superfamilies (Classes) has provided conceptual and experimental models for their early evolution. Urzymes, containing ~120-130 amino acids excerpted from regions where genetic coding sequence complementarities have been identified, are key experimental models motivated by the proposal of a single bidirectional ancestral gene. Previous reports that Class I and Class II urzymes accelerate both amino acid activation and tRNA aminoacylation have not been extended to other synthetases. We describe a third urzyme (LeuAC) prepared from the Class IA Pyrococcus horikoshii leucyl-tRNA synthetase. We adduce multiple lines of evidence for the authenticity of its catalysis of both canonical reactions, amino acid activation and tRNALeu aminoacylation. Mutation of the three active-site lysine residues to alanine causes significant, but modest reduction in both amino acid activation and aminoacylation. LeuAC also catalyzes production of ADP, a non-canonical enzymatic function that has been overlooked since it first was described for several full-length aaRS in the 1970s. Structural data suggest that the LeuAC active site accommodates two ATP conformations that are prominent in water but rarely seen bound to proteins, accounting for successive, in situ phosphorylation of the bound leucyl-5'AMP phosphate, accounting for ADP production. This unusual ATP consumption regenerates the transition state for amino acid activation and suggests, in turn, that in the absence of the editing and anticodon-binding domains, LeuAC releases leu-5'AMP unusually slowly, relative to the two phosphorylation reactions.


Subject(s)
Amino Acyl-tRNA Synthetases , Leucine-tRNA Ligase , Adenosine Diphosphate/metabolism , Adenosine Monophosphate/metabolism , Adenosine Triphosphate/metabolism , Amino Acids/metabolism , Amino Acyl-tRNA Synthetases/metabolism , Leucine-tRNA Ligase/genetics , Leucine-tRNA Ligase/metabolism , Phosphorylation
18.
Mol Biol Evol ; 39(1)2022 01 07.
Article in English | MEDLINE | ID: mdl-34751386

ABSTRACT

During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.


Subject(s)
Epistasis, Genetic , Space Flight , Evolution, Molecular , Genetic Fitness , Models, Genetic , Mutation , Proteins/genetics
19.
Annu Rev Virol ; 8(1): 51-72, 2021 09 29.
Article in English | MEDLINE | ID: mdl-34586874

ABSTRACT

Viral quasispecies are dynamic distributions of nonidentical but closely related mutant and recombinant viral genomes subjected to a continuous process of genetic variation, competition, and selection that may act as a unit of selection. The quasispecies concept owes its theoretical origins to a model for the origin of life as a collection of mutant RNA replicators. Independently, experimental evidence for the quasispecies concept was obtained from sampling of bacteriophage clones, which revealed that the viral populations consisted of many mutant genomes whose frequency varied with time of replication. Similar findings were made in animal and plant RNA viruses. Quasispecies became a theoretical framework to understand viral population dynamics and adaptability. The evidence came at a time when mutations were considered rare events in genetics, a perception that was to change dramatically in subsequent decades. Indeed, viral quasispecies was the conceptual forefront of a remarkable degree of biological diversity, now evident for cell populations and organisms, not only for viruses. Quasispecies dynamics unveiled complexities in the behavior of viral populations,with consequences for disease mechanisms and control strategies. This review addresses the origin of the quasispecies concept, its major implications on both viral evolution and antiviral strategies, and current and future prospects.


Subject(s)
RNA Viruses , Viruses , Animals , Antiviral Agents , Genome, Viral , Quasispecies/genetics , RNA Viruses/genetics , Viruses/genetics
20.
Bioessays ; 43(8): e2100052, 2021 08.
Article in English | MEDLINE | ID: mdl-34263468

ABSTRACT

Enzyme engineering allows to explore sequence diversity in search for new properties. The scientific literature is populated with methods to create enzyme libraries for engineering purposes, however, choosing a suitable method for the creation of mutant libraries can be daunting, in particular for the novices. Here, we address both novices and experts: how can one enter the arena of enzyme library design and what guidelines can advanced users apply to select strategies best suited to their purpose? Section I is dedicated to the novices and presents an overview of established and standard methods for library creation, as well as available commercial solutions. The expert will discover an up-to-date tool to freshen up their repertoire (Section I) and learn of the newest methods that are likely to become a mainstay (Section II). We focus primarily on in vitro methods, presenting the advantages of each method. Our ultimate aim is to offer a selection of methods/strategies that we believe to be most useful to the enzyme engineer, whether a first-timer or a seasoned user.


Subject(s)
Enzymes/genetics , Genetic Variation , Learning
SELECTION OF CITATIONS
SEARCH DETAIL