RESUMEN
Mutation Q345F in sucrose phosphorylase from Bifidobacterium adolescentis (BaSP) has shown to allow efficient (+)-catechin glucosylation yielding a regioisomeric mixture: (+)-catechin-3'-O-α-D-glucopyranoside, (+)-catechin-5-O-α-D-glucopyranoside and (+)-catechin-3',5-O-α-D-diglucopyranoside with a ratio of 51 : 25 : 24. Here, we efficiently increased the control of (+)-catechin glucosylation regioselectivity with a new variant Q345F/P134D. The same products were obtained with a ratio of 82 : 9 : 9. Thanks to bioinformatics models, we successfully explained the glucosylation favoured at the OH-3' position due to the mutation P134D.
Asunto(s)
Bifidobacterium adolescentis , Catequina , Bifidobacterium adolescentis/genética , Glucosiltransferasas/genética , MutaciónRESUMEN
Enzymes are biological catalysts with many industrial applications, but natural enzymes are usually unsuitable for industrial processes because they are not optimized for the process conditions. The properties of enzymes can be improved by directed evolution, which involves multiple rounds of mutagenesis and screening. By using mathematical models to predict the structure-activity relationship of an enzyme, and by defining the optimal combination of mutations in silico, we can significantly reduce the number of bench experiments needed, and hence the time and investment required to develop an optimized product. Here, we applied our innovative sequence-activity relationship methodology (innov'SAR) to improve glucose oxidase activity in the presence of different mediators across a range of pH values. Using this machine learning approach, a predictive model was developed and the optimal combination of mutations was determined, leading to a glucose oxidase mutant (P1) with greater specificity for the mediators ferrocene-methanol (12-fold) and nitrosoaniline (8-fold), compared to the wild-type enzyme, and better performance in three pH-adjusted buffers. The kcat /KM ratio of P1 increased by up to 121 folds compared to the wild type enzyme at pH 5.5 in the presence of ferrocene methanol.
Asunto(s)
Evolución Molecular Dirigida/métodos , Glucosa Oxidasa , Aprendizaje Automático , Mutagénesis Sitio-Dirigida/métodos , Mutación , Secuencia de Aminoácidos , Compuestos Ferrosos/metabolismo , Glucosa/metabolismo , Glucosa Oxidasa/química , Glucosa Oxidasa/genética , Glucosa Oxidasa/metabolismo , Concentración de Iones de Hidrógeno , Cinética , Modelos Estadísticos , Nitrosaminas/metabolismoRESUMEN
The repertoire of RNA-binding proteins (RBPs) in bacteria play a crucial role in their survival, and interactions with the host machinery, but there is little information, record or characterisation in bacterial genomes. As a first step towards this, we have chosen the bacterial model system Escherichia coli, and organised all RBPs in this organism into a comprehensive database named EcRBPome. It contains RBPs recorded from 614 complete E. coli proteomes available in the RefSeq database (as of October 2018). The database provides various features related to the E. coli RBPs, like their domain architectures, PDB structures, GO and EC annotations etc. It provides the assembly, bioproject and biosample details of each strain, as well as cross-strain comparison of occurrences of various RNA-binding domains (RBDs). The percentage of RBPs, the abundance of the various RBDs harboured by each strain have been graphically represented in this database and available alongside other files for user download. To the best of our knowledge, this is the first database of its kind and we hope that it will be of great use to the biological community.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Proteínas de Escherichia coli/metabolismo , Escherichia coli/metabolismo , ARN Bacteriano/metabolismo , Proteínas de Unión al ARN/metabolismo , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Proteoma , ARN Bacteriano/genética , Proteínas de Unión al ARN/genéticaRESUMEN
The GC-rich Binding Factor 2/Leucine Rich Repeat in the Flightless 1 Interaction Protein 1 gene (GCF2/LRRFIP1) is predicted to be alternatively spliced in five different isoforms. Although important peptide sequence differences are expected to result from this alternative splicing, to date, only the gene transcription regulator properties of LRRFIP1-Iso5 were unveiled. Based on molecular, cellular and biochemical data, we show here that the five isoforms define two molecular entities with different expression profiles in human tissues, subcellular localizations, oligomerization properties and transcription enhancer properties of the canonical Wnt pathway. We demonstrated that LRRFIP1-Iso3, -4 and -5, which share over 80% sequence identity, are primarily located in the cell cytoplasm and form homo and hetero-multimers between each other. In contrast, LRRFIP1-Iso1 and -2 are primarily located in the cell nucleus in part thanks to their shared C-terminal domain. Furthermore, we showed that LRRFIP1-Iso1 is preferentially expressed in the myocardium and skeletal muscle. Using the in vitro Topflash reporter assay we revealed that among LRRFIP1 isoforms, LRRFIP1-Iso1 is the strongest enhancer of the ß-catenin Wnt canonical transcription pathway thanks to a specific N-terminal domain harboring two critical tryptophan residues (W76, 82). In addition, we showed that the Wnt enhancer properties of LRRFIP1-Iso1 depend on its homo-dimerisation which is governed by its specific coiled coil domain. Together our study identified LRRFIP1-Iso1 as a critical regulator of the Wnt canonical pathway with a potential role in myocyte differentiation and myogenesis.
Asunto(s)
Proteínas de Unión al ARN/metabolismo , Vía de Señalización Wnt , Empalme Alternativo , Animales , Células Cultivadas , Células HEK293 , Humanos , Masculino , Ratones , Músculo Esquelético/metabolismo , Miocardio/metabolismo , Dominios Proteicos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas de Unión al ARN/química , Proteínas de Unión al ARN/genética , Ratas , Ratas Sprague-Dawley , Proteínas Wnt/genética , Proteínas Wnt/metabolismo , beta Catenina/genética , beta Catenina/metabolismoRESUMEN
Genotyping is the process of determining differences in the genetic make-up of an individual and comparing it to that of another individual. Focus on the family of chemosensory proteins (CSPs) in insects reveals differences at the genomic level across various strains and biotypes, but none at the level of individuals, which could be extremely useful in the biotyping of insect pest species necessary for the agricultural, medical and veterinary industries. Proposed methods of genotyping CSPs include not only restriction enzymatic cleavage and amplification of cleaved polymorphic sequences, but also detection of retroposons in some specific regions of the insect chromosome. Design of biosensors using CSPs addresses tissue-specific RNA mutations in a particular subtype of the protein, which could be used as a marker of specific physiological conditions. Additionally, we refer to the binding properties of CSP proteins tuned to lipids and xenobiotic insecticides for the development of a new generation of biosensor chips, monitoring lipid blood concentration and chemical environmental pollution.
Asunto(s)
Proteínas de Insectos/genética , Animales , Genotipo , Insectos , Filogenia , Receptores OdorantesRESUMEN
Sucrose phosphorylases, through transglycosylation reactions, are interesting enzymes that can transfer regioselectively glucose from sucrose, the donor substrate, onto acceptors like flavonoids to form glycoconjugates and hence modulate their solubility and bioactivity. Here, we report for the first time the structure of sucrose phosphorylase from the marine bacteria Alteromonas mediterranea (AmSP) and its enzymatic properties. Kinetics of sucrose hydrolysis and transglucosylation capacities on (+)-catechin were investigated. Wild-type enzyme (AmSP-WT) displayed high hydrolytic activity on sucrose and was devoid of transglucosylation activity on (+)-catechin. Two variants, AmSP-Q353F and AmSP-P140D catalysed the regiospecific transglucosylation of (+)-catechin: 89 % of a novel compound (+)-catechin-4'-O-α-d-glucopyranoside (CAT-4') for AmSP-P140D and 92 % of (+)-catechin-3'-O-α-d-glucopyranoside (CAT-3') for AmSP-Q353F. The compound CAT-4' was fully characterized by NMR and mass spectrometry. An explanation for this difference in regiospecificity was provided at atomic level by molecular docking simulations: AmSP-P140D was found to preferentially bind (+)-catechin in a mode that favours glucosylation on its hydroxyl group in position 4' while the binding mode in AmSP-Q353F favoured glucosylation on its hydroxyl group in position 3'.
Asunto(s)
Catequina , Glucosiltransferasas , Glucosiltransferasas/metabolismo , Glucosiltransferasas/química , Catequina/metabolismo , Catequina/química , Glicosilación , Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Proteínas Bacterianas/genética , Especificidad por Sustrato , Simulación del Acoplamiento Molecular , Cinética , HidrólisisRESUMEN
Insect Odorant Binding Proteins (OBPs) constitute important components of their olfactory apparatus, as they are essential for odor recognition. OBPs undergo conformational changes upon pH change, altering their interactions with odorants. Moreover, they can form heterodimers with novel binding characteristics. Anopheles gambiae OBP1 and OBP4 were found capable of forming heterodimers possibly involved in the specific perception of the attractant indole. In order to understand how these OBPs interact in the presence of indole and to investigate the likelihood of a pH-dependent heterodimerization mechanism, the crystal structures of OBP4 at pH 4.6 and 8.5 were determined. Structural comparison to each other and with the OBP4-indole complex (3Q8I, pH 6.85) revealed a flexible N-terminus and conformational changes in the α4-loop-α5 region at acidic pH. Fluorescence competition assays showed a weak binding of indole to OBP4 that becomes further impaired at acidic pH. Additional Molecular Dynamic and Differential Scanning Calorimetry studies displayed that the influence of pH on OBP4 stability is significant compared to the modest effect of indole. Furthermore, OBP1-OBP4 heterodimeric models were generated at pH 4.5, 6.5, and 8.5, and compared concerning their interface energy and cross-correlated motions in the absence and presence of indole. The results indicate that the increase in pH may induce the stabilization of OBP4 by increasing its helicity, thereby enabling indole binding at neutral pH that further stabilizes the protein and possibly promotes the creation of a binding site for OBP1. A decrease in interface stability and loss of correlated motions upon transition to acidic pH may provoke the heterodimeric dissociation allowing indole release. Finally, we propose a potential OBP1-OBP4 heterodimer formation/disruption mechanism induced by pH change and indole binding.
Asunto(s)
Anopheles , Receptores Odorantes , Animales , Odorantes , Anopheles/química , Anopheles/metabolismo , Receptores Odorantes/química , Sitios de Unión , Indoles/química , Concentración de Iones de Hidrógeno , Proteínas de Insectos/metabolismoRESUMEN
Accurate functional annotation of protein sequences is hampered by important factors such as the failure of sequence search methods to identify relationships and the inherent diversity in function of proteins related at low sequence similarities. Earlier, we had employed intermediate sequence search approach to establish new domain relationships in the unassigned regions of gene products at the whole genome level by taking Mycoplasma gallisepticum as a specific example and established new domain relationships. In this paper, we report a detailed comparison of the conservation status of the domain and domain architectures of the gene products that bear our newly predicted domains amongst 14 other Mycoplasma genomes and reported the probable implications for the organisms. Some of the domain associations, observed in Mycoplasma that afflict humans and other non-human primates, are involved in regulation of solute transport and DNA binding suggesting specific modes of host-pathogen interactions.
RESUMEN
The interaction between two proteins may involve local movements, such as small side-chains re-positioning or more global allosteric movements, such as domain rearrangement. We studied how one can build a precise and detailed protein-protein interface using existing protein-protein docking methods, and how it can be possible to enhance the initial structures using molecular dynamics simulations and data-driven human inspection. We present how this strategy was applied to the modeling of RHOA-ARHGEF1 interaction using similar complexes of RHOA bound to other members of the Rho guanine nucleotide exchange factor family for comparative assessment. In parallel, a more crude approach based on structural superimposition and molecular replacement was also assessed. Both models were then successfully refined using molecular dynamics simulations leading to protein structures where the major data from scientific literature could be recovered. We expect that the detailed strategy used in this work will prove useful for other protein-protein interface design. The RHOA-ARHGEF1 interface modeled here will be extremely useful for the design of inhibitors targeting this protein-protein interaction (PPI).
RESUMEN
We studied the expression profile and ontogeny (from the egg stage through the larval stages and pupal stages, to the elderly adult age) of four OBPs from the silkworm moth Bombyx mori. We first showed that male responsiveness to female sex pheromone in the silkworm moth B. mori does not depend on age variation; whereas the expression of BmorPBP1, BmorPBP2, BmorGOBP1, and BmorGOBP2 varies with age. The expression profile analysis revealed that the studied OBPs are expressed in non-olfactory tissues at different developmental stages. In addition, we tested the effect of insecticide exposure on the expression of the four OBPs studied. Exposure to a toxic macrolide insecticide endectocide molecule (abamectin) led to the modulated expression of all four genes in different tissues. The higher expression of OBPs was detected in metabolic tissues, such as the thorax, gut, and fat body. All these data strongly suggest some alternative functions for these proteins other than olfaction. Finally, we carried out ligand docking studies and reported that PBP1 and GOBP2 have the capacity of binding vitamin K1 and multiple different vitamins.
RESUMEN
CP12, a small intrinsically unstructured protein, plays an important role in the regulation of the Calvin cycle by forming a complex with phosphoribulokinase (PRK) and glyceraldehyde-3-phosphate dehydrogenase (GAPDH). An extensive search in databases revealed 129 protein sequences from, higher plants, mosses and liverworts, different groups of eukaryotic algae and cyanobacteria. CP12 was identified throughout the Plantae, apart from in the Prasinophyceae. Within the Chromalveolata, two putative CP12 proteins have been found in the genomes of the diatom Thalassiosira pseudonana and the haptophyte Emiliania huxleyi, but specific searches in further chromalveolate genomes or EST datasets did not reveal any CP12 sequences in other Prymnesiophyceae, Dinophyceae or Pelagophyceae. A species from the Euglenophyceae within the Excavata also appeared to lack CP12. Phylogenetic analysis showed a clear separation into a number of higher taxonomic clades and among different forms of CP12 in higher plants. Cyanobacteria, Chlorophyceae, Rhodophyta and Glaucophyceae, Bryophyta, and the CP12-3 forms in higher plants all form separate clades. The degree of disorder of CP12 was higher in higher plants than in the eukaryotic algae and cyanobacteria apart from the green algal class Mesostigmatophyceae, which is ancestral to the streptophytes. This suggests that CP12 has evolved to become more flexible and possibly take on more general roles. Different features of the CP12 sequences in the different taxonomic groups and their potential functions and interactions in the Calvin cycle are discussed.
Asunto(s)
Fotosíntesis , Proteínas de Plantas/química , Plantas/metabolismo , Análisis de Secuencia de Proteína , Proteínas Algáceas/química , Secuencia de Aminoácidos , Eucariontes/genética , Etiquetas de Secuencia Expresada , Genoma/genética , Datos de Secuencia Molecular , Filogenia , Homología de Secuencia de AminoácidoRESUMEN
Prediction of protein structures using computational approaches has been explored for over two decades, paving a way for more focused research and development of algorithms in comparative modelling, ab intio modelling and structure refinement protocols. A tremendous success has been witnessed in template-based modelling protocols, whereas strategies that involve template-free modelling still lag behind, specifically for larger proteins (>150 a.a.). Various improvements have been observed in ab initio protein structure prediction methodologies overtime, with recent ones attributed to the usage of deep learning approaches to construct protein backbone structure from its amino acid sequence. This review highlights the major strategies undertaken for template-free modelling of protein structures while discussing few tools developed under each strategy. It will also briefly comment on the progress observed in the field of ab initio modelling of proteins over the course of time as seen through the evolution of CASP platform.
Asunto(s)
Algoritmos , Biología Computacional , Bases de Datos de Proteínas , Modelos Moleculares , Pliegue de Proteína , Proteínas/química , Conformación Proteica , Proteínas/genéticaRESUMEN
In this review we present the developmental, histological, evolutionary and functional properties of insect chemosensory proteins (CSPs) in insect species. CSPs are small globular proteins folded like a prism and notoriously known for their complex and arguably obscure function(s), particularly in pheromone olfaction. Here, we focus on direct functional consequences on protein function depending on duplication, expression and RNA editing. The result of our analysis is important for understanding the significance of RNA-editing on functionality of CSP genes, particularly in the brain tissue.
Asunto(s)
Evolución Biológica , Proteínas de Insectos/metabolismo , Insectos/metabolismo , Filogenia , Receptores Odorantes/metabolismo , Células Receptoras Sensoriales/metabolismo , Animales , Proteínas de Insectos/genética , Insectos/genética , Receptores Odorantes/genéticaRESUMEN
Understanding the structural plasticity of proteins is key to understanding the intricacies of their functions and mechanistic basis. In the current study, we analyzed the available multiple crystal structures of the same protein for the structural differences. For this purpose we used an abstraction of protein structures referred as Protein Blocks (PBs) that was previously established. We also characterized the nature of the structural variations for a few proteins using molecular dynamics simulations. In both the cases, the structural variations were summarized in the form of substitution matrices of PBs. We show that certain conformational states are preferably replaced by other specific conformational states. Interestingly, these structural variations are highly similar to those previously observed across structures of homologous proteins (r2â¯=â¯0.923) or across the ensemble of conformations from NMR data (r2â¯=â¯0.919). Thus our study quantitatively shows that overall trends of structural changes in a given protein are nearly identical to the trends of structural differences that occur in the topologically equivalent positions in homologous proteins. Specific case studies are used to illustrate the nature of these structural variations.
Asunto(s)
Dominios Proteicos , Proteínas/química , Homología Estructural de Proteína , Animales , Bacterias/metabolismo , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto , Humanos , Ratones , Simulación de Dinámica MolecularRESUMEN
The selection of optimal enzyme concentration in multienzyme cascade reactions for the highest product yield in practice is very expensive and time-consuming process. The modelling of biological pathways is a difficult process because of the complexity of the system. The mathematical modelling of the system using an analytical approach depends on the many parameters of enzymes which rely on tedious and expensive experiments. The artificial neural network (ANN) method has been successively applied in different fields of science to perform complex functions. In this study, ANN models were trained to predict the flux for the upper part of glycolysis as inferred by NADH consumption, using four enzyme concentrations i.e., phosphoglucoisomerase, phosphofructokinase, fructose-bisphosphate-aldolase, triose-phosphate-isomerase. Out of three ANN algorithms, the neuralnet package with two activation functions, "logistic" and "tanh" were implemented. The prediction of the flux was very efficient: RMSE and R2 were 0.847, 0.93 and 0.804, 0.94 respectively for logistic and tanh functions using a cross validation procedure. This study showed that a systemic approach such as ANN could be used for accurate prediction of the flux through the metabolic pathway. This could help to save a lot of time and costs, particularly from an industrial perspective. The R-code is available at: https://github.com/DSIMB/ANN-Glycolysis-Flux-Prediction.
Asunto(s)
Glucólisis , Análisis de Flujos Metabólicos , Redes Neurales de la Computación , Algoritmos , Análisis de Flujos Metabólicos/métodos , Redes y Vías Metabólicas , NAD/metabolismoRESUMEN
The relationship between the immune repertoire and the physiopathological status of individuals is essential to apprehend the genesis and the evolution of numerous pathologies. Nevertheless, the methodological approaches to understand these complex interactions are challenging. We performed a study evaluating the diversity harbored by different immune repertoires as a function of their physiopathological status. In this study, we base our analysis on a murine scFv library previously described and representing four different immune repertoires: i) healthy and naïve, ii) healthy and immunized, iii) autoimmune prone and naïve, and iv) autoimmune prone and immunized. This library, 2.6 × 109 in size, is submitted to high throughput sequencing (Next Generation Sequencing, NGS) in order to analyze the gene subgroups encoding for immunoglobulins. A comparative study of the distribution of immunoglobulin gene subgroups present in the four libraries has revealed shifts in the B cell repertoire originating from differences in genetic background and immunological status of mice.
Asunto(s)
Linfocitos B/inmunología , Antecedentes Genéticos , Ratones/genética , Anticuerpos de Cadena Única/inmunología , Animales , Autoinmunidad , Linfocitos B/metabolismo , Biblioteca de Genes , Inmunización , Fenómenos Inmunogenéticos , Ratones/inmunología , Ratones Endogámicos BALB C , Anticuerpos de Cadena Única/genéticaRESUMEN
BACKGROUND: Protein domains are the structural and functional units of proteins. The ability to parse proteins into different domains is important for effective classification, understanding of protein structure, function, and evolution and is hence biologically relevant. Several computational methods are available to identify domains in the sequence. Domain finding algorithms often employ stringent thresholds to recognize sequence domains. Identification of additional domains can be tedious involving intense computation and manual intervention but can lead to better understanding of overall biological function. In this context, the problem of identifying new domains in the unassigned regions of a protein sequence assumes a crucial importance. RESULTS: We had earlier demonstrated that accumulation of domain information of sequence homologues can substantially aid prediction of new domains. In this paper, we propose a computationally intensive, multi-step bioinformatics protocol as a web server named as PURE (Prediction of Unassigned REgions in proteins) for the detailed examination of stretches of unassigned regions in proteins. Query sequence is processed using different automated filtering steps based on length, presence of coiled-coil regions, transmembrane regions, homologous sequences and percentage of secondary structure content. Later, the filtered sequence segments and their sequence homologues are fed to PSI-BLAST, cd-hit and Hmmpfam. Data from the various programs are integrated and information regarding the probable domains predicted from the sequence is reported. CONCLUSION: We have implemented PURE protocol as a web server for rapid and comprehensive analysis of unassigned regions in the proteins. This server integrates data from different programs and provides information about the domains encoded in the unassigned regions.
Asunto(s)
Biología Computacional/métodos , Estructura Terciaria de Proteína , Programas Informáticos , Adenilil Ciclasas/ultraestructura , Secuencias de Aminoácidos , Animales , Análisis por Conglomerados , Bases de Datos de Proteínas , Humanos , Mycoplasma gallisepticum/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Estructura Terciaria de Proteína/fisiología , Análisis de Secuencia de Proteína , Homología Estructural de ProteínaRESUMEN
BACKGROUND: Distantly related proteins adopt and retain similar structural scaffolds despite length variations that could be as much as two-fold in some protein superfamilies. In this paper, we describe an analysis of indel regions that accommodate length variations amongst related proteins. We have developed an algorithm CUSP, to examine multi-membered PASS2 superfamily alignments to identify indel regions in an automated manner. Further, we have used the method to characterize the length, structural type and biochemical features of indels in related protein domains. RESULTS: CUSP, examines protein domain structural alignments to distinguish regions of conserved structure common to related proteins from structurally unconserved regions that vary in length and type of structure. On a non-redundant dataset of 353 domain superfamily alignments from PASS2, we find that 'length- deviant' protein superfamilies show > 30% length variation from their average domain length. 60% of additional lengths that occur in indels are short-length structures (< 5 residues) while 6% of indels are > 15 residues in length. Structural types in indels also show class-specific trends. CONCLUSION: The extent of length variation varies across different superfamilies and indels show class-specific trends for preferred lengths and structural types. Such indels of different lengths even within a single protein domain superfamily could have structural and functional consequences that drive their selection, underlying their importance in similarity detection and computational modelling. The availability of systematic algorithms, like CUSP, should enable decision making in a domain superfamily-specific manner.
Asunto(s)
Algoritmos , Secuencia Conservada/genética , Modelos Moleculares , Estructura Terciaria de Proteína , Proteínas/genética , Alineación de Secuencia/métodos , Conformación ProteicaRESUMEN
BACKGROUND: Disulphide bridges are well known to play key roles in stability, folding and functions of proteins. Introduction or deletion of disulphides by site-directed mutagenesis have produced varying effects on stability and folding depending upon the protein and location of disulphide in the 3-D structure. Given the lack of complete understanding it is worthwhile to learn from an analysis of extent of conservation of disulphides in homologous proteins. We have also addressed the question of what structural interactions replaces a disulphide in a homologue in another homologue. RESULTS: Using a dataset involving 34,752 pairwise comparisons of homologous protein domains corresponding to 300 protein domain families of known 3-D structures, we provide a comprehensive analysis of extent of conservation of disulphide bridges and their structural features. We report that only 54% of all the disulphide bonds compared between the homologous pairs are conserved, even if, a small fraction of the non-conserved disulphides do include cytoplasmic proteins. Also, only about one fourth of the distinct disulphides are conserved in all the members in protein families. We note that while conservation of disulphide is common in many families, disulphide bond mutations are quite prevalent. Interestingly, we note that there is no clear relationship between sequence identity between two homologous proteins and disulphide bond conservation. Our analysis on structural features at the sites where cysteines forming disulphide in one homologue are replaced by non-Cys residues show that the elimination of a disulphide in a homologue need not always result in stabilizing interactions between equivalent residues. CONCLUSION: We observe that in the homologous proteins, disulphide bonds are conserved only to a modest extent. Very interestingly, we note that extent of conservation of disulphide in homologous proteins is unrelated to the overall sequence identity between homologues. The non-conserved disulphides are often associated with variable structural features that were recruited to be associated with differentiation or specialisation of protein function.
Asunto(s)
Disulfuros/química , Proteínas/química , Homología Estructural de Proteína , Secuencia Conservada , Cistina/química , Bases de Datos de Proteínas , Conformación Proteica , Estructura Terciaria de Proteína , Alineación de Secuencia , Solventes/químicaRESUMEN
Directed evolution is an important research activity in synthetic biology and biotechnology. Numerous reports describe the application of tedious mutation/screening cycles for the improvement of proteins. Recently, knowledge-based approaches have facilitated the prediction of protein properties and the identification of improved mutants. However, epistatic phenomena constitute an obstacle which can impair the predictions in protein engineering. We present an innovative sequence-activity relationship (innov'SAR) methodology based on digital signal processing combining wet-lab experimentation and computational protein design. In our machine learning approach, a predictive model is developed to find the resulting property of the protein when the n single point mutations are permuted (2n combinations). The originality of our approach is that only sequence information and the fitness of mutants measured in the wet-lab are needed to build models. We illustrate the application of the approach in the case of improving the enantioselectivity of an epoxide hydrolase from Aspergillus niger. n = 9 single point mutants of the enzyme were experimentally assessed for their enantioselectivity and used as a learning dataset to build a model. Based on combinations of the 9 single point mutations (29), the enantioselectivity of these 512 variants were predicted, and candidates were experimentally checked: better mutants with higher enantioselectivity were indeed found.