Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Proteins ; 92(2): 265-281, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37855235

RESUMEN

Amyloids, protein, and peptide assemblies in various organisms are crucial in physiological and pathological processes. Their intricate structures, however, present significant challenges, limiting our understanding of their functions, regulatory mechanisms, and potential applications in biomedicine and technology. This study evaluated the AlphaFold2 ColabFold method's structure predictions for antimicrobial amyloids, using eight antimicrobial peptides (AMPs), including those with experimentally determined structures and AMPs known for their distinct amyloidogenic morphological features. Additionally, two well-known human amyloids, amyloid-ß and islet amyloid polypeptide, were included in the analysis due to their disease relevance, short sequences, and antimicrobial properties. Amyloids typically exhibit tightly mated ß-strand sheets forming a cross-ß configuration. However, certain amphipathic α-helical subunits can also form amyloid fibrils adopting a cross-α structure. Some AMPs in the study exhibited a combination of cross-α and cross-ß amyloid fibrils, adding complexity to structure prediction. The results showed that the AlphaFold2 ColabFold models favored α-helical structures in the tested amyloids, successfully predicting the presence of α-helical mated sheets and a hydrophobic core resembling the cross-α configuration. This implies that the AI-based algorithms prefer assemblies of the monomeric state, which was frequently predicted as helical, or capture an α-helical membrane-active form of toxic peptides, which is triggered upon interaction with lipid membranes.


Asunto(s)
Amiloide , Antiinfecciosos , Humanos , Amiloide/química , Péptidos beta-Amiloides/química , Antiinfecciosos/farmacología , Polipéptido Amiloide de los Islotes Pancreáticos/metabolismo , Conformación Proteica en Hélice alfa
2.
Proc Natl Acad Sci U S A ; 118(31)2021 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-34330833

RESUMEN

Outer-membrane beta barrels (OMBBs) are found in the outer membrane of gram-negative bacteria and eukaryotic organelles. OMBBs fold as antiparallel ß-sheets that close onto themselves, forming pores that traverse the membrane. Currently known structures include only one barrel, of 8 to 36 strands, per chain. The lack of multi-OMBB chains is surprising, as most OMBBs form oligomers, and some function only in this state. Using a combination of sensitive sequence comparison methods and coevolutionary analysis tools, we identify many proteins combining multiple beta barrels within a single chain; combinations that include eight-stranded barrels prevail. These multibarrels seem to be the result of independent, lineage-specific fusion and amplification events. The absence of multibarrels that are universally conserved in bacteria with an outer membrane, coupled with their frequent de novo genesis, suggests that their functions are not essential but rather beneficial in specific environments. Adjacent barrels of complementary function within the same chain may allow for functions beyond those of the individual barrels.


Asunto(s)
Proteínas de la Membrana Bacteriana Externa/química , Gammaproteobacteria/metabolismo , Proteínas de la Membrana Bacteriana Externa/clasificación , Proteínas de la Membrana Bacteriana Externa/metabolismo , Simulación por Computador , Cadenas de Markov , Modelos Moleculares , Conformación Proteica , Dominios Proteicos
3.
PLoS Comput Biol ; 18(2): e1009833, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35157697

RESUMEN

As sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, ß-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If ß-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the ß-trefoil lineage itself arose de novo. To better understand ß-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both 'ß-trefoil bridging themes' (evolutionarily-related sequence segments) and 'ß-trefoil-like motifs' (structure motifs with a hallmark feature of the ß-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering ß-trefoil sequence segments or structure motifs rather than the ß-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the ß-trefoil fold itself-namely, that it is a derived fold formed by 'budding' from an Immunoglobulin-like ß-sandwich protein. These results demonstrate how the evolution of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature's sewing table.


Asunto(s)
Lotus , Inmunoglobulina G , Modelos Moleculares , Péptidos/química , Pliegue de Proteína
4.
Proc Natl Acad Sci U S A ; 117(9): 4701-4709, 2020 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-32079721

RESUMEN

Proteins' interactions with ancient ligands may reveal how molecular recognition emerged and evolved. We explore how proteins recognize adenine: a planar rigid fragment found in the most common and ancient ligands. We have developed a computational pipeline that extracts protein-adenine complexes from the Protein Data Bank, structurally superimposes their adenine fragments, and detects the hydrogen bonds mediating the interaction. Our analysis extends the known motifs of protein-adenine interactions in the Watson-Crick edge of adenine and shows that all of adenine's edges may contribute to molecular recognition. We further show that, on the proteins' side, binding is often mediated by specific amino acid segments ("themes") that recur across different proteins, such that different proteins use the same themes when binding the same adenine-containing ligands. We identify numerous proteins that feature these themes and are thus likely to bind adenine-containing ligands. Our analysis suggests that adenine binding has emerged multiple times in evolution.


Asunto(s)
Adenina/metabolismo , Evolución Molecular , Simulación del Acoplamiento Molecular/métodos , Conformación Proteica , Adenina/química , Sitios de Unión , Enlace de Hidrógeno , Unión Proteica , Análisis de Secuencia de Proteína/métodos , Programas Informáticos
5.
Mol Biol Evol ; 38(6): 2191-2208, 2021 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-33502503

RESUMEN

The vast majority of theoretically possible polypeptide chains do not fold, let alone confer function. Hence, protein evolution from preexisting building blocks has clear potential advantages over ab initio emergence from random sequences. In support of this view, sequence similarities between different proteins is generally indicative of common ancestry, and we collectively refer to such homologous sequences as "themes." At the domain level, sequence homology is routinely detected. However, short themes which are segments, or fragments of intact domains, are particularly interesting because they may provide hints about the emergence of domains, as opposed to divergence of preexisting domains, or their mixing-and-matching to form multi-domain proteins. Here we identified 525 representative short themes, comprising 20-80 residues that are unexpectedly shared between domains considered to have emerged independently. Among these "bridging themes" are ones shared between the most ancient domains, for example, Rossmann, P-loop NTPase, TIM-barrel, flavodoxin, and ferredoxin-like. We elaborate on several particularly interesting cases, where the bridging themes mediate ligand binding. Ligand binding may have contributed to the stability and the plasticity of these building blocks, and to their ability to invade preexisting domains or serve as starting points for completely new domains.


Asunto(s)
Evolución Molecular , Péptidos/genética , Dominios Proteicos/genética , Proteínas/genética , Homología de Secuencia de Aminoácido
6.
Clin Infect Dis ; 73(7): e2444-e2449, 2021 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-32797228

RESUMEN

BACKGROUND: Coronavirus disease 2019 (COVID-19) and dengue fever are difficult to distinguish given shared clinical and laboratory features. Failing to consider COVID-19 due to false-positive dengue serology can have serious implications. We aimed to assess this possible cross-reactivity. METHODS: We analyzed clinical data and serum samples from 55 individuals with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. To assess dengue serology status, we used dengue-specific antibodies by means of lateral-flow rapid test, as well as enzyme-linked immunosorbent assay (ELISA). Additionally, we tested SARS-CoV-2 serology status in patients with dengue and performed in-silico protein structural analysis to identify epitope similarities. RESULTS: Using the dengue lateral-flow rapid test we detected 12 positive cases out of the 55 (21.8%) COVID-19 patients versus zero positive cases in a control group of 70 healthy individuals (P = 2.5E-5). This includes 9 cases of positive immunoglobulin M (IgM), 2 cases of positive immunoglobulin G (IgG), and 1 case of positive IgM as well as IgG antibodies. ELISA testing for dengue was positive in 2 additional subjects using envelope protein directed antibodies. Out of 95 samples obtained from patients diagnosed with dengue before September 2019, SARS-CoV-2 serology targeting the S protein was positive/equivocal in 21 (22%) (16 IgA, 5 IgG) versus 4 positives/equivocal in 102 controls (4%) (P = 1.6E-4). Subsequent in-silico analysis revealed possible similarities between SARS-CoV-2 epitopes in the HR2 domain of the spike protein and the dengue envelope protein. CONCLUSIONS: Our findings support possible cross-reactivity between dengue virus and SARS-CoV-2, which can lead to false-positive dengue serology among COVID-19 patients and vice versa. This can have serious consequences for both patient care and public health.


Asunto(s)
COVID-19 , Virus del Dengue , Anticuerpos Antivirales , Reacciones Cruzadas , Humanos , SARS-CoV-2
7.
Proc Natl Acad Sci U S A ; 114(44): 11703-11708, 2017 10 31.
Artículo en Inglés | MEDLINE | ID: mdl-29078314

RESUMEN

Proteins share similar segments with one another. Such "reused parts"-which have been successfully incorporated into other proteins-are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment "reuse" across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for "themes"-segments of at least 35 residues of similar sequence and structure-reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.


Asunto(s)
Evolución Molecular , Proteínas/química , Proteínas/genética , Secuencia de Aminoácidos , Biología Computacional/métodos , Bases de Datos de Proteínas , Modelos Genéticos , Conformación Proteica
8.
Int J Mol Sci ; 21(16)2020 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-32824094

RESUMEN

Classical congenital adrenal hyperplasia (CAH) caused by pathogenic variants in the steroid 21-hydroxylase gene (CYP21A2) is a severe life-threatening condition. We present a detailed investigation of the molecular and functional characteristics of a novel pathogenic variant in this gene. The patient, 46 XX newborn, was diagnosed with classical salt wasting CAH in the neonatal period after initially presenting with ambiguous genitalia. Multiplex ligation-dependent probe analysis demonstrated a full deletion of the paternal CYP21A2 gene, and Sanger sequencing revealed a novel de novo CYP21A2 variant c.694-696del (E232del) in the other allele. This variant resulted in the deletion of a non-conserved single amino acid, and its functional relevance was initially undetermined. We used both in silico and in vitro methods to determine the mechanistic significance of this mutation. Computational analysis relied on the solved structure of the protein (Protein-data-bank ID 4Y8W), structure prediction of the mutated protein, evolutionary analysis, and manual inspection. We predicted impaired stability and functionality of the protein due to a rotatory disposition of amino acids in positions downstream of the deletion. In vitro biochemical evaluation of enzymatic activity supported these predictions, demonstrating reduced protein levels to 22% compared to the wild-type form and decreased hydroxylase activity to 1-4%. This case demonstrates the potential of combining in-silico analysis based on evolutionary information and structure prediction with biochemical studies. This approach can be used to investigate other genetic variants to understand their potential effects.


Asunto(s)
Simulación por Computador , Mutación/genética , Esteroide 21-Hidroxilasa/química , Esteroide 21-Hidroxilasa/genética , Preescolar , Evolución Molecular , Femenino , Humanos , Lactante , Recién Nacido
10.
Proc Natl Acad Sci U S A ; 111(32): 11691-6, 2014 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-25071170

RESUMEN

To explore protein space from a global perspective, we consider 9,710 SCOP (Structural Classification of Proteins) domains with up to 70% sequence identity and present all similarities among them as networks: In the "domain network," nodes represent domains, and edges connect domains that share "motifs," i.e., significantly sized segments of similar sequence and structure. We explore the dependence of the network on the thresholds that define the evolutionary relatedness of the domains. At excessively strict thresholds the network falls apart completely; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions that can be described as "continuous" versus "discrete." The continuous region comprises a large connected component, dominated by domains with alternating alpha and beta elements, and the discrete region includes the rest of the domains in isolated islands, each generally corresponding to a fold. We also construct the "motif network," in which nodes represent recurring motifs, and edges connect motifs that appear in the same domain. This network also features a large and highly connected component of motifs that originate from domains with alternating alpha/beta elements (and some all-alpha domains), and smaller isolated islands. Indeed, the motif network suggests that nature reuses such motifs extensively. The networks suggest evolutionary paths between domains and give hints about protein evolution and the underlying biophysics. They provide natural means of organizing protein space, and could be useful for the development of strategies for protein search and design.


Asunto(s)
Proteínas/química , Secuencias de Aminoácidos , Fenómenos Biofísicos , Bases de Datos de Proteínas , Evolución Molecular , Modelos Moleculares , Estructura Terciaria de Proteína , Proteínas/genética , Alineación de Secuencia , Homología Estructural de Proteína
11.
Bioinformatics ; 30(16): 2295-301, 2014 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-24771517

RESUMEN

MOTIVATION: Structural knowledge, extracted from the Protein Data Bank (PDB), underlies numerous potential functions and prediction methods. The PDB, however, is highly biased: many proteins have more than one entry, while entire protein families are represented by a single structure, or even not at all. The standard solution to this problem is to limit the studies to non-redundant subsets of the PDB. While alleviating biases, this solution hides the many-to-many relations between sequences and structures. That is, non-redundant datasets conceal the diversity of sequences that share the same fold and the existence of multiple conformations for the same protein. A particularly disturbing aspect of non-redundant subsets is that they hardly benefit from the rapid pace of protein structure determination, as most newly solved structures fall within existing families. RESULTS: In this study we explore the concept of redundancy-weighted datasets, originally suggested by Miyazawa and Jernigan. Redundancy-weighted datasets include all available structures and associate them (or features thereof) with weights that are inversely proportional to the number of their homologs. Here, we provide the first systematic comparison of redundancy-weighted datasets with non-redundant ones. We test three weighting schemes and show that the distributions of structural features that they produce are smoother (having higher entropy) compared with the distributions inferred from non-redundant datasets. We further show that these smoothed distributions are both more robust and more correct than their non-redundant counterparts. We suggest that the better distributions, inferred using redundancy-weighting, may improve the accuracy of knowledge-based potentials and increase the power of protein structure prediction methods. Consequently, they may enhance model-driven molecular biology.


Asunto(s)
Conformación Proteica , Aminoácidos/química , Minería de Datos , Bases de Datos de Proteínas , Proteínas/química
12.
Proc Natl Acad Sci U S A ; 108(30): 12301-6, 2011 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-21737750

RESUMEN

To study the protein structure-function relationship, we propose a method to efficiently create three-dimensional maps of structure space using a very large dataset of > 30,000 Structural Classification of Proteins (SCOP) domains. In our maps, each domain is represented by a point, and the distance between any two points approximates the structural distance between their corresponding domains. We use these maps to study the spatial distributions of properties of proteins, and in particular those of local vicinities in structure space such as structural density and functional diversity. These maps provide a unique broad view of protein space and thus reveal previously undescribed fundamental properties thereof. At the same time, the maps are consistent with previous knowledge (e.g., domains cluster by their SCOP class) and organize in a unified, coherent representation previous observation concerning specific protein folds. To investigate the function-structure relationship, we measure the functional diversity (using the Gene Ontology controlled vocabulary) in local structural vicinities. Our most striking finding is that functional diversity varies considerably across structure space: The space has a highly diverse region, and diversity abates when moving away from it. Interestingly, the domains in this region are mostly alpha/beta structures, which are known to be the most ancient proteins. We believe that our unique perspective of structure space will open previously undescribed ways of studying proteins, their evolution, and the relationship between their structure and function.


Asunto(s)
Proteínas/química , Proteínas/fisiología , Fenómenos Biofísicos , Bases de Datos de Proteínas , Modelos Biológicos , Modelos Químicos , Mapeo Peptídico , Estructura Terciaria de Proteína
13.
Proc Natl Acad Sci U S A ; 107(8): 3481-6, 2010 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-20133727

RESUMEN

Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.


Asunto(s)
Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Bases de Datos de Proteínas , Conformación Proteica
14.
Structure ; 30(8): 1047-1049, 2022 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-35931059

RESUMEN

Accurate protein structure predictors use clusters of homologues, which disregard sequence specific effects. In this issue of Structure, Weißenow and colleagues report a deep learning-based tool, EMBER2, that efficiently predicts the distances in a protein structure from its amino acid sequence only. This approach should enable the analysis of mutation effects.


Asunto(s)
Biología Computacional , Aprendizaje Profundo , Secuencia de Aminoácidos , Lenguaje , Proteínas/química
15.
Protein Sci ; 31(9): e4407, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-36040261

RESUMEN

The emergence of novel proteins, beyond these that can be readily made by duplication and recombination of preexisting domains, is elusive. De novo emergence from random sequences is unlikely because the vast majority of random chains would not even fold, let alone function. An alternative explanation is that novel proteins emerge by duplication and fusion of pre-existing polypeptide segments. In this case, traces of such ancient events may remain within contemporary proteins in the form of reused segments. Together with the late Dan Tawfik, we detected such similar segments, far shorter than intact protein domains, which are found in different environments. The detection of these, "bridging themes," was based on a unique search strategy, where in addition to searching for similarity of shared fragments, so-called "themes," we also explicitly searched for cases in which the sequence segments before and after the theme are dissimilar (both in sequence and structure). Here, using a similar strategy, we further expanded the search and discovered almost 500 additional "bridging themes," linking domains that are often from ancient folds. The themes, of 20 residues or more (average 53), do not retain their structure despite sharing 37% sequence identity on average. Indeed, conformation flexibility may confer an evolutionary advantage, in that it fits in multiple environments. We elaborate on two interesting themes, shared between Rossmann/Trefoil-Plexin-like domains and a ß-propeller-like domain. FOR A BROAD AUDIENCE: A fundamental question in molecular evolution is how protein domains emerged. Similar segments shared between domains of seemingly distinct origins, may offer clues, as these may be remnants of the evolutionary process through which these domains emerged. However, finding such cases is difficult. Here, we expand the set of such cases which we curated previously, adding segments shared between domains that are considered ancient.


Asunto(s)
Evolución Molecular , Proteínas , Secuencia de Aminoácidos , Péptidos/química , Dominios Proteicos , Proteínas/química , Proteínas/genética
16.
J Mol Biol ; 434(7): 167462, 2022 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-35104498

RESUMEN

Understanding how proteins evolved not only resolves mysteries of the past, but also helps address challenges of the future, particularly those relating to the design and engineering of new protein functions. Here we review the work of Dan S. Tawfik, one of the pioneers of this area, highlighting his seminal contributions in diverse fields such as protein design, high throughput screening, protein stability, fundamental enzyme-catalyzed reactions and promiscuity, that underpin biology and the origins of life. We discuss the influence of his work on how our models of enzyme and protein function have developed and how the main driving forces of molecular evolution were elucidated. The discovery of the rugged routes of evolution has enabled many practical applications, some which are now widely used.


Asunto(s)
Enzimas , Evolución Molecular , Proteínas , Catálisis , Evolución Molecular Dirigida , Ensayos Analíticos de Alto Rendimiento
17.
BMC Struct Biol ; 11(1): 20, 2011 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-21542935

RESUMEN

BACKGROUND: Protein surfaces serve as an interface with the molecular environment and are thus tightly bound to protein function. On the surface, geometric and chemical complementarity to other molecules provides interaction specificity for ligand binding, docking of bio-macromolecules, and enzymatic catalysis.As of today, there is no accepted general scheme to represent protein surfaces. Furthermore, most of the research on protein surface focuses on regions of specific interest such as interaction, ligand binding, and docking sites. We present a first step toward a general purpose representation of protein surfaces: a novel surface patch library that represents most surface patches (~98%) in a data set regardless of their functional roles. RESULTS: Surface patches, in this work, are small fractions of the protein surface. Using a measure of inter-patch distance, we clustered patches extracted from a data set of high quality, non-redundant, proteins. The surface patch library is the collection of all the cluster centroids; thus, each of the data set patches is close to one of the elements in the library.We demonstrate the biological significance of our method through the ability of the library to capture surface characteristics of native protein structures as opposed to those of decoy sets generated by state-of-the-art protein structure prediction methods. The patches of the decoys are significantly less compatible with the library than their corresponding native structures, allowing us to reliably distinguish native models from models generated by servers. This trend, however, does not extend to the decoys themselves, as their similarity to the native structures does not correlate with compatibility with the library. CONCLUSIONS: We expect that this high-quality, generic surface patch library will add a new perspective to the description of protein structures and improve our ability to predict them. In particular, we expect that it will help improve the prediction of surface features that are apparently neglected by current techniques.The surface patch libraries are publicly available at http://www.cs.bgu.ac.il/~keasar/patchLibrary.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Proteínas/química , Algoritmos , Análisis por Conglomerados , Modelos Moleculares , Fragmentos de Péptidos/química , Conformación Proteica , Propiedades de Superficie
18.
Curr Opin Struct Biol ; 68: 105-112, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33476896

RESUMEN

Evolutionary processes that formed the current protein universe left their traces, among them homologous segments that recur, or are 'reused,' in multiple proteins. These reused segments, called 'themes,' can be found at various scales, the best known of which is the domain. Yet, recent studies have begun to focus on the evolutionary insights that can be derived from sub-domain-scale themes, which are candidates for traces of more ancient events. Characterizing these may provide clues to the emergence of domains. Particularly interesting are themes that are reused across dissimilar contexts, that is, where the rest of the protein domain differs. We survey computational studies identifying reused themes within different contexts at the sub-domain level.


Asunto(s)
Evolución Molecular , Proteínas , Dominios Proteicos , Proteínas/genética
19.
J Phys Chem B ; 125(24): 6440-6450, 2021 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-34105961

RESUMEN

The deep learning revolution introduced a new and efficacious way to address computational challenges in a wide range of fields, relying on large data sets and powerful computational resources. In protein engineering, we consider the challenge of computationally predicting properties of a protein and designing sequences with these properties. Indeed, accurate and fast deep network oracles for different properties of proteins have been developed. These learn to predict a property from an amino acid sequence by training on large sets of proteins that have this property. In particular, deep networks can learn from the set of all known protein sequences to identify ones that are protein-like. A fundamental challenge when engineering sequences that are both protein-like and satisfy a desired property is that these are rare instances within the vast space of all possible ones. When searching for these very rare instances, one would like to use good sampling procedures. Sampling approaches that are decoupled from the prediction of the property or in which the predictor uses only post-sampling to identify good instances are less efficient. The alternative is to use sampling methods that are geared to generate sequences satisfying and/or optimizing the predictor's desired properties. Deep learning has a class of architectures, denoted as generative models, which offer the capability of sampling from the learned distribution of a predicted property. Here, we review the use of deep learning tools to find good sequences for protein engineering, including developing oracles/predictors of a property of the proteins and methods that sample from a distribution of protein-like sequences to optimize the desired property.


Asunto(s)
Aprendizaje Profundo , Secuencia de Aminoácidos , Ingeniería de Proteínas , Proteínas/genética
20.
Elife ; 92020 12 09.
Artículo en Inglés | MEDLINE | ID: mdl-33295875

RESUMEN

This article is dedicated to the memory of Michael G. Rossmann. Dating back to the last universal common ancestor, P-loop NTPases and Rossmanns comprise the most ubiquitous and diverse enzyme lineages. Despite similarities in their overall architecture and phosphate binding motif, a lack of sequence identity and some fundamental structural differences currently designates them as independent emergences. We systematically searched for structure and sequence elements shared by both lineages. We detected homologous segments that span the first ßαß motif of both lineages, including the phosphate binding loop and a conserved aspartate at the tip of ß2. The latter ligates the catalytic metal in P-loop NTPases, while in Rossmanns it binds the nucleotide's ribose moiety. Tubulin, a Rossmann GTPase, demonstrates the potential of the ß2-Asp to take either one of these two roles. While convergence cannot be completely ruled out, we show that both lineages likely emerged from a common ßαß segment that comprises the core of these enzyme families to this very day.


Asunto(s)
Proteínas AAA/metabolismo , Proteínas AAA/química , Proteínas AAA/genética , Sitios de Unión , Evolución Molecular , Estructura Terciaria de Proteína , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA