RESUMO
Carbohydrate binding modules (CBMs) are protein domains that typically reside near catalytic domains, increasing substrate-protein proximity by constraining the conformational space of carbohydrates. Due to the flexibility and variability of glycans, the molecular details of how these protein regions recognize their target molecules are not always fully understood. Computational methods, including molecular docking and molecular dynamics simulations, have been employed to investigate lectin-carbohydrate interactions. In this study, we introduce a novel approach that integrates multiple computational techniques to identify the critical amino acids involved in the interaction between a CBM located at the tip of bacteriophage J-1's tail and its carbohydrate counterparts. Our results highlight three amino acids that play a significant role in binding, a finding we confirmed through in vitro experiments. By presenting this approach, we offer an intriguing alternative for pinpointing amino acids that contribute to protein-sugar interactions, leading to a more thorough comprehension of the molecular determinants of protein-carbohydrate interactions.
Assuntos
Aminoácidos , Biologia Computacional , Aminoácidos/química , Aminoácidos/metabolismo , Simulação de Dinâmica Molecular , Carboidratos/química , Simulação de Acoplamento Molecular , Ligação Proteica , Sítios de Ligação , Proteínas Virais/química , Proteínas Virais/metabolismo , Proteínas Virais/genéticaRESUMO
According to the Principle of Minimal Frustration, folded proteins can only have a minimal number of strong energetic conflicts in their native states. However, not all interactions are energetically optimized for folding but some remain in energetic conflict, i.e. they are highly frustrated. This remaining local energetic frustration has been shown to be statistically correlated with distinct functional aspects such as protein-protein interaction sites, allosterism and catalysis. Fuelled by the recent breakthroughs in efficient protein structure prediction that have made available good quality models for most proteins, we have developed a strategy to calculate local energetic frustration within large protein families and quantify its conservation over evolutionary time. Based on this evolutionary information we can identify how stability and functional constraints have appeared at the common ancestor of the family and have been maintained over the course of evolution. Here, we present FrustraEvo, a web server tool to calculate and quantify the conservation of local energetic frustration in protein families.
Assuntos
Internet , Dobramento de Proteína , Proteínas , Software , Proteínas/química , Termodinâmica , Conformação Proteica , Evolução Molecular , Modelos MolecularesRESUMO
Wnt proteins are secreted hydrophobic glycoproteins that act over long distances through poorly understood mechanisms. We discovered that Wnt7a is secreted on extracellular vesicles (EVs) following muscle injury. Structural analysis identified the motif responsible for Wnt7a secretion on EVs that we term the Exosome Binding Peptide (EBP). Addition of the EBP to an unrelated protein directed secretion on EVs. Disruption of palmitoylation, knockdown of WLS, or deletion of the N-terminal signal peptide did not affect Wnt7a secretion on purified EVs. Bio-ID analysis identified Coatomer proteins as candidates responsible for loading Wnt7a onto EVs. The crystal structure of EBP bound to the COPB2 coatomer subunit, the binding thermodynamics, and mutagenesis experiments, together demonstrate that a dilysine motif in the EBP mediates binding to COPB2. Other Wnts contain functionally analogous structural motifs. Mutation of the EBP results in a significant impairment in the ability of Wnt7a to stimulate regeneration, indicating that secretion of Wnt7a on exosomes is critical for normal regeneration in vivo . Our studies have defined the structural mechanism that mediates binding of Wnt7a to exosomes and elucidated the singularity of long-range Wnt signalling.
RESUMO
Codon usage influences gene expression distinctly depending on the cell context. Yet, the importance of codon bias in the simultaneous turnover of specific groups of protein-coding genes remains to be investigated. Here, we find that genes enriched in A/T-ending codons are expressed more coordinately in general and across tissues and development than those enriched in G/C-ending codons. tRNA abundance measurements indicate that this coordination is linked to the expression changes of tRNA isoacceptors reading A/T-ending codons. Genes with similar codon composition are more likely to be part of the same protein complex, especially for genes with A/T-ending codons. The codon preferences of genes with A/T-ending codons are conserved among mammals and other vertebrates. We suggest that this orchestration contributes to tissue-specific and ontogenetic-specific expression, which can facilitate, for instance, timely protein complex formation.
Assuntos
Mamíferos , Vertebrados , Animais , Códon/genética , Mamíferos/genética , Vertebrados/genética , RNA de Transferência/genética , Uso do CódonRESUMO
BACKGROUND: Codon usage and nucleotide composition of coding sequences have profound effects on protein expression. However, while it is recognized that different tissues have distinct tRNA profiles and codon usages in their transcriptomes, the effect of tissue-specific codon optimality on protein synthesis remains elusive. RESULTS: We leverage existing state-of-the-art transcriptomics and proteomics datasets from the GTEx project and the Human Protein Atlas to compute the protein-to-mRNA ratios of 36 human tissues. Using this as a proxy of translational efficiency, we build a machine learning model that identifies codons enriched or depleted in specific tissues. We detect two clusters of tissues with an opposite pattern of codon preferences. We then use these identified patterns for the development of CUSTOM, a codon optimizer algorithm which suggests a synonymous codon design in order to optimize protein production in a tissue-specific manner. In human cell-line models, we provide evidence that codon optimization should take into account particularities of the translational machinery of the tissues in which the target proteins are expressed and that our approach can design genes with tissue-optimized expression profiles. CONCLUSIONS: We provide proof-of-concept evidence that codon preferences exist in tissue-specific protein synthesis and demonstrate its application to synthetic gene design. We show that CUSTOM can be of benefit in biological and biotechnological applications, such as in the design of tissue-targeted therapies and vaccines.
Assuntos
Biossíntese de Proteínas , Proteínas , Humanos , RNA Mensageiro/genética , Códon , Proteínas/genética , Uso do CódonRESUMO
SUMMARY: Recent years have seen an increase in the number of structures available, not only for new proteins but also for the same protein crystallized with different molecules and proteins. While protein design software has proven to be successful in designing and modifying proteins, they can also be overly sensitive to small conformational differences between structures of the same protein. To cope with this, we introduce here pyFoldX, a python library that allows the integrative analysis of structures of the same protein using FoldX, an established forcefield and modelling software. The library offers new functionalities for handling different structures of the same protein, an improved molecular parametrization module and an easy integration with the data analysis ecosystem of the python programming language. AVAILABILITY AND IMPLEMENTATION: pyFoldX rely on the FoldX software for energy calculations and modelling, which can be downloaded upon registration in http://foldxsuite.crg.eu/ and its licence is free of charge for academics. The pyFoldX library is open-source. Full details on installation, tutorials covering the library functionality and the scripts used to generate the data and figures presented in this paper are available at https://github.com/leandroradusky/pyFoldX. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Ecossistema , Software , Linguagens de Programação , Proteínas , Biblioteca GênicaRESUMO
SUMMARY: Once folded, natural protein molecules have few energetic conflicts within their polypeptide chains. Many protein structures do however contain regions where energetic conflicts remain after folding, i.e. they are highly frustrated. These regions, kept in place over evolutionary and physiological timescales, are related to several functional aspects of natural proteins such as protein-protein interactions, small ligand recognition, catalytic sites and allostery. Here, we present FrustratometeR, an R package that easily computes local energetic frustration on a personal computer or a cluster. This package facilitates large scale analysis of local frustration, point mutants and molecular dynamics (MD) trajectories, allowing straightforward integration of local frustration analysis into pipelines for protein structural analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/proteinphysiologylab/frustratometeR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Simulação de Dinâmica Molecular , Proteínas , Domínio Catalítico , SoftwareRESUMO
SUMMARY: Accurate 3D modelling of protein-protein interactions (PPI) is essential to compensate for the absence of experimentally determined complex structures. Here, we present a new set of commands within the ModelX toolsuite capable of generating atomic-level protein complexes suitable for interface design. Among these commands, the new tool ProteinFishing proposes known and/or putative alternative 3D PPI for a given protein complex. The algorithm exploits backbone compatibility of protein fragments to generate mutually exclusive protein interfaces that are quickly evaluated with a knowledge-based statistical force field. Using interleukin-10-R2 co-crystalized with interferon-lambda-3, and a database of X-ray structures containing interleukin-10, this algorithm was able to generate interleukin-10-R2/interleukin-10 structural models in agreement with experimental data. AVAILABILITY AND IMPLEMENTATION: ProteinFishing is a portable command-line tool included in the ModelX toolsuite, written in C++, that makes use of an SQL (tested for MySQL and MariaDB) relational database delivered with a template SQL dump called FishXDB. FishXDB contains the empty tables of ModelX fragments and the data used by the embedded statistical force field. ProteinFishing is compiled for Linux-64bit, MacOS-64bit and Windows-32bit operating systems. This software is a proprietary license and is distributed as an executable with its correspondent database dumps. It can be downloaded publicly at http://modelx.crg.es/. Licenses are freely available for academic users after registration on the website and are available under commercial license for for-profit organizations or companies. CONTACT: javier.delgado@crg.eu or luis.serrano@crg.eu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Software , ProteínasRESUMO
The COVID-19 pandemic has posed and is continuously posing enormous societal and health challenges worldwide. The research community has mobilized to develop novel projects to find a cure or a vaccine, as well as to contribute to mass testing, which has been a critical measure to contain the infection in several countries. Through this article, we share our experiences and learnings as a group of volunteers at the Centre for Genomic Regulation (CRG) in Barcelona, Spain. As members of the ORFEU project, an initiative by the Government of Catalonia to achieve mass testing of people at risk and contain the epidemic in Spain, we share our motivations, challenges and the key lessons learnt, which we feel will help better prepare the global society to address similar situations in the future.
Assuntos
COVID-19 , Teste para COVID-19 , Genômica , Humanos , Pandemias , SARS-CoV-2 , VoluntáriosRESUMO
RNA-protein interactions are crucial for such key biological processes as regulation of transcription, splicing, translation, and gene silencing, among many others. Knowing where an RNA molecule interacts with a target protein and/or engineering an RNA molecule to specifically bind to a protein could allow for rational interference with these cellular processes and the design of novel therapies. Here we present a robust RNA-protein fragment pair-based method, termed RnaX, to predict RNA-binding sites. This methodology, which is integrated into the ModelX tool suite (http://modelx.crg.es), takes advantage of the structural information present in all released RNA-protein complexes. This information is used to create an exhaustive database for docking and a statistical forcefield for fast discrimination of true backbone-compatible interactions. RnaX, together with the protein design forcefield FoldX, enables us to predict RNA-protein interfaces and, when sufficient crystallographic information is available, to reengineer the interface at the sequence-specificity level by mimicking those conformational changes that occur on protein and RNA mutagenesis. These results, obtained at just a fraction of the computational cost of methods that simulate conformational dynamics, open up perspectives for the engineering of RNA-protein interfaces.
Assuntos
Simulação de Acoplamento Molecular/métodos , Proteínas/metabolismo , RNA/metabolismo , Algoritmos , Sítios de Ligação , Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , RNA/química , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/metabolismo , Curva ROC , SoftwareRESUMO
SUMMARY: A new version of FoldX, whose main new features allows running classic FoldX commands on structures containing RNA molecules and includes a module that allows parametrization of ligands or small molecules (ParamX) that were not previously recognized in old versions, has been released. An extended FoldX graphical user interface has also being developed (available as a python plugin for the YASARA molecular viewer) allowing user-friendly parametrization of new custom user molecules encoded using JSON format. AVAILABILITY AND IMPLEMENTATION: http://foldxsuite.crg.eu/.
Assuntos
Software , Interface Usuário-Computador , Ligantes , RNARESUMO
Understanding the functional effect of Single Amino acid Substitutions (SAS), derived from the occurrence of single nucleotide variants (SNVs), and their relation to disease development is a major issue in clinical genomics. Despite the existence of several bioinformatic algorithms and servers that predict if a SAS is pathogenic or not, they give little or no information at all on the reasons for pathogenicity prediction and on the actual predicted effect of the SAS on the protein function. Moreover, few actual methods take into account structural information when available for automated analysis. Moreover, many of these algorithms are able to predict an effect that no necessarily translates directly into pathogenicity. VarQ is a bioinformatic pipeline that incorporates structural information for the detailed analysis and prediction of SAS effect on protein function. It is an online tool which uses UniProt id and automatically analyzes known and user provided SAS for their effect on protein activity, folding, aggregation and protein interactions, among others. We show that structural information, when available, can improve the SAS pathogenicity diagnosis and more important explain its causes. We show that VarQ is able to correctly reproduce previous analysis of RASopathies related mutations, saving extensive and time consuming manual curation. VarQ assessment was performed over a set of previously manually curated RASopathies (diseases that affects the RAS/MAPK signaling pathway) related variants, showing its ability to correctly predict the phenotypic outcome and its underlying cause. This resource is available online at http://varq.qb.fcen.uba.ar/. Supporting Information & Tutorials may be found in the webpage of the tool.
RESUMO
The speed at which new genomes are being sequenced highlights the need for genome-wide methods capable of predicting protein-DNA interactions. Here, we present PADA1, a generic algorithm that accurately models structural complexes and predicts the DNA-binding regions of resolved protein structures. PADA1 relies on a library of protein and double-stranded DNA fragment pairs obtained from a training set of 2103 DNA-protein complexes. It includes a fast statistical force field computed from atom-atom distances, to evaluate and filter the 3D docking models. Using published benchmark validation sets and 212 DNA-protein structures published after 2016 we predicted the DNA-binding regions with an RMSD of <1.8 Å per residue in >95% of the cases. We show that the quality of the docked templates is compatible with FoldX protein design tool suite to identify the crystallized DNA molecule sequence as the most energetically favorable in 80% of the cases. We highlighted the biological potential of PADA1 by reconstituting DNA and protein conformational changes upon protein mutagenesis of a meganuclease and its variants, and by predicting DNA-binding regions and nucleotide sequences in proteins crystallized without DNA. These results opens up new perspectives for the engineering of DNA-protein interfaces.
Assuntos
Algoritmos , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , DNA/química , DNA/metabolismo , Sítios de Ligação , Biologia Computacional/métodos , Simulação por Computador , Proteínas de Ligação a DNA/genética , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Bases de Conhecimento , Modelos Moleculares , Simulação de Acoplamento Molecular , Ligação Proteica , Conformação Proteica , Engenharia de Proteínas , SoftwareRESUMO
Diphtheria is an acute and highly infectious disease, previously regarded as endemic in nature but vaccine-preventable, is caused by Corynebacterium diphtheriae (Cd). In this work, we used an in silico approach along the 13 complete genome sequences of C. diphtheriae followed by a computational assessment of structural information of the binding sites to characterize the "pocketome druggability." To this end, we first computed the "modelome" (3D structures of a complete genome) of a randomly selected reference strain Cd NCTC13129; that had 13,763 open reading frames (ORFs) and resulted in 1,253 (â¼9%) structure models. The amino acid sequences of these modeled structures were compared with the remaining 12 genomes and consequently, 438 conserved protein sequences were obtained. The RCSB-PDB database was consulted to check the template structures for these conserved proteins and as a result, 401 adequate 3D models were obtained. We subsequently predicted the protein pockets for the obtained set of models and kept only the conserved pockets that had highly druggable (HD) values (137 across all strains). Later, an off-target host homology analyses was performed considering the human proteome using NCBI database. Furthermore, the gene essentiality analysis was carried out that gave a final set of 10-conserved targets possessing highly druggable protein pockets. To check the target identification robustness of the pipeline used in this work, we crosschecked the final target list with another in-house target identification approach for C. diphtheriae thereby obtaining three common targets, these were; hisE-phosphoribosyl-ATP pyrophosphatase, glpX-fructose 1,6-bisphosphatase II, and rpsH-30S ribosomal protein S8. Our predicted results suggest that the in silico approach used could potentially aid in experimental polypharmacological target determination in C. diphtheriae and other pathogens, thereby, might complement the existing and new drug-discovery pipelines.
RESUMO
Available genomic data for pathogens has created new opportunities for drug discovery and development to fight them, including new resistant and multiresistant strains. In particular structural data must be integrated with both, gene information and experimental results. In this sense, there is a lack of an online resource that allows genome wide-based data consolidation from diverse sources together with thorough bioinformatic analysis that allows easy filtering and scoring for fast target selection for drug discovery. Here, we present Target-Pathogen database (http://target.sbg.qb.fcen.uba.ar/patho), designed and developed as an online resource that allows the integration and weighting of protein information such as: function, metabolic role, off-targeting, structural properties including druggability, essentiality and omic experiments, to facilitate the identification and prioritization of candidate drug targets in pathogens. We include in the database 10 genomes of some of the most relevant microorganisms for human health (Mycobacterium tuberculosis, Mycobacterium leprae, Klebsiella pneumoniae, Plasmodium vivax, Toxoplasma gondii, Leishmania major, Wolbachia bancrofti, Trypanosoma brucei, Shigella dysenteriae and Schistosoma Smanosoni) and show its applicability. New genomes can be uploaded upon request.
Assuntos
Anti-Infecciosos/química , Biologia Computacional/métodos , Bases de Dados Factuais , Genoma Bacteriano , Genoma Fúngico , Genoma Helmíntico , Genoma de Protozoário , Sequência de Aminoácidos , Anti-Infecciosos/farmacologia , Sítios de Ligação , Doenças Transmissíveis/tratamento farmacológico , Descoberta de Drogas , Humanos , Internet , Redes e Vias Metabólicas/efeitos dos fármacos , Redes e Vias Metabólicas/genética , Modelos Moleculares , Terapia de Alvo Molecular , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , SoftwareRESUMO
Virtual screening is a powerful methodology to search for new small molecule inhibitors against a desired molecular target. Usually, it involves evaluating thousands of compounds (derived from large databases) in order to select a set of potential binders that will be tested in the wet-lab. The number of tested compounds is directly proportional to the cost, and thus, the best possible set of ligands is the one with the highest number of true binders, for the smallest possible compound set size. Therefore, methods that are able to trim down large universal data sets enriching them in potential binders are highly appreciated. Here we present LigQ, a free webserver that is able to (i) determine best structure and ligand binding pocket for a desired protein, (ii) find known binders, as well as potential ligands known to bind to similar protein domains, (iii) most importantly, select a small set of commercial compounds enriched in potential binders, and (iv) prepare them for virtual screening. LigQ was tested with several proteins, showing an impressive capacity to retrieve true ligands from large data sets, achieving enrichment factors of over 10%. LigQ is available at http://ligq.qb.fcen.uba.ar/ .
Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Internet , Proteínas/metabolismo , Software , Sítios de Ligação , Bases de Dados de Produtos Farmacêuticos , Ligantes , Ligação Proteica , Proteínas/química , Interface Usuário-ComputadorRESUMO
The protein frustratometer is an energy landscape theory-inspired algorithm that aims at localizing and quantifying the energetic frustration present in protein molecules. Frustration is a useful concept for analyzing proteins' biological behavior. It compares the energy distributions of the native state with respect to structural decoys. The network of minimally frustrated interactions encompasses the folding core of the molecule. Sites of high local frustration often correlate with functional regions such as binding sites and regions involved in allosteric transitions. We present here an upgraded version of a webserver that measures local frustration. The new implementation that allows the inclusion of electrostatic energy terms, important to the interactions with nucleic acids, is significantly faster than the previous version enabling the analysis of large macromolecular complexes within a user-friendly interface. The webserver is freely available at URL: http://frustratometer.qb.fcen.uba.ar.
Assuntos
Algoritmos , Proteínas Nucleares/química , Ácidos Nucleicos/química , Nucleossomos/química , Interface Usuário-Computador , Sequência de Aminoácidos , Gráficos por Computador , Humanos , Internet , Simulação de Dinâmica Molecular , Proteínas Nucleares/genética , Ácidos Nucleicos/genética , Nucleossomos/genética , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Análise de Sequência de Proteína , Eletricidade Estática , TermodinâmicaRESUMO
Predicting function from sequence is an important goal in current biological research, and although, broad functional assignment is possible when a protein is assigned to a family, predicting functional specificity with accuracy is not straightforward. If function is provided by key structural properties and the relevant properties can be computed using the sequence as the starting point, it should in principle be possible to predict function in detail. The truncated hemoglobin family presents an interesting benchmark study due to their ubiquity, sequence diversity in the context of a conserved fold and the number of characterized members. Their functions are tightly related to O2 affinity and reactivity, as determined by the association and dissociation rate constants, both of which can be predicted and analyzed using in-silico based tools. In the present work we have applied a strategy, which combines homology modeling with molecular based energy calculations, to predict and analyze function of all known truncated hemoglobins in an evolutionary context. Our results show that truncated hemoglobins present conserved family features, but that its structure is flexible enough to allow the switch from high to low affinity in a few evolutionary steps. Most proteins display moderate to high oxygen affinities and multiple ligand migration paths, which, besides some minor trends, show heterogeneous distributions throughout the phylogenetic tree, again suggesting fast functional adaptation. Our data not only deepens our comprehension of the structural basis governing ligand affinity, but they also highlight some interesting functional evolutionary trends.
Assuntos
Hemoglobinas Truncadas , Sequência de Aminoácidos , Biologia Computacional , Evolução Molecular , Modelos Lineares , Modelos Moleculares , Dados de Sequência Molecular , Oxigênio/metabolismo , Filogenia , Alinhamento de Sequência , Hemoglobinas Truncadas/química , Hemoglobinas Truncadas/genética , Hemoglobinas Truncadas/fisiologiaRESUMO
Current Tuberculosis treatment is long and expensive, faces the increasing burden of MDR/XDR strains and lack of effective treatment against latent form, resulting in an urgent need of new anti-TB drugs. Key to TB biology is its capacity to fight the host's RNOS mediated attack. RNOS are known to display a concentration dependent mycobactericidal activity, which leads to the following hypothesis "if we know which proteins are targeted by RNOS and kill TB, we we might be able to inhibit them with drugs resulting in a synergistic bactericidal effect". Based on this idea, we performed an Mtb metabolic network whole proteome analysis of potential RNOS sensitive and relevant targets which includes target druggability and essentiality criteria. Our results, available at http://tuberq.proteinq.com.ar yield new potential TB targets, like I3PS, while also providing and updated view of previous proposals becoming an important tool for researchers looking for new ways of killing TB.