Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35641150

RESUMO

Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.


Assuntos
Mutação de Sentido Incorreto , Proteínas , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Mutação , Proteínas/química , Proteínas/genética
2.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36648327

RESUMO

MOTIVATION: CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation to construct a hierarchical classification of evolutionary and structural relationships. The aim of this study was to develop algorithms for detecting remote homologues missed by state-of-the-art hidden Markov model (HMM)-based approaches. The method developed (CATHe) combines a neural network with sequence representations obtained from protein language models. It was assessed using a dataset of remote homologues having less than 20% sequence identity to any domain in the training set. RESULTS: The CATHe models trained on 1773 largest and 50 largest CATH superfamilies had an accuracy of 85.6 ± 0.4% and 98.2 ± 0.3%, respectively. As a further test of the power of CATHe to detect more remote homologues missed by HMMs derived from CATH domains, we used a dataset consisting of protein domains that had annotations in Pfam, but not in CATH. By using highly reliable CATHe predictions (expected error rate <0.5%), we were able to provide CATH annotations for 4.62 million Pfam domains. For a subset of these domains from Homo sapiens, we structurally validated 90.86% of the predictions by comparing their corresponding AlphaFold2 structures with structures from the CATH superfamilies to which they were assigned. AVAILABILITY AND IMPLEMENTATION: The code for the developed models is available on https://github.com/vam-sin/CATHe, and the datasets developed in this study can be accessed on https://zenodo.org/record/6327572. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Humanos , Homologia de Sequência de Aminoácidos , Proteínas/química , Bases de Dados de Proteínas
3.
Brief Bioinform ; 22(2): 742-768, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-33348379

RESUMO

SARS-CoV-2 is the causative agent of COVID-19, the ongoing global pandemic. It has posed a worldwide challenge to human health as no effective treatment is currently available to combat the disease. Its severity has led to unprecedented collaborative initiatives for therapeutic solutions against COVID-19. Studies resorting to structure-based drug design for COVID-19 are plethoric and show good promise. Structural biology provides key insights into 3D structures, critical residues/mutations in SARS-CoV-2 proteins, implicated in infectivity, molecular recognition and susceptibility to a broad range of host species. The detailed understanding of viral proteins and their complexes with host receptors and candidate epitope/lead compounds is the key to developing a structure-guided therapeutic design. Since the discovery of SARS-CoV-2, several structures of its proteins have been determined experimentally at an unprecedented speed and deposited in the Protein Data Bank. Further, specialized structural bioinformatics tools and resources have been developed for theoretical models, data on protein dynamics from computer simulations, impact of variants/mutations and molecular therapeutics. Here, we provide an overview of ongoing efforts on developing structural bioinformatics tools and resources for COVID-19 research. We also discuss the impact of these resources and structure-based studies, to understand various aspects of SARS-CoV-2 infection and therapeutic development. These include (i) understanding differences between SARS-CoV-2 and SARS-CoV, leading to increased infectivity of SARS-CoV-2, (ii) deciphering key residues in the SARS-CoV-2 involved in receptor-antibody recognition, (iii) analysis of variants in host proteins that affect host susceptibility to infection and (iv) analyses facilitating structure-based drug and vaccine design against SARS-CoV-2.


Assuntos
Antivirais/uso terapêutico , Tratamento Farmacológico da COVID-19 , Biologia Computacional , SARS-CoV-2/isolamento & purificação , COVID-19/virologia , Humanos , Conformação Proteica , Proteínas Virais/química
4.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33237325

RESUMO

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Assuntos
Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Domínios Proteicos , Proteínas/química , Sequência de Aminoácidos , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Biologia Computacional/métodos , Epidemias , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/genética , Proteínas/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Proteínas Virais/química , Proteínas Virais/genética , Proteínas Virais/metabolismo
5.
Bioinformatics ; 37(8): 1099-1106, 2021 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-33135053

RESUMO

MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION: https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Proteínas , Sequência de Aminoácidos , Evolução Biológica , Humanos , Proteínas/genética
6.
J Chem Inf Model ; 59(4): 1529-1546, 2019 04 22.
Artigo em Inglês | MEDLINE | ID: mdl-30794402

RESUMO

Small molecule drugs bind to a pocket in disease causing target proteins based on complementarity in shape and physicochemical properties. There is a likelihood that other proteins could have binding sites that are structurally similar to the target protein. Binding to these other proteins could alter their activities leading to off target effects of the drug. One such small molecule drug Nutlin binds the protein MDM2, which is upregulated in several types of cancer and is a negative regulator of the tumor suppressor protein p53. To investigate the off target effects of Nutlin, we present here a shape-based data mining effort. We extracted the binding pocket of Nutlin from the crystal structure of Nutlin bound MDM2. We next mined the protein structural database (PDB) for putative binding pockets in other human protein structures that were similar in shape to the Nutlin pocket in MDM2 using our topology-independent structural superimposition tool CLICK. We detected 49 proteins which have binding pockets that were structurally similar to the Nutlin binding site of MDM2. All of the potential complexes were evaluated using molecular mechanics and AutoDock Vina docking scores. Further, molecular dynamics simulations were carried out on four of the predicted Nutlin-protein complexes. The binding of Nutlin to one of these proteins, gamma glutamyl hydrolase, was also experimentally validated by a thermal shift assay. These findings provide a platform for identifying potential off-target effects of existing/new drugs and also opens the possibilities for repurposing drugs/ligands.


Assuntos
Imidazóis/farmacologia , Terapia de Alvo Molecular , Proteína Supressora de Tumor p53/metabolismo , Sítios de Ligação , Simulação de Dinâmica Molecular , Conformação Proteica , Proteínas Proto-Oncogênicas c-mdm2/metabolismo , Temperatura , Proteína Supressora de Tumor p53/química
7.
Methods ; 131: 33-65, 2017 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-28958951

RESUMO

It has been twenty years since the first rationally designed small molecule drug was introduced into the market. Since then, we have progressed from designing small molecules to designing biotherapeutics. This class of therapeutics includes designed proteins, peptides and nucleic acids that could more effectively combat drug resistance and even act in cases where the disease is caused because of a molecular deficiency. Computational methods are crucial in this design exercise and this review discusses the various elements of designing biotherapeutic proteins and peptides. Many of the techniques discussed here, such as the deterministic and stochastic design methods, are generally used in protein design. We have devoted special attention to the design of antibodies and vaccines. In addition to the methods for designing these molecules, we have included a comprehensive list of all biotherapeutics approved for clinical use. Also included is an overview of methods that predict the binding affinity, cell penetration ability, half-life, solubility, immunogenicity and toxicity of the designed therapeutics. Biotherapeutics are only going to grow in clinical importance and are set to herald a new generation of disease management and cure.


Assuntos
Produtos Biológicos/química , Biologia Computacional/métodos , Desenho de Fármacos , Peptídeos/química , Proteínas/química , Produtos Biológicos/imunologia , Produtos Biológicos/farmacologia , Tratamento Farmacológico/métodos , Meia-Vida , Imunogenicidade da Vacina , Peptídeos/imunologia , Peptídeos/farmacologia , Engenharia de Proteínas/métodos , Proteínas/imunologia , Proteínas/farmacologia , Software , Solubilidade , Vacinas/química , Vacinas/imunologia , Vacinas/farmacologia
8.
Sci Rep ; 14(1): 14208, 2024 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-38902252

RESUMO

The COVID-19 disease is an ongoing global health concern. Although vaccination provides some protection, people are still susceptible to re-infection. Ostensibly, certain populations or clinical groups may be more vulnerable. Factors causing these differences are unclear and whilst socioeconomic and cultural differences are likely to be important, human genetic factors could influence susceptibility. Experimental studies indicate SARS-CoV-2 uses innate immune suppression as a strategy to speed-up entry and replication into the host cell. Therefore, it is necessary to understand the impact of variants in immunity-associated human proteins on susceptibility to COVID-19. In this work, we analysed missense coding variants in several SARS-CoV-2 proteins and their human protein interactors that could enhance binding affinity to SARS-CoV-2. We curated a dataset of 19 SARS-CoV-2: human protein 3D-complexes, from the experimentally determined structures in the Protein Data Bank and models built using AlphaFold2-multimer, and analysed the impact of missense variants occurring in the protein-protein interface region. We analysed 468 missense variants from human proteins and 212 variants from SARS-CoV-2 proteins and computationally predicted their impacts on binding affinities for the human viral protein complexes. We predicted a total of 26 affinity-enhancing variants from 13 human proteins implicated in increased binding affinity to SARS-CoV-2. These include key-immunity associated genes (TOMM70, ISG15, IFIH1, IFIT2, RPS3, PALS1, NUP98, AXL, ARF6, TRIMM, TRIM25) as well as important spike receptors (KREMEN1, AXL and ACE2). We report both common (e.g., Y13N in IFIH1) and rare variants in these proteins and discuss their likely structural and functional impact, using information on known and predicted functional sites. Potential mechanisms associated with immune suppression implicated by these variants are discussed. Occurrence of certain predicted affinity-enhancing variants should be monitored as they could lead to increased susceptibility and reduced immune response to SARS-CoV-2 infection in individuals/populations carrying them. Our analyses aid in understanding the potential impact of genetic variation in immunity-associated proteins on COVID-19 susceptibility and help guide drug-repurposing strategies.


Assuntos
COVID-19 , Mutação de Sentido Incorreto , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , SARS-CoV-2/imunologia , COVID-19/genética , COVID-19/virologia , COVID-19/imunologia , Reposicionamento de Medicamentos , Proteínas Virais/genética , Proteínas Virais/metabolismo , Ligação Proteica , Predisposição Genética para Doença , Suscetibilidade a Doenças , Tratamento Farmacológico da COVID-19
9.
Curr Opin Struct Biol ; 81: 102640, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37354790

RESUMO

Proteins provide the basis for cellular function. Having multiple versions of the same protein within a single organism provides a way of regulating its activity or developing novel functions. Post-translational modifications of proteins, by means of adding/removing chemical groups to amino acids, allow for a well-regulated and controlled way of generating functionally distinct protein species. Alternative splicing is another method with which organisms possibly generate new isoforms. Additionally, gene duplication events throughout evolution generate multiple paralogs of the same genes, resulting in multiple versions of the same protein within an organism. In this review, we discuss recent advancements in the study of these three methods of protein diversification and provide illustrative examples of how they affect protein structure and function.


Assuntos
Processamento Alternativo , Duplicação Gênica , Evolução Molecular , Isoformas de Proteínas/genética , Processamento de Proteína Pós-Traducional
10.
Commun Biol ; 6(1): 160, 2023 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-36755055

RESUMO

Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.


Assuntos
Furilfuramida , Proteínas , Humanos , Bases de Dados de Proteínas , Proteínas/química
11.
Curr Opin Struct Biol ; 70: 108-122, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34225010

RESUMO

Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features.


Assuntos
Evolução Biológica , Proteínas , Sítios de Ligação , Catálise , Biologia Computacional , Humanos , Aprendizado de Máquina , Engenharia de Proteínas , Proteínas/genética
12.
Methods Mol Biol ; 2305: 53-80, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33950384

RESUMO

Biological processes are often mediated by complexes formed between proteins and various biomolecules. The 3D structures of such protein-biomolecule complexes provide insights into the molecular mechanism of their action. The structure of these complexes can be predicted by various computational methods. Choosing an appropriate method for modelling depends on the category of biomolecule that a protein interacts with and the availability of structural information about the protein and its interacting partner. We intend for the contents of this chapter to serve as a guide as to what software would be the most appropriate for the type of data at hand and the kind of 3D complex structure required. Particularly, we have dealt with protein-small molecule ligand, protein-peptide, protein-protein, and protein-nucleic acid interactions.Most, if not all, model building protocols perform some sampling and scoring. Typically, several alternate conformations and configurations of the interactors are sampled. Each such sample is then scored for optimization. To boost the confidence in these predicted models, their assessment using other independent scoring schemes besides the inbuilt/default ones would prove to be helpful. This chapter also lists such software and serves as a guide to gauge the fidelity of modelled structures of biomolecular complexes.


Assuntos
Simulação de Acoplamento Molecular/métodos , Complexos Multiproteicos/química , Conformação Proteica , Algoritmos , Biologia Computacional , Simulação por Computador , Ligantes , Ácidos Nucleicos/química , Peptídeos/química , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Software
13.
PLoS Negl Trop Dis ; 13(12): e0007419, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31830030

RESUMO

Despite Nipah virus outbreaks having high mortality rates (>70% in Southeast Asia), there are no licensed drugs against it. In this study, we have considered all 9 Nipah proteins as potential therapeutic targets and computationally identified 4 putative peptide inhibitors (against G, F and M proteins) and 146 small molecule inhibitors (against F, G, M, N, and P proteins). The computations include extensive homology/ab initio modeling, peptide design and small molecule docking. An important contribution of this study is the increased structural characterization of Nipah proteins by approximately 90% of what is deposited in the PDB. In addition, we have carried out molecular dynamics simulations on all the designed protein-peptide complexes and on 13 of the top shortlisted small molecule ligands to check for stability and to estimate binding strengths. Details, including atomic coordinates of all the proteins and their ligand bound complexes, can be accessed at http://cospi.iiserpune.ac.in/Nipah. Our strategy was to tackle the development of therapeutics on a proteome wide scale and the lead compounds identified could be attractive starting points for drug development. To counter the threat of drug resistance, we have analysed the sequences of the viral strains from different outbreaks, to check whether they would be sensitive to the binding of the proposed inhibitors.


Assuntos
Antivirais/isolamento & purificação , Antivirais/farmacologia , Vírus Nipah/efeitos dos fármacos , Proteínas Virais/antagonistas & inibidores , Antivirais/química , Antivirais/metabolismo , Simulação de Dinâmica Molecular , Ligação Proteica , Conformação Proteica , Proteínas Virais/química
14.
Structure ; 26(7): 1015-1024.e2, 2018 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-29804821

RESUMO

Modeling macromolecular assemblies with restraints from crosslinking mass spectrometry (XL-MS) tends to focus solely on distance violation. Recently, we identified three different modeling features inherent in crosslink data: (1) expected distance between crosslinked residues; (2) violation of the crosslinker's maximum bound; and (3) solvent accessibility of crosslinked residues. Here, we implement these features in a scoring function. cMNXL, and demonstrate that it outperforms the commonlyused crosslink distance violation. We compare the different methods of calculating the distance between crosslinked residues, which shows no significant change in performance when using Euclidean distance compared with the solvent-accessible surface distance. Finally, we create a combined score that incorporates information from 3D electron microscopy maps as well as crosslinking. This achieves, on average, better results than either information type alone and demonstrates the potential of integrative modeling with XL-MS and low-resolution cryoelectron microscopy.


Assuntos
Substâncias Macromoleculares/química , Reagentes de Ligações Cruzadas , Microscopia Crioeletrônica , Espectrometria de Massas , Modelos Moleculares , Conformação Proteica
15.
Prog Biophys Mol Biol ; 128: 14-23, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28212855

RESUMO

The 20 naturally occurring amino acids have different environmental preferences of where they are likely to occur in protein structures. Environments in a protein can be classified by their proximity to solvent by the residue depth measure. Since the frequencies of amino acids are different at various depth levels, the substitution frequencies should vary according to depth. To quantify these substitution frequencies, we built depth dependent substitution matrices. The dataset used for creation of the matrices consisted of 3696 high quality, non redundant pairwise protein structural alignments. One of the applications of these matrices is to predict the tolerance of mutations in different protein environments. Using these substitution scores the prediction of deleterious mutations was done on 3500 mutations in T4 lysozyme and CcdB. The accuracy of the technique in terms of the Matthews Correlation Coefficient (MCC) is 0.48 on the CcdB testing set, while the best of the other tested methods has an MCC of 0.40. Further developments in these substitution matrices could help in improving structure-sequence alignment for protein 3D structure modeling.


Assuntos
Substituição de Aminoácidos , Biologia Computacional , Mutação Puntual , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Bacteriófago T4/enzimologia , Modelos Moleculares , Muramidase/química , Muramidase/genética , Muramidase/metabolismo , Conformação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA