Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Bioinformatics ; 36(9): 2750-2754, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32044951

RESUMO

SUMMARY: Structural biology relies on specific file formats to convey information about macromolecular structures. Traditionally this has been the PDB format, but increasingly newer formats, such as PDBML, mmCIF and MMTF are being used. Here we present atomium, a modern, lightweight, Python library for parsing, manipulating and saving PDB, mmCIF and MMTF file formats. In addition, we provide a web service, pdb2json, which uses atomium to give a consistent JSON representation to the entire Protein Data Bank. AVAILABILITY AND IMPLEMENTATION: atomium is implemented in Python and its performance is equivalent to the existing library BioPython. However, it has significant advantages in features and API design. atomium is available from atomium.bioinf.org.uk and pdb2json can be accessed at pdb2json.bioinf.org.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Bases de Dados de Proteínas , Estrutura Molecular
2.
Molecules ; 26(4)2021 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-33673040

RESUMO

Background: Zinc binding proteins make up a significant proportion of the proteomes of most organisms and, within those proteins, zinc performs rôles in catalysis and structure stabilisation. Identifying the ability to bind zinc in a novel protein can offer insights into its functions and the mechanism by which it carries out those functions. Computational means of doing so are faster than spectroscopic means, allowing for searching at much greater speeds and scales, and thereby guiding complimentary experimental approaches. Typically, computational models of zinc binding predict zinc binding for individual residues rather than as a single binding site, and typically do not distinguish between different classes of binding site-missing crucial properties indicative of zinc binding. Methods: Previously, we created ZincBindDB, a continuously updated database of known zinc binding sites, categorised by family (the set of liganding residues). Here, we use this dataset to create ZincBindPredict, a set of machine learning methods to predict the most common zinc binding site families for both structure and sequence. Results: The models all achieve an MCC ≥ 0.88, recall ≥ 0.93 and precision ≥ 0.91 for the structural models (mean MCC = 0.97), while the sequence models have MCC ≥ 0.64, recall ≥ 0.80 and precision ≥ 0.83 (mean MCC = 0.87), with the models for binding sites containing four liganding residues performing much better than this. Conclusions: The predictors outperform competing zinc binding site predictors and are available online via a web interface and a GraphQL API.


Assuntos
Biologia Computacional , Proteínas/química , Software , Zinco/química , Algoritmos , Sítios de Ligação/genética , Bases de Dados de Proteínas , Ligantes , Aprendizado de Máquina , Ligação Proteica/genética , Proteínas/genética , Máquina de Vetores de Suporte
3.
Bioinformatics ; 34(2): 223-229, 2018 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-28968673

RESUMO

MOTIVATION: Protein-protein interactions are vital for protein function with the average protein having between three and ten interacting partners. Knowledge of precise protein-protein interfaces comes from crystal structures deposited in the Protein Data Bank (PDB), but only 50% of structures in the PDB are complexes. There is therefore a need to predict protein-protein interfaces in silico and various methods for this purpose. Here we explore the use of a predictor based on structural features and which exploits random forest machine learning, comparing its performance with a number of popular established methods. RESULTS: On an independent test set of obligate and transient complexes, our IntPred predictor performs well (MCC = 0.370, ACC = 0.811, SPEC = 0.916, SENS = 0.411) and compares favourably with other methods. Overall, IntPred ranks second of six methods tested with SPPIDER having slightly better overall performance (MCC = 0.410, ACC = 0.759, SPEC = 0.783, SENS = 0.676), but considerably worse specificity than IntPred. As with SPPIDER, using an independent test set of obligate complexes enhanced performance (MCC = 0.381) while performance is somewhat reduced on a dataset of transient complexes (MCC = 0.303). The trade-off between sensitivity and specificity compared with SPPIDER suggests that the choice of the appropriate tool is application-dependent. AVAILABILITY AND IMPLEMENTATION: IntPred is implemented in Perl and may be downloaded for local use or run via a web server at www.bioinf.org.uk/intpred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Bioinformatics ; 32(19): 2947-55, 2016 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-27318203

RESUMO

MOTIVATION: High-throughput sequencing platforms are increasingly used to screen patients with genetic disease for pathogenic mutations, but prediction of the effects of mutations remains challenging. Previously we developed SAAPdap (Single Amino Acid Polymorphism Data Analysis Pipeline) and SAAPpred (Single Amino Acid Polymorphism Predictor) that use a combination of rule-based structural measures to predict whether a missense genetic variant is pathogenic. Here we investigate whether the same methodology can be used to develop a differential phenotype predictor, which, once a mutation has been predicted as pathogenic, is able to distinguish between phenotypes-in this case the two major clinical phenotypes (hypertrophic cardiomyopathy, HCM and dilated cardiomyopathy, DCM) associated with mutations in the beta-myosin heavy chain (MYH7) gene product (Myosin-7). RESULTS: A random forest predictor trained on rule-based structural analyses together with structural clustering data gave a Matthews' correlation coefficient (MCC) of 0.53 (accuracy, 75%). A post hoc removal of machine learning models that performed particularly badly, increased the performance (MCC = 0.61, Acc = 79%). This proof of concept suggests that methods used for pathogenicity prediction can be extended for use in differential phenotype prediction. AVAILABILITY AND IMPLEMENTATION: Analyses were implemented in Perl and C and used the Java-based Weka machine learning environment. Please contact the authors for availability. CONTACTS: andrew@bioinf.org.uk or andrew.martin@ucl.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Mutação , Cadeias Pesadas de Miosina , Cardiomiopatias/genética , Cardiomiopatias/fisiopatologia , Análise por Conglomerados , Humanos , Miosinas Ventriculares
5.
Bioinformatics ; 31(24): 4017-9, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26323716

RESUMO

UNLABELLED: We describe BiopLib, a mature C programming library for manipulating protein structure, and BiopTools, a set of command-line tools which exploit BiopLib. The library also provides a small number of functions for handling protein sequence and general purpose programming and mathematics. BiopLib transparently handles PDBML (XML) format and standard PDB files. BiopTools provides facilities ranging from renumbering atoms and residues to calculation of solvent accessibility. AVAILABILITY AND IMPLEMENTATION: BiopLib and BiopTools are implemented in standard ANSI C. The core of the BiopLib library is a reliable PDB parser that handles alternate occupancies and deals with compressed PDB files and PDBML files automatically. The library is designed to be as flexible as possible, allowing users to handle PDB data as a simple list of atoms, or in a structured form using chains, residues and atoms. Many of the BiopTools command-line tools act as filters, taking a PDB (or PDBML) file as input and producing a PDB (or PDBML) file as output. All code is open source and documented using Doxygen. It is provided under the GNU Public Licence and is available from the authors' web site or from GitHub.


Assuntos
Conformação Proteica , Análise de Sequência de Proteína , Software , Bases de Dados de Proteínas
6.
Biochem Soc Trans ; 42(6): 1704-8, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25399593

RESUMO

Protein moonlighting is the property of a number of proteins to have more than one function. However, the definition of moonlighting is somewhat imprecise with different interpretations of the phenomenon. True moonlighting occurs when an individual evolutionary protein domain has one well-accepted role and a secondary unrelated function. The 'function' of a protein domain can be defined at different levels. For example, although the function of an antibody variable fragment (Fv) could be described as 'binding', a more detailed definition would also specify the molecule to which the Fv region binds. Using this detailed definition, antibodies as a family are consummate moonlighters. However, individual antibodies do not moonlight; the multiple functions they exhibit (first binding a molecule and second triggering the immune response) are encoded in different domains and, in any case, are related in the sense that they are a part of what an antibody needs to do. Nonetheless, antibodies provide interesting lessons on the ability of proteins to evolve binding functions. Remarkably similar antibody sequences can bind completely different antigens, suggesting that evolving the ability to bind a protein can result from very subtle sequence changes.


Assuntos
Anticorpos/fisiologia , Sequência de Aminoácidos , Anticorpos/química , Dados de Sequência Molecular , Conformação Proteica , Homologia de Sequência de Aminoácidos
7.
Biochem Soc Trans ; 42(6): 1671-8, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25399588

RESUMO

The phenomenon of protein moonlighting was discovered in the 1980s and 1990s, and the current definition of what constitutes a moonlighting protein was provided at the end of the 1990s. Since this time, several hundred moonlighting proteins have been identified in all three domains of life, and the rate of discovery is accelerating as the importance of protein moonlighting in biology and medicine becomes apparent. The recent re-evaluation of the number of protein-coding genes in the human genome (approximately 19000) is one reason for believing that protein moonlighting may be a more general phenomenon than the current number of moonlighting proteins would suggest, and preliminary studies of the proportion of proteins that moonlight would concur with this hypothesis. Protein moonlighting could be one way of explaining the seemingly small number of proteins that are encoded in the human genome. It is emerging that moonlighting proteins can exhibit novel biological functions, thus extending the range of the human functional proteome. The several hundred moonlighting proteins so far discovered play important roles in many aspects of biology. For example, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), heat-shock protein 60 (Hsp60) and tRNA synthetases play a wide range of biological roles in eukaryotic cells, and a growing number of eukaryotic moonlighting proteins are recognized to play important roles in physiological processes such as sperm capacitation, implantation, immune regulation in pregnancy, blood coagulation, vascular regeneration and control of inflammation. The dark side of protein moonlighting finds a range of moonlighting proteins playing roles in various human diseases including cancer, cardiovascular disease, HIV and cystic fibrosis. However, some moonlighting proteins are being tested for their therapeutic potential, including immunoglobulin heavy-chain-binding protein (BiP), for rheumatoid arthritis, and Hsp90 for wound healing. In addition, it has emerged over the last 20 years that a large number of bacterial moonlighting proteins play important roles in bacteria-host interactions as virulence factors and are therefore potential therapeutic targets in bacterial infections. So as we progress in the 21st Century, it is likely that moonlighting proteins will be seen to play an increasingly important role in biology and medicine. It is hoped that some of the major unanswered questions, such as the mechanism of evolution of protein moonlighting, the structural biology of moonlighting proteins and their role in the systems biology of cellular systems can be addressed during this period.


Assuntos
Proteínas/fisiologia , Biologia Celular , Humanos , Ligação Proteica
8.
Sci Rep ; 14(1): 8136, 2024 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-38584172

RESUMO

Computational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our approach VariPred (Variant impact Predictor) outperforms current state-of-the-art methods by using an end-to-end model that only requires the protein sequence as input. Using one of the best-performing protein language models (ESM-1b), we establish a robust classifier that requires no calculation of structural features or multiple sequence alignments. We compare the performance of VariPred with other representative models including 3Cnet, Polyphen-2, REVEL, MetaLR, FATHMM and ESM variant. VariPred performs as well as, or in most cases better than these other predictors using six variant impact prediction benchmarks despite requiring only sequence data and no pre-processing of the data.


Assuntos
Mutação de Sentido Incorreto , Proteínas , Virulência , Proteínas/genética , Sequência de Aminoácidos , Biologia Computacional/métodos
9.
MAbs ; 16(1): 2322533, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38477253

RESUMO

Antibodies have increasingly been developed as drugs with over 100 now licensed in the US or EU. During development, it is often necessary to increase or reduce the affinity of an antibody and rational attempts to do so rely on having a structure of the antibody-antigen complex often obtained by modeling. The antigen-binding site consists primarily of six loops known as complementarity-determining regions (CDRs), and an open question has been whether these loops change their conformation when they bind to an antigen. Existing surveys of antibody-antigen complex structures have only examined CDR conformational change in case studies or small-scale surveys. With an increasing number of antibodies where both free and complexed structures have been deposited in the Protein Data Bank, a large-scale survey of CDR conformational change during binding is now possible. To this end, we built a dataset, AbAgDb, that currently includes 177 antibodies with high-quality CDRs, each of which has at least one bound and one unbound structure. We analyzed the conformational change of the Cα backbone of each CDR upon binding and found that, in most cases, the CDRs (other than CDR-H3) show minimal movement, while 70.6% and 87% of CDR-H3s showed global Cα RMSD ≤ 1.0Å and ≤ 2.0Å, respectively. We also compared bound CDR conformations with the conformational space of unbound CDRs and found most of the bound conformations are included in the unbound conformational space. In future, our results will contribute to developing insights into antibodies and new methods for modeling and docking.


Assuntos
Antígenos , Regiões Determinantes de Complementaridade , Sequência de Aminoácidos , Modelos Moleculares , Conformação Proteica , Regiões Determinantes de Complementaridade/química , Complexo Antígeno-Anticorpo/química , Sítios de Ligação de Anticorpos
10.
BMC Genomics ; 14 Suppl 3: S4, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23819919

RESUMO

BACKGROUND: Understanding and predicting the effects of mutations on protein structure and phenotype is an increasingly important area. Genes for many genetically linked diseases are now routinely sequenced in the clinic. Previously we focused on understanding the structural effects of mutations, creating the SAAPdb resource. RESULTS: We have updated SAAPdb to include 41% more SNPs and 36% more PDs. Introducing a hydrophobic residue on the surface, or a hydrophilic residue in the core, no longer shows significant differences between SNPs and PDs. We have improved some of the analyses significantly enhancing the analysis of clashes and of mutations to-proline and from-glycine. A new web interface has been developed allowing users to analyze their own mutations. Finally we have developed a machine learning method which gives a cross-validated accuracy of 0.846, considerably out-performing well known methods including SIFT and PolyPhen2 which give accuracies between 0.690 and 0.785. CONCLUSIONS: We have updated SAAPdb and improved its analyses, but with the increasing rate with which mutation data are generated, we have created a new analysis pipeline and web interface. Results of machine learning using the structural analysis results to predict pathogenicity considerably outperform other methods.


Assuntos
Biologia Computacional/métodos , Doenças Genéticas Inatas/genética , Mutação/genética , Fenótipo , Conformação Proteica , Proteínas/genética , Software , Substituição de Aminoácidos/genética , Inteligência Artificial , Humanos , Internet , Polimorfismo de Nucleotídeo Único/genética
11.
Protein Eng Des Sel ; 362023 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-38015984

RESUMO

The Fv region of the antibody (comprising VH and VL domains) is the area responsible for target binding and thus the antibody's specificity. The orientation, or packing, of these two domains relative to each other influences the topography of the Fv region, and therefore can influence the antibody's binding affinity. We present abYpap, an improved method for predicting the packing angle between the VH and VL domains. With the large data set now available, we were able to expand greatly the number of features that could be used compared with our previous work. The machine-learning model was tuned for improved performance using 37 selected residues (previously 13) and also by including the lengths of the most variable 'complementarity determining regions' (CDR-L1, CDR-L2 and CDR-H3). Our method shows large improvements from the previous version, and also against other modeling approaches, when predicting the packing angle.


Assuntos
Regiões Determinantes de Complementaridade , Cadeias Pesadas de Imunoglobulinas , Cadeias Pesadas de Imunoglobulinas/química , Modelos Moleculares , Regiões Determinantes de Complementaridade/química , Anticorpos , Cadeias Leves de Imunoglobulina/química
12.
Ann Hum Genet ; 76(5): 387-401, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22881376

RESUMO

Familial hypercholesterolemia (FH) is caused predominately by variants in the low-density lipoprotein receptor gene (LDLR). We report here an update of the UCL LDLR variant database to include variants reported in the literature and in-house between 2008 and 2010, transfer of the database to LOVDv.2.0 platform (https://grenada.lumc.nl/LOVD2/UCL-Heart/home.php?select_db=LDLR) and pathogenicity analysis. The database now contains over 1288 different variants reported in FH patients: 55% exonic substitutions, 22% exonic small rearrangements (<100 bp), 11% large rearrangements (>100 bp), 2% promoter variants, 10% intronic variants and 1 variant in the 3' untranslated sequence. The distribution and type of newly reported variants closely matches that of the 2008 database, and we have used these variants (n= 223) as a representative sample to assess the utility of standard open access software (PolyPhen, SIFT, refined SIFT, Neural Network Splice Site Prediction Tool, SplicePort and NetGene2) and additional analyses (Single Amino Acid Polymorphism database, analysis of conservation and structure and Mutation Taster) for pathogenicity prediction. In combination, these techniques have enabled us to assign with confidence pathogenic predictions to 8/8 in-frame small rearrangements and 8/9 missense substitutions with previously discordant results from PolyPhen and SIFT analysis. Overall, we conclude that 79% of the reported variants are likely to be disease causing.


Assuntos
Bases de Dados como Assunto , Variação Genética , Hiperlipoproteinemia Tipo II/genética , Receptores de LDL/genética , Humanos , Mutação , Isoformas de Proteínas
13.
Nucleic Acids Res ; 38(12): 4040-51, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20197319

RESUMO

Spt5 is the only known RNA polymerase-associated factor that is conserved in all three domains of life. We have solved the structure of the Methanococcus jannaschii Spt4/5 complex by X-ray crystallography, and characterized its function and interaction with the archaeal RNAP in a wholly recombinant in vitro transcription system. Archaeal Spt4 and Spt5 form a stable complex that associates with RNAP independently of the DNA-RNA scaffold of the elongation complex. The association of Spt4/5 with RNAP results in a stimulation of transcription processivity, both in the absence and the presence of the non-template strand. A domain deletion analysis reveals the molecular anatomy of Spt4/5--the Spt5 Nus-G N-terminal (NGN) domain is the effector domain of the complex that both mediates the interaction with RNAP and is essential for its elongation activity. Using a mutagenesis approach, we have identified a hydrophobic pocket on the Spt5 NGN domain as binding site for RNAP, and reciprocally the RNAP clamp coiled-coil motif as binding site for Spt4/5.


Assuntos
Proteínas Arqueais/química , Proteínas Cromossômicas não Histona/química , RNA Polimerases Dirigidas por DNA/metabolismo , Transcrição Gênica , Fatores de Elongação da Transcrição/química , Motivos de Aminoácidos , Sequência de Aminoácidos , Proteínas Arqueais/metabolismo , Sítios de Ligação , Proteínas Cromossômicas não Histona/metabolismo , Sequência Conservada , Cristalografia por Raios X , Interações Hidrofóbicas e Hidrofílicas , Mathanococcus , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Fatores de Elongação da Transcrição/metabolismo
14.
MAbs ; 14(1): 2101183, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35838549

RESUMO

As interest in antibody-based drug development continues to increase, the biopharmaceutical industry has begun to focus on complex multi-specific antibodies (MsAbs) as an up-and-coming class of biologic that differ from natural monoclonal antibodies through their ability to bind to more than one type of antigen. As techniques to generate such molecules have diversified, so have their formats and the need for standard notation. Previous efforts to develop a notation language for macromolecule drugs have been insufficient, or too complex, for MsAbs. Here, we present Antibody Markup Language (AbML), a new notation language specifically for antibody formats that overcomes the limitations of existing languages and can annotate all current antibody formats, including fusions, fragments, standard antibodies and MsAbs, as well as all currently conceivable future formats. AbML V1.1 also provides explicit support for T-cell receptor domains. To assist users of this language we have also developed a tool, abYdraw, that can draw antibody schematics from AbML strings or generate an AbML string from a drawn antibody schematic. AbML has the potential to become a standardized notation for describing new MsAb formats entering clinical trials.Abbreviations: AbML: Antibody Markup Language; ADC: Antibody-drug conjugate; CAS: Chemical Abstracts Service; CH: Constant heavy; CL: Constant light; Fv: Variable fragment; HELM: Hierarchical Editing Language for Macromolecules; HSA: Human serum albumin; INN: International Nonproprietary Names; KIH: Knobs-into-holes; mAbs: Monoclonal antibodies; MsAb: Multi-specific antibody; WHO: World Health Organization; PEG: Poly-ethylene glycol; scFv: Single-chain variable fragment; SMILES: Simplified Molecular-Input Line-Entry System; VH: Variable heavy; VHH: Single-domain (Camelid) variable heavy; VL: Variable light.


Assuntos
Idioma , Anticorpos de Cadeia Única , Anticorpos Monoclonais , Humanos , Anticorpos de Cadeia Única/química , Software
15.
MAbs ; 14(1): 2075078, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35584276

RESUMO

Appropriate nomenclature for all pharmaceutical substances is important for clinical development, licensing, prescribing, pharmacovigilance, and identification of counterfeits. Nonproprietary names that are unique and globally recognized for all pharmaceutical substances are assigned by the International Nonproprietary Names (INN) Programme of the World Health Organization (WHO). In 1991, the INN Programme implemented the first nomenclature scheme for monoclonal antibodies. To accompany biotechnological development, this nomenclature scheme has evolved over the years; however, since the scheme was introduced, all pharmacological substances that contained an immunoglobulin variable domain were coined with the stem -mab. To date, there are 879 INN with the stem -mab. Owing to this high number of names ending in -mab, devising new and distinguishable INN has become a challenge. The WHO INN Expert Group therefore decided to revise the system to ease this situation. The revised system was approved and adopted by the WHO at the 73rd INN Consultation held in October 2021, and the radical decision was made to discontinue the use of the well-known stem -mab in naming new antibody-based drugs and going forward, to replace it with four new stems: -tug, -bart, -mig, and -ment.


Assuntos
Anticorpos Monoclonais , Preparações Farmacêuticas , Organização Mundial da Saúde
16.
MAbs ; 14(1): 2020082, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35104168

RESUMO

Therapeutic monoclonal antibodies and their derivatives are key components of clinical pipelines in the global biopharmaceutical industry. The availability of large datasets of antibody sequences, structures, and biophysical properties is increasingly enabling the development of predictive models and computational tools for the "developability assessment" of antibody drug candidates. Here, we provide an overview of the antibody informatics tools applicable to the prediction of developability issues such as stability, aggregation, immunogenicity, and chemical degradation. We further evaluate the opportunities and challenges of using biopharmaceutical informatics for drug discovery and optimization. Finally, we discuss the potential of developability guidelines based on in silico metrics that can be used for the assessment of antibody stability and manufacturability.


Assuntos
Anticorpos Monoclonais , Produtos Biológicos , Simulação por Computador , Descoberta de Drogas , Humanos
17.
BMC Bioinformatics ; 12 Suppl 4: S1, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21992016

RESUMO

BACKGROUND: Protein Kinases are a superfamily of proteins involved in crucial cellular processes such as cell cycle regulation and signal transduction. Accordingly, they play an important role in cancer biology. To contribute to the study of the relation between kinases and disease we compared pathogenic mutations to neutral mutations as an extension to our previous analysis of cancer somatic mutations. First, we analyzed native and mutant proteins in terms of amino acid composition. Secondly, mutations were characterized according to their potential structural effects and finally, we assessed the location of the different classes of polymorphisms with respect to kinase-relevant positions in terms of subfamily specificity, conservation, accessibility and functional sites. RESULTS: Pathogenic Protein Kinase mutations perturb essential aspects of protein function, including disruption of substrate binding and/or effector recognition at family-specific positions. Interestingly these mutations in Protein Kinases display a tendency to avoid structurally relevant positions, what represents a significant difference with respect to the average distribution of pathogenic mutations in other protein families. CONCLUSIONS: Disease-associated mutations display sound differences with respect to neutral mutations: several amino acids are specific of each mutation type, different structural properties characterize each class and the distribution of pathogenic mutations within the consensus structure of the Protein Kinase domain is substantially different to that for non-pathogenic mutations. This preferential distribution confirms previous observations about the functional and structural distribution of the controversial cancer driver and passenger somatic mutations and their use as a proxy for the study of the involvement of somatic mutations in cancer development.


Assuntos
Mutação em Linhagem Germinativa , Mutação Puntual , Proteínas Quinases/genética , Humanos , Modelos Moleculares , Neoplasias/genética , Membro 2 do Grupo A da Subfamília 4 de Receptores Nucleares , Ligação Proteica , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Estrutura Terciária de Proteína , Transdução de Sinais
18.
Bioinform Adv ; 1(1): vbab023, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35585947

RESUMO

Motivation: Many bioinformatics resources are provided as 'web services', with large databases and analysis software stored on a central server, and clients interacting with them using the hypertext transport protocol (HTTP). While some provide only a visual HTML interface, requiring a web browser to use them, many provide programmatic access using a web application programming interface (API) which returns XML, JSON or plain text that computer programs can interpret more easily. This allows access to be automated. Initially, many bioinformatics APIs used the 'simple object access protocol' (SOAP) and, more recently, representational state transfer (REST). Results: GraphQL is a novel, increasingly prevalent alternative to REST and SOAP that represents the available data in the form of a graph to which any conceivable query can be submitted, and which is seeing increasing adoption in industry. Here, we review the principles of GraphQL, outline its particular suitability to the delivery of bioinformatics resources and describe its implementation in our ZincBind resource. Availability and implementation: https://api.zincbind.net. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

19.
BMC Bioinformatics ; 10 Suppl 8: S5, 2009 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-19758469

RESUMO

BACKGROUND: The phenotypic effects of sequence variations in protein-coding regions come about primarily via their effects on the resulting structures, for example by disrupting active sites or affecting structural stability. In order better to understand the mechanisms behind known mutant phenotypes, and predict the effects of novel variations, biologists need tools to gauge the impacts of DNA mutations in terms of their structural manifestation. Although many mutations occur within domains whose structure has been solved, many more occur within genes whose protein products have not been structurally characterized. RESULTS: Here we present 3DSim (3D Structural Implication of Mutations), a database and web application facilitating the localization and visualization of single amino acid polymorphisms (SAAPs) mapped to protein structures even where the structure of the protein of interest is unknown. The server displays information on 6514 point mutations, 4865 of them known to be associated with disease. These polymorphisms are drawn from SAAPdb, which aggregates data from various sources including dbSNP and several pathogenic mutation databases. While the SAAPdb interface displays mutations on known structures, 3DSim projects mutations onto known sequence domains in Gene3D. This resource contains sequences annotated with domains predicted to belong to structural families in the CATH database. Mappings between domain sequences in Gene3D and known structures in CATH are obtained using a MUSCLE alignment. 1210 three-dimensional structures corresponding to CATH structural domains are currently included in 3DSim; these domains are distributed across 396 CATH superfamilies, and provide a comprehensive overview of the distribution of mutations in structural space. CONCLUSION: The server is publicly available at http://3DSim.bioinfo.cnio.es/. In addition, the database containing the mapping between SAAPdb, Gene3D and CATH is available on request and most of the functionality is available through programmatic web service access.


Assuntos
Substituição de Aminoácidos , Armazenamento e Recuperação da Informação/métodos , Mutação , Proteínas/genética , Bases de Dados de Proteínas , Internet , Modelos Moleculares , Fenótipo , Proteínas/química
20.
Hum Mutat ; 30(4): 616-24, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19191322

RESUMO

The Single Amino Acid Polymorphism database (SAAPdb) is a new resource for the analysis and visualization of the structural effects of mutations. Our analytical approach is to map single nucleotide polymorphisms (SNPs) and pathogenic deviations (PDs) to protein structural data held within the Protein Data Bank. By mapping mutations onto protein structures, we can hypothesize whether the mutant residues will have any local structural effect that may "explain" a deleterious phenotype. Our prior work used a similar approach to analyze mutations within a single protein. An analysis of the contents of SAAPdb indicates that there are clear differences in the sequence and structural characteristics of SNPs and PDs, and that PDs are more often explained by our structural analysis. This mapping and analysis is a useful resource for the mutation community and is publicly available at http://www.bioinf.org.uk/saap/db/.


Assuntos
Aminoácidos/genética , Bases de Dados de Proteínas , Polimorfismo de Nucleotídeo Único , Proteínas/genética , Aminoácidos/química , Humanos , Interações Hidrofóbicas e Hidrofílicas , Internet , Mutação de Sentido Incorreto , Ligação Proteica , Estabilidade Proteica , Estrutura Quaternária de Proteína , Estrutura Terciária de Proteína , Proteínas/química
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa