Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
BMC Bioinformatics ; 14: 249, 2013 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-23947436

RESUMO

BACKGROUND: Candidate disease gene prediction is a rapidly developing area of bioinformatics research with the potential to deliver great benefits to human health. As experimental studies detecting associations between genetic intervals and disease proliferate, better bioinformatic techniques that can expand and exploit the data are required. DESCRIPTION: Gentrepid is a web resource which predicts and prioritizes candidate disease genes for both Mendelian and complex diseases. The system can take input from linkage analysis of single genetic intervals or multiple marker loci from genome-wide association studies. The underlying database of the Gentrepid tool sources data from numerous gene and protein resources, taking advantage of the wealth of biological information available. Using known disease gene information from OMIM, the system predicts and prioritizes disease gene candidates that participate in the same protein pathways or share similar protein domains. Alternatively, using an ab initio approach, the system can detect enrichment of these protein annotations without prior knowledge of the phenotype. CONCLUSIONS: The system aims to integrate the wealth of protein information currently available with known and novel phenotype/genotype information to acquire knowledge of biological mechanisms underpinning disease. We have updated the system to facilitate analysis of GWAS data and the study of complex diseases. Application of the system to GWAS data on hypertension using the ICBP data is provided as an example. An interesting prediction is a ZIP transporter additional to the one found by the ICBP analysis. The webserver URL is https://www.gentrepid.org/.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Internet , Humanos , Fenótipo
2.
Nucleic Acids Res ; 36(2): 578-88, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18056079

RESUMO

Structural genomics initiatives aim to elucidate representative 3D structures for the majority of protein families over the next decade, but many obstacles must be overcome. The correct design of constructs is extremely important since many proteins will be too large or contain unstructured regions and will not be amenable to crystallization. It is therefore essential to identify regions in protein sequences that are likely to be suitable for structural study. Scooby-Domain is a fast and simple method to identify globular domains in protein sequences. Domains are compact units of protein structure and their correct delineation will aid structural elucidation through a divide-and-conquer approach. Scooby-Domain predictions are based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method employs an A*-search to identify sequence regions that form a globular structure and those that are unstructured. On a test set of 173 proteins with consensus CATH and SCOP domain definitions, Scooby-Domain has a sensitivity of 50% and an accuracy of 29%, which is better than current state-of-the-art methods. The method does not rely on homology searches and, therefore, can identify previously unknown domains.


Assuntos
Algoritmos , Estrutura Terciária de Proteína , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Homologia de Sequência de Aminoácidos
3.
Nucleic Acids Res ; 34(19): e130, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17020920

RESUMO

Linkage analysis is a successful procedure to associate diseases with specific genomic regions. These regions are often large, containing hundreds of genes, which make experimental methods employed to identify the disease gene arduous and expensive. We present two methods to prioritize candidates for further experimental study: Common Pathway Scanning (CPS) and Common Module Profiling (CMP). CPS is based on the assumption that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway. CPS applies network data derived from protein-protein interaction (PPI) and pathway databases to identify relationships between genes. CMP identifies likely candidates using a domain-dependent sequence similarity approach, based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. Both algorithms use two forms of input data: known disease genes or multiple disease loci. When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold. Using multiple loci, our methods successfully identify disease genes for all benchmark diseases with a sensitivity of 0.84 and a specificity of 0.63. Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.


Assuntos
Predisposição Genética para Doença , Mapeamento de Interação de Proteínas , Análise de Sequência de Proteína/métodos , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Genes , Humanos , Fenótipo , Estrutura Terciária de Proteína , Proteínas/genética
4.
Nucleic Acids Res ; 33(Web Server issue): W160-3, 2005 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-15980446

RESUMO

Scooby-domain (sequence hydrophobicity predicts domains) is a fast and simple method to identify globular domains in protein sequence, based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method successfully identifies sequence regions that will form a globular structure and those that are likely to be unstructured. The method does not rely on homology searches and, therefore, can identify previously unknown domains for structural elucidation. Scooby-domain is available as a Java applet at http://ibivu.cs.vu.nl/programs/scoobywww. It may be used to visualize local properties within a protein sequence, such as average hydrophobicity, secondary structure propensity and domain boundaries, as well as being a method for fast domain assignment of large sequence sets.


Assuntos
Estrutura Terciária de Proteína , Análise de Sequência de Proteína/métodos , Software , Interações Hidrofóbicas e Hidrofílicas , Internet , Proteínas/química
5.
J Mol Biol ; 347(2): 415-36, 2005 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-15740750

RESUMO

As enzymes evolve and diverge from common ancestor sequences, they often keep their overall reaction chemistry but specialize in the binding of different cognate ligands. This study borrows methods for the computational assessment of 2D similarity of small molecules from the field of chemoinformatics, to examine the extent of structure conservation of cognate ligands binding to similar proteins. Proteins from 87 structural superfamilies from Escherichia coli form the core dataset, which is extended using homologues with functional assignments from any organism. We find that correlation of the substrate similarity with protein similarity (measured by either sequence-based or structure-based scores) can only be clearly established for very similar proteins. At low sequence identities, the superfamily to which a protein belongs can give helpful clues to its function, and more importantly, the confidence attached to such clues is superfamily-dependent. Our data indicate that only a few superfamilies show great substrate diversity, and that most exhibit conservation of at least part of the structural scaffold of the substrate.


Assuntos
Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Escherichia coli/química , Escherichia coli/genética , Evolução Molecular , Bases de Dados de Proteínas , Escherichia coli/metabolismo , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/classificação , Ligantes , Modelos Moleculares , Estrutura Molecular , Conformação Proteica , Especificidade por Substrato
6.
J Mol Biol ; 349(4): 745-63, 2005 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-15896806

RESUMO

We present here a comprehensive analysis of the complement of enzymes in a large variety of species. As enzymes are a relatively conserved group there are several classification systems available that are common to all species and link a protein sequence to an enzymatic function. Enzymes are therefore an ideal functional group to study the relationship between sequence expansion, functional divergence and phenotypic changes. By using information retrieved from the well annotated SWISS-PROT database together with sequence information from a variety of fully sequenced genomes and information from the EC functional scheme we have aimed here to estimate the fraction of enzymes in genomes, to determine the extent of their functional redundancy in different domains of life and to identify functional innovations and lineage specific expansions in the metazoa lineage. We found that prokaryote and eukaryote species differ both in the fraction of enzymes in their genomes and in the pattern of expansion of their enzymatic sets. We observe an increase in functional redundancy accompanying an increase in species complexity. A quantitative assessment was performed in order to determine the degree of functional redundancy in different species. Finally, we report a massive expansion in the number of mammalian enzymes involved in signalling and degradation.


Assuntos
Biologia Computacional , Enzimas/metabolismo , Proteoma/metabolismo , Proteômica , Animais , Bases de Dados de Proteínas , Enzimas/genética , Células Eucarióticas/metabolismo , Genoma , Humanos , Mamíferos/classificação , Mamíferos/genética , Mamíferos/metabolismo , Filogenia , Células Procarióticas/metabolismo , Proteoma/genética , Especificidade da Espécie
7.
J Mol Biol ; 316(3): 839-51, 2002 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-11866536

RESUMO

We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4 %. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9 % for proteins comprising continuous domains only, and 35.4 % for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8 %. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation.


Assuntos
Biologia Computacional/métodos , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética , Software , Algoritmos , Bases de Dados de Proteínas , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Dobramento de Proteína , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Alinhamento de Sequência , Análise de Sequência , Estatística como Assunto
8.
Bioinformatics ; 20 Suppl 1: i130-6, 2004 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-15262791

RESUMO

MOTIVATION: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full-length sequence information from Swiss-Prot, and verified functional information from the Enzyme Classification (EC) database. Two major problems need to be overcome to create a database of domain-function relationships; (1) for sequences, EC numbers are typically assigned to whole sequences rather than the functional unit, and (2) The Protein Data Bank (PDB) structures elucidated from a larger multi-domain protein will often have EC annotation although the relevant catalytic domain may lie elsewhere. RESULTS: SCOPEC entries have high quality enzyme assignments; having passed both computational and manual checks. SCOPEC currently contains entries for 75% of all EC annotations in the PDB. Overall, EC number is fairly well conserved within a superfamily, even when the proteins are distantly related. Initial analysis is encouraging; suggesting that there is a 50:50 chance of conserved function in distant homologues first detected by a third iteration PSI-BLAST search. Therefore, we envisage that a knowledge-based approach to function assignment using the domain-EC relationships in SCOPEC will gain a marked improvement over this base line. AVAILABILITY: The SCOPEC database is a valuable resource in the analysis and prediction of protein structure and function. It can be obtained or queried at our website http://www.enzome.com


Assuntos
Catálise , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Modelos Químicos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Simulação por Computador , Sistemas de Gerenciamento de Base de Dados , Estrutura Terciária de Proteína
9.
BMC Med Genomics ; 8 Suppl 2: S1, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26044129

RESUMO

BACKGROUND: Coronary artery disease (CAD), one of the leading causes of death globally, is influenced by both environmental and genetic risk factors. Gene-centric genome-wide association studies (GWAS) involving cases and controls have been remarkably successful in identifying genetic loci contributing to CAD. Modern in silico platforms, such as candidate gene prediction tools, permit a systematic analysis of GWAS data to identify candidate genes for complex diseases like CAD. Subsequent integration of drug-target data from drug databases with the predicted candidate genes can potentially identify novel therapeutics suitable for repositioning towards treatment of CAD. METHODS: Previously, we were able to predict 264 candidate genes and 104 potential therapeutic targets for CAD using Gentrepid (http://www.gentrepid.org), a candidate gene prediction platform with two bioinformatic modules to reanalyze Wellcome Trust Case-Control Consortium GWAS data. In an expanded study, using five bioinformatic modules on the same data, Gentrepid predicted 647 candidate genes and successfully replicated 55% of the candidate genes identified by the more powerful CARDIoGRAMplusC4D consortium meta-analysis. Hence, Gentrepid was capable of enhancing lower quality genotype-phenotype data, using an independent knowledgebase of existing biological data. Here, we used our methodology to integrate drug data from three drug databases: the Therapeutic Target Database, PharmGKB and Drug Bank, with the 647 candidate gene predictions from Gentrepid. We utilized known CAD targets, the scientific literature, existing drug data and the CARDIoGRAMplusC4D meta-analysis study as benchmarks to validate Gentrepid predictions for CAD. RESULTS: Our analysis identified a total of 184 predicted candidate genes as novel therapeutic targets for CAD, and 981 novel therapeutics feasible for repositioning in clinical trials towards treatment of CAD. The benchmarks based on known CAD targets and the scientific literature showed that our results were significant (p < 0.05). CONCLUSIONS: We have demonstrated that available drugs may potentially be repositioned as novel therapeutics for the treatment of CAD. Drug repositioning can save valuable time and money spent on preclinical and phase I clinical studies.


Assuntos
Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/terapia , Estudo de Associação Genômica Ampla , Estudos de Casos e Controles , Ensaios Clínicos como Assunto , Bases de Dados como Assunto , Humanos , Terapia de Alvo Molecular , Reprodutibilidade dos Testes , Software
10.
Proteins ; 48(4): 672-81, 2002 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-12211035

RESUMO

Protein sequences containing more than one structural domain are problematic when used in homology searches where they can either stop an iterative database search prematurely or cause an explosion of a search to common domains. We describe a method, DOMAINATION, that infers domains and their boundaries in a query sequence from local gapped alignments generated using PSI-BLAST. Through a new technique to recognize domain insertions and permutations, DOMAINATION submits delineated domains as successive database queries in further iterative steps. Assessed over a set of 452 multidomain proteins, the method predicts structural domain boundaries with an overall accuracy of 50% and improves finding distant homologies by 14% compared with PSI-BLAST. DOMAINATION is available as a web based tool at http://mathbio.nimr.mrc.ac.uk, and the source code is available from the authors upon request.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Análise de Sequência de Proteína/métodos , Animais , Biologia Computacional/métodos , Sequências Repetitivas de Aminoácidos , Reprodutibilidade dos Testes , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
11.
BMC Med Genomics ; 7 Suppl 1: S8, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25077696

RESUMO

BACKGROUND: Human genome sequencing has enabled the association of phenotypes with genetic loci, but our ability to effectively translate this data to the clinic has not kept pace. Over the past 60 years, pharmaceutical companies have successfully demonstrated the safety and efficacy of over 1,200 novel therapeutic drugs via costly clinical studies. While this process must continue, better use can be made of the existing valuable data. In silico tools such as candidate gene prediction systems allow rapid identification of disease genes by identifying the most probable candidate genes linked to genetic markers of the disease or phenotype under investigation. Integration of drug-target data with candidate gene prediction systems can identify novel phenotypes which may benefit from current therapeutics. Such a drug repositioning tool can save valuable time and money spent on preclinical studies and phase I clinical trials. METHODS: We previously used Gentrepid (http://www.gentrepid.org) as a platform to predict 1,497 candidate genes for the seven complex diseases considered in the Wellcome Trust Case-Control Consortium genome-wide association study; namely Type 2 Diabetes, Bipolar Disorder, Crohn's Disease, Hypertension, Type 1 Diabetes, Coronary Artery Disease and Rheumatoid Arthritis. Here, we adopted a simple approach to integrate drug data from three publicly available drug databases: the Therapeutic Target Database, the Pharmacogenomics Knowledgebase and DrugBank; with candidate gene predictions from Gentrepid at the systems level. RESULTS: Using the publicly available drug databases as sources of drug-target association data, we identified a total of 428 candidate genes as novel therapeutic targets for the seven phenotypes of interest, and 2,130 drugs feasible for repositioning against the predicted novel targets. CONCLUSIONS: By integrating genetic, bioinformatic and drug data, we have demonstrated that currently available drugs may be repositioned as novel therapeutics for the seven diseases studied here, quickly taking advantage of prior work in pharmaceutics to translate ground-breaking results in genetics to clinical treatments.


Assuntos
Doença/genética , Estudo de Associação Genômica Ampla , Terapia de Alvo Molecular/métodos , Bases de Dados de Produtos Farmacêuticos , Aprovação de Drogas , Descoberta de Drogas , Estudos de Viabilidade , Loci Gênicos/genética , Humanos , Estados Unidos , United States Food and Drug Administration
12.
Protein Sci ; 18(8): 1745-65, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19598234

RESUMO

Disulfides are conventionally viewed as structurally stabilizing elements in proteins but emerging evidence suggests two disulfide subproteomes exist. One group mediates the well known role of structural stabilization. A second redox-active group are best known for their catalytic functions but are increasingly being recognized for their roles in regulation of protein function. Redox-active disulfides are, by their very nature, more susceptible to reduction than structural disulfides; and conversely, the Cys pairs that form them are more susceptible to oxidation. In this study, we searched for potentially redox-active Cys Pairs by scanning the Protein Data Bank for structures of proteins in alternate redox states. The PDB contains over 1134 unique redox pairs of proteins, many of which exhibit conformational differences between alternate redox states. Several classes of structural changes were observed, proteins that exhibit: disulfide oxidation following expulsion of metals such as zinc; major reorganisation of the polypeptide backbone in association with disulfide redox-activity; order/disorder transitions; and changes in quaternary structure. Based on evidence gathered supporting disulfide redox activity, we propose disulfides present in alternate redox states are likely to have physiologically relevant redox activity.


Assuntos
Dissulfetos/metabolismo , Metais/metabolismo , Proteínas/química , Biologia Computacional , Bases de Dados de Proteínas , Dissulfetos/química , Oxirredução , Conformação Proteica , Estrutura Terciária de Proteína/fisiologia , Proteínas/metabolismo
13.
Proc Natl Acad Sci U S A ; 102(35): 12299-304, 2005 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-16037208

RESUMO

Because of the extreme impact of genome sequencing projects, protein sequences without accompanying experimental data now dominate public databases. Homology searches, by providing an opportunity to transfer functional information between related proteins, have become the de facto way to address this. Although a single, well annotated, close relationship will often facilitate sufficient annotation, this situation is not always the case, particularly if mutations are present in important functional residues. When only distant relationships are available, the transfer of function information is more tenuous, and the likelihood of encountering several well annotated proteins with different functions is increased. The consequence for a researcher is a range of candidate functions with little way of knowing which, if any, are correct. Here, we address the problem directly by introducing a computational approach to accurately identify and segregate related proteins into those with a functional similarity and those where function differs. This approach should find a wide range of applications, including the interpretation of genomics/proteomics data and the prioritization of targets for high-throughput structure determination. The method is generic, but here we concentrate on enzymes and apply high-quality catalytic site data. In addition to providing a series of comprehensive benchmarks to show the overall performance of our approach, we illustrate its utility with specific examples that include the correct identification of haptoglobin as a nonenzymatic relative of trypsin, discrimination of acid-d-amino acid ligases from a much larger ligase pool, and the successful annotation of BioH, a structural genomics target.


Assuntos
Proteínas/química , Sequência de Aminoácidos , Animais , Domínio Catalítico , Sequência Conservada , Bases de Dados de Proteínas , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , Genômica , Haptoglobinas/química , Haptoglobinas/genética , Haptoglobinas/metabolismo , Humanos , Dados de Sequência Molecular , Proteínas/genética , Proteínas/metabolismo , Proteômica , Homologia de Sequência de Aminoácidos , Tripsina/química , Tripsina/genética , Tripsina/metabolismo
14.
Protein Eng ; 15(11): 871-9, 2002 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-12538906

RESUMO

Recent advances in protein engineering have come from creating multi-functional chimeric proteins containing modules from various proteins. These modules are typically joined via an oligopeptide linker, the correct design of which is crucial for the desired function of the chimeric protein. Here we analyse the properties of naturally occurring inter-domain linkers with the aim to design linkers for domain fusion. Two main types of linker were identified; helical and non-helical. Helical linkers are thought to act as rigid spacers separating two domains. Non-helical linkers are rich in prolines, which also leads to structural rigidity and isolation of the linker from the attached domains. This means that both linker types are likely to act as a scaffold to prevent unfavourable interactions between folding domains. Based on these results we have constructed a linker database intended for the rational design of linkers for domain fusion, which can be accessed via the Internet at http://mathbio.nimr.mrc.ac.uk.


Assuntos
Dobramento de Proteína , Estrutura Terciária de Proteína , Distribuição de Qui-Quadrado , Interações Hidrofóbicas e Hidrofílicas , Conformação Proteica , Análise de Sequência de Proteína
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa