Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38747351

RESUMEN

The PSIRED Workbench is a long established and popular bioinformatics web service offering a wide range of machine learning based analyses for characterizing protein structure and function. In this paper we provide an update of the recent additions and developments to the webserver, with a focus on new Deep Learning based methods. We briefly discuss some trends in server usage since the publication of AlphaFold2 and we give an overview of some upcoming developments for the service. The PSIPRED Workbench is available at http://bioinf.cs.ucl.ac.uk/psipred.

2.
Methods Mol Biol ; 2165: 27-67, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32621218

RESUMEN

Genome3D consortium is a collaborative project involving protein structure prediction and annotation resources developed by six world-leading structural bioinformatics groups, based in the United Kingdom (namely Blundell, Murzin, Gough, Sternberg, Orengo, and Jones). The main objective of Genome3D serves as a common portal to provide both predicted models and annotations of proteins in model organisms, using several resources developed by these labs such as CATH-Gene3D, DOMSERF, pDomTHREADER, PHYRE, SUPERFAMILY, FUGUE/TOCATTA, and VIVACE. These resources primarily use SCOP- and/or CATH-based protein domain assignments. Another objective of Genome3D is to compare structural classifications of protein domains in CATH and SCOP databases and to provide a consensus mapping of CATH and SCOP protein superfamilies. CATH/SCOP mapping analyses led to the identification of total of 1429 consensus superfamilies.Currently, Genome3D provides structural annotations for ten model organisms, including Homo sapiens, Arabidopsis thaliana, Mus musculus, Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Plasmodium falciparum, Staphylococcus aureus, and Schizosaccharomyces pombe. Thus, Genome3D serves as a common gateway to each structure prediction/annotation resource and allows users to perform comparative assessment of the predictions. It, thus, assists researchers to broaden their perspective on structure/function predictions of their query protein of interest in selected model organisms.


Asunto(s)
Genómica/organización & administración , Bases del Conocimiento , Anotación de Secuencia Molecular/métodos , Proteoma/química , Animales , Arabidopsis , Genoma , Genómica/métodos , Humanos , Difusión de la Información , Alineación de Secuencia/métodos , Reino Unido , Levaduras
3.
Nucleic Acids Res ; 48(D1): D314-D319, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31733063

RESUMEN

Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.


Asunto(s)
Proteínas/química , Bases de Datos de Proteínas , Proteínas/clasificación , Proteínas/genética , Interfaz Usuario-Computador
4.
Proteins ; 88(4): 616-624, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31703152

RESUMEN

In this paper, using Word2vec, a widely-used natural language processing method, we demonstrate that protein domains may have a learnable implicit semantic "meaning" in the context of their functional contributions to the multi-domain proteins in which they are found. Word2vec is a group of models which can be used to produce semantically meaningful embeddings of words or tokens in a fixed-dimension vector space. In this work, we treat multi-domain proteins as "sentences" where domain identifiers are tokens which may be considered as "words." Using all InterPro (Finn et al. 2017) pfam domain assignments we observe that the embedding could be used to suggest putative GO assignments for Pfam (Finn et al. 2016) domains of unknown function.


Asunto(s)
Anotación de Secuencia Molecular/métodos , Procesamiento de Lenguaje Natural , Proteínas/química , Semántica , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto , Ontología de Genes , Humanos , Dominios Proteicos , Proteínas/fisiología
5.
Nucleic Acids Res ; 47(W1): W402-W407, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31251384

RESUMEN

The PSIPRED Workbench is a web server offering a range of predictive methods to the bioscience community for 20 years. Here, we present the work we have completed to update the PSIPRED Protein Analysis Workbench and make it ready for the next 20 years. The main focus of our recent website upgrade work has been the acceleration of analyses in the face of increasing protein sequence database size. We additionally discuss any new software, the new hardware infrastructure, our webservices and web site. Lastly we survey updates to some of the key predictive algorithms available through our website.


Asunto(s)
Ontología de Genes/tendencias , Anotación de Secuencia Molecular/métodos , Proteínas/química , Programas Informáticos/historia , Secuencia de Aminoácidos , Sitios de Unión , Ontología de Genes/historia , Historia del Siglo XXI , Internet , Modelos Moleculares , Anotación de Secuencia Molecular/historia , Unión Proteica , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Proteínas/historia , Alineación de Secuencia , Homología de Secuencia de Aminoácido
6.
Proteins ; 86 Suppl 1: 78-83, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-28901583

RESUMEN

In this paper, we present the results for the MetaPSICOV2 contact prediction server in the CASP12 community experiment (http://predictioncenter.org). Over the 35 assessed Free Modelling target domains the MetaPSICOV2 server achieved a mean precision of 43.27%, a substantial increase relative to the server's performance in the CASP11 experiment. In the following paper, we discuss improvements to the MetaPSICOV2 server, covering both changes to the neural network and attempts to integrate contact predictions on a domain basis into the prediction pipeline. We also discuss some limitations in the CASP12 assessment which may have overestimated the performance of our method.


Asunto(s)
Biología Computacional/métodos , Internet , Aprendizaje Automático , Modelos Moleculares , Redes Neurales de la Computación , Conformación Proteica , Proteínas/química , Algoritmos , Cristalografía por Rayos X , Humanos , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos
7.
Sci Rep ; 7(1): 6999, 2017 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-28765603

RESUMEN

Intrinsically disordaered proteins (IDPs) are a prevalent phenomenon with over 30% of human proteins estimated to have long disordered regions. Computational methods are widely used to study IDPs, however, nearly all treat disorder in a binary fashion, not accounting for the structural heterogeneity present in disordered regions. Here, we present a new de novo method, FRAGFOLD-IDP, which addresses this problem. Using 200 protein structural ensembles derived from NMR, we show that FRAGFOLD-IDP achieves superior results compared to methods which can predict related data (NMR order parameter, or crystallographic B-factor). FRAGFOLD-IDP produces very good predictions for 33.5% of cases and helps to get a better insight into the dynamics of the disordered ensembles. The results also show it is not necessary to predict the correct fold of the protein to reliably predict per-residue fluctuations. It implies that disorder is a local property and it does not depend on the fold. Our results are orthogonal to DynaMine, the only other method significantly better than the naïve prediction. We therefore combine these two using a neural network. FRAGFOLD-IDP enables better insight into backbone dynamics in IDPs and opens exciting possibilities for the design of disordered ensembles, disorder-to-order transitions, or design for protein dynamics.


Asunto(s)
Biología Computacional/métodos , Proteínas Intrínsecamente Desordenadas/química , Biología Molecular/métodos , Cristalografía por Rayos X , Espectroscopía de Resonancia Magnética , Modelos Moleculares , Redes Neurales de la Computación
8.
Bioinformatics ; 33(17): 2684-2690, 2017 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-28419258

RESUMEN

MOTIVATION: Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem ( Moult et al., 2014 ). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010) , but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is. RESULTS: EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods. AVAILABILITY AND IMPLEMENTATION: All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts . EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/ . CONTACT: d.t.jones@ucl.ac.uk.


Asunto(s)
Biología Computacional/métodos , Modelos Moleculares , Pliegue de Proteína , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos
9.
Nucleic Acids Res ; 43(Database issue): D382-6, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25348407

RESUMEN

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Estructura Terciaria de Proteína , Algoritmos , Genómica , Internet , Modelos Moleculares , Estructura Terciaria de Proteína/genética , Análisis de Secuencia de Proteína
10.
Nucleic Acids Res ; 41(Web Server issue): W349-57, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23748958

RESUMEN

Here, we present the new UCL Bioinformatics Group's PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.


Asunto(s)
Conformación Proteica , Programas Informáticos , Animales , Internet , Proteínas de la Membrana/química , Ratones , Proteínas/química , Análisis de Secuencia de Proteína , Homología Estructural de Proteína
11.
BMC Bioinformatics ; 14 Suppl 3: S1, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23514099

RESUMEN

BACKGROUND: Accurate protein function annotation is a severe bottleneck when utilizing the deluge of high-throughput, next generation sequencing data. Keeping database annotations up-to-date has become a major scientific challenge that requires the development of reliable automatic predictors of protein function. The CAFA experiment provided a unique opportunity to undertake comprehensive 'blind testing' of many diverse approaches for automated function prediction. We report on the methodology we used for this challenge and on the lessons we learnt. METHODS: Our method integrates into a single framework a wide variety of biological information sources, encompassing sequence, gene expression and protein-protein interaction data, as well as annotations in UniProt entries. The methodology transfers functional categories based on the results from complementary homology-based and feature-based analyses. We generated the final molecular function and biological process assignments by combining the initial predictions in a probabilistic manner, which takes into account the Gene Ontology hierarchical structure. RESULTS: We propose a novel scoring function called COmbined Graph-Information Content similarity (COGIC) score for the comparison of predicted functional categories and benchmark data. We demonstrate that our integrative approach provides increased scope and accuracy over both the component methods and the naïve predictors. In line with previous studies, we find that molecular function predictions are more accurate than biological process assignments. CONCLUSIONS: Overall, the results indicate that there is considerable room for improvement in the field. It still remains for the community to invest a great deal of effort to make automated function prediction a useful and routine component in the toolbox of life scientists. As already witnessed in other areas, community-wide blind testing experiments will be pivotal in establishing standards for the evaluation of prediction accuracy, in fostering advancements and new ideas, and ultimately in recording progress.


Asunto(s)
Proteínas/fisiología , Biología Computacional/métodos , Bases de Datos de Proteínas , Evolución Molecular , Expresión Génica , Anotación de Secuencia Molecular , Mapeo de Interacción de Proteínas , Proteínas/química , Proteínas/genética , Análisis de Secuencia
12.
Nat Methods ; 10(3): 221-7, 2013 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-23353650

RESUMEN

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.


Asunto(s)
Biología Computacional/métodos , Biología Molecular/métodos , Anotación de Secuencia Molecular , Proteínas/fisiología , Algoritmos , Animales , Bases de Datos de Proteínas , Exorribonucleasas/clasificación , Exorribonucleasas/genética , Exorribonucleasas/fisiología , Predicción , Humanos , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Especificidad de la Especie
13.
Nucleic Acids Res ; 41(Database issue): D499-507, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23203986

RESUMEN

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Genómica , Humanos , Internet , Anotación de Secuencia Molecular , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Programas Informáticos
14.
Bioinformatics ; 28(2): 184-90, 2012 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-22101153

RESUMEN

MOTIVATION: The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. RESULTS: PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. AVAILABILITY: The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.


Asunto(s)
Algoritmos , Proteínas/química , Alineación de Secuencia/métodos , Teorema de Bayes , Mutación , Filogenia , Proteínas/genética
15.
J Mol Biol ; 336(4): 871-87, 2004 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-15095866

RESUMEN

We present the structural annotation of 56 different bacterial species based on the assignment of genes to 816 evolutionary superfamilies in the CATH domain structure database. These assignments have enabled us to analyse the recurrence of specific superfamilies within and across the genomes. We have selected the superfamilies that have a very broad representation and therefore appear to be universally distributed in a significant number of bacterial lineages. Occurrence profiles of these universally distributed superfamilies are compared with genome size in order to estimate the correlation between superfamily duplication and the increase in proteome size. This distinguishes between those size-dependent superfamilies where frequency of occurrence is highly correlated with increase in genome size, and size-independent superfamilies where no correlation is observed. Consideration of the size correlation and the ratio between the mean and the standard deviations for all the superfamily profiles allows more detailed subdivisions and classification of superfamilies. For example, within the size-independent superfamilies, we distinguished a group that are distributed evenly amongst all the genomes. Within the size-dependent superfamilies we differentiated two groups: linearly distributed and non-linearly distributed. Functional annotation using the COG database was performed for all superfamilies in each of these groups, and this revealed significant differences amongst the three sets of superfamilies. Evenly distributed, size-independent domains are shown to be involved primarily in protein translation and biosynthesis. For the size-dependent superfamilies, linearly distributed superfamilies are involved mainly in metabolism, and non-linearly distributed superfamily domains are involved principally in gene regulation.


Asunto(s)
Evolución Molecular , Genoma Bacteriano , Proteínas/clasificación , Proteínas/genética , Bases de Datos de Proteínas , Sistemas de Lectura Abierta , Conformación Proteica , Proteínas/química , Estadística como Asunto
16.
Nucleic Acids Res ; 31(1): 469-73, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12520054

RESUMEN

The Gene3D database (http://www.biochem.ucl.ac.uk/bsm/cath_new/Gene3D/) provides structural assignments for genes within complete genomes. These are available via the internet from either the World Wide Web or FTP. Assignments are made using PSI-BLAST and subsequently processed using the DRange protocol. The DRange protocol is an empirically benchmarked method for assessing the validity of structural assignments made using sequence searching methods where appropriate assignment statistics are collected and made available. Gene3D links assignments to their appropriate entries in relevent structural and classification resources (PDBsum, CATH database and the Dictionary of Homologous Superfamilies). Release 2.0 of Gene3D includes 62 genomes, 2 eukaryotes, 10 archaea and 40 bacteria. Currently, structural assignments can be made for between 30 and 40 percent of any given genome. In any genome, around half of those genes assigned a structural domain are assigned a single domain and the other half of the genes are assigned multiple structural domains. Gene3D is linked to the CATH database and is updated with each new update of CATH.


Asunto(s)
Bases de Datos Genéticas , Genoma , Estructura Terciaria de Proteína , Proteínas/química , Animales , Biología Computacional , Genoma Arqueal , Genoma Bacteriano , Imagenología Tridimensional , Internet , Proteínas/fisiología , Homología Estructural de Proteína
17.
Genome Res ; 12(3): 503-14, 2002 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-11875040

RESUMEN

We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, the functional Dictionary of Homologous Superfamilies (DHS) and PDBsum. Currently Gene3D provides annotation for 36 complete genomes (two eukaryotes, six archaea, and 28 bacteria). On average, between 30% and 40% of the genes of a given genome can be structurally annotated. Matches to structural domains are found using the profile-based method (PSI-BLAST). and a novel protocol, DRange, is used to resolve conflicts in matches involving different homologous superfamilies.


Asunto(s)
Bases de Datos Genéticas , Genes/genética , Genoma , Programas Informáticos , Animales , Proteínas Arqueales/genética , Proteínas Bacterianas/genética , Bases de Datos Genéticas/estadística & datos numéricos , Bases de Datos de Proteínas , Genes Arqueales/genética , Genes Bacterianos/genética , Genoma Arqueal , Genoma Bacteriano , Internet , Estructura Terciaria de Proteína , Proteínas/genética , Homología de Secuencia de Ácido Nucleico , Programas Informáticos/estadística & datos numéricos
18.
Protein Sci ; 11(2): 233-44, 2002 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-11790833

RESUMEN

An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.


Asunto(s)
Genoma , Proteínas/química , Algoritmos , Evolución Biológica , Bases de Datos Factuales , Bases de Datos de Proteínas , Conformación Proteica , Pliegue de Proteína , Estructura Terciaria de Proteína , Proteínas/genética , Relación Estructura-Actividad
19.
Proteomics ; 2(1): 11-21, 2002 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-11788987

RESUMEN

Over the last decade, there have been huge increases in the numbers of protein sequences and structures determined. In parallel, many methods have been developed for recognising similarities between these proteins, arising from their common evolutionary background, and for clustering such relatives into protein families. Here we review some of the protein family resources available to the biologist and describe how these can be used to provide structural and functional annotations for newly determined sequences. In particular we describe recent developments to the CATH domain database of protein structural families which have facilitated genome annotation and which have also revealed important caveats that must be considered when transferring functional data between homologous proteins.


Asunto(s)
Bases de Datos de Proteínas , Genoma , Conformación Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...