Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Nucleic Acids Res ; 40(Database issue): D306-12, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22096229

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas/clasificación , Proteínas/fisiología , Análisis de Secuencia de Proteína , Programas Informáticos , Terminología como Asunto , Interfaz Usuario-Computador
2.
Bioinformatics ; 26(5): 596-602, 2010 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-20130034

RESUMEN

MOTIVATION: Some first order methods for protein sequence analysis inherently treat each position as independent. We develop a general framework for introducing longer range interactions. We then demonstrate the power of our approach by applying it to secondary structure prediction; under the independence assumption, sequences produced by existing methods can produce features that are not protein like, an extreme example being a helix of length 1. Our goal was to make the predictions from state of the art methods more realistic, without loss of performance by other measures. RESULTS: Our framework for longer range interactions is described as a k-mer order model. We succeeded in applying our model to the specific problem of secondary structure prediction, to be used as an additional layer on top of existing methods. We achieved our goal of making the predictions more realistic and protein like, and remarkably this also improved the overall performance. We improve the Segment OVerlap (SOV) score by 1.8%, but more importantly we radically improve the probability of the real sequence given a prediction from an average of 0.271 per residue to 0.385. Crucially, this improvement is obtained using no additional information. AVAILABILITY: http://supfam.cs.bris.ac.uk/kmer


Asunto(s)
Biología Computacional/métodos , Estructura Secundaria de Proteína , Proteínas/química , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Modelos Moleculares , Datos de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de Proteína
3.
Nucleic Acids Res ; 37(Database issue): D380-6, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19036790

RESUMEN

SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas/genética , Animales , Gráficos por Computador , Genómica , Humanos , Filogenia , Estructura Terciaria de Proteína/genética , Proteínas/clasificación , Análisis de Secuencia de ADN , Análisis de Secuencia de Proteína
4.
Nucleic Acids Res ; 37(Database issue): D211-5, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18940856

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).


Asunto(s)
Bases de Datos de Proteínas , Análisis de Secuencia de Proteína , Proteínas/química , Proteínas/clasificación , Integración de Sistemas
5.
Bioinformatics ; 24(22): 2630-1, 2008 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-18845584

RESUMEN

UNLABELLED: Profile Comparer (PRC) is a stand-alone program for scoring and aligning profile hidden Markov models (HMMs) of protein families. PRC can read models produced by SAM and HMMER, two popular profile HMM packages, as well as PSI-BLAST checkpoint files. This application note provides a brief description of the profile-profile algorithm used by PRC. AVAILABILITY: The C source code licensed under the GNU General Public Licence and Linux and Mac OS X binaries can be downloaded from http://supfam.org/PRC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Cadenas de Markov , Modelos Biológicos , Programas Informáticos , Algoritmos
6.
Nucleic Acids Res ; 35(Database issue): D308-13, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17098927

RESUMEN

The SUPERFAMILY database provides protein domain assignments, at the SCOP 'superfamily' level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from http://supfam.org. The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Genómica , Internet , Estructura Terciaria de Proteína/genética , Estructura Terciaria de Proteína/fisiología , Proteínas/clasificación , Interfaz Usuario-Computador
7.
Nucleic Acids Res ; 35(Database issue): D224-8, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17202162

RESUMEN

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.


Asunto(s)
Bases de Datos de Proteínas , Internet , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/clasificación , Proteínas/fisiología , Análisis de Secuencia de Proteína , Integración de Sistemas , Interfaz Usuario-Computador
8.
Nucleic Acids Res ; 33(Database issue): D201-5, 2005 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-15608177

RESUMEN

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Proteínas/clasificación , Análisis de Secuencia de Proteína , Bases de Datos de Proteínas/tendencias , Humanos , Estructura Terciaria de Proteína , Alineación de Secuencia , Integración de Sistemas
9.
RNA Biol ; 3(1): 40-8, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-17114936

RESUMEN

Several recent studies indicate that mammals and other organisms produce large numbers of RNA transcripts that do not correspond to known genes. It has been suggested that these transcripts do not encode proteins, but may instead function as RNAs. However, discrimination of coding and non-coding transcripts is not straightforward, and different laboratories have used different methods, whose ability to perform this discrimination is unclear. In this study, we examine ten bioinformatic methods that assess protein-coding potential and compare their ability and congruency in the discrimination of non-coding from coding sequences, based on four underlying principles: open reading frame size, sequence similarity to known proteins or protein domains, statistical models of protein-coding sequence, and synonymous versus non-synonymous substitution rates. Despite these different approaches, the methods show broad concordance, suggesting that coding and non-coding transcripts can, in general, be reliably discriminated, and that many of the recently discovered extra-genic transcripts are indeed non-coding. Comparison of the methods indicates reasons for unreliable predictions, and approaches to increase confidence further. Conversely and surprisingly, our analyses also provide evidence that as much as approximately 10% of entries in the manually curated protein database Swiss-Prot are erroneous translations of actually non-coding transcripts.


Asunto(s)
Bioquímica/métodos , Técnicas Genéticas , ARN Mensajero/química , ARN no Traducido/química , Algoritmos , Animales , Biología Computacional , ADN Complementario/metabolismo , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Etiquetas de Secuencia Expresada , Ratones , Sistemas de Lectura Abierta , Estructura Terciaria de Proteína , Proteínas/química , ARN Mensajero/genética , ARN no Traducido/genética
10.
Nucleic Acids Res ; 30(19): 4321-8, 2002 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-12364612

RESUMEN

Profile hidden Markov models (HMMs) are amongst the most successful procedures for detecting remote homology between proteins. There are two popular profile HMM programs, HMMER and SAM. Little is known about their performance relative to each other and to the recently improved version of PSI-BLAST. Here we compare the two programs to each other and to non-HMM methods, to determine their relative performance and the features that are important for their success. The quality of the multiple sequence alignments used to build models was the most important factor affecting the overall performance of profile HMMs. The SAM T99 procedure is needed to produce high quality alignments automatically, and the lack of an equivalent component in HMMER makes it less complete as a package. Using the default options and parameters as would be expected of an inexpert user, it was found that from identical alignments SAM consistently produces better models than HMMER and that the relative performance of the model-scoring components varies. On average, HMMER was found to be between one and three times faster than SAM when searching databases larger than 2000 sequences, SAM being faster on smaller ones. Both methods were shown to have effective low complexity and repeat sequence masking using their null models, and the accuracy of their E-values was comparable. It was found that the SAM T99 iterative database search procedure performs better than the most recent version of PSI-BLAST, but that scoring of PSI-BLAST profiles is more than 30 times faster than scoring of SAM models.


Asunto(s)
Biología Computacional/métodos , Cadenas de Markov , Alineación de Secuencia/métodos , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Proteínas/genética , Reproducibilidad de los Resultados
11.
Nucleic Acids Res ; 32(Database issue): D235-9, 2004 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-14681402

RESUMEN

The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.


Asunto(s)
Biología Computacional , Bases de Datos de Proteínas , Proteínas/química , Proteínas/clasificación , Animales , Genómica , Humanos , Almacenamiento y Recuperación de la Información , Internet , Concesión de Licencias , Cadenas de Markov , Estructura Terciaria de Proteína , Programas Informáticos
12.
J Mol Biol ; 342(1): 131-43, 2004 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-15313612

RESUMEN

We have examined the mouse genome sequence to determine its VH gene segment repertoire. In all, 141 segments are mapped to a 3 Mb region of chromosome 12. There is evidence that 92 of these are functional in the mouse strain used for the genome sequence, C57BL/6J; 12 are functional in other mouse strains, and 37 are pseudogenes. The mouse VH gene segment repertoire is therefore twice the size of that in humans. The mouse and human loci bear no large-scale similarity to each other. The 104 functional segments belong to one of the 15 known sequence subgroups, which have been further clustered into eight sets here. Seven of these sets, comprising 101 sequences, are related to five of the human VH families and have the same canonical structures in their hypervariable regions. Duplication of members of one set in the distal half of the locus is mainly responsible for the larger size of the mouse repertoire. Phylogenetic analysis of the VH segments indicates that most of the sequences in the human and mouse VH loci have arisen subsequent to the divergence of the two organisms from their common ancestor.


Asunto(s)
Genes de Inmunoglobulinas , Genoma , Cadenas Pesadas de Inmunoglobulina/genética , Región Variable de Inmunoglobulina/genética , Secuencia de Aminoácidos , Animales , Evolución Molecular , Variación Genética , Humanos , Cadenas Pesadas de Inmunoglobulina/clasificación , Región Variable de Inmunoglobulina/clasificación , Funciones de Verosimilitud , Ratones , Datos de Secuencia Molecular , Familia de Multigenes , Filogenia , Alineación de Secuencia
13.
J Chromatogr A ; 997(1-2): 285-90, 2003 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-12830903

RESUMEN

In water-based heating and cooling circuits monoethylene glycol is frequently used as an anti-freezing agent. For corrosion protection inhibitors based on nitrite, molybdate or amines are commonly added. The determination of nitrite is usually performed by ion chromatography (IC) using an IonPac AS14 analytical column for the anion separation and a suppressed conductivity detection. Local overheating in some circuits causes degradation of ethylene glycol and leads to the formation of some organic acids. Under such chemical conditions the correct quantification of nitrite becomes a more complex analytical task due to the interference of the organic acids. This problem was solved using the IonPac AS9-HC separation column. In heat transfer systems, where nitrite is not stable, molybdate can be used as an inhibitor for corrosion protection. In these cases photometric methods are recommended for monitoring the molybdate concentration. However, due to the dark brown colour and turbidity of aged glycol solutions photometric methods were not applicable. Thus the use of IC offered a reliable alternative for the determination of molybdate, also in aged glycol solutions, using IonPac AS9-HC or AS14 columns for separation.


Asunto(s)
Cromatografía por Intercambio Iónico/métodos , Calor , Molibdeno/análisis , Nitritos/análisis , Corrosión , Glicol de Etileno/química , Glicoles/química , Indicadores y Reactivos , Fotometría , Control de Calidad , Soluciones
14.
J Mol Biol ; 403(3): 480-93, 2010 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-20813113

RESUMEN

Coiled coils are α-helical interactions found in many natural proteins. Various sequence-based coiled-coil predictors are available, but key issues remain: oligomeric state and protein-protein interface prediction and extension to all genomes. We present SpiriCoil (http://supfam.org/SUPERFAMILY/spiricoil), which is based on a novel approach to the coiled-coil prediction problem for coiled coils that fall into known superfamilies: hundreds of hidden Markov models representing coiled-coil-containing domain families. Using whole domains gives the advantage that sequences flanking the coiled coils help. SpiriCoil performs at least as well as existing methods at detecting coiled coils and significantly advances the state of the art for oligomer state prediction. SpiriCoil has been run on over 16 million sequences, including all completely sequenced genomes (more than 1200), and a resulting Web interface supplies data downloads, alignments, scores, oligomeric state classifications, three-dimensional homology models and visualisation. This has allowed, for the first time, a genomewide analysis of coiled-coil evolution. We found that coiled coils have arisen independently de novo well over a hundred times, and these are observed in 16 different oligomeric states. Coiled coils in almost all oligomeric states were present in the last universal common ancestor of life. The vast majority of occasions that individual coiled coils have arisen de novo were before the last universal common ancestor of life; we do, however, observe scattered instances throughout subsequent evolutionary history, mostly in the formation of the eukaryote superkingdom. Coiled coils do not change their oligomeric state over evolution and did not evolve from the rearrangement of existing helices in proteins; coiled coils were forged in unison with the fold of the whole protein.


Asunto(s)
Evolución Molecular , Genoma , Proteínas/química , Programas Informáticos , Biología Computacional , Bases de Datos como Asunto , Modelos Moleculares , Conformación Proteica , Multimerización de Proteína , Proteínas/genética , Proteínas/metabolismo
16.
J Bacteriol ; 188(8): 2761-73, 2006 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-16585737

RESUMEN

Lipid modification of the N-terminal Cys residue (N-acyl-S-diacylglyceryl-Cys) has been found to be an essential, ubiquitous, and unique bacterial posttranslational modification. Such a modification allows anchoring of even highly hydrophilic proteins to the membrane which carry out a variety of functions important for bacteria, including pathogenesis. Hence, being able to identify such proteins is of great value. To this end, we have created a comprehensive database of bacterial lipoproteins, called DOLOP, which contains information and links to molecular details for about 278 distinct lipoproteins and predicted lipoproteins from 234 completely sequenced bacterial genomes. The website also features a tool that applies a predictive algorithm to identify the presence or absence of the lipoprotein signal sequence in a user-given sequence. The experimentally verified lipoproteins have been classified into different functional classes and more importantly functional domain assignments using hidden Markov models from the SUPERFAMILY database that have been provided for the predicted lipoproteins. Other features include the following: primary sequence analysis, signal sequence analysis, and search facility and information exchange facility to allow researchers to exchange results on newly characterized lipoproteins. The website, along with additional information on the biosynthetic pathway, statistics on predicted lipoproteins, and related figures, is available at http://www.mrc-lmb.cam.ac.uk/genomes/dolop/.


Asunto(s)
Proteínas Bacterianas , Biología Computacional , Bases de Datos de Proteínas , Lipoproteínas , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Proteínas Bacterianas/fisiología , Genes Bacterianos , Genoma Bacteriano , Lipoproteínas/química , Lipoproteínas/genética , Lipoproteínas/fisiología , Señales de Clasificación de Proteína/genética , Estructura Terciaria de Proteína/genética
17.
Genome Res ; 13(8): 1787-99, 2003 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-12869580

RESUMEN

The apicomplexan Cryptosporidium parvum is one of the most prevalent protozoan parasites of humans. We report the physical mapping of the genome of the Iowa isolate, sequencing and analysis of chromosome 6, and approximately 0.9 Mbp of sequence sampled from the remainder of the genome. To construct a robust physical map, we devised a novel and general strategy, enabling accurate placement of clones regardless of clone artefacts. Analysis reveals a compact genome, unusually rich in membrane proteins. As in Plasmodium falciparum, the mean size of the predicted proteins is larger than that in other sequenced eukaryotes. We find several predicted proteins of interest as potential therapeutic targets, including one exhibiting similarity to the chloroquine resistance protein of Plasmodium. Coding sequence analysis argues against the conventional phylogenetic position of Cryptosporidium and supports an earlier suggestion that this genus arose from an early branching within the Apicomplexa. In agreement with this, we find no significant synteny and surprisingly little protein similarity with Plasmodium. Finally, we find two unusual and abundant repeats throughout the genome. Among sequenced genomes, one motif is abundant only in C. parvum, whereas the other is shared with (but has previously gone unnoticed in) all known genomes of the Coccidia and Haemosporida. These motifs appear to be unique in their structure, distribution and sequences.


Asunto(s)
Cryptosporidium parvum/genética , Mapeo Físico de Cromosoma/métodos , Análisis de Secuencia de ADN/métodos , Animales , Composición de Base/genética , Centrómero/genética , Criptosporidiosis/diagnóstico , Criptosporidiosis/microbiología , Criptosporidiosis/terapia , Cryptosporidium parvum/aislamiento & purificación , Cryptosporidium parvum/patogenicidad , ADN Protozoario/análisis , Dosificación de Gen , Terapia Genética , Genoma de Protozoos , Datos de Secuencia Molecular , Filogenia , Polimorfismo Genético/genética , Polimorfismo de Nucleótido Simple/genética , Secuencias Repetidas en Tándem/genética , Telómero/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA