Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Nucleic Acids Res ; 39(Database issue): D539-45, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20935045

RESUMO

The Protein Ontology (PRO) provides a formal, logically-based classification of specific protein classes including structured representations of protein isoforms, variants and modified forms. Initially focused on proteins found in human, mouse and Escherichia coli, PRO now includes representations of protein complexes. The PRO Consortium works in concert with the developers of other biomedical ontologies and protein knowledge bases to provide the ability to formally organize and integrate representations of precise protein forms so as to enhance accessibility to results of protein research. PRO (http://pir.georgetown.edu/pro) is part of the Open Biomedical Ontology Foundry.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Animais , Proteínas de Escherichia coli/química , Humanos , Camundongos , Complexos Multiproteicos/química , Complexos Multiproteicos/classificação , Isoformas de Proteínas/química , Isoformas de Proteínas/classificação , Proteínas/química , Proteínas/genética , Interface Usuário-Computador , Vocabulário Controlado
2.
BMC Bioinformatics ; 10 Suppl 5: S3, 2009 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-19426460

RESUMO

BACKGROUND: The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein forms of a gene, including those resulting from alternative splicing, cleavage and/or post-translational modifications. Focusing specifically on the TGF-beta signaling proteins, we describe the building, curation, usage and dissemination of PRO. RESULTS: PRO is manually curated on the basis of PrePRO, an automatically generated file with content derived from standard protein data sources. Manual curation ensures that the treatment of the protein classes and the internal and external relationships conform to the PRO framework. The current release of PRO is based upon experimental data from mouse and human proteins wherein equivalent protein forms are represented by single terms. In addition to the PRO ontology, the annotation of PRO terms is released as a separate PRO association file, which contains, for each given PRO term, an annotation from the experimentally characterized sub-types as well as the corresponding database identifiers and sequence coordinates. The annotations are added in the form of relationship to other ontologies. Whenever possible, equivalent forms in other species are listed to facilitate cross-species comparison. Splice and allelic variants, gene fusion products and modified protein forms are all represented as entities in the ontology. Therefore, PRO provides for the representation of protein entities and a resource for describing the associated data. This makes PRO useful both for proteomics studies where isoforms and modified forms must be differentiated, and for studies of biological pathways, where representations need to take account of the different ways in which the cascade of events may depend on specific protein modifications. CONCLUSION: PRO provides a framework for the formal representation of protein classes and protein forms in the OBO Foundry. It is designed to enable data retrieval and integration and machine reasoning at the molecular level of proteins, thereby facilitating cross-species comparisons, pathway analysis, disease modeling and the generation of new hypotheses.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Peptídeos e Proteínas de Sinalização Intracelular/classificação , Fator de Crescimento Transformador beta/química , Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/genética , Fator de Crescimento Transformador beta/classificação , Interface Usuário-Computador
3.
Virus Genes ; 39(1): 1-9, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19283462

RESUMO

We analyzed the envelope proteins in pathogenic flaviviruses to determine whether there are sequence signatures associated with the tendency of viruses to produce hemorrhagic disease (H-viruses) or encephalitis (E-viruses). We found that, at the position corresponding to the glycosylated Asn-67 in dengue virus, asparagine (Asn) occurs in all seven viral species that cause hemorrhagic disease in humans. Furthermore, Asn was extremely rare at position 67 in six flaviviruses that cause encephalitis, being replaced by Asp in four of them. Of the 3,246 sequences from H- and E-viruses, we found that 2,916 sequences (90%) contained Asn in position 67 for H-viruses or Asp in position 67 for E-viruses. The change from Asn-67 that is prevalent in H-viruses to Asp-67 (common in E-viruses) contributes to a stronger electrostatically negative surface in the E-viruses as compared to the H-viruses. These findings should help predicting the disease potential of emerging and re-emerging flaviviruses and understanding the relationship between protein structure and disease outcome.


Assuntos
Flavivirus/genética , Flavivirus/patogenicidade , Proteínas do Envelope Viral/genética , Fatores de Virulência/genética , Sequência de Aminoácidos , Asparagina/genética , Ácido Aspártico/genética , Encefalite Viral/virologia , Hemorragia/virologia , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos
4.
Nucleic Acids Res ; 34(Database issue): D187-91, 2006 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-16381842

RESUMO

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.


Assuntos
Bases de Dados de Proteínas , Internet , Proteínas/química , Proteínas/classificação , Proteínas/fisiologia , Proteoma/química , Análise de Sequência de Proteína , Integração de Sistemas , Interface Usuário-Computador
5.
BMC Bioinformatics ; 8 Suppl 9: S1, 2007 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-18047702

RESUMO

Biomedical ontologies are emerging as critical tools in genomic and proteomic research, where complex data in disparate resources need to be integrated. A number of ontologies describe properties that can be attributed to proteins. For example, protein functions are described by the Gene Ontology (GO) and human diseases by SNOMED CT or ICD10. There is, however, a gap in the current set of ontologies - one that describes the protein entities themselves and their relationships. We have designed the PRotein Ontology (PRO) to facilitate protein annotation and to guide new experiments. The components of PRO extend from the classification of proteins on the basis of evolutionary relationships to the representation of the multiple protein forms of a gene (products generated by genetic variation, alternative splicing, proteolytic cleavage, and other post-translational modifications). PRO will allow the specification of relationships between PRO, GO and other ontologies in the OBO Foundry. Here we describe the initial development of PRO, illustrated using human and mouse proteins involved in the transforming growth factor-beta and bone morphogenetic protein signaling pathways.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Evolução Molecular , Armazenamento e Recuperação da Informação/métodos , Proteínas , Análise de Sequência/métodos , Interface Usuário-Computador , Proteínas/química , Proteínas/classificação , Proteínas/genética , Proteínas/metabolismo , Alinhamento de Sequência/métodos
6.
Nucleic Acids Res ; 33(Database issue): D154-9, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608167

RESUMO

The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , Proteínas/fisiologia , Integração de Sistemas , Interface Usuário-Computador
7.
Nucleic Acids Res ; 31(1): 390-2, 2003 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-12520030

RESUMO

The iProClass database provides comprehensive, value-added descriptions of proteins and serves as a framework for data integration in a distributed networking environment. The protein information in iProClass includes family relationships as well as structural and functional classifications and features. The current version consists of about 830 000 non-redundant PIR-PSD, SWISS-PROT, and TrEMBL proteins organized with more than 36 000 PIR superfamilies, 145 000 families, 4000 domains, 1300 motifs and 550 000 FASTA similarity clusters. It provides rich links to over 50 database of protein sequences, families, functions and pathways, protein-protein interactions, post-translational modifications, protein expressions, structures and structural classifications, genes and genomes, ontologies, literature and taxonomy. Protein and superfamily summary reports present extensive annotation information and include membership statistics and graphical display of domains and motifs. iProClass employs an open and modular architecture for interoperability and scalability. It is implemented in the Oracle object-relational database system and is updated biweekly. The database is freely accessible from the web site at http://pir.georgetown.edu/iproclass/ and searchable by sequence or text string. The data integration in iProClass supports exploration of protein relationships. Such knowledge is fundamental to the understanding of protein evolution, structure and function and crucial to functional genomic and proteomic research.


Assuntos
Bases de Dados de Proteínas , Proteínas , Motivos de Aminoácidos , Animais , Humanos , Armazenamento e Recuperação da Informação , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/classificação , Proteínas/fisiologia , Homologia de Sequência de Aminoácidos
8.
Nucleic Acids Res ; 31(1): 345-7, 2003 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-12520019

RESUMO

The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotation, and detection of annotation errors. The superfamily curation defines signature domain architecture and categorizes memberships to improve automated classification. To increase the amount of experimental annotation, the PIR has developed a bibliography system for literature searching, mapping, and user submission, and has conducted retrospective attribution of citations for experimental features. PIR also maintains NREF, a non-redundant reference database, and iProClass, an integrated database of protein family, function, and structure information. PIR-NREF provides a timely and comprehensive collection of protein sequences, currently consisting of more than 1 000 000 entries from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. The PIR web site (http://pir.georgetown.edu) connects data analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and text searches, and sorting and visual exploration of search results. The FTP site provides free download for PSD and NREF biweekly releases and auxiliary databases and files.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Sequência de Aminoácidos , Animais , Bases de Dados Bibliográficas , Internet , Proteínas/genética
9.
Nucleic Acids Res ; 32(Database issue): D115-9, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14681372

RESUMO

To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Proteínas/química , Proteínas/metabolismo , Animais , Humanos , Internet , Conformação Proteica , Proteínas/classificação , Proteoma , Proteômica , Terminologia como Assunto
10.
Nucleic Acids Res ; 30(1): 35-7, 2002 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11752247

RESUMO

The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases).


Assuntos
Bases de Dados de Proteínas , Sequência de Aminoácidos , Animais , Humanos , Armazenamento e Recuperação da Informação , Agências Internacionais , Internet , Proteínas/classificação , Proteínas/genética , Integração de Sistemas
11.
Nucleic Acids Res ; 32(Database issue): D112-4, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14681371

RESUMO

The Protein Information Resource (PIR) is an integrated public resource of protein informatics. To facilitate the sensible propagation and standardization of protein annotation and the systematic detection of annotation errors, PIR has extended its superfamily concept and developed the SuperFamily (PIRSF) classification system. Based on the evolutionary relationships of whole proteins, this classification system allows annotation of both specific biological and generic biochemical functions. The system adopts a network structure for protein classification from superfamily to subfamily levels. Protein family members are homologous (sharing common ancestry) and homeomorphic (sharing full-length sequence similarity with common domain architecture). The PIRSF database consists of two data sets, preliminary clusters and curated families. The curated families include family name, protein membership, parent-child relationship, domain architecture, and optional description and bibliography. PIRSF is accessible from the website at http://pir.georgetown.edu/pirsf/ for report retrieval and sequence classification. The report presents family annotation, membership statistics, cross-references to other databases, graphical display of domain architecture, and links to multiple sequence alignments and phylogenetic trees for curated families. PIRSF can be utilized to analyze phylogenetic profiles, to reveal functional convergence and divergence, and to identify interesting relationships between homeomorphic families, domains and structural classes.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Motivos de Aminoácidos , Animais , Evolução Molecular , Humanos , Armazenamento e Recuperação da Informação , Internet , Estrutura Terciária de Proteína
12.
Comput Biol Chem ; 28(1): 87-96, 2004 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-15022647

RESUMO

Increasingly, scientists have begun to tackle gene functions and other complex regulatory processes by studying organisms at the global scales for various levels of biological organization, ranging from genomes to metabolomes and physiomes. Meanwhile, new bioinformatics methods have been developed for inferring protein function using associative analysis of functional properties to complement the traditional sequence homology-based methods. To fully exploit the value of the high-throughput system biology data and to facilitate protein functional studies requires bioinformatics infrastructures that support both data integration and associative analysis. The iProClass database, designed to serve as a framework for data integration in a distributed networking environment, provides comprehensive descriptions of all proteins, with rich links to over 50 databases of protein family, function, pathway, interaction, modification, structure, genome, ontology, literature, and taxonomy. In particular, the database is organized with PIRSF family classification and maps to other family, function, and structure classification schemes. Coupled with the underlying taxonomic information for complete genomes, the iProClass system (http://pir.georgetown.edu/iproclass/) supports associative studies of protein family, domain, function, and structure. A case study of the phosphoglycerate mutases illustrates a systematic approach for protein family and phylogenetic analysis. Such studies may serve as a basis for further analysis of protein functional evolution, and its relationship to the co-evolution of metabolic pathways, cellular networks, and organisms.


Assuntos
Bases de Dados Factuais , Genoma Humano , Proteínas/metabolismo , Sequência de Aminoácidos , Biologia Computacional , Humanos , Biologia Molecular/métodos , Dados de Sequência Molecular , Fosfoglicerato Mutase/química , Fosfoglicerato Mutase/genética , Fosfoglicerato Mutase/metabolismo , Filogenia , Proteínas/química , Proteínas/genética
13.
Comput Biol Chem ; 27(1): 37-47, 2003 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-12798038

RESUMO

With the accelerated accumulation of genomic sequence data, there is a pressing need to develop computational methods and advanced bioinformatics infrastructure for reliable and large-scale protein annotation and biological knowledge discovery. The Protein Information Resource (PIR) provides an integrated public resource of protein informatics to support genomic and proteomic research. PIR produces the Protein Sequence Database of functionally annotated protein sequences. The annotation problems are addressed by a classification-driven and rule-based method with evidence attribution, coupled with an integrated knowledge base system being developed. The approach allows sensitive identification, consistent and rich annotation, and systematic detection of annotation errors, as well as distinction of experimentally verified and computationally predicted features. The knowledge base consists of two new databases, sequence analysis tools, and graphical interfaces. PIR-NREF, a non-redundant reference database, provides a timely and comprehensive collection of all protein sequences, totaling more than 1,000,000 entries. iProClass, an integrated database of protein family, function, and structure information, provides extensive value-added features for about 830,000 proteins with rich links to over 50 molecular databases. This paper describes our approach to protein functional annotation with case studies and examines common identification errors. It also illustrates that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.


Assuntos
Biologia Computacional , Proteínas/classificação , Proteínas/fisiologia , Motivos de Aminoácidos/fisiologia , Sequência de Aminoácidos , Biologia Computacional/métodos , Biologia Computacional/normas , Bases de Dados de Proteínas/classificação , Bases de Dados de Proteínas/normas , Dados de Sequência Molecular , Estrutura Terciária de Proteína/fisiologia , Terminologia como Assunto
14.
Evol Bioinform Online ; 2: 197-209, 2007 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-19455212

RESUMO

The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA