Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Nucleic Acids Res ; 42(Database issue): D600-6, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24165880

RESUMO

Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.


Assuntos
Bases de Dados Genéticas , Metagenômica , Perfilação da Expressão Gênica , Internet , Metabolômica , Proteômica , Software
2.
Bioinformatics ; 30(9): 1236-40, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24451626

RESUMO

MOTIVATION: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.


Assuntos
Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas/análise , Arabidopsis/química , Arabidopsis/genética , Análise por Conglomerados , Linguagens de Programação , Proteínas/genética , Proteínas/metabolismo , Software
3.
Nucleic Acids Res ; 40(Database issue): D306-12, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22096229

RESUMO

InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/classificação , Proteínas/fisiologia , Análise de Sequência de Proteína , Software , Terminologia como Assunto , Interface Usuário-Computador
4.
Nucleic Acids Res ; 37(Database issue): D211-5, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18940856

RESUMO

The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).


Assuntos
Bases de Dados de Proteínas , Análise de Sequência de Proteína , Proteínas/química , Proteínas/classificação , Integração de Sistemas
5.
Nucleic Acids Res ; 35(Database issue): D224-8, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17202162

RESUMO

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.


Assuntos
Bases de Dados de Proteínas , Internet , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/classificação , Proteínas/fisiologia , Análise de Sequência de Proteína , Integração de Sistemas , Interface Usuário-Computador
6.
Nucleic Acids Res ; 33(Database issue): D201-5, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608177

RESUMO

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Análise de Sequência de Proteína , Bases de Dados de Proteínas/tendências , Humanos , Estrutura Terciária de Proteína , Alinhamento de Sequência , Integração de Sistemas
7.
Nucleic Acids Res ; 32(Database issue): D262-6, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14681408

RESUMO

The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.


Assuntos
Bases de Dados Genéticas , Bases de Dados de Proteínas , Genes , Terminologia como Assunto , Animais , Biologia Computacional , Humanos , Armazenamento e Recuperação da Informação , Internet , Proteoma/química , Proteoma/genética , Proteoma/metabolismo , Proteômica
8.
BMC Bioinformatics ; 6 Suppl 1: S17, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15960829

RESUMO

BACKGROUND: The Gene Ontology Annotation (GOA) database http://www.ebi.ac.uk/GOA aims to provide high-quality supplementary GO annotation to proteins in the UniProt Knowledgebase. Like many other biological databases, GOA gathers much of its content from the careful manual curation of literature. However, as both the volume of literature and of proteins requiring characterization increases, the manual processing capability can become overloaded. Consequently, semi-automated aids are often employed to expedite the curation process. Traditionally, electronic techniques in GOA depend largely on exploiting the knowledge in existing resources such as InterPro. However, in recent years, text mining has been hailed as a potentially useful tool to aid the curation process. To encourage the development of such tools, the GOA team at EBI agreed to take part in the functional annotation task of the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge. BioCreAtIvE task 2 was an experiment to test if automatically derived classification using information retrieval and extraction could assist expert biologists in the annotation of the GO vocabulary to the proteins in the UniProt Knowledgebase. GOA provided the training corpus of over 9000 manual GO annotations extracted from the literature. For the test set, we provided a corpus of 200 new Journal of Biological Chemistry articles used to annotate 286 human proteins with GO terms. A team of experts manually evaluated the results of 9 participating groups, each of which provided highlighted sentences to support their GO and protein annotation predictions. Here, we give a biological perspective on the evaluation, explain how we annotate GO using literature and offer some suggestions to improve the precision of future text-retrieval and extraction techniques. Finally, we provide the results of the first inter-annotator agreement study for manual GO curation, as well as an assessment of our current electronic GO annotation strategies. RESULTS: The GOA database currently extracts GO annotation from the literature with 91 to 100% precision, and at least 72% recall. This creates a particularly high threshold for text mining systems which in BioCreAtIvE task 2 (GO annotation extraction and retrieval) initial results precisely predicted GO terms only 10 to 20% of the time. CONCLUSION: Improvements in the performance and accuracy of text mining for GO terms should be expected in the next BioCreAtIvE challenge. In the meantime the manual and electronic GO annotation strategies already employed by GOA will provide high quality annotations.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas/classificação , Genes , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Proteínas/genética , Animais , Biologia Computacional/normas , Bases de Dados Genéticas/normas , Armazenamento e Recuperação da Informação/normas , Reconhecimento Automatizado de Padrão/normas
9.
Genome Res ; 13(4): 662-72, 2003 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-12654719

RESUMO

Gene Ontology Annotation (GOA) is a project run by the European Bioinformatics Institute (EBI) that aims to provide assignments of terms from the Gene Ontology (GO) resource to gene products in a number of its databases (http://www.ebi.ac.uk/GOA). In the first stage of this project, GO assignments have been applied to a data set representing the complete human proteome by a combination of electronic mappings and manual curation. This vocabulary has also been applied to the nonredundant proteome sets for all other completely sequenced organisms as well as to proteins from a wide range of organisms where the proteome is not yet complete.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas/classificação , Genômica , Proteômica , Vocabulário Controlado , Biologia Computacional/tendências , Sistemas de Gerenciamento de Base de Dados/tendências , Bases de Dados de Proteínas/tendências , Genoma Humano , Genômica/tendências , Humanos , Proteoma/classificação , Proteoma/genética , Proteômica/tendências
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA