Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398656

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos , Internet , Familia de Multigenes , Dominios Proteicos/genética , Homología de Secuencia de Aminoácido , Programas Informáticos , Interfaz Usuario-Computador
2.
Nucleic Acids Res ; 46(D1): D726-D735, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29069476

RESUMEN

EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons.


Asunto(s)
Bases de Datos Genéticas , Metagenómica , Microbiota , Algoritmos , Secuencia de Bases , Clasificación/métodos , Conjuntos de Datos como Asunto , Metagenómica/métodos , ARN de Archaea/genética , ARN Bacteriano/genética , ARN Viral/genética , Ribotipificación , Programas Informáticos , Transcriptoma , Interfaz Usuario-Computador , Navegador Web , Flujo de Trabajo
3.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899635

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Filogenia
4.
Nucleic Acids Res ; 44(D1): D595-603, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26582919

RESUMEN

EBI metagenomics (https://www.ebi.ac.uk/metagenomics/) is a freely available hub for the analysis and archiving of metagenomic and metatranscriptomic data. Over the last 2 years, the resource has undergone rapid growth, with an increase of over five-fold in the number of processed samples and consequently represents one of the largest resources of analysed shotgun metagenomes. Here, we report the status of the resource in 2016 and give an overview of new developments. In particular, we describe updates to data content, a complete overhaul of the analysis pipeline, streamlining of data presentation via the website and the development of a new web based tool to compare functional analyses of sequence runs within a study. We also highlight two of the higher profile projects that have been analysed using the resource in the last year: the oceanographic projects Ocean Sampling Day and Tara Oceans.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Metagenómica , Perfilación de la Expresión Génica , Internet , Océanos y Mares , Programas Informáticos
5.
Nucleic Acids Res ; 43(Database issue): D213-21, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25428371

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36,766 member database signatures integrated into 26,238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Bacterias/metabolismo , Ontología de Genes , Estructura Terciaria de Proteína , Proteínas/genética , Análisis de Secuencia de Proteína , Programas Informáticos
6.
Bioinformatics ; 30(9): 1236-40, 2014 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-24451626

RESUMEN

MOTIVATION: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.


Asunto(s)
Genoma , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteínas/análisis , Arabidopsis/química , Arabidopsis/genética , Análisis por Conglomerados , Lenguajes de Programación , Proteínas/genética , Proteínas/metabolismo , Programas Informáticos
7.
Nucleic Acids Res ; 42(Database issue): D600-6, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24165880

RESUMEN

Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.


Asunto(s)
Bases de Datos Genéticas , Metagenómica , Perfilación de la Expresión Génica , Internet , Metabolómica , Proteómica , Programas Informáticos
8.
Brief Bioinform ; 13(6): 743-6, 2012 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-22962339

RESUMEN

Several thousand metagenomes have already been sequenced, and this number is set to grow rapidly in the forthcoming years as the uptake of high-throughput sequencing technologies continues. Hand-in-hand with this data bonanza comes the computationally overwhelming task of analysis. Herein, we describe some of the bioinformatic approaches currently used by metagenomics researchers to analyze their data, the issues they face and the steps that could be taken to help overcome these challenges.


Asunto(s)
Metagenoma , Bases de Datos Factuales , Metagenómica , Análisis de Secuencia de ADN
9.
Nucleic Acids Res ; 40(Database issue): D306-12, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22096229

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas/clasificación , Proteínas/fisiología , Análisis de Secuencia de Proteína , Programas Informáticos , Terminología como Asunto , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA