Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33156333

RESUMEN

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos , COVID-19/metabolismo , Internet , Anotación de Secuencia Molecular , Dominios Proteicos , Mapas de Interacción de Proteínas , SARS-CoV-2/metabolismo , Alineación de Secuencia
2.
Nucleic Acids Res ; 49(D1): D412-D419, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33125078

RESUMEN

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Bases de Datos de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Animales , COVID-19/epidemiología , COVID-19/prevención & control , COVID-19/virología , Biología Computacional/métodos , Epidemias , Humanos , Internet , Modelos Moleculares , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/genética , Proteoma/clasificación , Proteoma/genética , Secuencias Repetitivas de Aminoácido/genética , SARS-CoV-2/genética , SARS-CoV-2/fisiología , Análisis de Secuencia de Proteína/métodos
3.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398656

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos , Internet , Familia de Multigenes , Dominios Proteicos/genética , Homología de Secuencia de Aminoácido , Programas Informáticos , Interfaz Usuario-Computador
4.
Nucleic Acids Res ; 47(D1): D427-D432, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30357350

RESUMEN

The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Anotación de Secuencia Molecular , Dominios Proteicos , Proteínas/química , Secuencias Repetitivas de Aminoácido
5.
Nucleic Acids Res ; 46(D1): D726-D735, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29069476

RESUMEN

EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons.


Asunto(s)
Bases de Datos Genéticas , Metagenómica , Microbiota , Algoritmos , Secuencia de Bases , Clasificación/métodos , Conjuntos de Datos como Asunto , Metagenómica/métodos , ARN de Archaea/genética , ARN Bacteriano/genética , ARN Viral/genética , Ribotipificación , Programas Informáticos , Transcriptoma , Interfaz Usuario-Computador , Navegador Web , Flujo de Trabajo
6.
Nucleic Acids Res ; 44(D1): D279-85, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26673716

RESUMEN

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Proteoma/química , Alineación de Secuencia , Análisis de Secuencia de Proteína , Anotación de Secuencia Molecular
7.
BMC Genomics ; 9: 133, 2008 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-18366695

RESUMEN

BACKGROUND: The MIAME and MAGE-OM standards defined by the MGED society provide a specification and implementation of a software infrastructure to facilitate the submission and sharing of data from microarray studies via public repositories. However, although the MAGE object model is flexible enough to support different annotation strategies, the annotation of array descriptions can be complex. RESULTS: We have developed a graphical Java-based application (Adamant) to assist with submission of Microarray designs to public repositories. Output of the application is fully compliant with the standards prescribed by the various public data repositories. CONCLUSION: Adamant will allow researchers to annotate and submit their own array designs to public repositories without requiring programming expertise, knowledge of the MAGE-OM or XML. The application has been used to submit a number of ArrayDesigns to the Array Express database.


Asunto(s)
Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Biblioteca de Genes
8.
Mol Biochem Parasitol ; 144(2): 177-86, 2005 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-16174539

RESUMEN

Microarray-based comparative genomic hybridization (CGH) provides a powerful tool for whole genome analyses and the rapid detection of genomic variation that underlies virulence and disease. In the field of Plasmodium research, many of the parasite genomes that one might wish to study in a high throughput manner are not laboratory clones, but clinical isolates. One of the key limitations to the use of clinical samples in CGH, however, is the miniscule amounts of genomic DNA available. Here we describe the successful application of multiple displacement amplification (MDA), a non-PCR-based amplification method that exhibits clear advantages over all other currently available methods. Using MDA, CGH was performed on a panel of NF54 and IT/FCR3 clones, identifying previously published deletions on chromosomes 2 and 9 as well as polymorphism in genes associated with disease pathology.


Asunto(s)
Genoma de Protozoos/genética , Técnicas de Amplificación de Ácido Nucleico/métodos , Plasmodium falciparum/genética , Animales , Cromosomas Humanos Par 2/genética , Cromosomas Humanos Par 9/genética , Eliminación de Gen , Humanos , Malaria Falciparum/parasitología , Análisis por Micromatrices , Reacción en Cadena de la Polimerasa , Polimorfismo Genético
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...