Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 151
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 184(4): 1098-1109.e9, 2021 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-33606979

RESUMO

Bacteriophages drive evolutionary change in bacterial communities by creating gene flow networks that fuel ecological adaptions. However, the extent of viral diversity and its prevalence in the human gut remains largely unknown. Here, we introduce the Gut Phage Database, a collection of ∼142,000 non-redundant viral genomes (>10 kb) obtained by mining a dataset of 28,060 globally distributed human gut metagenomes and 2,898 reference genomes of cultured gut bacteria. Host assignment revealed that viral diversity is highest in the Firmicutes phyla and that ∼36% of viral clusters (VCs) are not restricted to a single species, creating gene flow networks across phylogenetically distinct bacterial species. Epidemiological analysis uncovered 280 globally distributed VCs found in at least 5 continents and a highly prevalent phage clade with features reminiscent of p-crAssphage. This high-quality, large-scale catalog of phage genomes will improve future virome studies and enable ecological and evolutionary analysis of human gut bacteriophages.


Assuntos
Bacteriófagos/genética , Biodiversidade , Microbioma Gastrointestinal , Bases de Dados de Ácidos Nucleicos , Especificidade de Hospedeiro , Humanos , Filogeografia
2.
Nucleic Acids Res ; 52(D1): D777-D783, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37897342

RESUMO

Meta'omic data on microbial diversity and function accrue exponentially in public repositories, but derived information is often siloed according to data type, study or sampled microbial environment. Here we present SPIRE, a Searchable Planetary-scale mIcrobiome REsource that integrates various consistently processed metagenome-derived microbial data modalities across habitats, geography and phylogeny. SPIRE encompasses 99 146 metagenomic samples from 739 studies covering a wide array of microbial environments and augmented with manually-curated contextual data. Across a total metagenomic assembly of 16 Tbp, SPIRE comprises 35 billion predicted protein sequences and 1.16 million newly constructed metagenome-assembled genomes (MAGs) of medium or high quality. Beyond mapping to the high-quality genome reference provided by proGenomes3 (http://progenomes.embl.de), these novel MAGs form 92 134 novel species-level clusters, the majority of which are unclassified at species level using current tools. SPIRE enables taxonomic profiling of these species clusters via an updated, custom mOTUs database (https://motu-tool.org/) and includes several layers of functional annotation, as well as crosslinks to several (micro-)biological databases. The resource is accessible, searchable and browsable via http://spire.embl.de.


Assuntos
Bases de Dados Factuais , Metagenoma , Microbiota , Metagenômica , Microbiota/genética
3.
Bioinformatics ; 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39298479

RESUMO

MOTIVATION: Metagenome-Assembled Genomes (MAGs) or Single-cell Amplified Genomes (SAGs) are often incomplete, with sequences missing due to errors in assembly or low coverage. This presents a particular challenge for the identification of true gene frequencies within a microbial population, as core genes missing in only a few assemblies will be mischaracterized by current pangenome approaches. RESULTS: Here, we present CELEBRIMBOR, a Snakemake pangenome analysis pipeline which uses a measure of genome completeness to automatically adjust the frequency threshold at which core genes are identified, enabling accurate core gene identification in MAGs and SAGs. AVAILABILITY: CELEBRIMBOR is published under open source Apache 2.0 licence at https://github.com/bacpop/CELEBRIMBOR and is available as a Docker container from this repository. Supplementary material is available in the online version of the article. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Nature ; 568(7753): 499-504, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30745586

RESUMO

The composition of the human gut microbiota is linked to health and disease, but knowledge of individual microbial species is needed to decipher their biological roles. Despite extensive culturing and sequencing efforts, the complete bacterial repertoire of the human gut microbiota remains undefined. Here we identify 1,952 uncultured candidate bacterial species by reconstructing 92,143 metagenome-assembled genomes from 11,850 human gut microbiomes. These uncultured genomes substantially expand the known species repertoire of the collective human gut microbiota, with a 281% increase in phylogenetic diversity. Although the newly identified species are less prevalent in well-studied populations compared to reference isolate genomes, they improve classification of understudied African and South American samples by more than 200%. These candidate species encode hundreds of newly identified biosynthetic gene clusters and possess a distinctive functional capacity that might explain their elusive nature. Our work expands the known diversity of uncultured gut bacteria, which provides unprecedented resolution for taxonomic and functional characterization of the intestinal microbiota.


Assuntos
Bactérias/classificação , Bactérias/genética , Microbioma Gastrointestinal/genética , Genoma Bacteriano/genética , Genômica , Metagenoma/genética , Bactérias/isolamento & purificação , Bactérias/metabolismo , Humanos , Família Multigênica , Filogenia , Especificidade da Espécie
5.
Nucleic Acids Res ; 51(D1): D753-D759, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36477304

RESUMO

The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.


Assuntos
Microbiota , Análise de Sequência , Genômica/métodos , Metagenoma , Metagenômica/métodos , Microbiota/genética , Software , Análise de Sequência/métodos
6.
PLoS Comput Biol ; 19(8): e1011422, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37639475

RESUMO

The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.


Assuntos
Eucariotos , Microbiota , Humanos , Células Eucarióticas , Genoma Viral/genética , Metagenoma/genética
7.
Oecologia ; 204(2): 365-376, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38356033

RESUMO

A conflict of interest occurs when parasites manipulate the behavior of their host in contradictory ways to achieve different goals. In grass shrimp (Palaemonetes pugio), trematode parasites that use shrimp as an intermediate host cause the shrimp to be more active than usual around predators, whereas bopyrid isopod parasites that use shrimp as a final host elicit the opposite response. Since these parasites are altering the host's behavior in opposing directions, a conflict of interest would occur in co-infected shrimp. Natural selection should favor attempts to resolve this conflict through avoidance, killing, or sabotage. In a field survey of shrimp populations in four tidal creeks in the Cape Fear River, we found a significant negative association between the two parasites. Parasite abundance was negatively correlated in differently sized hosts, suggesting avoidance as a mechanism. Subsequent mortality experiments showed no evidence of early death of co-infected hosts. In behavior trials, co-infected shrimp did not show significantly different behavior from singly infected or uninfected shrimp, suggesting that neither parasite sabotages the manipulation of the other. Taken together, our results suggest that rather than sabotaging or killing one another, bopyrid and trematode parasites tend to infect differently sized hosts, thus avoiding a conflict and confirming the importance of testing assumptions in natural contexts.


Assuntos
Características de História de Vida , Parasitos , Animais , Conflito de Interesses , Crustáceos , Rios
8.
Nucleic Acids Res ; 50(D1): D765-D770, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34634797

RESUMO

The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.


Assuntos
COVID-19/virologia , Bases de Dados Genéticas , SARS-CoV-2/genética , Navegador , Coronaviridae/genética , Variação Genética , Genoma Viral , Humanos , Anotação de Sequência Molecular
9.
Brief Bioinform ; 22(2): 642-663, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-33147627

RESUMO

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional , SARS-CoV-2/isolamento & purificação , Pesquisa Biomédica , COVID-19/epidemiologia , COVID-19/virologia , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genética
10.
Nucleic Acids Res ; 49(D1): D412-D419, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33125078

RESUMO

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.


Assuntos
Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Animais , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Biologia Computacional/métodos , Epidemias , Humanos , Internet , Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética , Proteoma/classificação , Proteoma/genética , Sequências Repetitivas de Aminoácidos/genética , SARS-CoV-2/genética , SARS-CoV-2/fisiologia , Análise de Sequência de Proteína/métodos
11.
Nucleic Acids Res ; 49(D1): D192-D200, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33211869

RESUMO

Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microRNAs, viral and bacterial RNAs. We have completed the first phase of synchronising microRNA families in Rfam and miRBase, creating 356 new Rfam families and updating 40. We established a procedure for comprehensive annotation of viral RNA families starting with Flavivirus and Coronaviridae RNAs. We have also increased the coverage of bacterial and metagenome-based RNA families from the ZWD database. These developments have enabled a significant growth of the database, with the addition of 759 new families in Rfam 14. To facilitate further community contribution to Rfam, expert users are now able to build and submit new families using the newly developed Rfam Cloud family curation system. New Rfam website features include a new sequence similarity search powered by RNAcentral, as well as search and visualisation of families with pseudoknots. Rfam is freely available at https://rfam.org.


Assuntos
Bases de Dados de Ácidos Nucleicos , Metagenoma , MicroRNAs/genética , RNA Bacteriano/genética , RNA não Traduzido/genética , RNA Viral/genética , Bactérias/genética , Bactérias/metabolismo , Pareamento de Bases , Sequência de Bases , Humanos , Internet , MicroRNAs/classificação , MicroRNAs/metabolismo , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , RNA Bacteriano/classificação , RNA Bacteriano/metabolismo , RNA não Traduzido/classificação , RNA não Traduzido/metabolismo , RNA Viral/classificação , RNA Viral/metabolismo , Alinhamento de Sequência , Análise de Sequência de RNA , Software , Vírus/genética , Vírus/metabolismo
12.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33156333

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , COVID-19/metabolismo , Internet , Anotação de Sequência Molecular , Domínios Proteicos , Mapas de Interação de Proteínas , SARS-CoV-2/metabolismo , Alinhamento de Sequência
13.
Genomics ; 114(1): 9-22, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34798282

RESUMO

Genomic knowledge of the tree of life is biased to specific groups of organisms. For example, only six full genomes are currently available in the rhizaria clade. Here, we have applied metagenomic techniques enabling the assembly of the genome of Polymyxa betae (Rhizaria, Plasmodiophorida) RES F41 isolate from unpurified zoospore holobiont and comparison with the A26-41 isolate. Furthermore, the first P. betae mitochondrial genome was assembled. The two P. betae nuclear genomes were highly similar, each with just ~10.2 k predicted protein coding genes, ~3% of which were unique to each isolate. Extending genomic comparisons revealed a greater overlap with Spongospora subterranea than with Plasmodiophora brassicae, including orthologs of the mammalian cation channel sperm-associated proteins, raising some intriguing questions about zoospore physiology. This work validates our metagenomics pipeline for eukaryote genome assembly from unpurified samples and enriches plasmodiophorid genomics; providing the first full annotation of the P. betae genome.


Assuntos
Genoma Mitocondrial , Plasmodioforídeos , Genômica , Metagenômica , Plasmodioforídeos/genética
14.
J Chem Phys ; 157(24): 244705, 2022 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-36586983

RESUMO

Light emitters based on the semiconductor alloy aluminum gallium nitride [(Al,Ga)N] have gained significant attention in recent years due to their potential for a wide range of applications in the ultraviolet (UV) spectral window. However, current state-of-the-art (Al,Ga)N light emitters exhibit very low internal quantum efficiencies (IQEs). Therefore, understanding the fundamental electronic and optical properties of (Al,Ga)N-based quantum wells is key to improving the IQE. Here, we target the electronic and optical properties of c-plane AlxGa1-xN/AlN quantum wells by means of an empirical atomistic tight-binding model. Special attention is paid to the impact of random alloy fluctuations on the results as well as the Al content x in the well. We find that across the studied Al content range (from 10% to 75% Al), strong hole wave function localization effects are observed. Additionally, with increasing Al content, electron wave functions may also start to exhibit carrier localization features. Overall, our investigations on the electronic structure of c-plane AlxGa1-xN/AlN quantum wells reveal that already random alloy fluctuations are sufficient to lead to (strong) carrier localization effects. Furthermore, our results indicate that random alloy fluctuations impact the degree of optical polarization in c-plane AlxGa1-xN quantum wells. We find that the switching from transverse electric to transverse magnetic light polarization occurs at higher Al contents in the atomistic calculation, which accounts for random alloy fluctuations, compared to the widely used virtual crystal approximation approach. This observation is important for light extraction efficiencies in (Al,Ga)N-based light emitting diodes operating in the deep UV.

15.
Nucleic Acids Res ; 48(D1): D570-D578, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31696235

RESUMO

MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.


Assuntos
Metagenoma , Microbiota , Filogenia , Software , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , DNA Espaçador Ribossômico/genética , Bases de Dados Genéticas , Metagenômica/métodos
16.
Nucleic Acids Res ; 48(D1): D314-D319, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31733063

RESUMO

Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.


Assuntos
Proteínas/química , Bases de Dados de Proteínas , Proteínas/classificação , Proteínas/genética , Interface Usuário-Computador
17.
Nucleic Acids Res ; 47(D1): D564-D572, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30364992

RESUMO

Automatic annotation of protein function is routinely applied to newly sequenced genomes. While this provides a fine-grained view of an organism's functional protein repertoire, proteins, more commonly function in a coordinated manner, such as in pathways or multimeric complexes. Genome Properties (GPs) define such functional entities as a series of steps, originally described by either TIGRFAMs or Pfam entries. To increase the scope of coverage, we have migrated GPs to function as a companion resource utilizing InterPro entries. Having introduced GPs-specific versioned releases, we provide software and data via a GitHub repository, and have developed a new web interface to GPs (available at https://www.ebi.ac.uk/interpro/genomeproperties). In addition to exploring each of the 1286 GPs, the website contains GPs pre-calculated for a representative set of proteomes; these results can be used to profile GPs phylogenetically via an interactive viewer. Users can upload novel data to the viewer for comparison with the pre-calculated results. Over the last year, we have added ∼700 new GPs, increasing the coverage of eukaryotic systems, as well as increasing general coverage through automatic generation of GPs from related resources. All data are freely available via the website and the GitHub repository.


Assuntos
Bases de Dados de Proteínas , Genoma , Proteínas/genética , Genoma Microbiano , Redes e Vias Metabólicas/genética , Complexos Multiproteicos/genética , Proteínas/metabolismo , Proteoma
18.
Nucleic Acids Res ; 47(W1): W636-W641, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-30976793

RESUMO

The EMBL-EBI provides free access to popular bioinformatics sequence analysis applications as well as to a full-featured text search engine with powerful cross-referencing and data retrieval capabilities. Access to these services is provided via user-friendly web interfaces and via established RESTful and SOAP Web Services APIs (https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/EMBL-EBI+Web+Services+APIs+-+Data+Retrieval). Both systems have been developed with the same core principles that allow them to integrate an ever-increasing volume of biological data, making them an integral part of many popular data resources provided at the EMBL-EBI. Here, we describe the latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability.


Assuntos
Análise de Sequência , Software , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Alinhamento de Sequência , Análise de Sequência de Proteína
19.
Nucleic Acids Res ; 47(D1): D427-D432, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357350

RESUMO

The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Anotação de Sequência Molecular , Domínios Proteicos , Proteínas/química , Sequências Repetitivas de Aminoácidos
20.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30398656

RESUMO

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Animais , Bases de Dados Genéticas , Ontologia Genética , Humanos , Internet , Família Multigênica , Domínios Proteicos/genética , Homologia de Sequência de Aminoácidos , Software , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA