Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 148
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 184(4): 1098-1109.e9, 2021 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-33606979

RESUMEN

Bacteriophages drive evolutionary change in bacterial communities by creating gene flow networks that fuel ecological adaptions. However, the extent of viral diversity and its prevalence in the human gut remains largely unknown. Here, we introduce the Gut Phage Database, a collection of ∼142,000 non-redundant viral genomes (>10 kb) obtained by mining a dataset of 28,060 globally distributed human gut metagenomes and 2,898 reference genomes of cultured gut bacteria. Host assignment revealed that viral diversity is highest in the Firmicutes phyla and that ∼36% of viral clusters (VCs) are not restricted to a single species, creating gene flow networks across phylogenetically distinct bacterial species. Epidemiological analysis uncovered 280 globally distributed VCs found in at least 5 continents and a highly prevalent phage clade with features reminiscent of p-crAssphage. This high-quality, large-scale catalog of phage genomes will improve future virome studies and enable ecological and evolutionary analysis of human gut bacteriophages.


Asunto(s)
Bacteriófagos/genética , Biodiversidad , Microbioma Gastrointestinal , Bases de Datos de Ácidos Nucleicos , Especificidad del Huésped , Humanos , Filogeografía
2.
Nucleic Acids Res ; 52(D1): D777-D783, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37897342

RESUMEN

Meta'omic data on microbial diversity and function accrue exponentially in public repositories, but derived information is often siloed according to data type, study or sampled microbial environment. Here we present SPIRE, a Searchable Planetary-scale mIcrobiome REsource that integrates various consistently processed metagenome-derived microbial data modalities across habitats, geography and phylogeny. SPIRE encompasses 99 146 metagenomic samples from 739 studies covering a wide array of microbial environments and augmented with manually-curated contextual data. Across a total metagenomic assembly of 16 Tbp, SPIRE comprises 35 billion predicted protein sequences and 1.16 million newly constructed metagenome-assembled genomes (MAGs) of medium or high quality. Beyond mapping to the high-quality genome reference provided by proGenomes3 (http://progenomes.embl.de), these novel MAGs form 92 134 novel species-level clusters, the majority of which are unclassified at species level using current tools. SPIRE enables taxonomic profiling of these species clusters via an updated, custom mOTUs database (https://motu-tool.org/) and includes several layers of functional annotation, as well as crosslinks to several (micro-)biological databases. The resource is accessible, searchable and browsable via http://spire.embl.de.


Asunto(s)
Bases de Datos Factuales , Metagenoma , Microbiota , Metagenómica , Microbiota/genética
3.
Nature ; 568(7753): 499-504, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30745586

RESUMEN

The composition of the human gut microbiota is linked to health and disease, but knowledge of individual microbial species is needed to decipher their biological roles. Despite extensive culturing and sequencing efforts, the complete bacterial repertoire of the human gut microbiota remains undefined. Here we identify 1,952 uncultured candidate bacterial species by reconstructing 92,143 metagenome-assembled genomes from 11,850 human gut microbiomes. These uncultured genomes substantially expand the known species repertoire of the collective human gut microbiota, with a 281% increase in phylogenetic diversity. Although the newly identified species are less prevalent in well-studied populations compared to reference isolate genomes, they improve classification of understudied African and South American samples by more than 200%. These candidate species encode hundreds of newly identified biosynthetic gene clusters and possess a distinctive functional capacity that might explain their elusive nature. Our work expands the known diversity of uncultured gut bacteria, which provides unprecedented resolution for taxonomic and functional characterization of the intestinal microbiota.


Asunto(s)
Bacterias/clasificación , Bacterias/genética , Microbioma Gastrointestinal/genética , Genoma Bacteriano/genética , Genómica , Metagenoma/genética , Bacterias/aislamiento & purificación , Bacterias/metabolismo , Humanos , Familia de Multigenes , Filogenia , Especificidad de la Especie
4.
Nucleic Acids Res ; 51(D1): D753-D759, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36477304

RESUMEN

The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.


Asunto(s)
Microbiota , Análisis de Secuencia , Genómica/métodos , Metagenoma , Metagenómica/métodos , Microbiota/genética , Programas Informáticos , Análisis de Secuencia/métodos
5.
PLoS Comput Biol ; 19(8): e1011422, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37639475

RESUMEN

The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.


Asunto(s)
Eucariontes , Microbiota , Humanos , Células Eucariotas , Genoma Viral/genética , Metagenoma/genética
6.
Oecologia ; 204(2): 365-376, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38356033

RESUMEN

A conflict of interest occurs when parasites manipulate the behavior of their host in contradictory ways to achieve different goals. In grass shrimp (Palaemonetes pugio), trematode parasites that use shrimp as an intermediate host cause the shrimp to be more active than usual around predators, whereas bopyrid isopod parasites that use shrimp as a final host elicit the opposite response. Since these parasites are altering the host's behavior in opposing directions, a conflict of interest would occur in co-infected shrimp. Natural selection should favor attempts to resolve this conflict through avoidance, killing, or sabotage. In a field survey of shrimp populations in four tidal creeks in the Cape Fear River, we found a significant negative association between the two parasites. Parasite abundance was negatively correlated in differently sized hosts, suggesting avoidance as a mechanism. Subsequent mortality experiments showed no evidence of early death of co-infected hosts. In behavior trials, co-infected shrimp did not show significantly different behavior from singly infected or uninfected shrimp, suggesting that neither parasite sabotages the manipulation of the other. Taken together, our results suggest that rather than sabotaging or killing one another, bopyrid and trematode parasites tend to infect differently sized hosts, thus avoiding a conflict and confirming the importance of testing assumptions in natural contexts.


Asunto(s)
Rasgos de la Historia de Vida , Parásitos , Animales , Conflicto de Intereses , Crustáceos , Ríos
7.
Nucleic Acids Res ; 50(D1): D765-D770, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34634797

RESUMEN

The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.


Asunto(s)
COVID-19/virología , Bases de Datos Genéticas , SARS-CoV-2/genética , Navegador Web , Coronaviridae/genética , Variación Genética , Genoma Viral , Humanos , Anotación de Secuencia Molecular
8.
Brief Bioinform ; 22(2): 642-663, 2021 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-33147627

RESUMEN

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.


Asunto(s)
COVID-19/prevención & control , Biología Computacional , SARS-CoV-2/aislamiento & purificación , Investigación Biomédica , COVID-19/epidemiología , COVID-19/virología , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genética
9.
Nucleic Acids Res ; 49(D1): D412-D419, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33125078

RESUMEN

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Bases de Datos de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Animales , COVID-19/epidemiología , COVID-19/prevención & control , COVID-19/virología , Biología Computacional/métodos , Epidemias , Humanos , Internet , Modelos Moleculares , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/genética , Proteoma/clasificación , Proteoma/genética , Secuencias Repetitivas de Aminoácido/genética , SARS-CoV-2/genética , SARS-CoV-2/fisiología , Análisis de Secuencia de Proteína/métodos
10.
Nucleic Acids Res ; 49(D1): D192-D200, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33211869

RESUMEN

Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microRNAs, viral and bacterial RNAs. We have completed the first phase of synchronising microRNA families in Rfam and miRBase, creating 356 new Rfam families and updating 40. We established a procedure for comprehensive annotation of viral RNA families starting with Flavivirus and Coronaviridae RNAs. We have also increased the coverage of bacterial and metagenome-based RNA families from the ZWD database. These developments have enabled a significant growth of the database, with the addition of 759 new families in Rfam 14. To facilitate further community contribution to Rfam, expert users are now able to build and submit new families using the newly developed Rfam Cloud family curation system. New Rfam website features include a new sequence similarity search powered by RNAcentral, as well as search and visualisation of families with pseudoknots. Rfam is freely available at https://rfam.org.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Metagenoma , MicroARNs/genética , ARN Bacteriano/genética , ARN no Traducido/genética , ARN Viral/genética , Bacterias/genética , Bacterias/metabolismo , Emparejamiento Base , Secuencia de Bases , Humanos , Internet , MicroARNs/clasificación , MicroARNs/metabolismo , Anotación de Secuencia Molecular , Conformación de Ácido Nucleico , ARN Bacteriano/clasificación , ARN Bacteriano/metabolismo , ARN no Traducido/clasificación , ARN no Traducido/metabolismo , ARN Viral/clasificación , ARN Viral/metabolismo , Alineación de Secuencia , Análisis de Secuencia de ARN , Programas Informáticos , Virus/genética , Virus/metabolismo
11.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33156333

RESUMEN

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos , COVID-19/metabolismo , Internet , Anotación de Secuencia Molecular , Dominios Proteicos , Mapas de Interacción de Proteínas , SARS-CoV-2/metabolismo , Alineación de Secuencia
12.
Genomics ; 114(1): 9-22, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34798282

RESUMEN

Genomic knowledge of the tree of life is biased to specific groups of organisms. For example, only six full genomes are currently available in the rhizaria clade. Here, we have applied metagenomic techniques enabling the assembly of the genome of Polymyxa betae (Rhizaria, Plasmodiophorida) RES F41 isolate from unpurified zoospore holobiont and comparison with the A26-41 isolate. Furthermore, the first P. betae mitochondrial genome was assembled. The two P. betae nuclear genomes were highly similar, each with just ~10.2 k predicted protein coding genes, ~3% of which were unique to each isolate. Extending genomic comparisons revealed a greater overlap with Spongospora subterranea than with Plasmodiophora brassicae, including orthologs of the mammalian cation channel sperm-associated proteins, raising some intriguing questions about zoospore physiology. This work validates our metagenomics pipeline for eukaryote genome assembly from unpurified samples and enriches plasmodiophorid genomics; providing the first full annotation of the P. betae genome.


Asunto(s)
Genoma Mitocondrial , Plasmodiophorida , Genómica , Metagenómica , Plasmodiophorida/genética
13.
J Chem Phys ; 157(24): 244705, 2022 Dec 28.
Artículo en Inglés | MEDLINE | ID: mdl-36586983

RESUMEN

Light emitters based on the semiconductor alloy aluminum gallium nitride [(Al,Ga)N] have gained significant attention in recent years due to their potential for a wide range of applications in the ultraviolet (UV) spectral window. However, current state-of-the-art (Al,Ga)N light emitters exhibit very low internal quantum efficiencies (IQEs). Therefore, understanding the fundamental electronic and optical properties of (Al,Ga)N-based quantum wells is key to improving the IQE. Here, we target the electronic and optical properties of c-plane AlxGa1-xN/AlN quantum wells by means of an empirical atomistic tight-binding model. Special attention is paid to the impact of random alloy fluctuations on the results as well as the Al content x in the well. We find that across the studied Al content range (from 10% to 75% Al), strong hole wave function localization effects are observed. Additionally, with increasing Al content, electron wave functions may also start to exhibit carrier localization features. Overall, our investigations on the electronic structure of c-plane AlxGa1-xN/AlN quantum wells reveal that already random alloy fluctuations are sufficient to lead to (strong) carrier localization effects. Furthermore, our results indicate that random alloy fluctuations impact the degree of optical polarization in c-plane AlxGa1-xN quantum wells. We find that the switching from transverse electric to transverse magnetic light polarization occurs at higher Al contents in the atomistic calculation, which accounts for random alloy fluctuations, compared to the widely used virtual crystal approximation approach. This observation is important for light extraction efficiencies in (Al,Ga)N-based light emitting diodes operating in the deep UV.

14.
Nucleic Acids Res ; 48(D1): D314-D319, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31733063

RESUMEN

Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.


Asunto(s)
Proteínas/química , Bases de Datos de Proteínas , Proteínas/clasificación , Proteínas/genética , Interfaz Usuario-Computador
15.
Nucleic Acids Res ; 48(D1): D570-D578, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31696235

RESUMEN

MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.


Asunto(s)
Metagenoma , Microbiota , Filogenia , Programas Informáticos , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , ADN Espaciador Ribosómico/genética , Bases de Datos Genéticas , Metagenómica/métodos
16.
Nucleic Acids Res ; 47(D1): D564-D572, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30364992

RESUMEN

Automatic annotation of protein function is routinely applied to newly sequenced genomes. While this provides a fine-grained view of an organism's functional protein repertoire, proteins, more commonly function in a coordinated manner, such as in pathways or multimeric complexes. Genome Properties (GPs) define such functional entities as a series of steps, originally described by either TIGRFAMs or Pfam entries. To increase the scope of coverage, we have migrated GPs to function as a companion resource utilizing InterPro entries. Having introduced GPs-specific versioned releases, we provide software and data via a GitHub repository, and have developed a new web interface to GPs (available at https://www.ebi.ac.uk/interpro/genomeproperties). In addition to exploring each of the 1286 GPs, the website contains GPs pre-calculated for a representative set of proteomes; these results can be used to profile GPs phylogenetically via an interactive viewer. Users can upload novel data to the viewer for comparison with the pre-calculated results. Over the last year, we have added ∼700 new GPs, increasing the coverage of eukaryotic systems, as well as increasing general coverage through automatic generation of GPs from related resources. All data are freely available via the website and the GitHub repository.


Asunto(s)
Bases de Datos de Proteínas , Genoma , Proteínas/genética , Genoma Microbiano , Redes y Vías Metabólicas/genética , Complejos Multiproteicos/genética , Proteínas/metabolismo , Proteoma
17.
Nucleic Acids Res ; 47(W1): W636-W641, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-30976793

RESUMEN

The EMBL-EBI provides free access to popular bioinformatics sequence analysis applications as well as to a full-featured text search engine with powerful cross-referencing and data retrieval capabilities. Access to these services is provided via user-friendly web interfaces and via established RESTful and SOAP Web Services APIs (https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/EMBL-EBI+Web+Services+APIs+-+Data+Retrieval). Both systems have been developed with the same core principles that allow them to integrate an ever-increasing volume of biological data, making them an integral part of many popular data resources provided at the EMBL-EBI. Here, we describe the latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability.


Asunto(s)
Análisis de Secuencia , Programas Informáticos , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Alineación de Secuencia , Análisis de Secuencia de Proteína
18.
Nucleic Acids Res ; 47(D1): D427-D432, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30357350

RESUMEN

The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Anotación de Secuencia Molecular , Dominios Proteicos , Proteínas/química , Secuencias Repetitivas de Aminoácido
19.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398656

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos , Internet , Familia de Multigenes , Dominios Proteicos/genética , Homología de Secuencia de Aminoácido , Programas Informáticos , Interfaz Usuario-Computador
20.
BMC Genomics ; 21(1): 408, 2020 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-32552739

RESUMEN

BACKGROUND: The metabolic capacity, stress response and evolution of uncultured environmental Tenericutes have remained elusive, since previous studies have been largely focused on pathogenic species. In this study, we expanded analyses on Tenericutes lineages that inhabit various environments using a collection of 840 genomes. RESULTS: Several environmental lineages were discovered inhabiting the human gut, ground water, bioreactors and hypersaline lake and spanning the Haloplasmatales and Mycoplasmatales orders. A phylogenomics analysis of Bacilli and Tenericutes genomes revealed that some uncultured Tenericutes are affiliated with novel clades in Bacilli, such as RF39, RFN20 and ML615. Erysipelotrichales and two major gut lineages, RF39 and RFN20, were found to be neighboring clades of Mycoplasmatales. We detected habitat-specific functional patterns between the pathogenic, gut and the environmental Tenericutes, where genes involved in carbohydrate storage, carbon fixation, mutation repair, environmental response and amino acid cleavage are overrepresented in the genomes of environmental lineages, perhaps as a result of environmental adaptation. We hypothesize that the two major gut lineages, namely RF39 and RFN20, are probably acetate and hydrogen producers. Furthermore, deteriorating capacity of bactoprenol synthesis for cell wall peptidoglycan precursors secretion is a potential adaptive strategy employed by these lineages in response to the gut environment. CONCLUSIONS: This study uncovers the characteristic functions of environmental Tenericutes and their relationships with Bacilli, which sheds new light onto the pathogenicity and evolutionary processes of Mycoplasmatales.


Asunto(s)
Bacillus/clasificación , Tenericutes/clasificación , Tenericutes/patogenicidad , Acetatos/metabolismo , Adaptación Fisiológica , Bacillus/genética , Bacillus/metabolismo , Reactores Biológicos/microbiología , ADN Bacteriano/genética , Microbioma Gastrointestinal , Agua Subterránea/microbiología , Humanos , Hidrógeno/metabolismo , Filogenia , ARN Ribosómico 16S/genética , Tenericutes/genética , Tenericutes/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA