Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Nature ; 568(7753): 499-504, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30745586

RESUMEN

The composition of the human gut microbiota is linked to health and disease, but knowledge of individual microbial species is needed to decipher their biological roles. Despite extensive culturing and sequencing efforts, the complete bacterial repertoire of the human gut microbiota remains undefined. Here we identify 1,952 uncultured candidate bacterial species by reconstructing 92,143 metagenome-assembled genomes from 11,850 human gut microbiomes. These uncultured genomes substantially expand the known species repertoire of the collective human gut microbiota, with a 281% increase in phylogenetic diversity. Although the newly identified species are less prevalent in well-studied populations compared to reference isolate genomes, they improve classification of understudied African and South American samples by more than 200%. These candidate species encode hundreds of newly identified biosynthetic gene clusters and possess a distinctive functional capacity that might explain their elusive nature. Our work expands the known diversity of uncultured gut bacteria, which provides unprecedented resolution for taxonomic and functional characterization of the intestinal microbiota.


Asunto(s)
Bacterias/clasificación , Bacterias/genética , Microbioma Gastrointestinal/genética , Genoma Bacteriano/genética , Genómica , Metagenoma/genética , Bacterias/aislamiento & purificación , Bacterias/metabolismo , Humanos , Familia de Multigenes , Filogenia , Especificidad de la Especie
2.
Nucleic Acids Res ; 48(D1): D570-D578, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31696235

RESUMEN

MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.


Asunto(s)
Metagenoma , Microbiota , Filogenia , Programas Informáticos , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , ADN Espaciador Ribosómico/genética , Bases de Datos Genéticas , Metagenómica/métodos
3.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398656

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos , Internet , Familia de Multigenes , Dominios Proteicos/genética , Homología de Secuencia de Aminoácido , Programas Informáticos , Interfaz Usuario-Computador
4.
Nucleic Acids Res ; 46(D1): D726-D735, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29069476

RESUMEN

EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons.


Asunto(s)
Bases de Datos Genéticas , Metagenómica , Microbiota , Algoritmos , Secuencia de Bases , Clasificación/métodos , Conjuntos de Datos como Asunto , Metagenómica/métodos , ARN de Archaea/genética , ARN Bacteriano/genética , ARN Viral/genética , Ribotipificación , Programas Informáticos , Transcriptoma , Interfaz Usuario-Computador , Navegador Web , Flujo de Trabajo
5.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899635

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Filogenia
6.
Nucleic Acids Res ; 44(D1): D279-85, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26673716

RESUMEN

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Proteoma/química , Alineación de Secuencia , Análisis de Secuencia de Proteína , Anotación de Secuencia Molecular
7.
Commun Biol ; 5(1): 1217, 2022 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-36400841

RESUMEN

Understanding the myriad pathways by which antimicrobial-resistance genes (ARGs) spread across biomes is necessary to counteract the global menace of antimicrobial resistance. We screened 17939 assembled metagenomic samples covering 21 biomes, differing in sequencing quality and depth, unevenly across 46 countries, 6 continents, and 14 years (2005-2019) for clinically crucial ARGs, mobile colistin resistance (mcr), carbapenem resistance (CR), and (extended-spectrum) beta-lactamase (ESBL and BL) genes. These ARGs were most frequent in human gut, oral and skin biomes, followed by anthropogenic (wastewater, bioreactor, compost, food), and natural biomes (freshwater, marine, sediment). Mcr-9 was the most prevalent mcr gene, spatially and temporally; blaOXA-233 and blaTEM-1 were the most prevalent CR and BL/ESBL genes, but blaGES-2 and blaTEM-116 showed the widest distribution. Redundancy analysis and Bayesian analysis showed ARG distribution was non-random and best-explained by potential host genera and biomes, followed by collection year, anthropogenic factors and collection countries. Preferential ARG occurrence, and potential transmission, between characteristically similar biomes indicate strong ecological boundaries. Our results provide a high-resolution global map of ARG distribution and importantly, identify checkpoint biomes wherein interventions aimed at disrupting ARGs dissemination are likely to be most effective in reducing dissemination and in the long term, the ARG global burden.


Asunto(s)
Antibacterianos , Microbiota , Humanos , Antibacterianos/farmacología , Farmacorresistencia Bacteriana/genética , Teorema de Bayes , Microbiota/genética , Genes Bacterianos
8.
Genome Biol ; 21(1): 244, 2020 09 10.
Artículo en Inglés | MEDLINE | ID: mdl-32912302

RESUMEN

Microbial eukaryotes constitute a significant fraction of biodiversity and have recently gained more attention, but the recovery of high-quality metagenomic assembled eukaryotic genomes is limited by the current availability of tools. To help address this, we have developed EukCC, a tool for estimating the quality of eukaryotic genomes based on the automated dynamic selection of single copy marker gene sets. We demonstrate that our method outperforms current genome quality estimators, particularly for estimating contamination, and have applied EukCC to datasets derived from two different environments to enable the identification of novel eukaryote genomes, including one from the human skin.


Asunto(s)
Genoma Fúngico , Metagenómica/métodos , Programas Informáticos , Eucariontes , Piel/microbiología
9.
J Food Sci ; 85(2): 455-464, 2020 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-31957879

RESUMEN

Kombucha, a fermented tea generated from the co-culture of yeasts and bacteria, has gained worldwide popularity in recent years due to its potential benefits to human health. As a result, many studies have attempted to characterize both its biochemical properties and microbial composition. Here, we have applied a combination of whole metagenome sequencing (WMS) and amplicon (16S rRNA and Internal Transcribed Spacer 1 [ITS1]) sequencing to investigate the microbial communities of homemade Kombucha fermentations from day 3 to day 15. We identified the dominant bacterial genus as Komagataeibacter and dominant fungal genus as Zygosaccharomyces in all samples at all time points. Furthermore, we recovered three near complete Komagataeibacter genomes and one Zygosaccharomyces bailii genome and then predicted their functional properties. Also, we determined the broad taxonomic and functional profile of plasmids found within the Kombucha microbial communities. Overall, this study provides a detailed description of the taxonomic and functional systems of the Kombucha microbial community. Based on this, we conject that the functional complementarity enables metabolic cross talks between Komagataeibacter species and Z. bailii, which helps establish the sustained a relatively low diversity ecosystem in Kombucha.


Asunto(s)
Bacterias/aislamiento & purificación , Bebidas/microbiología , Alimentos Fermentados/microbiología , Microbiota , Levaduras/aislamiento & purificación , Bacterias/clasificación , Bacterias/genética , Bacterias/metabolismo , Fermentación , Metagenoma , Metagenómica , Análisis de Secuencia de ADN , Levaduras/clasificación , Levaduras/genética , Levaduras/metabolismo
10.
Microbiome ; 7(1): 78, 2019 05 22.
Artículo en Inglés | MEDLINE | ID: mdl-31118083

RESUMEN

BACKGROUND: The emergence of antibiotic-resistant pathogens has created an urgent need for novel antimicrobial treatments. Advances in next-generation sequencing have opened new frontiers for discovery programmes for natural products allowing the exploitation of a larger fraction of the microbial community. Polyketide (PK) and non-ribosomal pepetide (NRP) natural products have been reported to be related to compounds with antimicrobial and anticancer activities. We report here a new culture-independent approach to explore bacterial biosynthetic diversity and determine bacterial phyla in the microbial community associated with PK and NRP diversity in selected soils. RESULTS: Through amplicon sequencing, we explored the microbial diversity (16S rRNA gene) of 13 soils from Antarctica, Africa, Europe and a Caribbean island and correlated this with the amplicon diversity of the adenylation (A) and ketosynthase (KS) domains within functional genes coding for non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), which are involved in the production of NRP and PK, respectively. Mantel and Procrustes correlation analyses with microbial taxonomic data identified not only the well-studied phyla Actinobacteria and Proteobacteria, but also, interestingly, the less biotechnologically exploited phyla Verrucomicrobia and Bacteroidetes, as potential sources harbouring diverse A and KS domains. Some soils, notably that from Antarctica, provided evidence of endemic diversity, whilst others, such as those from Europe, clustered together. In particular, the majority of the domain reads from Antarctica remained unmatched to known sequences suggesting they could encode enzymes for potentially novel PK and NRP. CONCLUSIONS: The approach presented here highlights potential sources of metabolic novelty in the environment which will be a useful precursor to metagenomic biosynthetic gene cluster mining for PKs and NRPs which could provide leads for new antimicrobial metabolites.


Asunto(s)
Bacterias/clasificación , Variación Genética , Microbiota , Biosíntesis de Péptidos Independientes de Ácidos Nucleicos , Sintasas Poliquetidas/genética , Microbiología del Suelo , África , Regiones Antárticas , Bacterias/enzimología , Región del Caribe , Europa (Continente) , Familia de Multigenes , Filogenia , ARN Ribosómico 16S/genética
11.
Nat Biotechnol ; 37(2): 186-192, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30718869

RESUMEN

Understanding gut microbiome functions requires cultivated bacteria for experimental validation and reference bacterial genome sequences to interpret metagenome datasets and guide functional analyses. We present the Human Gastrointestinal Bacteria Culture Collection (HBC), a comprehensive set of 737 whole-genome-sequenced bacterial isolates, representing 273 species (105 novel species) from 31 families found in the human gastrointestinal microbiota. The HBC increases the number of bacterial genomes derived from human gastrointestinal microbiota by 37%. The resulting global Human Gastrointestinal Bacteria Genome Collection (HGG) classifies 83% of genera by abundance across 13,490 shotgun-sequenced metagenomic samples, improves taxonomic classification by 61% compared to the Human Microbiome Project (HMP) genome collection and achieves subspecies-level classification for almost 50% of sequences. The improved resource of gastrointestinal bacterial reference sequences circumvents dependence on de novo assembly of metagenomes and enables accurate and cost-effective shotgun metagenomic analyses of human gastrointestinal microbiota.


Asunto(s)
Genoma Bacteriano , Metagenoma , Metagenómica , Bacterias/clasificación , Biología Computacional/métodos , Mapeo Contig , Microbioma Gastrointestinal , Genoma Humano , Humanos , Filogenia , ARN Ribosómico 16S/metabolismo , Análisis de Secuencia de ADN , Especificidad de la Especie
12.
Nat Commun ; 10(1): 1014, 2019 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-30833550

RESUMEN

Metagenomic sequencing has greatly improved our ability to profile the composition of environmental and host-associated microbial communities. However, the dependency of most methods on reference genomes, which are currently unavailable for a substantial fraction of microbial species, introduces estimation biases. We present an updated and functionally extended tool based on universal (i.e., reference-independent), phylogenetic marker gene (MG)-based operational taxonomic units (mOTUs) enabling the profiling of >7700 microbial species. As more than 30% of them could not previously be quantified at this taxonomic resolution, relative abundance estimates based on mOTUs are more accurate compared to other methods. As a new feature, we show that mOTUs, which are based on essential housekeeping genes, are demonstrably well-suited for quantification of basal transcriptional activity of community members. Furthermore, single nucleotide variation profiles estimated using mOTUs reflect those from whole genomes, which allows for comparing microbial strain populations (e.g., across different human body sites).


Asunto(s)
Metagenómica , Microbiota/genética , Filogenia , Algoritmos , Análisis por Conglomerados , Biología Computacional/métodos , Perfilación de la Expresión Génica , Genes Esenciales , Marcadores Genéticos , Genoma , Interacciones Microbiota-Huesped , Humanos , Anotación de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de ADN
13.
Gigascience ; 7(5)2018 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-29762668

RESUMEN

Background: Taxonomic profiling of ribosomal RNA (rRNA) sequences has been the accepted norm for inferring the composition of complex microbial ecosystems. Quantitative Insights Into Microbial Ecology (QIIME) and mothur have been the most widely used taxonomic analysis tools for this purpose, with MAPseq and QIIME 2 being two recently released alternatives. However, no independent and direct comparison between these four main tools has been performed. Here, we compared the default classifiers of MAPseq, mothur, QIIME, and QIIME 2 using synthetic simulated datasets comprised of some of the most abundant genera found in the human gut, ocean, and soil environments. We evaluate their accuracy when paired with both different reference databases and variable sub-regions of the 16S rRNA gene. Findings: We show that QIIME 2 provided the best recall and F-scores at genus and family levels, together with the lowest distance estimates between the observed and simulated samples. However, MAPseq showed the highest precision, with miscall rates consistently <2%. Notably, QIIME 2 was the most computationally expensive tool, with CPU time and memory usage almost 2 and 30 times higher than MAPseq, respectively. Using the SILVA database generally yielded a higher recall than using Greengenes, while assignment results of different 16S rRNA variable sub-regions varied up to 40% between samples analysed with the same pipeline. Conclusions: Our results support the use of either QIIME 2 or MAPseq for optimal 16S rRNA gene profiling, and we suggest that the choice between the two should be based on the level of recall, precision, and/or computational performance required.


Asunto(s)
Bacterias/clasificación , Bacterias/genética , Microbiología Ambiental , Microbiota/genética , ARN Ribosómico 16S/genética , Biodiversidad , Bases de Datos Genéticas , Microbioma Gastrointestinal/genética , Humanos , Océanos y Mares , Filogenia , Análisis de Componente Principal , Suelo
14.
Artículo en Inglés | MEDLINE | ID: mdl-26994912

RESUMEN

The removal of annotation from biological databases is often perceived as an indicator of erroneous annotation. As a corollary, annotation stability is considered to be a measure of reliability. However, diverse data-driven events can affect the stability of annotations in both primary protein sequence databases and the protein family databases that are built upon the sequence databases and used to help annotate them. Here, we describe some of these events and their consequences for the InterPro database, and demonstrate that annotation removal or reassignment is not always linked to incorrect annotation by the curator. Database URL: http://www.ebi.ac.uk/interpro.


Asunto(s)
Bases de Datos Genéticas , Ontología de Genes , Anotación de Secuencia Molecular , Bases de Datos de Proteínas , Conocimiento
15.
Database (Oxford) ; 2012: bas019, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22508994

RESUMEN

The PRINTS database, now in its 21st year, houses a collection of diagnostic protein family 'fingerprints'. Fingerprints are groups of conserved motifs, evident in multiple sequence alignments, whose unique inter-relationships provide distinctive signatures for particular protein families and structural/functional domains. As such, they may be used to assign uncharacterized sequences to known families, and hence to infer tentative functional, structural and/or evolutionary relationships. The February 2012 release (version 42.0) includes 2156 fingerprints, encoding 12 444 individual motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. Here, we report the current status of the database, and introduce a number of recent developments that help both to render a variety of our annotation and analysis tools easier to use and to make them more widely available. Database URL: www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Proteínas/química , Proteínas/genética , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Secuencia Conservada , Humanos , Alineación de Secuencia , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA