RESUMEN
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.
Asunto(s)
Bacterias/genética , Bacterias/metabolismo , Genoma Bacteriano , Genómica/métodos , Metabolómica/métodos , Biología Computacional/métodos , Programas Informáticos , Navegador WebRESUMEN
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.
Asunto(s)
Biología Computacional/métodos , Metagenoma , Metagenómica/métodos , Microbiota/genética , Programas Informáticos , Navegador WebRESUMEN
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community.
Asunto(s)
Virus ADN/genética , Bases de Datos Genéticas , Genoma Viral , Genómica/métodos , Metagenómica/métodos , Retroviridae/genética , Programas Informáticos , Microbiología Ambiental , Interacciones Huésped-Patógeno , Metagenoma , Análisis de Secuencia de ADNRESUMEN
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currentlyâ¼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.
Asunto(s)
Genoma Arqueal/genética , Genoma Bacteriano/genética , Genómica , Análisis de Secuencia de ADN , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , Bases de Datos Genéticas , FilogeniaRESUMEN
BACKGROUND: The exponential growth of genomic data from next generation technologies renders traditional manual expert curation effort unsustainable. Many genomic systems have included community annotation tools to address the problem. Most of these systems adopted a "Wiki-based" approach to take advantage of existing wiki technologies, but encountered obstacles in issues such as usability, authorship recognition, information reliability and incentive for community participation. RESULTS: Here, we present a different approach, relying on tightly integrated method rather than "Wiki-based" method, to support community annotation and user collaboration in the Integrated Microbial Genomes (IMG) system. The IMG approach allows users to use existing IMG data warehouse and analysis tools to add gene, pathway and biosynthetic cluster annotations, to analyze/reorganize contigs, genes and functions using workspace datasets, and to share private user annotations and workspace datasets with collaborators. We show that the annotation effort using IMG can be part of the research process to overcome the user incentive and authorship recognition problems thus fostering collaboration among domain experts. The usability and reliability issues are addressed by the integration of curated information and analysis tools in IMG, together with DOE Joint Genome Institute (JGI) expert review. CONCLUSION: By incorporating annotation operations into IMG, we provide an integrated environment for users to perform deeper and extended data analysis and annotation in a single system that can lead to publications and community knowledge sharing as shown in the case studies.
Asunto(s)
Biología Computacional/métodos , Genoma Microbiano , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Conducta Cooperativa , Exactitud de los Datos , Difusión de la Información , Internet , Interfaz Usuario-ComputadorRESUMEN
IMG/M (http://img.jgi.doe.gov/m) provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in the context of a comprehensive set of reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG/M's data content and analytical tools have expanded continuously since its first version was released in 2007. Since the last report published in the 2012 NAR Database Issue, IMG/M's database architecture, annotation and data integration pipelines and analysis tools have been extended to copewith the rapid growth in the number and size of metagenome data sets handled by the system. IMG/M data marts provide support for the analysis of publicly available genomes, expert review of metagenome annotations (IMG/M ER: http://img.jgi.doe.gov/mer) and Human Microbiome Project (HMP)-specific metagenome samples (IMG/M HMP: http://img.jgi.doe.gov/imgm_hmp).
Asunto(s)
Bases de Datos Genéticas , Metagenoma , Perfilación de la Expresión Génica , Genoma Arqueal , Genoma Bacteriano , Genoma Viral , Internet , Metagenómica/normas , Plásmidos/genética , Estándares de Referencia , Análisis de Secuencia de Proteína , Programas Informáticos , Integración de SistemasRESUMEN
The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).
Asunto(s)
Bases de Datos Genéticas , Genoma Microbiano , Vías Biosintéticas/genética , Perfilación de la Expresión Génica , Genoma Arqueal , Genoma Bacteriano , Genoma Viral , Genómica , Internet , Anotación de Secuencia Molecular , Plásmidos/genética , Proteómica , Programas Informáticos , Integración de SistemasRESUMEN
The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11,472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond.
Asunto(s)
Bases de Datos Genéticas , Genómica , Metagenómica , Filogenia , Interfaz Usuario-ComputadorRESUMEN
The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp).
Asunto(s)
Bases de Datos Genéticas , Genoma Arqueal , Genoma Bacteriano , Genoma Viral , Genómica , Eucariontes/genética , Fenotipo , Plásmidos/genética , Proteómica , Programas Informáticos , Integración de SistemasRESUMEN
The integrated microbial genomes and metagenomes (IMG/M) system provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in a comprehensive integrated context. IMG/M integrates metagenome data sets with isolate microbial genomes from the IMG system. IMG/M's data content and analytical capabilities have been extended through regular updates since its first release in 2007. IMG/M is available at http://img.jgi.doe.gov/m. A companion IMG/M systems provide support for annotation and expert review of unpublished metagenomic data sets (IMG/M ER: http://img.jgi.doe.gov/mer).
Asunto(s)
Bases de Datos Genéticas , Metagenoma , Metagenómica , Sistemas de Administración de Bases de Datos , Eucariontes/genética , Genoma Arqueal , Genoma Bacteriano , Genoma Viral , Plásmidos/genética , Integración de SistemasRESUMEN
Desulfosporosinus species are sulfate-reducing bacteria belonging to the Firmicutes. Their genomes will give insights into the genetic repertoire and evolution of sulfate reducers typically thriving in terrestrial environments and able to degrade toluene (Desulfosporosinus youngiae), to reduce Fe(III) (Desulfosporosinus meridiei, Desulfosporosinus orientis), and to grow under acidic conditions (Desulfosporosinus acidiphilus).
Asunto(s)
Genoma Bacteriano , Peptococcaceae/clasificación , Peptococcaceae/genética , ADN Bacteriano/genética , Datos de Secuencia Molecular , Especificidad de la EspecieRESUMEN
The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Genoma , Genómica , Animales , Biología Computacional/tendencias , Bases de Datos de Proteínas , Genoma Arqueal , Genoma Bacteriano , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Estructura Terciaria de Proteína , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete microbial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at http://img.jgi.doe.gov.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Biología Computacional/tendencias , Genoma Arqueal , Genoma Bacteriano , Genoma Viral , Almacenamiento y Recuperación de la Información/métodos , Internet , Plásmidos/genética , Estructura Terciaria de Proteína , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
Micrococcus luteus (NCTC2665, "Fleming strain") has one of the smallest genomes of free-living actinobacteria sequenced to date, comprising a single circular chromosome of 2,501,097 bp (G+C content, 73%) predicted to encode 2,403 proteins. The genome shows extensive synteny with that of the closely related organism, Kocuria rhizophila, from which it was taxonomically separated relatively recently. Despite its small size, the genome harbors 73 insertion sequence (IS) elements, almost all of which are closely related to elements found in other actinobacteria. An IS element is inserted into the rrs gene of one of only two rrn operons found in M. luteus. The genome encodes only four sigma factors and 14 response regulators, a finding indicative of adaptation to a rather strict ecological niche (mammalian skin). The high sensitivity of M. luteus to beta-lactam antibiotics may result from the presence of a reduced set of penicillin-binding proteins and the absence of a wblC gene, which plays an important role in the antibiotic resistance in other actinobacteria. Consistent with the restricted range of compounds it can use as a sole source of carbon for energy and growth, M. luteus has a minimal complement of genes concerned with carbohydrate transport and metabolism and its inability to utilize glucose as a sole carbon source may be due to the apparent absence of a gene encoding glucokinase. Uniquely among characterized bacteria, M. luteus appears to be able to metabolize glycogen only via trehalose and to make trehalose only via glycogen. It has very few genes associated with secondary metabolism. In contrast to most other actinobacteria, M. luteus encodes only one resuscitation-promoting factor (Rpf) required for emergence from dormancy, and its complement of other dormancy-related proteins is also much reduced. M. luteus is capable of long-chain alkene biosynthesis, which is of interest for advanced biofuel production; a three-gene cluster essential for this metabolism has been identified in the genome.
Asunto(s)
Actinobacteria/genética , Genoma Bacteriano/genética , Micrococcus luteus/genética , Regulación Bacteriana de la Expresión Génica/genética , Regulación Bacteriana de la Expresión Génica/fisiología , Modelos GenéticosRESUMEN
Methanohalophilus mahii is the type species of the genus Methanohalophilus, which currently comprises three distinct species with validly published names. Mhp. mahii represents moderately halophilic methanogenic archaea with a strictly methylotrophic metabolism. The type strain SLP(T) was isolated from hypersaline sediments collected from the southern arm of Great Salt Lake, Utah. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,012,424 bp genome is a single replicon with 2032 protein-coding and 63 RNA genes and part of the Genomic Encyclopedia of Bacteria and Archaea project. A comparison of the reconstructed energy metabolism in the halophilic species Mhp. mahii with other representatives of the Methanosarcinaceae reveals some interesting differences to freshwater species.
Asunto(s)
ADN de Archaea/genética , Genoma Arqueal , Sedimentos Geológicos/microbiología , Redes y Vías Metabólicas/genética , Methanosarcinaceae/genética , Análisis de Secuencia de ADN , ADN de Archaea/química , Metabolismo Energético/genética , Methanosarcinaceae/aislamiento & purificación , Datos de Secuencia Molecular , UtahRESUMEN
MOTIVATION: A rapidly increasing number of microbial genomes are sequenced by organizations worldwide and are eventually included into various public genome data resources. The quality of the annotations depends largely on the original dataset providers, with erroneous or incomplete annotations often carried over into the public resources and difficult to correct. RESULTS: We have developed an Expert Review (ER) version of the Integrated Microbial Genomes (IMG) system, with the goal of supporting systematic and efficient revision of microbial genome annotations. IMG ER provides tools for the review and curation of annotations of both new and publicly available microbial genomes within IMG's rich integrated genome framework. New genome datasets are included into IMG ER prior to their public release either with their native annotations or with annotations generated by IMG ER's annotation pipeline. IMG ER tools allow addressing annotation problems detected with IMG's comparative analysis tools, such as genes missed by gene prediction pipelines or genes without an associated function. Over the past year, IMG ER was used for improving the annotations of about 150 microbial genomes.
Asunto(s)
Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Genoma Bacteriano/genética , Proteínas Bacterianas/genética , Enzimas/genética , Genes BacterianosRESUMEN
IMG/M is a data management and analysis system for microbial community genomes (metagenomes) hosted at the Department of Energy's (DOE) Joint Genome Institute (JGI). IMG/M consists of metagenome data integrated with isolate microbial genomes from the Integrated Microbial Genomes (IMG) system. IMG/M provides IMG's comparative data analysis tools extended to handle metagenome data, together with metagenome-specific analysis tools. IMG/M is available at http://img.jgi.doe.gov/m.
Asunto(s)
Bases de Datos Genéticas , Microbiología Ambiental , Genoma Arqueal , Genoma Bacteriano , Sistemas de Administración de Bases de Datos , Genómica , Internet , Programas InformáticosRESUMEN
The integrated microbial genomes (IMG) system is a data management, analysis and annotation platform for all publicly available genomes. IMG contains both draft and complete JGI microbial genomes integrated with all other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and annotating genomes, genes and functions, individually or in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through quarterly releases. IMG is provided by the DOE-Joint Genome Institute (JGI) and is available from http://img.jgi.doe.gov.
Asunto(s)
Bases de Datos Genéticas , Genoma Arqueal , Genoma Bacteriano , Genómica , Genoma Viral , Internet , Plásmidos/genética , Proteínas/química , Proteínas/genética , Programas Informáticos , Integración de SistemasRESUMEN
MOTIVATION: A typical metagenome dataset generated using a 454 pyrosequencing platform consists of short reads sampled from the collective genome of a microbial community. The amount of sequence in such datasets is usually insufficient for assembly, and traditional gene prediction cannot be applied to unassembled short reads. As a result, analysis of such datasets usually involves comparisons in terms of relative abundances of various protein families. The latter requires assignment of individual reads to protein families, which is hindered by the fact that short reads contain only a fragment, usually small, of a protein. RESULTS: We have considered the assignment of pyrosequencing reads to protein families directly using RPS-BLAST against COG and Pfam databases and indirectly via proxygenes that are identified using BLASTx searches against protein sequence databases. Using simulated metagenome datasets as benchmarks, we show that the proxygene method is more accurate than the direct assignment. We introduce a clustering method which significantly reduces the size of a metagenome dataset while maintaining a faithful representation of its functional and taxonomic content.
Asunto(s)
Proteínas Bacterianas/genética , Mapeo Cromosómico/métodos , Sistemas de Lectura Abierta/genética , Proteoma/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Análisis por Conglomerados , Datos de Secuencia MolecularRESUMEN
There is an urgent need to capture metadata on the rapidly growing number of genomic, metagenomic and related sequences, such as 16S ribosomal genes. This need is a major focus within the Genomic Standards Consortium (GSC), and Habitat is a key metadata descriptor in the proposed "Minimum Information about a Genome Sequence" (MIGS) specification. The goal of the work described here is to provide a light-weight, easy-to-use (small) set of terms ("Habitat-Lite") that captures high-level information about habitat while preserving a mapping to the recently launched Environment Ontology (EnvO). Our motivation for building Habitat-Lite is to meet the needs of multiple users, such as annotators curating these data, database providers hosting the data, and biologists and bioinformaticians alike who need to search and employ such data in comparative analyses. Here, we report a case study based on semiautomated identification of terms from GenBank and GOLD. We estimate that the terms in the initial version of Habitat-Lite would provide useful labels for over 60% of the kinds of information found in the GenBank isolation_source field, and around 85% of the terms in the GOLD habitat field. We present a revised version of Habitat-Lite defined within the EnvO Environmental Ontology through a new category, EnvO-Lite-GSC. We invite the community's feedback on its further development to provide a minimum list of terms to capture high-level habitat information and to provide classification bins needed for future studies.