RESUMO
The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute (DOE-JGI) continues to maintain its role as one of the flagship genomic metadata repositories of the world. The ever-increasing number of projects and metadata are freely available to the user community world-wide. GOLD's metadata is consumed by scientists and remains an important source for large-scale comparative genomics analysis initiatives. Encouraged by this active user engagement and growth, GOLD has continued to add new components and capabilities. The new features such as a public Application Programming Interface (API) and Ecosystem landing page as well as the growth of different entities in this current GOLD v.9 edition are described in detail in this manuscript.
Assuntos
Bases de Dados Genéticas , Genômica , Genoma , SoftwareRESUMO
The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) is a manually curated, daily updated collection of genome projects and their metadata accumulated from around the world. The current version of the database includes over 1.17 million entries organized broadly into Studies (45 770), Organisms (387 382) or Biosamples (101 207), Sequencing Projects (355 364) and Analysis Projects (283 481). These four levels contain over 600 metadata fields, which includes 76 controlled vocabulary (CV) tables containing 3873 terms. GOLD provides an interactive web user interface for browsing and searching by a wide range of project and metadata fields. Users can enter details about their own projects in GOLD, which acts as a gatekeeper to ensure that metadata is accurately documented before submitting sequence information to the Integrated Microbial Genomes (IMG) system for analysis. In order to maintain a reference dataset for use by members of the scientific community, GOLD also imports projects from public repositories such as GenBank and SRA. The current status of the database, along with recent updates and improvements are described in this manuscript.
Assuntos
Bases de Dados Genéticas , Genoma , Ecossistema , Ontologia Genética , Ferramenta de Busca , Análise de Sequência de DNARESUMO
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is an open online resource, which maintains an up-to-date catalog of genome and metagenome projects in the context of a comprehensive list of associated metadata. Information in GOLD is organized into four levels: Study, Biosample/Organism, Sequencing Project and Analysis Project. Currently GOLD hosts information on 33 415 Studies, 49 826 Biosamples, 313 324 Organisms, 215 881 Sequencing Projects and 174 454 Analysis Projects with a total of 541 metadata fields, of which 80 are based on controlled vocabulary (CV) terms. GOLD provides a user-friendly web interface to browse sequencing projects and launch advanced search tools across four classification levels. Users submit metadata on a wide range of Sequencing and Analysis Projects in GOLD before depositing sequence data to the Integrated Microbial Genomes (IMG) system for analysis. GOLD conforms with and supports the rules set by the Genomic Standards Consortium (GSC) Minimum Information standards. The current version of GOLD (v.7) has seen the number of projects and associated metadata increase exponentially over the years. This paper provides an update on the current status of GOLD and highlights the new features added over the last two years.
Assuntos
Bases de Dados Genéticas/normas , Genômica/métodos , Software/normas , Ontologia GenéticaRESUMO
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Genoma , Genômica/métodos , Mineração de Dados , Metagenoma , Metagenômica/métodos , Software , Interface Usuário-ComputadorRESUMO
The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Metagenômica , InternetRESUMO
The power of next-generation sequencing has resulted in an explosive growth in the number of projects aiming to understand the metagenomic diversity of complex microbial environments. The interdisciplinary nature of this microbiome research community, along with the absence of reporting standards for microbiome data and samples, poses a significant challenge for follow-up studies. Commonly used names of metagenomes and metatranscriptomes in public databases currently lack the essential information necessary to accurately describe and classify the underlying samples, which makes a comparative analysis difficult to conduct and often results in misclassified sequences in data repositories. The Genomes OnLine Database (GOLD) (https:// gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute has been at the forefront of addressing this challenge by developing a standardized nomenclature system for naming microbiome samples. GOLD, currently in its twenty-fifth anniversary, continues to enrich the research community with hundreds of thousands of metagenomes and metatranscriptomes with well-curated and easy-to-understand names. Through this manuscript, we describe the overall naming process that can be easily adopted by researchers worldwide. Additionally, we propose the use of this naming system as a best practice for the scientific community to facilitate better interoperability and reusability of microbiome data.