Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Gigascience ; 112022 06 14.
Article in English | MEDLINE | ID: mdl-35701374

ABSTRACT

The increasingly multidisciplinary nature of scientific research necessitates a need for Open Data repositories that can archive data in support of publications in scientific journals. Recognising this need, even before GigaScience launched in 2012, GigaDB was already in place and taking data for a year before (making it 11 this year). Since GigaDB launched, there has been a consistent growth in this resource in terms of data volume, data discoverability and data re-use. In this commentary, we provide a retrospective of key changes over the last decade, and the role of Data Curation in enhancing the user experience. Furthermore we explore a much needed emphasis on enabling researchers to interact with and explore datasets prior to data download.


Subject(s)
Data Curation , Retrospective Studies
2.
Database (Oxford) ; 20192019 01 01.
Article in English | MEDLINE | ID: mdl-30753480

ABSTRACT

With a large increase in the volume and type of data archived in GigaScience Database (GigaDB) since its launch in 2011, we have studied the metrics and user patterns to assess the important aspects needed to best suit current and future use. This has led to new front-end developments and enhanced interactivity and functionality that greatly improve user experience. In this article, we present an overview of the current practices including the Biocurational role of the GigaDB staff, the broad usage metrics of GigaDB datasets and an update on how the GigaDB platform has been overhauled and enhanced to improve the stability and functionality of the codebase. Finally, we report on future directions for the GigaDB resource.


Subject(s)
Databases, Factual , Data Curation , Databases, Genetic , Internet , Time Factors
3.
Methods Mol Biol ; 1757: 399-470, 2018.
Article in English | MEDLINE | ID: mdl-29761466

ABSTRACT

WormBase ( www.wormbase.org ) provides the nematode research community with a centralized database for information pertaining to nematode genes and genomes. As more nematode genome sequences are becoming available and as richer data sets are published, WormBase strives to maintain updated information, displays, and services to facilitate efficient access to and understanding of the knowledge generated by the published nematode genetics literature. This chapter aims to provide an explanation of how to use basic features of WormBase, new features, and some commonly used tools and data queries. Explanations of the curated data and step-by-step instructions of how to access the data via the WormBase website and available data mining tools are provided.


Subject(s)
Caenorhabditis elegans/genetics , Databases, Genetic , Genome, Helminth , Genomics , Animals , Computational Biology/methods , Data Mining/methods , Epistasis, Genetic , Gene Ontology , Genes, Helminth , Genomics/methods , Humans , Phenotype , Proteome , Search Engine , Software , Transcriptome , User-Computer Interface , Web Browser
4.
WormBook ; 2018: 1-14, 2018 08 08.
Article in English | MEDLINE | ID: mdl-29722207

ABSTRACT

Genetic nomenclature for Caenorhabditis species and other nematodes is supervised by WormBase in collaboration with the Caenorhabditis Genetics Center (CGC) and with essential input from the community of scientists working on C. elegans and other nematodes.


Subject(s)
Caenorhabditis , Terminology as Topic , Animals , Caenorhabditis elegans Proteins , Genetic Variation , Polymorphism, Genetic , Species Specificity
5.
Nucleic Acids Res ; 46(D1): D869-D874, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29069413

ABSTRACT

WormBase (http://www.wormbase.org) is an important knowledge resource for biomedical researchers worldwide. To accommodate the ever increasing amount and complexity of research data, WormBase continues to advance its practices on data acquisition, curation and retrieval to most effectively deliver comprehensive knowledge about Caenorhabditis elegans, and genomic information about other nematodes and parasitic flatworms. Recent notable enhancements include user-directed submission of data, such as micropublication; genomic data curation and presentation, including additional genomes and JBrowse, respectively; new query tools, such as SimpleMine, Gene Enrichment Analysis; new data displays, such as the Person Lineage browser and the Summary of Ontology-based Annotations. Anticipating more rapid data growth ahead, WormBase continues the process of migrating to a cutting-edge database technology to achieve better stability, scalability, reproducibility and a faster response time. To better serve the broader research community, WormBase, with five other Model Organism Databases and The Gene Ontology project, have begun to collaborate formally as the Alliance of Genome Resources.


Subject(s)
Databases, Genetic , Genome , Nematoda/genetics , Animals , Caenorhabditis/genetics , Caenorhabditis elegans/genetics , Data Curation , Data Mining , Datasets as Topic , Disease Models, Animal , Forecasting , Gene Ontology , Humans , Information Storage and Retrieval , Platyhelminths/genetics , Publishing , RNA Interference , Sequence Alignment , User-Computer Interface , Web Browser
6.
Nucleic Acids Res ; 44(D1): D774-80, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26578572

ABSTRACT

WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research.


Subject(s)
Caenorhabditis elegans/genetics , Databases, Genetic , Genome, Helminth , Genomics , Nematoda/genetics , Animals , Genes, Helminth , Molecular Sequence Annotation , Platyhelminths/genetics , Software
7.
Nucleic Acids Res ; 42(Database issue): D789-93, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24194605

ABSTRACT

WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest.


Subject(s)
Caenorhabditis elegans/genetics , Databases, Genetic , Genome, Helminth , Animals , Internet , Molecular Sequence Annotation , Nematoda/genetics
8.
Nucleic Acids Res ; 42(Database issue): D546-52, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24163254

ABSTRACT

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.


Subject(s)
Databases, Genetic , Genome , Animals , Edible Grain/genetics , Genome, Bacterial , Genome, Fungal , Genome, Plant , Genomics , Internet , Molecular Sequence Annotation , Software
9.
BMC Bioinformatics ; 13: 16, 2012 Jan 26.
Article in English | MEDLINE | ID: mdl-22280404

ABSTRACT

BACKGROUND: Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance. RESULTS: We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction. CONCLUSIONS: Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.


Subject(s)
Artificial Intelligence , Databases, Factual , Databases, Genetic , Animals , Automation , Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Genomics , Mice/genetics , Publications , Support Vector Machine
10.
Worm ; 1(1): 15-21, 2012 Jan 01.
Article in English | MEDLINE | ID: mdl-24058818

ABSTRACT

WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

11.
Nucleic Acids Res ; 38(Database issue): D463-7, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19910365

ABSTRACT

WormBase (http://www.wormbase.org) is a central data repository for nematode biology. Initially created as a service to the Caenorhabditis elegans research field, WormBase has evolved into a powerful research tool in its own right. In the past 2 years, we expanded WormBase to include the complete genomic sequence, gene predictions and orthology assignments from a range of related nematodes. This comparative data enrich the C. elegans data with improved gene predictions and a better understanding of gene function. In turn, they bring the wealth of experimental knowledge of C. elegans to other systems of medical and agricultural importance. Here, we describe new species and data types now available at WormBase. In addition, we detail enhancements to our curatorial pipeline and website infrastructure to accommodate new genomes and an extensive user base.


Subject(s)
Caenorhabditis elegans/genetics , Caenorhabditis/genetics , Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Alleles , Animals , Computational Biology/trends , Databases, Protein , Information Storage and Retrieval/methods , Internet , Phenotype , Protein Structure, Tertiary , Software , Transcription Factors
12.
Nucleic Acids Res ; 36(Database issue): D612-7, 2008 Jan.
Article in English | MEDLINE | ID: mdl-17991679

ABSTRACT

WormBase (www.wormbase.org) is the major publicly available database of information about Caenorhabditis elegans, an important system for basic biological and biomedical research. Derived from the initial ACeDB database of C. elegans genetic and sequence information, WormBase now includes the genomic, anatomical and functional information about C. elegans, other Caenorhabditis species and other nematodes. As such, it is a crucial resource not only for C. elegans biologists but the larger biomedical and bioinformatics communities. Coverage of core areas of C. elegans biology will allow the biomedical community to make full use of the results of intensive molecular genetic analysis and functional genomic studies of this organism. Improved search and display tools, wider cross-species comparisons and extended ontologies are some of the features that will help scientists extend their research and take advantage of other nematode species genome sequences.


Subject(s)
Caenorhabditis elegans/genetics , Databases, Genetic , Genome, Helminth , Animals , Caenorhabditis elegans/metabolism , Chromosome Mapping , Gene Expression , Gene Regulatory Networks , Genes, Helminth , Genomics , Internet , Mass Spectrometry , Peptides/chemistry , Phenotype , User-Computer Interface
13.
Nucleic Acids Res ; 35(Database issue): D506-10, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17099234

ABSTRACT

WormBase (http://wormbase.org), a model organism database for Caenorhabditis elegans and other related nematodes, continues to evolve and expand. Over the past year WormBase has added new data on C.elegans, including data on classical genetics, cell biology and functional genomics; expanded the annotation of closely related nematodes with a new genome browser for Caenorhabditis remanei; and deployed new hardware for stronger performance. Several existing datasets including phenotype descriptions and RNAi experiments have seen a large increase in new content. New datasets such as the C.remanei draft assembly and annotations, the Vancouver Fosmid library and TEC-RED 5' end sites are now available as well. Access to and searching WormBase has become more dependable and flexible via multiple mirror sites and indexing through Google.


Subject(s)
Caenorhabditis elegans/genetics , Caenorhabditis/genetics , Databases, Genetic , Animals , Genes, Helminth , Genome, Helminth , Genomics , Internet , Oligonucleotide Array Sequence Analysis , Phenotype , RNA Interference , User-Computer Interface
14.
Nucleic Acids Res ; 34(Database issue): D475-8, 2006 Jan 01.
Article in English | MEDLINE | ID: mdl-16381915

ABSTRACT

WormBase (http://wormbase.org), the public database for genomics and biology of Caenorhabditis elegans, has been restructured for stronger performance and expanded for richer biological content. Performance was improved by accelerating the loading of central data pages such as the omnibus Gene page, by rationalizing internal data structures and software for greater portability, and by making the Genome Browser highly customizable in how it views and exports genomic subsequences. Arbitrarily complex, user-specified queries are now possible through Textpresso (for all available literature) and through WormMart (for most genomic data). Biological content was enriched by reconciling all available cDNA and expressed sequence tag data with gene predictions, clarifying single nucleotide polymorphism and RNAi sites, and summarizing known functions for most genes studied in this organism.


Subject(s)
Caenorhabditis elegans Proteins/chemistry , Caenorhabditis elegans Proteins/genetics , Caenorhabditis elegans/genetics , Databases, Genetic , Software , Animals , Caenorhabditis elegans/physiology , DNA, Complementary/chemistry , Expressed Sequence Tags/chemistry , Genome, Helminth , Genomics , Internet , Polymorphism, Single Nucleotide , RNA Interference , User-Computer Interface
15.
Nucleic Acids Res ; 33(Database issue): D29-33, 2005 Jan 01.
Article in English | MEDLINE | ID: mdl-15608199

ABSTRACT

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.


Subject(s)
Databases, Nucleic Acid , Base Sequence , Databases, Nucleic Acid/trends , Internet , User-Computer Interface
16.
Nucleic Acids Res ; 32(Database issue): D27-30, 2004 Jan 01.
Article in English | MEDLINE | ID: mdl-14681351

ABSTRACT

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), maintained at the European Bioinformatics Institute (EBI), incorporates, organizes and distributes nucleotide sequences from public sources. The database is a part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. The web-based tool, Webin, is the preferred system for individual submission of nucleotide sequences, including Third Party Annotation (TPA) and alignment data. Automatic submission procedures are used for submission of data from large-scale genome sequencing centres and from the European Patent Office. Database releases are produced quarterly. The latest data collection can be accessed via FTP, email and WWW interfaces. The EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases as well as many other specialist molecular biology databases. For sequence similarity searching, a variety of tools (e.g. FASTA and BLAST) are available that allow external users to compare their own sequences against the data in the EMBL Nucleotide Sequence Database, the complete genomic component subsection of the database, the WGS data sets and other databases. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk.


Subject(s)
Databases, Nucleic Acid , Animals , Europe , Genomics , Humans , Information Storage and Retrieval , Internet
17.
Nucleic Acids Res ; 31(1): 17-22, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12519939

ABSTRACT

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) incorporates, organizes and distributes nucleotide sequences from all available public sources. The database is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis to achieve optimal synchronization. Webin is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, Email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases plus many other specialized molecular biology databases. For sequence similarity searching, a variety of tools (e.g. Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.


Subject(s)
Databases, Nucleic Acid , Animals , Base Sequence , Data Collection , Databases, Nucleic Acid/trends , Genomics , Information Storage and Retrieval , Internet , Sequence Analysis, DNA
18.
Nucleic Acids Res ; 31(1): 43-50, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12519944

ABSTRACT

As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics Institute (EBI) hosts six core databases, which store information on DNA sequences (EMBL-Bank), protein sequences (SWISS-PROT and TrEMBL), protein structure (MSD), whole genomes (Ensembl) and gene expression (ArrayExpress). But just as a cell would be useless if it couldn't transcribe DNA or translate RNA, our resources would be compromised if each existed in isolation. We have therefore developed a range of tools that not only facilitate the deposition and retrieval of biological information, but also allow users to carry out searches that reflect the interconnectedness of biological information. The EBI's databases and tools are all available on our website at www.ebi.ac.uk.


Subject(s)
Computational Biology , Databases, Genetic , Animals , Cooperative Behavior , Data Collection , Databases, Protein , Europe , Genomics , Humans , Information Storage and Retrieval , Internet , Models, Molecular , Protein Structure, Tertiary , Proteins/chemistry , Proteins/physiology , Sequence Analysis, DNA , Sequence Analysis, Protein , Sequence Analysis, RNA , Transcription, Genetic , Vocabulary, Controlled
19.
Nucleic Acids Res ; 30(1): 21-6, 2002 Jan 01.
Article in English | MEDLINE | ID: mdl-11752244

ABSTRACT

The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.


Subject(s)
Databases, Nucleic Acid , Animals , Base Sequence , Confidentiality , Data Collection , Database Management Systems , Databases, Protein , Europe , Expressed Sequence Tags , Genome , Genome, Human , Humans , Information Storage and Retrieval , Internet , Patents as Topic , Sequence Alignment , Sequence Analysis , Systems Integration
SELECTION OF CITATIONS
SEARCH DETAIL
...