Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Nucleic Acids Res ; 40(Database issue): D735-41, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22067452

ABSTRACT

Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community.


Subject(s)
Caenorhabditis elegans/genetics , Databases, Genetic , Genome, Helminth , Nematoda/genetics , Animals , Caenorhabditis/genetics , Caenorhabditis elegans/anatomy & histology , Computer Graphics , Gene Expression Profiling , Genomics , Internet , Molecular Sequence Annotation , Phenotype
2.
BMC Bioinformatics ; 12: 175, 2011 May 19.
Article in English | MEDLINE | ID: mdl-21595960

ABSTRACT

BACKGROUND: Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five years and have resulted in the development of automated tools that can recognize entities within a document and link those entities to a relevant database. Unfortunately, automated tools cannot resolve ambiguities that arise from one term being used to signify entities that are quite distinct from one another. Instead, resolving these ambiguities requires some manual oversight. Finding the right balance between the speed and portability of automation and the accuracy and flexibility of manual effort is a crucial goal to making text markup a successful venture. RESULTS: We have established a journal article mark-up pipeline that links GENETICS journal articles and the model organism database (MOD) WormBase. This pipeline uses a lexicon built with entities from the database as a first step. The entity markup pipeline results in links from over nine classes of objects including genes, proteins, alleles, phenotypes and anatomical terms. New entities and ambiguities are discovered and resolved by a database curator through a manual quality control (QC) step, along with help from authors via a web form that is provided to them by the journal. New entities discovered through this pipeline are immediately sent to an appropriate curator at the database. Ambiguous entities that do not automatically resolve to one link are resolved by hand ensuring an accurate link. This pipeline has been extended to other databases, namely Saccharomyces Genome Database (SGD) and FlyBase, and has been implemented in marking up a paper with links to multiple databases. CONCLUSIONS: Our semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases such as WormBase. Our pipeline results in interactive articles that are data rich with high accuracy. The use of a manual quality control step sets this pipeline apart from other hyperlinking tools and results in benefits to authors, journals, readers and databases.


Subject(s)
Databases, Factual , Periodicals as Topic , Animals , Biology/methods , Biology/trends , Caenorhabditis elegans/genetics , Databases, Genetic , Internet , Quality Control
3.
Nucleic Acids Res ; 38(Database issue): D463-7, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19910365

ABSTRACT

WormBase (http://www.wormbase.org) is a central data repository for nematode biology. Initially created as a service to the Caenorhabditis elegans research field, WormBase has evolved into a powerful research tool in its own right. In the past 2 years, we expanded WormBase to include the complete genomic sequence, gene predictions and orthology assignments from a range of related nematodes. This comparative data enrich the C. elegans data with improved gene predictions and a better understanding of gene function. In turn, they bring the wealth of experimental knowledge of C. elegans to other systems of medical and agricultural importance. Here, we describe new species and data types now available at WormBase. In addition, we detail enhancements to our curatorial pipeline and website infrastructure to accommodate new genomes and an extensive user base.


Subject(s)
Caenorhabditis elegans/genetics , Caenorhabditis/genetics , Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Alleles , Animals , Computational Biology/trends , Databases, Protein , Information Storage and Retrieval/methods , Internet , Phenotype , Protein Structure, Tertiary , Software , Transcription Factors
4.
Neuroinformatics ; 6(3): 195-204, 2008 Sep.
Article in English | MEDLINE | ID: mdl-18949581

ABSTRACT

Textpresso is a text-mining system for scientific literature. Its two major features are access to the full text of research papers and the development and use of categories of biological concepts as well as categories that describe or relate objects. A search engine enables the user to search for one or a combination of these categories and/or keywords within an entire literature. Here we describe Textpresso for Neuroscience, part of the core Neuroscience Information Framework (NIF). The Textpresso site currently consists of 67,500 full text papers and 131,300 abstracts. We show that using categories in literature can make a pure keyword query more refined and meaningful. We also show how semantic queries can be formulated with categories only. We explain the build and content of the database and describe the main features of the web pages and the advanced search options. We also give detailed illustrations of the web service developed to provide programmatic access to Textpresso. This web service is used by the NIF interface to access Textpresso. The standalone website of Textpresso for Neuroscience can be accessed at http://www.textpresso.org/neuroscience/.


Subject(s)
Computational Biology/methods , Databases as Topic , Neurosciences/methods , Periodicals as Topic , Access to Information , Animals , Computational Biology/organization & administration , Computational Biology/trends , Databases as Topic/organization & administration , Databases as Topic/trends , Humans , Information Storage and Retrieval/methods , Information Storage and Retrieval/trends , Internet/organization & administration , Internet/trends , Neurosciences/organization & administration , Neurosciences/trends , Periodicals as Topic/trends , Publishing/trends , Semantics , Software
5.
Neuroinformatics ; 6(3): 205-17, 2008 Sep.
Article in English | MEDLINE | ID: mdl-18958629

ABSTRACT

The overarching goal of the NIF (Neuroscience Information Framework) project is to be a one-stop-shop for Neuroscience. This paper provides a technical overview of how the system is designed. The technical goal of the first version of the NIF system was to develop an information system that a neuroscientist can use to locate relevant information from a wide variety of information sources by simple keyword queries. Although the user would provide only keywords to retrieve information, the NIF system is designed to treat them as concepts whose meanings are interpreted by the system. Thus, a search for term should find a record containing synonyms of the term. The system is targeted to find information from web pages, publications, databases, web sites built upon databases, XML documents and any other modality in which such information may be published. We have designed a system to achieve this functionality. A central element in the system is an ontology called NIFSTD (for NIF Standard) constructed by amalgamating a number of known and newly developed ontologies. NIFSTD is used by our ontology management module, called OntoQuest to perform ontology-based search over data sources. The NIF architecture currently provides three different mechanisms for searching heterogeneous data sources including relational databases, web sites, XML documents and full text of publications. Version 1.0 of the NIF system is currently in beta test and may be accessed through http://nif.nih.gov.


Subject(s)
Computational Biology/methods , Databases as Topic , Neurosciences/methods , Access to Information , Animals , Computational Biology/trends , Databases as Topic/trends , Humans , Information Storage and Retrieval/methods , Information Storage and Retrieval/trends , Internet/organization & administration , Internet/trends , Meta-Analysis as Topic , Neurosciences/standards , Software/standards , Software/trends
6.
Nucleic Acids Res ; 35(Database issue): D506-10, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17099234

ABSTRACT

WormBase (http://wormbase.org), a model organism database for Caenorhabditis elegans and other related nematodes, continues to evolve and expand. Over the past year WormBase has added new data on C.elegans, including data on classical genetics, cell biology and functional genomics; expanded the annotation of closely related nematodes with a new genome browser for Caenorhabditis remanei; and deployed new hardware for stronger performance. Several existing datasets including phenotype descriptions and RNAi experiments have seen a large increase in new content. New datasets such as the C.remanei draft assembly and annotations, the Vancouver Fosmid library and TEC-RED 5' end sites are now available as well. Access to and searching WormBase has become more dependable and flexible via multiple mirror sites and indexing through Google.


Subject(s)
Caenorhabditis elegans/genetics , Caenorhabditis/genetics , Databases, Genetic , Animals , Genes, Helminth , Genome, Helminth , Genomics , Internet , Oligonucleotide Array Sequence Analysis , Phenotype , RNA Interference , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...