Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Plant Direct ; 4(12): e00293, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33392435

RESUMO

We aim to enable the accurate and efficient transfer of knowledge about gene function gained from Arabidopsis thaliana and other model organisms to other plant species. This knowledge transfer is frequently challenging in plants due to duplications of individual genes and whole genomes in plant lineages. Such duplications result in complex evolutionary relationships between related genes, which may have similar sequences but highly divergent functions. In such cases, functional inference requires more than a simple sequence similarity calculation. We have developed an online resource, PhyloGenes (phylogenes.org), that displays precomputed phylogenetic trees for plant gene families along with experimentally validated function information for individual genes within the families. A total of 40 plant genomes and 10 non-plant model organisms are represented in over 8,000 gene families. Evolutionary events such as speciation and duplication are clearly labeled on gene trees to distinguish orthologs from paralogs. Nearly 6,000 families have at least one member with an experimentally supported annotation to a Gene Ontology (GO) molecular function or biological process term. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes facilitates the use of evolutionary history to determine the most likely function of genes that have not been experimentally characterized. Future work will enrich the resource by incorporating additional gene function datasets such as plant gene expression atlas data.

2.
Curr Protoc Bioinformatics ; 60: 1.11.1-1.11.45, 2017 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-29220077

RESUMO

The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive Web resource of Arabidopsis biology for plant scientists. TAIR curates and integrates information about genes, proteins, gene function, orthologs, gene expression, mutant phenotypes, biological materials such as clones and seed stocks, genetic markers, genetic and physical maps, genome organization, images of mutant plants, protein sub-cellular localizations, publications, and the research community. The various data types are extensively interconnected and can be accessed through a variety of Web-based search and display tools. This unit primarily focuses on some basic methods for searching, browsing, visualizing, and analyzing information about Arabidopsis genes and genome. Additionally, we describe how members of the community can share data using TAIR's Online Annotation Submission Tool (TOAST), in order to make their published research more accessible and visible. © 2017 by John Wiley & Sons, Inc.


Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Genes de Plantas , Proteínas de Arabidopsis/genética , Biologia Computacional , DNA de Plantas/genética , Mineração de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Ontologia Genética , Genoma de Planta , Internet , Anotação de Sequência Molecular
3.
Nucleic Acids Res ; 45(D1): D128-D134, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27794554

RESUMO

RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/química , Animais , Genômica , Humanos , Nucleotídeos/química , Análise de Sequência de RNA , Especificidade da Espécie
4.
Artigo em Inglês | MEDLINE | ID: mdl-26989150

RESUMO

Databases and data repositories provide essential functions for the research community by integrating, curating, archiving and otherwise packaging data to facilitate discovery and reuse. Despite their importance, funding for maintenance of these resources is increasingly hard to obtain. Fueled by a desire to find long term, sustainable solutions to database funding, staff from the Arabidopsis Information Resource (TAIR), founded the nonprofit organization, Phoenix Bioinformatics, using TAIR as a test case for user-based funding. Subscription-based funding has been proposed as an alternative to grant funding but its application has been very limited within the nonprofit sector. Our testing of this model indicates that it is a viable option, at least for some databases, and that it is possible to strike a balance that maximizes access while still incentivizing subscriptions. One year after transitioning to subscription support, TAIR is self-sustaining and Phoenix is poised to expand and support additional resources that wish to incorporate user-based funding strategies. Database URL: www.arabidopsis.org.


Assuntos
Arabidopsis/genética , Curadoria de Dados/economia , Modelos Teóricos , Apoio à Pesquisa como Assunto/economia , Bases de Dados Genéticas , Software
5.
PeerJ ; 3: e1470, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26713234

RESUMO

Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardize and link data about phenotypes and environments. Specifying and linking data through ontologies will allow researchers to increase the scope and flexibility of large-scale analyses aided by modern computing methods. Investments in this area would advance diverse fields such as ecology, phylogenetics, and conservation biology. While several biological ontologies are well-developed, using them to link phenotypes and environments is rare because of gaps in ontological coverage and limits to interoperability among ontologies and disciplines. In this manuscript, we present (1) use cases from diverse disciplines to illustrate questions that could be answered more efficiently using a robust linkage between phenotypes and environments, (2) two proof-of-concept analyses that show the value of linking phenotypes to environments in fishes and amphibians, and (3) two proposed example data models for linking phenotypes and environments using the extensible observation ontology (OBOE) and the Biological Collections Ontology (BCO); these provide a starting point for the development of a data model linking phenotypes and environments.

6.
Genesis ; 53(8): 474-85, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26201819

RESUMO

The Arabidopsis Information Resource (TAIR) is a continuously updated, online database of genetic and molecular biology data for the model plant Arabidopsis thaliana that provides a global research community with centralized access to data for over 30,000 Arabidopsis genes. TAIR's biocurators systematically extract, organize, and interconnect experimental data from the literature along with computational predictions, community submissions, and high throughput datasets to present a high quality and comprehensive picture of Arabidopsis gene function. TAIR provides tools for data visualization and analysis, and enables ordering of seed and DNA stocks, protein chips, and other experimental resources. TAIR actively engages with its users who contribute expertise and data that augments the work of the curatorial staff. TAIR's focus in an extensive and evolving ecosystem of online resources for plant biology is on the critically important role of extracting experimentally based research findings from the literature and making that information computationally accessible. In response to the loss of government grant funding, the TAIR team founded a nonprofit entity, Phoenix Bioinformatics, with the aim of developing sustainable funding models for biological databases, using TAIR as a test case. Phoenix has successfully transitioned TAIR to subscription-based funding while still keeping its data relatively open and accessible.


Assuntos
Arabidopsis/genética , Curadoria de Dados/métodos , Curadoria de Dados/normas , Bases de Dados Genéticas/normas , Genoma de Planta , Alelos , Proteínas de Arabidopsis/genética , Estudos de Associação Genética
7.
Plant Methods ; 11: 10, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25774204

RESUMO

BACKGROUND: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. RESULTS: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. CONCLUSIONS: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.

8.
PLoS Biol ; 13(1): e1002033, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25562316

RESUMO

Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.


Assuntos
Estudos de Associação Genética , Animais , Biologia Computacional , Curadoria de Dados , Bases de Dados Factuais/normas , Interação Gene-Ambiente , Genômica , Humanos , Fenótipo , Padrões de Referência , Reprodutibilidade dos Testes , Terminologia como Assunto
9.
Methods Mol Biol ; 1062: 65-96, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24057361

RESUMO

The volume of Arabidopsis information has increased enormously in recent years as a result of the sequencing of the reference genome and other large-scale functional genomics projects. Much of the data is stored in public databases, where data are organized, analyzed, and made freely accessible to the research community. These databases are resources that researchers can utilize for making predictions and developing testable hypotheses. The methods in this chapter describe ways to access and utilize Arabidopsis data and genomic resources found in databases and stock centers.


Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Sequência de Aminoácidos , Arabidopsis/metabolismo , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/genética , Mineração de Dados , Ontologia Genética , Genes de Plantas , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Conformação Proteica , Sementes/genética
10.
Plant Cell Physiol ; 54(2): e1, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23220694

RESUMO

The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary ('ontology') of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.


Assuntos
Genoma de Planta , Genômica/métodos , Plantas/anatomia & histologia , Plantas/genética , Software , Alquil e Aril Transferases/genética , Bases de Dados Genéticas , Flores/genética , Internet , Anotação de Sequência Molecular , Família Multigênica , Fenótipo , Folhas de Planta/anatomia & histologia , Proteínas de Plantas/genética
11.
Database (Oxford) ; 2012: bas047, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23221298

RESUMO

TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org


Assuntos
Arabidopsis/genética , Mineração de Dados/métodos , Bases de Dados Genéticas , Publicações Periódicas como Assunto , Fluxo de Trabalho , Genes de Plantas/genética , Anotação de Sequência Molecular , Fenótipo , Polimorfismo Genético , Ferramenta de Busca , Vocabulário Controlado
12.
Database (Oxford) ; 2012: bas041, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23160414

RESUMO

Today's biomedical research has become heavily dependent on access to the biological knowledge encoded in expert curated biological databases. As the volume of biological literature grows rapidly, it becomes increasingly difficult for biocurators to keep up with the literature because manual curation is an expensive and time-consuming endeavour. Past research has suggested that computer-assisted curation can improve efficiency, but few text-mining systems have been formally evaluated in this regard. Through participation in the interactive text-mining track of the BioCreative 2012 workshop, we developed PubTator, a PubMed-like system that assists with two specific human curation tasks: document triage and bioconcept annotation. On the basis of evaluation results from two external user groups, we find that the accuracy of PubTator-assisted curation is comparable with that of manual curation and that PubTator can significantly increase human curatorial speed. These encouraging findings warrant further investigation with a larger number of publications to be annotated. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/


Assuntos
Indexação e Redação de Resumos , Mineração de Dados/métodos , Bases de Dados Factuais , Genes , Publicações Periódicas como Assunto , PubMed , Interface Usuário-Computador , Retroalimentação , Humanos
13.
Database (Oxford) ; 2012: bas040, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23160413

RESUMO

WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology.


Assuntos
Arabidopsis/genética , Caenorhabditis elegans/genética , Mineração de Dados/métodos , Bases de Dados Genéticas , Dictyostelium/genética , Publicações Periódicas como Assunto , Fluxo de Trabalho , Acesso à Informação , Animais , Anotação de Sequência Molecular
14.
Database (Oxford) ; 2012: bas030, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22859749

RESUMO

As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org.


Assuntos
Bases de Dados Factuais , Internet , Anotação de Sequência Molecular/métodos , Publicações Periódicas como Assunto , Vocabulário Controlado , Indexação e Redação de Resumos , Arabidopsis , Biologia Computacional/métodos , Modelos Teóricos , Interface Usuário-Computador
15.
Nucleic Acids Res ; 40(19): e152, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22790981

RESUMO

We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.


Assuntos
Anotação de Sequência Molecular , Família Multigênica , Estrutura Terciária de Proteína , Proteínas/classificação , Software , Algoritmos , Sequência de Aminoácidos , Animais , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/classificação , Proteínas de Arabidopsis/genética , Sequência Consenso , Genômica/métodos , Camundongos , Estrutura Terciária de Proteína/genética , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína
16.
Database (Oxford) ; 2012: bas020, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22513129

RESUMO

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.


Assuntos
Pesquisa Biomédica , Mineração de Dados , Processamento de Linguagem Natural , Fluxo de Trabalho , Animais , Bases de Dados Factuais , Humanos
17.
Nucleic Acids Res ; 40(Database issue): D1202-10, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22140109

RESUMO

The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is a genome database for Arabidopsis thaliana, an important reference organism for many fundamental aspects of biology as well as basic and applied plant biology research. TAIR serves as a central access point for Arabidopsis data, annotates gene function and expression patterns using controlled vocabulary terms, and maintains and updates the A. thaliana genome assembly and annotation. TAIR also provides researchers with an extensive set of visualization and analysis tools. Recent developments include several new genome releases (TAIR8, TAIR9 and TAIR10) in which the A. thaliana assembly was updated, pseudogenes and transposon genes were re-annotated, and new data from proteomics and next generation transcriptome sequencing were incorporated into gene models and splice variants. Other highlights include progress on functional annotation of the genome and the release of several new tools including Textpresso for Arabidopsis which provides the capability to carry out full text searches on a large body of research literature.


Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Genes de Plantas , Anotação de Sequência Molecular , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Genoma de Planta , Software
18.
Stand Genomic Sci ; 6(3): 440-3, 2012 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-23409218

RESUMO

Representing phenotype in a way that can be linked to thousands of molecular genetic and environmental databases is an unresolved research challenge. A recent meeting of the Phenotype Research Coordination Network (RCN) aimed to coordinate and leverage current efforts. The three day summit meeting was hosted by NESCent (The National Evolutionary Synthesis Center) in Durham, North Carolina on the 23(rd) - 25(th) of February, 2012.

19.
BMC Bioinformatics ; 12 Suppl 8: S4, 2011 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-22151968

RESUMO

BACKGROUND: The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested. RESULTS: A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation. DISCUSSION: The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.


Assuntos
Mineração de Dados/métodos , Genes , Animais , Biologia Computacional/métodos , Publicações Periódicas como Assunto , Plantas/genética , Plantas/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...