Pesquisa | Portal Regional da BVS

BioCreative III interactive task: an overview.

Arighi, Cecilia N; Roberts, Phoebe M; Agarwal, Shashank; Bhattacharya, Sanmitra; Cesareni, Gianni; Chatr-Aryamontri, Andrew; Clematide, Simon; Gaudet, Pascale; Giglio, Michelle Gwinn; Harrow, Ian; Huala, Eva; Krallinger, Martin; Leser, Ulf; Li, Donghui; Liu, Feifan; Lu, Zhiyong; Maltais, Lois J; Okazaki, Naoaki; Perfetto, Livia; Rinaldi, Fabio; Sætre, Rune; Salgado, David; Srinivasan, Padmini; Thomas, Philippe E; Toldo, Luca; Hirschman, Lynette; Wu, Cathy H.

BMC Bioinformatics ; 12 Suppl 8: S4, 2011 Oct 03.

Artigo em Inglês | MEDLINE | ID: mdl-22151968

RESUMO

BACKGROUND: The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested. RESULTS: A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation. DISCUSSION: The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.

Assuntos

Mineração de Dados/métodos , Genes , Animais , Biologia Computacional/métodos , Publicações Periódicas como Assunto , Plantas/genética , Plantas/metabolismo

The gene normalization task in BioCreative III.

Lu, Zhiyong; Kao, Hung-Yu; Wei, Chih-Hsuan; Huang, Minlie; Liu, Jingchen; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Dai, Hong-Jie; Okazaki, Naoaki; Cho, Han-Cheol; Gerner, Martin; Solt, Illes; Agarwal, Shashank; Liu, Feifan; Vishnyakova, Dina; Ruch, Patrick; Romacker, Martin; Rinaldi, Fabio; Bhattacharya, Sanmitra; Srinivasan, Padmini; Liu, Hongfang; Torii, Manabu; Matos, Sergio; Campos, David; Verspoor, Karin; Livingston, Kevin M; Wilbur, W John.

BMC Bioinformatics ; 12 Suppl 8: S2, 2011 Oct 03.

Artigo em Inglês | MEDLINE | ID: mdl-22151901

RESUMO

BACKGROUND: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). RESULTS: We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. CONCLUSIONS: By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance.

Assuntos

Algoritmos , Mineração de Dados/métodos , Genes , Animais , Mineração de Dados/normas , Humanos , National Library of Medicine (U.S.) , Publicações Periódicas como Assunto , Estados Unidos

Building a high-quality sense inventory for improved abbreviation disambiguation.

Okazaki, Naoaki; Ananiadou, Sophia; Tsujii, Jun'ichi.

Bioinformatics ; 26(9): 1246-53, 2010 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-20360059

RESUMO

MOTIVATION: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions of bio-medical texts. However, expanded forms extracted by abbreviation recognition are mixtures of concepts/senses and their term variations. Consequently, a list of expanded forms should be structured into a sense inventory, which provides possible concepts or senses for abbreviation disambiguation. RESULTS: A sense inventory is a key to robust management of abbreviations. Therefore, we present a supervised approach for clustering expanded forms. The experimental result reports 0.915 F1 score in clustering expanded forms. We then investigate the possibility of conflicts of protein and gene names with abbreviations. Finally, an experiment of abbreviation disambiguation on the sense inventory yielded 0.984 accuracy and 0.986 F1 score using the dataset obtained from MEDLINE abstracts. AVAILABILITY: The sense inventory and disambiguator of abbreviations are accessible at http://www.nactem.ac.uk/software/acromine/ and http://www.nactem.ac.uk/software/acromine_disambiguation/.

Assuntos

Biologia Computacional/métodos , Algoritmos , Análise por Conglomerados , Bases de Dados Bibliográficas , Dicionários como Assunto , MEDLINE , Modelos Estatísticos , Processamento de Linguagem Natural , Reação em Cadeia da Polimerase/métodos , Reprodutibilidade dos Testes , Software , Tomografia Computadorizada por Raios X/métodos

Secoiridoid glucosides and unusual recyclized secoiridoid aglycones from Ligustrum vulgare.

Tanahashi, Takao; Takenaka, Yukiko; Okazaki, Naoaki; Koge, Megumi; Nagakura, Naotaka; Nishi, Toyoyuki.

Phytochemistry ; 70(17-18): 2072-7, 2009 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-19833363

RESUMO

Phytochemical investigation of the dried leaves and twigs of Ligustrum vulgare has led to the isolation of the secoiridoid glucosides, (2''R)- and (2''S)-10-hydroxy-2''-methoxyoleuropeins (1 and 2), and the secoiridoid aglycones, ligustrohemiacetals A (3) and B (4). Their structures were elucidated by spectroscopic and chemical means. Enzymatic hydrolysis of 10-hydroxyoleuropein to the analog of ligustrohemiacetals A and B led to the structural revision of jasmolactones.

Assuntos

Ligustrum/química , Extratos Vegetais/química , Hidrólise , Iridoides/química , Iridoides/isolamento & purificação , Estrutura Molecular , Folhas de Planta , Caules de Planta

Building an abbreviation dictionary using a term recognition approach.

Okazaki, Naoaki; Ananiadou, Sophia.

Bioinformatics ; 22(24): 3089-95, 2006 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-17050571

RESUMO

MOTIVATION: Acronyms result from a highly productive type of term variation and trigger the need for an acronym dictionary to establish associations between acronyms and their expanded forms. RESULTS: We propose a novel method for recognizing acronym definitions in a text collection. Assuming a word sequence co-occurring frequently with a parenthetical expression to be a potential expanded form, our method identifies acronym definitions in a similar manner to the statistical term recognition task. Applied to the whole MEDLINE (7 811 582 abstracts), the implemented system extracted 886 755 acronym candidates and recognized 300 954 expanded forms in reasonable time. Our method outperformed base-line systems, achieving 99% precision and 82-95% recall on our evaluation corpus that roughly emulates the whole MEDLINE. AVAILABILITY AND SUPPLEMENTARY INFORMATION: The implementations and supplementary information are available at our web site: http://www.chokkan.org/research/acromine/

Assuntos

Abreviaturas como Assunto , Indexação e Redação de Resumos/métodos , Dicionários como Assunto , Documentação/métodos , MEDLINE , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados

Secoiridoid and iridoid glucosides from Syringa afghanica.

Takenaka, Yukiko; Okazaki, Naoaki; Tanahashi, Takao; Nagakura, Naotaka; Nishi, Toyoyuki.

Phytochemistry ; 59(7): 779-87, 2002 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-11909635

RESUMO

Phytochemical investigation of the dried leaves of Syringa afghanica, has led to the isolation of nine secoiridoid glucosides, safghanosides A-H and 2"-epi-frameroside, as well as an iridoid glucoside, syringafghanoside along with nineteen known compounds. The structures were elucidated by spectroscopic and chemical means.

Assuntos

Glucosídeos/química , Oleaceae/química , Piranos/química , Glucosídeos/isolamento & purificação , Iridoides , Espectroscopia de Ressonância Magnética , Folhas de Planta/química , Piranos/isolamento & purificação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA