Search | VHL Search Portal

1.

Integrating heterogeneous knowledge graphs into drug-drug interaction extraction from the literature.

Asada, Masaki; Miwa, Makoto; Sasaki, Yutaka.

Bioinformatics ; 39(1)2023 01 01.

Article in English | MEDLINE | ID: mdl-36416141

ABSTRACT

MOTIVATION: Most of the conventional deep neural network-based methods for drug-drug interaction (DDI) extraction consider only context information around drug mentions in the text. However, human experts use heterogeneous background knowledge about drugs to comprehend pharmaceutical papers and extract relationships between drugs. Therefore, we propose a novel method that simultaneously considers various heterogeneous information for DDI extraction from the literature. RESULTS: We first construct drug representations by conducting the link prediction task on a heterogeneous pharmaceutical knowledge graph (KG) dataset. We then effectively combine the text information of input sentences in the corpus and the information on drugs in the heterogeneous KG (HKG) dataset. Finally, we evaluate our DDI extraction method on the DDIExtraction-2013 shared task dataset. In the experiment, integrating heterogeneous drug information significantly improves the DDI extraction performance, and we achieved an F-score of 85.40%, which results in state-of-the-art performance. We evaluated our method on the DrugProt dataset and improved the performance significantly, achieving an F-score of 77.9%. Further analysis showed that each type of node in the HKG contributes to the performance improvement of DDI extraction, indicating the importance of considering multiple pieces of information. AVAILABILITY AND IMPLEMENTATION: Our code is available at https://github.com/tticoin/HKG-DDIE.git.

Subject(s)

Data Mining , Pattern Recognition, Automated , Humans , Pattern Recognition, Automated/methods , Data Mining/methods , Drug Interactions , Neural Networks, Computer , Pharmaceutical Preparations

2.

BioVAE: a pre-trained latent variable language model for biomedical text mining.

Trieu, Hai-Long; Miwa, Makoto; Ananiadou, Sophia.

Bioinformatics ; 38(3): 872-874, 2022 01 12.

Article in English | MEDLINE | ID: mdl-34636886

ABSTRACT

SUMMARY: Large-scale pre-trained language models (PLMs) have advanced state-of-the-art (SOTA) performance on various biomedical text mining tasks. The power of such PLMs can be combined with the advantages of deep generative models. These are examples of these combinations. However, they are trained only on general domain text, and biomedical models are still missing. In this work, we describe BioVAE, the first large-scale pre-trained latent variable language model for the biomedical domain, which uses the OPTIMUS framework to train on large volumes of biomedical text. The model shows SOTA performance on several biomedical text mining tasks when compared to existing publicly available biomedical PLMs. In addition, our model can generate more accurate biomedical sentences than the original OPTIMUS output. AVAILABILITY AND IMPLEMENTATION: Our source code and pre-trained models are freely available: https://github.com/aistairc/BioVAE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Data Mining , Language , Software , Natural Language Processing

3.

Large-scale neural biomedical entity linking with layer overwriting.

Tsujimura, Tomoki; Miwa, Makoto; Sasaki, Yutaka.

J Biomed Inform ; 143: 104433, 2023 07.

Article in English | MEDLINE | ID: mdl-37385326

ABSTRACT

MOTIVATION: Entity linking is the task of linking entity mentions to the database entries corresponding to the entity mentions. Entity linking enables the treatment of superficially different but semantically identical mentions as the same entity. Since millions of concepts are listed in biomedical databases, selecting the correct database entry for each targeted entity is challenging. Simple string matching between the word and each synonym in biomedical databases is insufficient to handle a wide variety of variants of biomedical entities appearing in the biomedical literature. Recent progress in neural approaches is promising for entity linking. Still, existing neural methods require sufficient data, which is difficult to prepare in biomedical entity linking that deals with millions of biomedical concepts. Therefore, we need to develop a new neural method to train entity-linking models over the sparse training data covering a very limited part of the biomedical concepts. RESULTS: We have devised a pure neural model that classifies biomedical entity mentions into millions of biomedical concepts. The classifier employs (1) the layer overwriting that breaks through the performance ceiling during training, (2) training data augmentation using database entries that compensate for the problem of insufficient training data, and (3) the cosine similarity-based loss function that helps distinguish the millions of biomedical concepts. Our system using the proposed classifier was ranked first in the official run of the National NLP Clinical Challenges (n2c2) 2019 Track 3, which targeted linking medical/clinical entity mentions to 434,056 Concept Unique Identifier (CUI) entries. We also applied our system to the MedMentions dataset, which has 3.2M candidate concepts. Experimental results confirmed the same advantages of our proposed method. We further evaluated our system on the NLM-CHEM corpus with 350K candidate concepts, and our system achieved a new state-of-the-art performance on the corpus. AVAILABILITY: https://github.com/tti-coin/bio-linking Contact:makoto.miwa@toyota-ti.ac.jp.

Subject(s)

Data Mining , Semantics , Data Mining/methods , Databases, Factual

4.

Contextualized medication event extraction with striding NER and multi-turn QA.

Tsujimura, Tomoki; Yamada, Koshi; Ida, Ryuki; Miwa, Makoto; Sasaki, Yutaka.

J Biomed Inform ; 144: 104416, 2023 08.

Article in English | MEDLINE | ID: mdl-37321443

ABSTRACT

This paper describes contextualized medication event extraction for automatically identifying medication change events with their contexts from clinical notes. The striding named entity recognition (NER) model extracts medication name spans from an input text sequence using a sliding-window approach. Specifically, the striding NER model separates the input sequence into a set of overlapping subsequences of 512 tokens with 128 tokens of stride, processing each subsequence using a large pre-trained language model and aggregating the outputs from the subsequences. The event and context classification has been done with multi-turn question-answering (QA) and span-based models. The span-based model classifies the span of each medication name using the span representation of the language model. In the QA model, event classification is augmented with questions in classifying the change events of each medication name and the context of the change events, while the model architecture is a classification style that is the same as the span-based model. We evaluated our extraction system on the n2c2 2022 Track 1 dataset, which is annotated for medication extraction (ME), event classification (EC), and context classification (CC) from clinical notes. Our system is a pipeline of the striding NER model for ME and the ensemble of the span-based and QA-based models for EC and CC. Our system achieved a combined F-score of 66.47% for the end-to-end contextualized medication event extraction (Release 1), which is the highest score among the participants of the n2c2 2022 Track 1.

Subject(s)

Medication Systems , Natural Language Processing , Humans , Language , Data Mining , Electronic Health Records

5.

Contextualized medication event extraction with levitated markers.

Vasilakes, Jake; Georgiadis, Panagiotis; Nguyen, Nhung T H; Miwa, Makoto; Ananiadou, Sophia.

J Biomed Inform ; 141: 104347, 2023 05.

Article in English | MEDLINE | ID: mdl-37030658

ABSTRACT

Automatic extraction of patient medication histories from free-text clinical notes can increase the amount of relevant information to clinicians for developing treatment plans. In addition to detecting medication events, clinical text mining systems must also be able to predict event context, such as negation, uncertainty, and time of occurrence, in order to construct accurate patient timelines. Towards this goal, we introduce Levitated Context Markers (LCMs), a novel transformer-based model for contextualized event extraction. LCMs are an adaptation of levitated markers -originally developed for relation extraction- that allow pretrained transformer models to utilize global input representations while also focusing on event-related subspans using a sparse attention mechanism. In addition to outperforming a strong baseline model on the Contextualized Medication Event Dataset, we show that LCMs' sparse attention can provide interpretable predictions by detecting relevant context cues in an unsupervised manner.

Subject(s)

Data Mining , Records , Humans , Natural Language Processing

6.

Comparing neural models for nested and overlapping biomedical event detection.

Espinosa, Kurt; Georgiadis, Panagiotis; Christopoulou, Fenia; Ju, Meizhi; Miwa, Makoto; Ananiadou, Sophia.

BMC Bioinformatics ; 23(1): 211, 2022 Jun 02.

Article in English | MEDLINE | ID: mdl-35655127

ABSTRACT

BACKGROUND: Nested and overlapping events are particularly frequent and informative structures in biomedical event extraction. However, state-of-the-art neural models either neglect those structures during learning or use syntactic features and external tools to detect them. To overcome these limitations, this paper presents and compares two neural models: a novel EXhaustive Neural Network (EXNN) and a Search-Based Neural Network (SBNN) for detection of nested and overlapping events. RESULTS: We evaluate the proposed models as an event detection component in isolation and within a pipeline setting. Evaluation in several annotated biomedical event extraction datasets shows that both EXNN and SBNN achieve higher performance in detecting nested and overlapping events, compared to the state-of-the-art model Turku Event Extraction System (TEES). CONCLUSIONS: The experimental results reveal that both EXNN and SBNN are effective for biomedical event extraction. Furthermore, results on a pipeline setting indicate that our models improve detection of events compared to models that use either gold or predicted named entities.

Subject(s)

Models, Biological , Neural Networks, Computer

7.

Using drug descriptions and molecular structures for drug-drug interaction extraction from literature.

Asada, Masaki; Miwa, Makoto; Sasaki, Yutaka.

Bioinformatics ; 37(12): 1739-1746, 2021 07 19.

Article in English | MEDLINE | ID: mdl-33098410

ABSTRACT

MOTIVATION: Neural methods to extract drug-drug interactions (DDIs) from literature require a large number of annotations. In this study, we propose a novel method to effectively utilize external drug database information as well as information from large-scale plain text for DDI extraction. Specifically, we focus on drug description and molecular structure information as the drug database information. RESULTS: We evaluated our approach on the DDIExtraction 2013 shared task dataset. We obtained the following results. First, large-scale raw text information can greatly improve the performance of extracting DDIs when combined with the existing model and it shows the state-of-the-art performance. Second, each of drug description and molecular structure information is helpful to further improve the DDI performance for some specific DDI types. Finally, the simultaneous use of the drug description and molecular structure information can significantly improve the performance on all the DDI types. We showed that the plain text, the drug description information and molecular structure information are complementary and their effective combination is essential for the improvement. AVAILABILITY AND IMPLEMENTATION: Our code is available at https://github.com/tticoin/DESC_MOL-DDIE.

Subject(s)

Data Mining , Pharmaceutical Preparations , Drug Interactions , Molecular Structure , Publications

8.

DeepEventMine: end-to-end neural nested event extraction from biomedical texts.

Trieu, Hai-Long; Tran, Thy Thy; Duong, Khoa N A; Nguyen, Anh; Miwa, Makoto; Ananiadou, Sophia.

Bioinformatics ; 36(19): 4910-4917, 2020 12 08.

Article in English | MEDLINE | ID: mdl-33141147

ABSTRACT

MOTIVATION: Recent neural approaches on event extraction from text mainly focus on flat events in general domain, while there are less attempts to detect nested and overlapping events. These existing systems are built on given entities and they depend on external syntactic tools. RESULTS: We propose an end-to-end neural nested event extraction model named DeepEventMine that extracts multiple overlapping directed acyclic graph structures from a raw sentence. On the top of the bidirectional encoder representations from transformers model, our model detects nested entities and triggers, roles, nested events and their modifications in an end-to-end manner without any syntactic tools. Our DeepEventMine model achieves the new state-of-the-art performance on seven biomedical nested event extraction tasks. Even when gold entities are unavailable, our model can detect events from raw text with promising performance. AVAILABILITY AND IMPLEMENTATION: Our codes and models to reproduce the results are available at: https://github.com/aistairc/DeepEventMine. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Language , Research Design

9.

Syntactically-informed word representations from graph neural network.

Tran, Thy Thy; Miwa, Makoto; Ananiadou, Sophia.

Neurocomputing (Amst) ; 413: 431-443, 2020 Nov 06.

Article in English | MEDLINE | ID: mdl-33162674

ABSTRACT

Most deep language understanding models depend only on word representations, which are mainly based on language modelling derived from a large amount of raw text. These models encode distributional knowledge without considering syntactic structural information, although several studies have shown benefits of including such information. Therefore, we propose new syntactically-informed word representations (SIWRs), which allow us to enrich the pre-trained word representations with syntactic information without training language models from scratch. To obtain SIWRs, a graph-based neural model is built on top of either static or contextualised word representations such as GloVe, ELMo and BERT. The model is first pre-trained with only a relatively modest amount of task-independent data that are automatically annotated using existing syntactic tools. SIWRs are then obtained by applying the model to downstream task data and extracting the intermediate word representations. We finally replace word representations in downstream models with SIWRs for applications. We evaluate SIWRs on three information extraction tasks, namely nested named entity recognition (NER), binary and n-ary relation extractions (REs). The results demonstrate that our SIWRs yield performance gains over the base representations in these NLP tasks with 3-9% relative error reduction. Our SIWRs also perform better than fine-tuning BERT in binary RE. We also conduct extensive experiments to analyse the proposed method.

10.

Topic detection using paragraph vectors to support active learning in systematic reviews.

Hashimoto, Kazuma; Kontonatsios, Georgios; Miwa, Makoto; Ananiadou, Sophia.

J Biomed Inform ; 62: 59-65, 2016 08.

Article in English | MEDLINE | ID: mdl-27293211

ABSTRACT

Systematic reviews require expert reviewers to manually screen thousands of citations in order to identify all relevant articles to the review. Active learning text classification is a supervised machine learning approach that has been shown to significantly reduce the manual annotation workload by semi-automating the citation screening process of systematic reviews. In this paper, we present a new topic detection method that induces an informative representation of studies, to improve the performance of the underlying active learner. Our proposed topic detection method uses a neural network-based vector space model to capture semantic similarities between documents. We firstly represent documents within the vector space, and cluster the documents into a predefined number of clusters. The centroids of the clusters are treated as latent topics. We then represent each document as a mixture of latent topics. For evaluation purposes, we employ the active learning strategy using both our novel topic detection method and a baseline topic model (i.e., Latent Dirichlet Allocation). Results obtained demonstrate that our method is able to achieve a high sensitivity of eligible studies and a significantly reduced manual annotation cost when compared to the baseline method. This observation is consistent across two clinical and three public health reviews. The tool introduced in this work is available from https://nactem.ac.uk/pvtopic/.

Subject(s)

Machine Learning , Semantics , Classification , Humans , Review Literature as Topic , Support Vector Machine

11.

Photochlorination of Polycyclic Aromatic Hydrocarbons in Acidic Brine Solution.

Ohura, Takeshi; Miwa, Makoto.

Bull Environ Contam Toxicol ; 96(4): 524-9, 2016 Apr.

Article in English | MEDLINE | ID: mdl-26728279

ABSTRACT

The potential for the formation of chlorinated polycyclic aromatic hydrocarbons via photochlorination of PAHs has been investigated in milli-Q water/synthetic water containing NaCl and PAHs with either UV or visible light. The photochlorination of pyrene occurred under acidic conditions in the presence of both UV and visible light, resulting in 1-chloropyrene as the main product. Benzo[a]pyrene yielded 6-chlorobenzo[a]pyrene following visible light irradiation; however the reaction was dependent upon solution pH. The photochlorination of PAHs was proposed to proceed via a consecutive reaction model. The rate constants associated with the photochlorination and photodecay processes were determined with the observed and theoretical values displaying similar trends, whereas the observed values were approximately 50-1000 times lower than the theoretical values. The lower observed values could be due to undergo photodecay rather than photochlorination of PAHs. Therefore, as photochlorination of PAHs appears to be significantly affected by solution pH, this information may allow for minimizing the impact on the environment.

Subject(s)

Hydrocarbons, Chlorinated/analysis , Light , Polycyclic Aromatic Hydrocarbons/chemistry , Water Pollutants, Chemical/chemistry , Hydrogen-Ion Concentration , Models, Theoretical , Photolysis , Polycyclic Aromatic Hydrocarbons/radiation effects , Salts , Solutions , Ultraviolet Rays , Water Pollutants, Chemical/radiation effects

12.

Adaptable, high recall, event extraction system with minimal configuration.

Miwa, Makoto; Ananiadou, Sophia.

BMC Bioinformatics ; 16 Suppl 10: S7, 2015.

Article in English | MEDLINE | ID: mdl-26201408

ABSTRACT

BACKGROUND: Biomedical event extraction has been a major focus of biomedical natural language processing (BioNLP) research since the first BioNLP shared task was held in 2009. Accordingly, a large number of event extraction systems have been developed. Most such systems, however, have been developed for specific tasks and/or incorporated task specific settings, making their application to new corpora and tasks problematic without modification of the systems themselves. There is thus a need for event extraction systems that can achieve high levels of accuracy when applied to corpora in new domains, without the need for exhaustive tuning or modification, whilst retaining competitive levels of performance. RESULTS: We have enhanced our state-of-the-art event extraction system, EventMine, to alleviate the need for task-specific tuning. Task-specific details are specified in a configuration file, while extensive task-specific parameter tuning is avoided through the integration of a weighting method, a covariate shift method, and their combination. The task-specific configuration and weighting method have been employed within the context of two different sub-tasks of BioNLP shared task 2013, i.e. Cancer Genetics (CG) and Pathway Curation (PC), removing the need to modify the system specifically for each task. With minimal task specific configuration and tuning, EventMine achieved the 1st place in the PC task, and 2nd in the CG, achieving the highest recall for both tasks. The system has been further enhanced following the shared task by incorporating the covariate shift method and entity generalisations based on the task definitions, leading to further performance improvements. CONCLUSIONS: We have shown that it is possible to apply a state-of-the-art event extraction system to new tasks with high levels of performance, without having to modify the system internally. Both covariate shift and weighting methods are useful in facilitating the production of high recall systems. These methods and their combination can adapt a model to the target data with no deep tuning and little manual configuration.

Subject(s)

Gene Regulatory Networks , Genes , Information Storage and Retrieval , Models, Theoretical , Natural Language Processing , Neoplasms/genetics , Neoplasms/pathology , Humans , Knowledge Bases

13.

Wide-coverage relation extraction from MEDLINE using deep syntax.

Nguyen, Nhung T H; Miwa, Makoto; Tsuruoka, Yoshimasa; Chikayama, Takashi; Tojo, Satoshi.

BMC Bioinformatics ; 16: 107, 2015 Apr 01.

Article in English | MEDLINE | ID: mdl-25887686

ABSTRACT

BACKGROUND: Relation extraction is a fundamental technology in biomedical text mining. Most of the previous studies on relation extraction from biomedical literature have focused on specific or predefined types of relations, which inherently limits the types of the extracted relations. With the aim of fully leveraging the knowledge described in the literature, we address much broader types of semantic relations using a single extraction framework. RESULTS: Our system, which we name PASMED, extracts diverse types of binary relations from biomedical literature using deep syntactic patterns. Our experimental results demonstrate that it achieves a level of recall considerably higher than the state of the art, while maintaining reasonable precision. We have then applied PASMED to the whole MEDLINE corpus and extracted more than 137 million semantic relations. The extracted relations provide a quantitative understanding of what kinds of semantic relations are actually described in MEDLINE and can be ultimately extracted by (possibly type-specific) relation extraction systems. CONCLUSION: PASMED extracts a large number of relations that have previously been missed by existing text mining systems. The entire collection of the relations extracted from MEDLINE is publicly available in machine-readable form, so that it can serve as a potential knowledge base for high-level text-mining applications.

Subject(s)

Data Mining/methods , MEDLINE , Semantics

14.

Identifying synonymy between relational phrases using word embeddings.

Nguyen, Nhung T H; Miwa, Makoto; Tsuruoka, Yoshimasa; Tojo, Satoshi.

J Biomed Inform ; 56: 94-102, 2015 Aug.

Article in English | MEDLINE | ID: mdl-26004792

ABSTRACT

Many text mining applications in the biomedical domain benefit from automatic clustering of relational phrases into synonymous groups, since it alleviates the problem of spurious mismatches caused by the diversity of natural language expressions. Most of the previous work that has addressed this task of synonymy resolution uses similarity metrics between relational phrases based on textual strings or dependency paths, which, for the most part, ignore the context around the relations. To overcome this shortcoming, we employ a word embedding technique to encode relational phrases. We then apply the k-means algorithm on top of the distributional representations to cluster the phrases. Our experimental results show that this approach outperforms state-of-the-art statistical models including latent Dirichlet allocation and Markov logic networks.

Subject(s)

Data Mining/methods , Natural Language Processing , Vocabulary, Controlled , Algorithms , Cluster Analysis , Databases, Factual , False Positive Reactions , Fuzzy Logic , MEDLINE , Markov Chains , Medical Informatics/methods , Models, Statistical , Probability , Reproducibility of Results , Semantics

15.

A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text.

Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B; Pyysalo, Sampo; Ananiadou, Sophia.

Bioinformatics ; 29(13): i44-52, 2013 Jul 01.

Article in English | MEDLINE | ID: mdl-23813008

ABSTRACT

MOTIVATION: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. METHOD: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. RESULTS: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. AVAILABILITY: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Biochemical Phenomena , Data Mining/methods , Algorithms , Artificial Intelligence , MEDLINE , Support Vector Machine

16.

Reducing systematic review workload through certainty-based screening.

Miwa, Makoto; Thomas, James; O'Mara-Eves, Alison; Ananiadou, Sophia.

J Biomed Inform ; 51: 242-53, 2014 Oct.

Article in English | MEDLINE | ID: mdl-24954015

ABSTRACT

In systematic reviews, the growing number of published studies imposes a significant screening workload on reviewers. Active learning is a promising approach to reduce the workload by automating some of the screening decisions, but it has been evaluated for a limited number of disciplines. The suitability of applying active learning to complex topics in disciplines such as social science has not been studied, and the selection of useful criteria and enhancements to address the data imbalance problem in systematic reviews remains an open problem. We applied active learning with two criteria (certainty and uncertainty) and several enhancements in both clinical medicine and social science (specifically, public health) areas, and compared the results in both. The results show that the certainty criterion is useful for finding relevant documents, and weighting positive instances is promising to overcome the data imbalance problem in both data sets. Latent dirichlet allocation (LDA) is also shown to be promising when little manually-assigned information is available. Active learning is effective in complex topics, although its efficiency is limited due to the difficulties in text classification. The most promising criterion and weighting method are the same regardless of the review topic, and unsupervised techniques like LDA have a possibility to boost the performance of active learning without manual annotation.

Subject(s)

Abstracting and Indexing , Algorithms , Artificial Intelligence , Databases, Bibliographic , Natural Language Processing , Systematic Reviews as Topic , Workload , Abstracting and Indexing/methods , Databases, Bibliographic/classification , Manuscripts as Topic , Peer Review, Research/methods , Semantics

17.

Wide coverage biomedical event extraction using multiple partially overlapping corpora.

Miwa, Makoto; Pyysalo, Sampo; Ohta, Tomoko; Ananiadou, Sophia.

BMC Bioinformatics ; 14: 175, 2013 Jun 03.

Article in English | MEDLINE | ID: mdl-23731785

ABSTRACT

BACKGROUND: Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes. RESULTS: We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011. CONCLUSIONS: The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora.

Subject(s)

Data Mining/methods , Humans , Models, Theoretical , Semantics

18.

Boosting automatic event extraction from the literature using domain adaptation and coreference resolution.

Miwa, Makoto; Thompson, Paul; Ananiadou, Sophia.

Bioinformatics ; 28(13): 1759-65, 2012 Jul 01.

Article in English | MEDLINE | ID: mdl-22539668

ABSTRACT

MOTIVATION: In recent years, several biomedical event extraction (EE) systems have been developed. However, the nature of the annotated training corpora, as well as the training process itself, can limit the performance levels of the trained EE systems. In particular, most event-annotated corpora do not deal adequately with coreference. This impacts on the trained systems' ability to recognize biomedical entities, thus affecting their performance in extracting events accurately. Additionally, the fact that most EE systems are trained on a single annotated corpus further restricts their coverage. RESULTS: We have enhanced our existing EE system, EventMine, in two ways. First, we developed a new coreference resolution (CR) system and integrated it with EventMine. The standalone performance of our CR system in resolving anaphoric references to proteins is considerably higher than the best ranked system in the COREF subtask of the BioNLP'11 Shared Task. Secondly, the improved EventMine incorporates domain adaptation (DA) methods, which extend EE coverage by allowing several different annotated corpora to be used during training. Combined with a novel set of methods to increase the generality and efficiency of EventMine, the integration of both CR and DA have resulted in significant improvements in EE, ranging between 0.5% and 3.4% F-Score. The enhanced EventMine outperforms the highest ranked systems from the BioNLP'09 shared task, and from the GENIA and Infectious Diseases subtasks of the BioNLP'11 shared task. AVAILABILITY: The improved version of EventMine, incorporating the CR system and DA methods, is available at: http://www.nactem.ac.uk/EventMine/.

Subject(s)

Data Mining/methods , Software , Genes , Proteins/metabolism

19.

Event extraction across multiple levels of biological organization.

Pyysalo, Sampo; Ohta, Tomoko; Miwa, Makoto; Cho, Han-Cheol; Tsujii, Jun'ichi; Ananiadou, Sophia.

Bioinformatics ; 28(18): i575-i581, 2012 Sep 15.

Article in English | MEDLINE | ID: mdl-22962484

ABSTRACT

MOTIVATION: Event extraction using expressive structured representations has been a significant focus of recent efforts in biomedical information extraction. However, event extraction resources and methods have so far focused almost exclusively on molecular-level entities and processes, limiting their applicability. RESULTS: We extend the event extraction approach to biomedical information extraction to encompass all levels of biological organization from the molecular to the whole organism. We present the ontological foundations, target types and guidelines for entity and event annotation and introduce the new multi-level event extraction (MLEE) corpus, manually annotated using a structured representation for event extraction. We further adapt and evaluate named entity and event extraction methods for the new task, demonstrating that both can be achieved with performance broadly comparable with that for established molecular entity and event extraction tasks. AVAILABILITY: The resources and methods introduced in this study are available from http://nactem.ac.uk/MLEE/. CONTACT: pyysalos@cs.man.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Data Mining/methods , Humans , Neoplasms

20.

Indirect prediction of surface ozone concentration by plant growth responses in East Asia using mini-open top chambers.

Kohno, Yoshihisa; Matsumura, Hideyuki; Miwa, Makoto; Yonekura, Tetsushi; Aihara, Keiji; Umponstira, Chanin; Le, Vo Thanh; Ngoc, Nguyen Thuy; Viet, Phanm Hung; Wei, Ma.

Environ Monit Assess ; 185(3): 2755-65, 2013 Mar.

Article in English | MEDLINE | ID: mdl-22752963

ABSTRACT

We developed small and mobile open top chambers (mini-OTC) measuring 0.6 m (W) × 0.6 m (D) × 1.2 m (H) with an air duct of 0.6 m (W) × 0.23 m (D) × 1.2 m (H). The air duct can be filled with activated charcoal to blow charcoal filtered air (CF) into the chamber, as opposed to non-filtered ambient air (NF). Ozone sensitive radish Raphanus sativus cv. Red Chime and rosette pakchoi Brassica campestris var. rosularis cv. ATU171 were exposed to NF and CF in mini-OTCs at different locations in East Asia. A total of 29 exposure experiments were conducted at nine locations, Shanghai, China, Ha Noi, Vietnam, Lampang, Phitsanulok and Pathumtani, Thailand, and Hiratsuka, Kisai, Abiko and Akagi, Japan. Although no significant relationships between the mean concentrations of ambient O(3) during the experimental period and the growth responses were observed for either species, multiple linear regression analysis suggested a good relationship between the biomass responses in each species and the O(3) concentration, temperature, and relative humidity. The cumulative daily mean O(3) (ppb/day) could be indirectly predicted by NF/CF based on the dry weight ratio of biomass, mean air temperature, and relative air humidity.

Subject(s)

Air Pollutants/analysis , Air Pollution/statistics & numerical data , Ozone/analysis , Plant Physiological Phenomena/drug effects , Air Pollutants/toxicity , Charcoal , China , Dose-Response Relationship, Drug , Japan , Ozone/toxicity , Thailand , Vietnam

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL