Search | VHL Regional Portal

Semi-supervised learning of causal relations in biomedical scientific discourse.

Mihaila, Claudiu; Ananiadou, Sophia.

Biomed Eng Online ; 13 Suppl 2: S1, 2014.

Article in English | MEDLINE | ID: mdl-25559746

ABSTRACT

BACKGROUND: The increasing number of daily published articles in the biomedical domain has become too large for humans to handle on their own. As a result, bio-text mining technologies have been developed to improve their workload by automatically analysing the text and extracting important knowledge. Specific bio-entities, bio-events between these and facts can now be recognised with sufficient accuracy and are widely used by biomedical researchers. However, understanding how the extracted facts are connected in text is an extremely difficult task, which cannot be easily tackled by machinery. RESULTS: In this article, we describe our method to recognise causal triggers and their arguments in biomedical scientific discourse. We introduce new features and show that a self-learning approach improves the performance obtained by supervised machine learners to 83.47% for causal triggers. Furthermore, the spans of causal arguments can be recognised to a slightly higher level that by using supervised or rule-based methods that have been employed before. CONCLUSION: Exploiting the large amount of unlabelled data that is already available can help improve the performance of recognising causal discourse relations in the biomedical domain. This improvement will further benefit the development of multiple tasks, such as hypothesis generation for experimental laboratories, contradiction detection, and the creation of causal networks.

Subject(s)

Artificial Intelligence , Biomedical Research/classification , Communicable Diseases/classification , Natural Language Processing , Pattern Recognition, Automated/methods , Vocabulary, Controlled , Data Mining/methods , Humans , Periodicals as Topic , Software

Recognising discourse causality triggers in the biomedical domain.

Mihaila, Claudiu; Ananiadou, Sophia.

J Bioinform Comput Biol ; 11(6): 1343008, 2013 Dec.

Article in English | MEDLINE | ID: mdl-24372037

ABSTRACT

Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vast amounts of knowledge in a short time. Automatic discourse causality recognition can further reduce their workload by suggesting possible causal connections and aiding in the curation of pathway models. We describe here an approach to the automatic identification of discourse causality triggers in the biomedical domain using machine learning. We create several baselines and experiment with and compare various parameter settings for three algorithms, i.e. Conditional Random Fields (CRF), Support Vector Machines (SVM) and Random Forests (RF). We also evaluate the impact of lexical, syntactic, and semantic features on each of the algorithms, showing that semantics improves the performance in all cases. We test our comprehensive feature set on two corpora containing gold standard annotations of causal relations, and demonstrate the need for more gold standard data. The best performance of 79.35% F-score is achieved by CRFs when using all three feature types.

Subject(s)

Artificial Intelligence , Biomedical Research/methods , Data Mining , Support Vector Machine , Semantics

BioCause: Annotating and analysing causality in the biomedical domain.

Mihaila, Claudiu; Ohta, Tomoko; Pyysalo, Sampo; Ananiadou, Sophia.

BMC Bioinformatics ; 14: 2, 2013 Jan 16.

Article in English | MEDLINE | ID: mdl-23323613

ABSTRACT

BACKGROUND: Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining. RESULTS: We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems. CONCLUSION: Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new facts and providing new hypotheses for experimental work.

Subject(s)

Data Mining/methods , Data Interpretation, Statistical , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL