Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
1.
Lang Speech Hear Serv Sch ; 55(2): 276-302, 2024 04 11.
Article in English | MEDLINE | ID: mdl-38266231

ABSTRACT

PURPOSE: The purpose of this study was to explore the experiences of I'm Determined youth leaders with learning disability who have enrolled in higher education within 1 year of graduating high school to better understand if and how their experience participating in the I'm Determined project led to their participation in their Individualized Education Program (IEP) meetings. METHOD: The intent of the narrative inquiry methodology applied to this study was to create a unified story of collective experiences that described or explained the factors leading to participation in their IEP meeting. Although each of the eight narratives is unique to the individual, common themes emerged that were reflected in the literature and consistent across the time continuum of life before and life during participation in I'm Determined. RESULTS: One experience that was consistent was the importance of participating in and leading their IEP meeting. A narrative timeline led to our findings presented here within a continuum of experiences before and during participation in I'm Determined. We made the decision to present the findings in such a way that highlights common themes specific to IEP participation across moments in time while honoring individual narratives through supportive text from the data. This is a study of people's perceptions of their experiences best told by direct quotes from the participants. The IEP experience is just one component of the self-determination experience. CONCLUSIONS: This study provided insight into the educational experiences of the eight I'm Determined youth leader participants and examined the importance of both their participation in I'm Determined and the development of self-determination skills deemed essential to participate and lead their IEP meeting. Their unique perspective documented in this study served to both inform and push the field forward.


Subject(s)
Narration , Personal Autonomy , Adolescent , Humans , Research Design , Schools , Students
2.
Neuroinformatics ; 17(3): 391-406, 2019 07.
Article in English | MEDLINE | ID: mdl-30443819

ABSTRACT

The curation of neuroscience entities is crucial to ongoing efforts in neuroinformatics and computational neuroscience, such as those being deployed in the context of continuing large-scale brain modelling projects. However, manually sifting through thousands of articles for new information about modelled entities is a painstaking and low-reward task. Text mining can be used to help a curator extract relevant information from this literature in a systematic way. We propose the application of text mining methods for the neuroscience literature. Specifically, two computational neuroscientists annotated a corpus of entities pertinent to neuroscience using active learning techniques to enable swift, targeted annotation. We then trained machine learning models to recognise the entities that have been identified. The entities covered are Neuron Types, Brain Regions, Experimental Values, Units, Ion Currents, Channels, and Conductances and Model organisms. We tested a traditional rule-based approach, a conditional random field and a model using deep learning named entity recognition, finding that the deep learning model was superior. Our final results show that we can detect a range of named entities of interest to the neuroscientist with a macro average precision, recall and F1 score of 0.866, 0.817 and 0.837 respectively. The contributions of this work are as follows: 1) We provide a set of Named Entity Recognition (NER) tools that are capable of detecting neuroscience entities with performance above or similar to prior work. 2) We propose a methodology for training NER tools for neuroscience that requires very little training data to get strong performance. This can be adapted for any sub-domain within neuroscience. 3) We provide a small corpus with annotations for multiple entity types, as well as annotation guidelines to help others reproduce our experiments.


Subject(s)
Data Mining/methods , Deep Learning , Neurosciences/methods
3.
Res Synth Methods ; 9(3): 470-488, 2018 Sep.
Article in English | MEDLINE | ID: mdl-29956486

ABSTRACT

Screening references is a time-consuming step necessary for systematic reviews and guideline development. Previous studies have shown that human effort can be reduced by using machine learning software to prioritise large reference collections such that most of the relevant references are identified before screening is completed. We describe and evaluate RobotAnalyst, a Web-based software system that combines text-mining and machine learning algorithms for organising references by their content and actively prioritising them based on a relevancy classification model trained and updated throughout the process. We report an evaluation over 22 reference collections (most are related to public health topics) screened using RobotAnalyst with a total of 43 610 abstract-level decisions. The number of references that needed to be screened to identify 95% of the abstract-level inclusions for the evidence review was reduced on 19 of the 22 collections. Significant gains over random sampling were achieved for all reviews conducted with active prioritisation, as compared with only two of five when prioritisation was not used. RobotAnalyst's descriptive clustering and topic modelling functionalities were also evaluated by public health analysts. Descriptive clustering provided more coherent organisation than topic modelling, and the content of the clusters was apparent to the users across a varying number of clusters. This is the first large-scale study using technology-assisted screening to perform new reviews, and the positive results provide empirical evidence that RobotAnalyst can accelerate the identification of relevant studies. The results also highlight the issue of user complacency and the need for a stopping criterion to realise the work savings.


Subject(s)
Data Interpretation, Statistical , Data Mining/methods , Machine Learning , Review Literature as Topic , Software , Algorithms , Cluster Analysis , Humans , Reproducibility of Results , Retrospective Studies , Switzerland
4.
BMC Med Inform Decis Mak ; 18(1): 46, 2018 06 25.
Article in English | MEDLINE | ID: mdl-29940927

ABSTRACT

BACKGROUND: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author's intended knowledge gain) and New Knowledge (an author's findings). The method incorporates various features, including a combination of simple MK dimensions. METHODS: We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated. RESULTS: We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836). CONCLUSION: We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications.


Subject(s)
Biomedical Research/methods , Data Mining/methods , Knowledge , Research Design , Support Vector Machine , Humans
5.
Case Rep Obstet Gynecol ; 2018: 8190805, 2018.
Article in English | MEDLINE | ID: mdl-30693121

ABSTRACT

Ovarian pregnancy is a rare subtype of ectopic pregnancy with an increased incidence after assisted conception. We present a 31-year-old gestational carrier who presented with suprapubic and pelvic pain at 6 weeks and 2 days' gestation. An ultrasound scan demonstrated an empty uterus and a complex mass in the left adnexa. Operative laparoscopy was performed and an ovarian pregnancy was found and treated. We believe this to be the first report of ovarian pregnancy after IVF in a gestational carrier. Appropriate counselling of surrogate mothers is of utmost importance as the risk of ectopic pregnancy is increased by using assisted reproduction technology. Although ovarian pregnancy still remains a rare event, the possibility of this condition should always be considered.

6.
J Biomed Inform ; 72: 67-76, 2017 08.
Article in English | MEDLINE | ID: mdl-28648605

ABSTRACT

Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews.


Subject(s)
Review Literature as Topic , Automation , Data Curation , Humans , Natural Language Processing
7.
Article in English | MEDLINE | ID: mdl-27888231

ABSTRACT

Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable-those that have the crucial ability to share information, enabling smooth integration and reusability.


Subject(s)
Data Mining/methods , Databases, Factual , Humans
8.
PLoS One ; 11(1): e0144717, 2016.
Article in English | MEDLINE | ID: mdl-26734936

ABSTRACT

Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform.


Subject(s)
Data Mining , History of Medicine , History, 19th Century , Semantics
9.
Reprod Biomed Online ; 31(6): 732-8, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26602106

ABSTRACT

The aim of this comparative randomized embryology trial was to determine if an intravaginal culture device (IVC) can provide acceptable embryo development compared with conventional IVF. Ten women between the ages of 27 and 37 years with an indication for IVF treatment were included in this study. After ovarian stimulation, oocytes were randomized to fertilization in the IVC device or using conventional IVF. Fertilization rates were higher in the IVF group compared with the IVC device (68.7% ± 36 % versus 40.7% ± 27%), respectively, whereas cleavage rates were similar (93% ± 1.5% versus 97% ± 6%) for both groups. A significantly lower number of embryos of suitable quality for transfer was obtained from the IVC device compared with conventional IVF (OR, 0.47; 95% CI, 0.26 to 0.87). The clinical pregnancy rate from transfer of IVC device embryos was 30%. Satisfaction questionnaires were also completed by all participants. Most women (70%) placed high importance on having had fertilization and embryo development occur while carrying the device. Overall, the IVC device produced reasonable pregnancy rates suggesting this technology may have a place under certain circumstances. Cost-benefit analysis, psychological factors and future studies must be considered.


Subject(s)
Embryo Culture Techniques/instrumentation , Embryo Culture Techniques/methods , Fertilization in Vitro/methods , Vagina/cytology , Adult , Cleavage Stage, Ovum , Cost-Benefit Analysis , Embryo Culture Techniques/economics , Embryo Transfer , Embryonic Development , Equipment and Supplies , Female , Fertilization in Vitro/economics , Fertilization in Vitro/instrumentation , Humans , Patient Satisfaction , Pilot Projects , Pregnancy , Pregnancy Rate , Surveys and Questionnaires
11.
Syst Rev ; 4: 5, 2015 Jan 14.
Article in English | MEDLINE | ID: mdl-25588314

ABSTRACT

BACKGROUND: The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities. METHODS: Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged? We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings. RESULTS: The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall). CONCLUSIONS: Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in 'live' reviews. The use of text mining as a 'second screener' may also be used cautiously. The use of text mining to eliminate studies automatically should be considered promising, but not yet fully proven. In highly technical/clinical areas, it may be used with a high degree of confidence; but more developmental and evaluative work is needed in other disciplines.


Subject(s)
Computational Biology , Data Mining/methods , Computational Biology/methods , Computational Biology/trends , Data Mining/trends , Databases, Factual , Evidence-Based Medicine , Humans , Information Storage and Retrieval/trends , Publications
12.
Brief Funct Genomics ; 14(3): 213-30, 2015 May.
Article in English | MEDLINE | ID: mdl-24907365

ABSTRACT

The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of 'events', i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research.


Subject(s)
Biomedical Research , Computational Biology/methods , Data Mining/methods , Genomics , Humans , Phenotype , Semantics
13.
BMC Bioinformatics ; 13: 108, 2012 May 23.
Article in English | MEDLINE | ID: mdl-22621266

ABSTRACT

BACKGROUND: Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. RESULTS: Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP'09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP'09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. CONCLUSIONS: We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.


Subject(s)
Artificial Intelligence , Data Mining/methods , MEDLINE , Computational Biology/methods , Semantics
14.
BMC Bioinformatics ; 12: 393, 2011 Oct 10.
Article in English | MEDLINE | ID: mdl-21985429

ABSTRACT

BACKGROUND: Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event. RESULTS: We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events). High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa. CONCLUSION: By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative information to be specified as part of the search criteria. This can assist in a number of important tasks, e.g., finding new experimental knowledge to facilitate database curation, enabling textual inference to detect entailments and contradictions, etc. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event.


Subject(s)
Data Mining , Information Storage and Retrieval , Database Management Systems , Humans , Information Systems , Knowledge , Semantics
15.
BMC Bioinformatics ; 12: 397, 2011 Oct 12.
Article in English | MEDLINE | ID: mdl-21992002

ABSTRACT

BACKGROUND: Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. RESULTS: This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. CONCLUSIONS: The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.


Subject(s)
Data Mining , Vocabulary, Controlled , Computational Biology , Databases, Factual , Humans , Semantics
16.
Nucleic Acids Res ; 39(Database issue): D58-65, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21062818

ABSTRACT

UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first 'mirror' site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http://ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as access to other records such as Agricola, Patents and recent biomedical theses. UKPMC also differs from PubMed/PMC in that the full text and abstract information can be searched in an integrated manner from one input box. Furthermore, UKPMC contains 'Cited By' information as an alternative way to navigate the literature and has incorporated text-mining approaches to semantically enrich content and integrate it with related database resources. Finally, UKPMC also offers added-value services (UKPMC+) that enable grantees to deposit manuscripts, link papers to grants, publish online portfolios and view citation information on their papers. Here we describe UKPMC and clarify the relationship between PMC and UKPMC, providing historical context and future directions, 10 years on from when PMC was first launched.


Subject(s)
PubMed , Data Mining , Internet , Software , United Kingdom
17.
Res Synth Methods ; 2(1): 1-14, 2011 Mar.
Article in English | MEDLINE | ID: mdl-26061596

ABSTRACT

Systematic reviews are a widely accepted research method. However, it is increasingly difficult to conduct them to fit with policy and practice timescales, particularly in areas which do not have well indexed, comprehensive bibliographic databases. Text mining technologies offer one possible way forward in reducing the amount of time systematic reviews take to conduct. They can facilitate the identification of relevant literature, its rapid description or categorization, and its summarization. In this paper, we describe the application of four text mining technologies, namely, automatic term recognition, document clustering, classification and summarization, which support the identification of relevant studies in systematic reviews. The contributions of text mining technologies to improve reviewing efficiency are considered and their strengths and weaknesses explored. We conclude that these technologies do have the potential to assist at various stages of the review process. However, they are relatively unknown in the systematic reviewing community, and substantial evaluation and methods development are required before their possible impact can be fully assessed. Copyright © 2011 John Wiley & Sons, Ltd.

18.
Philos Trans A Math Phys Eng Sci ; 368(1925): 3829-44, 2010 Aug 28.
Article in English | MEDLINE | ID: mdl-20643679

ABSTRACT

The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500,000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents.

19.
J Bioinform Comput Biol ; 8(1): 147-61, 2010 Feb.
Article in English | MEDLINE | ID: mdl-20183880

ABSTRACT

This paper demonstrates that a large-scale lexicon tailored for the biology domain is effective in improving question analysis for genomics Question Answering (QA). We use the TREC Genomics Track data to evaluate the performance of different question analysis methods. It is hard to process textual information in biology, especially in molecular biology, due to a huge number of technical terms which rarely appear in general English documents and dictionaries. To support biological Text Mining, we have developed a domain-specific resource, the BioLexicon. Started in 2006 from scratch, this lexicon currently includes more than four million biomedical terms consisting of newly curated terms and terms collected from existing biomedical databases. While conventional genomics QA systems provide query expansion based on thesauri and dictionaries, it is not clear to what extent a biology-oriented lexical resource is effective for question pre-processing for genomics QA. Experiments on the genomics QA data set show that question analysis using the BioLexicon performs slightly better than that using n-grams and the UMLS Specialist Lexicon.


Subject(s)
Genomics/statistics & numerical data , Computational Biology , Data Mining/statistics & numerical data , Databases, Genetic/statistics & numerical data , Information Storage and Retrieval/statistics & numerical data
20.
BMC Bioinformatics ; 10: 349, 2009 Oct 23.
Article in English | MEDLINE | ID: mdl-19852798

ABSTRACT

BACKGROUND: Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. RESULTS: We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. CONCLUSION: The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE components, such as semantic role labellers. The corpus and annotation guidelines are freely available for academic purposes.


Subject(s)
Computational Biology/methods , Databases, Factual , Information Storage and Retrieval/methods , Natural Language Processing , Vocabulary, Controlled
SELECTION OF CITATIONS
SEARCH DETAIL
...