Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Bioinformatics ; 28(12): i292-300, 2012 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-22689774

RESUMO

MOTIVATION: Ontologies are an everyday tool in biomedicine to capture and represent knowledge. However, many ontologies lack a high degree of coverage in their domain and need to improve their overall quality and maturity. Automatically extending sets of existing terms will enable ontology engineers to systematically improve text-based ontologies level by level. RESULTS: We developed an approach to extend ontologies by discovering new terms which are in a sibling relationship to existing terms of an ontology. For this purpose, we combined two approaches which retrieve new terms from the web. The first approach extracts siblings by exploiting the structure of HTML documents, whereas the second approach uses text mining techniques to extract siblings from unstructured text. Our evaluation against MeSH (Medical Subject Headings) shows that our method for sibling discovery is able to suggest first-class ontology terms and can be used as an initial step towards assessing the completeness of ontologies. The evaluation yields a recall of 80% at a precision of 61% where the two independent approaches are complementing each other. For MeSH in particular, we show that it can be considered complete in its medical focus area. We integrated the work into DOG4DAG, an ontology generation plugin for the editors OBO-Edit and Protégé, making it the first plugin that supports sibling discovery on-the-fly. AVAILABILITY: Sibling discovery for ontology is available as part of DOG4DAG (www.biotec.tu-dresden.de/research/schroeder/dog4dag) for both Protégé 4.1 and OBO-Edit 2.1.


Assuntos
Mineração de Dados/métodos , Internet , Medical Subject Headings , Algoritmos , Terminologia como Assunto
2.
Bioinformatics ; 26(12): i88-96, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20529942

RESUMO

MOTIVATION: Ontologies and taxonomies have proven highly beneficial for biocuration. The Open Biomedical Ontology (OBO) Foundry alone lists over 90 ontologies mainly built with OBO-Edit. Creating and maintaining such ontologies is a labour-intensive, difficult, manual process. Automating parts of it is of great importance for the further development of ontologies and for biocuration. RESULTS: We have developed the Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG), a system which supports the creation and extension of OBO ontologies by semi-automatically generating terms, definitions and parent-child relations from text in PubMed, the web and PDF repositories. DOG4DAG is seamlessly integrated into OBO-Edit. It generates terms by identifying statistically significant noun phrases in text. For definitions and parent-child relations it employs pattern-based web searches. We systematically evaluate each generation step using manually validated benchmarks. The term generation leads to high-quality terms also found in manually created ontologies. Up to 78% of definitions are valid and up to 54% of child-ancestor relations can be retrieved. There is no other validated system that achieves comparable results. By combining the prediction of high-quality terms, definitions and parent-child relations with the ontology editor OBO-Edit we contribute a thoroughly validated tool for all OBO ontology engineers. AVAILABILITY: DOG4DAG is available within OBO-Edit 2.1 at http://www.oboedit.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Software , Vocabulário Controlado , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação/métodos , Internet , Linguagens de Programação , Interface Usuário-Computador
3.
Regul Toxicol Pharmacol ; 59(1): 47-52, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20850491

RESUMO

The risk assessment of nano-sized materials (NM) currently suffers from great uncertainties regarding their putative toxicity for humans and the environment. An extensive amount of the respective original research literature has to be evaluated before a targeted and hypothesis-driven Environmental and Health Safety research can be stipulated. Furthermore, to comply with the European animal protection legislation in vitro testing has to be preferred whenever possible. Against this background, there is the need for tools that enable producers of NM and risk assessors for a fast and comprehensive data retrieval, thereby linking the 3Rs principle to the hazard identification of NM. Here we report on the development of a knowledge-based search engine that is tailored to the particular needs of risk assessors in the area of NM. Comprehensive retrieval of data from studies utilising in vitro as well as in vivo methods relying on the PubMed database is presented exemplarily with a titanium dioxide case study. A fast, relevant and reliable information retrieval is of paramount importance for the scientific community dedicated to develop safe NM in various product areas, and for risk assessors obliged to identify data gaps, to define additional data requirements for approval of NM and to create strategies for integrated testing using alternative methods.


Assuntos
Mineração de Dados , Bases de Dados Factuais , Bases de Conhecimento , Nanoestruturas/toxicidade , Nanotecnologia/métodos , Ferramenta de Busca , Titânio/toxicidade , Toxicologia/métodos , Animais , Inteligência Artificial , Humanos , Internet , Medição de Risco , Terminologia como Assunto
4.
Brief Bioinform ; 9(6): 466-78, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19060303

RESUMO

The biomedical literature can be seen as a large integrated, but unstructured data repository. Extracting facts from literature and making them accessible is approached from two directions: manual curation efforts develop ontologies and vocabularies to annotate gene products based on statements in papers. Text mining aims to automatically identify entities and their relationships in text using information retrieval and natural language processing techniques. Manual curation is highly accurate but time consuming, and does not scale with the ever increasing growth of literature. Text mining as a high-throughput computational technique scales well, but is error-prone due to the complexity of natural language. How can both be married to combine scalability and accuracy? Here, we review the state-of-the-art text mining approaches that are relevant to annotation and discuss available online services analysing biomedical literature by means of text mining techniques, which could also be utilised by annotation projects. We then examine how far text mining has already been utilised in existing annotation projects and conclude how these techniques could be tightly integrated into the manual annotation process through novel authoring systems to scale-up high-quality manual curation.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genes , Armazenamento e Recuperação da Informação/métodos , Indexação e Redação de Resumos , Animais , Bases de Dados Bibliográficas , Humanos , Conhecimento , Semântica
5.
BMC Bioinformatics ; 10: 28, 2009 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-19159460

RESUMO

BACKGROUND: Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. RESULTS: The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate. CONCLUSION: Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently modelled ontology. Overall, the results show that well structured ontologies can play a very important role to improve disambiguation. AVAILABILITY: The three benchmark datasets created for the purpose of disambiguation are available in Additional file 1.


Assuntos
Biologia Computacional/métodos , Vocabulário Controlado , Algoritmos , Armazenamento e Recuperação da Informação , Informática Médica/métodos , Medical Subject Headings , Reconhecimento Automatizado de Padrão , Unified Medical Language System
6.
BMC Bioinformatics ; 9 Suppl 4: S2, 2008 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-18460175

RESUMO

BACKGROUND: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. RESULTS: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. CONCLUSIONS: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. AVAILABILITY: The TFIDF term recognition is available as Web Service, described at http://gopubmed4.biotec.tu-dresden.de/IdavollWebService/services/CandidateTermGeneratorService?wsdl.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Lipoproteínas/classificação , Lipoproteínas/metabolismo , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Terminologia como Assunto , Algoritmos , Bases de Dados Factuais , Semântica , Software
7.
Toxicol In Vitro ; 28(4): 571-87, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24389116

RESUMO

The knowledge-based search engine Go3R, www.Go3R.org, has been developed to assist scientists from industry and regulatory authorities in collecting comprehensive toxicological information with a special focus on identifying available alternatives to animal testing. The semantic search paradigm of Go3R makes use of expert knowledge on 3Rs methods and regulatory toxicology, laid down in the ontology, a network of concepts, terms, and synonyms, to recognize the contents of documents. Search results are automatically sorted into a dynamic table of contents presented alongside the list of documents retrieved. This table of contents allows the user to quickly filter the set of documents by topics of interest. Documents containing hazard information are automatically assigned to a user interface following the endpoint-specific IUCLID5 categorization scheme required, e.g. for REACH registration dossiers. For this purpose, complex endpoint-specific search queries were compiled and integrated into the search engine (based upon a gold standard of 310 references that had been assigned manually to the different endpoint categories). Go3R sorts 87% of the references concordantly into the respective IUCLID5 categories. Currently, Go3R searches in the 22 million documents available in the PubMed and TOXNET databases. However, it can be customized to search in other databases including in-house databanks.


Assuntos
Alternativas aos Testes com Animais/métodos , Bases de Dados Factuais , Substâncias Perigosas/toxicidade , Ferramenta de Busca , Terminologia como Assunto , Bem-Estar do Animal , Animais , Pesquisa Biomédica/métodos , Documentação , Projetos de Pesquisa
8.
ALTEX ; 26(1): 17-31, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19326030

RESUMO

Consideration and incorporation of all available scientific information is an important part of the planning of any scientific project. As regards research with sentient animals, EU Directive 86/609/EEC for the protection of laboratory animals requires scientists to consider whether any planned animal experiment can be substituted by other scientifically satisfactory methods not entailing the use of animals or entailing less animals or less animal suffering, before performing the experiment. Thus, collection of relevant information is indispensable in order to meet this legal obligation. However, no standard procedures or services exist to provide convenient access to the information required to reliably determine whether it is possible to replace, reduce or refine a planned animal experiment in accordance with the 3Rs principle. The search engine Go3R, which is available free of charge under http://Go3R.org, runs up to become such a standard service. Go3R is the world-wide first search engine on alternative methods building on new semantic technologies that use an expert-knowledge based ontology to identify relevant documents. Due to Go3R's concept and design, the search engine can be used without lengthy instructions. It enables all those involved in the planning, authorisation and performance of animal experiments to determine the availability of non-animal methodologies in a fast, comprehensive and transparent manner. Thereby, Go3R strives to significantly contribute to the avoidance and replacement of animal experiments.


Assuntos
Alternativas aos Testes com Animais , Internet , Software , Bem-Estar do Animal , Animais , Pesquisa Biomédica/métodos , Documentação , Ciência dos Animais de Laboratório , Terminologia como Assunto , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA